Detecting page language?

Discussions about machine vision support in GdPicture.
Post Reply
User avatar
ryancole11
Posts: 21
Joined: Fri May 21, 2010 7:19 pm

Detecting page language?

Post by ryancole11 » Fri May 21, 2010 7:23 pm

Hello,

I saw the thread below this asking about detecting the page language of a document being OCR'd. I saw the response by the admin saying they have no looked into this feature, and therefore I assume this does not exist in the current version of the Tesseract OCR engine plugin.

I guess that I will have to come up with some way to automate that part of the OCR process. Does anyone have any neat tricks that they use to detect, automatically, what language a document is in? We will be OCR'ing hundreds of documents at a time, and usually we have documents from all over the world. I'd like to detect the document language and then OCR using that dictionary, if possible.

Thanks,
Ryan

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: Detecting page language?

Post by Loïc » Tue May 25, 2010 3:37 pm

Hi Ryan,

Unfortunately we don't have this feature & I can't see stable enough solution for such need.

Thank you for your comprehension.

With best regards,

Loïc

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest