Detecting page language?

ryancole11 · Post by **ryancole11** » Fri May 21, 2010 7:23 pm

Hello,

I saw the thread below this asking about detecting the page language of a document being OCR'd. I saw the response by the admin saying they have no looked into this feature, and therefore I assume this does not exist in the current version of the Tesseract OCR engine plugin.

I guess that I will have to come up with some way to automate that part of the OCR process. Does anyone have any neat tricks that they use to detect, automatically, what language a document is in? We will be OCR'ing hundreds of documents at a time, and usually we have documents from all over the world. I'd like to detect the document language and then OCR using that dictionary, if possible.

Thanks,
Ryan

Post by **Loïc** » Tue May 25, 2010 3:37 pm

Hi Ryan,

Unfortunately we don't have this feature & I can't see stable enough solution for such need.

Thank you for your comprehension.

With best regards,

Loïc

Detecting page language?

Detecting page language?

Re: Detecting page language?

Who is online

Stay in Touch

About ORPALIS