Hello,
I saw the thread below this asking about detecting the page language of a document being OCR'd. I saw the response by the admin saying they have no looked into this feature, and therefore I assume this does not exist in the current version of the Tesseract OCR engine plugin.
I guess that I will have to come up with some way to automate that part of the OCR process. Does anyone have any neat tricks that they use to detect, automatically, what language a document is in? We will be OCR'ing hundreds of documents at a time, and usually we have documents from all over the world. I'd like to detect the document language and then OCR using that dictionary, if possible.
Thanks,
Ryan
Detecting page language?
Re: Detecting page language?
Hi Ryan,
Unfortunately we don't have this feature & I can't see stable enough solution for such need.
Thank you for your comprehension.
With best regards,
Loïc
Unfortunately we don't have this feature & I can't see stable enough solution for such need.
Thank you for your comprehension.
With best regards,
Loïc
Who is online
Users browsing this forum: No registered users and 2 guests