Why OCR is not OK?

fierikit · Post by **fierikit** » Tue Jan 12, 2010 10:24 am

Sorry to bother you
I use Tesseract OCR for reading documents printed with our software and after scanned.
I'm sending you an example of a document, surely there is something strange because usually your OCR is very fine but with this type of document every time a lot of words are not OK.
For us is important to read only some words, but we must read 'Numero pratica <Number>' and these words are never recognized (they are in the first row in the upper right corner).

I use gdpicturepro_5_11_13.exe
Can You help me?

Post by **Loïc** » Tue Jan 12, 2010 12:46 pm

Hi,

it is a bit complicated problem that I will try to explain:

If you make an OCR process on a whole page, the OCR engine will try to identify fonts, fonts styles, fonts sizes... Sometime he can fail especially on page mixing font family and styles.
Consequently, your issue come from a "bad decision" from the engine. it is rare, but can append.

However, if you make OCR on a specific area (on a rectangle bounding the area of your digits for example) you will have very satisfying result.

With best regards,

Loïc

fierikit · Post by **fierikit** » Tue Jan 12, 2010 12:57 pm

really tahnks for our very fast replay.
... I will try some other roads, 'cause i cannot know where the user put the informations
maybe i can try using only 1 font and 1 style and 1 size

tks

Why OCR is not OK?

Why OCR is not OK?

Re: Why OCR is not OK?

Re: Why OCR is not OK?

Who is online

Stay in Touch

About ORPALIS