Sorry to bother you
I use Tesseract OCR for reading documents printed with our software and after scanned.
I'm sending you an example of a document, surely there is something strange because usually your OCR is very fine but with this type of document every time a lot of words are not OK.
For us is important to read only some words, but we must read 'Numero pratica <Number>' and these words are never recognized (they are in the first row in the upper right corner).
I use gdpicturepro_5_11_13.exe
Can You help me?
Why OCR is not OK?
Re: Why OCR is not OK?
Hi,
it is a bit complicated problem that I will try to explain:
If you make an OCR process on a whole page, the OCR engine will try to identify fonts, fonts styles, fonts sizes... Sometime he can fail especially on page mixing font family and styles.
Consequently, your issue come from a "bad decision" from the engine. it is rare, but can append.
However, if you make OCR on a specific area (on a rectangle bounding the area of your digits for example) you will have very satisfying result.
With best regards,
Loïc
it is a bit complicated problem that I will try to explain:
If you make an OCR process on a whole page, the OCR engine will try to identify fonts, fonts styles, fonts sizes... Sometime he can fail especially on page mixing font family and styles.
Consequently, your issue come from a "bad decision" from the engine. it is rare, but can append.
However, if you make OCR on a specific area (on a rectangle bounding the area of your digits for example) you will have very satisfying result.
With best regards,
Loïc
Re: Why OCR is not OK?
really tahnks for our very fast replay.
... I will try some other roads, 'cause i cannot know where the user put the informations
maybe i can try using only 1 font and 1 style and 1 size
tks
... I will try some other roads, 'cause i cannot know where the user put the informations
maybe i can try using only 1 font and 1 style and 1 size
tks
Who is online
Users browsing this forum: No registered users and 2 guests