Why OCR is not OK?

Discussions about machine vision support in GdPicture.
Post Reply
fierikit
Posts: 20
Joined: Mon Nov 05, 2007 7:03 pm
Location: Italy
Contact:

Why OCR is not OK?

Post by fierikit » Tue Jan 12, 2010 10:24 am

Sorry to bother you
I use Tesseract OCR for reading documents printed with our software and after scanned.
I'm sending you an example of a document, surely there is something strange because usually your OCR is very fine but with this type of document every time a lot of words are not OK.
For us is important to read only some words, but we must read 'Numero pratica <Number>' and these words are never recognized (they are in the first row in the upper right corner).

I use gdpicturepro_5_11_13.exe
Can You help me?
Attachments
00004929_001.tif
Sample document

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: Why OCR is not OK?

Post by Loïc » Tue Jan 12, 2010 12:46 pm

Hi,

it is a bit complicated problem that I will try to explain:

If you make an OCR process on a whole page, the OCR engine will try to identify fonts, fonts styles, fonts sizes... Sometime he can fail especially on page mixing font family and styles.
Consequently, your issue come from a "bad decision" from the engine. it is rare, but can append.

However, if you make OCR on a specific area (on a rectangle bounding the area of your digits for example) you will have very satisfying result.

With best regards,

Loïc

fierikit
Posts: 20
Joined: Mon Nov 05, 2007 7:03 pm
Location: Italy
Contact:

Re: Why OCR is not OK?

Post by fierikit » Tue Jan 12, 2010 12:57 pm

really tahnks for our very fast replay.
... I will try some other roads, 'cause i cannot know where the user put the informations
maybe i can try using only 1 font and 1 style and 1 size

tks

Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests