Character After String of Numbers Is OCRed as a Number
Character After String of Numbers Is OCRed as a Number
Hello GDPictureTeam,
I have a clear document which is being scanned and OCRed. On this document there is only a few lines of text, all of which are clear. The resultant scan is of good quality as well. However, a string "TITLE 1120S K-1" is consistently creating a OCRed text of "TITLE 11208 K-1" We are wondering what to do about this. Is the engine seeing a string of numbers and assuming the next character is a number? Is there anything we can do to fix this?
Thanks
Doc.It Development Team
I have a clear document which is being scanned and OCRed. On this document there is only a few lines of text, all of which are clear. The resultant scan is of good quality as well. However, a string "TITLE 1120S K-1" is consistently creating a OCRed text of "TITLE 11208 K-1" We are wondering what to do about this. Is the engine seeing a string of numbers and assuming the next character is a number? Is there anything we can do to fix this?
Thanks
Doc.It Development Team
-
- Posts: 352
- Joined: Tue Sep 27, 2011 11:47 am
Re: Character After String of Numbers Is OCRed as a Number
Hi,
Could you please attach the image you are talking about?
Best,
Sami
Could you please attach the image you are talking about?
Best,
Sami
Re: Character After String of Numbers Is OCRed as a Number
Please take a look at attached file. As soon as you OCR them TITLE 1120S K-1 consistently creating a OCRed text of "TITLE 11208 K-1" .
- Attachments
-
- Documents.zip
- images
- (70.32 KiB) Downloaded 528 times
Re: Character After String of Numbers Is OCRed as a Number
We need to get this working as soon as possible.
The Attached zip file in my previous thread has one PDF file and One Tif image file. You will see both of them will output the same result. Please let us know how we can make it work.
Thanks,
Doc.It Development
The Attached zip file in my previous thread has one PDF file and One Tif image file. You will see both of them will output the same result. Please let us know how we can make it work.
Thanks,
Doc.It Development
-
- Posts: 352
- Joined: Tue Sep 27, 2011 11:47 am
Re: Character After String of Numbers Is OCRed as a Number
Hi,
Unfortunately there is nothing we can do. The Tesseract engine has characterized this as one word of digits, and thus assumes the 'S' is a segmented '8', especially that the line endings of the 'S' are closer to the middle than most fonts.
Best,
Sami
Unfortunately there is nothing we can do. The Tesseract engine has characterized this as one word of digits, and thus assumes the 'S' is a segmented '8', especially that the line endings of the 'S' are closer to the middle than most fonts.
Best,
Sami
Who is online
Users browsing this forum: No registered users and 1 guest