Formatted Text Output?

Discussions about machine vision support in GdPicture.
Post Reply
dwg
Posts: 6
Joined: Sun Apr 29, 2012 1:20 am

Formatted Text Output?

Post by dwg » Sun Mar 29, 2020 11:52 pm

When saving the OCR result as regular text (.txt), will the formatting be preserved? This is important for example if amounts need to line up under a certain column name. Like ( had to added the --- because posting the question removes the extra spaces!):

Date--------Description---------------Credit------------Debit
01/12-------text------------------------100.00
01/14-------text-----------------------------------------2,392.00

If the formatting is removed it could look like:

Date Description Credit Debit
01/12 text 100.00
01/14 text 2,392.00

Which makes it impossible to tell debit from credit.

Hugo
Posts: 227
Joined: Tue Dec 18, 2018 10:09 am

Re: Formatted Text Output?

Post by Hugo » Tue Mar 31, 2020 4:14 pm

Hi Dwg,

In our latest minor release we have improved text formatting when extracting text after OCR and saving the results as .txt.

This feature was greatly improved/implemented a few weeks ago.
I suggest you try this. Feel free to provide any document you are having trouble with and we'll take a look at it and fix it if necessary.

Regards,

dwg
Posts: 6
Joined: Sun Apr 29, 2012 1:20 am

Re: Formatted Text Output?

Post by dwg » Thu Apr 02, 2020 11:38 pm

Can you check this example PDF doc? It is important to keep the amounts under the correct columns...
I think I would like to evaluate v14 if this looks good. Thanks
Attachments
exampleB_good.pdf
(19.04 KiB) Downloaded 460 times

Hugo
Posts: 227
Joined: Tue Dec 18, 2018 10:09 am

Re: Formatted Text Output?

Post by Hugo » Fri Apr 03, 2020 1:19 pm

Hi Dwg,

Currently this is implemented but improvements can still be made. This is quite complex to implement as it needs to take into count the font style as well as the spaces.

This is currently how our OCR demo can render this. See attachments.

Regards
Attachments
Screenshot_48.png
.txt render results
Screenshot_47.png
ocr page section using ROI

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest