Problems with OCRTesseractGetCharTop/Left
Problems with OCRTesseractGetCharTop/Left
Hi
We have the following Problem: After Executing OCR, we extracted all Words and its Coordinates from the Recognized Text. This Words will be filtered and we tried to Show it for the Customer. For this we use AddRegion to Paint a Rectangle around the Word with the Coordinates from OCRTesseractGetCharTop/Left/Right/Bottom. Unfortunatelly this Coordinates are in Pixel and if we tried to Add the Region we have sometimes Troubles, when the Image in the PDF is rotated, or when the Resolution is other than the Resolution with OCR Recognition.
What can we do to paint the AddRegion every time on the Right Place on the PDF ??
We have the following Problem: After Executing OCR, we extracted all Words and its Coordinates from the Recognized Text. This Words will be filtered and we tried to Show it for the Customer. For this we use AddRegion to Paint a Rectangle around the Word with the Coordinates from OCRTesseractGetCharTop/Left/Right/Bottom. Unfortunatelly this Coordinates are in Pixel and if we tried to Add the Region we have sometimes Troubles, when the Image in the PDF is rotated, or when the Resolution is other than the Resolution with OCR Recognition.
What can we do to paint the AddRegion every time on the Right Place on the PDF ??
Re: Problems with OCRTesseractGetCharTop/Left
Hi
I tried to Use CoordDocumentToViewer, but this function Returns Negative Values. What i Need would be like AddPageRotation on the PDF Document, to Add the Region on the Viewer on the Right Place. Do you have an advice for me ??
I tried to Use CoordDocumentToViewer, but this function Returns Negative Values. What i Need would be like AddPageRotation on the PDF Document, to Add the Region on the Viewer on the Right Place. Do you have an advice for me ??
Re: Problems with OCRTesseractGetCharTop/Left
Hi
Can you give me an advice ?? I tested a lot of combinations, but i have documents, were the Rectangles are not correct. To paint a Rectangle on a PDF, we use the AddRegionInches MEthod. This method Needs Inches. So we calculate the Coordinates with the Resolution of the OCR. Sometimes we got the wrong coordinates.
Is there another possibility ??
Can you give me an advice ?? I tested a lot of combinations, but i have documents, were the Rectangles are not correct. To paint a Rectangle on a PDF, we use the AddRegionInches MEthod. This method Needs Inches. So we calculate the Coordinates with the Resolution of the OCR. Sometimes we got the wrong coordinates.
Is there another possibility ??
Re: Problems with OCRTesseractGetCharTop/Left
Hi
No answer ??
No answer ??
Re: Problems with OCRTesseractGetCharTop/Left
Hi,
Unfortunately your messages are really unclear to me, especially after reading "Sometimes we got the wrong coordinates".
What I can suggest you to go further is to simply reproduce your problem in a standalone application that you can share with our team through our helpdesk: https://www.gdpicture.com/support/getting-support-from-our-team
Kind regards,
Loïc
Unfortunately your messages are really unclear to me, especially after reading "Sometimes we got the wrong coordinates".
What I can suggest you to go further is to simply reproduce your problem in a standalone application that you can share with our team through our helpdesk: https://www.gdpicture.com/support/getting-support-from-our-team
Kind regards,
Loïc
Re: Problems with OCRTesseractGetCharTop/Left
Oh, sorry. Let it clarify to you.
We automatically process PDF Documents. The User loads a PDF Document in the Viewer.
Than he starts the OCR Recognition. In this Action we Render the PDF Image to an unvisible GDPictureImage and execute the OCRTesseractDoOCR Function. After the Execution we build a word List (Through OCRTesseractGetChar and OCRTesseractGetCharSpaces) and search for phrases in this list. If we found the Phrase we calculate the rectangle coordinates of this word on the Document.
Up to this Point it works great.
So that the User can find the Phrase on the Document I use the AddRegionInches Method of the Viewer. I recalculate the Rectangle coordinates to Inches (Pixel / DPI) but on some Documentes the Rectangle is drawn wrong. This happend, when the PDF has a rotated Image. I tried the Method CoordDocumentToViewer but i did not get the right coordinates.
Do you have an advoce for me ??
We automatically process PDF Documents. The User loads a PDF Document in the Viewer.
Than he starts the OCR Recognition. In this Action we Render the PDF Image to an unvisible GDPictureImage and execute the OCRTesseractDoOCR Function. After the Execution we build a word List (Through OCRTesseractGetChar and OCRTesseractGetCharSpaces) and search for phrases in this list. If we found the Phrase we calculate the rectangle coordinates of this word on the Document.
Up to this Point it works great.
So that the User can find the Phrase on the Document I use the AddRegionInches Method of the Viewer. I recalculate the Rectangle coordinates to Inches (Pixel / DPI) but on some Documentes the Rectangle is drawn wrong. This happend, when the PDF has a rotated Image. I tried the Method CoordDocumentToViewer but i did not get the right coordinates.
Do you have an advoce for me ??
OCRTesseractGetCharTop/left/bottom/right to physical pixels
We use Tesserect OCR to assist with screen reading. OCRTesserectGetCharTop/Left/Bottom/Right specify it returns 'in pixels'. However I use physical pixels for 'targeting' here to read.
The physical pixel conversion is important because I want to target an area of the screen, but not be perfectly specific. I will do a second pass taking the coordinates returned by OCRTesserectGetCharTop etc, to tune where to target the screen again.
I find that the size of the area being targeted is important for recognition accuracy.
The physical pixel conversion is important because I want to target an area of the screen, but not be perfectly specific. I will do a second pass taking the coordinates returned by OCRTesserectGetCharTop etc, to tune where to target the screen again.
I find that the size of the area being targeted is important for recognition accuracy.
Re: Problems with OCRTesseractGetCharTop/Left
Hello,
It is not quite clear what you mean by "the PDF has a rotated Image". To get rid of internal rotation in PDF documents, you should normalize pages in the document before OCR:
https://www.gdpicture.com/guides/gdpicture/web ... ePage.html
It is not quite clear what you mean by "the PDF has a rotated Image". To get rid of internal rotation in PDF documents, you should normalize pages in the document before OCR:
https://www.gdpicture.com/guides/gdpicture/web ... ePage.html
Re: OCRTesseractGetCharTop/left/bottom/right to physical pixels
Hello,
I'm not sure what you mean by "physical pixels". Maybe the posts above in this topic can be useful for you.
I'm not sure what you mean by "physical pixels". Maybe the posts above in this topic can be useful for you.
Who is online
Users browsing this forum: No registered users and 0 guests