How to know a PDF page has text (invisible / we can search) (from OCR or it was created by Word, Excel or any other tool

reisrf · Post by **reisrf** » Thu Jun 07, 2018 11:38 pm

I will receive some PDFs where some pages can have text (invisible) so we can search (example: PDFs created by OCR tools, or Office tools or others). And other pages where it will be a scanned image without ocr contents , so we can´t search. For the pages without OCR contents I need to apply OCR and create the hidden text in the specific locations (this I know how to do). My question is: how to detect a page has or not invisible text?

Thanks in advance

Robson Reis

Post by **Loïc** » Sun Jun 10, 2018 9:29 pm

See: https://www.gdpicture.com/guides/gdpicture/web ... lean).html

reisrf · Post by **reisrf** » Mon Jun 11, 2018 5:00 pm

Thank you!

reisrf · Post by **reisrf** » Tue Jun 19, 2018 8:52 pm

PageHasText method is returning True even if in the page we have only special characeters like \r, \n, \l, .... I have created by own PageHasText, using GetPageText:

string pageText = Regex.Replace(_gdPDF.GetPageText(), "[^0-9a-zA-Z]+", string.Empty).Trim();
return (pageText.Length == 0 ? false : true) ;

The snippet above returns True if we have at least a number or a letter (lower or uppercase) and false if there are only spaces or special characters.

Gabriela · Post by **Gabriela** » Mon Jan 21, 2019 4:48 pm

Hi,

The PageHasText() method returns true/True if an arbitrary text is on the page. Special characters are considered as text; hence the method is working correctly. Your workaround is nice, and it is working for you very well. It always depends on the requirements you have for your application. Methods intended to work generally needs to do the proper job for all users. You can open a ticket on our support platform if you need some "custom" method so we can investigate it further and offer you a solution.

reisrf · Post by **reisrf** » Mon Jan 21, 2019 6:13 pm

No worries. My custom code is in place and it is working as expected. Case can be closed. Many thanks

Gabriela · Post by **Gabriela** » Mon Jan 21, 2019 9:02 pm

Hi,

Thank you for your return. Please do not hesitate to contact us if will need any custom solution or further technical assistance.

How to know a PDF page has text (invisible / we can search) (from OCR or it was created by Word, Excel or any other tool

How to know a PDF page has text (invisible / we can search) (from OCR or it was created by Word, Excel or any other tool

Re: How to know a PDF page has text (invisible / we can search) (from OCR or it was created by Word, Excel or any other

Re: How to know a PDF page has text (invisible / we can search) (from OCR or it was created by Word, Excel or any other

Re: How to know a PDF page has text (invisible / we can search) (from OCR or it was created by Word, Excel or any other

Re: How to know a PDF page has text (invisible / we can search) (from OCR or it was created by Word, Excel or any other

Re: How to know a PDF page has text (invisible / we can search) (from OCR or it was created by Word, Excel or any other

Re: How to know a PDF page has text (invisible / we can search) (from OCR or it was created by Word, Excel or any other

Who is online

Stay in Touch

About ORPALIS