Page 1 of 1

KPV and Table extraction in PDF File

Posted: Tue Oct 17, 2023 2:26 pm
by jloizagah
Hi all.

I am starting using KPV and Table extraction new funcionalities, following your examples, and it seems that these new funtions rely on the OCR library. Even in your exaples, if you want to extract key par values or tables from a PDF file, the pdf is rasterized and an OCR proccess is performed. Is this always necesary?. If I am using a PDF file with its own text layer, why I have to convert it to images and perform an OCR?.

Best regards.

Re: KPV and Table extraction in PDF File

Posted: Wed Nov 08, 2023 11:07 am
by lindamat
Hello, I think if you have a PDF file with an embedded text layer and the library supports extracting text directly from the PDF's text layer, OCR and rasterization may not be necessary. The library should be able to access the text information directly and extract the desired key-value pairs or tables without the need for OCR. geometry dash subzero