Extract text from PDF

Nemo · Post by **Nemo** » Mon Feb 15, 2016 4:01 pm

Hi all !
How to extract text from OCRed PDF to string using C# ?

i tried to use this code but got empty results..

ocrText = oGdPicturePDF.GetPageText();
if (String.IsNullOrEmpty(ocrText))
{
    if (oGdPicturePDF.OcrPage("eng", Environment.CurrentDirectory + @"\OCR\", "", (float)200.0) != GdPictureStatus.OK)
    {
        Console.WriteLine("OCR problem on page " + i.ToString() + ". Error: " + oGdPicturePDF.GetStat().ToString());
    }
    ocrText = oGdPicturePDF.GetPageText();
}
using (StreamWriter sw = File.AppendText("D:\\output.txt"))
{
    sw.WriteLine(ocrText);
    sw.WriteLine("=====================================================");
}

Post by **David** » Tue Feb 16, 2016 10:09 am

Hi,

GdPicture.NET is delivered with a fully functional sample illustrating this feature. It is available in the folder "GdPicture.NET 11\Samples\WinForm\C#\OCR".

In a nutshell, the sample does the following operations:
- load the pdf
- setup the ocr engine
- And do the ocr, getting the result in a string:
[...]
sOCR = oGdPictureImaging.OCRTesseractDoOCR(m_ImageID, txtLang.Text, TextBox1.Text, "");
[...]

If you are experiencing troubles, may I ask you to share the PDF you are trying to read?

Thank you,

David

Extract text from PDF

Extract text from PDF

Re: Extract text from PDF

Who is online

Stay in Touch

About ORPALIS