Unable to parse Text

josef · Post by **josef** » Sat Mar 05, 2011 4:38 pm

Hi,
Using the code below does not return an error, but the output string is garbage. At first I thought it was the quality of the image (attached), but then I simply did an image capture of a pdf page and tried to scan it and that produced garbage as well. I have attached the image I am trying to scan. It is very poor quality. I have attached the code I am using as well, to make sure it isn't user error.

Here is the code I am using. I will basically run this in sort of a batch mode over dozens of .tif files, extract the text and work with the text later on in the code.

Code: Select all

           GdPictureImaging oGdPictureImaging = new GdPictureImaging();
           oGdPictureImaging.SetLicenseNumber("my key");
           oGdPictureImaging.SetLicenseNumberOCRTesseract("my key");

            int ImageId = oGdPictureImaging.CreateGdPictureImageFromFile(@"C:\projects\pdf conversion\OCR\3-5-2011 8-19-37 AM.png");
            String output=oGdPictureImaging.OCRTesseractDoOCR(ImageId,TesseractDictionary.TesseractDictionaryEnglish,"C:/Program Files/GdPicture.NET/Redist/OCR/","");
            Console.WriteLine(output);

Any help would be appreciated.

Thanks,
Josef

Post by **Loïc** » Tue Mar 08, 2011 6:38 pm

Hi Josef,

Unfortunately I can't help. The quality of the document is definitively too poor to get a good accuracy with the Tesseract engine.

Kind regards,

Loïc

Unable to parse Text

Unable to parse Text

Re: Unable to parse Text

Who is online

Stay in Touch

About ORPALIS