Page 1 of 1

OCR Individual Pages

Posted: Mon Mar 05, 2012 12:16 am
by rgoodson40
Hello,

I need to ocr individual pages of tif files but I can't figure out an easy way to do that. Basically, all I am doing is looping through each page of a document, ocr'ing the page and then storing the text in a database. I need to go page-by-page in order to show progress.

The problem is that the OCRTesseractDoOCR method ocr's an entire GD Picture image, so it appears that I could use that if I could load individual pages of a document into a GDPictureImage object. I can't figure out how to do that though. By the way, the images do not need to be displayed. This will all be done behind the scenes, minus the progress information.

Thanks,
Reagan

Re: OCR Individual Pages

Posted: Mon Mar 05, 2012 11:01 am
by Loïc
Hello Reagan,

Do you mean you want to OCR a multipage TIFF image ?

Regards,

Loïc

Re: OCR Individual Pages

Posted: Mon Mar 05, 2012 6:13 pm
by rgoodson40
Yes. But I would like to be able to do one page at a time so that I can show progress for it.

Thanks,
Reagan

Re: OCR Individual Pages

Posted: Tue Mar 06, 2012 5:48 pm
by Loïc
Hello,

ok it' easy to do:

1- Open the image
2- Select the desired page by using the TiffSelectPage() method
3- Run the ocr process

repeat 2-3 for each page of your file.

Let me know if I am not clear enough.

Kind regards,

Loïc

Re: OCR Individual Pages

Posted: Thu Mar 08, 2012 1:11 am
by rgoodson40
Thanks. That worked.

Reagan

Re: OCR Individual Pages

Posted: Wed Oct 10, 2012 10:18 am
by mdelbene
Hi Loïc,
I read your hint about OCR a Tiff multipage file, but I'm encountering some problems. I try to explain you.
I'm using the sample C# project installed in GdViewerSamplesv8\OCR\ with some changes.

I open a Tiff multipage, then I loop on the pages and I call OCR on each page.
This is the code:

Code: Select all

// opens the file
int m_ImageID = oGdPictureImaging.CreateGdPictureImageFromFile(fileName);
string sOCR = string.Empty;

// loop pages
if (oGdPictureImaging.TiffIsMultiPage(m_ImageID))
{
	int pageCount = oGdPictureImaging.TiffGetPageCount(m_ImageID);
	for (int i = 1; i <= pageCount; i++)
	{
		if (i > 1)
		    oGdPictureImaging.TiffSelectPage(m_ImageID, i);

		oGdPictureImaging.Scale(m_ImageID, 300, System.Drawing.Drawing2D.InterpolationMode.HighQualityBicubic);
		oGdPictureImaging.OCRTesseractReinit();
		sOCR += oGdPictureImaging.OCRTesseractDoOCR(m_ImageID, txtLang.Text, TextBox1.Text, string.Empty);
		oGdPictureImaging.OCRTesseractClear();
	}
}
At the end of procedure in my string sOCR I have the text of the first page of file repeating for three times (because my tiff file has three pages).
I tried to use the property TiffOpenMultiPageForWrite, but nothing changes.

The only way to have the purposed result is to use

Code: Select all

 m_ImageID = oGdPictureImaging.TiffCreateMultiPageFromFile(fileName);
instead of

Code: Select all

m_ImageID = oGdPictureImaging.CreateGdPictureImageFromFile(fileName);
The problem to use this method is that sometimes I don't have a filename but I have a stream, so I use the method gdPicture.CreateGdPictureImageFromStream(binaryContent).

I'm probably doing something wrong.
Can you help me?

Thank you in advance.
Michela

P.S. I'm using GdPicture v. 8.3.

Re: OCR Individual Pages

Posted: Wed Oct 10, 2012 12:23 pm
by Loïc
Hello,

First I suggest you to upgrade to latest 8.X edition. To get the download link, please create a ticket here: https://www.gdpicture.com/support/getting-support-from-our-team

Also, have you tried to replace CreateGdPictureImageFromFile by TiffCreateMultipageFromFile() method?

Kind regards,

Loïc

Re: OCR Individual Pages

Posted: Wed Oct 10, 2012 2:37 pm
by mdelbene
Yes, using the method TiffCreateMultiPageFromFile() I get the expected behaviour.
Thanks a lot.
Michela