IsBlank Method (GdPictureImaging)

Discussions about image processing and document imaging.
Post Reply
mtnguyen
Posts: 8
Joined: Mon Nov 13, 2017 11:33 pm

IsBlank Method (GdPictureImaging)

Post by mtnguyen » Mon Dec 11, 2017 4:47 am

Hi,

Is there any different between V11 and V14 regarding IsBlank Method?
As I have just upgraded to V14 to utilise OCRPages method but just noticed that the IsBlank method returns different results for the same Multipage TIF image. The method was called as follows:

if (m_GdPictureImaging.IsBlank(m_ImageID, float.Parse(this.tifToPDFConfig.DeleteBlanksBlankPixelThresholdInt), true))

this.tifToPDFConfig.DeleteBlanksBlankPixelThresholdInt was set to 99.95

Note: before making this post, I have downloaded the latest version 14.0.0.27. The version that had the different result was 11.2.0.7.

Regards

David
Posts: 66
Joined: Mon Feb 08, 2016 3:12 pm

Re: IsBlank Method (GdPictureImaging)

Post by David » Tue Dec 12, 2017 11:47 am

Hi,

We have changed the blank page detection in order to improve its accuracy. This change might request a slight change of the threshold under certain circumstances.

Feel free to share the impacted images in case you need us to have a look.

Regards,

David

mtnguyen
Posts: 8
Joined: Mon Nov 13, 2017 11:33 pm

Re: IsBlank Method (GdPictureImaging)

Post by mtnguyen » Thu Dec 14, 2017 1:45 am

Hi David,

I attached my test images. This is the process that we have been using with Ver 11 for TIF to PDF Ocr:
1. Remove punch holes on even pages only using GdPictureImaging.RemoveHolePunch
- Note: Ver 11 does not have AccountForPunchHoles in IsBlank method and with Ver 14 I can remove this function.

2. Auto rotate

3. Detect and delete blank page on even pages only using GdPictureImaging.IsBlank(m_ImageID, float.Parse(this.tifToPDFConfig.DeleteBlanksBlankPixelThresholdInt), true)
Add or delete page based on IsBlank result with threshold = 99.95:
- GdPictureImaging.TiffAddToMultiPageFile(m_ImageID, i, tiffcompression)
- GdPictureImaging.TiffDeletePage(m_ImageID, i)

4. OCR using GdPictureImaging.PdfOCRCreateFromMultipageTIFF
**Note:

Two tests was run with referencing to Ver 11 and 14 with exact same setting and Ver 11 seems to be better.

You mentioned the the blank page detection was changed. So is there an equivalent scale? As we have been using different settings for different groups of document.

I am looking forward to be able to use V14 with its OCRPages method and PDFCompression settings.

Regards,
Tri
Attachments
TestDoc.zip
(1.99 MiB) Downloaded 446 times

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: IsBlank Method (GdPictureImaging)

Post by Loïc » Thu Dec 21, 2017 6:44 pm

Hi,

What are we supposed to do with the image? Could you provide a code snippet and describe the expected result vs the obtained result?

Kind regards,

Loïc

mtnguyen
Posts: 8
Joined: Mon Nov 13, 2017 11:33 pm

Re: IsBlank Method (GdPictureImaging)

Post by mtnguyen » Mon Jan 29, 2018 1:39 am

Hi Loïc,

I apologise for late response.
As described in my first post, the IsBlank method returns different result after updating from V11 to V14 (using same threshold 99.95 ) which blank pages detected using V11 were not detected using V14.

David (in one of reply posts) mentioned that the Blank Page Detection was changed to improve accuracy; hence the change of the threshold.

Improving the method is good but we do not expect the change of threshold as it wont be practical to just "blind" upgrade to V14.
Is there any equivalent scale between the two version?
Is there a way where we could use the same IsBlank method of V11 and Ocrpages method of V14?

Here is the code snippet:

Code: Select all

using (GdPictureImaging m_GdPictureImaging = new GdPictureImaging())
{
	int m_ImageID;
	m_GdPictureImaging.TiffOpenMultiPageForWrite(true); // For performance
	m_ImageID = m_GdPictureImaging.CreateGdPictureImageFromFile(sourcePath);
	_PDFtoOCR.NewPDF(this.tifToPDFConfig.PDFA);
	_PDFtoOCR.OcrPagesProgress += this.OcrPagesProgress;
	_PDFtoOCR.OcrPagesDone += this.OcrPagesDone;
	float resolution = System.Math.Max(this.tifToPDFConfig.PDFDPI_IfLessThen_Int_ElseOriginal, m_GdPictureImaging.GetVerticalResolution(m_ImageID));
	switch (this.tifToPDFConfig.PDFCompressionBitonalSetting.ToUpper())
	{
		case "CCITT4":
			_PDFtoOCR.SetCompressionForBitonalImage(PdfCompression.PdfCompressionCCITT4);
			break;
		case "JBIG2":
			_PDFtoOCR.SetCompressionForBitonalImage(PdfCompression.PdfCompressionJBIG2);
			break;
		default:
			break;
	}

	switch (this.tifToPDFConfig.PDFCompressionColorSetting.ToUpper())
	{
		case "JPEG":
			_PDFtoOCR.SetCompressionForColorImage(PdfCompression.PdfCompressionJPEG);
			_PDFtoOCR.SetJpegQuality(this.tifToPDFConfig.PDFJPEGQuality_0_Worse_100_Best);
			break;
		case "JPEG2000":
			_PDFtoOCR.SetCompressionForColorImage(PdfCompression.PdfCompressionJPEG2000);
			_PDFtoOCR.SetJpeg2000Quality(this.tifToPDFConfig.PDFJPEG2000Quality_1_Best_512_Worse);
			break;
		default:
			break;
	}

	if (m_GdPictureImaging.TiffIsMultiPage(m_ImageID))
	{
		#region Tiff is multipage
		int NumberofDeletedPages = 0;
		int NumberOfPages = m_GdPictureImaging.TiffGetPageCount(m_ImageID);
		//loop through pages

		for (int i = 1; i <= NumberOfPages; i++)
		{
			//select each page in TIFF file
			m_GdPictureImaging.TiffSelectPage(m_ImageID, i);

			if (this.tifToPDFConfig.Detect_CompressBW == true)
			{
				m_GdPictureImaging.ColorDetection(m_ImageID, true, true, true);
			}

			if (this.tifToPDFConfig.BrightenColourInt > 0)
			{
				m_GdPictureImaging.SetBrightness(m_ImageID, this.tifToPDFConfig.BrightenColourInt);
			}

			if (this.tifToPDFConfig.RequiredResolution > 0)
			{
				ResizePage(m_ImageID, m_GdPictureImaging);
			}

			// 2016-10-20 : L. Oliver : Start of Addition
			if (this.tifToPDFConfig.PunchHoleRemovalAllOddEvenNone.ToLower() == "all" ||
				(this.tifToPDFConfig.PunchHoleRemovalAllOddEvenNone.ToLower() == "odd" && (i % 2 != 0)) ||
				(this.tifToPDFConfig.PunchHoleRemovalAllOddEvenNone.ToLower() == "even" && (i % 2 == 0)))
			{
				List<String> lstPunchHoleMargins = new List<String>(this.tifToPDFConfig.PunchHoleLocationsRightLeftTopBottom.ToLower().Split(new char[] { ',' }));

				if (lstPunchHoleMargins.Contains("left") && lstPunchHoleMargins.Contains("right") &&
						lstPunchHoleMargins.Contains("top") && lstPunchHoleMargins.Contains("bottom"))
					status = m_GdPictureImaging.RemoveHolePunch(m_ImageID, HolePunchMargins.MarginLeft | HolePunchMargins.MarginRight | HolePunchMargins.MarginTop | HolePunchMargins.MarginBottom);
				else if (lstPunchHoleMargins.Contains("left") && lstPunchHoleMargins.Contains("right") && lstPunchHoleMargins.Contains("top"))
					status = m_GdPictureImaging.RemoveHolePunch(m_ImageID, HolePunchMargins.MarginLeft | HolePunchMargins.MarginRight | HolePunchMargins.MarginTop);
				else if (lstPunchHoleMargins.Contains("left") && lstPunchHoleMargins.Contains("right") && lstPunchHoleMargins.Contains("bottom"))
					status = m_GdPictureImaging.RemoveHolePunch(m_ImageID, HolePunchMargins.MarginLeft | HolePunchMargins.MarginRight | HolePunchMargins.MarginBottom);
				else if (lstPunchHoleMargins.Contains("left") && lstPunchHoleMargins.Contains("top") && lstPunchHoleMargins.Contains("bottom"))
					status = m_GdPictureImaging.RemoveHolePunch(m_ImageID, HolePunchMargins.MarginLeft | HolePunchMargins.MarginTop | HolePunchMargins.MarginBottom);
				else if (lstPunchHoleMargins.Contains("right") && lstPunchHoleMargins.Contains("top") && lstPunchHoleMargins.Contains("bottom"))
					status = m_GdPictureImaging.RemoveHolePunch(m_ImageID, HolePunchMargins.MarginRight | HolePunchMargins.MarginTop | HolePunchMargins.MarginBottom);
				else if (lstPunchHoleMargins.Contains("left") && lstPunchHoleMargins.Contains("right"))
					status = m_GdPictureImaging.RemoveHolePunch(m_ImageID, HolePunchMargins.MarginLeft | HolePunchMargins.MarginRight);
				else if (lstPunchHoleMargins.Contains("left") && lstPunchHoleMargins.Contains("top"))
					status = m_GdPictureImaging.RemoveHolePunch(m_ImageID, HolePunchMargins.MarginLeft | HolePunchMargins.MarginTop);
				else if (lstPunchHoleMargins.Contains("left") && lstPunchHoleMargins.Contains("bottom"))
					status = m_GdPictureImaging.RemoveHolePunch(m_ImageID, HolePunchMargins.MarginLeft | HolePunchMargins.MarginBottom);
				else if (lstPunchHoleMargins.Contains("right") && lstPunchHoleMargins.Contains("top"))
					status = m_GdPictureImaging.RemoveHolePunch(m_ImageID, HolePunchMargins.MarginRight | HolePunchMargins.MarginTop);
				else if (lstPunchHoleMargins.Contains("right") && lstPunchHoleMargins.Contains("bottom"))
					status = m_GdPictureImaging.RemoveHolePunch(m_ImageID, HolePunchMargins.MarginRight | HolePunchMargins.MarginBottom);
				else if (lstPunchHoleMargins.Contains("top") && lstPunchHoleMargins.Contains("bottom"))
					status = m_GdPictureImaging.RemoveHolePunch(m_ImageID, HolePunchMargins.MarginTop | HolePunchMargins.MarginBottom);
				else if (lstPunchHoleMargins.Contains("left"))
					status = m_GdPictureImaging.RemoveHolePunch(m_ImageID, HolePunchMargins.MarginLeft);
				else if (lstPunchHoleMargins.Contains("right"))
					status = m_GdPictureImaging.RemoveHolePunch(m_ImageID, HolePunchMargins.MarginRight);
				else if (lstPunchHoleMargins.Contains("top"))
					status = m_GdPictureImaging.RemoveHolePunch(m_ImageID, HolePunchMargins.MarginTop);
				else if (lstPunchHoleMargins.Contains("bottom"))
					status = m_GdPictureImaging.RemoveHolePunch(m_ImageID, HolePunchMargins.MarginBottom);
				else
					status = m_GdPictureImaging.RemoveHolePunch(m_ImageID, HolePunchMargins.MarginLeft | HolePunchMargins.MarginRight | HolePunchMargins.MarginTop | HolePunchMargins.MarginBottom);

			}

			if (this.tifToPDFConfig.AutoRotateEnabled)
			{
				int intPageRotation = m_GdPictureImaging.OCRTesseractGetOrientation(m_ImageID, this.tifToPDFConfig.Language, this.tifToPDFConfig.DictionaryPath);
				if (intPageRotation != 0)
					status = m_GdPictureImaging.RotateAngle(m_ImageID, 360 - intPageRotation);
			}
			
			if (this.tifToPDFConfig.DeleteBlanks == true)
			{
				if (m_GdPictureImaging.IsBlank(m_ImageID, float.Parse(this.tifToPDFConfig.DeleteBlanksBlankPixelThresholdInt), true))
				{
					if (this.tifToPDFConfig.DeleteBlanksAllOddEven.ToLower() == "all" ||
						(this.tifToPDFConfig.DeleteBlanksAllOddEven.ToLower() == "odd" && (i % 2 != 0)) ||
						(this.tifToPDFConfig.DeleteBlanksAllOddEven.ToLower() == "even" && (i % 2 == 0)))
					{
						// GdPicture11
						//status = m_GdPictureImaging.TiffDeletePage(m_ImageID, i);
						
						// GdPicture14
						NumberofDeletedPages++;
					}
					else
					{
						// GdPicture11
						//status = m_GdPictureImaging.TiffAddToMultiPageFile(m_ImageID, i, tiffcompression);
						
						// GdPicture14
						_PDFtoOCR.AddImageFromGdPictureImage(m_ImageID, false, true);
					}
				}
				else
				{
					// GdPicture11
					//status = m_GdPictureImaging.TiffAddToMultiPageFile(m_ImageID, i, tiffcompression);
					
					// GdPicture14
					_PDFtoOCR.AddImageFromGdPictureImage(m_ImageID, false, true);
				}
			}
			else
			{
				// GdPicture11
				//status = m_GdPictureImaging.TiffAddToMultiPageFile(m_ImageID, i, tiffcompression);
				
				// GdPicture14
				_PDFtoOCR.AddImageFromGdPictureImage(m_ImageID, false, true);
			}
		}
		m_GdPictureImaging.ReleaseGdPictureImage(m_ImageID);

		// check if searchable PDF req
		if (this.tifToPDFConfig.SearchablePDF)
		{
			// GdPicture11 code
			
			// GdPicture14 code
			
		}
		else
		{
			// GdPicture11 code
			
			// GdPicture14 code
		}

		int pageCount = 0;

		using (GdPicturePDF gdPicturePDF = new GdPicturePDF())
		{
			gdPicturePDF.LoadFromFile(this.GetTempOutputFilename(file), true);
			pageCount = gdPicturePDF.GetPageCount();
		}

		if (NumberOfPages != (pageCount + NumberofDeletedPages))
		{
			success = false;
		}

	}
	else // is single page
	{
		#region Tiff is single
		#endregion
	}

	if (!(status == GdPictureStatus.OK))
	{
		success = false;
	}

	_PDFtoOCR.CloseDocument();
	m_GdPictureImaging.ReleaseGdPictureImage(m_ImageID);
	m_GdPictureImaging.Dispose();
}

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: IsBlank Method (GdPictureImaging)

Post by Loïc » Fri Feb 09, 2018 6:01 pm

Hi,

Could you attach the image that is badly recognized with the engine?

Kind regards,

Loïc

mtnguyen
Posts: 8
Joined: Mon Nov 13, 2017 11:33 pm

Re: IsBlank Method (GdPictureImaging)

Post by mtnguyen » Tue Feb 13, 2018 8:54 am

Hi Loic,

I have attached the doc in my previous reply.

I attached it again here anyway.

Regards,
Tri Nguyen
Attachments
TestDoc.zip
(1.99 MiB) Downloaded 430 times

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: IsBlank Method (GdPictureImaging)

Post by Loïc » Mon Mar 12, 2018 5:36 pm

Hi,

The latest version correctly detects the provided image as non blank.

Kind regards,

Loïc

mtnguyen
Posts: 8
Joined: Mon Nov 13, 2017 11:33 pm

Re: IsBlank Method (GdPictureImaging)

Post by mtnguyen » Tue Mar 13, 2018 2:09 am

Thanks Loïc,

I will check out your latest release.

Kind Regards,
Tri

mtnguyen
Posts: 8
Joined: Mon Nov 13, 2017 11:33 pm

Re: IsBlank Method (GdPictureImaging)

Post by mtnguyen » Mon Mar 19, 2018 1:31 am

Hi Loïc,

Firstly, I would like to thank GDPicture team for your recent release. The Blank page drop out is now much better.

Secondly, what we are trying to do is to print document ID on the back of every page (adjustable position) and expect the "account for margin" in IsBlank method will take care of it.
Could you please explain how the account for margin works in IsBlank(Int32,Single,Boolean,Boolean) Method:

public bool IsBlank(
int ImageID,
float Confidence,
bool AccountForMargins,
bool AccountForPunchHoles
)

I found a discussion regarding this. (viewtopic.php?t=4071)

If you need more info, please ask me.

Thank you for any help you can offer.
Tri

Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests