How Can i Remove the Blank Pages from PDF

Example requests & Code samples for GdPicture Toolkits.
Post Reply
sharale
Posts: 2
Joined: Fri Dec 30, 2011 11:30 am

How Can i Remove the Blank Pages from PDF

Post by sharale » Fri Dec 30, 2011 11:46 am

I have a PFD which contains bunch of Scanned pages sometime that file Contain Blank pages and it get scanned as it is
I need to Load that file using GdPicturePDF or GdPictureImaging and run through the Pages and check if it has a blank image or No Text .
If the page is blank then I should remove it .
Then If the page contain the Color image then I should make it Gray Scale or Black and White which will reduce the Size of the Newly generated file
I am getting trouble in understanding how can I loop through the Pages and move to next page.
Please let me know How should I proceed , i need some Code Sample

I am Using Code like below but it is working for Text on the PDF but not for the Images on the File


Code: Select all

 private void ProcessPDF ()
{
           GdPicture.LicenseManager OBJ_License = new GdPicture.LicenseManager();
            OBJ_License.RegisterKEY("XXX");
            FileInfo[] allFiles = new System.IO.DirectoryInfo(@"D:\PDF\OLD\").GetFiles("*.pdf", SearchOption.AllDirectories);   // Get all the Files from Folder
            
            foreach (var f in allFiles) // loop through each file 
            {

                GdPicturePDF oGdPicturePDF = new GdPicturePDF(); 
                oGdPicturePDF.LoadFromFile(f.FullName, false); // Load Each File in PDF Object 
                oGdPicturePDF.EnableCompression(true);
                int pages = oGdPicturePDF.GetPageCount(); // Count the Pages in File 

                for (int i = 1; i <= pages; i++)
                {
                    bool selectPG = oGdPicturePDF.SelectPage(i);
                    //int PageNO = oGdPicturePDF.GetCurrentPage();
                    string txt = oGdPicturePDF.GetPageText(); // Get he Text on that Page 
                    //MessageBox.Show(PageNO.ToString() + "---" + txt.ToString());

                    
                    if (txt.Equals("")) // Check for the Text in the Page 
                    {

                        oGdPicturePDF.DeletePage(PageNO);
                    }

                }
                 oGdPicturePDF.SaveToFile(@"D:\PDF\NEW\" + f.Name, false);  // Save the file to New Location 
                 oGdPicturePDF.CloseDocument(); // Close the Object 
             }
 

}

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: How Can i Remove the Blank Pages from PDF

Post by Loïc » Fri Dec 30, 2011 3:15 pm

Hi,

Here a suggestion.

Code: Select all

            const int RASTER_DPI = 200;
            GdPictureImaging oGdPictureImaging = new GdPictureImaging();
            GdPicturePDF oInputPDF = new GdPicturePDF();
            GdPicturePDF oOutputPDF = new GdPicturePDF();
            oInputPDF.LoadFromFile(@"c:\input.pdf", false);
            oOutputPDF.NewPDF();
            for (int i = 1; i <= oInputPDF.GetPageCount(); i++)
            {
                oInputPDF.SelectPage(i);
                int rasterImageID = oInputPDF.RenderPageToGdPictureImageEx(RASTER_DPI, false);
                int bitDepth = oGdPictureImaging.GetBitDepth(rasterImageID);
                if (!oGdPictureImaging.IsBlank(rasterImageID))
                {
                    oGdPictureImaging.ConvertTo1BppAT(rasterImageID);
                    oOutputPDF.AddImageFromGdPictureImage(rasterImageID, false, true);
                }
                oGdPictureImaging.ReleaseGdPictureImage(rasterImageID);
            }
            oInputPDF.CloseDocument();
            oOutputPDF.SaveToFile(@"c:\newpdf.pdf", false);
            oOutputPDF.CloseDocument();
Kind regards,

Loïc

sharale
Posts: 2
Joined: Fri Dec 30, 2011 11:30 am

Re: How Can i Remove the Blank Pages from PDF

Post by sharale » Sat Dec 31, 2011 10:34 am

It Works Awesome !!!

I Tested it with a Sample Data of 7.49 MB, It is reduced to 336KB and still the Quality is good, I got Readable files as well as graphics

Thanks

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: How Can i Remove the Blank Pages from PDF

Post by Loïc » Thu Jan 05, 2012 9:29 pm

Hello,

Here a better version working with GdPicture.NET 8.5.3 and higher:

Code: Select all

            const int RASTER_DPI = 200;
            GdPictureImaging oGdPictureImaging = new GdPictureImaging();
            GdPicturePDF oGdPicturePDF = new GdPicturePDF();
            oGdPicturePDF.LoadFromFile(@"c:\input.pdf", false);
            oGdPicturePDF.SetMeasurementUnit(PdfMeasurementUnit.PdfMeasurementUnitPoint);
            oGdPicturePDF.EnableCompression(true);
            for (int i = 1; i <= oGdPicturePDF.GetPageCount(); i++)
            {
                oGdPicturePDF.SelectPage(i);
                int rasterImageID = oGdPicturePDF.RenderPageToGdPictureImageEx(RASTER_DPI, false);
                int bitDepth = oGdPictureImaging.GetBitDepth(rasterImageID);
                if (oGdPictureImaging.IsBlank(rasterImageID))
                {
                    //page is blank, we remove it
                    oGdPicturePDF.DeletePage(i--);
                }
                else
                {
                    //if the page is based on a single color image, we convert it to 1bpp. Warning: not 100% safe
                    if (bitDepth > 8 && !oGdPicturePDF.PageHasText() && !oGdPicturePDF.PageHasShape() && oGdPicturePDF.GetPageImageCount() == 1)
                    {
                        if (oGdPictureImaging.ConvertTo1BppAT(rasterImageID) == GdPictureStatus.OK)
                        {
                            //we remove the page then create a new one to remove unused resources on pack
                            float pageWidth = oGdPicturePDF.GetPageWidth();
                            float pageHeight = oGdPicturePDF.GetPageHeight();
                            oGdPicturePDF.DeletePage(i);
                            oGdPicturePDF.InsertPage(pageWidth, pageHeight, i);
                            string pdfImageResName = oGdPicturePDF.AddImageFromGdPictureImage(rasterImageID, false, false);
                            if (pdfImageResName != "")
                            {
                                oGdPicturePDF.DrawImage(pdfImageResName, 0, 0, pageWidth, pageHeight);
                            }
                            else
                            {
                                throw new Exception("Error embedding bitmap in PDF");
                            }
                        }
                        else
                        {
                            throw new Exception("Error during bitmap thresholding");
                        }
                    }
                }
                oGdPictureImaging.ReleaseGdPictureImage(rasterImageID);
            }
            oGdPicturePDF.SaveToFile(@"c:\newpdf.pdf", true); //we have to pack the doc to remove unused enbedded bitmaps
            oGdPicturePDF.CloseDocument();

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest