PDF Optimization: An Overview


PDF Optimization: An Overview

Hi everyone,

We continue our PDF Optimization series (started with two articles about fonts, Pack and Optimize Fonts in PDF and Fonts Optimization in PDF) with a broader overview of what you can do to optimize PDFs.

It’s not news, electronic documents are everywhere.
Electronic invoices in PDF format, for example, represent the most considerable amount of circulated materials together with many other documents designated to archive. All of them need to be stored. As we like to say, storage space is like the air we breathe, and it will be given a real value when we will be threatened to miss it.

So, the time has come to offer within the GdPicture.NET toolkit a super-fast and powerful PDF compression tool. It can help anyone to get existing PDF files reduced up to 80% more than concurrent products.
Let’s say welcome to GdPicture.NET PDF Reducer SDK!

PDF Reducer SDK

The newly introduced GdPicturePDFReducer class provides innovative and highly sophisticated techniques based on ten years of continuous research. With the help of PDFReducerConfiguration class, you can optionally apply selected features to address all areas of compression and optimization, with a focus on font optimization, data compression, and image analysis.

Optimizing PDF documents in general

The usage is effortless and straightforward. Here is how it looks like:

GdPicturePDFReducer gdpicturePDFReducer = new GdPicturePDFReducer();

//PDFReducerConfiguration class provides different properties and useful options.
gdpicturePDFReducer.PDFReducerConfiguration.Author = "GdPicture.NET PDF Reducer SDK";
gdpicturePDFReducer.PDFReducerConfiguration.ProducerName = "Orpalis";
gdpicturePDFReducer.PDFReducerConfiguration.Title = "Sample document";

//When compressing your PDF files, you have the possibility to decide which version of PDF to use.
gdpicturePDFReducer.PDFReducerConfiguration.OutputFormat = PDFReducerPDFVersion.PdfVersionRetainExisting;

//By selecting required options through the PDFReducerConfiguration class,
//you enable or disable the features you want to accent, for example:
//Greatly optimizes the output file size by focusing on fonts.
gdpicturePDFReducer.PDFReducerConfiguration.PackFonts = true;

//Packing the document content before saving.
gdpicturePDFReducer.PDFReducerConfiguration.PackDocument = true;

//Processing the specified document.
GdPictureStatus status = gdpicturePDFReducer.ProcessDocument("input.pdf", "output.pdf");

See it in the documentation

Content removal

Let’s go deeper to see the robustness of this new feature.
PDF Optimization is about serializing several compression algorithms to surpass the limitations of some compression schemes while removing unwanted or unused objects and applying several other techniques when necessary (you can find more details describing the background of the content stream compression and the object streams generation in our detailed article).

So that we would entice you to read it, the compression rate we can deliver is, for example, from 726 KB to 63 KB or from 54.5 MB to 15.69MB. To remove redundant, unwanted, or unused objects from PDF document you can achieve by directly selecting which objects you want to eliminate:

//Content removal.
gdpicturePDFReducer.PDFReducerConfiguration.RemoveAnnotations = true;
gdpicturePDFReducer.PDFReducerConfiguration.RemoveBlankPages = true;
gdpicturePDFReducer.PDFReducerConfiguration.RemoveBookmarks = true;
gdpicturePDFReducer.PDFReducerConfiguration.RemoveEmbeddedFiles = true;
gdpicturePDFReducer.PDFReducerConfiguration.RemoveFormFields = true;
gdpicturePDFReducer.PDFReducerConfiguration.RemoveHyperlinks = true;
gdpicturePDFReducer.PDFReducerConfiguration.RemoveJavaScript = true;
gdpicturePDFReducer.PDFReducerConfiguration.RemoveMetadata = true;
gdpicturePDFReducer.PDFReducerConfiguration.RemovePageThumbnails = true;

See it in the documentation

Re-compressing images

Images are the next big part of the optimization process.
You can take control of the image compression by recompressing existing images in the PDF document. Decreasing unnecessary high resolutions can dramatically reduce the file size without affecting the viewing experience. Another benefit is that the engine optimizes images in case of size reduction only.

//Recompressing images.
gdpicturePDFReducer.PDFReducerConfiguration.RecompressImages = true;
gdpicturePDFReducer.PDFReducerConfiguration.ImageQuality = PDFReducerImageQuality.ImageQualityMedium;
 //Reducing the size by decreasing the image resolution.
gdpicturePDFReducer.PDFReducerConfiguration.DownscaleImages = true;
gdpicturePDFReducer.PDFReducerConfiguration.DownscaleResolution = 200;

See it in the documentation

Control image compression

To find further relevant details for image compression related to JPEG 2000 (for high definition images) and JBIG2 (for bitonal images option) schemes we again invite you to read our article.
We offer saving space, for example, up to 70.43% when using dedicated settings for reducing images in the PDF document:

//Automatic color detection.
gdpicturePDFReducer.PDFReducerConfiguration.EnableColorDetection = true;
//JBIG2 and JPEG 2000 settings.
gdpicturePDFReducer.PDFReducerConfiguration.EnableJBIG2 = true;
gdpicturePDFReducer.PDFReducerConfiguration.JBIG2PMSThreshold = 0.65f;
gdpicturePDFReducer.PDFReducerConfiguration.EnableJPEG2000 = true;
//Repairing characters.
gdpicturePDFReducer.PDFReducerConfiguration.EnableCharRepair = true;

See it in the documentation

MRC compression

The tool we are most proud of is our MRC compression engine based on MRC compression techniques (Hyper Compression).
It can reduce the size of images up to 8 to 10 times compared to JPEG in some cases, together with improving the rendering quality of the compressed documents. This approach uses image segmentation to compress areas with the optimum algorithm based on their characteristics. The method produces optimal results with document mixing text, graphics, and images.

//Enabling MRC compression.
gdpicturePDFReducer.PDFReducerConfiguration.EnableMRC = true;
gdpicturePDFReducer.PDFReducerConfiguration.DownscaleResolutionMRC = 200;
gdpicturePDFReducer.PDFReducerConfiguration.PreserveSmoothing = true;

See it in the documentation

Fonts optimization

Fonts optimization can have little to no change displaying the PDF document but has a significant impact when reducing the file size.
For example, deduplication of fonts or turning the embedded fonts into partial characters can deliver the compression from 527 KB to 34 KB. For more details, please refer to our article.

//Packing fonts.
gdpicturePDFReducer.PDFReducerConfiguration.PackFonts = true;

And finally, here is the option you can use to ensure the quality of a PDF document for optimum reading even on large files:

//Fast web view (linearization).
gdpicturePDFReducer.PDFReducerConfiguration.FastWebView = true;

As said before, storage is essential for electronic documents to survive.
So, let’s go and try the GdPicture.NET PDF Reducer SDK to make free storage around us for other important aspects of our lives.

And of course, we’re here if you have any questions or if you need any help with your PDF files.

See you next time!

Gabriela


Tags: