Methods with Losses: Mixed Raster Content (MRC) Compression
Previously in our PDF Optimization In-depth Series:
- 1. Introduction to the Optimization of Existing PDF files: Methods
- 2. Lossless Methods: Optimization of Document Content
- 3. Lossless Methods: Compression of Streams and Fonts
- 4. Methods with Losses: JPEG 2000 and JBIG2
- 5. Methods with losses: resizing, color detection, and other techniques
MRC compression in PDF documents is highly efficient in file size reduction without visible degradation of document rendering.
MRC, also known as hyper-compression, is an image compression benefiting from image segmentation methods. It is particularly useful for images containing text and continuous-tone graphics.
This method reduces the size up to 8 to 10 times compared to JPEG.
In some cases, it also improves the quality of the rendering of documents.
The method combines many techniques, among which segmentation plays a key role.
crucial point is to separate certain regions of an image into three separate
images, so-called layers.
Each of these layers is subsequently altered and optimally compressed. The specification of the PDF format allows reconstructing the original document using specific rendering instructions.
Let’s describe the mentioned layers:
- The binary layer
The text and graphic elements are placed in their uniforms, like lines, flat areas, etc. This layer is ideally saved in high resolution, and it is compressed with the JBIG2 algorithm.
- The background layer
It remains from the original image once the content of the previous layer is removed. The resolution of this layer is defined by the user or by using decision algorithms. Then it is saved ideally with the JPEG 2000 compression algorithm.
- The foreground layer
It contains the colors of the binary layer, so it is, in a way, its equivalent of the chromatic channel. As for the JPEG and JPEG-2000 compressions, one can significantly reduce the resolution of this layer. It is usually compressed with the JPEG-2000 compression scheme.
How the MRC engine works
Here are essential steps the engine will follow:
1. Pretreatment of the image
The primary goal is to improve the accuracy of the next step.
It is the detection of paragraphs, lines, words, characters, graphic elements, etc.
Each of the segmented elements needs to be placed on the ideal layer.
The goal here is to improve the compression of each layer as well as the quality of the final rendering, such as noise suppression and contrast adjustment.
6. Compression of each layer for creating a resulting compressed image.
Here are some numbers for comparison.
Our recent blog article about MRC shows more very interesting examples of file size reduction.
You can and find an example of usage here.
We’re reaching the end of our PDF Optimization In-depth Series!
We hope you found these posts useful and that they helped you with the never-ending task of PDF compression…
You can download the original article on which this series was based here
Because we love working on complex files and we cannot say no to a challenge, please feel free to send us any file that causes you trouble. We’ll make sure (we’ll try our best!) our engine gives you a satisfactory result.
Loïc & Elodie