The first article of our PDF Optimization In-depth Series is available here. The PDF format is interactive. During the release cycle of a PDF document, different people will use tools such as forms, annotations, attachments, and more. Archived PDF documents generally do not have the same use as those in circulation in a collaborative context. […]

How to redact content in PDF…efficiently.

Don’t get your fingers burned with uncovered sensitive data Have you ever heard of PDF Redaction? If not, well either you don’t have sensitive content to hide, either you are doing it the wrong way. If yes, ask yourself: am I redacting my files properly? In this article, you will learn how to redact PDF […]

General PDF PDF optimization

Lossless Methods: Compression of Streams and Fonts

Here are the links for the two previous articles of the series: A PDF document is composed of various data structures, described as different objects. In the scope of optimization, one is particularly important: the stream object. This object aims to store binary data representing the content of an image, font file, color profile, embedded […]

New events

Deep Learning for Text Recognition

Text recognition machine learning is transforming OCR (optical character recognition) by using advanced algorithms to improve accuracy and speed. In the previous articles of our series, we covered some of the deep learning techniques used in text detection, which is the first part of any OCR system. In this article, we will cover the second […]

blog Tutorial

Unlocking the Power of 2D Data Matrix Barcode Scanners with GdPicture .NET SDK

In the fast-paced world of digital data capture, businesses and developers rely on efficient barcode scanning solutions to streamline operations. Among the most versatile barcode types is the 2D Data Matrix barcode, widely used in manufacturing, healthcare, retail, and logistics. If you’re looking for a robust, accurate, and high-performance barcode scanning solution for .NET applications, […]

blog Tutorial

Read Text from PDFs Using C# with GdPicture.NET OCR

Sometimes, text in a scanned PDF isn’t selectable or searchable. This is where Optical Character Recognition (OCR) comes in. By using OCR, you can extract text from PDF files and save it in a file for editing or further processing. In this guide, you’ll learn how to use GdPicture.NET’s OCR engine to recognize text from […]

General PDF PDF optimization

Optimization of Existing PDF files: Methods

February 14, 2025

This paper was originally a presentation in French delivered at the PDF Day – France by Loïc Carrère, CEO of ORPALIS, in April 2019. Organized by the PDF Association, the PDF days are the meeting place of the PDF industry, where experts conduct educational (non-commercial) presentations, panel, and discussion-based sessions about the format. The richness […]

webinar

Automatically Export a Table from PDF to Excel

May 22, 2023

To export a table from pdf to excel, you need an efficient tool that ensures accurate data extraction while preserving the structure. This process is essential when dealing with complex tables in PDFs that need to be converted into editable, usable formats like Excel. Various methods, including software solutions and online tools, can help streamline […]

New events

Table Extraction Series – Part 3: Layout Understanding Approach

December 29, 2022

In this third article of the Table Extraction series, we’ll see how the GdPicture.NET engine goes beyond OCR and Deep Learning methods thanks to the Layout Understanding approach. You can read the first articles of the Table Extraction series here Part 1: ChallengesPart 2: Deep Learning Approaches The Layout Understanding approach A performant table extraction […]

New events

Table Extraction Series – Part 1: Challenges

December 22, 2022

To Extract tables from PDFs and electronic documents can be easy or very complex, depending on the nature of the file. In this series, we’ll see how automatic table extraction can help companies overcome various challenges. We will also compare the different approaches available on the market (OCR, Deep Learning, and Layout Analysis) and tell […]

Blog

Lossless Methods: Optimization of Document Content

How to redact content in PDF…efficiently.

Lossless Methods: Compression of Streams and Fonts

Deep Learning for Text Recognition

Unlocking the Power of 2D Data Matrix Barcode Scanners with GdPicture .NET SDK

Read Text from PDFs Using C# with GdPicture.NET OCR

Optimization of Existing PDF files: Methods

Automatically Export a Table from PDF to Excel

Table Extraction Series – Part 3: Layout Understanding Approach

Table Extraction Series – Part 1: Challenges