Extracting images from PDF in the most performant way possible

caiosm1005 · Post by **caiosm1005** » Fri Sep 08, 2023 1:32 am

Hi all!

We have a use case in which users might extract an image from PDF files on-demand. We are using GdPicture.NET v14.

Our archive of PDF files is hundreds of GB in size, hence we'd like to avoid running a batch script to pre-extract and cache all images from all documents, as this would virtually double storage consumption. Moreover, only a small percentage of those documents will have their images extracted on-demand.

Since the image extraction will run on-demand and in realtime upon user's request, performance is critical. I am trying to figure out a way of doing this, extracting images and returning its binary content to the client as-is, with the least amount of image processing as possible.

I tried using GdPictureImaging.SaveAsByteArray(), but I noticed that it takes a file format parameter. I don't need to convert it, I only want to grab it as a byte[] array, no compression/encoding needed.

This led me to believe that maybe there's a faster method. Is using GdPictureImaging.GetBitmapFromGdPictureImage() more suitable for this, then? I'd imagine that it just scans the image data and returns it as a Bitmap without any image processing whatsoever. Could this be confirmed?

Thank you!

Extracting images from PDF in the most performant way possible

Extracting images from PDF in the most performant way possible

Who is online

Stay in Touch

About ORPALIS