Extracting images from PDF in the most performant way possible

Example requests & Code samples for GdPicture Toolkits.
Post Reply
caiosm1005
Posts: 1
Joined: Fri Sep 08, 2023 12:18 am

Extracting images from PDF in the most performant way possible

Post by caiosm1005 » Fri Sep 08, 2023 1:32 am

Hi all!

We have a use case in which users might extract an image from PDF files on-demand. We are using GdPicture.NET v14.

Our archive of PDF files is hundreds of GB in size, hence we'd like to avoid running a batch script to pre-extract and cache all images from all documents, as this would virtually double storage consumption. Moreover, only a small percentage of those documents will have their images extracted on-demand.

Since the image extraction will run on-demand and in realtime upon user's request, performance is critical. I am trying to figure out a way of doing this, extracting images and returning its binary content to the client as-is, with the least amount of image processing as possible.

I tried using GdPictureImaging.SaveAsByteArray(), but I noticed that it takes a file format parameter. I don't need to convert it, I only want to grab it as a byte[] array, no compression/encoding needed.

This led me to believe that maybe there's a faster method. Is using GdPictureImaging.GetBitmapFromGdPictureImage() more suitable for this, then? I'd imagine that it just scans the image data and returns it as a Bitmap without any image processing whatsoever. Could this be confirmed?

Thank you!

Post Reply

Who is online

Users browsing this forum: Bing [Bot] and 2 guests