PDF + OCR not PDF/A

rromeijn · Post by **rromeijn** » Tue Jul 06, 2010 1:20 pm

How can i save the image as a plain PDF with OCR
this sample code

Imaging1.CreateImageFromFile ("image.tif")
Imaging1.SaveAsPDFOCR("output.pdf", TesseractDictionaryEnglish, App.Path & "\AppData")  'AppData includes dictionary files
Imaging1.CloseNativeImage

saves the image as PDF/A + OCR
i want plain PDF + OCR

rromeijn · Post by **rromeijn** » Mon Aug 16, 2010 11:24 am

a lot of views, but not 1 reply.

eagleman · Post by **eagleman** » Mon Aug 16, 2010 3:22 pm

@rromeijn

I do the following:

imageID = Imaging1.CreateGdPictureImageFromFile("00000001.JPG");
iPdfId = Imaging1.TwainPdfStart("00000001.PDF", true, "", "", "", "", "");
Imaging1.TwainAddGdPictureImageToPdf(iPdfId, imageID);
Imaging1.TwainPdfStop(iPdfId);
Imaging1.ReleaseGdPictureImage(imageID);

Good luck.

Eagleman

rromeijn · Post by **rromeijn** » Mon Aug 16, 2010 3:25 pm

thanks,

but that saves the image as a PDF without OCR
I need PDF with OCR, but not PDF/A with OCR

Post by **Loïc** » Mon Aug 16, 2010 3:31 pm

Hi,

This option is not available.
Why PDF/A is a problem for you ? PDF/A is certified 100% PDF compliant.

A workaround consists to remove the PDF/A flag replacing the header information "%âãÏÓ" by " " in the generated PDF. But there is no sense to do that as my humble opinion...

Kind regards,

Loïc

eagleman · Post by **eagleman** » Mon Aug 16, 2010 8:58 pm

To do OCR on image and save as PDF:

imageID = Imaging1.CreateGdPictureImageFromFile("00000001.JPG");
iPdfId = Imaging1.PdfOCRStart("00000001.PDF", true, "", "", "", "", "");
Imaging1.PdfAddGdPictureImageToPdfOCR(iPdfId
, imageID
, GdPicture.TesseractDictionary.TesseractDictionaryDutch
, Application.StartupPath.ToString() + "\\OCR"
, "");
Imaging1.PdfOCRStop(iPdfId);
Imaging1.ReleaseGdPictureImage(imageID);

Eagleman

Note: According to the manual, the 2nd parameter of PdfOCRStart (boolean): True to generate PDF in PDF/A format else False.

rromeijn · Post by **rromeijn** » Tue Aug 17, 2010 8:30 am

Eagleman,

according to my manual this function doesnt even exist.

rromeijn · Post by **rromeijn** » Tue Aug 17, 2010 8:34 am

Loic,

as you know, there are several restrictions to the PDF/A format that are not there in PDF(1.3)
(hyperlinks are not allowed)
I also have a customer who can only display PDF up to version 1.3 in his (expensive) software.

I will lookin to the option you described, but an option to save plain PDF would be nice.

eagleman · Post by **eagleman** » Tue Aug 17, 2010 5:08 pm

@rromeijn,

Make sure you have the latest manual. Although the manual does not show any version number, its name = "GdPicture_NET Document Imaging SDK.pdf" and is about 7.1 MB.

The function I mentioned does exist. Try the code I wrote earlier.

Succes.

Groet,
Eagleman

PDF + OCR not PDF/A

PDF + OCR not PDF/A

Re: PDF + OCR not PDF/A

Re: PDF + OCR not PDF/A

Re: PDF + OCR not PDF/A

Re: PDF + OCR not PDF/A

Re: PDF + OCR not PDF/A

Re: PDF + OCR not PDF/A

Re: PDF + OCR not PDF/A

Re: PDF + OCR not PDF/A

Who is online

Stay in Touch

About ORPALIS