PDF + OCR not PDF/A

Discussions about machine vision support in GdPicture.
Post Reply
rromeijn
Posts: 28
Joined: Fri Jun 18, 2010 4:21 pm

PDF + OCR not PDF/A

Post by rromeijn » Tue Jul 06, 2010 1:20 pm

How can i save the image as a plain PDF with OCR
this sample code

Code: Select all

Imaging1.CreateImageFromFile ("image.tif")
Imaging1.SaveAsPDFOCR("output.pdf", TesseractDictionaryEnglish, App.Path & "\AppData")  'AppData includes dictionary files
Imaging1.CloseNativeImage
saves the image as PDF/A + OCR
i want plain PDF + OCR

rromeijn
Posts: 28
Joined: Fri Jun 18, 2010 4:21 pm

Re: PDF + OCR not PDF/A

Post by rromeijn » Mon Aug 16, 2010 11:24 am

a lot of views, but not 1 reply.

eagleman
Posts: 27
Joined: Mon Jan 25, 2010 1:48 pm

Re: PDF + OCR not PDF/A

Post by eagleman » Mon Aug 16, 2010 3:22 pm

@rromeijn

I do the following:

imageID = Imaging1.CreateGdPictureImageFromFile("00000001.JPG");
iPdfId = Imaging1.TwainPdfStart("00000001.PDF", true, "", "", "", "", "");
Imaging1.TwainAddGdPictureImageToPdf(iPdfId, imageID);
Imaging1.TwainPdfStop(iPdfId);
Imaging1.ReleaseGdPictureImage(imageID);


Good luck.

Eagleman

rromeijn
Posts: 28
Joined: Fri Jun 18, 2010 4:21 pm

Re: PDF + OCR not PDF/A

Post by rromeijn » Mon Aug 16, 2010 3:25 pm

thanks,

but that saves the image as a PDF without OCR
I need PDF with OCR, but not PDF/A with OCR

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: PDF + OCR not PDF/A

Post by Loïc » Mon Aug 16, 2010 3:31 pm

Hi,

This option is not available.
Why PDF/A is a problem for you ? PDF/A is certified 100% PDF compliant.

A workaround consists to remove the PDF/A flag replacing the header information "%âãÏÓ" by " " in the generated PDF. But there is no sense to do that as my humble opinion...

Kind regards,

Loïc

eagleman
Posts: 27
Joined: Mon Jan 25, 2010 1:48 pm

Re: PDF + OCR not PDF/A

Post by eagleman » Mon Aug 16, 2010 8:58 pm

To do OCR on image and save as PDF:

imageID = Imaging1.CreateGdPictureImageFromFile("00000001.JPG");
iPdfId = Imaging1.PdfOCRStart("00000001.PDF", true, "", "", "", "", "");
Imaging1.PdfAddGdPictureImageToPdfOCR(iPdfId
, imageID
, GdPicture.TesseractDictionary.TesseractDictionaryDutch
, Application.StartupPath.ToString() + "\\OCR"
, "");
Imaging1.PdfOCRStop(iPdfId);
Imaging1.ReleaseGdPictureImage(imageID);


Eagleman

Note: According to the manual, the 2nd parameter of PdfOCRStart (boolean): True to generate PDF in PDF/A format else False.

rromeijn
Posts: 28
Joined: Fri Jun 18, 2010 4:21 pm

Re: PDF + OCR not PDF/A

Post by rromeijn » Tue Aug 17, 2010 8:30 am

Eagleman,

according to my manual this function doesnt even exist.

rromeijn
Posts: 28
Joined: Fri Jun 18, 2010 4:21 pm

Re: PDF + OCR not PDF/A

Post by rromeijn » Tue Aug 17, 2010 8:34 am

Loic,

as you know, there are several restrictions to the PDF/A format that are not there in PDF(1.3)
(hyperlinks are not allowed)
I also have a customer who can only display PDF up to version 1.3 in his (expensive) software.

I will lookin to the option you described, but an option to save plain PDF would be nice.

eagleman
Posts: 27
Joined: Mon Jan 25, 2010 1:48 pm

Re: PDF + OCR not PDF/A

Post by eagleman » Tue Aug 17, 2010 5:08 pm

@rromeijn,

Make sure you have the latest manual. Although the manual does not show any version number, its name = "GdPicture_NET Document Imaging SDK.pdf" and is about 7.1 MB.

The function I mentioned does exist. Try the code I wrote earlier.

Succes.

Groet,
Eagleman

Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests