Page 1 of 1

Normal searchable PDF

Posted: Mon May 04, 2009 10:22 am
by lripoll
Hi,

I am performing several tests with OCR plugin and I've noticed that the size of the resultant PDF files is much bigger than the incoming one. The size relation I got in my test are very variable, examples are 318Kb to 2,08Mb, 336kb to 2,43Mb, 123Kb to 1Mb, 1,4Mb to 14,68Mb and the more spectacular I got is 200Mb to 1,6Gb.
Well, I'm assuming that this increase in size is due to the fact that the OCR plugin is creating PDF/A, which has to be bigger files you want it or not. PDF/A is not a requirement of my customer and I'm wondering if it is possible to create normal searchable PDFs, I mean not PDF/A.

So for sort, is my first assumption true? The increase size is due to the use of PDF/A format?
If so, is there any other way of creating searchable PDFs without increase the size so much?

These are relevant lines of the code I'm using:
For Tiff2PDF process:

Code: Select all

If oImaging.TiffIsMultiPage(nImageID) Then
   oImaging.PdfOCRCreateFromMultipageTIFFEx nImageID, pathOut, TesseractDictionarySpanish, App.Path & "\AppData"
Else
   oImaging.SaveAsPDFOCREx pathOut, TesseractDictionarySpanish, App.Path & "\AppData"  'In AppData we should have ne needed dictionary files
End If
For PDF2PDF process:

Code: Select all

For nPage = 1 To oGdViewer.PageCount
   oGdViewer.DisplayFrame (nPage)

   RasterizedPage = oGdViewer.GetNativeImage

   If nPage = 1 Then oImaging.TwainPdfOCRStartEx (pathOut) 'Crea PDF/A
    
   Call oImaging.TwainAddGdPictureImageToPdfOCR(RasterizedPage, TesseractDictionarySpanish, App.Path & "\AppData")
Next nPage

Re: Normal searchable PDF

Posted: Tue May 05, 2009 9:53 am
by Loïc
Hi Luis,

1 - Check you are using the latest edition - We added better compression support for bitonal image
2 - For yout PDF 2 PDF conversion there is 2 ways to reduce output size: Reduce the value of PDFDPIRendering propertry of the GdViewer control & make a conversion to 1bpp image (PDF rasterization builds 32bpp bitmap). IE:

Code: Select all

For nPage = 1 To oGdViewer.PageCount
   oGdViewer.DisplayFrame (nPage)
   oImaging.SetNativeImage (oImaging.CreateClonedImage(oGdViewer.GetNativeImage))
   oImaging.ConvertTo1Bpp

   RasterizedPage = oGdViewer.GetNativeImage

   If nPage = 1 Then oImaging.TwainPdfOCRStartEx (pathOut) 'Crea PDF/A
   
   Call oImaging.TwainAddGdPictureImageToPdfOCR(RasterizedPage, TesseractDictionarySpanish, App.Path & "\AppData")
  oImaging.CloseNativeImage()
Next nPage

Kind regards,

Loïc