Searchable pdf from colored scanned pdf files

Discussions about machine vision support in GdPicture.
Post Reply
charuvas1
Posts: 38
Joined: Tue Dec 02, 2008 1:49 pm

Searchable pdf from colored scanned pdf files

Post by charuvas1 » Sat Aug 01, 2009 2:46 pm

hi,

I have been trying to create searchable pdf from colored scanned pdf file. Following is the code I picked up from this forum-

Code: Select all

       PDFid = oGdPictureImaging.PdfOCRStart(strflname, False, "", "", "", "", "")
            For i As Integer = 1 To GdViewer1.PageCount
                imageid = GdViewer1.PdfRenderPageToGdPictureImage(200, i)
              [color=#FF4040]  oGdPictureImaging.ConvertTo1Bpp(imageid)[/color]
                oGdPictureImaging.PdfAddGdPictureImageToPdfOCR(PDFid, imageid, TesseractDictionary.TesseractDictionaryEnglish, TextBox1.Text, "")
                GdViewer1.ReleaseGdPictureImage(imageid)
            Next
            oGdPictureImaging.PdfOCRStop(PDFid)
This creates a Black and white searchable pdf file. If I comment the line where it converts image to BW, I get a deadlock soon. Can you guide me how to go about creating a searchable pdf file without loosing its colors?

Thank you
Charu

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: Searchable pdf from colored scanned pdf files

Post by Loïc » Sun Aug 02, 2009 6:55 pm

Hi,

I don't understand what you mean by:
I get a deadlock soon

I tried the following code without any issue:

Code: Select all

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim PDFid As Integer
        Dim imageid As Integer
        GdViewer1.DisplayFromFile("c:\input.pdf")
        PDFid = oGdPictureImaging.PdfOCRStart("c:\output.pdf", False, "", "", "", "", "")
        For i As Integer = 1 To GdViewer1.PageCount
            imageid = GdViewer1.PdfRenderPageToGdPictureImage(200, i)
            'oGdPictureImaging.ConvertTo1Bpp(imageid)
            oGdPictureImaging.PdfAddGdPictureImageToPdfOCR(PDFid, imageid, TesseractDictionary.TesseractDictionaryEnglish, TextBox1.Text, "")
            GdViewer1.ReleaseGdPictureImage(imageid)
        Next
        oGdPictureImaging.PdfOCRStop(PDFid)
    End Sub
Kind regards,

Loïc

Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests