Page 1 of 1

Extract text from an area of a pdf page

Posted: Thu Sep 24, 2009 12:59 pm
by GabiBan
Hy,
I made an application wich extract text from an area of a pdf page. To do this I've used GdViewer's function: PdfGetPageTextArea but this didn't return anything.
How do I retrieve text from a particular area of a given page?

PS. My application doesn't have a graphical interface pdf, areas and page numbers are given by user at runtime via parameters.

Re: Extract text from an area of a pdf page

Posted: Thu Sep 24, 2009 4:40 pm
by Loïc
Hi,

Please have a look on the sample named "Document Viewer With Thumbnails" There is a menu to extract text from a particular area.

Kind regards,

Loïc

Re: Extract text from an area of a pdf page

Posted: Fri Sep 25, 2009 9:40 am
by GabiBan
Yes , I know this example but the function doesn't work if there are no page displayed. I use this component into a program wich didn't have a graphical user interface and I don't display any page to the user.


1. The function PdfGetPageTextArea doesn't work if there are no DisplayPage function call before this?
2. PdfGetPageTextArea function doesn't extract text from the page if we doesn't previously set displayed page as the current page?

Re: Extract text from an area of a pdf page

Posted: Fri Sep 25, 2009 11:31 am
by Loïc
Hi,

Try this:

Code: Select all

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim oGdViewer As New GdPicture.GdViewer

        oGdViewer.SetLicenseNumber("XXX")' Replace XXX by a valid demo or commercial key
        oGdViewer.DisplayFromFile("c:\test.pdf")
        MsgBox(oGdViewer.PdfGetPageTextArea(0, 0, 8.5, 11))
    End Sub
Kind regards,

Loïc

Re: Extract text from an area of a pdf page

Posted: Fri Sep 25, 2009 12:53 pm
by GabiBan
Loïc wrote:Hi,

Try this:

Code: Select all

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim oGdViewer As New GdPicture.GdViewer

        oGdViewer.SetLicenseNumber("XXX")' Replace XXX by a valid demo or commercial key
        oGdViewer.DisplayFromFile("c:\test.pdf")
        MsgBox(oGdViewer.PdfGetPageTextArea(0, 0, 8.5, 11))
    End Sub
Kind regards,

Loïc
This is what I have done but it doesn't work. It doesn't return anything.

Re: Extract text from an area of a pdf page

Posted: Fri Sep 25, 2009 12:54 pm
by Loïc
Could you attach the PDF you are using ?

Kind regards,

Loïc

Re: Extract text from an area of a pdf page

Posted: Tue Oct 20, 2009 1:18 pm
by mattewan
I am also getting this same problem:

? GdViewer.PdfGetPageText()
""
? GdViewer.PdfGetPageText(1)
""

Opening the pdf in adobe reader i can select the text fine.

I am unable to attach the pdf hower, as it contains sensitive information

Re: Extract text from an area of a pdf page

Posted: Tue Oct 20, 2009 1:19 pm
by Loïc
Hi,

Please send your PDF to esupport(at) gdpicture (dot) com

Kind regards,

Loïc

Re: Extract text from an area of a pdf page

Posted: Thu Dec 22, 2011 6:01 pm
by DBr
Hi,

I hope it's ok to revive this thread, or should I have opened a new one?

I have the same problem with some different settings. I'm using a UI, so the page I want to get the text from is displayed. The PdfGetPageText() method works just fine and returns the whole text of the page. But the PdfGetPageTextArea() method always returns an empty string "". Except when I select the whole page (or for example 0,0,PageWidth,PageHeight ) it returns the whole text again like the first method.

Is there anything important you have to know about pdf for this case? Does it have to be a text based pdf? Unfortunately I can't send a sample pdf, if needed, I will have to search for a "sendable" pdf with the same effect first.


Kind Regards,
Dominik Braun

Re: Extract text from an area of a pdf page

Posted: Thu Dec 22, 2011 6:19 pm
by Loïc
Hi Dominik,

I suppose you are using bad coordinates.

Please do the following test:
- Start our demo application that come with the package, "Document Viewer"
- Click Options / Left Click / Area Selection
- Open a PDF
- Draw a rectangle over the text area to catch
- Click Text / Show text within the area of selection

This sample is available in C# & vb.NET.

Hope this helps !

Kind regards,

Loïc

Re: Extract text from an area of a pdf page

Posted: Thu Dec 22, 2011 6:50 pm
by DBr
Ok,

sorry, a look to the demos should have helped immediately ^_^'

Thanks anyway