Extract text from an area of a pdf page

Discussions about document viewing.
Post Reply
GabiBan
Posts: 4
Joined: Tue Sep 01, 2009 8:45 am

Extract text from an area of a pdf page

Post by GabiBan » Thu Sep 24, 2009 12:59 pm

Hy,
I made an application wich extract text from an area of a pdf page. To do this I've used GdViewer's function: PdfGetPageTextArea but this didn't return anything.
How do I retrieve text from a particular area of a given page?

PS. My application doesn't have a graphical interface pdf, areas and page numbers are given by user at runtime via parameters.

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: Extract text from an area of a pdf page

Post by Loïc » Thu Sep 24, 2009 4:40 pm

Hi,

Please have a look on the sample named "Document Viewer With Thumbnails" There is a menu to extract text from a particular area.

Kind regards,

Loïc

GabiBan
Posts: 4
Joined: Tue Sep 01, 2009 8:45 am

Re: Extract text from an area of a pdf page

Post by GabiBan » Fri Sep 25, 2009 9:40 am

Yes , I know this example but the function doesn't work if there are no page displayed. I use this component into a program wich didn't have a graphical user interface and I don't display any page to the user.


1. The function PdfGetPageTextArea doesn't work if there are no DisplayPage function call before this?
2. PdfGetPageTextArea function doesn't extract text from the page if we doesn't previously set displayed page as the current page?

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: Extract text from an area of a pdf page

Post by Loïc » Fri Sep 25, 2009 11:31 am

Hi,

Try this:

Code: Select all

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim oGdViewer As New GdPicture.GdViewer

        oGdViewer.SetLicenseNumber("XXX")' Replace XXX by a valid demo or commercial key
        oGdViewer.DisplayFromFile("c:\test.pdf")
        MsgBox(oGdViewer.PdfGetPageTextArea(0, 0, 8.5, 11))
    End Sub
Kind regards,

Loïc

GabiBan
Posts: 4
Joined: Tue Sep 01, 2009 8:45 am

Re: Extract text from an area of a pdf page

Post by GabiBan » Fri Sep 25, 2009 12:53 pm

Loïc wrote:Hi,

Try this:

Code: Select all

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim oGdViewer As New GdPicture.GdViewer

        oGdViewer.SetLicenseNumber("XXX")' Replace XXX by a valid demo or commercial key
        oGdViewer.DisplayFromFile("c:\test.pdf")
        MsgBox(oGdViewer.PdfGetPageTextArea(0, 0, 8.5, 11))
    End Sub
Kind regards,

Loïc
This is what I have done but it doesn't work. It doesn't return anything.

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: Extract text from an area of a pdf page

Post by Loïc » Fri Sep 25, 2009 12:54 pm

Could you attach the PDF you are using ?

Kind regards,

Loïc

mattewan
Posts: 33
Joined: Fri Apr 03, 2009 5:58 pm

Re: Extract text from an area of a pdf page

Post by mattewan » Tue Oct 20, 2009 1:18 pm

I am also getting this same problem:

? GdViewer.PdfGetPageText()
""
? GdViewer.PdfGetPageText(1)
""

Opening the pdf in adobe reader i can select the text fine.

I am unable to attach the pdf hower, as it contains sensitive information

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: Extract text from an area of a pdf page

Post by Loïc » Tue Oct 20, 2009 1:19 pm

Hi,

Please send your PDF to esupport(at) gdpicture (dot) com

Kind regards,

Loïc

DBr
Posts: 21
Joined: Thu Dec 09, 2010 8:21 pm

Re: Extract text from an area of a pdf page

Post by DBr » Thu Dec 22, 2011 6:01 pm

Hi,

I hope it's ok to revive this thread, or should I have opened a new one?

I have the same problem with some different settings. I'm using a UI, so the page I want to get the text from is displayed. The PdfGetPageText() method works just fine and returns the whole text of the page. But the PdfGetPageTextArea() method always returns an empty string "". Except when I select the whole page (or for example 0,0,PageWidth,PageHeight ) it returns the whole text again like the first method.

Is there anything important you have to know about pdf for this case? Does it have to be a text based pdf? Unfortunately I can't send a sample pdf, if needed, I will have to search for a "sendable" pdf with the same effect first.


Kind Regards,
Dominik Braun

User avatar
Loïc
Site Admin
Posts: 5881
Joined: Tue Oct 17, 2006 10:48 pm
Location: France
Contact:

Re: Extract text from an area of a pdf page

Post by Loïc » Thu Dec 22, 2011 6:19 pm

Hi Dominik,

I suppose you are using bad coordinates.

Please do the following test:
- Start our demo application that come with the package, "Document Viewer"
- Click Options / Left Click / Area Selection
- Open a PDF
- Draw a rectangle over the text area to catch
- Click Text / Show text within the area of selection

This sample is available in C# & vb.NET.

Hope this helps !

Kind regards,

Loïc

DBr
Posts: 21
Joined: Thu Dec 09, 2010 8:21 pm

Re: Extract text from an area of a pdf page

Post by DBr » Thu Dec 22, 2011 6:50 pm

Ok,

sorry, a look to the demos should have helped immediately ^_^'

Thanks anyway

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest