Hi,
I've built an invoice recognition/learning product that relies on the Recostar OCR engine. In particular, it processes the XML OCR results file created by Recostar. As I can't seem to find any way to license Recostar and because I'm wanting to build a web based point and click indexing solution, I'm considering GdPicture.
What I'd like to know is does Tesseract produce a similar kind of output file giving all characters and words along with their locations? Or do you have to use this command to get this information: PdfReaderGetPageTextWithCoords
Note that I searched for that command in the online documentation and I get no hits which is a bit of a worry?
Thanks, Turhan
OCR results file
Re: OCR results file
Do GdPicture staff monitor this forum at all? I've seen many very sensible questions in the forum go unanswered. And I've not had any response in over five days! To me that almost rules this product out because a product is really only as good as the support provided. Add to that the fact that back in 2010 this new method was released "PdfReaderGetPageTextWithCoords":
post9116.html?hilit=PdfReaderGetPageTex ... ords#p9116
So why can't I find any mention of it in documentation six years later? That's pretty much inexcusable from my perspective.
All of this is such a huge shame because the product actually looks really good. But there is no way I can risk launching a commercial product without quality support for the underlying engine driving it.
post9116.html?hilit=PdfReaderGetPageTex ... ords#p9116
So why can't I find any mention of it in documentation six years later? That's pretty much inexcusable from my perspective.
All of this is such a huge shame because the product actually looks really good. But there is no way I can risk launching a commercial product without quality support for the underlying engine driving it.
Re: OCR results file
PdfReaderGetPageTextWithCoords is a method that was introduced in GdPicture.NET 7 which is a long time discontinued version and this method does not exist in the product any more.
The reason is simple: since GdPicture.NET 8, PDF features have grown a lot and there is a separate PDF plugin that is in charge of all the PDF aspect, including the text extraction feature.
In the current GdPicture.NET release (GdPicture.NET 12) the method you are looking for is in the GdPicturePDF class and is called GetPageTextWithCoords.
Here is a link to the corresponding documentation: https://www.gdpicture.com/guides/gdpicture/web ... oords.html
The reason is simple: since GdPicture.NET 8, PDF features have grown a lot and there is a separate PDF plugin that is in charge of all the PDF aspect, including the text extraction feature.
In the current GdPicture.NET release (GdPicture.NET 12) the method you are looking for is in the GdPicturePDF class and is called GetPageTextWithCoords.
Here is a link to the corresponding documentation: https://www.gdpicture.com/guides/gdpicture/web ... oords.html
Who is online
Users browsing this forum: No registered users and 1 guest