pdf bad ocr
pdf bad ocr
Hello,
I'm creating a PDF OCR using tesseract from a tif, but with some files (like fax) the OCR result i complety wrong.
I attach a tif file as sample ..
Can you help me?
Mirko
I'm creating a PDF OCR using tesseract from a tif, but with some files (like fax) the OCR result i complety wrong.
I attach a tif file as sample ..
Can you help me?
Mirko
Re: pdf bad ocr
Hi,
This file contains different horizontal (204 dpi) & vertical (98 dpi) resolution.
What I can suggest is to resize the image in order to have similar resolution:
You should have good OCR improvement with this method.
Best regards,
Loïc
This file contains different horizontal (204 dpi) & vertical (98 dpi) resolution.
What I can suggest is to resize the image in order to have similar resolution:
Code: Select all
Dim ResFactor As Single
Dim Hres, Vres As Single
Vres = oGdPictureImaging.GetVerticalResolution(m_ImageID)
Hres = oGdPictureImaging.GetVerticalResolution(m_ImageID)
ResFactor = Hres / Vres
If ResFactor <> 1 Then
Call oGdPictureImaging.Resize(m_ImageID, CInt(oGdPictureImaging.GetWidth(m_ImageID) * ResFactor), oGdPictureImaging.GetHeight(m_ImageID), Drawing2D.InterpolationMode.HighQualityBicubic)
End If
Best regards,
Loïc
Re: pdf bad ocr
Hi,
It work fine for tif single page, but it doesn't work for multipage tif. It create one page pdf.
This is my code:
Mirko
It work fine for tif single page, but it doesn't work for multipage tif. It create one page pdf.
This is my code:
Code: Select all
Single ResFactor;
Single Hres;
Single Vres;
Vres = oGdPictureImaging.GetVerticalResolution(ImageID);
Hres = oGdPictureImaging.GetHorizontalResolution(ImageID);
ResFactor = Hres / Vres;
if (ResFactor != 1)
oGdPictureImaging.Resize(ImageID,
oGdPictureImaging.GetWidth(ImageID),
Convert.ToInt32(oGdPictureImaging.GetHeight(ImageID) * ResFactor),
System.Drawing.Drawing2D.InterpolationMode.HighQualityBicubic);
if (oGdPictureImaging.TiffIsMultiPage(ImageID))
{
if (makeSearchable == false)
oGdPictureImaging.PdfCreateFromMultipageTIFF(ImageID, filePDF, pdfA, "", "", "", "", "");
else
oGdPictureImaging.PdfOCRCreateFromMultipageTIFF(ImageID,
GetTesseractDictionary(dizionario),
_dirOCR, "", filePDF, pdfA, "", "", "", "", "");
}
else
{
if (makeSearchable == false)
oGdPictureImaging.SaveAsPDF(ImageID, filePDF, pdfA, "", "", "", "", "");
else
oGdPictureImaging.SaveAsPDFOCR(ImageID, filePDF,
GetTesseractDictionary(dizionario),
_dirOCR, "", pdfA, "", "", "", "", "");
}
oGdPictureImaging.ReleaseGdPictureImage(ImageID);
Re: pdf bad ocr
Hi,
You need to open the multipage tiff for read & write.
For that, just call the before opening a file.
Then, you will have to resize all page in a loop:
Let me know if you have other problem with this issue.
Kind regards,
Loïc
You need to open the multipage tiff for read & write.
For that, just call the
Code: Select all
TiffOpenMultiPageForWrite(True)
Then, you will have to resize all page in a loop:
Code: Select all
For i = 1 to oGdPictureImagingImaging.TiffGetPageCount(imageid)
oGdPictureImagingImaging.TiffSelectPage(imageid, i)
oGdPictureImagingImaging.Resize(imageid...)
Next i
Kind regards,
Loïc
Re: pdf bad ocr
Thank you .. it works fine.
Mirko
Mirko
Who is online
Users browsing this forum: Bing [Bot] and 1 guest