Support To Create .traineddata files.
Posted: Mon Nov 03, 2014 3:47 pm
Problem Statement:
-On some pdf files OCR engine is not able to recognize some numbers (for ex. 0(Zero) is recognized as English alphabet "O", 1(One) is recognized as English alphabet "I").
Proposed Solution:
To resolve such problems we are trying to create 'Trainable OCR Tool'. Inside this tool user can manually select the characters which are not properly recognized by the OCR and from these selected characters he can create his own custom dictionary(.traineddata file).
Query on GDPicture:
We would like to know whether GDPicture provides any mechanism to create .traineddata file by providing any text input to it.
Like for Google tesseract-ocr we found following information related to creation of .traineddata file.
-Using third party tool (like txt2image , ghostscript) we can create the .tif image file either from txt file or PDF file and after that we can create .box file by providing the .tif file to tesseract.exe .
-Using this tif and box file and following the procedure given at https://code.google.com/p/tesseract-ocr ... Tesseract3 we can generate .traineddata file.
Does GDPicture provide such capability?
-On some pdf files OCR engine is not able to recognize some numbers (for ex. 0(Zero) is recognized as English alphabet "O", 1(One) is recognized as English alphabet "I").
Proposed Solution:
To resolve such problems we are trying to create 'Trainable OCR Tool'. Inside this tool user can manually select the characters which are not properly recognized by the OCR and from these selected characters he can create his own custom dictionary(.traineddata file).
Query on GDPicture:
We would like to know whether GDPicture provides any mechanism to create .traineddata file by providing any text input to it.
Like for Google tesseract-ocr we found following information related to creation of .traineddata file.
-Using third party tool (like txt2image , ghostscript) we can create the .tif image file either from txt file or PDF file and after that we can create .box file by providing the .tif file to tesseract.exe .
-Using this tif and box file and following the procedure given at https://code.google.com/p/tesseract-ocr ... Tesseract3 we can generate .traineddata file.
Does GDPicture provide such capability?