GdPictureOCR Class Members

In This Topic

The following tables list the members exposed by GdPictureOCR.

Public Constructors

	Name	Description
	GdPictureOCR Constructor	Creates a new instance of the GdPictureOCR class. This instance represents a wrapper, that you will need to perform all available OCR operations. At the same it enables you to find out all required results in one place.

Top

Public Properties

	Name	Description
	CharacterBlackList	Defines the restricted recognition characters, so called characters blacklist, that are not allowed to recognize during subsequent OCR processes. The recognition is disabled for the provided characters, that means the engine doesn't consider the specified characters when processing. For example, if you want to disable/not recognize the characters "0X@", set this parameter to "0X@".
	CharacterSet	Defines the restricted recognition characters, so called characters whitelist, that are allowed during subsequent OCR processes. The recognition is limited to the provided characters, that means the engine returns only the specified characters when processing. For example, if you want to recognize only numeric characters, set this parameter to "0123456789". If you want to recognize only uppercase letters, set it to "ABCDEFGHIJKLMNOPQRSTUVWXYZ". Set this parameter to the empty string to recognize all characters.
	Context	Specifies the OCR context to be used during subsequent OCR processes. You have to inform the engine of the layout type of the data you want to process.
	EnableOrientationDetection	Specifies, whether the engine will try to detect the page orientation, means the page standard rotation, automatically during subsequent OCR processes. Enabling this options may noticeably slow down the engine's performance.
	EnablePreprocessing	Specifies if image preprocessing is activated.
	EnableSkewDetection	Specifies, whether the engine will try to detect the page skew automatically during subsequent OCR processes.
	EnableVigorousDespeckle	Specifies if the OCR engine must try to vigorously remove noise during the recognition process. It corresponds to the tesseract textord_heavy_nr parameter.
	ExpectedSymbolCount	Specifies the number of expected symbols to detect and decode. Use 0 if this number is undefined.
	LanguageModelPenaltyNonDictWords	Specifies the penalty applied by the engine to all words, that are not listed in the dictionary (word_dawg / user_words wordlists). It must be a value within the interval from 0 to 1.
	LanguageModelPenaltyNonFreqDictWords	Specifies the penalty applied by the engine to all words, that are not listed in the frequent words dictionary (freq_dawg wordlist). It must be a value within the interval from 0 to 1.
	LoadFreqWordsDictionary	Defines, whether the engine should load the frequent words of the dictionaries for all added languages.
	LoadMainDictionary	Defines, whether the engine should load the main dictionary for all added languages.
	MaxCharHeight	Specifies the maximal accepted height, in pixels, for each recognized character.
	MaxCharWidth	Specifies the maximal accepted width, in pixels, for each recognized character.
	MaxThreadCount	Specifies the maximun number of threads this instance can allocate. The default value is equal to the number of logical cores of the hosting machin.
	MinCharHeight	Specifies the minimal accepted height, in pixels, for each recognized character.
	MinCharWidth	Specifies the minimal accepted width, in pixels, for each recognized character.
	OCRMode	Defines the OCR mode to be used during subsequent OCR processes. You can choose between speed or accuracy.
	OrientationDetectionAccuracyLevel	Specifies the OCR detection's accuracy level. It must be a value in the range 1 (worst accuracy, best speed) to 10 (best accuracy, worst speed).
	ResourcesFolder	Specifies the path to the directory containing the engine resources (mostly dictionaries).
	Timeout	Defines the time interval, means timeout, in milliseconds, that specifies the maximum time allowed for subsequent OCR processes before they are automatically interrupted.

Top

Public Methods

	Name	Description
	AddCustomDictionary	Adds a custom language dictionary from the defined resource folder to be used during subsequent OCR processes. You are able to add multiple custom languages by calling this method for each custom language file according to your preference. The specified language is then added internally in the current GdPictureOCR object.
	AddLanguage	Adds a known language from the defined resource folder to be used during subsequent OCR processes. You are able to add multiple languages by calling this method for each language according to your preference. The specified language is then added internally in the current GdPictureOCR object.
	Dispose	Disposes of the GdPictureOCR object completely. All related resources used by this object are released. All used OCR results within the current GdPictureOCR object are released too.
	GetAvailableLanguage	Returns a name of the specific known language dictionary available in the currently defined resource folder according to the index you have specified. You can use the GetAvailableLanguageCount method to determine the number of all available language dictionaries. The index is simply an integer value within the interval from 0 to GetAvailableLanguageCount-1.
	GetAvailableLanguageCount	Returns a number of all known language dictionaries available in the currently defined resource folder.
	GetAvailableLanguages	Returns a name list of all known language dictionaries available in the currently defined resource folder.
	GetAverageWordConfidence	Returns the average word confidence of a specified OCR result.
	GetBlockBottom	Returns the bottom y-coordinate of the bounding box of the specified block, that is a part of a specified OCR result. This method uses a coordinate system, where the origin is in the top-left corner of the processed image and the units are pixels.
	GetBlockCount	Returns the number of blocks within a specified OCR result.
	GetBlockFirstParagraphIndex	Returns the index of the first paragraph in the specified block, that is a part of a specified OCR result.
	GetBlockLeft	Returns the left x-coordinate of the bounding box of the specified block, that is a part of a specified OCR result. This method uses a coordinate system, where the origin is in the top-left corner of the processed image and the units are pixels.
	GetBlockOrientation	Returns the orientation of the specified block, that is a part of a specified OCR result.
	GetBlockParagraphCount	Returns the number of paragraphs within the specified block, that is a part of a specified OCR result.
	GetBlockRight	Returns the right x-coordinate of the bounding box of the specified block, that is a part of a specified OCR result. This method uses a coordinate system, where the origin is in the top-left corner of the processed image and the units are pixels.
	GetBlockSpecialFormat	Returns the special format of the specified block within a specified OCR result.
	GetBlockSpecialFormatData	Returns the special format data of the specified block within a specified OCR result, as JSON format.
	GetBlockTop	Returns the top y-coordinate of the bounding box of the specified block, that is a part of a specified OCR result. This method uses a coordinate system, where the origin is in the top-left corner of the processed image and the units are pixels.
	GetBlockType	Returns the type of the specified block within a specified OCR result.
	GetBlockWritingDirection	Returns the writing direction of the specified block, that is a part of a specified OCR result.
	GetCharacterAlternativeConfidence	Returns the confidence of specific alternative character.
	GetCharacterAlternativeCount	Returns the number of alternative symbols of a specific character recognized by the engine.
	GetCharacterAlternativeValue	Returns the value of specific alternative character.
	GetCharacterBottom	Returns the bottom y-coordinate of the bounding box of the specified character, that is a part of a specified OCR result. This method uses a coordinate system, where the origin is in the top-left corner of the processed image and the units are pixels.
	GetCharacterConfidence	Returns the confidence of the specified character, that is a part of a specified OCR result.
	GetCharacterCount	Returns the number of characters within a specified OCR result.
	GetCharacterLeft	Returns the left x-coordinate of the bounding box of the specified character, that is a part of a specified OCR result. This method uses a coordinate system, where the origin is in the top-left corner of the processed image and the units are pixels.
	GetCharacterRight	Returns the right x-coordinate of the bounding box of the specified character, that is a part of a specified OCR result. This method uses a coordinate system, where the origin is in the top-left corner of the processed image and the units are pixels.
	GetCharacterTop	Returns the top y-coordinate of the bounding box of the specified character, that is a part of a specified OCR result. This method uses a coordinate system, where the origin is in the top-left corner of the processed image and the units are pixels.
	GetCharacterValue	Gets the value of the specified character, that is a part of a specified OCR result.
	GetCharacterWordIndex	Returns the index of the word, which incorporates the specified character, that is a part of a specified OCR result.
	GetFormFieldCount	Returns the number of extracted form fields within a specified OCR result. Form fields extraction is automatically performed during each OCR process.
	GetFormFieldKeyRect	Returns the location of the key part of a specified form field.
	GetFormFieldKeyText	Returns the text of a specified form field.
	GetFormFieldType	Returns the type of a specified form field.
	GetFormFieldValueRect	Returns the location of the value part of a specified form field.
	GetFormFieldValueText	Returns the text of the key of a specified form field.
	GetKeyValuePairConfidence	Returns the detection confidence a specified key-value pair.
	GetKeyValuePairCount	Returns the number of extracted key-value pairs within a specified OCR result. Key-value pairs extraction is automatically performed during each OCR process.
	GetKeyValuePairDataType	Returns the data type of a specified key-value pair.
	GetKeyValuePairIsStrong	Returns whether a specific key-value pair is strong. A pair is marked as strong when a semantic relationship have been established during the detection process.
	GetKeyValuePairKeyRect	Returns the location of the key part of a specified key-value pair.
	GetKeyValuePairKeyString	Returns the string representation of the key part of a specified key-value pair.
	GetKeyValuePairPublicName
	GetKeyValuePairValueRect	Returns the location of the value part of a specified key-value pair.
	GetKeyValuePairValueString	Returns the string representation of the value part of a specified key-value pair.
	GetOCRResultText	Overloaded. Returns the recognized text of the provided OCR result, identifiable by its unique ID, as a formatted string. The empty lines, if recognized, are not provided in the resulting text.
	GetOrientation	Computes the page orientation of the image previously set by the SetImage method. In order to correct the resulting orientation, you need to rotate the image by the following angle: 360 - returned value.
	GetPageRotation	Returns the page rotation, in degrees, clockwise, detected during a specific OCR process.
	GetPageSkewAngle	Returns the page skew angle, in degrees, clockwise, detected during a specific OCR process.
	GetParagraphBlockIndex	Returns the index of the block, which incorporates the specified paragraph, that is a part of a specified OCR result.
	GetParagraphBottom	Returns the bottom y-coordinate of the bounding box of the specified paragraph, that is a part of a specified OCR result. This method uses a coordinate system, where the origin is in the top-left corner of the processed image and the units are pixels.
	GetParagraphCount	Returns the number of paragraphs within a specified OCR result.
	GetParagraphFirstTextLineIndex	Returns the index of the first text line in the specified paragraph, that is a part of a specified OCR result.
	GetParagraphJustification	Returns the justification of the specified paragraph, that is a part of a specified OCR result.
	GetParagraphLeft	Returns the left x-coordinate of the bounding box of the specified paragraph, that is a part of a specified OCR result. This method uses a coordinate system, where the origin is in the top-left corner of the processed image and the units are pixels.
	GetParagraphRight	Returns the right x-coordinate of the bounding box of the specified paragraph, that is a part of a specified OCR result. This method uses a coordinate system, where the origin is in the top-left corner of the processed image and the units are pixels.
	GetParagraphTextLineCount	Returns the number of text lines within the specified paragraph, that is a part of a specified OCR result.
	GetParagraphTop	Returns the top y-coordinate of the bounding box of the specified paragraph, that is a part of a specified OCR result. This method uses a coordinate system, where the origin is in the top-left corner of the processed image and the units are pixels.
	GetSerializedResult	Returns the whole OCR result based on internal GdPicture structures serialized as json string.
	GetStat	Returns the status of the last executed operation with the current GdPictureOCR object.
	GetTableCellRect	Returns the location of a cell in a specified table.
	GetTableCellText	Returns the text content of a cell in a specified table.
	GetTableColumnCount	Returns the number of columns in a specified table.
	GetTableColumnRect	Returns the location of a column in a specified table.
	GetTableCount	Returns the number of detected tables within a specified OCR result.
	GetTableRect	Returns the location of a specified table.
	GetTableRowCount	Returns the number of rows in a specified table.
	GetTableRowRect	Returns the location of a row in a specified table.
	GetTextLineBottom	Returns the bottom y-coordinate of the bounding box of the specified text line, that is a part of a specified OCR result. This method uses a coordinate system, where the origin is in the top-left corner of the processed image and the units are pixels.
	GetTextLineCount	Returns the number of text lines within a specified OCR result. The resulting value doesn't contain any empty lines, as they are not provided in the OCR result.
	GetTextLineFirstWordIndex	Returns the index of the first word in the specified text line, that is a part of a specified OCR result.
	GetTextLineLeft	Returns the left x-coordinate of the bounding box of the specified text line, that is a part of a specified OCR result. This method uses a coordinate system, where the origin is in the top-left corner of the processed image and the units are pixels.
	GetTextLineParagraphIndex	Returns the index of the paragraph, which incorporates the specified text line, that is a part of a specified OCR result.
	GetTextLineRight	Returns the right x-coordinate of the bounding box of the specified text line, that is a part of a specified OCR result. This method uses a coordinate system, where the origin is in the top-left corner of the processed image and the units are pixels.
	GetTextLineTop	Returns the top y-coordinate of the bounding box of the specified line, that is a part of a specified OCR result. This method uses a coordinate system, where the origin is in the top-left corner of the processed image and the units are pixels.
	GetTextLineValue	Gets the value of the specified line, that is a part of a specified OCR result.
	GetTextLineWordCount	Returns the number of words within the specified text line, that is a part of a specified OCR result.
	GetWordBottom	Returns the bottom y-coordinate of the bounding box of the specified word, that is a part of a specified OCR result. This method uses a coordinate system, where the origin is in the top-left corner of the processed image and the units are pixels.
	GetWordCharacterCount	Returns the number of characters within the specified word, that is a part of a specified OCR result.
	GetWordConfidence	Returns, the specified word confidence within a specified OCR result.
	GetWordCount	Returns the number of words within a specified OCR result.
	GetWordFirstCharacterIndex	Returns the index of the first character in the specified word, that is a part of a specified OCR result.
	GetWordFontSize	Returns the size of the detected font of the specified word, that is a part of a specified OCR result.
	GetWordIsFromDictionary	Returns, if the specified word within a specified OCR result, has been found in the added dictionaries.
	GetWordLeft	Returns the left x-coordinate of the bounding box of the specified word, that is a part of a specified OCR result. This method uses a coordinate system, where the origin is in the top-left corner of the processed image and the units are pixels.
	GetWordLineIndex	Returns the index of the line, which incorporates the specified word, that is a part of a specified OCR result.
	GetWordRecognitionLanguage	Returns the name of the language used to recognize the specified word, that is a part of a specified OCR result.
	GetWordRight	Returns the right x-coordinate of the bounding box of the specified word, that is a part of a specified OCR result. This method uses a coordinate system, where the origin is in the top-left corner of the processed image and the units are pixels.
	GetWordSpacesBefore	Returns the number of spaces before the specified word, that is a part of a specified OCR result.
	GetWordTop	Returns the top y-coordinate of the bounding box of the specified word, that is a part of a specified OCR result. This method uses a coordinate system, where the origin is in the top-left corner of the processed image and the units are pixels.
	GetWordValue	Returns the value of the specified word, that is a part of a specified OCR result.
	IsHeaderCell	Specify whether if the cell's coordinate is located in the table's header.
	ReleaseOCRResult	Releases an OCR result specified by its unique identifier. Each OCR result, identifiable by its unique ID, is internally attached to that GdPictureOCR object, which has executed the OCR process. By disposing of the current GdPictureOCR object you also release all attached OCR results.
	ReleaseOCRResults	Releases all results of all previously executed OCR processes within the current GdPictureOCR object. Each OCR result is internally attached to that GdPictureOCR object, which has executed the OCR process. By disposing of the current GdPictureOCR object you also release all attached OCR results.
	ResetParameters	Resets all parameters of the current GdPictureOCR object to their default values, including added languages, custom dictionaries and internal engine variables. This does not apply to OCR results used in the current object.
	ResetROI	Resets the previously specified region of interest, means completely removes the region's data.
	ResetSelectedDictionaries	Resets, means completely release, all previously added languages and dictionaries by the AddLanguage and the AddCustomDictionary methods.
	RunOCR	Overloaded. Executes the OCR using the available parameters you have specified within the current GdPictureOCR object.
	SaveAsDOCX	Overloaded. Saves the specified OCR result to a docx file.
	SaveAsHTML	Overloaded. Saves the specified OCR result to a html file.
	SaveAsText	Overloaded. Saves the specified OCR result to a text file.
	SaveAsXLSX	Overloaded. Saves the specified OCR results to an xlsx file.
	SetImage	Sets up the specified image object, so that it is subsequently used when you start the next OCR process. This step is mandatory before running any OCR. This aproach permits you to highly improve performance when running multiple subsequent OCR processes on the same image, for example using different regions of interest or using different charsets, etc.
	SetROI	Sets up the new region of interest (ROI) of an image, that is subsequently processed using the OCR. Only the specified region is included into the next OCR process.
	SetVariable	Sets up a specified value for an internal parameter of the OCR engine. The Tesseract engine has a large number of control parameters to modify its behavior.

Top

Reference

GdPictureOCR Class
GdPicture14 Namespace
GdPictureOCR Constructor
Dispose Method