Extract labeled data from unstructured and semi-structured documents with intelligent document processing. Automatically identify field labels and their corresponding values, and then extract and validate the data — no manual templates required.
A key-value pair links a label (key) to its corresponding data (value). The key is fixed and describes the data type, while the value is the variable part of the pair.
Example: Date (key): 06/04/22 (value)
Key-value pairs capture common business data from invoices and other documents:
Different document types contain different key-value pairs. Invoice fields differ from survey or government form fields. Structured documents like Excel files are easy to parse because values are already labeled, while unstructured documents require intelligent extraction.
Documents without a predefined data model or consistent organization contain unstructured data. This represents roughly 80 percent of all business documents.
Key-value pair extraction engines automatically identify and capture information from unstructured documents — no manual data entry required.
Built on the GdPicture.NET OCR engine, the KVP extractor uses a hybrid approach combining heuristics, mathematics, and machine learning techniques. Advanced OCR and MLP techniques enable the engine to automatically adapt to each document and select the optimal extraction strategy.
This hybrid approach overcomes the typical weaknesses of both traditional OCR and pure machine learning engines.
Adaptive denoising enables text recognition, even in poor-quality images with noise, graphics, and complex table structures.
Recognizes dotted lines, touching characters, and broken letterforms that pure ML engines can’t handle, and improves accuracy with reliable segmentation.
Preprocesses colored backgrounds using thresholding and image segmentation. Supports underlined and styled text.
Accurately recognizes skewed text and handles orientation issues across challenging page layouts.
Beyond key and value, the extraction engine returns two additional data points.
Identifies the nature of the extracted content — phone number, IBAN, name, credit card number, and more.
This confidence score is calculated from OCR results at character and word levels, including key type, page location, and other details.
INDUSTRIES
GET STARTED
Download and install the GdPicture.NET package to access compiled demo applications and multi-language sample projects with full source code.
\Samples\Bin\.\Samples\WinForm\.This example extracts data items from an invoice.
using GdPictureOCR gdpictureOCR = new GdPictureOCR();using GdPictureImaging gdpictureImaging = new GdPictureImaging();// Load the source document.int imageId = gdpictureImaging.CreateGdPictureImageFromFile(@"C:\temp\source.png");// Configure the OCR process.gdpictureOCR.ResourceFolder = @"C:\GdPicture.NET 14\Redist\OCR";gdpictureOCR.AddLanguage(OCRLanguage.English);gdpictureOCR.SetImage(imageId);// Run the OCR process.string ocrResultId = gdpictureOCR.RunOCR();string keyValuePairsData = "";for (int pairIndex = 0; pairIndex < gdpictureOCR.GetKeyValuePairCount(ocrResultId); pairIndex++){ keyValuePairsData += $"| Key: {gdpictureOCR.GetKeyValuePairKeyString(ocrResultId, pairIndex)} | " + $"Value: {gdpictureOCR.GetKeyValuePairValueString(ocrResultId, pairIndex)} | " + $"Document Type: {gdpictureOCR.GetKeyValuePairDataType(ocrResultId, pairIndex).ToString()} | " + $"Confidence Level: {Math.Round(gdpictureOCR.GetKeyValuePairConfidence(ocrResultId, pairIndex), 1).ToString()}% |\n";}// Write the output to the console.Console.WriteLine(keyValuePairsData);// Release unnecessary resources.gdpictureImaging.ReleaseGdPictureImage(imageId);gdpictureOCR.ReleaseOCRResults();Using gdpictureOCR As GdPictureOCR = New GdPictureOCR()Using gdpictureImaging As GdPictureImaging = New GdPictureImaging() ' Load the source document. Dim imageId As Integer = gdpictureImaging.CreateGdPictureImageFromFile("C:\temp\source.png") ' Configure the OCR process. gdpictureOCR.ResourceFolder = "C:\GdPicture.NET 14\Redist\OCR" gdpictureOCR.AddLanguage(OCRLanguage.English) gdpictureOCR.SetImage(imageId) ' Run the OCR process. Dim ocrResultId As String = gdpictureOCR.RunOCR() Dim keyValuePairsData = ""
For pairIndex As Integer = 0 To gdpictureOCR.GetKeyValuePairCount(ocrResultId) - 1 keyValuePairsData += $"| Key: {gdpictureOCR.GetKeyValuePairKeyString(ocrResultId, pairIndex)} | Value: {gdpictureOCR.GetKeyValuePairValueString(ocrResultId, pairIndex)} | Document Type: {gdpictureOCR.GetKeyValuePairDataType(ocrResultId, pairIndex).ToString()} | Confidence Level: {Math.Round(gdpictureOCR.GetKeyValuePairConfidence(CStr(ocrResultId), CInt(pairIndex)), CInt(1)).ToString()}% |" & vbLf Next ' Write the output to the console. Console.WriteLine(keyValuePairsData) ' Release unnecessary resources. gdpictureImaging.ReleaseGdPictureImage(imageId) gdpictureOCR.ReleaseOCRResults()End UsingEnd UsingKey-value pair extraction is an intelligent document processing technique that automatically identifies and extracts labeled data from unstructured and semi-structured documents. A key-value pair consists of a label (key) that describes a data type and its corresponding value. For example, in an invoice, “Invoice Number” is the key, and “12345” would be the value.
The GdPicture KVP extraction engine uses a hybrid approach combining heuristics, mathematics, and machine learning techniques to automatically identify field labels and their corresponding values without requiring manual templates.
The GdPicture key-value pair extraction engine works through a multistep process:
The hybrid approach ensures accurate extraction, even from noisy documents, skewed text, and challenging layouts that pure ML engines struggle with.
Key-value pair extraction is particularly beneficial for documents containing structured data fields, including:
The technology excels with unstructured documents that represent roughly 80 percent of all business documents — those without a predefined data model or consistent organization.
Key-value pair extraction faces several technical challenges that the GdPicture engine is designed to overcome:
The GdPicture hybrid approach addresses these challenges through advanced OCR techniques, machine learning, and intelligent preprocessing.
Implementing key-value pair extraction with GdPicture.NET is straightforward:
The SDK includes compiled demo applications and multi-language sample projects with full source code in C# and VB.NET to help you get started quickly.
60-day free trial