Key Value Pair Extraction SDK

Bring Intelligent Document Understanding and Processing features to your unstructured and semi-structured documents with the new key-value pair data extractor.

The engine is able to understand there is information with labels and values in a document, extract them and qualify the value, in an instant.

KVP and Intelligent Document Processing

KVP and Intelligent Document Processing

Key Value Pair

Key Value Pairs are two related data items, a key, and a value. The key defines the data and is fixed, and the value is variable and describes the key.

Date (key) : 06/14/22 (value)

Depending on the type of document, the key-value pair fields are different. For instance, the ones on an invoice will be different than the ones on a survey or a government form.
It’s easy to get key-value pairs from structured documents like excel files because the values are all named.

Key-value pair fields for invoices can be:

  • Invoice Number
  • Date
  • Total Amounts
  • Taxes

Any document that does not have a pre-defined data model or is not organized in a pre-defined manner have unstructured data, which represents about 90% off all documents generated.
For these documents, you will need a KVP extraction engine to retrieve the information.

The GdPicture.NET KVP engine

The KVP extract engine is fully part of the GdPicture.NET OCR engine and like the other OCR technologies (MICR, MRZ, OMR, contextual OCR, and more), it benefits from a hybrid approach that includes heuristics, mathematics, and ML capabilities.
We also use an adaptive layout understanding and the same underlying elements techniques as NLP technologies.

The GdPicture.NET engine automatically adapts to the document and searches for the right approach, making the best use of resources available.

This approach allows us to have excellent results on the usual weaknesses of traditional OCR and pure Machine Learning engines, especially with:

  • Text recognition in documents with lots of noise thanks to adaptive despeckling,
  • Dotted lines: ML engines often fail on recognizing them,
  • Touching & broken characters, thanks to character segmentation,
  • Text on colored background: we’re using thresholding and image segmentation to make the image easier to analyze by converting an image from color or grayscale into a binary image,
  • Underlined text,
  • Skewed text,
  • Text in graphics and tables.

The GdPicture.NET KVP extraction engine also provides two additional fields besides Key and Value: Type and Accuracy.

  • The Type data provides the nature of the content. For example: phone number, IBAN, name, credit card number, etc.
  • The Accuracy data is a confidence level. This confidence level is computed by taking into account various parameters like OCR results at character and word levels, type of key, position on the page, and more.

kvp results for a bank statement
GdPicture.NET KVP extractor results on an IBAN

Capabilities and benefits

Document management capabilities

  • enhanced document indexing
  • automatic labellisation
  • automatic removal of sensitive information (helps with redaction)
  • invoice processing

Global benefits

  • Less errors
  • Saves time… and money
  • Easier for compliance

Areas of use

Banking & FinanceBanking & Finance

How to use

You will find a compiled demo applications in

You will find a compiled demo applications in
[Install directory]\Samples\Bin\

You will find C# and VB.NET demo applications including source code in
[Install directory]\Samples\WinForm\

You will find other code snippets within the online reference guide found here GdPicture.NET Guides

Our Intelligent Document Processing technologies

