Extract structured data from tables in any document using AI-powered detection. GdPicture.NET’s table extraction engine uses machine vision and artificial intelligence for fast, accurate results across all business documents.
Built on GdPicture’s intelligent document processing technology, the table extraction engine handles the challenges that break traditional OCR.
Integrate table extraction into your workflows with just a few lines of code.
Extract tables at 100 pages per second — 60× faster than traditional ML approaches. Streamline document processing and reduce manual data entry.
Adaptive resource management supports high-volume processing. Works on all document types, including scanned PDFs, digital PDFs, and images.
Export extracted tables to Excel (XLSX), JSON, or Markdown. Deploy on-premises or embedded in your application.
HOW IT WORKS
USE CASES
GET STARTED
Download and install the GdPicture.NET package to access compiled demo applications and multi-language sample projects with full source code.
\Samples\Bin\.\Samples\WinForm\.This example shows how to extract data from an invoice using the preconfigured invoice template with four additional custom fields.
static void RunExtraction(){ Configuration.RegisterGdPictureKey("GDPICTURE_KEY"); Configuration.RegisterLLMProvider(new OpenAIProvider("OPENAI_API_KEY")); Configuration.ResourcesFolder = "resources";
// Building the component. ProcessorComponent component = buildComponent();
// Process the document. ProcessorResult result = new DocumentProcessor().Process("invoice.pdf", component);
// Analyze results. if (result.ExtractedFields != null) { foreach (var item in result.ExtractedFields) { Console.WriteLine($"Field name: {item.FieldName} | Field value: {item.Value} | Validation state: ({item.ValidationState})"); } }}
static ProcessorComponent buildComponent(){ return new ProcessorComponent() { EnableClassifier = false, // Classification is not required, as you're using a single template. EnableFieldsExtraction = true, // Enabling extraction of fields specified from the templates defined in the "Templates" field below. Templates = new DocumentTemplate[] { buildInvoiceTemplate() } };}
static DocumentTemplate buildInvoiceTemplate(){ // Template setup: getting an instance of the preconfigured invoice template. DocumentTemplate invoiceTemplate = DocumentTemplates.Invoice;
// Adding a custom field to the invoice template instance // to get the order ID of a specific format. invoiceTemplate.AddField(new() { Name = "Order ID", Format = FieldDataFormat.Text, SemanticDescription = "The order ID in the invoice", RegexValidationMethods = new List<RegexFieldValidationMethod> { new RegexFieldValidationMethod("^[A-Z][0-9]{1,6}$") } });
// Adding a custom field to the invoice template instance // to calculate the payment due date based on the information in the invoice. invoiceTemplate.AddField(new() { Name = "Payment due date", Format = FieldDataFormat.Text, SemanticDescription = "The date that the payment is due", StandardValidationMethods = new List<StandardFieldValidationMethod>() { new StandardFieldValidationMethod(StandardFieldValidation.DateIntegrity) } });
// Adding a custom field to the template instance // to get the unique item count. invoiceTemplate.AddField(new() { Name = "Unique item count", Format = FieldDataFormat.Number, SemanticDescription = "The total number of unique items", StandardValidationMethods = new List<StandardFieldValidationMethod>() { new StandardFieldValidationMethod(StandardFieldValidation.NumberIntegrity) } });
// Adding a custom field to the template instance // to get the sum total of all items. invoiceTemplate.AddField(new() { Name = "Total item count", Format = FieldDataFormat.Number, SemanticDescription = "The sum of all items, including multiplying the quantities of each item", StandardValidationMethods = new List<StandardFieldValidationMethod>() { new StandardFieldValidationMethod(StandardFieldValidation.NumberIntegrity) } });
return invoiceTemplate;}The table extraction engine can process all types of documents, including scanned PDFs, digital PDFs, and images. It handles documents with low image quality, noise, skewed pages, and various table formats, including bordered, semi-bordered, and borderless tables.
The engine uses machine vision and artificial intelligence to handle challenges that break traditional OCR. It can accurately extract tables from documents with colored cells, characters touching cell borders, tables spanning page breaks, and various other complex scenarios.
Yes. Extracted tables can be exported to multiple formats, including Excel (XLSX), JSON, and Markdown. You can extract individual tables or all tables from a document, preview each table, or combine multiple tables in a single output file.
Yes. The engine can detect and process multiple tables within a single document. It identifies all table regions on each page, including bounding boxes for borderless tables, and can extract them individually or all at once.
Common use cases include invoice automation for extracting line items and totals, quality control for verifying data accuracy across documents, form automation for processing standardized documents, and data reconciliation for comparing tables across multiple documents for financial reporting.
Yes. GdPicture.NET is one of Nutrient’s core products, and the table extraction technology is shared across Nutrient’s desktop, web, and mobile SDKs, delivering consistent structured data capture from complex document layouts across platforms.
60-day free trial