Page 1 of 1

False Positives

Posted: Mon Mar 11, 2013 3:25 pm
by ShaneH
Hi,

I am using ADR to split large PDFs created on a copier into individual documents, however I get a lot of false positives where the scanned image looks nothing like the template image. (This happens in about 10-20% of non-matching pages)

Code: Select all

 For pgTurner = 1 To gdIncomingPDF.GetPageCount
                Dim ImgNo As Integer
                gdIncomingPDF.SelectPage(pgTurner)
                ImgNo = gdIncomingPDF.ExtractPageImage(1)
                Dim nCloserTemplate As Integer = gdImage.ADRGetCloserTemplateForGdPictureImage(ImgNo)
                    If gdImage.ADRGetLastConfidence > TemplateConfidence And EndPage > 0 Then
                        Dim conf As Integer = gdImage.ADRGetLastConfidence
                        gdNewPDF.NewPDF()
                        gdNewPDF.SetKeyWords(String.Format("TemplateID={0};Confidence={1}", PrevTemplateID, PrevTemplateConfidence))
                        For pg = StartPage To EndPage
                            gdNewPDF.ClonePage(gdIncomingPDF, pg)
                        Next
                        StartPage = pgTurner
                        gdNewPDF.SaveToFile(String.Format(filename + "_{0}.pdf", DocNo))
                        gdNewPDF.CloseDocument()
                        DocNo += 1
                        

                    End If
                    If gdImage.ADRGetLastConfidence > TemplateConfidence Then
                        Dim tmpTemplate As TemplateItems = CurrentTemplates.First(Function(tmp As TemplateItems) tmp.templateID = gdImage.ADRGetCloserTemplateForGdPictureImage(ImgNo))
                        PrevTemplateConfidence = gdImage.ADRGetLastConfidence
                        PrevTemplateID = tmpTemplate.templateID
                    End If
                
                EndPage = pgTurner
            Next
            If PrevTemplateID = -1 Then
                tmpItem.Processed = True
            Else
                gdNewPDF.NewPDF()
                gdNewPDF.SetKeyWords(String.Format("TemplateID={0};Confidence={1}", PrevTemplateID, PrevTemplateConfidence))
                For pg = StartPage To EndPage
                    gdNewPDF.ClonePage(gdIncomingPDF, pg)
                Next
                gdNewPDF.SaveToFile(String.Format(filename + "_{0}.pdf", DocNo))
                gdNewPDF.CloseDocument()
           End If
For most images where there ought to be a match, confidence levels are 80-90%, but some pages that are not a match are creeping in at the same confidence levels. Is there a way to improve the accuracy? I have tried adding numerous images to the template with not much difference.

The attached files are heavily redacted, but included here so you can get a feel for the different layouts of the documents.

Thanks

Shane

Re: False Positives

Posted: Tue Mar 12, 2013 3:11 pm
by Loïc
Hello,

Unfortunately there is nothing that we can do today to improve your results. Basically the ADR engine, in its current version, expects to work with structured documents (for template and doc to identify).
We have research & development plans to make it more accurate with any kind of documents, but since we are talking about "research" I can't provide any hint about a date of availability of an enhanced version.

Thank you for your comprehension.

Kind regards,

Loïc

Re: False Positives

Posted: Tue Mar 12, 2013 6:16 pm
by ShaneH
OK.. In that case, would the forms recognition CreateAnchorTemplate / FindAnchor be any more accurate?