I've noticed an interesting and very strange behavior:
When I read a PDF file previously saved by GdPicturePdf with GdPicturePdf again, remove all hidden text, perform a new OCR recognition and save it again the file gets bigger and bigger every time I do this.
Here is a short sample code that reproduces this problem:
Code: Select all
for (var i = 0; i <= 10; i++)
{
var gdPicturePdf = new GdPicturePDF();
gdPicturePdf.LoadFromFile($"sample{i}.pdf");
gdPicturePdf.RemoveHiddenText();
gdPicturePdf.OcrPages("*", 0, "eng+deu", @"C:\GdPicture.NET 14\Redist\OCR", string.Empty, 300, OCRMode.FavorAccuracy, int.MaxValue, true);
gdPicturePdf.SaveToFile($"sample{i + 1}.pdf", true, false);
}
As you will see when running the sample project the "sample1.pdf" (after first saving it with GdPicturePdf) is 231 KB and after the 10th iteration the file "sample11.pdf" increased to 289 KB!
What's the reason for this?
As the hidden text is always cleared before the next OCR round, I would expect the file size to stay the same.
Why does it increase more and more every time?
Thanks
Riso