Introducing PDF/A-4 Support in our PDF and Document Processing SDKs
We’re happy to provide conversion and validation support for all PDF/A versions, including the latest PDF/A-4 ISO standard (19005-4:2020) and its subsets PDF/A-4e and PDF/A-4f.
Introducing conversion and validation support for a new standard generally depends on two things: a good understanding of the specification and many examples to test the engines. In the case of PDF/A, taking care of the conversion engine is only the first step.
PDF/A is a critical format used for legal archival and preservation of electronic documents, and companies must be sure that the converted documents pass validation tests.
If the PDF/A-4 converter is relatively straightforward to do, especially since we’ve been implementing the PDF/A specification from its first version, it is not the case for the validator. The adoption of PDF/A-4 is quite recent. Even if the specification was released in 2020, “real-world” documents, aka documents that were not created for the sole sake of testing, are still scarce.
The market is still slow on the PDF/A-4 adoption because software editors are waiting for competitors to release their converter and validator engines, so they have a point of comparison before starting to build theirs. But companies cannot develop engines out of thin air if people do not use the format. So until recently, we were all in a chicken or egg situation with PDF/A-4.
This is why we’re only releasing PDF/A-4 support now because we can find enough resources to provide reliable engines.
Do you want to know more about PDF/A conversion and validation challenges? We presented a webinar about it at the last PDF Days Europe, here is the replay:
The PDF/A format: a quick recap
The format’s history started back in 2005, even before the PDF itself was standardized (PDF ISO 32000-1:2008). Until version 4, PDF/A includes various conformance levels: a, b, and u, detailed below.
PDF/A-1 ISO 19005-1: 2005
- Based on PDF 1.4.
- PDF/A-1a (with a for “accessible”) aims to increase the accessibility of the file thanks to a logical document structure and necessary tags and information to help users with assistive technologies.
- PDF/A-1b (with b for “basic”) only includes the necessary features needed for reliable reproduction of the document’s visual appearance.
PDF/A-2 ISO 19005-2: 2011
- Based on PDF 1.7 (ISO 32000-1:2008)
- We find three conformance levels in this version: PDF/A-2a, PDF/A-2b, and PDF/A-2u.
- Based on PDF 1.7, PDF/A-2 includes all the new features of the PDF specification, such as transparency and optional content (layers).
- An additional significant feature is embedding PDF/A files to facilitate archiving sets of documents with a single file.
- The new conformance level PDF/A-2u (with u for “Unicode”) requires all text in the document to have Unicode mapping.
PDF/A-3 ISO 19005-3: 2012
- Based on PDF 1.7 (ISO 32000-1:2008)
- Three conformance levels are available: PDF/A-3a, PDF/A-3b, and PDF/A-3u.
- PDF/A-3 allows embedding arbitrary file formats (such as XML, CSV, CAD, word-processing, spreadsheet, and others) into PDF/A documents.
The different PDF/A-4 versions
PDF/A-4 ISO 19005-4:2020 is based on PDF 2.0 (ISO 32000-2:2020)
It comes in three flavors, like the previous versions but with a plot twist. We now have a “base” version and two new optional conformance levels:
- PDF/A-4: without any conformance level
Originally when PDF/A-4 was started, the plan was to simplify the standard. So the first thing to do was remove the conformance levels and leave just a single version of the standard.
However, this solution revealed not optimal for all cases, and for some exceptional cases, the e and f optional conformance levels were added.
So, in the end, the PDF/A-4 base replaces the previous b and u conformance levels.
Regarding the accessible features brought by conformance level a, the PDF/UA standard (ISO 14289-1:2014, with the ISO 14289-2 coming soon), specifically dedicated to accessible PDF, is taking over PDF/A.
The conformance level f allows embedding files in any other format.
PDF/A-4e is intended for engineering documents and acts as a successor to the PDF/E-1 standard (the PDF standard for engineering). PDF/A-4e supports Rich Media and 3D Annotations as well as embedded files.
Why should you use PDF/A-4 now?
Now that we’ve described the format’s history, let’s look at the benefits of using PDF/A-4.
Benefits of PDF/A-4
- Based on the PDF 2.0 specification (ISO 32000-2:2020)
It’s always better to work with up-to-date specifications and technologies. A great effort has been made in how the PDF 2.0 specification is written, allowing a clearer understanding of the standard.
- Support for modern and current-generation technologies
PDF/A-3 was released in 2012. Since then, many other technologies have evolved a lot, especially Unicode. As a result, PDF/A-4 is fully adapted to today’s technologies.
You will find more information for advocating the use of PDF/A-4 on the PDF Association website.
PDF/A-4 support in our PDF SDK
Since the end of 2021, we support PDF/A-4 conversion and validation features in our various SDKs:
Test our conversion engine with your documents! For our demos, we use our AvePDF web app that runs with our engines:
Test our validator with your documents:
Remember, PDF/A-4 is still young, so if you encounter any issue with one of your files, please do not hesitate to let us know! The more we process problematic files, the more robust the conversion and validation engines become.