BFO PDF Library 2.28.4 - with PDF/UA-2 and WTPDF

BFO PDF Library 2.28.4

Released late last friday, this is a point release but it has some big news - it contains the final profile for PDF/UA-2, which is being published tomorrow.

To be fair, this is mostly big news for us: BFO are part of the working groups for PDF/UA-2 and WTPDF for several years now, so this is the end of a long process. But PDF/UA-2 and the related "Well Tagged PDF" specifications will likely form the basis for accessible PDF over the next decade or more, so we think this is significant - even though it's probably the first you've heard of them.

Here's a very short primer on these two specifications.

PDF/UA-2 and Well Tagged PDF

PDF/UA is also known as ISO 14289, and is the current standard for accessibility in PDF documents. If your PDF documents comply with "Section 508" in the USA, "EN 301 549" in Europe or WCAG for PDF then you're probably using PDF/UA already, perhaps without even realising it.

PDF/UA-1 (ISO14289-1) was published in 2014, and defines the accessibility rules for PDF 1.x.

PDF/UA-2 (ISO14289-2) updates this for PDF 2.x, which was itself published in 2017. It's a big improvement, and if I had to pick the most important reasons why, they would be:

  • The ambiguities in PDF/UA-1 which resulted in validators giving different "pass" or "fail" verdicts on files are fixed. It is a much tighter specification, thanks in part to ISO 32005 defining the tag hierarchy rules (rules that were not well defined for PDF/UA-1).
  • Supports the PDF 2.0 concept of namespaces on the logical structure. Namespaces let us keep a lot of semantic information which was being lost in PDF/UA-1. If you're converting from HTML to PDF, for example, you can keep the original HTML tags - very important if you later want to convert the PDF back to HTML. We'll come back to this topic a bit later.
  • PDF/UA-2 supports MathML, which is part of PDF 2.x. For academic publishers this is will be a very significant improvement.
  • PDF/UA-2 is completely compatible with PDF/A-4, and we expect these two profiles to be used together in many cases.

There's also WTPDF, short for Well-Tagged PDF. An Adobe-led initiative, the short summary is that WTPDF is essentially identical to PDF/UA-2 but published by the PDF Association rather than ISO, which means it's free to download and will be actively maintained - two phrases rarely associated with ISO. Of course this is over-simplified, and for a longer and more accurate explanation we'd recommend you go to the source at https://pdfa.org/wtpdf/.

A PDF file can be compliant with PDF/UA-2, WTPDF or both. Exactly how the industry adopts this slightly unfortunate acronym-soup of standards remains to be seen, but for now best practice when creating an accessible PDF is: always target both standards. The cost is negligible (700 bytes per file), and your documents will be future-proof.

BFO, PDF/UA-2 and WTPDF

This 2.28.4 release adds new profiles to the OutputProfile class for PDF/UA-2 as well as WTPDF. Validation and creation of these files is really no different to PDF/UA-1, and support has unofficially been in BFO Publisher for several months - it was used to create one of the two sample WTPDF files published at the PDF Association WTPDF site.

We'll demonstrate the exact process of creating a PDF/UA-2 file with the API in an upcoming article. The process of validating a PDF/UA-2 (or WTPDF) file is identical to validating PDF/UA-1.

Deriving HTML from PDF

BFO have also been involved in another PDF Association project, this one initiated by Foxit rather than Adobe. Deriving HTML from PDF defines a set of rules for converting a suitably tagged PDF file (such as a PDF/UA file) back to HTML.

This proposal is still in its early days, but release 2.28.4 of our PDF Library includes an implementation of the algorithm in the HtmlDerivation class.

This will only worked on a tagged PDF - the algorithm is not a magic bullet. Even in 2024 the vast majority of PDF files on existence are not tagged, and fair proportion of the rest are tagged badly. But it's a potentially useful area for many workflows, and we already have tooling for converting HTML to PDF (publisher.bfo.com) and for working with tagged PDF, thanks to our support for PDF/UA. So the BFO PDF Library will continue to support and track this specification as it evolves.

The rest of 2.28.4

As usual this release includes a mixed bag of bug-fixes and small improvements, all of which are documented in the Release Notes. A fair summary of those in this release is "nothing big" - most are very subtle issues resulting from interop testing with other tools, and from an audit ensuring we are 100% up-to-date with the published errata.

Download the latest release from bfo.com/download