102

I am trying to produce a structured or 'tagged' PDF from pdftex (Tex Live 2013) that passes automated tests in Adobe Acrobat for tagging. These tests are the de facto indicator for a document accessibility, which are often required by universities, government agencies and others worldwide for published documents.

To test a document for accessibility / tagging, try this:

  1. Open the PDF in Acrobat Reader
  2. 'file -> 'document properties' -> 'Description'
  3. under 'advanced', you'll see a field, 'tagged pdf'. Ideally it should say 'yes'. But it doesn't :(

enter image description here

Question: does anyone know of any way to create a PDF which passes Acrobat's 'Tagged PDF' test directly from LaTeX?

The ideal solution would be a package or two that can be called from the preamble of a latex document. I want something that is very low effort for the author, and can thus be very easily integrated with existing workflows.

Note: There has been discussion about this in the past (see tags and links on the right), but as of May 2014, there was no clear solution to this. Much of the existing discussion is from 2012 or earlier (see How can tagged PDFs be created that support Universal Accessibility and reflowing?), and so I'd like to see if we can kick-start this discussion.

Why this isn't a duplicate question: When I first posted this question in July 2013, it was flagged as a duplicate of a question from 2010. The answer references a presentation given at at TUG 2010. That presentation basically says "we're working on it", and is not an answer that allows me to implement a solution.

Because it is now 2014, I think it reasonable to expect that:

  1. There have been new developments in this field through new packages or updates to core LaTeX
  2. Attempting to comply with Section 508 will have required that people figure out solutions. This is a relatively recent requirement and so solutions may have changed since 2010
  3. The software that is typically used to judge compliance (Adobe Acrobat) has changed several times since the question was not answered
  4. Packages that address accessibility have been proposed but not come in to widespread use or even disappeared (e.g. the accessibility package, http://www.babs.gmxhome.de/da_ergeb.htm), and it is therefore reasonable to revisit this issue with fresh eyes.
Andy Clifton
  • 3,699
  • 10
  • 2
    I can't recall any details, but there was some discussion about it on the ConTeXt mailing list - try searching its archives. How much are you attached to LaTeX? ConTeXt might be a better choice especially in non-academic setting. – mbork Jul 16 '13 at 19:34
  • 1
    @Werner - thank you, very relevant, but from 2009, and it includes this comment: "This kind of coding, directly in pdfTEX primitives, is really only useful for testing and “proof of concept” examples". I was hoping that there would be a 2013 version of this called "How to produce ADA-compliant documents from LaTeX"... – Andy Clifton Jul 16 '13 at 19:38
  • @mbork Umm, all of our templates are in LaTeX, and if I suddenly tell the authors to use ConTeXt there will be tantrums. That said, can you add an example demonstrating this? – Andy Clifton Jul 16 '13 at 19:40
  • 1
    Can you please have a look at these videos: [1], [2] and [3]. I guess, you will get more info regarding tagged PDFs from last two videos. – Jagath Jul 17 '13 at 02:55
  • So, who do I get his reopened as not a duplicate? Please see edits explaining why this is not a duplicate. – Andy Clifton Jul 19 '13 at 00:13
  • Did you try this patch (admittedly very old)? I've opened the PDF there and Acrobat claims it to be tagged. – Vedran Šego Jul 20 '13 at 23:47
  • @VedranŠego. Thanks for the suggestion. I was aware of the patch but patching tex doesn't really pass my requirement for something that's transparent to the user and works with Tex Live 2012. That said, if you wanted to propose this as an answer, maybe others can build on this and convince me that this is the solution? – Andy Clifton Jul 21 '13 at 00:11
  • I don't think anyone will try to convince you of anything. The patch (I've seen it recommended for TexLive 2010 somewhere) seems the only way to accomplish what you want. However, if you do decide to try it out on your own, I suggest trying on a non-live system, maybe one installed solely for this test, i.e., in a virtual box. Good luck! – Vedran Šego Jul 21 '13 at 00:48
  • 2
    @LostBrit: There is not much new; certainly no working solution. There is a german computer science masters thesis from ca. 2007 which shows a fairly complete solution; some german people are looking into it. But no developer has stepped forward as of now. – Martin Schröder Jul 22 '13 at 09:27
  • 5
    @MartinSchröder: Do you mean the accessibility package? That's available again at http://www.babs.gmxhome.de/download/da_pdftex/accessibility.sty with (german) documentation at http://www.babs.gmxhome.de/download/da_pdftex/dok_pdf.pdf. This is the closest thing I've seen to a solution (single package, transparent to authors and passes most of the PDF tests). I'm going to put some time into that as a possible solution. I understand that the package has also been submitted to CTAN, so there might be a "formal", licensed, release as well. – Andy Clifton Aug 13 '13 at 20:20
  • @LostBrit: Yes, that's what I meant. Thanks. – Martin Schröder Aug 15 '13 at 10:20
  • You could look at the hyperref package (http://ctan.org/pkg/hyperref). – ppr Sep 30 '13 at 22:27
  • 1
    @ppr: Hyperref gives me some of what I need, but I would need to build a framework to use hyperref to then construct the document structure. I think there are other options that might be simpler, which is what I would like to find. – Andy Clifton Sep 30 '13 at 23:58
  • 1
    Glad to know you guys are still working on addressing the accessibility issue. BTW, I flew my rather simple idea of using the LaTeX code as written by the document author as the alt text for an equation by an unsighted mathematician colleague. He loved the idea. – David Hammen Jan 18 '14 at 19:59
  • @DavidHammen I read the comments as suggesting that not much work was being done on this. Certainly the people posting here are not among those they mention as having touched this. (I am probably not the only one who would have no clue how to do it, although perhaps I underestimate the resources of the average commentator on this question.) – cfr Jan 29 '14 at 04:56
  • 1
    If I ever need to produce accessible documentation, I'm going to have to use something other than TeX, I think. The gold standard seems to be Word (or was a year or two ago). What seems odd about this, when I think about it, is that semantic markup ought to be more amenable, although I think that you do probably have to write your Word document in a particular way for it to work (i.e. use styles etc. and not just do it visually). I keep hoping there'll be a solution by the time somebody comes along who needs me to do this... – cfr Jan 29 '14 at 04:56
  • 1
    it is possible to use tex4ht to generate xml formats suited for screen readers http://www.cse.ohio-state.edu/~gurari/laspeak/. other option is to make html file with mathml for math. latter option is probably better, as html and mathml should be supported well with current screen readers – michal.h21 May 09 '14 at 08:17
  • According to accessibility.sty it is distributed under a licence contained in access.tex which I can't find and may be distributed only with that file. According to accessibility.sty it is licensed under the LPPL. So there are conflicting licence statements which probably mean it is illegal to do much with it. (Or course, if access.tex were available, that might say something consistent with LPPL but it is hard to tell otherwise.) It is unfortunate that this seems not to have been uploaded to CTAN or even made clearly available. – cfr May 11 '14 at 00:29

2 Answers2

23

Experience with the Accessibility Package

I downloaded the accessibility.sty style file from Babette Schalitz's website. This file is available under an LPPL license.

Using \usepackage[tagged]{accessibility} in my preamble allowed me you to generate a basic tagged PDF file that passes tests in Adobe Acrobat. However, the package didn't work if I needed to use roman numerals for the first page. This error seems to be because accessibility.sty uses displayed page numbers to build the document tag structure.

Updating the Accessibility Package

To fix the numbering problem I added the count1to package and replaced a few of the \pageref with \count1 in accessibility.sty. I've called this modified file accessibility-meta.sty for now, and posted it to GitHub. The package now seems to compile both articles and reports, and the output shows up as "tagged" in Acrobat.

Make sure that \usepackage[tagged]{accessibility} or \usepackage[tagged]{accessibility-meta} are pretty much the last thing in your preamble. In the event of a 'tex counter overflow, try commenting out various bits of your code. I've found that the tagging blows up quite regularly, especially with complex documents. Compile the document several times so that the page numbering and position of text or floats settles down, and the tags are properly generated.

Other Steps

I've also taken the steps that nbdb suggested:

  1. Upgrading to TexLive-2015.
  2. Adding \pdfinterwordspaceon to the preamble to fix the loss of inter word spacing in text (requires TexLive-2014 or later).
  3. Adding the cmap package to fix mapping characters to unicode.

Testing and Feedback

I produced a test PDF using TexLive 2015, the cmap package, \ pdfinterwordspaceon, and the modified accessibility class. This seems to be "tagged". The PDF is available here.

Because I don't work with tagged documents on a regular basis it's highly likely I may not be doing a detailed enough test of the resulting document. I'd be keen to know if this PDF passes pre-flight testing in Acrobat, and if accessibility-meta.sty file works on other documents. I'm particularly interested in feedback from anyone who deals with tagged publications professionally.

Babett Schalitz deserves all of the credit for having produced the original accessibility.sty file, upon which accessibility-meta.sty is based. I would not have had a chance without her work.

Update August 2015: I am working towards getting accessibility-meta.sty on to CTAN. Please feel free to register issues on the github repository, or provide suggestions for improvements.


Update June 2020:

In 2019 I got in contact with Babett and got the files for the original accessibility package; she also allowed me to take over the maintenance of the package. I tidied up accessibility enough to get it to CTAN, but didn't update the functionality.

Releasing accessibility to CTAN has shown that unfortunately there are now quite some problems with it. When it was developed back in the early 2010s it worked a lot better, and it looks to be very sensitive to developments in other packages. It no longer compiles a basic MWE reliably.

Because of this I no longer think that accessibility is fit for purpose, and I will be contacting CTAN to look into getting it taken off CTAN (if possible).

However, I will leave the code at https://github.com/AndyClifton/accessibility. If anyone reading has coding skills and would like to contribute to the package, please leave an issue there.

Andy Clifton
  • 3,699
  • 2
    This is very interesing. However have you checked the resulting PDF with Acrobat Pro for accessibility? There are lots of errors. Most are due to not correctly mapping the characters to unicode, which can probably be overcome by using the cmap package. The biggest hurdle however is the lack of spaces. Text within tags looks like: "Mostareduetonotcorrectlymapping... – ndbd Aug 05 '14 at 17:01
  • @ndbd thanks for the comments. This is exactly the kind of feedback I need. I don't have any experience of looking at PDFs from a publisher's perspective, and it wouldn't have occurred to me to look for this. – Andy Clifton Aug 05 '14 at 17:06
  • 2
    I just checked and in the new TexLive 2014, there seems to be a new command \pdfinterwordspaceon, which does exactly this. Hadn't had time to tinker with yet, though – ndbd Aug 05 '14 at 17:08
  • What are the chances of either the original or your modified version actually making it to CTAN? – cfr Aug 15 '15 at 21:26
  • Is there a link for the package? If so, accessibility.sty could be redistributed. Without the source files, though, the licence doesn't permit it. I can't navigate the site well as I don't read more than a few words of German. Although not specifying a version of the LPPL is problematic. – cfr Aug 15 '15 at 21:32
  • The package file on GitHub is probably strictly illegal as it claims to provide accessibility.sty, even though it doesn't. And the licensing information does not apply to the modified version posted there so it can't be legally used. This might seem pedantic, but I'd like to see this work available generally and that means getting it to CTAN and into TeX distributions. TeX Live, in particular, won't take stuff without clear OSS licences. So it matters. Take a look at nfssext-cfr.sty for an example of a derivative work, but it requires a clear licence on the original which we don't have. – cfr Aug 15 '15 at 21:41
  • @cfr I emailed the person who wrote the accessibility package a year or so ago and they then tried to post it to CTAN. That didn't work for some reason. I haven't pestered them recently, but will try again. Also, the modified style file seems a bit alpha-version for CTAN. I almost wonder if it might be worth starting again...? – Andy Clifton Aug 16 '15 at 01:07
  • @AndyClifton It would be worth it. As for the modified version: you are the best judge of that. I think it is worth posting if you don't expect to find time to start again any time soon, though. Just because it seems efforts in this direction are (a) important and (b) not ending up being widely disseminated, so people keep finding nothing, trying to reinvent the wheel, getting stuck when the square they've made doesn't roll properly and so on. You can even post it as "unmaintained" under the LPPL if you don't want to be bothered by it. – cfr Aug 16 '15 at 02:27
  • @AndyClifton It appears that two years ago you deprecated accessibilityMeta in favor of CorporateLaTeX. But is that a different thing? Are you still aiming to get accessibilityMeta into CTAN? It would be helpful to have an officialish way to produce accessible pdfs. – Teepeemm Apr 09 '19 at 16:06
  • @AndyClifton Have you been in contact at all with Ulrike? I wonder what might be 'salvageable' as part of LaTeX team efforts – Joseph Wright Jun 25 '20 at 09:17
  • @JosephWright - yes, just talked to her today. Should have done that a long time ago. I'd also be interested in talking with you and the rest of the team to explore ideas. – Andy Clifton Jun 26 '20 at 22:40
22

This question has a lot of votes, but hasn't received an answer. So I am giving a ConTeXt solution. To create a tagged pdf, simply add:

\setuptagging[state=start]

at the top of your document. For example:

\setuptagging[state=start]

\starttext
\startsection
    [title={A section title}]
  \input ward
\stopsection
\stoptext

Then, pdfinfo test.pdf gives:

Title:          test
Creator:        ConTeXt - 2014.04.24 09:39
Producer:       LuaTeX-0.79.1
CreationDate:   Fri May 30 11:58:32 2014
ModDate:        Fri May 30 11:58:32 2014
Tagged:         yes
Form:           none
Pages:          1
Encrypted:      no
Page size:      595.276 x 841.89 pts (A4)
Page rot:       0
File size:      10992 bytes
Optimized:      no
PDF version:    1.6
Aditya
  • 62,301
  • 3
    Aditya - thanks for the contribution. However, to fit with existing workflows I really need a LaTeX solution, so I can't accept this as an answer. – Andy Clifton May 30 '14 at 16:06
  • 4
    @AndyClifton: I understand. However, I believe that there is significant interest in tagged pdf. I wrote this answer in the hope that it will be of interest to others. – Aditya May 30 '14 at 16:51