Recently, I'm coming across more and more places that use PDF files as a non-editable format to ensure that the data in it is not altered in any form. (Whether that is a good idea or not is another question altogether -- the general public "understand" PDF and know how to generate them easily from systems.)
However, what I'm having trouble with is coercing it into a text-friendly and parseable format that allows for analysis of the data in the PDF. pdftotext goes a long way, but there's almost always one small problem with its output that makes it a non-ideal solution.
Are there any solutions to this dilemma? What's a text-friendly file format that can be reasonably assured to be the original output from some software (without going through the hoops of signatures and encryption) that a layman would be able to open up and read easily?
Note: I'm well aware that all formats are editable, but it's not so readily apparent to the average user i.e. they would probably not know how to edit a PDF without some searching. Also, I'm not advocating for this, I'm just curious as to whether a text-friendlier format exists. I'm not going to be able to get a whole lot of people to understand what file signatures are, let alone generate them properly.