Does pdf file contain metadata of used images? Is it possible to read metadata of images wich was used in pdf that file? Of cource pdf file was create by latex (texlive).
1 Answers
Yes
In principle, embedded JPG files in a PDF can contain valid EXIF metadata.
The responsibility of removing the metadata from images included in PDF files unfortunately relies on the user, unless the program used to generate such a PDF file includes a feature for that.
Amped, a software company specializing in software for image forensics, has an interesting blog post about this topic. Here's a quote from their example of a photo extracted from a PDF:
There are traces of processing with Adobe Photoshop left in the Exif metadata! Not only, Exif dates tell us that the image was captured well before February 2019, but it was edited on February 20th, 2019!
We notice there is plenty of Exif metadata: taking a look at them with the EXIF tool reveals that the image also contains GPS coordinates [...]
-
Both ImageMagick and GraphicsMagick can
-stripmetadata from images. In fact, this is a requirement if the PDF is in formatPDF/X-1a:2001(print to paper, using some technologies). I assume that you (Andrea) arrived here via one of the graphic arts forums, where (I hope) this is well-known. But here at TeX, hardly anyone knows it. – rallg Mar 14 '24 at 17:35 -
@rallg correct. I often use said command line argument to strip the metadata. Given that the question asked if it's possible to read it, I thought I didn't need to mention how to remove it. – Andrea Lazzarotto Mar 16 '24 at 13:59
-
There is an old American joke. Something like this: Clem and Josh are small farmers (long ago). Clem says, "My horse has colic. When your horse had colic, what did you feed him?" Josh says, "When my horse had colic, I fed him alfalfa." The following week, they meet again. Clem says, "I fed my horse alfalfa, but the horse died." Josh says, "So did mine." My point: Clem asked what the horse was fed. He didn't asked if it helped. – rallg Mar 16 '24 at 18:28
qpdf --stream-data=uncompress input.pdf output.pdf) and inspecting the raw pdf data in a text editor. – Marijn May 20 '20 at 13:14