When I drag and copy text from PDF file that was compiled by LaTeX, often the content is not fully preserved. Diacritics are separated from characters, some math symbols change into other symbols and even control characters appear. All encodings are UTF8. Why does this happen?
Asked
Active
Viewed 65 times
All encodings are UTF8.but none of the fonts used by pdftex for the text in the PDF are Unicode fonts. pdftex can only use fonts with 256 characters, and most only have 127 so all the code points in the pdf are < 256. It is possible (sometimes) to embed a mapping back to unicode so a pdf reader can generate Unicode text for the clipboard, but the details are tricky and font dependent – David Carlisle Oct 04 '22 at 09:48