One frequent aspect of pdflatex output that could be improved is to what extent one can select text output from the generated pdf-file and copy-and-paste it. This is essentially an issue of having the right output encoding. There are at least two important aspects to this:
- In the ideal case, Unicode would be the output encoding.
- Ligature glyphs (such as fi) are ideally de-ligated in the output (here: "fi").
It seems that output encodings are specified through the fontenc package. What is the range of possible output encodings, and most importantly: Is there a way to specify a Unicode output encoding and one that deals with ligatures in the intended way?
(Note: It is important to distinguish output from input encodings. How to use different input encodings in LaTeX has been documented widely. My understanding is that this is best handled by the inputenx package, as it supersedes the inputenc package.)
Quick guide to the solutions: The information in the answers and answer threads is a bit distributed, so here the short summary: one approach is to use \input glyphtounicode together with \pdfgentounicode=1; the other approach is to use cmap/mmap.
Addendum: Sometimes one will need to load the package accsupp and enclose one's macro definition in \BeginAccSupp{method=hex,unicode,ActualText=<codepoint>}[...]\EndAccSupp{} to generate a specific code point. See the caveat about non-BMP code points here and the fix (starting from accsupp version v0.4, 2012/11/18) here, which provides the new unichar package option.
\usepackage{ae}is a partial solution - the copy'n'paste doesn't produce gibberish, but characters gets copied decomposed (e.g. 'ó' -> '`o'). – Jakub Narębski Feb 25 '15 at 14:58\usepackage{lmodern},\usepackage[T1]{fontenc}and\usepackage[utf8]{inputenc}(a combination which permits to copy many characters, including those given in the OP examples, from the PDF), copying\succeqfrom the resulting PDF does not work. Adding\input glyphtounicode \pdfgentounicode=1makes it work:\succeqgets copied to unicode ⪰. – Olivier Cailloux Feb 26 '19 at 22:57