9

For quite a while I am having the problem that copying and searching from my PDFs is a bit difficult as ligatures are not properly translated. I am using XeLaTeX with Libertine/Biolinum.

I am a simple user, so I tried workarounds I found on the internet (Make ligatures in Linux Libertine copyable (and searchable) - \pdfglyphtounicode with XeTeX - Can PDF search find words with ligatures in XeLaTeX-documents?) but all of this doesn't work.

Here's my MWE

%!TEX TS-program = xelatex 
%!TEX encoding = UTF-8 Unicode 
\documentclass{scrreprt}
\usepackage{fontspec}
%\defaultfontfeatures{Ligatures=Historic}
%\setmainfont{Linux Libertine O}
\usepackage{libertine}
\begin{document}
fluffier soufflé fisticuffs fb fh ffh fj ffj fk ffk ft fft tt Qu Th ch ck ct
\end{document}

Which renders

u er sou é sticu s ch ck ct

for the above and

u er sou é icu s ch ck

when I use the historic ligatures.

Using \input{glyphtounicode} workaround I get:

Undefined control sequence. l.7 \pdfglyphtounicode{A}{0041}

Using \usepackage[t1]{fontenc} I get:

/usr/local/texlive/2014/texmf-dist/tex/latex/base/fontenc.sty:100: LaTeX Error: Encoding scheme `t1' unknown.

See the LaTeX manual or LaTeX Companion for explanation.

Type H for immediate help.

l.100 \fontencoding\encodingdefault\selectfont

Experimenting with other fonts shows very mixed results, so while it's obviously possible that the problem is in the fonts, is there something, anything, I can do to work around this and still keep ligatures?

Something like the above-mentioned

\input{glyphtounicode}

\pdfglyphtounicode{f_f}{FB00}

where I could "translate" the ligatures by hand - the above doesn't work for me, though.

Thorsten
  • 461
  • 2
    If I compile your very example, select the words in my PDF previewer and paste, I get what you can see here: fluffier soufflé fisticuffs fb fh ffh fj ffj fk ffk ft fft tt Qu Th ch ck ct – egreg Jan 21 '16 at 13:29
  • It depends on where you paste the text. A while ago it happened to me with certain ligatures, and nowadays it still occurs with accents appearing as ’e but it only occurs in some editors. If you paste it somewhere else and then copy it back, it works fine. – ienissei Jan 21 '16 at 13:35
  • 1
    @egreg - In a way I am glad to hear that, what am I doing wrong then? I am using the latest Libertine fonts (well, they haven't been updated for ages, but still) and "This is XeTeX, Version 3.14159265-2.6-0.99991 (TeX Live 2014)" BTW, I am on OSX. – Thorsten Jan 21 '16 at 17:03
  • My try was with TL 2015, but if I do the same with TL 2014, I paste here like this: fluffier soufflé fisticuffs fb fh ffh fj ffj fk ffk ft fft tt Qu Th ch ck ct – egreg Jan 21 '16 at 17:07
  • Have you made any progress resolving this? – jan May 02 '16 at 09:36
  • @jan Nobody can even reproduce the problem, so it is hard to see how anybody might be able to solve it. When other people test, there is no problem. – cfr May 03 '16 at 03:01
  • @jan - No, I did not make any progress. It just doesn't work. And while I concede that it is difficult for anyone to provide help if the problem doesn't show when they try, I sort of resent the possible implication that the only reason for this is that I am either an idiot, devious, or both. – Thorsten May 07 '16 at 07:39
  • @Thorsten I assume we're having the same problem although with a different font. I opened a new thread here after doing some more tests. It looks like it has either to do with OSX's pdf handling or with the fonts themselves, but I haven't made any progress. Again, no one (even people on a mac) is able to reproduce it. So I totally feel your frustration. – jan May 08 '16 at 10:29
  • @jan - yes, after trying these workarounds specifically for Libertine my conclusion was, too, that the font is not the problem. There must be something else going on. Are you using MacTeX/TeXShop as well? Whether I copy "offered" from the TeXShop PDF; from the OSX Preview PDF or from Skim's PDF - it always pastes as "o ered". But I don't know enough about the PDF creation process (or whether there are options that are set wrongly) to address this... – Thorsten May 18 '16 at 07:21
  • Yes, I am also on Mac/TeX Shop (same problem with Preview/Skim). I don't think it has to do only with LaTeX, since creating a PDF thru the Mac print dialog from Text edit or Pages with certain fonts also produces the problem, and also because Acrobat Reader can read most ligatures without problem, but given that XeTeX and LuaTeX give different results, i.e., different ligatures are/are not readable in Preview, I still hope there is a way to change something on the LaTeX side of things to produce PDFs safe for all readers. I can't believe we're the only ones experiencing this. – jan May 18 '16 at 16:03
  • Unfortunately, the only solution I've been able to find is to abandon Libertine for another font, like BaskervaldX. – bishopcranmer Nov 06 '16 at 04:33
  • I am having similar problems recently, though with LuaTeX instead of XeTeX. My finding is that though standard ligatures have Unicode spot, some fonts opt to use private sections instead. And it's almost certainly so for non-standard ligatures. This might contribute to the difficulty of copying from PDF. In the end, a ligature in PDF is but one glyphs, and the PDF reader might have difficult to "decode" it into sepetare letters if that glyph has a private section code point – Yan Zhou Jan 24 '17 at 19:26
  • 1
    @jan I can also reproduce Thorsten’s problem on a Mac using either Skim or Preview. However, in Acrobat Reader everything works as it should, and I can correctly copy and paste ligatures. – Marco Varisco Feb 01 '18 at 23:06

1 Answers1

3

Try adding \XeTeXgenerateactualtext=1 at the start of your document.

(IIRC, I think this requires XeTeX from TeX Live 2016 or later, or an equivalent from other distributions such as MikTeX; and the result of copy/paste will also depend on the PDF reader used, as not all PDF viewers support ActualText annotations.)

  • This answer worked for me and solved a long standing problem that I faced when copying Devanagari letters from the XeLaTeX generated PDF into an Unicode-aware text editor. Thanks a lot @jfkthame !! – vrgovinda Jul 14 '21 at 06:22