17

The aim is to generate the .pdf with accented characters (the .tex file has mixed macro and unicode input), in a way that the .pdf text can be copy-pasted.

An example:

\documentclass{article}

\usepackage[utf8x]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{tgpagella}

\begin{document}

Unicode input: ā ī ū ṃ ṅ ñ

Macro input: \=a \={\i} \=u \d{m} \.n \~n

\end{document}

Compiling with pdflatex, the above will visually produce the desired characters, but when you select and copy-paste them from the .pdf, you get

Unicode input: a  ̄ u m n ñ
 ̄ı ̄ .  ̇
Macro input: a  ̄ u m n ñ

Edit:
Ulrike's answer explains what pdflatex is doing here.

Gambhiro
  • 3,384

3 Answers3

11

pdflatex doesn't use "unicode compounds". You are using T1-encoding and for the accented chars not available in this encoding pdflatex use various methods to build them. E.g the dot below the m is actually a small tabular with the m in the first row and a dot in the second:

\DeclareTextCommand{\d}{T1}[1]
   {\hmode@bgroup
    \o@lign{\relax#1\crcr\hidewidth\ltx@sh@ft{-1ex}.\hidewidth}\egroup}

In theory you can get correct glyphs with pdflatex (if your font contains them). In practice it would mean a lot work. Better use xelatex or lualatex.

Ulrike Fischer
  • 327,261
  • With XeLaTeX and LuaLaTeX, use, accordingly, \usepackage[EU1]{fontenc} and \usepackage[EU2]{fontenc}, or just load fontspec. – Andrey Vihrov Jul 03 '11 at 17:04
  • Thanks, I didn't realize that. I removed the latter part of the question accordingly. I see that I'll need to update lualatex to work with fontspec. – Gambhiro Jul 03 '11 at 17:43
6

Try adding \usepackage{cmap}. Or switch to xelatex/lualatex.

  • Thanks! There should be a phrase on this site for the "yet another latex package I didn't know about" feeling. – Gambhiro Jul 06 '11 at 16:08
  • Did "cmap" work for anybody? It's been 11 years (to the day!), and I still can't copy extended latin characters properly using pdftex =) I guess it's time to switch to XeTeX for good. – Nikolaj Š. Jul 06 '22 at 11:18
  • According to cmap's documentation, it only works if the character is already in the specified encoding, so this solution is useless in this case where isn't even included in T1 encoding. – user202729 Dec 21 '23 at 09:03
4

This is the path to victory:

  1. Install TeX Live 2010 which has a luatex and fontspec version that work together. Follow the instructions on tug.org.

  2. On Linux, don't forget to update the $PATH! If you already have TeX Live installed with your package manager, give the new path priority over the old one. For example, at the end of ~/.bashrc, put PATH=/usr/local/texlive/2010/bin/i386-linux:$PATH; export PATH

  3. Log out and back in (or just open a new terminal) so that your $PATH updates.

Save this test to test.tex somewhere, and compile with
lualatex --interaction=nonstopmode test.tex

\documentclass{article}

\usepackage{fontspec}
\setmainfont[Ligatures=TeX]{TeX Gyre Pagella}

\begin{document}

Unicode input: ā ī ū ṃ ṅ ñ

Macro input: \=a \={\i} \=u \d{m} \.n \~n

\end{document}

Open the resulting test.pdf, and when you copy-paste from it, you get

Unicode input: ā ī ū ṃ ṅ ñ
Macro input: ā ī ū ṃ ṅ ñ

Nice!

Gambhiro
  • 3,384