2

I am using LaTeX to produce accented characters. I use the resulting PDF file as a palette from which I can select characters for copying and pasting into (say) a Word document or the input field of a webpage.

Here is an example of an acute accent above "e":

\documentclass{article}
\begin{document}
\Huge
\'{e}
\end{document}

Unfortunately, I cannot select the entire accented vowel in the resulting PDF. I can select the "e" or I can select the accent.

Is there a way to have LaTeX produce the accented vowel as a single character so that I can select it from the PDF for copying and pasting?

  • 2
    In this example the é also copies fine if you load \usepackage[T1]{fontenc} and use pdfLaTeX. But T1 doesn't have all characters available either, \={o} won't copy as desired, for example. – moewe Jun 14 '21 at 04:46
  • @moewe: Thanks! I can live with vowels that I don't accent until I run into them. It's trivial revert to other engines if I encounter a problem vowel accent. – user2153235 Jun 14 '21 at 05:50
  • 1
    Just remember to remove \usepackage[T1]{fontenc} when you switch to LuaLaTeX or XeLaTeX (they don't like T1: https://tex.stackexchange.com/q/470976/35864). Plus the font handling of the Unicode engines is different, so switching engines may mean more than just calling a different binary. – moewe Jun 14 '21 at 06:28
  • @moewe: Good to know. Thanks For real reports, I've always used pdflatex and always had \usepackage[T1]{fontenc} (inheritted with dozens of other includes and various types of definitions). MWEs are the only circumstance where I leave out that huge payload of stuff. This is the first time I've experimented with other LaTeX engines, so I've never encountered the problem you describe. I appreciate the heads up. – user2153235 Jun 14 '21 at 12:30

2 Answers2

5

Any document using non ASCII accented characters with pdftex should be using the T1 encoding for 8-bit fonts that include pre-composed letters for most Western European languages. (Or another suitable encoding such as LGR for Greek or T2 for Cyrillic for example) Without that hyphenation will be wrong, even without considering cut-and-paste.

In pdftex, as in luatex and xetex LaTeX's \' command will use a pre-composed character if it exists in the font encoding being used.

So given

\documentclass{article}
\usepackage[T1]{fontenc}
\begin{document}
\Huge
\'{e}
\end{document}

Cut and paste can be expected to work from a pdftex generated PDF, and as shown below, that is what happens (copying from xpdf in a cygwin X server to Word running on the same machine)

enter image description here

David Carlisle
  • 757,742
1

pdfLaTeX is based on 8-bit font encodings (actually 7-bit in the default cases) so it doesn't actually set a composite character when you type \'{e}, but rather positions the character ´ over e. This is why you're getting the results you're experiencing.

But there's a simple solution. Use one of the Unicode-based TeX engines in place of pdfLaTeX. If you generate your pdf using xelatex or lualatex, you will get the expected results when you copy and paste your character.

Don Hosek
  • 14,078
  • 1
    Thanks, Don Hosek. I have lualatex for both Windows MiKTeK and Cygwin. The PDF from the Cygwin lualatex causes the accented "e" to be pasted into Word as an apostrophe followed by an "e" (undesirable). The MiKTeK lualatex causes the accented "e" to be properly pasted into Word. For xelatex, I have only the MiKTeK installation. It also behaves properly. For the MiKTeK installaions of lualatex and xelatex, I am prompted for administrator privileges, which I deny. It doesn't seem to prevent them from working, so I think the prompt is to launch a package update checker. – user2153235 Jun 14 '21 at 04:21
  • Sheesh, that was a mess. I just used Cygwin's package manager to explicitly either install or update xetex and luatex. Now my accented vowels pastes from PDF to Word just fine. Lesson learned. – user2153235 Jun 14 '21 at 05:47
  • 1
    This is rather misleading. There may be reasons to switch to a Unicode TeX, but this isn't one of them. \'{e} will use a composite character even in pdftex if the font encoding being used has an e acute. No document using accented characters should be using OT1 encoding (as there is no hyphenation in that case) with \usepackage[T1]{fontenc} the composite character is used and you get correct hyphenation of words with accented characters. – David Carlisle Jun 14 '21 at 06:49
  • Thanks David. I'm not ignoring your comment, I'm just having difficulty understanding it due to my limited insight into the vast machineries of LaTeX. Googling OT1 and composite characters hasn't dispelled that fog. Did I understand correctly in that my posted problem is that the accented character doesn't use composite characters? When you say "No document...should be using OT1", are you saying that we don't expect the engines to operate this way, or that the user should avoid doing so? I'm sorry, I just don't know enough about what goes on under the hood. – user2153235 Jun 14 '21 at 12:43
  • @DavidCarlisle For the OP's purposes, anything other than Unicode would result in erroneous results (he wants to be able to copy and paste characters to webforms or Word, both of which would be expecting Unicode-encoded characters). As for “No document using accented characters should be using OT1 encoding,” it's problematic then that the default in LaTeX is OT1. Given that it's 2021, it's long past time to deprecate the 8-bit encodings, not to mention the 7-bit ones and stop having them be the default. – Don Hosek Jun 14 '21 at 15:50
  • 1
    @DonHosek cut and paste will work in current releases, the tounicode maping is applied at the pdf font inclusion) as for changing the default font: you haven't got to worry about arxiv and other archives with millions of documents that are expected to use the same fonts with new releases so it's a lot easier for you to say than us to do:-) . (It's not impossible that some way of changing the default for pdftex is found, but it's not easy). We can't deprecate 8bit encodings unless we deprecate pdftex and tex in favour of luatex and xetex, may be the future but we are not there yet – David Carlisle Jun 14 '21 at 15:57
  • 1
    @user2153235 I mean that as an alternative to switching to luatex or xetex you can use pdftex if you add \usepackage[T1]{fontenc} and furthermore even without cut and paste every document using accented characters should have that already as hyphenation patterns in pdftex assume that encoding. – David Carlisle Jun 14 '21 at 16:04
  • @DonHosek LuaTeX is not archival-stable, XeTeX borderline: of the major engines, only pdfTeX really makes the grade there. That limits the team's room to manoeuvrer, as David has said. (For 'personal' projects I can consider LuaTeX, but not for pushing to the AMS!) – Joseph Wright Jun 14 '21 at 16:29