Angle symbols which can be copy and pasted from PDF

Question

I'm looking for a way to use the left and right narrow angle symbols (〈〉, Unicode #3008, #3009) so that they are not changed when copy and pasted from a PDF.

These symbols are used e.g. by the doc package for \meta{...} and other formatting macros to indicate the content of an argument, e.g.: \command[〈options〉]{〈filename〉} would be written with |\command|\oarg{options}\marg{filename}. This package uses the \langle and \rangle math symbols. The problem with this is that if this code is copied from the generated PDF the symbols are changed to h and i, e.g.:\command[hoptionsi]{hfilenamei}.

Detexify now pointed me to \textlangle and \textrangle from the textcomp package. These angles are copied to ASCII angles < and >, respectively, which is much better already, but I like to have the correct narrow angles instead.

How can I insert this angles so that the can be copied correctly from the PDF? I tried to use [utf8]{inputenc} and insert them as Unicode symbols in VIM manually (CTRL+V u 3008 or 3009, respectively) but got an error that this symbols are not setup for this use. I need this for my ydoc class/package, an alternative to doc and ltxdoc and would like to avoid hassle with fonts and encoding as much as possible. This should work for PDFLaTeX and not just for XeLaTeX or LuaLaTeX.

In theory this could also be fixed using the answer to Is it possible to provide alternative text to use when copying text from the PDF?, For the angles this would be overkill, but it might be good for other symbols. — Martin Scharrer, May 18 '11 at 10:30

egreg · Accepted Answer · 2011-05-17T16:25:43.510

13

With

\usepackage{cmap}

the two symbols may be associated to U+27E8 MATHEMATICAL LEFT ANGLE BRACKET and U+27E9 MATHEMATICAL RIGHT ANGLE BRACKET. If you want to input them directly, here it is how it can be done:

\documentclass[a4paper]{article}
\usepackage[utf8]{inputenc}
\usepackage{cmap}
\usepackage{textcomp}
\usepackage{newunicodechar}
\newunicodechar{〈}{$\langle$} % U+27E8
\newunicodechar{〉}{$\rangle$} % U+27E9
\begin{document}
\textlangle\textrangle

〈〉
\end{document}

Another choice might be

\newunicodechar{〈}{\textlangle} % U+2329
\newunicodechar{〉}{\textrangle} % U+232A

You can use directly \DeclareUnicodeCharacter, of course, with

\DeclareUnicodeCharacter{2329}{$\langle$}
\DeclareUnicodeCharacter{232A}{$\rangle$}

instead of \newunicodechar.

This should allow to copy and paste the symbol from the PDF viewer, but it seems that the code of the copied character is viewer dependent, when \textlangle and \textrangle are used. :(

Thanks to Martin Scharrer for having suggested improvements to the answer.

edited May 17 '11 at 16:25

answered May 13 '11 at 11:15

egreg

1,121,712

I was about to ask about the spaces around the angle brackets, but then I realized that those glyphs have a generous amount of space built in, at least with the fonts I am viewing this in. – Harald Hanche-Olsen May 13 '11 at 11:24
You might have mentioned that the newunicodechar package can be found on CTAN. It is not in TeX Live 2010, and so the vast majority (?) of readers probably won't have it in their installations. – Harald Hanche-Olsen May 13 '11 at 11:28
the newunicodechar package is in my tl2010 (updated yesterday) – wasteofspace May 13 '11 at 11:56
@Harald: The first version of the package was uploaded to CTAN on February 18, and has been in TeX Live since one or two days later. The space indeed depends on the font the system decides to use for showing the symbol. – egreg May 13 '11 at 12:16
I get the correct angles in the PDF, but copy and pasting under Ubuntu from Adobe Reader results in normal ASCII angles < >, exactly like with textcomp alone. If I use pdftotext I get Unicode angles. – Martin Scharrer May 13 '11 at 12:40
@Martin: I get <> with Adobe Reader 9 also on Mac OS X. :( With Skim and Preview I get U+2329 and U+232A. – egreg May 13 '11 at 12:50
I ran tlmgr update --all, and now my TeXlive too has newunicodechar.sty. – Harald Hanche-Olsen May 13 '11 at 13:16
There seems to be a problem: While this lets you get 〈〉into the document, if I copy them from the resulting PDF I get ␣␣ back (where the two glyphs are U+2423 OPEN BOX). I believe this is what the OP asked about. – Harald Hanche-Olsen May 13 '11 at 13:21
@Harald: It doesn't happen to me (except with Adobe Reader). Actually, cmap doesn't do anything, I get the same result (and I can copy-paste the right symbols) either with the default fonts or the Latin Modern fonts. – egreg May 13 '11 at 13:39
@egreg: cmap does the trick when the math-symbols \langl27e8e/\rangle are used: \documentclass{ltxdoc} \usepackage{cmap} \begin{document} \meta{test} \end{document} or \documentclass{article} \usepackage{cmap} \begin{document} $\langle$text$\rangle$ \end{document} work like a charm. – Martin Scharrer May 13 '11 at 14:21
@Martin: you've the strange habit of calling me "egrep". :) I see that U+2329 and U+232A seem to be deprecated, but I guess that changing their CMAP into U+3008 and U+3009 requires updating the fonts or the cmap files. – egreg May 13 '11 at 14:25
@egreg: Sorry for the typo! ;-) I had a look and the code in my last comment produces the characters U+27e8 and U+27e9 (MATHEMATICAL LEFT/RIGHT ANGLE BRACKETs). – Martin Scharrer May 13 '11 at 14:30
Hmm, fascinating. I compiled it with pdftex 3.1415926-1.40.11 (TeX Live 2010) on a Mac and got the result I indicated from Skim and Preview on the Mac. With Adobe Acrobat Pro, copying gave me <>, but PDFPen gave me U+2329 and U+232A. I copied the PDF file to a linux machine and got U+3008 and U+3009 from evince. It all seems very confused. – Harald Hanche-Olsen May 13 '11 at 15:08
@egrep: The package cmap suggested by you does it for me with the normal math symbols as shown in one of my last comments. If you would update your answer and add this, maybe with a note that your other code seems to be viewer dependent, I would accept your answer. – Martin Scharrer May 17 '11 at 13:56

Ulrike Fischer · Answer 2 · 2011-05-13T16:13:11.333

You can use glyphtounicode. Differently to cmap it will work with virtual fonts too.

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{mathptmx}
\input{glyphtounicode.tex}
\pdfgentounicode=1
\pdfglyphtounicode{less}{3008}
\pdfglyphtounicode{greater}{3009}

\begin{document}

< >

\end{document}

If you compile with pdflatex you can copy the brackets e.g. to notepad. This will work fine, but in an ascii editor (e.g. winedt) the result is not correct.

Edit. I just remembered: Another possibility is accsupp. It work with pdftex, luatex + xetex:

\documentclass{article}
\usepackage{accsupp}
\begin{document}
\BeginAccSupp{method=hex,unicode,ActualText=3008}%
angleleft%
\EndAccSupp{}%

\end{document}

Angle symbols which can be copy and pasted from PDF

2 Answers2

Linked