As described in the answers to this question, the accsupp package can be used to have symbols paste as arbitrary Unicode codepoints. For example you might want to write something like the following, to have blackboard-bold symbols denoting natural numbers, booleans, and strings paste with the Unicode codepoints denoting the corresponding blackboard bold symbols ℕ, , (U+2115, U+1D539, U+1D54A) as opposed to just as "N, B, S":
\documentclass{article}
\usepackage{amsfonts}
\usepackage{accsupp}
\newcommand*{\setNat}{\BeginAccSupp{method=hex,unicode,ActualText=2115}\mathbb{N}\EndAccSupp{}}
\newcommand*{\setBool}{\BeginAccSupp{method=hex,unicode,ActualText=1D539}\mathbb{B}\EndAccSupp{}}
\newcommand*{\setStr}{\BeginAccSupp{method=hex,unicode,ActualText=1D54A}\mathbb{S}\EndAccSupp{}}
\begin{document}
\(\setNat, \setBool, \setStr\)
\end{document}
However the latter two symbols paste incorrectly as ᵓ, ᵔ (U+1D53 and U+1D54). This problem arises whenever Unicode codepoints larger than hexadecimal U+FFFF are used. How can one fix this?
The same issue arises with \pdfglyphtounicode lines (with glyphtounicode.tex); see the above-linked-to question for examples involving ordinary BMP-characters. So I would also like to ask: How can one fix it there?
accsuppdocumentation. (Maybe all this is obvious to an expert already, but it wasn't to me, even though I know Unicode very well.) :-) – Lover of Structure Oct 03 '12 at 06:34accsupp(currently here, installation hints inside until the next release of my bundle) adds a new methodunichar, where the text can be given as comma separated list of Unicode code point numbers. Numbers outside the BMP are automatically converted to surrogate pairs. – Heiko Oberdiek Nov 18 '12 at 03:59