2

I've just found

How do I enter an arbitrary Unicode code point into my document?

but I want to do the same with pdfTeX, not LuaTeX. Is this possible?

einpoklum
  • 12,311

1 Answers1

5

I assume, the Unicode character is given as four hexadecimal digits (or eight hexadecimal digits for a surrogate pair). Then macro \uni performs the following steps:

  • First the hex string is extended to eight hexadecimal digits for a full hexadecimal representation of a Unicode character in encoding UTF-32BE.
  • \pdfunescapehex converts the hexadecimal digits to the four bytes in encoding UTF-32BE.
  • Package stringenc converts from UTF-32BE to UTF-8.
  • \scantokens is used to convert the string of inactive characters (category code 12/other) to active characters, needed for encoding utf8 of package inputenc.
  • Since the definition is fragile, the macro is made robust by \DeclareRobustCommand.
  • Additionally, support for hyperref is added, so that the defined macro \uni can be used in bookmarks.

Full example:

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\usepackage{textcomp}
\usepackage[utf8]{inputenc}

\usepackage{stringenc}
\newcommand*{\uni}{}
\DeclareRobustCommand*{\uni}[1]{%
  \begingroup
    \StringEncodingConvert\x{%
      \pdfunescapehex{%
        00%
        \ifnum"#1<"100000 0\fi
        \ifnum"#1<"10000 0\fi
        \ifnum"#1<"1000 0\fi
        \ifnum"#1<"100 0\fi
        \ifnum"#1<"10 0\fi
        #1%
      }%
    }{utf32be}{utf8}%
    \everyeof{\noexpand}%
    \endlinechar=-1 %
  \edef\x{%
    \endgroup
    \scantokens\expandafter{%
      \expandafter\unexpanded\expandafter{\x}%
    }%
  }\x
}

% hyperref support
\usepackage[pdfencoding=auto]{hyperref}
\pdfstringdefDisableCommands{%
  \def\uni#1{\unichar{"#1}}%
}

\begin{document}
\section{Musical note \uni{266A} in title}
Symbols: \uni{266A}, \uni{B1}, \uni{20AC}, \uni{DF}.
\end{document}

Result

Heiko Oberdiek
  • 271,626
  • For what it's worth, I just enter the Unicode character directly into my document. There's no general reason to keep everything in ASCII, although you may have specific requirements otherwise. – Derek Sep 16 '16 at 03:01