310

How do I enter Unicode characters in LaTeX? What packages do I need to install and what escape sequence do I type to specify Unicode characters in an ASCII source file?

John D. Cook
  • 3,331

15 Answers15

180

"Unicode" in this context could mean either in the input or in the output. I assume you're looking to insert something like "©" into your source and have it do something meaningful.

For full support for unicode input and unicode fonts, take a look at XeTeX; it's easy to get started — just select an appropriate font and the unicode characters in your input are directly typeset as unicode glyphs in the output. Switching engines is not always a possibility, however, and sometimes you'll want to stick with pdfTeX for its other useful features.

The best that regular LaTeX (i.e., based from pdfTeX in a modern distribution) can do is recognise UTF-8 sequences in the text and expand macros based on what it sees. Load the inputenc package to select the UTF-8 input encoding:

\usepackage[utf8]{inputenc}

Note that the resulting input file must not have a byte-order mark (BOM) at the beginning, or else it won't compile. (You can also use the [utf8x] option which has more extensive coverage but is not as well supported. I don't have any experience using this option.)

To define behaviour for unicode characters, use the \DeclareUnicodeCharacter command that is then defined. Here's an example for binding the control sequence \dash to the input character "—"; i.e., a literal em-dash, U+2014, in the source:

\DeclareUnicodeCharacter{2014}{\dash}

\dash can then be defined in the usual manner; I use:

\DeclareRobustCommand\dash{%
  \unskip\nobreak\thinspace\textemdash\allowbreak\thinspace\ignorespaces}

This defines a dash that has a small space on either side and will only allow a line break after it.

  • 6
    \usepackage[utf8]{inputenc} worked for me, cheers – Grzenio Mar 21 '09 at 15:50
  • 14
    Doesn't work for me. \DeclareUnicodeCharacter has no effect, whether it's in or not, the \dash command works. OTOH, if \DeclareRobustCommand is missing, \dash doesn't work. And where does the Unicode character enter anyways? \DeclareRobustCommand uses \textemdash.

    (Of course this works in a way for the dash, but I tried to transfer it to another Unicode character, U+2318, the "twiddle" known from the Apple command key.)

    –  Nov 19 '09 at 09:04
  • 2
    I suggest creating a minimal example and asking a new question. – Will Robertson Nov 19 '09 at 13:24
  • Note though that in practice there seem to be no constraints against line breaks either before or after an (em-/en-)dash used for parenthetical purposes. See my now updated answer to this question about hyphens and dashes. – Lover of Structure Jul 24 '12 at 19:42
  • @user14996 I'm fairly sure this is discussed in the TeXbook, and I have no problem consulting Knuth as an authority in this area. Happy to concede that most publications don't do it though — possibly due to the software they use. – Will Robertson Jul 26 '12 at 03:03
  • @WillRobertson The only implicit reference I can find is the bottom of p. 95. I'd say Bringhurst is a definite authority on such matters (but I don't know whether he discusses this); Knuth has had an influence on typesetting via TeX, but I don't agree with all of his ideas. I don't find any restrictions on line breaks to the left of dashes in the Chicago Manual of Style. If you can dig up an unambiguous reference, more power to you; but plz also include all sources you've checked that turned out to lack such a prescription ;-) – Lover of Structure Jul 26 '12 at 06:06
  • @user14996 I guess I've been mistaken, then. I wonder where I got this idea? I do prefer it. – Will Robertson Jul 26 '12 at 06:21
  • @WillRobertson I don't think such a constraint would be crazy or bad and wouldn't be surprised if some publishers obeyed it; the truth is that such finer points of style guides differ. Still, "in the wild" I can't observe such a constraint being there/widespread. Though that definitely doesn't mean that typography in the wild is always best practice. Though on this issue I personally don't see a strong reason for a prohibition against breaking before a dash. – Lover of Structure Jul 26 '12 at 06:44
  • I already had \usepackage[utf8]{inputenc} so I tried \usepackage[utf8x]{inputenc}, but it did not help. I will find some other solution. – Henke - Нава́льный П с м Oct 13 '20 at 12:46
82

Have you considered using XeTeX? This is an adaptation of TeX that adds Unicode support, and is included in the latest TeX Live and MiKTeX distributions. This Wikipedia article gives a good introduction.

ChrisN
  • 1,400
  • 4
    Not only Unicode support (that was partially available as an ugly hack-job before) but proper modern font support as well. Very nice, but a pain to get working, at least here :-) – Joey Jul 25 '11 at 12:48
  • 51
    Can you post a minimal example of using xetex that illustrates what you mean? At minimum, it seems you need to set the default font to something that covers the range of characters you want -- otherwise, characters not covered are simply (and silently!) ignored. – ShreevatsaR Jan 16 '12 at 05:54
  • 21
    Unlike the other answers, this doesn't seem to answer the question. – András Salamon Feb 27 '13 at 19:35
  • 1
    You need to add \usepackage{unicode-math}, see https://tex.stackexchange.com/a/394109/85164 – AnthonyC Apr 17 '19 at 03:21
  • 1
    Then use \char" followed by the four digit Unicode value: https://stackoverflow.com/a/56707992/1458208 – ancestral Jun 21 '19 at 17:26
  • This doesn't answer the question. – Robert Dodier Nov 27 '23 at 05:08
71

This is a minimal example that finally worked for me without using XeTeX:

\documentclass{article}
\usepackage[mathletters]{ucs}
\usepackage[utf8x]{inputenc}

\begin{document}
    The vorticity $ω$ is defined as $ω = ∇ × u$.
\end{document}
  • 1
    Thanks Roberto, this is a nice trick when you are bound to use no XeTeX. E.g. with texi2dvi which I used in R I did not know how to switch the engine. I had to reprogram my rendering functions if it wasn't for your hint here, mathletters did the trick, YAY! – hans0l0 Nov 23 '12 at 18:44
  • 3
    Also works for pdflatex – Alec Jacobson Aug 27 '13 at 11:58
  • 5
    Not great to use utf8x. I have an MWE for xelatex, but I can't publish it, unfortunately. Let me know if you want me to "ask the right questions" that might prompt the right answers for you. –  Jan 23 '19 at 02:19
  • 3
    This is a great solution! For those who oppose this solution, I would like to know better what exactly can go wrong with utf8x. Even if I was willing to use "better" compilers, I really need backward compatibility to be 100% sure that my co-authors will be able to compile my files, so yes I'll be using good old pdflatex for a long time – user4929 Apr 22 '19 at 01:33
  • Remark: Unlike unicode-math, x²³ will create "double superscript error", which causes problem (incorrect typesetting) with x²³₄₅ for example. – user202729 Sep 13 '21 at 15:34
  • Does it work for Pdftex on overleaf? – dodo Dec 04 '23 at 03:53
37

Try \char"hexcode like \char"2012 for the ‒ (figure-dash). This command works in XeLaTeX and probably other engines

doncherry
  • 54,637
  • 4
    Welcome to Stack Overflow! This will only work in certain TeX engines, especially the unicode-capable ones (XeTeX, LuaTeX). Could you add to your answer in which engine your example worked? – Paŭlo Ebermann Nov 12 '11 at 18:19
  • 1
    and how to insert a hexidecimal? – Малъ Скрылевъ Jan 30 '15 at 13:32
  • 3
    Thank you. I have been searching for hours, only to be misdirected to things that don't work. This is the first solution that works at least for XeLaTeX. It is kind of shocking, really, that there is NO standard way to simply specify a unicode codepoint in a document and have it work everywhere. – AgilePro Apr 05 '15 at 23:04
  • 1
    This does not work for me in XeLaTeX – 71GA May 26 '20 at 15:21
  • Note that •(1) this does not work in PDFlatex, it's explicitly mentioned in the TeXBook that "AB only work for at most 2 hexadecimal characters, and •(2) in most case it's possible to simply type out the Unicode character directly ( instead of \char"2300 and it will work), •(3) If the character is not available in the font, it will be silently skipped, see https://tex.stackexchange.com/questions/45796/unicode-characters-not-displaying-in-xetex – user202729 Sep 13 '21 at 15:50
15

In case anyone is not satisfied with any of the answers: I just had the same problem and came up with my own little solution. I didn't want to dig into another distribution but stay with pdflatex. So I created a textfield in inkscape, put the character in, cropped it, and saved as pdf. You can include the pdf in your document like this:

\includegraphics[width=1em]{symbol.pdf}
Moritz H.
  • 159
  • Welcome to TeX.SE! Yes the solution of using an image has been mentioned a few times on this site recently (e.g. at this question), but it hasn't been mentioned at this question from 2008 so I've upvoted your answer. Of course, this would be a highly impractical solution in other cases, such as when one has a large number of Unicode characters one wants to enter. – ShreevatsaR May 16 '17 at 21:29
  • 3
    Problem is that images can't be text-searched. –  Jan 23 '19 at 02:21
  • 3
    Odds are, when you trying to include non ASCII characters, people don’t want to search for them. In my case it was the “airplane” character. Of course, I don’t know how popular Tex is in non ASCII-languages. – Moritz H. Jan 24 '19 at 07:17
  • Use replace inkscape with Xelatex. Just generate the character you want in Xelatex as a standalone document class and proceed include it as you mentioned above. – Mahmoud Jun 29 '20 at 20:54
  • @MoritzH. TeX is incredibly popular in non ASCII-languages (as well), and these languages exceed by far (in their number and in their number of speakers) the "ASCII-only world" – Daniel Diniz Jul 17 '22 at 21:22
14

As of 2020, Arthur Reutenauer says that XeTeX has “gone into maintenance mode,” and the future of TeX development is LuaTeX. I would therefore recommend using LuaTeX when you can, then XeTeX if you have to, and PDFTeX if it’s all that your publisher supports.

Now that LuaTeX supports complex scripts, the main XeTeX feature I use that LuaTeX does not have (as of July 2020) is interchar tokens. There are, on the other hand, many LuaTeX features that XeTeX does not have. I use microtype font expansion in nearly every document I create.

If you’re asking what syntax to use to enter Unicode characters, you can use the syntax ^^^^abcd for U+ABCD, \char"ABCD, \symbol{"ABCD}, or any of the macros defined by the LaTeX kernel or unicode-math.

Davislor
  • 44,045
11

This answer explains more of the background "what happens internally, and why it seems to be so complex".

PDFTeX XeTeX/LuaTeX
Text mode Maintained by LaTeX team (inputenc), compose accent on characters Available by default, if font has character
Math mode (legacy font) Not officially supported Not officially supported
Math mode (OpenType math font) Not available Maintained by LaTeX team (unicode-math)

1   Text mode

1.1   PDFTeX

Explanation

It's commonly said that "PDFTeX does not support Unicode", but

  • it doesn't mean you can't type a non-ASCII character like ǫ and get that character in the output.

What it means is

  • Each UTF-8 character (byte sequence) is interpreted as 2/3 character tokens, not 1

  • TeX cannot natively "pick character at position X from OpenType font Y and put on the paper". (source 1 2 3 )

    However, it can pick character at position X from old-style (METAFONT) font file Y, and so it's possible to "convert" a OpenType font to old font files (each font file contains maximum 256 characters). See CJK package for an example.

More commonly, characters with accents are composed from a letter and an accent, which is what inputenc package does.

Particular example:

  • In the case of the letter ǫ, internally TeX use inputenc library to "convert" the 2 bytes c7 ab to the macro \k o, which puts the character o and the hook on the PDF page.

    If you copy the content on the PDF page, you'll get – the character o and the hook separately.

    (it's possible to make the copied content and the displayed content different, see comment, but that's not the point here.)

How to fix it?

As explained above, there's no completely general way.

Either define some way to "draw" the character (DeclareUnicodeCharacter), or find some package that draws these characters for you.

In the latter case, use DeTeXify or Comprehensive LaTeX symbols list. also: Comprehensive LaTeX symbols, indexed by code point

Example questions

1.1.1   PDFTeX versions older than 2018

In these versions, you also need to include

\usepackage[utf8]{inputenc}

so that UTF8 encoding is used. In newer versions, it's included by default.

1.2   XeTeX/LuaTeX

In this case, the program can typeset the character directly (in XeTeX/LuaTeX, the default encoding is TU which allows typesetting character from OpenType font by specifying Unicode character directly) so usually there's no problem.

Special case: if the font does not have the character (for example the character α in the default Latin Modern font), it will drop an empty space in. See https://tex.stackexchange.com/a/377729/250119.

Note: do not ever use \usepackage[T1]{fontenc} on XeTeX/LuaTeX! (unless you know what you're doing)

Side note, by default (TU encoding) LaTeX does some smart thing to automatically convert \`y to typeset the character in Unicode so copy paste from the PDF works correctly.

How to fix it?

Look for a different font.

Example questions

See also

2   Math mode

2.1   PDFTeX

Explanation

The LaTeX team does not define in the inputenc package – the most popular one used to process Unicode input -- see 1 2 3.

It's not too clear what's the rationale behind this decision, but one possible reason is, as mentioned in the documentation of inpmath package (and also in this answer), that there's no known way to implement it in a bulletproof way -- namely if \ifmmode check is used without a initial \relax then it will fail at the start of halign alignment entry, but with \relax added it will break kerning.

That remark was written back in 2016. Nevertheless, the package itself uses \protected\def\protected@empty{}, which appears to work without breaking the kerning, so I don't know what's the real rationale...? I don't know if there's any other disadvantage.

So, if you use e.g. in math mode it will not work.

Compare:

Character utf8 utf8x (ucs) [mathletters]{ucs}
ǫ good good good
not supported \ensuremath{\in} same
α not supported (requires babel or similar) \textalpha (still requires babel or similar) \ensuremath{\alpha}
not supported \textsuperscript n \ensuremath{^n}

As mentioned in ucs package documentation of mathletters option:

This option is disabled by default, because using math greek in a normal text does not look good

How to fix it?

Unfortunately, you have to either define them yourself, or resort to third-party packages (none of them are particularly complete at the moment, unfortunately). See part 3 below.

2.2   XeTeX/LuaTeX

Explanation

Without unicode-math, the engine uses traditional math font source, so it has the same problem as above.

With unicode-math, the engine uses Latin Modern Math font (by default) this also explains why there are some differences between the output with and without unicode-math – because the glyph in Latin Modern math font is different, so you can use characters like α in math mode without any problem.

Internally, the package unicode-math:

  • defines the mathcode for each Unicode character (unlike PDFTeX, it's not necessary to make these characters active, because the code point of the character to be typeset is the same as the input character), and
  • define the corresponding command (\alpha, \varnothing, \mathbb) to translate to the corresponding Unicode character.

This should work fine, no need to fix.

2.3   Special case

If the character is not a Unicode math symbol, but rather a "letter" (and you can already get the letter to show up normally in text mode, see part 1 above), then usually you'd want to use \text{...}. Example question.

3   Use Unicode input with legacy math font

Recall that this is not supported by the LaTeX team.

  • find some package that defines it

    • unicode-math-input package

      (disclaimer: I'm the maintainer) I aim to support all the characters that unicode-math package supports.

    • commonunicode package -- works in both engines; however this redefines all characters as active, which means it breaks in e.g. fancyvrb verbatim environments; as well as using \ensuremath everywhere, for more details see documentation of unicode-math-input package.

      This is the list of characters included in this package but missing in unicode-math[-input].

    • utf8x option (ucs package). This is highly incompatible with several packages however, see comparison with [utf8]inputenc.

    • inputenx package has some math characters, using it depends on some removed inpmath package (now source code can be found on the Internet Archive. See also comparison.

    • unixode package for example, mentioned in other answer.

  • defining the characters yourself.

3.1   How to define Unicode characters?

  • \DeclareUnicodeCharacter 1 2: for latex and pdflatex only.
  • newunicodechar package 1 2: for all engines.

Potentially with some disadvantages e.g. the latter may break usage of these Unicode characters in fancyvrb environment, see documentation of unicode-math-input for details.

3.2   How to find a table of correspondences?

  • An extensive list of LaTeX symbols and Unicode equivalents?

  • from source code of commonunicode, unicode-math-input, unicode-math or unixode package.

  • from ucs source code. Has approximately 700 (math) symbols. (based on counting number of lines with ensuremath in the generated .def files)

  • a list from flowfram / texparser source code.

  • from unicode-math source code. Has approximately 2500 symbols.

    Note that, as explained above, this table is meant to define the command to map to the Unicode character, so

    • there are some Unicode characters that is listed twice.
    • some escape sequences are not defined in the default "legacy math" mode (for example \muprho "math upright rho")
  • from unixode source code. Has approximately 100 symbols.

  • Miscellaneous unclassified: texmf-dist/tex/generic/enctex/utf8raw.tex, texmf-dist/tex/latex/pdfx/l8umath-penc.def, texmf-dist/tex/luatex/markdown/markdown.lua, texmf-dist/tex/luatex/luaxml/luaxml-namedentities.lua

3.3   Additional information

  • Supporting multiple consecutive superscript/subscript characters such as x²³⁴:
    • there's a "simple" way of defining ² to map to {}^2 for example (approach used by OpTeX in an answer below), but this will clearly put the superscript in incorrect positions (because the superscript is relative to the height of the {}, not the formula before)
    • there are more complicated ways, refer to Interpreting unicode ², ³, etc... characters in math mode.
user202729
  • 7,143
  • .note. Looks like you can use computer modern math as OpenType font https://tex.stackexchange.com/a/191293/250119 – didn't figure out how. – user202729 Dec 28 '21 at 07:03
10

In order to use XeLaTeX (and even both pdflatex and xelatex on the same document), you can use the simple unixode package:

\documentclass{article}
\usepackage{unixode}

\begin{document}
    The vorticity $ω$ is defined as $ω = ∇ × u$.
\end{document}

You may then compile your document either with pdflatex or with xelatex.

Note: the package is in development; the aim is to support as many unicode equivalents as possible.

Olivier
  • 3,151
  • 3
    Looking at the code, it only supports Unicode math symbols. – Metamorphic Sep 06 '18 at 19:47
  • Also doesn't look like "in development". Last commit in 2016, a pull request from 2016 not merged. (also, not included in TeX live, so need to download source from GitHub.) – user202729 Dec 28 '21 at 06:15
7

As of today, both XeTeX and LuaTeX will let you input unicode without complaining.

raphink
  • 31,894
  • 14
    Whether it "complains" depends on HOW you enter it. You need to be specific on exactly what you mean by "will let you input". – AgilePro Apr 05 '15 at 21:13
  • 3
    @ℝaphink Could you show an example on how you would input unicode in LuaLaTeX ? – SDrolet Mar 30 '18 at 19:46
5

Sorry, I'm not an expert on this, but hope I can at least provide some useful leads.

A lot of the early multi-lingual support for LaTeX predates the widespread adoption of Unicode, although it looks like there's been some consolidation around Unicode recently. So you might find something useful in specific language support packages, e.g. CJK LaTeX (for Chinese, Japanese and Korean).

Another Unicode package for LaTeX has a new name (formerly unicode; now ucs). For a list of Unicode packages, see https://ctan.org/topic/unicode .

You might also have a look at the excellent book The LaTeX Companion, which includes a section on multilingual text.

2

This is a minimal example that finally worked for me using OpTeX:

\fontfam[lm]

The vorticity $ω$ is defined as $ω = ∇ × u$. \bye

wipet
  • 74,238
1

If you are looking for unicode characters defined in a standard font, you can do: Use either XeLaTeX or LuaLaTeX.

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage{fontspec}
\begin{document}
Print some leaves: {\fontspec{Symbola} %the font name
                                      \symbol{"1F343}} %unicode symbol code                  
\end{document}

enter image description here

Schroeder
  • 121
  • Thank you! This finally solved the problem I was struggling for a long time. On Ubuntu I had to first manually download the Symbola font with sudo apt install fonts-symbola. – Eerik Sven Puudist Aug 08 '20 at 10:13
  • I'm still struggling to be able to do this on non-standard fonts, particularly in Overleaf. – Schroeder Aug 19 '20 at 19:23
  • 1
    Something is wrong with this solution. If you use XeLaTeX or LuaLaTeX then you don't need inputenc at all. – user202729 Oct 26 '21 at 03:09
  • Unfortunately I don't have much time to play around and fine tune this answer. If you do and can find a better way, please post an improved answer. – Schroeder Oct 27 '21 at 19:33
1

I found the following tweak helpful when you are dealing with very specific (often weird-looking) unicodes. For instance, if you want to render the Cuneiform Sign Dugud character (U+12082), simply copy-paste the following into a .tex file and compile it:

\documentclass[]{article}

% The required package \usepackage{unicode-math}

% The definitioning of the (locally installed) font family % that supports the unicode symbol \newfontfamily\NotoSansCuneiform{Noto Sans Cuneiform}

% The definitioning of the literal custom unicode symbol % The font command and the unicode must be encapsulated within "\text{}", % otherwise the text will be rendered in Noto Sans Cuneiform % all the way to the end of the document \newcommand{}{\text{\NotoSansCuneiform }}

\begin{document} The Cuneiform Sign Dugud (U+12082) has the following glyph: . \end{document}

The following is the screenshot of the entire code (in case your browser cannot render it):

The LaTeX code for rendering U+12082 character

And the following is the screenshot of the output after rendering using KDE Kile:

The output of rendering U+12082 character using LaTeX

Look how easy it is to display the U+12082 character!

While Davislor's approach of using \symbol{"12082} is definitely working, you can copy-paste the above code into a word processor with some compatible font family. Then the Cuneiform Sign Dugud character (plus a leading "\") will be displayed literally, instead of just the number "12082".

However, there are three things to note when using this method:

  • You must ensure that the local font family that you are using is compatible with the unicode character that you want to render (e.g., using Unicode Map application in your computer to view a font's supported unicodes).
  • Each unicode character must be defined individually. And if there are two glyphs that are mapped by two distict font families, both fonts must be defined in the document as well.
  • As ChrisN has pointed out, the above LaTeX code should be compiled using XeTeX. This is because the command \newfontfamily{} is not supported by PDFLaTeX (the default compiler in several LaTeX editors).
0

This question is really ambiguous, and I believe the answers are to the wrong interpretation. To have LaTeX handle Unicode is what is being answered, what I understand is being asked is how to enter such characters into the file. And that depends on the editor used... I've even copy&pasted some from Wikipedia pages into xemacs to go around that. The methods given in the Unicode FAQ clash with xemacs definitions or get interpreted at random by gnome-terminal :-(

vonbrand
  • 5,473
0

Just linking to this other answer that uses unicode-math’s source file to automatically define all symbols: Unicode maths in pdflatex

tobiasBora
  • 8,684