4

I am using Doxygen to document my C code project. My C source files are saved with UTF-8 encoding. Within some of the files I have mathematical symbols, such as this line:

∴ ∀ FOO ∈ ℕ ≤ BAR

The symbols are copy-pasted from the fileformat.info website so are definitely the correct UTF-8 characters. My doxygen build uses a config file (encoded in UTF-8) that tells it to produce UTF-8 encoded latex output. It also instructs it to add amsmath and amssymb

The Doxygen build runs without errors or warnings

Yet when I attempt to build the latex it fails for:

("C:\Program Files\MiKTeX 2.9\tex\latex\amsfonts\umsb.fd") [1{C:/Users/Toby/App Data/Local/MiKTeX/2.9/pdftex/config/pdftex.map}] [2] [1] [2] Chapter 1. (group__pmb.tex

! Package inputenc Error: Unicode char Ôê┤ (U+2234) (inputenc) not set up for use with LaTeX.

See the inputenc package documentation for explanation. Type H for immediate help. ...

l.12 ...+E+L+O+W+E+R+B+I+TS))\mbox{]} Ôê┤ ÔêÇ A+D+D+R+_++N+...

?

It seems to error on the first symbol (∴) that it encounters.

I'm not a TeX person, I just want to document my C program well (which worked on my last PC, of course running older versions of all software involved). What more can I do to get it to understand the symbol characters?

I am using the latest version of MiKTeX (64-bit) and ghostscript (32-bit)

Toby
  • 145

1 Answers1

7

You should add

\DeclareUnicodeCharacter{2234}{\therefore}

to your document preamble. How to do it for Doxygen I don't know. You need also \usepackage{amssymb}.

You can somewhat automate the correspondence between Unicode point and command name with something like

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage{amsmath,amssymb}

\makeatletter
\newcommand\UnicodeMathSymbol[4]{%
  \ifnum#1>"FF
    \expandafter\DeclareUnicodeCharacter\expandafter{\@gobble#1}{#2}%
  \fi
}
\makeatother
\input{unicode-math-table}

\begin{document}

$∴$

\end{document}

based on the assumption that the command name offered by unicode-math-table is the same as the amssymb name.

egreg
  • 1,121,712
  • Maybe also \ensuremath? Not sure if Doxygen would be smart enough to put the symbols in math mode. – Willie Wong Dec 05 '16 at 15:40
  • Ah I meant I have amssymb, not asmsymb, d'oh. Will I need to do this for every UTF-8 symbol that I use? surely if LaTex supports UTF-8 symbols then it should recognise them without this for each symbol? – Toby Dec 05 '16 at 15:41
  • 1
    @WillieWong The error message seems to be in a math formula; I wouldn't use \ensuremath anyway, because the symbol is math and should stay in math. – egreg Dec 05 '16 at 15:42
  • 3
    @Toby no, the utf8 support means that it will decode the utf8 encoding and know that you want U+2234, it does not define all the thousands of characters that could be accessed by number and allocate a font for each one. but an alternative would be to use xelatex and unicode-math which uses opentype math fonts which do have a large range of math characters in a single font. – David Carlisle Dec 05 '16 at 15:47
  • @DavidCarlisle OK, where can I find the names to use for each character (e.g. the \therefore)? Or do they match the UTF-8 names? Bonus question: How does one know which package to use? I selected amsmath based on it's name but haven't a notion what it really provides - or any of the other packages TBH, is there a canonical listing with explanation somewhere? – Toby Dec 05 '16 at 15:53
  • 1
    you have to just know. For math characters the ams* packages almost certainy covers what you need but if it was say U+A880 I'd have no idea what font would support that, other than google for information, but perhaps you would do better with xelatex (or lualatex) and just use unicode-math` package but that's really not the subject of this question/answer – David Carlisle Dec 05 '16 at 15:57
  • 1
    @Toby: If you don't know the LaTeX commands, detexify can be handy. Alternatively, egreg's answer here can also be useful for looking things up if you know the unicode. – Willie Wong Dec 05 '16 at 16:07