3

Motivation: There are a lot of specific characters mentioned in How to solve the `Package inputenc Error: Unicode char not set up for use with LaTeX` problem? (This question can be considered as an extension to it) There are even more questions on this this asking about similar errors for specific characters. I know that some of the characters can be properly handled by inputenc if you include the proper packages, but is there a list (even incomplete) on which unicode characters are implemented in which packages or is there a general guide on how to find such a package if I encounter a specific character that reports these errors?

Note: I know that I can manually do \DeclareUnicodeCharacter for the problematic ones and I also know the options of using LuaTeX and XeTeX as mentioned in the only comment of the linked question, but I still believe such a table/mapping can be helpful to the general public if it does exist, especially if it is indexed by Unicode code points.

As a specific example, maybe a demonstration of how to find which package includes "parenthesized Latin small letter a ⒜ (U+249C)" is helpful. (Not which package to use, but how to find those packages).

I will also be happy to accept an answer claiming such a table/list does not/should not exist with sound evidence.

Edit: With the help of the answer by @David Carlisle I found this related question: Mapping from Unicode character to LaTeX-Symbol for BibTeX?

Weijun Zhou
  • 282
  • 1
  • 11

2 Answers2

5

The unicode.xml file available from https://github.com/w3c/xml-entities has a lot of information about names for Unicode characters, it has several latex names although only the unicode-math set (as used by unicode-math package for xetex/luatex or stix and stix2 packages for pdftex) has been checked recently. That is also the source of the tex names shown if you paste text into

https://w3c.github.io/xml-entities/unicode-names.html

David Carlisle
  • 757,742
  • Thank you for the great resources. I will try to parse the XML. Sadly it doesn't seem to work for the specific example ⒜ given in the question. – Weijun Zhou Apr 25 '19 at 18:32
  • @WeijunZhou no due to the history and existing uses of that file (it is also the master source of entity names in HTML and MathML) it is mainly targetted at math characters and European accented characters, there are entries for the full Unicode 11 but most have no tex mapping. TeX mappings could be added though..... – David Carlisle Apr 25 '19 at 18:35
  • Thank you anyway. I will try to work on something like that and may release to the public if it is sufficiently satisfying. – Weijun Zhou Apr 25 '19 at 18:37
3

Will Robertson’s “Symbols Defined by unicode-math contains nearly every math-mode symbol and its Unicode codepoint.

The documentation for the utf8 option of inputenc comes with a listing of every commonly-used text-mode symbol by its Unicode codepoint. With legacy 7- and 8-bit encodings, these are defined by a .def file, and in the modern toolchain, they are defined by fontspec with compatible commands.

Davislor
  • 44,045