Where can one find a listing of all characters per TeX category code? I need to construct a symbol table for parsing purposes and I have been unable to locate a complete listing. I don't mind if I have to generate it programmatically, but any pointers in the right direction would be appreciated.
Asked
Active
Viewed 208 times
2
1 Answers
5
This prints out the values of the catcode for all characters that have non zero catcode at the current point.
Process with pdflatex you get a 6 page document with the table
Process with xelatex you get a 24220 page document.
The first page of the xelatex version looks like
\documentclass{article}
\begin{document}
\ifx\Umathchar\undefined
\def\maxchar{"FF }
\else
\def\maxchar{"10FFFF }
\fi
\newcount\zz
\loop
\ifnum\catcode\zz>0
Catcode \the\zz=\the\catcode\zz\par
\fi
\ifnum\zz<\maxchar
\advance\zz 1
\repeat
\end{document}
David Carlisle
- 757,742
-
This is pretty much what I was looking for. Now I just need to convert it to a text file and parse it in my build script. – sesodesa Nov 24 '20 at 15:29
-
@SeSodesa what is the use case here? It just gives the values at that point, if you load babel for example catcodes of several characters will be different and your build script will be wrong. (But it would be easy to change that loop to write a text file rather than printing the output) – David Carlisle Nov 24 '20 at 15:35
-
@SeSodesa eg change
Catcode \the\zz=\the\catcode\zz\parto\typeout{\the\zz,\the\catcode\zz}%and you'll get comma separated values text in the log – David Carlisle Nov 24 '20 at 15:38 -
I'm building a reStructuredText to LaTeX (fragment) transpiler, so I need to change the ouput based on which character the parser encounters. reST documents can contain arbitrary unicode, and I want to be able to handle at least some of the characters
pdflatexmight have trouble with. – sesodesa Nov 24 '20 at 15:44 -
2@SeSodesa it seems you are attacking this from the wrong end, the table is very large and at best only partially useful for that use case. For example if you are generating text from rst and that source has
save 10% only $3You could worry that the standard catcodes of%and$are special but I wouldn't I would just arrange that in your generated text you have\catcode\%=12and\catcode`$=12so that%and$` are not special. – David Carlisle Nov 24 '20 at 15:49 -
That is something I will take into consideration, but I need to take the needs of the inverse transpiler into consideration as well. I need to discuss this with the author of said transpiler. – sesodesa Nov 24 '20 at 15:55

utf8encoding makes some other bytes active. – campa Nov 24 '20 at 14:49\catcode88=5then the catcode of character 88 is 5. – David Carlisle Nov 24 '20 at 15:02expl3,pgf/tikz, packages for verbatim code, etc.). LaTeX itself plays with @'s catcode (\makeatletter/makeatother) to protect macros. Outside LaTeX, you have Plain, ConTeXt and others with different catcode regimes... – Nov 24 '20 at 15:17