2

Where can one find a listing of all characters per TeX category code? I need to construct a symbol table for parsing purposes and I have been unable to locate a complete listing. I don't mind if I have to generate it programmatically, but any pointers in the right direction would be appreciated.

David Carlisle
  • 757,742
sesodesa
  • 753
  • 2
    I'm not sure what you mean by "list of all characters per TeX category code". The standard catcodes are given e.g. here. utf8 encoding makes some other bytes active. – campa Nov 24 '20 at 14:49
  • 1
    in Unicode TeXs there are 1114111 code points hex 10FFFF do you want a list of all the catcodes for each character??? the catcode of any character can be changed at any time. If you go \catcode88=5 then the catcode of character 88 is 5. – David Carlisle Nov 24 '20 at 15:02
  • Also, you're not specifying a format, an environment, etc. Without looking outside LaTeX, many packages change catcodes on the fly (expl3, pgf/tikz, packages for verbatim code, etc.). LaTeX itself plays with @'s catcode (\makeatletter/makeatother) to protect macros. Outside LaTeX, you have Plain, ConTeXt and others with different catcode regimes... –  Nov 24 '20 at 15:17

1 Answers1

5

This prints out the values of the catcode for all characters that have non zero catcode at the current point.

Process with pdflatex you get a 6 page document with the table

Process with xelatex you get a 24220 page document.

The first page of the xelatex version looks like

enter image description here

\documentclass{article}

\begin{document} \ifx\Umathchar\undefined \def\maxchar{"FF } \else \def\maxchar{"10FFFF } \fi

\newcount\zz

\loop \ifnum\catcode\zz>0 Catcode \the\zz=\the\catcode\zz\par \fi \ifnum\zz<\maxchar \advance\zz 1 \repeat

\end{document}

David Carlisle
  • 757,742
  • This is pretty much what I was looking for. Now I just need to convert it to a text file and parse it in my build script. – sesodesa Nov 24 '20 at 15:29
  • @SeSodesa what is the use case here? It just gives the values at that point, if you load babel for example catcodes of several characters will be different and your build script will be wrong. (But it would be easy to change that loop to write a text file rather than printing the output) – David Carlisle Nov 24 '20 at 15:35
  • @SeSodesa eg change Catcode \the\zz=\the\catcode\zz\par to \typeout{\the\zz,\the\catcode\zz}% and you'll get comma separated values text in the log – David Carlisle Nov 24 '20 at 15:38
  • I'm building a reStructuredText to LaTeX (fragment) transpiler, so I need to change the ouput based on which character the parser encounters. reST documents can contain arbitrary unicode, and I want to be able to handle at least some of the characters pdflatex might have trouble with. – sesodesa Nov 24 '20 at 15:44
  • 2
    @SeSodesa it seems you are attacking this from the wrong end, the table is very large and at best only partially useful for that use case. For example if you are generating text from rst and that source has save 10% only $3 You could worry that the standard catcodes of % and $ are special but I wouldn't I would just arrange that in your generated text you have \catcode\%=12and\catcode`$=12so that%and$` are not special. – David Carlisle Nov 24 '20 at 15:49
  • That is something I will take into consideration, but I need to take the needs of the inverse transpiler into consideration as well. I need to discuss this with the author of said transpiler. – sesodesa Nov 24 '20 at 15:55