Print catcodes as subscripts

Question

I have the following code to print catcodes as subscripts. Can one improve it to work also for the backslash and the braces? And what about the space? (I'm not asking about the comment char and ignored characters but it would be interesting too).

\documentclass{article}
\usepackage{expl3,xparse}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\ExplSyntaxOn

\NewDocumentCommand\showcatcodes { m }
 {
  \tl_map_inline:nn { #1 } 
   {
     \tl_to_str:n {##1} \textsubscript{\char_value_catcode:n{`##1}}
   }
 }  

\ExplSyntaxOff


\begin{document}
\showcatcodes{abcde123!$_ ^€\{\}} 
\end{document}

Probably this can be done by varying the output of \tl_show_analysis:N — Joseph Wright, Mar 15 '17 at 19:21
Something like \def\showcatcodes#1#2\relax{% \string#1% \ifcat ###1$_{6}$\else% \ifcat &#1$_{4}$\else% \ifcat A#1$_{11}$\else% \ifcat .#1$_{12}$\else% \ifcat $#1$_{3}$\else% \ifcat _#1$_{8}$\else% \ifcat ^#1$_{7}$\else% \ifcat \noexpand\relax\noexpand#1$_{0}$\else% \fi\fi\fi\fi\fi\fi\fi\fi% \ifx\relax#2\relax\else\showcatcodes#2\relax\fi% } will show \{ and \} and other macros as catcode 0, but it does not handle actual braces { and }. Note invocation as \showcatcodes abc&de#123!$_ ^€\{\}\relax — Steven B. Segletes, Mar 15 '17 at 19:38
@JosephWright: I liked the "probably" -- it means that I didn't overlook something obvious ;-). I checked \tl_show_analysis:N but it returns things like \abc as a single token so I'm not sure if it is suitable. — Ulrike Fischer, Mar 16 '17 at 08:58

score 10 · Answer 1 · answered Mar 15 '17 at 20:30

10

Apart from spaces you could do

\documentclass{article}
\usepackage{expl3,xparse}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\ExplSyntaxOn

\NewDocumentCommand\showcatcodes { m }
 {
   \expandafter\tl_map_inline:nn\expandafter{ \detokenize{#1} }
   {
     \tl_to_str:n {##1} \textsubscript{\char_value_catcode:n{`##1}}
   }
 }

\ExplSyntaxOff


\begin{document}

\showcatcodes{a bcde123!$_ ^€\{\} {} } 
\end{document}

spaces you ought to be able to do by using \scantokens to change the catcode of space after making everything else safe with `\detokenize but scantokens is a dangerous beast and it's biting back at present.

answered Mar 15 '17 at 20:30

David Carlisle

757,742

1

That's good. But I think you should use \cs_generate_variant:Nn \tl_map_inline:nn { on } and \tl_map_inline:on { \tl_to_str:n {#1} } ;-) – Ulrike Fischer Mar 15 '17 at 20:51
I was expecting egreg to reprimand me on poor l3 style not you:-) I did actually try \tl_map_inline:on and ;xn but as they didn't exist I reverted to form and used \expandafter :-) – David Carlisle Mar 15 '17 at 21:03
Actually I don't understand why it works. Isn't after \detokenize the argument a list of chars with catcode 12? So why are the subscripts "correct" ? – Ulrike Fischer Mar 15 '17 at 21:20
@UlrikeFischer you are not looking up the catcodes of the tokens passed in, you are looking up the current catcode values. If you want to inspect the catcodes of the tokens then you need \ifcat a #1 11 \else \ifcat^#1 7 \else.... – David Carlisle Mar 15 '17 at 21:26
I'm not sure if we have a standard function in L3 to return the catcode of a token as a number... – David Carlisle Mar 15 '17 at 21:27
@DavidCarlisle I think that l3tl-analysis could be tweaked into doing it, but some trick by Bruno is needed. – egreg Mar 15 '17 at 23:55

egreg · Accepted Answer · 2017-03-16T11:43:23.397

You can use l3regex:

\documentclass{article}
\usepackage{expl3,xparse,l3regex}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\ExplSyntaxOn

\NewDocumentCommand\showcatcodes { v }
 {
  \regex_extract_all:nnN { . } { #1 } \l_tmpa_seq
  \seq_map_inline:Nn \l_tmpa_seq
   { \ulrike_value_catcode:x { \tl_to_str:n {##1} } }
 }
\cs_new_protected:Nn \ulrike_value_catcode:n
 {
  #1\textsubscript{\char_value_catcode:n { `#1 }}
 }
\cs_generate_variant:Nn \ulrike_value_catcode:n { x }
\ExplSyntaxOff


\begin{document}
\showcatcodes{abcde123!$_ ^€\{\}}
\end{document}

This is the output with XeLaTeX (after removing inputenc and fontenc):

Final variant with \textvisiblespace and monospaced fonts for the characters:

\documentclass{article}
\usepackage{expl3,xparse,l3regex}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\ExplSyntaxOn

\NewDocumentCommand\showcatcodes { v }
 {
  \group_begin:
  \ttfamily
  \regex_extract_all:nnN { . } { #1 } \l_tmpa_seq
  \seq_map_inline:Nn \l_tmpa_seq
   { \ulrike_value_catcode:x { \tl_to_str:n {##1} } }
  \group_end:
 }
\cs_new_protected:Nn \ulrike_value_catcode:n
 {
  \tl_if_blank:nTF { #1 } { \textvisiblespace } { #1 }
  \textsubscript{\normalfont\char_value_catcode:n { `#1 }}
 }
\cs_generate_variant:Nn \ulrike_value_catcode:n { x }
\ExplSyntaxOff


\begin{document}
\showcatcodes{abcde123!$_ ^€\{\}}
\end{document}

That works fine. I added a test \tl_if_blank:nTF to print the space as \textvisiblespace. But what does \tl_set:Nn \l_tmpa_tl { #1 } do? It is a rest from some older code? — Ulrike Fischer, Mar 16 '17 at 08:54

Print catcodes as subscripts

2 Answers2

Linked