16

I have the following code to print catcodes as subscripts. Can one improve it to work also for the backslash and the braces? And what about the space? (I'm not asking about the comment char and ignored characters but it would be interesting too).

\documentclass{article}
\usepackage{expl3,xparse}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\ExplSyntaxOn

\NewDocumentCommand\showcatcodes { m }
 {
  \tl_map_inline:nn { #1 } 
   {
     \tl_to_str:n {##1} \textsubscript{\char_value_catcode:n{`##1}}
   }
 }  

\ExplSyntaxOff


\begin{document}
\showcatcodes{abcde123!$_ ^€\{\}} 
\end{document}

enter image description here

Ulrike Fischer
  • 327,261
  • Probably this can be done by varying the output of \tl_show_analysis:N – Joseph Wright Mar 15 '17 at 19:21
  • Something like \def\showcatcodes#1#2\relax{% \string#1% \ifcat ###1$_{6}$\else% \ifcat &#1$_{4}$\else% \ifcat A#1$_{11}$\else% \ifcat .#1$_{12}$\else% \ifcat $#1$_{3}$\else% \ifcat _#1$_{8}$\else% \ifcat ^#1$_{7}$\else% \ifcat \noexpand\relax\noexpand#1$_{0}$\else% \fi\fi\fi\fi\fi\fi\fi\fi% \ifx\relax#2\relax\else\showcatcodes#2\relax\fi% } will show \{ and \} and other macros as catcode 0, but it does not handle actual braces { and }. Note invocation as \showcatcodes abc&de#123!$_ ^€\{\}\relax – Steven B. Segletes Mar 15 '17 at 19:38
  • @JosephWright: I liked the "probably" -- it means that I didn't overlook something obvious ;-). I checked \tl_show_analysis:N but it returns things like \abc as a single token so I'm not sure if it is suitable. – Ulrike Fischer Mar 16 '17 at 08:58

2 Answers2

10

Apart from spaces you could do

enter image description here

\documentclass{article}
\usepackage{expl3,xparse}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\ExplSyntaxOn

\NewDocumentCommand\showcatcodes { m }
 {
   \expandafter\tl_map_inline:nn\expandafter{ \detokenize{#1} }
   {
     \tl_to_str:n {##1} \textsubscript{\char_value_catcode:n{`##1}}
   }
 }

\ExplSyntaxOff


\begin{document}

\showcatcodes{a bcde123!$_ ^€\{\} {} } 
\end{document}

spaces you ought to be able to do by using \scantokens to change the catcode of space after making everything else safe with `\detokenize but scantokens is a dangerous beast and it's biting back at present.

David Carlisle
  • 757,742
  • 1
    That's good. But I think you should use \cs_generate_variant:Nn \tl_map_inline:nn { on } and \tl_map_inline:on { \tl_to_str:n {#1} } ;-) – Ulrike Fischer Mar 15 '17 at 20:51
  • I was expecting egreg to reprimand me on poor l3 style not you:-) I did actually try \tl_map_inline:on and ;xn but as they didn't exist I reverted to form and used \expandafter :-) – David Carlisle Mar 15 '17 at 21:03
  • Actually I don't understand why it works. Isn't after \detokenize the argument a list of chars with catcode 12? So why are the subscripts "correct" ? – Ulrike Fischer Mar 15 '17 at 21:20
  • @UlrikeFischer you are not looking up the catcodes of the tokens passed in, you are looking up the current catcode values. If you want to inspect the catcodes of the tokens then you need \ifcat a #1 11 \else \ifcat^#1 7 \else.... – David Carlisle Mar 15 '17 at 21:26
  • I'm not sure if we have a standard function in L3 to return the catcode of a token as a number... – David Carlisle Mar 15 '17 at 21:27
  • @DavidCarlisle I think that l3tl-analysis could be tweaked into doing it, but some trick by Bruno is needed. – egreg Mar 15 '17 at 23:55
7

You can use l3regex:

\documentclass{article}
\usepackage{expl3,xparse,l3regex}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\ExplSyntaxOn

\NewDocumentCommand\showcatcodes { v }
 {
  \regex_extract_all:nnN { . } { #1 } \l_tmpa_seq
  \seq_map_inline:Nn \l_tmpa_seq
   { \ulrike_value_catcode:x { \tl_to_str:n {##1} } }
 }
\cs_new_protected:Nn \ulrike_value_catcode:n
 {
  #1\textsubscript{\char_value_catcode:n { `#1 }}
 }
\cs_generate_variant:Nn \ulrike_value_catcode:n { x }
\ExplSyntaxOff


\begin{document}
\showcatcodes{abcde123!$_ ^€\{\}}
\end{document}

enter image description here

This is the output with XeLaTeX (after removing inputenc and fontenc):

enter image description here

Final variant with \textvisiblespace and monospaced fonts for the characters:

\documentclass{article}
\usepackage{expl3,xparse,l3regex}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\ExplSyntaxOn

\NewDocumentCommand\showcatcodes { v }
 {
  \group_begin:
  \ttfamily
  \regex_extract_all:nnN { . } { #1 } \l_tmpa_seq
  \seq_map_inline:Nn \l_tmpa_seq
   { \ulrike_value_catcode:x { \tl_to_str:n {##1} } }
  \group_end:
 }
\cs_new_protected:Nn \ulrike_value_catcode:n
 {
  \tl_if_blank:nTF { #1 } { \textvisiblespace } { #1 }
  \textsubscript{\normalfont\char_value_catcode:n { `#1 }}
 }
\cs_generate_variant:Nn \ulrike_value_catcode:n { x }
\ExplSyntaxOff


\begin{document}
\showcatcodes{abcde123!$_ ^€\{\}}
\end{document}

enter image description here

egreg
  • 1,121,712
  • That works fine. I added a test \tl_if_blank:nTF to print the space as \textvisiblespace. But what does \tl_set:Nn \l_tmpa_tl { #1 } do? It is a rest from some older code? – Ulrike Fischer Mar 16 '17 at 08:54
  • @UlrikeFischer Yes, it should have been removed. – egreg Mar 16 '17 at 11:38