Scan characters in a string

Question

What I want to do is take some input string, process each character or "token" individually, and output something based on what said character/token is. Is there a way to do this? Perhaps it could look something like this:

\for\char\in#1
\ifx\char\textbackslash
...
\fi
...
\fi

You might look at the xstring package, or https://tex.stackexchange.com/questions/233085/basics-of-parsing?r=SearchResults&s=1|22.2927 — John Kormylo, Apr 14 '20 at 16:18
tex doesn't have strings just tokens, and it depends a bit what you mean by "character" for example £ is two tokens (with hex codes C2 A3) — David Carlisle, Apr 14 '20 at 16:18
@DavidCarlisle Of topic: The fact that £ is two token is only development deadlock and very bad concept. For example, in pdfcsplain (using pdfTeX) the £ is single token. And in XeTeX and LuaTeX the £ is single token too, of course. On topic: TeX must know what sequence of tokens will be treated. How to specify this? "Someone" should give more information what is his intention. — wipet, Apr 14 '20 at 16:31
@wipet I think there is a fairly high probability that it's two tokens in the system the OP is using — David Carlisle, Apr 14 '20 at 16:34
Note that you cannot test a "normal" TeX code by \ifx\nextchat\textbackslash because \textbackslash almost never occurs as a token in the TeX code. Tokenizer interprets backslash with very special manner (with its default setting) and almost never generates single token backslash. — wipet, Apr 14 '20 at 16:42
It would be better if you show a more sensible example and describe with more details the strings you expect to loop on. — egreg, Apr 14 '20 at 17:00
I don't need that many special characters, just "normal" code. — Someone, Apr 15 '20 at 13:11

score 3 · Answer 1 · answered Apr 14 '20 at 16:32

3

Note TeX doesn't have strings and character tokens do not necessarily correspond to what you might call a character, for example £ is two tokens, however latex has a built in loop over tokens:

\documentclass{article}

\begin{document}

\makeatletter

\def\zzz{b}

\@tfor\tmp:=abcdef\do{
[ \tmp\ is
\ifx\tmp\zzz
 b
\else
 not b
\fi
]\par}


\end{document}

produces

answered Apr 14 '20 at 16:32

David Carlisle

757,742

Actually, I plopped this in, and it doesn't really work. – Someone Apr 15 '20 at 13:20
I'm using XeTeX – Someone Apr 15 '20 at 13:21
@Someone It works exactly as designed, "doesn't really work" isn't very descriptive and presumably means "doesn't do what you expected" perhaps so but only you know what you expect especially as your question is so unclear with no test file or use case. – David Carlisle Apr 15 '20 at 14:00
It compiles an error. – Someone Apr 15 '20 at 14:01
@Someone then you did something wrong. The above runs without error and produces the output shown. I used pdflatex in the answer but you get identical output from xelatex. – David Carlisle Apr 15 '20 at 14:03
It must be the \makeatletter then. What's the difference between that and \makeatother? – Someone Apr 15 '20 at 14:06
\makeatletter makes @ a letter so it can be used in command names \makeatother makes @ a non-letter so it can not be used in command names. – David Carlisle Apr 15 '20 at 14:35
oh, at as in @. – Someone Apr 15 '20 at 14:39
@Someone https://tex.stackexchange.com/questions/8351/what-do-makeatletter-and-makeatother-do – David Carlisle Apr 15 '20 at 14:40
Perhaps what @Someone was trying to point out is that there is a missing \makeatotherin your code. – Luis Turcio Jun 16 '23 at 01:44
@LuisTurcio it"s not missing, just not used. – David Carlisle Jun 16 '23 at 06:56
I know this is an example showing how to do what OP wants (+1 by the way). But if later, in a real document one want to write a @, it is still not used? (I'm just a simple user) – Luis Turcio Jun 17 '23 at 01:53
you could and possibly should use it but there are few documents where it makes any difference just if you use \@ to control end of sentence space – David Carlisle Jun 17 '23 at 07:26

Steven B. Segletes · Accepted Answer · 2020-04-14T17:14:08.117

3

The tokcycle package is designed to cycle through input tokens, and take actions based on whether the token is a "character", a group, a macro/command sequence, or a space.

The directives allow one to apply conditional tests to the token to achieve the desired output. Here I place parens around every character token, except for e, which I make bold. If a macro is \today, it is set in italic, if it is \textbackslash, it is \fboxed---otherwise it is merely echoed to the output. Spaces are converted to \textvisiblespaces, while also allowing for line breaks.

Notably, the token cycle can work its way into group content, unless one wishes that to be purposely precluded. It is shown below in its pseudo-environment form, but has macro forms, as well.

\documentclass{article}
\usepackage{tokcycle}
\begin{document}
\tokencycle
{\ifx e#1\addcytoks{\textbf{#1}}\else\addcytoks{(#1)}\fi}%
{\processtoks{#1}}%
{\ifx\today#1\addcytoks{\textit{#1}}\else
 \ifx\textbackslash#1\addcytoks{\fbox{#1}}\else\addcytoks{#1}\fi\fi}%
{\addcytoks{\textvisiblespace\allowbreak}}%
These are \underline{difficult times}, \today{} of all days!

Note that I seek out instances of \textbackslash today in order to make
  it italic.  Paragraphs are not a problem.
\endtokencycle
\end{document}

edited Apr 14 '20 at 17:14

answered Apr 14 '20 at 16:59

Steven B. Segletes

237,551

Apparently \today takes an argument. What can you put in it? – Someone Apr 15 '20 at 13:59
@Someone no standard definition of \today takes an argument. – David Carlisle Apr 15 '20 at 14:01
@Someone \today does not take an argument. However, the syntax \today{} is used so that the spaces following the macro name are not auto-absorbed. Thus \today x has no space before the x, whereas \today{} x has a space before the x. – Steven B. Segletes Apr 15 '20 at 14:02
Couldn't you just use ~? – Someone Sep 28 '20 at 23:05
@Someone Yes, one could use it. This was just for demonstration purposes. – Steven B. Segletes Sep 28 '20 at 23:34
How about processing only THE last character? – Someone Oct 07 '20 at 17:50
@Someone Do you mean last char in input stream or last character in each word? In either case, one has to save the character under consideration until the next token is examined, and only then decide if and in what manner to add the prior token to the output stream. – Steven B. Segletes Oct 07 '20 at 20:37
the last character of the entire string. – Someone Oct 08 '20 at 17:06
\tokcycleenvironment\lastchar {\gdef\recentchar{##1}} {\processtoks{##1}\gdef\recentchar{\egroup}} {\gdef\recentchar{##1}} {\gdef\recentchar{##1}} {\gdef\recentchar{##1}} \lastchar {xyz}\today\endlastchar [\detokenize\expandafter{\recentchar}]. This example gives \today as last token of the input {xyz}\today. The only issue here is that this simple approach gives \egroup if the last character is }. @Someone – Steven B. Segletes Oct 08 '20 at 18:14

Scan characters in a string

2 Answers2