Replace character defined by ASCII code (by something else)

Question

I am trying to replace, in a given string, some characters. The characters to be replaced is given by its ASCII code. So for example, in abc, I want to replace character 97 (which is a, but I don't know this yet) by something else, let's say a dot (.).

Now, a, \char97 and \char\thenumber all look the same to me when printing them in a document, but replacing this respective argument in a string a only works with the first. Why is that, and how can I solve it? This is my MWE (unchanged from the original question):

\documentclass{book}
\usepackage{xstring}
\newcounter{number}
\setcounter{number}{97}
\newcommand{\replace}[1]{\StrSubstitute{#1}{a}{.}}
\begin{document}
a

\char97

\char\thenumber

\replace{a}

\expandafter\replace{\char97}

\expandafter\expandafter\expandafter\replace{\char\thenumber}

\end{document}

\char97 is not the same as a: the former is three tokens, the latter just one. There are situations where a and \char97 might result in very different output (math mode, in particular). The construction \expandafter\replace\expandafter{\char97} can't work (in yours \expandafter does nothing) because \char is not expandable. — egreg, Aug 10 '14 at 19:25
BTW, \char\value{number} or \symbol{\value{number}} is much better than \char\thenumber, because \thenumber can expand into something that is not a number. And if it expands to a number, then it does not stop for looking digits in the following text. — Heiko Oberdiek, Aug 10 '14 at 19:29
\expandafter does only expand the token after the next token, thus the example expands the unexpandable { three times in summary. And even if you insert \expandafter after \replace, \char is not expandable. — Heiko Oberdiek, Aug 10 '14 at 19:34
Aaaaalright, \char is not expandable. That's the information I was missing. No, \number is not was I was looking for: I wanted to use \StrSubstitute. — bers, Aug 10 '14 at 19:35
@bers why do you want to use \StrSubstitute? Joseph's comment answers your question as written (and is far more efficient than using string substitution macros) — David Carlisle, Aug 10 '14 at 20:32
@ David Carlisle: \number`a returns 97, I do not see how this is helping. Yes, \char\number`a outputs a, but: 1. I want to replace to list of characters given by ASCII codes in a string (so I already have the result of \number), and 2. \replace{\char\number`a} does not work in my MWE either. — bers, Aug 11 '14 at 05:35
@bers replacing a by 97 is exactly replacing a character by its ascii code. If you don't want that perhaps you should edit to change the question as I can't guess what else you do want given the question title? — David Carlisle, Aug 11 '14 at 06:57
Oh, now I get it. The title is confusing, sorry! I meant to ask how to "Replace character defined by ASCII code (by something else)" ... — bers, Aug 11 '14 at 10:50
@bers Since you have an answer you should post it (it's not bad style). However… I did not understand what you want; the sentence “replace a character given by its ASCII code” and the title don't look clear to me. What is what you mean by “given by its ASCII code”? So, apart from answering it, I think you should edit the question so it is more clear. — Manuel, Aug 11 '14 at 15:14

Joseph Wright · Answer 1 · 2014-08-11T12:26:46.253

The TeX \char primitive is not expandable. This means that in the example the attempted expansion

\expandafter\replace\expandafter{\char97}

does nothing and is equivalent to

\replace{\char97}

which for hopefully-obvious reasons fails to match a literal a. What is needed is therefore some expandable way to turn a number into a char. Classical TeX offers us the \ifcase primitive, which is used in the general form

\ifcase<number> %
   <case 0>\or
   <case 1>\or
   <case 2>\or
   ...
   <case n>\else
   <no match>
\fi

Thus with appropriate set up it's possible to do a case-by-case conversion. Such an approach is taken by LaTeX's \@alph and \@Alph, which index lower-/upper-case letters, respectively, from 1:

\@alph{1}% => a

Clearly here some maths made be needed to get the offset right. Such an approach is expandable but is obviously tedious for long lists of chars.

It's possible to set up a 'selective' case statement that is expandable: there is one pre-built in for example expl3

\int_case:nnF { <number> }
  {
    {  97 } { a }
    { 101 } { e }
    { 105 } { i }
    { 111 } { o }
    { 117 } { u }
  }
  { No~match }

There is a performance hit over the primitive for short lists of continuous integer ranges but better input format and clarity, particularly for long lists. (I think there are other implementations of the same concept in addition to the expl3 one.)

The problem with any case-by-case approach is if you are looking beyond ASCII to the entire UTF-8 range. If LuaTeX is in use then there is an expandable primitive that will do the job

\Uchar97 % => a

That's obviously the easiest solution if you can be sure of the engine in use.

score 4 · Accepted Answer · answered Aug 11 '14 at 13:38

4

There is a standard \lowercase trick which creates a token with a given ASCII code. For example you need to replace all ASCII 97 characters from the string abcabcac by double ?. Then you can try:

\bgroup\lccode`X=97 \lowercase{\egroup \StrSubstitute{abcabcabc}{X}{??}}
output:  ??bc??bc??bc

This code creates the token with given ASCII code instead the X letter and then the \StrSubstitute macro is executed.

Of course, your processed string cannot include the X letter itself. If this is a problem then you can create the following \replace macro:

\def\replace#1{\bgroup \lccode`X=#1 \lowercase{\egroup \replaceA{X}}{abcabcabcXuv}}
\def\replaceA#1#2{\StrSubstitute{#2}{#1}{??}}

\replace{97}
output:  ??bc??bc??bcXuv
\replace{98}
output:  a??ca??ca??cXuv

answered Aug 11 '14 at 13:38

wipet

74,238

1

Perhaps use ^^@ to avoid a char likely to be in the input? – Joseph Wright Aug 11 '14 at 13:43
@JosephWright I don't know \StrSubstitute but may be it expects exactly ASCII+category. It means that you need to say \catcode`^^@=11 before using it, because the a, b characters are letters. Or you need to convert the processed string to tokens of catcode 12 (by \meaning primitive, for example). – wipet Aug 11 '14 at 14:03
I was thinking something like \begingroup\catcode`\^^@=11\relax\gdef\replace#1{\bgroup...so that the catcode would be 'right'. Of course, that can't cover the case of different chars with different catcodes, but it's not clear to me what is wanted in that respect from the question. – Joseph Wright Aug 11 '14 at 14:05
Not having tried it, but this looks very much like what I wanted (without having to use LuaTeX or expl3 or ...). – bers Aug 11 '14 at 18:50

score 0 · Answer 3 · answered Aug 11 '14 at 18:47

What I wanted to achieve I achieved using \@Alph and \@alph, since my input ASCII codes are in the a-z, A-Z range. \@Alph etc. appear to expand to the 'real' letters. As multiple others have pointed out, \char is not expandable. I hope this may help someone else sometime.

Replace character defined by ASCII code (by something else)

3 Answers3

Linked