28

Goal: reverse a list of characters, going from \def\mylist{abcdefgh} to \def\mylist{hgfedcba}. This is easy using a marker which does not appear in the list, such as \relax:

\def\mylist{abcdefgh}
\def\reverse #1%
  {\edef #1{\expandafter \reverseloop #1\relax \marker }}
\def\reverseloop #1#2\marker
  {\ifx#1\relax\reverseend\fi \reverseloop #2\marker #1}
\def\reverseend #1\marker #2{}
\reverse\mylist
\show\mylist

So far, so good. Unfortunately, this wastes a large amount of memory, and trying to apply the same function when \mylist has a few thousand characters already blows up. Indeed, each call to \reverseloop reads the whole token list as its #2 argument, and this is not flushed from TeX's memory via tail recursion, because TeX never reaches the end of the replacement text of \reverseloop, or rather, only reaches it at the very end, once all the \reverseloop macros have been expanded. You can see this from the call trace in

\def\fiveup{\edef\mylist{\mylist\mylist\mylist\mylist\mylist}}
\fiveup \fiveup \fiveup \fiveup
\tracingall
\reverse\mylist

Thus, the whole process consumes a memory proportional to the square of the number of characters, reaching millions, typical size of TeX's main memory. How can I implement such a reversal using only a linear amount of memory?

It should easily scale up to 100000 characters, albeit maybe be a bit slow there: of course we cannot avoid a quadratic time. I don't care too much about expandability.

  • I'm not sure that such a beast really exists, unless you can index the string: \def\i{a}\def\ii{b}\def\iii{c}\iii\ii\i. That is, index the string via macros and expand them from the last back. Of course it's not "linear". – egreg Nov 24 '11 at 12:09
  • You can look at page 379 in the TeXbook – egreg Nov 24 '11 at 12:13
  • @BrunoLeFloch Your title implies speed, but your text is about memory use. Could you clarify one or the other? – Joseph Wright Nov 24 '11 at 13:01
  • What's wrong here with using two macros and moving tokens one at a time? Slow for long lists, but would be the usual approach. – Joseph Wright Nov 24 '11 at 13:04
  • 1
    @egreg Joseph rightfully pointed out that my title was misleading. I guess that a more interesting question would be "what are the most efficient ways (plural) to reverse a string?". I've been experimenting with many approaches. Expandably, I cannot do better than O(n^2) time and O(n) space (but I wouldn't be too surprised to see a crazy divide-and-conquer algorithm in O(n log n)). Non-expandably, I can reach linear times for token lists <32768 chars long (by storing the various characters in TeX's toks registers, in a group). – Bruno Le Floch Nov 24 '11 at 13:40
  • @Joseph: I need to benchmark the various codes I ended up with. Using two macros is slow (quadratic in time), and non-expandable, but indeed won't use more than a linear amount of memory. – Bruno Le Floch Nov 24 '11 at 13:45
  • @BrunoLeFloch We are all curious to see your solutions:) I have posted one with a stack (pretty much Joseph's suggestion), that managed not to bomb out over 116,000 chars. It took a full coffee and a cigarette (bad habits) to complete the loop:) – yannisl Nov 24 '11 at 16:44
  • @Bruno I'm not able to compute the complexity of my solution: I'm a mathematician, after all. :) You surely know better than me. – egreg Nov 24 '11 at 17:13
  • @BrunoLeFloch: Do you want to link to the solution by LaTeX 3: http://tex.stackexchange.com/questions/40225/how-can-i-reverse-the-order-of-letters-tokens/40227#40227 – Marco Daniel Jan 06 '12 at 16:43
  • @MarcoDaniel It's terribly inefficient, because it makes sure to preserve spaces. I still need to write up a full answer to the current question, comparing various methods, and what can be achieved, expandably or not. – Bruno Le Floch Jan 06 '12 at 21:33

7 Answers7

16
\def\firstoftwo#1#2{#1}
\def\secondoftwo#1#2{#2}

\def\rev#1#2\revA#3\revB{%
  \if\relax\detokenize{#2}\relax
    \expandafter\firstoftwo
  \else
    \expandafter\secondoftwo
  \fi{#1#3}{\rev#2\revA#1#3\revB}}

\edef\x{\rev abcde\revA\revB}\show\x

A string with 10000 characters is reversed in about 20 seconds on my machine, without clobbering the memory.

For your list I get

8.16 real         5.16 user         0.05 sys

(just because I had to react to \show)

In #3 there is the "reversed-so-far" string; at each step of the recursion I put in front of it the first token in the remaining string, which is #1#2. When #2 is empty, the recursion ends.

The "linear" reversing should be obtained by

\catcode`\@=11
\def\reverse#1{\count@=\z@\def\temp{}
  \expandafter\doreverse#1\doreverse
  \loop\ifnum\count@>\z@
    \edef\temp{\temp\csname @@\romannumeral\count@\endcsname}%
    \advance\count@\m@ne
  \repeat
  \expandafter\def\expandafter#1\expandafter{\temp}%
}
\def\doreverse#1{%
  \unless\ifx#1\doreverse
    \advance\count@\@ne
    \expandafter\def\csname @@\romannumeral\count@\endcsname{#1}%
    \expandafter\doreverse
  \fi}
\catcode`\@=12

which is limited only by available memory, using the space for control sequences.

With \def\mylist{<string>}, \reverse\mylist defines successively \i, \ii and so on to the tokens forming the list and at the end stores them back in reverse order in \temp to which \mylist is then made equivalent. So after

\def\mylist{abcdefgh}
\reverse\mylist

\mylist will expand to hgfedcba. It doesn't work as is for braced groups, but the modification in that case should be trivial.

I've reversed a 40000 character long string in 42 seconds. TeX refuses to do a 100000 character long string, because it exhausts the pool size. (I removed \begingroup and \endgroup as it makes run away of save size.)

egreg
  • 1,121,712
  • How are the braces in the elements to be preserved? \edef\x{\rev {a}bcd{e}\revA\revB} – Ahmed Musa Nov 25 '11 at 01:23
  • @AhmedMusa You can say \unless\ifnum\pdfstrcmp{\detokenize{#1}}{\string\doreverse\space} instead of \ifx#1\doreverse (requirese pdftex; in xetex use \strcmp instead of \pdfstrcmp; or input pdftexcmds.sty and use `\pdf@strcmp). – egreg Nov 25 '11 at 01:30
  • Changing \count@ outside a local group is dangerous! – Ahmed Musa Nov 25 '11 at 01:39
  • (1) I don't see how \unless\ifnum\pdfstrcmp{\detokenize{#1}}{\string\doreverse\space} preserves outer braces in the reserved elements. The braces are lost during argument grabbing, not inside \doreverse. (2) \doreverse is not expandable: not interesting. (3) \doreverse might define 10000k temporary commands, unless they're localized. – Ahmed Musa Nov 25 '11 at 01:47
  • @AhmedMusa Yes, the braces are lost, but an additional test might reinsert them. Localizing the definitions would rapidly exhaust the save size. It's just an exercise: reversing long strings is better done with a different program. – egreg Nov 25 '11 at 11:01
  • (1) Please which additional test will preserve outer braces? I wanted to reverse a{ax}{by}cde preserving braces and forms of {ax} and {by}. (2) @tfor performs well without the need for potentially unsustainable number of intermediate macros. – Ahmed Musa Nov 25 '11 at 12:41
  • @AhmedMusa Then use @tfor! :) I wouldn't search for an "efficient" (but awkward) algorithm just to reverse short strings. – egreg Nov 25 '11 at 13:11
  • I don't understand the test for preserving the outer braces that you've suggested but I have a new solution that achieves that. It isn't really efficient. Please where do I post it? – Ahmed Musa Nov 26 '11 at 01:24
  • Hm actually your solution is not really linear because each append to \temp takes linear time already. You need to edef temp to an expandable loop I think – user202729 Aug 06 '22 at 13:31
  • @user202729 I don't think I like to be reproached for something I never claimed (note the quotes around linear) by someone who does their best to hide their identity. – egreg Aug 07 '22 at 10:15
  • Huh, I didn't notice the quote (nor can immediately see that that's what the quote mean), never mind then. For the latter half it's not like it matter anyway...? – user202729 Aug 07 '22 at 10:50
  • @user202729 As far as I know, you're the only one providing packages under anonymity. Think about this. – egreg Aug 07 '22 at 11:56
12

I didn't check the memory, but a lua solution would be:

\def\StrRev#1{\directlua{tex.print(string.reverse('#1'))}}

abcdefgh\par
\StrRev{abcdefgh}

Which prints: result

I measured the running time of two string lengths. On my machine:

Using \nullfont

100 000 chars : 0.21 s
1 000 000 chars : 1.24 s

With font

100 000 chars : 0.74 s
1 000 000 chars : 6.21 s

Marco
  • 26,055
  • 1
  • Please post a minimal working example. 2) \luacode should be replaced by \luaexec if you use the luacode package. 3) otherwise: nice solution
  • – topskip Nov 24 '11 at 12:18
  • 2
    sorry, but I cannot see how this is a good answer? the question was tagged tex-core.. – Davy Landman Nov 24 '11 at 12:33
  • 8
    @DavyLandman This site is also good for future reference - and perhaps someone with a similar problem using LuaTeX sees this question and finds the related and constructive answer by Marco. I welcome such answers (I tend to give LuaTeX solutions, too :)) – topskip Nov 24 '11 at 12:36
  • 1
    @Patrick \luaexec doesn't seem to be defined in plain LuaTeX. It throws Undefined control sequence. I changed example to ConTeXt. – Marco Nov 24 '11 at 12:48
  • @Marco As he mentioned, you have to load the luacode package. – Torbjørn T. Nov 24 '11 at 12:50
  • It is defined in the LaTeX package luacode. Forgot to mention that it is a LaTeX package. – topskip Nov 24 '11 at 12:50
  • I was using plain, not LaTeX. Is there a \luacode or \luaexec replacement for plain defined? – Marco Nov 24 '11 at 12:59
  • I got it: \directlua. – Marco Nov 24 '11 at 13:00
  • 2
    @DavyLandman Yes, this answer does not help me much, but it may very well be of interest to others. – Bruno Le Floch Nov 24 '11 at 13:47