I don't know what OP might have came up with in the 11 years, but there doesn't seem to be anything "interesting" in source3.pdf, in particular \tl_reverse_items:n takes O(n2) time. Not sure about the memory, and there isn't a \str_reverse_items:n.
With \expanded primitive available it would be possible in O(n log n). (divide and conquer.)
Otherwise, I think best case (with f-type expansion or similar) is O(n √n).
The idea is that, we
- count the number of items,
- reverse first √n items (takes O(n) time),
- throw it after the remaining n-√n items (takes O(n) time)
- then continue to reverse the remaining n-√n items.
Each time √n items are reversed, the time taken is O(n), so total time complexity is O(n √n).
I try implementing it. The implementation takes 2.83s to reverse 524288 characters on LuaLaTeX. Looks as expected.
The implementation only handles non-space string characters however, although the convert-everything-to-space part can also be implemented in O(n √n) using the same idea described above, I would not expect a large difference.
For comparison \tl_reverse_items:n takes 1.74s to reverse 8192 items.
(although the comparison is unfair against \tl_reverse_items:n since it does not grab 8 items at a time & have to return the braces)
I think this is optimal in this condition, as even the simple task of expandably collecting n undelimited items in the input stream and put them in a group seems to require quadratic time in n without \expanded (but takes linear time with)
Time complexity O(n log n) is easy.
(because everything (except avoiding hash halving) \expanded can do, \edef can do unexpandably in similar time complexity.
At least in this particular case where it doesn't need to nest, so \edef can replace the role of \expanded in the divide-and-conquer approach)
(the implementation is not very convenient to post because it depends on a bunch of unpublished libraries etc.)
Alternatively, there is a solution that takes O(kn) time and O(k × n^(1/k)) tok registers (or hash table entries, assuming TeX hashing takes O(1))
Below I'll describe the solution that takes O(n) (see note below) time and O(√n) tok registers. Generalizing (e.g. to a solution that takes O(n) time and O(∛n) tok registers) is not difficult.
- split the string into √n parts, store each into a tok register. This can be done in linear time.
- reverse each part (whose size is √n) in linear time in the part size, using √n tok registers.
- concatenate the parts into the result in reverse order. This can also be done in linear time.
(I try implementing this one and it takes 1.45s to reverse 524288 characters.
Marginally faster than the above I guess, although the implementation could be optimized a bit e.g. chunking 8 characters at once in the latter half.)
\tl_build_put_left implementation, despite having time complexity O(n log5 n) instead of O(n), is surprisingly fast at 1.74s.
In retrospect, there's a problem with this approach, namely that a number in range 1..n takes log10(n) decimal digits to represent, so actual time complexity is O(n log n).
In my algorithm, both the string split part and the concatenate part are affected by this.
So, workaround...
- first, for the string split part, I can't come up with any way better than to define √n control sequences to count the steps. Each control sequence would need to have csname length at least log10(√n) = O(log n).
- for the concatenating part, similarly define √n control sequences each
\toksdef'ed to the corresponding toks register, then define √n control sequences each expand to the following control sequence plus the \toksdef'ed control sequence,
so the full expansion of one of them would equal to the concatenation of all these √n toks registers.
Time complexity of these parts is O(√n log n), but fortunately they only need to be done once, so they're insignificant compared to O(n).
Another workaround (for the concatenating part only) is, to split into parts of size O(log n), then handle that with the naive algorithm, time complexity would be O(n log log n).
Applying that idea recursively, I guess it would become O(n log* n) or something similar.
Final time complexity is still O(n), but memory usage becomes O(√n log n). Of course similar idea exists for O(∛n log n) etc., although I haven't worked out the details completely.
Another (rather weird in my opinion) workaround is to use the e-TeX extension \currentiflevel which can be incremented and decremented expandably in O(1).
I test the limit of this one, on LuaLaTeX it seems to be unbounded, but on LaTeX, PDFLaTeX and XeLaTeX it runs to more than 2000000 until reports "TeX capacity exceeded", so in any case that's large enough for any purpose and in theoretical analysis we can assume it takes linear memory and only bounded above by the available memory. (unlike e.g. the \romannumeral nesting level, \expandafter nesting level or \dimexpr nesting level etc.)
Although if the value is already a large value (unlikely in practice, but possible in theory) it might be a little difficult, not sure... I can see a few ways
repeatedly execute \fi until it reaches a small value, then repeatedly execute \ifcase\z@ until it reaches the original value (I think this cannot affect the behavior of normal program...? Except possibly changing the content in the log I was wrong, \currentiftype exists)
hope that TeX implementation evaluates the internal number \numexpr\currentiflevel-\originalvalue\relax in O(1) time
where ⟨internal number⟩ is expected, which is \the\toks\numexpr\currentiflevel-\originalvalue\relax in this case.
(this is in theory possible, as at no point in time is O(log n) tokens generated, although I'm not sure how it's implemented in reality)
\def\i{a}\def\ii{b}\def\iii{c}\iii\ii\i. That is, index the string via macros and expand them from the last back. Of course it's not "linear". – egreg Nov 24 '11 at 12:09