4

I want to implement something similar to \detokenize as a Lua function. But there is one case where I am prevented from accomplishing my goal.

\directlua{tex.sprint("[" .. token.get_next().csname .. "]")}\par
\directlua{tex.sprint("[" .. token.get_next().csname .. "]")}\UNDEFINED

The first line above outputs [par] and the other outputs []. Replacing token.get_next().csname by token.scan_csname() doesn't help. How do I obtain the name of an undefined control sequence in LuaTeX?

  • good question, I'd have expected .csname or scan_csname to work, but as you say, they don't. Might be better to ask on the luatex list, see what Hans says! – David Carlisle Oct 26 '16 at 10:16
  • Reading the LuaTeX manual I suspect this is 'by design' behaviour, as it's not actually the user input that token.get_next().csname returns, rather the csname linked to this input where it exists. – Joseph Wright Oct 26 '16 at 11:16
  • 1
    raised on luatex list as http://tug.org/pipermail/luatex/2016-October/006257.html – David Carlisle Oct 26 '16 at 11:40
  • BTW, I suspect a \detokenize equivalent won't work as you can't ell if a token was an active char (due to dropping of the escape char): ~ and \~ yield the same from token.get_next().csname, for example. – Joseph Wright Oct 26 '16 at 12:48
  • 1
    @JosephWright: \directlua{tex.sprint(token.get_next().active and "active character" or "control sequence")} – user3840170 Oct 26 '16 at 16:36
  • 1
    But wait, there's more: token.create("UNDEFINED").csname is empty, while token.create("par").csname is "par". – user3840170 Oct 27 '16 at 14:05

2 Answers2

5

Note that you don't get an empty string because \UNDEFINED is not defined but because it has never been scanned by TeX (so there is no internal token of that name)

this plain luatex

\long\def\z#1{%
\directlua{texio.write_nl("[" .. token.scan_csname() .. "]")}#1%
\directlua{texio.write_nl("[" .. token.get_next().csname .. "]")}#1%
}

\z\par
\z\UNDEFINED

\bye

produces a log

[par]
[par]
[UNDEFINED]
[UNDEFINED]

showing that just having \UNDEFINED being seen by TeX's #1 argument scanner is enough to get the behaviour you expected.

David Carlisle
  • 757,742
  • 1
    It's a bug, then? – user3840170 Oct 26 '16 at 16:52
  • @user114332 no by design (not a design I'd have used myself, especially for .scan_csname which could return the string that it scanned, it's more reasonable for get_next() as there is no token defined for \UNDEFINED until TeX has scanned it. see the email list discussion – David Carlisle Oct 26 '16 at 17:19
1

One possible way to get is to use the expand capability of scan_toks() to expand/execute some following tokens, similar to this answer.

In the example below,

  • function f does a simple token.get_next() (which does not handle the case the following token is not in the hash table)
  • function g uses scan_toks(false, true) to expand the input, and \immediateassignment combined with \futurelet to force TeX to tokenize the next token.
  • function h uses coroutine trick (although instead of a coroutine, auxiliary functions are used) to call the \futurelet instead. \immediateassignment can be added if it needs to work in an expansion-only context; nevertheless in an o-expansion context it will still fail to work.
%! TEX program = lualatex
\documentclass{article}
\usepackage{luacode}

\ExplSyntaxOn \use_none:n {__unused} % put the token __unused into the hash table \cs_new_protected:Npn __h_aux { \directlua{h_aux()} } \ExplSyntaxOff

\begin{luacode*}

print("\n\n\n\n")

function f()
    print("f: csname =", token.get_next().csname)
end

function g()
    token.put_next {
        token.create(string.byte "{", 1),
        token.create "immediateassignment",
        token.create "futurelet",
        token.create "__unused",
        token.create(string.byte "}", 2),
    }
    token.scan_toks(false, true)
    print("g: csname =", token.get_next().csname)
end

function h()
    tex.sprint {
        token.create "futurelet",
        token.create "__unused",
        token.create "__h_aux",
    }
end

function h_aux()
    print("h: csname =", token.get_next().csname)
end
\end{luacode*}


% normal behavior: tokens never seen before (not in the hash table) results in empty csname
\directlua{f()}\par
\directlua{f()}\undefined
\directlua{f()}\undefineda
\directlua{f()}\undefineda
\directlua{f()}\undefinedb

% use function g instead
\directlua{g()}\par
\directlua{g()}\undefined
\directlua{g()}\undefineda
\directlua{g()}\undefineda

% just checking, undefinedb is still not in the hash table
\directlua{f()}\undefinedb

% use function h instead
\directlua{h()}\par
\directlua{h()}\undefined
\directlua{h()}\undefinedb
\directlua{h()}\undefinedb

\begin{luacode*}
print("\n\n\n\n")
\end{luacode*}
\begin{document}
\end{document}

The output is

f: csname =     par
f: csname =     undefined
f: csname =
f: csname =
f: csname =
g: csname =     par
g: csname =     undefined
g: csname =     undefineda
g: csname =     undefineda
f: csname =
h: csname =     par
h: csname =     undefined
h: csname =     undefinedb
h: csname =     undefinedb

You can see that function g() and h() gets the csname correctly, being \undefineda and \undefinedb.

user202729
  • 7,143
  • As far as I know futurelet is the safest way instead of e.g. applying \noexpand on it then expand it once, which might make it lose the noexpand property (although I'm not sure whether futurelet does the same -- or whether noexpand-ness is preserved by get_token() either) – user202729 Jul 02 '22 at 15:24