I hope I can communicate this clearly. I've been trying to pass an argument to a Lua function that contains LaTeX commands. Normally this works fine, but if Lua tries to match/sub %s within this argument it seems to treat the command as if it had no curly braces and operates only on the following single character. So, with the code in file 'new.lua' like this...
local function test(str)
newstr = str:gsub("(-+)","X")
tex.print(newstr)
end
return {test=test}
and the following in LaTeX:
\documentclass{article}
\usepackage{luacode}
\directlua{lua = require("new.lua")}
\newcommand{\test}[1]{\directlua{lua.test(\luastringN{#1})}}
\begin{document}
\test{aaa-\textbf{bbb}-ccc}
\end{document}
I obtain the desired result:
aaaXbbbXccc
but if I try the same with whitespace instead of -, like so:
(Lua)
local function test(str)
newstr = str:gsub("(%s+)","X")
tex.print(newstr)
end
return {test=test}
(LaTeX)
\documentclass{article}
\usepackage{luacode}
\directlua{lua = require("new.lua")}
\newcommand{\test}[1]{\directlua{lua.test(\luastringN{#1})}}
\begin{document}
\test{aaa \textbf{bbb} ccc}
\end{document}
I get the incorrect output
aaaXbbbXccc
and the following error:
! Undefined control sequence.
l.1 aaaX\textbfX
{bbb}Xccc
l.8 \test{aaa \textbf{bbb} ccc}
In trying to figure this out, I noticed that replacing the X here with a non-alphabetic character makes the command operate only on that character before "adding" whitespace after the whole unit. So, with the following Lua...
local function test(str)
newstr = str:gsub("(%s+)","1")
tex.print(newstr)
end
return {test=test}
and the LaTeX the same as the previous example, I get:
aaa11bbb1ccc
Ideally, I would like the match/sub to treat LaTeX commands as if they involved no "hidden" whitespace, if that makes sense; that is, I only want to match the whitespace that is present in the actual written LaTeX.
I know this issue stems for an improper understanding of the way TeX handles tokens, but I'm not sure how to rectify that, or understand it properly.

(+-)(same as in the first block), is that intended or is that a type/copy-paste mistake? – Marijn May 17 '20 at 12:15