Ulrich Diez's answer already cover the "how to do it" part. In the spirit of "one hundred percent correct", the following algorithm will always takes at most O(N log N) time (and O(N) time if the delimiter token is not present in the token list).
Currently there's no implementation, but the idea should work, and we are relatively sure that the (asymptotic) time complexity is computed correctly.
The algorithm.
The description of the algorithm follows.
- First, use
#{ to gobble until the first {} (needless to say, add a trailing {} to avoid the case there's no braced group in the token list)
- then deal with empty token list case accordingly.
- Then, count the number of items in the head. (this can be done without explicitly take the head, by putting them forward in the token list, then after counting skip through the count itself to add a brace and remove the tail part)
- Use a function to remove exactly that many items.
- Deal with any remaining space characters. This part is not hard.
- Stringify and get the result. Recall that the stringfication of the
} is a single token, as such it's possible in linear time.
Step 2 is the bottleneck here with O(N log N) time complexity. The rest takes linear time. (at the cost of linear \romannumeral recursion depth)
Old answer
(uses the approach identical to Ulrich Diez's answer)
Since I think writing complex code such as the above is way too complex, I decide to write a "compiler" that compiles easy-to-understand imperative-style code such as this... that's the definition of the macro that extracts the string of the first }
\rdeflinenumbered \firstegroup #x {}!
\while {}{ \ifnotempty {#x} } {
\conditional {\ifbrace {#x}} {
% found the first group.
% first make sure the open brace has charcode `{` (anything not a space will do.).
\assignoperate #x {\string #x} {
\expandonce
\rcall{\putnextbgroup}
}
% then empty out that group.
\while {
\assignr #\firstcomponent {\firstarg{#x}}
} {
\ifnotempty {#\firstcomponent}
}{
% firstcomponent is still nonempty. Pick one item
\conditional{\ifspace {#\firstcomponent}} {
\assignoperate #x {#x} {
% following in the input stream: { <space> ... } ...
\putnext{\string} \expandonce
% following in the input stream: '{' <space> ... } ... where the initial { is stringified and is definitely not a space
\matchrm{#1 ~}
% following in the input stream: ... } ...
\rcall{\putnextbgroup}
}
} {
\assignoperate #x {#x} {
% following in the input stream: { <item> ... } ...
\putnext{\string} \expandonce
% following in the input stream: '{' <item> ... } ... where the initial { is stringified and is definitely not a space
\matchrm{#1 #2}
% following in the input stream: ... } ...
\rcall{\putnextbgroup}
}
}
% now firstcomponent is shorter.
}
% finally firstcomponent is empty now. (and the opening brace is guaranteed to be non-space)
\assignoperate #x {#x} {
% following in the input stream: { } ...
\putnext{\string} \expandonce
% following in the input stream: '{' } ...
\matchrm{#1}
% following in the input stream: } ...
\putnext{\string} \expandonce
% following in the input stream: '}' ...
}
% finally done
\assignr #x{\firstcomponent{#x}}
\return{#x}
} {
\conditional {\ifspace {#x}} {
\assignr #x{\removespace {#x}}
} {
\assignr #x{\dropfirst {#x}}
}
}
}
\return {}
!
to a set of TeX macros.
Currently there's little to no documentation, but hopefully you can roughly understand what the code does (for instance...
while does a while loop
conditional does a conditional ("if" statement)
assign "assigns" values to token lists
assignr "computes" result of a "function" then assign that to token lists
matchrm matches a pattern forward in the input stream then remove that, assign matched value to token lists
putnext puts tokens forward in the input stream
rcall calls a "subroutine"
- etc.
Currently the compiler is quite slow and generates bloated code, this will be improved later. Also it's not on CTAN. If you're interested you can try to find some ways to run it from source.
The compiler is implemented in LuaTeX, but the generated code can be used in any
compiler.
For a demo, try running the following code
%! TEX program = pdflatex
\documentclass{article}
\usepackage{filecontentsdef}
\begin{document}
\ExplSyntaxOn
% ======== some auxiliary macros ========
\def__process_char #1 #2 {
%\prettye:n{\expandafter \expandafter \expandafter \noexpand \char_generate:nn {#2} {"#1}} \expandafter \expandafter \expandafter \noexpand \char_generate:nn {#2} {"#1}
__process_s
}
\def__process_space_other_cat #1 {
\expandafter \expandafter \expandafter \noexpand \char_generate:nn {32} {"#1}
__process_s
}
\def__process_cs #1 / {
\expandafter \noexpand \csname #1 \endcsname
__process_s
}
\def__process_s#1{
\token_if_eq_charcode:NNTF #1 0 { % 0 <name> / → the control sequence
__process_cs
} {
\token_if_eq_charcode:NNTF #1 s { ~ __process_s
} {
\token_if_eq_charcode:NNTF #1 S { % S <cat> → a space
__process_space_other_cat
} {
\token_if_eq_charcode:NNF #1 . { % . → end
__process_char #1
}
}
}
}
}
% main handler function, will exec the resulting token list.
\def__process_all#1{
\begingroup \exp_last_unbraced:Nx \endgroup {__process_s #1}
}
\ExplSyntaxOff
\begin{filecontentsdefmacro}{\data}
0def/0stzz241/s1{0exp_end:/2}0def/0removespace/6#6#C11{0expandafter/0stzz584/0expandafter/1{0exp:w/0stzz241/6#6#C12}2}0def/0ifempty/6#6#C11{0iffalse/1{0fi/0expandafter/0use_none:n/0expandafter/1{0expandafter/1{0string/6#6#C12}0@ifempty@casei/2}0@ifempty@caseii/2}0use_i:nn/2}0def/0@ifempty@casei/1{0exp:w/0removenextegroup/2}0def/0@ifempty@caseii/1{0expandafter/0stzz309/0exp:w/0removenextegroup/2}0def/0stzz309/6#6#C11{0use_ii:nn/2}0def/0ifbrace/6#6#C11{0iffalse/1{0fi/0expandafter/0use_none:n/0expandafter/1{0expandafter/1{0string/6#6#C12}0@ifbrace@casei/2}0@ifbrace@caseii/2}0use_ii:nn/2}0def/0@ifbrace@casei/1{0expandafter/0stzz353/0exp:w/0removenextegroup/2}0def/0stzz353/1{0expandafter/0stzz354/0exp:w/0removeuntilegroup/2}0def/0stzz354/6#6#C11{0use_i:nn/2}0def/0@ifbrace@caseii/1{0exp:w/0removenextegroup/2}0def/0removeuntilegroup/1{0expandafter/0stzz400/0expandafter/1{0iffalse/2}0fi/2}0def/0putnextbgroup/1{0expandafter/0exp_end:/0expandafter/1{0iffalse/2}0fi/2}0def/0putnextegroup/1{0expandafter/0exp_end:/0iffalse/1{0fi/2}2}0def/0stzz400/6#6#C11{0exp_end:/2}0def/0dropfirst/6#6#C11{0expandafter/0stzz584/0expandafter/1{0exp:w/0stzz400/6#6#C12}2}0def/0stzz411/6#6#C1s1{0expandafter/0stzz425/0expandafter/1{0exp:w/0dropfirst/1{6#6#C12}2}2}0def/0stzz425/6#6#C11{0ifempty/1{6#6#C12}0stzz426/0stzz429/2}0def/0stzz429/1{0expandafter/0stzz434/0expandafter/0use_ii:nn/0exp:w/0removeuntilegroup/2}0def/0stzz426/1{0expandafter/0stzz434/0expandafter/0use_i:nn/0exp:w/0removeuntilegroup/2}0def/0stzz434/6#6#C11{0expandafter/0stzz584/0expandafter/6#6#C10exp:w/0putnextegroup/2}0def/0ifspace/6#6#C11{0expandafter/0stzz439/0expandafter/1{0exp:w/0stzz411/C.6#6#C1s2}2}0def/0stzz439/6#6#C11{6#6#C12}0def/0ifnotempty/6#6#C11{0ifempty/1{6#6#C12}0use_ii:nn/0use_i:nn/2}0def/0stzz464/6#6#C16#6#C20relax/1{0exp_end:/1{6#6#C12}2}0def/0firstarg/6#6#C11{0expandafter/0stzz463c/0expandafter/1{0exp:w/0dropfirst/1{6#6#C10relax/2}2}1{6#6#C10relax/2}2}0def/0stzz463a/6#6#C11{0expandafter/0stzz463b/0expandafter/1{0exp:w/0stzz464/6#6#C12}2}0def/0stzz463b/6#6#C11{0expandafter/0stzz463c/0expandafter/1{0exp:w/0dropfirst/1{6#6#C12}2}1{6#6#C12}2}0def/0stzz463c/6#6#C16#6#C21{0ifnotempty/1{6#6#C12}0stzz463a/0stzz472/1{6#6#C22}2}0def/0stzz472/6#6#C11{0expandafter/0stzz584/0expandafter/1{0use:n/6#6#C12}2}0def/0firstargsingletoken/6#6#C11{0expandafter/0stzz484/0removebgroup/1{6#6#C12}2}0def/0stzz484/6#6#C11{0stzz486/6#6#C12}0def/0stzz486/6#6#C11{0expandafter/0stzz487/0expandafter/6#6#C10exp:w/0putnextbgroup/2}0def/0stzz487/6#6#C16#6#C21{0exp_end:/6#6#C12}0def/0firstcomponent/6#6#C11{0ifspace/1{6#6#C12}0stzz495/1{0stzz497/1{6#6#C12}2}2}0def/0stzz497/6#6#C11{0expandafter/0stzz584/0expandafter/1{0exp:w/0firstarg/1{6#6#C12}2}2}0def/0stzz495/1{0exp_end:/s2}0def/0stzz509/1{0expandafter/0putnextbgroup/2}0def/0firstbgroup/6#6#C11{0ifnotempty/1{6#6#C12}1{0stzz505a/1{6#6#C12}2}0exp_end:/2}0def/0stzz505a/6#6#C11{0ifbrace/1{6#6#C12}0stzz520c/0stzz519/1{6#6#C12}2}0def/0stzz519/6#6#C11{0ifspace/1{6#6#C12}0stzz522b/0stzz522/1{6#6#C12}2}0def/0stzz522/6#6#C11{0expandafter/0stzz520a/0expandafter/1{0exp:w/0dropfirst/1{6#6#C12}2}2}0def/0stzz522b/6#6#C11{0expandafter/0stzz520a/0expandafter/1{0exp:w/0removespace/1{6#6#C12}2}2}0def/0stzz520a/6#6#C11{0ifnotempty/1{6#6#C12}1{0stzz505a/1{6#6#C12}2}0exp_end:/2}0def/0stzz520c/6#6#C11{0expandafter/0stzz515/0expandafter/1{0exp:w/0stzz509/0string/6#6#C12}2}0def/0stzz515/6#6#C11{0expandafter/0stzz583/0expandafter/1{0exp:w/0firstarg/1{6#6#C12}2}2}0def/0stzz537/1{0expandafter/0putnextbgroup/2}0def/0stzz559/1{0expandafter/0stzz563/0string/2}0def/0stzz563/6#6#C16#6#C21{0putnextbgroup/2}0def/0stzz550/1{0expandafter/0stzz554/0string/2}0def/0stzz554/6#6#C1s1{0putnextbgroup/2}0def/0stzz572/1{0expandafter/0stzz576/0string/2}0def/0stzz576/6#6#C11{0expandafter/0exp_end:/0string/2}0def/0firstegroup/6#6#C11{0ifnotempty/1{6#6#C12}1{0stzz532a/1{6#6#C12}2}0exp_end:/2}0def/0stzz532a/6#6#C11{0ifbrace/1{6#6#C12}0stzz588c/0stzz587/1{6#6#C12}2}0def/0stzz587/6#6#C11{0ifspace/1{6#6#C12}0stzz590b/0stzz590/1{6#6#C12}2}0def/0stzz590/6#6#C11{0expandafter/0stzz588a/0expandafter/1{0exp:w/0dropfirst/1{6#6#C12}2}2}0def/0stzz590b/6#6#C11{0expandafter/0stzz588a/0expandafter/1{0exp:w/0removespace/1{6#6#C12}2}2}0def/0stzz588a/6#6#C11{0ifnotempty/1{6#6#C12}1{0stzz532a/1{6#6#C12}2}0exp_end:/2}0def/0stzz588c/6#6#C11{0expandafter/0stzz550b/0expandafter/1{0exp:w/0stzz537/0string/6#6#C12}2}0def/0stzz543a/6#6#C16#6#C21{0ifspace/1{6#6#C22}0stzz559c/0stzz559a/1{6#6#C12}2}0def/0stzz559a/6#6#C11{0expandafter/0stzz550b/0expandafter/1{0exp:w/0stzz559/6#6#C12}2}0def/0stzz559c/6#6#C11{0expandafter/0stzz550b/0expandafter/1{0exp:w/0stzz550/6#6#C12}2}0def/0stzz550b/6#6#C11{0expandafter/0stzz544a/0expandafter/1{0exp:w/0firstarg/1{6#6#C12}2}1{6#6#C12}2}0def/0stzz544a/6#6#C16#6#C21{0ifnotempty/1{6#6#C12}1{0stzz543a/1{6#6#C22}1{6#6#C12}2}1{0stzz572a/1{6#6#C22}2}2}0def/0stzz572a/6#6#C11{0expandafter/0stzz583/0expandafter/1{0exp:w/0stzz572/6#6#C12}2}0def/0stzz583/6#6#C11{0expandafter/0stzz584/0expandafter/1{0exp:w/0firstcomponent/1{6#6#C12}2}2}0def/0stzz584/6#6#C11{0exp_end:/6#6#C12}.
\end{filecontentsdefmacro}
\ExplSyntaxOn
\exp_args:NV __process_all \data
\ExplSyntaxOff
\let\removenextegroup\removeuntilegroup
\message{%
^^JResult: |\romannumeral\firstbgroup{ n A { X{m } j}{}jh}|%
}%
\begingroup
\catcode`\Y=1
\message{%
^^JResult: |\romannumeral\firstbgroup{ n A Y X{m } j}{}jh}|%
}%
\endgroup
\begingroup
\catcode\~=10 \catcode\ =1~%
\message{%
^^JResult:~~|\romannumeral\firstbgroup{~n~A~ ~X{m~}~j}{}jh}|%
}%
\endgroup
\message{%
^^JResult: |\romannumeral\firstbgroup{ n A Xm jjh}|%
}%
\begingroup
\catcode`\Y=1
\expandafter\expandafter\expandafter\def
\expandafter\expandafter\expandafter\test
\expandafter\expandafter\expandafter{%
\romannumeral\firstbgroup{ n A Y X{m } j}{}jh}%
}%
\message{%
^^JResult: |\meaning\test|%
}%
\endgroup
\message{%
^^J------------------------------------------------------%
}%
\message{%
^^JResult: |\romannumeral\firstegroup{ n A { X{m } j}{}jh}|%
}%
\begingroup
\catcode`\Y=2
\message{%
^^JResult: |\romannumeral\firstegroup{ n A { X{m } jY{}jh}|%
}%
\endgroup
\begingroup
\catcode\~=10 \catcode\ =2~%
\message{%
^^JResult:|\romannumeral\firstegroup{~n~A~{~X{m~}~j {}jh}|%
}%
\endgroup
\message{%
^^JResult: |\romannumeral\firstegroup{ n A Xm jjh}|%
}%
\begingroup
\catcode`\Y=2
\expandafter\expandafter\expandafter\def
\expandafter\expandafter\expandafter\test
\expandafter\expandafter\expandafter{%
\romannumeral\firstegroup{ n A { X{m } jY{}jh}%
}%
\message{%
^^JResult: |\meaning\test|%
}%
\endgroup
\end{document}
The demo part is copied from Ulrich Diez's answer.
The algorithm used is mostly identical to that answer as well.
For the source code, see https://github.com/user202729/TeXlib/blob/main/test_imperative.tex#L487 etc. if you're interested. (that file is runnable in LuaLaTeX with appropriate libraries, nevertheless the output is shown in the HTML file, nowadays I prefer using my prettytok package to display the output)
Because of some limitations, \relax is used in place of (frozen relax) token, which will make the algorithm slower in certain cases.
Regarding the compiled code -- the output is some expandable macros e.g., but I make the program generate tokens that does not tokenize well in the normal catcode, thus the auxiliary function to reconstruct the token list.
Example generated code (as I've mentioned before, the current code is very bloated. Will be fixed later)

\tl_analysis_map_inlineorpeek_analysis_map_inlinecan be used to do this. (as far as I can seetokcyclecannot do this easily) – user202729 Mar 26 '22 at 02:29