I want to execute some code if some words are present in a latex3 string. I came up with my own implementation, basically doing a loop using \str_map_inline and keeping track of the last part of the current word using \str_put_right, but it turns out to be like 500x slower than what I would expect (comparing with \str_if_in:NnTF that should roughly do the same amount of operations), which makes my overall library 20% slower for one tiny operation. Any idea what I did wrong?
MWE:
\documentclass{article}
\usepackage{l3benchmark}
\begin{document}
Test
\ExplSyntaxOn
%%%%%%%%%%%%%% Library to make more efficient
% __robExt_auto_forward_words:N \commandToRunOnEachWord \stringToSearchOn
\cs_set:Nn __robExt_auto_forward_words:NN {
% \l_tmpa_str will contain the current word read so far
\str_set:Nn \l_tmpa_str {}%
\str_map_inline:Nn #2 {
% \token_case_charcode:NnTF ##1 {} {} {}
__robExt_if_letter:nTF {##1} {
\str_put_right:Nn \l_tmpa_str {##1}
}{
\str_if_empty:NTF \l_tmpa_str { } {
% if the string is empty, we run the command on the string
#1 \l_tmpa_str%
\str_set:Nn \l_tmpa_str {}% we reset its value
}
}
}
}
%% __robExt_if_letter:nTF {char} {true} {false} tests if an element is a letter
%% https://tex.stackexchange.com/a/700864/116348
\prg_new_conditional:Npnn __robExt_if_letter:n #1 { TF }
{
\bool_lazy_or:nnTF
{
\bool_lazy_and_p:nn
{ \int_compare_p:nNn { #1 } > {a - 1 } }
{ \int_compare_p:nNn { #1 } < {z + 1 } }
}
{
\bool_lazy_and_p:nn
{ \int_compare_p:nNn { #1 } > {A - 1 } }
{ \int_compare_p:nNn { #1 } < {Z + 1 } }
}
\prg_return_true:
\prg_return_false:
}
% \robExt_register_match_word {namespace that defaults to empty} {word} {code to run if word is present}
\cs_set:Nn \robExt_register_match_word:nnn {
\cs_set:cn {l__robExt_execute_if_word_present_#1_#2:} {#3}
}
% \robExt_try_to_execute_if_match_word:nn {namespace} {word}
\cs_set:Nn \robExt_try_to_execute_if_match_word:nn {
\cs_if_exist:cTF {l__robExt_execute_if_word_present_#1_#2:} {%
\cs_if_exist:cTF {l__robExt_execute_if_word_present_#1_#2__already_forwarded:}{\message{Already forwarded}}{
\use:c {l__robExt_execute_if_word_present_#1_#2:}%
% define it so that we do not import twice next time
\cs_set:cx {l__robExt_execute_if_word_present_#1_#2__already_forwarded:} {}
}
} { }
}
\cs_generate_variant:Nn \robExt_try_to_execute_if_match_word:nn { nV }
%%%%%%%%%%%%%% Usage
\robExt_register_match_word:nnn {} {grapes} {I~like~grapes.\}
\robExt_register_match_word:nnn {} {grapefruits} {In~hate~grapefruits.\}
%% This string is already created for other reasons, so you can safely assume it exists
\str_new:N \l_my_str
\str_set:Nn \l_my_str {In~the~market~you~can~find~some~grapes~and~grapefruits.}
My~string~is~''\l_my_str''.\newline
\NewDocumentCommand{\testAutoForward}{}{
\cs_set:Nn __robExt_tmp_fct:N {
\message{I will try to run ##1}
\robExt_try_to_execute_if_match_word:nV {} ##1
}
__robExt_auto_forward_words:NN __robExt_tmp_fct:N \l_my_str
}
\cs_new:Nn \robExt_benchmark_me:n {
\benchmark:n {#1}
Number~of~operations~taken~by:\par\texttt{\detokenize{#1}}\par~is~\fp_to_scientific:N\g_benchmark_ops_fp.
Time~taken~by:\par\texttt{\detokenize{#1}}\par is~\fp_to_scientific:N\g_benchmark_time_fp.
}
\fp_new:N \l_robExt_fp
\fp_set_eq:NN \l_robExt_fp \g_benchmark_time_fp
\robExt_benchmark_me:n {\testAutoForward}
\par Second test (reference time I'd like to reach):\par
\robExt_benchmark_me:n {
\str_if_in:NnTF \l_my_str {grapes}{%
% Not sure why I cannot print this with getting "TeX capacity exceeded", I guess because it repeats it a lot?
% I~like~grapes.
}{}
\str_if_in:NnTF \l_my_str {grapefruits}{}{}
}
% Not sure why this prints "ERROR: Use of ??? doesn't match its definition."
% The~reference~implementation~is~\fp_eval:n{(\g_benchmark_time_fp) / (\l_robExt_fp)}~times~faster.
\ExplSyntaxOff
\end{document}
EDIT
To answer the comments, more precisely, I have a string (latex3, i.e. everything should be chatcode other or space I think) \mystring, and I want to extract all words ([a-zA-Z]+) to run the corresponding some code that might have been registered before via \registerWord{myWord}{some code}. So, if \mystring contains:
In the market you can find some grapes, apples, and grapefruits.
and if I ran \registerWord{grapes}{\message{I like grapes}}, then running \extractAndExecuteWords \mystring should run \message{I like grapes}.
My first try with normal latex (but multiple issues: spaces are removed from the string, and I can't find how to insert braces in the macro, so I insert bgroups but it is not equivalent and How can I add a single curly brace to a macro? gives me weird errors):
\documentclass{article}
\begin{document}
\ExplSyntaxOn
\str_new:N \l_my_str
\str_set:Nn \l_my_str {In~the~market~you~can~find~some~grapes, apples,~and~grapefruits.}
\let\myString\l_my_str
\ExplSyntaxOff
\makeatletter
% \autoForwardWords \stringToSearchOn
\def\autoForwardWords#1#2{%
\def\robExt@tmp@word{}%
\let\robExt@cmd@to@run#1%
\message{AAAAAAAAA #2}%
\edef\robExt@list@of@commands{%
\noexpand\robExt@cmd@to@run\noexpand\bgroup%
\expandafter\autoForwardWords@aux#2\robExt@end@of@string% \autoForwardWords@aux is the end of the string
}%
%% This shows the command to run, with two issues:
%% 1) it removed spaces in the string
%% 2) I can't find how to add braces instead of bgroups.
%% I tried https://tex.stackexchange.com/questions/506613/how-can-i-add-a-single-curly-brace-to-a-macro
%% but I was getting errors.
%%\show\robExt@list@of@commands
\robExt@list@of@commands
}
\def\autoForwardWords@aux#1{%
\ifx#1\robExt@end@of@string% We arrived at the end of the string
\noexpand\bgroup%
\else%
\ifnum#1>\numexpra-1\relax%
\ifnum#1<\numexprz+1\relax%
#1%
\else%
\noexpand\egroup\noexpand\robExt@cmd@to@run\noexpand\bgroup%
\fi%
\else%
\ifnum#1>\numexprA-1\relax%
\ifnum#1<\numexprZ+1\relax%
#1%
\else%
\noexpand\egroup\noexpand\robExt@cmd@to@run\noexpand\bgroup%
\fi%
\else%
\noexpand\egroup\noexpand\robExt@cmd@to@run\noexpand\bgroup%
\fi%
\fi%
\expandafter\autoForwardWords@aux% let it grap the next character
\fi%
}
\def\robExt@end@of@string{}
\def\printWord#1{I saw --((#1))--.}
\autoForwardWords\printWord\myString
\makeatother
\end{document}

f-type expansion approach to avoid assignments. I'll see if I can write something later, but you might want to look at the implementation of\text_lowercase:n(in general terms). – Joseph Wright Nov 15 '23 at 16:53\documentclass{article} \usepackage{listofitems} \ignoreemptyitems \begin{document} \def\mystring{In the market you can find some grapes, apples, and grapefruits.} \setsepchar[+]{ ||,||.||?||!||;||:||'||/}% define punctuations% \readlist\myterms\mystring \foreachitem\z\in\myterms[]{``\z'' } \end{document}, you will see that thelistofitemspackage can parse input, using defined punctuation and spaces as delimiters. The array\myterms[<n>]gives the n'th word in your input. The extraction should, I believe, be fast. – Steven B. Segletes Nov 16 '23 at 01:58