In answering Automatically put certain inputs (e.g. punctuation marks) outside of the environment/command I wrote something very similar to this:
\documentclass{article}
\usepackage{xparse}
\newcommand\Any[1]{``\textbf{#1}''\space}% a dummy command
\ExplSyntaxOn
\seq_new:N \l_word_seq % define a new sequence
\NewDocumentCommand\IterateOverPunctutation{ m }{
% apply "function" #2 to the "words" in #1 between the punctuation characters
\regex_split:nnN { ([\(\)\.,;\:\s]+) } { #1 } \l_word_seq% split the sequence
\cs_set:Nn \l_map_two:n {
\regex_match:nnTF{ [\(\)\.,;\:\s]+ }{##1}
{##1}% matches a punctuation character
{\Any{##1}}% apply \Any to ##1
}
\seq_map_function:NN \l_word_seq \l_map_two:n
}
\ExplSyntaxOff
\begin{document}
\IterateOverPunctutation{A, (B: C. D)}
\IterateOverPunctutation{A), E, G H(;;,) (B: C. D)}
\IterateOverPunctutation{abc,a:b::def:f}
\end{document}
This code produces:
Can anyone explain to me the empty double quotes appear at the end of the first two lines?
What is happening is that an empty string is being passed through to
\regex_match:nnTF{ [\(\)\.,;\:\s]+ }{##1}{##1}{\Any{##1}}
As the empty string does not match the regular expression it is then printed as \Any{}. My question really is why is \regex_match:nnTF putting an empty string into the sequence \l_word_seq?
If we change the match to
\regex_match:nnTF{ ^[\(\)\.,;\:\s]*$ }{##1}{##1}{\Any{##1}}
then we get the output that I expected:
because the new regular matches the "punctuation", the empty string and none of the "words". So it solves the problem but I still don't understand why the empty string can appear in the sequence returned by \regex_split:nnN.



\regex_split:nnNadding an empty item when there is terminal punctuation and this is not a common feature of regular expressions in other languages. Is it intentional? Is it a bug? (Thanks for telling me about\tl_if_empty:xT, although my use of\regex_match:nnTFto deal with the issue seems more efficient in this instance.) – Oct 03 '17 at 04:56;-)– egreg Oct 03 '17 at 06:31