Automatically put certain inputs (e.g. punctuation marks) outside of the environment/command

Question

So I have a code, modified from this answer by User "ShreevatsaR", which can be called at using

\tooltips{...}

The code works great, for any characters inside it.

For example, for:

\tooltips{疊音詞疊音詞疊音詞}

Only ... the code will jam for punctuation marks.

The code is not designed for punctuation marks, which is fine. Because the code runs some character encoding conversion which is not needed for punctuation marks.

But, the problem is, currently I cannot compile something as such:

\tooltips{疊音詞（疊、音、詞。）疊音詞}

In fact, I have to manually copy and paste untill I end up getting all of the punctuation marks outside of the command, and re-initializing the environment for those parts which are not punctuation marks:

\tooltips{疊音詞}（\tooltips{疊}、\tooltips{音}、\tooltips{詞}。）\tooltips{疊音詞}

Which is a bit time-consuming. Is there any way to do this automatically please?

Please note that, for this question, the answer doesn't require the specific \tooltips{...} command to be used. For my part, you can replace the command by any sort of command, e.g. as such:

From:

\any{疊音詞（疊、音、詞。）疊音詞}

To:

\any{疊音詞}（\any{疊}、\any{音}、\any{詞}。）\any{疊音詞}

Also, the Chinese characters are not necessary. They could be anything which is not in a pre-defined list of items to be excluded, e.g.:

From:

\any{AQ（D、Y、F。）PIOP}

To:

\any{AQ}（\any{D}、\any{Y}、\any{F}。）\any{PIOP}

The point is to exclude any punctuation marks (in this case （ and 、 and 。 and ）.

I would be satisfied with:

an answer that either thus excludes all possible punctuation marks, however Tex may figure out those ...
or an answer that allows a list of punctuation marks to be specified (e.g. , and ? and . and " and and - in the following example).

From:

\any{AQ,D?Y.F'-PIOP}

To:

\any{AQ},\any{D}?\any{Y}.\any{F}'-\any{PIOP}

@Andrew Structurally, the question is very easy. My question is to go from \something{A Z B C} to \something{A} Z \something{B C}. That is, to take any Z and place it outside of a command (so not to let the command apply its algorithm to the Z). — O0123, Oct 02 '17 at 07:52
I have answered this version of your question :) One of the points about giving a MWE is that it makes it much easier to check whether the solution fits your use-case. The answer below probably does but as you have not given code to check against I cannot guarantee this. MWEs typically also clarify your question and make it easier for some one to help because they have some code to start from. — , Oct 02 '17 at 08:40
I understand @Andrew I should have been clearer from the beginning indeed. My apologies. I have now edited the OP. — O0123, Oct 02 '17 at 08:47

score 5 · Answer 1 · 2017-10-04T05:54:04.957

New answer - that keeps the punctuation

The only way that I know of to split text on an arbitrary set of characters is by using \regex_split from expl3. For me this is scary territory.

The code below works producing the output:

from (essentially) the lines:

\Any{A, (B: C. D)} 
\Any{A), E,  G  H(;;,) (B: C. D)}
\Any{abc,a:b::def:f}
\Any{A,A,,AAA}

Here is the full code:

\documentclass{article}
\usepackage{xparse}
\ExplSyntaxOn

\seq_new:N \l_word_seq % define a new seqence
\NewDocumentCommand\IterateOverPunctutation{ m m }{
  % apply "function" #2 to the "words" in #1 between the punctuation  characters
   \regex_split:nnN{ ([\(\)\.,;\:\s]+) }{ #1 }\l_word_seq% split word to sequence
    \cs_set:Nn \l_map_two:n {
       \regex_match:nnTF{ ^[\(\)\.,;\:\s]*$ }{##1}
            {##1}% matches a punctuation character or empty string
            {#2{##1}}% apply #2 to ##1
    }
    \seq_map_function:NN\l_word_seq\l_map_two:n% apply \l_map_two to sequence
}
\ExplSyntaxOff
\begin{document}

% make a wrapper for \Any to apply \IterateOverPunctutation
\newcommand\realAny[1]{``\textbf{#1}''\space}% a dummy \Any command
\newcommand\Any[1]{\IterateOverPunctutation{#1}\realAny}

\Any{A, (B: C. D)}

\Any{A), E,  G  H(;;,) (B: C. D)}

\Any{abc,a:b::def:f}

\Any{A,A,,AAA}

\end{document}

As I find expl3 to be quite scary let me explain a little about how the code works.

the *regular expression ([\.,;\:\s]+), in theory, matches whitespace that surrounds one or more of the "punctuation characters" ().,;:. The [...] says match any of these and the + says they should occur one or more times. We have to "escape" the characters ().: as they have other meanings in regular expressions and \s is any "space" character. Finally, the (...) makes this a "matching group", which means that it will be remembered later. See the l3regex documentation for more details.
The \regex_split:nnN splits the "word" #1 into a sequence separated by the things in the regular expression. The key thing here is that the matching group in the regular expression, which is the punctuation, gets put into the sequence too! This means that we can iterate over the sequence, using \seq_map_function:NN and use \regex_match:nnTF to either print the punctuation or apply \Any to the the "word" in the sequence.
The \cs_set:Nn defines a new macro that is applied to each element of the sequence, which includes the punctuation because of the capture group in the regular expression (unlike \cs_new:Nn this does not complain if the command is ready defined
if the "sentence" ends in a punctuation character then there will be an "empty" sequence in \l_word_seq. To cater for this \regex_match looks for ^[\.,;\:\s]*$, which matches 0 or more occurrences of the "punctuation" but only if it is all of ##1. That is, it accepts either the empty string or a full sentence of punctuation characters.

Original answer

I'll answer the version of the question given in the comments, which is to apply a command \something to each element of a comma separated list. This is very easy to do using the \docsvlist command from the etoolbox package.

As there is not MWE and no definition of \something the MWe below takes \something to be \textbf:

\documentclass{article}
\usepackage{etoolbox}
\begin{document}

\renewcommand*\do[1]{\textbf{#1}}
\docsvlist{A,B,C,D}
\end{document}

A space separated list is marginally harder, and probably not worth the effort. Given that commas were used in the original question I assume this is OK. Of course, something may well go wrong if \something is replaced by the \tooltips command of the OP, but in almost all cases this will be OK.

I can learn a lot from your answer, but I am looking for something extra. Could we make the separators to consist of a list of predefined punctuation marks? The OP shows not only "western-style" comma's (,) to be taken out, but actually many punctuation markers, e.g. （ and 、 and 。 and）. — O0123, Oct 02 '17 at 08:45
Please also note that it is possible for two or more punctuation markers to follow up one another (e.g. in 。） at the OP). — O0123, Oct 02 '17 at 08:59
I am excited to dig into your new answer @Andrew. Can you compile this straight away in LaTeX, or which compiler to use please? — O0123, Oct 02 '17 at 10:46
I had to update my installation. I am very very happy with this answer, but only one question: How to still output the original punctuation marks? — O0123, Oct 02 '17 at 11:52
I have now made a follow-up question, entitled: Regular expressions … how to also return the sequence splitting characters? — O0123, Oct 02 '17 at 12:12
@VincentMiaEdieVerheyenmn I have added a variation that keeps the punctutation — , Oct 02 '17 at 12:15
@Andrew The problem with the code now seems to be that you seem not to be able to have more than one instance? For example, if you paste two copies of the line \IterateOverPunctutation{A, B: C. D}\Any then the code will fail. — O0123, Oct 02 '17 at 13:15
@VincentMiaEdieVerheyen It seems that I should use \cs_set:Nn rather than \cs_new:Nn to define \my_map_two. This said, for reasons that I don't understand, when you add \)\( into the regular expression, as in your question, then you also capture spurious white space. — , Oct 02 '17 at 15:45
On my system \Any{A,A,,AAA} currently returns “A”,“A”,,A whereas it should return “A”,“A”,,“AAA” (I am disregarding spacing here). — O0123, Oct 04 '17 at 02:15
@VincentMiaEdieVerheyen Sorry, I forgot to take out a hack that I had previously needed in the definition of \realAny. I have updated the code and also used the marginally more efficient regular expressions from my question https://tex.stackexchange.com/questions/394278/regular-expression-weirdness/394284?noredirect=1#comment980285_394284. Yesterday I tested your example\Any{A,A,,AAA} using the code from my post, which is why I thought it worked. — , Oct 04 '17 at 05:24

score 2 · Answer 2 · answered Oct 02 '17 at 13:16

This seems to work; first we split the input at spaces, then process each item.

It requires defining \tooltipsA as the macro that does the real work with tooltips, whereas \tooltip is used just for processing the input for taking care of punctuation.

\documentclass{article}
\usepackage{xparse}

\ExplSyntaxOn

\NewDocumentCommand{\tooltipsA}{m}
 {
  \fbox{#1} % this should be the real action of your \tooltip command
 }

\NewDocumentCommand{\tooltip}{m}
 {
  \seq_set_split:Nnn \l_tmpa_seq { ~ } { #1 }
  \seq_clear:N \l_tmpb_seq
  \seq_map_function:NN \l_tmpa_seq \vincent_process:n
  \seq_use:Nn \l_tmpb_seq { ~ }
 }

\cs_new_protected:Nn \vincent_process:n
 {
  \tl_set:Nn \l_tmpa_tl { #1 }
  \regex_replace_all:nnN
   { ( [^\(\)]+? ) ( [,.]+? ) } % <==== Here define your punctuations
   { \c{tooltipsA}\cB\{ \1 \cE\} \2 }
   \l_tmpa_tl
  \regex_match:nVF { \c{tooltipsA} } \l_tmpa_tl
   {
    \regex_replace_once:nnN
     { ( \(? ) ( [^\(\)]* ) }
     { \1 \c{tooltipsA}\cB\{ \2 \cE\} }
     \l_tmpa_tl
   }
  \seq_put_right:NV \l_tmpb_seq \l_tmpa_tl
 }
\cs_generate_variant:Nn \regex_match:nnF { nV }

\ExplSyntaxOff

\begin{document}

\tooltip{a,b,c.}

\tooltip{abc (d,ef,g.) hij (uuu vvv)}

\end{document}

Here I gave a dummy definition of \tooltipsA, just for showing that the result is as expected.

Thank you very much. There's a problem with \tooltip{a,b,c..d} though (compile and you can see what I mean). Also problems with \tooltip{a,b,c;,;d} or e.g. with \tooltip{a,b,c,,.d}. It's not consistent in comparison with e.g. \tooltip{a,b,c;;;d}. — O0123, Oct 02 '17 at 14:03
Compiling \tooltip{abc,a:b::def:f}perhaps shows the problems even more clearly. — O0123, Oct 02 '17 at 14:19

wipet · Accepted Answer · 2017-10-04T09:56:02.713

2

You can use \replacetrings from OPmac. It replaces mentioned punctuation chars by & followed by this punctuation mark. Then you can define simple \anyA macro with parameter separated by &.

The code for defining \replacestring cannot be used directly when you are using plain TeX plus OPmac.

\long\def\addto#1#2{\expandafter\def\expandafter#1\expandafter{#1#2}}
\bgroup \catcode`!=3 \catcode`?=3
\gdef\replacestrings#1#2{\long\def\replacestringsA##1#1{\def\tmpb{##1}\replacestringsB}%
   \long\def\replacestringsB##1#1{\ifx!##1\relax \else\addto\tmpb{#2##1}%
      \expandafter\replacestringsB\fi}%     improved version <May 2016> inspired 
   \expandafter\replacestringsA\tmpb?#1!#1% from pysyntax.tex by Petr Krajnik
   \long\def\replacestringsA##1?{\def\tmpb{##1}}\expandafter\replacestringsA\tmpb
}
\egroup


\def\any#1{\def\tmpb{#1}%
   \replacestrings {,}  {&,}%
   \replacestrings {.}  {&.}%
   \replacestrings {'-} {&{'-}}
   \replacestrings {'}  {&'}%
   \replacestrings {(}  {&(}%
   \replacestrings {)}  {&)}%
   \replacestrings {?}  {&?}%
   \expandafter\anyA\tmpb&{}%
}
\def\anyA#1#2&#3{\anyX{#1#2}#3\ifx&#3&\else\expandafter\anyB\fi}
\def\anyB{\futurelet\next\anyC}
\def\anyC{\expandafter\ifx\space\next\space\fi\anyA}


% just for testing:
\def\anyX#1{any[#1]}

\any{AQ,D?Y.F'-PIOP}
% result:  any[AQ],any[D]?any[Y].any[F]'-any[PIOP]

\bye

edited Oct 04 '17 at 09:56

answered Oct 02 '17 at 16:42

wipet

74,238

Of all the answers from this OP and from the related OP "Regular expressions … how to also return the sequence splitting characters?", I found this answer to be the best compatible with other commands, such as this one adapted from the tooltip package. – O0123 Oct 04 '17 at 03:17
I notice that the flexible part here is \anyX#1{...#1...}. For my purpose, I can add horizontal space here, to be inserted around the punctuation marks. I was wondering whether it is easy to adapt your answer so as to fine-tune this spacing for certain punctuation marks only? That is, how to issue slightly different outputs around certain of the separators? – O0123 Oct 04 '17 at 03:19
Another problem seems to be the character ～. Somehow, \replacestrings {～} {&～}% doesn't seem to work. One can partially solve this by using \replacestrings {～} {&～}% and replacing every instance of ～ by \textasciitilde, but that only works if the next character is either a space or another separator. – O0123 Oct 04 '17 at 04:20
I can't estimate why the character ~ "doesn't seem to work". It is active character by default, it means no-breakable space. If you need to convert active character to non-active character then you can use \replacestrings {~} {&{\string~}}. – wipet Oct 04 '17 at 09:53
I added two lines of code in my answer which allow to use space after punctuation character. The existence of such space is tested by \futurelet primitive. – wipet Oct 04 '17 at 10:02
Regarding ～, I have asked a new question entitled "What's so special about ﹑and ～" which also uses your code. There we will find out what "doesn't seem to work" and hopefully why. – O0123 Oct 04 '17 at 12:37
I still very much enjoy your code, but have found there is some sort of incompatibility with including \kern into it (well, when combined it with a Tooltip command as already hinted at in this OP). I have therefor asked a new OP at "Why is there a conflict between \kern and this customized Tooltips/Separator command". – O0123 Oct 12 '17 at 11:17

score 1 · Answer 4 · answered Oct 02 '17 at 19:53

The listofitems package can make quick work of it.

\documentclass{article}
\usepackage{listofitems,amsmath}
\newcommand\any[1]{\fbox{#1}}
%%%
\let\svany\any
\renewcommand\any[1]{%
  \setsepchar{,||?||.||'||-||(||)|| }
  \readlist\mylist{#1}%
  \foreachitem\i\in\mylist{%
    \expandafter\ifx\expandafter\relax\i\else\svany{\i}\fi%
    \ifnum\icnt<\mylistlen\relax\mylistsep[\icnt]\fi%
  }%
}
%%%
\begin{document}
\any{AQ,D?Y.F'-PIOP}

\any{a,b,c.}

\any{abc (d,ef,g.) hij (uuu vvv)}
\end{document}

Automatically put certain inputs (e.g. punctuation marks) outside of the environment/command

4 Answers4

Linked