Macro parameter delimited by more than one delimiter

Question

TeX user can simply create the macro with a parameter delimited by an arbitrary sequence of tokens. For example

\def\macro #1delimiter{...#1...}

Is it possible to create a macro with a parameter which has more than one delimiter? I mean something like this:

\mydef\macro #1[first or second]{...#1...}
\macro text first  % -> ...text ... (delimiter is first)
\macro text second % -> ...text ... (delimiter is second)

The parameter text should be scanned to the first delimiter from a list of delimiters. No more scanning. I know that you can set a fixed delimiter after this (\par for example) and then to treat with this longer parameter and read words "first" or "second" from it. But this approach reads more input data than we need. Unwanted tokenization is done, maybe here is unbalanced text etc.

You can certainly pick up such delimiters, but not using a single macro. Are we to assume that the following text must be first if it starts f and second if it starts s, or is the situation more complex than that? — Joseph Wright, Sep 10 '14 at 08:46
@JosephWright My question concerns to the situation most general if it is possible. No spaces are supposed around delimiter, the letters f and s are only example. And the balanced text is supposed, of course. This is similar to the case when the parameter is delimited by a single delimiter and it have to be balanced too. — nov222, Sep 10 '14 at 09:28
@nov222 Welcome to TeX.SX! You can have a look at our starter guide to familiarize yourself further with our format. Please, could you tell us what do you need it for? — yo', Sep 10 '14 at 09:39
@tohecz Welcome too. I need to know if there exists a generalized case for delimited parameters given by \def. For example \macro text-to-the-first-coma-or-space-or-dot. Else \another text-to-the-first "and" or "And" or "AND". Etc. — nov222, Sep 10 '14 at 10:00
The short answer is no, but the longer answer is that it is possible to parse the input stream token by token and build up the potential delimiters letter by letter checking at each stage if the letters so far are an initial substring of each possible delimiter, if you really want to do that — David Carlisle, Sep 10 '14 at 10:07
@DavidCarlisle OK. The question is, how complicated it is. It seems to me usable. Maybe such macro was already done by somebody... — nov222, Sep 10 '14 at 12:34
@nov222 A recent question of min has some sort of “parsing token by token”, however the delimiter is a single token only, not a string. So it would be possible to add to that solution a checking for a string… — Manuel, Sep 10 '14 at 12:44
It would be very complicated even just for the special case of testing for first and second . To implement a system taking an arbitrary list of possible delimiters would be vastly more complicated: certainly hundreds, probably thousands of lines of code I'd guess. — David Carlisle, Sep 10 '14 at 12:45

score 10 · Accepted Answer · edited Apr 13 '17 at 12:35

You can use my macro \seplist exactly for purposes described in your question. The macro \seplist has the following syntax:

\def\yourmacro#1{... macro with normal #1 without separator}
\seplist{list of separators}\yourmacro parameter text

For example:

\def\macro#1{parameter is: "#1"}
\def\separed{\seplist{{sepA}{SepB}{SEPC}}}

\separed\macro text separated by sepA % sepA is separator
\separed\macro text separated by SepB % SepB is separator
\separed\macro text separated by SEPC % SEPC is separator

The actually used separator is stored globaly to the \sepused macro. The macro-programmer can use this.

The input stream is read to the first instance of any of the listed separator, no more. The separator list includes separators in braces. If there are only one-token separators, braces can be omited. Example:

\seplist{0123456789}\macro text to the first decimal digit 7

The separators can consist from any tokens except { } and # (more exactly the categories of these characters play the role). This is the same as when \def primitive with single separator is used. The parameter text can include these characters, but it have to be always balanced. Thus, the separator hidden in braces is ignored. This behavior is similar like in normal separated parameters. Example:

\seplist{0123456789}\macro this text {1234} is separated by five: 5
\seplist{\undefined \defined}\macro this text ends by \undefined or \defined
\seplist{{\undefined\defined}\par}\macro this text ends by \undefined\defined or by \par

If there are more separators, they all match the same text then the longer separator wins. Example:

\def\m#1{\message{param: "#1", separator: "\sepused"}}
\seplist{{BC}{ABC}}\m ahaABC % -> param: "aha", separator: "ABC"

If your \macro is defined as \long then \par can be scanned into parameter. Warning: if your \macro isn't defined as \long then parameter scanning doesn't stop at \par! You have to add the \par to the separator list. Example:

\def\parinmacro{\par}
\def\thismacro{\ifx\sepused\parinmacro \message{something wrong}\fi ...}
\seplist{{ab}{cd}\par}\thismacro This text skips the hidden {ab} and {cd}
                                 and it stops at the end of the paragraph.

You can define the \sepdef macro with the syntax similar to your requirement:

\def\sepdef #1#2[#3]{\def#1{\seplist{#3}{\csname.\string#1\endcsname}}%
  \long\expandafter\def\csname.\string#1\endcsname ##1}

\sepdef\test #1[{first}{second}]{Parameter is: "#1", separator is "\sepused".}

\test text separated by first
\test text separated by second

The implementation of \seplist macro follows. It uses only primitives and basic macros. It should be used in any TeX format.

\long\def\addto#1#2{\expandafter\def\expandafter#1\expandafter{#1#2}}
\newtoks\seplistT

\long\def\seplistD#1{%
   \seplistS##2\seplistE{\def\tmpa{##1}\def\tmpb{##2}\seplistE}%
   \def\tmpb{\tmpa #1}\expandafter\tmpb \tmp\seplistD\seplistE
}
\long\def\seplistE#1{%
   \ifx\tmpa\empty
      \seplistS\seplistD{\def\tmpb{##1}}\expandafter\tmpa\tmpb
      \ifx\tmpb\empty \seplistQ{#1}%
      \else \expandafter\addto\expandafter\seplistLx
                           \expandafter {\expandafter\seplistD\expandafter{\tmpb}{#1}}%
   \fi\fi
}
\def\seplistS{\long\expandafter\def\expandafter\tmpa\expandafter##\expandafter1\tmp}
\long\def\seplistQ#1#2\seplistA{\fi\fi\gdef\sepused{#1}\seplistZ}

\long\def\seplist#1#2{\begingroup
  \toks0={#2}\let\bgroup=\relax \let\egroup=\relax
  \def\seplistL{}\def\seplistLx{}\seplistI#1{}\gdef\sepused{}%
  \ifx\seplistL\empty \expandafter\endgroup \the\toks0\else
  \seplistT={}\expandafter\seplistA\fi
}
\def\seplistA{\futurelet\tmp\seplistB}
\def\seplistB{\let\next=\seplistP
   \expandafter\ifx\space\tmp \let\next=\seplistC \let\nexxt=\seplistM \fi
   \ifx##\tmp \let\next=\seplistC \let\nexxt=\seplistH \fi
   \ifx{\tmp  \let\next=\seplistG \fi
   \ifx}\tmp  \let\next=\seplistC \let\nexxt=\seplistF \fi
   \next
}
\def\seplistC{\afterassignment\nexxt \let\next= }
\long\def\seplistP#1{\seplistX#1\def\tmp{#1}\seplistN}
\def\seplistM{\seplistX{ }\def\tmp{ }\seplistN}
\def\seplistH{\seplistX{##}\def\seplistLx{}\seplistA}
\def\seplistN{\edef\seplistLx{\expandafter}\seplistLx \seplistL \seplistA}
\long\def\seplistG#1{\def\seplistLx{}\seplistX{{#1}}\seplistA}
\def\seplistF{\seplistT\expandafter{\expandafter{\the\seplistT}}\seplistZ}
\long\def\seplistX#1{\seplistT\expandafter{\the\seplistT#1}}
\def\seplistZ{\let\tmp=\sepused 
   \expandafter\seplistS\expandafter{\the\toks0{##1}}%
   \expandafter\expandafter\expandafter\endgroup\expandafter\tmpa\the\seplistT
}
\long\def\seplistI#1{\ifx\seplistI#1\seplistI\else
   \addto\seplistL{\seplistD{#1}{#1}}\expandafter\seplistI \fi
}

Comments to the implementation

We read the parameter token-per-token similarly as the solution of this thread and store these tokens in \seplistT token list. The internal macro \seplistL includes the list of separators in the form:

\seplistD{sepA}{sepA}\seplistD{sepB}{sepB}\seplistD{sepC}{sepC}...

We store the already read token to \tmp and run \seplistL. More exactly: at the start, the temporary \seplistLx is emty. For each read token, we expand \seplistLx and \seplistL to the input stream and before executing it we reset \def\seplistLx{}. Now, the input stream is executed, i.e. the \seplistD macro is processed for each separator. The task of \seplistD{sepA}{sepA} is the following: to test if the \tmp is equal to the first token of its first parameter (s in this example). If it is true, then \seplistD (using \seplistE) adds the text \seplistD{epA}{sepA} (the first token from the first parameter is removed) to the temporary list \seplistLx which will be executed for the next token. If \tmp isn't equal to the first token of the first parameter then \seplistD does nothing.

For example, the next read token in \tmp is e. Then \seplistD{epA}{sepA} saves the \seplistD{pA}{sepA} to the \seplistLx, because the first letter is e. If the next token in \tmp is p, then \seplistD{A}{sepA} is stored to the \seplistLx. And finally, if the next token in \tmp is A, then \seplistD{A}{sepA} does not store \seplistD{}{sepA}, but it decides that separator is found because the first parameter is empty. It defines \sepused as its second parameter (sepA) and it does the end of this game by \seplistQ plus \seplistZ. If the last token in \tmp isn't A then the \seplistD{A}{sepA} does nothing and the chain is broken because the \seplistLx is set to empty in each step. The new chain can be built because \seplistD{sepA}{sepA} is still included in \seplistL which isn't changed during calculation.

Macro-programing in TeX is beautiful but it is different than the classical technique used by "normal" programming. We utilize here the fact that the code can create the code, i.e. data and code isn't strictly separed.

@DavidCarlisle : "To implement a system taking an arbitrary list of possible delimiters would be vastly more complicated: certainly hundreds, probably thousands of lines of code I'd guess." ... You can compare your guess with the reality: 50 lines of code. — wipet, Sep 10 '14 at 15:46
+1 yes well I suppose hundreds would have been my guess "thousands" was a bit of hyperbole to stress to the user that it wouldn't be just a simple change like using xdef instead of \def. 50 isn't hundreds either but then you've done this kind of macro programming before:-) — David Carlisle, Sep 16 '14 at 08:29
@DavidCarlisle "you've done this kind of macro programming before" ... yes, I've done the reading token-per-token before. But the separators calculation was my reaction to this thread. It was an example of a subtle modification of my previous code. — wipet, Sep 16 '14 at 15:20
@wipet thank you for this fantastic tool, have you released it somewhere? (I'm mostly asking because I'd like to acknowledge it properly and use correct compatible license in my own project). Perhaps more importantly for my own application: is there any way to add \end{document} as one of delimiters? — Łukasz Grabowski, Sep 15 '23 at 19:13

Macro parameter delimited by more than one delimiter

1 Answers1

Linked