How do I split a string?

Question

I need to split a string into one or more substrings. I know that I could use the xstring package, but I'd like to do it using only inbuilt TeX/LaTeX commands. So, if I say

\def\MyTeXKnowledge{Not good enough}

what is the simplest way to extract the substrings "Not", "good" and "enough" from the macro \MyTexKnowledge and store them in variables?

score 33 · Accepted Answer · edited Jun 08 '19 at 10:28

You need to define a macro which has the separation character in the parameter text:

\def\testthreewords#1{\threewords#1\relax}
\def\threewords#1 #2 #3\relax{ First: (#1), Second: (#2), Third: (#3) }
\testthreewords{Now good enough}

If you want to be able to provide a macro as argument you need to expand it first. This can be either done once (only first macro is expanded once):

\def\testthreewords#1{\expandafter\threewords#1\relax}

or completely:

\def\testthreewords#1{%
    \begingroup
    \edef\@tempa{#1}%
    \expandafter\endgroup
    \expandafter\threewords\@tempa\relax
}

The \relax here is used as an end marker and must not occur in the argument, otherwise a different macro should be used, like \@nnil. The grouping is added to keep the temporary definitions local.

However this setup fails with an error if the two spaces are not included in the argument. To be on the safe side you should read every substring on its own and add the separation character to the end as a fail-safe. Then you test if the end was reached:

\def\testwords#1{%
    \begingroup
    \edef\@tempa{#1\space}%
    \expandafter\endgroup
    \expandafter\readwords\@tempa\relax
}
\def\readwords#1 #2\relax{%
      \doword{#1}%  #1 = substr, #2 = rest of string
      \begingroup
      \ifx\relax#2\relax  % is #2 empty?
         \def\next{\endgroup\endtestwords}% your own end-macro if required
      \else
         \def\next{\endgroup\readwords#2\relax}%
      \fi
      \next
}
\def\doword#1{(#1)}
\def\endtestwords{}


\testwords{Now good enough}% Gives `(Now)(good)(enough)`
\testwords{Now good}% Gives `(Now)(good)`

thanks. With your help I have achieved the result that I needed. I think a modification is needed before I accept your answer, though. If I define \MyTeXKnowledge as above and then say \testthreewords{\MyTeXKnowledge} I get an error (presumably because \MyTeXKnowledge counts as only one argument). — Ian Thompson, Mar 06 '11 at 22:49
@Ian: I wasn't sure about the exact interface you want to use. You need to expand the macro first. I will update my answer. — Martin Scharrer, Mar 06 '11 at 22:53
Any chance of a MWE? This is such a good example - but rather complicated for a newbie like myself. An MWE will allow me to play around with it and help me understand it! — 3kstc, Jul 24 '19 at 04:40

Alain Matthes · Answer 2 · 2011-03-07T07:47:12.780

9

Another way : the words are stocked in macros \worda \wordb etc.

\documentclass[a4paper]{article}  

\newcount\nbofwords
\makeatletter  
\def\myutil@empty{}
\def\multiwords#1 #2\@nil{% 
 \def\NextArg{#2}%
 \advance\nbofwords by  1 %   
 \expandafter\edef\csname word\@alph\nbofwords\endcsname{#1}% 
 \ifx\myutil@empty\NextArg
     \let\next\@gobble
 \fi
 \next#2\@nil
}%    

\def\GetWords#1{%
   \let\next\multiwords 
   \nbofwords=0 %
   \expandafter\next#1 \@nil %
}% 
\makeatother

\begin{document}
 \def\MyTeXKnowledge{Not good  enough the end}
\GetWords{\MyTeXKnowledge}

There are \the\nbofwords\  words:  \worda; \wordb; \wordc;\wordd;\worde.

\end{document}

Now \MyTeXKnowledgeis accepted.

edited Mar 07 '11 at 07:47

answered Mar 06 '11 at 23:40

Alain Matthes

95,075

@Ian: Now \MyTeXKnowledgeis accepted and the substrings from the macro \MyTexKnowledge are stored in variables. – Alain Matthes Mar 07 '11 at 07:48
@AlainMatthes: I really like this solution since we can use it for an arbitrary number of words, right? I'm just learning TeX but will try to understand better this code since it appears to solve some other question I posted. – Sergio Parreiras May 12 '14 at 15:58

score 2 · Answer 3 · answered Oct 27 '23 at 09:06

As of 2023 there are other options. E.g. using the expl3 programming environment:

\documentclass{book}
\ExplSyntaxOn
\NewDocumentCommand{\getNth}{mmm}
  {
    % #1 string, #2 separator, #3 index
    \seq_set_split:Nnx \l_tmpa_seq { #2 } { #1 }
    \seq_item:Nn \l_tmpa_seq { #3 }
  }
\ExplSyntaxOff
\begin{document}
\def\mywords{first second third last}
% split by spaces and get the first item
\getNth{\mywords}{ }{1}
% split by spaces and get the last item
\getNth{\mywords}{ }{-1}
\end{document}

outputs

first
last

Or if you want to apply a function to every item:

\documentclass{book}
\ExplSyntaxOn
\NewDocumentCommand{\mapToFunction}{m}
  {
    % split by space "~"
    \seq_set_split:Nnx \l_tmpa_seq { ~ } { #1 }
    \seq_map_indexed_function:NN \l_tmpa_seq __xyz_myfunction:nn
  }
\cs_new:Nn __xyz_myfunction:nn
  {
    % #1 is the 1-based index and #2 is the current item
    % if necessary check the index with \int_compare, \int_case, or \bool_case
    % do something
    \par #1~#2 
  }
\ExplSyntaxOff
\begin{document}
\def\mywords{first second third last}
\mapToFunction{\mywords}
\end{document}

outputs

1 first
2 second
3 third
4 last

I'd not use x expansion, but possibly o; however, I'd prefer a different approach with a *-version that takes as argument a control sequence. There's large room for improvements. — egreg, Oct 27 '23 at 12:06

score 2 · Answer 4 · answered Oct 27 '23 at 12:58

Inspired by wolfrevo's attempt:

\documentclass{article}
\ExplSyntaxOn
% the prefix is `clint' because of the OP's avatar
\NewDocumentCommand{\definechunkcontainer}{s m O{~} m}
 {% #1 = boolean
  % #2 = symbolic name
  % #3 = separator (default a space)
  % #4 = text or control sequence
  \IfBooleanTF { #1 }
   {
    \clint_chunk_define:onnn { #4 } { #2 } { #3 }
   }
   {
    \clint_chunk_define:nnnn { #4 } { #2 } { #3 }
   }
 }
\NewExpandableDocumentCommand{\getchunk}{o m}
 {% #1 = chunk number; if omitted we get the number of chunks
  % #2 = symbolic name
  \IfNoValueTF { #1 }
   {
    \seq_count:c { l__clint_chunk_#2_seq }
   }
   {
    \seq_item:cn { l__clint_chunk_#2_seq } { #1 }
   }
 }
\NewDocumentCommand{\processchunks}{m o +m}
 {% #1 = symbolic name
  % #2 = optional tokens to be inserted between chunks
  % #3 = template where #1 stands for the chunk number and #2 for the chunk
  \IfNoValueTF { #2 }
   {% easier processing
    \clint_chunk_process:nn { #1 } { #3 }
   }
   {% more complex processing
    \clint_chunk_process:nnn { #1 } { #2 } { #3 }
   }
 }
\seq_new:N \l__clint_chunk_temp_seq
\cs_generate_variant:Nn \seq_set_split:Nnn { c }
\cs_generate_variant:Nn \seq_map_indexed_function:NN { c }
\cs_generate_variant:Nn \seq_map_indexed_inline:Nn { c }
\cs_new_protected:Nn \clint_chunk_define:nnnn
 {% #1 = text to be split
  % #2 = symbolic name
  % #3 = separator
  \seq_clear_new:c { l__clint_chunk_#2_seq }
  \seq_set_split:cnn { l__clint_chunk_#2_seq } { #3 } { #1 }
 }
\cs_generate_variant:Nn \clint_chunk_define:nnnn { o }
\cs_new_protected:Nn \clint_chunk_process:nn
 {
  \cs_set:Nn __clint_chunk_process_do:nn { #2 }
  \seq_map_indexed_function:cN { l__clint_chunk_#1_seq } __clint_chunk_process_do:nn
 }
\cs_new_protected:Nn \clint_chunk_process:nnn
 {
  \seq_clear:N \l__clint_chunk_temp_seq
  \cs_set:Nn __clint_chunk_process_do:nn { #3 }
  \seq_map_indexed_inline:cn { l__clint_chunk_#1_seq }
   {
    \seq_put_right:Nn \l__clint_chunk_temp_seq { __clint_chunk_process_do:nn { ##1 } { ##2 } }
   }
  \seq_use:Nn \l__clint_chunk_temp_seq { #2 }
 }
\ExplSyntaxOff
\begin{document}
% a couple of containers
\definechunkcontainer{myTeXknowledge}{not good enough}
\newcommand{\gbu}{The Good -- The Bad -- The Ugly}
\definechunkcontainer*{movie}[--]{\gbu}
% now let's test
\getchunk{myTeXknowledge} (expected: 3)
\getchunk[2]{myTeXknowledge} (expected: good)
\getchunk[3]{movie} (expected: The Ugly)
\processchunks{myTeXknowledge}{#1: #2\par}
\processchunks{movie}[/]{#2}
\begin{itemize}
\processchunks{movie}{\item[#1)] #2}
\end{itemize}
\begin{enumerate}
\processchunks{movie}{\item #2}
\end{enumerate}
\end{document}

How do I split a string?

4 Answers4

Linked