17

I have a list of strings called mylist:

mylist = {"[a]", "a", "a", "[b]", "b", "b", "[ c ]", "c", "c"};

I would like to split mylist by "section headers." Strings that begin with the character [ are section headers in my application. Thus, I would like to split mylist in such a way as to obtain this output:

{{"[a]", "a", "a"}, {"[b]", "b", "b"}, {"[ c ]", "c", "c"}}

(The as, bs, and cs represent any characters; the string inside the section header does not necessarily match the strings that follow in that section. Also, the number of strings in each section can vary.

I have tried:

SplitBy[mylist, StringMatchQ[#, "[" ~~ ___] &]

But this is not correct; I obtain:

{{"[a]"}, {"a", "a"}, {"[b]"}, {"b", "b"}, {"[ c ]"}, {"c", "c"}}

Likewise, using Split (since it applies the test function only to adjacent elements) does not work. The command:

Split[mylist, StringMatchQ[#, "[" ~~ ___] &]

yields:

{{"[a]", "a"}, {"a"}, {"[b]", "b"}, {"b"}, {"[ c ]", "c"}, {"c"}}

Do you have any advice? Thanks.

Andrew
  • 10,569
  • 5
  • 51
  • 104

8 Answers8

19

Here's my suggestion:

mylist = {"[a]", "a", "a", "[b]", "b", "b", "[ c ]", "c", "c"};

Split[mylist, ! StringMatchQ[#2, "[*"] &]

and we get:

{{"[a]", "a", "a"}, {"[b]", "b", "b"}, {"[ c ]", "c", "c"}}
J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574
Murta
  • 26,275
  • 6
  • 76
  • 166
9

At the risk of being annoying, I will pitch the linked lists again. Here is the code using linked lists:

ClearAll[split];
split[{}] = {};
split[l_List] :=
  Reap[split[{}, Fold[{#2, #1} &, {}, Reverse@l]]][[2, 1]];

split[accum_, {h_, tail : {_?sectionQ, _} | {}}] :=
  split[Sow[Flatten[{accum, h}]]; {}, tail];

split[accum_, {h_, tail_}] := split[{accum, h}, tail];

The function sectionQ has been stolen from the answer of @rm-rf. The usage is

split[mylist]

(* {{[a],a,a},{[b],b,b},{[ c ],c,c}} *)

The advantages I see in using linked lists is that they allow one to produce solutions which are

  • Easily generalizable to more complex problems
  • Straightforward to implement
  • Easy to argue about (in terms of algorithmic complexity etc)

They may not be the fastest though, so may not always be suitable for performance-critical applications.

Leonid Shifrin
  • 114,335
  • 15
  • 329
  • 420
7

Here's one method, using a slightly modified example:

mylist = {"[a]", "a", "[b]", "b", "b", "b", "[ c ]", "c", "c"};

pos = Append[Flatten[Position[mylist,
             s_String /; StringMatchQ[s, "[" ~~ ___]]], Length[mylist] + 1]
   {1, 3, 7, 10}

Take[mylist, {#1, #2 - 1}] & @@@ Partition[pos, 2, 1]
   {{"[a]", "a"}, {"[b]", "b", "b", "b"}, {"[ c ]", "c", "c"}}
J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574
7

Here's one approach using FixedPoint and Replace:

sectionQ := ! StringFreeQ[#, "["] &;
FixedPoint[
    Replace[#, {h___, sec_?sectionQ, Longest[x___?(! sectionQ@# &)], t___} :> {h, t, {sec, x}}] &, 
    mylist]

(* {{"[a]", "a", "a"}, {"[b]", "b", "b"}, {"[ c ]", "c", "c"}} *)
rm -rf
  • 88,781
  • 21
  • 293
  • 472
  • 1
    One can also define sectionQ = StringMatchQ[#, "[" ~~ ___] &, as in the question to allow for [ occurring somewhere (except the first position) in the strings following the section headers. – rm -rf Oct 12 '12 at 18:55
3

Here's an answer based on the solution of Murta that parses recursively a list based on different delimiters that can be patterns or string patterns. This can be useful for example to parse a debug output where loops are involved.

splitByPattern[l_List,p_?System`Dump`validStringExpressionQ]:=splitByPattern[l, _String?(StringMatchQ[#, p] &)];
splitByPattern[l_List,p_]:=Split[l,!MatchQ[#2,p]&];

splitByPatternFold[l_,{},True|False]:=l;
splitByPatternFold[l_,{p_},False]:=splitByPattern[l,p];
splitByPatternFold[l_,{p_},True]:=Join[{First@l},splitByPattern[Rest@l,p]];
splitByPatternFold[l_,{p_,rest__},False]:=splitByPatternFold[#,{rest},True]&/@splitByPattern[l,p];
splitByPatternFold[l_,{p_,rest__},True]:=Join[{First@l},splitByPatternFold[#,{rest},True]&/@splitByPattern[Rest@l,p]];
splitByPatternFold[l_List,patterns_List,hasHeader_:False]:=splitByPatternFold[l,patterns,hasHeader];

To access the split elements you can use this function

splitAccess[l_, indices_] :=
Module[{offsets = Table[1, {Length@indices}]},
   offsets[[1]] = 0;
   l[[Sequence @@ (indices + offsets)]]
]

Example

l={a, b, c, d, e, f, a, b, c, d, e, f};

x = splitByPatternFold[l,{a,b,c,d,e}]
> {{a,{b,{c,{d,{e,f}}}}},{a,{b,{c,{d,{e,f}}}}}}

splitAccess[x,{2,1}]
> {b, {c, {d, {e, f}}}}

The answer to the question would be written as

mylist={"[a]",a,"a","[b]",b,"b","[ c ]",c,"c"};
splitByPattern[mylist,"[*"]  

Note that all elements don't need to be strings when giving a string pattern as argument.

faysou
  • 10,999
  • 3
  • 50
  • 125
2

Here's my version based on Position.

mylist = {"[a]", "a", "a", "[b]", "b", "b", "[ c ]", "c", "c"};

split[lst_List, pat_String] := Module[{len, pos},
  len = Length[lst];
  pos = Partition[Flatten[{Position[lst, _String?(StringMatchQ[#, pat ~~ __] &)],len + 1}], 2, 1];
  lst[[#[[1]] ;; #[[2]] - 1]] & /@ pos]

usage

split[mylist, "["]

Out

{{"[a]", "a", "a"}, {"[b]", "b", "b"}, {"[ c ]", "c", "c"}}

Lou
  • 3,822
  • 23
  • 26
0
Split[mylist, StringFreeQ["["] @ #2 &]

{{"[a]", "a", "a"}, {"[b]", "b", "b"}, {"[ c ]", "c", "c"}}

SequenceCases[mylist, a:{_?(!StringFreeQ[ "["]@#&),__?(StringFreeQ[ "["])}:> {a}]

{{"[a]", "a", "a"}, {"[b]", "b", "b"}, {"[ c ]", "c", "c"}}

kglr
  • 394,356
  • 18
  • 477
  • 896
0
Clear["Global`*"];

mylist = {"[a]", "a", "a", "[b]", "b", "b", "[ c ]", "c", "c"};
patt = StartOfString ~~ Whitespace ... ~~ "[" ~~ Whitespace ... ~~ _ ~~
    Whitespace ... ~~ "]" ~~ Whitespace ... ~~ EndOfString;

Testing:

StringMatchQ[patt] /@ mylist

{True, False, False, True, False, False, True, False, False}

Finally:

Split[mylist, StringMatchQ[#1, patt] || StringFreeQ[#2, patt] &]

This solution is tolerant of any combination of (additionally) inserted whitespace around "[" and "]".


Result

{{"[a]", "a", "a"}, {"[b]", "b", "b"}, {"[ c ]", "c", "c"}}

Syed
  • 52,495
  • 4
  • 30
  • 85