11

Suppose I have a list of arbitrary length:

{1, 2, 3, "Open", 3, 2, "Close", 9, 3, 4, "Open", 1, 0, "Close", 3, 5}

and I am trying to extract the sequences delimited by the "Open"/"Close" tags, i.e. the answer I want is:

{{3, 2}, {1, 0}}

What's the right way to do this? The data I am actually working with is a large XML document and I am trying to extract sections of it by identifying certain tags as boundaries. I've fiddled with different pattern matching functions but can't figure out how to operate on a list in this way since I am trying to match patterns on an intermediate level between individual elements and the entire list.

EDIT

To clarify, I don't know in advance how many such sequences the data will contain or how many elements will be present between any particular set of tags.

Syed
  • 52,495
  • 4
  • 30
  • 85
mfvonh
  • 8,460
  • 27
  • 42

10 Answers10

8

Here's a plain pattern approach, I'm not quite sure how robust it is:

ReplaceList[expr, {___, "Open", x : Except["Close"] ..., "Close", ___} :> {x}]

Also take a look at Longest and Shortest, which may come in handy.

amr
  • 5,487
  • 1
  • 22
  • 32
7

Try this one

ReplaceList[expr, {__, PatternSequence["Open",v__ /; Count[{v}, _String] == 0, 
            _String], __} -> {v}]
Spawn1701D
  • 1,871
  • 13
  • 14
  • If you don't Flatten the data from the xml Case will be a good alternative. – Spawn1701D Apr 30 '13 at 20:51
  • Good call on ReplaceList – mfvonh Apr 30 '13 at 20:53
  • Maybe change PatternSequence["Open", v__ /; Count[{v}, _?(# == "Close" &)] == 0, "Close"] in case the list contains other strings than "Open"|"Close". – b.gates.you.know.what Apr 30 '13 at 21:02
  • @b.gatessucks yes the correct structure of the pattern will depend on the actual data, if the tags contain also in the data strings the OP should exclude just the xml tags. My opinion is that the XMLObject and XMLElement objects must be exploited instead of a flattened list. – Spawn1701D Apr 30 '13 at 21:09
  • I would use the pattern {__, PatternSequence["Open", v : Except["Close"] .., "Close"], __} from the answer below. That should be more efficient that using a test function. – Sjoerd Smit Jul 29 '17 at 19:26
5

In versions 10.1+, you can use SequenceCases:

 lst={1, 2, 3, "Open", 3, 2, "Close", 9, 3, 4, "Open", 1, 0, "Close", 3, 5};
 SequenceCases[lst, {"Open", x:Except["Close"].., "Close"} :> {x}]

{{3, 2}, {1, 0}}

kglr
  • 394,356
  • 18
  • 477
  • 896
  • How could one use the Shortest construct here for x? – Syed Sep 10 '23 at 16:13
  • 1
    SequenceCases[lst, {"Open", Shortest[x__Integer], "Close"} :> {x}] works, but,for some reason, SequenceCases[lst, {"Open", Shortest[x__], "Close"} :> {x}] does not. – kglr Sep 10 '23 at 17:34
3
Split[lst, (#1 =!= "Close" && #2 =!= "Open") &] // 
 Cases[{"Open", x__, "Close"} :> {x}]

{{3, 2}, {1, 0}}

Syed
  • 52,495
  • 4
  • 30
  • 85
2

My humble attempt:

list = {1, 2, 3, "Open", 3, 2, "Close", 9, 3, 4, "Open", 1, 0, 
   "Close", 3, 5};
SplitBy[
 Select[list, 
  (open = # != "Close" && (# == "Open" || open)) &], # == "Open" &] 
    //. "Open" | {} -> Sequence[]
swish
  • 7,881
  • 26
  • 48
2

I feel this is way too complicated, but anyway:

l = {1, 2, 3, "Open", 3, 4, 5, 2, "Close", 9, 3, 4, "Open", 0, 
   "Close", "Close", 3, 5};

Reap[l //. {a___, 
     PatternSequence["Open", mid : _?NumericQ .., "Close"], 
     b___} :> {a, Sow[{mid}]; mid, b}][[2, 1]]

{{3, 4, 5, 2}, {0}}

This example might have some limited instructional value showing one convoluted possible use of Sow and Reap.

Yves Klett
  • 15,383
  • 5
  • 57
  • 124
1
positions = (data // {PositionIndex[#]["Open"],PositionIndex[#]["Close"]} & 
// Transpose )

{{4, 7}, {11, 14}}

data[[Span[# + {1, -1}]]] & /@ positions

{{3, 2}, {1, 0}}

user1066
  • 17,923
  • 3
  • 31
  • 49
1

Splitting twice:

list = {1, 2, 3, "Open", 3, 2, "Close", 9, 3, 4, "Open", 1, 0, "Close", 3, 5};

Join @@ Cases[{{_?NumberQ, ___}}] @ Split[SplitBy[list, NumberQ], #1 != {"Open"} && #2 != {"Close"} &]

{{3, 2}, {1, 0}}

eldo
  • 67,911
  • 5
  • 60
  • 168
1

Using Subsequences and Cases:

Subsequences[lst] // Cases[{"Open", x__?NumberQ, "Close"} :> {x}]

({{3, 2}, {1, 0}})

E. Chan-López
  • 23,117
  • 3
  • 21
  • 44
1

What about this one?

ReplaceList[list, {___, "Open", x__ /; FreeQ[{x}, "Close"], "Close", ___} -> {x}]

(although amr's one seems better)

J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574