19

I have run across several times recently where I must match a sequence that is cyclically repeating. For example, with the sequence,

Sequence[a, b, c]

how would I write a pattern that matched against

a
a, b
a, b, c
a, b, c, a

etc., but not any of the surrounding elements?

For a two element sequence comprising the entire list, this is straightforward, if a bit verbose:

MatchQ[#, {a | PatternSequence[a, b] .., a | PatternSequence[]}] & /@ 
 {{a}, {a, b}, {a, b, a}, {a, b, a, b, a, b, a}}
(* {True, True, True, True} *)

But, this does not work when the sequence is part of a bigger list, e.g.

Cases[
 {q, r, a, b, a, b, a, s, e, f, a}, 
 x : PatternSequence[a|(PatternSequence[a, b] ..), a | PatternSequence[]] :> {x}, 
 Infinity
]
(* {{a}, {a}, {a}, {a}} *)

Even reversing the initial element to (PatternSequence[a, b] ..)|a returns the same thing.

So, my questions are:

  1. How can I write the pattern so it extracts parts of larger lists that may contain other elements?

  2. How can I generalize the pattern to match against larger cyclically repeating sequences?

rcollyer
  • 33,976
  • 7
  • 92
  • 191
  • 1
    Partial success: Cases[{{q, r, a, b, a, b, a, s, e, f, a}}, {y___, x : Longest@PatternSequence[(PatternSequence[a, b] ...), a | PatternSequence[]], z___} :> {x}]. I don't think PatternSequence matches a sub-Sequence by itself but needs to appear inside a Head. I might be wrong, though. – Michael E2 Mar 11 '14 at 15:07
  • I think @Michael is right: it generally doesn't seem to be possible to match any sequence on its own, only as part of a bigger expression. ___ alone will only match a single element, but {___} will match a list with an arbitrary number of elements. It is only the complete list that will match, not it's elements separately. Example: ReplaceAll[Range[10], Longest[___Integer] -> x]. This can't replace the whole sequence of numbers in one go. This can: ReplaceAll[Range[10], z_[___Integer] :> z[x]]. – Szabolcs Mar 11 '14 at 16:01
  • So the key is to somehow also match on whatever contains that sequence. What function do you want to use this in in practice? Cases? Replace? MatchQ? Or a function definition? It might work best in the last case. – Szabolcs Mar 11 '14 at 16:01
  • @Szabolcs in general, I'm using MatchQ, but I would like it to be adaptable to both Cases and Replace. Usually, I'm working with lists, but I'd like this as general as possible. – rcollyer Mar 11 '14 at 16:35

3 Answers3

5
findCyclicMatches[u_List, cycle_List] := Module[{form, x, y},
  form[w_List] := 
   PatternSequence[
    x : Longest@Repeated[PatternSequence @@ w, {0, Infinity}], 
    y : Alternatives @@ 
      Table[Longest@Repeated[PatternSequence @@ w[[;; i]], {0, 1}], {i, Length[w], 1, -1}]];

  Last@Reap@ NestWhileList[First@Cases[{#}, {r___, Longest@form@cycle, s___} :>
                                                    (Sow@{x, y}; {r, "Separator", s})] &, 
     u, 
    (Length@Last@{##} <= Length[{##}[[-2]]]) &, 2]
  ]
u = {a, b, c, kk, a, b, c, k, a, b, c, a, b, k, a, b, c, a};
findCyclicMatches[u, {a, b, c}]

(*
 {{{a, b, c, a, b}, {a, b, c, a}, {a, b, c}, {a, b, c}, {}}}
*)
Dr. belisarius
  • 115,881
  • 13
  • 203
  • 453
5

Here is a reasonable first implementation of a cyclical pattern matcher that behaves as described in the question. It takes an input cyclic list (single cycle) and a list that is to be tested:

ClearAll@cyclicPatternMatchQ
cyclicPatternMatchQ[cycList_][testList_] := MatchQ[cycList, 
    testList /. {Shortest[h___], (PatternSequence @@ cycList) ... , 
        Shortest[t___]} :> {h, t, ___}]

Here is a sample test case with results:

{#, cyclicPatternMatchQ[{a, b, c}]@#} & /@ {{a}, {a, b}, {a, b, 
    c}, {a, b, c, a}, {a, b, c, a, b}, {x, y, z, p, q}, {d, a, b, c, 
    d}, {a, b, c, d, a}, {a, b, d, c}} // Grid

You can combine this with regular patterns to match cyclical patterns inside another list:

{d, a, b, c, a, e, f} /. 
    {h___, Longest@m__, t___} /; cyclicPatternMatchQ[{a, b, c}][{m}] :> {h, t}
(* {d, e, f} *)
rm -rf
  • 88,781
  • 21
  • 293
  • 472
  • +1. How would you make this amenable to using PatternTest instead of Condition? – rcollyer Mar 11 '14 at 17:00
  • @rcollyer Is there a reason you want to use PatternTest? PatternTest tests per element, whereas you want to test against the entire sequence, so I don't think it's the right tool. – rm -rf Mar 11 '14 at 17:01
  • More habit than anything else. But, I'm not sure you get that much savings with Condition, i.e. {d, a, b, c, a, e, f} /. {h___, Longest@m__, t___} /; cyclicPatternMatchQ[{a, b, c}][Print[{m}]; {m}] :> {h, t}. Using Condition is certainly easier, but I'm mostly just curious at this point. – rcollyer Mar 11 '14 at 17:10
5

A very interesting question. I thought of a much plainer approach than the other responders but it proves to perform quite well. I simply PadRight the reference sequence to match the length of the test sequence.

Update: limited extension to patterns within ref and timings updated for version 10.1.0.

Functions

cycQ[ref_][test_] := test ~MatchQ~ PadRight[ref, Length @ test, ref]

cycpat[f_, r___] := p : PatternSequence[f, ___] /; cycQ[{f, r}][{p}] // Identity

cycQ tests one sequence against another:

cycQ[{1, 2, 3}] /@ {{}, {1}, {1, 2, 3}, {1, 2, 3, 1}, {2, 3}}
{True, True, True, True, False}

cycpat is the pattern-building function:

cycpat[1, 2, 3]
p$ : PatternSequence[1, ___] /; cycQ[{1, 2, 3}][{p$}]

Sample applications

Sample data:

SeedRandom[1]
test = RandomInteger[{1, 3}, 20]
{2, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 1, 3, 1, 2, 3, 1, 1, 2, 2}

Finding the single longest sequence in the list:

test /. {___, x : Longest @ cycpat[1, 2, 3], ___} :> {x}
{1, 2, 3, 1}

Finding all sequence fragments in a list, length 2 or greater:

ReplaceList[test, {___, x : cycpat[1, 2, 3] /; Length[{x}] > 1, ___} :> {x}]
{{1, 2}, {1, 2}, {1, 2}, {1, 2, 3}, {1, 2, 3, 1}, {1, 2}}

Performance

rm -rf's cyclicPatternMatchQ, while certainly interesting, isn't fast enough to be widely applicable:

SeedRandom[1]
a = RandomInteger[{1, 5}, 300];

a /. {___, x : Longest@cycpat[1, 2, 3, 4, 5], ___} :> {x} // Timing

a /. {___, Longest@m__, ___} /; 
    cyclicPatternMatchQ[{1, 2, 3, 4, 5}][{m}] :> {m}      // Timing
{0.145, {1, 2, 3, 4}}

{6.16204, {1, 2, 3, 4}}

belisarius's form function is much faster but still not as fast as cycpat:

form[w_List] := (* Note I removed the x and y patterns *)
 PatternSequence[Longest@Repeated[PatternSequence @@ w, {0, Infinity}], 
  Alternatives @@ 
    Table[Longest@Repeated[PatternSequence @@ w[[;; i]], {0, 1}], {i, Length[w], 1, -1}]]

SeedRandom[10]
big = RandomInteger[{1, 5}, 1200];

big /. {___, q : Longest @ cycpat[1, 2, 3, 4, 5], ___} :> {q}  // Timing
big /. {___, q : Longest @ form @ {1, 2, 3, 4, 5}, ___} :> {q} // Timing
{6.18, {1, 2, 3, 4, 5, 1}}

{10.80, {1, 2, 3, 4, 5, 1}}

It is worth noting however that (use of) form slows down semi-proportionately to the length of the sequence it is given, while cycpat does not:

big /. {___, q : Longest[cycpat @@ Range[50]], ___} :> {q} // Timing
big /. {___, q : Longest @ form @ Range[50], ___} :> {q}   // Timing
{6.282, {1, 2, 3, 4, 5}}

{63.586, {1, 2, 3, 4, 5}}

cycpat still seems rather slow for a list of only 1200 elements but I was unable to improve its performance. Possibly a form of memoization would speed the highly repetitive application of cycQ without unacceptable memory consumption.

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
  • I am not sure if it is THAT slow. There should be about Binomial[1200+2,2] or 721801 sequences to try. Do[f[i], {i, 1*^6}] // Timing already takes the order of a second, so I doubt it can be improved more than an order of magnitude the most – Rojo Apr 07 '14 at 11:54
  • +1, I wish I had thought of this. Very interesting. – rcollyer Apr 07 '14 at 12:45
  • @Rojo I suppose that for a standard pattern expression that may be correct. I suspect that a stand-alone search would permit a more intelligent algorithm, but that is not what was requested. – Mr.Wizard Apr 07 '14 at 17:14
  • @rcollyer Thank you. Does your application require a pattern (that may be combined with other patterns, etc.) or do you simply want to search a list? I suspect the latter can be much faster; for example if I know a priori that the longest match in big is 1, 2, 3, 4, 5, 1 I can find its position in 0.00005 second. – Mr.Wizard Apr 07 '14 at 17:18
  • I was just thinking there are cases where a pattern is absolutely necessary. I'll have to play with it a bit, and see. – rcollyer Apr 07 '14 at 18:30
  • I was just trying to use this like MatchQ[list, _?(cycpat[p1, p2, ...])] where the pi are patterns, and it keeps returning false. On the contrary, MatchQ[list, _?(cycQ[{p1, p2, ...}])] returns True. On the surface it looked like _?(cycpat[...]) wasn't getting expanded, but adding an UpValue doesn't fix it. Not sure how to correct it to work with MatchQ, so I'll keep playing with it. – rcollyer Jun 22 '15 at 19:20
  • @rcollyer I wrote this only for a literal match, not patterns within the cycle itself. Using MatchQ in place of SameQ should extend this to patterns that match a single element only, and I'll update my answer with that, but it's still far from general pattern handling! – Mr.Wizard Jun 22 '15 at 22:01
  • MatchQ wasn't cutting it either, at least using cycpat. But, I used the idea of building the pattern on the fly using it's length to get something that worked just fine, e.g. cyclicPattern[p_List][test_List] /; Length@p <= Length@test := test~MatchQ~PadRight[{}, Length@test, p] which is used like MatchQ[{a,3,b}, _?(cyclicPattern[{_Symbol, _?NumericQ}])]. – rcollyer Jun 23 '15 at 01:36