11

I'm trying to make a pattern that's easy to preceive by human but hard to write out by Mathematica when I came across this problem. (Check the original problem here)

Let's check this simple case:

I've got a list {5,1,2,1,2,1,2,1,2,4,3,3,3,3,3,3,10} and I would like to find out all the recurrence period and the sequence before and after them. So, if you have a brief match with your brain, you can know there are two possible matchs:{5,1,2,1,2,1,2,1,2,4,3,3,3,3,3,3,10} and {5,1,2,1,2,1,2,1,2,4,3,3,3,3,3,3,10}.

It's okay if I only want to find out one of them, using the following code will work as desired:

Replace[{5, 1, 2, 1, 2, 1, 2, 1, 2, 4, 3, 3, 3, 3, 3, 3, 10}, 
{Shortest[pre___, 3], Longest[Repeated[Shortest[rep__, 1], {2, Infinity}]], Shortest[inc___, 2]}
:> {{pre}, {rep}, {inc}}]

(*{{5}, {1, 2}, {4, 3, 3, 3, 3, 3, 3, 10}}*)

But If I want to find out all of them, it's not quite direct as the following code which simply change Replace to ReplaceList will not work:

r1=
ReplaceList[{5, 1, 2, 1, 2, 1, 2, 1, 2, 4, 3, 3, 3, 3, 3, 3, 10}, 
{Shortest[pre___, 3], Longest[Repeated[Shortest[rep__, 1], {2, Infinity}]], Shortest[inc___, 2]}
:> {{pre}, {rep}, {inc}}]

The result is incredibly long and included all the possible match and ignored all the Shortest or Longest:

r2 = ReplaceList[{5, 1, 2, 1, 2, 1, 2, 1, 2, 4, 3, 3, 3, 3, 3, 3, 10},
{pre___, Repeated[rep__, {2, Infinity}], inc___} :> {{pre}, {rep}, {inc}}]

Sort@r1==Sort@r2

(*True*)

This is, of course, not the desired result, but how can I set the pattern-matcher to do this work? And are there any reason that ReplaceList will ignore all these Shortest and Longest? Any help or any other approach other than my way is appreciated. But of course, the final goal is to solve this using ReplaceList or similar functions.

Wjx
  • 9,558
  • 1
  • 34
  • 70

2 Answers2

8

To me this is an interesting problem, however I think the question is a misguided one.

Briefly: Longest and Shortest only change the order in which an expression is searched for a match. They are not filters that eliminate potential matches.

As noted in the documentation:

If no explicit Shortest or Longest is given, ordinary expression patterns are normally effectively assumed to be Shortest[p], while string patterns are assumed to be Longest[p].

If several Shortest objects occur in the same expression, those that appear first are given higher priority to match shortest sequences.

If several Longest objects occur in the same expression, those that appear first are given higher priority to match longest sequences.

Realize then that all patterns already have a length priority as established by their type and position.

For example this always gives the same output:

Replace[{1, 2, 3, Pi, 4.4, 1/5}, {a__Integer, b__?NumericQ} :> {{a}, {b}}]
{{1}, {2, 3, π, 4.4, 1/5}}

This pattern is the same as {Shortest[a__Integer], Shortest[b__?NumericQ]}. The priorities determine the order in which different alignments are tried and the first match is returned. This process is always the same therefore the result is always the same. Using a different length priority results in a different search order and (possibly) a different match but it is always deterministic.

So what happens when we use ReplaceList? The same priorities are respected but now the pattern engine keeps looking for matches rather than stopping with the first one found, and all matches are returned in the order searched. Compare these outputs:

expr = {1, 2, 3, Pi, 4.4, 1/5};

ReplaceList[expr, {a__Integer, b__?NumericQ} :> {{a}, {b}}]

ReplaceList[expr, {Longest[a__Integer], b__?NumericQ} :> {{a}, {b}}]

{{{1}, {2, 3, π, 4.4, 1/5}}, {{1, 2}, {3, π, 4.4, 1/5}}, {{1, 2, 3}, {π, 4.4, 1/5}}}

{{{1, 2, 3}, {π, 4.4, 1/5}}, {{1, 2}, {3, π, 4.4, 1/5}}, {{1}, {2, 3, π, 4.4, 1/5}}}

Note that Longest is not ignored; it changes the search order, as it always does.

While the order is different the set of matches is the same; it has to be because the same input expression and the same patterns are used in each case. All that changes is the order in which the alignments are checked.

To illustrate this in different way without ReplaceList we can use a failed condition to echo back matching alignments.

expr /. {a__Integer, b__?NumericQ} /; Print[{a}, " ", {b}] -> Null;

{1} {2,3,π,4.4,1/5}

{1,2} {3,π,4.4,1/5}

{1,2,3} {π,4.4,1/5}

expr /. {Longest[a__Integer], b__?NumericQ} /; Print[{a}, " ", {b}] -> Null;

{1,2,3} {π,4.4,1/5}

{1,2} {3,π,4.4,1/5}

{1} {2,3,π,4.4,1/5}

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
  • This is inspiring, I always consider Longest and Shortest as some sort of selection criteria instead of the specification of search order...... So, may I additionally ask how to make things go like my original intention? I mean: use normal pattern matching style method to create a result like my additional information or Edmund's answer. So I mean are there any convenient way to add another level of selection after ReplaceList. – Wjx Aug 01 '16 at 06:03
  • 1
    I think maybe you can point things out more clearly so other viewers can grip the main point, or actually my main mistakes :(, at the first glimps. I think simply stating: Longest and Shortest ONLY change the search ORDER, not FILTER search RESULT . will be quite clear~ – Wjx Aug 01 '16 at 06:07
  • @Wjx The truth is that Mathematica patterns are usually pretty poor for working with long sequences. Experienced users have for years used the "trick" of converting expressions into a String and operating on that, as the string library better adapted to those cases. This is exactly what Alexey did for (121855) and if you search hard enough you will find a lot of other examples of that method. A number of functions that deal with sequences were added around version 10 and in some cases these do provide better solutions to old problems. – Mr.Wizard Aug 01 '16 at 06:33
  • @Wjx However they also can be a bit disappointing at times. I created a Question to address this to some degree; see (83325). The point is that IMHO long sequence operations are a present weakness in Mathematica. Nevertheless I shall think on this particular question and see if I can provide something interesting. Regarding the second comment I shall add a more prominent note to my answer as you suggest. – Mr.Wizard Aug 01 '16 at 06:36
  • Yes, I agree that long pattern matching with lists are quite disappointing...... I will think of using String next time I met this sort of problem~ thanks! – Wjx Aug 01 '16 at 06:42
  • But in a lot of cases, using StringPattern means to throw away the flexibility of Mathematica patterns I suppose? – Wjx Aug 01 '16 at 06:45
  • @Wjx String patterns actually allow quite a bit of flexibility as they can exist in a hybrid form; those elements which Mathematica cannot convert into Regular Expression form are retained as a higher level construct. I have been operating on the assumption (or poor recollection?) that introducing these exceptions caused string patterns to slow down to a level similar to the native patterns but Alexey questioned this and I need to explore it further. (See the comments under 121855 linked above.) – Mr.Wizard Aug 01 '16 at 06:49
  • But If I want to do calculations or manipulation while matching, string pattern will not be suitable I suppose? When doing purely structral things It seems StringPattern will be greatly effective. But in this serie of question I asked, yes, using String will be greatly effective as it won't require any numerical or symbolical computation on the match. I mean, Extended use of ToExpression is not good, at least from my first thought...... Am I correct this time? Are there any examples where String approach will still be more effective even with ToExpression inside? – Wjx Aug 01 '16 at 06:57
  • I've seen your links under 121855 (http://stackoverflow.com/questions/8484299/patterntest-not-optimized/8485700#8485700), and I found another small problem: If I know the pattern matching will go correctly without multiple tests, can I tell the pattern matcher do test only once for a single element? This may speed things up a lot I suppose? Or should I open another question for this? – Wjx Aug 01 '16 at 07:06
  • @Wjx As far as I know there is no way to do that. (I am not including memoization of a test function which is a somewhat different mater.) That is the point of that old question of mine, and I haven't come across a "solution" since then. If you wish to post a new question about this you have my support and I think it deserves more attention. – Mr.Wizard Aug 01 '16 at 07:38
  • Check this question, will this post clarify our points? :) – Wjx Aug 01 '16 at 13:20
  • Also, have you checked this question? I've updated a new solution, have you got any better solution? – Wjx Aug 01 '16 at 13:21
  • @Wjx I'll look at both later in detail. The existing vote on your self-answer is mine, by the way. I did think about that problem but I never came up with a satisfying method. Likely I shall learn something for your own solution and I look forward to that. – Mr.Wizard Aug 01 '16 at 22:51
  • Thanks for your appreciation :) – Wjx Aug 01 '16 at 23:58
3

As a workaround, you can use Select to filter down the results of ReplaceList to the set you want by using MatchQ and SequencePosition.

v = {5, 1, 2, 1, 2, 1, 2, 1, 2, 4, 3, 3, 3, 3, 3, 3, 10};
rl = ReplaceList[v, {Shortest[pre___, 3], 
    Longest[Repeated[Shortest[rep__, 1], {2, Infinity}]], 
    Shortest[inc___, 2]} :> {{pre}, {rep}, {inc}}];

Then

Select[
 Not[MatchQ[#[[2]], {Repeated[x__, {2, Infinity}]}]] &&
 Max@Flatten@SequencePosition[#[[1]], #[[2]]] != Length@#[[1]] && 
 Min@Flatten@SequencePosition[#[[3]], #[[2]]] != 1 &
]@rl

(*
{{{5}, {1, 2}, {4, 3, 3, 3, 3, 3, 3, 10}}, 
 {{5, 1}, {2, 1}, {2, 4, 3, 3, 3, 3, 3, 3, 10}},
 {{5, 1, 2, 1, 2, 1, 2, 1, 2, 4}, {3}, {10}}}
*)

The ReplaceList cases of interest are those with no repeated sequences in rep__, have pre___ that does not end in rep__'s pattern, and have inc___ that does not begin in rep__'s pattern. Note that a third possible match has been found that was not initially considered by eye.

Hope this helps.

Edmund
  • 42,267
  • 3
  • 51
  • 143
  • En, Actually I know how to do it by select, just want to ask a question more about the property of ReplaceList itself......And, you may also add one select condition to eliminate something like 2},{1,2},{1...... – Wjx Jul 05 '16 at 12:53
  • also, in more complex situations involving other forms of Longest and Shortest, I think it will be quite hard to implement this. – Wjx Jul 05 '16 at 12:58
  • @Wjx The second find of {2, 1} matches the conditions. As long as you are seeking to isolate a repeating pattern in your list the above should work. – Edmund Jul 05 '16 at 15:05