12

In a list of positive integers, I would like to find the largest n such that Range[n] is a subset of the list. So for example given {1,2,3,5,7} I want the answer to be 3, while if the list is {2,4,5}, the answer should be 0.

I can think of various algorithms that involve looping over the list, but there must be a better way. In my particular case, the lists are pretty small (well under 100 elements).

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
rogerl
  • 4,209
  • 3
  • 27
  • 42

6 Answers6

10

If speed is important, I think that this is a typical example in which compilation of a straightforward algorithm gives better results than making use of the advanced functions of Mathematica. Here is such a compiled function:

f= Compile[{{lst, _Integer, 1}}, Module[{result=0,counting=0},
  Do[ 
    If[n==counting+1,
      counting=n,
      If[counting>result, result=counting]; counting=If[n==1, 1, 0] ],
  {n, lst}];
  Max[{result, counting}]]]

On my computer, for a list of 10^6 positive integers, this function is more than 20 times faster than the splitting technique:

SeedRandom[42];
list=RandomInteger[{1,10},10^6];
Select[Split[list,#1+1==#2&],#[[1]]==1&]//Max // Timing
f[list] // Timing
fC[list] // Timing

(* {2.418016,5} *)
(* {0.109201,5} *)
(* {0.015600,5} *)

Here the function fC is identical to f, but compiled with the option CompilationTarget->"C". So when available, this gives another factor about 9.

Fred Simons
  • 10,181
  • 18
  • 49
7

This seems to be substantially faster.

extend[list_List, pos_List /; Length[pos] > 0, depth_Integer] := 
 extend[list, 
  pos[[Flatten[
      Position[list[[DeleteCases[Flatten[pos], Length[list]] + 1]], 
       depth + 1]]]] + 1, depth + 1]

extend[list_List, {}, depth_Integer] := depth - 1

extend[list, Position[list, 1], 1]

Also, using Max gives negative infinity when there is no element 1 in the list.

This seems analogous to Boyer-Moore. Therefore, a further improvement would be to extend the first matching substring as much as possible (let's say the length of the substring is n), and then at the second matching substring check to see if it could be longer (by checking the element n positions after the start of the second matching substring), etc.

Karsten7
  • 27,448
  • 5
  • 73
  • 134
xcah
  • 538
  • 3
  • 7
  • Nice idea! But there seems to be some bugs. Your code cancelled with SeedRandom[42]; list = RandomInteger[{1, 4}, 100]; extend[list, Position[list, 1], 1]. Any idea if what's happening? – Dr. belisarius Dec 12 '14 at 06:24
  • Should have been Drop[Flatten[pos] + 1, -1] (since the last element is not extensible). Fixed that, but I still get a different answer than the code based on Split on a few test cases. I will look into that. – xcah Dec 12 '14 at 08:29
  • This needs to be reworked a little more, but the idea is right. The problem is that the Positions in the sublist need to be translated back to their position in the original list (when searching for the higher elements). – xcah Dec 12 '14 at 08:43
  • OK, I think that should do it. The Drop[..,-1] had to be changed to DeleteCases[..,Length[list]] becase we only want to delete the last of the matches with the value we are currently searching if it is the last element in the list. And the pos[[some_fun[pos]]] allows us to map positions in the sublist back to positions in the original list. Over a bunch of random examples, the output of the above matches the code based on Split above. – xcah Dec 12 '14 at 08:53
  • @xcah I have been notified of a pending edit to this post. Is the item you added different from the very first item in the list, addressed in (13529)? – Mr.Wizard Dec 12 '14 at 17:53
  • Yes, the changed between 9 and 10. And there is no documentation of how to compute the function that was present in 9 (this forced me to recompute a bunch of magic data). On 10, Hash["abcde"] yields 3335709116491347743. On 9, the same yields 8840055571401358647. As a result, I highly recommend never using Hash without specifying the algorithm. – xcah Dec 12 '14 at 18:31
  • @xcah Thanks for your contribution. I too cannot find a way to generate the old value. – Mr.Wizard Dec 12 '14 at 19:13
  • @xcah Would you please post a question asking for a way to generate the legacy hash value? I would like to know myself but since you brought the problem to my attention I think the question should come from you. – Mr.Wizard Dec 12 '14 at 19:30
  • @Mr.Wizard Done, here. – xcah Dec 12 '14 at 19:44
6
ClearAll[f1, f2]
f1 = Module[{i = 1}, While[SequencePosition[#, Range[i], 1] =!= {}, i++]; i-1]&;
f2 = Module[{i = 0}, While[SequencePosition[#, Range[++i], 1] =!= {}]; i-1]&;

f1 /@ {{1, 2, 3, 5, 7}, {3, 2, 1, 5, 7}, {3, 2, 7, 5}}

{3, 1, 0}

f2 /@ {{1, 2, 3, 5, 7}, {3, 2, 1, 5, 7}, {3, 2, 7, 5}}

{3, 1, 0}

Timings:

Both f1 and f2 are faster than all the methods (except possibly fC in Fred Simon's answer which I cannot use because I don't have a C compiler installed) posted so far.

SeedRandom[42];
list = RandomInteger[{1, 10}, 10^6];
rsel = Max @ Select[Split[list,#1+1==#2&],#[[1]]==1&] // Timing; (* belisarius *)
rext = extend[list, Position[list, 1], 1] // Timing; (* xcah *)
rf = f[list] // Timing; (* fred simons *)
rf1 = f1[list] //Timing;

Grid[{{"Select", "extend", "f", "f1"}, {rsel, rext, rf, rf1}}, Dividers->All]// TeXForm

$\begin{array}{|c|c|c|c|c|} \hline \text{Select} & \text{extend} & f & \text{f1} & \text{f2} \\ \hline \{1.453,5\} & \{0.137,5\} & \{0.039,5\} & \{0.008,5\} & \{0.008,5\} \\ \hline \end{array}$

With list = RandomInteger[{1, 1000}, 10^6];

$\begin{array}{|c|c|c|c|c|} \hline \text{Select} & \text{extend} & f & \text{f1} & \text{f2} \\ \hline \{1.68,2\} & \{0.047,2\} & \{0.039,2\} & \{0.012,2\} & \{0.011,2\} \\ \hline \end{array}$

kglr
  • 394,356
  • 18
  • 477
  • 896
4
list = {1, 2, 3, 5, 7, 4, 5, 6, 7, 1, 6, 7, 8, 9};

MaximalBy[Split[list, #2 - #1 == 1 &], Length]

{{4, 5, 6, 7}, {6, 7, 8, 9}}

MaximalBy[Split[list, #2 - #1 == 1 &], Length, 1]

{{4, 5, 6, 7}}

eldo
  • 67,911
  • 5
  • 60
  • 168
4
SeedRandom[42];
list = RandomInteger[{1, 7}, 1000];
Select[Split[list, #1 + 1 == #2 &], #[[1]] == 1 &] // Max
(* 3*)
Dr. belisarius
  • 115,881
  • 13
  • 203
  • 453
2

Using SequenceCases and ConsecutiveQ:

ConsecutiveQ = Most[#] == Rest[#] - 1 &;(*By Kuba*)

f = Max@Map[Length@If[Range[Last@#] === #, #, {}] &, Cases[Subsequences[#, {1, Length[#], 1}], _?ConsecutiveQ]] &;

l1 = {4, 2, 5}; l2 = {2, 4, 5}; l3 = {3, 2, 7, 5}; l4 = {1, 2, 3, 5, 7}; l5 = {3, 2, 1, 5, 7}; l6 = {1, 2, 3, 5, 7, 4, 5, 6, 7, 1, 6, 7, 8, 9};(By Eldo)

f /@ {l1, l2, l3, l4, l5, l6}

({0, 0, 0, 3, 1, 3})

E. Chan-López
  • 23,117
  • 3
  • 21
  • 44