for example f[{1,2,3,4},{2,3}]=True, f[{1,2,3,4},{2,3,4}]=True but f[{1,2,3,4},{2,4}]=False. Which function should I use because I don't want to reinvent the wheel.
Thanks in advance
4 Answers
It really depends on how fast you need it and how many elements your list will have. Let's assume the worst:
list = Range[1000000];
contain = {1, 2, 3, 4, 5, 6};
Checking fun from Lotus
fun[list, contain] // RepeatedTiming
(* {0.27, True} *)
and f from kglr
f[list, contain] // RepeatedTiming
(* {0.043, True} *)
Here is a version that is again an order of magnitude faster, although it looks awful
h[list_, sub_] := With[{l = Length[sub]},
Catch[Developer`PartitionMap[If[# === sub, Throw[True]] &, list, l, 1]] === True
]
h[list, contain] // RepeatedTiming
(* {0.00271, True} *)
However
As you see, my example was made to test the influence of a large input list and an early accepting contain. I should note when contain is located at the far end of list, @kglr's version works best and it seems to have a constant running time.
Final notes
I tried to find a simple algorithm, that matches the speed of SequencePosition and is in most cases better. A satisfactory solution seems to be to go linear through the list and test if the current element is equal to the first element in the sub list. Only if yes, we compare the full sublist.
In high-level Mathematica, this still cannot compete but when we compile it down to C, then this seems to be a fast solution. A clear disadvantage of this solution is that it only works with typed lists (here integers).
uglyC = Compile[{{list, _Integer, 1}, {sub, _Integer, 1}},
With[{l = Length[sub], first = sub[[1]]},
Do[
If[list[[i]] === first &&
list[[i ;; i + Length[sub] - 1]] === sub,
Return[True]
], {i, 1, Length[list] - Length[sub] + 1}
] === True
], CompilationTarget -> "C", RuntimeOptions -> "Speed"
]
Let's do some test-cases with a random large list
list = RandomInteger[100, 1000000];
First we try a long sublist that is not part. I need to use AbsoluteTiming for the compiled code to get meaningful results
f[list, Range[50, 150]] // RepeatedTiming
(* {0.0057, False} *)
Median@Table[First@AbsoluteTiming[uglyC[list, Range[50, 150]]], {50}]
(* 0.004662 *)
Sublists that match, but were the match is at the very end of list will have an equivalent runtime. However, the closer the match comes to the front of list, the faster the Do loop will be
f[list, list[[100 ;; 110]]] // RepeatedTiming
(* {0.0057, True} *)
Median@Table[First@AbsoluteTiming[uglyC[list, list[[100 ;; 110]]]], {50}]
(* 5.*10^-6 *)
I have not tested all scenarios and I believe the ugly compiled code should not be used, if it is not absolutely time-critical. A simple SequencePosition should be preferred.
- 112,764
- 7
- 263
- 474
-
1On my machine, kglr's
f2seems to be significantly slower thanf1. I think it's also important to examine a negative case, such as{889, 893, 894, 895}.h's construction seems like it would have quite a bit of difficulty gettingFalseresults quickly. – eyorble May 29 '18 at 10:18 -
@eyorble Yes, that was what I tried to express in my last section. All mapping methods check each sublist and the worst case is when the sublist is not in the original list. For this case, kglr's approach still works fast while all others drop like hell. A fast solution could theoretically be done in a
Doloop, but I don't get this fast enough (not without compiling at least). – halirutan May 29 '18 at 10:40
f1 = SequencePosition[##] != {}&;
f1b = Length[SequencePosition[##]] > 0&; (* thanks: @Henrik Schumacher *)
f1[{1, 2, 3, 4}, #] & /@ {{2, 3}, {2, 3, 4}, {2, 4},{2, 3, 1}}
{True, True, False, False}
f2 = MemberQ[Subsequences[#, {Length @ #2}],#2]&;
f2[{1, 2, 3, 4},#] & /@ {{2, 3}, {2, 3, 4}, {2, 4}, {2, 3, 1}}
{True, True, False, False}
Two variations on halirutan's approach:
ClearAll[f3,f4]
f3[lst_,sub_]:= Or @@ BlockMap[# === sub&, lst, Length@sub, 1]
f4[lst_,sub_]:=Catch[BlockMap[If[# === sub, Throw[True]]&, lst, Length@sub, 1]] === True
- 394,356
- 18
- 477
- 896
-
It is often a bit more efficient testing lists for length
>0instead of comparing agains{}.Length[SequencePosition[##]] > 0 &should be a bit faster thanf1. – Henrik Schumacher May 29 '18 at 10:27 -
1Nice, now we have cyclic references between our posts :) I still like your
SequencePositionbest. It's clear, short and should work for a wide range of cases. +1 – halirutan May 29 '18 at 10:42 -
We may use pattern matching:
f[a_, {ss__}] := MatchQ[a, {___, ss, ___}]
Testing:
f[{1, 2, 3, 4}, {2, 3}] (* True *)
f[{1, 2, 3, 4}, {2, 3, 4}] (* True *)
f[{1, 2, 3, 4}, {2, 4}] (* False *)
This should work in any version of Mathematica and it is nearly equivalent in performance to SequencePosition: (as measured in version 10.1 under Windows)
f1[Range[10000], {5000, 5001, 5002}] // RepeatedTiming
f[Range[10000], {5000, 5001, 5002}] // RepeatedTiming
{0.000444, True}
{0.000475, True}
- 271,378
- 34
- 587
- 1,371
How about this ?
fun[list1_, list2_] := Module[{n, list3},
n = Length[list2];
list3 = Partition[list1, n, 1];
MemberQ[list3, list2]
]
In[10]:= fun[{1, 2, 3, 4}, {2, 3}]
Out[10]= True
In[8]:= fun[{1, 2, 3, 4}, {2, 3, 4}]
Out[8]= True
In[9]:= fun[{1, 2, 3, 4}, {2, 4}]
Out[9]= False
- 2,671
- 11
- 10
f[{1,2,3,4},{3,2}]? – halirutan May 29 '18 at 09:39SequencePosition[##]!={}&? – kglr May 29 '18 at 09:42