I have a set of say 100 numbers {1,3,7,11,19,...3971}. All elements are previously determined. I want to check whether 376 belongs to this set or not. what is the fastest way?
Thanks
4 Answers
As andre notes this can be done simply with MemberQ. However, the set you show is ordered, so other methods may be faster. It probably won't matter for a set of "say 100 numbers" but it can make a big difference in longer sets.
Starting with a set of ordered unique elements:
set = Union @ RandomInteger[1*^7, 1*^6];
Timing using MemberQ:
Do[MemberQ[set, i], {i, 1, 1*^7, 77777}] // Timing // First
5.834
Timing using BinarySearch from the Combinatorica package:
Needs["Combinatorica`"]
Do[IntegerQ @ BinarySearch[set, i], {i, 1, 1*^7, 77777}] // Timing // First
0.015
Also, if you are going to perform this test repeatedly, or if the set is not ordered, it is worth building a hash table:
rls = Dispatch @ Append[Thread[set -> True], _ -> False];
Now with a denser sampling:
Do[IntegerQ @ BinarySearch[set, i], {i, 1, 1*^7, 500}] // Timing // First
Do[Replace[i, rls], {i, 1, 1*^7, 500}] // Timing // First
1.7480.015
Of course if you can test them all at once it's even better (note very dense sampling):
Do[Replace[i, rls], {i, 1, 1*^7, 15}] // Timing // First
Replace[Range[1, 1*^7, 15], rls, {1}] // Timing // First
0.6710.39
Now, if all your elements are machine-size positive integers we can take this farther by building an array, then extracting values with Part:
sa = SparseArray[Partition[set, 1] -> True, 1*^7, False];
sa[[ Range[1, 1*^7, 15] ]] // Timing // First
0.0268
This is about 1.1 million times faster than MemberQ on this set. This requires that each of the test elements is within the set or you will get Part:partw error messages. You could however Clip the input, setting out-of-bounds values to a known-False position. There is overhead (~0.34 second) in building the SparseArray but once that is complete element tests are very fast.
-
1You need to add
IntegerQto the output ofBinarySearchto get a Boolean result. – rm -rf Feb 05 '14 at 19:53 -
-
@Mr.Wizard I just love it when a Q turns into a benchmarking competition :D +1 – Sektor Feb 05 '14 at 20:27
-
1
-
1@Mr.Wizard The sparse array approach is fast, but you're making an implicit assumption that all the keys being searched are less than the max value in the list. If the integer being checked is larger, then Part will complain. Of course, you can rectify this with a check, but that will probably also kill all the speed gains and perhaps make it slower than the rest of the approaches. – rm -rf Feb 05 '14 at 20:37
-
@rm-rf True, and I should have stated this. One could
Clipoutliers to a known-Falseposition to handle it efficiently I think, but I've spent enough time on this problem now. Also, one could consider a non-sparse array or even a bitmask as alternatives... I'll leave those to someone else. – Mr.Wizard Feb 05 '14 at 20:50 -
1I'm trying to apply the SparseArray method. Apprently, under Matheamatica v10.0.2, your code for creating the SparseArray, returns now a SparseArray with
Length@setdimensions. AThread[set->True]is necessary, in some way according to the actual documentation. I'm not sure if it's an implementation bug, a documentation bug, a previous-version solved bug... – unlikely Jan 15 '15 at 08:36 -
@unlikely Apparently this code didn't work in version 7 either so I made a mistake that no one called me on until you! Thanks! – Mr.Wizard Jan 15 '15 at 13:25
-
I think it would be nice to note here that
MemberQunpacks and that it can easily be compiled. I suppose reference to Leonids compiled function would also be nice (somehow I couldn't easily get it to work). – Jacob Akkerboom Jun 03 '15 at 17:09 -
-
@Jacob The reason I did not sue Leonid's code is that as written it does not work as I want as one cannot directly tell from the result if an element is present or not. – Mr.Wizard Jun 03 '15 at 17:31
If you're going to do several lookups repeatedly in a single set, using Associations in version 10 is orders of magnitude faster than BinarySearch. You can try it out if you have Mathematica 10 for Raspberry Pi (publicly available) or the pre-release version.
set = Union @ RandomInteger[1*^7, 1*^6];
assoc = <|Thread[set -> True]|>; (* One time operation *)
Do[Lookup[assoc, i, False], {i, 1, 1*^7, 77777}] // Timing // First
(* 0.000152 *)
Here's the timings for BinarySearch on my computer:
Do[IntegerQ@BinarySearch[set, i], {i, 1, 1*^7, 77777}] // Timing // First
(* 0.016991 *)
which is about 100 times slower!
- 88,781
- 21
- 293
- 472
-
I was about to add a
Dispatchtable method to my answer, but that kind of took the wind out of my sails. I suppose I still should for pre-v10 users. – Mr.Wizard Feb 05 '14 at 20:03 -
@Mr.Wizard Yes,
Dispatchshould still be useful.Associationis basicallyDispatchon steroids – rm -rf Feb 05 '14 at 20:05 -
Do you know how the implementations (low level) differ? How is the performance in a direct comparison? – Mr.Wizard Feb 05 '14 at 20:06
-
@Mr.Wizard On my machine,
Associationis about twice as fast asDispatch. I believe one of the developers mentioned somewhere thatAssociationis highly optimized and implemented at a low level in the kernel, but I'm not privy to the details.Associationserves the (much needed) role of key-value pair/dictionary, for which we were using rules up until now. – rm -rf Feb 05 '14 at 20:09 -
I notice that you did not use the
<| |>syntax here; a matter of clarity, preference, or need? – Mr.Wizard Feb 05 '14 at 20:18 -
@Mr.Wizard
Associationbeing very new, the<||>syntax doesn't come naturally yet... :) I've updated the post. – rm -rf Feb 05 '14 at 20:20 -
It sure is a lot cleaner than
Dispatchthat way! I added a new method to my answer, racing toward ultimate speed. (And I hope other v10/Pi users come by and vote for this answer soon.) – Mr.Wizard Feb 05 '14 at 20:24 -
1+1. Out of curiosity, are these timings on your Raspberry Pi? If so what is the processor speed? – RunnyKine Feb 05 '14 at 20:32
-
@RunnyKine My "Raspberry Pi" has a quad core 2.6 GHz i7 processor with 16 GB memory ;) – rm -rf Feb 05 '14 at 20:38
-
2@RunnyKine I benchmarked this on my Raspberry Pi and these are the results: For a set with 10^7 elements the Pi runs out of memory. So I reduced it to 10^6 elements.
BinarySearchtakes about the same time asLookup(0.002725 and 0.003627 seconds, respectively), but the initialAssociationtakes 21 seconds, so it's slower in most cases. – shrx Feb 05 '14 at 22:04 -
3You can easily
Compilebinary search, to get competitive performance (I've done that many times). You can also construct a massive compiled binary search, which I believe can be even quite a bit faster. But of course, +1. – Leonid Shifrin Feb 05 '14 at 22:37 -
8@rm-rf The main advantages of
Associationw.r.t. sayDownValues- based hash - table, apart from speed, are that it is stateless / immutable and cheap to copy. This allowsAssociationto play well with the core Mathematica constructs. OTOH, it has also advantages overDispatch, since you can add new or update old key-value pairs in constant time, getting a newAssociation- whileDispatchyou can't efficiently update once it is formed. – Leonid Shifrin Feb 05 '14 at 22:42 -
-
Here is some time comparition between Dispatch and Association, creating a memberQFunction using @rm-rf and @Mr.Wizard solutions.
memberQFunction1[set_]:=Module[{f,ass},
ass=<|Thread[set -> True]|>;
f[x_]:=Lookup[ass,x,False];
f
]
memberQFunction2[set_]:=Module[{f,rule},
rule=Dispatch@Append[Thread[set->True],_-> False];
f[x_]:=x/.rule;
f
]
memberQFunction3[set_]:=Module[{f,rule},
rule=Dispatch@Map[#->True&,set];
f[x_]:=If[x/.rule,True,False,False];
f
]
Now let's create our hashed functions:
set = DeleteDuplicates[RandomInteger[1000000, 1000000]];
setSample = RandomSample[set, 100000];
mQ1 = memberQFunction1[set];
mQ2 = memberQFunction2[set];
mQ3 = memberQFunction3[set];
Testing it we get:
mQ1 /@ setSample // AbsoluteTiming // First
mQ2 /@ setSample // AbsoluteTiming // First
mQ3 /@ setSample // AbsoluteTiming // First
0.156000 (*Association*) 0.202800 (*Dispatch1*) 0.218400 (*Dispatch2*)
Association wins!
I hopped that the new MemberQ operator form would be Hashed, just like Nearest does, creating a NearestFunction, but it's not the case, so memberQFunction (with Association) is a good alternative.
- 26,275
- 6
- 76
- 166
V 12.1 introduced CreateDataStructure:
A link to its many members: DataStructures
A suitable choice for the question at hand would be SortedMultiset
1. Structure
ds = CreateDataStructure["SortedMultiset"];
Scan[ds["Insert", #] &, Range @ 10]
ds["Elements"]
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
ds["Visualization"]
2. Timings
list = Union @ RandomInteger[1*^7, 1*^6];
ds = CreateDataStructure["SortedMultiset"];
Scan[ds["Insert", #] &, list] // Timing // First
5.7673
Once inserted, access to its elements is ultra-fast:
Do[ds["MemberQ", i], {i, 1, 1*^7, 77777}] // Timing // First
0.00152
Compare to "normal" MemberQ:
Do[MemberQ[list, i], {i, 1, 1*^7, 77777}] // Timing // First
4.74134
- 67,911
- 5
- 60
- 168
-
1Interesting. But you will need quite many queries to ammortize the huge insertion time of the tree. And I doubt that it will ever be more efficient than a
Sortcombined with a binary search, at least for simple data types. – Henrik Schumacher Nov 17 '23 at 16:50 -
1
Scan[ds["Insert", #] &, list]must have an enormous calling overhead. One can try to build the tree also withCreateDataStructure["SortedMultiset", list]. I had hoped that it would improve the timings, but it barely did so. Which tells me that the data structure is badly designed. I guess this constructor is not implemented on the C/C++ side of the backend. – Henrik Schumacher Nov 17 '23 at 16:50 -
But you might also consider that a SINGLE
MemberQ-Query takes 4.74 seconds. Doesn't amortization come soon enough? On the other hand I agree that a compiled binary search would be more efficient, but not everybody has the expertise to write such code. – eldo Nov 17 '23 at 16:59 -
nf = Nearest[list];found = Flatten[nf[Range[1, 1*^7, 7777], {1, 0}]];might be a good compromise between built time and lookup time. It outputs all the elements found and not the boolean flags. But that can easily be converted. – Henrik Schumacher Nov 17 '23 at 17:03

MemberQ[listOfElements,376]– andre314 Feb 05 '14 at 19:54