7

Let L be a list of integers, and ⓝ denote an explicit integer.

I've thought the most natural way to tell whether ⓝ belongs to L is

In[1]  MemberQ[L, ⓝ]

But there are problems.

1>> If Length[L] is too big(around 10^9), then the code is indeed harmful. Mathematica automatically quits krenel(lose every definition).

2>> Try

 In[2]  MemberQ[Range[10^8],1]//Timing
 Out[2]  (* Takes about 3 seconds, too long *)

If we know ⓝ appears very early in L, then using MemberQ may be inefficient.

Now AnyTrue technique is the following :

 In[3]  AnyTrue[L, #==ⓝ&]

Introducing advantages :

  1. The code works well even in case Length[L] is too big.
  2. If ⓝ appears very early in L, then it takes very short time. (more precisely, the earlier, the shorter)
  3. Its usage is much more diverse than MemberQ.

So I'll never use MemberQ if Length[L] is too big or if I know ⓝ appears early in L.

But I found an advantages of MemberQ, it behaved faster for irregular data.

L=RandomSample[Range[10^8]];
 MemberQ[L, ⓝ]//Timing
 AnyTrue[L, #==ⓝ&]//Timing

There can be other better method. Can you tell me more about these kind of things ? I mean performance of various test(that can tell the existence of something in the list).

imida k
  • 4,285
  • 9
  • 17

4 Answers4

7

One possibility for lists of integers is to use Clip:

iMemberQ[l_, s_] := s == Max @ Clip[l, {s, s}, {s-1, s-1}]

For your examples:

L = Range[10^8];
iMemberQ[L, 1] //AbsoluteTiming
iMemberQ[L, 10^8] //AbsoluteTiming
iMemberQ[L, 10^9] //AbsoluteTiming

{0.729907, True}

{0.648677, True}

{0.631598, False}

L = RandomSample[Range[10^8]];
iMemberQ[L, 1] //AbsoluteTiming
iMemberQ[L, 10^8] //AbsoluteTiming
iMemberQ[L, 10^9] //AbsoluteTiming  

{0.651394, True}

{0.616799, True}

{0.613475, False}

If you need to do membership tests for multiple elements with the same list, you could consider using Nearest (Nearest is a little slow, but only needs to be done once):

SeedRandom[1];
L = RandomSample @ Range[10^8];

nf = Nearest[Sort @ L]; //AbsoluteTiming nMemberQ[nf_, s_] := Length[nf[s, {1, 0}]] > 0

nMemberQ[nf, 1] //AbsoluteTiming nMemberQ[nf, 10^8] //AbsoluteTiming nMemberQ[nf, -1] //AbsoluteTiming

{21.915, Null}

{0.000957, True}

{0.000041, True}

{0.000995, False}

Carl Woll
  • 130,679
  • 6
  • 243
  • 355
6
list1 = Range[10^8];
list2 = RandomInteger[10^10, 10^8];
list3 = RandomInteger[10, 10^8];

elem = 5;


MemberQ[list1, elem] // AbsoluteTiming
MemberQ[list2, elem] // AbsoluteTiming
MemberQ[list3, elem] // AbsoluteTiming

memberQCompiled = Compile[{{a, _Integer, 1}, {e, _Integer}}, MemberQ[a, e]];
memberQCompiled[list1, elem] // AbsoluteTiming
memberQCompiled[list2, elem] // AbsoluteTiming
memberQCompiled[list3, elem] // AbsoluteTiming


Random`Private`PositionsOf[list1, elem] =!= {} // AbsoluteTiming
Random`Private`PositionsOf[list2, elem] =!= {} // AbsoluteTiming
Random`Private`PositionsOf[list3, elem] =!= {} // AbsoluteTiming

(* Using librarylink, need a C++ compiler *)
isMember[list1, elem] // AbsoluteTiming
isMember[list2, elem] // AbsoluteTiming
isMember[list3, elem] // AbsoluteTiming

enter image description here

Needs["CCompilerDriver`"];
$CCompiler={"Compiler"->CCompilerDriver`GenericCCompiler`GenericCCompiler,"CompilerInstallation"->"C:/msys64/mingw64","CompilerName"->"g++.exe","CompileOptions"->"-O2"};

src="#include &quot;WolframLibrary.h&quot; #include <algorithm>

EXTERN_C DLLEXPORT bool is_member(WolframLibraryData libData, mint Argc, MArgument Args, MArgument Res) { mint in_data; MTensor in = MArgument_getMTensor(Args[0]); mint elem = MArgument_getInteger(Args[1]); in_data = libData->MTensor_getIntegerData(in); mint len = *libData->MTensor_getDimensions(in); bool ret = std::any_of(in_data, in_data + len, [&](int x) { return x == elem; }); MArgument_setBoolean(Res, ret); return 0; } ";

lib=CreateLibrary[src,"lib"]; isMember=LibraryFunctionLoad[lib,"is_member",{{Integer,1,"Shared"},Integer},True|False]; isMember[Range[3],#]&/@{2,4}

chyanog
  • 15,542
  • 3
  • 40
  • 78
2

You are much better off using Not[FreeQ[expr, form]] in place of MemberQ in almost all cases. MemberQ has a number of different problems, including the tests it uses for equality and the depth of its search into expr, and I don't remember the specifics of them, that you are justified in ignoring it completely. This is not to disparage the other answers offered to your question, which look interesting.

CElliott
  • 560
  • 2
  • 7
2

The problem with some functions and performance is that they unpack packed arrays. When the packed array is large, the unpacked array is larger and may exhaust the memory resources available. Functions such as MemberQ, FreeQ, FirstPosition, etc. unpack, but AnyTrue does not unpack. This is the principal difference in speed on an integer array.

Packed example:

ll = Range[10^7];

MemberQ[ll, 1] // RepeatedTiming AnyTrue[ll, # == 1 &] // RepeatedTiming ! FreeQ[ll, 1] // RepeatedTiming (* {0.462118, True} {6.2125210^-7, True} {0.421446, True} )

MemberQ[ll, 10^7] // RepeatedTiming AnyTrue[ll, # == 10^7 &] // RepeatedTiming ! FreeQ[ll, 10^7] // RepeatedTiming (* {0.680443, True} {6.57062, True} <-- AnyTrue pretty bad here! {0.660157, True} *)

Unpacked example:

lup = Developer`FromPackedArray@ll; // AbsoluteTiming
(*  {0.424071, Null}  *)

MemberQ[lup, 1] // RepeatedTiming AnyTrue[lup, # == 1 &] // RepeatedTiming ! FreeQ[lup, 1] // RepeatedTiming (* {4.3442910^-7, True} <-- All methods roughly equivalent {5.3030310^-7, True} <-- {5.7768510^-7, True} <-- )

MemberQ[lup, 10^7] // RepeatedTiming AnyTrue[lup, # == 10^7 &] // RepeatedTiming ! FreeQ[lup, 10^7] // RepeatedTiming (* {0.223027, True} {5.92986, True} <-- AnyTrue is still pretty bad here! {0.231383, True} *)

The timing for unpacking is roughly equal to the timing of MemberQ on the packed example.

Michael E2
  • 235,386
  • 17
  • 334
  • 747