8

I want to compare 2 lists. One is very large and I need to compare many times.

Lets say Length[list1] < Length[list2].

I need to know how many times list1 occures in list2.

list1 = {1, 0}
list2 = {1, 0, 1, 0, 1, 1}

So the result would be 2 (at position 1 and 3).

Furthermore list1 can countain wildcards

list1 = {1, 2}

Where 2 is a wildcard, so with list2 from above the result would be 3 (at position 1,3,5).

I solved this with a few For loops. It works but is really slow. I need to speed it up very much.

What I got:

With list1 as lMask and list2 as BitData

GetFits[i_] := Block[{icount, lMask},
  icount = 0;
  lMask = IntegerDigits[i, 3];
  If[lMask[[-1]] != 2 ,
   If[ lMask[[1]] != 2,
    For[ii = 1, ii <= Length[BitData] + 1 - Length[lMask], ii++,
     If[FitAt[lMask, ii] == 1, icount++;];
     ];
    icount
    , -1]
   , -1]
  ]


FitAt[lMask_, iPos_] := (For[i = 1, i <= Length[lMask], i++,
   If[lMask[[i]] != 2,
     If[lMask[[i]] != BitData[[i + iPos - 1]],
       Return[0]
       ];
     ];
   ];
  1)
kglr
  • 394,356
  • 18
  • 477
  • 896

5 Answers5

7
list2 = {1, 0, 1, 0, 1, 1};
list1 = {1, 0};

p2 = Partition[list2, Length[list1], 1];
Count[p2, list1]
Flatten@Position[p2, list1]

2

{1, 3}

Now with 2 as a wildcard.

list1 = {1, 2};
list1 = list1 /. {2 -> _};

and same again

p2 = Partition[list2, Length[list1], 1];
Count[p2, list1]
Flatten@Position[p2, list1]

3

{1, 3, 5}

Chris Degnen
  • 30,927
  • 2
  • 54
  • 108
6

I assume your Lists are of numbers. You can convert them to Strings and use StringCount or StringCases:

MyList1 = {1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1};
MyTest = {1, 0};

StringCount[StringJoin[ToString /@ MyList1], 
StringJoin[ToString /@ MyTest]]

Length[StringCases[StringJoin[ToString /@ MyList1], 
StringJoin[ToString /@ MyTest]]]

This has the added benefit that you can choose to allow or disallow overlapping cases, with Overlaps -> True or Overlaps -> False within StringCount.

xyz
  • 605
  • 4
  • 38
  • 117
David G. Stork
  • 41,180
  • 3
  • 34
  • 96
4

In versions 10+, there is SequenceCount that does exactly what is needed:

SequenceCount[list,sub] gives a count of the number of times sub appears as a sublist of list.

list1 = {1, 0};
list2 = {1, 0, 1, 0, 1, 1};

SequenceCount[list2, list1]

2

It also works with patterns,

SequenceCount[list2, {1, _}]

3

Note: The last one could be very slow. See:Performance problems in new Sequence functions.

kglr
  • 394,356
  • 18
  • 477
  • 896
3

If your lists are actually binary then pre-partitioning can be done in an efficient way:

 biglist = RandomInteger[{0, 1}, 2000];
 Clear[partition];
 partition[len_] := 
       partition[len]  = 
         FromDigits[#, 2] & /@ Partition[biglist , len , 1  ];
 findsub[small_List  ] := 
      Flatten@Position[ partition[Length[small]]   ,  
         FromDigits[small, 2] , 1, Heads -> False]

 findsub[{1, 1, 0, 1, 1, 0, 1, 0}]

{146, 677, 699, 1220, 1238, 1286, 1663, 1717}

 biglist[[146 ;; 146 + 7]]

{1, 1, 0, 1, 1, 0, 1, 0}

wildcard version:

 findsub[small_List ] := Module[{
   s = Flatten@Position[ small , Except[2] , {1}, Heads -> False ] },
   Flatten@Position[ partition[Length[small]]   ,  x_ /;
       IntegerDigits[x, 2, Length[small]][[s]] == small[[s]]  , 1, 
           Heads -> False] ]
george2079
  • 38,913
  • 1
  • 43
  • 110
  • If the lists are not binary, one can interpret the parts (of length Length@smallist) as numbers with an appropriate base. Then this should work also. – mgamer Nov 08 '14 at 08:00
2

Not the best method, but an option to explore the Slot function

list1={1,0};list2={1,0,1,0,1,1};
Flatten[Position[({list2[[#1]],list2[[#1+1]]}&)/@Range[Length[list2]-1],list1]]
Length[%]

{1,3}

2

list1={1,2};list1=list1/.{2->_};list2={1,0,1,0,1,1};
Flatten[Position[({list2[[#1]],list2[[#1+1]]}&)/@Range[Length[list2]-1],list1]]
Length[%]

{1,3,5}

3

Luciano
  • 253
  • 2
  • 9
LCarvalho
  • 9,233
  • 4
  • 40
  • 96