18

Is there a short and easier way to count number of sequences in a list?

Let's say I have three lists:

list1 = {0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0}

list2 = {0,0,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1}

list3 = {1,1,1,0,0,0,0,0,1,1,0,0,1,1,1,1,0,0,0}

There is one sub-list of 1's at list1, 2 sub-lists (two islands) of 1's at list2 and 3 sub-lists of 1 at list3. So counting function should return to me 1 for list1, 2 for list2 and 3 for list3. I am not interested in number of 1's in a sub-list.

I am using a previous Mathematica version, so I cannot use SequenceCount function. Thank you.

gurluk
  • 475
  • 2
  • 9

14 Answers14

20

Here is one way:

oneseqs[lst_] := Total @ Unitize @ Total[Split[lst], {2}]
Leonid Shifrin
  • 114,335
  • 15
  • 329
  • 420
18

This s/b quite quick (particularly with long lists):

MorphologicalEulerNumber[Image@{list}]

And this is even faster...

Length[With[{d = Differences@Prepend[list, 0]}, Pick[d, d, 1]]]

This seems quite quick:

f = Compile[{{z, _Integer, 1}}, Module[{c = True, cnt = 0},
   Do[
    If[c && x == 1, cnt++; c = False; Continue[]];
    If[x == 0, c = True];,
    {x, z}];
   cnt]];
ciao
  • 25,774
  • 2
  • 58
  • 139
13

Another method:

oneseqs[list_] := Count[Append[list, 0] - Prepend[list, 0], 1]

Technically, this counts the number of times the sequence shifts from 0 to 1, but that's effectively the same as the number of blocks of 1's.

EDIT: Here are some performance figures for the four methods proposed thus far:

@Leonard Shifrin:

randlist = RandomInteger[1, 10^6];
Timing[Total@Unitize@Total[Split[randlist], {2}]]

(* {0.834908, 249707} *)

@Michael Seifert:

Timing[Count[Append[randlist, 0] - Prepend[randlist, 0], 1]]  

(* {0.077662, 249707} *)

@SquareOne:

Timing[StringJoin[ToString /@ randlist] // StringCount[#, "1" ..] &] 

(* {1.362467, 249707} *)

@algohi, appropriately modified:

count[list_, n_] := Total@Cases[Split[list], {n ..} :> 1] 
Timing[count[randlist, 1]]  

(* {0.304680, 249707} *)
Michael Seifert
  • 15,208
  • 31
  • 68
  • 1
    Interestingly, I wanted to test the timing of the new (v10.1) function SequenceCount but for it seems it takes ... infinite time. Do you get the same behaviour ? : randlist = RandomInteger[1, 10^4] ; SequenceCount[randlist, {1 ..}]. (It works for 10^3). – SquareOne Jul 02 '15 at 21:59
  • 1
    @SquareOne Please read (83325) – Mr.Wizard Jul 03 '15 at 01:18
  • @Mr.Wizard I missed that post. It's a shame ... And why aren't these functions coded at low-level ... I wonder what they would do without MSE contributors. – SquareOne Jul 03 '15 at 07:43
  • @SquareOne Directorial emphasis appears to be strongly in favor of more functionality over good functionality (fast and bug-free). This is a shame. Nevertheless there are improvements being made. SequencePosition is much faster for verbatim (pattern-free) sequences thanks to the low-level-coded GroupTheory`Tools`SublistPosition which it uses in such cases. – Mr.Wizard Jul 03 '15 at 13:18
11

I am late to the party but here is my terse contribution:

f1 = Tr @ Split[#][[All, 1]] &

This is quite a bit faster than Leonid's oneseqs:

x = RandomInteger[1, 500000];

oneseqs[x] // RepeatedTiming
f1[x]      // RepeatedTiming
{0.2992, 125166}

{0.0697, 125166}

For speed I propose:

f2 = Length@# - Tr@# & @ UnitStep @ Differences[# ~Append~ 0] &

This is somewhat faster than rasher/ciao's Differences implementation:

Length[With[{d = Differences@Prepend[x, 0]}, Pick[d, d, 1]]] // RepeatedTiming

f2[x] // RepeatedTiming
{0.00900, 125166}

{0.00782, 125166}

Update

Seeking greater speed I considered working in the binary realm. For that:

f3 = Tr @ IntegerDigits[BitShiftLeft@# ~BitXor~ #, 2]/2 &

The array must be converted to an integer first but even with that overhead it is faster:

x = RandomInteger[1, 1*^7];  (* larger array *)

f2[x]                 // RepeatedTiming

f3 @ FromDigits[x, 2] // RepeatedTiming
{0.155, 2498958}

{0.051, 2498958}

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
  • Haha. I was looking at this on my phone and saw someone was using the Tr function. I just knew it was MrWizard before I even got finished reading the answer. – kale Jul 03 '15 at 02:20
  • Nice optimization you sly dog! +1 – ciao Jul 03 '15 at 04:03
  • @ciao @Mr.Wizard It seems it's even a little bit faster if you use Join instead of Append or Prepend. – SquareOne Jul 03 '15 at 08:48
  • @SquareOne On a packed integer array Join and and Append appear to perform similarly in 10.1.0. What specific code are you using? – Mr.Wizard Jul 03 '15 at 13:22
  • @ciao Thanks! By the way which nickname would you prefer I use? – Mr.Wizard Jul 03 '15 at 13:38
  • @Mr.Wizard I replaced #~Append~0 with #~Join~{0}. I tested 10^6, 10^7 random integers, Join is always 1-2% faster. – SquareOne Jul 03 '15 at 13:44
  • 1
    @Mr.Wizard: call me whatever you'd like - I'm called lots of things ;-} – ciao Jul 03 '15 at 22:44
8

After all these ingenious and interesting solutions only some simple remarks.

Jacob Akkerboom's solution shows that totally readable is a subjective concept. It took me some time before I really understood it (but it is very hot here today). In fact, his solution is an ingenious implementation of what could be done with Fold as well, thereby producing another (slow) solution for this problem:

clusterFu2[list_] := Module[{d = 0}, Fold[(If[#1 < #2, d++]; #2) &, 0, list]; d]

A timing:

randlist = RandomInteger[1, 5*10^6];
clusterFu[randlist] // Timing
clusterFu2[randlist] // Timing
(*
{1.98121, 1251006}
{7.37885, 1251006}
*)

So, to my surprise, Jacob's solution is about 3.5 times as fast as the construction with Fold.

But the Fold construction can be easily compiled:

clusterFu3 = Compile[{{list, _Integer, 1}}, Module[{last = 0, result = 0},
   Do[If[last < z, result++]; last = z, {z, list}]; result],
   CompilationTarget -> "C"];

Now it is even faster than Mr.Wizard's intruiging function f2:

clusterFu[randlist] // Timing
clusterFu3[randlist] // Timing
f2[randlist] // Timing

(*
{1.96561, 1251006}
{0.0312002, 1251006}
{0.0936006, 1251006}
*)
Fred Simons
  • 10,181
  • 18
  • 49
  • I hope I didn't mislead you with my "totally readable" comment. It was meant as a joke, because the solution is so unconventional. I hope it was a nice puzzle :). I think the reason why it is faster than the Fold solution is quite subtle, but still it makes some sense to me. +1, especially for the Compile solution. – Jacob Akkerboom Jul 03 '15 at 15:17
  • Hi @Jacob. I am always too serious ... But indeed, your solution is, at least for me, very unusual. Essentially, you have an If statement, of which you placed the second argument as part 0 of your uneveluated expression and the third argument as part 1, and then use the Part function for evaluating al these If's for the consecutive elements of list. Wonderful, deep, and as such a nice puzzle as well. – Fred Simons Jul 03 '15 at 15:39
  • I can't compile to C (still!) but I'll trust your timings and give a +1. I updated my answer to squeeze as much as possible from top-level Mathematica code and I think I got close to your clusterFu3 based the ratio of timings. – Mr.Wizard Jul 03 '15 at 19:28
  • I pushed it a bit further and it is now 3X faster than f2 so about the same as your code. – Mr.Wizard Jul 03 '15 at 20:48
  • @Mr.Wizard According to my tests, your f3 is faster (10-20)% than clusterFu3 for 10^4-10^5 rand.int., slower by (5-10)% only for 10^6-10^7 rand., and for 10^8-10¨^9 it's 50-80% slower. – SquareOne Jul 03 '15 at 22:38
  • @Mr.Wizard No need for C - in a quick test the new addition to my answer faster than f3... – ciao Jul 03 '15 at 22:39
  • @ciao My f3 is nearly an order of magnitude faster than your Compile code on my machine on the test in my answer. I can't imagine the CPU used makes that much difference. Would you check your results again? – Mr.Wizard Jul 03 '15 at 22:44
  • @Mr.Wizard: Yeah, just did, and weirdly my Diff. based is faster than both on this machine - vagaries of the loungbook, I suppose (but since I throw them out every six to twelve months since cigar lounge smoke destroys them, new one inbound with a bit more modern specs - though nothing crazy - I won't blow clams on a box that's for fun and is going to die from smoke inhalation pretty quickly ... ( – ciao Jul 03 '15 at 22:47
6

for any number you can use:

count[list_, n_] := Total@Cases[Split[list], {n,n ..} :> 1]
count[list1, 1]
(*1*)
count[list1, 0]
(*2*)
Basheer Algohi
  • 19,917
  • 1
  • 31
  • 78
5

Using StringCount:

StringJoin[ToString /@ list3] // StringCount[#, "1"..] &

3

SquareOne
  • 7,575
  • 1
  • 15
  • 34
5

Not very fast, but totally readable

clusterFu[list_] :=
 Module[
  {d = 0,
   bool = True},
  Unevaluated[
    Set[bool, True]@If[bool, bool = False; d++]
    ][[list]]; d
  ]
Jacob Akkerboom
  • 12,215
  • 45
  • 79
3

Another way:

f[list_] := Plus @@@ ImageData@ HitMissTransform[Image@{list}, {{-1, 1}}, Padding -> 0]
Dr. belisarius
  • 115,881
  • 13
  • 203
  • 453
3

Untested, but this is effectively the same strategy as Michael's:

Count[ListCorrelate[{-1, 1}, list, {-1, 1}, 0], 1]
J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574
  • 1
    It works, and it is faster. Its run time does not seem to be sensitive to whether the input is packed or not, and is 85% that of Michael's in the packed case, and 60% otherwise. (I upvoted before I checked this, because I had faith... but I was not expecting the performance to be insensitive to packing.) – Oleksandr R. Jul 04 '15 at 12:57
  • I am thankful that the faith was deserved in this case, @Oleksandr. :) – J. M.'s missing motivation Jul 04 '15 at 13:09
2
a = {0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0};
b = {0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1};
c = {1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0};

Using SequenceCount (new in 10.1)

SequenceCount[#, {1 ..}] & /@ {a, b, c}

{1, 2, 3}

eldo
  • 67,911
  • 5
  • 60
  • 168
2
oneSeqs = Tr @* DeleteAdjacentDuplicates;

oneSeqs /@ {list1, list2, list3}
{1, 2, 3}
WolframLanguageData["DeleteAdjacentDuplicates", "VersionIntroduced"]
13.1
kglr
  • 394,356
  • 18
  • 477
  • 896
1

Using SequenceSplit and Cases:

l1 = {0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0};
l2 = {0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1};
l3 = {1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0};

f = Length@Cases[SequenceSplit[#, s : {1 ..} :> s], {1 ..}] &;

f /@ {l1, l2, l3}

({1, 2, 3})

Or using SequenceSplit, Pick and ContainsOnly:

f = Length@Pick[#, ContainsOnly[#, {1}] & /@ #] 
    &@SequenceSplit[#, s : {1 ..} :> s] &;

f /@ {l1, l2, l3}

({1, 2, 3})

E. Chan-López
  • 23,117
  • 3
  • 21
  • 44
1

Using FindPeaks:

list1 = {0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0};
list2 = {0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1};
list3 = {1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0};

Length@*FindPeaks /@ {list1, list2, list3}

{1, 2, 3}

Syed
  • 52,495
  • 4
  • 30
  • 85