Count the sequences in an array

Question

Is there a short and easier way to count number of sequences in a list?

Let's say I have three lists:

list1 = {0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0}

list2 = {0,0,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1}

list3 = {1,1,1,0,0,0,0,0,1,1,0,0,1,1,1,1,0,0,0}

There is one sub-list of 1's at list1, 2 sub-lists (two islands) of 1's at list2 and 3 sub-lists of 1 at list3. So counting function should return to me 1 for list1, 2 for list2 and 3 for list3. I am not interested in number of 1's in a sub-list.

I am using a previous Mathematica version, so I cannot use SequenceCount function. Thank you.

Do "blocks" consisting of a single 1 count? In other words, should {1,0,1,0} return 0 or 2? — Michael Seifert, Jul 02 '15 at 21:09

score 20 · Accepted Answer · answered Jul 02 '15 at 20:45

20

Here is one way:

oneseqs[lst_] := Total @ Unitize @ Total[Split[lst], {2}]

answered Jul 02 '15 at 20:45

Leonid Shifrin

114,335
15
329
420

I was going to say that it failed to count the 0 sequences, but then I noticed that is exactly what the op wanted. – rcollyer Jul 02 '15 at 20:50
@Leonid That's very elegant and fast. Thank you! – gurluk Jul 02 '15 at 20:50
It counts for this '{0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0}' two sequence. – Basheer Algohi Jul 02 '15 at 21:06
@Algohi Yes, and this is what it should return in this case, if I understood the question correctly. – Leonid Shifrin Jul 02 '15 at 21:30

ciao · Answer 2 · 2015-07-03T22:40:38.630

18

This s/b quite quick (particularly with long lists):

MorphologicalEulerNumber[Image@{list}]

And this is even faster...

Length[With[{d = Differences@Prepend[list, 0]}, Pick[d, d, 1]]]

This seems quite quick:

f = Compile[{{z, _Integer, 1}}, Module[{c = True, cnt = 0},
   Do[
    If[c && x == 1, cnt++; c = False; Continue[]];
    If[x == 0, c = True];,
    {x, z}];
   cnt]];

edited Jul 03 '15 at 22:40

answered Jul 02 '15 at 21:56

ciao

25,774
2
58
139

wow, 2.5x faster than Michael's ! – SquareOne Jul 02 '15 at 22:06
@SquareOne: LOL, I'd not even tested it, just recalled it from some work I did for the same kind of problem - counting "islands" of some value. IIRC, I came up with a really fast routine that beat this handily - looking for it now... – ciao Jul 02 '15 at 23:57
@SquareOne: Give the new one a spin... – ciao Jul 03 '15 at 00:36
I wanted to post Differences but you beat me to it. A favorite method of mine for related problems. – Mr.Wizard Jul 03 '15 at 00:51

Michael Seifert · Answer 3 · 2015-07-02T21:38:06.220

13

Another method:

oneseqs[list_] := Count[Append[list, 0] - Prepend[list, 0], 1]

Technically, this counts the number of times the sequence shifts from 0 to 1, but that's effectively the same as the number of blocks of 1's.

EDIT: Here are some performance figures for the four methods proposed thus far:

@Leonard Shifrin:

randlist = RandomInteger[1, 10^6];
Timing[Total@Unitize@Total[Split[randlist], {2}]]

(* {0.834908, 249707} *)

@Michael Seifert:

Timing[Count[Append[randlist, 0] - Prepend[randlist, 0], 1]]  

(* {0.077662, 249707} *)

@SquareOne:

Timing[StringJoin[ToString /@ randlist] // StringCount[#, "1" ..] &] 

(* {1.362467, 249707} *)

@algohi, appropriately modified:

count[list_, n_] := Total@Cases[Split[list], {n ..} :> 1] 
Timing[count[randlist, 1]]  

(* {0.304680, 249707} *)

edited Jul 02 '15 at 21:38

answered Jul 02 '15 at 20:51

Michael Seifert

15,208
31
68

1

Interestingly, I wanted to test the timing of the new (v10.1) function SequenceCount but for it seems it takes ... infinite time. Do you get the same behaviour ? : randlist = RandomInteger[1, 10^4] ; SequenceCount[randlist, {1 ..}]. (It works for 10^3). – SquareOne Jul 02 '15 at 21:59
1

@SquareOne Please read (83325) – Mr.Wizard Jul 03 '15 at 01:18
@Mr.Wizard I missed that post. It's a shame ... And why aren't these functions coded at low-level ... I wonder what they would do without MSE contributors. – SquareOne Jul 03 '15 at 07:43
@SquareOne Directorial emphasis appears to be strongly in favor of more functionality over good functionality (fast and bug-free). This is a shame. Nevertheless there are improvements being made. SequencePosition is much faster for verbatim (pattern-free) sequences thanks to the low-level-coded GroupTheory`Tools`SublistPosition which it uses in such cases. – Mr.Wizard Jul 03 '15 at 13:18

Mr.Wizard · Answer 4 · 2015-07-03T20:47:37.860

11

I am late to the party but here is my terse contribution:

f1 = Tr @ Split[#][[All, 1]] &

This is quite a bit faster than Leonid's oneseqs:

x = RandomInteger[1, 500000];

oneseqs[x] // RepeatedTiming
f1[x]      // RepeatedTiming

{0.2992, 125166}
{0.0697, 125166}

For speed I propose:

f2 = Length@# - Tr@# & @ UnitStep @ Differences[# ~Append~ 0] &

This is somewhat faster than rasher/ciao's Differences implementation:

Length[With[{d = Differences@Prepend[x, 0]}, Pick[d, d, 1]]] // RepeatedTiming

f2[x] // RepeatedTiming

{0.00900, 125166}
{0.00782, 125166}

Update

Seeking greater speed I considered working in the binary realm. For that:

f3 = Tr @ IntegerDigits[BitShiftLeft@# ~BitXor~ #, 2]/2 &

The array must be converted to an integer first but even with that overhead it is faster:

x = RandomInteger[1, 1*^7];  (* larger array *)

f2[x]                 // RepeatedTiming

f3 @ FromDigits[x, 2] // RepeatedTiming

{0.155, 2498958}
{0.051, 2498958}

edited Jul 03 '15 at 20:47

answered Jul 03 '15 at 01:16

Mr.Wizard

271,378
34
587
1,371

Haha. I was looking at this on my phone and saw someone was using the Tr function. I just knew it was MrWizard before I even got finished reading the answer. – kale Jul 03 '15 at 02:20
Nice optimization you sly dog! +1 – ciao Jul 03 '15 at 04:03
@ciao @Mr.Wizard It seems it's even a little bit faster if you use Join instead of Append or Prepend. – SquareOne Jul 03 '15 at 08:48
@SquareOne On a packed integer array Join and and Append appear to perform similarly in 10.1.0. What specific code are you using? – Mr.Wizard Jul 03 '15 at 13:22
@ciao Thanks! By the way which nickname would you prefer I use? – Mr.Wizard Jul 03 '15 at 13:38
@Mr.Wizard I replaced #~Append~0 with #~Join~{0}. I tested 10^6, 10^7 random integers, Join is always 1-2% faster. – SquareOne Jul 03 '15 at 13:44
1

@Mr.Wizard: call me whatever you'd like - I'm called lots of things ;-} – ciao Jul 03 '15 at 22:44

score 8 · Answer 5 · answered Jul 03 '15 at 14:21

8

After all these ingenious and interesting solutions only some simple remarks.

Jacob Akkerboom's solution shows that totally readable is a subjective concept. It took me some time before I really understood it (but it is very hot here today). In fact, his solution is an ingenious implementation of what could be done with Fold as well, thereby producing another (slow) solution for this problem:

clusterFu2[list_] := Module[{d = 0}, Fold[(If[#1 < #2, d++]; #2) &, 0, list]; d]

A timing:

randlist = RandomInteger[1, 5*10^6];
clusterFu[randlist] // Timing
clusterFu2[randlist] // Timing
(*
{1.98121, 1251006}
{7.37885, 1251006}
*)

So, to my surprise, Jacob's solution is about 3.5 times as fast as the construction with Fold.

But the Fold construction can be easily compiled:

clusterFu3 = Compile[{{list, _Integer, 1}}, Module[{last = 0, result = 0},
   Do[If[last < z, result++]; last = z, {z, list}]; result],
   CompilationTarget -> "C"];

Now it is even faster than Mr.Wizard's intruiging function f2:

clusterFu[randlist] // Timing
clusterFu3[randlist] // Timing
f2[randlist] // Timing

(*
{1.96561, 1251006}
{0.0312002, 1251006}
{0.0936006, 1251006}
*)

answered Jul 03 '15 at 14:21

Fred Simons

10,181
18
49

I hope I didn't mislead you with my "totally readable" comment. It was meant as a joke, because the solution is so unconventional. I hope it was a nice puzzle :). I think the reason why it is faster than the Fold solution is quite subtle, but still it makes some sense to me. +1, especially for the Compile solution. – Jacob Akkerboom Jul 03 '15 at 15:17
Hi @Jacob. I am always too serious ... But indeed, your solution is, at least for me, very unusual. Essentially, you have an If statement, of which you placed the second argument as part 0 of your uneveluated expression and the third argument as part 1, and then use the Part function for evaluating al these If's for the consecutive elements of list. Wonderful, deep, and as such a nice puzzle as well. – Fred Simons Jul 03 '15 at 15:39
I can't compile to C (still!) but I'll trust your timings and give a +1. I updated my answer to squeeze as much as possible from top-level Mathematica code and I think I got close to your clusterFu3 based the ratio of timings. – Mr.Wizard Jul 03 '15 at 19:28
I pushed it a bit further and it is now 3X faster than f2 so about the same as your code. – Mr.Wizard Jul 03 '15 at 20:48
@Mr.Wizard According to my tests, your f3 is faster (10-20)% than clusterFu3 for 10^4-10^5 rand.int., slower by (5-10)% only for 10^6-10^7 rand., and for 10^8-10¨^9 it's 50-80% slower. – SquareOne Jul 03 '15 at 22:38
@Mr.Wizard No need for C - in a quick test the new addition to my answer faster than f3... – ciao Jul 03 '15 at 22:39
@ciao My f3 is nearly an order of magnitude faster than your Compile code on my machine on the test in my answer. I can't imagine the CPU used makes that much difference. Would you check your results again? – Mr.Wizard Jul 03 '15 at 22:44
@Mr.Wizard: Yeah, just did, and weirdly my Diff. based is faster than both on this machine - vagaries of the loungbook, I suppose (but since I throw them out every six to twelve months since cigar lounge smoke destroys them, new one inbound with a bit more modern specs - though nothing crazy - I won't blow clams on a box that's for fun and is going to die from smoke inhalation pretty quickly ... ( – ciao Jul 03 '15 at 22:47

Basheer Algohi · Answer 6 · 2015-07-02T21:01:34.800

6

for any number you can use:

count[list_, n_] := Total@Cases[Split[list], {n,n ..} :> 1]
count[list1, 1]
(*1*)
count[list1, 0]
(*2*)

edited Jul 02 '15 at 21:01

answered Jul 02 '15 at 20:56

Basheer Algohi

19,917
1
31
78

count[{1, 0, 1}, 0] and count[{1, 0, 1}, 1] both return 0, which I don't think is the desired result. – Michael Seifert Jul 02 '15 at 21:07
Probable that is correct because no sequence of any number ! – Basheer Algohi Jul 02 '15 at 21:08
The OP clarified that sequences of a single 1 should count towards the total. – Michael Seifert Jul 02 '15 at 21:30
In this case replace {n,n ..} by {n ..}. it will work fine. – Basheer Algohi Jul 02 '15 at 21:37

score 5 · Answer 7 · answered Jul 02 '15 at 21:00

5

Using StringCount:

StringJoin[ToString /@ list3] // StringCount[#, "1"..] &

3

answered Jul 02 '15 at 21:00

SquareOne

7,575
1
15
34

Jacob Akkerboom · Answer 8 · 2015-07-03T08:39:00.480

5

Not very fast, but totally readable

clusterFu[list_] :=
 Module[
  {d = 0,
   bool = True},
  Unevaluated[
    Set[bool, True]@If[bool, bool = False; d++]
    ][[list]]; d
  ]

edited Jul 03 '15 at 08:39

answered Jul 03 '15 at 08:06

Jacob Akkerboom

12,215
45
79

score 3 · Answer 9 · answered Jul 02 '15 at 22:14

3

Another way:

f[list_] := Plus @@@ ImageData@ HitMissTransform[Image@{list}, {{-1, 1}}, Padding -> 0]

answered Jul 02 '15 at 22:14

Dr. belisarius

115,881
13
203
453

score 3 · Answer 10 · answered Jul 02 '15 at 22:20

3

Untested, but this is effectively the same strategy as Michael's:

Count[ListCorrelate[{-1, 1}, list, {-1, 1}, 0], 1]

answered Jul 02 '15 at 22:20

J. M.'s missing motivation

124,525
11
401
574

1

It works, and it is faster. Its run time does not seem to be sensitive to whether the input is packed or not, and is 85% that of Michael's in the packed case, and 60% otherwise. (I upvoted before I checked this, because I had faith... but I was not expecting the performance to be insensitive to packing.) – Oleksandr R. Jul 04 '15 at 12:57
I am thankful that the faith was deserved in this case, @Oleksandr. :) – J. M.'s missing motivation Jul 04 '15 at 13:09

score 2 · Answer 11 · answered Dec 27 '23 at 00:39

a = {0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0};
b = {0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1};
c = {1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0};

Using SequenceCount (new in 10.1)

SequenceCount[#, {1 ..}] & /@ {a, b, c}

{1, 2, 3}

score 2 · Answer 12 · answered Dec 27 '23 at 00:54

2

oneSeqs = Tr @* DeleteAdjacentDuplicates;

oneSeqs /@ {list1, list2, list3}

{1, 2, 3}

WolframLanguageData["DeleteAdjacentDuplicates", "VersionIntroduced"]

13.1

answered Dec 27 '23 at 00:54

kglr

394,356
18
477
896

E. Chan-López · Answer 13 · 2023-12-27T01:30:24.947

Using SequenceSplit and Cases:

l1 = {0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0};
l2 = {0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1};
l3 = {1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0};
f = Length@Cases[SequenceSplit[#, s : {1 ..} :> s], {1 ..}] &;
f /@ {l1, l2, l3}
({1, 2, 3})

Or using SequenceSplit, Pick and ContainsOnly:

f = Length@Pick[#, ContainsOnly[#, {1}] & /@ #] 
    &@SequenceSplit[#, s : {1 ..} :> s] &;
f /@ {l1, l2, l3}
({1, 2, 3})

score 1 · Answer 14 · answered Dec 27 '23 at 07:59

Using FindPeaks:

list1 = {0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0};
list2 = {0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1};
list3 = {1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0};
Length@*FindPeaks /@ {list1, list2, list3}

{1, 2, 3}

Count the sequences in an array

14 Answers14

Update

Linked