Count consecutive occurrences in a list above a certain value

Question

I am trying to tally and sort the number of consecutive occurrences in a list above a certain value.

For instance:

data={55.5, 65.3, 100.3, 100.1, 100.5, 98.7, 75.2, 101.3, 102.1}

The totals for this data set would be {3,2} if the tally criteria were all values greater than 100.

As I have worked on this problem, I found the following code:

a = RandomInteger[1, {100}];
Cases[Split[a], {1, ___}] // Tally // Sort

Which is very close to what I am trying to do. In this case, the code provides a sorted list of consecutive integers equal to one.

However, I can't seem to modify the decision criteria to only tally values above a certain criteria.

How might I go about this?

This question is closely related: Finding long strings of identical digits in transcendental numbers, unfortunately I haven't got so far a quite satisfactory answer. — Artes, Aug 13 '13 at 13:25
This works but it will not be efficient for huge data: (Total /@ Split@(Boole[# > 100] & /@ data)) /. (0 -> Sequence[]) — Kuba, Aug 13 '13 at 13:31
Do you count a single appearance of a number >100? e.g. {. . . , 3, 7, 108, 2, 14, . . .}? — Mr.Wizard, Aug 13 '13 at 14:38

score 8 · Answer 1 · edited Apr 13 '17 at 12:55

Edit: there is an ambiguity in the question (and my answer) as to whether or not you want to include single-element "runs" in your tally.

If you merely want to count the elements in each run and not also return the elements I propose:

Cases[Split @ UnitStep[100 - data], x : {0, __} :> Length@x]

{3, 2}

If you (also) want to return the elements you could use this:

Module[{us = UnitStep[100 - data], tally},
 tally = Cases[Split[us], x : {0, ___} :> Length@x];
 dynamicPartition[Pick[data, us, 0], tally]
]

{{100.3, 100.1, 100.5}, {101.3, 102.1}}

You'll need my dynamicPartition function described here.

Timings

Comparing my proposal to rm-rf's:

data = RandomReal[200, 750000];

Cases[Split@UnitStep[100 - data], x : {0, __} :> Length@x]        // Length // Timing

Length /@ Split[data, # > 100 && #2 > 100 &] /. 1 -> Sequence[]   // Length // Timing

Select[Total[Split@Clip[data, {100, 100}, {0, 1}], {2}], # > 1 &] // Length // Timing

{0.109, 93479}

{0.671, 93479}

{0.655, 93479}

score 7 · Answer 2 · answered Aug 13 '13 at 13:41

One direct way is to use the second argument of Split and find sublists of length > 0:

Length /@ Split[data, # > 100 && #2 > 100 &] /. 1 -> Sequence[]

or by clipping the values before splitting:

Select[Total[Split@Clip[data, {100, 100}, {0, 1}], {2}], # > 1 &]

MikeLimaOscar · Answer 3 · 2013-08-13T14:46:38.847

6

Not sure about the efficiency but how about defining a test function:

testFunction = (# > 100 &)

and then

Length /@ Select[Split[testFunction /@ data], First]

or if you don't want singletons

Length /@ Cases[Split[testFunction /@ data], {True, __}]

edited Aug 13 '13 at 14:46

answered Aug 13 '13 at 13:44

MikeLimaOscar

3,456
17
29

This does not give output that matches mine or rm-rf's; please check it. – Mr.Wizard Aug 13 '13 at 14:30
Okay, I'm not sure about the meaning of "consecutive occurrences" -- yours takes one interpretation, mine takes another. Clarification is needed. – Mr.Wizard Aug 13 '13 at 14:36
No, you're correct, mine reports "runs" of length one which probably aren't wanted. – MikeLimaOscar Aug 13 '13 at 14:39
I'm not so sure; the code in the question includes ___ (BlankNullSequence) which would match your interpretation. It's not clear IMHO. By the way, Thread should be a least slightly faster: Length /@ Select[Split[Thread[data > 100]], First] – Mr.Wizard Aug 13 '13 at 14:41
1

I've added an alternative that excludes singletons, using Cases, but still no Thread -- using the test function means it could be written as a generic RunLengths[data,test] function. – MikeLimaOscar Aug 13 '13 at 14:51

Pillsy · Answer 4 · 2013-08-13T15:05:53.657

This answer is general and fast, allowing the use of any predicate (not just ones based on a numerical threshold) and using the fact that you'll have alternating runs of elements that satisfy the predicate and elements that don't satisfy the predicate:

runLengths[list_, pred_] :=
 With[{bits = Boole[pred@#] & /@ list},
  Length /@ Part[
    Split[bits],
    If[bits[[1]] == 0,
     Span[2, All, 2],
      Span[1, All, 2]]]]

EDIT to add: Some timings:

bigData = RandomReal[{0, 200}, {10000}];

Do[runLengths[bigData, # >= 100 &], {100}] // Timing

{0.218401, Null}

For comparison, here's Mr. Wizard's approach on my machine:

Do[
  Cases[Split@UnitStep[100 - bigData], x : {0, __} :> Length@x],
  {100}] // Timing

{0.249602, Null}

So it's a little faster, and a little more flexible.

EDIT again to add:

I'm assuming we want to keep runs of length 1; those can always be deleted later in any event.

EDIT the third: just getting rid of 1 at the end adds very little overhead:

Do[DeleteCases[
   runLengths[bigData, # >= 100 &], 1], 
   {100}] // Timing

{0.234002, Null}

Strange, using your own code my method is much faster. I am getting {0.78, Null} and {0.156, Null} for the two outputs. Version differences I guess? Also, your method includes "singletons" which may or may not be desired; can your solution be adapted to eliminate these if required? — Mr.Wizard, Aug 13 '13 at 14:59
Could be version or even architecture variations. FWIW, I'm using Mathematica 9.0.1. — Pillsy, Aug 13 '13 at 15:04
I have no reason to doubt your timings and you addressed the one concern. +1 — Mr.Wizard, Aug 13 '13 at 15:08

sebhofer · Answer 5 · 2013-08-14T09:52:36.467

5

As Mr. Wizard pointed out in the comments, my original code tests for elements greater or equal 100, a corrected version is in the edit below.

I interpreted the question to include singletons and came up with the following, which I think performs decently

Length/@DeleteCases[Split[UnitStep[data-100]],{0..}]

Edit

The following tests for elements greater than 100

Length/@DeleteCases[Split[UnitStep[100-data]],{1..}]

Thanks to Mr. Wizard to pointing out mistakes and performance issues in this post!

edited Aug 14 '13 at 09:52

answered Aug 13 '13 at 18:00

sebhofer

2,741
18
25

I've got more critique, but please know I wouldn't bother except that I hope it helps. You don't need to use 1 - Unitstep[. . . as you can simply delete the 1's instead. Also, your second method is inefficient as you pattern-match twice; it would be better to either use Cases as I did, or Length /@ DeleteCases . . . as you did in your first method. If I were you I'd delete the second form of each one. +1 for the corrected versions. – Mr.Wizard Aug 14 '13 at 09:32
@Mr.Wizard True again. Do you just want to take over this answer? :) Editing... – sebhofer Aug 14 '13 at 09:49
No, it looks perfect now. :^) – Mr.Wizard Aug 14 '13 at 10:01

eldo · Answer 6 · 2023-12-02T10:25:18.237

3

Using SequenceSplit which came with V 11.3

Length /@ SequenceSplit[data, {x_ /; x < 100}]

{3, 2}

{Length /@ #, #} & [SequenceSplit[data, {x_ /; x < 100}]]

{{3, 2}, {{100.3, 100.1, 100.5}, {101.3, 102.1}}}

edited Dec 02 '23 at 10:25

answered Dec 01 '23 at 23:50

eldo

67,911
5
60
168

score 1 · Answer 7 · answered Dec 05 '23 at 06:24

Another way is to use ConsecutiveQ to split by positions as follows:

Length /@ SequenceCases[Position[data, x_ /; x > 100], {__}?ConsecutiveQ]
({3, 2})
{Length@#, Extract[data, #]} & /@ 
SequenceCases[Position[data, x_ /; x > 100], {__}?ConsecutiveQ]
({{3, {100.3, 100.1, 100.5}}, {2, {101.3, 102.1}}})

Count consecutive occurrences in a list above a certain value

7 Answers7

Timings

Linked