8

I have a stream of data like this:

0001100111100000111111001110000001111111111000000111000111110000...

(I can represent them as a list, like in {0,0,0,1,1,...}, I guess that's easier to work with.)

Now I want to count how many sequences of two "1"s, three "1"s, etc there are (the zeros lengths are not important, they're just separators), to show them in a histogram. I have no problems doing this procedural, but functional programming remains difficult for me. While I don't mind pausing for a cup of coffee (there's 4.8 million data points), I guess in functional programming this will be orders of magnitude faster. How do I do this with functional programming?

Note
"0011100" only counts as a sequence of length 3, the two sub-sequences of length 2 should not be taken into account.

stevenvh
  • 6,866
  • 5
  • 41
  • 64

5 Answers5

7

If your data is in list form (conversion from string will swamp advantage), this should be quite a bit faster (5-50+X than existing answers, timings on the loungbook, so I'd expect 10+X faster for all on W/S):

tOnes = Module[{p = Append[Pick[Range@Length@#, #, 1], 0], sa},
    If[p === {0}, {},
     sa = SparseArray[Subtract[Rest@p, Most@p], Automatic, 1]["AdjacencyLists"];
     Tally[Differences[Prepend[sa, 0]]]]] &;

Comparable in speed, and arguably prettier:

tOnes2 = With[{d = Join[{0}, #, {0}]}, 
    Tally[Differences@DeleteDuplicates@Pick[Accumulate@d, d, 0]]] &;

Comparison:

(* make some data & string/digit equivalents for string/Mr.W solutions *)
data = RandomInteger[{0, 1}, 4000000];
strng = StringJoin[ToString /@ data];
mwdata = FromDigits[data];
ClearSystemCache[]

(* eldo *)
eldotim = 
  First@Timing[
    eldo = Tally@
       Select[StringLength /@ StringSplit[strng, "0"], # > 0 &];];

(* Mr. W *)
mwtim = First@
   Timing[mwr = 
      Tally[Length /@ Split[IntegerDigits@mwdata][[;; ;; 2]]];];

(* 2012rcampion *)
rctim = First@Timing[
    lengths = Cases[Split[data], l : {1, ___} :> Length[l]];
    tally = Tally[lengths];
    ];

(* kguler *)
kgtim = First@
   Timing[tally2 = Tally@StringLength@StringCases[strng, "1" ..];];

(* Me *)
me1tim = First@Timing[me = tOnes@data;];
me2tim = First@Timing[me2 = tOnes2@data;];

Transpose[{{"Mr.W", "eldo", "2012rcampion", "kguler", "Me1", "Me2"},
   {mwtim, eldotim, rctim, kgtim, me1tim, me2tim}}] // TableForm

(* Check *)
me == me2 == tally == eldo == tally2 == mwr

enter image description here

(* True *)

ciao
  • 25,774
  • 2
  • 58
  • 139
6
strng = "0001100111100000111111001110000001111111111000000111000111110000";

tally = Tally@StringLength@StringSplit[strng, "0" ..]
tally2 = Tally@StringLength@StringCases[strng, "1" ..]
tally3 = Tally@StringCases[strng, s : "1" .. :> StringLength[s]]

all give

(* {{2, 1}, {4, 1}, {6, 1}, {3, 2}, {10, 1}, {5, 1}} *)
kglr
  • 394,356
  • 18
  • 477
  • 896
3

data = "0001100111100000111111001110000001111111111000000111000111110000";

Tally@Select[StringLength /@ StringSplit[data, "0"], # > 0 &]

{{2, 1}, {4, 1}, {6, 1}, {3, 2}, {10, 1}, {5, 1}}

eldo
  • 67,911
  • 5
  • 60
  • 168
3

Generate 4M data points:

data = RandomInteger[1, 4000000];

Split into runs of identical elements, then find all sequences of 1s and compute their Lengths:

lengths = Cases[Split[data], l : {1 ..} :> Length[l]];

From here you can either Tally up all the lengths to use later:

Tally[lengths]//Sort
(* {{1, 499797}, {2, 249940}, {3, 124555}, {4, 62985}, {5, 31233}, {6, 
  15454}, {7, 7912}, {8, 3987}, {9, 1912}, {10, 969}, {11, 459}, {12, 
  238}, {13, 100}, {14, 59}, {15, 30}, {16, 6}, {17, 9}, {18, 6}, {19,
  1}, {20, 1}, {22, 2}} *)

... or make a Histogram directly:

Histogram[lengths, {1}, {"Log", "Count"}]

enter image description here

Update

It's slightly faster (1.5 s vs 1.6 s) to use {1, ___} instead of {1 ..} as the pattern.

2012rcampion
  • 7,851
  • 25
  • 44
1

Starting with Integer input:

in = 0001100111100000111111001110000001111111111000000111000111110000;

Length /@ Split[IntegerDigits @ in][[;; ;; 2]]
{2, 4, 6, 3, 10, 3, 5}

You can Tally that if you want:

Tally @ %
{{2, 1}, {4, 1}, {6, 1}, {3, 2}, {10, 1}, {5, 1}}
Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371