I was asked to look into writing some simulation code for a user, where the simulation output (x) would oscillate between three possible states:
states[x_, γ_] := Which[
x < -γ, 1 (*top*),
x < γ , -1 (*bottom*),
True , 0 (*middle*)]
The simulation essentially boiled down to the following, and what they were interested in is computing the lengths runs in each state - which could be easily computed with SplitBy:
γ = 0.9;
sampleSimulation = RandomChoice[{-1, 0, 1}, 1000000];
runLengths = Switch[Sign[#[[1]]],
1, {"top", Length[#]},
-1, {"bottom", Length[#]},
0, {"middle", Length[#]}
] & /@ SplitBy[sampleSimulation, states[#, γ] &]
However, if the simulation output moved between the "top" and "bottom" states (in either direction) without occupying the "middle" state they needed this information in runLengths via the string "unoccupied middle".
The following code achieves that and is surprisingly performant and the experimenter was happy. However it feels unnecessary to SplitBy and then Reap and Sow with a Do loop to find where the simulation moves seemlessly between "top" and "bottom". What would be a more natural, and less wasteful method to solve such a problem?
runLengthsFunc[list_, γ_] :=
Block[{splitting},
splitting =
Switch[Sign[#[[1]]],
1, {"top", Length[#]}, -1, {"bottom", Length[#]},
0, {"middle", Length[#]}] & /@
SplitBy[list, states[#, γ] &];
Append[Flatten[
Reap[Do[If[(splitting[[k, 1]] == "top" &&
splitting[[k + 1, 1]] == "bottom") || (splitting[[k, 1]] ==
"bottom" && splitting[[k + 1, 1]] == "top"),
Sow[{splitting[[k]], "unoccupied middle"}],
Sow[{splitting[[k]]}]],
{k, Length[splitting] - 1}
]][[2]], 2], splitting[[-1]]]]
Timing:
In[10]:=AbsoluteTiming[runLengthsFunc[sampleSimulation, γ];]
Out[10]:={7.526282, Null}
Solution Choosing
Thanks for the three very different solutions, as requested the focus was on different methodologies rather than sheer blistering speed.
I'm going to accept Mr Wizard's answer as it provided the most different approach (and best learning experience for me) as well as exceptional speed - under a second for an initial set of 1,000,000 states. However, there was an apparently superfluous process; Unitize where a list already contained only 0,1.
Kguler's method falls apart for more than ~3000 elements, but is a great example of pattern matching. And it's great to see in both this and Mr Wizard's solutions the convergent evolution of SE toward Composition.
Leonid's answer introduced me to linked lists which will be a useful tool for similar problems I have. Unfortunately, I can't benchmark Leonid's as he elected to focus on methodology rather than speed so did not include an implementation of runLengths - but the linking of lists is much more performant than my Reap[Do[Sow] approach.
Position[Abs[Differences@sampleSimulation], 2]will give you list of positions where next position is a jump up/down past 0. You can then use that to insert the unoccupied tag. – ciao May 01 '14 at 00:16Unitize; at the moment I agree, it appears superfluous. – Mr.Wizard Jun 05 '14 at 07:47