19

I'm trying to create a function that randomly returns a value from a list but remembers the values that have been given before. At the end when the list is empty it should return an empty list. Basically like emptying a bucket full of eggs one at a time.

Suppose I have two lists:

data1 = Range[10];
data2 = Range[20];

Assume a function

getRandomItem[l_List]

I tried playing with down-values but that doesn't work.

Calling getRandomItem[data1] two times would give (e.g) {1} and {3}. Calling getRandomItem[data2] two times would give (e.g) {15} and {20}

At the end as stated before when all items are chosen both getRandomItem[data1] and getRandomItem[data2] should return {}.

I would like to do that without declaring data1 and data2 as global variables nor do I which to change/alter them. So, basically I presume the function itself should remember which data has been given to it and where it had left the previous time.

rm -rf
  • 88,781
  • 21
  • 293
  • 472
Lou
  • 3,822
  • 23
  • 26
  • Yes I realize that but using global variables is also a bit cumbersome (could add a counter field in the list e.g). So I just wondered if there's a neat way to do this. Perhaps there isn't.. – Lou Apr 11 '12 at 14:15
  • Yes I noticed that :) Using getRandomItem[{1,2,3}] as a downvalue didn't work either since it's up in the downvalue pattern matching stack. – Lou Apr 11 '12 at 14:19
  • 3
    Have you looked at RandomSample? if the critical component is sampling without replacement, that is a good place to start. – Andy Ross Apr 11 '12 at 14:21
  • What you describe reminds me a bit of closures (which I haven't used much personally). Please see if my answer is helpful. – Szabolcs Apr 11 '12 at 14:26
  • I did but it doesn't seem to solve anything. RandomChoice is perfect already. I wanted to push the housekeeping to the function but it seemed complicated and perhaps to be avoided. Calling the function with any list until it's empty is essentialy the goal. – Lou Apr 11 '12 at 14:30
  • @Lou I suggest looking at celtschk's answer below. Also, it's generally advisable to wait a little before accepting an answer, as often a better answer comes a little late. – Eli Lansey Apr 11 '12 at 15:42
  • @Eli Ok you're right. It seems that celtschk's answer would be the best solution. I'm not very sure about the ethics of choosing answers and switching one's opinion. It also seems a combined effort. Any guiding rules? – Lou Apr 11 '12 at 16:23
  • @Lou I believe the general idea is that you can change your vote to the best (current) answer, as you see it. – Eli Lansey Apr 11 '12 at 16:34
  • Sounds like what you really want is a random permutation of indices. Look up card-shuffling algorithms, they are extremely simple (or, knowing Mathematica, there's probably a function for that already :P) – BlueRaja - Danny Pflughoeft Apr 11 '12 at 16:52
  • @BlueRaja: It's not that simple... most answers already use the random permutation of indices, but the OP wants only a certain number of those at a time, and wants the function to remember previously returned values till the list is empty – rm -rf Apr 11 '12 at 18:26
  • @R.M. Right, so you create a list of indices, and treat each index like a card in a deck of cards. Then just shuffle the deck. Every time he wants n values from the list, return the top n cards. When the deck is empty, every index has been returned once. (It's much easier/more efficient to remember which numbers need to be returned, rather than which numbers have already been returned, but both have the same effect). – BlueRaja - Danny Pflughoeft Apr 11 '12 at 18:41

5 Answers5

14

One thing you can do is

data1Random = RandomSample[data1];
data2Random = RandomSample[data2];

This gives you a random ordering of the initial dataset, without any repetitions. Then you can just pick them out from that list one by one in order.

Edit Thinking along the lines of Szabolcs's answer, I've come up with a possible approach to the "drops in a bucket" element of the question. If you use:

data1Random = {RandomSample[data1], Length[data1]}
getRandomItem[data_] := If[data[[2]] > 0, 
 {data[[1, data[[2]]]], {data[[1]], data[[2]] - 1}}, 
 {{}, {data[[1]], 0}}]

you can keep track of how many things you've used already. Here's how you'd use this:

{drop, data1Random} = getRandomItem[data1Random]

where drop is the random value, and you re-assign data1Random's counter each time. A quick benchmark:

data1 = Range[100000];
Do[{drop, data1Random} = getRandomItem[data1Random], {Length@data1}]; // AbsoluteTiming

(* ==> 0.9531128 *)

compared to Szabolcs's second answer

bucket = makeDrippingBucket[data1]
Do[bucket[], {Length@data1}]; // AbsoluteTiming

(* ==> 12.9529592 *)

This is much slower than celtschk's solution

bucket = makeDrippingBucket[data1]
Do[bucket[], {Length@data1}]; // AbsoluteTiming

(* ==> 0.3593727 *)

Further Edit Here's a way which does the left-hand-side reassignment within the function:

SetAttributes[getRandomItem, HoldAll]
getRandomItem[data_]:=({drop,data}=If[data[[2]]>0,
 {data[[1,data[[2]]]],{data[[1]],data[[2]]-1}},{{},{data[[1]],0}}];
 drop)

Then, for usage:

data1Random = {RandomSample[data1], Length[data1]};
getRandomItem2[data1Random]

only outputs the random number, and can be re-evaluated until it's used them all up, and it outputs {}. This is actually faster than the previous version (same benchmark runs in 0.7657132) and has a simpler usage.

Eli Lansey
  • 7,499
  • 3
  • 36
  • 73
12

How about making a closure? A closure is a function with an internal state.

makeDrippingBucket[list_] := 
 Module[{bucket = list}, 
  If[bucket === {}, {}, 
    With[{item = RandomChoice[bucket]}, 
     bucket = DeleteCases[bucket, item]; {item}]] &]

Then use this to make a "bucket", like this:

bucket = makeDrippingBucket[{1,2,3,4,5}]

This object has an internal state that changes every time you call it. Every time you call bucket[], it will give you a new number, until it gets empty.

bucket[]

(* ==> {3} *)

EDIT

The same thing, using @Eli's solution of pre-randomizing the list:

makeDrippingBucket[list_] := 
 Module[{bucket = RandomSample[list]}, 
  If[bucket === {}, {}, 
    With[{item = Last[bucket]}, bucket = Most[bucket]; {item}]] &]
Szabolcs
  • 234,956
  • 30
  • 623
  • 1,263
  • That's clever. I imagine that for large arrays this would be computationally expensive, though? – Eli Lansey Apr 11 '12 at 14:26
  • @Eli You're right that your solution is probably faster, I was concentrating on the hidden state :-) – Szabolcs Apr 11 '12 at 14:32
  • I've suggested an alternate version similar to your idea. Any idea which is more efficient? – Eli Lansey Apr 11 '12 at 14:40
  • @Eli I haven't benchmarked at all ... – Szabolcs Apr 11 '12 at 14:42
  • @Szabolcs Yes that seems to be good. I can just define several buckets at time needed. I added an If to check when the list is empty before RandomChoice is called. Thanks! – Lou Apr 11 '12 at 14:45
  • @Szabolcs I did a quick benchmark. This is pretty slow. Not sure why, though, with the improvement. The initial one I gave up on evaluating. – Eli Lansey Apr 11 '12 at 14:49
  • @EliLansey: If you replace Last/Most with First/Rest, does it get faster? – celtschk Apr 11 '12 at 15:04
  • @celtschk Slower, by a little. – Eli Lansey Apr 11 '12 at 15:06
  • Using DeleteCases will remove all duplicates in the list, so your bucket will soon be empty. Try bucket = makeDrippingBucket[{9, 9, 9, 9}] – rm -rf Apr 11 '12 at 15:12
  • 1
    @EliLansey: What about this: makeDrippingBucket[list_] := Module[{bucket = RandomSample[list], index = 0, len = Length[list]}, If[index == len, {}, bucket[[++index]]]&] – celtschk Apr 11 '12 at 15:16
  • @celtschk Wow, MUCH faster (0.34). That's actually the thing I had thought to do initially but couldn't quite work out. That's why I include an explicit counter in my version. I'd post it as an answer. – Eli Lansey Apr 11 '12 at 15:20
  • @EliLansey: I've now added the code as answer. – celtschk Apr 11 '12 at 15:29
11

Here's my solution, based on Szabolcs' solution which used Eli Lansey's solution of pre-randomizing. Basically I've replaced the list manipulation with index calculation.

makeDrippingBucket[list_] :=
  Module[{bucket = RandomSample[list],
          index = 0,
          len = Length[list]},
    If[index == len, {}, bucket[[++index]]]&]

Of the solutions Eli benchmarked, up to now it seems to be the fastest (see the corresponding comment on this post).

celtschk
  • 19,133
  • 1
  • 51
  • 106
8

Here's a more imperative solution, still based on closing over a mutable symbol, if for some reason you don't want to pre-randomize the list. I'm not sure I'd recommend it, but it's an alternative approach that might be interesting:

Pillsy`DrippingBucket[list_List] := 
 Module[{array = list, fill = Length@list},
  Function[{},
   If[fill == 0, (* bucket is empty! *) 
    $Failed,
    With[{k = RandomInteger[{1, fill}]},
     array[[{k, fill}]] = array[[{fill, k}]];
     array[[fill--]]]]]]

Its speed is comparable to Eli Lansey's approach for a 100000 element list if you're going to drip away the whole bucket; the advantage comes about if you're only using a small number of drips, because you only have to pay for the ones you use. Still, for most applications I'd just use RandomSample.

Pillsy
  • 18,498
  • 2
  • 46
  • 92
  • +1 Very nice. This is much faster for fewer drips, but slower by approx a factor of 2 for the whole bucket. – Eli Lansey Apr 11 '12 at 15:23
5

Here's a solution using Internal`Bag et al. It also does not involve the computational cost of pre-randomizing your list if you aren't going to empty your bucket fully.

Begin["Lou`"];
bag; data;
CreateBucket[list_List] := (bag = Internal`Bag[]; data = list;)
EmptyBucket[] := If[data === {}, {}, 
    ((Internal`StuffBag[bag, data[[#]]];
      data = Drop[data, {#}];)&@RandomInteger[{1, Length[data]}];
      Internal`BagPart[bag, -1])
];

ListSoFar[] := Internal`BagPart[bag, All];
End[];

You can now use it in the following manner:

AppendTo[$ContextPath, "Lou`"];
CreateBucket[{1, 2, 3, 4, 5}]; (* create a bucket of data *)
EmptyBucket[]                  (* empty your bucket one by one *)
(* Out[1]= {5} *)

EmptyBucket[]
(* Out[2]= {1} *)

ListSoFar[]                    (* see what has been output so far *)
(* Out[3]= {5, 1} *)

When you finally empty your bucket, it returns {}.

rm -rf
  • 88,781
  • 21
  • 293
  • 472