24

As I understand it, when using Sow[expr] you throw the expr on some private stack which you can Reap afterwards.

Questions: But what happens if you don't Reap? Does the sowed data remain on this stack? Can this cause issues (memory leaks) if you sow large amounts of data?

Alexey Popkov
  • 61,809
  • 7
  • 149
  • 368
Gert
  • 1,530
  • 8
  • 22

2 Answers2

28

Preamble

This complements the answer of Roman with a few more details.

Reap - Sow implementation is based on an internal object Internal`Bag, which is properly garbage-collectable. Once an expression wrapped in Reap, where Sow has been internally used, goes out of scope, or its evaluation is finished, all these objects are GC-ed and the memory is released.

Also, whenever there is no Reap around Sow, no collection of sown values seems to be at all attempted, and no extra memory is used (i.e., Sow then acts like Identity or #&).

Memory consumption measurements

Reap - Sow measurements

Let us now illustrate that. First, measure memory consumption for 3 cases: simple computation (idle function mapped on a large list), same with Sow in place of an idle function, and same with Reap wrapped around code with Sow inside:

ClearAll[$dataSize, data, wors, rwos, rws, storedBag]

$HistoryLength = 0;

$dataSize = 100000; data = Range[$dataSize]; wors = MaxMemoryUsed[f /@ data] (* No Reap - Sow at all ) rwos = MaxMemoryUsed[Sow /@ data] ( Sow without Reap ) rwos - wors ( Possible extra memory used by Sow without Reap ) rws = MaxMemoryUsed[ Reap[Sow /@ data]] ( Sow with surrounding Reap ) storedBag = rws - rwos ( How much memory take internal structures storing sown results *)

(* 7989840 7990136 296 8856288 866152 *)

The first conclusion we make, is similar to what Roman has stated: there is pretty much no memory wasted when Sow is used without Reap (difference is just a few bytes, 296 here) for even decently sized data.

The last value 866152 is what has been used to internally store sown data.

Internal`Bag[] measurements

Let us now experiment with the Internal`Bag[] structure:

bag = Internal`Bag[]; (* Initialize the bag *)
Do[Internal`StuffBag[bag, i], {i, data}]; (* Fill the bag with the same data *)
ByteCount[bag] (* Unfortunately, ByteCount does not give correct value for bags *)
mu = MemoryInUse[]; (* which is why here we measure the used memory using MemoryInUse *)
Remove[bag]
bagUse = mu - MemoryInUse[]

(* 33 866248 *)

The first number shows that ByteCount can not be trusted for bags.

Comparison

The second number we can compare to the value of storedBag variable, which we obtained earlier and in a completely different way:

bagUse - storedBag

(* 96 *)

I wasn't able to track this remaining difference of 96 bytes down and explain it, but it stays fairly constant when we vary $dataSize within some range, and is a pretty small residual value, compared to the total amount of memory used.

Please note: when running the above code on a fresh kernel, you may need to ignore the first few runs, to start getting stable results similar to those I quoted above. The reason probably has to do with some initialization / autoloading process, although this is just a guess.

What happens if code inside Reap returns early

This has been asked in comments and is a good question. Here is an illustration:

rwsReturn = MaxMemoryUsed[
  Reap[If[# > $dataSize /2, Return[#, Reap], Sow[#]] & /@ data]
]; (* Sow with surrounding Reap, but exiting early *)

storedBagReturn = rwsReturn - rwos (* How much memory takes internal structures storing sown results *)

(* 434160 *)

In this case, I used a 2-argument Return to return early, but the same would've happened had I used Throw / Catch instead. What we see is that memory still has been filled with data up to the point of early return - we get almost exactly half the memory used in the full evaluation case, which is what we would expect here.

Here is a crude way to model how this works:

ClearAll[reap, sow, $storage, $inReap]
$inReap = False;
SetAttributes[{reap}, HoldAll]
reap[code_] := # &@ Block[
  {$inReap = True, $storage}, 
  {code, Internal`BagPart[$storage, All]}
]
sow[arg_] /; !TrueQ[$inReap] := arg;
sow[arg_] := If[ 
  ! ValueQ[$storage],  
  $storage  = Internal`Bag[{arg}]; arg,
  Internal`StuffBag[$storage, arg]; arg
];

The # &@ part in reap implementation is needed if one wants to be able to use 2-arg Return on reap, otherwise one can remove it.

This gives exact same results:

storedBagReturn = MaxMemoryUsed[
  reap[If[# > $dataSize /2, Return[#, reap], sow[#]] & /@ data]
]
storedBagReturn = rwsReturn - rwos (* How much memory takes internal structures storing sown results *)

(* 8424264 434160 *)

So, even though Reap has been interrupted and the sown results have been discarded as well as the result of evaluation, otherwise Reap - Sow work as usual. What matters is that Reap being wrapped around the code creates a dynamic environment in which Sow does collect the data, rather than being idle (which happens when there is no Reap around the code). Whether or not the evaluation is interrupted, does not affect the "collecting" vs "idle" mode for Sow.

Summary

The above analysis indicates that:

  • Sow without surrounding Reap does not use any noticeable extra memory (w.r.t. computations without Sow).
  • Memory consumption of Sow with surrounding Reap is in good agreement with what one would expect based on the behavior of the underlying Internal`Bag[] structure.
  • We have seen that bags are automatically GC-ed once not referenced, which explain why Reap and Sow behave the same.
Leonid Shifrin
  • 114,335
  • 15
  • 329
  • 420
  • 2
    My inference has been that Reap[] sets up bag(s) to store (appropriately tagged) objects. Sow[] looks for bag(s) in which to stuff an object. If there are no bags appropriate the (tagged) object, Sow[] does nothing. And if nothing, then no memory is used. Perhaps I didn't read your answer carefully enough, but does my description seem an accurate model? -- Further, I'd assume Sow[] stores a pointer and itself has a minimal impact on memory. The real impact comes when Reap constructs the lists of sown objects; then objects might be copied or at least blocked from GC. (+1) – Michael E2 Aug 30 '22 at 18:22
  • 1
    @MichaelE2 I can't think of anything wrong in your description. Of course, my guess here would be as good as yours, since I didn't see the internal code for Reap/Sow. My understanding is that when Reap is wrapped around, it sets some kind of dynamic Block - like environment, which tells Sow to create bag-based containers for intermediate results as needed (i.e. every time a new tag appears, a new container is created and starts populating). When the code inside Reap is done executing, Reap performs tag filtering and copies collected objects into lists, and then bags are released. – Leonid Shifrin Aug 30 '22 at 20:10
  • Substituting for data, Developer`FromPackedArray[data], Table[Range[$dataSize], 10] and (a+b)^2000 // Expand yields varying results for me that have me confused. – Michael E2 Aug 30 '22 at 22:22
  • @MichaelE2 I am getting the same results every time (for the last input I had to use List @@ ((a+b)^2000 // Expand) to run the bag measurement code which expects a list). But I forgot to set $HistoryLength = 0 at the start of my post, while this setting is critical for correct measurements. Also, one has to ignore the first couple of runs of the code on a fresh kernel, since presumably some autoloading / nontrivial state changes take place. I am on Mac OSX, and use a pretty old build 12.1.1. on this machine. – Leonid Shifrin Aug 30 '22 at 22:52
  • @MichaelE2 I made changes to the answer, adding $HistoryLength = 0 and mentioning that the first few runs of code might have to be ignored to reach stable results. – Leonid Shifrin Aug 30 '22 at 23:08
  • 2
    Here's the unpacked example: https://i.stack.imgur.com/6Pa0F.png -- I may have just figured that one out (by giving f a definition): https://i.stack.imgur.com/mw5y9.png -- Reply to new comment: I noticed the need for some warm-up trials. – Michael E2 Aug 30 '22 at 23:09
  • 1
    @LeonidShifrin: if I would create a list using Reap[ Do[ Sow[ expr ] ... ] ] and Reap doesn't get evaluated because, e.g. of a throw statement inside expr, would that also count as Reap getting out of scope or its evaluation finishing? – Gert Sep 02 '22 at 09:24
  • @Gert That's a good question. I have added a new section to my answer, to address it. Check it out. – Leonid Shifrin Sep 02 '22 at 11:27
22

It looks like there's no memory being used by Sow when there is no encompassing Reap.

With Reap

Exit[]
a = Reap[Do[Sow[i], {i, 2^28}]];
MaxMemoryUsed[]
(*    4532982944    *)

which is a reasonable amount of memory: 4 Gigabytes are required for 2^28 numbers taking 16 bytes each.

Without Reap

Exit[]
Do[Sow[i], {i, 2^28}];
MaxMemoryUsed[]
(*    69922256    *)

which is 70 Megabytes: there seems to be no storage of the sown numbers anywhere, not even a temporary storage.

Roman
  • 47,322
  • 2
  • 55
  • 121