Preamble
This complements the answer of Roman with a few more details.
Reap - Sow implementation is based on an internal object Internal`Bag, which is properly garbage-collectable. Once an expression wrapped in Reap, where Sow has been internally used, goes out of scope, or its evaluation is finished, all these objects are GC-ed and the memory is released.
Also, whenever there is no Reap around Sow, no collection of sown values seems to be at all attempted, and no extra memory is used (i.e., Sow then acts like Identity or #&).
Memory consumption measurements
Reap - Sow measurements
Let us now illustrate that. First, measure memory consumption for 3 cases: simple computation (idle function mapped on a large list), same with Sow in place of an idle function, and same with Reap wrapped around code with Sow inside:
ClearAll[$dataSize, data, wors, rwos, rws, storedBag]
$HistoryLength = 0;
$dataSize = 100000;
data = Range[$dataSize];
wors = MaxMemoryUsed[f /@ data] (* No Reap - Sow at all )
rwos = MaxMemoryUsed[Sow /@ data] ( Sow without Reap )
rwos - wors ( Possible extra memory used by Sow without Reap )
rws = MaxMemoryUsed[ Reap[Sow /@ data]] ( Sow with surrounding Reap )
storedBag = rws - rwos ( How much memory take internal structures storing sown results *)
(*
7989840
7990136
296
8856288
866152
*)
The first conclusion we make, is similar to what Roman has stated: there is pretty much no memory wasted when Sow is used without Reap (difference is just a few bytes, 296 here) for even decently sized data.
The last value 866152 is what has been used to internally store sown data.
Internal`Bag[] measurements
Let us now experiment with the Internal`Bag[] structure:
bag = Internal`Bag[]; (* Initialize the bag *)
Do[Internal`StuffBag[bag, i], {i, data}]; (* Fill the bag with the same data *)
ByteCount[bag] (* Unfortunately, ByteCount does not give correct value for bags *)
mu = MemoryInUse[]; (* which is why here we measure the used memory using MemoryInUse *)
Remove[bag]
bagUse = mu - MemoryInUse[]
(*
33
866248
*)
The first number shows that ByteCount can not be trusted for bags.
Comparison
The second number we can compare to the value of storedBag variable, which we obtained earlier and in a completely different way:
bagUse - storedBag
(* 96 *)
I wasn't able to track this remaining difference of 96 bytes down and explain it, but it stays fairly constant when we vary $dataSize within some range, and is a pretty small residual value, compared to the total amount of memory used.
Please note: when running the above code on a fresh kernel, you may need to ignore the first few runs, to start getting stable results similar to those I quoted above. The reason probably has to do with some initialization / autoloading process, although this is just a guess.
What happens if code inside Reap returns early
This has been asked in comments and is a good question. Here is an illustration:
rwsReturn = MaxMemoryUsed[
Reap[If[# > $dataSize /2, Return[#, Reap], Sow[#]] & /@ data]
]; (* Sow with surrounding Reap, but exiting early *)
storedBagReturn = rwsReturn - rwos (* How much memory takes internal structures storing sown results *)
(* 434160 *)
In this case, I used a 2-argument Return to return early, but the same would've happened had I used Throw / Catch instead. What we see is that memory still has been filled with data up to the point of early return - we get almost exactly half the memory used in the full evaluation case, which is what we would expect here.
Here is a crude way to model how this works:
ClearAll[reap, sow, $storage, $inReap]
$inReap = False;
SetAttributes[{reap}, HoldAll]
reap[code_] := # &@ Block[
{$inReap = True, $storage},
{code, Internal`BagPart[$storage, All]}
]
sow[arg_] /; !TrueQ[$inReap] := arg;
sow[arg_] := If[
! ValueQ[$storage],
$storage = Internal`Bag[{arg}]; arg,
Internal`StuffBag[$storage, arg]; arg
];
The # &@ part in reap implementation is needed if one wants to be able to use 2-arg Return on reap, otherwise one can remove it.
This gives exact same results:
storedBagReturn = MaxMemoryUsed[
reap[If[# > $dataSize /2, Return[#, reap], sow[#]] & /@ data]
]
storedBagReturn = rwsReturn - rwos (* How much memory takes internal structures storing sown results *)
(*
8424264
434160
*)
So, even though Reap has been interrupted and the sown results have been discarded as well as the result of evaluation, otherwise Reap - Sow work as usual. What matters is that Reap being wrapped around the code creates a dynamic environment in which Sow does collect the data, rather than being idle (which happens when there is no Reap around the code). Whether or not the evaluation is interrupted, does not affect the "collecting" vs "idle" mode for Sow.
Summary
The above analysis indicates that:
Sow without surrounding Reap does not use any noticeable extra memory (w.r.t. computations without Sow).
- Memory consumption of
Sow with surrounding Reap is in good agreement with what one would expect based on the behavior of the underlying Internal`Bag[] structure.
- We have seen that bags are automatically GC-ed once not referenced, which explain why
Reap and Sow behave the same.