19

I have read that using Reap and Sow to generate a list of unknown length is much more efficient than using AppendTo over and over. I believe AppendTo works by creating a new list that is one element longer than the given list, iterating over every element in the given list to add it to the new list, and then adding in the new element. Therefore, it makes perfect sense that using AppendTo is inefficient for long lists. But how do Reap and Sow do it better?

The documentation says that Sow[e] "specifies that e should be collected by the nearest enclosing Reap". My best guess is that Sow appends e to a linked list, and Reap converts this to an array–type list at the end. Is this correct? How does the program keep track of the values that Sow specifies should be collected? It has to store it somehow.

In addition, if you evaluate an expression that has Sow in it and do not enclose this in a Reap[], does the information that Sow generates get thrown out, is it not generated in the first place, or is it retained in some hidden way so that if that result is passed to Reap, Reap returns the same thing it would if it was given the unevaluated expression?

Michael E2
  • 235,386
  • 17
  • 334
  • 747
Allison B
  • 366
  • 1
  • 8

2 Answers2

18

I do not know how Sow and Reap are actually implemented, but to show you that it isn't magic, I can provide the following toy implementation for you.

ClearAll[sow, reap];
sow[x_, tag_] := x;

SetAttributes[reap, HoldFirst]; reap[expression_] := Block[{sow, a}, a = <||>; sow[x_, tag_] := With[{exp = x}, If[ MissingQ[a[tag]] , AssociateTo[a, tag -> <|1 -> exp|>]; , AssociateTo[a[tag], Length[a[tag]] + 1 -> exp]; ]; exp ]; {expression, Values /@ a} ]

Observe that reap locally overwrites the definition of sow. This way, sow will do anything nontrivial only if it is wrapped by reap. The accumulation can be done in any dynamic data type like a linked list or a hash table. I use a nested Association here, which is basically a hash table. Here is a simple usage example:

reap[
 Table[sow[i, Mod[i, 3]], {i, 1, 6}]
 ]

{{1, 2, 3, 4, 5, 6}, <|1 -> {1, 4}, 2 -> {2, 5}, 0 -> {3, 6}|>}

You may compare the output of

Trace[
 Table[sow[i, Mod[i, 3]], {i, 1, 6}]
 ]

with

Trace[
 reap[
  Table[sow[i, Mod[i, 3]], {i, 1, 6}]
  ]
 ]

to see why this local redefinition technique makes sense. Well, this brought me to the idea to run

Trace[
 Reap[
  Table[Sow[i, Mod[i, 3]], {i, 1, 6}]
  ]
 ]

vs.

Trace[
  Table[Sow[i, Mod[i, 3]], {i, 1, 6}]
]

and there something surprising happens: The latter reveals that the trace is actually longer without Reap because some obscure Message passing is going on. Message is typically used for error handling and usually it is very, very slow. In indeed, for some reason, Sow without Reap is actually slower than with Reap oO

n = 100000;
Reap[data1 = Table[Sow[i, Mod[i, 3]], {i, 1, n}]]; // AbsoluteTiming // First
data2 = Table[Sow[i, Mod[i, 3]], {i, 1, n}]; // AbsoluteTiming // First
data1 == data2

0.087347

0.24273

True

Henrik Schumacher
  • 106,770
  • 7
  • 179
  • 309
  • 8
    The last bit is a nice find. People, too, often work more efficiently and with less grumbling (= Message) if they feel there is some purpose to their work. :) – Michael E2 Apr 03 '21 at 21:51
  • @Henrik Schumacher - That's a nice implementation, and if I can't get a less hypothetical answer, I will accept yours. Are you sure that Reap overwrites the definition of Sow? The fact that Sow is less efficient without Reap makes me think that it's not being overwritten. – Allison B Apr 03 '21 at 22:35
  • 1
    No, as I said, I am not sure. It is also possible that Sow and Reap are not at all implemented in the language itsself, but rather on the "C-side". Moreover, one can nest Reaps in each other and IIRC Sow is supposed to be collected only in the most recent Reap. This is also not reflect in the implementation. As I said, it is just a toy model. – Henrik Schumacher Apr 03 '21 at 23:01
  • 1
    This was also quite nice. At one time I sometimes used hashing via pattern-free DownValues to emulate the functionality of expr bags. Those experiments long predate the appearance of Association. – Daniel Lichtblau Apr 05 '21 at 13:17
15

I will make an exception to the usual practice I try to follow on this site, and fully replicate my post from this answer of mine here, since it answers this question just as directly or even more, than the one I originally posted it for. I hope this will not cause objections.

In addition to the standard Reap - Sow functionality, this one also allows access to intermediate accumulated results at any time during the execution - something that standard Reap - Sow don't.


Here is a drop-in Reap - Sow replacement based on Internal`Bag structure, which is also the one that the actual Reap and Sow are based on:

SetAttributes[withSideEffect, HoldRest];
withSideEffect[code_, sideEffectCode_] := (sideEffectCode; code);

$storage = <||>

ClearAll[reap] SetAttributes[reap, HoldFirst]

reap[expr_] := reap[expr, _] reap[expr_, patt_] := reap[expr, patt, #2&]

(call : reap[expr_, patt_, func_]) /; !TrueQ[$inReap] := Block[{$storage = <||>, $inReap = True}, call ]

reap[expr_, patt:Except[_List], func_:Function[#2]] := MapAt[First, reap[expr, {patt}, func], 2]

reap[expr_, patt_List, func_] := Module[{result = expr, reaped, tag, matchingTags}, {reaped, matchingTags} = Reap[ Map[ With[{matchingTagsData = KeySelect[$storage, MatchQ[#]]}, Sow[Keys @ matchingTagsData, tag]; Composition[ KeyValueMap[func], Map[Internal`BagPart[#, All]&] ] @ matchingTagsData ]& , patt ], tag ]; If[matchingTags =!= {}, KeyDropFrom[$storage, First @ matchingTags]; ]; {result, reaped} ]

ClearAll[sow, $globalTag] sow[expr_] := sow[expr, $globalTag] sow[expr_, _] /; !TrueQ[$inReap] := expr sow[expr_, tags_List] := First @ Map[sow[expr, #]&, tags] sow[expr_, tag_] := withSideEffect[expr, If[!KeyExistsQ[$storage, tag], $storage[tag] = InternalBag[{expr}], (* else *) InternalStuffBag[$storage[tag], expr] ] ]

ClearAll[getCurrentData] getCurrentData[part: _Integer | {__Integer} | _Span | All : All] := getCurrentData[part, $globalTag]

getCurrentData[part: Integer | {__Integer} | _Span | All : All, tag] := Replace[ Lookup[$storage, tag, {}], bag: Except[{}] :> Internal`BagPart[bag, part] ]

It can serve as an explanation of how Reap-Sow work, but it can also be used to access the sowed data at any time, something that built-in Reap-Sow can't do:

reap[
  sow[1, {h[1], h[2], g[1]}];
  sow[2, g[2]];
  Print["The current data for tag ", h[1], " is ", getCurrentData[h[1]]];
  sow[3, {h[1], g[2], g[3]}]
  , 
  {_h, _g}
  , f
]

(*

During evaluation of In[283]:= The current data for tag h[1] is {1}

{3, {{f[h[1], {1, 3}], f[h[2], {1}]}, {f[g[1], {1}], f[g[2], {2, 3}], f[g[3], {3}]}}} *)

In terms of performance, these will be slower, but should not be too much slower than the built-ins, since they are based on the same underlying data structure.

Leonid Shifrin
  • 114,335
  • 15
  • 329
  • 420
  • 2
    I will be happy to expand on specific points of the above implementation which will cause questions or will be found hard to understand. Note also that while the above implementation uses actual Sow internally, this isn't critical and was only done as a short-cut, which one can in principle do without. – Leonid Shifrin Apr 03 '21 at 22:59
  • 2
    This is so slick we should all downvote it, and claim you had so many upvotes that the counter overflowed to negative. – Daniel Lichtblau Apr 04 '21 at 15:23
  • 2
    @DanielLichtblau Thanks :). That's the most interesting praise I've got for a post during my entire presence here on this site :). – Leonid Shifrin Apr 04 '21 at 16:12
  • 5
    That's what happens when you mix admiration and envy... – Daniel Lichtblau Apr 04 '21 at 16:37
  • 2
    @DanielLichtblau lol. I think this is mutual :) – Leonid Shifrin Apr 04 '21 at 19:45