Memoization with pure functions?

Question

Is this possible?

If I have a simple function, say:

f=If[#>0,1,2]&

then for each value of # this will re-evaluate f right?

Is it possible to define a pure function like this:

f:=f=If[#>0,1,2]&

such that previous values of the function are stored for future use?

score 64 · Accepted Answer · edited May 23 '17 at 12:35

General

The conceptual problem with memoized pure functions is that pure functions typically (in fact, normally by their mere definition) do not cause side effects, while memoization necessarily requires side effects (changes of state). What was meant was probably to construct a memoized anonymous (lambda) - functions - this is possible, because the latter can manipulate mutable state.

A note on pure functions and terminology

Somewhat as a side note, but a rather important one: in fact, the standard notion of pure function in Computer Science is exactly this - a function without side effects. It is important to emphasize (as suggested by WReach in comments), that Mathematica's notion of pure function is different - in Mathematica, pure function is any function built with the keyword Function, regardless of whether or not the application of such function may cause side effects. It is an important distinction to keep in mind, particularly for those who come from other languages supporting pure functions (in the usual sense).

Speaking of side effects, their presence always means that the function manipulates some global state. While the essence is the same, this may take different forms:

Manipulating an external mutable state by using it implicitly in the body of the function
```
var = 1;
Function[var++]
```
Leaking internal state (Module- generated variables and such), and manipulating that (applies to closures constructed using Module or similar):
```
Module[{var = 1}, Function[var++]]
```
Mutating external variables, using (an emulation of) pass-by-reference semantics:
```
var=0;
Function[Null, #++,HoldFirst][var]
```

For the solution suggested below, we will be using the second version of side effects - the one relevant for mutable closures.

And once again, the functions constructed this way, are still called pure in Mathematica, but are not called pure elsewhere in the CS lore / literature.

The case at hand

In Mathematica, by pure function one usually means a function built with the Function keyword (as opposed to functions which are essentially global rules), and as such, it can contain side effects. So, you can do something like this:

ff = 
  Module[{f = <||>},
   Function[
     If[KeyExistsQ[f, #],
       f[#],
       f[#] = If[# > 0, 1, 2]
     ]
   ]
  ]

(* If[KeyExistsQ[f$1407, #1], f$1407[#1], f$1407[#1] = If[#1 > 0, 1, 2]]& *)

which would effectively work similarly to a memoized function.

Automation

The process can be automated with the following constructor:

ClearAll[makeMemoPF];
SetAttributes[makeMemoPF, HoldFirst];
makeMemoPF[body_, start_: <||>] :=
  Module[{fn = start},Function[If[KeyExistsQ[fn, #], fn[#], fn[#] = body]]]

where now you can simply write:

ff = makeMemoPF[If[# > 0, 1, 2]]

Advantages of this construct

One advantage I can see in this construct w.r.t. a usual memoized function is that, as with other functions based on Function, you can pass this without storing in a variable. The good thing here is that then, once this function is no longer referenced, it will be automatically garbage-collected, and that would also be true for the inner variable f, used to store the mutable state (memoized values).

Let me illustrate this aspect with the example of Fibonacci numbers. Suppose we just need to compute first 20 (say) of those, but use recursive function and take an advantage of memoization. We would write there:

Map[makeMemoPF[#0[# - 1] + #0[# - 2], <|0 -> 1, 1 -> 1|>], Range[20]]

(* {1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946} *)

and one can check that there were no leaks of inner variable we use for memoization, after this code executed - so it has been successfully garbage-collected (for those who are puzzled by #0, this is the syntax used to call Function recursively in Mathematica. More details can be found in the docs, and also e.g. here).

Extension: building controllable-size garbage-collectable caches

The technique above can also be extended in another interesting direction, where the standard memoization does not provide a simple solution: what if we want to limit the size of the cache (that is, a collection of memoized values)? I will only consider a simpler case, when we limit the number of stored elements - while the case when we limit based on ByteCount can be tackled too, but is more complex.

Here is the code that implements that. First, we need two auxilliary functions. The first one is a macro, to avoid using With, when we need to execute some code after we obtain the result, before returning it:

ClearAll[withCodeAfter];
SetAttributes[withCodeAfter, HoldRest];
withCodeAfter[before_, after_] := (after; before);

The other function we need is one to shrink an association to a given size, dropping key-value pairs from the start:

ClearAll[assocShrink];
assocShrink[a_Association, size_] /; Length[a] > size := Drop[a, Length[a] - size];
assocShrink[a_Association, _] := a;

Finally, the constructor for the cache:

ClearAll[makeCachedPF];
SetAttributes[makeCachedPF, HoldFirst];
makeCachedPF[body_, start_: <||>, cacheLimit_: Infinity] :=
  Module[{f = <||>},
    Function[
      If[KeyExistsQ[f, #],
        f[#]
        ,
        withCodeAfter[
          f = assocShrink[f, cacheLimit];
          f[#] = body
          ,
          f = assocShrink[f, cacheLimit]
        ]
      ]]];

What this does is pretty simple: it uses the fact, that the new key-value pairs are added from the right to an association, when assignment is used. Then, every time we add a new key-value pair, we also remove the "oldest" one from the left, if the total number of values stored in a cache has exceeded a given limit. In this way, we keep the maximal number of cached values under control.

Let us see how this works, using an example: here is our data:

data = RandomInteger[{1000, 1100}, 10000];

which is, a large number of values from 1000 to 1100. We now want to compute a function, that determines a total number of primes in Range[x], where x is our data point, on this data.

Map[Total[Boole@PrimeQ@Range[#]]&,data]//Short//AbsoluteTiming

(* {6.59317,{169,183,172,180,183,180,179,172,181,176 <<9980>>,168,175,180,168,184,174,168,174,169,174}} *)

Now, we can do the same with our cache construction, and since we know that we only have 100 different points, we can restrict our cache size to a 100:

Map[makeCachedPF[Total[Boole@PrimeQ@Range[#]],<||>, 100],data]//Short//AbsoluteTiming

(* {0.166174,{169,183,172,180,183,180,179,172,181,176 <<9980>>,168,175,180,168,184,174,168,174,169,174}} *)

We see very significant savings in computation time, while the cache size was fully controlled and fairly small. And again, once the computation finished, the cache (internal variable used to store it) has been garbage-collected, so we don't have to think about that at all.

Obviously, in this case, because the number of different values was small, the initial memoized function would do just as well in terms of cache memory consumption. It turns out to be about twice faster (on this example), than the controlled cache version:

Map[makeMemoPF[Total[Boole@PrimeQ@Range[#]],<||>],data]//Short//AbsoluteTiming

(* {0.088639,{169,183,172,180,183,180,179,172,181,176 <<9980>>,168,175,180,168,184,174,168,174,169,174}} *)

However, in general, we may either not know how many different values the function would be computed on, or find it unacceptable to store memoized values for all those different points.

One thing I did not implement, which is possible to add, is a version where every time when a value already in cache is encountered again, it is moved in an cache association to the right. That would somewhat improve the cache, for the price of slowing down the cached value lookup from a cached function. It may make sense to do this, if the function being computed is relatively expensive. Adding such code is easy.

Conclusions

So, in conclusion, this is a very good question and there indeed may be an advantage in using such constructs in certain circumstances, in terms of automatic garbage collection of memoized definitions when they are no longer needed.

I've also shown how one can extend this technique to create cached versions of pure functions, which differ from memoized versions in that the size of the cache can be controlled, so that it does not exceed certain number of stored values.

Note that the presence of Association in the language helps a great deal. One could probably do without it (e.g. using System`Utilities`HashTable), but one would still need some hash table - like data structure that would be automatically garbage-collectable - which is what the usual approach based on DownValues does not provide.

This is a really in depth answer, thanks for taking the time to explain it all. — maria, Jun 12 '15 at 13:09
@maria Glad I could help. Thanks for the accept. In general, though, it is a good idea to give some more time for others to come up with more answers, before accepting - this way some better answers have more chances to appear. — Leonid Shifrin, Jun 12 '15 at 13:18
+1 I'd just like to doubly emphasize the point you made in the second paragraph: the Mathematica jargon "pure function" is not the same thing as the contemporary notion of "pure function" -- despite the coincidental use of the same words. — WReach, Jun 12 '15 at 16:04
@WReach Thanks. I guess I was lucky to answer this first, I am sure that otherwise you'd leave me no chance to add anything :). I added a section on terminology, to stress that point more - see if you like the new text better. — Leonid Shifrin, Jun 12 '15 at 16:50
@Leonid Is the performance of the pure functions with memoization better or worse than of the traditional approach? — Alexey Popkov, Jun 12 '15 at 17:23
@AlexeyPopkov I did some simple benchmarks, it looks like the traditional approach is about twice faster for code that does very little. If the computation of the function is even a little expensive, the difference will likely be negligible. — Leonid Shifrin, Jun 12 '15 at 17:39
@WReach, can you expand please? Pure functions and state are contentious topics: eg http://www.johndcook.com/blog/2010/05/18/pure-functions-have-side-effects/, and http://stackoverflow.com/questions/2271417/is-erlang-really-a-functional-language, just to show diversity of opinions. — alancalvitti, Jun 12 '15 at 17:54
@alancalvitti I discuss the jargon "pure function" in somewhat more detail in (64624). I argue there that the Mathematica jargon "pure function" means nothing more than "anonymous function" and makes no judgment about side effects (observable or otherwise). — WReach, Jun 12 '15 at 18:36
A very nice answer with sophisticated and elegant examples, so +1, of course. I have probably misunderstood something because I have not played with Association (or version 10) very much so far. But, what is the insertion complexity for Association versus downvalues? Is it $O(N)$? And if Association is a hash table, how can we be certain that the older keys are to the left of the structure, or is this information stored separately? A miscellaneous point (which surely you are aware of): deleting keys rigidly by insertion order is not necessarily the best policy. — Oleksandr R., Jun 13 '15 at 14:10
@OleksandrR. Thanks, I appreciate! Re: Assocition insert complexity - should be O(log N) with a fairly small constant, so effectively constant time up to very large N. Association is not a usual hash table - it is a persistent data structure. I wrote a bit more about it here and here, but the main idea is that while addition of new /removal of old key-value pairs is cheap, you still get a brand new copy of an original association. It is better than DownValues, which, while having similar ... — Leonid Shifrin, Jun 13 '15 at 14:19
@OleksandrR. ... insert / delete complexity, do not produce new immutable structures (are mutable), so require manual memory management (are not automatically garbage-collected), and also, would for many applications require deep-copying of all key-value pairs, which would indeed be O(N) operation. Re: suboptimal strategy - yes, I actually noted that in one paragraph at the end of that section. Was too lazy to add an implementation for that, but that's pretty straightforward. Besides, this would still be a tradeoff, since other strategies would slow down the lookup of cached values. — Leonid Shifrin, Jun 13 '15 at 14:22
Thanks; I had not appreciated the advantages of immutability for Association until reading your discussion of it. And sorry for the superfluous comment: I should have read your answer completely before commenting. Do you know if there are any signs of higher-performance (not reference counting) garbage collection in future versions? If everyone starts using Association to its full advantage as you do here, it seems that this could become a significant performance bottleneck. Also, are the keys stored by reference or by value in the Association? — Oleksandr R., Jun 13 '15 at 14:36
@OleksandrR. One thing I forgot to mention here is that Association is ordered, so that the order in which key-value pairs were added is preserved (which is another difference bewteen it and DownValues). This is why I can be sure that the newly added key-value pairs will be always on the right, and the older ones on the left. — Leonid Shifrin, Jun 13 '15 at 14:37
@OleksandrR. I am not aware of plans to change the garbage collector in a near future, but that doesn't mean there arent - I may well not be aware of them (will ask around when I get a chance). FWIW, Associations are already very widely used internally, to the extent that it's kind of hard to imagine how we managed to live without them for so long. But they are mostly used (as far as I could see) as more high-level constructs, while my use above is pretty hard-core, so I fully share your concerns. — Leonid Shifrin, Jun 13 '15 at 14:42
@OleksandrR. The keys are stored by value, but this is a little tricker than just that: you can do, e.g. ClearAll[a]; assoc = <|a->1|>; a=1; assoc, to dicsover that assoc still returns <| a -> 1|> - so, the keys are computed at the time when they are added to an assoc, but then later they don't change even when / if they are changed outside. Of course, in this example, you then can't really get the value any more either by assoc[a] or by assoc[1] - you would need assoc[Unevaluated@a]. — Leonid Shifrin, Jun 13 '15 at 14:46
Leonid, I love your answers but I had to chuckle at the difference in our philosophy regarding withCodeAfter; so much code for what I just write as #&[before, after] :^) — Mr.Wizard, Jun 26 '15 at 01:18
@Mr.Wizard You know, this form you proposed here simply did not come to my mind :). I agree that it is superior. Very nice! I am off for today, but perhaps will update my answer with your code some time soon. Or feel free to do this yourself. — Leonid Shifrin, Jun 26 '15 at 01:40
@Mr.Wizard But in defense of my method, it communicates more clearly what's going on. Your method is something that needs some thought when seen in code, and may puzzle the uninitiated. If it becomes widely used by the community and so becomes an idiom, that would change. Still, I would argue that in the large body of code, one pays high price for many compact but cryptic expressions, and quite often that price is just too high. I've been working with rather large code bases in recent years, and my preferences certainly have been affected by the type of work I do. B.t.w., thx for the upvote! — Leonid Shifrin, Jun 26 '15 at 01:47
@Mr.Wizard A compromise here would perhaps just be this: withCodeAfter = # & , and then use withCodeAfter - which is probably the best solution here. — Leonid Shifrin, Jun 26 '15 at 01:55
I imagine you've seen it before and just forgot as I've used it a number of times here before. (I can only think of (38956) at the moment.) I understand your concern but I also am of the opinion that in this case one would be better served by learning the method and how to read it than by adding an abstraction. I do not believe that compact implies cryptic; even I feel that keeping code well condensed can have benefits for readability as more of it is seen at a glance which can give one a better overview of the code if formatted well. — Mr.Wizard, Jun 26 '15 at 02:27
@Mr.Wizard Here is the thing: the way one works with larger code bases is really different from the smaller ones. The complexity grows, you have to read much more code than you write (typically), and the important factor is to have a continuous flow of thought when you read the code. The point is, pure functions usually are associated with other things but not control flow, so typically when I see such use in code, it breaks the flow of my reading, even if I am well-familiar with the idiom. There are exceptions, surely, but those typically are for really widely used idioms. — Leonid Shifrin, Jun 26 '15 at 15:16
@Leonid Thanks (as always) for your insight. I don't work with large code bases so I don't know what it's like. If I have several hundred lines of code I think of it as a significant project. :-) Could you comment further on the factors effecting "you have to read much more code than you write?" — Mr.Wizard, Jun 26 '15 at 15:34
@Mr.Wizard Well, for large projects, big portions of the code stabilize, and no longer have frequent changes (unless, of course, a decision is made to rewrite that piece of code). However, the code remains "live" - other changes in other places may require generalizations, bugs are being fixed, etc. So, typically every so often the code is read by those who work with it, without necessarily being modified much or at all. — Leonid Shifrin, Jun 26 '15 at 17:48
@Mr.Wizard For smaller projects, code is typically changed frequently, so the write / read ratio is much larger. Also, a very important thing is that for smaller projects, it is typically much easier to directly test the code, and the interactivity and ability to test interactively makes it not as critical to have the code perfectly readable. While for larger pieces of code, often it isn't very easy to immediately run a given piece of it in isolation, and then readability becomes much more important still. — Leonid Shifrin, Jun 26 '15 at 17:51

score 1 · Answer 2 · answered Jun 12 '15 at 22:49

@Leonid's answer is quite good. However, there is an easier way and you will find examples through the Wolfram code you inspect and it is quite similar to the syntax you used. It's also Wolfram's documented method.

For functions whose outputs depend only on their arguments (so functions that do not rely on additional state, like a character being available on an input buffer or the current date/time), you can memoize via

f[x_]:=f[x]=If[x>0,1,2]

doing so uses Mathematica's mechanism to resolve overloaded functions based on its usual rules for more and less general arguments. This mechanism may or may not be more efficient than the map lookup in Loeonid's solution. (It really can go either way depending on implementation details that may very well change in every version of the kernel. I have timed implementation alternatives of the solution above and something similar to Leonid's (using explicit hashing to find matches, because maps weren't intrinsic types at the time) where the best choice was kernel version dependent.)

Note that the idiom does "odd things" if the output of the function does depend on anything other than the arguments. It behaves as if the external state the first time a particular set of arguments is used is the external state every time that particular set of arguments is used. As a consequence, this idiom (and Leonid's solution) are useful if your function is closer to what a computer scientist would call a "pure function". The Mathematica documentation uses that term to apply to an anonymous function.

Well, the main thing that I found interesting about a method I proposed is that it leads to automatic garbage-collection of results which are no longer needed - which is quite important in practice. The other thing which is quite hard to control in the standard approach you describe here, is a number of memoized results - in case when such control is desired, while in my suggested approach this is relatively easy. — Leonid Shifrin, Jun 12 '15 at 23:11
As to the standard memoization idiom you have described here, there have been multiple discussions of the standard memoization here before, for example here, or here (section called "Memoization / caching"). So, when answering, I was assuming that this standard idiom is well-known to the OP, who actually knew about that but wanted to use pure functions (Function). — Leonid Shifrin, Jun 12 '15 at 23:14
@LeonidShifrin: I saw that, but the OP's use of a named function, "f", was incompatible with the (Wolfram) definition of pure function that they mentioned. This suggested some disconnect in terminology. — Eric Towers, Jun 12 '15 at 23:31
Well, my guess is that the OP simply didn't know how to formulate better what she wanted, and used that f as a kind of hint to herself and the readers of the question. — Leonid Shifrin, Jun 12 '15 at 23:33
@LeonidShifrin: And I provided the alternate choice: that she wanted a (CS) pure function named "f". Now she has a full set of solutions. — Eric Towers, Jun 12 '15 at 23:36