38

Python's built-in function enumerate takes an iterable over $(a_0, a_1, \dots )$ as argument and returns an iterable over the sequence of pairs $((0, a_0), (1, a_1), \dots)$. For example:

>>> for p in enumerate(('a', 'b', 'c', 'd')):
...     print p
... 
(0, 'a')
(1, 'b')
(2, 'c')
(3, 'd')

Furthermore, the value returned by the enumerate function is actually a "generator" object, which means that it generates the $(i, a_i)$ pairs lazily, as it iterates over them. (This is particularly important, of course, when the iteration is over a very large number of items. In fact, enumerate accepts potentially "infinite" arguments.)

Now, given some arbitrary List X, the expression

Transpose[{Range[Length[X]], X}]

will produce a similar list of pairs, but I'd like to know if Mathematica has a built-in analogue of Python's enumerate (hopefully with lazy evaluation as well).

Leonid Shifrin
  • 114,335
  • 15
  • 329
  • 420
kjo
  • 11,717
  • 1
  • 30
  • 89
  • 7
    Something like MapIndexed[Flatten[{##}] &, {x, y, z}]? I'm not sure about the lazy evaluation. – march Jun 05 '15 at 22:39
  • 2
    Well, there is no lazy evaluation in Mathematica as standard, so for that part of the question we can answer a definite "no". But certainly there are examples of lazy approaches available on this site, so it should not be too difficult to build what you need. – Oleksandr R. Jun 05 '15 at 22:47
  • 1
    @OleksandrR. Now there is some (undocumented, see my answer), although I can't say whether that would stay or not. – Leonid Shifrin Jun 06 '15 at 18:24

2 Answers2

55

Streaming` module - general, and the case at hand

Starting with V10.1, there is an undocumented support for certain lazy operations in Mathematica. However, the primary goal of Streaming` is to support out of core computations reasonably efficiently, and lazy operations are only the secondary goal.

Example: lazy infinite lists and an analog of enumerate

Here is an example.

Load the Streaming` module:

Needs["Streaming`"]

Define an infinite lazy list of integers:

integers = LazyRange[Infinity];

Form an (infinite) lazy list of primes:

primes = Select[integers, PrimeQ];

Enumerate this list (lazily):

enumerated = MapIndexed[{#2[[1]], #1} &, primes]

Extract some elements:

Take[enumerated,{10000,20000}]//Normal//Short

(*
 {{10000,104729},{10001,104743},{10002,104759},<<9996>>,{19999,224729},{20000,224737}}
*)

Example: traversing a large list, and saving memory

Consider a following example: we have a huge list of matrices, whose elements are only 0 or 1, which we must traverse, for example we want to select only those of them which satisfy a certain criteria.

In-memory version

To be specific, consider this code on a fresh kernel:

Quit
(tuplesMem= Tuples@Table[Tuples[{0,1},11],{i,1,2}])//ByteCount//AbsoluteTiming

(*  {0.381172,738197664} *)

We now select the matrices, which have exactly 3 non-zero elements:

Select[tuplesMem,Total[Flatten[#]]==3&]//Short//AbsoluteTiming

(* {13.9526,{{{0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,1,1,1}},<<1538>>,{{1,1,1,0,0,0,0,0,0,0,0},<<1>>}}}
*)

We can inspect how much memory was required to carry out this operation:

MaxMemoryUsed[]

(* 2008377104 *)

and see that it was about 2Gb of RAM.

Lazy / out-of-core version

Now, let us try to use the out-of-core machinery that Streaming` provides. Here is some preparatory code (we'll need to quit the kernel to have a clean experiment):

Quit

Needs["Streaming`"];
Streaming`PackageScope`$LazyListCachingDirectory = $StreamingCacheBase 
  = FileNameJoin[{$TemporaryDirectory, "Streaming", "Cache"}];
    If[!DirectoryQ[$StreamingCacheBase],CreateDirectory[$StreamingCacheBase]];

(formatting is not ideal due to a bug in SE formatter for code involving $ sign). We will also need to load the code for a lazy version of Tuples, which is not part of Streaming yet:

Import["https://gist.githubusercontent.com/lshifr/56c6fcfe7cafcd73bdf8/raw/LazyTuples.m"]

Now we are ready to test things. So we do:

(lazyTuples = LazyTuples[Table[Tuples[{0, 1}, 11], {i, 1, 2}], 
 "ChunkSize" -> 100000]); // AbsoluteTiming

(* {0.410596, Null} *)

which defines a lazy list of tuples. Now we can try using Select:

(sel = Select[lazyTuples, Total[Flatten[#]] == 3 &]); // AbsoluteTiming

(* {0.00379, Null} *)

which takes almost no time, since Select is lazy by default, on a lazy list. We can inspect that by this time, we still don't use any HDD memory, and the RAM usage has been pretty modest yet:

MaxMemoryUsed[]
Total[FileByteCount /@ FileNames["*.mx", {$StreamingCacheBase}]]

(* 
  41693800

  0
*)

Now, the real work in this approach happens when we request data from the list:

Normal[sel]//Short//AbsoluteTiming

(* {38.6308,{{{0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,1,1,1}},<<1538>>,{{1,1,1,0,0,0,0,0,0,0,0},<<1>>}}} *)

We see that it took about 3 times as much time to get the result in this approach, compared to the previous in-memory approach. Let us now see at memory use:

MaxMemoryUsed[]
Total[FileByteCount /@ FileNames["*.mx", {$StreamingCacheBase}]]

(*
   112128792

   738209516
*)

What we see is a much (almost 20 times) more modest RAM use, but a substantial use of HDD space, where the chunks of the LazyList were saved.

Garbage collection issues

If we now destroy our 2 lazy lists:

LazyListDestroy /@ {sel, lazyTuples}

(* {Streaming`Common`ID[{3642634309, 1}], Streaming`Common`ID[{3642634221, 0}]} *)

those files will be automatically deleted by Streaming garbage collector:

Total[FileByteCount /@ FileNames["*.mx", {$StreamingCacheBase}]]

(* 0 *)

There is a way to make sure that those lists will be destroyed automatically, in case if they are only needed for this particular computation - with the help of LazyListBlock:

LazyListBlock[
  Normal @ Select[
     LazyTuples[Table[Tuples[{0,1},11],{i,1,2}],"ChunkSize"->100000],
     Total[Flatten[#]]==3&
  ]
]//Short//AbsoluteTiming

(* {35.9029,{{{0,0,0,0,0,0,0,0,0,0,0},{0,0,0,0,0,0,0,0,1,1,1}},<<1538>>,{{1,1,1,0,0,0,0,0,0,0,0},<<1>>}}} *)

and in this case, there are no files left on disk after the code has finished:

Total[FileByteCount /@ FileNames["*.mx", {$StreamingCacheBase}]]

(* 0 *)

Notes

This answer should not be considered as any kind of tutorial on this functionality, but just as an illustration. Also, there is no guarantee, that this functionality will remain in future versions and / or have the same syntax in the future. It may also suffer from efficiency problems, to a smaller or greater extent depending on the task, since it has been implemented in top-level Mathematica.

Note by the way, that technically the lists constructed above are not fully lazy. What really happens there is that data is divided into chunks, and a given operation (Map or whatever) is applied to the entire chunk at the same time. The chunk size can be controlled, but the laziness is only there on the coarse - grained level (per chunk) - this was done to keep the performance reasonable. One can, in principle, in most implemented lazy functions, set chunk size to be one element, but that would very seriously degrade the performance.

Leonid Shifrin
  • 114,335
  • 15
  • 329
  • 420
  • 3
    Niiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiice – Rojo Jun 06 '15 at 21:31
  • @Rojo Well, thanks, glad you liked it! But this is really a rather small side effect of what Streaming is supposed to do. I can add some more examples when time permits. Besides, (lazy version of) MapInxeded is currently really very sub-optimally implemented, and there are some bugs with infinite lists. – Leonid Shifrin Jun 06 '15 at 21:36
  • @Rojo Ok, I added one example that would give some more real taste of what Streaming` is basically about. – Leonid Shifrin Jun 06 '15 at 23:09
  • Thanks @Leonid, this will be useful! Perhaps now I can stop saving for a new PC with 32GB RAM and settle for 8 :) I've been playing around with Haskell lately and grown fond of it and its lazy data structures; glad to see some of that around here. In a related note, earlier I was counting, given a random permutation of Range@n, the distribution of the number of elements whose value coincides with their position. Had to stop at n=7 or 8 aprox due to RAM. Perhaps with this I could have gone a couple of nums higher – Rojo Jun 07 '15 at 00:11
  • @Rojo Re: settle for 8 Gb - I'd be happy to learn that this stuff would help you towards that goal, but in reality there are a number of limitations for Streaming still. But in principle, and in time, hopefully it would serve this purpose. This was in fact one of the main goals for it. Re: random permutation of Range - I guess that could work, although one would need an implementation of a lazy random permutation (which isn't yet there in Streaming), and there are a few more technicalities that need to be taken care of. Let me know if you decide to tackle this using Streaming. – Leonid Shifrin Jun 07 '15 at 00:25
  • @Rojo Re: Haskell + lazy structures - well, as I mentioned in the post, the structures in Streaming are, generally, not fully lazy - which was necessary to keep decent performance given that Streaming is currently entirely top-level code. In practice however, in many data-processing applications with large amonts of data, those operations will have "look and feel" of lazy ones. They just have coarser granularity (than one element). – Leonid Shifrin Jun 07 '15 at 00:29
  • Can you add some more examples of how to use the Lazy functionalities, I tried using LazyFold and it wasn't giving me the expected result. – RunnyKine Jul 07 '15 at 07:05
  • @RunnyKine I don't have much time right now to add significant examples, but LazyFold works indeed a little differently from Fold. The function to be folded takes 2 arguments: the previous result (#1), and the next chunk (note: a list of elements, not a single element) (#2). So, for example, to compute a total of a LazyList, you can use Fold[#1+Total[#2]&, 0, lazylist], where for usual lists and usual Fold, you'd use Fold[#1+#2&, 0, list]. Let me know if this makes thing work for you. – Leonid Shifrin Jul 07 '15 at 14:10
  • That works for me, thanks. One more question, I know you're busy, I appreciate your response. How can we create a LazyList other than using e.g. LazyRange. Can we use LazyListCreate for example, or some other way to achieve this. For example, let's assume we want a LazyList of the triangle numbers which we can create normally like so: FoldList[#1+#2&, 0, Range[1,10^6]], how can we create a LazyList out of that. – RunnyKine Jul 07 '15 at 16:36
  • If you have some normal list of data, you can use LazyListCreate[lst, size-of-the-chunk], to create a LazyList out of it, with the specified size of teh chunk (length, number of elements, not ByteCount). There are other ways too (also using LazyListCreate), but they are somewhat more involved. – Leonid Shifrin Jul 07 '15 at 17:19
  • @RunnyKine Sure, np. At some point I will write up a much more detail tutorial on this stuff, but right now both I don't have the time and the functionality itself has not stabilized. B.t.w., keep in mind that chunk size may affect the performance quite seriously. For large lists, it generally is preferable for best performance to keep chunks large enough, like several thousands elements and more (of course, that also depends on how large the elements are, on the average). – Leonid Shifrin Jul 07 '15 at 17:51
  • Is it just me, or does the "Streaming"` package no longer work in 11.2? At least, not the way it seems to be working in this answer. – Sjoerd Smit Nov 24 '17 at 08:51
  • @SjoerdSmit There are two possible reasons for what you observe. One is that Streaming is indeed broken in recent versions of Mathematica. The patch to make it work is described here. The other is that Streaming API somewhat changed since the above post has been written. I will try to set the time to update the examples above, and also perhaps extend them a bit, but you can try the patch and see it works for you. – Leonid Shifrin Nov 24 '17 at 14:59
11

Here are some ways this could be done:

list = CharacterRange["a", "g"];
Thread[{Range[Length@list], list}]
Transpose[{Range@Length@list, list}]
Table[{j, list[[j]]}, {j, Length@list}];
MapIndexed[{#2[[1]], #1} &, list]
Inner[List, Range@Length@list, list, List]

You could re-index by using Range[0, Length@list-1]

or

i = 1;
{i++, #} & /@ list

or more ridiculously,

j = 1;
Fold[Append[#1, {j++, #2}] &, {}, list]

Partition[Riffle[Range@Length@list, list], 2]

k = 1;
list /. x_String :> {k++, x}
ubpdqn
  • 60,617
  • 3
  • 59
  • 148