Manage to save large arrays

Question

Some time ago I asked the following question:

I was easily answered and I was satisfied by the answer. However in computing such combinatorics, I saturated the RAM very easily, both in my computer and in a cluster.

I would like to know if there is a way to save the output coming from Distribute while Mathematica is running maybe in different files (one for component, I was thinking, but maybe there is also some more sophisticated way), so that it does not occupy the RAM and I can later call part of the output when I need it, and the RAM saturation should be solved.

I'm also open to the possibility that there're other ways to store the large output coming from Distribute (or equivalent) so that I can actually do the computation without problem.

I have the strong feeling that this is an X-Y question. By using Outer, Tuples, or Distribute from the other answer, you just bloat your memory without getting any further bit of information. Why don't you appreciate that the data was seemingly well compressed in the first place? My suffestions: Find a way to generate small chunks of the uncompressed data from the compressed one as needed. — Henrik Schumacher, Feb 27 '20 at 20:57
Also another memory-friendly suggestion: Try to avoid symbols in large arrays. Instead, index your symbols by integers (you can use a simple flat list as dictionary and its PositionIndex as dictionary). Then you have the chance to work with packed arrays which can be stored and processed very efficiently. Plus, you can employ Compile to speed up procedures on your data. — Henrik Schumacher, Feb 27 '20 at 21:04
Do you really need to generate all the tuples, to only use (part of) them later? Why not just generate the tuples you need by index? — ciao, Feb 27 '20 at 21:58
@ciao, yes I need all the tuples. What I meant was that I need to call them one by one later, but I need to call them all. So I cannot generate only those I need because I need them all. — Alessandro Mininno, Feb 28 '20 at 08:14
My idea was maybe to generate them and use them and then discard them maybe. So that I don't need to save them. It's an intermediate step in my code, so I just need the tuple to manipulate later in different functions, but I need to repeat such manipulation for any Tuple — Alessandro Mininno, Feb 28 '20 at 08:15
@henrik yes, I don't have symbols in my array. In the other question the letters were just a short way to write an actual array with numbers. The dimensions of such array can change but it doesn't matter, what I needed was the combinatorics with all the other arrays. I'm not good with compressed data management, maybe I didn't get what you're suggesting: hash tables? — Alessandro Mininno, Feb 28 '20 at 08:20
@AlessandroMininno - in that case, just write a simple function to generate the tuples, say by index, and call that function with the sequence of index values.
That said, I have the same feeling as Henrik that this may be an X-Y question: What precisely is the end game here? Are you generating some combinatorial structure and then counting members that meet some criteria? If so, why not just do it directly using combinatorial means? — ciao, Feb 28 '20 at 08:21
@ciao The idea is that I have actually a matrix as in the previous question, now, I need to generate all possibile combinations because I need the matrices formed by such combinations for computing polytopes given by those vectors. Now, the example in the previous question is small, suppose that I have instead of a vector {a,b} a vector containing 20 components and each component is an array by 10 components and I have 18 {a,b} like vectors. How can I efficiently create such tuples? — Alessandro Mininno, Feb 28 '20 at 08:32
@AlessandroMininno Hm. Then the question is what you are about to do with the polytopes? Do you really, really need them all at the same time? Just as an example consider linear programming: There on tries to minimize a linear function on a polytope. To this end, one often applies the simplex algorithm: In each step one sits at one of the boundary vertices of the polytope and one requires only knowledge about the neighboring boundary facets. And such information can probably be computed on the fly (and discarded afterwards). — Henrik Schumacher, Feb 28 '20 at 08:39
@HenrikSchumacher well, I don't need all the polytopes at the same time, no, you're right. This is why I was asking if there is a way to generate the first polytope, maybe saving the matrix on a file, move on the second one and go on. Or maybe I can generate the first polytope, I do what I need to do with that, I save my final output, and I move to the second polytope. I agree with you, in fact, I don't need to generate all possible tuples in once, I can generate it also one by one but they need to be different one from the other. — Alessandro Mininno, Feb 28 '20 at 08:42
@HenrikSchumacher If you know the code of how the function Distribute or Tuples is written, maybe it's sufficient to add a line "Export" to the end, so that instead of storing it in an array, it stores it in a file and then it doesn't memorize it. Is it a crazy idea? — Alessandro Mininno, Feb 28 '20 at 08:43
@HenrikSchumacher I didn't know. So, I think that maybe the best Idea is to still see how Distribute works and instead of doing the storing in an array, just select the tuple, I will use it, and then I move to the following tuple. Everything inside the Distribute function. That's should be like using the data without first generate them all. Right? Do you have any suggestion? — Alessandro Mininno, Feb 28 '20 at 08:45
Yes, it is probably crazy. Saving the data will be 1000 times slower than recomputing it. Even on an SSD, hard drive is waaaay slower than RAM access and RAM access is slow compared to what modern CPUs can bite through. — Henrik Schumacher, Feb 28 '20 at 08:45

score 2 · Answer 1 · answered Feb 28 '20 at 09:05

Here's a quick-n-dirty way to generate the tuples by index. This will handle things that would be preposterous to try to generate first using Tuples. This should give you a start, it can be easily modified to generate tuples in batches of arbitrary size.

tups[l_, n_] := Module[{l1 = Length@l,l2 = Rest[Length /@ l]}, 
   Extract[l, Transpose[{Range@l1, IntegerDigits[n - 1, MixedRadix[l2], l1] + 1}]]];

An example tuples generation list of 20 elements, each of 10 elements, each of two elements:

example = ArrayReshape[Range@400, {20, 10, 2}];

This would generate 10^20 tuples...

First tuple:

tups[example, 1] // Short // AbsoluteTiming

{0.0002863,{{1,2},{21,22},{41,42},{61,62},<<12>>,{321,322},{341,342},{361,362},{381,382}}}

A random deep tuple:

tups[example, 95675769776785995775] // Short // AbsoluteTiming

{0.0002038,{{19,20},{31,32},{53,54},{75,76},{91,92},<<11>>,{331,332},{355,356},{375,376},{389,390}}}

That's great! So I just need to know the number of tuples that I expect, and the function compute such combination in zero time. I think that this is exactly what I was looking for. I'll try it on some of my examples! Thank you! — Alessandro Mininno, Feb 28 '20 at 09:36

Manage to save large arrays

1 Answers1