Efficient memory usage while building a large sparse matrix

Question

I am sure the following problem has been solved already, but I am unable to find any solution... Any help appreciated!

So I am building a pretty huge matrix (or tensor, actually) using

mat=SparseArray[ParallelTable[UnitStep[f[a,b,c,d]],{a,1,D},{b,1,D},{c,1,D},{d,1,D}]]]

In my specific case I have D=180 and f is a simple function (basically addition or subtraction of certain elements stored in a short list), such that the majority of the values UnitStep[f[...]] are zeros and the resulting matrix is indeed sparse.

Now, constructing this array takes quite some time, but more problematically, it uses around 12 GB of memory during the computation. I want to use less memory, but I do not know how. However, I think it should be possible because it is a very sparse array containing only zeros and a few ones. In particular, using

Export["mat.mx",max]

the final file on my computer is only 12 MB (instead of GB). Is there a better way?

Without the specific definition of f, it's hard to give advice. Anyway, if I have to guess, building the SparseArray from rules rather than converting a normal tensor to a SparseArray should help. (See the examples in document of SparseArray for more info. ) — xzczd, Jan 26 '23 at 03:56
Memoizing f may be more efficient than parallelizing. But as @xzczd says, nothing can be said without details on f. — Roman, Jan 26 '23 at 06:01
I vote to reopen the question. Even though the author never stated that explicitly, looking at the form of the SparseArray construction, which uses UnitStep, and given the information that the resulting sparse matrix is much smaller than the intermediate result, it is possible to conclude that UnitStep[f[...]] produces mostly zeros, and that the author wants to avoid consuming too much memory for the intermediate result. After that assumption, the question becomes well-defined. — Victor K., Feb 01 '23 at 08:15
mat = SparseArray[Reap[Do[val = UnitStep[f[a, b, c, d]]; If[val != 0, Sow[{a, b, c, d} -> val]], {a, 1, top}, {b, 1, top}, {c, 1, top}, {d, 1, top}]][[2, 1]]]; — Daniel Lichtblau, Feb 05 '23 at 20:10

Victor K. · Accepted Answer · 2023-01-26T06:37:42.873

Even though you have shared no details about f, as others have mentioned, I think I understand your problem.

Consider this (note that D is a protected symbol in Mathematica):

dim = 40;
x = ParallelTable[
  UnitStep[a - b - c - d], 
  {a, 1, dim}, {b, 1, dim}, {c, 1, dim}, {d, 1, dim}];
y = SparseArray[x];

On my machine, x occupies 65 MB of memory while y just 3 MB. So when you do

z = SparseArray[ParallelTable[
    UnitStep[a - b - c - d], 
    {a, 1, dim}, {b, 1, dim}, {c, 1, dim}, {d, 1, dim}]]; // AbsoluteTiming
(* {0.600104, Null} *)

Mathematica uses 65 MB of memory to store the intermediate calculation result.

Instead, you can generate the list of rules for SparceArray directly:

zz = SparseArray[Reap[Do[
       If[UnitStep[a - b - c - d] == 1, Sow[{a, b, c, d} -> 1]],
       {a, 1, dim}, {b, 1, dim}, {c, 1, dim}, {d, 1, dim}
     ]][[2, 1]], {dim, dim, dim, dim}]; // AbsoluteTiming
(* {3.55455, Null} *)
z == zz (* True *)

The code above is not parallelized and is 5.6x slower than the original on my machine, which has 8 cores. You cannot naively parallelize Sow, but you can use Mr.Wizard's trick with minimal performance loss over the original solution (and significant memory gain!).

ParallelEvaluate[foo = {}];
sow[x_] := (foo = {foo, x};)
ParallelDo[
   If[UnitStep[a - b - c - d] == 1, sow[{a, b, c, d} -> 1]],
   {a, 1, dim}, {b, 1, dim}, {c, 1, dim}, {d, 1, dim}]; // AbsoluteTiming
(zzz = SparseArray[
     Flatten@ParallelEvaluate[Flatten@foo], 
     {dim, dim, dim, dim}];) // AbsoluteTiming
{0.444935, Null}
{0.299598, Null}
zzz == z (* True *)

Dear Victor K. Thanks for the answer, that solved exactly the problem (although unfortunately I am not able to understand Mr.Wizard's code to the detail, but it works!). And indeed, f was pretty much the function you wrote down, with the only difference that it called elements list[[a]] from a list instead of directly adding/substracting a,b,c,d. However, I think this neither slows down the code nor does it suck up memory (unless , maybe,```list''' is huge, but it isn't). Thx! — Philipp Strasberg, Jan 27 '23 at 17:36

Efficient memory usage while building a large sparse matrix

1 Answers1

Linked