SparseArray: how to efficiently extract/delete many entries?

Question

This is a follow-up of this question. We have a large $m\!\times\!n$ matrix stored as a SparseArray a and a list of positions pos. If it helps, we can assume that each position in pos corresponds to a nonzero entry, that has zeros left (in its row) and below (in its column).

I wish to (a) construct a list of values of a at positions pos and (b) delete those values in a.

The naive approach val=a[[pos]]; a[[pos]]=0; returns errors. An example of an input:

m=10000; n=20000; r=m*n*0.01; 
a = SparseArray[Transpose[{RandomInteger[{1,m},r],RandomInteger[{1,n},r]}] ->RandomChoice[{-1,1,2,-3},r],{m,n}]; 
pos = DeleteDuplicates@Transpose[{RandomInteger[{1,m},r/2],RandomInteger[{1,n},r/2]}];

My clumsy solution is the following, but it is unbearably slow:

val = Table[{i,j}=e; v=a[[i,j]]; a[[i,j]]=0; v, {e,pos}];

Most of the time is spent on performing a[[i,j]]=0, since this reconstructs the SparseArray.

Is there a faster way to do this?

Carl Woll · Accepted Answer · 2019-06-26T20:07:49.173

You can use your naive approach for 1D lists, so one idea is to convert your matrices to vectors, process the vectors using your naive approach, and then convert back to a matrix:

b = ArrayReshape[a, m n]; //AbsoluteTiming
newpos = {n, 1} . (Transpose[Developer`ToPackedArray @ pos] - {1, 0}); //AbsoluteTiming
e = b[[newpos]]; //AbsoluteTiming
b[[newpos]] = 0; //AbsoluteTiming
newa = ArrayReshape[b, {m, n}];//AbsoluteTiming

{0.110141, Null}

{0.014826, Null}

{1.03683, Null}

{1.03194, Null}

{0.067154, Null}

To do a comparison, I will truncate the length of pos to 1000:

pos = pos[[;;1000]];

b = ArrayReshape[a, m n]; //AbsoluteTiming
newpos = {n, 1} . (Transpose[Developer`ToPackedArray @ pos] - {1, 0}); //AbsoluteTiming
e = b[[newpos]]; //AbsoluteTiming
b[[newpos]] = 0; //AbsoluteTiming
newa = ArrayReshape[b, {m, n}];//AbsoluteTiming

{0.091493, Null}

{0.001057, Null}

{0.963493, Null}

{1.02483, Null}

{0.066962, Null}

It is interesting that the assignment steps take basically the same time, whether pos has a length of 1000 or 10^6. Your "slow" approach:

val = Table[{i,j}=e; v=a[[i,j]]; a[[i,j]]=0; v, {e, pos}];//AbsoluteTiming

{2.81587, Null}

Check:

a == newa
SparseArray[val] == e

True

True

So, the timing is comparable when pos has a length of ~1000. On the other hand, your approach for a length of ~10^6 will be about 1000 times slower.

Addendum

Another possibility is to create a mask, and then use the mask to zero out the positions:

masked = SparseArray[pos -> 0, {m, n}, 1] a; //AbsoluteTiming
masked == newa

{0.104856, Null}

True

It is also possible to use this approach to recreate the e vector from before:

index = SparseArray[pos->1, {m, n}]; //AbsoluteTiming
v = index a; //AbsoluteTiming
values = SparseArray[index (v - Min[v] + 1)]["NonzeroValues"] + Min[v] - 1; //AbsoluteTiming
newe = values[[Ordering @ Ordering @ pos]]; //AbsoluteTiming

newe == e

{0.030043, Null}

{0.008961, Null}

{0.06608, Null}

{0.323891, Null}

True

Thank you for your answer, but I've been playing with your solution these past days, and the commands e =b[[newpos]] and b[[newpos]]=0 are way too memory-inefficient. Even for a $10^4!\times!10^4$ matrix with density $10^{-3}$, it uses up more than 1GB of RAM. My goal is to work with $10^7!\times!10^7$ sparse matrices. Hope someone provides a new answer : ). — Leo, Jun 28 '19 at 17:15
@Leon Did you try my alternate approach? Also, what density are you looking at with $10^7 \times 10^7$? — Carl Woll, Jun 28 '19 at 17:26
The $10^7!\times!10^7$ matrices have density around $10^{−6}$, they come from homology theories (as boundary operators). Turns out that your alternate approach works very efficiently, so I'm accepting it. Thank you! — Leo, Jun 29 '19 at 21:35

SparseArray: how to efficiently extract/delete many entries?

1 Answers1