Adding two SparseArrays produces zeros in the reported "NonzeroValues"

Question

When two SparseArrays are added together and new zero values are created, these new zero values are reported as "NonzeroValues". Example, produced with Mathematica version 10.2:

tst = SparseArray[{1, 0, 1, 0, 1}] - SparseArray[{1, 0, 1, 0, 1}];

tst["NonzeroValues"]
{0, 0, 0}

tst["NonzeroPositions"]
{{1}, {3}, {5}}

It appears that SparseArrays constructed in this way can become "polluted" with lots of false non-zeros. Is there a way to get Mathematica to quickly compact such a SparseArray and strip out the introduced zeros? In my application, I produce large sparse vectors through many such additions, and I need to quickly identify the positions of nonzero entries.

Edit: My application is similar to RowReduce. I have a large sparse matrix of mostly zeros and ones, and I am implementing pivoting, with selection rules based on the number of nonzero elements in the rows and columns. After a pivot, the number of nonzero elements will change for many of the rows of the matrix. My matrices have hundreds of rows and columns, with densities of around 1%.

score 11 · Accepted Answer · answered Oct 17 '15 at 02:35

11

Re-applying SparseArray[] to a matrix or vector generated in this way usually restores the sparsity.

p = SparseArray[{Band[{1, 1}] -> {1, 2, 4}, Band[{2, 1}] -> {5, -3}}, {3, 3}];
q = SparseArray[{{2, 2} -> -2, {3, 2} -> 3, {3, 1} -> -1}, {3, 3}];

r = p + q;
rs = SparseArray[r];

Complement[r["NonzeroPositions"], rs["NonzeroPositions"]]
   {{2, 2}, {3, 2}}

answered Oct 17 '15 at 02:35

J. M.'s missing motivation

124,525
11
401
574

This solves the problem, but it requires recreation of the entire SparseArray. In my application, the number of nonzero elements is in the thousands or tens of thousands. Perhaps Mathematica's SparseArray implementation simply does not give a better way to do this. – dcutrell Oct 17 '15 at 02:49
I'm not aware of any tidier way, either. In this example, of course, I could have done r = SparseArray[p + q]; directly; I just wanted to demonstrate that the new zeroes do get recognized. – J. M.'s missing motivation Oct 17 '15 at 02:52
I see now that this does solve my problem in an acceptable way. I am updating a large matrix in a row-by-row fashion, with a need for accurate zero counts after each row change. Hence, I only need to recreate the SparseArray for the particular row I am changing, not the whole matrix. – dcutrell Oct 17 '15 at 03:07
Yes, a construction like {SparseArray[{1, 0, 1, 0}], SparseArray[{0, 0, 1, 1}]} is certainly possible. After doing your row operations, you can then apply SparseArray[] to the whole thing. – J. M.'s missing motivation Oct 17 '15 at 03:36
@J.M. Is it a bug or expected behavior? – Alexey Popkov Oct 17 '15 at 07:38
@dcutrell "but it requires recreation of the entire SparseArray" <- What evidence do you have that this operation is inefficient? Re-applying the SparseArray had is just one possible user-interface for compacting the sparse array, and it's not clear at all how this is implemented internally. It could, in principle, be as efficient as possible. – Szabolcs Oct 17 '15 at 07:46
@Alexey, which one? If you're referring to the results of subtraction still being counted as nonzero entries, I'm loath to call it a bug. The case of inexact arithmetic is especially murky; for that, how does one determine the threshold for zeroing out differences that are near, but not quite zero? – J. M.'s missing motivation Oct 17 '15 at 08:02
@J.M. As I understand, actually both examples (in the question and your's) demonstrate exact arithmetic. For the inexact cases like SparseArray[{1., 0., 1., 0., 1.}] - SparseArray[{1., 0., 1., 0., 1.}] zero threshold would be sufficient: 0.` is zero, otherwise it isn't. – Alexey Popkov Oct 17 '15 at 08:15
@Alexey, Yes, I was just thinking of a rationale on why "fake" nonzero entries aren't automatically zeroed out in general. This is a relatively easy case, of course. – J. M.'s missing motivation Oct 17 '15 at 08:17
1

@J.M. The only rationale that come to mind is performance of finding zero positions after every modification of SparseArray: it would be logical to determine such things only when it is necessary, not on every stage of computation with SparseArrays. That is the reason for my question: is it "by design" and the user have to apply SparseArray to the final result before using things like "NonzeroPositions" or is it a bug? Earlier I thought that "NonzeroPositions" aren't stored and calculated on the fly. – Alexey Popkov Oct 17 '15 at 08:22
2

@AlexeyPopkov I would like to remind that the whole "Nonzero*" are undocumented, thus should be immune from prosecution. :) – Silvia Oct 17 '15 at 13:23
@SZabolcs The operation is necessarily inefficient because afterwards there are two complete copies of the given SparseArray. "Re-applying the SparseArray" does not change the original SparseArray. tst = SparseArray[{1, 0, 1, 0, 1}] - SparseArray[{1, 0, 1, 0, 1}]; newTst = SparseArray[tst]; tst["NonzeroValues"] {0,0,0} newTst["NonzeroValues"] {} – dcutrell Oct 18 '15 at 17:19
@dcutrell, you can overwrite the original with the sparsified one (tst = SparseArray[SparseArray[{1, 0, 1, 0, 1}] - SparseArray[{1, 0, 1, 0, 1}]];), but is your memory really that taxed for you not to want to maintain two copies? – J. M.'s missing motivation Oct 18 '15 at 17:22
@dcutrell I still don't understand. You don't need to keep both. You can sa = SparseArray[sa];, then there will only be one copy left. – Szabolcs Oct 18 '15 at 17:23
@Szabolcs, it is true that Mathematica's memory collection will reclaim the memory from the old copy if no reference to it is retained. However, the new copy must still be constructed in the first place. If the array is very large, this construction is not cheap. This is also a response to J.M. – dcutrell Oct 18 '15 at 17:30
1

@dcutrell That's really a theory, but you have not shown any evidence for it. If you really worry about it, then you can do this: write a LibraryLink function that uses Shared passing, to ensure no copy will be made, call MSparseArray_resetImplicitValues on it, the return. But once again: 1. What evidence do you have to show that this is not what happens when you do sa = SparseArray[sa]? 2. Are you sure that the recomputing can at all be done without a temporary internal copy? If yes, explain why. Based on my limited familiarity with the SparseArray internal structure, ... – Szabolcs Oct 18 '15 at 17:47
@dcutrell ... the arrays use to store the column indices, row pointers and the explicit values will need to be reallocated anyway, as part of shortening them. So the equivalent of an temporary copy will happen. Low-level memory allocation functions can't usually release part of a memory block while retaining the rest. The way to trim an array is to reallocate the memory, copy the array, release the old large one. – Szabolcs Oct 18 '15 at 17:58
@Szabolcs, I agree with you. Some temporary memory overhead will be necessary regardless of the underlying technique. In my case, I'm able to avoid recreating the entire matrix since I can limit the changes to specific rows of the matrix, and do the SparseArray wrapper trick on just those. There would be more overhead if I were to wrap the entire matrix in a SparseArray wrapper. I don't know what tricks Mathematica uses internally when a row of a SparseArray matrix is replaced, but it's slightly better than recreating the entire matrix. I tested this in my larger examples and... – dcutrell Oct 18 '15 at 18:42
@Szabolcs ... I observed a 10% timing improvement between wrapping only the changed rows in SparseArray, versus wrapping the entire matrix in SparseArray. Anyway, thanks for the probing discussion. – dcutrell Oct 18 '15 at 18:45

Adding two SparseArrays produces zeros in the reported "NonzeroValues"

1 Answers1

Linked