6

When I use DeleteDuplicates to delete equal rows of a sparse array, the output is not a sparse array anymore. In my application, the sparse arrays are huge but very sparse, so converting back and forth is quite annoying.

Is there a version of, or alternative to, DeleteDuplicates that always preserves sparse arrays? (i.e. without converting back and forth between dense and sparse representations)

Gert
  • 1,530
  • 8
  • 22
  • You mean, you want to delete duplicate rows? – Henrik Schumacher Jun 16 '23 at 12:00
  • Yes. But preferably still with the possibility to use a test function of choice (e.g. when two rows are equal up to sign) – Gert Jun 16 '23 at 13:43
  • 5
    You can avoid unsparsifying rows by doing SparseArray @ DeleteDuplicates[List @@ sparse] – Carl Woll Jun 16 '23 at 14:23
  • 4
    Deleting duplicate rows would change the dimensions of the array. That in turn would change the coordinate->value rules. You won't be preserving much of anything. So, if DeleteDuplicates did return a sparse array, you'd just have a whole other set of annoyances to deal with. Are you sure DeleteDuplicates is the operation you want? Maybe there is a more algebraic operation that you're looking for. – lericr Jun 16 '23 at 15:44

1 Answers1

5

It often helps when the OP provides their own test case. But here goes, assuming any test case sufficiently illustrates the problem as we are implicitly invited to do:

Quit[]

SeedRandom[0]; sparse = SparseArray[ Thread[RandomInteger[{1, 1000}, {100, 2}] -> 1], {1000, 1000}]

MaxMemoryUsed[]
(*  150671984  *)
sparse[[DeleteDuplicatesBy[Range@Length@sparse, sparse[[#]] &]]]
MaxMemoryUsed[]
(*  150671984  *)

Or per the OP's comment under the OP:

SeedRandom[0];
sparse = SparseArray[
  Thread[RandomInteger[{1, 1000}, {100, 2}] -> 
    RandomChoice[{-1, 1}, 100]], {1000, 1000}]
sparse[[
 DeleteDuplicatesBy[Range@Length@sparse, Abs[sparse[[#]]] &]]]

(Same dimensions since Abs[-1] == 1.

Michael E2
  • 235,386
  • 17
  • 334
  • 747