Rank of singular, large, sparse matrices

Question

I need to find the rank (and eventually do the Gaussian elimination) of a large, sparse, non-square matrix of integers. There are a few methods in Mathematica to find the rank of a non-square matrix (or decompose it, and thereby find the rank) that work more or less out of the box: MatrixRank, SingularValueList, QRDecomposition, HermiteDecomposition

For large and sparse matrices there is an ultra fast algorithm of LU decomposition implemented through LinearSolve[ ... , Method -> "Multifrontal"] . It needs a square matrix but that can be achieved just by completing the square with zeroes (or "Background" in the language of SparseArray). This gives a very fast method to calculated the SparseMatrixRank.

The problem is that the underlying algorithm does not always work when the matrix is singular (LinearSolve gives a warning also), but if the decomposition reproduces the original matrix then we can trust the result.

In my real case it does not work maybe because the system is too big (30k x 100k) or too dense (~4/30k) or it has to do with its shape. So I am doing a very simple LU decomposition by hand that gives me the rank. I was hoping to use the functionalities of SparseArray to make it as fast as possible, using as little memory as possible.

So far, my algorithm gives always the correct result, even for a bit denser matrices where Multifrontal fails. But it is nowhere near as fast, and sometimes it eats all my RAM.

If you have suggestions on how to improve this code, maybe exploiting the Methods of SparseArray, or maybe spoting some obvious inefficiencies of my coding, I think that it could be useful for many people that do not want to resort to other programming languages to face large systems.

My simpleLU

Clear[simpleLU]
simpleLU[A_SparseArray] := 
 Module[{L, U, m, n, rk, ck, rank = 0, auxA = A, js, subt},
  {m, n} = Dimensions[A];
  L = IdentityMatrix[m, SparseArray];
  subt = ConstantArray[0, m];
(Loop over all rows i that aren't zero )
  Do[
   If[Total[Abs[auxA[[i]]]] != 0,
     rank++;
 (*Pivot has indices {rk,ck}*)
 rk = i;
 ck = 
  auxA[[i]][
     &quot;NonzeroPositions&quot;][[FirstPosition[
       Flatten[
        auxA[[i]][&quot;NonzeroValues&quot;]], _?(# != 0 &amp;)][[1]]]][[1]];

 (*Preselect which rows (js) have nonzero elements below the pivot*)
 js = Select[
   Flatten[auxA[[All, ck]][
     &quot;NonzeroPositions&quot;]], (# &gt;= rk + 1 &amp;)];
 js = Select[js, (auxA[[#, ck]] != 0 &amp;)];

 (*Make zero all elements below the pivot auxA[[rk,ck]] *)
 L[[js, rk]] = auxA[[js, ck]]/auxA[[rk, ck]];
 Do[subt[[j]] = L[[j, rk]]*auxA[[rk]], {j, js}];
 auxA[[js]] -= subt[[js]]; (*&lt;- bottleneck*)

 , {}];


, {i, m}];
U = auxA;
  Print["simpleLU worked? ", A == L . U];
  Print["simpleLU rank: ", rank];
  {L, U}]

The SparseMatrixRank that does not always work

SparseMatrixRank[A_SparseArray?MatrixQ] := 
 Module[{a, S, U, L, p, q, d1, d2},
  {d1, d2} = Dimensions[A];
  a = If[d1 > d2, 
    Join[Transpose[A], 
     SparseArray[{}, {d1 - d2, d1}, A["Background"]]], 
    Join[A, SparseArray[{}, {d2 - d1, d2}, A["Background"]]]];
S = Quiet@LinearSolve[a, Method -> "Multifrontal"];
  U = S["getU"];
  L = S["getL"];
  {p, q} = S["getPermutations"];
Print["Multifrontal worked? ", (L . U)[[p, q]] == a];
  Print["Multifrontal rank: ", Total[Unitize[Diagonal[Chop[U]]]]];
  ]

Denser matrices where sometimes SparseMatrixRank fails

n = 3000;
d2 = 100;
d1 = 50;
i = RandomInteger[{1, d1}, n];
j = RandomInteger[{1, d2}, n];
v = RandomInteger[{-10, 10}, n];
A = SparseArray[Transpose[{i, j}] -> v, {d1, d2}];
A // SparseMatrixRank // EchoTiming;
A // simpleLU // EchoTiming;

Update:

I have improved some basic syntax of simpleLU and now it runs 5x faster (the code is updated above). Depending on the density and size of the matrix, it outperforms other built-in methods (SingularValueList, QRDecomposition, HermiteDecomposition). And always gives the correct rank (unlike Multifrontal). It is still slow for very large systems, but so far it is the best I have for my problem.

If someone is willing to have a look I'm fairly sure it can still be improved, I have marked the bottleneck.

As @HenrikSchumacher pointed out, it would be great to have an interface to the methods of SuiteSparse. If you have untreatably large systems, you probably have to go there. Besides C++ they have an interface with MATLAB.

Yeah (+1), would be a great thing to have a Mathematica interface for SuiteSparse's rank-revealing SQPR function. — Henrik Schumacher, Apr 19 '23 at 17:49
Yes it would! I have tweaked a bit my function and now it runs faster. I also reproduced your comment in the question to help other people that may end up here — Albercoc, Apr 20 '23 at 13:30
I am trying to improve the algorithm itself. Is there a predefined function to swap rows of a SparseArray? Particularly I want to re-order the rows to put those with less entries at the top — Albercoc, Apr 24 '23 at 12:57
Yes, finding suitable reoderings of the rows is big thing in sparse matrix algorithms. But it depends quite a lot on what you exactly want to do with the matrix. For example, for Gaussian elimination and the like, SparseArray`NestedDissection is typically a good starting point, at least to compute a decent elimination tree... What you have in mind is probably closer to SparseArray`ApproximateMinimumDegree. — Henrik Schumacher, Apr 24 '23 at 13:30
But if you are new to sparse matrix algorithms, the I would recommend to call SPQR from SuiteSparse via LibraryLink. — Henrik Schumacher, Apr 24 '23 at 13:32
Hi Henrik. I am learning LibraryLink to use SPQR, following your suggestion. If (when) I run into problems I'll probably make another post. — Albercoc, May 02 '23 at 12:46
But testing the methods of SparseArray, I get There is no method ApproximateMinimumDegree for SparseArray objects. Both ApproximateMinimumDegree and NestedDissection appear under Names["SparseArray`*"]. Can anybody give a very minimal example of how to use these methods? — Albercoc, May 02 '23 at 12:49
You really have to call these functions with the context in front, i.e.,SparseArray`ApproximateMinimumDegree[A] and SparseArray`NestedDissection[A] (of course A has to be a SparseArray that is also a matrix). — Henrik Schumacher, May 02 '23 at 18:37
I know from experience that these should work for Intel and Apple CPUs under macos. But my experiences with Windows or Linux are very limited. Could be that some backend libraries are missing on these operating systems, but I would be surprised. — Henrik Schumacher, May 02 '23 at 18:40
Thank you! I did p=SparseArray`ApproximateMinimumDegree[A]; A=A[[p,p]] and it saves a 30% of time (!) if the matrix is already square. Unfortunately it gets compensated from having a larger matrix (making square a rectangular matrix). On the other hand it seems like NestedDissection only works with structurally symmetric matrices — Albercoc, May 03 '23 at 07:47
"NestedDissection only works with structurally symmetric matrices". Oh, of course! Sorry, I forgot about that. — Henrik Schumacher, May 03 '23 at 08:26

score 5 · Accepted Answer · answered May 23 '23 at 07:54

I am now able to use SuiteSparse with Mathematica. My advice if you are facing the same problem I had:

SparseMatrixRank is the fastest and simplest to implement. You will get warnings, but if the output is Multifrontal worked? True you can trust the result.
simpleLU is significantly slower, but will always give the correct answer. The memory problems I was having are fixed in the code above. Add Monitor[...,rank] around the loop to see the progress.
SuiteSparseQR. If the matrix is huge and the other solutions do not work for you, then we need third-party code. SuiteSparse contains several methods for sparse matrices that are extraordinarily fast. The downside is that you need to install it (together with all its dependencies) and get its C++ interface running.

Linking SuiteSparseQR to Mathematica

After many attempts to use LibraryLink I gave up. Many simple cases are nicely documented. Getting it to work with a simple C++ program is possible. But using third-party libraries the kernel crashes and there is no way of debugging it.

The work around is offensively simple: save the SparseArray into an ascii file with Export, call the C++ program from Mathematica with RunProcess. Wrap it in a function and it is (for my purpose) the same as using LibraryLink. It takes only a few seconds for my 30k x 200k matrices (!!)

Clear[rankSparseQR]
rankSparseQR[A_SparseArray]:=Module[{matrixPath,cmdCompile,cmdRun,env,stdout},
(Save matrix; compile; run SPQR with matrix as argument)
matrixPath=Export[NotebookDirectory[]<>"tmp/mat.mtx",A];
cmdCompile="g++ "<>NotebookDirectory[]<>"SPQR/rankQR.cpp "<>"-lcholmod -lspqr -lsuitesparseconfig -o "<>NotebookDirectory[]<>"SPQR/tmp"; 
cmdRun=NotebookDirectory[]<>"SPQR/tmp "<>matrixPath;
env=<|"LD_LIBRARY_PATH"->"/home/etlar/albercoc/Programs/SuiteSparse-7.0.1/CHOLMOD/build:$LD_LIBRARY_PATH"|>;
RunProcess[{$SystemShell,"-c",cmdCompile},"StandardError"]//Print;
stdout=RunProcess[$SystemShell,"StandardOutput",cmdRun,ProcessEnvironment->env];
stdout//Print;
stdout//StringTake[#,-10]&//ToExpression
]
sa = SparseArray[{{1, 1} -> 1, {2, 2} -> 2, {3, 3} -> 3, {3, 4} -> 0}];
rankSparseQR[sa]

3

The code of rankQR.cpp

#include "SuiteSparseQR.hpp"
#include "SuiteSparse_config.h"
int main (int argc, char *argv)
{
    cholmod_common Common, cc;
    cholmod_sparse *A;
// start CHOLMOD
cc = &amp;Common;
cholmod_l_start (cc);


// load A
int mtype ;
const char* filePath = argv[1];
FILE* fp; fp = fopen(filePath, &quot;r&quot;); 

//test
//FILE* fp; fp = fopen(&quot;/home/albercoc/Programs/SuiteSparse-7.0.1/SPQR/Matrix/b1_ss.mtx&quot;, &quot;r&quot;);     

A = (cholmod_sparse *) cholmod_l_read_matrix (fp, 1, &amp;mtype, cc) ;

fclose(fp);

// factorization
SuiteSparseQR_factorize &lt;double&gt; (SPQR_ORDERING_DEFAULT, SPQR_DEFAULT_TOL, A, cc) ;

// print info
cholmod_l_print_sparse(A, &quot;A&quot;, cc);

// print the rank
printf(&quot;%10ld&quot;,cc-&gt;SPQR_istat [4]); //use 10 characters


// free everything and finish CHOLMOD
cholmod_l_free_sparse (&amp;A, cc);
cholmod_l_finish (cc);

return 0;

}

Thank you @Henrik Schumacher for pointing me out to SuiteSparseQR, and to all the people that developed it

Hm. It could be that the LibraryLink method failed because SparseSuite uses 32-bit integers by default while Mathematica uses 64-bit integers. I faced similar problems when I instelled SparseSuite via homebrew. There are two possible workarounds: Recompile SuiteSparse manually with 64-bit integers (which is probably cumbersome, and I don't know for sure that it will work) or you first have copy the 64-bit-integer arrays into 32-bit integer arrays in the LibraryLink code... If I had more time, I would investigate it. Anyways, congratulations that you found a solution that works for you! (+1) — Henrik Schumacher, May 23 '23 at 10:16
Oh, and a further typical issue with moving sparse arrays between Mathematica and C is of course, that Mathematica uses 1-based indices but C uses 0-based indices. There could be a compile switch in SuiteSparse that changes the behavior to 1-based indexing. But again the simplest work-around is probably to substract 1 during the above copying process. (You have to substract 1 only from the "ColumnIndices", not from the "RowPointers" of the SparseArray.) — Henrik Schumacher, May 23 '23 at 10:19
These are valuable comments. To avoid confusion: in the solution above, the sparse matrix is passed with a standard format (matrix market) that both Mathematica and SuiteSparse read and write. — Albercoc, May 23 '23 at 11:01
Yes. That's understood. The person who wrote the export filter has taken care of this off-by-one issue. And because mtx stores data as strings (and not in binary), this SuiteSparse importer is able to case to every they want. =) — Henrik Schumacher, May 23 '23 at 12:52

Rank of singular, large, sparse matrices

1 Answers1

Linked