Largest invertible dense matrix with standard solvers such as Lapack

Question

I have a matrix which is complex symmetric. It is around 50,000 elements per side. It is a Method of Moments matrix. Is it feasible to use a standard direct solver such as Lapack to do a matrix decomposition such as $LU$ or $LDL^T$. I need to solve approximately 7200 right hand side vectors with this matrix, so if I were to take an iterative approach I would need to run it 7200 times.

is the matrix dense? Which fraction of your main memory does it fill? — Arnold Neumaier, Jul 05 '12 at 15:03
How accurate do you want the solutions to be? How fast does the kernel decay? How much effort are you willing to invest in "fast methods"? — Jed Brown, Jul 05 '12 at 18:37
@ArnoldNeumaier: Yes, the matrix is dense. Since it is symmetric, storing only about half the elements in double precision will take about ~18-19GB of RAM. This is about 15% of the RAM available on the target machine (128GB total.) — Costis, Jul 05 '12 at 22:25
@JedBrown: At least to within 1e-4 relative tolerance. 1e-6 would be ideal. It is a 3D time-harmonic electromagnetics problem, so it has Green's function: $\frac{e^{-jkR}}{R}$. So it decays essentially as $\frac{1}{R}$ with distance between elements when far enough away. I am willing to invest effort as long as it does not come at the cost of accuracy. I have read about methods such as Fast Multipole Method, etc., but I feel like it will sacrifice accuracy in the near-field results, which is unacceptable in my situation. — Costis, Jul 05 '12 at 22:29
FMM evaluates the near-field exactly, it's the mid-/far-field that is approximate. You can always increase the number of moments to get higher accuracy. A reasonable and well-established method here would be to use FMM to apply the operator and a block Krylov method for the right hand sides. — Jed Brown, Jul 06 '12 at 00:02
@JedBrown: I will look into this, thanks. Is it not feasible to consider a direct solve? — Costis, Jul 06 '12 at 00:50
A direct solve is going to be pretty expensive at this size. It's certainly possible, just time consuming (more than an hour if you only have a small machine available, and needs 20GB minimum just to store the matrix). You could get it faster with enough parallelism, but I don't think you'll be able to beat the FMM+Krylov methods in time or memory. — Jed Brown, Jul 06 '12 at 01:25
I definitely won't beat a Krylov method in terms of memory...that is for sure. Assuming I have enough memory, though, which should be true for this case (the computer running the solver has 128GB of RAM), I am not sure how much of a speedup I would get with FMM. Without FMM, a Krylov subspace method relies on matrix-vector multiplies which are $N^2$ FLOPs. If I need to solve for 7200 RHS vectors, this means that if the number of matrix-vector mults done by the iterative method per RHS is >7 then it will take >$N^3$ flops total. An $LDL^T$ decomposition should be about $\frac{N^3}{3}$ FLOPs. — Costis, Jul 06 '12 at 07:37

score 4 · Accepted Answer · answered Jul 06 '12 at 16:49

[Moving the comment thread down here]

A direct solve will be pretty expensive, likely on the order of an hour for the sort of machine you describe. If you want invest time in something faster using more specialized methods, I suggest using FMM and a block Krylov method. FMM applies the operator in $\mathcal O(N)$ memory and time. FMM has a big constant, but less than $N=50000$, so it should be faster than dense matrix multiply. I don't know if you'll find a library implementation that applies to multiple vectors at once.

Block Krylov methods apply to $k$ right hand sides, converging as much as $k$ times faster than with a single right hand side. The cost per iteration is $\mathcal O(N k)$ to apply the operator using FMM plus $\mathcal O(N k^2)$ to orthogonalize (with an additional factor for the restart length if using GMRES) and $\mathcal O(k^3)$ "scalar" work. You can choose $k=7200$ or some smaller number, to reduce the $Nk^2$ and $k^3$ cost at the expense of (slightly) slower convergence.

Given its source in integral equations, I assume that the continuum operator associated with your problem is of the form $I + K$ where $K$ is a compact operator. Being compact, $K$ has essentially finite rank, therefore the discretized system has an essentially constant number of eigenvalues that are not "near" $1$, thus we expect the block Krylov method to converge very fast.

Just a quick comment that for non-harmonic kernels, such as that for the Greens function in electromagnetic scattering, the scaling for FMM is $\mathcal{O}(N^{3/2})$. For multi-level FMM, the scaling is either $\mathcal{O}(N \log N)$ or $\mathcal{O}(N \log^2 N)$, depending on whether you use local (e.g. lagrange) or global methods (e.g. FFT) for the interpolation/anterpolation. See Rokhlin, V. (1990). Rapid solution of integral equations of scattering theory in two dimensions. Journal of Computational Physics, 86(2). — OscarB, Jul 11 '12 at 12:20

score 3 · Answer 2 · edited Apr 13 '17 at 12:53

It will be slow but feasible. (You can find out the likely time it takes by solving first for sevaral $k$ reduced problems with dimension $50000/k$ and $7200/k$ right hand sides, and multiply the resulting time with $k^3$.)

You need to do pivoting for accuracy (as $LDL^T$ may be unstable http://eprints.ma.man.ac.uk/344/), and you should process all right hand sides simultaneously (i.e., solve $AX=B$ with a matrix $B$ composed of all right hand sides) to take maximal advantage of the BLAS3 kernel.

But it is still a brute force approach, and a multipole method is the more elegant choice. Moreover, you can combine the latter with the method in https://scicomp.stackexchange.com/a/2422/1117 to avoid having to solve 7200 independent problems.

Largest invertible dense matrix with standard solvers such as Lapack

2 Answers2