3

I am a physicist with limited numerical methods knowledge and I am trying to speed up the inversion of a very ill-conditioned problem ($rcond>10^{30}$). The same sparse square matrix is used repeatedly, so our current setup relies on a one time LU factorisation and a triangular solver.

I have at my disposal a very powerful GPU and would like to use it (using CUDA with Cusparse, Magma...). My current understanding is that triangular solver are very sequential by nature and do not work well on parallel SIMD architecture, but iterative solver on the other hand do work well on GPU, sadly my problem is far to ill-condtionned.

Since the matrix is used a large number of time, taking some time to compute a preconditioner seems like the way to go. I am therefore looking for ways to design a preconditioner suitable for parallel solver.

If you guys want to take look at it, i have included a link to the kind of matrix I am working with.

Big Matrix
size=114708*114708
One-based_numbering
34Mb

Small Matrix
size=12890*12890
One-based_numbering
3Mb

Also a script to load the matrix with Matlab:

n=12890;
%n=114708;
fileID = fopen('SmallMatrix.txt','r');
%fileID = fopen('BigMatrix.txt','r');
dataArray = textscan(fileID,'%f%f%f%[^\n\r]', 'Delimiter',' ', 'MultipleDelimsAsOne', true);
fclose(fileID);
U1 = dataArray{:, 1};
U2 = dataArray{:, 2};
V = dataArray{:, 3};
s=sparse(U1,U2,V,n,n);
figure;
spy(s);

Edit: Despite the very high condition number, the direct solvers I have used (matlab, mumps,umfpack and superlu) have no problem solving it. In matlab,

cond(s)
x=rand(n,1);
y=s\x;
norm(s*y-x)

Returns,

1.4996e+34
2.7318e-10

Edit2: Here is the kind of structure my matrix might have (Very big value are always on the diagonal):

\begin{pmatrix} 0.1 & 2.1 & 3.2 \\ 4.1 & -1.1 & 21\\ 8 &-0.2 & 10^{30} \end{pmatrix}

And the the RHS: \begin{pmatrix} 3.1 \\ 4 \\ 0.3 \end{pmatrix} And the solution is: \begin{pmatrix} 1.3544 \\ 1.4117 \\ 0 \end{pmatrix}

It is very Ill-conditionned but every direct solver is completely okay with it. It is just a easy way to impose boundary conditions (not very graceful, i have to admit)

Edit3: Following Brian Borchers and Tobias advice, I have removed from my matrix and my RHS all the rows and columns where my very large values appear. It works great in the sense that the solution to a given RHS is the same and the condition number is a more reasonable $10^{5}$. Here is the script used to do that:

%Creating the new matrix
sCopy=s;
nRemoved=nnz(abs(sCopy)>10^29);
for index=1:n-nRemoved
    while abs(sCopy(index,index))>10^29
        sCopy(index,:)=[];   
        sCopy(:,index)=[];  
    end
end
nnz(abs(sCopy)>10^29)

%Updating the  RHS
[i,j,]=find(abs(s)>10^29);
NewRHS=RHS;
NewRHS(i)=[];

With that out of the way, my main interrogation remains, the condition number is still too high for a iterative method to converge.

RYegavian
  • 65
  • 4
  • 3
    With a condition number of $10^{30}$ compared to machine epsilon of about $10^{-16}$ for double precision, the results you get are basically going to be amplified numerical noise regardless of how you solve it. – Nick Alger Nov 22 '14 at 23:57
  • Do you mean that iterative solving is hopeless or that any solving is hopeless? Matlab does return a cond() of 10^34, but also gives me a solution (with 10^-10 norm of residual) – RYegavian Nov 23 '14 at 00:08
  • 3
    Any method of solution (using double precision floating point arithmetic) is hopeless. You should be looking at strategies for regularizing the problem. – Brian Borchers Nov 23 '14 at 00:11
  • @BrianBorchers, thanks for your input. The reason why the matrix is so ill-conditionned is because I use some very high coefficient at some places to impose boundary conditions. My problem is therefore completely solvable despite the high condition number. – RYegavian Nov 23 '14 at 00:36
  • 1
    @RYegavian: What Brian is saying is that if your condition number is $10^{30}$, then if you perturb the right hand side by $10^{-16}$ (machine precision-level roundoff, which is unavoidable), the solution will change by $10^{14}$. This is likely an unacceptably large error. – Wolfgang Bangerth Nov 23 '14 at 00:49
  • @WolfgangBangerth, thank you for your answer, I have included a example at the end of my post where the condition number is 10^30 but the solver is completely fine with it. – RYegavian Nov 23 '14 at 01:16
  • 3
    If that's the only reason for the ill-conditioning, then just set all of those variables equal to 0 and substitute them into the other equations and look at the condition number of the resulting system of equations. It may be fine. – Brian Borchers Nov 23 '14 at 03:41
  • 2
    Note that having a small residual norm is not necessarily equivalent to having an extremely accurate solution- that's the nature of ill-conditioned systems of equations. – Brian Borchers Nov 23 '14 at 03:42
  • 1
    Elaborating a bit on Brian's answer: Assume that row k has such a huge diagonal element (in the order of 10^30). Divide the corresponding equation by that diagonal element. The resulting k'th equation will be x[k] = 0 if you ignore the coefficients and the right-hand-side in the order of 10^-30. – Tobias Nov 23 '14 at 10:05
  • 2
    Note, that in your case you can easily avoid the ill-condition by balancing the matrix (already row-scaling is suffient in your case). As Brian already noted this just leads to zeroing the unknowns with the huge diagonal elements. (@BrianBorchers) Brian should write that as an answer to your question and you should accept it. – Tobias Nov 23 '14 at 10:13
  • @BrianBorchers and Tobias, I have updated my post using your tips. – RYegavian Nov 23 '14 at 11:53
  • 1
    Designing a preconditioner is very problem dependent- I would suggest that you start a new question in which you explain what your underlying problem is (e.g. what PDE you're solving and how you've discretized it.) Perhaps then we could suggest ways to precondition the system. – Brian Borchers Nov 23 '14 at 16:44
  • 2
    I tested your small matrix. It is symmetric within a absolute tolerance of 10^-10 but not with absolute tolerance of 10^-16. Is the matrix supposed to be symmetric? Furthermore, it looks like it is a diagonal-dominant matrix. Might it be that this is the discretization of a Laplacian? Maybe, reverse Cuthill McKee and Skyline Cholesky is already sufficient for this problem? Maybe, you can use incomplete LU with a tri-diagonal Cholesky solver as pre-conditioner? That are just wild guesses. – Tobias Nov 23 '14 at 19:26
  • @Tobias, this is indeed the discretization of a Laplacian. Thank for taking the time to take a look. I will definitely do research on Cuthill McKee and Skyline Cholesky to evaluate the performances. About incomplete LU, I thought is was not suited for parallelization, but maybe I'm wrong? – RYegavian Nov 23 '14 at 22:05
  • 1
    @RYegavian: I agree with BrianBorchers here. Preconditioning is often problem-dependent, and supplying numerical values is not as helpful as describing the continuous problem you would like to solve, its physics, and how you are going about discretizing it. I suggest you close this question and post a new one with this information, in addition to the information you've posted about wanting to use a GPU. – Geoff Oxberry Nov 24 '14 at 21:37
  • 1
    Sorry. I've not done so much with parallelization yet. Maybe, my suggestions are not optimal for your usecase. The only thing is that this approach would get you working... But, now I've seen that you have already found a solution. – Tobias Nov 26 '14 at 06:30

1 Answers1

0

I found a solver that seems to be suitable for my problem. My very ill-conditioned matrix is actually the FEM discretisation of a Poisson equation on a bidimentionnal unstructured mesh. First results with a geometrical multi-grid approach (multiple V-cycle) show very good weak-scaling on CPU. And since most of the time is spent multiplying big sparse matrices with vectors (on the fine mesh), I expect the GPU performance to be okay.

RYegavian
  • 65
  • 4