Dear mathematica users,
In my present research I am faced with a real dense $n\times n$ matrix $A$ where $n \geq 3000$ (hopefully even more). The coefficients of this matrix are fixed, but I will have to repeatedly multiply it by a variable vector: $Ax$.
I am not complaining about Mathematica's speed to do the job, which seems quite nice, but since I will need to do this repeatedly for very many times, I was wondering if there was a way to optimise the process. Perhaps, declaring it in a With will help, but besides that I am out of ideas.
Another alternative would be a low-rank approximation using SVD. My thoughts were: with the SVD I can write \begin{equation} A = \sigma_1 u_1 v_1^\text{T} + \sigma_2 u_2 v_2^\text{T} + \ldots \end{equation} so \begin{equation} Ax = (\sigma_1 v_1^\text{T}x) u_1 + (\sigma_2 v_2^\text{T})u_2 + \ldots \end{equation} As an example, using a rank 100 approximation to a $3000\times 3000$ matrix (which yields a Frobenius error of $\sim5-10\;\%$) I was able to reduce the computation time by a factor of roughly 3 or 4.
I thank in advance for any positive feedback.
Best regards,
Gabriel Landi
Edit: Forgot to say that $A$ is symmetric and has zero diagonal.
CUDADotis very fast, but depending on the problem the overhead for a call can make it slower than a standardDot. – Yves Klett Jan 15 '13 at 14:02xvectors at a time so you can put them in a matrix? – ssch Jan 15 '13 at 14:14LinearAlgebra`BLAS`GEMV. – Oleksandr R. Jan 15 '13 at 14:34Ahave a full rank? If so, can you work in its eigenspace? Alternatively, can you calculate $A X$ where $X = (\begin{matrix}\vec{x}_1&\vec{x}_2&\cdots\end{matrix})$, instead? That would allow you to use CUDA with full matrix-matrix computations, and the eigenspace idea is still applicable. – rcollyer Jan 15 '13 at 15:30{a11, a22, ...} {x1, x2, ...}which is very fast. But, I think your likely only to get a speed up, if $x$ is already in that basis. If $x$ isn't easily expressible in the basis of $A$, then it may be expressible in either terms of the group representations of $A$ or, barring that, as spherical components. Either way would reduce the computational load, if set up right. But, it depends on how $x$ is generated. – rcollyer Jan 15 '13 at 17:03