I don't know much about GPU computing at the moment, so please pardon the simple question. Can one invert local matrices in parallel on the GPU? CUBLAS doesn't seem to support factorization, and most of the LU/QR/Chol libraries I've found for GPUs aim instead to accelerate a single direct factorization.
For example, if mass matrices had to be recomputed for an explicit DG method, is there a way to reinvert them locally on the GPU (i.e. in more of an MPI fashion, computing a factorization in parallel over multiple warps/blocks/etc)?
Edit: I'm trying to see if it's possible to assemble and invert a large number of small matrices on a GPU.
We were just hoping to parallelize over elements and quadrature for now.
– Jesse Chan Aug 09 '13 at 21:43