I am doing work which requires as fast matrix multiplication as possible and just want to double-check with this community that the Winograd variant of Strassen's MM algorithm is the fastest practical (so no Coppersmith-Winograd) algorithm.
I am doing a lot of processing with the data before, between, and after each MM to the point where using Mathematica or Matlab would be a hindrance.
Also, I am curious if anyone has a good idea for the error in the regular Strassen vs Winograd variant? In "Exploiting Parallelism in Matrix Computation Kernels for SMP systems" D'Alberto et al. briefly mention Strassen as being more accurate but this seems counter-intuitive since Winograd has less operations overall.
Edit: We are using matrices up to size 2^16 x 2^16 ~ 4 billion doubles, so a sub-cubic algorithm is definitely faster than naive.
Edit 2: On the accuracy of Strassen vs Winograd, if anyone is interested. In "Accuracy and Stability of Numerical Algorithms" Higham has an in-depth analysis of the error of the two algorithms and shows Strassen has slower error growth w/respect to size of matrices. Also of note, Strassen more error-prone (against itself) for matrices with all positive entries.
Thanks