To use newer accelerators like this, I need to perform matmul in 4-bit precision. How do I tell whether this operation is stable? Wondering if there well common heuristics in terms of properties of matrices $\{A_i\}$.
For classification tasks, matrix multiplication is stable if $\langle y, y_{\text{4-bit}}\rangle>0$ with high probability for random $x$ and $y=A_d\ldots A_2 A_1 x$.
Orthogonal matrices seem more likely to work in 4-bits, but that's too hard of a restriction.