Since $A=R_1R_2^T$ with $R_1, R_2\in \mathbb{R}^{n \times r}$, we have
$$
\sum_i A_{\pi i, i} = \sum_i (P_{\pi} A)_{i, i} = \text{trace}(P_{\pi}R_1R_2^T)
$$
where $P_{\pi}$ is the permutation matrix corresponding to $\pi$.
For any $\pi$, the trace can be computed as
$$
\text{trace}(P_{\pi}R_1R_2^T) = \sum_{i} \sum_{k} (P_{\pi}R_1)_{i,k} (R_2^T)_{k,i}
= \sum_{i,k} ((P_{\pi}R_1)\circ R_2)_{i,k}.
$$
(This quantity is also known as Frobenius product, $P_{\pi}R_1:R_2$).
This idea doesn't take away the burden of having to go through all permutations and brute-force search for the maximum of all Frobenius products, and in fact is has the same arithmetic complexity as explicitly computing $A=R_1R_2^T$. However, it has much lower memory requirements since you never have to actually form $A$.