To further add to @Omnomnomnom's answer:
The calculations you posted shows, that the original problem is equivalent to the case where you have a diagonal matrix $\Delta$ instead of a merely symmetric $\Sigma$. Note here, that your $B$ is again orthogonal. You can now solve this new problem
$$\max \lbrace {\rm tr} ( B^T \Delta B ) | B \in \mathbb{R}^{n,k}, \, B^t B={\rm Id}_k \rbrace,$$
which is a bit easier than the original, and the maximum will be the same. Also, to get the desired matrix you can then set $U=VB$.
It turns out that the maximum is the sum of the $r$ largest eigenvalues and that $B=(e_1,\ldots,e_r) \in \mathbb{R}^{n,r}$, which then yields $U = VB= (v_1,\ldots,v_r)$, where $v_i$ are ordered eigenvectors of $\Sigma$.
The author of the proof you are following suggests that the solution of the new maximization is obvious. However, Omnomnomnom and I do not agree with that. You could therefore take the approach shown in his or her answer, or you could try to finish your proof. This is actually quite tedious as I will try to show, so I'd suggest to use Omnomnomnom's proof or Nick's suggestion for the upper bound here.
In order to finish your proof, start by calculating
$${\rm tr} ( B^T A B ) = \sum_{j=1}^r \sum_{i=1}^n b_{i,j}^2 \lambda_i = \sum_{i=1}^n (\sum_{j=1}^r b_{i,j}^2)\lambda_i = \sum_{i=1}^n h_i \lambda_i$$ where $B=(b_{i,j})$ and $h_i := \sum_{j=1}^r b_{i,j}^2$. You could also have arrived at this point by simply expressing the columns of $U$ as a linear combination of an orthonormal basis of eigenvectors.
Now we'll show two properties of the above linear combination. For the first one, add columns to $B$ to build an orthogonal square matrix $C=[B|R]=(c_{i,j})$ and note that for each $i$
$$0 \le h_i = \sum_{j=1}^r b_{i,j}^2 = \sum_{j=1}^r c_{i,j}^2 \le \sum_{j=1}^n c_{i,j}^2 = 1$$
holds because $CC^T={\rm Id}_n$. Secondly, for the sum of the coefficients we have
$$ \sum_{i=1}^n h_i = \sum_{j=1}^r <b_j,b_j> = r \cdot 1 = r$$.
Now we've arrived at exactly this problem from convex optimization (with $a=(\lambda_1,\ldots,\lambda_n)^T$ and $x_{\min}=0$, $x_{\max}=1$). Instead of solving it with those methods, we can find the maximum by hand. We bound
$$\sum_{i=1}^n h_i \lambda_i \le \sum_{i=1}^{r-1} h_i \lambda_i + \lambda_r \sum_{i=k}^n h_i = \sum_{i=1}^{r-1} h_i \lambda_i + \lambda_r (r-\sum_{i=1}^{r-1}) = r \lambda_r + \sum_{i=1}^{r-1} h_i (\lambda_i - \lambda_r)$$.
Now we can establish an upper bound for our maximization problem:
$$\max \lbrace {\rm tr} ( B^T A B ) | B \in \mathbb{R}^{n,k}, \, B^t B={\rm Id}_k \rbrace \le \max \lbrace \sum_{i=1}^n h_i \lambda_i | 0 \le h_i \le 1, \sum_{i=1}^n h_i = r \rbrace \\ \le \max \lbrace \sum_{i=1}^n h_i \lambda_i | 0 \le h_i \le 1\rbrace \le \max \lbrace r \lambda_r + \sum_{i=1}^{r-1} h_i (\lambda_i -\lambda_r) | 0 \le h_i \le 1\rbrace \le \sum_{i=1}^r \lambda_k$$.
For the last inequality we've set $h_i=1$ since the term in brackets is nonnegative. Since this bound can be attained with $h_1 = \ldots = h_r = 1$ and $h_{r+1} = \ldots = h_n = 0$ we have found our maximum and with it the desired $B$ or $U$.