While it is true that $V_0 \subset V$, these spaces are still both infinite dimensional, and there are no matrix sizes to speak of since no discretization has happened yet. To form matrices, you must choose finite dimensional subspaces $V_{0,h}$ and $V_h$ which "approximate" $V_0$ and $V$, in some sense. The size of the resulting matrices just depends on the dimension of $V_{0,h}$ and $V_h$. Therefore, if you use the same number of basis functions so that your finite-dimensional approximate spaces $V_{0,h}$ and $V_h$ have the same dimension, your matrices will end up square.
EDIT: Actually, the last sentence above is incorrect. You can get square matrices without having the exact same dimension for your subspaces; what matters is obviously the number of DOF you have, which also includes essential boundary conditions which are enforced in your problem. The following example should hopefully be more helpful than what I wrote above.
We seek a function $u(x)$ such that
\begin{align}
-u'' +u = f, \: x \in \Omega, \\
u'|_{\Gamma_N} = g_N, \: u|_{\Gamma_D} = g_D,
\end{align}
where $\Omega$ is some open interval whose boundary is partitioned $\partial \Omega = \Gamma_N \cup \Gamma_D$.
One weak form for this problem is: find $u \in U$ such that
\begin{align*}
\int_{\Omega} v' u' + \int_{\Omega} vu = \int_{\Omega} vf - v|_{\Gamma_N} g_N , \: \forall v \in V,
\end{align*}
where
\begin{align}
V = \{ w \: | \: w \in H^1(\Omega), w|_{\Gamma_D} = 0 \}, \\
U = \{ w \: | \: w \in H^1(\Omega), w|_{\Gamma_D} = g_D \}.
\end{align}
Note that we set $w|_{\Gamma_D} = 0$ in the test function space $V$ to eliminate the $v|_{\Gamma_D} u''|_{\Gamma_D}$ term since no information about $u''$ is known on that part of the boundary. This does not affect the DOF boundary conditions which we will enforce later, since we can enforce them directly without using our test function space $V$.
Lets proceed by a Petrov-Galerkin method: we make finite-dimensional subspaces by specifying bases
\begin{align*}
U_m = \text{span} \{\phi_j\}_{j = 1}^m, \\
V_n = \text{span} \{\psi_i \}_{i = 1}^n.
\end{align*}
The weak problem reduces to problem of seeking some $u_m \in U_M$ for which the following $n+1$ equations hold:
\begin{align*}
\int_{\Omega} \psi_i' u_m' + \int_{\Omega} \psi_i u_m = \int_{\Omega} \psi_i f - \phi_i |_{\Gamma_N} g_N, \: i = 1, ... ,n \\
u_m|_{\Gamma_D} = g_D \text{ (enforce DOF bc directly without $V$) }.
\end{align*}
Expanding $u_m$ out in a basis, this becomes the linear system
\begin{align*}
\sum_{j = 1}^m \left( \int_{\Omega} \psi_i ' \phi_j ' + \psi_i \phi_j \right) u_j = \int_{\Omega} \psi_i f - \phi_i |_{\Gamma_N} g_N, \: i = 1, ..., n \\
\sum_{j = 1}^m u_j \phi_j = g_D.
\end{align*}
In matrix form (with m = 3, n = 2 for simplicity):
\begin{align*}
\begin{bmatrix}
\int_{\Omega} (\psi_1' \phi_1' + \psi_1 \phi_1) & \int_{\Omega} (\psi_1' \phi_2' + \psi_1 \phi_2) & \int_{\Omega} (\psi_1' \phi_3' + \psi_1 \phi_3) \\
\int_{\Omega} (\psi_2' \phi_1' + \psi_2 \phi_1) & \int_{\Omega} (\psi_2' \phi_2' + \psi_2 \phi_2) & \int_{\Omega} (\psi_2' \phi_3' + \psi_2 \phi_3) \\
\phi_1 |_{\Gamma_D} & \phi_2 |_{\Gamma_D} & \phi_3 |_{\Gamma_D}
\end{bmatrix}
\begin{bmatrix}
u_1 \\ u_2 \\ u_3
\end{bmatrix} =
\begin{bmatrix}
\int_{\Omega} \psi_1 f - \phi_1|_{\Gamma_N} g_N \\
\int_{\Omega} \psi_2 f - \phi_2|_{\Gamma_N} g_N \\
g_D
\end{bmatrix}.
\end{align*}
So, the last row to enforce essential boundary conditions extends the matrix to be square, as you mention. Once again, it was fine that we restricted our space $V$ to be zero on part of the boundary since the essential DOFs were enforced directly using the equation $\sum u_j \phi_j = g_D$, which does not involve any functions from $V$.
I think computer is using $a(u,v)=f(v)$ to generate matrix $A$ and source vector $b$ by $A_{ij} = a(\psi_i,\phi_j)$ and $b_i = f(\phi_j)$ for $\psi_i$ is basis of $V_h\subset V$ and $\phi_j$ is basis of $W_h\subset V$. If test function space $W_h $ is limited to homogeneous BC, its dimension would be lower than $V_h$. Additional Dirichlet BC must be imposed to make the matrix square. If Nitsche's method is used. No extra BC need to be imposed after the matrix assemble.
– CatDog Aug 03 '18 at 03:14