Let's say you've sampled an analog signal $x(t)$ with spectrum $X(\omega)$ at rate $1/T$ that is high enough to satisfy the sampling theorem. The spectrum (i.e. the discrete time Fourier transform (DTFT)) of the sampled signal $x_{1,k}$ will be a periodic repetition of $X(\omega)$. The repetition period is $1/T$.
Now you sample the same signal with a higher rate $L/T,\, L\in\mathbb N$ yielding $x_{L,k}$. Again the spectrum will be a periodic repetition of $X(\omega)$ but this time the repetition period is $L/T$, so the spectral images have greater frequency distance than before.
The task of upsampling consists in calculating $x_{L,k}$ from $x_k$. First, $L-1$ zeros are inserted after every sample of $x_k$. Actually this just changes the basic frequency support of the DTFT to $-L/(2T)\ldots L/(2T)$ containing $L$ copies of the original (analog) spectrum. Therefore the unwanted copies are filtered out with a lowpass filter so that only the original spectrum in range $-1/(2T)\ldots 1/(2T)$ remains. This is identical to $x_{L,K}$
The above steps are quite well explained in the figure of the Wiki article you quoted (in the same order).