I think that the accepted answer to the question that you linked to is incorrect. The image of a compact manifold into $\mathbb{R}^n$ by a continuous mapping is compact regardless of how we cook up the map. However, judging from some of the other questions on the same topic, it might be good to be a little more explicit in answering.
To see what is going on, we can work out an example in detail following the definitions and language given by Milnor up to that point in the book. For simplicity, consider the example that Milnor gives immediately after the theorem. Let us consider the map $f:\mathbb{R}^2\rightarrow \mathbb{R}$ that sends $(x,y)\rightarrow x^2+y^2$. Following Milnor's comments on p. 3, the derivative at the point $(x,y)$ is given by the matrix
$$[\partial f/\partial x, \partial f/\partial y]_{(x,y)} $$
In particular the first partial derivative is $2x$ and the second is $2y$. So, at the point $(x,y)$ the dervative is given by the matrix
$$[2x,2y]$$
The above matrix acts on tangent vectors.
In particular, notice that at any non-zero point of $\mathbb{R}^2$ that the above matrix is a surjection, hence every $v\neq 0$ in $\mathbb{R}$ is a regular value of $f$.
In particular, let us consider $f^{-1}(1)$. As we know this is just the circle, $S^1$. Milnor now asks us to consider the null-space of the of the map $df_{(x,y)}$. At the point $(0,1)$ the derivative is $[0,2]$. So, the null space is generated by the tangent vector in the $x$ direction. So, following what Milnor says, Consider the linear map $L:\mathbb{R}^2\rightarrow \mathbb{R}$ that sends $(x,y)\mapsto x$. Certainly the derivative of this map is non-singular on the null space. So, Milnor now asks us to consider the map $F:\mathbb{R}^2\rightarrow \mathbb{R}\times \mathbb{R}$ that sends $(x,y)\mapsto (x^2+y^2,x)$.
Observe now that the image of $f^{-1}(1)$ by $F$ is the set of points $1\times [-1,1]$, which is not $1\times \mathbb{R}$. So, evidently we only have containment. However, we still obtain the result, which is that the map $F$ is a diffeomorphism in a neighborhood of $(0,1)$. This is because its derivative, written using the standard basis of each tangent space, is
$$\begin{bmatrix}
0& 2\\
1 & 0
\end{bmatrix}$$
and so its determinant is non-zero. In particular, by the inverse function theorem there exists an open neighborhood $V$ of $(0,1)$ in $\mathbb{R}^2$ such that $F(V)=U$ is open in $\mathbb{R}\times \mathbb{R}$ and $F\mid_{V}$ is a diffeomorphism between the two. Now observe that $F(f^{-1}(1))\subset 1\times \mathbb{R}$ as by the definition of $f$ only points in $f^{-1}(1)$ are mapped into $1\times \mathbb{R}$ by $F$. Hence $F(V\cap f^{-1}(1))=(1\times \mathbb{R})\cap U$, which is then open when regarded as a subset of $1\times \mathbb{R}$. Hence $F\mid_{V\cap f^{-1}(1)}$ is a system of coordinates for the neighborhood $V\cap f^{-1}(1)$. The same argument, mutatis mutandis, works at the other points, so we see that $S^1$ is a smooth manifold.