$\phi$ is differentiable. In fact it's $\mathcal C^\infty$. The reason: $\phi$ is a multivariate polynomial of degree 2. Indeed, each term in the sum is of the form $$g(x-x_j)^2=\|x-x_j\|^2=\sum_{i=1}^n (x^{(i)}-x_j^{(i)})^2$$ where the upper index $(i)$ denotes the $i$-th component of a vector.
With that, we can express the Taylor expansion of $\phi$ at an arbitrary point $x$ as
$$\phi(x+h)=\phi(x)+\langle \nabla_x\phi, h\rangle + \langle \mathcal H_x h, h\rangle + o(\|h\|^2)\tag{1}$$
where $\nabla_x\phi$ represents the gradient (vector) of $\phi$ at $x$, and $\mathcal H_x$ represents the Hessian (matrix) of $\phi$ at that point.
Now, remember that $x$ is an extremum of $\phi$ if and only if $\nabla_x\phi$ is the zero vector (singular point), and that $\mathcal H_x$ is positive semi-definite for a minimum (or negative semi-definite for a maximum).
So we need to compute the Taylor expansion at $[1]$ to find the gradient and Hessian. With this function, it's fairly simple because:
$$\begin{split}
\phi(x+h)&=\sum_{j=1}^\mu \|x+h-x_j\|^2\\
&=\sum_{j=1}^\mu\langle x+h-x_j,x+h-x_j\rangle\\
&= \sum_{j=1}^\mu\left(\langle x-x_j,x-x_j\rangle + 2\langle x-x_j,h\rangle +\langle h,h\rangle\right)\\
&= \phi(x) + \left\langle 2\sum_{j=1}^\mu (x-x_j),h\right\rangle + \mu \|h\|^2
\end{split}$$
Thus, identifying with $[1]$, we see that the gradient is $$\nabla_x\phi = 2\sum_{j=1}^\mu (x-x_j)$$ and that the Hessian is $\mu$ times the identity matrix (positive definite).
Equating the gradient to the zero vector leads to $x$ being the centroid (that is, the average) of the points $x_1, ..., x_\mu$:
$$x=\frac 1 \mu \sum_{j=1}^\mu x_j$$
So $\phi$ has a unique extremum, it's a minimum, and it's at the centroid of your points.
And there's nothing special about being in finite dimensions (the computation above extends to infinite dimensions).
With this, you have recovered a well-known result. In statistics / machine learning / functional approximation, $\phi$ often represents the estimation / training / approximation error. And the average is the estimate / model / approximation that minimizes that least-square error. In physics, this also has an interpretation as the center of mass of a cloud of points $x_1, ..., x_\mu$ of equal mass. Note that you can extend the result to points with different masses/weights, to obtain that the minimizer is the weighted average. But I digress.