The optimum occurs because of the following mathematical arguments. I am going to give the physical argument with the nozzle flow equations, however, the method of Lagrangian multipliers could be used to mathematically prove the optimum occurs when $(p_e-p_a) = 0$.
Anyways, as you mentioned above, the thrust for a conventional rocket is given as,
$$ F = \dot{m} v_e + (p_e-p_a)A_e $$
From the derivation using control volume analysis, it follows that the first term is the momentum flux across the nozzle exit plane, and the second term is the pressure area force acting on the nozzle exit plane.
Now, we want to look at how each parameter above is dependent on the properties of the rocket thrust chamber and nozzle configuration. Rewriting the mass flow rate ($\dot{m}$) using the mass flow parameter for a nozzle with the choked condition at the nozzle throat we have,
$$ \dot{m} = \frac{A_t p_1}{\sqrt[]{T_1}} \ \sqrt[]{\frac{\gamma}{R}} \left(\frac{\gamma+1}{2}\right)^{-(\gamma+1)/[2(\gamma-1)]} $$
where $p_1$ is the chamber pressure, $T_1$ is the chamber temperature, $A_t$ is the throat area, $R$ is the specific gas constant for the burned products in the thrust chamber, and $\gamma$ is the ratio of specific heats of the burned products in the thrust chamber. Notice, this parameter has no dependence as to what is taking place at the nozzle exit, or the nozzle exit area for that matter. This is simply a result of the mass flow parameter being applied to the throat choking condition, which alleviates the dependency at the nozzle exit plane. Thus, our first parameter of the first term is not influenced by the nozzle exit condition.
Now for the velocity at the nozzle exit plane ($v_e$) we have,
$$ v_e = \sqrt[]{\frac{2\gamma}{\gamma-1} R T_1 \left[1-\left(\frac{p_e}{p_1}\right)^{(\gamma-1)/\gamma}\right]} $$
As observed in this expression, the lower $p_e$ we can achieve the higher the nozzle exit flow velocity we can achieve. So in the context of the momentum flux from the nozzle exit, it is desired to expand the gas to the lowest $p_e$ as possible in order to achieve the highest $v_e$ as possible.
Now for the pressure area force term, it is obvious that if $p_e$ drops below the ambient pressure $p_a$, then we will have a negative pressure area force acting on the nozzle exit plane. Obviously, this is not desirable, so there is a limit to which we wish to expand the chamber pressure $p_1$ through the nozzle to $p_e$ at the nozzle exit plane. Hence, we have two scenarios left to consider, the case where $p_e = p_a$ (perfect expansion and no pressure area force) and the case where $p_e > p_a$ (reduced momentum flux, but pressure area force). The latter condition will result in a reduced momentum flux than the case $p_e = p_a$, but we will have a positive pressure area force, instead of a zero pressure area force. The question then becomes, what is the ratio of the gain in pressure area force vs. the loss in momentum flux. The easiest way to demonstrate the optimum thrust argument is by looking at how each thrust component term grows in relation to $p_e/p_a$. Below is a plot demonstrating this relation. Clearly, the optimum thrust is achieved when $p_e = p_a$, hence, the gains from the pressure area force are not comparable to the loss in momentum flux for the case when $p_a < p_e$.
