You stated the acceptance criterion:
$$e^{(E_\mathrm{new}-E_\mathrm{old})/kT} \geq \mathrm{random}(0,1)$$
The left hand side expresses a ratio of likelihoods $\mathcal{L}_\mathrm{new}/\mathcal{L}_\mathrm{old}$ of a new state and an old state. If the new state is more likely than the old state, $\mathcal{L}_\mathrm{new}/\mathcal{L}_\mathrm{old} \gt 1$, so the acceptance criterion is automatically fulfilled.
If the new state is less likely than the old state, $0 \lt \mathcal{L}_\mathrm{new}/\mathcal{L}_\mathrm{old} \lt 1$. Now the acceptance criterion is fulfilled randomly. The more unlikely the new state (as compared to the old state), the more unlikely it is to be accepted, but there is always a finite probability of acceptance (well, up to machine precision anyway).
In practice, this means that as the space of states is explored, typically the algorithm will move toward more likely states, but will occasionally move against the gradient of the likelihood. This is important if there are local maxima in the likelihood distribution so that the algorithm can "get out" of a local maximum to keep exploring other regions of the space of states. Without this feature, the algorithm would get stuck at the first little peak in the likelihood distribution that it encountered. It also allows proper exploration of the vicinity of a maximum, with the number of occupations of a state being proportional to the relative likelihood of the state (as the number of iterations approaches $\infty$).