The (differential) cross section for some scattering process is directly proportional to the (square of the) matrix element for that process. Conceptually they are both proportional to the probability that the process occurs. But they have different units and have slightly different meanings. Neither is a pure probability by itself, but both are closely related to the probabilities of a transition.
Let's be explicit about our terms.$%
\newcommand{\bra}[1]{\left< #1 \right|}
\newcommand{\ket}[1]{\left| #1 \right>}
\newcommand{\braket}[1]{\left< #1 \right>}
$
The "matrix element" connecting two states $\ket a$ and $\ket b$ is generally the integral $\mathcal H_{ab} = \braket{a\middle|\hat H\middle| b}$, where $\hat H$ is the Hamiltonian operator. If the states are properly normalized, so that $\braket{a\middle|b} = \delta_{ab}$, then the matrix element has the same units has the Hamiltonian operator: an energy.
To turn a matrix element into a decay probability, you find the decay width, $\Gamma_{ab} \propto \left|\mathcal H_{ab} \right|^2 \rho(b)$, which also has units of energy. (Here $\rho$ is the density of states around the final state, which has units of states (dimensionless) per unit energy.) The dimensionless probability that a state has decayed through a particular pathway during some time interval $\Delta t$ is $e^{-\Delta t \,\Gamma / \hbar}$. This is quite a bit of massaging, and requires some specification about your experimental setup.
To turn a matrix element into a cross section, you have to do even more massaging, because the initial and final momenta and energies and the spin degeneracy and the phase space all come into the calculation. Here's a nice writeup that keeps all of the dimensionful constants, rather than the handwaving practice of setting $\hbar = c = \pi = 2\pi = 1$ which is common in order-of-magnitude estimates. Note that the linked result includes an unphysical "normalization volume," because the plane-wave initial states can't actually be normalized.
An operational definition of a cross section is to imagine a scattering experiment from a thin fixed target, with thickness $\ell$ and number density $n$. In the approximation where you can ignore multiple scattering (thus a "thin" target), you find that the beam is exponentially attenuated: the ratio of the incident beam intensity to the unscattered beam intensity goes like $e^{-n\ell s}$, where the $s$ stands for "something." Dimensional analysis tells you that $s$ must have units of area. For the case of hard-sphere scattering, $s$ actually does turn out to be the cross-sectional area of the spheres, so we retain that name and promote the $s$ to its Greek-alphabet counterpart to make it seem like we're educated.
Probabilities are dimensionless. Neither the cross-section nor the matrix element is dimensionless, so neither is a probability.