This answer is based on the contents of "pages 8 and 9 of file below: "GSM Channel Equalization, Decoding, and SOVA on the MSC8126 Viterbi Coprocessor (VCOP)" and not upon the OP's interpretation of what has been said on those pages or the incorrect naming of various mathematical operations therein.
Consider the cross-correlation function
$$R_{u,v}(t) = \int_{-\infty}^\infty u(\tau+t)v(\tau)\,\mathrm d\tau
\tag{1}$$
of finite-energy signals $u(t)$ and $v(t)$. Its Fourier transform is
\begin{align}
S_{u,v}(f) &= \int_{-\infty}^\infty R_{u,v}(t)\exp(-j2\pi ft)
\,\mathrm dt\\
&= \int_{-\infty}^\infty \left[\int_{-\infty}^\infty
u(\tau+t)v(\tau)\,\mathrm d\tau\right]\exp(-j2\pi ft)\,\mathrm dt\\
&= \int_{-\infty}^\infty \left[\int_{-\infty}^\infty
u(\tau+t)\exp(-j2\pi ft)\,\mathrm dt\right]v(\tau)\,\mathrm d\tau\\
&= \int_{-\infty}^\infty \left[\int_{-\infty}^\infty
u(\lambda)\exp(-j2\pi f(\lambda-\tau))\,\mathrm d\lambda \right]v(\tau)\,\mathrm d\tau\\
&= \int_{-\infty}^\infty \left[\int_{-\infty}^\infty
u(\lambda)\exp(-j2\pi f\lambda)\,\mathrm d\lambda \right]
v(\tau)\exp(j2\pi f\tau)\,\mathrm d\tau\\
&= \int_{-\infty}^\infty U(f)v(\tau)\exp(-j2\pi f\tau)\,\mathrm d\tau\\
&= U(f)V^*(f) \tag{2}.
\end{align}
Now, we apply this result to the GSM system under consideration in
which a known training signal $x(t)$ is transmitted across the channel
which is modeled as an LTI system whose impulse response $h(t)$
is unknown. We wish to estimate $h(t)$ from knowledge of the
transmitted training signal $x(t)$ (which we have chosen very carefully
to have various desirable properties) and the corresponding channel output $y(t) = (x \star h)_t$ whose Fourier transform $Y(f)$
equals $X(f)H(f)$.
Since $x(t)$ is known to the receiver, it can generate a replica of
$x(t)$ at the receiver. Now suppose that we compute the
cross-correlation function $R_{y,x}(t)$ of the received signal
$y(t)$ and the local replica $x(t)$. The Fourier transform of
this cross-correlation is
$$\mathcal F\{R_{y,x}(t)\} = Y(f)X^{*}(f) = X(f)H(f)X^{*}(f)
= |X(f)|^2 H(f)\tag{3}$$
which shows that
$$R_{y,x} = R_{x,x}\star h\tag{4}$$ where $R_{x,x}$ is
the autocorrelation function of the signal $x(t)$.
(This is Equation (4) on page 8 of the document cited
in the first sentence of thus answer.
If the autocorrelation function $R_{x,x}(t)$ is a Dirac delta
or impulse $\delta(t)$,
then $(4)$ shows that the cross-correlation function $R_{y,x}(t)$
that we have just computed is just $h(t)$, the channel impulse
response that we are trying to estimate! Of course, no
deterministic signal can have $\delta(t)$ as its autocorrelation
function, but there do exist signals whose autocorrelation
function resembles the proverbial "inverted thumbtack"
function: a large very narrow spike
at $t=0$ and very small (close to $0$) values for $t \neq 0$. Binary
Barker sequences are one such class, but since the longest known
Barker sequence is of length $13$, lots of people have expended
lots of computer time searching for longer binary sequences
whose autocorrelation functions look like inverted thumbtacks.
(If arbitrary amplitude levels are permissible, then Huffman's
impulse-equivalent sequences can be considered. Using such a sequence
(actually, the corresponding pulse train) for $x(t)$ leads to
$$h(t) \approx K\cdot R_{y,x}(t) \tag{5}$$ where $K$ is a constant
whose value can be determined once we have chosen $x(t)$.
Finally, we come to the computation of the cross-correlation
function $R_{y,x}(t)$. Set $\hat{x}(t) = x(-t)$ and note that we
can write
\begin{align}
R_{y,x}(t) &= \int_{-\infty}^\infty y(\tau+t)x(\tau)\,\mathrm d\tau\\
&= \int_{-\infty}^\infty y(\lambda)x(\lambda-t)\,\mathrm d\lambda\\
&= \int_{-\infty}^\infty y(\lambda)\hat{x}(t-\lambda)\,\mathrm d\lambda\\
&= \left(y\star \hat{x}\right)_t
\end{align}
that is,
we can compute the desired cross-correlation $R_{y,x}(t)$
by filtering
the received signal $y(t)$ through a filter whose impulse
response is $\hat{x}(t) = x(-t)$.
But, a filter whose impulse response is $x(-t)$ is what I have
called in this answer as the matched filter for $x(t)$ and so
we can compute the desired cross-correlation $R_{y,x}(t)$
by filtering
the received signal $y(t)$ through **the* matched filter for $x(t)$.
This is what the document cited above says (just above Equation 4):
The received training sequence in the digital domain $\ldots$
is fed into a digital matched filter $\ldots$ with an impulse response that is matched to $\ldots$
(the transmitted sequence)
It is to be hoped that this will clarify the confused discussion
in the comments on the main question as well as on my other answer
between the OP, @DanBoschen and myself.