For maximum sensitivity preamble detection with time referencing, consider using a barker code, or for more choices with longer lengths a PRN (pseudo-random noise) sequence, or for even more choices when more distinction is needed (CDMA) Gold codes are a possibility. (Also not further detailed but have other advantages are Kasami codes and Zadoff-Chu codes).
All these codes offer the characteristic of having a strong autocorrelation at time offset = 0 and importantly a much smaller autocorrelation for any other time offsets. Such as that demonstrated in the graphic below for a PRN sequence using an LFSR (Linear Feedback Shift Register):

The codes provide some noise immunity through processing gain; under condition of white noise, the processing gain which is the improvement of SNR is given as $10Log_{10}(N)$ where N is the length of the sequence. This comes out from addition of correlated signal components results in the standard deviation of the signal component growing at rate N while the noise (if white, meaning uncorrelated from sample to sample) will have a standard deviation that grows at $\sqrt{N}$. We detect the presence of the sequences using correlation, which is a multiply and sum:
$$R_{xy}(\tau) = \Sigma x(t-\tau)y(t)$$
With the formula above representing a cross-correlation function of the received signal $x(t)$ with the sequence of interest $y(t)$, at various time offsets $\tau$.
There are several ways this is done in practice and depends on the detailed implementation approach but for shorter sequences this can be done with an FIR filter, with the coefficients of the filter as the code sequence upsampled to match the sample rate of the received signal (best to do this at the lowest rate possible of course). For longer sequences it is often more practical to create a code generator with a programmable offset (to shift the start position in time) and simply multiply the received sequence by the code and accumulate the output. A threshold detection can indicate the successful presence of the code.
Below shows the result of a sliding correlation on a GPS signal, where the code repeats every 1 ms. (I would never use an FIR filter to correlate a GPS sequence which is 1023 chips long, but this shows what a sliding correlation would look like).

Other approaches involve doing the correlation in the frequency domain using this relationship:
$$R_{xy} = \text{IFFT}(\text{FFT}(x)\text{FFT}^*(y))$$
Which states that taking the inverse Fourier Transform of the complex conjugate product of the Fourier Transforms of x and y results in the circular correlation of x an y.
Further, a very important take-away is the sensitivity of any of these correlations to frequency offset, specifically that the magnitude goes down as a Sinc function with the first null at 1/T.
Please see this post directly where I detail that: GPS CA Signal Acquisition
This informs the frequency capture range based on the length of code used. If you are well inside this main lobe in frequency (I typically use 1/2 the null-null frequency range, but if the SNR is poor you may need to be more constrained). Typical acquisition approaches will either slowly ramp the frequency offset or step the frequency offset during correlation (The frequency is offset by using an IQ rotator on the received signal prior to correlation). Stepping the offset involves setting the frequency at an initial guess, correlating and if the acquisition threshold is not exceeded, stepping the frequency by 1/T (if using 1/2 the null-null frequency range) and repeating. If ramping, the ramp rate must be much less than 1/T without inducing an additional frequency offset loss.
The frequency offset is accurately estimated by comparing two successive correlations, since frequency is the change in phase versus the change in time ($d\phi/dt)$ by simply comparing the phase and time between two successive correlations, we can determine the frequency offset assuming we are in the frequency range of acquisition as detailed above (importantly these correlations are complex correlations with two real outputs I and Q such that carrier phase is determined).
Note that the frequency computation detailed at this link Obtaining phase and magnitude of 2 spectral components after ADC
using the computation
$$K = I[n]Q[n-1] - I[n-1]Q[n]$$ where K is propotional to frequency offset equally applies to the successive I and Q correlation outputs, referred to as I[n], Q[n] and I[n-1]Q[n-1].
Much more practical with modern processing and shorter preamble sequences would be a one shot frequency offset and delay estimation that I have detailed here:
GPS signal acquisition
That said, a robust preamble solution could involve sending multiple copies of one code to aid initial carrier and clock acquisition (and channel estimation!) followed by a final negative of the code sequence to mark the beginning of the data. (Just invert the code and the correlation will be shifted 180°). The trade space is over-head of the preamble versus acquisition sensitivity. It is typical to require more SNR during acquisition, especially if a fast acquisition is needed, hence the significance of correlation and processing gain.