I first remove some possible points of confusion with wrapping,and then important assumptions since there can potentially be a lot of pitfalls with frequency and phase estimation, and then with those assumptions I will provide an optimum maximum-likelihood strategy for estimating the frequency and phase noise.
The OP has specifically stated that the samples received were wrapped at the 0 and $2\pi$ boundaries. This is different from the typical wrapping assumed by unwrap in the common tools (Python's Numpy, MATLAB, Octave). In those tools the assumed boundary is -$\pi$ to $\pi$ by default (and can be set to any other value). Changing the boundary is identical to changing the assumed phase offset $\theta_0$. Regardless of where the wrapping is set to occur, the phase offset term $\theta_0$ has no effect on reducing the amount of wrapping that was done in the measurement other than trivially the first sample in the equation given by the OP: wrapping is done on the difference between successive samples, and any such offset is removed in the difference operation. Generally consider arbitrary phase samples as a phase changing versus sample $n$ given as $\phi[n]$ and a phase offset given as $\theta_0$:
$$\theta[n] = \phi[n] + \theta_0$$
$$\theta[n]-\theta[n-1] = \phi[n]-\phi[n-1]$$
Given the wrapping is done on phase differences, we also see that if $\omega$ is known or can be easily approximated, then if the frequency could have been removed in the original phase measurement prior to the wrapping that was done, there will be significantly less discontinuities (knowing these kind of measurements, I assume that was actually done and the OP's $\omega$ is actually a small frequency offset from the frequency in the original signal, and the $\delta \omega$ term is a residual static frequency error to be determined.
I am also making the following simplifying assumptions which if not valid for the OP would modify the maximum likelihood estimation strategy that I will suggest:
The samples are uniformly spaced in time by $T = 1/f_s$ where $f_s$ is the sampling rate.
$A(t)$ has been properly removed prior to phase measurement and has had no effect on the measured results.
$\phi(t)$ is "small angle" phase noise; there is not a deterministic phase modulation that is being determined, and once the linear phase slope (static frequency) is removed the rms phase error is $\phi_{rms} < 1$ radian.
The capture duration is sufficiently short or the signal generation and signal measurement systems are locked to a common reference. The most important point here is that we are not yet affected by frequency drift that will inevitably occur when the reference of time or frequency for the signal generation system is different than the reference for the signal measurement system.
The noise process is assumed to be "White FM": Phase Noise (the OP's $\phi(t)$) and frequency noise given as $\omega(t)$ are related by $f(t) = d\phi(t)/dt$. Given this relationship, the OP's expression is given entirely in terms of phase noise with a static frequency but that doesn't mean we do not have both frequency and phase noise (they are the same noise, it is just deciding which way we want to describe it). That said, the Power Spectral Density due to Frequency Fluctuations (commonly abbreviated $S_f(f)$ for one-sided spectrums, and $\mathscr{L}_f(f)$ for two-sided spectrums) will be constant if the noise process is White FM, and the Power Spectral Density due to Phase Fluctuations (commonly abbreviated $S_\phi(f)$ for one-sided spectrums, and $\mathscr{L}_\phi(f)$ for two-sided spectrums) will go down -20 dB/decade. $\mathscr{L}_\phi(f)$ is most commonly used to represent spectrums of phase noise, so given that, this whole paragraph reduces to: confirm the phase noise is going with a slope of -20 dB/decade (or less). For longer captures of unlocked source and measurement, the slope of the noise will inevitably get steeper (for example from -20 dB/decade to -30 dB /decade to -40 dB/decade as the noise process migrates from white FM noise, to flicker freq noise, to random walk frequency noise), which in simpler terms will mean our estimation techniques will diverge. If this turned out to be the case, it means our estimates would improve if we shortened the capture duration used so that this does apply.
Also to avoid confusion and as motivated by this recent post, since I will be using both discrete and continuous time references to frequency, I will use $\Omega$ to refer to continuous time frequency as given in units of radians/sec ($\Omega = 2\pi F$ where $F$ is the frequency in Hz) and $\omega$ as discrete time "normalized" frequency as given in units of radians/sample, with $\omega = \Omega/f_s$. (And $\omega = 2\pi f$ where $f$ is normalized frequency in units of cycles/sample).
Thus we start with the OP's phase measurements as samples of the following phase versus time function:
$$\theta(t) = (\hat{\Omega}+\Delta{\Omega})t + \phi(t) + \theta_0 \tag{1}\label{1}$$
I will remove $\theta_0$ from consideration since it has no effect on phase wrapping as previously explained, as well as no effect on the estimate the OP seeks for static frequency offset $\Delta \omega$ and phase noise $\phi(t)$.
Thus when sampled at $t= nT$, we have the samples given as:
$$\theta(nT) = (\hat{\Omega}+\Delta{\Omega})nT + \phi(nT) \tag{2}\label{2}$$
The sampling rate $f_s = 1/T$, and thus the samples in units of normalized frequency as a function of time given simply in samples is given as:
$$\theta[n] = (\hat{\omega}+\Delta{\omega})n + \phi[n] \tag{3}\label{3}$$
With that, my recommended algorithm to determine $\Delta \omega$, $\phi[n]$ is:
(OP has mentioned in the comments that there is a deterministic phase modulation involved, so this answer will likely be modified based on the details of that)