There is no allowance for transition band between the passband of interest (5.5KHz to 6.5 KHz less a notch in the center) and the frequencies in aliasing zones to reject based on the translation frequencies chosen. This means implementing the required anti-alias filters is not possible. One solution is to maintain a complex signal by implementing a full complex multiplier as the aliasing occurs in the final conversion to a real output, otherwise the frequency of the second translation must be increased to allow for transition bands for realizable filters, or the actual bandwidth of interest reduced.
I suspect that this is the primary effect in that the signal quality is limited by the actual attenuation of the filters with finite noise and signal in the image locations. There also is the possibly for amplitude and phase imbalance in the quadrature mixing- quadrature phase and amplitude imbalance between the I and Q paths would be more likely if any of that is done in the analog while the OP appears to be doing all quadrature multiplication in the digital — still there is opportunity if care is not taken to have delay, phase or amplitude imbalances which will show up through the testing detailed further below.
Below shows spectrum diagrams of the processing if the filter rejection and quadrature balances were perfect. This is the operation the OP desires:

Consider the first translation as multiplying a real signal as represented by the top spectrum (which would have a Hermitian symmetric spectrum: same magnitude in positive and negative frequencies but opposite phase) with the complex tone $e^{-j2\pi 6000 t} = e^{-j2\pi6n/24}$ once sampled. This shifts the entire spectrum to the left just as the OP has shown which is then selected by the low pass and high pass filter. The performance of the low pass if of particular interest as that sets the image rejection for the next translation.
The next translation is multiplying this complex result (we can refer to the upper path in the OP's functional sketch as "real" (I) and the lower path as "imaginary" (Q) and then treat the two path output in the filter stage as a complex signal given as $I + JQ$) with another complex tone when sampled at $e^{j 2\pi n/48}$ which shifts the entire spectrum to the right as shown, and then we effectively take the real part of this full complex multiplication by not doing two of the multiplications required for an actual full complex multiplication:
$$(I_1+jQ_1)(I_2+jQ_2) = I_1 I_2 - Q_1Q_2 + j(I_1 Q_2 + I_2Q_1)$$
The real part of the above represents the processing actually done in the second translation stage ($I_1 I_2 - Q_1Q_2$) where $I_1, Q_1$ is the signal path and $I_2, Q_2$ is the cosine and sine used for translation. The result of the full complex signal is centered at +0.5 KHz. When you take the real part it will be centered at +/-0.5 KHz.
Any leakage from the filter that results in signals at -1 KHz prior to the second stage will be centered at -0.5 KHz after the frequency translation. When you take the real of that, it will also end up at +/-0.5 KHz and be distortion in the result. This depicted in the next graphic below showing how spectral content in the alias region from 4.5 KHz to 5.5 KHz will fold onto the desired signal at the output if not filtered out (either at the very front-end, or after the first multipliers:

The issue here is the OP is interested in maintaining the input spectrum from 5.5 KHz to 6.5 KHz (ignoring the center notch which is of no consequence). The band that will alias in from the second frequency translation (once converted to real) is from 4.5 KHz to 5.5 KHz (at the real input). This allows for no transition between the band to reject and the band to pass whether the filter is implemented at the front-end or after the first frequency translation! Filter complexity and performance is dictated by how much frequency we allow to transition from a passband to a stopband.
Any quadrature amplitude and phase imbalances will also lead to such image distortion. This is less likely in an all digital implementation, but to show the effect consider the spectrum plot below showing how quadrature imbalance in the first multiplier stage would appear and how distortion results:

To test and validate against these effects (regardless of final frequency plan and bandwidth used), consider using a single tone at the frequency of interest and measure the “conversion gain” from input to output. Then put this tone at the image frequency locations and measure the image suppression. For example, to confirm the -6 KHz balance try a tone just at 6.25KHz and see how much appears at 0.75 KHz in the output as it should versus how much appears at 0.25 KHz which is dues to quadrature amplitude and phase imbalance in either of the two multiplier stages. (I suspect this should be quite good if done digitally but worth confirming). Then put a tone just at 4.75KHz to test the filter rejection. This will end up at -1.25 in the middle stage which will be rejected by the low pass filter (to some degree) and whatever is not rejected will shift to -0.75KHz after the second stage multiplier, and when you take the real of that will also be at +/-0.75KHz. (From this we also see as currently done, the appropriate filtering as we approach the 5.5 KHz edge is not feasible).
The best place to do decimation is after the filters and prior to translating to a higher frequency. Although depending on decimation ratios and approach it may not make any different. To decimate properly, understand where all the aliasing frequency bands are given the sampling rate and decimation ratio and ensure there is sufficient rejection in these zones specifically. See this post for further details on aliasing bands with decimation.
If the operations in the above spectrum plots are confusing, this post provides further detailed insight into the frequency translation of complex signals.
not quite right? – Peter K. Apr 18 '22 at 18:01