For tones the peak value is simply the DFT result divided by the total number of signal samples used in the DFT. In
the case for a normalized sinusoid, we would expect each bin to be -6 dB without the effects of leakage and scalloping given the relationship:
$$\cos(\omega t) = \frac{1}{2}e^{j\omega t} + \frac{1}{2}e^{-j\omega t}$$
Each tone in dB would be given as $20log_{10}(0.5)=-6$ dB consistent with the coefficient of each term above. The OP has scaled the sinusoid by $\sqrt{2}$ which would then increase each to be -3 dB.
We would only divide by the sampling rate to get the power spectral density which is very different from the total power in each tone (as it is power per unit Hz of bandwidth as measured by the resolution bandwidth of the DFT). In this case, without further windowing, the resolution bandwidth of each bin in the DFT is the $f_s/N$ where $f_s$ is the sampling rate and $N$ is the total number of signal samples in the DFT. Here the resolution bandwidth is $100e3/4096 \approx 24.4$ Hz. So if the power in each bin was $-3$ dB, then as a power spectral density this should be close to $-3 - 10Log_{10}24.4 = -16.9 $ dB/Hz.
When I run the OP's code the result I get is -19.3 dB/Hz. The difference is attributed to scalloping loss which can be up to 3.92 dB for a single tone in a rectangular windowed DFT. (attributed the the signal not being an integer number of cycles in the DFT frame, however using better windows is the generally preferred solution).
To estimate the peak tone we would simply scale by the number of actual signal samples used to compute the FFT, here as:
X_norm = 20*np.log10(abs(X) / fft_size)
Which in this case results in -5.4 dB vs the theoretical -3.01 dB. This difference is also attributed to scalloping loss. To demonstrate in the OPs case since an integer sampling rate was used, if instead the fft was done on the complete sequence we would precisely predict the tone peak:
X = np.fft.fft(x)
X_norm = 20*np.log10(abs(X) / len(t))
With the result for the max of X_norm as -3.01 dB.
And the power spectral density which is the OP's normalization used, would be:
X_norm = 10*np.log10(abs(X)**2 / fs / fft_size)
With the result for the max of X_norm as -13.01 dB.
This is consistent with the resolution bandwidth as given by the 10,000 samples if all were used in the FFT: $f_s/N = 100,000/10,000 = 10$ Hz/bin. $-3.01 - 10Log_{10}(10)= -13.01$ dB/Hz.
Note for proper estimation of spectral tones and distributed noise in the DFT, please see the further details I have provided in this referenced post.