If you decided to represent a time domain sample as an infinitesimal width sinc with main lobe width of $\frac{\pi}{T}$ at a fixed height $A$ centered at $t=0$, instead of a Dirac delta $A\delta(t)$, then the amplitude would have to be normalised (decreased) by a factor of $\frac{1}{T}$ as $T\to\infty$ to keep the height fixed at $A$.
$$\lim_{T \to \infty} \lim_{x \to 0} \frac{A\sin(Tx)}{x} = \infty$$
$$\lim_{T \to \infty} \lim_{x \to 0} \frac{A\sin(Tx)}{Tx} = A$$
In the first case, the frequency domain is a centred rect window of width $2T$ that stretches to infinity and will be at an amplitude of $\frac{2\pi A}{2}$. In the second case it will be at an amplitude of $\frac{2\pi A} {2T}$, which will become infinitesimally small in amplitude, which is useless.
If you decided to represent the time domain sample as an infinitesimal width rect window centred at $t=0$ of width $2T$ as $T\to0$, the same thing would happen. As you reduce the width of the window, the amplitude in the frequency domain goes to 0, so you'd have to normalise (increase) the height of the rect window with $\frac{1}{T}$, meaning the rect window becomes infinitely tall. The frequency domain sinc would have an amplitude of $2TA$ without normalisation, which is once again infinitesimally small, and $\frac{2TA}{T} = 2A $ with normalisation.
The essence is, you need an infinite amplitude impulse in the time domain to mathematically get a more convenient frequency domain sample to work with that doesn't contain $\frac{1}{\infty}$. Therefore it is convenient to represent a sample using the dirac delta function, which is essentially a infinitesimal width rectangular window with a height that is reciprocal of its width (and indeed the area underneath it is 1, like the dirac delta, as height multiplied by width cancel), and given that the area under a dirac delta is a multiple of 1 instead of a multiple of $\pi$ that sincs have, it is closer to the normalised rect window suggestion, except it is normalised by the full width $2T$ instead of half the width ($T$), and you get an amplitude of $A$ in the frequency domain for $A\delta(t)$ instead of $2A$ that the earlier suggested normalised centred rect window of height, and instead of the $\frac{2\pi A}{2}$ that the suggested unnormalised centred sinc gives (of course, you could also represent it as an infinitesimal width sinc (which results in infinite height) divided by $\pi$, so that it always has an area of $A$ and gives an amplitude of $A$ in the frequency domain). The dirac delta is also used in the frequency domain to represent a frequency component that is of infinite length in the time domain, and here the $\pi$ factor is introduced because a unit sinc has an area of $\pi$.
Sampling is therefore multiplying the time domain by a Dirac Comb. If you sample a rectangular window of width $2T$ amplitude $A$ at $t=0$ and the other samples fall outside of the rectangular window, then the result in the time domain is a single $A\delta(t)$. The frequency domain of this sample, being a flat line at $A$, is indeed the same as the frequency domain of a Dirac Comb multiplied by the rectangular window. The fourier transform of a Dirac comb of sampling period $T_s$ is another Dirac comb of Dirac deltas of area $\frac{2\pi}{T_s}$ i.e. $\frac{2\pi}{T_s}\delta(x)$, and of spacing $\frac{2\pi}{T_s}$. We use the known fact that the fourier transform of the multiplication of 2 functions (the rectangular window and the dirac comb) is $\frac{1}{2\pi}$ times the convolution of their individual fourier transforms. Therefore, you end up with sincs at peak amplitude of $2AT$, being convolved with a dirac delta of area $\frac{2\pi}{T_s}$, multiplied by $\frac{1}{2\pi}$, and you end up with sincs of a height of $\frac{2AT}{T_s}$ spaced apart at $\frac{2\pi}{T_s}$ intervals, and when $2T=T_s$, the height is $A$ and they're spaced orthogonally at $\pi$ intervals, which produces a straight line at $A$ when as there are an infinite number of sincs. A straight line at $A$ should be produced as long as $T_s > T$, otherwise there will be 3 samples instead of one sample. $2T > T$, so you get a straight line with amplitude $A$ in the frequency domain.
This can be seen by plotting on Desmos:
$$\sum_{n=-450}^{450}\frac{2A}{T_s}\frac{\sin\left(T(x-n\frac{2\pi}{T_s})\right)}{x-n\frac{2\pi}{T_s}}$$
And seeing what happens when $T_s > T$ vs. $T_s \leq T$. The difference is the point where you have one sample at $t=0$ vs. 3 samples at $t=-T$, $t=0$ and $t=T$, which of course produces a cos wave in the frequency domain. If the rect pulse you're sampling were just from $t=0$ to $t=T$ then it is not an even function and the frequency domain is a sinc and an imaginary cosc. When these are convolved at $T_s =T$ you get a sample at $t=0$ and $t=T$ which of course produces a complex exponential in the frequency domain added to the constant line produced at sample $t=0$.
A single dirac delta at $A\delta(t+c)$ is indeed an infinite complex sinusoid in the frequency domain of amplitude $A$ and frequency $c$; to visualise, plot an infinitesimal width ($2T$ as $T\to0$) rect window with a height the reciprocal of its width centred at $t=-c$
$$ A \int_{-\infty}^{\infty} \delta(t+c) e^{-i\omega t} dt $$
$$ \equiv A ~ \left(\lim_{T\to 0} \int_{-c-T}^{-c+T} \frac{1}{2T} e^{-i\omega t} dt\right)$$
$$ = \lim_{T\to 0} \frac{A}{2T}\left(\frac{\sin ((-c+T)\omega) - \sin ((-c-T)\omega)}{\omega} + \frac{i(\cos ((-c+T)\omega) - \cos ((-c-T)\omega))}{\omega} \right)$$
$$= Ae^{ic\omega} $$
If you were to do the inverse fourier transform of $Ae^{ic\omega}$, you'd get $2\pi A\delta(t+c)$, because the (inverse) fourier transform of a complex sinusoid produces an infinitesimal width sinc of amplitude $2AT$, and therefore area $2\pi A$. I.e. the dirac delta is now made an infinitesimal width sinc instead of a rect window by using the $2\pi$ factor.
$$ A ~ \left(\lim_{T\to 0} \int_{-c-T}^{-c+T} \frac{1}{2T} e^{-i\omega t} dt\right) \equiv A ~ \left(\lim_{T\to \infty} \int_{-c-T}^{-c+T} \frac{1}{\pi} \frac{\sin(T(x+c))}{x+c} e^{-i\omega t} dt\right) $$
The $2\pi$ factor always appears because depending on the selected representation in the time domain, in the frequency domain you either have the multiplication of a signal with an infinite width rect window, or the multiplication of a signal with an infinite width sinc. When you inverse transform to the time domain again you either produce an infinitesimal width sinc with amplitude $2AT$ therefore area $2A\pi$, or you produce an infinitesimal width rect window with amplitude $\frac{1}{2T}2\pi A$ (coming from the area under the sinc in the frequency domain), and therefore area $2\pi A$ when you multiply by the width of $2T$.
The frequency domain sinc amplitude of the sampled centred rect window is always the full width of the window $2T$ multiplied by the time domain amplitude of the frequency divided by the sampling period, which is $AN$, the amplitude multiplied by the number of samples. As you sample more frequently i.e. $N\to\infty$, you get an infinite amplitude sinc in the frequency domain at the origin. To make the time domain continuous at amplitude $A$ and not $A\delta$, you convolve the dirac deltas with unit rect windows of width $T_s$ as $T_s\to0$, which results in a frequency domain multiplication with a sinc of height $T_s$ with infinite width, which results in the sinc in the frequency domain now having amplitude $\frac{2ATT_s}{T_s} = 2AT$, which is the amplitude multiplied by the window width. That's the same thing as not normalising the height of the rect window that represents the dirac delta and instead making it have the amplitude $A$ rather than an infinite amplitude, but also making its width $T_s$ instead of infinitesimal, which is how a ZOH DAC would produce the samples in the analogue domain and what the spectrum would look like.