I'm building a vocoder on an FPGA, and now trying to mix $N$ signals that are each the product of some carrier signal in one of $N$ bands and an envelope of a modulator in that same band (essentially a digital VCA). But due to the width of these samples I'm getting clipping. In particular, both the carrier and envelope signals are 24-bit and I want to produce a 24-bit mixed signal for my DAC with $N = 16$.
I've naively just multiplied the signals (48-bit) and divided each by $2^{24}$ through a right-shift to get $N$ 24-bit samples. This leads to lots of clipping especially when the modulator has moderately high amplitude, but dividing by anything more than this results in nearly inaudible output. I understand that adding $N$ 24-bit samples would require at least $24 + \log_2 N$ bits to prevent overflow, but dividing by more didn't really help. Is there something more sophisticated that can prevent clipping other than dividing each channel then adding? Are my carrier/envelope signals too sensitive?