Given the links in the question, the OP is interested in more details on implementing a squaring algorithm for carrier recovery. A squaring algorithm is a popular analog option for the recovery of BPSK and QPSK modulations based on its simplicity and is completely applicable to digital implementations as well, but it is not the only algorithm nor most efficient. Specific to squaring algorithm for carrier recovery, the following will help understand how this approach works and its limitations.
If the modulation is BPSK, then the symbols each have a carrier phase of either 0 or 180 degrees relative to the unmodulated carrier (which the OP is trying to recovery). A squaring loop strips the modulation by using the property:
$$\cos(\alpha)cos(\beta) = \frac{1}{2}cos(\alpha - \beta) + \frac{1}{2}cos(\alpha + \beta)$$
From above observe what happens if we square a BPSK signal given as
$$f(t) = \cos(2\pi f t+\phi[m])$$
Where $\phi[m] = 0, \pi$ for each symbol $m$
$$f^2(t) = \frac{1}{2}\cos(0) +\frac{1}{2}cos(4 \pi ft + 2\phi[m])$$
The first term is a DC offset value and is ignored or filtered out. Notice what happens in the second term from which we can now derive a constant carrier signal: The frequency has doubled (so squaring a tone in general will double its frequency, and frequency doublers are implemented by doing this) AND importantly the modulation has been completely removed, since doubling a signal's frequency will also double its phase. Since the phase was going between $0$ and $\pi$ (180 degrees) before being doubled, it will go between $0$ and $2\pi = 0$ after being doubled. This is a common analog recovery scheme due to the simplicity of implementing a frequency doubler (using a mixer, or even an XOR gate), and a frequency divider (with a flip-flop) to divide the recovered tone without the modulation by 2 to recover the coherent BPSK carrier. Pulse shaping will introduce other spectral artifacts, but they will all be sufficiently lower than the strong recovered carrier such that a PLL can be used to clean-up and provide a clean recovered coherent carrier for demodulation.
The same approach can be used for QPSK, but in this case we have four phase states, so the doubling needs to occur twice (resulting in $f^4(t)$). Doubling once will reduce the QPSK signal to a BPSK signal at double the carrier frequency, and doubling that will then strip the remaining BPSK modulation for a recovered carrier at four times the original frequency. (We also see that for any four quadrature phase states, if we raise it to the fourth power it will reduce to one phase state, meaning the modulation will be stripped: for example, for all phases $(0, \pi/2, \pi, 3\pi/2)$, any of these phases multiplied by 4 will be modulo $2\pi$ which is a phase equal to 0.
This approach could be done using an FFT such as first multiplying the QPSK signal by four and then use the FFT to find the strongest carrier that is at approximately four times the original carrier frequency, but that is highly inefficient and an approach I would not recommend following (this is basically breaking fred harris' rule "Don't copy the analog!" given we have so much more ability to provide direct optimized solutions with high efficiency with more direct DSP approaches targeting the core problem, where analog implementations are often much more constrained by the physical components themselves). Further it immediately requires a sampling rate 4 times higher to handle the 4x carrier increase (even for any given carrier offset). Further once the strongest FFT bin is found, the actual frequency will inevitably be at an smaller offset between the bins, so further resolution will be required to maintain actual coherence. I suggest alternate carrier recovery approaches at the links below, but if it is necessary to pursue a squaring approach, this would typically be done in a loop such that a PLL is used to clean-up the strongest single tone signal (at double the actual carrier for BPSK or four times the actual carrier for QPSK).
For a suggested carrier recovery approach please also see these other links:
High modulation index PSK - carrier recovery
BPSK and QPSK demodulation computational complexity
Demodulation of 4-QAM