According to Oppenheim and Schafer's "Discrete Time Signal Processing", the Goertzel algorithm will be more efficient than the FFT in computing an N point DFT if less than $2 Log_2 N $ DFT coefficients are needed. So if you need to compute 15 frequency points, the Goertzel will be more efficient for total number of samples $N > 65536$. Anything below that and the FFT will be more efficient than Goertzel as Marcus has implied.
As you probably discovered, one point in the DFT requires N phase rotations and N complex additions (assuming a complex input). (As the DFT output is computed by simply multiplying your input signal through successive rotations and accumulating the result-- in this manner you are correlating to each output frequency of interest).
Each phase rotation requires 4 multiplications and 2 additions (if you have time to iterate with little multipliers, see the CORDIC algorithm as an alternative phase rotator that requires no multipliers). The phase rotations that are +/-j are simplified, since a rotation by j is done by simply changing the sign of Q and swapping I and Q, but for a larger FFT this will be an insignificant number of the total computation.

So therefore for each point using the DFT, there are approximately 4N total multiplications and 4N total additions. So for an N pt DFT that would be $4N^2$ of each - wow!
In comparison the FFT requires $2Nlog_2N$ multiplications and $2Nlog_2N$ additions. When considering a complete DFT, the FFT offers a dramatic reduction in required computations. (Pause here for some respectful bowing to Cooley-Tukey).
I would be interested in seeing how the Goertzel algorithm lands on the graph below but haven't muddled through those details yet.

Regarding your other challenge mentioned that your frequencies do not line up; for that I recommend windowing first to reduce the effect of distant frequencies on each tone you are measuring and then interpolate between the bins to more accurately estimate your frequency result for each tone. (A simple linear interpolation may be sufficient). The windowing will reduce the frequency resolution of each bin but will also reduce the sidelobes from the other bins, so you would need to confirm that your frequency separation for your tones of interest are sufficiently larger than the equivalent bandwidth of your window+FFT. For a good summary of equivalent bandwidths vs window refer to fred harris' paper on windowing: https://www.utdallas.edu/~cpb021000/EE%204361/Great%20DSP%20Papers/Harris%20on%20Windows.pdf