The new samples are generated by interpolating between the original ones. Exactly how this is done will vary by implementation, but the most typical way would be to use a linear interpolating filter. With this technique, you would interpolate by a factor of 2 by inserting zeros between each of your input samples. Assuming your input signal is $x[n]$, your expanded signal would look like:
$$
x_e[n] = [ x[0], 0, x[1], 0, x[2], 0, \ldots ]
$$
Due to the properties of the discrete time Fourier transform, this has the effect of compressing and duplicating the spectrum of the input signal (so all of the spectral content that were present between DC and the Nyquist rate of the original signal are compressed into a region half as wide, then duplicated twice).
Finally, you then apply a lowpass filter to remove the extra copy of the spectrum above $f_s/4$ in the expanded signal. The result is an interpolated-by-2 version of your input signal that is bandlimited to the same region of frequencies as the original.
Doing this for an audio signal isn't likely to do much for you, unless you have some equipment or algorithm that requires a particular input sample rate. The interpolated audio, when played back at the new rate, should sound the same as the original signal; the interpolation process doesn't create any new information.
Edit: I should also note that the above is just a conceptual description of the process. In practice, you wouldn't use the explicit step of zero-stuffing followed by linear filtering. Instead, you can use multirate techniques like a polyphase realization of the interpolation filter that allow you to achieve the same effect with fewer computations.