You're correct that measurement and modeling is the right way to go about this.
For your PID to work properly, you need to be able to make a somewhat linear conversion of error (desired roll vs actual roll) into corrective force (in this case, provided by the control surfaces -- the aileron angle, influenced by air speed and other factors).
The $k_d$ term of your PID should account for the inertia of the plane in rolling from side to side, so don't worry about that in your measurements.
What you should measure in your wind tunnel tests is the torque on the longitudinal axis of the plane, in response to airspeed (in both X and Y axes, if you have on-board sensors for that) and aileron angle. That will provide data on the following relationship:
$$
\tau_{actual} = f(\theta_{\text{aileron}}, \text{airspeed}_x, \text{airspeed}_y, \text{[other measurable factors]})
$$
You are going to approximate the inverse of that function -- finding $\theta_{\text{aileron}}$ given $\tau_{desired}$. Whether you do that with a neural network, wolfram alpha, multivariate regression, or a good knowledge of physics is up to you. In the end, you want to produce a modeling function in this form:
$$
\theta_{\text{aileron}} = f(\tau_{desired}, \text{airspeed}_x, \text{airspeed}_y, \text{[other measurable factors]})
$$
The PID will give you $\tau_{desired}$, and your sensors will give you the other factors to plug into this function.