Long time reader, first time poster. I have a few very simple questions that are troubling me and I am hoping that one of you guys can help me out.
Setup & Aim: I have a time series that I want to downsample, and I simply want to run a lowpass filter on it before doing so to avoid aliasing. I am using Python (SciPy) but it looks like MATLAB behaves similarly, neither are really relevant for these questions.
My original time series is sampled at $0.5\textrm{ ms}$ ($2000\textrm{ Hz}, f_{\rm Nyquist}=1000\textrm{ Hz}$) and I want to resample to $2\textrm{ ms}$ ($250\textrm{ Hz}, f_{\rm Nyquist}=250\textrm{ Hz}$), so I must apply an anti-alias filter that cuts off any frequencies $> 250\textrm{ Hz}$, and then downsample. So far, so good.
In Python, it looks like a Butterworth Filter is the way to go, which requires a normalised frequency $\omega_n$. My understanding is that in my case $\omega_n = 250\textrm{ Hz}/1000\textrm{ Hz} = 0.25$.
Now, what I don't understand and I cannot find any information on, is as follows:
What if my original time series ($f_s=2000\textrm{ Hz}$) had been upsampled from $1\textrm{ ms}$ ($f_s=1000\textrm{ Hz}, f_{\rm Nyquist}=500\textrm{ Hz}$)? There is no extra information between $500\textrm{ Hz}$ and $1000\textrm{ Hz}$ but I don't necessarily know that and I apply a Butterworth Filter with $omega_n = 0.25$ (instead of $\omega_n = 0.5$ for $1\textrm{ ms}$ sampling). Is it an issue? Am I misunderstanding how a Butterworth filter works?
My second question is something like "Why is this the preferred implementation of a low pass filter?" I am sure there are good reasons but I have used software in the past to just high cut filter my data knowing my new $f_{\rm Nyquist}$, so in my case. So in my case I would use something like $0-0-200-250\textrm{ Hz}$. Again, what am I missing? I know that maximum frequency that I want to keep.
Finally, one of the roots of my problem is that I sometimes have irregularly sampled data. I can run an interpolation to a regular time array but when doing this I tend to oversample, to avoid losing signal (This is where my first question comes in). Am I wasting my time? Should I just resample to the smallest time interval in my data?



I'm not set on using a Butterworth filter. I had just assumed that this was the best approach based on search for examples of how other people have done low pass filtering in Python.
When you say "shape", what do you mean? The low frequency "background" trend? If so, then yes that is important.
– Stev May 11 '17 at 08:18