1

I would like to analyze data to get the maximum value out of 360 points. I used least square fitting because I get the data from signal strengths. I want to remove any outliers I get from my data which is highly possible since the signal strength can be really not reliable sometimes. the data I have is really big (360), which will require a very high-degree polynomial. I want someone to help me to solve this issue since I tried using a very high degree polynomial in Matlab and it didn't work. I also tried dividing the data into chucks of data and process them seperately and then get the highest among them, but it will take lots of processing when I convert the algorithm from Matlab to C++.

flawr
  • 16,533
  • 5
  • 41
  • 66

2 Answers2

1

It is a bad idea to use polynomials of degree greater than 7 or so (just a rule of thumb), because you will get enormous oscillations that do not represent your data at all. You should first make some assumption on how those datapoints should look like, otherwise you cannot make any statement about what outliers look like / whether there is a maximum etc.

Can you tell us a bit more where you got your datapoints from, and perhaps post a plot of your datapoints so we get an impression how they look like?

flawr
  • 16,533
  • 5
  • 41
  • 66
  • The only thing I can do is to set a threshold of the minimum and maximum value I expect. What can I do rather than least square fitting ? any idea ? – user573014 Dec 30 '14 at 10:07
  • If the data is a time series that should in reality be continuous you could use the https://de.wikipedia.org/wiki/Kalman-Filter. If you assume the values to be normal distributed you could use the https://en.wikipedia.org/wiki/Mahalanobis_distance and define a threshold where all values above/below are considered as outliers. If it is not a time series you can plot the data after sorting it by the values. Perhaps you might want to share your data at pastebin.com or somthing similar? – flawr Dec 30 '14 at 10:17
  • link it is not a time series @flawr – user573014 Dec 31 '14 at 08:57
  • Ok I recommend sorting all y values by size and plotting them, you will see some outliers, which could be ignored. If you plot a histogram of the y values you will see that they could be normal or gamma (or beta) distributed and you will again find outliers, but I am sure this will help you find the maximum. But as long as you do not have a time series or something similar (where the order of the points does matter) it makes absolutely no sence trying to fit a function throu the points, since their order is arbitrary. I'd rather try to find the 1d distribution for making a statement. – flawr Dec 31 '14 at 12:50
  • I cannot sort the values because they represent the angles at where I captured the signal. @flawr – user573014 Jan 04 '15 at 05:42
  • That would have been useful to know beforehand. So these are measurements that can be assumed to have a normal distributed noise? Do you have expectations of the variance? I assume it was a speaker/microphone/antenna sort of thing? – flawr Jan 04 '15 at 09:50
  • On the one hand you have to make funamentally different assumptions, like that your function has to be periodic. If it really is an antenna/mic/speaker what can you assume about the shape of the polar pattern. And you could also assume that is symmetrical, so that could give you a hint of the expected noise. – flawr Jan 04 '15 at 10:06
  • I'd determine the maximum number of clubs you assume to get, and then fit a trigonometric series up to that frequency via least squares. – flawr Jan 04 '15 at 10:16
  • yes I get the values from an antenna, the antenna I am using is a directional antenna so I should be getting only one big peak and several small peaks. I am doing that to perform a calibration for my antenna so I am looking at the angle of the highest received signal. However, because sometimes we get signals from other sources I want to be careful of that and remove any outliers from the data set. @flawr – user573014 Jan 05 '15 at 11:32
  • can you elaborate more on the last suggestion you gave? – user573014 Jan 05 '15 at 11:34
  • Lets say you only expect 5 peaks, then you can try to fit e.g. $f(x)\sum\limits_{k=0}^5 (a_k \cos(kx) + b_k \sin(kx))$ If you want the whole thing to be symmetrical (so that $f(-x) = f(x)$, then just eliminate the $\sin$ terms in the equation. – flawr Jan 05 '15 at 12:17
0

An example of the problems with polynomial fitting is discussed in Polynomial best fit line for very large values

The optimal degree of fit can't be quantified without understanding the data and what you want with the fit (interpolation, integration, etc.). In general, you can expect the total error to drop initially as order of fit $d$ increases. It may hit a minimum (error increases as $d$ grows), or it may plateau.

The values for the amplitudes will oscillate, and you may wish to look at the amplitudes for an orthogonal polynomial set. Remember, orthogonal polynomials are not orthogonal when discretized.

Finally, look at the error in the fit amplitudes. You typically see that the signal drops below the range. You spend a lot of computation time calculating very expensive zeros.

These errors will provide you with the ability to quantitively remove data points using a standard deviation threshold. For example, exclude all points that are five deviations away from the prediction.

If we can get to your data we can provide more insight.

dantopa
  • 10,342