Find peaks in data set and correlate variables

Question

I have a data set composed of a X variable (col 1) and two Y variables. Y1 shows roughly 40 peaks while Y2 is a linear function. Is it possible to find the approximated 40 Y2 values for which Y1 is a local peak and make a list?

I am voting to close this question as more information is needed to give a good answer. — gwr, Jan 17 '19 at 19:41
I, too, am voting to close because more (appropriate) information is needed. For example, Y2 is a list a values rather than a "linear function". Also, is it a relationship with the location (X) or the value of the peaks (Y1) and Y2 as the relationship you want to eventually explore? Either way it seems you're throwing away data. — JimB, Jan 17 '19 at 20:35

Anton Antonov · Accepted Answer · 2019-01-19T13:53:49.113

This can be done with the algorithms in the MSE discussion "Finding Local Minima / Maxima in Noisy Data" and the WordPress blog post "Finding local extrema in noisy data using Quantile Regression".

Below is code applying the algorithm from the latter using the QRMon package.

data = Import["https://pastebin.com/raw/mqmxDjrC", "RawData"] // RightComposition[Map[StringReplace["," -> "."]], Map[ToExpression]]

Import["https://raw.githubusercontent.com/antononcube/\
MathematicaForPrediction/master/MonadicProgramming/\
MonadicQuantileRegression.m"]

res =
  QRMonUnit[data[[All, 1 ;; 2]]]⟹
   QRMonEchoDataSummary⟹
   QRMonQuantileRegression[120, {0.1, 0.8}, InterpolationOrder -> 2]⟹
   QRMonPlot[ImageSize -> Large]⟹
   QRMonFindLocalExtrema⟹
   QRMonEchoValue⟹
   QRMonTakeValue;

ListPlot[data[[All, 1 ;; 2]], AspectRatio -> 0.25, PlotRange -> All, 
 Epilog -> {PointSize[.01], Red, Point[res["localMaxima"]]}]

Note that the found peaks are more proper than the ones in the previous answer with FindPeaks.

Answer to gwr's "are your peaks better now" question

By comparing my peaks with gwr's peaks from his answer update we can see in the image below that my peaks are still better.

yPeaks = peaks[[All, 2]];

gwrPeaks = Transpose[{xPeaks, yPeaks}];

ListPlot[{data[[All, 1 ;; 2]], gwrPeaks, res["localMaxima"]}, 
 PlotRange -> All, 
 PlotStyle -> {Gray, {Red, PointSize[0.01]}, {Blue, PointSize[0.01]}}, 
 PlotLegends -> SwatchLegend[{"data", "gwr peaks", "my peaks"}], 
 ImageSize -> 1000]

How much "more proper" are yours now?

We can quantify that -- my peaks are 0.038 times better on average:

Block[{myPeaks = res["localMaxima"][[All, 2]], gwrPeaks = gwrPeaks[[All, 2]], dataRange = MinMax[data[[All, 2]]]},
 Through[{Min, Mean, Max}[
   Rescale[myPeaks, dataRange, {0, 1}] - 
    Rescale[gwrPeaks, dataRange, {0, 1}]]]
]

(* {0.00723808, 0.0377582, 0.0844722} *)

... also please see my updated solution the compare the peak-values obtained. How much "more proper" are yours now? — gwr, Jan 18 '19 at 11:58
@gwr Just posted an update — it seems my peaks are still better. :) — Anton Antonov, Jan 19 '19 at 13:55
Ok, the race is on! But having the „data speak for themselves“ smacks a bit of frequentism and will give me old Bayesian the shudders? :) — gwr, Jan 19 '19 at 14:04

gwr · Answer 2 · 2019-01-18T11:32:38.177

Try this:

data = Import["https://pastebin.com/raw/mqmxDjrC", "RawData"] // RightComposition[
    Map[StringReplace["," -> "."]],
    Map[ToExpression]
];

For noisy data we might make use of FindPeaks:

peaks = FindPeaks[ data[[All, 2]], 6, 0, 0.015, InterpolationOrder -> 2];
ListPlot[ data[[All, 2]], Epilog -> {PointSize[.02], Red, Point[peaks]} ]

We can then use the peaks to interpolate the Y2 values:

f = Interpolation @ data[[ All, 3 ]];
f /@ peaks[[ All, 1 ]]

{6.4, 8.8, 12.0618, 15.2, 15.2, 17.6082, 20.3207, 22.9153, 25.6528, \ 27.6321, 30.4, 31.9778, 31.9958, 34.2066, 35.9551, 38.4479, 40., \ 40.2023, 41.5939, 43.1894, 45.6, 47.9816, 48.8, 50.4, 52., 53.6, \ 55.6922, 57.5808, 58.497, 60., 61.6, 63.2, 63.8678, 64., 65.6, 67.2, \ 68.8, 70.0901, 71.486, 72.8, 74.4, 75.1668, 76.8, 78.4}

EDIT

We may try to tweak this solution a bit using a GaussionFilter to smooth the data and use DeleteDuplicates coupled with EuclideanDistance to eliminate peaks that are closer than some threshhold to eliminate "false signals":

Manipulate[
    With[
        { 
            peaks = FindPeaks[ GaussianFilter[data[[All, 2]], r], p1, 0, 0.015
                , InterpolationOrder -> order
            ] // DeleteDuplicates[ #, EuclideanDistance[#1, #2] < dist &] &
        },
        ListPlot[ data[[All, 2]]
            , PlotRange -> {All, {0.005, 0.025}}
            , AspectRatio -> 0.25
            , Epilog -> { PointSize[.01], Red, Point[peaks] }
            , ImageSize -> Large
        ]
    ]
    , { {r, 2}, 0, 10, 1, Appearance -> "Labeled" }
    , { {dist, 20}, 1, 30, 1, Appearance -> "Labeled" }
    , { {order, 2}, 1, 3, 1, Appearance -> "Labeled" }
    , { {p1, 2.75}, 0, 10, 0.25, Appearance -> "Labeled" }
 ]

We have now come up with 40 peaks using the above parameter settings:

peaks = FindPeaks[ GaussianFilter[ data[[All, 2]], 2], 2.75, 0, 0.015
    , InterpolationOrder -> 2
] // DeleteDuplicates[ #, EuclideanDistance[#1, #2] < 20 &] &;
Length @ peaks

40

We can now compare the x-values thus obtained with Antonov's:

fx = Interpolation @ data[[All, 1]];
xPeaks = fx /@ peaks[[All, 1]];
xPeaks // Partition[#, 10] & // Grid[#, Alignment -> "Decimal"] &

Maybe the peaks found in this way are not so much worse than the "full fledged Antonov-Approach" (very elegant of course using monads!)? ;-)

We can now interpolate y with regard to x:

fyx = Interpolation @ data[[All, {1, 3}]];
y2Peaks = fyx /@ xPeaks;
y2Peaks // Partition[#, 10] & // Grid[#, Alignment -> "Decimal"] &

Yeah, but that will pick two or more values for the same peak, since you are sorting the overall values. I was expecting something like smoothing the Y1 data so I can get an approximate value for each X coordinate and correlate that with Y2. — Rodrigo, Jan 17 '19 at 19:06
@Rodrigo Sorry, you will have to be a bit more elaborate on what you want. Have you looked at FindPeaks? — gwr, Jan 17 '19 at 19:13
Yes, but the problem is the data is noisy. I see roughly 40 peaks, but if I run a simple peak sorting algorithm, it might pick two or more points from the same peak. If you smooth the data enough, it should be easier to see the peaks. The idea is to find the approximate X positions for each of the 40 peaks and then find the corresponding Y2 values for those X coordinates. The end result should be a list of 40 Y2 values for which X gives a peak in Y1. — Rodrigo, Jan 17 '19 at 19:26
@Rodrigo I have improved my initial solution a bit and it now gives 40 peaks and a proper interpolation x_peak -> y2_peak. — gwr, Jan 18 '19 at 11:37

score 1 · Answer 3 · answered Jan 17 '19 at 19:47

1

Just an extended comment: such data is better displayed with a non-default AspectRatio:

ListPlot[data[[All, {1, 2}]], AspectRatio -> 0.1, ImageSize -> 1000,
 Joined -> True, PlotRange -> {{3, 22}, {0.007, 0.022}}]

answered Jan 17 '19 at 19:47

JimB

41,653
3
48
106

Find peaks in data set and correlate variables

3 Answers3

Answer to gwr's "are your peaks better now" question