0

I have a data set composed of a X variable (col 1) and two Y variables. Y1 shows roughly 40 peaks while Y2 is a linear function. Is it possible to find the approximated 40 Y2 values for which Y1 is a local peak and make a list?

gwr
  • 13,452
  • 2
  • 47
  • 78
Rodrigo
  • 1,482
  • 9
  • 13
  • 1
    I am voting to close this question as more information is needed to give a good answer. – gwr Jan 17 '19 at 19:41
  • 1
    I, too, am voting to close because more (appropriate) information is needed. For example, Y2 is a list a values rather than a "linear function". Also, is it a relationship with the location (X) or the value of the peaks (Y1) and Y2 as the relationship you want to eventually explore? Either way it seems you're throwing away data. – JimB Jan 17 '19 at 20:35

3 Answers3

6

This can be done with the algorithms in the MSE discussion "Finding Local Minima / Maxima in Noisy Data" and the WordPress blog post "Finding local extrema in noisy data using Quantile Regression".

Below is code applying the algorithm from the latter using the QRMon package.

data = Import["https://pastebin.com/raw/mqmxDjrC", "RawData"] // RightComposition[Map[StringReplace["," -> "."]], Map[ToExpression]]

Import["https://raw.githubusercontent.com/antononcube/\
MathematicaForPrediction/master/MonadicProgramming/\
MonadicQuantileRegression.m"]

res =
  QRMonUnit[data[[All, 1 ;; 2]]]⟹
   QRMonEchoDataSummary⟹
   QRMonQuantileRegression[120, {0.1, 0.8}, InterpolationOrder -> 2]⟹
   QRMonPlot[ImageSize -> Large]⟹
   QRMonFindLocalExtrema⟹
   QRMonEchoValue⟹
   QRMonTakeValue;

enter image description here

ListPlot[data[[All, 1 ;; 2]], AspectRatio -> 0.25, PlotRange -> All, 
 Epilog -> {PointSize[.01], Red, Point[res["localMaxima"]]}]

enter image description here

Note that the found peaks are more proper than the ones in the previous answer with FindPeaks.

Answer to gwr's "are your peaks better now" question

By comparing my peaks with gwr's peaks from his answer update we can see in the image below that my peaks are still better.

yPeaks = peaks[[All, 2]];

gwrPeaks = Transpose[{xPeaks, yPeaks}];

ListPlot[{data[[All, 1 ;; 2]], gwrPeaks, res["localMaxima"]}, 
 PlotRange -> All, 
 PlotStyle -> {Gray, {Red, PointSize[0.01]}, {Blue, PointSize[0.01]}}, 
 PlotLegends -> SwatchLegend[{"data", "gwr peaks", "my peaks"}], 
 ImageSize -> 1000]

enter image description here

How much "more proper" are yours now?

We can quantify that -- my peaks are 0.038 times better on average:

Block[{myPeaks = res["localMaxima"][[All, 2]], gwrPeaks = gwrPeaks[[All, 2]], dataRange = MinMax[data[[All, 2]]]},
 Through[{Min, Mean, Max}[
   Rescale[myPeaks, dataRange, {0, 1}] - 
    Rescale[gwrPeaks, dataRange, {0, 1}]]]
]

(* {0.00723808, 0.0377582, 0.0844722} *)
Anton Antonov
  • 37,787
  • 3
  • 100
  • 178
  • 1
    Very elegant solution! Please define "more proper" ... ;-) – gwr Jan 18 '19 at 11:33
  • ... also please see my updated solution the compare the peak-values obtained. How much "more proper" are yours now? – gwr Jan 18 '19 at 11:58
  • 1
    @gwr Just posted an update — it seems my peaks are still better. :) – Anton Antonov Jan 19 '19 at 13:55
  • Ok, the race is on! But having the „data speak for themselves“ smacks a bit of frequentism and will give me old Bayesian the shudders? :) – gwr Jan 19 '19 at 14:04
2

Try this:

data = Import["https://pastebin.com/raw/mqmxDjrC", "RawData"] // RightComposition[
    Map[StringReplace["," -> "."]],
    Map[ToExpression]
];

For noisy data we might make use of FindPeaks:

peaks = FindPeaks[ data[[All, 2]], 6, 0, 0.015, InterpolationOrder -> 2];
ListPlot[ data[[All, 2]], Epilog -> {PointSize[.02], Red, Point[peaks]} ]

Peaks

We can then use the peaks to interpolate the Y2 values:

f = Interpolation @ data[[ All, 3 ]];
f /@ peaks[[ All, 1 ]]

{6.4, 8.8, 12.0618, 15.2, 15.2, 17.6082, 20.3207, 22.9153, 25.6528, \ 27.6321, 30.4, 31.9778, 31.9958, 34.2066, 35.9551, 38.4479, 40., \ 40.2023, 41.5939, 43.1894, 45.6, 47.9816, 48.8, 50.4, 52., 53.6, \ 55.6922, 57.5808, 58.497, 60., 61.6, 63.2, 63.8678, 64., 65.6, 67.2, \ 68.8, 70.0901, 71.486, 72.8, 74.4, 75.1668, 76.8, 78.4}

EDIT

We may try to tweak this solution a bit using a GaussionFilter to smooth the data and use DeleteDuplicates coupled with EuclideanDistance to eliminate peaks that are closer than some threshhold to eliminate "false signals":

Manipulate[
    With[
        { 
            peaks = FindPeaks[ GaussianFilter[data[[All, 2]], r], p1, 0, 0.015
                , InterpolationOrder -> order
            ] // DeleteDuplicates[ #, EuclideanDistance[#1, #2] < dist &] &
        },
        ListPlot[ data[[All, 2]]
            , PlotRange -> {All, {0.005, 0.025}}
            , AspectRatio -> 0.25
            , Epilog -> { PointSize[.01], Red, Point[peaks] }
            , ImageSize -> Large
        ]
    ]
    , { {r, 2}, 0, 10, 1, Appearance -> "Labeled" }
    , { {dist, 20}, 1, 30, 1, Appearance -> "Labeled" }
    , { {order, 2}, 1, 3, 1, Appearance -> "Labeled" }
    , { {p1, 2.75}, 0, 10, 0.25, Appearance -> "Labeled" }
 ]

Tweaking the parameters

We have now come up with 40 peaks using the above parameter settings:

peaks = FindPeaks[ GaussianFilter[ data[[All, 2]], 2], 2.75, 0, 0.015
    , InterpolationOrder -> 2
] // DeleteDuplicates[ #, EuclideanDistance[#1, #2] < 20 &] &;
Length @ peaks

40

We can now compare the x-values thus obtained with Antonov's:

fx = Interpolation @ data[[All, 1]];
xPeaks = fx /@ peaks[[All, 1]];
xPeaks // Partition[#, 10] & // Grid[#, Alignment -> "Decimal"] &

peak table

Maybe the peaks found in this way are not so much worse than the "full fledged Antonov-Approach" (very elegant of course using monads!)? ;-)

We can now interpolate y with regard to x:

fyx = Interpolation @ data[[All, {1, 3}]];
y2Peaks = fyx /@ xPeaks;
y2Peaks // Partition[#, 10] & // Grid[#, Alignment -> "Decimal"] &

y2Peaks

gwr
  • 13,452
  • 2
  • 47
  • 78
  • Yeah, but that will pick two or more values for the same peak, since you are sorting the overall values. I was expecting something like smoothing the Y1 data so I can get an approximate value for each X coordinate and correlate that with Y2. – Rodrigo Jan 17 '19 at 19:06
  • @Rodrigo Sorry, you will have to be a bit more elaborate on what you want. Have you looked at FindPeaks? – gwr Jan 17 '19 at 19:13
  • Yes, but the problem is the data is noisy. I see roughly 40 peaks, but if I run a simple peak sorting algorithm, it might pick two or more points from the same peak. If you smooth the data enough, it should be easier to see the peaks. The idea is to find the approximate X positions for each of the 40 peaks and then find the corresponding Y2 values for those X coordinates. The end result should be a list of 40 Y2 values for which X gives a peak in Y1. – Rodrigo Jan 17 '19 at 19:26
  • @Rodrigo I have improved my initial solution a bit and it now gives 40 peaks and a proper interpolation x_peak -> y2_peak. – gwr Jan 18 '19 at 11:37
1

Just an extended comment: such data is better displayed with a non-default AspectRatio:

ListPlot[data[[All, {1, 2}]], AspectRatio -> 0.1, ImageSize -> 1000,
 Joined -> True, PlotRange -> {{3, 22}, {0.007, 0.022}}]

Time series figure

JimB
  • 41,653
  • 3
  • 48
  • 106