6

If I have the following data (which is temperature in the x-axis and heat flow in the y-axis):

Import["https://pastebin.com/raw/SMKZUtbQ", "Package"]

which plotted using ListLinePlot[data, PlotRange -> {{50, 100}, {-0.1, 1}}] gives:

Image

Questions:

1) How can I find the onset value of each peak?. The onset value is defined as the intersection of the tangents of the peak with the extrapolated baseline (which can be taken as 0 in this case). An example of the onset value for two peaks are given in the figure below (done in excel), where the onset would be where both red lines intersect:

onset

PS: Here's a very brief description of onset temperature if you have any doubt: http://www.hydrateweb.org/dsc#:~:text=The%20onset%20and%20the%20offset%20temperatures%20are%20defined%20as%20the,heating%20rate%20or%20sample%20preparation)

EDIT: Here's one clarification about the tangent line to use. The tangent should be the line along the most reasonable length of the peak. The criteria for “reasonable” in this scenario would be the longest length of approximately consistent slope. Here's a picture that shows another example using two tangents lines to the "left" and "right" (I only need the one on the "left" to find the "onset"):

Image

For the baseline, please use 0 as the baseline.

John
  • 1,611
  • 4
  • 14
  • 2
    Have you seen FindPeaks? (https://reference.wolfram.com/language/ref/FindPeaks.html) – C. E. Jun 05 '20 at 21:12
  • @ C.E thank you! I think FindPeaks will definitely help me finding question 1. Thank you for the suggestion. Is there any similar function to find the onset value (temperature in this case)?. Also, do you know if I can use FindPeaks only on a given region (let's say from 65 to 94)? – John Jun 05 '20 at 21:14
  • 1
    Just take the slice of data that you’re interested in and give that to FindPeaks. As for the onset, there is no built-in function for that. – C. E. Jun 05 '20 at 21:23
  • "the tangents of the peak" - the tangents at which point on the peak? Different points --> different tangent slope --> different onset value. How exactly were the red lines constructed? – MarcoB Jun 05 '20 at 21:43
  • @C.E. Thank you (+1) – John Jun 05 '20 at 21:43
  • @MarcoB in this case it will be on the left side of the peak. Similar to how it is done here: http://www.hydrateweb.org/dsc#:~:text=The%20onset%20and%20the%20offset%20temperatures%20are%20defined%20as%20the,heating%20rate%20or%20sample%20preparation. The red lines are simply lines that are in the same path when starting from the highest point on the peak and are tangent to it. I hope it helps clarify it – John Jun 05 '20 at 21:45
  • 2
    @John Yes, I get that, but through which point is the tangent drawn? For instance, a tangent line that goes through the tip of the peak will be almost horizontal. THe web site does not define that either. I think they may have implied the tangent through the inflection point in the peak, which is pretty much the only one you can define precisely on the "side" of the peak's shape. – MarcoB Jun 05 '20 at 21:47
  • @MarcoB Yes, correct. It will be the tangent through the inflection point in the peak so that it can be defined as "left" – John Jun 05 '20 at 21:50
  • Since we finally discover that you are dealing with thermograms, all calorimeters come with dedicated software to analyze the relevant thermograms that will do all the things you have been attempting to do in your past questions (peak areas, baseline subtraction, today's problem...) automatically, and typically quickly and reliably. I wonder, why are you not using the bundled software? – MarcoB Jun 05 '20 at 21:50
  • @MarcoB you are correct. All calorimeters usually have that. However, I want to learn to do it in Mathematica as I way to do it faster for a lot of data. For instance, I can have a code that deals with a lot of data at the same time and finds all the peaks and onsets faster than the calorimeter software. Also, notice that with calorimeters you can only usually do this one at a time while if do it in Mathematica you can potentially do it with several thermograms and faster It is also a way for me to become better in Mathematica in calculations that are common in my daily area. – John Jun 05 '20 at 21:52
  • @John Yeah sure, but you will not get much better by having others to do it for you. The steps are pretty clear here: interpolate each peak (see your old question), interpolate the baseline (see your old question), find the inflection point (the zero of the second derivative), calculate the value of the first derivative at that point, calculate the intersection with the baseline. What have you tried towards those goals? Show some code, and tell us where you get stuck! – MarcoB Jun 05 '20 at 22:05
  • @MarcoB one way to get better at something is watching experts do something. I am in a stage where I would say I am a beginner. I learn so much in Mathematica (which is a program I am now using daily as I found how powerful it is) just by asking and seeing new ways of programming (beside my own way). Those extra steps that you mentioned are steps that I understand and sound easy but I simply do not know how to transform them into code and that's the reason I ask for help in this community which has helped me and made me better in Mathematica. – John Jun 05 '20 at 22:14
  • 3
    Multiple subtle choices need to be made in order to estimate 1) the curve's inflection points along with its tangent lines in the presence of noise — its derivatives turned out to be pretty noisy, and 2) the baseline. Too much freedom in the choices, no criteria of correctness but the vague common sense, no Mathematica functions to make all of the choices automagically, and next to no Mathematica users to know the specifics of this particular area. You need to find out — or invent yourself — the precise methodology of how to do it in principle before asking how to do it in Mathematica. – aooiiii Jun 05 '20 at 22:54
  • @aooiiii Thank you for your comment. I edited the question to clarify this matter that you pointed out. I hope that clarifies it. Thanks again for trying to help me. – John Jun 05 '20 at 23:06
  • @John, it's even vaguer than before. The inflection points had a straightforward, rigorous, no-nonsense mathematical definition. The only problem with them was dealing with noise, as noisy curves don't have inflection points. Now it turns out those people rely on something completely different. They said — vaguely — what it looks like, but they didn't give a clue on how exactly it's supposed to work. – aooiiii Jun 06 '20 at 00:00
  • You see, I can, and surely MarcoB can write a program that seemingly does the trick on that one curve. It's easy. But it won't be any good. Without tons of data to debug on, it will miss some peaks or erroneously see them in the noise. Without the expert knowledge, without precise specifications, it will systematically miscalculate their areas and slopes. Do you really need this toy imitation of the real scientific software? – aooiiii Jun 06 '20 at 00:03
  • @aooiiii even the "real scientific software" select the tangent similar to what I mentioned in the edit taking into account "the longest length of constant slope". Even within two different thermograms softwares can give different results within certain error because of how they (or the user) select the tangent lines. I will appreciate if you can post the code that does this for this data and I promise I will built on this I tried to post something more robust if I can based on that. Thanks ! – John Jun 06 '20 at 00:33
  • Oooh, harder than I thought. Here's my idea for the slope estimation (not promising anything and having a feeling I'm going to hell for this). Instead of fitting the triangles to the peaks, let's directly estimate the max and min values of the derivative. It's corrupted by noise, so we'll have to use smoothing. But we can't let the sharp triangular center be blurred by a Gaussian: the opposing slopes would almost cancel each other out. Instead, to estimate the right slope, we convolve the signal with something like Exp-x, x>0, and Expx, x<0 for the left slope. – aooiiii Jun 06 '20 at 02:15
  • @aooiiii I do not completely understand the idea but yeah! go for it. This certainly sounds interesting. If it is helpful and perhaps easier for you, another idea I was thinking was to create a code that it allows the user to select the tangent (out of the many possibles) that the user thinks is correct to get the onset base on the baseline of 0 and once this is done the code will give the onset value. This is similar to what some old thermogram softwares do as well (the new ones automatically give the onset based on the range where the peak is located). – John Jun 06 '20 at 02:28
  • @aooiiii I have updated the question to make it more easy to find the "correct" tangent to use for this problem. I also put a bounty. Could you help me do this? – John Jun 12 '20 at 21:57
  • @John, the function fitting approach from your deleted edit 2 seems viable in general, but extracting geometric features from a Gaussian fit to a clearly non-Gaussian function would result in severe errors (for instance, the estimated peak location would be shifted in the direction of the heavier tail). You'll need a better suited model, e.g. rational functions, or anything else with sharp skewed peaks. – aooiiii Jun 14 '20 at 05:08
  • @aooiiii Thanks for your reply. Take a look at this: https://mathematica.stackexchange.com/questions/223929/find-x-intercept-when-maximum-slope-of-tangent-intersects-a-point. This answer together with the code I posted there almost pretty much answer this question and provides a better way of doing it than with the Gaussian fit. It relies on finding onset temperature when the slope of the tangent is maximum and at the same time intersects the point that corresponds to the peak. This makes the selection of the tangent better defined as well. – John Jun 14 '20 at 05:46
  • @John, originally you had slope lines closely aligned with the curve up to the second derivative, intersecting at so-called extrapolated peak, higher than the actual one. Now the slope lines are merely tangent to the curve, but they intersect at the actual peak. I'm not sure which definition is better - it's up to experts in that particular field to decide. Whichever definition you choose, I strongly advise to estimate the derivative with some filter, as MarcoB did. – aooiiii Jun 14 '20 at 06:53

1 Answers1

9

John, I hope you will believe me when I tell you that you are severely underestimating the complexity of your problem. I think it laudable that you want to learn the tools of your trade better, and I definitely encourage it!

The process you describe, however, is complex and tedious to program up by hand; it also requires a great deal of technique-specific knowledge to make the "right" decisions. For instance, a piece of software to perform peak detection and integration for Nuclear Magnetic Resonance data will make different decisions than one designed for chromatography, and yet others are used in calorimetry. To try an re-create these nuances in a few lines of code is a bit naïve. Done right, your question is far more complex than you give it credit for.

I also wanted to address your point about "getting better by watching experts do something". Although that is certainly true, programming relies on a lot of trial and error. You try something; it does not work; you painfully, slowly fix your mistakes by trawling through this site and others and by reading the docs; and then whatever you learned will be seared into your brain :-)

But enough chit chat. Here is some code to illustrate a few of my points.

First off, you know that this is a question of zeros in derivatives, so first we need to calculate some derivatives. That might steer you towards interpolation:

int = Interpolation[data];

MapThread[
 Plot[
   D[int[x], {x, #1}] /. x -> t, {t, 45, 110},
   PlotRange -> All, Axes -> {True, False}, Frame -> True,
   ImageSize -> Medium, PlotLabel -> #3,
   PlotStyle -> #2] &,
 {{0, 1, 2}, 
  {Black, Red, Blue}, 
  {"interpolated data", "first derivative", "second derivative"} }
]

interpolation results

The data has a wandering baseline, but the first derivative still looks pretty good, although a bit noisy. In fact, it is a good example of why data is often presented in "derivative form" when the baseline matters little and the position of the peaks is more important (of course, the peak positions correspond to the zero-crossings in the first derivative).

The second derivative looks very noisy though. We need to find the zeroes of $f''$ because those are the positions of the inflection points of those peaks. This is too noisy though; it would be challenging to work with this. You would want to smooth it.

In fact, Savitzky - Golay smoothing would be a common choice in this case; convolution with an appropriate Savitzky - Golay kernel can give you smoother data, but also first and second derivatives directly (see Savitzky - Golay filter on Wikipedia, (124928), (37380), (190857), and SavitzkyGolayMatrix.

The thing is, your data is time-stamped and of course you would want to apply smoothing only to the ordinate, not to the times. You would also have to keep track of the points you "lose" by convolution with the filter kernel, etc. etc. Best not done by hand; fortunately, the TimeSeries machinery in Mathematica is perfect for this kind of stuff. All operations will be carried out on the intensities, and the time stamps will be correctly and automatically carried along. Creating a TimeSeries object from your data is simple: TimeSeries[data].

With that in hand, we can apply appropriate Savitzky - Golay filters to smooth the data, and to obtain its smoothed first and second derivatives:

{smoothed, firstderivative, secondderivative} = 
 ListConvolve[SavitzkyGolayMatrix[{10}, 3, #], TimeSeries[data]] & /@ 
  Range[0, 2] 

This applies a smoothing kernel of radius 10 (hand-wavingly, considering runs of ten points in your data), performs polynomial regression of degree 3 (pretty standard choice), and produces the $n^{th}$ derivative. With $n=0$ you get smoothed data, with $n=(1,2)$ you get the smoothed first and second derivatives, respectively:

We can then use DateListPlot to show the results. We can select a particular time window to plot using TimeSeriesWindow, to focus on the region at 50 to 100 seconds (or minutes, or whatever your time unit is, which you did not specify): that is where your peaks are

Here are smoothed data and first derivative:

DateListPlot[
  TimeSeriesWindow[#, {52, 105}] & /@ {smoothed, 5 firstderivative},
  PlotStyle -> {Black, Red}, PlotRange -> All,
  GridLines -> {None, {0}}, GridLinesStyle -> Darker@Gray,
  DateTicksFormat -> {"Minute", ":", "Second"},
  PlotLegends -> {"smoothed data", "first derivative"}
]

data and first derivative

... and here are smoothed data and second derivative:

DateListPlot[
  TimeSeriesWindow[#, {52, 105}] & /@ {smoothed, 30 secondderivative},
  PlotStyle -> {Black, Blue}, PlotRange -> All,
  GridLines -> {None, {0}}, GridLinesStyle -> Darker@Gray,
  DateTicksFormat -> {"Minute", ":", "Second"},
  PlotLegends -> {"smoothed data", "second derivative"}
]

data and second derivative

Much better, no?

Alright, that is a good starting point. We can work with this. So now we "only" have to:

  • find the zeroes of the first derivative (the positions of the peaks, for reference);
  • find the relevant zeroes of the second derivative (the positions of the inflection points), two for each peak ("left" and "right");
  • calculate the values of the first derivative at the inflection point, derive the equation of the tangent line through that point with that slope.
  • estimate a LOCAL baseline for each peak (there obviously is no global baseline here, since the drift is significant); perhaps subtract it from the peak?
  • calculate the intersection between baseline and that tangent.
  • repeat for the other side;
  • repeat for all peaks.

I hope to convey that this is a very complicated task. I am not going to attempt the rest, as it is laborious and time-consuming. But I do strongly encourage you to do so, if you still want to! You will learn A LOT if you do.

MarcoB
  • 67,153
  • 18
  • 91
  • 189
  • Very impressive! May I please ask you to comment on the problem of very sharp peaks, resembling Exp[-Abs[x]], with their two inflection points almost coinciding. I thought that symmetric smoothing filters would blur the peak, resulting in either underestimation of the derivative's absolute values, or too much noise, depending on the filter size. My idea was to convolve the signal with a causal filter strongly skewed to zero, to estimate one side of the peak, and its time reversed version for another, so that the slopes wouldn't cancel each other out. What do you think? – aooiiii Jun 06 '20 at 03:29
  • @aooiiii Smoothing will certainly blur the peak, but should not typically modify the position of the maximum; the magnitude of the derivative is rarely important in peak detection, its zeroes more so. I'd be interested in seeing your approach though, but I was wondering how often one would encounter such sharp peaks as Exp[Abs[x]]in physical processes. – MarcoB Jun 06 '20 at 04:16
  • Could it be that I misunderstood something? According to my approach, the slopes of the tangent lines are defined to be equal to the maximum and minimum of the two derivative estimates. Thus underestimating the derivative's magnitude would have resulted in flatter tangent lines. Regarding Exp[-Abs[x]], I meant that the curve, although smooth, has peaks with tiny rounded corners and inflection points a few sampling periods apart. Precisely this region, smaller than the filter size, defines the slopes of the tangent lines, as the derivative there is the highest in magnitude. I thought. – aooiiii Jun 06 '20 at 05:54
  • MarcoB this is a very impressive code. Yes, you are right. Perhaps I am not aware of how difficult (and interesting) this problem is. I will take your code as basis and if I solve it I will post it here so that other people in the community can benefit as well. Thanks – John Jun 06 '20 at 15:57
  • MarcoB do you think it is better and easier to create a code that allows the user to select the tangent (out of the many possibles) that the user thinks is correct to get the onset base on the baseline of 0 and once this is done the code will give the onset value. This would be similar to what some old thermogram softwares do as well?, Would this be more approachable to do than the complications that arise from doing it similar to how you are doing it? – John Jun 06 '20 at 17:28
  • @john well yes no doubt it would be easier, but interactive manipulation and user input are also kind of hard in Mathematica, and conceptually then two different people could get two different numbers from the same curve that way. Acceptable at least as a first approximation though. Besides, these onset offset temperatures are not that accurate anyway. – MarcoB Jun 06 '20 at 19:09
  • @aooiiii no you’re right in what you said. It’s just that, for the level of accuracy required of this calculation, the issue you raise may end up being of relatively minor Importance. – MarcoB Jun 06 '20 at 19:10
  • @MarcoB Unfortunately, my Mathematica skills are not good enough and I was not able to complete the program to sucesfully find the onset temperature. I added a bounty to this question in case you want to take on the remaining of the challenge. – John Jun 10 '20 at 00:33
  • @MarcoB I gave you the bounty of this question because of all the effor you have put in to help me! This is very much appreciate and keep helping people ! Thanks – John Jun 16 '20 at 18:24
  • @John I appreciate it! Thank you! – MarcoB Jun 16 '20 at 20:26