1

finale edit: i found someone who solved my problem via matlab. therefore i wont try out the proposed solutions now, but wont delete my question in case someone might have use for the proposed solutions in the future. thx to everyone who tried to help me

I have a csv data with measure points. These points follow a exponential curve in regular sequences.

edit: had to remove the plot

Now I need to find an exponential function to fit for example for the time interval between 6 minutes and 9 minutes (the second gray bar). I created a new data for this which only contains the measure points between 6 and 9 min). But if I try the usual:

data=Import["mydataselection","CSV"];
nlm=NonlinearModelFit[data[[All,{1,2}]],a Exp[-k t+m],{a,k,m},t]

I get completely wrong numbers for a,k and m. in this case: $a=0.142679$, $m=0.441977$ and $k=0.122386$. (btw I think I need the $+m$ somewhere because I should get the same "a" and "k" for the time interval 6 to 9 and 12 to 15 etc.

The plot of this exponential function looks like this:enter image description here here you can see what the plot of the data points for 6 to 9 min looks like and on the right what the plot of the exponential fit with a Exp[-k t] looks like, which clearly doesn't fit.

In another case (working with another csv data I even got a negative "a" although the curve is very similar.

Maybe its also important to note that I have a lot of datapoints (from 6 to 9 min its about 24000 datapoints).

Can anyone help me out here?

Edit: Maybe I should note that the CSV data looks like this: x1,y1, x2,y2, ...

edit2: I can't post more pictures or the actual data because I need 10 "reputation" for that.

Also I removed the "+m", but I still can't get proper values. For example for 9 to 12 min I get positive a Exp[k t] with both positive "a" and "k"

Edit3: I switched the second pic for a better one.

  • Welcome to Mathematica.SE! 1) As you receive help, try to give it too, by answering questions in your area of expertise. 2) Take the tour and check the faqs!
    3) When you see good questions and answers, vote them up by clicking the gray triangles, because the credibility of the system is based on the reputation gained by users sharing their knowledge. Also, please remember to accept the answer, if any, that solves your problem, by clicking the checkmark sign!
    –  Feb 20 '16 at 15:22
  • 1
    The first part of your question about exponential fitting is discussed in many threads on this site. The second part of your question is essentially about simultaneous fitting of one model to multiple datasets which was also discussed in details here. If you still have difficulties, please clarify what makes trouble and also provide your dataset (you can put it in http://pastebin.com/ and publish a link here). – Alexey Popkov Feb 20 '16 at 15:27
  • 1
    You need to share your data in order for members to resolve your problem. I have seen people using a site called Pastebin in order to share data files. – Jack LaVigne Feb 20 '16 at 15:28
  • 1
    You have an identifiability issue in that you really only have 2 distinct parameters rather than 3. Note that $a e^{-k t + m}=a e^{m} e^{-k t}=c e^{-k t}$. So rewriting your model as $c e^{-k t}$ should get you more consistent results. – JimB Feb 20 '16 at 15:51
  • i tried out removing the +m. didnt help getting better results. – Christopher Bee Feb 20 '16 at 16:09
  • the link to the dataset for 6 to 9 min: link . i cant past the whole dataset there because its too big – Christopher Bee Feb 20 '16 at 16:10
  • to asnwere Louis' question: as far as i can tell i am following the usual instructions. at least i cant find anything wrong with my code. but i still cant bring mathematica to give me propper values for a and k. is it because i have too many datapoints? if yes: how can i get rid of this problem without deleting datapoints by hand? – Christopher Bee Feb 20 '16 at 16:12
  • If you rewrite the model as a Exp[-k(t-t0)] where you know t0 as the starting point for a particular interval, then you'll get more consistent results among intervals for the estimates of a and k. – JimB Feb 20 '16 at 16:31
  • @JimBaldwin i shifted the datapoints from 6 - 9 to 0-3. i still get what is shown in the second picture – Christopher Bee Feb 20 '16 at 16:33
  • Since the data is claimed to be exponential, the first thing I do is a ListPlot of the Log of the y values and expect to see a line. That plot is not a line. Log Log of that is not a line. So perhaps part of the reason you are having problems fitting to an exponential is that the data isn't really exponential. Plot your data. Do transforms on the y values until you get a (noisy) line. Then use the inverse of that transform as your model to fit. – Bill Feb 20 '16 at 17:06
  • @ChristopherBee Please include this code in your answer (it Imports your first dataset): dataset1 = Most /@ Import["http://pastebin.com/raw/RnMJkaPv", "CSV"];. Please include full code which generates the plots which you show in your question and also please clear out the question: it contains a lot of unnecessary wording and has a deficiency of actual data and no clear explanation of what you have tried and what you have got! – Alexey Popkov Feb 20 '16 at 17:35
  • After following @Bill 's advice, try FindFormula[data, t, 10] which will give you ten potential models (either before or after taking logs) which suggests the fit is more complicated than a simple linear function of t or an exponential of a linear function of t. Also, you should state whether the objective requires fitting so some predetermined functional form or if you just need to obtain a predictive summary. – JimB Feb 20 '16 at 17:44
  • I now wonder if m was misplaced in the original formula. nlm = NonlinearModelFit[data, {a Exp[-k t] + m, m < Min[data[[All, 2]]]}, {a, k, {m, 0.075}}, t] seems to provide a reasonable fit. – JimB Feb 20 '16 at 18:05
  • If the Matlab solution is online, would you post a link to that answer? – JimB Feb 21 '16 at 00:15
  • @Jim sadly its not online. a friend came by and solved it. if you like i can try to give you a short overview of his steps (as good as i can remember) – Christopher Bee Feb 21 '16 at 07:48
  • I think that would be informative. Thanks! – JimB Feb 21 '16 at 08:01
  • @JimBaldwin we isolated the data of one timeinterval. matlab pushed it automaticly so that the x values started at 0 (and reached up to 3 minutes for the shown dataset). then he subtracted the lowest y-value from all y-values, then did the exponential fit. after that he readded the lowest y value. all this resulted in very good fits of the form m+a Exp(-k t). although he had to "teach" matlab that "a" can be negative too (for example for the blue segment in my first pic). i hope, this helps. i think its interesting that the fits were that good since some told me the data was not exponentialy – Christopher Bee Feb 21 '16 at 08:49
  • Thanks. That makes sense. I'd argue that you want to keep m as a parameter to be estimated along with all of the others as that way you can get defensible estimates of precision for all of the parameters and if, say, one of the time intervals was incomplete, you'd have comparable estimates of m. Also, the person who pointed out that you really didn't have a standard exponential curve is correct with the original model proposed. It only has a part of its final definition as a standard exponential. That was a good hint that something else needed to be considered. – JimB Feb 21 '16 at 16:40

2 Answers2

4

I assume that you need to compare parameters among the different 3-minute time intervals. To do so you'll need to construct a functional form that allows such a comparison. You mention an "exponential" curve but that needs more specificity. One possibility is the following.

First set the beginning of the time period: t0.

t0 = Min[data[[All, 1]]];

Then fit the following model:

nlm = NonlinearModelFit[data, {a Exp[-k (t - t0)] + m, m < Min[data[[All, 2]]]},
 {a, k, {m, 0.075}}, t]
nlm["BestFitParameters"]
(* {a->0.03810162618882989, k->1.0157428075372188, m->0.07725243271210368} *)

Then display the fit and various residual plots:

(* Collect some summary statistics *)
residuals = nlm["FitResiduals"];
predicted = nlm["PredictedResponse"];
residMean = Mean[residuals];
residSD = StandardDeviation[residuals];

(* Data and predictions *)
Show[ListPlot[data, PlotStyle -> LightGray], 
 Plot[nlm[t], {t, 6, 9}, PlotStyle -> Red],
 PlotLabel -> "Data and prediction", Frame -> True]

Data and fit

(* Predicted vs. Residual *)
ListPlot[Transpose[{predicted, residuals}],
 PlotLabel -> "Predicted vs. Residual",
 Frame -> True, PlotStyle -> LightGray]

Predicted vs residual

(* Histogram of residuals *)
Show[{Histogram[residuals, Automatic, "PDF"],
 Plot[PDF[NormalDistribution[residMean, residSD], x], {x, Min[residuals], Max[residuals]}]},
 PlotRange -> All, PlotLabel -> "Histogram of residuals", Frame -> True]

Histogram of residuals

(* Quantile plot *)
QuantilePlot[residuals, PlotLabel -> "QuantilePlot", Frame -> True]

Quantile plot

The histogram and quantile plot suggest some departures from the assumed normality (a slightly heavy right tail and a slightly light left tail - i.e., residuals are skewed to the right a bit). Given the large amount of data, the standard errors for the parameter estimates might be a bit off but probably nothing to worry about.

JimB
  • 41,653
  • 3
  • 48
  • 106
  • +1 for identifying the crucial factor of adding an offset m. As you pointed out in your earlier comment the time offset, t0 is really not needed. However, I think it is a nice touch and probably useful when describing real physical systems so that the amplitude is significant. – Jack LaVigne Feb 21 '16 at 14:59
  • @JackLaVigne. Thanks. I agree with what you say about the amplitude: t0 is needed to allow for meaningful comparisons of the a parameter across datasets. – JimB Feb 21 '16 at 16:32
0

I think I saw this data once on this site and the plots look suspiciously familiar ... but whatever, a start to discuss:

myData = Import["data.txt", "Data"]

enter image description here

model = -a  x^3 + b  x^2 - c x + d 

myFit = Normal[NonlinearModelFit[myData, model, {a, b, c, d}, x]]

$-0.00160329 x^3+0.0411292 x^2-0.354777 x+1.10907$

lp1 = ListPlot[{myData}]
p2 = Plot[myFit, {x, 6, 9}, PlotStyle -> Red]
Show[lp1, p2]

enter image description here

  • i dont know what you mean with "suspicious". i know that you cant have seen this exact dataset because it was taken two months ago and not posted anywhere. you might have seen similar plots because its from a "special" way to analyze certain molecules (dont want to touch on chemistry in this forum^^) – Christopher Bee Feb 21 '16 at 16:03
  • now to your "solution": as i stated in my last edit, the dataset was fittable with an exponentil funktion very nicely. and sinc in the end i needed the "k" value, your solution doesnt solve the problem. – Christopher Bee Feb 21 '16 at 16:08