15

I have data that I want to fit with a constant plus a combination of Gaussians, with their actual number being one parameter of the fit (besides the average and the variance of every one of them).

How can I carry out and test such procedure in Mathematica?

If fitting discrete values is not possible, how can implement a series of fits for different number of Gaussians and a selection criteria the best fit?

A representative example of the data:

dat = Uncompress@Import["http://pastebin.com/raw.php?i=VHZ7XJAi"];
ListPlot[dat, PlotRange -> All, PlotTheme -> "Scientific"]

Mathematica graphics

rhermans
  • 36,518
  • 4
  • 57
  • 149
psmith
  • 815
  • 2
  • 7
  • 20
  • 2
    Please describe your data. – Sjoerd C. de Vries Sep 07 '15 at 14:57
  • 2
    If the number of Gaussians is a free parameter, wouldn't the search algorithm will most likely make it diverge? – rhermans Sep 07 '15 at 15:04
  • 1
    It may be helpful to share your code attempts AND representative data, hopefully in a well formatted form, so we can quickly see the problem you are facing. Examples of how to share your data here.Please edit your question accordingly. – rhermans Sep 07 '15 at 15:15
  • @rhermans yes, you're right. I know that there are techniques in statistics to avoid such overfitting, I was thinking if there would be something already implemented in Mathematica. – psmith Sep 07 '15 at 15:17
  • 3
    @rhermans: I don't think the number of Gaussians would diverge in the strictest sense; but since each Gaussian has 2 free parameters, a sum of $(n-1)/2$ Gaussians plus a constant will be able to fit a generic set of $n$ data points perfectly. – Michael Seifert Sep 07 '15 at 15:17
  • 5
    I don't think that there's any support for fitting to discrete parameters in Mathematica (though I could be wrong.) You might consider doing a series of fits to 1, 2, 3, ... Gaussians (which Mathematica can do) and then applying the Aikaike information criterion or a similar criterion to pick the best one. – Michael Seifert Sep 07 '15 at 15:21
  • @MichaelSeifert, don't $n+1$ Gaussians always fit better than $n$ Gaussians? You need to apply some kind of constrain. – rhermans Sep 07 '15 at 15:22
  • @MichaelSeifert thanks, I was thinking about something like that, but I didn't know about Akaike information criterion. – psmith Sep 07 '15 at 15:28
  • 4
    @rhermans: Not if you run out of data points to fit. For example, you could fit $n$ data points $(x_i, y_i)$ perfectly with $n$ Gaussians by letting them have infinitesimally small width, centering the $i$th Gaussian on $x_i$, and setting its height to be $y_i$. Adding another Gaussian on top of this wouldn't increase the goodness of the fit. Allowing the widths of the Gaussians to vary would presumably reduce the number required (though I'm not 100% about the estimate I gave above.) – Michael Seifert Sep 07 '15 at 15:31
  • @MichaelSeifert you are correct. Thanks. – rhermans Sep 07 '15 at 15:33
  • thanks all for your insights. I see now that the problem is more hard than I was expecting and I have first to understand the statistics behind it before trying to implement in Mathematica. Should I delete the question? – psmith Sep 07 '15 at 15:36
  • 2
    I also wanted to add that the experimental function FindFormula may well be able to do this in some future iteration, depending on how Wolfram develops it. As of now (v10.2), it appears to be limited to a small number of target functions – Michael Seifert Sep 07 '15 at 15:56
  • @rhemans the fake data you used are representative of my situation and your answer is definitely useful. I don't have the data close at hand at the moment but I'll add them asap. – psmith Sep 08 '15 at 10:57
  • 2
    These answers may be helpful. – KennyColnago Sep 08 '15 at 14:57

1 Answers1

35

Solution

Following @MichaelSeifert's advice, here is a working solution.

First define a compact expression for a Gaussian

g[x_, xo_, σ_, a_] :=  a Exp[-((x - xo)^2/(2 σ^2))] /(σ Sqrt[2 π])

To get

g[x, xo, σ, Ao]

Mathematica graphics

Creates a list of variables for the $k^{\text{th}}$ term.

kvar[k_Integer] := 
 ToExpression@
  Map[StringJoin[#, ToString[k]] &, {"x", "σ", "a"}]

Mathematica graphics

Generate a model equation to fit $n$ Gaussians using Sequence to place the output of kvar as arguments for g

gmodel[n_Integer] := Sum[
  g[x, Sequence @@ kvar[i]]
  , {i, 1, n}
  ]

Generate a list of parameters for the $n$ Gaussian model.

gpars[n_Integer] := Flatten@Array[kvar, n]

When evaluated these functions look like this:

Mathematica graphics

Find fit with minimal Akaike Information Criterion ( AIC ) by calculating a series of fits with 1, 2, ...maxn Gaussian, and selecting the fit with the smallest "AIC" as defined in the NonlinearModelFit documentation.

fitg[data_, maxn_Integer] := MinimalBy[
   Table[
    {#, #["AIC"]}& @ NonlinearModelFit[data, gmodel[n], gpars[n], x]
    , {n, maxn}
    ], Last][[1, 1]]

Usage: fitg[data, maxn] where maxn is the maximum allowed number of Gaussians.

Possibly fitg[data, Length[data]]

Test

Example of data

dat = Uncompress@Import["http://pastebin.com/raw.php?i=VHZ7XJAi"]

Find fit with up to 6 Gaussians and plot

Show[
 ListPlot[dat, PlotStyle -> Red],
 Plot[Evaluate[Normal[fitg[dat, 6]]], {x, -1, 2}]
 ]

Mathematica graphics

fitg[dat, 6]["ParameterTable"]

Mathematica graphics

In this case, only three Gaussians were necessary.

chris
  • 22,860
  • 5
  • 60
  • 149
rhermans
  • 36,518
  • 4
  • 57
  • 149