10

I want to use a distribution I have only aggregate statistics on, namely its CDF sampled at certain points. I would like to keep it "nonparametric" (remain noncommittal on the parametric form), but I need some interpolation over these points. Also, I would like to use a PDF of the distribution. So I would need an interpolation that is monotone and differentiable over a list. What is there to do?

(FWIW: With Interpolation[], only a first-order interpolation is monotone, but then there are non-differentiable breakpoints in the piecewise linear function.)

Here is a list I would interpolate over:

EDIT: This is the US taxable income distribution for 2009 which I manually extended with two endpoints to help it look like a CDF. This latter point might be silly, let me know how you'd do better.

{{0, 0}, {5000., 0.00373625}, {10000., 0.0269361}, {15000., 0.0621601}, {20000., 0.121612},
 {25000., 0.178247}, {30000., 0.234482}, {40000., 0.3516}, {50000., 0.453972},
 {75000., 0.654898}, {100000., 0.78906}, {200000., 0.952381}, {500000., 0.991166},
 {1.*10^6, 0.997134}, {1.5*10^6, 0.998442}, {2.*10^6, 0.998977}, {5.*10^6, 0.999727},
 {1.*10^7, 0.9999}, {10000000000, 1}}
J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574
László
  • 345
  • 2
  • 10

3 Answers3

14

The code in Heike's answer is a bit long, but only because it does not exploit the fact that piecewise Hermite interpolation is already supported by Mathematica as Interpolation[]. Thus, here is a shorter way to implement Fritsch-Carlson monotonic cubic interpolation:

fcint[data_] := Module[{del, slopes, tau},
                       del = #2/#1 & @@@ Differences[data];
                       slopes = Flatten[{del[[1]], MovingAverage[del, 2], del[[-1]]}];
                       tau = MapThread[Min, Through[{Append, Prepend}[Min[#, 1] & /@
                                       (3 del/(Norm /@ Partition[slopes, 2, 1])), 1]]];
                       Interpolation[Transpose[{List /@ data[[All, 1]], data[[All, -1]],
                                slopes tau}], InterpolationOrder -> 3, Method -> "Hermite"]]

Try it out:

data = {{0, 0}, {5000., 0.00373625}, {10000., 0.0269361}, {15000., 0.0621601},
        {20000., 0.121612}, {25000., 0.178247}, {30000., 0.234482}, {40000., 0.3516},
        {50000., 0.453972}, {75000., 0.654898}, {100000., 0.78906}, {200000., 0.952381},
        {500000., 0.991166}, {1.*10^6, 0.997134}, {1.5*10^6, 0.998442}, {2.*10^6, 0.998977},
        {5.*10^6, 0.999727}, {1.*10^7, 0.9999}, {10000000000, 1}};

fun = fcint[data];

Plot[fun[x], {x, 0, 10^6}, Axes -> None,
     Epilog -> {Red, AbsolutePointSize[4], Point[data]}, Frame -> True, PlotRange -> All]

plot of Fritsch-Carlson piecewise cubic interpolant


I actually have implemented various methods for producing piecewise monotonic interpolants in Mathematica, but I don't think this is the right place to list them all. (Maybe for another time and question...) In the meantime, however, I still want to show that the method presented above is not the only (much less the best) method for doing monotonic interpolation. For instance, here is another method, also due to Fritsch and Carlson:

PCHIPEnds[h1_, h2_, d1_, d2_] := With[{d = ((2 h1 + h2) d1 - h1 d2)/(h1 + h2)}, 
          Which[Sign[d] != Sign[d1], 0, 
                Sign[d1] != Sign[d2] && Abs[d] > 3 Abs[d1], 3 d1,
                True, d]]

PCHIPInterpolation[data_] := Module[{dTrans = Transpose[data], del, h},
    h = Differences[First[dTrans]]; del = Differences[Last[dTrans]]/h;
    Interpolation[Transpose[{List /@ First[dTrans], Last[dTrans], Flatten[
                  {PCHIPEnds[h[[1]], h[[2]], del[[1]], del[[2]]], 
                   If[Equal @@ Sign[Last[#]] && And @@ Thread[Last[#] != 0], 
                      3 Total[First[#]]/Total[({{1, 2}, {2, 1}}.First[#])/Last[#]], 0] & /@
                      Transpose[Partition[#, 2, 1] & /@ {h, del}], 
                   PCHIPEnds[h[[-1]], h[[-2]], del[[-1]], del[[-2]]]}]}], 
                  InterpolationOrder -> 3, Method -> "Hermite"]]

This is in fact the same algorithm behind the MATLAB function pchip().

Try it out:

fun = PCHIPInterpolation[data];

Plot[fun[x], {x, 0, 10^6}, Axes -> None,
     Epilog -> {Red, AbsolutePointSize[4], Point[data]}, Frame -> True, PlotRange -> All]

plot of "PCHIP" interpolant


As an addendum: the Fritsch-Carlson paper notes that the interpolant from Akima's (1970) interpolation method (which can also be recast as a piecewise cubic Hermite interpolant) is not guaranteed to preserve the monotonicity of the data (thus, I don't recommend its use for constructing an interpolated CDF). If, even with this caveat, you still want to use the Akima interpolant, here's a short routine for producing it:

AkimaInterpolation[data_] := Module[{dy},
  dy = #2/#1 & @@@ Differences[data];
  Interpolation[Transpose[{List /@ data[[All, 1]], data[[All, -1]], 
                With[{wp = Abs[#4 - #3], wm = Abs[#2 - #1]}, 
                     If[wp + wm == 0, (#2 + #3)/2, (wp #2 + wm #3)/(wp + wm)]] & @@@
                Partition[Join[{{3, -2}, {2, -1}}.Take[dy, 2], dy,
                               {{-1, 2}, {-2, 3}}.Take[dy, -2]], 4, 1]}], 
                InterpolationOrder -> 3, Method -> "Hermite"]]

(There is a newer (1991) cubic interpolant also due to Akima, but the algorithm for producing it is a bit more involved. It, too, does not preserve monotonicity.)

Try it out:

fun = AkimaInterpolation[data];

Plot[fun[x], {x, 0, 10^6}, Axes -> None,
     Epilog -> {Red, AbsolutePointSize[4], Point[data]}, Frame -> True, PlotRange -> All]

plot of Akima interpolant

J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574
10

Following the method on the wikipedia page mentioned in the comments, I came up with this

interp[pts_] := Module[{delta, mlst, zeropos, tau, h00, h01, h10, h11},
   delta = #2/#1 & @@@ Differences[pts];
   mlst = Flatten[{delta[[1]], MovingAverage[delta, 2], delta[[-1]]}];
   tau = Min[#, 1] & /@ (3 delta/Sqrt[(Most[mlst]^2 + Rest[mlst]^2)]);
   tau = MapThread[Min, {ArrayPad[tau, {0, 1}, 1], ArrayPad[tau, {1, 0}, 1]}];
   mlst = mlst tau;
   h01[t_] := -2 t^3 + 3 t^2;
   h00[t_] := h01[1 - t];
   h11[t_] := t^3 - t^2;
   h10[t_] := -h11[1 - t];
   Function[{x},
    Evaluate@Piecewise[MapThread[
       Function[{plow, phigh, mlow, mhigh},
        Module[{diff, t},
         diff = phigh[[1]] - plow[[1]];
         t = (x - plow[[1]])/diff;
         {Expand[plow[[2]] h00[t] + diff mlow h10[t] + phigh[[2]] h01[t] +
           diff mhigh h11[t]], plow[[1]] <= x <= phigh[[1]]}]],
       {Most[pts], Rest[pts], Most[mlst], Rest[mlst]}]]]];

fun = interp[pts];

Plot[fun[x], {x, 0, 10^6}, Exclusions -> None, PlotRange -> All, Epilog -> Point[pts]]

Mathematica graphics

Heike
  • 35,858
  • 3
  • 108
  • 157
  • 1
    On version 7 mlst = Join[{delta[[1]]}, (Most@delta + Rest@delta)/2, {delta[[-1]]}] is faster; can you confirm for v8? Also, consider tau = MapThread[Min, {{##, 1}, {1, ##}}] & @@ tau – Mr.Wizard May 23 '12 at 15:56
  • @Heike, This is impressive, but on a more detailed list (113 elements) I got gaps in the interpolation. What could cause that? Shame I cannot decipher on my own. – László May 23 '12 at 17:50
8

Update: The link in the late Jens-Peer Kuska's MathGroup post is no longer working, and it seems there are no other locations on the web to download the package from. So, here I post the contents of the package with gratitute to Jens-Peer Kuska for his continuing service to the Mathematica community.

Nonparametric Splines package - Jens-Peer Kuska

 BeginPackage["NonParametricSplines`"]

 CubicSplineInterpolation::usage="CubicSplineInterpolation[x,y] compute a cubic spline interpolation function. It return a CubicSpline[] function, the function should have smooth first and second derivatives.."
 CubicSpline::usage="CubicSpline[] represent a interpolation function returned by    CubicSplineInterpolation[]."

 AkimaSplineInterpolation::usage="AkimaSplineInterpolation[x,y] compute a Akima spline interpolation. It return a AkimaSpline[] function with smooth first derivative."
 AkimaSpline::usage="AkimaSpline[] represent a interpolation function returned by AkimaSplineInterpolation[].."

 ExtractSplineData::usage="ExtractSplineData[spline_] return the list of the original {x,y} pairs."

 Begin["`Private`"]

 ExtractSplineData[CubicSpline[x_,y_,__]]:=Transpose[{x,y}]
 ExtractSplineData[AkimaSpline[x_,y_,__]]:=Transpose[{x,y}]

 SplineDataRange[CubicSpline[x_,__]]:={Min[#],Max[#]} &[x]
 SplineDataRange[AkimaSpline[x_,__]]:={Min[#],Max[#]} &[x]

 CubicSplineInterpolation[x_, y_, yp1_, ypn_] :=
    Module[{n, u, d2y, i, sig, p, qn, un},
    n = Length[y];
    u = Table[0, {n}];
    d2y = u;
    If[NumericQ[yp1],
    d2y[[1]] = 0.5;
    u[[1]] = (3.0/(x[[2]] - x[[1]]))*((y[[2]] - y[[1]])/(x[[2]] - x[[1]]) - yp1);
   ];
   Do[
   sig = (x[[i]] - x[[i - 1]])/(x[[i + 1]] - x[[i - 1]]);
   p = sig*d2y[[i - 1]] + 2.0;
   d2y[[i]] = (sig - 1.0)/p;
   u[[i]] = (y[[i + 1]] - y[[i]])/(x[[i + 1]] - x[[i]]) - (y[[i]] -  y[[i - 1]])/(x[[i]] - x[[i - 1]]);
   u[[i]] = (6.0*u[[i]]/(x[[i + 1]] - x[[i - 1]]) - sig*u[[i - 1]])/p, {i, 2, n - 1} ];
   If[NumericQ[ypn],  qn = 0.5;  un = (3.0/(x[[n]] - 
     x[[n - 1]]))*(ypn - (y[[n]] - y[[n - 1]])/(x[[n]] - 
      x[[n - 1]])),
    un = qn = 0.0 ];
   d2y[[n]] = (un - qn*u[[n - 1]])/(qn*d2y[[n - 1]] + 1.0);
   Do[d2y[[i]] = d2y[[i]]*d2y[[i + 1]] + u[[i]],  {i, n - 1, 1, -1}];
   CubicSpline[x, y, d2y]
   ]

 CubicSplineInterpolation[xy : {{_?NumericQ, _?NumericQ} ..}, yp1_, ypn_] :=
    CubicSplineInterpolation[Sequence @@ Transpose[xy], yp1, ypn]

 CubicSplineInterpolation[xy : {{_?NumericQ, _?NumericQ} ..}] := 
  CubicSplineInterpolation[Sequence @@ Transpose[xy], Automatic, Automatic]

 spline = Compile[{{x, _Real, 1}, {y, _Real, 1}, {d2y, _Real, 1}, {t, _Real}},
    Module[{a, b, h, low = 1, hi = Length[x], j},
    While[hi - low > 1,  j = Quotient[hi + low, 2];
     If[x[[j]] > t, hi = j, low = j]   ];
     h = x[[hi]] - x[[low]];
     a = (x[[hi]] - t)/h;
     b = 1 - a;
     a*y[[low]] + b*y[[hi]] + ((a^3 - a)*d2y[[low]] + (b^3 - b)*d2y[[low]])*h^2/6
]];

dspline = Compile[{{x, _Real, 1}, {y, _Real, 1}, {d2y, _Real, 1}, {t, _Real}},
    Module[{a, b, h, low = 1, hi = Length[x], j},
      While[hi - low > 1,
       j = Quotient[hi + low, 2];
        If[x[[j]] > t, hi = j, low = j]];
       h = x[[hi]] - x[[low]];
       a = (x[[hi]] - t)/h;
       b = 1 - a;
      (y[[hi]] - y[[low]])/h +h*((3*b^2 - 1)*d2y[[hi]] - (3*a^2 - 1)*d2y[[low]])/6
]];    

 ddspline =  Compile[{{x, _Real, 1}, {y, _Real, 1}, {d2y, _Real, 1}, {t, _Real}},
    Module[{a, b, h, low = 1, hi = Length[x], j},
      While[hi - low > 1, j = Quotient[hi + low, 2];
        If[x[[j]] > t, hi = j, low = j] ];
        h = x[[hi]] - x[[low]];
        a = (x[[hi]] - t)/h;
        b = 1 - a;
       b*d2y[[hi]] + a*d2y[[low]] ]];

 CubicSpline[x_, y_, d2y_][t_?NumericQ] := spline[x, y, d2y, t]
 CubicSpline[x_, y_, d2y_]'[t_?NumericQ] := dspline[x, y, d2y, t]
 CubicSpline[x_, y_, d2y_]''[t_?NumericQ] := ddspline[x, y, d2y, t]
 Format[CubicSpline[x_, _, _]] := CubicSpline[{First[x], Last[x]}, "<>"]

 AkimaSplineInterpolation[x_List, y_List, yp1_, ypn_] /; Length[x] == Length[y] :=
  Module[{i, n, m, t, tmp1, tmp2},
    n = Length[x];
    m = Table[0, {n + 4}];
    Do[ m[[i + 2]] = (y[[i + 1]] - y[[i]])/(x[[i + 1]] - x[[i]]), {i, 1, n - 1}];
      If[NumericQ[yp1],  m[[1]] = m[[2]] = yp1,   m[[1]] = m[[2]] = m[[3]]  ];
      If[NumericQ[ypn],   m[[n + 3]] = m[[n + 4]] = ypn,
        m[[n + 3]] = m[[n + 4]] = m[[n - 1]]  ];
      m=N[m]; 
      t = Table[0, {n}];
    Do[ tmp1 = Abs[m[[i + 3]] - m[[i + 2]]];
        tmp2 = Abs[m[[i + 1]] - m[[i]]];
        t[[i]] = If[tmp1 + tmp2 > 0, (* Then *)
       (tmp1*m[[i + 1]] + tmp2*m[[i + 2]])/ (tmp1 + tmp2),
      (* Else *)   (m[[i + 1]] + m[[i + 2]])/2 ], {i, 1, n}];
   AkimaSpline[x, y, t]
  ]

 AkimaSplineInterpolation[xy : {{_?NumericQ, _?NumericQ} ..}, yp1_, ypn_] :=
    AkimaSplineInterpolation[Sequence @@ Transpose[xy], yp1, ypn]

 AkimaSplineInterpolation[xy : {{_?NumericQ, _?NumericQ} ..}] :=
    AkimaSplineInterpolation[Sequence @@ Transpose[xy], Automatic, Automatic]

 aspline =  Compile[{{x, _Real, 1}, {y, _Real, 1}, {ta, _Real, 1}, {t, _Real}},
    Module[{a, b, c, d, h, del, tmp, low = 1, hi = Length[x], j},
     While[hi - low > 1,  j = Quotient[hi + low, 2];
      If[x[[j]] > t, hi = j, low = j]  ];
      h = 1.0/(x[[hi]] - x[[low]]);
      a = y[[low]];
      b = ta[[low]];
      tmp = (y[[hi]] - y[[low]])*h;
      c = (3.0*tmp - 2.0*ta[[low]] - ta[[hi]])*h;
      d = (ta[[low]] + ta[[hi]] - 2.0*tmp)*h*h;
      del = t - x[[low]];
      a + del*(b + del*(c + del*d)) ]];

  daspline =   Compile[{{x, _Real, 1}, {y, _Real, 1}, {ta, _Real, 1}, {t, _Real}},
     Module[{b, c, d, h, del, tmp, low = 1, hi = Length[x], j},
      While[hi - low > 1,
        j = Quotient[hi + low, 2];
        If[x[[j]] > t, hi = j, low = j]  ];
        h = 1.0/(x[[hi]] - x[[low]]);
        b = ta[[low]];
        tmp = (y[[hi]] - y[[low]])*h;
        c = (3.0*tmp - 2.0*ta[[low]] - ta[[hi]])*h;
        d = (ta[[low]] + ta[[hi]] - 2.0*tmp)*h*h;
        del = t - x[[low]];
        b + (2.0* c + 3.0* d *del) *del  ]];

  AkimaSpline[x_, y_, ta_][t_?NumericQ] := aspline[x, y, ta, t]
  AkimaSpline[x_, y_, ta_]'[t_?NumericQ] := daspline[x, y, ta, t]

  Format[AkimaSpline[x_, _, _]] := AkimaSpline[{First[x], Last[x]}, "<>"]

  End[]
  EndPackage[]

AkimaSplineInterpolation in the NonParametricSplines package seems to be a good fit for your needs.

EDIT: The package does not come with the Mathematica installation; it was posted in 2008 in MathGroup by the late Jens-Peer Kuska. The link in the MathGroup post for Kuska's zipped files is still active.

doc page for AkimaSplineInterpolation

Needs["NonParametricSplines`"]
dataset = {{0, 0}, {5000., 0.00373625}, {10000., 0.0269361}, {15000.,  0.0621601},
           {20000., 0.121612}, {25000., 0.178247}, {30000., 0.234482}, {40000., 0.3516},
           {50000., 0.453972}, {75000.,  0.654898}, {100000., 0.78906}, {200000., 0.952381},
           {500000., 0.991166}, {1.*10^6, 0.997134}, {1.5*10^6, 0.998442},
           {2.*10^6, 0.998977}, {5.*10^6, 0.999727}, {1.*10^7, 0.9999}, {10000000000, 1}};
asi = AkimaSplineInterpolation[dataset]
(* ==> AkimaSpline[{0,10000000000},"<>"] *)
Manipulate[Show[
ListPlot[dataset[[;; i]], PlotMarkers -> {Automatic, Medium}, Frame -> True], 
Plot[asi[x], {x, 0, 10000000}, Frame -> True, PlotStyle -> {Red, Thick}, PlotRange -> All]],
{{i, 10, "i"}, 10, 19, 1}]

Screenshot:

screenshot

kglr
  • 394,356
  • 18
  • 477
  • 896
  • This package isn't included in my version of Mathematica – Heike May 23 '12 at 08:30
  • Neither in mine. 8.0.4. What version do you use? – Szabolcs May 23 '12 at 08:38
  • @Heike, I am using v8.0.4.0 (for Windows). Just noticed that the link in my post (copied from package docs and used to work half hour ago) is now redirected to Wolfram's Oops page; and searching for NonParametricSplines, AkimaSpline etc does not give any results. – kglr May 23 '12 at 08:56
  • @Szabolcs & Heike, ... I don't remember installing this add-on at all - I am pretty sure it came with the standard installation. BTW, although the package seems to have been around since version 6, my V7 installation does not have this add-on. – kglr May 23 '12 at 09:01
  • @kguler I've found the package via a post on Mathgroup: http://forums.wolfram.com/mathgroup/archive/2011/Jun/msg00011.html – Heike May 23 '12 at 09:18
  • This does not work on my 8.0.4.0 either. I never heard of this spline before. The link is indeed broken. – PlatoManiac May 23 '12 at 09:20
  • Thank you @Heike for identifying the source of the package. Yet another reminder how much/little I should trust my memory:) I will update the post with the link. – kglr May 23 '12 at 09:25
  • @kguler et al.: Thanks, this is great. But before I could accept this, could let me know why I see the drop at the second largest value back to zero here: Join[SOIlist, Saezlist, bounds]={{5000., 0.00373625}, {10000., 0.0269361}, {15000., 0.0621601}, {20000., 0.121612}, {25000., 0.178247}, {30000., 0.234482}, {40000., 0.3516}, {50000., 0.453972}, {75000., 0.654898}, {108024., 0.9}, {150400., 0.95}, {352055., 0.99}, {521246., 0.995}, {1.49218*10^6, 0.999},{7.89031*10^6,0.9999}, {0, 0}, {10000000000, 1}} ecdf = AkimaSplineInterpolation[Join[SOIlist, Saezlist, bounds], 0, 0] – László May 23 '12 at 15:11
  • And there are discontinuities in the derivative too, thus the interpolation is not smooth. Why? – László May 23 '12 at 15:13
  • @Laszlo, I need to check the issues you mention. This is the first time that I use this package. Although the package is quite nice and clean so that you can modify any part to implement your own requirements. My preference would be to go with Heike's solution. – kglr May 23 '12 at 18:39
  • @kguler: Thanks! Actually, I got gaps with Heike's solution too, for some reason. – László May 23 '12 at 18:43
  • 1
    The link seems to be kaput for me. Would you happen to know another place where I can download Kuska's package? – J. M.'s missing motivation Oct 11 '12 at 14:52
  • @J.M. that's the only link I know of and it does not work any longer. Google does not return any leads either. – kglr Oct 11 '12 at 15:03
  • Do you have a copy of the file yourself? – Mr.Wizard Oct 11 '12 at 19:48
  • @Mr.W, yes I do. Is it possible/advisable to make the zip file available on MMA.SE? – kglr Oct 11 '12 at 19:54
  • I cannot advise you on that, but I'm sure some people would be happy if you do. How big is the file? – Mr.Wizard Oct 11 '12 at 19:55
  • Mr.W: Not too big -- zip file is 152KB. – kglr Oct 11 '12 at 20:01
  • @kguler please upload somewhere! – PlatoManiac Nov 16 '12 at 01:33
  • @PlatoManiac, please see the update. – kglr Nov 18 '12 at 22:17
  • @J.M., posted the contents of Kuska's package. – kglr Nov 18 '12 at 22:18
  • Thanks kguler, and thanks Jens (R.I.P)! – J. M.'s missing motivation Nov 18 '12 at 23:47