2

I saw a question about creating contour lines over a smoothdensityhistogram plot. It really help me to create this kind of graphs.

I want to ask if is possible to extract only the points inside a certain contour to obtain, for example, mean values or desviation.

kind regards

Renato QE
  • 61
  • 3
  • What do you mean by "mean values" of the points inside a certain contour. Mean location? Mean height of the density curve? Or are that additional attributes associated with each data point such as percent foliage cover, elevation, tree density, etc., for which you want a mean or standard deviation? – JimB Oct 10 '23 at 17:10
  • Yes mean location, or just extract these points in a array to do other calculations. – Renato QE Oct 12 '23 at 02:30
  • 1
    @ydd 's splitUp list created by using Table and Select shows how you can extract the points either between contours or inside or outside a particular contour. – JimB Oct 12 '23 at 02:45
  • Also note that a "mean location" could still be calculated from the bivariate density even if there were no data points within a particular contour. This would be analogous to the mean of a truncated distribution. I would think that after the trouble estimating a bivariate distribution that summary calculations would be done based on the estimated bivariate density. The data points have done their job and it would be relying on the entity that you've estimated for future calculations. – JimB Oct 12 '23 at 02:51

1 Answers1

2

Using @Ian 's example from your linked question:


Ian's part (except I added explicit SeedRandom)

SeedRandom[123];

two dimensional data) data = RandomVariate[BinormalDistribution[.75], 200];

(nonparametric density estimate) d = SmoothKernelDistribution[data];

(logged density estimate at data) p = PDF[d, data];

(quantile for countour height) q = 1 - {0.68, 0.95, 0.99};

c = Quantile[p, q];


Then create your quantile bounds by padding a 1 and a 0 to either end of q

cPad = PadLeft[c, Length@c + 1, 1];
cPad = PadRight[cPad, Length@cPad + 1, 0];
quantileBounds = Partition[cPad, 2, 1]
(*{{{1, 0.0759093}, {0.0759093, 0.0200241}, {0.0200241, 
  0.00849196}, {0.00849196, 0}}*)

And split up data based on which quantiles their PDF values fall between:

splitUp = 
Table[Select[data, Last[i] < PDF[d, #] <= First[i] &], {i, 
   quantileBounds}];

(If you don't want to calculate the PDF again of all the data points, you could use Position instead but I feel this looks uglier than the Select implementation):

splitUp = 
  Table[Part[data, 
    Flatten@Position[p, n_ /; Last[i] < n <= First[i]]], {i, 
    quantileBounds}];

Comparing mean and variances. As expected, we see the variance grow as you go out to more extreme quantiles:

Mean /@ splitUp

({{0.0116993, -0.147756}, {0.214602, 0.222228}, {-0.361543, -0.202958}, {-0.741956, -1.677}})

Variance /@ splitUp

({{0.382458, 0.346718}, {1.4757, 1.59431}, {2.1283, 3.82937}, {8.91892, 2.63842}})

And plotting different point colors depending on quantiles:

nonEmptyContours = Select[splitUp, Length[#] > 0 &];
lpAll = ListPlot[nonEmptyContours, 
   PlotStyle -> {Black, Green, Orange, Red}];

Show[DensityPlot[PDF[d, {x, y}], ##], ContourPlot[PDF[d, {x, y}], ##, Contours -> c, ContourShading -> None], lpAll] &[{x, -4, 4}, {y, -4, 4}, PlotRange -> All]

Mathematica graphics

Edit: I think you were asking actually how to do this using JimB's answer. The methodology is basically the same:


JimB's part (except adding explicit SeedRandom and set c to contours):

SeedRandom[123];
data = RandomVariate[BinormalDistribution[.75], 100];
(*Calculate a nonparametric density estimate*)
d = SmoothKernelDistribution[data];

(Evaluate the estimated density function over a grid of points and
sort by the density values from high to low
) pdf = Reverse[ Sort[Flatten[ Table[PDF[d, {x, y}], {x, -3, 3, 0.05}, {y, -3, 3, 0.05}]]]];

(Create a table of cumulative pdf values that correspond to the
volumes of interest
) cdf = Accumulate[pdf]/Total[pdf]; contours = pdf[[Flatten[ Table[FirstPosition[cdf, p_ /; p >= alpha], {alpha, {0.68, 0.95, 0.99}}]]]];

c = contours;


Then do everything the same as before. We can color the points based on their contours:

cPad = PadLeft[c, Length@c + 1, 1];
cPad = PadRight[cPad, Length@cPad + 1, 0];
quantileBounds = Partition[cPad, 2, 1];
splitUp = 
  Table[Select[data, Last[i] < PDF[d, #] <= First[i] &], {i, 
    quantileBounds}];

sdh = SmoothDensityHistogram[data, PlotRange -> All]; cp = ContourPlot[PDF[d, {x, y}], {x, -4, 4}, {y, -4, 4}, Contours -> contours, ContourShading -> None, PlotRange -> All]; nonEmptyContours = Select[splitUp, Length[#] > 0 &];

coloredByContourPoints = ListPlot[nonEmptyContours, PlotStyle -> {Black, Green, Orange, Red}]; Show[{sdh, cp, coloredByContourPoints}, PlotRange -> All]

Mathematica graphics

ydd
  • 3,673
  • 1
  • 5
  • 17
  • This a question for the Graphics folks…is there a way to decide colors based on how “extreme” they seem? The green really seems to stand out here and so I’d want it to be the outermost color. Orange should really be the second color because it doesn’t stand out so much – ydd Oct 12 '23 at 04:06
  • Thanks! this works perfectly! – Renato QE Oct 12 '23 at 04:50