20

I was recently reading the following paper on visualization techniques using ellipses to gain statistical insights. Elliptical visualization for looking at correlations has briefly been touched on this board as well.

Elliptical Insights: Understanding Statistical Methods through Elliptical Geometry.

There are a couple of plots in the paper on quickly plotting correlations between complex data that I found quite useful and would like to replicate them. Having a function to visualize complex correlations would be useful to a lot of folks… However, my skills in MMA are lacking a bit...

Scatterplot Matrices showing data, 68% data ellipses and regression lines enter image description here

Pam
  • 1,867
  • 11
  • 24
  • I believe you'll have to construct these manually using overlays. ListPlot will create the scatters, and LinearModelFit will regress the data and give you the error information you need to calculate the ellipses. Can you take a stab at implementing the ellipse method described in the paper, or at least spell out the steps? Here's a nudge as far as splitting the data goes: groups = Map[Tuples[Most@#, {2}] &, GatherBy[irisData, Last], {2}]; ListPlot[groups[[All, All, #]]] & /@ Range@16 // Partition[#, 4] & // TableForm – mfvonh Jun 20 '14 at 00:57
  • Look at this link. You can find fantastic techniques related to Confidence ellipses. –  Jun 20 '14 at 05:26
  • Pam, if you've come up with some code that answers the question, please post it as an answer (it's allowed, in fact positively encouraged, to answer your own question). – Simon Woods Jun 24 '14 at 14:33
  • tnx. just posted it as an answer. – Pam Jun 24 '14 at 14:46

2 Answers2

20

I don't have time to do the full-monty on the question, but perhaps this little-known functionality might be of use:

Needs["MultivariateStatistics`"]

(* fake some data *)
data = RandomVariate[BinormalDistribution[{20, 20}, {5, 5}, .75], 500];

Show[{ListPlot[data, PlotRange -> Automatic, AspectRatio -> 1], 
       Graphics[{Red, EllipsoidQuantile[data, .95], 
                 Green, EllipsoidQuantile[data, .99]}]}, 
       PlotRange -> {{0, 40}, {0, 40}}]

enter image description here

ciao
  • 25,774
  • 2
  • 58
  • 139
14

Updated with working code (tnx @rasher @mfvonh)

Let’s start by importing Fisher’s classic dataset on Iris flower measurements… Fisher’s classic paper can be found here….

Needs["MultivariateStatistics`"]

(*Import Data*)
irisData = Import["http://aima.cs.berkeley.edu/data/iris.csv", "CSV"];
plotLabels = {"Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Type"};
T= Transpose;
(*Parse Data and Regress, thanks @mfvonh *)

groups = Map[Tuples[Most@#, {2}] &, GatherBy[irisData, Last], {2}];
pairs = (Dimensions@groups)[[3]];
lm = Table[LinearModelFit[groups[[All, All, i]][[#]], {x}, x] & /@ Range[3], {i, 1, pairs}];
plotLabels =Flatten@
ConstantArray[{"Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width”},Sqrt@pairs];

Setting up plotting options:

(*Set up plot options *)

DodgerBlue = RGBColor[0.117603`, 0.564699`, 1.`]; 
CrimsonRed = RGBColor[0.889996`, 0.149998`, 0.209998`];
SeaGreen = RGBColor[0.180395`, 0.545106`, 0.341197`];

SetOptions[{ListPlot, SmoothHistogram}, AspectRatio -> 1, 
Frame -> True, ImageSize -> 150, 
PlotStyle -> {CrimsonRed, SeaGreen, DodgerBlue}, 
FrameTicks -> {Automatic, Automatic, None, None}, 
BaseStyle -> {FontFamily -> "Myriad Pro", FontTracking -> "SemiCondensed", 
FontWeight ->"Thin", FontSize -> 10}];

Let’s create some helper functions for the individual plots:

(*Elliptical Insights*)

Clear[regPlot, data, regressions, MinMax];

MinMax[x_] := Flatten[{Min[x], Max[x]}];

ellipseInsight[data_, regressions_, colors_: {CrimsonRed, SeaGreen, DodgerBlue}, ci_: 0.68] :=

Show[ {

  ListPlot[data[[1]], PlotStyle -> Lighter@colors[[1]]]
, ListPlot[data[[2]], PlotStyle -> Lighter@colors[[2]] ]
, ListPlot[data[[3]], PlotStyle -> Lighter@colors[[3]] ]

, Plot[regressions[[1]][x], {x, Min@(T@data[[1]])[[1]], Max@(T@data[[1]])[[1]]}, PlotStyle -> colors[[1]] ]
, Plot[regressions[[2]][x], {x, Min@(T@data[[2]])[[1]], Max@(T@data[[2]])[[1]]}, PlotStyle -> colors[[2]] ]
, Plot[regressions[[3]][x], {x, Min@(T@data[[3]])[[1]], Max@(T@data[[3]])[[1]]}, PlotStyle -> colors[[3]] ]

, Graphics[{colors[[1]] , Quiet@EllipsoidQuantile[data[[1]], ci]}]
, Graphics[{colors[[2]] , Quiet@EllipsoidQuantile[data[[2]], ci]}]
, Graphics[{colors[[3]] , Quiet@EllipsoidQuantile[data[[3]], ci]}]

}
, PlotRange -> Automatic
, FrameTicks -> {False, True, False, False}
, FrameStyle -> Directive[Thin, Gray]
, Axes -> False
, ImagePadding -> {{pad, pad/4}, {pad, pad/4}}
, AspectRatio -> 1]

Let’s generate the plots:

(* Generate Regression Plots *)

plots = Table[ellipseInsight[groups[[All, All, i]], lm[[i]]], {i, 1, pairs}];

(* Generate Histogram Plots For the Diagonal *)

diags = Table[i (1 + Sqrt@pairs) + 1, {i, 0, Sqrt@pairs - 1} ];

histogramsPlots =Table[
Show[MapThread[
 SmoothHistogram[(T@groups[[All, All, i]][[#1]]), 
   AspectRatio -> 1, PlotStyle -> #2] & 
 , {Range[3], {CrimsonRed, SeaGreen, DodgerBlue}}]
 , PlotRange -> {MinMax @ groups[[All, All, i, 1]], All}
 , ImagePadding -> {{pad, pad/4}, {pad, pad/4}}
 , Frame -> True, FrameStyle -> Directive[Thin, Gray]
 , FrameTicks -> {True, False, False, False}, ImageSize -> 150 
 , FrameLabel -> {"", plotLabels[[i]]}, Axes -> False], {i, diags}];


(*Merge Plots*)

Do[plots[[i (1 + Sqrt@pairs) + 1]] = histogramsPlots[[i + 1]], {i, 0, Sqrt@pairs - 1}];


(*Draw the plots*)
plots // Partition[#, 4] & // Grid

And here’s the sample output: enter image description here

Pam
  • 1,867
  • 11
  • 24