5

What is the best way to produce the actual pairs of points shown by QuantilePlot? Would anyone have an implementation? (I am interested in the simplest possible version: QuantilePlot[l1, l2], where l1, l2 are lists of numbers.

J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574
Igor Rivin
  • 5,094
  • 20
  • 19
  • See https://mathematica.stackexchange.com/questions/19859/plot-extract-data-to-a-file – Vsevolod A. Mar 15 '18 at 19:32
  • And https://stackoverflow.com/questions/18921651/get-points-of-listplot-in-mathematica – Vsevolod A. Mar 15 '18 at 19:34
  • @VsevolodA. Yes, I am aware that foo[[1, 2, 2, 2, 3, 1]] works, but that is not the real question, since I have no idea what parameters mathematica uses (number of points, etc). I can guess, but this is not very satisfying. – Igor Rivin Mar 15 '18 at 19:37
  • 1
    Read your question again: "What is the best way to produce the actual pairs of points shown by QuantilePlot". – Vsevolod A. Mar 15 '18 at 20:04
  • @Vsevolod, that's still not the best way, then; gwr's answer gives a direct method to produce the points from scratch, without the processing overhead of QuantilePlot[]. – J. M.'s missing motivation Mar 16 '18 at 12:17

2 Answers2

4

Reverse-Engineering Mathematica's Q-Q-Plots

You can also use Quantile directly which gives more control about what is happening and precision maybe:

Inspecting points produced by Bob Hanlon's approach and playing around with other sample sizes reveals that Mathematica appears to use the range specified by $\frac{k-0.3}{n+0.4}$ with $k = 1, \ldots, n$ to produce quantiles. Other possibilities are given by Wikipedia (cf. source no. 10 for the option given here).

Thus:

SeedRandom[0];

data1 = RandomVariate[NormalDistribution[2, 3], 100];
data2 = RandomVariate[StudentTDistribution[4, 2, 3], 200];

pts = With[
    {
        n = Min @@ Length /@ { data1, data2 }        
    },
    Curry[Quantile][ Table[(k - 0.3)/(n + 0.4), {k, n}] ] /@ {data2, data1} // Transpose
    (* or Map[ Quantile[#, Table[ ... ]]&] @ {data2,data1} // Transpose *)
];

ListPlot[ pts, Axes-> False, Frame -> True ]

QuantilePlot

We can compare with the QuantilePlot points using Bob Hanlon's approach:

pts2 = Cases[plot, Point[pts_] :> pts, Infinity][[1]];
pts2 - pts

{{0.,0.}, ... , {0.,0.}}

This comparison also holds for samples sizes 200, 300 that I tested so far.

Thanks, Sjoerd C. de Vries, for pointing out the more general range-regime for Q-Q-Plots.

gwr
  • 13,452
  • 2
  • 47
  • 78
  • 1
    Nice use of the new 11.3 function Curry . The Range call, though, is only correct because data 1 happens to be 100 points long. It doesn't generalize to differently sized data sets. – Sjoerd C. de Vries Mar 15 '18 at 22:16
  • 2
    Ah, you beat me to it. To add: the positioning method is due to Bernard and Bos-Levenbach (English version). Spelunking confirms that this method is used internally by the helper function System`QuantilePlotDump`plottingPosition[]. – J. M.'s missing motivation Mar 16 '18 at 12:15
  • 1
    In the case where one is comparing against a distribution instead of another dataset, EstimatedDistribution[] is used on the data (NormalDistribution[] is the default distribution to compare against), and then this is used in the computation of the second set of quantiles. – J. M.'s missing motivation Mar 16 '18 at 12:19
1
SeedRandom[0];

data1 = RandomVariate[NormalDistribution[2, 3], 100];
data2 = RandomVariate[StudentTDistribution[4, 2, 3], 200];

plt = QuantilePlot[data1, data2]

enter image description here

Use Cases to extract the points

pts = Cases[plt, Point[pts_] :> pts, Infinity][[1]];

Dimensions[pts]

{100, 2}

In this case there are 100 points.

ListPlot[pts, Frame -> True, Axes -> False]

enter image description here

Bob Hanlon
  • 157,611
  • 7
  • 77
  • 198