Performing a chi-square goodness of fit test

Question

How would I perform a chi-square goodness of fit test? I have tried the following, where my data consists of the observed values whilst the data pair contains the correct values. Then I use the DistributionFitTest on the data and the data pair.

data = {115, 188, 97};
datapair = {100, 200, 100};
DistributionFitTest[datapair, data]

(* Out= 0.248213 *)

However the result I get is 0.248 and should be 0.216. Am I doing something wrong? And how would I measure the $\chi^2$ value?

I used Maple 15, however I'm opting for learning Mathematica. — Frederik Brinck Jensen, May 15 '12 at 12:44
There is a PearsonChiSquareTest[ ] in Mathematica. If that is what you are looking for, check Menu/Help/ref/PearsonChiSquareTest — Alexei Boulbitch, May 15 '12 at 12:59
@Alexei: that's a specialization of DistributionFitTest[], in fact... (i.e., equivalent to DistributionFitTest[data, dist, "PearsonChiSquare"]) — J. M.'s missing motivation, May 15 '12 at 13:02
(1) What do these data mean? Are they counts? And how are the "correct" values obtained? If they are estimated from the data, that has to be taken into account. (2) Note that the order of the arguments given to DistributionFitTest is incorrect (but putting them in the correct order does not yield a correct test!). — whuber, May 15 '12 at 13:32
They're counts based on an example of Mendle's flora count. The correct values are also example values; and with them I'd like to make a chi-square goodness of fit test. It is done in Maple with the function referred to by J.M. How do I achieve something similar in Mathematica? — Frederik Brinck Jensen, May 15 '12 at 13:36

score 25 · Accepted Answer · edited May 15 '12 at 15:18

25

The name PearsonChiSquareTest has led to a bit of confusion for people wanting to test count/frequency data like these. In short, M just doesn't have this sort of test built in yet.

PearsonChiSquareTest and its equivalent call from DistributionFitTest have been derived according to the methods of D'Agostino and Stephens.

The test computes a maximum of Ceiling[2 n^(2/5)] equiprobable bins (where n is the data length) dropping bins that do not contain any data. These bins are used to compute observed and expected frequency histograms. The chi-square statistic is computed from the computed frequencies in the usual way. The Properties & Relations section of the PearsonChiSquareTest docs give more details on this.

Again, the important distinction is that this is a test for goodness of fit to a distribution with raw data and not a test for count/frequency data. If you want the latter it is easy to put together yourself.

pearsonTest[obs_List, exp_List] /; Length[obs] == Length[exp] :=
 Block[{t},
  t = Total[(obs - exp)^2/exp] // N;
 {Rule["chisqr", t], 
  Rule["p-val", SurvivalFunction[ChiSquareDistribution[Length[exp] - 1], t]]}  
  ]

pearsonTest[{115, 188, 97}, {100, 200, 100}]

==> {"chisqr" -> 3.06, "p-val" -> 0.216536}

edited May 15 '12 at 15:18

J. M.'s missing motivation

124,525
11
401
574

answered May 15 '12 at 15:01

Andy Ross

19,320
2
61
93

Hah. I was scratching my head trying to reconcile the stuff in my textbook with the stuff in Mathematica's docs. Thanks for this! – J. M.'s missing motivation May 15 '12 at 15:12
Thanks for this wonderful reply Andy! Can you give me any reference on where to read about the functions and the syntax of mathematica more deeply? – Frederik Brinck Jensen May 15 '12 at 18:56
@FrederikBrinckJensen you are welcome. I recommend looking at the following post – Andy Ross May 15 '12 at 19:07
Because I want to compare 2-dimensional lists, I adjusted this code to pearsonTest[obs_List, exp_List]/;Dimensions[obs] == Dimensions[exp] :=Block[{t}, t = Total[(Flatten@obs - Flatten@exp)^2/Flatten@exp] // N;{Rule["chisqr", t],Rule[pval,SurvivalFunction[ChiSquareDistribution[Length[Flatten@exp] - 1],t]]}]. However, comparing 2x2 outcomes in Excel and this function gives different values, example observed {{4,6},{6,4}} and expected {{5,5},{5,5}}. Any idea what I am missing? – Nicholas G May 11 '20 at 15:05
My calculation of degrees of freedom was off. The correct formula is pearsonTest[obs_List, exp_List] /;Dimensions[obs] == Dimensions[exp] := Block[{t}, t = Total[(Flatten@obs - Flatten@exp)^2/Flatten@exp] // N;{Rule["chisqr", t], Rule[pval, SurvivalFunction[ ChiSquareDistribution[ Times @@Table[Dimensions[exp][[i]] - 1, {i, Length@Dimensions@exp}]],t]]}]. – Nicholas G May 11 '20 at 15:58

ubpdqn · Answer 2 · 2013-02-12T13:16:06.980

A very good demonstration of contingency table hypothesis testing is here The following is a crude approach and formatting and style issues for presentation are a matter of taste. Apologies for errors and ugliness of code. It could be generalized for hypotheses other than marginal homogeneity ("additive" in the demonstration, This demonstration provides useful template for generating expected values based on particular null hypothesis).

For an r x c contingency table with null hypothesis of marginal homogeneity here is a code:

chi[u_] := 
Module[{dim, dof, rs, cs, n, full, exp, chis, restbl, tbl, exptbl, 
   pv, res},
  dim = Dimensions[u];
  dof = Times @@ (dim - 1);
  rs = Map[Plus @@ # &, u];
  cs = Map[Plus @@ # &, Transpose[u]];
  n = Total[Flatten[u]];
  full = Append[MapThread[Append[#1, #2] &, {u, rs}], Join[cs, {n}]];
  exp = Outer[Times, rs, cs]/n;
  chis = Total[Flatten[(u - exp)^2/exp]];
  restbl = 
   Grid[{{"Degrees of Freedom", dof}, {"Chi Square Statistic", 
      N[chis, 3]}, {"p-value", pv}}, Alignment -> {{Left, "."}}, 
    Frame -> All];
  tbl = Grid[full, 
    Background -> {None, 
      None, {{{dim[[1]] + 1, dim[[1]] + 1}, {1, dim[[2]] + 1}} -> 
        Pink, {{1, dim[[1]] + 1}, {dim[[2]] + 1, dim[[2]] + 1}} -> 
        Pink}}];
  exptbl = Grid[N[exp], Alignment -> {".", "."}];
  pv = N[SurvivalFunction[ChiSquareDistribution[dof], chis]];
  res = {"ChiSquareStatistic" -> N[chis, 2], "p-value" -> pv, 
    "result" -> restbl, "table" -> tbl, "expectedtable" -> exptbl, 
    "fullresults" -> 
     Column[{"Data Table", tbl, "Expected Values Table", exptbl, 
       "Hypothesis Testing", restbl}, Frame -> All, 
      Background -> {Yellow, None, Yellow, None, Yellow, None}], 
    "Properties" -> {"ChiSquareStatistic","p-value", "result", "table", 
      "expectedtable", "fullresults"}
    };
  # /. res &
  ]
chisqt[u_, r_] := chi[u][r]

For example for table

x = {{50, 25, 15}, {25, 25, 35}, {15, 10, 35}, {10, 40, 55}};

chisqt[x, "table"] yields:

enter image description here

chisqt[x, "expectedtable"] yields:

enter image description here

chisqt[x, "result"] yields:

enter image description here

chisqt[x, "fullresults"] yields:

enter image description here

chisqt[x, "Properties"] yields:

{"ChiSquareStatistic","p-value", "result", "table", "expectedtable", "fullresults"}

It seems contingency table is still not available in mma in 2017 ... — SquareOne, Oct 23 '17 at 20:43

Performing a chi-square goodness of fit test

2 Answers2

Linked

Related