PearsonChiSquareTest for count/frequency data?

Question

I read with interest a previous question on the PearsonChiSquareTest from a few years ago, and one answer was very helpful:

Performing a chi-square goodness of fit test

Have there been any updates since 2012 for testing counts/frequency data? Although easy to implement manually, I do not understand why testing with the following example data (two 'distributions' of counts?) produces different results to those computed manually:

real = {331, 155, 337, 302, 86}
expected = {650.797, 118.261, 325.545, 109.397, 6.99883}

With PearsonChiSquareTest[real,expected] the test statistic is only 1.6, and a high p-value. But manually (or using MATLAB or SPSS) the test gives 1400 and a very small p-value (<0.001). It also confusingly gives degrees of freedom as 3 with the PearsonChiSquareTest. Any suggestions/clarification?

The documentation for PearsonChiSquareTest states that continuous distributions are being compared. When data is supplied, it is in the form of actual sample values and not counts. You need a different test for what is likely observed counts from a multinomial distribution. (I'm more than a bit surprised that the expected counts are exactly integers - or should the variables be named real1 and real2 ?.) — JimB, May 12 '16 at 00:36
You are right, the expected counts are not integers, I had just rounded them before copy/paste. That makes sense about the types of distributions being tested, I did not think about it in that way. — Jim, May 12 '16 at 09:06
If your model (with the data that results in the expected values) can be cast as a generalized linear model, then GeneralizedLinearModelFit would work for you. But you'd need to give more information about your model, data, and how you obtained the expected values. — JimB, May 12 '16 at 15:14

PearsonChiSquareTest for count/frequency data?

0 Answers0