9

Take the following:

p= N@{459/10703, 1/973, 95/10703, 635/10703, 179/10703, 565/10703,
      54/973, 474/10703, 794/10703, 548/10703, 52/1529, 1/10703, 
      61/973, 86/1529, 775/10703, 162/10703, 160/10703, 870/10703, 
      157/10703, 816/10703, 691/10703, 471/10703, 192/10703, 
      10/973, 307/10703};

q={5, 6, 8, 20, 5, 14, 15, 12, 14, 5, 16, 14, 9, 8, 14, 9, 17, 18,
   6, 6, 7, 11, 14, 17, 15};

n=200;

CDF[MultinomialDistribution[n, p], q] // AbsoluteTiming

I gave up waiting and aborted.

Any ideas for a faster method in native Mathematica for multinomial distribution CDF?

Edit: I'll be adding a bounty as soon as it's available to stimulate ideas/answers.

Here's a more trivial test case to use for those interested with results/timings from loungebook. Even with this much simpler case, better than five orders of magnitude performance gain.

probs = {.15, .1, .05, .2, .35, .12, .03};
counts = {18, 16, 13, 20, 21, 12, 10};
n = 49;
ClearSystemCache[];
fast = multinomFastCDF2[n, probs, counts] // AbsoluteTiming
ClearSystemCache[];
mma = CDF[MultinomialDistribution[n, probs], counts] // AbsoluteTiming
Last@fast == Last@mma
First@mma/First@fast

{0.00393243, 0.897792}

{2526.37, 0.897792}

True

642446.

ciao
  • 25,774
  • 2
  • 58
  • 139
  • 1
    A brute force approach would mean taking a look at (q + 1 /. List -> Times)/(Max[q] + 1) combinations which turns out to be 15,236,712,344,935,710,720,000,000 combinations. A fast method would have to be awfully efficient. – JimB Feb 16 '17 at 05:09
  • @JimBaldwin: That number is off I think, nonetheless, came up with something reasonably quick. – ciao Feb 16 '17 at 06:44
  • How far off could a number like 15,236,712,344,935,710,720,000,000 be? ;) I simply calculated the total number of possible values for each element of q divided by the maximum + 1 as knowing all but one results in knowing of that remaining value. Choosing the largest element minimizes the number of brute force combinations. – JimB Feb 16 '17 at 06:49
  • And I thought other build-ins were slow. 642446X! 8-O – Mr.Wizard Feb 25 '17 at 00:38

2 Answers2

9

Came up with this, using N[{...},14] on the setting of p in the OP to get sufficient result precision, it finishes in a few tenths on the loungebook:

multinomFastCDF2[n_Integer, p_, q_] := If[n < 0 || n > Tr[q], 0,
   FullSimplify[E n! Last@Fold[Take[ListConvolve[##, {1, -1}, 0], UpTo[n + 1]] &, 
                               MapThread[Divide[Gamma[#2 + 1, #1], Gamma[#2 + 1]] 
                               Normalize[Divide[E^-#1 #1^Range[0, #2], Range[0, #2]!], Total] &, 
                                        {p,q}]]]];

Arguments are the number of trials, category probabilities, and CDF vector to evaluate.

If anyone can come up with better, I'm all eyes...

ciao
  • 25,774
  • 2
  • 58
  • 139
1

You may be interested by the algorithm I developed for that, using Poisson's summation formula. The details are here: https://link.springer.com/article/10.1007/s11222-012-9334-8 This algorithm has been implemented into the C++ library OpenTURNS, which is interfaced with Python, see www.openturns.org

Here is a script solving the two given examples:

import openturns as ot

Case 1

p = [.15, .1, .05, .2, .35, .12, .03] q = [18, 16, 13, 20, 21, 12, 10] N = 49 d = ot.Multinomial(N, p) print("cdf=", d.computeCDF(q))

Case 2

p = [459/10703, 1/973, 95/10703, 635/10703, 179/10703, 565/10703, 54/973, 474/10703, 794/10703, 548/10703, 52/1529, 1/10703, 61/973, 86/1529, 775/10703, 162/10703, 160/10703, 870/10703, 157/10703, 816/10703, 691/10703, 471/10703, 192/10703, 10/973, 307/10703] q = [5, 6, 8, 20, 5, 14, 15, 12, 14, 5, 16, 14, 9, 8, 14, 9, 17, 18, 6, 6, 7, 11, 14, 17, 15]

N = 200

d = ot.Multinomial(N, p) print("cdf=", d.computeCDF(q))

The results are obtained in 0.22s on my laptop:

cdf= 0.8977915047222008
cdf= 1.4869651687992657e-14

The algorithm should be easily translated into the Mathematica language and could be checked against the C++ implementation for an easy debug.

Cheers

Régis

  • I shall pursue your paper with great interest. – ciao Apr 15 '23 at 20:53
  • 2
    To all those flagging this as "not an answer" because it's not in Mathematica — I hear you, but it also seems to me that deleting this answer might not be in the best interest either as it provides a useful algorithm to the problem at hand. Instead of flagging this, someone could take the initiative to either translate the algorithm to native Mathematica or a write a wrapper around the C library. I'll definitely vote for that and even throw in a +500 bounty if @ciao accepts the answer :) – rm -rf Oct 01 '23 at 03:27