1

According to references (e.g. Wikipedia and elsewhere), the Dirichlet distribution, parametrized by $\boldsymbol{\alpha}=(\alpha_1,\ldots,\alpha_K)$, is $$ D(x_1, \ldots, x_K) = \frac{1}{\mathrm{B}(\boldsymbol\alpha)} \prod_{i=1}^K x_i^{\alpha_i - 1} $$ where $$ \mathrm{B}(\boldsymbol\alpha) = \frac{\prod_{i=1}^K \Gamma(\alpha_i)}{\Gamma\left(\sum_{i=1}^K \alpha_i\right)}. $$ So, if $K = 2$ and $\alpha_1 = \alpha_2 = 1$ then this gives $ D(x_1, x_2) = 1/\mathrm{B(\boldsymbol\alpha)} $ where $$ \mathrm{B}(\boldsymbol\alpha) = \Gamma(1)^2 / \Gamma(2) = 1 $$ so, $D(x_1, x_2) = 1$ for all $x_1, x_2$. However, $D(x_1, x_2)$ is defined on the standard $1$-simplex defined in $R^2$ by $x_i \ge 0$ and $x_1 + x_2 = 1$. This is the span (or affine hull) of the two points $(0, 1)$ and $(1, 0)$. Since this is a line segment of length $\sqrt{2}$, the integral of the Dirichlet distribution over this simplex is $\sqrt{2}$, not $1$ as expected. What am I missing here?

The same problem comes in higher dimensions. For instance, for $K=3$, the simplex is a triangle with side $\sqrt{2}$, but the normalization constant becomes $B(\boldsymbol\alpha) = 1/\Gamma(3) = 1/2$, which is not the area of this triangle.

What is wrong here?

1 Answers1

3

I think there are subtle issues with the way Wikipedia has presented the density. (Those who are more knowledgeable are free to correct me.)

The normalizing constant is for a slightly different form of the density, where the simplex is parameterized as $\{(x_1, x_2, \ldots, x_{K-1}, 1-(x_1 + \cdots + x_{K-1}) : x_i \in [0,1]\}$ rather than $\{(x_1, \ldots, x_K) : \sum_i x_i = 1, x_i \in [0,1]\}$. This is completely analogous to how the beta distribution is parameterized, which is a special case of the Dirichlet distribution where $K=2$: note that beta densities are univariate functions on $[0,1]$, rather than a distribution over a line segment in $\mathbb{R}^2$.

Then the density is $$\frac{1}{B(\alpha)} x_1^{\alpha_1 - 1}\cdots x_{K-1}^{\alpha_{K-1} - 1} (1-(x_1 + \cdots + x_{K-1}))^{\alpha_K-1}$$ and integrals are taken over $(x_1, \ldots, x_{K-1}) \in [0,1]^{K-1}$ rather than the $(K-1)$-dimensional simplex in $K$-dimensional space, and this computation yields the normalizing constant. (For instance in your simple examples, you easily get $1$.) The discrepancy with the original parameterization of the simplex involves a factor due to the change of variables, which would give the extra $\sqrt{2}$ or area measurements that you mention.

I am not sure why Wikipedia maintains the problematic notation involving the other parameterization. In any case, the Dirichlet distribution is most prominently used in a Bayesian setting as a prior distribution, and the normalizing constant often is not important.

angryavian
  • 89,882