2

Let $X:=X_1,..,X_n$ iid $\sim P$-distributed where $P$ is a known distribution and $T_n(X_1,..,X_n)$ is a test-statistic with unknown distribution $P(T_n(X)\leq t)$. Furthermore I am interested in the probability $P(T_n\geq t_0)$ for some fixed value $t_0$. To get this probability I start a simulation:

1) I simulate a number of samples $X^b=X_1^b,..,X_n^b$ iid $\sim P$, where $1\leq b\leq k$ is the index of b-th sample.

2) I approximate $P(T_n\geq t_0)$ through the ecdf $\hat P(T_n\geq t_0)=\frac{\#(T_n(X^b)\geq t_0)}{k}$ of $T_n$.

If I dont make a mistake in my explanations, is the mathematical background for this approximation the theorem of Glivenko-Cantelli or do I need more then this theorem or something else?

Klaus
  • 131

1 Answers1

1

This is just an application of the law of large numbers (the Glivenko-Cantelli theorem is even stronger and this is actually not needed here).

Let us just forget about the index $n$. So we want to estimate probabilities of $T(X)$, where the distribution of $X$ is known and can be simulated from, and $T$ is some known (measurable) function. Suppose we have an infinite sample $X^1,X^2,\ldots$ of the $X$'s. Then we can rewrite the empirical CDF in terms of indicator functions: $$ \frac{\# \{1\leq b\leq k\mid T(X^b)\geq t_0\}}{k}=\frac1k\sum_{b=1}^k 1_{\{T(X^b)\geq t_0\}}.\tag{1} $$ Here $1_A$ is the indicator function of the set $A$, i.e. it is $1$ on $A$ and $0$ outside. Now $(1)$ is of the form $\frac1k\sum_{b=1}^k Y_b$, where $Y_b=1_{\{T(X^b)\geq t_0\}}$, $b=1,2,\ldots$, are i.i.d. variables with finite mean, and hence the law of large numbers applies. We obtain $$ \frac1k\sum_{b=1}^k Y_b\to {\rm E}[Y_1]={\rm E}[1_{\{T(X^1)\geq t_0\}}]=P(T(X^1)\geq t_0) $$ which is the desired result.

Stefan Hansen
  • 25,582
  • 7
  • 59
  • 91