ProbabilityCentral Limit Theorem
Convergence in distribution
The central limit theorem, one of the most important results in applied probability, is a statement about the convergence of a sequence of probability measures. So, we begin this section by exploring what it should mean for a sequence of probability measures to converge to a given probability measure.
Roughly speaking, we will consider two probability measures close if they put approximately the same amount of probability mass in approximately the same places on the number line. For example, a sequence of continuous probability measures with densities converges to a continuous probability measure with density if for all :
If the limiting probability measure is not continuous, then the situation is slightly more complicated. For example, we would like to say that the probability measure which puts a mass of at and a mass of at converges to the
We can get around this problem by giving ourselves a little space to the left and right of any point where the limiting measure has a positive probability mass. In other words, suppose that is a probability measure on with probability mass function , and consider an interval . Let's call such an interval a continuity interval of if and are both zero.
We will say that a sequence of probability measures converges to if converges to for every continuity interval of .
We can combine the discrete and continuous definitions into a single definition:
Definition (Convergence of probability measures on )
A sequence of probability measures on converges to a probability measure on if whenever is an interval satisfying , where and are the endpoints of .
Exercise
Define to be when and 0 otherwise, and let be the probability measure with density . Show that converges to the probability measure which puts of all its mass at the origin.
Solution. Suppose is a continuity interval of .
If contains the origin, then the terms of sequence are equal to for large enough , since all of the probability mass of is in the interval and eventually .
If does not contain the origin, then the terms of the sequence are eventually equal to 0, for the same reason.
In either case, converges to . Therefore, converges to .
The central limit theorem
The law of large numbers tells us that the distribution of a mean of many independent, identically distributed finite-variance, mean- random variables is concentrated around . This a mathematical formalization of the well-known fact that flipping a coin many times results in a heads proportion close to 1/2 with high probability, or the average of many die rolls is very close to 3.5 with high probability.
The central limit theorem gives us precise information about how the probability mass of is concentrated around its mean. Consider a sequence of independent fair coin flips , and define the sums
for . The probability mass functions of the 's can be calculated exactly, and they are graphed in the figure below, for several values of . We see that the graph is becoming increasingly bell-shaped as increases.
If we repeat this exercise with other distributions in place of the independent coin flips, we obtain similar results. For example, the Poisson distribution is a discrete distribution which assigns mass to each nonnegative integer . The probability mass functions for sums of the independent Poisson(3) random variables is shown in the figure below. Not only is the shape of the graph stabilizing as increases, but we're apparently getting the same shape as in the Bernoulli example.
To account for the shifting and spreading of the distribution of , we normalize it: we subtract its mean and then divide by its standard deviation to obtain a random variable with mean zero and variance 1:
So, we define , which has mean 0 and variance 1. Based on the figures above, we conjecture that the distribution of converges as to some distribution with a bell-shaped probability density function.
This conjecture turns out to be correct, with a Gaussian as the limiting distribution. The standard Gaussian distribution is denoted and has probability density function .
Theorem (Central Limit theorem)
Suppose that are independent, identically distributed random variables with mean and finite standard deviation , and defined the normalized sums for .
For all , we have
where . In other words, the sequence converges in distribution to .
The normal approximation is the technique of approximating the distribution of as .
Example
Suppose we flip a coin which has probability 60% of turning up heads times. Use the normal approximation to estimate the value of such that the proportion of heads is between 59% and 61% with probability approximately 99%.
Solution. We calculate the standard deviation and the mean of each flip, and we use these values to rewrite the desired probability in terms of . We find
where the last step was obtained by multiplying all three expressions in the compound inequality by . Since is distributed approximately like a standard normal random variable, the normal approximation tells us to look for the least so that
where . By the symmetry of the Gaussian density, we may rewrite this equation as
Defining the normal CDF , we want to find the least integer such that exceeds . The following code tells us that .
from scipy.stats import norm norm.ppf(0.995)
using Distributions quantile(Normal(0,1), 0.995)
Setting this equal to and solving for gives 15,924. The exact value of for which the probability is closest to 99% is 15,861, so we can see that the normal approximation worked pretty well in this case.
Example
Consider a random variable which is defined to be the sum of independent fair coin flips. The law of such a random variable is called a binomial distribution. Let be the pmf of . Use the code block below to observe that appears to converge to for all , and explain why this does not contradict the central limit theorem.
For simplicity, you may assume that is even.
import matplotlib.pyplot as plt import scipy.stats def binom_stickplot(n): """ Return a stick plot representing the pmf of a sum of n independent coin flips """ ν = scipy.stats.binom(n,0.5) # x contains the possible RV values: x = (np.arange(n+1) - n/2)/np.sqrt(n/2) # y contains the probabilities: y = [ν.pmf(k) for k in range(n+1)] plt.ylim(0,1) return plt.vlines(x,y) binom_stickplot(10)
using Plots, Distributions function binom_stickplot(n) ν = Binomial(n, 0.5) sticks((-n÷2: n÷2)/sqrt(n/2), [pdf(ν, k) for k in 0:n], label = "Binomial($n,1/2)", ylims = (0, 1)) end binom_stickplot(1000)
Solution. Executing the cells, we see that the height of the tallest stick indeed goes to zero as the argument to binom_stickplot
is increased.
This finding does not contract the central limit theorem, since convergence in distribution is not based on convergence of the amount of probability mass at individual points but rather on the amount of probability mass assigned to intervals. In any positive-width interval, the distribution of has many points with nonzero probability mass. Since there are many of them, they can be small individually while nevertheless totaling up to a non-small mass.
Exercise
Suppose that the percentage of residents in favor of a particular policy is 64%. We sample individuals uniformly at random from the population.
In terms of , find a interval centered at 0.64 such that the proportion of residents polled who are in favor of the policy is in with probability about 95%.
How many residents must be polled for the proportion of poll participants who are in favor of the policy to be between 62% and 66% with probability at least 95%?
Solution. Let be the th sample from the population (1 if the resident is in favor, and 0 otherwise). Then the proportion of the residents in favor of the policy is Each
We need to find
Therefore,
and with probability 95%, the proportion of polled residents in favor of the policy will be in
For the second part, we want to find
Exercise
Suppose that
\mathbb{P}(X_1 + \cdots + X_{n} = 7n) \mathbb{P}(6.9n < X_1 + \cdots + X_{n} < 7.1n) \mathbb{P}(7n < X_1 + \cdots + X_{n} < 7n + 3\sqrt{n}) \mathbb{P}(6n < X_1 + \cdots + X_{n} < 7n + 3\sqrt{n})
Solution. Let
For each non-negative integer
We have
by the CLT. Since
by the CLT. We have
by the CLT.