Not every random phenomenon is ideally modeled using a discrete probability space. For example, we will see that the study of discrete distributions leads us to the Gaussian distribution, which smooths its probability mass out across the whole real number line, with most of the mass near the origin and less as you move out toward or .
We won't be able to work with such distributions using probability mass functions, since the function which maps each point to the amount of probability mass at that point is the zero function. However, calculus provides us with a smooth way of specifying where stuff is on the number line and how to total it up: integration. We can define a function which is larger where there's more probability mass and smaller where there's less, and we can calculate probabilities by integrating .
The simplest possible choice for is the function which is on and 0 elsewhere. In this case, the probability mass associated with a set is the total length of .In higher dimensions, with the probability measure gives us a probability space, as does with the probability measure .
Exercise Consider the probability space with the area probability measure. Show that if and , then the events and are independent for any intervals and .
Solution.We have
by the area formula for a rectangle. Since and , we conclude that and are independent.
The probability density function
Just as a function we integrate to find total mass is called a mass density function, the function we integrate to find total probability is called a probability density function. We refer to as a density because its value at a point may be interpreted as limit as of the probability mass in the ball of radius around divided by the volume (or area/length) of that ball.
Definition Suppose that for some , and suppose that has the property that .We call a probability density function, abbreviated PDF, and we define
for events .We call a continuous probability space.
Exercise Consider the probability space with and probability measure given by the density for .Find .
Solution.We calculate .
If is constant on , then we call the uniform measure on .Note that this requires that have finite volume.
All of the tools we developed for discrete probability spaces have analogues for continuous probability spaces. The main idea is to replace sums with integrals, and many of the definitions transfer over with no change. Let's briefly summarize and follow up with some exercises.
The distribution of a continuous random variable is the measure on .
The cumulative distribution function of a continuous random variable is defined by for all .
The joint distribution of two continuous random variables and is the measure on .
If is a continuous pair of random variables with joint density , then the conditional distribution of given the event has density defined by , where is the pdf of
Two continuous random variables and are independent if for all and .This is true if and only if has density , where and are the densities of and ,respectively.
The expectation of a continuous random variable defined on a probability space is , where is 's density. The expectation is also given by , where is the density of the distribution of .
Example Suppose that is the function which returns for any point in the triangle with vertices ,, and and otherwise returns 0. Suppose that has density .Find the conditional density of given , where is a number between and 0 and 1.
Solution.Then the conditional density of given is the uniform distribution on the segment , since that interval is the intersection of the triangle and the horizontal line at height .
Exercise Find the expectation of a random variable whose density is .
Solution.We calculate
Exercise Show that the cumulative distribution function of a continuous random variable is increasing and continuous.
(Note: if is a nonnegative-valued function on satisfying , then for all .)
Solution.The CDF is increasing since whenever .
To see that is continuous, we note that the difference between and is the integral of the density over a width- interval. Thus we can use the supplied note to conclude that as for all .
Exercise Suppose that is a density function on and that is the cumulative distribution function of the associated probability measure on .Show that is differentiable and that wherever is continuous.
Use this result to show that if is uniformly distributed on , then has density function on .
Solution.The equation follows immediately from the fundamental theorem of calculus. We have
at any point where is continuous.
Let be the CDF of .Since for , we have for .Differentiating, we find that the density is on .
Exercise Given a cumulative distribution function , let us define the generalized inverse so that is the left endpoint of the interval of points which are mapped by to a value which is greater than or equal to .
The generalized inverse is like the inverse function of , except that if the graph of has a vertical jump somewhere, then all of the values spanned by the jump get mapped by to the -value of the jump, and if the graph of is flat over a stretch of -values, then the corresponding -value gets mapped by back to the left endpoint of the interval of values.
The remarkably useful inverse CDF trick gives us a way of sampling from any distribution whose CDF we can compute a generalized inverse for: it says that if is uniformly distributed on , then the cumulative distribution of is .
Confirm that if the graph of has a jump from to , then the probability of the event is indeed .
Show that the event has the same probability as the event .Conclude that is in fact the CDF of .Hint: draw a figure showing the graph of together with somewhere on the -axis and in the corresponding location on the -axis.
Write a Python function which samples from the distribution whose density function is .
Solution.
It can be shown that, as result of monotonicity and additivity,
whenever the maximum exists. Now, because a CDF is monotonic, if has a jump from to at it must be the case that and Therefore, Since
Therefore the generalized inverse of F is F^{-1}(u) = \sqrt{u} for all 0 \leq u \leq 1.This leads to the following Julia code for sampling from this distribution.
np.sqrt(np.random.random_sample())
using Distributions
sqrt(rand(Uniform(0, 1)))
General probability spaces
So far we have discussed probability spaces which are specified with the help of either a probability mass function or a probability density function. These are not the only possibilities. For example, if we produce an infinite sequence of independent bits B_1, B_2, \ldots, then the distribution of B_1/3 + B_2 / 3^2 + B_3 / 3^3 + \cdots has CDF as shown in the figure below. This function doesn't have jumps, so it does not arise from cumulatively summing a mass function. But it does all of its increasing on a set of total length zero (in other words, there is a set of total length 1 on which the derivative of this function is zero), so it also does not arise from cumulatively integrating a density function.
In general, a person may propose a probability space by specifying any set \Omega, a collection of subsets of \Omega which supports taking countable unions, intersections, and complements, and a function \mathbb{P} defined on that collection of subsets. We require that certain properties are satisfied:
Definition (Probability space: the general definition) Suppose that \Omega is a set and \mathbb{P} is a function defined on a collection of subsets of \Omega(called events). If
\mathbb{P}(\Omega) = 1,
\mathbb{P}(E) \geq 0 for all events E, and
\mathbb{P}(E_1 \cup E_2 \cup \cdots) = \mathbb{P}(E_1) + \mathbb{P}(E_2) + \cdots for all sequences of pairwise disjoint events E_1, E_2, \ldots,
then we say that \mathbb{P} is a probability measure on \Omega, and that \Omega together with the given collection of events and the measure \mathbb{P} is a probability space.