ProbabilityExpectation and Variance
We often want to distill a random variable's distribution down to a single number. For example, consider the height of an individual selected uniformly at random from a given population. This is a random variable, and communicating its distribution would involve communicating the heights of every person in the population. However, we can summarize the distribution by reporting an average height: we add up the heights of the people in the population and
If the random individual is selected according to some non-uniform probability distribution on the population, then it makes sense to calculate a
Definition
The expectation (or mean ) of a random variable is the probability-weighted average of :
For example, the expected number of heads in two fair coin flips is
There are two common ways of interpreting expected value.
- The expectation may be thought of as the value of a random game with payout . According to this interpretation, you should be willing to pay anything less than $1 to play the game where you get a dollar for each head in two fair coin flips. For more than $1 you should be unwilling to play the game, and at $1 you should be indifferent.
- The second way of thinking about expected value is as a long-run average. If you play the dollar-per-head two-coin-flip game a very large number of times, then your average payout per play is very likely to be close to $1.
We can test this second interpretation out:
Exercise
Use the expression sum(randint(0,2) + randint(0,2) for _ in range(10**6))/10**6
mean(rand(0:1) + rand(0:1) for k=1:10^6)
to play the dollar-per-head two-coin-flip game a million times and calculate the average payout in those million runs.
How close to 1 is the result typically? Choose the best answer.
from numpy.random import randint sum(randint(0,2) + randint(0,2) for _ in range(10**6))/10**6
mean(rand(0:1) + rand(0:1) for k=1:10^6)
Solution. Running the code several times, we see that the error is seldom as large as 0.01 or as small as 0.0000001. So the correct answer choice is the third one.
We will see that this second interpretation is actually a theorem in probability, called the law of large numbers. In the meantime, however, this interpretation gives us a useful tool for investigation: if a random variable is easy to simulate, then we can sample from it many times and calculate the average of the resulting samples. This will not give us the expected value exactly, but we can get as close as desired by using sufficiently many samples. This is called the Monte Carlo method of approximating the expectation of a random variable.
Exercise
Use a Monte Carlo simulation to estimate the expectation of , where and are independent die rolls.
import numpy as np
Solution. sum(randint(1,7)/randint(1,7) for i in range(10_000_000))/10_000_000
returns approximately 1.43. The actual mean is sum(x/y for x in range(1,7) for y in range(1,7))/36
, which is . So we can say that the Monte Carlo result with 10 million trials is quite close to the correct value.
Solution. mean(rand(1:6)/rand(1:6) for i=1:10^8)
returns approximately 1.43. The actual mean is mean(x/y for x=1:6, y=1:6)
, which is
The following exercise confirms an intuitive fact about expectation: a random variable which is always larger than another has a larger mean. We will state this idea with "larger" replaced by its weak version "at least as large as".
Exercise
Explain why
Solution. If
Expectation and distribution
Although the definition
Theorem
The expectation of a discrete random variable
The idea is that the given formula is just a rearrangement of the terms in the definition of expectation. Let's begin by considering an example. Suppose
We can group the first two terms together to get
This expression is the one we would get if we wrote out
Therefore, we can see that the two sides are the same.
Let's write this idea down in general form. We group terms on the right-hand side in the formula
Then we can replace
Since
as desired.
Exercise
The expectation of a random variable need not be finite or even well-defined. Show that the expectation of the random variable which assigns a probability mass of
Consider a random variable
Solution. We multiply the probability mass at each point
For the second distribution, the positive and negative parts of the are both infinite for the same reason. Therefore, the sum does not make sense and the mean is therefore not well-defined.
We can also work out the expectation of a function of two
Theorem
If
Proof. We use the same idea we used in the proof of the expectation formula: group terms in the definition of expectation according the value of the pair
We can use this theorem to show that expectation distributes across multiplication for independent random variables:
Exercise (independence product formula)
Show that
Solution. Using the definition of independence, we have
as desired.
Variance
The expectation of a random variable gives us some coarse information about where on the number line the random variable's probability mass is located. The variance gives us some information about how widely the probability mass is spread around its mean. A random variable whose distribution is highly concentrated about its mean will have a small variance, and a random variable which is likely to be very far from its mean will have a large variance. We define the variance of a random variable
Definition (Variance)
The variance of a random variable
The standard deviation
Exercise
Consider a random variable which is obtained by making a selection from the list
uniformly at random. Make a rough estimate of the mean and variance of this random variable just from looking at the number line. Then use Python to calculate the mean and variance exactly to see how close your estimates were.
import numpy as np
Solution. My estimate of the mean and variance are
Calculating the mean exactly using m = mean([0.245, 0.874, 0.998, 0.567, 0.482])
, we get a value of 0.6332. Calculating the variance exactly using mean([(a-m)^2 for a in A])
(where
Exercise
Consider the following game. We begin by picking a number in
Tips: rand(0:1000)/1000
returns a sample from the desired distribution. Also, it's a good idea to wrap a single run of the game into a zero-argument function.
import numpy as np
Solution. We define a function run
which plays the game once, and we record the result of the game over a million runs. We estimate the mean as the mean of the resulting list, and we estimate the variance using
import numpy as np
def runs_till_over():
s = 0
ctr = 0
while s < 1.0:
s += np.random.randint(0,1001)/1000
return ctr
A = [runs_till_over() for _ in range(1_000_000)]
μ = np.mean(A)
var = np.mean((a-μ)**2 for a in A)
μ,var
function runs_till_over()
s = 0
ctr = 0
while s < 1.0
s += rand(1:1000)/1000
end
ctr
end
A = [runs_till_over() for _ in 1:1_000_000]
μ = mean(A)
var = mean((a-μ)^2 for a in A)
μ,var
We get a mean of about
We can use linearity of expectation to rewrite the formula for variance in a simpler form:
We can use this formula to show how variance interacts with linear operations:
Exercise
Show that variance satisfies the properties
if
Proof. The first part of the statement follows easily from linearity of expectation
Since
Rearranging and using linearity of expectation, we get
The desired result follows because if
Exercise
Consider the distribution which assigns a probability mass of
Show that this distribution has a finite mean but not a finite variance.
Solution. Let
Since the sum on the right converges by the
does not converge because of the harmonic series term. Therefore