# Bayesian Inference and Graphical ModelsProbabilistic Programming

Using MCMC to do inference can be hard work, as we saw in the previous section. However, if we take the Bayesian approach where and are random variables just like , then the only thing the user really needs to specify are the priors, the structure of the model, and the sampler (e.g., in the example above, we used a Gibbs sampler with a bit-switching proposal distribution). The rest is calculation, which could in principle be handled automatically by a probability-aware programming framework.

**Probabilistic programming** systems seek to automate Bayesian inference by allowing the user to specify model structure in the form of a *program*, choose samplers, and let the computer handle the rest.

**Example**

Suppose we have a 6-sided die. Letting represent the outcome of a roll, suppose and for . If we assume that has a beta prior, then after observing many rolls we can manually determine that the posterior distribution of given the data is also beta. While finding a closed form solution of this posterior is simple for this problem, this may not be the case for more complicated problems. Use probabilistic programming and MCMC to obtain samples from the posterior.

*Solution.* We begin by specifying the model from which the data was drawn:

using Turing, MCMCChains, Distributions, StatsPlots, CSV @model die_model(observations) = begin # Set prior distribution for p (probability of rolling 1) p ~ Beta(2,5) # Probability of rolling one of 2 - 6 q = (1-p)/5 # Define how samples are obtained for i in 1:length(observations) observations[i] ~ Categorical([p, q, q, q, q, q]) end end

Some points about how this works:

The model definition works essentially like a function definition, with the

`@model`

macro in place of the`function`

keyword. The argument (in this case,`observations`

) should be the*observed*data that will be provided for purposes of performing inference.Hidden random variables appear before a tilde (

`~`

), which is a common probability syntax for "is distributed according to". These random variables will be tracked by the framework, and you'll get information about their posterior distributions once you provide the observed data.Other than these distinctions, the program is normal Julia code. It describes the procedure we would use to sample from the model's prior distribution.

Note that above we have imposed a prior of for , encoding a belief that the die is biased toward 1.

After defining the model, we need to describe a method for proposing steps in the Markv chain we'll be using to sample from the posterior distribution. In practice, using a straightforward proposal distribution for the Metropolis-Hastings updates can be very inefficient because the proposal it suggests often go in directions of much smaller probability density. Hamiltonian Monte Carlo differentiates (usually

Let's use a Hamiltonian Monte Carlo sampler for this problem:

# Use HMC sampler to obtain posterior samples n_posterior_observations = 1000 ϵ = 0.05 # Leapfrog step size τ = 10 # Number of leapfrog iterations data = [4,6,4,5,1,4,1,5,5,5,6,5,3,5,2,1,1,1,1,4, 2,1,1,1,3,6,4,2,1,5,1,3,4,2,1,3,3,3,2,3, 1,1,1,1,2,1,2,3,6,4,2,1,3,6,2,2,5,6,1,4, 6,4,4,2,4,4,6,1,2,1,5,3,1,3,6,3,1,3,6,1, 3,5,6,6,4,4,4,2,2,3,2,5,1,3,1,5,1,5,6,5] # Obtain posterior samples chain = sample(die_model(data), HMC(ϵ, τ), n_posterior_observations, progress=false)

The variable `data`

above is a vector containing the observed data.

We can now obtain summary statistics of the parameter , such as mean and standard deviation, and plot a histogram of the posterior samples with the following code:

# Extract summary of p parameter and plot histogram p_summary = chain[:p] plot(p_summary, seriestype = :histogram)

which produces the following histogram:

The variable `p_summary`

above is an `MCMCChains`

object and contains the samples of sampled from the posterior. To obtain the samples, we can run

Array(p_summary)

We can now use this to construct confidence intervals for . For example, a 95% confidence interval is given by

quantile(Array(p_summary), [0.025, 0.975])

Below we consider a slightly more complex HMM problem.

**Example**

Consider an HMM with hidden variables and:

where is defined by:

Suppose we have observed the variables available here. Estimate and using probabilistic programming.

*Solution.* We will assume a uniform prior on and . We can define the model as follows:

using Turing, MCMCChains, Distributions @model HMM(x) = begin n = length(x) z = zeros(Int64, length(x)) # hidden states # Define priors p₁ ~ Uniform(0,1) # Transition probability 1→1 p₂ ~ Uniform(0,1) # Transition probability 2→1 # Define transition matrix P = [p₁ 1-p₁; p₂ 1-p₂] # Initialize samples z[1] ~ Categorical([0.5, 0.5]) # Start z at either 1 or 2 x[1] ~ Normal(z[1], 0.1) for i in 2:n # Get next hidden state z[i] ~ Categorical(P[z[i-1],:]) x[i] ~ Normal(z[i], 0.1) end return (p₁, p₂) end

We will now use a Gibbs sampler to obtain samples from the posterior of `p₁`

and `p₂`

. HMC is only appropriate for continuous random variables; other samplers are needed for discrete random variables, like the 's in this case. We'll use one called **particle Gibbs**, which keeps track of several values for each random variable at the same time. (Each of these values is conceived as a particle; hence the name.) We will use a Gibbs sampler to combine HMC and Particle Gibbs.

# Load observed data data = [0.97, 0.94, 2.02, 1.07, 1.93, 0.93, 2.0, 2.0, 2.07, 0.92, 2.18, 1.8, 1.94, 1.9, 0.93] # Use Gibbs with HMC and PG to obtain posterior samples n_posterior_observations = 1000 ϵ = 0.1 # Leapfrog step size τ = 10 # Number of leapfrog iterations hmc = HMC(ϵ, τ, :p₁, :p₂) particle_gibbs = Turing.PG(20, :z) G = Gibbs(hmc, particle_gibbs) # Obtain posterior samples chain = sample(HMM(data), G, n_posterior_observations)

Histograms of the and marginal posteriors are given below.

The actual values of and used to generate the data were 0.25 and 0.55, respectively. So these figures look pretty plausible, at least in the sense that their means are near the correct values.

Finally, let's look at the hidden Markov model from the previous section:

using Turing, OffsetArrays, StatsPlots @model HMM(x) = begin n = length(x) z = tzeros(Int64, n) q ~ Uniform(0, 1) σ² ~ InverseGamma(1e-3, 1e-3) P = OffsetArray([ q 1-q 1-q q ], 0:1, 0:1) z[1] ~ DiscreteNonParametric(0:1, [0.5, 0.5]) x[1] ~ Normal(z[1], √(σ²)) for i in 2:n z[i] ~ DiscreteNonParametric(0:1, collect(P[z[i-1],:])) x[i] ~ Normal(z[i], √(σ²)) end end

As in the previous example, we'll use a Gibbs sampler to combine HMC and particle Gibbs sampling:

# choose parameters for samplers # 0.05 is a step size ϵ, while 10 is n_leapfrog, which we won't get into hmc = HMC(0.05, 10, :q, :σ²) # 20 particles pg = PG(20, :z) G = Gibbs(hmc, pg) X = [-0.12, 0.11, -0.58, -1.21, 1.06, -0.09, -0.32, -0.09, -0.39, 2.01, 0.72, 1.61, -0.08, 0.99, 0.55, 1.28, 0.75, 1.09, 0.91, 0.45, 0.13, -0.61, 0.18, -0.03, 1.36, 0.12, 0.06, -0.96, 1.38, -1.86, 0.11, -1.17, -1.94, -0.3, -0.01, 0.82, -0.34, 0.55, 0.53, -1.15, -0.67, 0.47, 0.68, 1.84, 2.39, -0.05, -0.14, -0.03, 2.27, 0.36, 0.31, 0.46, 0.72, 1.34, 1.45, -0.28, -0.06, 0.71, -0.5, -1.01, 0.31, -0.07, 0.67, 2.55, 1.41, 0.35, 0.89, 1.04, 0.81, 1.08, 1.45, 0.23, 0.06, -0.05, 1.88, 1.11, 1.25, 0.35, 0.84, 1.96, 1.52, 1.34, 1.43, 1.9, 0.01, 1.08, 1.44, 1.22, 1.7, 1.36, 1.62, 1.49, 2.42, 1.11, -0.56, 0.71, -0.35, 0.26, 0.48, 0.42] chains = sample(HMM(X), G, 50) plot(chains[:q])

**Congraulations!** You've finished the Data Gymnasia Bayesian Inference and Graphical Models course.