## Representing Sampling Distributions Using Markov Chain Samplers

For more complex probability distributions, you might need more advanced methods for generating samples than the methods described in Common Pseudorandom Number Generation Methods. Such distributions arise, for example, in Bayesian data analysis and in the large combinatorial problems of Markov chain Monte Carlo (MCMC) simulations. An alternative is to construct a Markov chain with a stationary distribution equal to the target sampling distribution, using the states of the chain to generate random numbers after an initial burn-in period in which the state distribution converges to the target.

### Using the Metropolis-Hastings Algorithm

The Metropolis-Hastings algorithm draws samples from a distribution that is only known up to a constant. Random numbers are generated from a distribution with a probability density function that is equal to or proportional to a proposal function.

To generate random numbers:

Assume an initial value

*x*(*t*).Draw a sample,

*y*(*t*), from a proposal distribution*q*(*y*|*x*(*t*)).Accept

*y*(*t*) as the next sample*x*(*t*+ 1) with probability*r*(*x*(*t*),*y*(*t*)), and keep*x*(*t*) as the next sample*x*(*t*+ 1) with probability 1 –*r*(*x*(*t*),*y*(*t*)), where:$$r(x,y)\text{\hspace{0.17em}}\text{=}\text{}\text{\hspace{0.17em}}min\left\{\frac{f(y)}{f(x)}\frac{q(x|y)}{q(y|x)},\text{}\text{\hspace{0.17em}}1\right\}$$

Increment

*t*→*t*+ 1, and repeat steps 2 and 3 until you get the desired number of samples.

Generate random numbers using the Metropolis-Hastings method
with the `mhsample`

function. To
produce quality samples efficiently with the Metropolis-Hastings algorithm,
it is crucial to select a good proposal distribution. If it is difficult
to find an efficient proposal distribution, use slice sampling (`slicesample`

) or Hamiltonian Monte Carlo
(`hmcSampler`

) instead.

### Using Slice Sampling

In instances where it is difficult to find an efficient Metropolis-Hastings
proposal distribution, the slice sampling algorithm does not require
an explicit specification. The slice sampling algorithm draws samples
from the region under the density function using a sequence of vertical
and horizontal steps. First, it selects a height at random from 0
to the density function *f *(*x*).
Then, it selects a new *x* value at random by sampling
from the horizontal “slice” of the density above the
selected height. A similar slice sampling algorithm is used for a
multivariate distribution.

If a function *f*(*x*)
proportional to the density function is given, then do the following
to generate random numbers:

Assume an initial value

*x*(*t*) within the domain of*f*(*x*).Draw a real value

*y*uniformly from (0,*f*(*x*(*t*))), thereby defining a horizontal “slice” as*S*= {*x*:*y*<*f*(*x*)}.Find an interval

*I*= (*L*,*R*) around*x*(*t*) that contains all, or much of the “slice”*S*.Draw the new point

*x*(*t*+ 1) within this interval.Increment

*t*→*t*+ 1 and repeat steps 2 through 4 until you get the desired number of samples.

Slice sampling can generate random numbers from a distribution
with an arbitrary form of the density function, provided that an efficient
numerical procedure is available to find the interval *I* =
(*L*,*R*), which is the “slice”
of the density.

Generate random numbers using the slice sampling method with
the `slicesample`

function.

### Using Hamiltonian Monte Carlo

Metropolis-Hastings and slice sampling can produce MCMC chains that mix slowly and take a long time to converge to the stationary distribution, especially in medium-dimensional and high-dimensional problems. Use the gradient-based Hamiltonian Monte Carlo (HMC) sampler to speed up sampling in these situations.

To use HMC sampling, you must specify log *f(x)* (up
to an additive constant) and its gradient. You can use a numerical
gradient, but this leads to slower sampling. All sampling variables
must be unconstrained, meaning that log *f(x)* and
its gradient are well-defined for all real *x*. To
sample constrained variables, transform these variables into unconstrained
ones before using the HMC sampler.

The HMC sampling algorithm introduces a random “momentum
vector” *z* and defines a joint density of *z* and
the “position vector” *x* as *P(x,z)
= f(x)g(z)*. The goal is to sample from this joint distribution
and then to ignore the values of *z* — the
marginal distribution of *x* has the desired density *f(x)*.

The HMC algorithm assigns a Gaussian density with covariance
matrix *M* (the “mass matrix”) to *z*:

$$g(z)\propto \mathrm{exp}\left(-\frac{1}{2}{z}^{T}{M}^{-1}z\right)$$

Then, it defines an “energy function” as

$$E(x,z)=-\mathrm{log}f(x)+\frac{1}{2}{z}^{T}{M}^{-1}z=U(x)+K(z)$$

with *U(x)
= * – log *f(x)* the “potential
energy” and *K(z) = z ^{T}M^{-1}z/2* the
“kinetic energy”. The joint density is given by

*P(x,z) ∝*exp{

*-E(x,z)*}.

To generate random samples, the HMC algorithm:

Assumes an initial value

*x*of the position vector.Generates a sample of the momentum vector:

*z ∼ g(z)*.Evolves the state

*(x, z)*for some amount of fictitious time*τ*to a new state*(x’,z’)*using the “equations of motion”:$$\frac{dz}{d\tau}=-\frac{\partial U}{\partial x}$$

$$\frac{dx}{d\tau}=\frac{\partial K}{\partial z}$$

If the equations of motion could be solved exactly, the energy (and hence the density) would remain constant:

*E(x,z) = E(x’,z’)*. In practice, the equations of motions must be solved numerically (usually using so-called leapfrog integration) and the energy is not conserved.Accepts

*x’*as the next sample with probability*p*_{acc}= min(1, exp{*E(x,z) – E(x’,z’)*}), and keeps*x*as the next sample with probability 1 –*p*_{acc}.Repeats steps 2 through 4 until it has generated the desired number of samples.

To use HMC sampling, create a sampler using the `hmcSampler`

function. After creating a sampler,
you can compute MAP (maximum-a-posteriori) point estimates, tune the
sampler, draw samples, and check convergence diagnostics. For an example
of this workflow, see Bayesian Linear Regression Using Hamiltonian Monte Carlo.