Dual Averaging

Dual averaging is not a sampling method, but is a method of adaptivly tuning the Hamintonian Monte Carlo (HMC) step size and mass matrix for the particular log-posterior being sampled. Pint’s NUTS sampler uses dual averaging, but we have defined the dual averaging method separately so that in the future it can be used in HMC and other HMC-derived samplers.

class pints.DualAveragingAdaption(num_warmup_steps, target_accept_prob, init_epsilon, init_inv_mass_matrix)[source]

Dual Averaging method to adaptively tune the step size and mass matrix of a Hamiltonian Monte Carlo (HMC) routine (as used e.g. in NUTS).

Implements a Dual Averaging scheme to adapt the step size epsilon, as per [1] (section 3.2.1 and algorithm 6), and estimates the inverse mass matrix using the sample covariance of the accepted parameter, as suggested in [2]. The mass matrix can either be given as a fully dense matrix represented as a 2D ndarray, or a diagonal matrix represented as a 1D ndarray.

During iteration m of adaption, the parameter epsilon is updated using the following scheme:

\[\bar{H} = (1 - 1/(m + t_0)) \bar{H} + 1/(m + t_0)(\delta_t - \delta) \text{log} \epsilon = \mu - \sqrt{m}/\gamma \bar{H}\]

where $delta_t$ is the target acceptence probability set by the user and $delta$ is the acceptence probability reported by the algorithm (i.e. that is provided as an argument to the step() method.

The adaption is done using the same windowing method employed by Stan, which is done over three or more windows:

  • initial window: epsilon is adapted using dual averaging (no adaption of the mass matrix).
  • base window: epsilon continues to be adapted using dual averaging; this adaption completes at the end of this window. The inverse mass matrix is adaped at the end of the window by taking the sample covariance of all parameter points within this window.
  • terminal window: epsilon is adapted using dual averaging, holding the mass matrix constant, and completes at the end of the window.

If the number of warmup steps requested by the user is greater than the sum of these three windows, then additional base windows are added, each with a size double that of the previous window.

References

[1]Hoffman, M. D., & Gelman, A. (2014). The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(1), 1593-1623.
[2]Betancourt, M. (2018). A Conceptual Introduction to Hamiltonian Monte Carlo. https://arxiv.org/abs/1701.02434.
Parameters:
  • num_warmup_steps

    ???

  • target_accept_prob

    ???

  • init_epsilon – An initial guess for the step size epsilon
  • init_inv_mass_matrix – An initial guess for the inverse adapted mass matrix
adapt_epsilon(accept_prob)[source]

Perform a single step of the dual averaging scheme.

add_parameter_sample(sample)[source]

Store the parameter samples to calculate a sample covariance matrix later on.

calculate_sample_variance()[source]

Return the sample covariance of all the stored samples.

final_epsilon()[source]

Perform the final step of the dual averaging scheme.

get_epsilon()[source]

return the step size

get_mass_matrix()[source]

Return the mass matrix.

init_adapt_epsilon(epsilon)[source]

Start a new dual averaging adaption for epsilon.

init_sample_covariance(size)[source]

Start a new adaption window for the inverse mass matrix.

set_inv_mass_matrix(inv_mass_matrix)[source]

We calculate the mass matrix whenever the inverse mass matrix is set.

step(x, accept_prob)[source]

Perform a single step of the adaption.

Parameters:
  • x (ndarray) – The next accepted mcmc parameter point.
  • accept_prob (float) – The acceptance probability of the last NUTS/HMC mcmc step.