Dual Averaging¶
Dual averaging is not a sampling method, but is a method of adaptivly tuning the Hamintonian Monte Carlo (HMC) step size and mass matrix for the particular log-posterior being sampled. Pint’s NUTS sampler uses dual averaging, but we have defined the dual averaging method separately so that in the future it can be used in HMC and other HMC-derived samplers.
-
class
pints.
DualAveragingAdaption
(num_warmup_steps, target_accept_prob, init_epsilon, init_inv_mass_matrix)[source]¶ Dual Averaging method to adaptively tune the step size and mass matrix of a Hamiltonian Monte Carlo (HMC) routine (as used e.g. in NUTS).
Implements a Dual Averaging scheme to adapt the step size
epsilon
, as per [1] (section 3.2.1 and algorithm 6), and estimates the inverse mass matrix using the sample covariance of the accepted parameter, as suggested in [2]. The mass matrix can either be given as a fully dense matrix represented as a 2D ndarray, or a diagonal matrix represented as a 1D ndarray.During iteration
m
of adaption, the parameterepsilon
is updated using the following scheme:\[\bar{H} = (1 - 1/(m + t_0)) \bar{H} + 1/(m + t_0)(\delta_t - \delta) \text{log} \epsilon = \mu - \sqrt{m}/\gamma \bar{H}\]where $delta_t$ is the target acceptence probability set by the user and $delta$ is the acceptence probability reported by the algorithm (i.e. that is provided as an argument to the
step()
method.The adaption is done using the same windowing method employed by Stan, which is done over three or more windows:
- initial window: epsilon is adapted using dual averaging (no adaption of the mass matrix).
- base window: epsilon continues to be adapted using dual averaging; this adaption completes at the end of this window. The inverse mass matrix is adaped at the end of the window by taking the sample covariance of all parameter points within this window.
- terminal window: epsilon is adapted using dual averaging, holding the mass matrix constant, and completes at the end of the window.
If the number of warmup steps requested by the user is greater than the sum of these three windows, then additional base windows are added, each with a size double that of the previous window.
References
[1] Hoffman, M. D., & Gelman, A. (2014). The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(1), 1593-1623. [2] Betancourt, M. (2018). A Conceptual Introduction to Hamiltonian Monte Carlo. https://arxiv.org/abs/1701.02434. Parameters: - num_warmup_steps –
???
- target_accept_prob –
???
- init_epsilon – An initial guess for the step size epsilon
- init_inv_mass_matrix – An initial guess for the inverse adapted mass matrix
-
add_parameter_sample
(sample)[source]¶ Store the parameter samples to calculate a sample covariance matrix later on.