Adam (adaptive moment estimation)¶
- class pints.Adam(x0, sigma0=0.1, boundaries=None)[source]¶
Adam optimiser (adaptive moment estimation), as described in [1] (see Algorithm 1).
This method is a variation on gradient descent that maintains two “moments”, allowing it to overshoot and go against the gradient for a short time. This property can make it more robust against noisy gradients.
Pseudo-code is given below. Here the value of the j-th parameter at iteration i is given as
p_j[i]and the corresponding derivative is denotedg_j[i]:m_j[i] = beta1 * m_j[i - 1] + (1 - beta1) * g_j[i] v_j[i] = beta2 * v_j[i - 1] + (1 - beta2) * g_j[i]**2 m_j' = m_j[i] / (1 - beta1**(1 + i)) v_j' = v_j[i] / (1 - beta2**(1 + i)) p_j[i] = p_j[i - 1] - alpha * m_j' / (sqrt(v_j') + eps)
The initial values of the moments are
m_j[0] = v_j[0] = 0, after which they decay with ratesbeta1andbeta2. In this implementation,beta1 = 0.9andbeta2 = 0.999.The terms
m_j'andv_j'are “initialisation bias corrected” versions ofm_jandv_j(see section 3 of the paper).The parameter
alphais a step size, which is set asmin(sigma0)in this implementation.Finally,
epsis a small constant used to avoid division by zero, set toeps = 1e-8in this implementation.This is an unbounded method: Any
boundarieswill be ignored.References
- ask()[source]¶
See
Optimiser.ask().
- f_best()[source]¶
See
Optimiser.f_best().
- classmethod name()[source]¶
See
Optimiser.name().
- running()[source]¶
See
Optimiser.running().
- set_hyper_parameters(x)¶
Sets the hyper-parameters for the method with the given vector of values (see
TunableMethod).- Parameters:
x – An array of length
n_hyper_parametersused to set the hyper-parameters.
- stop()¶
Checks if this method has run into trouble and should terminate. Returns
Falseif everything’s fine, or a short message (e.g. “Ill-conditioned matrix.”) if the method should terminate.
- tell(reply)[source]¶
See
Optimiser.tell().
- x_best()[source]¶
See
Optimiser.x_best().