Adam (adaptive moment estimation)¶
- class pints.Adam(x0, sigma0=0.1, boundaries=None)[source]¶
Adam optimiser (adaptive moment estimation), as described in [1] (see Algorithm 1).
This method is a variation on gradient descent that maintains two “moments”, allowing it to overshoot and go against the gradient for a short time. This property can make it more robust against noisy gradients.
Pseudo-code is given below. Here the value of the j-th parameter at iteration i is given as
p_j[i]
and the corresponding derivative is denotedg_j[i]
:m_j[i] = beta1 * m_j[i - 1] + (1 - beta1) * g_j[i] v_j[i] = beta2 * v_j[i - 1] + (1 - beta2) * g_j[i]**2 m_j' = m_j[i] / (1 - beta1**(1 + i)) v_j' = v_j[i] / (1 - beta2**(1 + i)) p_j[i] = p_j[i - 1] - alpha * m_j' / (sqrt(v_j') + eps)
The initial values of the moments are
m_j[0] = v_j[0] = 0
, after which they decay with ratesbeta1
andbeta2
. In this implementation,beta1 = 0.9
andbeta2 = 0.999
.The terms
m_j'
andv_j'
are “initialisation bias corrected” versions ofm_j
andv_j
(see section 3 of the paper).The parameter
alpha
is a step size, which is set asmin(sigma0)
in this implementation.Finally,
eps
is a small constant used to avoid division by zero, set toeps = 1e-8
in this implementation.This is an unbounded method: Any
boundaries
will be ignored.References
- ask()[source]¶
See
Optimiser.ask()
.
- f_best()[source]¶
See
Optimiser.f_best()
.
- name()[source]¶
See
Optimiser.name()
.
- running()[source]¶
See
Optimiser.running()
.
- set_hyper_parameters(x)¶
Sets the hyper-parameters for the method with the given vector of values (see
TunableMethod
).- Parameters:
x – An array of length
n_hyper_parameters
used to set the hyper-parameters.
- stop()¶
Checks if this method has run into trouble and should terminate. Returns
False
if everything’s fine, or a short message (e.g. “Ill-conditioned matrix.”) if the method should terminate.
- tell(reply)[source]¶
See
Optimiser.tell()
.
- x_best()[source]¶
See
Optimiser.x_best()
.