class nbla::AMSGRAD
-
template<typename T>
class AMSGRAD : public nbla::Solver AMSGRAD solver defined as.
\[\begin{split} m_t \leftarrow \beta_1 m_{t-1} + (1 - \beta_1) g_t\\ v_t \leftarrow \beta_2 v_{t-1} + (1 - \beta_2) g_t^2\\ \hat{v_t} = \max(\hat{v_{t-1}}, v_t)\\ \theta_{t+1} \leftarrow \theta_t - \alpha \frac{m_t}{\sqrt{\hat{v_t}} + \epsilon} \end{split}\]where \(\theta_t\) is a gradient of a parameter, \(m_t\) and \(v_t\) are moving average and 0-mean variance of a sequence of gradients \(t=0,...,t\).See also
See the paper linked below for more details. Reddi et al. On the convergence of ADAM and beyond. https://openreview.net/pdf?id=ryQu7f-RZ
- Param alpha:
\(\alpha\) Learning rate.
- Param beta1:
\(\beta_1\) Decay rate of moving mean.
- Param beta2:
\(\beta_2\) Decay rate of moving 0-mean variance.
- Param eps:
\(\epsilon\) Small value for avoiding zero division(:math:`\epsilon`). Note this does not appear in the paper.
- Param bias_correction:
Apply bias correction to moving averages defined in ADAM. Note this does not appear in the paper.
Public Functions
-
inline virtual float learning_rate()
Set learning rate.