class nbla::AdaBelief
-
template<typename T>
class AdaBelief : public nbla::Solver AdaBelief solver defined as.
\[ \theta_{t+1} \leftarrow \theta_t - \alpha \frac{\sqrt{1 - \beta_2^t}}{1 - \beta_1^t} \frac{m_t}{\sqrt{s_t + \epsilon} + \epsilon} \]where \(\theta_t\) is a gradient of a parameter, \(m_t\) and \(s_t\) are moving average and 0-mean variance of a sequence of gradients \(t=0,...,t\).See also
See the paper linked below for more details. Juntang Zhuang, et al. (2020). AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients. https://arxiv.org/pdf/2010.07468.pdf
- Param alpha:
\(\alpha\) Learning rate.
- Param beta1:
\(\beta_1\) Decay rate of moving mean.
- Param beta2:
\(\beta_2\) Decay rate of moving 0-mean variance.
- Param eps:
\(\epsilon\) Tiny factor for avoiding 0-division.
Public Functions
-
inline virtual float learning_rate()
Set learning rate.