class nbla::AdamW

template<typename T> class AdamW : public nbla::Solver 

AdamW solver defined as.

\[ \theta_{t+1} \leftarrow \theta - \alpha \frac{\sqrt{1 - \beta_2^t}}{1 - \beta_1^t} \frac{m_t}{\sqrt{v_t} + \epsilon} - \eta_t\lambda w_{t-1} \]

where \(\theta_t\) is a gradient of a parameter, \(m_t\) and \(v_t\) are moving average and 0-mean variance of a sequence of gradients \(t=0,...,t\).

See also

See the paper linked below for more details. Kingma and Ba, AdamW: A Method for Stochastic Optimization. https://arxiv.org/abs/1412.6980

Public Functions