class nbla::Adamax

template<typename T>
class Adamax : public nbla::Solver

Adamax solver defined as.

\[ \theta_{t+1} \leftarrow \theta_t - \alpha \frac{\sqrt{1 - \beta_2^t}}{1 - \beta_1^t} \frac{m_t}{u_t + \epsilon} \]
where \(\theta_t\) is a gradient of a parameter, \(m_t\) and \(u_t\) are moving average and exponentially weighted infinity norm of a sequence of gradients \(t=0,...,t\).

See also

See the paper linked below for more details. Kingma and Ba, Adam: A Method for Stochastic Optimization. https://arxiv.org/abs/1412.6980

Param alpha:

\(\alpha\) Learning rate.

Param beta1:

\(\beta_1\) Decay rate of moving mean.

Param beta2:

\(\beta_2\) Decay rate of exponentially weighted infinity norm.

Param eps:

\(\epsilon\) Tiny factor for avoiding 0-division.

Public Functions

inline virtual float learning_rate()

Set learning rate.