class nbla::Adagrad
-
template<typename T>
class Adagrad : public nbla::Solver -
This is defined as
\[\begin{split} g_t \leftarrow \Delta w_t\\ G_t \leftarrow G_{t-1} + g_t^2\\ w_{t+1} \leftarrow w_t - \frac{\eta}{\sqrt{G_t} + \epsilon} g_t\\ \end{split}\]See also
See the paper linked below for more details. John Duchi, Elad Hazan and Yoram Singer (2011) Adaptive Subgradient Methods for Online Learning and Stochastic Optimization http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf
- Param lr:
\(\eta\) Learning rate.
- Param eps:
\(\epsilon\) Tiny factor for avoiding 0-division.
Public Functions
-
inline virtual float learning_rate()
Set learning rate.