Solvers

The nnabla.solvers.Solver class represents a stochastic gradient descent based optimizer for optimizing the parameters in the computation graph. NNabla provides various solvers listed below.

Solver

class nnabla.solvers.Solver

Solver interface class.

The same API provided in this class can be used to implement various types of solvers.

Example:

# Network building comes above
import nnabla.solvers as S
solver = S.Sgd(lr=1e-3)
solver.set_parameters(nn.get_parameters())

for itr in range(num_itr):
    x.d = ... # set data
    t.d = ... # set label
    loss.forward()
    solver.zero_grad()  # All gradient buffer being 0
    loss.backward()
    solver.weight_decay(decay_rate)  # Apply weight decay
    solver.update()  # updating parameters

Note

All solvers provided by NNabla belong to an inherited class of Solver . A solver is never instantiated by this class itself.

clear_parameters(self)

Clear all registered parameters and states.

info

info – object

learning_rate(self)

Get the learning rate.

name

Get the name of the solver.

remove_parameters(self, vector[string] keys)

Remove previously registered parameters, specified by a vector of its keys.

set_learning_rate(self, learning_rate)

Set the learning rate.

set_parameters(self, param_dict, bool reset=True, bool retain_state=False)

Set parameters by dictionary of keys and parameter Variables.

Parameters:
  • param_dict (dict) – key:string, value: Variable.
  • reset (bool) – If true, clear all parameters before setting parameters. If false, parameters are overwritten or added (if it’s new).
  • retain_state (bool) – The value is only considered if reset is false. If true and a key already exists (overwriting), a state (such as momentum) associated with the key will be kept if the shape of the parameter and that of the new param match.
setup(self, params)

Deprecated. Call set_parameters with param_dict .

update(self)

When this function is called, parameter values are updated using the gradients accumulated in backpropagation, stored in the grad field of the parameter Variable s. Update rules are implemented in the C++ core, in derived classes of Solver. The updated parameter values will be stored into the data field of the parameter Variable s.

weight_decay(self, float decay_rate)

Apply weight decay to gradients. When called, the gradient weight will be decayed by a rate of the current parameter value.

Parameters:decay_rate (float) – The coefficient of weight decay.
zero_grad(self)

Initialize gradients of all registered parameter by zero.

List of solvers

nnabla.solvers.Sgd(lr)

Stochastic gradient descent (SGD) optimizer.

\[w_{t+1} \leftarrow w_t - \eta \Delta w_t\]
Parameters:lr (float) – Learning rate (\(\eta\)).
Returns:
An intance of Solver class.
See Solver API guide for details.
Return type:Solver

Note

You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by nnabla.set_default_context(ctx) or nnabla.context_scope(ctx). See API docs.

nnabla.solvers.Momentum(lr, momentum=0.9)

SGD with Momentum.

\[\begin{split}v_t &\leftarrow \gamma v_{t-1} + \eta \Delta w_t\\ w_{t+1} &\leftarrow w_t - v_t\end{split}\]
Parameters:
  • lr (float) – Learning rate (\(\eta\)).
  • momentum (float) – Decay rate of momentum (\(\gamma\)).
Returns:

An intance of Solver class.

See Solver API guide for details.

Return type:

Solver

References

Note

You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by nnabla.set_default_context(ctx) or nnabla.context_scope(ctx). See API docs.

nnabla.solvers.Nesterov(lr, momentum=0.9)

Nesterov Accellerated Gradient optimizer.

\[\begin{split}v_t &\leftarrow \gamma v_{t-1} - \eta \Delta w_t\\ w_{t+1} &\leftarrow w_t - \gamma v_{t-1} + \left(1 + \gamma \right) v_t\end{split}\]
Parameters:
  • lr (float) – Learning rate (\(\eta\)).
  • momentum (float) – Decay rate of momentum (\(\gamma\)).
Returns:

An intance of Solver class.

See Solver API guide for details.

Return type:

Solver

References

  • Yurii Nesterov. A method for unconstrained convex minimization problem with the rate of convergence \(o(1/k2)\).

lr (float): Learning rate. momentum (float): Decay rate of momentum.

Note

You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by nnabla.set_default_context(ctx) or nnabla.context_scope(ctx). See API docs.

nnabla.solvers.Adadelta(lr=1.0, decay=0.95, eps=1e-06)

AdaDelta optimizer.

\[\begin{split}g_t &\leftarrow \Delta w_t\\ v_t &\leftarrow - \frac{RMS \left[ v_t \right]_{t-1}} {RMS \left[ g \right]_t}g_t\\ w_{t+1} &\leftarrow w_t + \eta v_t\end{split}\]
Parameters:
  • lr (float) – Learning rate (\(\eta\)).
  • decay (float) – Decay rate (\(\gamma\)).
  • eps (float) – Small value for avoiding zero devision(\(\epsilon\)).
Returns:

An intance of Solver class.

See Solver API guide for details.

Return type:

Solver

References

Note

You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by nnabla.set_default_context(ctx) or nnabla.context_scope(ctx). See API docs.

nnabla.solvers.Adagrad(lr=0.01, eps=1e-08)

ADAGrad optimizer.

\[\begin{split}g_t &\leftarrow \Delta w_t\\ G_t &\leftarrow G_{t-1} + g_t^2\\ w_{t+1} &\leftarrow w_t - \frac{\eta}{\sqrt{G_t} + \epsilon} g_t\end{split}\]
Parameters:
  • lr (float) – Learning rate
  • eps (float) – Small value for avoiding zero devision.
  • lr – Learning rate (\(\eta\)).
  • eps – Small value for avoiding zero devision(\(\epsilon\)).
Returns:

An intance of Solver class.

See Solver API guide for details.

Return type:

Solver

References

Note

You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by nnabla.set_default_context(ctx) or nnabla.context_scope(ctx). See API docs.

nnabla.solvers.RMSprop(lr=0.001, decay=0.9, eps=1e-08)

RMSprop optimizeer (Geoffery Hinton).

\[\begin{split}g_t &\leftarrow \Delta w_t\\ v_t &\leftarrow \gamma v_{t-1} + \left(1 - \gamma \right) g_t^2\\ w_{t+1} &\leftarrow w_t - \eta \frac{g_t}{\sqrt{v_t} + \epsilon}\end{split}\]
Parameters:
  • lr (float) – Learning rate (\(\eta\)).
  • decay (float) – Decay rate (\(\gamma\)).
  • eps (float) – Small value for avoiding zero devision(\(\epsilon\)).
Returns:

An intance of Solver class.

See Solver API guide for details.

Return type:

Solver

References

Note

You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by nnabla.set_default_context(ctx) or nnabla.context_scope(ctx). See API docs.

nnabla.solvers.Adam(alpha=0.001, beta1=0.9, beta2=0.999, eps=1e-08)

ADAM optimizer.

\[\begin{split}m_t &\leftarrow \beta_1 m_{t-1} + (1 - \beta_1) g_t\\ v_t &\leftarrow \beta_2 v_{t-1} + (1 - \beta_2) g_t^2\\ w_{t+1} &\leftarrow w_t - \alpha \frac{\sqrt{1 - \beta_2^t}}{1 - \beta_1^t} \frac{m_t}{\sqrt{v_t} + \epsilon}\end{split}\]

where \(g_t\) denotes a gradient, and let \(m_0 \leftarrow 0\) and \(v_0 \leftarrow 0\).

Parameters:
  • alpha (float) – Step size (\(\alpha\))
  • beta1 (float) – Decay rate of first-order momentum (\(\beta_1\)).
  • beta2 (float) – Decay rate of second-order momentum (\(\beta_2\)).
  • eps (float) – Small value for avoiding zero devision (\(\epsilon\)).
Returns:

An intance of Solver class.

See Solver API guide for details.

Return type:

Solver

References

Note

You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by nnabla.set_default_context(ctx) or nnabla.context_scope(ctx). See API docs.

nnabla.solvers.Adamax(alpha=0.002, beta1=0.9, beta2=0.999, eps=1e-08)

ADAMAX Optimizer.

\[\begin{split}m_t &\leftarrow \beta_1 m_{t-1} + (1 - \beta_1) g_t\\ v_t &\leftarrow \max\left(\beta_2 v_{t-1}, |g_t|\right)\\ w_{t+1} &\leftarrow w_t - \alpha \frac{\sqrt{1 - \beta_2^t}}{1 - \beta_1^t} \frac{m_t}{v_t + \epsilon}\end{split}\]

where \(g_t\) denotes a gradient, and let \(m_0 \leftarrow 0\) and \(v_0 \leftarrow 0\), \(v_t\) is an exponentially weighted infinity norm of a sequence of gradients \(t=0,...,t\).

Parameters:
  • alpha (float) – Step size (\(\alpha\))
  • beta1 (float) – Decay rate of first-order momentum (\(\beta_1\)).
  • beta2 (float) – Decay rate of inf-order momentum (\(\beta_2\)).
  • eps (float) – Small value for avoiding zero devision (\(\epsilon\))n.
Returns:

An intance of Solver class.

See Solver API guide for details.

Return type:

Solver

References

Note

You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by nnabla.set_default_context(ctx) or nnabla.context_scope(ctx). See API docs.