Solvers

The nnabla.solvers.Solver class represents a stochastic gradient descent based optimizer for optimizing the parameters in the computation graph. NNabla provides various solvers listed below.

Solver

class nnabla.solvers.Solver

Solver interface class.

The same API provided in this class can be used to implement various types of solvers.

Example:

# Network building comes above
import nnabla.solvers as S
solver = S.Sgd(lr=1e-3)
solver.set_parameters(nn.get_parameters())

for itr in range(num_itr):
    x.d = ... # set data
    t.d = ... # set label
    loss.forward()
    solver.zero_grad()  # All gradient buffer being 0
    loss.backward()
    solver.weight_decay(decay_rate)  # Apply weight decay
    solver.clip_grad_by_norm(clip_norm)  # Apply clip grad by norm
    solver.update()  # updating parameters

Note

All solvers provided by NNabla belong to an inherited class of Solver . A solver is never instantiated by this class itself.

check_inf_grad(self, pre_hook=None, post_hook=None): Check if there is any inf on the gradients which were setup.

check_inf_or_nan_grad(self, pre_hook=None, post_hook=None): Check if there is any inf or nan on the gradients which were setup.

check_nan_grad(self, pre_hook=None, post_hook=None): Check if there is any nan on the gradients which were setup.

clear_parameters(self): Clear all registered parameters and states.

clip_grad_by_norm(self, float clip_norm, pre_hook=None, post_hook=None)

Clip gradients by norm. When called, the gradient will be clipped by the given norm.

Parameters:: clip_norm (float) – The value of clipping norm.

get_parameters(self): Get all registered parameters

get_states(self): Get all states

info

object

Type:: info

learning_rate(self): Get the learning rate.

load_states(self, path)

Load solver states.

Parameters:: path – path to the state file to be loaded.

name: Get the name of the solver.

remove_parameters(self, vector[string] keys): Remove previously registered parameters, specified by a vector of its keys.

save_states(self, path)

Save solver states.

Parameters:: path – path or file object

scale_grad(self, scale, pre_hook=None, post_hook=None): Rescale gradient

set_learning_rate(self, learning_rate): Set the learning rate.

set_parameters(self, param_dict, bool reset=True, bool retain_state=False)

Set parameters by dictionary of keys and parameter Variables.

Parameters:

param_dict (dict) – key:string, value: Variable.
reset (bool) – If true, clear all parameters before setting parameters. If false, parameters are overwritten or added (if it’s new).
retain_state (bool) – The value is only considered if reset is false. If true and a key already exists (overwriting), a state (such as momentum) associated with the key will be kept if the shape of the parameter and that of the new param match.

set_states(self, states): Set states. Call set_parameters to initialize states of a solver first, otherwise this method raise an value error.

set_states_from_protobuf(self, optimizer_proto)

Set states to the solver from the protobuf file.

Internally used helper method.

set_states_to_protobuf(self, optimizer)

Set states to the protobuf file from the solver.

Internally used helper method.

setup(self, params): Deprecated. Call set_parameters with param_dict .

update(self, update_pre_hook=None, update_post_hook=None)

When this function is called, parameter values are updated using the gradients accumulated in backpropagation, stored in the grad field of the parameter Variable s. Update rules are implemented in the C++ core, in derived classes of Solver. The updated parameter values will be stored into the data field of the parameter Variable s.

Parameters:

update_pre_hook (callable) – This callable object is called immediately before each update of parameters. The default is None.
update_post_hook (callable) – This callable object is called immediately after each update of parameters. The default is None.

weight_decay(self, float decay_rate, pre_hook=None, post_hook=None)

Apply weight decay to gradients.

When called, the gradient weight will be decayed by a rate of the current parameter value.

Parameters:: decay_rate (float) – The coefficient of weight decay.

Note

In solvers which weight_decay_is_fused() returns true, the weight decay is not immediately performed when called. Instead, the specified decay_rate is stored in the solver instance, and lazily evaluated when update() method is called. The stored decay rate will expire after update() and revert to 0 or a default value specified at initialization of Solver class (if exists, ex. SgdW). The definition of weight decay operation depends on each of solver classes. Please refer to the documentation of each solver class.

weight_decay_is_fused(self)

Returns a boolean which represents whether weight decay is fused into update(), hence lazily evaluated.

See weight_decay() for more details.

zero_grad(self): Initialize gradients of all registered parameter by zero.

List of solvers

nnabla.solvers.Sgd(lr=0.001)

Stochastic gradient descent (SGD) optimizer.

\[w_{t+1} \leftarrow w_t - \eta \Delta w_t\]

Parameters:

lr (float) – Learning rate (\(\eta\)).

Returns:

An instance of Solver class.: See Solver API guide for details.

Return type: