Experimental

class nnabla.experimental.mixed_precision_training.DynamicLossScalingUpdater(solver, loss, data_feeder=<function <lambda>>, scale=8.0, scaling_factor=2.0, N=2000, clear_buffer=True, accum_grad=1, weight_decay=None, comm=None, grads=[])[source]

Dynamic Loss Scaling Updater for the mixed precision training.

Parameters:
  • solver (nnabla.solvers.Solver) – Solver object. E.g., Momentum or Adam.
  • loss (nnabla.Variable) – Loss variable from which the forward and the backward is called.
  • data_feeder (callable object, function, or lambda) – Data feeder
  • scale (float) – Loss scale constant. This is dynamically changing during training.
  • scaling_factor (float) – Scaling factor for the dynamic loss scaling.
  • N (int) – Interval, the number of iterations in training for increasing loss scale by scaling_factor.
  • clear_buffer (bool) – Clears the no longer referenced variables during backpropagation to save memory.
  • accum_grad (int) – Number of accumulation of gradients. Update method of the solver is called after the accum_grad number of the forward and backward is called.
  • weight_decay (float) – Decay constant. Default is None, not applying the weight decay.
  • comm (nnabla.communicators.Communicator) – Communicator when to do distributed training. Defalt is None.
  • grads (list of nnabla._nd_array.NdArray) – The list of gradients to be exchanged when to do distributed training. Defalt is the empty list.
solver

nnabla.solvers.Solver – Solver object. E.g., Momentum or Adam.

loss

nnabla.Variable – Loss variable from which the forward and the backward is called.

data_feeder

callable object, function, lambda – Data feeder

scale

float – Loss scale constant. This is dynamically changing during training.

scaling_factor

float – Scaling factor for the dynamic loss scaling.

N

int – Interval, the number of iterations in training for increasing loss scale by scaling_factor.

clear_buffer

bool – Clears the no longer referenced variables during backpropagation to save memory.

accum_grad

int – Number of accumulation of gradients. Update method of the solver is called after the accum_grad number of the forward and backward is called.

weight_decay

float – Decay constant. Default is None, not applying the weight decay.

comm

nnabla.communicators.Communicator – Communicator when to do distributed training.

grads

list of nnabla._nd_array.NdArray – The list of gradients to be exchanged when to do distributed training.

Example

Reference:

update()[source]

Monolithic update method.

This method calls the following methods with the dynamic loss scaling.

  1. solver.zerograd
  2. feed data
  3. loss.forward
  4. loss.backward
  5. comm.all_reduce (if it is specified)
  6. solver.update