Mixed Precision Trainings

DynamicLossScalingUpdater

class nnabla.experimental.mixed_precision_training.DynamicLossScalingUpdater(solver, loss, data_feeder=<function DynamicLossScalingUpdater.<lambda>>, scale=8.0, scaling_factor=2.0, N=2000, clear_buffer=True, accum_grad=1, weight_decay=None, comm=None, grads=[])[source]

Dynamic Loss Scaling Updater for the mixed precision training.

Parameters
  • solver (nnabla.solvers.Solver) – Solver object. E.g., Momentum or Adam.

  • loss (nnabla.Variable) – Loss variable from which the forward and the backward is called.

  • data_feeder (callable object, function, or lambda) – Data feeder

  • scale (float) – Loss scale constant. This is dynamically changing during training.

  • scaling_factor (float) – Scaling factor for the dynamic loss scaling.

  • N (int) – Interval, the number of iterations in training for increasing loss scale by scaling_factor.

  • clear_buffer (bool) – Clears the no longer referenced variables during backpropagation to save memory.

  • accum_grad (int) – Number of accumulation of gradients. Update method of the Solver is called after the accum_grad number of the forward and backward is called.

  • weight_decay (float) – Decay constant. Default is None, not applying the weight decay.

  • comm (nnabla.communicators.Communicator) – Communicator when to do distributed training. Default is None.

  • grads (list of nnabla.NdArray) – The list of gradients to be exchanged when to do distributed training. Default is the empty list.

solver

Solver object. E.g., Momentum or Adam.

Type

nnabla.solvers.Solver

loss

Loss variable from which the forward and the backward is called.

Type

nnabla.Variable

data_feeder

Data feeder

Type

callable object, function, lambda

scale

Loss scale constant. This is dynamically changing during training.

Type

float

scaling_factor

Scaling factor for the dynamic loss scaling.

Type

float

N

Interval, the number of iterations in training for increasing loss scale by scaling_factor.

Type

int

clear_buffer

Clears the no longer referenced variables during backpropagation to save memory.

Type

bool

accum_grad

Number of accumulation of gradients. Update method of the Solver is called after the accum_grad number of the forward and backward is called.

Type

int

weight_decay

Decay constant. Default is None, not applying the weight decay.

Type

float

comm

Communicator when to do distributed training.

Type

nnabla.communicators.Communicator

grads

The list of gradients to be exchanged when to do distributed training.

Type

list of nnabla.NdArray

Example

Reference:

update()[source]

Monolithic update method.

This method calls the following methods with the dynamic loss scaling.

  1. solver.zerograd

  2. feed data

  3. loss.forward

  4. loss.backward

  5. comm.all_reduce (if it is specified)

  6. solver.update