Mixed Precision Trainings

DynamicLossScalingUpdater

class nnabla.experimental.mixed_precision_training.DynamicLossScalingUpdater(solver, loss, data_feeder=<function DynamicLossScalingUpdater.<lambda>>, scale=8.0, scaling_factor=2.0, N=2000, clear_buffer=True, accum_grad=1, weight_decay=None, comm=None, grads=[])[source]

Dynamic Loss Scaling Updater for the mixed precision training.

Parameters:
  • solver (nnabla.solvers.Solver) – Solver object. E.g., Momentum or Adam.

  • loss (nnabla.Variable) – Loss variable from which the forward and the backward is called.

  • data_feeder (callable object, function, or lambda) – Data feeder

  • scale (float) – Loss scale constant. This is dynamically changing during training.

  • scaling_factor (float) – Scaling factor for the dynamic loss scaling.

  • N (int) – Interval, the number of iterations in training for increasing loss scale by scaling_factor.

  • clear_buffer (bool) – Clears the no longer referenced variables during backpropagation to save memory.

  • accum_grad (int) – Number of accumulation of gradients. Update method of the Solver is called after the accum_grad number of the forward and backward is called.

  • weight_decay (float) – Decay constant. Default is None, not applying the weight decay.

  • comm (nnabla.communicators.Communicator) – Communicator when to do distributed training. Default is None.

  • grads (list of nnabla.NdArray) – The list of gradients to be exchanged when to do distributed training. Default is the empty list.

solver

Solver object. E.g., Momentum or Adam.

Type:

nnabla.solvers.Solver

loss

Loss variable from which the forward and the backward is called.

Type:

nnabla.Variable

data_feeder

Data feeder

Type:

callable object, function, lambda

scale

Loss scale constant. This is dynamically changing during training.

Type:

float

scaling_factor

Scaling factor for the dynamic loss scaling.

Type:

float

N

Interval, the number of iterations in training for increasing loss scale by scaling_factor.

Type:

int

clear_buffer

Clears the no longer referenced variables during backpropagation to save memory.

Type:

bool

accum_grad

Number of accumulation of gradients. Update method of the Solver is called after the accum_grad number of the forward and backward is called.

Type:

int

weight_decay

Decay constant. Default is None, not applying the weight decay.

Type:

float

comm

Communicator when to do distributed training.

Type:

nnabla.communicators.Communicator

grads

The list of gradients to be exchanged when to do distributed training.

Type:

list of nnabla.NdArray

Example

Reference:

update()[source]

Monolithic update method.

This method calls the following methods with the dynamic loss scaling.

  1. solver.zerograd

  2. feed data

  3. loss.forward

  4. loss.backward

  5. comm.all_reduce (if it is specified)

  6. solver.update