Quantization Aware Training¶

QATConfig¶

Configuration for quantization aware training.

class nnabla.utils.qnn.QATConfig[source]¶

Bases: object

class RecorderPosition(value)[source]¶

Bases: enum.Enum

Position to add recorder for function.

BEFORE = 0¶: Add recoder only before a function

BOTH = 1¶: Add recoder before/after a function

class RoundingMethod(value)[source]¶

Bases: enum.Enum

Round method of scale

CEIL = 'CEIL'¶: round up. e.g. ceil(9.4) = 10

FLOOR = 'FLOOR'¶: round down. e.g. floor(9.5) = 9

NOTROUND = 'NOTROUND'¶: not round

ROUND = 'ROUND'¶: round. e.g. round(9.4) = 9, round(9.5) = 10

bn_folding = False¶: Enable Batch Normalization Folding. Note that sometimes this can cause the training become unstable.

bn_self_folding = False¶: Enable Batch Normalization Self-Folding. Note that sometimes this can cause the training become unstable.

channel_last = False¶: Enable channel last (channel_first is only supported now)

channel_wise = False¶: Enable channel-wise quantization

dtype¶

Precision

alias of numpy.int8

ext_name = 'cudnn'¶: Extension Context. ‘cpu’, ‘cuda’ or ‘cudnn’

learning_rate_scale = 0.1¶: QAT Learning_rate = NonQNN Learning_rate * learning_rate_scale. Recommend setting it to 0.1 or 0.01

narrow_range = False¶: Narrow the lower-bound (e.g., when in int8, -128 -> -127)

niter_to_recording = 0¶: Step start to record

niter_to_training = -1¶: Step start to QAT. The number of steps between recording and training should be greater than the number of steps of one epoch training.

pow2 = 'ROUND'¶: Member of nnabla.utils.qnn.QATConfig.RoundingMethod. Round the scale to power-of-2. If you want to deploy the model with tensorrt, please enable this.

record_layers = []¶: list of nnabla function names. Recording layers. If empty, add recoders to all layers. Otherwise, only add recoders to functions in record_layers.

recorder_activation¶

One of nnabla.utils.qnn.MinMaxMinMaxRecorderCallback, nnabla.utils.qnn.AbsMaxRecorderCallback, nnabla.utils.qnn.MinMaxMvaRecorderCallback, nnabla.utils.qnn.MaxMaxRecorderCallback, nnabla.utils.qnn.MaxMvaRecorderCallback Recorder of activation

alias of nnabla.utils.qnn.MaxMvaRecorderCallback

recorder_position = 0¶: Member of nnabla.utils.qnn.QATConfig.RecorderPosition. Recorder position

recorder_weight¶

One of nnabla.utils.qnn.MinMaxMinMaxRecorderCallback, nnabla.utils.qnn.AbsMaxRecorderCallback, nnabla.utils.qnn.MinMaxMvaRecorderCallback, nnabla.utils.qnn.MaxMaxRecorderCallback, nnabla.utils.qnn.MaxMvaRecorderCallback Recorder of weight

alias of nnabla.utils.qnn.MinMaxMinMaxRecorderCallback

round_mode = 'HALF_TO_EVEN'¶: Round mode of quantize layer

skip_bias = False¶: Skip quantizing bias of Affine and bias of the Convolution function family

skip_inputs_layers = ['Convolution', 'Deconvolution']¶: List of nnabla function name. Skip quantizing inputs layers of network

skip_outputs_layers = ['Affine']¶: List of nnabla function name. Skip quantizing outputs layers of network

zero_point = False¶: Use zero-point (asymmetric) or not use (symmetric)

QATTensorRTConfig¶

The default quantization aware training Configuration that meets the requirements of TensorRT.

class nnabla.utils.qnn.QATTensorRTConfig[source]¶

QATScheduler¶

class nnabla.utils.qnn.QATScheduler(config=<nnabla.utils.qnn.QATTensorRTConfig object>, solver=None)[source]¶

Bases: object

Scheduler for quantization aware training.

Parameters

config (QATConfig) – Quantization-Aware-Training Configuration
solver (nnabla.solver.Solver) – Neural Network Solver

Example

from nnabla.utils.qnn import QATScheduler, QATConfig, PrecisionMode

# Set configuration
config = QATConfig()
config.bn_folding = True
config.bn_self_folding = True
config.channel_last = False
config.precision_mode = PrecisionMode.SIM_QNN
config.niter_to_recording = 1
config.niter_to_training = 500

qat_scheduler = QATScheduler(config=config, solver=solver)

# convert graph to enable quantization aware training.
qat_scheduler(pred) # pred is the output variable of training network
qat_scheduler(vpred, training=False) # vpred is the output variable of evaluation network

# Training loop
for i in range(training_step):
    qat_scheduler.step()

    # Your training code here

# save quantized nnp
qat_scheduler.save('qnn.np', vimage, deploy=False) # vimage is the input variable of network

save(fname, inputs, batch_size=1, net_name='net', deploy=False)[source]¶

Save QAT network model to NNP file as default.

Parameters

fname (str) – NNP file name.
inputs (nnabla.Variable or list of nnabla.Variable) – Network inputs variables.
batch_size (int) – batch size.
net_name (str) – network name.
deploy (bool) – Whether to apply QNN deployment conversion. deploy=True is not supported yet.

Returns

None

step()[source]¶: Step in the state of QNN. According to the number of iterations in config.