Quantization Aware Training

QATConfig

Configuration for quantization aware training.

class nnabla.utils.qnn.QATConfig[source]

Bases: object

class RecorderPosition(value)[source]

Bases: Enum

Position to add recorder for function.

BEFORE = 0: Add recoder only before a function

BOTH = 1: Add recoder before/after a function

class RoundingMethod(value)[source]

Bases: Enum

Round method of scale

CEIL = 'CEIL': round up. e.g. ceil(9.4) = 10

FLOOR = 'FLOOR': round down. e.g. floor(9.5) = 9

NOTROUND = 'NOTROUND': not round

ROUND = 'ROUND': round. e.g. round(9.4) = 9, round(9.5) = 10

bn_folding = False: Enable Batch Normalization Folding. Note that sometimes this can cause the training become unstable.

bn_self_folding = False: Enable Batch Normalization Self-Folding. Note that sometimes this can cause the training become unstable.

channel_last = False: Enable channel last (channel_first is only supported now)

channel_wise = False: Enable channel-wise quantization

dtype

Precision

alias of int8

ext_name = 'cudnn': Extension Context. ‘cpu’, ‘cuda’ or ‘cudnn’

learning_rate_scale = 0.1: QAT Learning_rate = NonQNN Learning_rate * learning_rate_scale. Recommend setting it to 0.1 or 0.01

narrow_range = False: Narrow the lower-bound (e.g., when in int8, -128 -> -127)

niter_to_recording = 0: Step start to record

niter_to_training = -1: Step start to QAT. The number of steps between recording and training should be greater than the number of steps of one epoch training.

pow2 = 'ROUND': Member of nnabla.utils.qnn.QATConfig.RoundingMethod. Round the scale to power-of-2. If you want to deploy the model with tensorrt, please enable this.

record_layers = []: list of nnabla function names. Recording layers. If empty, add recoders to all layers. Otherwise, only add recoders to functions in record_layers.

recorder_activation

One of nnabla.utils.qnn.MinMaxMinMaxRecorderCallback, nnabla.utils.qnn.AbsMaxRecorderCallback, nnabla.utils.qnn.MinMaxMvaRecorderCallback, nnabla.utils.qnn.MaxMaxRecorderCallback, nnabla.utils.qnn.MaxMvaRecorderCallback Recorder of activation

alias of MaxMvaRecorderCallback

recorder_position = 0: Member of nnabla.utils.qnn.QATConfig.RecorderPosition. Recorder position

recorder_weight

One of nnabla.utils.qnn.MinMaxMinMaxRecorderCallback, nnabla.utils.qnn.AbsMaxRecorderCallback, nnabla.utils.qnn.MinMaxMvaRecorderCallback, nnabla.utils.qnn.MaxMaxRecorderCallback, nnabla.utils.qnn.MaxMvaRecorderCallback Recorder of weight

alias of MinMaxMinMaxRecorderCallback

round_mode = 'HALF_TO_EVEN': Round mode of quantize layer

skip_bias = False: Skip quantizing bias of Affine and bias of the Convolution function family

skip_inputs_layers = ['Convolution', 'Deconvolution']: List of nnabla function name. Skip quantizing inputs layers of network

skip_outputs_layers = ['Affine']: List of nnabla function name. Skip quantizing outputs layers of network

zero_point = False: Use zero-point (asymmetric) or not use (symmetric)

QATTensorRTConfig

The default quantization aware training Configuration that meets the requirements of TensorRT.

class nnabla.utils.qnn.QATTensorRTConfig[source]

bn_folding = True: Enable Batch Normalization Folding. Note that sometimes this can cause the training become unstable.

bn_self_folding = True: Enable Batch Normalization Self-Folding. Note that sometimes this can cause the training become unstable.

pow2 = 'ROUND': Member of nnabla.utils.qnn.QATConfig.RoundingMethod. Round the scale to power-of-2. If you want to deploy the model with tensorrt, please enable this.

record_layers = ['Convolution', 'Deconvolution', 'Affine', 'BatchMatmul', 'ReLU']: list of nnabla function names. Recording layers. If empty, add recoders to all layers. Otherwise, only add recoders to functions in record_layers.

QATScheduler

class nnabla.utils.qnn.QATScheduler(config=<nnabla.utils.qnn.QATTensorRTConfig object>, solver=None)[source]

Bases: object

Scheduler for quantization aware training.

Parameters:

config (QATConfig) – Quantization-Aware-Training Configuration
solver (nnabla.solver.Solver) – Neural Network Solver

Example

from nnabla.utils.qnn import QATScheduler, QATConfig, PrecisionMode

# Set configuration
config = QATConfig()
config.bn_folding = True
config.bn_self_folding = True
config.channel_last = False
config.precision_mode = PrecisionMode.SIM_QNN
config.niter_to_recording = 1
config.niter_to_training = 500

qat_scheduler = QATScheduler(config=config, solver=solver)

# convert graph to enable quantization aware training.
qat_scheduler(pred) # pred is the output variable of training network
qat_scheduler(vpred, training=False) # vpred is the output variable of evaluation network

# Training loop
for i in range(training_step):
    qat_scheduler.step()

    # Your training code here

# save quantized nnp
qat_scheduler.save('qnn.np', vimage, deploy=False) # vimage is the input variable of network

save(fname, inputs, batch_size=1, net_name='net', deploy=False)[source]

Save QAT network model to NNP file as default.

Parameters:

fname (str) – NNP file name.
inputs (nnabla.Variable or list of nnabla.Variable) – Network inputs variables.
batch_size (int) – batch size.
net_name (str) – network name.
deploy (bool) – Whether to apply QNN deployment conversion. deploy=True is not supported yet.

Returns:

None

step()[source]: Step in the state of QNN. According to the number of iterations in config.