Quantization Aware Training
QATConfig
Configuration for quantization aware training.
- class nnabla.utils.qnn.QATConfig[source]
Bases:
object
- class RecorderPosition(value)[source]
Bases:
Enum
Position to add recorder for function.
- BEFORE = 0
Add recoder only before a function
- BOTH = 1
Add recoder before/after a function
- class RoundingMethod(value)[source]
Bases:
Enum
Round method of scale
- CEIL = 'CEIL'
round up. e.g. ceil(9.4) = 10
- FLOOR = 'FLOOR'
round down. e.g. floor(9.5) = 9
- NOTROUND = 'NOTROUND'
not round
- ROUND = 'ROUND'
round. e.g. round(9.4) = 9, round(9.5) = 10
- bn_folding = False
Enable Batch Normalization Folding. Note that sometimes this can cause the training become unstable.
- bn_self_folding = False
Enable Batch Normalization Self-Folding. Note that sometimes this can cause the training become unstable.
- channel_last = False
Enable channel last (channel_first is only supported now)
- channel_wise = False
Enable channel-wise quantization
- dtype
Precision
alias of
int8
- ext_name = 'cudnn'
Extension Context. ‘cpu’, ‘cuda’ or ‘cudnn’
- learning_rate_scale = 0.1
QAT Learning_rate = NonQNN Learning_rate * learning_rate_scale. Recommend setting it to 0.1 or 0.01
- narrow_range = False
Narrow the lower-bound (e.g., when in int8, -128 -> -127)
- niter_to_recording = 0
Step start to record
- niter_to_training = -1
Step start to QAT. The number of steps between recording and training should be greater than the number of steps of one epoch training.
- pow2 = 'ROUND'
Member of
nnabla.utils.qnn.QATConfig.RoundingMethod
. Round the scale to power-of-2. If you want to deploy the model with tensorrt, please enable this.
- record_layers = []
list of nnabla function names. Recording layers. If empty, add recoders to all layers. Otherwise, only add recoders to functions in record_layers.
- recorder_activation
One of
nnabla.utils.qnn.MinMaxMinMaxRecorderCallback
,nnabla.utils.qnn.AbsMaxRecorderCallback
,nnabla.utils.qnn.MinMaxMvaRecorderCallback
,nnabla.utils.qnn.MaxMaxRecorderCallback
,nnabla.utils.qnn.MaxMvaRecorderCallback
Recorder of activationalias of
MaxMvaRecorderCallback
- recorder_position = 0
Member of
nnabla.utils.qnn.QATConfig.RecorderPosition
. Recorder position
- recorder_weight
One of
nnabla.utils.qnn.MinMaxMinMaxRecorderCallback
,nnabla.utils.qnn.AbsMaxRecorderCallback
,nnabla.utils.qnn.MinMaxMvaRecorderCallback
,nnabla.utils.qnn.MaxMaxRecorderCallback
,nnabla.utils.qnn.MaxMvaRecorderCallback
Recorder of weightalias of
MinMaxMinMaxRecorderCallback
- round_mode = 'HALF_TO_EVEN'
Round mode of quantize layer
- skip_bias = False
Skip quantizing bias of Affine and bias of the Convolution function family
- skip_inputs_layers = ['Convolution', 'Deconvolution']
List of nnabla function name. Skip quantizing inputs layers of network
- skip_outputs_layers = ['Affine']
List of nnabla function name. Skip quantizing outputs layers of network
- zero_point = False
Use zero-point (asymmetric) or not use (symmetric)
QATTensorRTConfig
The default quantization aware training Configuration that meets the requirements of TensorRT.
- class nnabla.utils.qnn.QATTensorRTConfig[source]
- bn_folding = True
Enable Batch Normalization Folding. Note that sometimes this can cause the training become unstable.
- bn_self_folding = True
Enable Batch Normalization Self-Folding. Note that sometimes this can cause the training become unstable.
- pow2 = 'ROUND'
Member of
nnabla.utils.qnn.QATConfig.RoundingMethod
. Round the scale to power-of-2. If you want to deploy the model with tensorrt, please enable this.
- record_layers = ['Convolution', 'Deconvolution', 'Affine', 'BatchMatmul', 'ReLU']
list of nnabla function names. Recording layers. If empty, add recoders to all layers. Otherwise, only add recoders to functions in record_layers.
QATScheduler
- class nnabla.utils.qnn.QATScheduler(config=<nnabla.utils.qnn.QATTensorRTConfig object>, solver=None)[source]
Bases:
object
Scheduler for quantization aware training.
- Parameters:
config (
QATConfig
) – Quantization-Aware-Training Configurationsolver (
nnabla.solver.Solver
) – Neural Network Solver
Example
from nnabla.utils.qnn import QATScheduler, QATConfig, PrecisionMode # Set configuration config = QATConfig() config.bn_folding = True config.bn_self_folding = True config.channel_last = False config.precision_mode = PrecisionMode.SIM_QNN config.niter_to_recording = 1 config.niter_to_training = 500 qat_scheduler = QATScheduler(config=config, solver=solver) # convert graph to enable quantization aware training. qat_scheduler(pred) # pred is the output variable of training network qat_scheduler(vpred, training=False) # vpred is the output variable of evaluation network # Training loop for i in range(training_step): qat_scheduler.step() # Your training code here # save quantized nnp qat_scheduler.save('qnn.np', vimage, deploy=False) # vimage is the input variable of network
- save(fname, inputs, batch_size=1, net_name='net', deploy=False)[source]
Save QAT network model to NNP file as default.
- Parameters:
fname (str) – NNP file name.
inputs (
nnabla.Variable
or list ofnnabla.Variable
) – Network inputs variables.batch_size (int) – batch size.
net_name (str) – network name.
deploy (bool) – Whether to apply QNN deployment conversion. deploy=True is not supported yet.
- Returns:
None