Graph Converter for Inference¶
In this tutorial, we demonstrate several graph converters mainly used for inference. Graph converters are basically used for a trained graph, neural network, so once you train a neural network, you can use graph converters.
We show how to use the following graph converters step-by-step according to usecases.
- BatchNormalizationLinearConverter
- BatchNormalizationFoldedConverter
- FixedPointWeightConverter
- FixedPointActivationConverter
Note before starting the following instruction, import python modules needed.
# Import
import numpy as np
import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import nnabla.experimental.viewers as V
import nnabla.experimental.graph_converters as GC
Also, define LeNet as the motif.
# LeNet
def LeNet(image, test=False):
h = PF.convolution(image, 16, (5, 5), (1, 1), with_bias=False, name='conv1')
h = PF.batch_normalization(h, batch_stat=not test, name='conv1-bn')
h = F.max_pooling(h, (2, 2))
h = F.relu(h)
h = PF.convolution(h, 16, (5, 5), (1, 1), with_bias=True, name='conv2')
h = PF.batch_normalization(h, batch_stat=not test, name='conv2-bn')
h = F.max_pooling(h, (2, 2))
h = F.relu(h)
h = PF.affine(h, 10, with_bias=False, name='fc1')
h = PF.batch_normalization(h, batch_stat=not test, name='fc1-bn')
h = F.relu(h)
pred = PF.affine(h, 10, with_bias=True, name='fc2')
return pred
BatchNormalizationLinearConverter¶
Typical networks contain the batch normalization layers. It serves as normalization in a network and uses the batch stats (the batch mean and variance) to normalize inputs as
in training. \(\mu\) and \(\sigma^2\) are the batch mean and variance, and \(\gamma\) and \(\beta\) are the scale and bias parameter to be learnt.
At the same time, it computes the running stats (the exponential moving average \(\mu_r\) and variance \(\sigma_r^2\) of inputs to the batch normalization layer), which are used later for inference.
If nothing changes, in inference time, the batch normalization is performed as in the above equation using the running stats.
This is the explicit normalization, so as you can see, there are many redundant computations (subtraction, devision, pow2, sqrt, multiplication, addition) in inference, which should be avoided in inference graph. We can do it by ourselves, but it is apparently troublesome.
BatchNormalizationLinearConverter automatically converts this equation of the batch normalization to the simple linear form as
After the conversion, we just have one multiplication and one addition since \(c_0\) and \(c_1\) can be precomputed in inference.
Specifically, suppose that \(x\) is the output of the 2D-Convolution, so \(x\) is 3D-Tensor (e.g., \(N \times H \times W\)). In the batch normalization, the number of \(c\)s is the map size \(N\), respectively for \(c_0\) and \(c_1\). Thus, the multiplication (\(c_0 \times x\)) is \(N \times H \times W\) and the addition ($ + c_1$) is same \(N \times H \times W\). We can see much reduction compared to the native implementation.
Example¶
First, create LeNet.
x = nn.Variable.from_numpy_array(np.random.rand(4, 3, 28, 28))
y = LeNet(x, test=True)
Now look at LeNet visually.
viewer = V.SimpleGraph()
viewer.view(y)
Convert it to the one with the batch normalization linearly folded.
converter = GC.BatchNormalizationLinearConverter(name="bn-linear-lenet")
y = converter.convert(y, [x])
Also, show the converted graph.
viewer = V.SimpleGraph()
viewer.view(y)
BatchNormalizationFoldedConverter¶
As you can see in the previous converter, BatchNormalizationLinearConverter is the linear folding of the batch normalization layer in inference. However, if the preceding layer of the batch normalization is the convolution, affine or another layer performing inner-product, that the linear folding is further folded into the weights of the preceding layers.
Suppose the sequence of a convolution and a batch normalization in inference, it can be written as,
where \(\ast\) is the convolutional operator, \(w\) is the convolutional weights, and \(b\) is the bias of the convolution layer. Since \(\ast\) has linearity, we can further fold \(c_0\) into the weights \(w\) and bias \(b\), such that we have the simpler form.
BatchNormalizationFoldedConverter automatically finds a sequence of the convolution and the batch normalization in a given graph, then folds all parameters related to the batch normalization into the preceding convolution layer. Now, we do not need the multiplication and addition seen in the previous case, BatchNormalizationLinearConverter.
Example¶
First, create LeNet.
x = nn.Variable.from_numpy_array(np.random.rand(4, 3, 28, 28))
y = LeNet(x, test=True)
Now look at LeNet visually.
viewer = V.SimpleGraph()
viewer.view(y)
Convert it to the one with the batch normalization linearly folded.
converter = GC.BatchNormalizationFoldedConverter(name="bn-folded-lenet")
y = converter.convert(y, [x])
Also, show the converted graph.
viewer = V.SimpleGraph()
viewer.view(y)
FixedPointWeightConverter¶
Once training finishes, where to deploy? Your destination of deployment of a trained model might be on Cloud or an embedded device. In either case, the typical data type, FloatingPoint32 (FP32) might be redundant for inference, so you may want to use SIMD operation with e.g., 4-bit or 8-bit of your target device. Training is usually performed using FP32, while interfence might be performed FixedPoint. Hence, you have to change corresponding layers, e.g., the convolution and affine.
FixedPointWeightConverter automatically converts the affine, convolution, and deconvolution of a given graph to that of fixed point version.
Example¶
First, create LeNet.
x = nn.Variable.from_numpy_array(np.random.rand(4, 3, 28, 28))
y = LeNet(x, test=True)
Now look at LeNet visually.
viewer = V.SimpleGraph()
viewer.view(y)
Convert it to the one with the batch normalization linearly folded.
converter = GC.FixedPointWeightConverter(name="fixed-point-weight-lenet")
y = converter.convert(y, [x])
Also, show the converted graph.
viewer = V.SimpleGraph()
viewer.view(y)
FixedPointActivationConverter¶
FixedPointWeightConverter converts layers of weights, but
FixedPointActivationConverter automatically converts activation layers,
e.g., ReLU. The typial neural network architecture contains the sequence
of the block ReLU -> Convolution -> BatchNormalization
; therefore,
when you convert both ReLU
and Convolution
to the fixed-point
ones with proper hyper-paremters (step-size and bitwidth), you can
utilize your SIMD operation of your target device because both of the
weights and inputs of the convolution are fixed-point.
Example¶
First, create LeNet.
x = nn.Variable.from_numpy_array(np.random.rand(4, 3, 28, 28))
y = LeNet(x, test=True)
Now look at LeNet visually.
viewer = V.SimpleGraph()
viewer.view(y)
Convert it to the one with the batch normalization linearly folded.
converter = GC.FixedPointActivationConverter(name="fixed-point-activation-lenet")
y = converter.convert(y, [x])
Also, show the converted graph.
viewer = V.SimpleGraph()
viewer.view(y)
Tipically, FixedPointWeightConverter and FixedPointActivationConverter
are used togather. For such purposes, you can use
GC.SequentialConverter
.
converter_w = GC.FixedPointWeightConverter(name="fixed-point-lenet")
converter_a = GC.FixedPointActivationConverter(name="fixed-point-lenet")
converter = GC.SequentialConverter([converter_w, converter_a])
y = converter.convert(y, [x])
Needless to say, GC.SequentialConverter
is not limited to using this
case. One you creat your own Conveterter
s, then you can add these
converters to GC.SequentialConverter
if these are used togather.
Look at the converted graph visually.
viewer = V.SimpleGraph()
viewer.view(y)