Parametric Functions¶
In NNabla, trainable models are created by composing functions that have optimizable parameters.
These functions are called parametric functions.
Parametric functions are provided by nnabla.parametric_functions
.
 See also:
 Python API Tutorial.
Parameter Management API¶
The parameters registered by List of Parametric Functions can be managed using APIs listed in this section.

nnabla.parameter.
parameter_scope
(*args, **kwds)[source]¶ Grouping parameters registered by parametric functions listed in
nnabla.parametric_functions
.Parameters:  name (str) – Parameter scope name.
 scope (OrderedDict, optional) – Specifiy current parameter scope as a local dictionary.
The default value is
None
. In this case, the current parameter scope maintained in global is used.
Example:
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.functions as F with nn.parameter_scope('conv1'): conv_out1 = PF.convolution(x, 32, (5, 5)) bn_out1 = PF.batch_normalization(conv_out1) act_out1 = F.relu(bn_out1) with nn.parameter_scope('conv2'): conv_out2 = PF.convolution(act_out1, 64, (3, 3)) bn_out2 = PF.batch_normalization(conv_out2) act_out2 = F.relu(bn_out2)
Nesting with blocks allows you to nest parameter scopes. This can also be done by using “/” inside the parameter names.
Example:
with nn.parameter_scope('network1'): with nn.parameter_scope('conv1'): conv_out1 = PF.convolution(x, 32, (5, 5)) bn_out1 = PF.batch_normalization(conv_out1) act_out1 = F.relu(bn_out1) with nn.parameter_scope('conv2'): conv_out2 = PF.convolution(act_out1, 64, (3, 3)) bn_out2 = PF.batch_normalization(conv_out2) act_out2 = F.relu(bn_out2)
is equivalent to
with nn.parameter_scope('network1/conv1'): conv_out1 = PF.convolution(x, 32, (5, 5)) bn_out1 = PF.batch_normalization(conv_out1) act_out1 = F.relu(bn_out1) with nn.parameter_scope('network1/conv2'): conv_out2 = PF.convolution(act_out1, 64, (3, 3)) bn_out2 = PF.batch_normalization(conv_out2) act_out2 = F.relu(bn_out2)

nnabla.parameter.
get_parameters
(params=None, path='', grad_only=True)[source]¶ Get parameter Variables under the current parameter scope.
Parameters: Returns: Return type:

nnabla.parameter.
save_parameters
(path, params=None)[source]¶ Save all parameters into a file with the specified format.
Currently hdf5 and protobuf formats are supported.
Parameters:

nnabla.parameter.
load_parameters
(path)[source]¶ Load parameters from a file with the specified format.
Parameters: path – path or file object

nnabla.parameter.
get_parameter_or_create
(name, shape=None, initializer=None, need_grad=True, as_need_grad=None)[source]¶ Returns an existing parameter variable with the provided name. If a variable with the provided name does not exist, a new variable with the provided name is returned.
Parameters:  name (str) – The name under the current scope. If it already exists, the name is queried from the parameter manager.
 shape (
tuple
ofint
) – Shape of created parameter. The shape of the specified parameter must match with this shape. The default is None which is only valid if initializer is given as annumpy.ndarray
.  initializer (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – An initialization function to be applied to the parameter.numpy.ndarray
can also be given to initialize parameters from numpy array data.  need_grad (bool) – Register the parameter with the specified
need_grad
flag. The default is True. If the flag is different from the previously specified one, the flag will be overwritten, but the values will be kept.  as_need_grad (bool) – Get a parameter variable with the specified
need_grad
flag. Note that this doesn’t overwrite the flag of the registered parameter variable with the provided name. Instead, if the given flag mismatches with the previously registeredneed_grad
flag, it returns a new variable referring to the same array contents but withneed_grad=as_need_grad
.
List of Parametric Functions¶
Parametric functions are provided by nnabla.parametric_functions
, as listed below.
Like functions listed in Functions, they take Variable
(s) as
first argument(s) followed by options specific to a parametric function. In addition,
they register parameter Variable
(s) into the parameter scope.
The parameter variables are registered with need_grad
properties specific
to a parametric function. The variables with need_grad=False
flag will not
be updated by gradient descent. Hence, backward computation is not executed for
those variables. False
is usually specified when the parameters are updated
during foward pass and/or backward pass, e.g., batch normalization.
All parametric functions take an optional argument fix_parameters=False
.
By giving True
, the associated parameter variables are connected to a
computation graph with a property need_grad=False
regardless properties
of the registered variables, then backward gradient
computation is not executed for those variables. This is useful when you create
a computation graph for evaluation purpose, fixing parameters partially in a
graph, and so on.
All parametric functions listed below are decorated with the following decorator.

nnabla.parametric_functions.
parametric_function_api
(scope_name=None, param_desc=None)[source]¶ Decorator for parametric functions.
The decorated function is always called under a parameter scope
scope_name
. Also, the decorator adds an additional argumentname
(str
, default isNone
) at the end. Ifname
is specified, the scopescope_name
comes under a scopename
. This feature could reduce vertical space usage of the source code. Any parametric function should be decorated by this.Parameters:  scope_name (str, optional) – The original function will be called
under a parameter scope named by
scope_name
.  param_desc (list, optional) – Descriptions of parameters will be automatically included into docstring. This must be a list of tuples with 4 elements composed of (name (str), description (str), shape info (str), need_grad (bool)).
Returns: A decorated parametric function.
Return type: function
 scope_name (str, optional) – The original function will be called
under a parameter scope named by
See Parameter Management API to know how to query and manipulate registered variables.
Here is the list of parametric functions.

nnabla.parametric_functions.
affine
(inp, n_outmaps, base_axis=1, w_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, apply_w=None, apply_b=None, name=None)[source]¶ The affine layer, also known as the fully connected layer. Computes
\[{\mathbf y} = {\mathbf A} {\mathbf x} + {\mathbf b}.\]where \({\mathbf x}, {\mathbf y}\) are the inputs and outputs respectively, and \({\mathbf A}, {\mathbf b}\) are constants.
Parameters:  inp (Variable) – Input ND array with shape (\(M_0 \times \ldots \times M_{B1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.
 n_outmaps (
int
ortuple
ofint
) – Number of output neurons per data.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.  fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
 apply_w (function) – Lambda, function, or callable object applied to the weights.
 apply_b (function) – Lambda, function, or callable object applied to the bias.
Returns: \((B + 1)\)D array. (\(M_0 \times \ldots \times M_{B1} \times L\))f
Return type:  Parameters to be registered
The following variables are registered in a parameter scope
"affine"
; W (
need_grad=True
) : Weight matrix. (shape:(inmaps, outmaps)
)  b (
need_grad=True
) : bias vector. (shape:(outputs,)
)
 W (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = affine(<args>)

nnabla.parametric_functions.
convolution
(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, apply_w=None, apply_b=None, name=None)[source]¶ ND Convolution with a bias term.
For Dilated Convolution (a.k.a. Atrous Convolution), refer to:
 Chen et al., DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. https://arxiv.org/abs/1606.00915
 Yu et al., MultiScale Context Aggregation by Dilated Convolutions. https://arxiv.org/abs/1511.07122
Note
Convolution is a computationally intensive operation that should preferrably be run with the cudnn backend. NNabla then uses CuDNN library functions to determine and cache the fastest algorithm for the given set of convolution parameters, which results in additional memory consumption which may pose a problem for GPUs with insufficient memory size. In that case, the NNABLA_CUDNN_WORKSPACE_LIMIT environment variable can be used to restrict the choice of algorithms to those that fit the given workspace memory limit, expressed in bytes. In some cases it may also be desired to restrict the automatic search to algorithms that produce deterministic (reproducable) results. This can be requested by setting the the environment variable NNABLA_CUDNN_DETERMINISTIC to a nonzero value.
Parameters:  inp (Variable) – ND array.
 outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
 kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) twodimensional kernel, specify (3,5).  pad (
tuple
ofint
) – Padding sizes for dimensions.  stride (
tuple
ofint
) – Stride sizes for dimensions.  dilation (
tuple
ofint
) – Dilation sizes for dimensions.  group (int) – Number of groups of channels. This makes connections across channels more sparse by grouping connections along map direction.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
 apply_w (function) – Lambda, function, or callable object applied to the weights.
 apply_b (function) – Lambda, function, or callable object applied to the bias.
Returns: ND array. See
convolution
for the output shape.Return type:  Parameters to be registered
The following variables are registered in a parameter scope
"conv"
; W (
need_grad=True
) : Filter weights. (shape:(outmaps, inmaps // group, *kernel)
)  b (
need_grad=True
) : Bias vector. (shape:(outmaps,)
)
 W (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = convolution(<args>)

nnabla.parametric_functions.
depthwise_convolution
(inp, kernel, pad=None, stride=None, dilation=None, multiplier=1, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶ ND Depthwise Convolution with a bias term.
Reference:
 Chollet: Chollet, Francois. “Xception: Deep Learning with Depthwise Separable Convolutions. https://arxiv.org/abs/1610.02357
Parameters:  inp (Variable) – ND array.
 kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) twodimensional kernel, specify (3,5).  pad (
tuple
ofint
) – Padding sizes for dimensions.  stride (
tuple
ofint
) – Stride sizes for dimensions.  dilation (
tuple
ofint
) – Dilation sizes for dimensions.  multiplier (
int
) – Number of output feature maps per input feature map.  w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
Returns: ND array. See
depthwise_convolution
for the output shape.Return type:  Parameters to be registered
The following variables are registered in a parameter scope
"depthwise_conv"
; W (
need_grad=True
) : Filter weights. (shape:(inmaps * multiplier, *kernel)
)  b (
need_grad=True
) : Bias vector. (shape:(inmaps * multiplier,)
)
 W (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = depthwise_convolution(<args>)

nnabla.parametric_functions.
deconvolution
(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, apply_w=None, apply_b=None, name=None)[source]¶ Deconvolution layer.
Parameters:  inp (Variable) – ND array.
 outmaps (int) – Number of deconvolution kernels (which is equal to the number of output channels). For example, to apply deconvolution on an input with 16 types of filters, specify 16.
 kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply deconvolution on an image with a 3 (height) by 5 (width) twodimensional kernel, specify (3,5).  pad (
tuple
ofint
) – Padding sizes for dimensions.  stride (
tuple
ofint
) – Stride sizes for dimensions.  dilation (
tuple
ofint
) – Dilation sizes for dimensions.  group (int) – Number of groups of channels. This makes connections across channels sparser by grouping connections along map direction.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
 apply_w (function) – Lambda, function, or callable object applied to the weights.
 apply_b (function) – Lambda, function, or callable object applied to the bias.
Returns: ND array. See
deconvolution
for the output shape.Return type:  Parameters to be registered
The following variables are registered in a parameter scope
"deconv"
; W (
need_grad=True
) : Filter weights. (shape:(inmaps, outmaps // group, *kernel)
)  b (
need_grad=True
) : Bias vector. (shape:(outmaps,)
)
 W (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = deconvolution(<args>)

nnabla.parametric_functions.
depthwise_deconvolution
(inp, kernel, pad=None, stride=None, dilation=None, divisor=1, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶ Depthwise deconvolution computes the transposed depthwise convolution for onedimensional and twodimensional input data.
Parameters:  inp (Variable) – ND array.
 kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) twodimensional kernel, specify (3,5).  pad (
tuple
ofint
) – Padding sizes for dimensions.  stride (
tuple
ofint
) – Stride sizes for dimensions.  dilation (
tuple
ofint
) – Dilation sizes for dimensions.  divisor (
int
) – Number of input feature maps per output feature map.  w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
Returns: ND array. See
depthwise_deconvolution
for the output shape.Return type:  Parameters to be registered
The following variables are registered in a parameter scope
"depthwise_deconv"
; W (
need_grad=True
) : Filter weights. (shape:(inmaps,) + kernel
)  b (
need_grad=True
) : Bias vector. (shape:(inmaps / divisor,)
)
 W (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = depthwise_deconvolution(<args>)

nnabla.parametric_functions.
batch_normalization
(inp, axes=[1], decay_rate=0.9, eps=1e05, batch_stat=True, output_stat=False, fix_parameters=False, param_init=None, name=None)[source]¶ Batch normalization layer.
\[\begin{split}\begin{array}{lcl} \mu &=& \frac{1}{M} \sum x_i\\ \sigma^2 &=& \frac{1}{M} \sum \left(x_i  \mu\right)^2\\ \hat{x}_i &=& \frac{x_i  \mu}{\sqrt{\sigma^2 + \epsilon }}\\ y_i &= & \hat{x}_i \gamma + \beta. \end{array}\end{split}\]where \(x_i, y_i\) are the inputs. In testing, the mean and variance computed by moving average calculated during training are used.
Parameters:  inp (Variable) – ND array of input.
 axes (
tuple
ofint
) – Mean and variance for each element inaxes
are calculated using elements on the rest axes. For example, if an input is 4 dimensions, andaxes
is[1]
, batch mean is calculated asnp.mean(inp.d, axis=(0, 2, 3), keepdims=True)
(using numpy expression as an example).  decay_rate (float) – Decay rate of running mean and variance.
 eps (float) – Tiny value to avoid zero division by std.
 batch_stat (bool) – Use minibatch statistics rather than running ones.
 output_stat (bool) – Output batch mean and variance.
 fix_parameters (bool) – When set to True, the beta and gamma will not be updated.
 param_init (dict) – Parameter initializers can be set with a dict. A key of the dict must
be
'beta'
,'gamma'
,'mean'
or'var'
. A value of the dict must be anInitializer
or anumpy.ndarray
. E.g.{'beta': ConstantIntializer(0), 'gamma': np.ones(gamma_shape) * 2}
.
Returns: ND array.
Return type: References
 Ioffe and Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. https://arxiv.org/abs/1502.03167
The shape of parameters has the same number of dimensions with the input data, and the shapes in
axes
has the same dimensions with the input, while the rest has1
. If an input is 4dim andaxes=[1]
, the parameter shape will beparam_shape = np.mean(inp.d, axis=(0, 2, 3), keepdims=True).shape
(using numpy expression as an example). Parameters to be registered
The following variables are registered in a parameter scope
"bn"
; beta (
need_grad=True
) : Trainable bias \(\beta\). (shape:<see above>
)  gamma (
need_grad=True
) : Trainable scaling factor \(\gamma\). (shape:<see above>
)  mean (
need_grad=False
) : Moving average of batch mean. (shape:<see above>
)  var (
need_grad=False
) : Moving average of batch variance. (shape:<see above>
)
 beta (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = batch_normalization(<args>)

nnabla.parametric_functions.
mean_subtraction
(inp, base_axis=1, update_running_mean=True, fix_parameters=False, name=None)[source]¶ Mean subtraction layer.
It subtracts the mean of the elements of the input array, and normalizes it to \(0\). Preprocessing arrays with this function has the effect of improving accuracy in various tasks such as image classification.
At training time, this function is defined as
\[\begin{split}\begin{array}{lcl} \mu &=& \frac{1}{M} \sum x_i \\ y_i &=& x_i  \mu \end{array}\end{split}\]At testing time, the mean values used are those that were computed during training by moving average.
Note
The backward performs an approximated differentiation that takes into account only the latest minibatch.
Parameters:  inp (Variable) – ND array of input.
 base_axis (int) – Base axis of Mean Subtraction operation. Dimensions up to base_axis is treated as sample dimension.
 update_running_mean (bool) – When set to True, the running mean will not be updated.
 fix_parameters (bool) – dummy parameter. This argument dose not affect anything.
Returns: ND array.
Return type:  Parameters to be registered
The following variables are registered in a parameter scope
"mean_subtraction"
; mean (
need_grad=False
) : Moving average. (shape:inp.shape[base_axis:]
)  t (
need_grad=False
) : Minibatch counter used in forward pass. (shape:(1,)
)
 mean (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = mean_subtraction(<args>)

nnabla.parametric_functions.
rnn
(x, h, w0_init=None, w_init=None, b_init=None, num_layers=1, nonlinearity='tanh', dropout=0.0, bidirectional=False, training=True, rng=None, with_bias=True, fix_parameters=False, name=None)[source]¶ NStep RNN (recurrent neural networks).
NStep RNN function implements Elman RNN with nonlineraity to input sequence. NStep RNN function is defined as following:
\[h_t = \tanh(w_{ih}x_t+b_{ih}+w_{hh}h_{(t1)}).\]We use the following notations to describe the inputs and outputs below. \(T\): sequcne length, \(B\): batch size, \(I\): input size, \(L\): number of layers, \(D\): number of directions, can be either 1 or 2, \(H\): hidden size.
References
Jeffrey L. Elman. “Finding Structure in Time.” Cognitive Science. 1990.
Parameters:  x (Variable) – Input ND array with shape \((T, B, I)\).
 h (Variable) – Input ND array with shape \((L, D, B, H)\).
 w0_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for weight at the first layer. Shape is \((D, H, I + H)\).  w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for weights at the second layer and up. Shape is \((L1, D, H, D*H + H)\).  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for bias. Shape is \((L, D, H)\).  num_layers (int, optional) – Number of layers in the network. If set to 1, only the weights for the first layer will be invoked. Default is 1.
 nonlinearity (str, optional) – Type of nonlinearity applied to input sequcne. Must be either tanh or relu. Default is tanh.
 dropout (float, optional) – Dropout ratio applied to parameters. Default is 0.0.
 bidirectional (bool, optional) – If True, bidirectional computation will be performed in each layer. Default is False.
 training (bool, optional) – Backpropagation will be performed only when it is true. Default is True.
 with_bias (bool, optional) – Specify whether to include the bias term.
Returns: Output \(y\) with shape \((T, B, D * H)\) ~nnabla.Variable: Output \(h_n\) with shape \((L, D, B, H)\)
Return type: Example
x = nn.Variable((seq_len, batch_size, input_size)) h = nn.Variable((num_layers, num_directions, batch_size, hidden_size)) y, hn = PF.rnn(x, h)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = rnn(<args>)

nnabla.parametric_functions.
lstm
(x, h, c, w0_init=None, w_init=None, b_init=None, num_layers=1, dropout=0.0, bidirectional=False, training=True, rng=None, with_bias=True, fix_parameters=False, name=None)[source]¶ LSTM (long shortterm memory).
Long ShortTerm Memory, or LSTM, is a building block for recurrent neural networks (RNN) layers. LSTM unit consists of a cell and input, output, forget gates whose functions are defined as following:
\[\begin{split}f_t&&=\sigma(W_fx_t+U_fh_{t1}+b_f) \\ i_t&&=\sigma(W_ix_t+U_ih_{t1}+b_i) \\ o_t&&=\sigma(W_ox_t+U_oh_{t1}+b_o) \\ c_t&&=f_t\odot c_{t1}+i_t\odot\tanh(W_cx_t+U_ch_{t1}+b_c) \\ h_t&&=o_t\odot\tanh(c_t).\end{split}\]We use the following notations to describe the inputs and outputs below. \(T\): sequcne length, \(B\): batch size, \(I\): input size, \(L\): number of layers, \(D\): number of directions, can be either 1 or 2, \(H\): hidden size.
References
S. Hochreiter, and J. Schmidhuber. “Long ShortTerm Memory.” Neural Computation. 1997.
Parameters:  x (Variable) – Input ND array with shape \((T, B, I)\).
 h (Variable) – Input ND array with shape \((L, D, B, H)\).
 c (Variable) – Input ND array with shape \((L, D, B, H)\) .
 w0_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for weight at the first layer. Shape is \((D, 4, H, I + H)\).  w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for weights at the second layer and up. Shape is \((L1, D, 4, H, D * H + H)\).  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for bias. Shape is \((L, D, 4, H)\).  num_layers (int, optional) – Number of layers in the network. If set to 1, only the weights for the first layer will be invoked. Default is 1.
 dropout (float, optional) – Dropout ratio applied to parameters. Default is 0.0.
 bidirectional (bool, optional) – If True, bidirectional computation will be performed in each layer. Default is False.
 training (bool, optional) – Backpropagation will be performed only when it is true. Default is True.
 with_bias (bool, optional) – Specify whether to include the bias term.
 fix_parameters (bool) – When set to True, the weights and biases will not be updated.
Returns: Output \(y\) with shape \((T, B, D * H)\) ~nnabla.Variable: Output \(h_n\) with shape \((L, D, B, H)\) ~nnabla.Variable: Output \(c_n\) with shape \((L, D, B, H)\)
Return type: Example
x = nn.Variable((seq_len, batch_size, input_size)) h = nn.Variable((num_layers, num_directions, batch_size, hidden_size)) c = nn.Variable((num_layers, num_directions, batch_size, hidden_size)) y, hn, cn = PF.lstm(x, h, c)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = lstm(<args>)

nnabla.parametric_functions.
gru
(x, h, w0_init=None, w_init=None, b_init=None, num_layers=1, dropout=0.0, bidirectional=False, training=True, rng=None, with_bias=True, fix_parameters=False, name=None)[source]¶ GRU (gated recurrent units).
GRU is defined as following:
\[\begin{split}r_t&&=\sigma(W_rx_t+U_rh_{t1}+b_r) \\ z_t&&=\sigma(W_zx_t+U_zh_{t1}+b_z) \\ n_t&&=\tanh(W_nx_t+b_{in}+r_n(U_nh_{t1}+b_{hn})) \\ h_t&&=(1z_t)n_t+z_th_{t1}.\end{split}\]We use the following notations to describe the inputs and outputs below. \(T\): sequcne length, \(B\): batch size, \(I\): input size, \(L\): number of layers, \(D\): number of directions, can be either 1 or 2, \(H\): hidden size.
References
K. Cho et al. “Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation.” Empirical Methods in Natural Language Processing. 2014.
Parameters:  x (Variable) – Input ND array with shape \((T, B, I)\).
 h (Variable) – Input ND array with shape \((L, D, B, H)\).
 w0_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for weight at the first layer. Shape is \((D, 3, H, I + H)\).  w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for weights at the second layer and up. Shape is \((L1, D, 3, H, D * H + H)\).  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for bias. Shape is \((L, D, 4, H)\).  num_layers (int, optional) – Number of layers in the network. If set to 1, only the weights for the first layer will be invoked. Default is 1.
 dropout (float, optional) – Dropout ratio applied to parameters. Default is 0.0.
 bidirectional (bool, optional) – If True, bidirectional computation will be performed in each layer. Default is False.
 training (bool, optional) – Backpropagation will be performed only when it is true. Default is True.
 with_bias (bool, optional) – Specify whether to include the bias term.
Returns: Output \(y\) with shape \((T, B, D * H)\) ~nnabla.Variable: Output \(h_n\) with shape \((L, D, B, H)\)
Return type: Example
x = nn.Variable((seq_len, batch_size, input_size)) h = nn.Variable((num_layers, num_directions, batch_size, hidden_size)) y, hn = PF.gru(x, h)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = gru(<args>)

nnabla.parametric_functions.
embed
(inp, n_inputs, n_features, initializer=None, fix_parameters=False, apply_w=None, name=None)[source]¶ Embed.
Embed slices a matrix/tensor with indexing array/tensor. Weights are initialized with
nnabla.initializer.UniformInitializer
within the range of \(\sqrt{3}\) and \(\sqrt{3}\).Parameters:  x (Variable) – [Integer] Indices with shape \((I_0, ..., I_N)\)
 n_inputs – number of possible inputs, words or vocabraries
 n_features – number of embedding features
 fix_parameters (bool) – When set to True, the embedding weight matrix will not be updated.
 apply_w (function) – Lambda, function, or callable object applied to the weights.
Returns: Output with shape \((I_0, ..., I_N, W_1, ..., W_M)\)
Return type:  Parameters to be registered
The following variables are registered in a parameter scope
"embed"
; W (
need_grad=True
) : Embedding matrix. (shape:(n_inputs, n_features)
)
 W (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = embed(<args>)

nnabla.parametric_functions.
prelu
(inp, base_axis=1, shared=True, fix_parameters=False, name=None)[source]¶ Parametrized Rectified Linear Unit function defined as
\[y_i = \max(0, x_i) + w_i \min(0, x_i)\]where negative slope \(w\) is learned and can vary across channels (an axis specified with base_axis). Weights are initialized with \(1\).
Parameters: Returns: ND array.
Return type:  Parameters to be registered
The following variables are registered in a parameter scope
"prelu"
; slope (
need_grad=True
) : Negative slope. (shape:tuple() if shared else (inp.shape[base_axis],)
)
 slope (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = prelu(<args>)

nnabla.parametric_functions.
svd_affine
(inp, n_outmaps, r, base_axis=1, uv_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶ SVD affine is a low rank approximation of the affine layer. It can be seen as two consecutive affine layers with a bottleneck. It computes:
\[{\mathbf y} = {\mathbf U} {\mathbf V} {\mathbf x} + {\mathbf b}.\]where \({\mathbf x}, {\mathbf y}\) are the inputs and outputs respectively, and \({\mathbf U}, {\mathbf V}, {\mathbf b}\) are constants.
The weights \({\mathbf U}\) and \({\mathbf V}\) are approximated with singular value decomposition (SVD) of the original weight matrix \({\mathbf W}\) and by selecting the \({R}\) dominant singular values and the corresponding singular vectors. Therefore the low rank \({R}\) is the size of the bottleneck.
If uv_init is a numpy array, \({\mathbf U}\) and \({\mathbf V}\) are computed such that uv_init is approximated by \({\mathbf{UV}}\). If uv_init is None or an initializer, the product of \({\mathbf U}\) and \({\mathbf V}\) approximates the random initialization.
If \({\mathbf U}\) and \({\mathbf V}\) exist in the context, they take precedence over uv_init.
Suppose the weight of the affine is of \({I \times O}\) and the compression rate you want to specify is \({CR}\), then you set \({R}\) as
\[R = \left\lfloor \frac{(1  CR)OI}{O + I} \right\rfloor.\]Parameters:  inp (Variable) – Input ND array with shape (\(M_0 \times \ldots \times M_{B1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.
 n_outmaps (int or tuple) – Number of output neurons per data.
 r (int) – rank of the factorized layer (size of the bottleneck)
 base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 uv_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.  fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
Returns: \((B + 1)\)D array. (\(M_0 \times \ldots \times M_{B1} \times L\))
Return type:  Parameters to be registered
The following variables are registered in a parameter scope
"svd_affine"
; U (
need_grad=True
) : \({\mathbf U}\). (shape:(inmaps, r)
)  V (
need_grad=True
) : \({\mathbf V}\). (shape:(r, outmaps)
)  b (
need_grad=True
) : Bias vector. (shape:(outmaps,)
)
 U (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = svd_affine(<args>)

nnabla.parametric_functions.
svd_convolution
(inp, outmaps, kernel, r, pad=None, stride=None, dilation=None, uv_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶ SVD convolution is a low rank approximation of the convolution layer. It can be seen as a depth wise convolution followed by a 1x1 convolution.
The flattened kernels for the ith input map are expressed by their low rank approximation. The kernels for the ith input \({\mathbf W_i}\) are approximated with the singular value decomposition (SVD) and by selecting the \({R}\) dominant singular values and the corresponding singular vectors.
\[{\mathbf W_{:,i,:}} ~ {\mathbf U_i} {\mathbf V_i}.\]\({\mathbf U}\) contains the weights of the depthwise convolution with multiplier \({R}\) and \({\mathbf V}\) contains the weights of the 1x1 convolution.
If uv_init is a numpy array, \({\mathbf U}\) and \({\mathbf V}\) are computed such that uv_init is approximated by \({\mathbf{UV}}\). If uv_init is None or an initializer, the product of \({\mathbf U}\) and \({\mathbf V}\) approximates the random initialization.
If \({\mathbf U}\) and \({\mathbf V}\) exist in the context, they take precedence over uv_init.
Suppose the kernel tensor of the convolution is of \({O \times I \times K \times K}\) and the compression rate you want to specify is \({CR}\), then you set \({R}\) as
\[R = \left\lfloor \frac{(1  CR)OIK^2}{I(O + K^2)} \right\rfloor.\]Parameters:  inp (Variable) – ND array.
 outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
 kernel (tuple) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) twodimensional kernel, specify (3, 5).
 r (int) – Rank of the factorized layer.
 pad (tuple) – Padding sizes (int) for dimensions.
 stride (tuple) – Stride sizes (int) for dimensions.
 dilation (tuple) – Dilation sizes (int) for dimensions.
 uv_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
Returns: \((B + 1)\)D array. (\(M_0 \times \ldots \times M_{B1} \times L\))
Return type:  Parameters to be registered
The following variables are registered in a parameter scope
"svd_conv"
; U (
need_grad=True
) : Decomposed filter weights \({\mathbf U}\). (shape:(inmaps * r, *kernel)
)  V (
need_grad=True
) : Decomposed filter weights \({\mathbf V}\). (shape:(outmaps, inmaps * r, 1, ...)
)  b (
need_grad=True
) : Bias vector. (shape:(outmaps,)
)
 U (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = svd_convolution(<args>)

nnabla.parametric_functions.
cpd3_convolution
(inp, outmaps, kernel, r, pad=None, stride=None, dilation=None, oik_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, max_iter=500, stopping_criterion=1e05, lambda_reg=0.0, name=None)[source]¶ CP convolution is a low rank approximation of a convolution layer. A 3D tensor containing the parameter is built by collapsing the ND kernels into 1D, then the tensor is decomposed into three matrices. The decomposed layer can be seen as linear combinations of the input feature maps to \({R}\) feature maps followed by a depthwise convolution and followed by linear combinations of the feature maps to compute the output feature maps.
The CP decomposition allows to approximate the kernel tensor by \({R}\) rank1 tensors of the form:
\[\sum_{r=1}^{R} \lambda_r {\mathbf{o}^{(r)} \otimes \mathbf{i}^{(r)} \otimes \mathbf{k}^{(r)}},\]where \({\lambda}_r\) is the normalization coefficient and \({\otimes}\) is the outer product.
If oik_init is a numpy array, U and V are computed so that uv_init can be approximates from UV If oik_init is None or an initializer, the product of U and V approximate the randomly initialized array
If O, I and K exist in context, they are used to initialize the layer and oik_init is not used.
Suppose the kernel tensor of the affine is of \({I \times O}\) and the compression rate you want to specify is \({CR}\), then you set \({R}\) as
\[R = \left\lfloor \frac{(1  CR)OIK^2}{O + I + K^2} \right\rfloor.\]References
 Lebedev, Vadim, Yaroslav Ganin, Maksim Rakhuba, Ivan Oseledets, and Victor Lempitsky, “Speedingup convolutional neural networks using finetuned cpdecomposition.”, arXiv preprint arXiv:1412.6553 (2014).
 Marcella Astrid, SeungIk Lee, “CPdecomposition with Tensor Power Method for Convolutional Neural Networks Compression”, BigComp 2017.
Parameters:  inp (Variable) – ND array.
 outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
 kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) twodimensional kernel, specify (3,5).  r (int) – rank of the factorized layer
 pad (
tuple
ofint
) – Padding sizes for dimensions.  stride (
tuple
ofint
) – Stride sizes for dimensions.  dilation (
tuple
ofint
) – Dilation sizes for dimensions.  oik_init (numpy array or
nnabla.initializer.BaseInitializer
) – Initializer for weight. Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. It is initialized with zeros if with_bias is True.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
 max_iter (int) – Max iteration of the ALS.
 stopping_criterion (float) – Threshold for stopping the ALS. If the value is negative, the convergence check is ignored; in other words, it may reduce the computation time.
 lambda_reg (float) – regularization parameter for the ALS. Larger lambda_reg means larger regularization.
Returns: \((B + 1)\)D array. (\(M_0 \times \ldots \times M_{B1} \times L\))
Return type:  Parameters to be registered
The following variables are registered in a parameter scope
"cpd3_conv"
; I (
need_grad=True
) : Decomposed filter weights \({\mathbf I}\). (shape:(r, inmaps, 1, ...)
)  K (
need_grad=True
) : Decomposed filter weights \({\mathbf K}\). (shape:(r, *kernel)
)  O (
need_grad=True
) : Decomposed filter weights \({\mathbf O}\). (shape:(outmaps, r, 1, ...)
)  b (
need_grad=True
) : Bias vector. (shape:(outmaps,)
)
 I (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = cpd3_convolution(<args>)

nnabla.parametric_functions.
binary_connect_affine
(inp, n_outmaps, base_axis=1, quantize_zero_to=1.0, w_init=None, wb_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶ Binary Connect Affine, multiplierless innerproduct.
Binary Connect Affine is an affine function, except the definition of the inner product is modified. The inputoutput relation of this function is as follows:
\[y_i = \sum_{i} sign(w_i) x_i.\]Therefore \(sign(w_i)\) is either \(1\) or \(1\) and the inner product simplifies to addition.
This function should be used together with Batch Normalization.
References
M. Courbariaux, Y. Bengio, and J.P. David. “BinaryConnect: Training Deep Neural Networks with binary weights during propagations.” Advances in Neural Information Processing Systems. 2015.
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the binarized weights (binary_weight)
2) The weights and the binary weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the binary weights will not be in sync.3) Quantized values are stored as floating point number for binary_weight, since this function is only for simulation purposes.
Parameters:  inp (Variable) – Input ND array with shape (\(M_0 \times \ldots \times M_{B1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.
 n_outmaps (int or
tuple
ofint
) – Number of output neurons per data.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 quantize_zero_to (float) – Input value at zero is quantized to this value.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.  wb_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for binary weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.  fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
Returns:  Parameters to be registered
The following variables are registered in a parameter scope
"bicon_affine"
; W (
need_grad=True
) : Weight matrix in floating type. (shape:(inmaps, outmaps)
)  Wb (
need_grad=False
) : Binarized weights. (shape:(inmaps, outmaps)
)  b (
need_grad=True
) : Bias vector. (shape:(outmaps,)
)
 W (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = binary_connect_affine(<args>)

nnabla.parametric_functions.
binary_connect_convolution
(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, quantize_zero_to=1.0, w_init=None, wb_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶ Binary Connect Convolution, multiplierless innerproduct.
Binary Connect Convolution is the convolution function, except the definition of the inner product is modified. The inputoutput relation of this function is as follows:
\[y_{n, a, b} = \sum_{m} \sum_{i} \sum_{j} sign(w_{n, m, i, j}) x_{m, a + i, b + j}.\]Therefore \(sign(w_i)\) is either \(1\) or \(1\) and the inner product simplifies to addition.
This function should be used together with BatchNormalization.
References
M. Courbariaux, Y. Bengio, and J.P. David. “BinaryConnect: Training Deep Neural Networks with binary weights during propagations.” Advances in Neural Information Processing Systems. 2015.
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the binarized weights (binary_weight)
2) The weights and the binary weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the binary weights will not be in sync.3) Quantized values are stored as floating point number for binary_weight, since this function is only for simulation purposes.
Parameters:  inp (Variable) – ND array.
 outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
 kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) twodimensional kernel, specify (3,5).  pad (
tuple
ofint
) – Padding sizes for dimensions.  stride (
tuple
ofint
) – Stride sizes for dimensions.  dilation (
tuple
ofint
) – Dilation sizes for dimensions.  group (int) – Number of groups of channels. This makes connections across channels sparser by grouping connections along map direction.
 quantize_zero_to (float) – Input value at zero is quantized to this value.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.  wb_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for binary weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
Returns:  Parameters to be registered
The following variables are registered in a parameter scope
"bicon_conv"
; W (
need_grad=True
) : Filter weights in float. (shape:(outmaps, inmaps, *kernel)
)  Wb (
need_grad=False
) : Binarized filter weights. (shape:(outmaps, inmaps, *kernel)
)  b (
need_grad=True
) : Bias vector. (shape:(outmaps,)
)
 W (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = binary_connect_convolution(<args>)

nnabla.parametric_functions.
binary_weight_affine
(inp, n_outmaps, base_axis=1, quantize_zero_to=1.0, w_init=None, wb_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶ Binary Weight Affine, multiplierless innerproduct with a scale factor.
Binary Weight Affine is the affine function, but the inner product in this function is the following,
\[y_j = \frac{1}{\\mathbf{w}_j\_{\ell_1}} \sum_{i} sign(w_{ji}) x_i\]Therefore \(sign(w_{ji})\) is either \(1\) or \(1\) and the inner product simplifies to addition followed by scaling factor \(\alpha = \frac{1}{\\mathbf{w}_j\_{\ell_1}}\). The number of :\(\alpha\) is the outmaps of the affine function.
References
Rastegari, Mohammad, et al. “XNORNet: ImageNet Classification Using Binary Convolutional Neural Networks.” arXiv preprint arXiv:1603.05279 (2016).
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the binarized weights (binary_weight)
2) The weights and the binary weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the binary weights will not be in sync.3) Quantized values are stored as floating point number for binary_weight, since this function is only for simulation purposes.
Parameters:  inp (Variable) – Input ND array with shape (\(M_0 \times \ldots \times M_{B1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it was a matrix.
 n_outmaps (int or
tuple
ofint
) – Number of output neurons per data.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 quantize_zero_to (float) – Input value at zero is quantized to this value.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.  wb_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the binary weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the bias. By defalut, it is initialized with zeros if with_bias is True.  fix_parameters (bool) – When set to True, the weight and bias will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
Returns:  Parameters to be registered
The following variables are registered in a parameter scope
"bwn_affine"
; W (
need_grad=True
) : Weight matrix in floating type. (shape:(inmaps, outmaps)
)  Wb (
need_grad=False
) : Binarized weights. (shape:(inmaps, outmaps)
)  alpha (
need_grad=False
) : Scaling factor \(\alpha\). (shape:(outmaps,)
)  b (
need_grad=True
) : Bias vector. (shape:(outmaps,)
)
 W (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = binary_weight_affine(<args>)

nnabla.parametric_functions.
binary_weight_convolution
(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, quantize_zero_to=1.0, w_init=None, wb_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶ Binary Weight Convolution, multiplierless innerproduct with a scale factor.
Binary Weight Convolution is the convolution function, but the inner product in this function is the following,
\[y_{n, a, b} = \frac{1}{\\mathbf{w}_n\_{\ell_1}} \sum_{m} \sum_{i} \sum_{j} sign(w_{n, m, i, j}) x_{m, a + i, b + j}.\]Therefore \(sign(w_{n, m, i, j})\) is either \(1\) or \(1\) and the inner product simplifies to addition followed by scaling factor \(\alpha = \frac{1}{\\mathbf{w}_n\_{\ell_1}}\). The number of \(n\) is the number of outmaps of the convolution function.
References
Rastegari, Mohammad, et al. “XNORNet: ImageNet Classification Using Binary Convolutional Neural Networks.” arXiv preprint arXiv:1603.05279 (2016).
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the binarized weights (binary_weight)
2) The weights and the binary weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the binary weights will not be in sync.3) Quantized values are stored as floating point number for binary_weight, since this function is only for simulation purposes.
Parameters:  inp (Variable) – ND array.
 outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
 kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) twodimensional kernel, specify (3,5).  pad (
tuple
ofint
) – Padding sizes for dimensions.  stride (
tuple
ofint
) – Stride sizes for dimensions.  dilation (
tuple
ofint
) – Dilation sizes for dimensions.  group (int) – Number of groups of channels. This makes connections across channels sparser by grouping connections along map direction.
 quantize_zero_to (float) – Input value at zero is quantized to this value.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.  wb_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for binary weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
Returns:  Parameters to be registered
The following variables are registered in a parameter scope
"bwn_conv"
; W (
need_grad=True
) : Filter weights in float. (shape:(outmaps, inmaps, *kernel)
)  Wb (
need_grad=False
) : Binarized filter weights. (shape:(outmaps, inmaps, *kernel)
)  alpha (
need_grad=False
) : Scaling factor \(\alpha\). (shape:(outmaps,)
)  b (
need_grad=True
) : Bias vector. (shape:(outmaps,)
)
 W (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = binary_weight_convolution(<args>)

nnabla.parametric_functions.
inq_affine
(inp, n_outmaps, base_axis=1, num_bits=4, inq_iterations=(), selection_algorithm='random', seed=1, w_init=None, i_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶ Incremental Network Quantization Affine Layer
During training, the weights are sequentially quantized to poweroftwo values, which allows the training of a multiplierless network.
Using inq_iterations, one can specify after how many forward passes half of the learnable weights are fixed and quantized to powersoftwo. After reaching the last value in inq_iterations, all weights are fixed.
For more details, please refer to the reference.
Reference: Zhou A, Yao A, Guo Y, Xu L, Chen Y. Incremental network quantization: Towards lossless CNNs with lowprecision weights. <https://arxiv.org/abs/1702.03044>
Parameters:  inp (Variable) – Input ND array with shape (\(M_0 \times \ldots \times M_{B1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it was a matrix.
 n_outmaps (int or
tuple
ofint
) – Number of output neurons per data.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 quantize_zero_to (float) – Input value at zero is quantized to this value.
 num_bits (int) – Number of bits per weight. Value has to be larger than 1 as one bit is already used to code the value “0”
 inq_iterations (tuple of int) – Tuple of iteration numbers at which we fix half of the weights.
 selection_algorithm (str) – Chooses algorithm that is used to decide which weights are fixed. (“largest_abs” … fix weights with largest absolute value, “random” … fix weights randomly)
 seed (int) – Random seed for INQ algorithm
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.  i_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for indicators (0 … learnable, 1 … fixed). By default, it is initialized with zeros.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.  fix_parameters (bool) – When set to True, the weight and bias will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
Returns:  Parameters to be registered
The following variables are registered in a parameter scope
"inq_affine"
; W (
need_grad=True
) : Weight matrix in floating type. (shape:(inmaps, outmaps)
)  I (
need_grad=False
) : Binary indicator matrix of fixed weights. (shape:(inmaps, outmaps)
)  b (
need_grad=True
) : Bias vector. (shape:(outmaps,)
)
 W (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = inq_affine(<args>)

nnabla.parametric_functions.
inq_convolution
(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, num_bits=4, inq_iterations=(), selection_algorithm='random', seed=1, w_init=None, i_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶ Incremental Network Quantization Convolution Layer
During training, the weights are sequentially quantized to poweroftwo values, which allows the training of a multiplierless network.
Using inq_iterations, one can specify after how many forward passes half of the learnable weights are fixed and quantized to powersoftwo. After reaching the last value in inq_iterations, all weights are fixed.
For more details, please refer to the reference.
Reference: Zhou A, Yao A, Guo Y, Xu L, Chen Y. Incremental network quantization: Towards lossless CNNs with lowprecision weights. <https://arxiv.org/abs/1702.03044>
Parameters:  inp (Variable) – Input ND array with shape (\(M_0 \times \ldots \times M_{B1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it was a matrix.
 n_outmaps (int or
tuple
ofint
) – Number of output neurons per data.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 num_bits (int) – Number of bits per weight. Value has to be larger than 1 as one bit is already used to code the value “0”
 inq_iterations (tuple of int) – Tuple of iteration numbers at which we fix half of the weights.
 selection_algorithm (str) – Chooses algorithm that is used to decide which weights are fixed. (“largest_abs” … fix weights with largest absolute value, “random” … fix weights randomly)
 seed (int) – Random seed for INQ algorithm
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.  i_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the indicators (0 … learnable, 1 … fixed). By default, it is initialized with zeros.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the bias. By default, it is initialized with zeros if with_bias is True.  fix_parameters (bool) – When set to True, the weight and bias will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
Returns:  Parameters to be registered
The following variables are registered in a parameter scope
"inq_conv"
; W (
need_grad=True
) : Filter weights in float. (shape:(outmaps, inmaps, *kernel)
)  I (
need_grad=False
) : Binary indicator matrix of fixed weights. (shape:(outmaps, inmaps, *kernel)
)  b (
need_grad=True
) : Bias vector. (shape:(outmaps,)
)
 W (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = inq_convolution(<args>)

nnabla.parametric_functions.
fixed_point_quantized_affine
(inp, n_outmaps, base_axis=1, w_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, quantize_w=True, sign_w=True, n_w=8, delta_w=0.0625, ste_fine_grained_w=True, quantize_b=True, sign_b=True, n_b=8, delta_b=0.0625, ste_fine_grained_b=True, name=None)[source]¶ FixedPoint Quantized Affine.
FixedPoint Quantized Affine is the affine function, except the definition of the inner product is modified. The inputoutput relation of this function is as follows:
\[y_j = \sum_{i} Q(w_{ji}) x_i,\]where \(Q(w_{ji})\) is the fixedpoint quantization function.
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the quantized weights (quantized weight)
2) The weights and the quantized weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the quantized weights will not be in sync.3) CPU and GPU implementations now use float value for quantized weight, since this function is only for simulation purposes.
Parameters:  inp (Variable) – Input ND array with shape (\(M_0 \times \ldots \times M_{B1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.
 n_outmaps (
int
ortuple
ofint
) – Number of output neurons per data.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.  fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
 quantize_w (bool) – Quantize weights if True.
 sign_w (bool) – Use signed quantization if True.
 n_w (int) – Bit width used for weight.
 delta_w (float) – Step size for weight.
 ste_fine_grained_w (bool) – STE is finegrained if True.
 quantize_b (bool) – Quantize bias if True.
 n_b (int) – Bit width used for bias.
 delta_w – Step size for bias.
 ste_fine_grained_b (bool) – STE is finegrained if True.
Returns: \((B + 1)\)D array. (\(M_0 \times \ldots \times M_{B1} \times L\))
Return type:  Parameters to be registered
The following variables are registered in a parameter scope
"fp_quantized_affine"
; W (
need_grad=True
) : Weight matrix in float. (shape:(inmaps, outmaps)
)  b (
need_grad=True
) : Bias vector in float. (shape:(outmaps,)
)  W_q (
need_grad=False
) : Quantized weights. (shape:(inmaps, outmaps)
)  b_q (
need_grad=False
) : Quantized biases. (shape:(outmaps,)
)
 W (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = fixed_point_quantized_affine(<args>)

nnabla.parametric_functions.
fixed_point_quantized_convolution
(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, quantize_w=True, sign_w=True, n_w=8, delta_w=0.0625, ste_fine_grained_w=True, quantize_b=True, sign_b=True, n_b=8, delta_b=0.0625, ste_fine_grained_b=True, name=None)[source]¶ FixedPoint Quantized Convolution.
FixedPoint Quantized Convolution is the convolution function, except the definition of the inner product is modified. The inputoutput relation of this function is as follows:
\[y_{n, a, b} = \sum_{m} \sum_{i} \sum_{j} Q(w_{n, m, i, j}) x_{m, a + i, b + j},\]where \(Q(w_{n, m, i, j})\) is the fixedpoint quantization function.
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the quantized weights (quantized weight)
2) The weights and the quantized weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the quantized weights will not be in sync.3) CPU and GPU implementations now use float value for quantized weight, since this function is only for simulation purposes.
Parameters:  inp (Variable) – ND array.
 outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
 kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) twodimensional kernel, specify (3,5).  pad (
tuple
ofint
) – Padding sizes for dimensions.  stride (
tuple
ofint
) – Stride sizes for dimensions.  dilation (
tuple
ofint
) – Dilation sizes for dimensions.  group (int) – Number of groups of channels. This makes connections across channels more sparse by grouping connections along map direction.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
 quantize_w (bool) – Quantize weights if True.
 quantize_bias (bool) – Quantize bias if True.
 sign_w (bool) – Use signed quantization if True.
 n_w (int) – Bit width used for weight.
 delta_w (float) – Step size for weight.
 ste_fine_grained_w (bool) – STE is finegrained if True.
 quantize_b (bool) – Quantize bias if True.
 n_b (int) – Bit width used for bias.
 delta_w – Step size for bias.
 ste_fine_grained_b (bool) – STE is finegrained if True.
Returns: ND array.
Return type:  Parameters to be registered
The following variables are registered in a parameter scope
"fp_quantized_conv"
; W (
need_grad=True
) : Filter weights in float. (shape:(outmaps, inmaps // group, *kernel)
)  b (
need_grad=True
) : Bias vector in float. (shape:(outmaps,)
)  W_q (
need_grad=False
) : Quantized weights. (shape:(outmaps, inmaps // group, *kernel)
)  b_q (
need_grad=False
) : Quantized biases. (shape:(outmaps,)
)
 W (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = fixed_point_quantized_convolution(<args>)

nnabla.parametric_functions.
pow2_quantized_affine
(inp, n_outmaps, base_axis=1, w_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, quantize_w=True, sign_w=True, with_zero_w=False, n_w=8, m_w=2, ste_fine_grained_w=True, quantize_b=True, sign_b=True, with_zero_b=False, n_b=8, m_b=2, ste_fine_grained_b=True, name=None)[source]¶ Pow2 Quantized Affine.
Pow2 Quantized Affine is the affine function, except the definition of the inner product is modified. The inputoutput relation of this function is as follows:
\[y_j = \sum_{i} Q(w_{ji}) x_i,\]where \(Q(w_{ji})\) is the powerof2 quantization function.
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the quantized weights (quantized weight)
2) The weights and the quantized weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the quantized weights will not be in sync.3) Quantized values are stored as floating point number for quantized weight, since this function is only for simulation purposes.
Parameters:  inp (Variable) – Input ND array with shape (\(M_0 \times \ldots \times M_{B1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.
 n_outmaps (
int
ortuple
ofint
) – Number of output neurons per data.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.  fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
 quantize_w (bool) – Quantize weights if True.
 sign_w (bool) – Use signed quantization if True.
 with_zero_w (bool) – Indicate using zero as a quantized value. Default is false.
 n_w (int) – Bit width used for weight.
 m_w (int) – \(2^m\) is upper bound and \(2^m\) is lower bound for weights. Default is 2.
 ste_fine_grained_w (bool) – STE is finegrained if True.
 quantize_b (bool) – Quantize bias if True.
 with_zero_b (bool) – Indicate using zero as a quantized value. Default is false.
 n_b (int) – Bit width used for bias.
 m_b (int) – \(2^m\) is upper bound and \(2^m\) is lower bound for bias. Default is 2.
 ste_fine_grained_b (bool) – STE is finegrained if True.
Returns: \((B + 1)\)D array. (\(M_0 \times \ldots \times M_{B1} \times L\))
Return type:  Parameters to be registered
The following variables are registered in a parameter scope
"pow2_quantized_affine"
; W (
need_grad=True
) : Weight matrix in float. (shape:(inmaps, outmaps)
)  b (
need_grad=True
) : Bias vector in float. (shape:(outmaps,)
)  W_q (
need_grad=False
) : Quantized weights. (shape:(inmaps, outmaps)
)  b_q (
need_grad=False
) : Quantized biases. (shape:(outmaps,)
)
 W (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = pow2_quantized_affine(<args>)

nnabla.parametric_functions.
pow2_quantized_convolution
(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, quantize_w=True, with_zero_w=False, sign_w=True, n_w=8, m_w=2, ste_fine_grained_w=True, quantize_b=True, with_zero_b=False, sign_b=True, n_b=8, m_b=2, ste_fine_grained_b=True, name=None)[source]¶ Pow2 Quantized Convolution.
Pow2 Quantized Convolution is the convolution function, except the definition of the inner product is modified. The inputoutput relation of this function is as follows:
\[y_{n, a, b} = \sum_{m} \sum_{i} \sum_{j} Q(w_{n, m, i, j}) x_{m, a + i, b + j},\]where \(Q(w_{n, m, i, j})\) is the powerof2 quantization function.
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the quantized weights (quantized weight)
2) The weights and the quantized weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the quantized weights will not be in sync.3) Quantized values are stored as floating point number for quantized weight, since this function is only for simulation purposes.
Parameters:  inp (Variable) – ND array.
 outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
 kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) twodimensional kernel, specify (3,5).  pad (
tuple
ofint
) – Padding sizes for dimensions.  stride (
tuple
ofint
) – Stride sizes for dimensions.  dilation (
tuple
ofint
) – Dilation sizes for dimensions.  group (int) – Number of groups of channels. This makes connections across channels more sparse by grouping connections along map direction.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
 quantize_w (bool) – Quantize weights if True.
 sign_w (bool) – Use signed quantization if True.
 n_w (int) – Bit width used for weight.
 m_w (int) – \(2^m\) is upper bound and \(2^m\) is lower bound for weights. Default is 2.
 ste_fine_grained_w (bool) – STE is finegrained if True.
 quantize_b (bool) – Quantize bias if True.
 sign_b (bool) – Use signed quantization if True.
 n_b (int) – Bit width used for bias.
 m_b (int) – \(2^m\) is upper bound and \(2^m\) is lower bound for bias. Default is 2.
 ste_fine_grained_b (bool) – STE is finegrained if True.
Returns: ND array.
Return type:  Parameters to be registered
The following variables are registered in a parameter scope
"pow2_quantized_conv"
; W (
need_grad=True
) : Filter weights in float. (shape:(outmaps, inmaps // group, *kernel)
)  b (
need_grad=True
) : Bias vector in float. (shape:(outmaps,)
)  W_q (
need_grad=False
) : Quantized weights. (shape:(outmaps, inmaps // group, *kernel)
)  b_q (
need_grad=False
) : Quantized biases. (shape:(outmaps,)
)
 W (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = pow2_quantized_convolution(<args>)

nnabla.parametric_functions.
pruned_affine
(inp, n_outmaps, base_axis=1, w_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, prune_w=True, rate_w=0.9, prune_b=True, rate_b=0.9, name=None)[source]¶ Pruned Affine.
Pruned Affine is the affine function, except the definition of the inner product is modified. The inputoutput relation of this function is as follows:
\[y_j = \sum_{i} Q(w_{ji}) x_i,\]where \(Q(w_{ji})\) is the pruning function, i.e., F.prune.
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the quantized weights (quantized weight)
2) The weights and the quantized weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the quantized weights will not be in sync.3) CPU and GPU implementations now use float value for quantized weight, since this function is only for simulation purposes.
Parameters:  inp (Variable) – Input ND array with shape (\(M_0 \times \ldots \times M_{B1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.
 n_outmaps (
int
ortuple
ofint
) – Number of output neurons per data.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias.  fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
 prune_w (bool) – Quantize weights if True.
 rate_w (float) – Pruning rate for weights.
 prune_b (bool) – Quantize bias if True.
 rate_b (float) – Pruning rate for bias.
Returns: \((B + 1)\)D array. (\(M_0 \times \ldots \times M_{B1} \times L\))
Return type:  Parameters to be registered
The following variables are registered in a parameter scope
"pruned_affine"
; W (
need_grad=True
) : Weight matrix in float. (shape:(inmaps, outmaps)
)  b (
need_grad=True
) : Bias vector in float. (shape:(outmaps,)
)  W_q (
need_grad=False
) : Qunatized weights. (shape:(inmaps, outmaps)
)  b_q (
need_grad=False
) : Quantized biases. (shape:(outmaps,)
)
 W (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = pruned_affine(<args>)

nnabla.parametric_functions.
pruned_convolution
(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, prune_w=True, rate_w=0.9, prune_b=True, rate_b=0.9, name=None)[source]¶ Pruned Convolution.
Pruned Convolution is the convolution function, except the definition of the inner product is modified. The inputoutput relation of this function is as follows:
\[y_{n, a, b} = \sum_{m} \sum_{i} \sum_{j} Q(w_{n, m, i, j}) x_{m, a + i, b + j},\]where \(Q(w_{ji})\) is the pruning function, i.e., F.prune.
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the quantized weights (quantized weight)
2) The weights and the quantized weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the quantized weights will not be in sync.3) CPU and GPU implementations now use float value for quantized weight, since this function is only for simulation purposes.
Parameters:  inp (Variable) – ND array.
 outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
 kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) twodimensional kernel, specify (3,5).  pad (
tuple
ofint
) – Padding sizes for dimensions.  stride (
tuple
ofint
) – Stride sizes for dimensions.  dilation (
tuple
ofint
) – Dilation sizes for dimensions.  group (int) – Number of groups of channels. This makes connections across channels more sparse by grouping connections along map direction.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias.  base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.
 fix_parameters (bool) – When set to True, the weights and biases will not be updated.
 rng (numpy.random.RandomState) – Random generator for Initializer.
 with_bias (bool) – Specify whether to include the bias term.
 prune_w (bool) – Quantize weights if True.
 rate_w (float) – Pruning rate for weights.
 prune_b (bool) – Quantize bias if True.
 rate_b (float) – Pruning rate for bias.
Returns: ND array.
Return type:  Parameters to be registered
The following variables are registered in a parameter scope
"pruned_conv"
; W (
need_grad=True
) : Filter weights in float. (shape:(outmaps, inmaps // group, *kernel)
)  b (
need_grad=True
) : Bias vector in float. (shape:(outmaps,)
)  W_q (
need_grad=False
) : Qunatized weights. (shape:(outmaps, inmaps // group, *kernel)
)  b_q (
need_grad=False
) : Quantized biases. (shape:(outmaps,)
)
 W (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = pruned_convolution(<args>)

nnabla.parametric_functions.
lstm_cell
(x, h, c, state_size, w_init=None, b_init=None, fix_parameters=False, name=None)[source]¶ Long ShortTerm Memory.
Long ShortTerm Memory, or LSTM, is a building block for recurrent neural networks (RNN) layers. LSTM unit consists of a cell and input, output, forget gates whose functions are defined as following:
\[\begin{split}f_t&&=\sigma(W_fx_t+U_fh_{t1}+b_f) \\ i_t&&=\sigma(W_ix_t+U_ih_{t1}+b_i) \\ o_t&&=\sigma(W_ox_t+U_oh_{t1}+b_o) \\ c_t&&=f_t\odot c_{t1}+i_t\odot\tanh(W_cx_t+U_ch_{t1}+b_c) \\ h_t&&=o_t\odot\tanh(c_t).\end{split}\]References
S. Hochreiter, and J. Schmidhuber. “Long ShortTerm Memory.” Neural Computation. 1997.
Parameters:  x (Variable) – Input ND array with shape (batch_size, input_size).
 h (Variable) – Input ND array with shape (batch_size, state_size).
 c (Variable) – Input ND array with shape (batch_size, state_size).
 state_size (int) – Internal state size is set to state_size.
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.  fix_parameters (bool) – When set to True, the weights and biases will not be updated.
Returns:  Parameters to be registered
The following variables are registered in a parameter scope
"lstm"
; affine/W (
need_grad=True
) : Stacked weight matrixes of LSTM block. (shape:(inmaps, 4, state_size)
)  affine/b (
need_grad=True
) : Stacked bias vectors of LSTM block. (shape:(4, state_size,)
)
 affine/W (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = lstm_cell(<args>)

class
nnabla.parametric_functions.
LSTMCell
(batch_size, state_size, h=None, c=None, name=None)[source]¶ 
__call__
(x, w_init, b_init, fix_parameters)[source]¶ Updates h and c by calling lstm function.
Parameters:  x (Variable) – Input ND array with shape (batch_size, input_size).
 w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.  b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.  fix_parameters (bool) – When set to True, the weights and biases will not be updated.


nnabla.parametric_functions.
spectral_norm
(w, dim=0, itr=1, eps=1e12, test=False, u_init=None, fix_parameters=True, name=None)[source]¶ Spectral Normalization.
\[W_{sn} = \frac{W}{\sigma(W)}.\]where \(W\) is the input matrix, and the \(\sigma(W)\) is the spectral norm of \(W\). The spectral norm is approximately computed by the power iteration.
References
Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida, “Spectral Normalization for Generative Adversarial Networks”, International Conference on Learning Representations. 2018.
Parameters:  W (Variable) – Input ND array with shape. This is normally network parameter.
 dim (int) – Output dimension. Default is 0. If the dimension is not 0, then the specified dimension becomes the mostleft dimension by transposing.
 itr (int) – Number of iterations. Default is 1.
 eps (float) – Epsilon for the normalization. Default is 1e12.
 test (bool) – Use test mode. Default is False.
Returns: Spectrally normalized \(W_{sn}\) with the same shape as \(W\).
Return type: Example
import nnabla as nn import nnabla.parametric_functions as PF b, c, h, w = 4, 64, 32, 32 # Spectrally normalized convolution apply_w = lambda w: PF.spectral_norm(w, dim=0) h = nn.Variable.from_numpy_array(np.random.randn(b, c, h, w)) h = PF.convolution(h, with_bias=False, apply_w=apply_w) # Spectrally normalized affine apply_w = lambda w: PF.spectral_norm(w, dim=1) h = nn.Variable.from_numpy_array(np.random.randn(b, c)) h = PF.affine(h, with_bias=False, apply_w=apply_w) # Spectrally normalized embed apply_w = lambda w: PF.spectral_norm(w, dim=1) h = nn.Variable.from_numpy_array(np.random.randn(b, c)) h = PF.embed(h, c, apply_w=apply_w)
 Parameters to be registered
The following variables are registered in a parameter scope
"spectralnorm"
; W_sn (
need_grad=False
) : Spectral Normalized Weight matrix.. (shape:w.shape
)  u (
need_grad=False
) : singular vector. (shape:(w.shape[dim], )
)
 W_sn (
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = spectral_norm(<args>)
Parameter Initializer¶
Some of the parametric functions optionally takes parameter initializer listed below.

class
nnabla.initializer.
BaseInitializer
[source]¶ Base class of the parameter initializer.

__call__
(shape)[source]¶ Generates an array with an initializer.
Parameters: shape ( tuple
ofint
) –numpy.ndarray
with the shape created.Returns: Array. Return type: numpy.ndarray
Note
Subclasses of
BaseInitializer
must override this method.


class
nnabla.initializer.
ConstantInitializer
(value=0)[source]¶ Bases:
nnabla.initializer.BaseInitializer
Generates a constant valued array.
Parameters: value (float) – A constant value. Example:
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.initializer as I x = nn.Variable([60,1,28,28]) w = I.ConstantInitializer(0.1) b = I.ConstantInitializer() # this generates constant valued array of default value 0 h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv'

class
nnabla.initializer.
NormalInitializer
(sigma=1.0, rng=None)[source]¶ Bases:
nnabla.initializer.BaseInitializer
Generates a random array from a specified normal distribution.
\[\mathbf x \sim {\cal N} (\mathbf 0  \sigma^2 \mathbf I)\]Parameters:  sigma (float) – \(\sigma\).
 rng (numpy.random.RandomState) – Random number generator.
Example:
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.initializer as I x = nn.Variable([60,1,28,28]) w = I.NormalInitializer(5e5) b = I.NormalInitializer(0.0) h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv')

class
nnabla.initializer.
UniformInitializer
(lim=(1, 1), rng=None)[source]¶ Bases:
nnabla.initializer.BaseInitializer
Generates a random array from a specified uniform distribution.
\[\mathbf x \sim {\cal U} (a, b)\]Parameters:  lim (
tuple
offloat
) – A tuple of two floats, \((a, b)\).  rng (numpy.random.RandomState) – Random number generator.
Example:
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.initializer as I x = nn.Variable([60,1,28,28]) w = I.UniformInitializer() # this generates uniform distribution within the default range of (1,1) b = I.UniformInitializer((0.5,0.5)) h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv')
 lim (

nnabla.initializer.
calc_normal_std_he_forward
(inmaps, outmaps, kernel=(1, 1))[source]¶ Calculates the standard deviation proposed by He et al.
\[\sigma = \sqrt{\frac{2}{NK}}\]Parameters: Example:
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.initializer as I x = nn.Variable([60,1,28,28]) s = I.calc_normal_std_he_forward(x.shape[1],64) w = I.NormalInitializer(s) b = I.ConstantInitializer(0) h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv')
References

nnabla.initializer.
calc_normal_std_he_backward
(inmaps, outmaps, kernel=(1, 1))[source]¶ Calculates the standard deviation of He et al. (backward case).
\[\sigma = \sqrt{\frac{2}{MK}}\]Parameters: Example:
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.initializer as I x = nn.Variable([60,1,28,28]) s = I.calc_normal_std_he_backward(x.shape[1],64) w = I.NormalInitializer(s) b = I.ConstantInitializer(0) h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv')
References

nnabla.initializer.
calc_normal_std_glorot
(inmaps, outmaps, kernel=(1, 1))[source]¶ Calculates the standard deviation proposed by Glorot et al.
\[\sigma = \sqrt{\frac{2}{NK + M}}\]Parameters: Example:
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.initializer as I x = nn.Variable([60,1,28,28]) s = I.calc_normal_std_glorot(x.shape[1],64) w = I.NormalInitializer(s) b = I.ConstantInitializer(0) h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv')
References

nnabla.initializer.
calc_uniform_lim_glorot
(inmaps, outmaps, kernel=(1, 1))[source]¶ Calculates the lower bound and the upper bound of the uniform distribution proposed by Glorot et al.
\[\begin{split}b &= \sqrt{\frac{6}{NK + M}}\\ a &= b\end{split}\]Parameters: Example:
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.initializer as I x = nn.Variable([60,1,28,28]) lb,ub= I.calc_uniform_lim_glorot(x.shape[1],64) w = I.UniformInitializer((lb,ub)) b = I.ConstantInitializer(0) h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv')
References