Functions¶
All NNabla functions are derived from the nnabla.function.Function
class.
Function¶
- class nnabla.function.Function¶
Function interface class.
Instances of
nnabla.function.Function
are not directly created by users. It is indirectly created by the functions available innnabla.functions
. These functions returnnnabla.Variable
(s) holding the created function instance as the parent property.- args¶
Experimental
Get args of the function.
- auto_grad_depends_input_data(self, int i, int j)¶
- auto_grad_depends_output_data(self, int i, int o)¶
- backward(self, inputs, outputs, accum=None)¶
- forward(self, inputs, outputs)¶
- grad_depends_input_data(self, int i, int j)¶
- grad_depends_output_data(self, int i, int o)¶
- info¶
object
- Type:
info
- inplace_data(self, int i)¶
- inplace_data_with(self, int i)¶
- min_outputs(self)¶
- need_setup_recompute(self, int o)¶
- recompute(self, inputs, outputs)¶
- set_active_input_mask(self, mask)¶
- setup(self, inputs, outputs)¶
- setup_recompute(self, inputs, outputs)¶
- tags¶
Experimental
Get tags of the function.
- class nnabla.function.PythonFunction(ctx=None)¶
Creates a user-defined custom function in the subclsass.
To implement the naive multiplicaiton function of two variables using PythonFunction,
import nnabla as nn import nnabla.functions as F from nnabla.function import PythonFunction class Mul2(PythonFunction): def __init__(self, ctx): super(Mul2, self).__init__(ctx) @property def name(self): return self.__class__.__name__ def min_outputs(self): return 1 def setup_impl(self, inputs, outputs): i0 = inputs[0] i1 = inputs[1] assert i0.shape == i1.shape, "Shapes of inputs are different." o0 = outputs[0] o0.reset_shape(i0.shape, True) def forward_impl(self, inputs, outputs): x0 = inputs[0].data x1 = inputs[1].data y = outputs[0].data # We can also write like, y.copy_from(x0 * x1) y.copy_from(F.mul2(x0, x1)) def backward_impl(self, inputs, outputs, propagate_down, accum): # Data of inputs and outputs x0 = inputs[0].data x1 = inputs[1].data y = outputs[0].data # Grads of inputs and outputs dx0 = inputs[0].grad dx1 = inputs[1].grad dy = outputs[0].grad # backward w.r.t. x0 if propagate_down[0]: if accum[0]: dx0 += F.mul2(dy, x1) else: dx0.copy_from(F.mul2(dy, x1)) # backward w.r.t. x1 if propagate_down[1]: if accum[1]: dx1 += F.mul2(dy, x0) else: dx1.copy_from(F.mul2(dy, x0)) def grad_depends_output_data(self, i, o): return False def grad_depends_input_data(self, i, j): return True def mul2(x, y, ctx=None): func = Mul2(ctx) return func(x, y)
- __init__(self, ctx=None)¶
- Parameters:
ctx (
nnabla.Context
) – Context used for the forward and backward pass. If not specified, the current context is used.
- backward_impl(self, inputs, outputs, propagate_down, accum)¶
Backward method.
- Parameters:
inputs – (list of
nnabla.Variable
): Inputs to the function.outputs – (list of
nnabla.Variable
): Outputs from the function.
- property ctx¶
Context Return the context if the context is set in the constructor; otherwise return the global context
- forward_impl(self, inputs, outputs)¶
Forward method.
- Parameters:
inputs – (list of
nnabla.Variable
): Inputs to the function.outputs – (list of
nnabla.Variable
): Outputs from the function.
- grad_depends_input_data(self, i, j)¶
Checking if i-th input’ gradient computation requires j-th input’s data or not.
- Parameters:
i – (list of
nnabla.Variable
): Input variable index.i – (list of
nnabla.Variable
): Input variable index.
- grad_depends_output_data(self, i, o)¶
Checking if i-th input’ gradient computation requires o-th output’s data or not.
- Parameters:
i – (list of
nnabla.Variable
): Input variable index.o – (list of
nnabla.Variable
): Output variable index.
- min_outputs(self)¶
Minimum number of outputs of the function.
- property name¶
Name of the function.
- setup_impl(self, inputs, outputs)¶
Setup method.
- Parameters:
inputs – (list of
nnabla.Variable
): Inputs to the function.outputs – (list of
nnabla.Variable
): Outputs from the function.
List of Functions¶
The nnabla.functions
module provides various types of functions listed below.
These functions takes input nnabla.Variable
(s) as its leading argument(s), followed by options
specific to each function.
Note
The functions can also take NdArray
(s) as inputs instead
of Variable
(s). It will execute the function operation immediately,
and returns NdArray
(s) as output(s) holding output values of the
operation. We call this “Imperative Mode” (NdArray + Functions).
Neural Network Layers¶
- nnabla.functions.affine(x, weight, bias=None, base_axis=1, n_outputs=-1, outputs=None)[source]¶
Affine layer, also called as the fully connected layer. It calculates:
\[{\mathbf y} = {\mathbf A} {\mathbf x} + {\mathbf b}.\]where \({\mathbf x}\) is the input and \({\mathbf y}\) is the output.
- Parameters:
x (Variable) – Input N-D array with shape (\(M_0 \times ... \times M_{B-1} \times D_B \times ... \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.
weight (Variable) – Weight matrix with shape (\((D_B \times ... \times D_N) \times L_{0} \times \ldots \times L_{I}\)) [parameter]
bias (Variable) – Bias vector (\(L_{0} \times \ldots \times L_{I}\)) [optional][parameter]
base_axis (int) – Base axis of Affine operation. Dimensions up to base_axis is treated as sample dimension. [default=
1
]
- Returns:
\((B + 1)\)-D array. (\(M_0 \times ... \times M_{B-1} \times L_{0} \times \ldots \times L_{I}\))
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.convolution(x, weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, channel_last=False, n_outputs=-1, outputs=None)[source]¶
N-D Convolution with bias.
See references for dilated convolution (a.k.a. atrous convolution).
References
Note
Convolution is a computationally intensive operation that should preferrably be run with the
cudnn
backend. NNabla then uses CuDNN library functions to determine and cache the fastest algorithm for the given set of convolution parameters, which results in additional memory consumption which may pose a problem for GPUs with insufficient memory size. In that case, theNNABLA_CUDNN_WORKSPACE_LIMIT
environment variable can be used to restrict the choice of algorithms to those that fit the given workspace memory limit, expressed in bytes. In some cases it may also be desired to restrict the automatic search to algorithms that produce deterministic (reproducable) results. This can be requested by setting the the environment variableNNABLA_CUDNN_DETERMINISTIC
to a non-zero value.- Parameters:
x (Variable) – \((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C \times L_1 \times ... \times L_N\)).
weight (Variable) – \((2 + N)\)-D array (\(C' \times C \times K_1 \times ... \times K_N\)). [parameter]
bias (Variable) – Bias vector (\(C'\)). [optional][parameter]
base_axis (int) – base axis \(B\). [default=
1
]pad (
tuple
ofint
) – Padding sizes for dimensions. [default=(0,) * (len(x.shape) - (base_axis+1))
]stride (
tuple
ofint
) – Stride sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default=
1
]channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default=
False
]
- Returns:
\((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C' \times L'_1 \times ... \times L'_N\)).
A spatial size of the output is calculated as
\[L'_i = \frac{L_i + 2 p_i - d_i (k_i - 1) - 1}{s_i} + 1,\]where \(L_i\) is the spatial size, \(p_i\) is the padding, \(d_i\) is the dilation, \(k_i\) is the kernel size, and \(s_i\) is the stride for \(i\)-th spatial dimension. The same calculation can also be applied to the other spatial dimensions.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.depthwise_convolution(x, weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, multiplier=1, n_outputs=-1, outputs=None)[source]¶
N-D Depthwise Convolution with bias.
References
- Parameters:
x (Variable) – \((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C \times L_1 \times ... \times L_N\)).
weight (Variable) – \((1 + N)\)-D array (\(C \times K_1 \times ... \times K_N\)). [parameter]
bias (Variable) – Bias vector (\(C'\)). [optional][parameter]
base_axis (int) – base axis \(B\). [default=
1
]pad (
tuple
ofint
) – Padding sizes for dimensions. [default=(0,) * (len(x.shape) - (base_axis+1))
]stride (
tuple
ofint
) – Stride sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]multiplier (int) – Number of output feature maps per input feature map. [default=
1
]
- Returns:
\((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C' \times L'_1 \times ... \times L'_N\)).
The output map size \(C'\) is \(C\) multiplied by \(m\)
\[C' = m \times C,\]where \(m\) is the multiplier.
A spatial size of the output is calculated as
\[L'_i = \frac{L_i + 2 p_i - d_i (k_i - 1) - 1}{s_i} + 1,\]where \(L_i\) is the spatial size, \(p_i\) is the padding, \(d_i\) is the dilation, \(k_i\) is the kernel size, and \(s_i\) is the stride for \(i\)-th spatial dimension. The same calculation can also be applied to the other spatial dimensions.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.deconvolution(x, weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, channel_last=False, output_padding=None, n_outputs=-1, outputs=None)[source]¶
N-D deconvolution, also known as transposed convolution, with bias operates backward convolution (derivative of the output w.r.t. the input) plus channel-wise learned bias.
The weights are specified in the same manner as
convolution()
, as if it was an ordinary convolution function. The forward operation ofdeconvolution()
will then be operationally equivalent to the backward pass ofconvolution()
. Therefore, the number of input channels (can be seen as output channels of forward convolution) is specified in the first dimension, and the number of the output channels divided by the number of groups is specified in the second dimension.For
stride > 1
, a parameter-wise identical deconvolution on the output of a convolution may not produce the same output shape as the input to the convolution if, due to striding, the convolution did not fully cover the input spatial dimension. Theoutput_padding
parameter can then be used to appropriately increase the calculated output shape. Note that this is used to find the output shape for the deconvolution operation, but not to add zero-padding to the output.- Parameters:
x (Variable) – \((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C \times L_1 \times ... \times L_N\)).
weight (Variable) – \((2 + N)\)-D array (\(C \times C' \times K_1 \times ... \times K_N\)). [parameter]
bias (Variable) – Bias vector (\(C'\)). [optional][parameter]
base_axis (int) – base axis \(B\). [default=
1
]pad (
tuple
ofint
) – Padding sizes for dimensions. [default=(0,) * (len(x.shape) - (base_axis+1))
]stride (
tuple
ofint
) – Stride sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default=
1
]channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default=
False
]output_padding (
tuple
ofint
) – Additional size added to the output shape. [default=(0,) * (len(x.shape) - (base_axis+1))
]
- Returns:
\((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C' \times L'_1 \times ... \times L'_N\)).
A spatial size of the output is calculated as
\[L'_i =s_i (L_i - 1) - 2 p_i + d_i (k_i - 1) + 1,\]where \(s_i\) is the stride, \(L_i\) is the spatial size, \(p_i\) is the padding, \(d_i\) is the dilation, and \(k_i\) is the kernel size for \(i\)-th spatial dimension. The same calculation can also be applied to the other spatial dimensions.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.depthwise_deconvolution(x, weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, divisor=1, n_outputs=-1, outputs=None)[source]¶
Depthwise deconvolution computes the transposed depthwise convolution with bias for one-dimensional and two-dimensional input data.
- Parameters:
x (Variable) – \((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C \times L_1 \times ... \times L_N\)).
weight (Variable) – \((1 + N)\)-D array (\(C \times K_1 \times ... \times K_N\)). [parameter]
bias (Variable) – Bias vector (\(C'\)). [optional][parameter]
base_axis (int) – base axis \(B\). [default=
1
]pad (
tuple
ofint
) – Padding sizes for dimensions. [default=(0,) * (len(x.shape) - (base_axis+1))
]stride (
tuple
ofint
) – Stride sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]divisor (int) – Number of input feature maps per output feature map. [default=
1
]
- Returns:
\((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C' \times L'_1 \times ... \times L'_N\)).
The output map size \(C'\) is \(C\) multiplied by \(m\)
\[C' = \frac{C}{d},\]where \(d\) is the divisor.
A spatial size of the output is calculated as
\[L'_i =s_i (L_i - 1) - 2 p_i + d_i (k_i - 1) + 1,\]where \(s_i\) is the stride, \(L_i\) is the spatial size, \(p_i\) is the padding, \(d_i\) is the dilation, and \(k_i\) is the kernel size for \(i\)-th spatial dimension. The same calculation can also be applied to the other spatial dimensions.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.deformable_convolution(x, weight, offset, mask=None, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, deformable_group=1, channel_last=False, n_outputs=-1, outputs=None)[source]¶
2-D Deformable Convolution with bias. Another convolution with fixed output channels must be passed externally to calculate the offsets and mask. Mask should be normalized to \([0,1]\) interval.
\[\begin{eqnarray} y(p) = \sum_{k=1}^{K} w_k \cdot x(p + p_k + \Delta p_k) \cdot \Delta m_k, \end{eqnarray}\]where \(x\) and \(y\) are input and output, \(w_k\) is the weight, \(p\) is the pixel location of interest, \(p_k\) is the fixed displacement e.g., \(p_k \in \{(-1, -1), (-1, 0), \ldots (1, 1)\}\) for the 2D 3x3 receptive field, \(\Delta p_k\) is the learnable displacement, and \(\Delta m_k\) is the learnable scale normalized in \([0, 1]\) by a function like the sigmoid. Note that \(\Delta p_k\) and \(\Delta m_k\) are sample-dependent, location-dependent, and feature-independent.
References
- Parameters:
x (Variable) – \((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C \times L_1 \times ... \times L_N\)).
weight (Variable) – \((2 + N)\)-D array (\(C' \times C \times K_1 \times ... \times K_N\)). [parameter]
offset (Variable) – Offsets for deformable convolutions. Shape is fixed to \((N, deformable{\_}group \times 2 \times Kh \times Kw, H, W)\). Offsets must be calculated externally through a separate convolution layer.
mask (Variable) – Normalized mask for deformable convolutions v2. Shape is fixed to \((N, deformable{\_}group \times Kh \times Kw, H, W)\). Masks must be calculated externally together with the offsets through a separate convolution layer. [optional]
bias (Variable) – Bias vector (\(C'\)). [optional][parameter]
base_axis (int) – base axis \(B\). [default=
1
]pad (
tuple
ofint
) – Padding sizes for dimensions. [default=(0,) * (len(x.shape) - (base_axis+1))
]stride (
tuple
ofint
) – Stride sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default=
1
]deformable_group (int) – Number of deformable groups of channels. [default=
1
]channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default=
False
]
- Returns:
\((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C' \times L'_1 \times ... \times L'_N\)).
A spatial size of the output is calculated as
\[L'_i = \frac{L_i + 2 p_i - d_i (k_i - 1) - 1}{s_i} + 1,\]where \(L_i\) is the spatial size, \(p_i\) is the padding, \(d_i\) is the dilation, \(k_i\) is the kernel size, and \(s_i\) is the stride for \(i\)-th spatial dimension. The same calculation can also be applied to the other spatial dimensions.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.adaptive_separable_convolution(x, vertical_kernel, horizontal_kernel, n_outputs=-1, outputs=None)[source]¶
2-D Adaptive Separable Convolution for NCHW (the channel-first tensor). Sample and pixel dependent vertical and horizontal kernels are dynamically generated ones, which are used for approximating a feature-independent 2-D kernel in this function. Thus, the kernel used in this function is dependent on samples and pixels but independent on features.
If the padding is needed, use the pad function to the input \(x\) before this function.
Adaptive separable convolution is formulated as
\[\tilde{I}(c, h, w) = \sum_{j, i} K_v(j, h, w) \times K_h(i, h, w) \times I(c, h + j, w + i),\]where \(I(c, h, w)\) and \(\tilde{I}(c, h, w)\) are the input and output images at \(c\)-th channel, \(h\)-th height, \(w\)-th width. \(K_V(:, h, w)\) and \(K_h(:, h, w)\) are vertical and horizontal 1-D kernels at \(h\)-th height and \(w\)-th width.
References
- Parameters:
- Returns:
\(4-D\) array (\(B \times C \times H - K_v + 1 \times W - K_h + 1\))
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.max_pooling(x, kernel, stride=None, ignore_border=True, pad=None, channel_last=False, n_outputs=-1, outputs=None)[source]¶
Max pooling. It pools the maximum values inside the scanning kernel:
\[y_{i_1, i_2} = \max_{k_1, k_2 \in K} (x_{i_1 + k_1, i_2 + k_2})\]where \(x_{i_1 + k_1, i_2 + k_2}\) is the input and \(y_{i_1, i_2}\) is the output.
- Parameters:
x (Variable) – Input variable.
stride (
tuple
ofint
) – Subsampling factors for each spatial axis. [default=kernel
]ignore_border (bool) – If false, kernels covering borders are also considered for the output. [default=
True
]pad (
tuple
ofint
) – Border padding values for each spatial axis. Padding will be added both sides of the dimension. [default=(0,) * len(kernel)
]channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default=
False
]
- Returns:
Maximum values variable
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.average_pooling(x, kernel, stride=None, ignore_border=True, pad=None, channel_last=False, including_pad=True, n_outputs=-1, outputs=None)[source]¶
Average pooling. It pools the averaged values inside the scanning kernel:
\[y_{i_1, i_2} = \frac{1}{K_1 K_2} \sum_{k1} \sum_{k2} x_{i_1 + k_1, i_2 + k_2}\]where \(x_{i_1 + k_1, i_2 + k_2}\) is the input and \(y_{i_1, i_2}\) is the output.
- Parameters:
x (Variable) – Input variable.
stride (
tuple
ofint
) – Subsampling factors for each spatial axis. [default=kernel
]ignore_border (bool) – If false, kernels covering borders are also considered for the output. [default=
True
]pad (
tuple
ofint
) – Border padding values for each spatial axis. Padding will be added both sides of the dimension. [default=(0,) * len(kernel)
]channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default=
False
]including_pad (bool) – If true, border padding values are considered for the output. [default=
True
]
- Returns:
Average values variable
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.global_average_pooling(x, n_outputs=-1, outputs=None)[source]¶
Warning
This function is experimental support, so please do not actively use it.
Global average pooling. It pools an averaged value from the whole image
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.sum_pooling(x, kernel, stride=None, ignore_border=True, pad=None, channel_last=False, n_outputs=-1, outputs=None)[source]¶
Sum pooling. It pools the summed values inside the scanning kernel:
\[y_{i_1, i_2} = \sum_{k1} \sum_{k2} x_{i_1 + k_1, i_2 + k_2}\]where \(x_{i_1 + k_1, i_2 + k_2}\) is the input and \(y_{i_1, i_2}\) is the output.
- Parameters:
x (Variable) – Input variable.
stride (
tuple
ofint
) – Subsampling factors for each spatial axis. [default=kernel
]ignore_border (bool) – If false, kernels covering borders are also considered for the output. [default=
True
]pad (
tuple
ofint
) – Border padding values for each spatial axis. Padding will be added both sides of the dimension. [default=(0,) * len(kernel)
]channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default=
False
]
- Returns:
Summed values variable
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.unpooling(x, kernel, channel_last=False, n_outputs=-1, outputs=None)[source]¶
Inverse operation of pooling. It spreads the input values:
\[y_{k_1 i_1 + j_1, k_2 i_2 + j_2} = x_{i_1, i_2}\]where \(_{i_1, i_2}\) is the input and \(y_{k_1 i_1 + j_1, k_2 i_2 + j_2}\) is the output.
- Parameters:
- Returns:
Spread values variable
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.embed(x0, w, n_outputs=-1, outputs=None)[source]¶
Embed slices of a matrix/tensor with indexing array/tensor.
- Parameters:
- Returns:
Output with shape \((I_0, ..., I_N, W_1, ..., W_M)\)
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.rnn(x, h, weight_l0, weight=None, bias=None, num_layers=1, nonlinearity='tanh', dropout=None, bidirectional=False, training=True, n_outputs=-1, outputs=None)[source]¶
RNN function implements Elman RNN with nonlinearity to input sequence. RNN function is defined as following:
\[{\mathbf h_t} = {\mathbf \tanh}( {\mathbf w_{ih}} *{\mathbf x_t} + {\mathbf b_{ih}} + {\mathbf w_{hh}}* {\mathbf h_{(t-1)}} + {\mathbf b_{hh}}).\]We use the following notations to describe the inputs and outputs below. \(T\): sequcne length, \(B\): batch size, \(I\): input size, \(L\): number of layers, \(D\): number of directions, can be either 1 or 2, \(H\): hidden size.
References
- Parameters:
x (Variable) – Input N-D array with shape \((T, B, I)\).
h (Variable) – Input N-D array with shape \((L, D, B, H)\).
weight_l0 (Variable) – Input N-D array with shape \((D, H, I + H)\). [parameter]
weight (Variable) – Input N-D array with shape \((L-1, D, H, D * H + H)\). [optional][parameter]
bias (Variable) – Input N-D array with shape \((L, D, H)\). [optional][parameter]
num_layers (int) – Number of layers in the network. If set to 1, only the weights for the first layer will be invoked. Default is 1. [default=
1
]nonlinearity (string) – Type of nonlinearity applied to input sequcne. Must be either tanh or relu. Default is tanh. [default=
'tanh'
]dropout (float) – Dropout ratio applied to parameters. Default is 0.0. [default=
0.0
]bidirectional (bool) – If True, bidirectional computation will be performed in each layer. Default is False. [default=
False
]training (bool) – Backpropagation will be performed only when it is true. Default is True. [default=
True
]
- Returns:
Output \(y\) with shape \((T, B, D * H)\) ~nnabla.Variable: Output \(h_n\) with shape \((L, D, B, H)\)
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.lstm(x, h, c, weight_l0, weight=None, bias=None, num_layers=1, dropout=None, bidirectional=False, training=True, n_outputs=-1, outputs=None)[source]¶
N-Step LSTM layer.
\[\begin{split}{\mathbf f_t} &=& {\mathbf \sigma}( {\mathbf W_f} *{\mathbf x_t} + {\mathbf U_f}* {\mathbf h_{(t-1)}} + {\mathbf b_f})\\ {\mathbf i_t} &=& {\mathbf \sigma}( {\mathbf W_i} *{\mathbf x_t} + {\mathbf U_i}* {\mathbf h_{(t-1)}} + {\mathbf b_i})\\ {\mathbf o_t} &=& {\mathbf \sigma}( {\mathbf W_o} *{\mathbf x_t} + {\mathbf U_o}* {\mathbf h_{(t-1)}} + {\mathbf b_o})\\ {\mathbf c_t} &=& {\mathbf f_t}\odot {\mathbf c_{(t-1)}} + {\mathbf i_t}\odot {\mathbf \tanh}({\mathbf W_c}*{\mathbf x_t} + {\mathbf U_c} *{\mathbf h_{(t-1)}} + {\mathbf b_c})\\ {\mathbf h_t} &=& {\mathbf o_t} \odot {\mathbf \tanh}({\mathbf c_t}).\end{split}\]We use the following notations to describe the inputs and outputs below. \(T\): sequcne length, \(B\): batch size, \(I\): input size, \(L\): number of layers, \(D\): number of directions, can be either 1 or 2, \(H\): hidden size.
References
- Parameters:
x (Variable) – Input N-D array with shape \((T, B, I)\).
h (Variable) – Input N-D array with shape \((L, D, B, H)\).
c (Variable) – Input N-D array with shape \((L, D, B, H)\).
weight_l0 (Variable) – weight parameters for the first layer. Shape is \((D, 4, H, I + H)\). [parameter]
weight (Variable) – weight parameters for the second layer and above. Shape is \((L-1, D, 4, H, D * H + H)\). [optional][parameter]
bias (Variable) – Bias vector (\(L\)). Shape is \((L, D, 4, H)\). [optional][parameter]
num_layers (int) – Number of layers in the network. If set to 1, only the weights for the first layer will be invoked. Default is 1. [default=
1
]dropout (float) – Dropout ratio applied to parameters. Default is 0.0. [default=
0.0
]bidirectional (bool) – If True, bidirecitonal computation will be performed in each layer. Default is False. [default=
False
]training (bool) – Backpropagation will be performed only when it is True. Default is True. [default=
True
]
- Returns:
Output \(y\) with shape \((T, B, D * H)\). Its memory layout can be reshaped as \((T, B, D, H)\). ~nnabla.Variable: Output \(h_n\) with shape \((L, D, B, H)\) ~nnabla.Variable: Output \(c_n\) with shape \((L, D, B, H)\)
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.gru(x, h, weight_l0, weight=None, bias=None, num_layers=1, dropout=None, bidirectional=False, training=True, n_outputs=-1, outputs=None)[source]¶
N-Step GRU layer.
\[\begin{split}{\mathbf r_t} &=& {\mathbf \sigma}( {\mathbf W_r} *{\mathbf x_t} + {\mathbf U_r}* {\mathbf h_{(t-1)}} + {\mathbf b_r})\\ {\mathbf z_t} &=& {\mathbf \sigma}( {\mathbf W_z} *{\mathbf x_t} + {\mathbf U_z}* {\mathbf h_{(t-1)}} + {\mathbf b_z})\\ {\mathbf n_t} &=& {\mathbf \tanh}( {\mathbf W_n}{\mathbf x_t}+ {\mathbf b_{in}}+ {\mathbf r_n}\odot( {\mathbf U_n}{\mathbf h_{t-1}}+ {\mathbf b_{hn}})) \\ {\mathbf h_t} &=& (1- {\mathbf z_t})\odot {\mathbf n_t} + {\mathbf z_t}\odot {\mathbf h_{t-1}}.\end{split}\]We use the following notations to describe the inputs and outputs below. \(T\): sequcne length, \(B\): batch size, \(I\): input size, \(L\): number of layers, \(D\): number of directions, can be either 1 or 2, \(H\): hidden size.
References
- Parameters:
x (Variable) – Input N-D array with shape \((T, B, I)\).
h (Variable) – Input N-D array with shape \((L, D, B, H)\).
weight_l0 (Variable) – weight parameters for the first layer. Shape is \((D, 3, H, I + H)\). [parameter]
weight (Variable) – weight parameters for the second layer and above. Shape is \((L-1, D, 3, H, D * H + H)\). [optional][parameter]
bias (Variable) – Bias vector (\(L\)). Shape is \((L, D, 4, H)\). [optional][parameter]
num_layers (int) – Number of layers in the network. If set to 1, only the weights for the first layer will be invoked. Default is 1. [default=
1
]dropout (float) – Dropout ratio applied to parameters. Default is 0.0. [default=
0.0
]bidirectional (bool) – If True, bidirecitonal computation will be performed in each layer. Default is False. [default=
False
]training (bool) – Backpropagation will be performed only when it is True. Default is True. [default=
True
]
- Returns:
Output \(y\) with shape \((T, B, D * H)\). Its memory layout can be reshaped as \((T, B, D, H)\). ~nnabla.Variable: Output \(h_n\) with shape \((L, D, B, H)\)
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.multi_head_attention(query, key, value, num_heads, q_weight, k_weight, v_weight, out_weight, q_bias=None, k_bias=None, v_bias=None, out_bias=None, attn_bias_k=None, attn_bias_v=None, dropout=0.0, additive_mask=None, key_padding_mask=None)[source]¶
MultiHeadAttention.
Computes multi-headed attention with query, key, and value. We use the following notations to describe the inputs and outputs below. \(L_T\): target sequence length, \(L_S\): source sequence length, \(B\): batch size, \(D\): input dimension, \(E\): embedding dimension, \(H\): number of attention heads.
References
A. Vaswani et al. “Attention is All You Need.” NIPS. 2017. <https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf>
- Parameters:
query (Variable) – Input N-D array with shape \((L_T, B, D_q)\).
key (Variable) – Input N-D array with shape \((L_S, B, D_k)\).
value (Variable) – Input N-D array with shape \((L_S, B, D_v)\).
num_heads (int) – Number of attention heads. Note that embedding dimensoin E must be divisible by the number of heads. Default is 12 which is conventional.
q_weight (Variable) – Input N-D array with shape \((D_q, E)\).
k_weight (Variable) – Input N-D array with shape \((D_k, E)\).
v_weight (Variable) – Input N-D array with shape \((D_v, E_v)\).
out_weight (Variable) – Input N-D array with shape \((D_v, E_{out})\).
q_bias (Variable, optional) – Input N-D array with shape \((E, )\).
k_bias (Variable, optional) – Input N-D array with shape \((E, )\).
v_bias (Variable, optional) – Input N-D array with shape \((E_v, )\).
out_bias (Variable, optional) – Input N-D array with shape \((E_{out}, )\).
attn_bias_k (Variable, optional) – Input N-D array with shape \((E, )\).
attn_bias_v (Variable, optional) – Input N-D array with shape \((E_v, )\).
dropout (float, optional) – Dropout ratio applied to parameters. Default is 0.
additive_mask (Variable, optional) – Input N-D array with shape \((L_T, L_S)\). Values will be added to the attention layer to prevent attention to certain positions.
key_padding_mask (Variable, optional) – Input N-D array with shape \((B, L_S)\). Specified padding elements will be ignored by the attention layer. Values must be either 1 or 0.
- Returns:
Output \(y\) with shape \((L_T, B, E_{out})\) ~nnabla.Variable: Output \(h_n\) with shape \((B, L_T, L_S)\)
- Return type:
- nnabla.functions.patch_correlation(x1, x2, patch=(1, 1), shift=(0, 0), patch_step=(1, 1), shift_step=(1, 1), padding=(0, 0, 0, 0), channel_last=False)[source]¶
Multiplicative patch-wise comparison between inputs
x1
andx2
, which must both be 4-dimensional NCHW (withchannel_last=False
) or NHWC (withchannel_last=True
) arrays (where N is the number of samples, H and W are the sample height and width and C is the number of channels). The function returns a 5-D array with shape \((N, C_y, C_x, H_o, W_o)\) where \(H_o, W_o\) are determined by the possible patch locations within the, optionally padded, input image sizeand \(C_y, C_x\) are determined by the optionally shifted patch positions.Mathematically, the patch correlation is formulated as
\[O(s_y, s_x, h_0, w_0) = \sum_{c} \sum_{k_h} \sum_{k_w} I_1(c, h + k_h, w + k_w) \times I_2(c, h + k_h + s_h, w + k_w + s_w),\]where \(I_1(c, h, w)\) and \(I_2(c, h, w)\) are the inputs at \(c\)-th channel, \(h\)-th height, and \(w\)-th width, \(k_h, k_w\) indices for the patch size and \(s_h, s_w\) indices for the shifts.
A single correlation value (per sample) is produced if the patch extends to the image dimensions and all other parameters use the default values.
>>> import numpy as np, nnabla as nn, nnabla.functions as F >>> N, C, H, W = (1, 2, 3, 4) >>> x = nn.Variable.from_numpy_array(np.ones([N, C, H, W])) >>> F.patch_correlation(x, x, patch=(H, W)).d array([[[[[24.]]]]], dtype=float32)
A patch that is smaller than the image size moves horizontal and vertical producing a value per position. The
patch_step
argument may be used to control the position increments.>>> F.patch_correlation(x, x, patch=(H-1, W-1)).d array([[[[[12., 12.], [12., 12.]]]]], dtype=float32) >>> F.patch_correlation(x, x, patch=(H-1, W-1), patch_step=(2, 1)).d array([[[[[12., 12.]]]]], dtype=float32)
Multiple correlations may be performed at each position between the patch from
x1
and patches fromx2
at relative offsets striding the maximum vertical and horizontal distance given by theshift
values at increments ofshift_step
. The shifted correlation values can be obtained for the from the second and third output dimension for the vertical and horizontal shifts.>>> F.patch_correlation(x, x, (H, 1), shift=(0, 1)).shape (1, 1, 3, 1, 4) >>> F.patch_correlation(x, x, (H, 1), shift=(0, 1)).d array([[[[[0., 6., 6., 6.]], [[6., 6., 6., 6.]], [[6., 6., 6., 0.]]]]], dtype=float32) >>> F.patch_correlation(x, x, (H, 1), shift=(0, 1), shift_step=(1, 2)).d array([[[[[0., 6., 6., 6.]], [[6., 6., 6., 0.]]]]], dtype=float32)
Padding with zero values may be applied individually to the top, bottom, left and right side of the input image.
>>> F.patch_correlation(x, x, patch=(H, W), padding=(0, 1, W, W)).d array([[[[[ 0., 6., 12., 18., 24., 18., 12., 6., 0.], [ 0., 4., 8., 12., 16., 12., 8., 4., 0.]]]]], dtype=float32)
This function may be used to implement the FlowNetC correlation layer.
>>> N, C, H, W = (1, 256, 44, 60) >>> x1, x2 = nn.Variable((N, C, H, W)), nn.Variable((N, C, H, W)) >>> F.patch_correlation(x1, x2, shift=20, shift_step=2).shape (1, 21, 21, 44, 60)
References
- Parameters:
x1 (Variable) – Input N-D array with shape \((N, C, H, W)\) or \((N, H, W, C)\).
x2 (Variable) – Input N-D array with shape \((N, C, H, W)\) or \((N, H, W, C)\).
patch – A tuple with height and width of the correlation patch. A single integer expands to identical height and width.
shift – A tuple of maximum vertical and horizontal displacement of patches from
x2
that are correlated with a single patch fromx1
. A single integer expands to identical vertical and horizontal displacement.patch_step – A tuple of vertical and horizontal increments for advancing the position of the correlation patch within the input image shape. A single integer expands to identical vertical and horizontal increments.
shift_step – A tuple of vertical and horizontal increments for advancing the relative offset position within the shift range. A single integer expands to identical vertical and horizontal increments.
padding – A tuple of top, bottom, left and right padding extent. A tuple of two values yields identical top/bottom and left/right padding from the first and second tuple value. A single integer expands to identical padding extent for all sides.
channel_last – Last dimension is the channel (NHWC order) if True.
- Returns:
N-D array with shape \((N, C_y, C_x, H_o, W_o)\) or \((N, H, W, C_y, C_x)\) if
channel_last=True
.A spatial size of the output is calculated as
\[H_o = \frac{H + (top\_pad + bottom\_pad) - patch_v}{patch\_step_v} + 1.\]A channel size of the output is calculated as
\[C_y = \frac{2 \times shift_v}{shift\_step_v} + 1.\]\(W_o\) and \(C_x\) are the same calculation with differenct components.
- Return type:
- nnabla.functions.roi_align(input, boxes, output_size, spatial_scale=(1.0, 1.0), sampling_ratio=None, channel_last=None, n_outputs=-1, outputs=None)[source]¶
Map Regions of Interest (RoI) defined by bounding
boxes
to features ofoutput_size
height and width using bilinear interpolation withsampling_ratio
points in the interpolation grid.>>> import numpy as np, nnabla as nn, nnabla.functions as F >>> nn.set_auto_forward(True) >>> input = F.pad(F.constant(1, (1, 1, 2, 2)) * 2, (1, 1, 1, 1), "constant", 1) >>> print(input.d) [[[[1. 1. 1. 1.] [1. 2. 2. 1.] [1. 2. 2. 1.] [1. 1. 1. 1.]]]] >>> boxes = nn.Variable.from_numpy_array([[0, 0, 0, 4, 4], [0, 1, 1, 3, 3]]) >>> output = F.roi_align(input, boxes, (2, 2)) >>> print(output.d[0]) [[[[1.25 1.25] [1.25 1.25]]] >>> print(output.d[1]) [[[2. 2. ] [2. 2. ]]]]
The
spatial_scale
argument tuple may be used to appropriately scale the box coordinates, for example, to scale normalized box coordinate to the input height and width dimensions.>>> input = F.reshape(F.arange(1, 13), (1, 1, 3, 4)) >>> print(input.d) >>> boxes = nn.Variable.from_numpy_array([[0, 1/4, 1/3, 3/4, 2/30]]) >>> output = F.roi_align(input, boxes, (1, 2), spatial_scale=(3, 4)) >>> print(input.d) [[[[6. 7.]]]]
References:
- Parameters:
input (Variable) – N-D array with shape \((N, H, W, C)\) or \((N, C, H, W)\).
boxes (Variable) – N-D array with shape \((K, 5)\) containing box coordinates in (b, x1, y1, x2, y2) format where b is the batch index. Note that an invalid (out-of-range) batch index will generate an error only when running on CPU; when using a GPU context the batch index values are clipped to the range of input samples.
output_size (
tuple
ofint
) – the height and width of the output feature maps.spatial_scale (repeated float) – Scaling factor from box to input coordinates, as (x, y). [default=
(1.0, 1.0)
]sampling_ratio (int) – The number of sampling points used for interpolation. Computed as
ceil((y2 - y1) / output_size[0])
for height and likewise for width ifsampling_ratio <= 0
. [default=-1
]channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default=
False
]
- Returns:
N-D array with shape \((K, C, output\_size[0], output\_size[1])\) or \((K, output\_size[0], output\_size[1], C)\).
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Neural Network Activation¶
- nnabla.functions.sigmoid(x, n_outputs=-1, outputs=None)[source]¶
Element-wise sigmoid function.
\[f(x) = \frac{1}{1 + \exp(-x)},\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.swish(x, n_outputs=-1, outputs=None)[source]¶
Element-wise swish function, by Ramachandran et al. (2017).
\[y_i = \frac{x_i}{1 + \exp(-x_i)},\]References
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.tanh(x, n_outputs=-1, outputs=None)[source]¶
Element-wise hyperbolic tangent (tanh) function.
\[y_i = \tanh (x_i)\]- Parameters:
x (Variable) – N-D array
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.relu(x, inplace=False, n_outputs=-1, outputs=None)[source]¶
Element-wise Rectified Linear Unit (ReLU) function.
\[y_i = \max (0, x_i)\]- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.softmax(x, axis=None, n_outputs=-1, outputs=None)[source]¶
Softmax normalization. Calculates
\[y_i = \frac{\exp(x_i)}{\sum_j \exp(x_j)}\]along the dimension specified by
axis
, where \(x_i\) is the input and \(y_i\) is the output.- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.log_softmax(x, axis=None, n_outputs=-1, outputs=None)[source]¶
Fused operation of Softmax normalization followed by log, which is defined as
\[y_i = \log \frac{\exp(x_i)}{\sum_j \exp(x_j)},\]where \(y_i\) is the input and \(x_i\) is the output at i-th channel. An advantage of this fusion is reducing the numerical instability due to the log application.
The original definition can be rewritten as
\[y_i = x_i - \max_j(x_j) - \log\left(\sum_j \exp(x_j - \max_k(x_k))\right).\]It is more stable as a log is always applied to a value \(\ge e\), while a log can be evaluated for 0 in the non-fused operation.
Also, backward gradient computation is more stable than the original one as it doesn’t perform division by x due to a gradient of log. The definition is as following.
\[dx_i = dy_i - y_i * \sum_j dy_j\]where \(dx_i\) and \(dy_i\) denote gradients of loss wrt \(x_i\) and \(y_i\) respectively.
- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.elu(x, alpha=1.0, n_outputs=-1, outputs=None)[source]¶
Element-wise Exponential Linear Unit (ELU) function.
\[\begin{split}y_i= \left\{ \begin{array}{ll} x_i & (x > 0)\\ \alpha (\exp(x_i) - 1) & (x \leq 0) \end{array} \right..\end{split}\]References
- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.selu(x, scale=1.05070098735548, alpha=1.673263242354377, n_outputs=-1, outputs=None)[source]¶
Element-wise Scaled Exponential Linear Unit (SELU) function by Klambauer et al. (2017).
\[\begin{split}y_i= \lambda \left\{ \begin{array}{ll} x_i & (x > 0)\\ \alpha (\exp(x_i) - 1) & (x \leq 0) \end{array} \right..\end{split}\]The coefficients \(\lambda\) and \(\alpha\) default to the following values \(\lambda_{01}\) and \(\alpha_{01}\), respectively, provided by Klambauer et al. (2017):
\[\begin{split}\begin{array}{lll} \lambda_{01} &=& \left( 1 - \operatorname{erfc}\left( \frac{1}{\sqrt{2}} \right) \sqrt{e} \right) \sqrt{2 \pi} \\ && \left( 2 \operatorname{erfc} \left( \sqrt{2} \right) e^2 + \pi \operatorname{erfc}\left( \frac{1}{\sqrt{2}} \right)^2 e \right. \\ && \left. - 2(2 + \pi) \operatorname{erfc} \left( \frac{1}{\sqrt{2}} \right) \sqrt{e} + \pi + 2 \right)^{-1/2} \\ &\approx& 1.0507 \\ \alpha_{01} &=& - \frac {\sqrt {\frac {2}{\pi}}} {\operatorname{erfc} \left( \frac{1}{\sqrt{2}} \right) \exp \left(\frac {1} {2} \right) - 1} \\ &\approx& 1.67326 \end{array}\end{split}\]References
- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.crelu(x, axis=1, n_outputs=-1, outputs=None)[source]¶
Element-wise Concatenated Rectified Linear Unit (CReLU) function. This function calculates the ReLU of \(x\) and \(-x\) , then concatenates the results together at a specified axis, and returns the resulting array.
References
- Parameters:
- Returns:
N-D array where axis dimension is doubled by concatenating.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.celu(x, alpha=1.0, axis=1, n_outputs=-1, outputs=None)[source]¶
Element-wise Concatenated Exponential Linear Unit (CELU) function. Concatenates ELU outputs of positive and negative inputs together at specified axis.
- Parameters:
- Returns:
N-D array where axis dimension is doubled by concatenating.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.gelu(x, n_outputs=-1, outputs=None)[source]¶
Gaussian Error Unit (GELU) function.
\[GELU(x) = xP(X \leq x) = x \Phi (x)\]which is approximated by
\[GELU(x) = 0.5x (1 + \tanh ( \sqrt(2/\pi)(x + 0.044715x^3) ))\]References
- Parameters:
x (Variable) – N-D array
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.mish(x, n_outputs=-1, outputs=None)[source]¶
Mish activation function.
\[Mish(x) = x \tanh(\log(1+\exp(x_i)))\]References
- Parameters:
x (Variable) – N-D array
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.prelu(x0, x1, base_axis=1, n_outputs=-1, outputs=None)[source]¶
Element-wise Parametrized Rectified Linear Unit function. Calculates:
\[y_i = \max(0, x_i) + w_i \min(0, x_i)\]where negative slope \(w\) is learned and can vary across channels (an axis specified with
base_axis
).- Parameters:
- Returns:
N-D array.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.leaky_relu(x, alpha=0.1, inplace=False, n_outputs=-1, outputs=None)[source]¶
Element-wise Leaky Rectified Linear Unit (ReLU) function.
It is defined as:
\[y_i = \alpha * \min(0, x_i) + \max (0, x_i)\]- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.relu6(x, n_outputs=-1, outputs=None)[source]¶
Element-wise ReLU6 function. Capping ReLU activation to 6 is often observed to learn sparse features earlier.
\[ReLU6(x) = \min(\max(0,x,),6)\]- Parameters:
x (Variable) – N-D array
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.hard_sigmoid(x, n_outputs=-1, outputs=None)[source]¶
Segment-wise linear approximation of sigmoid. Preferable when speed of computation is more important than precision. Returns \(0\) if \(x < -2.5\). Returns \(1\) if \(x> 2.5\). Returns \(0.2x + 0.5\) if \(-2.5 <= x <= 2.5\).
- Parameters:
x (Variable) – N-D array
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.hard_tanh(x, n_outputs=-1, outputs=None)[source]¶
Element-wise HardTanh function. Computationally cheaper than Tanh function. Returns \(1\) if \(x > 1\). Returns \(-1\) if \(x < -1\). Returns \(x\) otherwise.
- Parameters:
x (Variable) – N-D array
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.log_sigmoid(x, n_outputs=-1, outputs=None)[source]¶
Element-wise LogSigmoid function.
\[LogSigmoid(x) = \log(1/(1+\exp(-x_i)))\]- Parameters:
x (Variable) – N-D array
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.softplus(x, beta=1.0, n_outputs=-1, outputs=None)[source]¶
Element-wise SoftPlus function. Unlike Sigmoid and Tanh that have upper and lower bound, SoftPlus is only lower-bounded by 0.
\[SoftPlus(x) = \frac{1}{\beta} * \log(1+\exp(\beta * x_i))\]- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.softsign(x, n_outputs=-1, outputs=None)[source]¶
Element-wise SoftSign. Can be used in place of Tanh function. While Tanh converges exponentially, SoftSign converges polynomially.
\[SoftSign(x) = x/(1+|x|)\]- Parameters:
x (Variable) – N-D array
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.tanh_shrink(x, n_outputs=-1, outputs=None)[source]¶
Element-wies TanhShrink function.
\[TanhShrink(x) = x - \tanh(x)\]- Parameters:
x (Variable) – N-D array
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.sinc(x, n_outputs=-1, outputs=None)[source]¶
Element-wise Sinc function. Unlike other popular activation functions, it has rises and falls. returns \(1\) if \(x = 0\). returns \(\sin(x)/x\) otherwise.
- Parameters:
x (Variable) – N-D array
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Normalization¶
- nnabla.functions.batch_normalization(x, beta, gamma, mean, variance, axes=[1], decay_rate=0.9, eps=1e-05, batch_stat=True, output_stat=False, n_outputs=None)[source]¶
Batch normalization.
\[\begin{split}\begin{eqnarray} \mu &=& \frac{1}{M} \sum x_i \\ \sigma^2 &=& \frac{1}{M} \sum \left(x_i - \mu\right)^2 \\ \hat{x}_i &=& \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon}} \\ y_i &=& \hat{x}_i \gamma + \beta. \end{eqnarray}\end{split}\]At testing time, the mean and variance values used are those that were computed during training by moving average.
References
- Parameters:
x (Variable) – N-D array of input.
beta (Variable or None) – N-D array of beta which is learned. If None, the bias term is omitted.
gamma (Variable or None) – N-D array of gamma which is learned. If None, the scale term is omitted.
mean (Variable or None) – N-D array of running mean (modified during forward execution). If None, dummy variable is created and running mean is not updated. mean=None with batch_stat=False is prohibited.
variance (Variable or None) – N-D array of running variance (modified during forward execution). If None, dummy variable is created and running variance is not updated. variance=None with batch_stat=False is prohibited.
axes (list of int or int) – Mean and variance are calculated along these axes.
decay_rate (float) – Decay rate of running mean and variance.
eps (float) – Tiny value to avoid zero division by std.
batch_stat (bool) – Use mini-batch statistics rather than running ones. If False, mean and variance must be
~nnabla.Variable
. (None is prohibited.)output_stat (bool) – It true, the batch statistics of mean and variance, will be returned as Variables. They are also differentiable.
- Returns:
Returns batch normalization output as
Variable
. Ifoutput_stat=True
, it also returns the mean and variance of the mini-batch
See also
nnabla.function_bases.batch_normalization
.
- nnabla.functions.fused_batch_normalization(x, beta, gamma, mean, variance, z=None, axes=[1], decay_rate=0.9, eps=1e-05, batch_stat=True, nonlinearity='relu', output_stat=False, n_outputs=None)[source]¶
Batch normalization fused with an add operation and an activation.
References
- Parameters:
x (Variable) – N-D array of input.
beta (Variable or None) – N-D array of beta which is learned. If None, the bias term is omitted.
gamma (Variable or None) – N-D array of gamma which is learned. If None, the scale term is omitted.
mean (Variable or None) – N-D array of running mean (modified during forward execution). If None, dummy variable is created and running mean is never updated. mean=None with batch_stat=False is prohibited.
variance (Variable) – N-D array of running variance (modified during forward execution). If None, dummy variable is created and running variance is not updated. variance=None with batch_stat=False is prohibited.
z (Variable, optional) – N-D array
axes (list of int or int) – Mean and variance are calculated along these axes.
decay_rate (float) – Decay rate of running mean and variance.
eps (float) – Tiny value to avoid zero division by std.
batch_stat (bool) – Use mini-batch statistics rather than running ones. If False, mean and variance must be
~nnabla.Variable
. (None is prohibited.)nonlinearity (str) – Nonlinearity chosen from relu. Default is relu.
output_stat (bool) – It true, the batch statistics of mean and variance, will be returned as Variables. They are also differentiable.
- Returns:
Returns batch normalization output as
Variable
. Ifoutput_stat=True
, it also returns the mean and variance of the mini-batch
See also
nnabla.function_bases.batch_normalization
.
- nnabla.functions.sync_batch_normalization(x, beta, gamma, mean, variance, comm, group='world', axes=[1], decay_rate=0.9, eps=1e-05, batch_stat=True, output_stat=False, n_outputs=None)[source]¶
Synchronized batch normalization.
For some tasks (e.g., semantic segmentation), batch size will be too small and BatchNormalization layer might not work well. SyncBatchNorlization layer solves these problems by synchronizing batch stats (mean and var) between multiple processes.
\[\begin{split}\begin{eqnarray} \mu &=& \frac{1}{M} \sum x_i \\ \sigma^2 &=& \frac{1}{M} \left(\sum x_i - \mu\right)^2 \\ \hat{x}_i &=& \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon}} \\ y_i &=& \hat{x}_i \gamma + \beta. \end{eqnarray}\end{split}\]References
Implementing Synchronized Multi-GPU Batch Normalization https://hangzhang.org/PyTorch-Encoding/notes/syncbn.html
- Parameters:
x (Variable) – N-D array of input.
beta (Variable or None) – N-D array of beta which is learned. If None, the bias term is omitted.
gamma (Variable or None) – N-D array of gamma which is learned. If None, the scale term is omitted.
mean (Variable or None) – N-D array of running mean (modified during forward execution). If None, dummy variable is created and running mean is never updated. mean=None with batch_stat=False is prohibited.
variance (Variable or None) – N-D array of running variance (modified during forward execution). If None, dummy variable is created and running variance is never updated. variance=None with batch_stat=False is prohibited.
comm (Communicator) – The communicator
group (string) – The name of the communicator group
axes (list of int or int) – Mean and variance are calculated along these axes.
decay_rate (float) – Decay rate of running mean and variance.
eps (float) – Tiny value to avoid zero division by std.
batch_stat (bool) – Use mini-batch statistics rather than running ones. If False, mean and variance must be
~nnabla.Variable
. (None is prohibited.)output_stat (bool) – It true, the batch statistics of mean and variance, will be returned as Variables. They are also differentiable.
- Returns:
Returns batch normalization output as
Variable
. Ifoutput_stat=True
, it also returns the mean and variance of the mini-batch
See also
nnabla.function_bases.batch_normalization
.
- nnabla.functions.mean_subtraction(x, mean, t, base_axis=1, update_running_mean=True)[source]¶
It subtracts the mean of the elements of the input array, and normalizes it to \(0\). Preprocessing arrays with this function has the effect of improving accuracy in various tasks such as image classification.
At training time, this function is defined as
\[\begin{split}\begin{eqnarray} \mu &=& \frac{1}{M} \sum x_i \\ y_i &=& x_i - \mu \end{eqnarray}\end{split}\]At testing time, the mean values used are those that were computed during training by moving average.
Note
The backward performs an approximated differentiation that takes into account only the latest mini-batch.
- Parameters:
x (Variable) – N-D array of input.
mean (Variable) – N-D array of running mean (modified during forward execution).
t (Variable) – Scalar of num of iteration of running mean (modified during forward execution).
base_axis (int) – Base axis of Mean Subtraction operation. Dimensions up to base_axis is treated as sample dimension. [default=
1
]update_running_mean (bool) – Update running mean during forward execution. [default=
True
]
- Returns:
N-D array.
- Return type:
See also
nnabla.function_bases.mean_subtraction
.
- nnabla.functions.norm_normalization(x, p=None, axes=None, eps=1e-12)[source]¶
Norm normalization.
\[y = \frac{x_i}{\|x\|_p}\]- Parameters:
x (Variable) – N-D array.
p (float) – Order of the norm. [default=
2
]axes (repeated int64) – Axes to be reduced. If empty list is given, all dimensions are reduced. [default=
range(x.ndim)
]eps (float) – Epsilon for the normalization. This
eps
is added before taking the p-th root in the norm computation. [default=1e-12
]
- Returns:
N-D array
- Return type:
- nnabla.functions.clip_by_value(x, min, max)[source]¶
Clip inputs by values.
\[\begin{split}y = \begin{cases} max & (x > max) \\ x & (otherwise) \\ min & (x < min) \end{cases}.\end{split}\]- Parameters:
x (Variable) – An input variable.
min (Variable or float) – A min variable or float value by which
x
is clipped. Note that if Variable is given, its shape must be the same asx
’s.max (Variable or float) – A max variable or float value by which
x
is clipped. Note that if Variable is given, its shape must be the same asx
’s
- Returns:
N-D array.
- Return type:
- nnabla.functions.clip_grad_by_value(x, min, max, n_outputs=-1, outputs=None)[source]¶
In forward pass, the function behaves as the identity.
In backward pass,
\[\begin{split}g_x = \begin{cases} max & (g_y > max) \\ g_y & (otherwise) \\ min & (g_y < min) \end{cases}.\end{split}\]A typical case for use is to prevent the gradient explosion through a whole computational graph. For example, if you want to clip gradient values for each feature map,
x = nn.Variable([16, 3, 32, 32]) min = F.broadcast(nn.Variable.from_numpy_array(np.asarray([-1.0]).reshape((1, 1, 1, 1))), (16, 3, 32, 32)) max = F.broadcast(nn.Variable.from_numpy_array(np.asarray([1.0]).reshape((1, 1, 1, 1))), (16, 3, 32, 32)) c = F.clip_grad_by_value(x, min=min, max=max) h = PF.convolution(c, 64, (3, 3), pad=(1, 1))
- Parameters:
x (Variable) – N-D array of input.
min (Variable) – N-D array of minimum input value by which the gradients of the
y
are clipped. Note that the shape ofmin
must be the same asx
’s and the backward tomin
is not performed.max (Variable) – N-D array of maximum input value by which the gradients of the
y
are clipped. Note that the shape ofmax
must be the same asx
’s and the backward tomax
is not performed.
- Returns:
N-D array.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.clip_by_norm(x, clip_norm, axis=None)[source]¶
Clip inputs by its L2 norm when the L2 norm is larger than the threshold value (defined by clip_norm). If it is less than the threshold, inputs are not modified. If it is applied, the operation is represented as
\[y = N \times \frac{x}{\|x\|_2}.\]where \(x\) is the input, \(y\) is the output, and \(N\) is
clip_norm
. this is the case thataxes
is not set. Whenaxes
is set, the norm is computed overaxes
.- Parameters:
- Returns:
N-D array.
- Return type:
- nnabla.functions.clip_grad_by_norm(x, clip_norm=None, axes=None, n_outputs=-1, outputs=None)[source]¶
In the forward pass, the function behaves like the identity.
In the backward pass,
\[g_x = N \times \frac{g_y}{\|g_y\|_2}.\]where \(g_x\) is the gradient w.r.t the input, \(g_y\) is the gradient w.r.t. the output, and \(N\) is
clip_norm
where the norm of \(g_y\) becomes. this is the case thataxes
is not set. Whenaxes
is set, the norm is computed overaxes
.A typical case for use is to prevent the gradient explosion through a whole computational graph. For example, if you want to normalize gradient values over feature axis,
x = nn.Variable([16, 3, 32, 32]) c = F.clip_grad_by_norm(x, axes=(1, )) h = PF.convolution(c, 64, (3, 3), pad=(1, 1))
- Parameters:
- Returns:
N-D array.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.layer_normalization(x, beta, gamma, batch_axis=0, eps=1e-05, output_stat=False)[source]¶
Applies Layer Normalization over an input tensor, which is defined as:
\[\begin{split}\begin{eqnarray} \mu^l &=& \frac{1}{H} \sum_{i=1}^{H} x_i^l \\ \sigma^l &=& \sqrt{\frac{1}{H} \sum_{i=1}^{H} \left(x_i^l - \mu^l\right)^2 + \epsilon} \\ y &=& \frac{x - \mu^l}{\sigma^l} \gamma + \beta \end{eqnarray}\end{split}\]where \(x\) and \(y\) are input and output variable, \(\mu^l\) and \(\sigma^l\) are the mean and std of each layer which is separately calculated for each batch, and \(\beta\) and \(\gamma\) are adaptive biases and gains.
If the input shape is [B, C, H, W] (= batch_axis=0), the shape of calculated mean and std are [B, 1, 1, 1]
References
- Parameters:
x (Variable) – An input variable.
beta (Variable or None) – An Adaptive biases. If None, the bias term is omitted.
gamma (Variable or None) – An Adaptive gains. If None, the scale term is omitted.
batch_axis (int or repeated int) – Axes mean and variance are taken.
eps (float) – Tiny value to avoid zero division by std.
output_stat (bool) – If true, calculated mean and variance are also returned.
- Returns:
output variable which is normalized its statics and rescaled by alpha and beta. *
Variable
: Mean (ifoutput_stat=True
). *Variable
: Std (ifoutput_stat=True
)- Return type:
- nnabla.functions.instance_normalization(x, beta, gamma, channel_axis=1, batch_axis=0, eps=1e-05, output_stat=False)[source]¶
Applies Instance Normalization over an input tensor, which is defined as:
\[\begin{split}\begin{eqnarray} \mu^i &=& \frac{1}{H} \sum_{i=1}^{H} x_i^i \\ \sigma^i &=& \sqrt{\frac{1}{H} \sum_{i=1}^{H} \left(x_i^i - \mu^i\right)^2 + \epsilon} \\ y &=& \frac{x - \mu^i}{\sigma^i} \gamma + \beta \end{eqnarray}\end{split}\]where \(x\) and \(y\) are input and output variable, \(\mu^i\) and \(\sigma^i\) are the mean and std of each instance which is separately calculated for each batch and channel, and \(\gamma\) and \(\beta\) are adaptive gains and biases.
If the input shape is [B, C, H, W] (= channel_axis=1, batch_axis=0), the shape of calculated mean and std are [B, C, 1, 1]
References
- Parameters:
x (Variable) – An input variable.
beta (Variable or None) – An Adaptive biases. If None, the bias term is omitted.
gamma (Variable or None) – An Adaptive gains. If None, the scale term is omitted.
channel_axis (int) – Channel axis.
batch_axis (int or repeated int) – Batch axes.
eps (float) – Tiny value to avoid zero division by std.
output_stat (bool) – If true, the batch statistics of mean and variance.
- Returns:
Normalized output variable. *
Variable
: Mean (ifoutput_stat=True
) *Variable
: Std (ifoutput_stat=True
)- Return type:
- nnabla.functions.group_normalization(x, beta, gamma, num_groups, channel_axis=1, batch_axis=0, eps=1e-05, output_stat=False)[source]¶
Applies Group Normalization over an input tensor, which is defined as:
\[\begin{split}\begin{eqnarray} \mu^g &=& \frac{1}{H} \sum_{i=1}^{H} x_i^g \\ \sigma^g &=& \sqrt{\frac{1}{H} \sum_{i=1}^{H} \left(x_i^g - \mu^g\right)^2 + \epsilon} \\ y &=& \frac{x - \mu^g}{\sigma^g} \gamma + \beta \end{eqnarray}\end{split}\]where \(x\) and \(y\) are input and output variable, \(\mu^g\) and \(\sigma^g\) are the mean and std of each group which contains
num_channels / num_groups
channels, and \(\gamma\) and \(\beta\) are adaptive gains and biases.The input channels, specified by
channel_axis
, are separated intonum_groups
groups, and the mean and std are calculated over the each group. For example, if the input shape is [B, C, H, W] (= channel_axis=1, batch_axis=0), an input variable is once reshaped to [B, num_groups, C / num_groups, H, W] and standardize by its mean and std whose shapes are [B, num_groups, 1, 1, 1]. Finally, an output variable is reshaped again to the original input shape (= [B, C, H, W] in the case above).References
- Parameters:
x (Variable) – An input variable.
beta (Variable or None) – An Adaptive biases. If None, the bias term is omitted.
gamma (Variable or None) – An Adaptive gains. If None, the scale term is omitted.
num_groups (int) – A number of groups. The channel dim of ‘x’ must be integer multiple of
num_groups
.channel_axis (int) – Channel axis.
batch_axis (int or repeated int) – Batch axes.
eps (float) – Tiny value to avoid zero division by std.
output_stat (bool) – If true, the batch statistics of mean and variance.
- Returns:
Normalized output variable. *
Variable
: Mean (ifoutput_stat=True
) *Variable
: Std (ifoutput_stat=True
)- Return type:
- nnabla.functions.weight_standardization(w, channel_axis=0, eps=1e-05, output_stat=False)[source]¶
Applies Weight Standardization over an input weight, which is defined as:
\[\begin{split}\begin{eqnarray} \mu_{W_i} &=& \frac{1}{I} \sum_{j=1}^{I} W_{ij} \\ \sigma_{W_i} &=& \sqrt{\frac{1}{I} \sum_{i=1}^{I} \left(W_{ij} - \mu_{W_{i}}\right)^2 + \epsilon} \\ \hat{W_{ij}} &=& \frac{W_{ij} - \mu_{W_i}}{\sigma_{W_i}} \\ y &=& \hat{W} \ast x \end{eqnarray}\end{split}\]Example
import numpy as np import nnabla as nn import nnabla.functions as F import nnabla.parametric_functions as PF rng = np.random.RandomState(313) x = nn.Variable.from_numpy_array(rng.randn(*(32, 16, 3, 3))) # For convolution: def ws_callback_conv(w): return F.weight_standardization(w, channel_axis=0) y = PF.convolution(x, 10, (2, 2), apply_w=ws_callback_conv) # For affine: def ws_callback_affine(w): return F.weight_standardization(w, channel_axis=1) y = PF.affine(x, 10, apply_w=ws_callback_affine)
References
- nnabla.functions.weight_normalization(w, g, dim=0, eps=1e-12, n_outputs=-1, outputs=None)[source]¶
Weight normalization.
\[\mathbf{w}_{WN} = g \dfrac{\mathbf{w}}{\|\mathbf{w}\|}\]where \(\mathbf{w}\) is the input weights to be normalized. and \(g\) is learnable multiplication factors each of which is applied to each data at
dim
.References
- Parameters:
w (Variable) – N-D array of learnable weights.
g (Variable) – 1-D array of learnable scales.
dim (int) – Output dimension. For the other dimensions, the norms are computed. [default=
0
]eps (float) – Epsilon for the normalization. This
eps
is added before taking the sqrt in the norm computation. [default=1e-12
]
- Returns:
N-D array
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.spectral_norm(w, u, dim=0, itr=1, eps=1e-12, test=False, output_u=False)[source]¶
Spectral Normalization.
\[W_{sn} = \frac{W}{\sigma(W)}\]where \(W\) is the input matrix, and the \(\sigma(W)\) is the spectral norm of \(W\). The spectral norm is approximately computed by the power iteration.
References
Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida, “Spectral Normalization for Generative Adversarial Networks”, International Conference on Learning Representations. 2018.
- Parameters:
w (Variable) – N-D array of learnable weights. This is normally network parameter.
u (Variable) – 1-D array of singular vector. When
test == False
, the data region ofu
will be updated during forward calculation.dim (int) – Output dimension. Default is 0. If the dimension is not 0, then the specified dimension becomes the most-left dimension by transposing. [default=
0
]itr (int) – Number of power iterations. Default is 1. [default=
1
]eps (float) – Epsilon for the normalization. This
eps
is added before taking the sqrt in the norm computation. [default=1e-12
]test (bool) – When in
True
,u
will not be updated. Default isFalse
. [default=False
]output_u (bool) – Output original
u
or not.u
is updated whentest == True
but you can get originalu
as output with this option. Default isFalse
. [default=False
]
- Returns:
Spectrally normalized \(W_{sn}\) with the same shape as \(W\).
- Return type:
Reduction¶
- nnabla.functions.sum(x, axis=None, keepdims=False)[source]¶
Reduction along axes with sum operation.
- Parameters:
- Returns:
N-D array.
- Return type:
- nnabla.functions.mean(x, axis=None, keepdims=False)[source]¶
Reduction along axes with mean operation.
- Parameters:
- Returns:
N-D array.
- Return type:
- nnabla.functions.max(x, axis=None, keepdims=False, with_index=False, only_index=False)[source]¶
Reduce the input N-D array
x
along the givenaxis
using the max operation. Theaxis
argument may be a single integer to reduce over one axis, a tuple of integers to reduce over multiple axes, orNone
to reduce over all axes. Ifkeepdims
isTrue
, the output will keep all reduced dimensions with size 1. Ifwith_index
is True, result is a tuple(sorted, indices)
or onlyindices
ifonly_index
is True. Settingonly_index
to True implies thatwith_index
is also True.import numpy as np import nnabla as nn import nnabla.functions as F nn.set_auto_forward(True) x = nn.Variable.from_numpy_array(np.random.rand(2, 3, 4)) maxval = F.max(x, axis=1) assert np.allclose(maxval.d, np.max(x.d, axis=1)) maxval, indices = F.max(x, axis=1, with_index=True) assert np.allclose(maxval.d, np.max(x.d, axis=1)) assert np.all(indices.d == np.argmax(x.d, axis=1)) indices = F.max(x, axis=1, only_index=True) assert np.all(indices.d == np.argmax(x.d, axis=1))
- Parameters:
x (Variable) – An input variable.
axis (None, int or tuple of ints) – Axis or axes along which max is calculated. The default value
None
will reduce all dimensions.keepdims (bool) – Keep reduced axes as dimension with 1 element.
with_index (bool) – Return tuple of max values and index.
only_index (bool) – Return only the index of max values.
- Returns:
N-D array.
- Return type:
- nnabla.functions.min(x, axis=None, keepdims=False, with_index=False, only_index=False)[source]¶
Reduce the input N-D array
x
along the givenaxis
using the min operation. Theaxis
argument may be a single integer to reduce over one axis, a tuple of integers to reduce over multiple axes, orNone
to reduce over all axes. Ifkeepdims
isTrue
, the output will keep all reduced dimensions with size 1. Ifwith_index
is True, result is a tuple(sorted, indices)
or onlyindices
ifonly_index
is True. Settingonly_index
to True implies thatwith_index
is also True.import numpy as np import nnabla as nn import nnabla.functions as F nn.set_auto_forward(True) x = nn.Variable.from_numpy_array(np.random.rand(2, 3, 4)) minval = F.min(x, axis=1) assert np.allclose(minval.d, np.min(x.d, axis=1)) minval, indices = F.min(x, axis=1, with_index=True) assert np.allclose(minval.d, np.min(x.d, axis=1)) assert np.all(indices.d == np.argmin(x.d, axis=1)) indices = F.min(x, axis=1, only_index=True) assert np.all(indices.d == np.argmin(x.d, axis=1))
- Parameters:
x (Variable) – An input variable.
axis (None, int or tuple of ints) – Axis or axes along which min is calculated. The default value
None
will reduce all dimensions.keepdims (bool) – Keep reduced axes as dimension with 1 element.
with_index (bool) – Return tuple of min values and index.
only_index (bool) – Return only the index of min values.
- Returns:
N-D array.
- Return type:
- nnabla.functions.norm(x, p=None, axis=None, keepdims=False)[source]¶
Reduction along axes with norm operation.
\[y = \|x\|_p = \left( \sum_i |x_i|^p \right)^{\frac{1}{p}}\]- Parameters:
- Returns:
N-D array.
- Return type:
- nnabla.functions.prod(x, axis=None, keepdims=False)[source]¶
Reduction along axes with product operation.
- Parameters:
- Returns:
N-D array.
- Return type:
Note
Backward computation is not accurate in a zero value input.
- nnabla.functions.reduce_sum(x, n_outputs=-1, outputs=None)[source]¶
Reduction along an axis with sum operation.
Note
This is deprecated. Use
sum
instead.Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.reduce_mean(x, n_outputs=-1, outputs=None)[source]¶
Reduction by mean along an axis.
Note
This is deprecated. Use
mean
instead.Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Arithmetic¶
- nnabla.functions.add2(x0, x1, inplace=False, n_outputs=-1, outputs=None)[source]¶
Element-wise addition.
\[y_i = x^{(0)}_i + x^{(1)}_i\]- Parameters:
- Returns:
N-D array
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.add_n(*x, **kw)[source]¶
Element-wise addition.
\[y_i = x^{(0)}_i + . . . + x^{(n-1)}_i\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.sub2(x0, x1, inplace=False, n_outputs=-1, outputs=None)[source]¶
Element-wise subtraction.
\[y_i = x^{(0)}_i - x^{(1)}_i\]- Parameters:
- Returns:
N-D array
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.mul2(x0, x1, inplace=False, n_outputs=-1, outputs=None)[source]¶
Element-wise multiplication.
\[y_i = x^{(0)}_i x^{(1)}_i\]- Parameters:
- Returns:
N-D array
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.mul_n(*x, **kw)[source]¶
Element-wise multiplication.
\[y_i = x^{(0)}_i . . . x^{(n-1)}_i\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.div2(x0, x1, inplace=False, n_outputs=-1, outputs=None)[source]¶
Element-wise division.
\[y_i = \frac{x^{(0)}_i} {x^{(1)}_i}\]- Parameters:
- Returns:
N-D array
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.pow2(x0, x1, inplace=False, n_outputs=-1, outputs=None)[source]¶
Element-wise power function.
\[y_i = {(x^{(0)}_i)} ^ {x^{(1)}_i}\]- Parameters:
- Returns:
N-D array
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.add_scalar(x, val=1, inplace=False, n_outputs=-1, outputs=None)[source]¶
Element-wise scalar addition.
\[y_i = x_i + v\]- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.mul_scalar(x, val=1, inplace=False, n_outputs=-1, outputs=None)[source]¶
Element-wise scalar multiplication.
\[y_i = v x_i\]- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.pow_scalar(x, val=1, inplace=False, n_outputs=-1, outputs=None)[source]¶
Element-wise scalar power function.
\[y_i = (x_i) ^ v\]- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.r_sub_scalar(x, val=1, n_outputs=-1, outputs=None)[source]¶
Element-wise scalar subtraction.
\[y_i = v - x_i\]- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.r_div_scalar(x, val=1, n_outputs=-1, outputs=None)[source]¶
Element-wise scalar division.
\[y_i = \frac{v}{x_i}\]- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.r_pow_scalar(x, val=1, n_outputs=-1, outputs=None)[source]¶
Element-wise scalar power function.
\[y_i = v ^ {x_i}\]- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Logical¶
- nnabla.functions.equal(x0, x1, n_outputs=-1, outputs=None)[source]¶
Element wise ‘equal’
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i = x^{(1)}_i) \\ 0 & otherwise \end{cases}.\end{split}\]- Parameters:
- Returns:
No Description
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.equal_scalar(x0, val=1, n_outputs=-1, outputs=None)[source]¶
Element wise ‘equal’ with a scalar
\[\begin{split}f(x_i,v) = \begin{cases} 1 & (x_i = v) \\ 0 & otherwise \end{cases}.\end{split}\]- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.greater(x0, x1, n_outputs=-1, outputs=None)[source]¶
Element wise comparison. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i > x^{(1)}_i) \\ 0 & (x^{(0)}_i \leq x^{(1)}_i) \end{cases}.\end{split}\]- Parameters:
- Returns:
No Description
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.greater_equal(x0, x1, n_outputs=-1, outputs=None)[source]¶
Element wise comparison. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i \geq x^{(1)}_i) \\ 0 & (x^{(0)}_i < x^{(1)}_i) \end{cases}.\end{split}\]- Parameters:
- Returns:
No Description
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.greater_equal_scalar(x0, val=1, n_outputs=-1, outputs=None)[source]¶
Element wise comparison with a scalar. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,v) = \begin{cases} 1 & (x^{(0)}_i \geq v \\ 0 & (x^{(0)}_i < v \end{cases}.\end{split}\]- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.greater_scalar(x0, val=1, n_outputs=-1, outputs=None)[source]¶
Element wise comparison with a scalar. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,v) = \begin{cases} 1 & (x^{(0)}_i > v \\ 0 & (x^{(0)}_i \leq v \end{cases}.\end{split}\]- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.less(x0, x1, n_outputs=-1, outputs=None)[source]¶
Element wise comparison. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i < x^{(1)}_i) \\ 0 & (x^{(0)}_i \geq x^{(1)}_i) \end{cases}.\end{split}\]- Parameters:
- Returns:
No Description
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.less_equal(x0, x1, n_outputs=-1, outputs=None)[source]¶
Element wise comparison. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i \leq x^{(1)}_i) \\ 0 & (x^{(0)}_i > x^{(1)}_i) \end{cases}.\end{split}\]- Parameters:
- Returns:
No Description
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.less_equal_scalar(x0, val=1, n_outputs=-1, outputs=None)[source]¶
Element wise comparison with a scalar. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,v) = \begin{cases} 1 & (x^{(0)}_i \leq v) \\ 0 & (x^{(0)}_i > v) \end{cases}.\end{split}\]- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.less_scalar(x0, val=1, n_outputs=-1, outputs=None)[source]¶
Element wise comparison with a scalar. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,v) = \begin{cases} 1 & (x^{(0)}_i < v) \\ 0 & (x^{(0)}_i \geq v) \end{cases}.\end{split}\]- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.logical_and(x0, x1, n_outputs=-1, outputs=None)[source]¶
Elementwise logical AND.
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i \neq 0 \;\&\; x^{(1)}_i \neq 0) \\ 0 & otherwise \end{cases}.\end{split}\]- Parameters:
- Returns:
No Description
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.logical_and_scalar(x0, val, n_outputs=-1, outputs=None)[source]¶
Elementwise logical AND with scalar.
\[\begin{split}f(x_i,v) = \begin{cases} 1 & (x_i \neq 0 \;\&\; v \neq 0) \\ 0 & otherwise \end{cases}.\end{split}\]- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.logical_not(x0, n_outputs=-1, outputs=None)[source]¶
Element-wise logical NOT operation
\[\begin{split}f(x_i) = \begin{cases} 1 & (x_i = 0) \\ 0 & otherwise \end{cases}.\end{split}\]- Parameters:
x0 (Variable) – Input variable
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.logical_or(x0, x1, n_outputs=-1, outputs=None)[source]¶
Elementwise logical OR.
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 0 & (x^{(0)}_i = 0 \;\&\; x^{(1)}_i = 0) \\ 1 & otherwise \end{cases}.\end{split}\]- Parameters:
- Returns:
No Description
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.logical_or_scalar(x0, val, n_outputs=-1, outputs=None)[source]¶
Elementwise logical OR with scalar.
\[\begin{split}f(x_i,v) = \begin{cases} 0 & (x_i = 0 \;\&\; v = 0) \\ 1 & otherwise \end{cases}.\end{split}\]- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.logical_xor(x0, x1, n_outputs=-1, outputs=None)[source]¶
Elementwise logical XOR.
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i = 0 \;\&\; x^{(1)}_i = 0) \\ 1 & (x^{(0)}_i \neq 0 \;\&\; x^{(1)}_i \neq 0) \\ 0 & otherwise \end{cases}.\end{split}\]- Parameters:
- Returns:
No Description
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.logical_xor_scalar(x0, val, n_outputs=-1, outputs=None)[source]¶
Elementwise logical XOR with scalar.
\[\begin{split}f(x_i,v) = \begin{cases} 1 & (x_i = 0 \;\&\; v = 0) \\ 1 & (x_i \neq 0 \;\&\; v \neq 0) \\ 0 & otherwise \end{cases}.\end{split}\]- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.not_equal(x0, x1, n_outputs=-1, outputs=None)[source]¶
Element wise ‘not equal’
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 0 & (x^{(0)}_i = x^{(1)}_i) \\ 1 & otherwise \end{cases}.\end{split}\]- Parameters:
- Returns:
No Description
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.not_equal_scalar(x0, val=1, n_outputs=-1, outputs=None)[source]¶
Element wise ‘not equal’ with a scalar
\[\begin{split}f(x_i,v) = \begin{cases} 0 & (x_i = v) \\ 1 & otherwise \end{cases}.\end{split}\]- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.sign(x, alpha=1.0, n_outputs=-1, outputs=None)[source]¶
Element-wise sign function.
In the forward pass, it is defined as
\[\begin{split}f(x) = \begin{cases} 1 & (x > 0) \\ -1 & (x < 0) \\ \alpha & (x = 0) \end{cases}.\end{split}\]In the backward pass, it is defined as
\[\frac{\partial f(x)}{\partial x} = 1,\]or in other words, it behaves as the identity function for the gradient in the backward pass.
- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.minimum2(x0, x1, n_outputs=-1, outputs=None)[source]¶
Element-wise minimum.
\[y_i = \min(x^{(0)}_i, x^{(1)}_i)\]- Parameters:
- Returns:
N-D array of min value
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.maximum2(x0, x1, n_outputs=-1, outputs=None)[source]¶
Element-wise maximum.
\[y_i = \max(x^{(0)}_i, x^{(1)}_i)\]- Parameters:
- Returns:
N-D array of max value
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.minimum_scalar(x, val=1.0, n_outputs=-1, outputs=None)[source]¶
Element-wise scalar minimum.
\[y_i = \min(x_i, v)\]- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.maximum_scalar(x, val=1.0, n_outputs=-1, outputs=None)[source]¶
Element-wise scalar maximum.
\[y_i = \max (x_i, v)\]- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.isnan(x0, n_outputs=-1, outputs=None)[source]¶
Test element-wise for NaN and return a
0/1
array.- Parameters:
x0 (Variable) – Input variable
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.isinf(x0, n_outputs=-1, outputs=None)[source]¶
Test element-wise for
inf/-inf
and return a0/1
array.- Parameters:
x0 (Variable) – Input variable
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.reset_nan(x0, val=0, n_outputs=-1, outputs=None)[source]¶
Replace NaNs with a scalar value specified by
val
.- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.reset_inf(x0, val=0, n_outputs=-1, outputs=None)[source]¶
Replace
-inf/inf
with a scalar value specified byval
.- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.where(condition, x_true, x_false, n_outputs=-1, outputs=None)[source]¶
Return elements, either from
x_true
orx_false
, depending oncondition
.If rank of
condition
is higher than those ofx_true
andx_false
, the first dimensions ofx_true
andx_false
must match the dimensions ofcondition
.Example:
import numpy as np import nnabla as nn import nnabla.functions as F a = nn.Variable.from_numpy_array(np.random.rand(2, 3)) x = nn.Variable.from_numpy_array(np.random.rand(2, 3, 4)) y = nn.Variable.from_numpy_array(np.random.rand(2, 3, 4)) z = F.where(F.greater_scalar(a, 0.5), x, y) z.forward() # Numpy equivalent z_numpy = np.where(a.d > 0.5, x.d, y.d) assert np.allclose(z_numpy, z.d)
- Parameters:
- Returns:
N-D array with the same shape as condition
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Math¶
- nnabla.functions.constant(val=0, shape=[], n_outputs=-1, outputs=None)[source]¶
Generate a constant-valued array.
- Parameters:
- Returns:
N-D array where all values are the specified constant.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.arange(start, stop, step=1, n_outputs=-1, outputs=None)[source]¶
Generate a range of values within the half-open interval
[start, stop)
(the interval including start but excluding stop) withstep
increments.- Parameters:
- Returns:
1-D array with the generated values.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.abs(x, n_outputs=-1, outputs=None)[source]¶
Element-wise absolute value function.
\[y_i = |x_i|\]- Parameters:
x (Variable) – Input variable
- Returns:
Element-wise absolute variable
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.exp(x, n_outputs=-1, outputs=None)[source]¶
Element-wise natural exponential function.
\[y_i = \exp(x_i).\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.log(x, n_outputs=-1, outputs=None)[source]¶
Element-wise natural logarithm function.
\[y_i = \ln(x_i).\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.round(x, n_outputs=-1, outputs=None)[source]¶
Element-wise round function.
In the forward pass, this function simply computes
round
to the nearest integer value.\[y_i = round(x_i).\]In the backward pass, the simple Straight-Through Estimator (STE) is applied,
\[\frac{\partial y_i}{\partial x_i} = 1.\]- Parameters:
x (Variable) – Input variable
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.ceil(x, n_outputs=-1, outputs=None)[source]¶
Element-wise ceil function.
In the forward pass, this function simply returns the smallest integer which is not less than the input.
\[y_i = ceil(x_i).\]In the backward pass, the simple Straight-Through Estimator (STE) is applied,
\[\frac{\partial y_i}{\partial x_i} = 1.\]- Parameters:
x (Variable) – Input variable
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.floor(x, n_outputs=-1, outputs=None)[source]¶
Element-wise floor function.
In the forward pass, this function simply returns the largest integer which is not greater than the input.
\[y_i = floor(x_i).\]In the backward pass, the simple Straight-Through Estimator (STE) is applied,
\[\frac{\partial y_i}{\partial x_i} = 1.\]- Parameters:
x (Variable) – Input variable
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.identity(x, n_outputs=-1, outputs=None)[source]¶
Identity function.
\[y = x\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.matrix_diag(x, n_outputs=-1, outputs=None)[source]¶
Returns an array where the last two dimensions consist of the diagonal matrix.
- Parameters:
x (Variable) – N-D array with shape (\(M_0 \times \ldots \times M_N\)).
- Returns:
N-D array with shape (\(M_0 \times \ldots \times M_N \times M_N\)).
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.matrix_diag_part(x, n_outputs=-1, outputs=None)[source]¶
Returns an array in which the values of the last dimension consist of the diagonal elements of the last two dimensions of an input array.
- Parameters:
x (Variable) – N-D array with shape (\(M_0 \times \ldots \times M_N \times M_N\)).
- Returns:
N-D array with shape (\(M_0 \times \ldots \times M_N\)).
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.batch_matmul(a, b, transpose_a=False, transpose_b=False, n_outputs=-1, outputs=None)[source]¶
Batch matrix multiplication.
Two of batchs of matrices are multiplied for each sample in a batch. A batch of matrices is composed as […, P, Q] where the last two dimensions compose matrix dimensions, and the first dimensions up to the third last dimension are considered as batch samples. These batch dimensions are internally broadcasted when the size of a dimension is 1.
Example:
import nnabla as nn import nnabla.functions as F import numpy as np nn.set_auto_forward(True) # Same batch size a = nn.Variable.from_numpy_array(np.random.rand(2, 2, 3, 4)) b = nn.Variable.from_numpy_array(np.random.rand(2, 2, 4, 3)) c = F.batch_matmul(a, b) # Different batch size with the broadcast a = nn.Variable.from_numpy_array(np.random.rand(2, 1, 3, 4)) b = nn.Variable.from_numpy_array(np.random.rand(1, 3, 4, 3)) c = F.batch_matmul(a, b)
Warning
Since the version 1.13, the behavior of the batch dimensions changed, it supported the internal broadcast when the size of a dimension is 1. Accordingly, this function does not supports different batch dimensions between two inputs even if the total sample size for each input is same.
- Parameters:
a (Variable) – N-D array with >= 2-dim. The last two dimensions will be treated as a matrix.
b (Variable) – N-D array with >= 2-dim. The last two dimensions will be treated as a matrix. The product of the size of 0-th dimension through the size of the third last dimension must be same as that of the input
a
.transpose_a (bool) – Transpose the last two axes of
a
in matrix multiplication. [default=False
]transpose_b (bool) – Transpose the last two axes of
b
in matrix multiplication. [default=False
]
- Returns:
Output of sample-wise matrix multiplication in a batch. When
a
is of a shape of [N, P, Q],b
is of a shape of [N, Q, R], and transpose options are all False, the output will be a shape of [N, P, R].- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.sin(x, n_outputs=-1, outputs=None)[source]¶
Element-wise sine (sin) function.
\[y_i = \sin (x_i)\]- Parameters:
x (Variable) – N-D array
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.cos(x, n_outputs=-1, outputs=None)[source]¶
Element-wise cosine (cos) function.
\[y_i = \cos (x_i)\]- Parameters:
x (Variable) – N-D array
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.tan(x, n_outputs=-1, outputs=None)[source]¶
Element-wise tangent (tan) function.
\[y_i = \tan (x_i)\]- Parameters:
x (Variable) – N-D array
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.sinh(x, n_outputs=-1, outputs=None)[source]¶
Element-wise hyperbolic sine (sinh) function.
\[y_i = \sinh (x_i)\]- Parameters:
x (Variable) – N-D array
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.cosh(x, n_outputs=-1, outputs=None)[source]¶
Element-wise hyperbolic cosine (cosh) function.
\[y_i = \cosh (x_i)\]- Parameters:
x (Variable) – N-D array
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.tanh(x, n_outputs=-1, outputs=None)[source]¶
Element-wise hyperbolic tangent (tanh) function.
\[y_i = \tanh (x_i)\]- Parameters:
x (Variable) – N-D array
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.asin(x, n_outputs=-1, outputs=None)[source]¶
Element-wise arcsine (asin) function.
\[y_i = \arcsin (x_i)\]- Parameters:
x (Variable) – N-D array
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.acos(x, n_outputs=-1, outputs=None)[source]¶
Element-wise arccosine (acos) function.
\[y_i = \arccos (x_i)\]- Parameters:
x (Variable) – N-D array
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.atan(x, n_outputs=-1, outputs=None)[source]¶
Element-wise arctangent (atan) function.
\[y_i = \arctan (x_i)\]- Parameters:
x (Variable) – N-D array
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.atan2(x0, x1, n_outputs=-1, outputs=None)[source]¶
Element-wise arctangent (atan) function with 2 input variables.
\[y_i = \arctan2 (x_{i1}, x_{i2})\]- Parameters:
- Returns:
N-D array with the same shape as input variables
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.asinh(x, n_outputs=-1, outputs=None)[source]¶
Element-wise hyperbolic arcsine (asinh) function.
\[y_i = \text{arcsinh} (x_i)\]- Parameters:
x (Variable) – N-D array
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.acosh(x, n_outputs=-1, outputs=None)[source]¶
Element-wise hyperbolic arccosine (acosh) function.
\[y_i = \text{arccosh} (x_i)\]- Parameters:
x (Variable) – N-D array
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.atanh(x, n_outputs=-1, outputs=None)[source]¶
Element-wise hyperbolic arctangent (atanh) function.
\[y_i = \text{arctanh} (x_i)\]- Parameters:
x (Variable) – N-D array
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.cumsum(x, axis=None, exclusive=False, reverse=False, n_outputs=-1, outputs=None)[source]¶
Cumulative sum along a given axis.
- Parameters:
- Returns:
N-D array
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.cumprod(x, axis=None, exclusive=False, reverse=False, n_outputs=-1, outputs=None)[source]¶
Cumulative product along a given axis.
Note
Backward computation is not accurate in a zero value input.
- Parameters:
- Returns:
N-D array
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.batch_inv(x, n_outputs=-1, outputs=None)[source]¶
Returns an array of inverted matrix
- Parameters:
x (Variable) – batched N-D array
- Returns:
batched N-D array of inverted matrix
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.batch_det(x, n_outputs=-1, outputs=None)[source]¶
Batch-wise determinant function.
\[Y_b = \det(X_b),\]where \(X_b\) and \(Y_b\) are the \(b\)-th input and output, respectively.
- Parameters:
x (Variable) – batched N-D array
- Returns:
batched N-D array of determinant
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.batch_logdet(x, n_outputs=-1, outputs=None)[source]¶
Batch-wise log absolute determinant function.
\[Y_b = \log(|\det(X_b)|),\]where \(X_b\) and \(Y_b\) are the \(b\)-th input and output, respectively.
- Parameters:
x (Variable) – batched N-D array
- Returns:
batched N-D array of log absolute determinant
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.batch_cholesky(x, upper=False, n_outputs=-1, outputs=None)[source]¶
Batch-wise cholesky decomposition of symmetric positive definite matrix. The gradient of this function will be a symmetric matrix. This function does not check whether given matrix is symmetric positive define matrix or not.
- Parameters:
- Returns:
batched N-D array of lower/upper triangular matrix.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.erf(x, n_outputs=-1, outputs=None)[source]¶
Element-wise Error function.
\[y_i = \text{erf} (x_i)\]- Parameters:
x (Variable) – N-D array
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Array Manipulation¶
- nnabla.functions.concatenate(*x, **kw)[source]¶
Concatenate a variable number of input arrays along the specified axis.
- Parameters:
- Returns:
Concatenate variable
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.split(x, axis=0)[source]¶
Split arrays at the specified axis.
It returns a number corresponding the size of the given axis (i.e
x.shape[axis]
) ofVariable
s.Returns: A
tuple
ofVariable
sSee also
nnabla.function_bases.split()
.
- nnabla.functions.stack(*x, **kw)[source]¶
Joins two or more arrays on a new axis.
Note
Unlike
nnabla.functions.concatenate()
, which joins arrays on an existing axis, Stack joins arrays on a new axis.- Parameters:
*x (Variable) – N-D arrays. The sizes of all the arrays to be stacked must be the same. [variadic]
axis (int) – The axis on which to concatenate arrays. Axis indices take on values 0, 1, 2, and so on from the left. For example, to stack four (3,28,28) inputs on the second axis, specify 1. In this case, the output size will be (3,4,28,28). [default=
0
]
- Returns:
Output
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.slice(x, start=None, stop=None, step=None, n_outputs=-1, outputs=None)[source]¶
Slice arrays along specified axis. This function complies with python slice where
slice(None, None, -1)
andslice(-1, None, -1)
are the special case, which flips the input array and results in the output array from the end to the beginning of the input array along the corresponding dimension.- Parameters:
x (Variable) – N-D array
start (repeated int64) – Start indices for each axis [default=
(0,) * len(x.shape)
]stop (repeated int64) – Stop indices for each axis [default=
tuple(x.shape)
]step (repeated int64) – Step indices for each axis [default=
(1,) * len(x.shape)
]
- Returns:
Sliced N-D array
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.gather(x, Indices, axis=None, batch_dims=None, n_outputs=-1, outputs=None)[source]¶
Gather from the input data according to the index.
Given the input data \(X\) of \((D_{0}, \ldots, D_{N-1})\) shape and the indices \(IDX\) of \((I_{0}, \ldots, I_{M-1})\) shape, in case of
batch_dims = 0
, the gather outputs\[\begin{split}&& Y[d_{0}, \ldots, d_{axis - 1}, i_{0}, \ldots, i_{M-1}, d_{axis + 1}, \ldots, d_{N-1}] = \\ && X[d_{0}, \ldots, d_{axis - 1}, IDX[i_{0}, \ldots, i_{M-1}], d_{axis + 1}, \ldots, d_{N-1}].\end{split}\]Generally, the gather outputs
\[\begin{split}&& Y[d_{0}, \ldots, d_{axis - 1}, i_{B}, \ldots, i_{M-1}, d_{axis + 1}, \ldots, d_{N-1}] = \\ && X[d_{0}, \ldots, d_{axis - 1}, IDX[i_{0}, \ldots, i_{B - 1}, i_{B} \ldots, i_{M-1}], d_{axis + 1}, \ldots d_{N-1}].\end{split}\]where \(B\) =
batch_dims
.x.shape[:batch_dims]
must be equal toindices.shape[:batch_dims]
.Output shape is
x.shape[:axis] + indices.shape[batch_dims:] + x.shape[axis + 1]
.- Parameters:
- Returns:
Gathered output.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.gather_nd(data, indices)[source]¶
Gather elements or slices from
data
according toindices
, which must be at least two-dimensional with the first dimension \(M\) being less or equal to the \(N\) dimensions ofdata
. Givendata
with shape \((X_0, X_1, ..., X_{N-1})\) and indices with shape \((M, Y_0, ..., Y_{K-1})\) output has shape \((Y_0, ..., Y_{K-1}, X_M, ..., X_{N-1})\). If \(M == N\), output shape is simply \((Y_0, ..., Y_{K-1})\).The forward of
gather_nd()
is equivalent to:def gather_nd(data, index): import numpy as np tmp_index = index.reshape(index.shape[0], -1) tmp_index = (idx + (Ellipsis,) for idx in zip(*new_index)) out_shape = index.shape[1:] + data.shape[index.shape[0]:] return np.vstack(data[idx] for idx in tmp_index).reshape(*out_shape)
Examples:
>>> import numpy as np, nnabla as nn, nnabla.functions as F >>> nn.set_auto_forward(True) >>> data = F.arange(1, 11).reshape([2, 5]) >>> print(data.d) [[ 1. 2. 3. 4. 5.] [ 6. 7. 8. 9. 10.]] >>> F.gather_nd(data, [[1, 1, 0]]).shape (3, 5) >>> F.gather_nd(data, [[1, 1, 0], [0, 1, 0]]).shape (3,) >>> print(F.gather_nd(data, [[1, 1, 0], [0, 1, 0]]).d) [6. 7. 1.] >>> print(F.gather_nd(data, [[1, 1, 0]]).d) [[ 6. 7. 8. 9. 10.] [ 6. 7. 8. 9. 10.] [ 1. 2. 3. 4. 5.]]
When
indices
is provided as aVariable
it will be possible to change the actual index values after function creation. It is important to note that out-of-bound indices raise errors when running on CPU but are ignored when using an accelerated computation context.>>> indices = nn.Variable((2, 1)) >>> indices.d = [[0], [0]] >>> y = F.gather_nd(data, indices) >>> print(y.d) [1.] >>> indices.d = [[1], [4]] >>> y.forward() >>> print(y.d) [10.]
- Parameters:
Returns: ~nnabla.Variable or ~nnabla.NdArray of gathered elements.
- nnabla.functions.scatter_nd(data, indices, shape=None, out=None, add=False)[source]¶
Scatter
data
according toindices
into a new array of givenshape
or an existing array provided asout
. Exactly one of theshape
orout
argument must be given. Given outputshape
, or shape ofout
array, \((X_0,X_1,\ldots,X_{N-1})\) andindices
shape \((M,Y_0,\ldots,Y_{K-1})\) the inputdata
shape is \((Y_0,\ldots,Y_{K-1},X_M,\ldots,X_{N-1})\), where \(M<=N\). If \(M==N\) thedata
shape is simply \((Y_0,\ldots,Y_{K-1})\). Note thatindices
are treated as integers and potentially converted.The forward of
scatter_nd()
is equivalent to:def scatter_nd(data, indices, shape=None, out=None): assert (shape and not out) or (out and not shape) if isinstance(indices, numpy.ndarray) indices = indices.tolist() result = out if out else numpy.zeros(shape) result[indices] = data return result
Examples:
>>> import numpy as np, nnabla as nn, nnabla.functions as F >>> nn.set_auto_forward(True) >>> data = nn.Variable.from_numpy_array(np.array([9, 10, 11, 12])) >>> indices = nn.Variable.from_numpy_array(np.array([[4, 3, 1, 7]])) >>> scattered = F.scatter_nd(data, indices, shape=(8,)) >>> print(scatterd.d) [ 0. 11. 0. 10. 9. 0. 0. 12.] >>> print(F.gather_nd(scattered, indices).d) [ 9. 10. 11. 12.]
- Parameters:
Returns: ~nnabla.Variable or ~nnabla.NdArray of given
shape
.
- nnabla.functions.scatter_add(x0, indices, x1, axis=None)[source]¶
Add all values from
x1
into thex0
according to index specified byindices
. This function addsx1
into the copy ofx0
and outputs the copy. The originalx0
will not be changed.x0
,indices
andx1
must have same number of dimensions.The forward of
scatter_add()
is equivalent to:def scatter_add(x0, indices, x1, axis): # Assuming each input is 3 dimensional import numpy as np output = np.copy(x0) for i in range(indices.shape[0]): for j in range(indices.shape[1]): for k in range(indices.shape[2]): if axis == 0: output[indices[i][j][k]][j][k] += x1[i][j][k] elif axis == 1: output[i][indices[i][j][k]][k] += x1[i][j][k] elif axis == 2: output[i][j][indices[i][j][k]] += x1[i][j][k] return output
- Parameters:
x0 (Variable) – N-D array which the data is added to its copy.
indices (Variable) – N-D array scatter indices. The size of each dimension must be equal or smaller than that of x0 except for the specified axis. The value of indices must be smaller than the size of specified axis’ dimension of x0. The size of each dimension must be equal or smaller than that of x1. Indices must not be negative.
x1 (Variable) – N-D array which is scattered and added to x0.
axis (int) – Axis along which to index. The axis must not exceed the inputs’ dimension. [default=
0
]
- Returns:
N-D array which contains the result of scatter addition. The shape is same as x0.
- Return type:
- nnabla.functions.pad(x, pad_width, mode='constant', constant_value=0, n_outputs=-1, outputs=None)[source]¶
Pad the input N-D array
x
over the number of dimensions given by half the length of thepad_width
iterable, where every two values inpad_width
determine the before and after pad size of an axis. Thepad_width
iterable must hold an even number of positive values which may cover all or fewer dimensions of the input variablex
. Ifpad_width
covers fewer dimensions then it applies to the innermost dimensions ofx
.x = nn.Variable.from_numpy_array(np.ones((2, 3, 4))) assert F.pad(x, (1, 1, 2, 2)).shape == (2, 5, 8)
Padding is performed according to the requested
mode
:- constant
Pads with a value given by the keyword argument
constant_value
.x = nn.Variable.from_numpy_array(np.array([1, 2, 3, 4], dtype=np.int)) y = F.pad(x, (3, 3), 'constant', constant_value = -1) y.forward() assert np.all(y.d == np.array([-1, -1, -1, 1, 2, 3, 4, -1, -1, -1]))
- reflect
Pads with the reflection of the vector mirrored on the first and last values of the vector along each axis.
x = nn.Variable.from_numpy_array(np.array([1, 2, 3, 4], dtype=np.int)) y = F.pad(x, (3, 3), 'reflect') y.forward() assert np.all(y.d == np.array([4, 3, 2, 1, 2, 3, 4, 3, 2, 1]))
- repeat
Pads with the edge value of the vector along each axis.
x = nn.Variable.from_numpy_array(np.array([1, 2, 3, 4], dtype=np.int)) y = F.pad(x, (3, 3), 'repeat') y.forward() assert np.all(y.d == np.array([1, 1, 1, 1, 2, 3, 4, 4, 4, 4]))
- Parameters:
- Returns:
Padded N-D array with the same number of dimensions as the input.
x = nn.Variable((3, 3, 4, 2)) # a shape like (B, C, H, W) # 1-D padding: last dim by 1 left and 2 on the right side assert F.pad(x, (1, 2)).shape == (3, 3, 4, 5) # 2-D padding: last dim by (1, 1) and 2nd to last by (2, 2) assert F.pad(x, (2, 2, 1, 1)).shape == (3, 3, 8, 4) # 3-D padding: dims C by (0, 1), H by (2, 1), and W by (3, 3) assert F.pad(x, (0, 1, 2, 1, 3, 3)).shape == (3, 4, 7, 8)
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.transpose(x, axes, n_outputs=-1, outputs=None)[source]¶
Transposes tensor dimensions.
- Parameters:
x (Variable) – N-D array
axes (repeated int64) – Source axis indices for each axis.
- Returns:
Transposed N-D array.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.broadcast(x, shape, n_outputs=-1, outputs=None)[source]¶
Broadcasting ND-array to the specified shape.
- Parameters:
- Returns:
Broadcasted N-D array
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.broadcast_to(x, y, axis=None, n_outputs=-1, outputs=None)[source]¶
Warning
This function is experimental support, so please do not actively use it.
Broadcasting ND-array to the specified buffer.
- Parameters:
- Returns:
Broadcasted N-D array
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.tile(x, reps)[source]¶
Forward
x
repeated the number of times given byreps
. Ifreps
is a sequence, the output has dimension ofd = max(len(reps), x.ndim)
and eitherx
is promoted to be d-dimensional by prepending new axes orreps
is promoted to x.ndim by prepending 1’s.- Parameters:
- Returns:
N-D array.
- Return type:
>>> import numpy as np, nnabla as nn, nnabla.functions as F >>> F.tile(nn.Variable([2, 3]), 3).shape # reps is promoted to [1, 3] (2, 9) >>> F.tile(nn.Variable([3]), [2, 3]).shape # x is promoted to shape (1, 3) (2, 9) >>> nn.set_auto_forward(True) >>> x = nn.Variable.from_numpy_array(np.array([1, 2, 3])) >>> print(F.tile(x, 3).d) [1. 2. 3. 1. 2. 3. 1. 2. 3.] >>> print(F.tile(x, [2, 3]).d) [[1. 2. 3. 1. 2. 3. 1. 2. 3.] [1. 2. 3. 1. 2. 3. 1. 2. 3.]] >>> x = nn.Variable.from_numpy_array(np.array([[1, 3], [2, 4]])) >>> print(F.tile(x, 3).d) [[1. 3. 1. 3. 1. 3.] [2. 4. 2. 4. 2. 4.]] >>> print(F.tile(x, [2, 3]).d) [[1. 3. 1. 3. 1. 3.] [2. 4. 2. 4. 2. 4.] [1. 3. 1. 3. 1. 3.] [2. 4. 2. 4. 2. 4.]]
- nnabla.functions.flip(x, axes=None, n_outputs=-1, outputs=None)[source]¶
Reverses the order of elements of the specified dimension of an array.
- Parameters:
x (Variable) – N-D array
axes (repeated int64) – The index of the dimension to reverse the order of the elements. Axis indices take on values 0, 1, 2, and so on from the left. For example, to flip a 32 (W) by 24 (H) 100 RGB image (100,3,24,32) vertically and horizontally, specify (2,3). [default=
[len(x.shape) - 1]
]
- Returns:
N-D array
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.shift(x, shifts=None, border_mode='nearest', n_outputs=-1, outputs=None)[source]¶
Shifts the array elements by the specified amount.
- Parameters:
x (Variable) – N-D array.
shifts (repeated int64) – The amount to shift elements. For example, to shift image data to the right by 2 pixels and up 3 pixels, specify (-3,2). [default=
(0,) * len(x.shape)
]border_mode (string) – Specify how to process the ends of arrays whose values will be undetermined as a result of shifting. nearest: The data at the ends of the original array is copied and used. reflect: Original data reflected at the ends of the original array is used. [default=
'nearest'
]
- Returns:
N-D array.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.sort(x, axis=-1, reverse=False, with_index=False, only_index=False)[source]¶
Sorts the elements of
x
along a givenaxis
in ascending order by value. A negativeaxis
counts from the last dimension ofx
, so the default of -1 sorts along the last dimension. Ifreverse
is True, then the elements are sorted in descending order.If
with_index
is True, result is a tuple(sorted, indices)
or onlyindices
ifonly_index
is True. Settingonly_index
to True implies thatwith_index
is also True.import numpy as np import nnabla as nn import nnabla.functions as F nn.set_auto_forward(True) x = nn.Variable.from_numpy_array(np.random.rand(2, 3, 4)) sorted = F.sort(x) assert np.allclose(sorted.d, np.sort(x.d)) sorted, indices = F.sort(x, with_index=True) assert np.allclose(sorted.d, np.sort(x.d)) assert np.all(indices.d == np.argsort(x.d)) indices = F.sort(x, only_index=True) assert np.all(indices.d == np.argsort(x.d))
- Parameters:
Returns: ~nnabla.Variable
sorted
or ~nnabla.Variableindices
or (~nnabla.Variablesorted
, ~nnabla.Variableindices
)
- nnabla.functions.shape(x, start=None, end=None, n_outputs=-1, outputs=None)[source]¶
Get the shape of a tensor. Optional attributes start and end can be used to compute a slice of the input tensor’s shape. If start axis is omitted, the slice starts from axis 0.
- Parameters:
- Returns:
1-D array
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.reshape(x, shape, inplace=True, n_outputs=-1, outputs=None)[source]¶
Reshapes the input variable in-place. It does not create a copy of the variable. The output variable (y) has a new shape but points to the same data as the input variable (x). This means that if the data in the output variable (y) is modified, the data in the input variable (x) also gets modified since the reshape was done in-place.
Note
This function has the same behavior as the
nnabla.Variable.reshape()
method.- Parameters:
- Returns:
Reshaped N-D array
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.one_hot(x, shape, n_outputs=-1, outputs=None)[source]¶
This function creates one-hot vector based on input indices. The range [-shape[i], -1] of input indices are regarded as [0, shape[i]-1], and an input index outside [-shape[i], shape[i]-1] generates a vector filled with zero.
Example:
import nnabla as nn import nnabla.functions as F import numpy as np labels = nn.Variable.from_numpy_array(np.array([[9], [4], [5], [-9], [10]])) print(labels.shape) # (5, 1) num_class = 10 y_train = F.one_hot(labels, shape=(num_class, )) y_train.forward() print(y_train.shape) # (5, 10) print(y_train.d) # [[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.] # [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.] # [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]] # Can also be used for ndarray. labels = nn.Variable.from_numpy_array(np.array([[1, 7], [4, 7], [8, 6], [5, 0], [2, 6]])) print(labels.shape) # (5, 2) num_class_1, num_class_2 = 10, 8 y_train = F.one_hot(labels, shape=(num_class_1, num_class_2)) y_train.forward() print(y_train.shape) # (5, 10, 8) print(y_train.d) # [[[0. 0. 0. 0. 0. 0. 0. 0.] [[0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 1.] [0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 1. 0.] # [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 0.] ... [0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 0.]], [0. 0. 0. 0. 0. 0. 0. 0.]]]
- Parameters:
- Returns:
N-D array one-hot vector/tensor.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.assign(dst, src, n_outputs=-1, outputs=None)[source]¶
Assign source array to destination array just like
tf.assign
. This is useful to synchronize or manually update parameters.dst = nn.Variable((2, 3, 4)) src = nn.Variable((2, 3, 4)) assign = F.assign(dst, src) assign.forward() assert np.allclose(dst.d, src.d) # dst and src have identical values. assert np.allclose(assign.d dst.d) # returned Variable is also identical to dst.
Unlike TensorFlow, the returned Variable has a backward path to
dst
:\[g_{dst} = g_{y}\]- Parameters:
- Returns:
An assigned array
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.top_k_data(x, k, abs=False, reduce=True, base_axis=1, largest=True, with_index=False, n_outputs=-1, outputs=None)[source]¶
Select the
k
largest values from each sample inx
to propagate unmodified and set all other values to 0. Ifabs
is True, thek
largest values are selected by magnitude. Ifreduce
is True (the default), all feature dimensions are reduced to a single dimension of sizek
that propagates only thek
largest values. Otherwise, ifreduce
is False, input and output dimensions are identical. Dimensions beforebase_axis
are treated as number of sample dimensions andk
values get selected from all elements of a sample (dimensions frombase_axis
) regardless of shape.>>> import nnabla as nn, nnabla.functions as F >>> x = nn.Variable((4, 5, 6)) >>> F.top_k_data(x, 3, reduce=False).shape (4, 5, 6) >>> F.top_k_data(x, 3, reduce=True).shape (4, 3) >>> F.top_k_data(x, 3, reduce=True, base_axis=2).shape (4, 5, 3)
- Parameters:
x (Variable) – N-D array
k (int) – Number of largest data values to propagate.
abs (bool) – Determine largest data values by magnitude. [default=
False
]reduce (bool) – Reduce feature size to one dimension of size
k
. [default=True
]base_axis (int) – First dimension of the sample shape. [default=
1
]largest (bool) – Whether to select the
k
largest or smallest values. [default=True
]with_index (bool) – Return top-k values and indices. [default=
False
]
- Returns:
N-D array. ~nnabla.Variable: N-D array of top-k indices.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.top_k_grad(x, k, abs=False, base_axis=1, n_outputs=-1, outputs=None)[source]¶
Select the
k
largest gradients for each sample inx
to back-propagate unmodified and set all other gradients to 0. Ifabs
is True, thek
largest gradients are selected by magnitude. Dimensions beforebase_axis
are treated as number of sample dimensions andk
gradients get selected from all gradients of a sample (dimensions frombase_axis
) regardless of shape.- Parameters:
- Returns:
N-D array with same shape and data as
x
.- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.pack_padded_sequence(padded_sequence, lengths, batch_first=False, n_outputs=-1, outputs=None)[source]¶
Pack a padded variable-length sequences.
This method packs a padded variable-length sequences.
\(T_i\) is the length of the \(i\)-th Variable in the sequences. \(B\) is the batch size equal to the length of the sequences. \(T\) is the max of \(T_i\) for all \(i\). \(*\) is the remaining dimensions including none.
Note
This function assumes the length-sorted padded sequence in the decreasing order and must be used by
pack_padded_sequence()
in the dynamic computation mode. See :- Parameters:
padded_sequence (Variable) – Padded sequence of (\(T \times B \times *\)) or (\(B \times T \times *\)) shape.
lengths (Variable) – Sequence length for each batch and always resides in CPU.
batch_first (bool) –
padded_sequence
is of (\(T\), \(B\), \(*\)) shape if False, otherwise (\(B\), \(T\), \(*\)).[default=
False
]
- Returns:
Packed sequence of (\(N\), \(*\)) shape. ~nnabla.Variable: Batch size for each time and always resides in CPU.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.pad_packed_sequence(packed_sequence, batch_sizes, batch_first=False, padding_value=None, total_length=None, n_outputs=-1, outputs=None)[source]¶
Pad packed sequence.
This method unpacks the packed sequqnce and pad it, the inverse operation of
pack_padded_sequence()
.\(T_i\) is the length of the \(i\)-th Variable in the sequences. \(B\) is the batch size equal to the length of the sequences. \(T\) is the max of \(T_i\) for all \(i\). \(*\) is the remaining dimensions including none.
Note
This function assumes the output of the length-sorted padded sequence in the decreasing order and must be used by
pad_packed_sequence()
in the dynamic computation mode.- Parameters:
packed_sequence (Variable) – Packed sequence of (\(N\), \(*\)) shape.
batch_sizes (Variable) – Batch size for each time and always resides in CPU.
batch_first (bool) –
padded_sequence
is of (\(T\), \(B\), \(*\)) shape if False, otherwise (\(B\), \(T\), \(*\)).[default=
False
]padding_value (float) – Padding value. [default=
0.0
]total_length (int) –
If not None, the outputs are padded up to the
total_length
. If thetotal_length
is less than the max length in thesequences
, the error is thrown.[default=
-1
]
- Returns:
Padded sequence of (\(T \times B \times *\)) or (\(B \times T \times *\)) shape. ~nnabla.Variable: Sequence length for each batch and always resides in CPU.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.searchsorted(sorted_sequence, values, right=None, n_outputs=-1, outputs=None)[source]¶
Finds indices in the innermost dimension of a sorted sequance where values must be inserted in order to maintain value
- Parameters:
sorted_sequence (Variable) – N-D array of sorted sequence where search is to be performed. Note that this must be a sorted array
values (Variable) – N-D array of Search values
right (bool) – :If True, given a value v, the function returns index i such that sorted_sequence[i-1] <= v < sorted_sequence[i] (index of closest upper bound of v). By default, this is false so the function returns index i such that a[i-1] < v <= a[i] (index of closest lower bound of v) [default=
False
]
- Returns:
N-D array containing the required indices
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.bool_gather(input, mask, n_outputs=-1, outputs=None)[source]¶
Gather from the input data according to the mask.
Given an input of \((B_1, \ldots, B_N, D_1, \ldots, D_M)\) shape and mask of \((B_1, \ldots, B_N)\) shape, the function returns an output of \((nnz, D_1, \ldots, D_M)\) shape and \(nnz\) is the number of non-zero elements in mask.
import numpy as np import nnabla as nn import nnabla.functions as F nn.set_auto_forward(True) input = nn.Variable.from_numpy_array([[1, 2], [3, 4], [5, 6]]) mask = nn.Variable.from_numpy_array([1, 0, 1]) output = F.bool_gather(input, mask) print(output.d) # [[1, 2], [5, 6]]
Note that this function is normally used with the dynamic graph since this function outputs a variable-length output. If used with the static graph, a network has to be constructed all time in iteration.
- Parameters:
- Returns:
Gathered output.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.bool_scatter(input, mask, output=None, n_outputs=-1, outputs=None)[source]¶
Scatter the
input
according to themask
.Given an input of \((nnz, D_1, \ldots, D_M)\) shape and mask of \((B_1, \ldots, B_N)\) shape, the function returns an output \((B_1, \ldots, B_N, D_1, \ldots, D_M)\) and \(nnz\) is the number of non-zero elements in the mask.
import numpy as np import nnabla as nn import nnabla.functions as F nn.set_auto_forward(True) input0 = nn.Variable.from_numpy_array([[1, 2], [3, 4], [5, 6]]) mask = nn.Variable.from_numpy_array([1, 0, 1]) output0 = F.bool_gather(input0, mask) input1 = output0 + 10 output1 = F.bool_scatter(input1, mask) print(output1.d) # [[11, 12], [0, 0], [15, 16]]
Note that the higher-order gradients of this function relies on F.gather, thus the higher-order gradients of this function is normally used with the dynamic graph.
- Parameters:
- Returns:
Scattered output.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.bool_fill(data, mask, value=0, n_outputs=-1, outputs=None)[source]¶
Fill the data with the value to according to the mask.
import numpy as np import nnabla as nn import nnabla.functions as F nn.set_auto_forward(True) input = nn.Variable.from_numpy_array([[np.inf, 2], [3, np.nan]]) mask = nn.Variable.from_numpy_array([[1, 0], [0, 1]]) output = F.bool_fill(input, mask, -1) print(output.d) # [[-1, 2], [3, -1]]
- Parameters:
- Returns:
Filled output.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.dot(a, b, out=None)[source]¶
A compatible operation with
numpy.dot
.Note
Any operation between nnabla’s Variable/NdArray and numpy array is not supported.
If both arguments are 1-D, it is inner product of vectors. If both arguments are 2-D, it is matrix multiplication. If either a or b is 0-D(scalar), it is equivalent to multiply. If b is a 1-D array, it is a sum product over the last axis of a and b. If b is an M-D array (M>=2), it is a sum product over the last axis of a and the second-to-last axis of b.
- Parameters:
- Returns:
~nnabla.Variable or ~nnabla.NdArray
Examples:
import numpy as np import nnabla as nn import nnabla.functions as F # 2-D matrix * 2-D matrix arr1 = np.arange(5*6).reshape(5, 6) arr2 = np.arange(6*8).reshape(6, 8) nd1 = nn.NdArray.from_numpy_array(arr1) nd2 = nn.NdArray.from_numpy_array(arr2) ans1 = F.dot(nd1, nd2) print(ans1.shape) #(5, 8) var1 = nn.Variable.from_numpy_array(arr1) var2 = nn.Variable.from_numpy_array(arr2) ans2 = F.dot(var1, var2) ans2.forward() print(ans2.shape) #(5, 8) out1 = nn.NdArray((5, 8)) out1.cast(np.float32) F.dot(nd1, nd2, out1) print(out1.shape) #(5, 8) out2 = nn.Variable((5, 8)) out2.data.cast(np.float32) F.dot(var1, var2, out2) out2.forward() print(out2.shape) #(5, 8) # N-D matrix * M-D matrix (M>=2) arr1 = np.arange(5*6*7*8).reshape(5, 6, 7, 8) arr2 = np.arange(2*3*8*6).reshape(2, 3, 8, 6) nd1 = nn.NdArray.from_numpy_array(arr1) nd2 = nn.NdArray.from_numpy_array(arr2) ans1 = F.dot(nd1, nd2) print(ans1.shape) #(5, 6, 7, 2, 3, 6) var1 = nn.Variable.from_numpy_array(arr1) var2 = nn.Variable.from_numpy_array(arr2) ans2 = F.dot(var1, var2) ans2.forward() print(ans2.shape) #(5, 6, 7, 2, 3, 6) out1 = nn.NdArray((5, 6, 7, 2, 3, 6)) out1.cast(np.float32) F.dot(nd1, nd2, out1) print(out1.shape) #(5, 6, 7, 2, 3, 6) out2 = nn.Variable((5, 6, 7, 2, 3, 6)) out2.data.cast(np.float32) F.dot(var1, var2, out2) out2.forward() print(out2.shape) #(5, 6, 7, 2, 3, 6)
Stochasticity¶
- nnabla.functions.rand(low=0, high=1, shape=[], seed=-1, n_outputs=-1, outputs=None)[source]¶
Samples numbers from a uniform distribution \(x \sim U(low, high)\) given lowest value \(low\), upper bound \(high\), and shape of the returned Variable.
- Parameters:
- Returns:
Variable with the shape specified in the argument.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.randint(low=0, high=1, shape=[], seed=-1, n_outputs=-1, outputs=None)[source]¶
Samples integer numbers from a uniform distribution \(x \sim U(low, high)\) given lowest value \(low\), upper bound \(high\), and the shape of the returned Variable. The lowest value \(low\) is included in the range, while the upper bound \(high\) is excluded, corresponding to the half-open interval \([low, high)\).
- Parameters:
- Returns:
Variable with the shape specified in the argument. The dtype is int32.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.randn(mu=0, sigma=1, shape=[], seed=-1, n_outputs=-1, outputs=None)[source]¶
Samples numbers from a normal distribution \(x \sim N(\mu, \sigma)\) given mean \(\mu\), standard deviation \(\sigma\), and shape of the returned Variable.
- Parameters:
- Returns:
Variable with the shape specified in the argument.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.rand_binomial(n=1, p=0.5, shape=[], seed=-1, n_outputs=-1, outputs=None)[source]¶
Samples numbers from a binomial distribution \(x \sim B(n, p)\) given the numbers of trials \(n\), probability \(p\), and shape of the returned Variable. When \(n = 1\), this behaves like the Bernoulli distriburion.
- Parameters:
n (int) – \(n\) in definition, the number of trials. [default=
1
]p (float) – \(p\) in definition, probability of success. [default=
0.5
]shape (
tuple
ofint
) – Shape of returned variable. [default=[]
]seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default=
-1
]
- Returns:
Variable with the shape specified in the argument.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.rand_beta(alpha=0.5, beta=0.5, shape=[], seed=-1, n_outputs=-1, outputs=None)[source]¶
Samples numbers from a beta distribution \(x \sim \beta(\alpha, \beta)\).
- Parameters:
- Returns:
Variable with the shape specified in the argument.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.rand_gamma(k=0.5, theta=1, shape=[], seed=-1, n_outputs=-1, outputs=None)[source]¶
Samples numbers from a gamma distribution \(x \sim \frac {\gamma(k, \frac {x}{\theta})}{\Gamma(k)}\).
- Parameters:
- Returns:
Variable with the shape specified in the argument.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.dropout(x, p=0.5, seed=-1, n_outputs=-1, outputs=None)[source]¶
Dropout. Samples a number \(u\) from a uniform distribution in \([0, 1]\) , and ignores the input if \(u \leq p\).
\[\begin{split}y = \left\{ \begin{array}{ll} \frac{x}{1 - p} & (u > p) \\ 0 & ({\rm otherwise}) \end{array} \right.\end{split}\]Note
Usually dropout only applied during training as below (except MC dropout). If you want to use dropout as an MC dropout, remove ‘if train:’.
h = PF.affine(x, num_hidden) if train: h = F.dropout(h, 0.5)
- Parameters:
- Returns:
N-D array with the same shape as x
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.random_choice(x, w, shape=[], replace=True, seed=-1, n_outputs=-1, outputs=None)[source]¶
Generate random samples from population
x
with selection probabilities determined by the relative weightsw
. The number of samples to draw is given by the product ofshape
s dimensions, and the samples are returned with the givenshape
. By default, samples are drawn with replacement, i.e. selection of a specific population member is solely determined by its associated weight. Sampling without replacement, where any population member may be drawn only once, is used ifreplace
is set to False.For both
x
andw
the innermost dimension corresponds to the individual populations and their weights from which samples are returned with the requestedshape
following all outermost dimensions of the input.import nnabla as nn import nnabla.functions as F import numpy as np nn.set_auto_forward(True) # x holds two populations x = nn.Variable.from_numpy_array(np.array([[11, 22, 33], [110, 220, 330]])) # w holds the weights for each population w = nn.Variable.from_numpy_array(np.array([[10, 20, 70], [70, 20, 10]])) # draw one sample from each population y = F.random_choice(x, w) # y.shape => (2, 1) # draw 12 samples with shape (3, 4) from each population y = F.random_choice(x, w, shape=(3, 4)) # y.shape => (2, 3, 4)
Note that weights must not be less than zero and for each population the sum of weights must be greater than zero. Additionally, sampling without replacement requires that the number of non-zero weights is not less than the number of samples to be drawn. These conditions are verified in “cpu” computation context but not when using “cuda” or “cudnn” acceleration (this would require additional device synchronization steps penalizing performance).
Random sampling from an implicit array of index values (like categorical or multinomial) can be realized with input
x
constructed as indices.w = nn.Variable.from_numpy_array(np.array([1, 2, 3, 2, 1])) y = F.random_choice(F.arange(0, 5), w)
- Parameters:
- Returns:
N-D array
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.random_crop(x, shape=None, base_axis=1, seed=-1, n_outputs=-1, outputs=None)[source]¶
RandomCrop randomly extracts a portion of an array.
- Parameters:
x (Variable) – N-D array
shape (
tuple
ofint
) – The data size to extract. For example, to randomly extract a portion of the image (3,48,48) from a 3,64,64 image, specify (3,48,48). [default=x.shape
]base_axis (int) – No Description [default=
1
]seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default=
-1
]
- Returns:
N-D array
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.random_erase(x, prob=0.5, area_ratios=(0.02, 0.4), aspect_ratios=(0.3, 3.3333), replacements=(0.0, 255.0), n=None, share=True, inplace=False, base_axis=1, seed=-1, channel_last=False, ste_fine_grained=True, n_outputs=-1, outputs=None)[source]¶
Randomly erase patches of the inputs and replace with random values.
Erasing is applied for each sample and for each
n
with the given probability, the randomly selected area ratio and aspect ratio ifshare
isTrue
; otherwise (share`=`False
), for each feature additionally.Random patch are selected by random coordinates as the following,
\[\begin{split}S_e &&= Uniform(s_l, s_h) \times S \\ r_e &&= Uniform(r_l, r_h) \\ H_e &&= \sqrt{S_e \times r_e} \\ W_e &&= \sqrt{S_e / r_e} \\ y_e &&= Uniform(0, H - H_e) \\ x_e &&= Uniform(0, W - W_e),\end{split}\]where \(S\) is the area, \(s_l\) and \(s_h\) are the low and high values of the area ratio range, \(r_l\) and \(r_h\) are the low and high values of the aspect ratio range, \(H_e\) and \(W_e\) are height and width of a patch, and \(y_e\) and \(x_e\) are the start coordinates of a patch. If a pixel of the inputs falls in this patch, the value of that pixel is replaced with a random value in
replacements
range.Backward is implemented as passing gradients if
ste_fine_grained
is False; otherwise, the backward only occurs in regions not erased.References
- Parameters:
x (Variable) – N-D array.
prob (float) – Probability to erase. [default=
0.5
]area_ratios (repeated float) – Low and high of the area ratio range. [default=
(0.02, 0.4)
]aspect_ratios (repeated float) – Low and high of the aspect ratios range. [default=
(0.3, 3.3333)
]replacements (repeated float) – Low and high of the replacement value range. [default=
(0.0, 255.0)
]n (int) – Max number of patches to be erased. [default=
1
]share (bool) – Use a same bounding box randomly picked over the feature dimension when being True. Default is True. [default=
True
]inplace (bool) – This option is obsolete and ignored. Output is never in-placed with input. [default=
False
]base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default=
1
]seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default=
-1
]channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default=
False
]ste_fine_grained (bool) – Straight Through Estimator is fine-grained or not. Default is True. [default=
True
]
- Returns:
N-D array.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.random_flip(x, axes=None, base_axis=1, seed=-1, n_outputs=-1, outputs=None)[source]¶
Reverses the order of elements of the specified dimension of an array at 50% probability.
- Parameters:
x (Variable) – N-D array
axes (repeated int64) – The index of the axis to reverse the order of the elements. Axis indices take on values 0, 1, 2, and so on from the left. For example, to flip a 32 (W) by 24 (H) 100 RGB images (100, 3,24,32) vertically and horizontally at random, specify (2,3). [default=
[len(x.shape) - 1]
]base_axis (int) – No Description [default=
1
]seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default=
-1
]
- Returns:
N-D array
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.random_shift(x, shifts=None, border_mode='nearest', constant_value=0, base_axis=1, seed=-1, n_outputs=-1, outputs=None)[source]¶
Randomly shifts the array elements within the specified range.
- Parameters:
x (Variable) – N-D array.
shifts (repeated int64) – Max absolute amount to shift elements. For example, to shift image data horizontally by \(\pm 2\) pixels and vertically by \(\pm 3\) pixels, specify (3,2). [default=
(0,) * len(x.shape)
]border_mode (string) – Specify how to process the ends of arrays whose values will be undetermined as a result of shifting. nearest: The data at the ends of the original array is copied and used. reflect: Original data reflected at the ends of the original array is used. constant: Constant value is used. [default=
'nearest'
]constant_value (float) – Value used for outside of the original array if border_mode=’constant’. [default=
0
]base_axis (int) – No Description [default=
1
]seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default=
-1
]
- Returns:
N-D array.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.image_augmentation(x, shape=None, pad=(0, 0), min_scale=1.0, max_scale=1.0, angle=0.0, aspect_ratio=1.0, distortion=0.0, flip_lr=False, flip_ud=False, brightness=0.0, brightness_each=False, contrast=1.0, contrast_center=0.0, contrast_each=False, noise=0.0, seed=-1, n_outputs=-1, outputs=None)[source]¶
ImageAugmentation randomly alters the input image.
- Parameters:
x (Variable) – N-D array.
shape (
tuple
ofint
) – The output image data size. [default=x.shape
]pad (
tuple
ofint
) – Border padding values for each spatial axis. Padding will be added both sides of the dimension. [default=(0, 0)
]min_scale (float) – The minimum scale ratio when randomly scaling the image. For example, to scale down to 0.8 times the size of the original image, specify “0.8”. To not apply random scaling, set both min_scale and max_scale to “1.0”. [default=
1.0
]max_scale (float) – The maximum scale ratio when randomly scaling the image. For example, to scale down to 2 times the size of the original image, specify “2.0”. [default=
1.0
]angle (float) – The rotation angle range in radians when randomly rotating the image. The image is randomly rotated in the -Angle to +Angle range. For example, to rotate in a +-15 degree range, specify “0.26” (15 degrees/360 degrees * 2PI). To not apply random rotation, specify “0.0”. [default=
0.0
]aspect_ratio (float) – The aspect ratio range when randomly deforming the image. For example, to deform aspect ratio of image from 1:1.3 to 1.3:1, specify “1.3”. To not apply random deforming, specify “1.0”. [default=
1.0
]distortion (float) – The distortion range when randomly distorting the image. To not apply distortion, specify “0.0”. [default=
0.0
]flip_lr (bool) – Whether to randomly flip the image horizontally at 50% probability. [default=
False
]flip_ud (bool) – Whether to randomly flip the image vertically at 50% probability. [default=
False
]brightness (float) – The absolute range of values to randomly add to the brightness. A random value in the -Brightness to +Brightness range is added to the brightness. For example, to vary the brightness in the -0.05 to +0.05 range, specify “0.05”. To not apply random addition to brightness, specify “0.0”. [default=
0.0
]brightness_each (bool) – Whether to apply the random addition to brightness (as specified by brightness) to each color channel. True: brightness is added based on a different random number for each channel. False: brightness is added based on a random number common to all channels. [default=
False
]contrast (float) – The range in which to randomly vary the image contrast. The contrast is varied in the 1/Contrast times to Contrast times range. The output brightness is equal to (input - contrast_center) * contrast + contrast_center. For example, to vary the contrast in the 0.91 times to 1.1 times range, specify “1.1”. To not apply random contrast variation, specify “1.0”. [default=
1.0
]contrast_center (float) – Intensity center used for applying contrast. [default=
0.0
]contrast_each (bool) – Whether to apply the random contrast variation (as specified by contrast) to each color channel. True: contrast is varied based on a different random number for each channel. False: contrast is varied based on a random number common to all channels. [default=
False
]noise (float) – Sigma of normal random number to be added. [default=
0.0
]seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default=
-1
]
- Returns:
N-D array.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Loss Functions¶
- nnabla.functions.sigmoid_cross_entropy(x, target, n_outputs=-1, outputs=None)[source]¶
Element-wise cross entropy between
x
and the target variables, passed to a sigmoid function.\[y_i = - \left(x^{(1)}_i \ln \left(\sigma \left(x^{(0)}_i \right)\right) + \ \left(1 - x^{(1)}_i\right) \ln \left(1 - \sigma \left(x^{(0)}_i \ \right)\right)\right)\]where \(\sigma(s)=\frac{1}{1+\exp(-s)}\).
Note
SigmoidCrossEntropy is equivalent to Sigmoid+BinaryCrossEntropy, but computing them at once has the effect of reducing computational error.
- Parameters:
- Returns:
N-D array of element-wise losses.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.binary_cross_entropy(x, target, n_outputs=-1, outputs=None)[source]¶
Element-wise cross entropy between
x
and the target variables.\[y_i = - \left(x^{(1)}_i * \ln \left(x^{(0)}_i\right) + \left(1 - \ x^{(1)}_i\right) * \ln \left(1 - x^{(0)}_i\right)\right).\]- Parameters:
- Returns:
N-D array of element-wise losses.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.softmax_cross_entropy(x, target, axis=None, n_outputs=-1, outputs=None)[source]¶
Element-wise cross entropy between the variables and the variables of a label given by a category index with Softmax normalization.
\[y_{j} = -\ln \left(\frac{\exp(x_{j,t_j})}{\sum_{i'} \exp(x_{j,i'})}\right)\]along dimension specified by axis (\(i\) is the axis where normalization is performed on).
Note
SoftmaxCrossEntropy is equivalent to Softmax+CategoricalCrossEntropy, but computing them at once has the effect of reducing computational error.
- Parameters:
x (Variable) – N-D array. Typically indicates a score. \((D_1 \times ... \times D_i \times ... \times D_N)\) [parameter]
target (Variable) – N-D array of labels. \((D_1 \times ... \times 1 \times ... \times D_N)\) , each label should be the index from 0 to n-class, -1 if not belongs any class. [parameter]
axis (int) – Axis normalization is taken. [default=
len(x.shape) - 1
]
- Returns:
N-D array of element-wise losses. \((D_1 \times ... \times 1 \times ... \times D_N)\)
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.categorical_cross_entropy(x, target, axis=None, n_outputs=-1, outputs=None)[source]¶
Element-wise cross entropy between
x
and the targett
where targets are given by a category index.\[y_{j} = -\ln \left( x_{j, t_j} \right)\]along dimension specified by axis (\(i\) is the axis where normalization is performed on).
- Parameters:
x (Variable) – N-D array. Typically indicates a score. \((D_1 \times ... \times D_i \times ... \times D_N)\) [parameter]
target (Variable) – N-D array of labels. \((D_1 \times ... \times 1 \times ... \times D_N)\), each label should be the index from 0 to n-class, -1 if not belongs any class. [parameter]
axis (int) – Axis normalization is taken. [default=
len(x.shape) - 1
]
- Returns:
N-D array of element-wise losses. \((D_1 \times ... \times 1 \times ... \times D_N)\)
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.squared_error(x0, x1, n_outputs=-1, outputs=None)[source]¶
Element-wise squared error
\[y_i = \left(x^{(0)}_i - x^{(1)}_i\right)^2.\]- Parameters:
- Returns:
N-D array.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.absolute_error(x0, x1, n_outputs=-1, outputs=None)[source]¶
Element-wise absolute error
\[y_i = | x^{(0)}_i - x^{(1)}_i |.\]- Parameters:
- Returns:
N-D array.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.huber_loss(x0, x1, delta=1.0, n_outputs=-1, outputs=None)[source]¶
Element-wise Huber loss
\[\begin{split}y_i= \left\{ \begin{array}{ll} d^2 & (|d| < \delta)\\ \delta (2 |d| - \delta) & ({\rm otherwise}) \end{array} \right.\end{split}\]where \(d = x^{(0)}_i - x^{(1)}_i\)
- Parameters:
- Returns:
N-D array of element-wise losses.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.epsilon_insensitive_loss(x0, x1, epsilon, n_outputs=-1, outputs=None)[source]¶
Element-wise Epsilon Insensitive Loss
\[\begin{split}y_i= \left\{ \begin{array}{ll} | x^{(0)}_i - x^{(1)}_i | - \epsilon & if \ \ | x^{(0)}_i - x^{(1)}_i | > \epsilon \\ 0 & otherwise \end{array} \right.\end{split}\]- Parameters:
- Returns:
N-D array of element-wise losses.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.kl_multinomial(p, q, base_axis=1, n_outputs=-1, outputs=None)[source]¶
The Kullback Leibler Divergence for multinomial distributions.
\[D = \sum_i p_i \log \left( \frac{p_i}{q_i} \right)\]- Parameters:
- Returns:
Kullback Leibler divergence \(KL(p \parallel q)\).
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Signal Processing¶
- nnabla.functions.interpolate(x, scale=None, output_size=None, mode='linear', align_corners=False, half_pixel=False, half_pixel_for_nn=False, channel_last=False)[source]¶
Resize an ND array with interpolation.
Scaling factors for spatial dimensions are determined by either
scale
oroutput_size
.nd = len(scale)
ornd = len(output_size)
determines the number of spatial dimensions, and the lastnd
dimensions of the inputx
are considered as the spatial dimensions to be resized.If
scale
is given, theoutput_size
is calculated byoutput_size[i] = floor(scale[i] * x.shape[i - len(scale)]).
Calculation of the coordinate transformation are as follows.
The input coordinate i_input is computed by the output coordinate i_output, the input size size_input, and the output size size_output as
align_corners
half_pixel
i_input
True
True
Not supported.
True
False
i_output * (size_input - 1) / (size_output - 1)
False
True
(i_output + 0.5) * size_input / size_output - 0.5
False
False
i_output * size_input / size_output
In the case of the
nearest
mode andhalf_pixel_for_nn
isTrue
, the input coordinate i_input is computed by the output coordinate i_output asi_input = (i_output + 0.5) * size_input / size_output.
Example:
import numpy as np import nnabla as nn import nnabla.functions as F x_data = np.random.rand(64, 3, 224, 224) x = nn.Variable.from_numpy_array(x_data) # Resize by scales y = F.interpolate(x, scale=(2, 2), mode='linear') print(y.shape) # (64, 3, 448, 448) y.forward() print(y.d) # Print output # Resize to a size y2 = F.interpolate(x, output_size=(320, 257), mode='linear') print(y2.shape) # (64, 3, 320, 257) y2.forward() print(y2.d) # Print output
- Parameters:
x (Variable) – N-D array with an arbitrary number of dimensions.
scale (tuple of ints) – Scale factors along axes. The default is
None
, and if this is omitted,output_size
must be specified.output_size (tuple of ints) – The output sizes for axes. If this is given, the scale factors are determined by the output sizes and the input sizes. The default is
None
, and if this is omitted,scale
must be specified.mode (str) – Interpolation mode chosen from (‘linear’|’nearest’). The default is ‘linear’.
align_corners (bool) – If true, the corner pixels of input and output arrays are aligned, such that the output corner pixels have the same values with the input corner pixels. Default is
False
.half_pixel – If true, in the coordinate transformation, 0.5 is added to the output coordinate and 0.5 is subtracted from the input coordinate after scaling. Default is
False
.half_pixel_for_nn – This is a special argument to support the backward-compatibility of the nearest neighbor interpolation. Default is
False
. When inTrue
, the implementation of nearest neighbor interpolation is the old one.channel_last – Last dimension is the channel (NHWC order) if True.
- Returns:
N-D array.
- Return type:
Warning
Up to the version 1.8.0, the default of
align_corners
wasNone
, and it becomesTrue
ifmode
is linear, otherwiseFalse
.Warning
Up to the version 1.8.0, the nearest
mode
interpolation corresponds to the nearestmode
andhalf_pixel_for_nn
=True
after the version 1.8.0.
- nnabla.functions.fft(x, signal_ndim, normalized=False, n_outputs=-1, outputs=None)[source]¶
Complex-to-complex Discrete Fourier Transform,
\[X_{k_1, \ldots, k_d} = \sum_{n_1=0}^{N_1-1} \dots \sum_{n_d=0}^{N_d-1} x_{n_1, \ldots, n_d} \exp\left(-2 \pi j \left( \sum_{i=0}^{d} \frac{k_i n_i}{N_i} \right) \right),\]where
\[k_i = 0, \ldots, N_i - 1.\]This function now supports 1-D, 2-D, and 3-D DFT with or without the leading batch dimension(s).
The input is expected to be complex-valued with at least signal_ndim + 1 dimensions. The last dimension has a shape of two where x[…, 0] is the real part and x[…, 1] the imaginary part.
Example:
import numpy as np import nnabla as nn import nnabla.functions as F from nnabla.ext_utils import get_extension_context ctx = get_extension_context("cudnn") nn.set_default_context(ctx) # Example for a batched 2D-FFT and 2D-IFFT (batch-size: 2, data-size: 4x3) x_data = np.random.rand(2, 4, 3) + 1j * np.random.rand(2, 4, 3) x = nn.Variable.from_numpy_array(np.stack([np.real(x_data), np.imag(x_data)], axis=3)) y = F.fft(x, signal_ndim=2, normalized=True) z = F.ifft(y, signal_ndim=2, normalized=True) z.forward() np.allclose(z.d[..., 0] + 1j*z.d[...,1], x_data)
- Parameters:
- Returns:
FFT transformed signal.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.ifft(x, signal_ndim, normalized=False, n_outputs=-1, outputs=None)[source]¶
Complex-to-complex inverse Discrete Fourier Transform,
\[X_{k_1, \ldots, k_d} = \frac{1}{\prod_{i=1}^{d} N_i} \sum_{n_1=0}^{N_1-1} \dots \sum_{n_d=0}^{N_d-1} x_{n_1, \ldots, n_d} \exp\left(2 \pi j \left( \sum_{i=0}^{d} \frac{k_i n_i}{N_i} \right) \right),\]where
\[k_i = 0, \ldots, N_i - 1.\]This function now supports 1-D, 2-D, and 3-D DFT with or without the leading batch dimension(s).
The input is expected to be complex-valued with at least signal_ndim + 1 dimensions. The last dimension has a shape of two where x[…, 0] is the real part and x[…, 1] the imaginary part.
- Parameters:
- Returns:
IFFT transformed signal.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.stft(x, window_size, stride, fft_size, window_type='hanning', center=True, pad_mode='reflect', as_istft_backward=False)[source]¶
Computes the short-time Fourier transform
- Parameters:
x (Variable) – Time domain sequence of size
batch_size x sample_size
.window_size (int) – Size of STFT analysis window.
stride (int) – Number of samples that we shift the window, also called
hop size
.fft_size (int) – Size of the FFT, the output will have
fft_size // 2+ 1
frequency bins.window_type (str) – Analysis window, can be either
hanning
,hamming
orrectangular
. For convenience, alsowindow_type=None
is supported which is equivalent towindow_type='rectangular'
.center (bool) – If
True
, then the signalx
is padded by half the FFT size using reflection padding.pad_mode (str) – Padding mode, which can be
'constant'
or'reflect'
.'constant'
pads with0
.as_istft_backward – If
True
, then forward execution behaves as backward execution of ISTFT, treating inputx
as output gradient of ISTFT and outputsy_r
andy_i
as inputs gradient of ISTFT. This option is only used in nn.grad operator.
- Returns:
Returns real and imaginary parts of STFT result.
- nnabla.functions.istft(y_r, y_i, window_size, stride, fft_size, window_type='hanning', center=True, pad_mode='reflect', as_stft_backward=False)[source]¶
Computes the inverse shoft-time Fourier transform
Note: We use a constant square inverse window for the reconstruction of the time-domain signal, therefore, the first and last
window_size - stride
are not perfectly reconstructed.- Parameters:
y_r (Variable) – Real part of STFT of size
batch_size x fft_size//2 + 1 x frame_size
.y_i (Variable) – Imaginary part of STFT of size
batch_size x fft_size//2 + 1 x frame_size
.window_size (int) – Size of STFT analysis window.
stride (int) – Number of samples that we shift the window, also called
hop size
.fft_size (int) – Size of the FFT, (STFT has
fft_size // 2 + 1
frequency bins).window_type (str) – Analysis window, can be either
hanning
,hamming
orrectangular
. For convenience, alsowindow_type=None
is supported which is equivalent towindow_type='rectangular'
.center (bool) – If
True
, then it is assumed that the time-domain signal has centered frames.pad_mode (str) – Padding mode corresponding to STFT
pad_mode
, which can be'constant'
or'reflect'
.'constant'
pads with0
. This option is ignored for the normal use of ISTFT. You need to set the samepad_mode
only whenas_stft_backward == True
.as_stft_backward (bool) – If
True
, then forward execution behaves as backward execution of STFT, treating inputsy_r
andy_i
as outputs gradient of STFT and outputx
as input gradient of STFT. This option is only used in nn.grad operator.
- Returns:
Time domain sequence of size
batch_size x sample_size
.- Return type:
Geometric Neural Network Layers¶
- nnabla.functions.affine_grid(theta, size, align_corners=False, n_outputs=-1, outputs=None)[source]¶
Generate the source grid based on the normalized target grid with
size
. The target grid is first normalized in [-1, 1], then tranformed by the affine transformation \(\theta\) to generate the source grid. 2D and 3D grid are supported now.This function is normally used with the
warp_by_grid
function for constructing the spatial transformer.- Parameters:
theta (Variable) – N-D array with the shape (\(B \times 2 \times 3\)), the sample-wise affine transformation matrix.
size (repeated int64) – The grid size of (\(H \times W\)) for 2D and (\(D \times H \times W\)) for 3D.
align_corners (bool) – If
True
, the top-left and bottom-right pixels correspond to (-1, -1) and (1, 1) respectively since a pixel is located on the corner of a grid, and the target grid is normalized in [-1, 1]. IfFalse
, the normalized target grid in [-1, 1] is scaled bysize - 1 / size
according to the respective spatial size (e.g., \(H\) and \(W\)) before the transformation since a pixel is located on a center of a cell in a grid. [default=False
]
- Returns:
N-D array with the shape (\(B \times H \times W \times 2\)) for 2D and (\(B \times D \times H \times W \times 3\)) for 3D. The last dimension of 2 is for (x, y) and of 3 for (x, y, z). The
gird
is used as the source grid for the warping.- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.warp_by_grid(x, grid, mode='linear', padding_mode='zero', align_corners=False, channel_last=False, n_outputs=-1, outputs=None)[source]¶
Warp the input data by the grid. This function is normally used with the generated normalized grid by the
affine_grid
function for constructing the spatial transformer.- Parameters:
x (Variable) – Input data to be warped with the shape (\(B \times C \times H_{in} \times W_{in}\)) for 2D and (\(B \times C \times D_{in} \times H_{in} \times W_{in}\)) for 3D.
grid (Variable) – Grid warping the input data with the shape (\(B \times H_{out} \times W_{out} \times 2\)) for 2D and (\(B \times D_{out} \times H_{out} \times W_{out} \times 3\)) for 3D. The last dimension of 2 is for (x, y) or 3 for (x, y, z).
mode (string) – Interpolation mode, linear or nearest. [default=
'linear'
]padding_mode (string) – Padding mode when the grid value is outside [-1, 1]. If this is “zero”, 0 is used for padding. “reflect” uses the values reflected at the ends of the original input data like the mirror. “repeat” used the values at the ends of the original input data. [default=
'zero'
]align_corners (bool) – The target grid normalized in [-1, 1] is scaled by
size - 1 / size
according to the respective spatial size (e.g., \(H\) and \(W\)) before the transformation if this isFalse
. If this isTrue
, the top-left and bottom-right pixels correspond to (-1, -1) and (1, 1) respectively. [default=False
]channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default=
False
]
- Returns:
Output data warped by the grid.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.warp_by_flow(data, flow, n_outputs=-1, outputs=None)[source]¶
Transform the image(s) data by flow field(s) of offset vectors such that each output pixel corresponds to the input image pixel at the relative offset location given by horizontal and vertical flow values (in other words, the flow field describes the coordinate displacements for each output pixel to the corresponding input pixel). Both data and flow are 4-D variables (in “NCHW” layout) with identical shape except the flow channel dimension (which is always 2).
\[output_{n,c,y,x} = data_{n,c,y',x'},\]where
\[\begin{split}y' &=& y + flow_{n,1,y,x}, \\ x' &=& x + flow_{n,0,y,x}.\end{split}\]The output pixel values at \(y'\) and \(x'\) locations are obtained by bilinear interpolating between the 4 closest pixels of the input image. Pixel values outside of the input image are implicitly padded with the value of the closest boundary pixel.
- Parameters:
- Returns:
Transformed image data with shape
(N, Channels, Height, Width)
.- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Quantized Neural Network Layers¶
- nnabla.functions.binary_sigmoid(x, n_outputs=-1, outputs=None)[source]¶
Element-wise binary sigmoid function. In the forward pass, it computes
\[\begin{split}f(x) = \begin{cases} 1 & (x > 0) \\ 0 & ({\rm otherwise})\end{cases},\end{split}\]but in the backward pass, a straight-through approximation of the gradient is used, i.e.,
\[\begin{split}\frac{\partial f(x)}{\partial x} = \begin{cases} 0 & (|x| \geq 1) \\ \frac{1}{2} & ({\rm otherwise}) \end{cases}.\end{split}\]References
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.binary_tanh(x, n_outputs=-1, outputs=None)[source]¶
Element-wise binary tanh function. In the forward pass, it computes
\[\begin{split}f(x) = \begin{cases} 1 & (x > 0) \\ -1 & ({\rm otherwise}) \end{cases},\end{split}\]but in the backward pass, a straight-through approximation of the gradient is used, i.e.,
\[\begin{split}\frac{\partial f(x)}{\partial x} = \begin{cases} 0 & (|x| \geq 1) \\ 1 & ({\rm otherwise}) \end{cases}.\end{split}\]References
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.binary_connect_affine(x, weight, binary_weight, bias=None, base_axis=1, quantize_zero_to=1.0, n_outputs=-1, outputs=None)[source]¶
This function provides a BinaryConnect affine layer. It computes in the forward pass
\[y_j = \sum_{i} sign(w_{j,i}) x_i,\]i.e., the weights \(w_{j,i}\) are binarized to \(sign(w_{j,i})\) and, hence, each weight is in \(\{-1,\,1\}\). By this weight binarization, the inner product computations do not require any multiplications anymore as they turn into additions/subtractions.
This function should be used together with
batch_normalization()
.Note
1) If you would like to share the binary weights between other layers, please use the standard, floating value weights (
weight
) and not the binary weights (binary_weight
).2) The weights and the binary weights become in sync only after a call to
forward()
, and not after a call tobackward()
. If you wish to store the parameters of the network, remember to callforward()
, once before doing so, otherwise the weights and the binary weights will not be in sync.3) CPU and GPU implementations now use floating values for
binary_weight
, since this function is for simulation purposes.References
- Parameters:
x (Variable) – Input .
weight (Variable) – Weight . [parameter]
binary_weight (Variable) – Binarized weight . [parameter]
bias (Variable) – Bias. [optional][parameter]
base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default=
1
]quantize_zero_to (float) – Input value at zero is quantized to this value. [default=
1.0
]
- Returns:
Output.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.binary_connect_convolution(x, weight, binary_weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, quantize_zero_to=1.0, n_outputs=-1, outputs=None)[source]¶
This function provides a BinaryConnect convolution layer. It computes in the forward pass
\[y_{n, a, b} = \sum_{m} \sum_{i} \sum_{j} sign(w_{n, m, i, j}) x_{m, a + i, b + j},\]i.e., the weights \(w_{n, m, i, j}\) are binarized to \(sign(w_{n, m, i, j})\) and, hence, each weight is in \(\{-1,\,1\}\). By this weight binarization, the inner product computations do not require any multiplications anymore as they turn into additions/subtractions.
This function should be used together with
batch_normalization()
.Reference
Note
1) If you would like to share the binary weights between other layers, please use the standard, floating value weights (
weight
) and not the binary weights (binary_weight
).2) The weights and the binary weights become in sync only after a call to
forward()
, and not after a call tobackward()
. If you wish to store the parameters of the network, remember to callforward()
, once before doing so, otherwise the weights and the binary weights will not be in sync.3) CPU and GPU implementations now use floating values for
binary_weight
, since this function is for simulation purposes.- Parameters:
x (Variable) – Input.
weight (Variable) – Weight. [parameter]
binary_weight (Variable) – Binarized weight. [parameter]
bias (Variable) – Bias. [optional][parameter]
base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default=
1
]pad (
tuple
ofint
) – Padding sizes for dimensions. [default=(0,) * (len(x.shape) - (base_axis+1))
]stride (
tuple
ofint
) – Stride sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default=
1
]quantize_zero_to (float) – Input value at zero is quantized to this value. [default=
1.0
]
- Returns:
Output
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.binary_weight_affine(x, weight, binary_weight, alpha, bias=None, base_axis=1, quantize_zero_to=1.0, n_outputs=-1, outputs=None)[source]¶
This function provides a Binary Weight Network affine layer. It computes in the forward pass
\[y_j = \frac{1}{\|\mathbf{w}_j\|_{\ell_1}} \sum_{i} sign(w_{j,i}) x_i\]i.e., the weights \(w_{j,i}\) are binarized to \(sign(w_{j,i})\) and, hence, each weight is in \(\{-1,\,1\}\). By this weight binarization, the inner product computations turn into additions/subtractions which are followed by multiplication with the scaling factor \(\alpha_j = \frac{1}{\|\mathbf{w}_j\|_{\ell_1}}\).
Reference
Note
1) If you would like to share the binary weights with other layers, please use the standard, floating value weights (
weight
) and not the binary weights (binary_weight
).2) The weights and the binary weights become in sync only after a call to
forward()
, and not after a call tobackward()
. If you wish to store the parameters of the network, remember to callforward()
, once before doing so, otherwise the weights and the binary weights will not be in sync.3) CPU and GPU implementations now use floating values for
binary_weight
, since this function is for simulation purposes.- Parameters:
x (Variable) – Input .
weight (Variable) – Weight. [parameter]
binary_weight (Variable) – Binarized weight. [parameter]
alpha (Variable) – Alpha. [parameter]
bias (Variable) – Bias. [optional][parameter]
base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default=
1
]quantize_zero_to (float) – Input value at zero is quantized to this value. [default=
1.0
]
- Returns:
Output.
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.binary_weight_convolution(x, weight, binary_weight, alpha, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, quantize_zero_to=1.0, n_outputs=-1, outputs=None)[source]¶
This function provides a Binary Weight Network convolution layer. It computes in the forward pass
\[y_{n, a, b} = \frac{1}{\|\mathbf{w}_n\|_{\ell_1}} \sum_{m} \sum_{i} \sum_{j} sign(w_{n, m, i, j}) x_{m, a + i, b + j}.\]i.e., the weights \(w_{n, m, i, j}\) are binarized to \(sign(w_{n, m, i, j})\) and, hence, each weight is in \(\{-1,\,1\}\). By this weight binarization, the inner product computations turn into additions/subtractions which are followed by multiplication with the scaling factor \(\alpha_n = \frac{1}{\|\mathbf{w}_n\|_{\ell_1}}\).
Reference
Note
1) If you would like to share the binary weights between other standard layers, please use the standard, floating value weights (
weight
) and not the binary weights (binary_weight
).2) The weights and the binary weights become in sync only after a call to
forward()
, and not after a call tobackward()
. If you wish to store the parameters of the network, remember to callforward()
, once before doing so, otherwise the weights and the binary weights will not be in sync.3) CPU and GPU implementations now use floating values for
binary_weight
, since this function is for simulation purposes.- Parameters:
x (Variable) – Input.
weight (Variable) – Weight. [parameter]
binary_weight (Variable) – Binarized weight. [parameter]
alpha (Variable) – Alpha. [parameter]
bias (Variable) – Bias. [optional][parameter]
base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default=
1
]pad (
tuple
ofint
) – Padding sizes for dimensions. [default=(0,) * (len(x.shape) - (base_axis+1))
]stride (
tuple
ofint
) – Stride sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default=
1
]quantize_zero_to (float) – Input value at zero is quantized to this value. [default=
1.0
]
- Returns:
Output
- Return type:
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.fixed_point_quantize(x, sign=True, n=8, delta=0.0625, quantize=True, ste_fine_grained=True, outputs=None)[source]¶
Fixed Point Quantize.
This function simulates to uniformly quantize values in fixed-point number representation.
- Parameters:
x (Variable) – An input variable.
sign (bool) – Indicate the signed number or the unsigned number. Default is true.
n (int) – Bit width used. Note that
sign
consumes one bit. \(n-1\) is used for number representation insigned
case.delta (float) – Step size.
quantize (bool) – If true, quantize input, otherwise not.
ste_fine_grained (bool) – If true, STE is not 1.
- Returns:
N-D array.
- Return type:
See also
nnabla.function_bases.fixed_point_quantize
.In the forward pass,
\[\begin{split}\begin{equation} q_i= \left\{ \begin{array}{ll} max & if \ \ \ x_i > max \\ sign(x_i) \times floor(|x_i| \delta^{-1} + 2^{-1}) \times \delta & if \ \ min \le x_i \le max \\ min & if \ \ x_i < min \\ \end{array} \right., \end{equation}\end{split}\]where \(\delta\) is the step size, \((min, max) :=(- (2^{n-1} - 1)\delta, (2^{n-1} - 1)\delta)\) if \(sign\) is true, \((min, max) := (0, (2^n - 1) \delta)\) otherwise, and \(n\) is the total bit-width used.
In the backward pass when using
ste_fine_grained
as false,\[\begin{equation} \frac{\partial q_i}{\partial x_i} = 1. \end{equation}\]In the backward pass when using
ste_fine_grained
as true,\[\begin{split}\begin{equation} \frac{\partial q_i}{\partial x_i}= \left\{ \begin{array}{ll} 0 & if \ \ \ x_i > max \\ 1 & if \ \ min \le x_i \le max \\ 0 & if \ \ x_i < min \\ \end{array} \right.. \end{equation}\end{split}\]Note
Quantized values are stored as floating point number, since this function is for simulation purposes.
- nnabla.functions.min_max_quantize(x, qr_min, qr_max, ql_min, ql_max, decay=0.999, x_min_max=False, ema=False, ste_fine_grained=True, eps=0.01, quantize=True, outputs=None)[source]¶
Min-max quantization.
This function simulates to uniformly quantize values in fixed-point number representation.
Min-max quantization is defined as the following equation
\[y = round \left(\frac{\min(\max(x, m), M) - m}{scale} \right) \times scale + m,\]where the \(scale\) is defined as
\[scale = \frac{M - m}{M_q - m_q},\]and
\[\begin{split}m_q = ql_{min}, \\ M_q = ql_{max}, \\ m = qr_{min}, \\ M = qr_{max}.\end{split}\]In the backward pass when using
ste_fine_grained
as false,\[\frac{\partial q_i}{\partial x_i} = 1.\]In the backward pass when using
ste_fine_grained
as true,\[\begin{split} \frac{\partial q_i}{\partial x_i}= \left\{ \begin{array}{ll} 0 & if \ \ \ x_i > M \\ 1 & if \ \ m \le x_i \le M \\ 0 & if \ \ x_i < m \\ \end{array} \right..\end{split}\]\(qr_{min}\) and \(qr_{max}\) are treaded as follows.
x_min_max
isTrue
andema
isTrue
: Exponential moving average are computed for each \(min(x)\) and \(max(x)\) then stored in \(qr_{min}\) and \(qr_{max}\).x_min_max
isTrue
andema
isFalse
: \(min(x)\) and \(max(x)\) are computed then stored in \(qr_{min}\) and \(qr_{max}\).x_min_max
isFalse
andema
isTrue
: Exponential moving average stored in \(qr_{min}\) and \(qr_{max}\) are used.x_min_max
isFalse
andema
isFalse
Gradients of \(qr_{min}\) and \(qr_{max}\) are computed in the backward pass.
More precisely, in inference of the min-max quantization, one has to consider zero-point (zp) which corresponds to the real value 0, and its data type is an integer. zero-point is defined as
\[\begin{split} && zp_f = ql_{min} -\frac{qr_{min}}{scale}, \\ && zp = \left\{ \begin{array}{ll} ql_{max} & if \ \ \ zp_f >= ql_{max} \\ round(zp_f) & if \ \ otherwise \\ ql_{min} & if \ \ zp_f <= ql_{min} \\ \end{array} \right..\end{split}\]Accordingly, in order to simulate quantization effect of zero-point, during both forward and backward pass, \(qr_{min}\) and \(qr_{max}\) are adjusted as follows,
\[\begin{split}qr_{min}^{adj} = ql_{min} - zp * scale, \\ qr_{max}^{adj} = ql_{max} - zp * scale.\end{split}\]These operations are often called nudge.
Finally, in the formulas of the min-max quantization, \(m\) and \(M\) are replaced by \(qr_{min}^{adj}\) and \(qr_{max}^{adj}\) respectively.
- Parameters:
x (Variable) – Input N-D array.
qr_min (Variable) – Minimum quantization range (modified during forward execution).
qr_max (Variable) – Maximum quantization range (modified during forward execution).
ql_min (Variable) – Minimum quantization level, typically 0.
ql_max (Variable) – Maximum quantization level, typically 255.
decay (float) – The decay rate for the exponential moving average.
x_min_max (bool) – Use the min and max of x to compute quantization ranges. Default is
False
.ema (bool) – Use the exponential moving average for the min and max quantization ranges. Default is
False
.ste_fine_grained (bool) – If
True
, STE is not 1, the {0, 1}-mask computed from the min-max is applied to the gradient in the backward; otherwise, STE is 1.eps (float) – Epsilon, or small value to ensure \(qr_{max} - qr_{min}\) must be greater than the epsilon.
quantize (bool) – Apply quantization or not.
References
Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko, “Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference”, https://arxiv.org/abs/1712.05877
- nnabla.functions.pow2_quantize(x, sign=True, with_zero=True, n=8, m=1, quantize=True, ste_fine_grained=True, outputs=None)[source]¶
Pow2 Quantize.
This function simulates to uniformly quantize values in fixed-point number representation.
- Parameters:
x (Variable) – An input variable.
sign (bool) – Indicate the signed number or the unsigned number. Default is true.
with_zero (bool) – Indicate using zero as a quantized value. Default is true. Note that
zero
consumes one bit.n (int) – Bit width used. Note that
sign
consumes one bit. \(n-1\) is used for number representation insigned
case. Default is 8.m (int) – \(2^m\) is the upper bound of the dynamic range and \(-2^m\) is the lower bound, \(m \in \mathcal{Z}\). Default is 1.
quantize (bool) – If true, quantize input, otherwise not.
ste_fine_grained (bool) – If true, STE is not 1.
- Returns:
N-D array.
- Return type:
See also
nnabla.function_bases.pow2_quantize
.In the forward pass of
signed
case,\[\begin{split}q_i= \left\{ \begin{array}{ll} max_{+} & if \ \ \overline{q_i} > max_{+} \\ \overline{q_i} & if \ \ min_{+} \le \overline{q_i} \le max_{+} \\ min_{+} & if \ \ 0 \le \overline{q_i} < min_{+} \\ min_{-} & if \ \ min_{-} < \overline{q_i} < 0 \\ \overline{q_i} & if \ \ max_{-} \le \overline{q_i} \le min_{-}\\ max_{-} & if \ \ \overline{q_i} < max_{-} \\ \end{array} \right.,\end{split}\]where
\[\begin{split}&& max_{+} = 2^{m}, min_{+} = 2^{m - (2^{n-1} - 1)},\\ && max_{-} = -2^{m}, min_{-} = -2^{m - (2^{n-1} - 1)},\\ &