Neural Network Libraries¶
Neural Network Libraries is deep learning framework that is intended to be used for research, development, and production. We aim it running everywhere like desktop PCs, HPC clusters, embedded devices and production servers.
This document describes how to use the Python API and C++ API, the contribution guide for developers, and the license term of this software. The Python API is more suitable for fast prototyping and experimentation of deep learning systems, while the C++ API is for deploying inference or training algorithms into embedded systems and servers (The documentation is not available so far. We will make it available soon). The framework is designed modularity and extensibility in mind. Community contributors can add a new operator or optimizer module of neural networks, and a specialized implementation of neural network modules for a specific target device as an extension.
Python Package¶
The Python API built on top of our C++11 core maximizes the flexibility of the design of neural networks , and encourages fast prototyping and experimentation. NNabla works on Python>=3.5 (>=3.6 is recommended).
Python Package Installation¶
There are three ways to install NNabla Python package.
Install with pip command¶
The NNabla python packages are hosted on PYPI for many platforms. For people who are familiar with Python and its package management system pip
(and optionally CUDA, but recommended), the following pip installation guide will be satisfactory when you install NNabla Python. To see the a bit more detailed OS specific setup guide, go to the next section.
NNabla package installation using PIP¶
Note: please refer to the OS specific workflows for the OS specific dependencies setup.
Install NNabla package via pip:
pip install nnabla
Note: If you want to make sure the latest version will be installed, try to uninstall previously installed one with pip uninstall -y nnabla
beforehand.
Then, check if it works by running:
python -c "import nnabla"
2018-06-26 15:20:16,759 [nnabla][INFO]: Initializing CPU extension...
NNabla CUDA extension package installation¶
Run an Example¶
Get the examples (, and unzip) or clone NNabla Examples repository, and go to the MNIST folder.
cd nnabla-examples/mnist-collection/
Run MNIST classification.
python classification.py
Run MNIST classification with CUDA/cuDNN.
python classification.py -c cudnn
OS specific workflows¶
Installation on Linux¶
This installation instruction describes how to install NNabla using pip on almost any Linux 64-bit systems.
The supported Python versions for provided binary packages are 3.5(not recommended), 3.6 and 3.7. It is recommended to use Miniconda as a Python distribution. The following is a simple procedure to install Miniconda Python.
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p {installation path e.g. ~/miniconda}
# You have to set an environment variable PATH accordingly
# to enable the installed ``Python`` and the ``conda`` system.
echo 'export PATH=<installation path>/bin:$PATH' >> ~/.bashrc
# Restart your bash or source ~/.bashrc
# Switch the default Python version
conda install -y python={version number e.g. 3.6}
We actually tested other linux distributions and versions; Ubuntu 14.04, CentOS 6.9, 7.3, Fedora 23, 25, 26, and RHEL 7.3 on various environments; Baremetal server, AWS instance, and/or Docker machine. Thus, you can install in almost the same way described here. The details of how-to-install for each are coming soon.
Installation on Windows¶
We tested on Windows8.1 64bit and Windows10 64bit.
The following software are required for installation:
Required software.
Python>=3.6: PIP
Microsoft Visual C++ 2015 Redistributable
Recommended.
CUDA Toolkit and cuDNN (if you have CUDA GPUs).
In this instruction, we use Miniconda.
Get and install the windows binary from here
And then install required packages from command prompt.
> conda install scipy scikit-image ipython
If your network is using proxy and setup fails, configure proxy server with environment variable and try install again.
> SET HTTP_PROXY=http://(enter the address of the http proxy server here)
> SET HTTPS_PROXY=https://(enter the address of the https proxy server here)
Get and install from here
If you are using a NVIDIA GPU, execution speed will be drastically improved by installing the following software.
To install cuDNN, copy bin, include and lib to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v{CUDA_VERSION}
See a list of compatible cuDNN versions of CUDA extension packages.
Depending on the environment, it will take a long time. Please wait.
Please install scipy using “conda install” before “pip install nnabla”.
Installation on macOS¶
NOTE: Our testing coverage in terms of environments and machines on macOS is very limited. Please submit an issue if you face any issue.
We test the installation on macOS Sierra.
The following software are required for installation:
Python>=3.6 (We’d recommend you to setup Python using Anaconda or Miniconda).
pip (bundled in Conda Python)
wheel (bundled in Conda Python)
setuptools (bundled in Conda Python. You may need to upgrade the version of setuptools with
pip install -U --no-deps setuptools
.)
See NNabla package installation using PIP (note that the binary packages for the CUDA extension are not available for macOS. Please build it from source).
Install NNabla package compatible with Multi-GPU execution¶
To enable multi-GPU execution such as distributed training on NNabla, you have to install a special edition of NNabla package. See Installation with Multi-GPU supported for installation.
Install from source¶
Documentation of build from source has been moved to Github repository (build or build_distributed).
Running on Docker¶
Docker images¶
Python API Tutorial¶
The following tutorial documents are automatically generated from Jupyter notebook files listed in NNabla Tutorial. If you want to run these step-by-step, follow the link and see the instruction found there.
NNabla by Examples¶
This tutorial demonstrates how you can write a script to train a neural network by using a simple hand digits classification task.
Note: This tutorial notebook requires scikit-learn and matplotlib installed in your Python environment.
First let us prepare some dependencies.
import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import nnabla.solvers as S
from nnabla.monitor import tile_images
import numpy as np
import matplotlib.pyplot as plt
import tiny_digits
%matplotlib inline
np.random.seed(0)
imshow_opt = dict(cmap='gray', interpolation='nearest')
2017-06-26 23:09:49,971 [nnabla][INFO]: Initializing CPU extension...
The tiny_digits
module is located under this folder. It provides
some utilities for loading a handwritten-digit classification dataset
(MNIST) available in scikit-learn.
Logistic Regression¶
We will first start by defining a computation graph for logistic regression. (For details on logistic regression, see Appendix A.)
The training will be done by gradient descent, where gradients are calculated using the error backpropagation algorithm (backprop).
Preparing a Toy Dataset¶
This section just prepares a dataset to be used for demonstration of NNabla usage.
digits = tiny_digits.load_digits(n_class=10)
tiny_digits.plot_stats(digits)
Num images: 1797
Image shape: (8, 8)
Labels: [0 1 2 3 4 5 6 7 8 9]

The next block creates a dataset loader which is a generator providing images and labels as minibatches. Note that this dataset is just an example purpose and not a part of NNabla.
data = tiny_digits.data_iterator_tiny_digits(digits, batch_size=64, shuffle=True)
2017-06-26 23:09:50,545 [nnabla][INFO]: DataSource with shuffle(True)
2017-06-26 23:09:50,546 [nnabla][INFO]: Using DataSourceWithMemoryCache
2017-06-26 23:09:50,546 [nnabla][INFO]: DataSource with shuffle(True)
2017-06-26 23:09:50,547 [nnabla][INFO]: On-memory
2017-06-26 23:09:50,547 [nnabla][INFO]: Using DataIterator
A minibatch is as follows. img
and label
are in
numpy.ndarray
.
img, label = data.next()
plt.imshow(tile_images(img), **imshow_opt)
print("labels: {}".format(label.reshape(8, 8)))
print("Label shape: {}".format(label.shape))
labels: [[ 2. 8. 2. 6. 6. 7. 1. 9.]
[ 8. 5. 2. 8. 6. 6. 6. 6.]
[ 1. 0. 5. 8. 8. 7. 8. 4.]
[ 7. 5. 4. 9. 2. 9. 4. 7.]
[ 6. 8. 9. 4. 3. 1. 0. 1.]
[ 8. 6. 7. 7. 1. 0. 7. 6.]
[ 2. 1. 9. 6. 7. 9. 0. 0.]
[ 5. 1. 6. 3. 0. 2. 3. 4.]]
Label shape: (64, 1)

Preparing the Computation Graph¶
NNabla provides two different ways for backprop-based gradient descent optimization. One is with a static graph, and another is with a dynamic graph. We are going to show a static version first.
# Forward pass
x = nn.Variable(img.shape) # Define an image variable
with nn.parameter_scope("affine1"):
y = PF.affine(x, 10) # Output is 10 class
This code block shows one of the most important features in graph
building in NNabla, the parameter scope. The first line defines an
input variable x
. The second line creates a parameter scope. The
third line then applies PF.affine
- an affine transform - to x
,
and creates a variable y
holding that result. Here, the PF
(parametric_function) module provides functions that contain learnable
parameters, such as affine transforms (which contains weights),
convolution (which contains kernels) and batch normalization (which
contains transformation factors and coefficients). We will call these
functions as parametric functions. The parameters are created and
initialized randomly at function call, and registered by a name
“affine1” using parameter_scope
context.
# Building a loss graph
t = nn.Variable(label.shape) # Define an target variable
loss = F.mean(F.softmax_cross_entropy(y, t)) # Softmax Xentropy fits multi-class classification problems
The remaining lines shown above define a target variable and attach functions for loss at the end of the graph. Note that the static graph build doesn’t execute any computation, but the shapes of output variables are inferred. Therefore, we can inspect the shapes of each variable at this time:
print("Printing shapes of variables")
print(x.shape)
print(y.shape)
print(t.shape)
print(loss.shape) # empty tuple means scalar
Printing shapes of variables
(64, 1, 8, 8)
(64, 10)
(64, 1)
()
Executing a static graph¶
You can execute the computation of the graph by calling the
forward()
method in a sink variable. Inputs can be set via .d
accessor. It will borrow CPU array references as numpy.ndarray
.
# Set data
x.d = img
t.d = label
# Execute a forward pass
loss.forward()
# Showing results
print("Prediction score of 0-th image: {}".format(y.d[0]))
print("Loss: {}".format(loss.d))
Prediction score of 0-th image: [ 9.75851917 6.49118519 16.47323608 -1.36296904 -0.78583491
4.08872032 7.84134388 2.42956853 3.31485462 3.61868763]
Loss: 10.6016616821
The output doesn’t make sense since the network is just randomly initialized.
Backward propagation through the graph¶
The parameters registered by parameter_scope
management function can
be queried by get_parameters()
as a dict format.
print(nn.get_parameters())
OrderedDict([('affine1/affine/W', <Variable((64, 10), need_grad=True) at 0x7fa0ba361d50>), ('affine1/affine/b', <Variable((10,), need_grad=True) at 0x7fa0ba361ce8>)])
Before executing backpropagation, we should initialize gradient buffers of all parameter to zeros.
for param in nn.get_parameters().values():
param.grad.zero()
Then, you can execute backprop by calling backward()
method at the
sink variable.
# Compute backward
loss.backward()
# Showing gradients.
for name, param in nn.get_parameters().items():
print(name, param.shape, param.g.flat[:20]) # Showing first 20.
affine1/affine/W (64, 10) [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 4.98418584e-02 8.72317329e-03
-4.06671129e-02 -4.68742661e-02 2.52632981e-09 7.86017510e-04
9.06870365e-02 -1.56249944e-02 -1.56217301e-02 -3.12499963e-02]
affine1/affine/b (10,) [ 0.42710391 -0.01852455 0.07369987 -0.04687012 -0.07798236 -0.03664626
0.01651323 -0.1249291 -0.11862005 -0.09374455]
Gradient is stored in grad field of Variable
. .g
accessor can be
used to access grad data in numpy.ndarray
format.
Optimizing parameters (=Training)¶
To optimize parameters, we provide solver module (aliased as S here). The solver module contains a bunch of optimizer implementations such as SGD, SGD with momentum, Adam etc. The below block creates SGD solver and sets parameters of logistic regression to it.
# Create a solver (gradient-based optimizer)
learning_rate = 1e-3
solver = S.Sgd(learning_rate)
solver.set_parameters(nn.get_parameters()) # Set parameter variables to be updated.
In the next block, we demonstrate a single step of optimization loop.
solver.zero_grad()
line does equivalent to calling .grad.zero()
for all parameters as we shown above. After backward computation, we
apply weight decay, then applying gradient descent implemented in Sgd
solver class as follows
where \(\eta\) denotes learning rate.
# One step of training
x.d, t.d = data.next()
loss.forward()
solver.zero_grad() # Initialize gradients of all parameters to zero.
loss.backward()
solver.weight_decay(1e-5) # Applying weight decay as an regularization
solver.update()
print(loss.d)
12.9438686371
Next block iterates optimization steps, and shows the loss decreases.
for i in range(1000):
x.d, t.d = data.next()
loss.forward()
solver.zero_grad() # Initialize gradients of all parameters to zero.
loss.backward()
solver.weight_decay(1e-5) # Applying weight decay as an regularization
solver.update()
if i % 100 == 0: # Print for each 10 iterations
print(i, loss.d)
0 12.6905069351
100 3.17041015625
200 1.60036706924
300 0.673069953918
400 0.951370298862
500 0.724424362183
600 0.361597299576
700 0.588107347488
800 0.28792989254
900 0.415006935596
Show prediction¶
The following code displays training results.
x.d, t.d = data.next() # Here we predict images from training set although it's useless.
y.forward() # You can execute a sub graph.
plt.imshow(tile_images(x.d), **imshow_opt)
print("prediction:")
print(y.d.argmax(axis=1).reshape(8, 8)) # Taking a class index based on prediction score.
prediction:
[[5 0 1 9 0 1 3 3]
[2 4 1 7 4 5 6 5]
[7 7 9 7 9 0 7 3]
[5 3 7 6 6 8 0 9]
[0 1 3 5 5 5 4 9]
[1 0 0 8 5 1 8 8]
[7 5 0 7 6 9 0 0]
[0 6 2 6 4 4 2 6]]

Dynamic graph construction support¶
This is another way of running computation graph in NNabla. This example doesn’t show how useful dynamic graph is, but shows a bit of flavor.
The next block just define computation graph building as functions for later use.
def logreg_forward(x):
with nn.parameter_scope("affine1"):
y = PF.affine(x, 10)
return y
def logreg_loss(y, t):
loss = F.mean(F.softmax_cross_entropy(y, t)) # Softmax Xentropy fits multi-class classification problems
return loss
To run a computation graph dynamically during creation, you use
nnabla.auto_forward()
context as you see in the below block. By
this, computation is fired immediately at functions are called. (You can
also use nnabla.set_auto_forward(auto)
to set the auto-forward state
globally.)
x = nn.Variable(img.shape)
t = nn.Variable(label.shape)
x.d, t.d = data.next()
with nn.auto_forward(): # Graph are executed
y = logreg_forward(x)
loss = logreg_loss(y, t)
print("Loss: {}".format(loss.d))
plt.imshow(tile_images(x.d), **imshow_opt)
print("prediction:")
print(y.d.argmax(axis=1).reshape(8, 8))
Loss: 0.43071603775
prediction:
[[9 3 5 0 1 9 9 2]
[5 6 6 2 7 5 1 1]
[3 7 7 6 0 8 3 8]
[0 6 4 6 0 6 9 9]
[6 1 2 5 8 3 2 4]
[1 4 4 0 5 7 1 7]
[7 8 9 5 8 3 7 8]
[5 7 5 3 3 0 0 7]]

Backward computation can be done on a dynamically constructed graph.
solver.zero_grad()
loss.backward()
Multi-Layer Perceptron (MLP)¶
In this section, you see an example of MLP graph building and training.
Before starting, we clear all parameters registered in the logistic regression example.
nn.clear_parameters() # Clear all parameters
Here is the function that builds a MLP with an arbitrary depth and width for 10 class classification.
def mlp(x, hidden=[16, 32, 16]):
hs = []
with nn.parameter_scope("mlp"): # Parameter scope can be nested
h = x
for hid, hsize in enumerate(hidden):
with nn.parameter_scope("affine{}".format(hid + 1)):
h = F.tanh(PF.affine(h, hsize))
hs.append(h)
with nn.parameter_scope("classifier"):
y = PF.affine(h, 10)
return y, hs
# Construct a MLP graph
y, hs = mlp(x)
print("Printing shapes")
print("x: {}".format(x.shape))
for i, h in enumerate(hs):
print("h{}:".format(i + 1), h.shape)
print("y: {}".format(y.shape))
Printing shapes
x: (64, 1, 8, 8)
h1: (64, 16)
h2: (64, 32)
h3: (64, 16)
y: (64, 10)
# Training
loss = logreg_loss(y, t) # Reuse logreg loss function.
# Copied from the above logreg example.
def training(steps, learning_rate):
solver = S.Sgd(learning_rate)
solver.set_parameters(nn.get_parameters()) # Set parameter variables to be updated.
for i in range(steps):
x.d, t.d = data.next()
loss.forward()
solver.zero_grad() # Initialize gradients of all parameters to zero.
loss.backward()
solver.weight_decay(1e-5) # Applying weight decay as an regularization
solver.update()
if i % 100 == 0: # Print for each 10 iterations
print(i, loss.d)
# Training
training(1000, 1e-2)
0 2.42193937302
100 1.83251476288
200 1.49943637848
300 1.30751883984
400 1.00974023342
500 0.904026031494
600 0.873289525509
700 0.725554704666
800 0.614291608334
900 0.555113613605
# Showing responses for each layer
num_plot = len(hs) + 2
gid = 1
def scale01(h):
return (h - h.min()) / (h.max() - h.min())
def imshow(img, title):
global gid
plt.subplot(num_plot, 1, gid)
gid += 1
plt.title(title)
plt.imshow(img, **imshow_opt)
plt.axis('off')
plt.figure(figsize=(2, 5))
imshow(x.d[0, 0], 'x')
for hid, h in enumerate(hs):
imshow(scale01(h.d[0]).reshape(-1, 8), 'h{}'.format(hid + 1))
imshow(scale01(y.d[0]).reshape(2, 5), 'y')

Convolutional Neural Network with CUDA acceleration¶
Here we demonstrates a CNN with CUDA GPU acceleration.
nn.clear_parameters()
def cnn(x):
with nn.parameter_scope("cnn"): # Parameter scope can be nested
with nn.parameter_scope("conv1"):
c1 = F.tanh(PF.batch_normalization(
PF.convolution(x, 4, (3, 3), pad=(1, 1), stride=(2, 2))))
with nn.parameter_scope("conv2"):
c2 = F.tanh(PF.batch_normalization(
PF.convolution(c1, 8, (3, 3), pad=(1, 1))))
c2 = F.average_pooling(c2, (2, 2))
with nn.parameter_scope("fc3"):
fc3 = F.tanh(PF.affine(c2, 32))
with nn.parameter_scope("classifier"):
y = PF.affine(fc3, 10)
return y, [c1, c2, fc3]
To enable CUDA extension in NNabla, you have to install nnabla-ext-cuda
package first. See the install
guide.
After installing the CUDA extension, you can easily switch to run on
CUDA by specifying a context before building a graph. We strongly
recommend using a cuDNN context that is fast. Although the context class
can be instantiated by nn.Context()
, specifying a context descriptor
might be a bit complicated for users. There for we recommend create a
context by using a helper function get_extension_context()
found in the
nnabla.ext_utils
module. NNabla officially supports cpu
and cudnn
as a context specifier passed to the first argument
(extension name). NOTE: By setting the cudnn context as a global default
context, Functions and solves created are instantiated with cuDNN
(preferred) mode. You can also specify a context using
with nn.context_scope()
. See API
reference
for details.
# Run on CUDA
from nnabla.ext_utils import get_extension_context
cuda_device_id = 0
ctx = get_extension_context('cudnn', device_id=cuda_device_id)
print("Context: {}".format(ctx))
nn.set_default_context(ctx) # Set CUDA as a default context.
y, hs = cnn(x)
loss = logreg_loss(y, t)
2017-06-26 23:09:54,555 [nnabla][INFO]: Initializing CUDA extension...
2017-06-26 23:09:54,731 [nnabla][INFO]: Initializing cuDNN extension...
Context: Context(backend='cpu|cuda', array_class='CudaCachedArray', device_id='0', compute_backend='default|cudnn')
training(1000, 1e-1)
0 2.34862923622
100 1.00527024269
200 0.416576713324
300 0.240603536367
400 0.254562884569
500 0.206138283014
600 0.220851421356
700 0.161689639091
800 0.230873346329
900 0.121101222932
# Showing responses for each layer
num_plot = len(hs) + 2
gid = 1
plt.figure(figsize=(2, 8))
imshow(x.d[0, 0], 'x')
imshow(tile_images(hs[0].d[0][:, None]), 'conv1')
imshow(tile_images(hs[1].d[0][:, None]), 'conv2')
imshow(hs[2].d[0].reshape(-1, 8), 'fc3')
imshow(scale01(y.d[0]).reshape(2, 5), 'y')

nn.save_parameters
writes parameters registered in
parameter_scope
system in HDF5 format. We use it a later example.
path_cnn_params = "tmp.params.cnn.h5"
nn.save_parameters(path_cnn_params)
2017-06-26 23:09:56,132 [nnabla][INFO]: Parameter save (hdf5): tmp.params.cnn.h5
Recurrent Neural Network (Elman RNN)¶
This is an example of recurrent neural network training.
nn.clear_parameters()
def rnn(xs, h0, hidden=32):
hs = []
with nn.parameter_scope("rnn"):
h = h0
# Time step loop
for x in xs:
# Note: Parameter scopes are reused over time
# which means parameters are shared over time.
with nn.parameter_scope("x2h"):
x2h = PF.affine(x, hidden, with_bias=False)
with nn.parameter_scope("h2h"):
h2h = PF.affine(h, hidden)
h = F.tanh(x2h + h2h)
hs.append(h)
with nn.parameter_scope("classifier"):
y = PF.affine(h, 10)
return y, hs
It is not meaningful, but just a demonstration purpose. We split an image into 2 by 2 grids, and feed them sequentially into RNN.
def split_grid4(x):
x0 = x[..., :4, :4]
x1 = x[..., :4, 4:]
x2 = x[..., 4:, :4]
x3 = x[..., 4:, 4:]
return x0, x1, x2, x3
hidden = 32
seq_img = split_grid4(img)
seq_x = [nn.Variable(subimg.shape) for subimg in seq_img]
h0 = nn.Variable((img.shape[0], hidden)) # Initial hidden state.
y, hs = rnn(seq_x, h0, hidden)
loss = logreg_loss(y, t)
# Copied from the above logreg example.
def training_rnn(steps, learning_rate):
solver = S.Sgd(learning_rate)
solver.set_parameters(nn.get_parameters()) # Set parameter variables to be updated.
for i in range(steps):
minibatch = data.next()
img, t.d = minibatch
seq_img = split_grid4(img)
h0.d = 0 # Initialize as 0
for x, subimg in zip(seq_x, seq_img):
x.d = subimg
loss.forward()
solver.zero_grad() # Initialize gradients of all parameters to zero.
loss.backward()
solver.weight_decay(1e-5) # Applying weight decay as an regularization
solver.update()
if i % 100 == 0: # Print for each 10 iterations
print(i, loss.d)
training_rnn(1000, 1e-1)
0 2.62527275085
100 0.780260562897
200 0.486522495747
300 0.289345681667
400 0.249717146158
500 0.538961410522
600 0.276877015829
700 0.159639537334
800 0.249660402536
900 0.0925596579909
# Showing responses for each layer
num_plot = len(hs) + 2
gid = 1
plt.figure(figsize=(2, 8))
imshow(x.d[0, 0], 'x')
for hid, h in enumerate(hs):
imshow(scale01(h.d[0]).reshape(-1, 8), 'h{}'.format(hid + 1))
imshow(scale01(y.d[0]).reshape(2, 5), 'y')

Siamese Network¶
This example show how to embed an image in a categorical dataset into 2D space using deep learning. This also demonstrates how to reuse a pretrained network.
First, we load parameters learned in the CNN example.
nn.clear_parameters()
# Loading CNN pretrained parameters.
_ = nn.load_parameters(path_cnn_params)
2017-06-26 23:09:57,838 [nnabla][INFO]: Parameter load (<built-in function format>): tmp.params.cnn.h5
We define embedding function. Note that the network structure and parameter hierarchy is identical to the previous CNN example. That enables you to reuse the saved parameters and finetune from it.
def cnn_embed(x, test=False):
# Note: Identical configuration with the CNN example above.
# Parameters pretrained in the above CNN example are used.
with nn.parameter_scope("cnn"):
with nn.parameter_scope("conv1"):
c1 = F.tanh(PF.batch_normalization(PF.convolution(x, 4, (3, 3), pad=(1, 1), stride=(2, 2)), batch_stat=not test))
with nn.parameter_scope("conv2"):
c2 = F.tanh(PF.batch_normalization(PF.convolution(c1, 8, (3, 3), pad=(1, 1)), batch_stat=not test))
c2 = F.average_pooling(c2, (2, 2))
with nn.parameter_scope("fc3"):
fc3 = PF.affine(c2, 32)
# Additional affine for map into 2D.
with nn.parameter_scope("embed2d"):
embed = PF.affine(c2, 2)
return embed, [c1, c2, fc3]
def siamese_loss(e0, e1, t, margin=1.0, eps=1e-4):
dist = F.sum(F.squared_error(e0, e1), axis=1) # Squared distance
# Contrastive loss
sim_cost = t * dist
dissim_cost = (1 - t) * \
(F.maximum_scalar(margin - (dist + eps) ** (0.5), 0) ** 2)
return F.mean(sim_cost + dissim_cost)
We build two stream CNNs and compare them with the contrastive loss function defined above. Note that both CNNs have the same parameter hierarchy, which means both parameters are shared.
x0 = nn.Variable(img.shape)
x1 = nn.Variable(img.shape)
t = nn.Variable((img.shape[0],)) # Same class or not
e0, hs0 = cnn_embed(x0)
e1, hs1 = cnn_embed(x1) # NOTE: parameters are shared
loss = siamese_loss(e0, e1, t)
def training_siamese(steps):
for i in range(steps):
minibatchs = []
for _ in range(2):
minibatch = data.next()
minibatchs.append((minibatch[0].copy(), minibatch[1].copy()))
x0.d, label0 = minibatchs[0]
x1.d, label1 = minibatchs[1]
t.d = (label0 == label1).astype(np.int).flat
loss.forward()
solver.zero_grad() # Initialize gradients of all parameters to zero.
loss.backward()
solver.weight_decay(1e-5) # Applying weight decay as an regularization
solver.update()
if i % 100 == 0: # Print for each 10 iterations
print(i, loss.d)
learning_rate = 1e-2
solver = S.Sgd(learning_rate)
with nn.parameter_scope("embed2d"):
# Only 2d embedding affine will be updated.
solver.set_parameters(nn.get_parameters())
training_siamese(2000)
# Decay learning rate
solver.set_learning_rate(solver.learning_rate() * 0.1)
training_siamese(2000)
0 0.150528043509
100 0.186870157719
200 0.149316266179
300 0.207163512707
400 0.171384960413
500 0.190256178379
600 0.138507723808
700 0.0918073058128
800 0.159692272544
900 0.0833697617054
1000 0.0839115008712
1100 0.104669973254
1200 0.0776312947273
1300 0.114788673818
1400 0.120309025049
1500 0.107732802629
1600 0.070114441216
1700 0.101728007197
1800 0.114350572228
1900 0.118794307113
0 0.0669310241938
100 0.0553173273802
200 0.0829797014594
300 0.0951051414013
400 0.128303915262
500 0.102963000536
600 0.0910559669137
700 0.0898950695992
800 0.119949311018
900 0.0603067912161
1000 0.105748720467
1100 0.108760476112
1200 0.0820947736502
1300 0.0971114039421
1400 0.0836166366935
1500 0.0899554267526
1600 0.109069615602
1700 0.0921652168036
1800 0.0759357959032
1900 0.100669950247
We visualize embedded training images as following. You see the images from the same class embedded near each other.
all_image = digits.images[:512, None]
all_label = digits.target[:512]
x_all = nn.Variable(all_image.shape)
x_all.d = all_image
with nn.auto_forward():
embed, _ = cnn_embed(x_all, test=True)
plt.figure(figsize=(16, 9))
for i in range(10):
c = plt.cm.Set1(i / 10.) # Maybe it doesn't work in an older version of Matplotlib where color map lies in [0, 256)
plt.plot(embed.d[all_label == i, 0].flatten(), embed.d[
all_label == i, 1].flatten(), '.', c=c)
plt.legend(map(str, range(10)))
plt.grid()

Appendix¶
A. Logistic Regression¶
Here we demonstrate how to train the simplest neural network, logistic regression (single layer perceptron). Logistic regression is a linear classifier \(f : {\cal R}^{D\times 1} \rightarrow {\cal R}^{K\times 1}\)
where \(\mathbf x \in {\cal R}^{D \times 1}\) is an input image flattened to a vector, \(t \in \{0, 1, \cdots, K\}\) is a target label, \(\mathbf W \in {\cal R}^{K \times D}\) is a weight matrix, \(\mathbf b \in {\cal R}^{K \times 1}\) is a bias vector and \(\mathbf \Theta \equiv \left\{\mathbf W, \mathbf b\right\}\). Loss function is defined as
where \(\mathbf X \equiv \left\{\mathbf x_1, t_1, \cdots, \mathbf x_N, t_N\right\}\) denotes a dataset the network trained on, \(\sigma(\mathbf z)\) is softmax operation defined as \(\frac{\exp(-\mathbf z)}{\sum_{z \subset \mathbf z} \exp(-z)}\), and \(\left[\mathbf z\right]_i\) denotes i-th element of \(\mathbf z\).
NNabla Python API Demonstration Tutorial¶
Let us import nnabla first, and some additional useful tools.
# python2/3 compatibility
from __future__ import print_function
from __future__ import absolute_import
from __future__ import division
import nnabla as nn # Abbreviate as nn for convenience.
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
2017-09-27 14:00:30,785 [nnabla][INFO]: Initializing CPU extension...
NdArray¶
NdArray is a data container of a multi-dimensional array. NdArray is
device (e.g. CPU, CUDA) and type (e.g. uint8, float32) agnostic, in
which both type and device are implicitly casted or transferred when it
is used. Below, you create a NdArray with a shape of (2, 3, 4)
.
a = nn.NdArray((2, 3, 4))
You can see the values held inside a
by the following. The values
are not initialized, and are created as float32 by default.
print(a.data)
[[[ 9.42546995e+24 4.56809286e-41 8.47690058e-38 0.00000000e+00]
[ 7.38056336e+34 7.50334969e+28 1.17078231e-32 7.58387310e+31]
[ 7.87001454e-12 9.84394250e-12 6.85712044e+22 1.81785692e+31]]
[[ 1.84681296e+25 1.84933247e+20 4.85656319e+33 2.06176836e-19]
[ 6.80020530e+22 1.69307638e+22 2.11235872e-19 1.94316151e-19]
[ 1.81805047e+31 3.01289097e+29 2.07004908e-19 1.84648795e+25]]]
The accessor .data
returns a reference to the values of NdArray as
numpy.ndarray
. You can modify these by using the NumPy API as
follows.
print('[Substituting random values]')
a.data = np.random.randn(*a.shape)
print(a.data)
print('[Slicing]')
a.data[0, :, ::2] = 0
print(a.data)
[Substituting random values]
[[[ 0.36133638 0.22121875 -1.5912329 -0.33490974]
[ 1.35962474 0.2165522 0.54483992 -0.61813235]
[-0.13718799 -0.44104072 -0.51307833 0.73900551]]
[[-0.59464753 -2.17738533 -0.28626776 -0.45654735]
[ 0.73566747 0.87292582 -0.41605178 0.04792296]
[-0.63856047 0.31966645 -0.63974309 -0.61385244]]]
[Slicing]
[[[ 0. 0.22121875 0. -0.33490974]
[ 0. 0.2165522 0. -0.61813235]
[ 0. -0.44104072 0. 0.73900551]]
[[-0.59464753 -2.17738533 -0.28626776 -0.45654735]
[ 0.73566747 0.87292582 -0.41605178 0.04792296]
[-0.63856047 0.31966645 -0.63974309 -0.61385244]]]
Note that the above operation is all done in the host device (CPU).
NdArray provides more efficient functions in case you want to fill all
values with a constant, .zero
and .fill
. They are lazily
evaluated when the data is requested (when neural network computation
requests the data, or when NumPy array is requested by Python) The
filling operation is executed within a specific device (e.g. CUDA GPU),
and more efficient if you specify the device setting, which we explain
later.
a.fill(1) # Filling all values with one.
print(a.data)
[[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]
[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]]
You can create an NdArray instance directly from a NumPy array object.
b = nn.NdArray.from_numpy_array(np.ones(a.shape))
print(b.data)
[[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]
[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]]
NdArray is used in Variable class, as well as NNabla’s imperative computation of neural networks. We describe them in the later sections.
Variable¶
Variable class is used when you construct a neural network. The neural network can be described as a graph in which an edge represents a function (a.k.a operator and layer) which defines operation of a minimum unit of computation, and a node represents a variable which holds input/output values of a function (Function class is explained later). The graph is called “Computation Graph”.
In NNabla, a Variable, a node of a computation graph, holds two
NdArray
s, one for storing the input or output values of a function
during forward propagation (executing computation graph in the forward
order), while another for storing the backward error signal (gradient)
during backward propagation (executing computation graph in backward
order to propagate error signals down to parameters (weights) of neural
networks). The first one is called data
, the second is grad
in
NNabla.
The following line creates a Variable instance with a shape of (2, 3,
4). It has data
and grad
as NdArray
. The flag need_grad
is used to omit unnecessary gradient computation during backprop if set
to False.
x = nn.Variable([2, 3, 4], need_grad=True)
print('x.data:', x.data)
print('x.grad:', x.grad)
x.data: <NdArray((2, 3, 4)) at 0x7f575caf4ea0>
x.grad: <NdArray((2, 3, 4)) at 0x7f575caf4ea0>
You can get the shape by:
x.shape
(2, 3, 4)
Since both data
and grad
are NdArray
, you can get a
reference to its values as NdArray with the .data
accessor, but also
it can be referred by .d
or .g
property for data
and grad
respectively.
print('x.data')
print(x.d)
x.d = 1.2345 # To avoid NaN
assert np.all(x.d == x.data.data), 'd: {} != {}'.format(x.d, x.data.data)
print('x.grad')
print(x.g)
x.g = 1.2345 # To avoid NaN
assert np.all(x.g == x.grad.data), 'g: {} != {}'.format(x.g, x.grad.data)
# Zeroing grad values
x.grad.zero()
print('x.grad (after `.zero()`)')
print(x.g)
x.data
[[[ 9.42553452e+24 4.56809286e-41 8.32543479e-38 0.00000000e+00]
[ nan nan 0.00000000e+00 0.00000000e+00]
[ 3.70977305e+25 4.56809286e-41 3.78350585e-44 0.00000000e+00]]
[[ 5.68736600e-38 0.00000000e+00 1.86176378e-13 4.56809286e-41]
[ 4.74367616e+25 4.56809286e-41 5.43829710e+19 4.56809286e-41]
[ 0.00000000e+00 0.00000000e+00 2.93623372e-38 0.00000000e+00]]]
x.grad
[[[ 9.42576510e+24 4.56809286e-41 9.42576510e+24 4.56809286e-41]
[ 9.27127763e-38 0.00000000e+00 9.27127763e-38 0.00000000e+00]
[ 1.69275966e+22 4.80112800e+30 1.21230330e+25 7.22962302e+31]]
[[ 1.10471027e-32 4.63080422e+27 2.44632805e+20 2.87606258e+20]
[ 4.46263300e+30 4.62311881e+30 7.65000750e+28 3.01339003e+29]
[ 2.08627352e-10 1.03961868e+21 7.99576678e+20 1.74441223e+22]]]
x.grad (after .zero()
)
[[[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]]
[[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]]]
Like NdArray
, a Variable
can also be created from NumPy
array(s).
x2 = nn.Variable.from_numpy_array(np.ones((3,)), need_grad=True)
print(x2)
print(x2.d)
x3 = nn.Variable.from_numpy_array(np.ones((3,)), np.zeros((3,)), need_grad=True)
print(x3)
print(x3.d)
print(x3.g)
<Variable((3,), need_grad=True) at 0x7f572a5242c8>
[ 1. 1. 1.]
<Variable((3,), need_grad=True) at 0x7f572a5244a8>
[ 1. 1. 1.]
[ 0. 0. 0.]
Besides storing values of a computation graph, pointing a parent edge
(function) to trace the computation graph is an important role. Here
x
doesn’t have any connection. Therefore, the .parent
property
returns None.
print(x.parent)
None
Function¶
A function defines an operation block of a computation graph as we
described above. The module nnabla.functions
offers various
functions (e.g. Convolution, Affine and ReLU). You can see the list of
functions available in the API reference
guide.
import nnabla.functions as F
As an example, here you will defines a computation graph that computes the element-wise Sigmoid function outputs for the input variable and sums up all values into a scalar. (This is simple enough to explain how it behaves but a meaningless example in the context of neural network training. We will show you a neural network example later.)
sigmoid_output = F.sigmoid(x)
sum_output = F.reduce_sum(sigmoid_output)
The function API in nnabla.functions
takes one (or several)
Variable(s) and arguments (if any), and returns one (or several) output
Variable(s). The .parent
points to the function instance which
created it. Note that no computation occurs at this time since we just
define the graph. (This is the default behavior of NNabla computation
graph API. You can also fire actual computation during graph definition
which we call “Dynamic mode” (explained later)).
print("sigmoid_output.parent.name:", sigmoid_output.parent.name)
print("x:", x)
print("sigmoid_output.parent.inputs refers to x:", sigmoid_output.parent.inputs)
sigmoid_output.parent.name: Sigmoid
x: <Variable((2, 3, 4), need_grad=True) at 0x7f572a51a778>
sigmoid_output.parent.inputs refers to x: [<Variable((2, 3, 4), need_grad=True) at 0x7f572a273a48>]
print("sum_output.parent.name:", sum_output.parent.name)
print("sigmoid_output:", sigmoid_output)
print("sum_output.parent.inputs refers to sigmoid_output:", sum_output.parent.inputs)
sum_output.parent.name: ReduceSum
sigmoid_output: <Variable((2, 3, 4), need_grad=True) at 0x7f572a524638>
sum_output.parent.inputs refers to sigmoid_output: [<Variable((2, 3, 4), need_grad=True) at 0x7f572a273a48>]
The .forward()
at a leaf Variable executes the forward pass
computation in the computation graph.
sum_output.forward()
print("CG output:", sum_output.d)
print("Reference:", np.sum(1.0 / (1.0 + np.exp(-x.d))))
CG output: 18.59052085876465
Reference: 18.5905
The .backward()
does the backward propagation through the graph.
Here we initialize the grad
values as zero before backprop since the
NNabla backprop algorithm always accumulates the gradient in the root
variables.
x.grad.zero()
sum_output.backward()
print("d sum_o / d sigmoid_o:")
print(sigmoid_output.g)
print("d sum_o / d x:")
print(x.g)
d sum_o / d sigmoid_o:
[[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]
[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]]
d sum_o / d x:
[[[ 0.17459197 0.17459197 0.17459197 0.17459197]
[ 0.17459197 0.17459197 0.17459197 0.17459197]
[ 0.17459197 0.17459197 0.17459197 0.17459197]]
[[ 0.17459197 0.17459197 0.17459197 0.17459197]
[ 0.17459197 0.17459197 0.17459197 0.17459197]
[ 0.17459197 0.17459197 0.17459197 0.17459197]]]
NNabla is developed by mainly focused on neural network training and
inference. Neural networks have parameters to be learned associated with
computation blocks such as Convolution, Affine (a.k.a. fully connected,
dense etc.). In NNabla, the learnable parameters are also represented as
Variable
objects. Just like input variables, those parameter
variables are also used by passing into Function
s. For example,
Affine function takes input, weights and biases as inputs.
x = nn.Variable([5, 2]) # Input
w = nn.Variable([2, 3], need_grad=True) # Weights
b = nn.Variable([3], need_grad=True) # Biases
affine_out = F.affine(x, w, b) # Create a graph including only affine
The above example takes an input with B=5 (batchsize) and D=2 (dimensions) and maps it to D’=3 outputs, i.e. (B, D’) output.
You may also notice that here you set need_grad=True
only for
parameter variables (w and b). The x is a non-parameter variable and the
root of computation graph. Therefore, it doesn’t require gradient
computation. In this configuration, the gradient computation for x is
not executed in the first affine, which will omit the computation of
unnecessary backpropagation.
The next block sets data and initializes grad, then applies forward and backward computation.
# Set random input and parameters
x.d = np.random.randn(*x.shape)
w.d = np.random.randn(*w.shape)
b.d = np.random.randn(*b.shape)
# Initialize grad
x.grad.zero() # Just for showing gradients are not computed when need_grad=False (default).
w.grad.zero()
b.grad.zero()
# Forward and backward
affine_out.forward()
affine_out.backward()
# Note: Calling backward at non-scalar Variable propagates 1 as error message from all element of outputs. .
You can see that affine_out holds an output of Affine.
print('F.affine')
print(affine_out.d)
print('Reference')
print(np.dot(x.d, w.d) + b.d)
F.affine
[[-0.17701732 2.86095762 -0.82298267]
[-0.75544345 -1.16702223 -2.44841242]
[-0.36278027 -3.4771595 -0.75681627]
[ 0.32743117 0.24258983 1.30944324]
[-0.87201929 1.94556415 -3.23357344]]
Reference
[[-0.1770173 2.86095762 -0.82298267]
[-0.75544345 -1.16702223 -2.44841242]
[-0.3627803 -3.4771595 -0.75681627]
[ 0.32743117 0.24258983 1.309443 ]
[-0.87201929 1.94556415 -3.23357344]]
The resulting gradients of weights and biases are as follows.
print("dw")
print(w.g)
print("db")
print(b.g)
dw
[[ 3.10820675 3.10820675 3.10820675]
[ 0.37446201 0.37446201 0.37446201]]
db
[ 5. 5. 5.]
The gradient of x
is not changed because need_grad
is set as
False.
print(x.g)
[[ 0. 0.]
[ 0. 0.]
[ 0. 0.]
[ 0. 0.]
[ 0. 0.]]
Parametric Function¶
Considering parameters as inputs of Function
enhances expressiveness
and flexibility of computation graphs. However, to define all parameters
for each learnable function is annoying for users to define a neural
network. In NNabla, trainable models are usually created by composing
functions that have optimizable parameters. These functions are called
“Parametric Functions”. The Parametric Function API provides various
parametric functions and an interface for composing trainable models.
To use parametric functions, import:
import nnabla.parametric_functions as PF
The function with optimizable parameter can be created as below.
with nn.parameter_scope("affine1"):
c1 = PF.affine(x, 3)
The first line creates a parameter scope. The second line then
applies PF.affine
- an affine transform - to x
, and creates a
variable c1
holding that result. The parameters are created and
initialized randomly at function call, and registered by a name
“affine1” using parameter_scope
context. The function
nnabla.get_parameters()
allows to get the registered parameters.
nn.get_parameters()
OrderedDict([('affine1/affine/W',
<Variable((2, 3), need_grad=True) at 0x7f572822f0e8>),
('affine1/affine/b',
<Variable((3,), need_grad=True) at 0x7f572822f138>)])
The name=
argument of any PF function creates the equivalent
parameter space to the above definition of PF.affine
transformation
as below. It could save the space of your Python code. The
nnabla.parametric_scope
is more useful when you group multiple
parametric functions such as Convolution-BatchNormalization found in a
typical unit of CNNs.
c1 = PF.affine(x, 3, name='affine1')
nn.get_parameters()
OrderedDict([('affine1/affine/W',
<Variable((2, 3), need_grad=True) at 0x7f572822f0e8>),
('affine1/affine/b',
<Variable((3,), need_grad=True) at 0x7f572822f138>)])
It is worth noting that the shapes of both outputs and parameter variables (as you can see above) are automatically determined by only providing the output size of affine transformation(in the example above the output size is 3). This helps to create a graph in an easy way.
c1.shape
(5, 3)
Parameter scope can be nested as follows (although a meaningless example).
with nn.parameter_scope('foo'):
h = PF.affine(x, 3)
with nn.parameter_scope('bar'):
h = PF.affine(h, 4)
This creates the following.
nn.get_parameters()
OrderedDict([('affine1/affine/W',
<Variable((2, 3), need_grad=True) at 0x7f572822f0e8>),
('affine1/affine/b',
<Variable((3,), need_grad=True) at 0x7f572822f138>),
('foo/affine/W',
<Variable((2, 3), need_grad=True) at 0x7f572822fa98>),
('foo/affine/b',
<Variable((3,), need_grad=True) at 0x7f572822fae8>),
('foo/bar/affine/W',
<Variable((3, 4), need_grad=True) at 0x7f572822f728>),
('foo/bar/affine/b',
<Variable((4,), need_grad=True) at 0x7f572822fdb8>)])
Also, get_parameters()
can be used in parameter_scope
. For
example:
with nn.parameter_scope("foo"):
print(nn.get_parameters())
OrderedDict([('affine/W', <Variable((2, 3), need_grad=True) at 0x7f572822fa98>), ('affine/b', <Variable((3,), need_grad=True) at 0x7f572822fae8>), ('bar/affine/W', <Variable((3, 4), need_grad=True) at 0x7f572822f728>), ('bar/affine/b', <Variable((4,), need_grad=True) at 0x7f572822fdb8>)])
nnabla.clear_parameters()
can be used to delete registered
parameters under the scope.
with nn.parameter_scope("foo"):
nn.clear_parameters()
print(nn.get_parameters())
OrderedDict([('affine1/affine/W', <Variable((2, 3), need_grad=True) at 0x7f572822f0e8>), ('affine1/affine/b', <Variable((3,), need_grad=True) at 0x7f572822f138>)])
MLP Example For Explanation¶
The following block creates a computation graph to predict one dimensional output from two dimensional inputs by a 2 layer fully connected neural network (multi-layer perceptron).
nn.clear_parameters()
batchsize = 16
x = nn.Variable([batchsize, 2])
with nn.parameter_scope("fc1"):
h = F.tanh(PF.affine(x, 512))
with nn.parameter_scope("fc2"):
y = PF.affine(h, 1)
print("Shapes:", h.shape, y.shape)
Shapes: (16, 512) (16, 1)
This will create the following parameter variables.
nn.get_parameters()
OrderedDict([('fc1/affine/W',
<Variable((2, 512), need_grad=True) at 0x7f572822fef8>),
('fc1/affine/b',
<Variable((512,), need_grad=True) at 0x7f572822f9a8>),
('fc2/affine/W',
<Variable((512, 1), need_grad=True) at 0x7f572822f778>),
('fc2/affine/b',
<Variable((1,), need_grad=True) at 0x7f572822ff98>)])
As described above, you can execute the forward pass by calling forward method at the terminal variable.
x.d = np.random.randn(*x.shape) # Set random input
y.forward()
print(y.d)
[[-0.05708594]
[ 0.01661986]
[-0.34168088]
[ 0.05822293]
[-0.16566885]
[-0.04867431]
[ 0.2633169 ]
[ 0.10496549]
[-0.01291842]
[-0.09726256]
[-0.05720493]
[-0.09691752]
[-0.07822668]
[-0.17180404]
[ 0.11970415]
[-0.08222144]]
Training a neural networks needs a loss value to be minimized by gradient descent with backprop. In NNabla, loss function is also a just function, and packaged in the functions module.
# Variable for label
label = nn.Variable([batchsize, 1])
# Set loss
loss = F.reduce_mean(F.squared_error(y, label))
# Execute forward pass.
label.d = np.random.randn(*label.shape) # Randomly generate labels
loss.forward()
print(loss.d)
1.9382084608078003
As you’ve seen above, NNabla backward
accumulates the gradients at
the root variables. You have to initialize the grad of the parameter
variables before backprop (We will show you the easiest way with
Solver
API).
# Collect all parameter variables and init grad.
for name, param in nn.get_parameters().items():
param.grad.zero()
# Gradients are accumulated to grad of params.
loss.backward()
Imperative Mode¶
After performing backprop, gradients are held in parameter variable grads. The next block will update the parameters with vanilla gradient descent.
for name, param in nn.get_parameters().items():
param.data -= param.grad * 0.001 # 0.001 as learning rate
The above computation is an example of NNabla’s “Imperative Mode” for
executing neural networks. Normally, NNabla functions (instances of
nnabla.functions)
take Variable
s as their input. When at least one NdArray
is
provided as an input for NNabla functions (instead of Variable
s),
the function computation will be fired immediately, and returns an
NdArray
as the output, instead of returning a Variable
. In the
above example, the NNabla functions F.mul_scalar
and F.sub2
are
called by the overridden operators *
and -=
, respectively.
In other words, NNabla’s “Imperative mode” doesn’t create a computation graph, and can be used like NumPy. If device acceleration such as CUDA is enabled, it can be used like NumPy empowered with device acceleration. Parametric functions can also be used with NdArray input(s). The following block demonstrates a simple imperative execution example.
# A simple example of imperative mode.
xi = nn.NdArray.from_numpy_array(np.arange(4).reshape(2, 2))
yi = F.relu(xi - 1)
print(xi.data)
print(yi.data)
[[0 1]
[2 3]]
[[ 0. 0.]
[ 1. 2.]]
Note that in-place substitution from the rhs to the lhs cannot be done
by the =
operator. For example, when x
is an NdArray
,
writing x = x + 1
will not increment all values of x
-
instead, the expression on the rhs will create a new NdArray
object that is different from the one originally bound by x
, and binds
the new NdArray
object to the Python variable x
on the lhs.
For in-place editing of NdArrays
, the in-place assignment operators
+=
, -=
, *=
, and /=
can be used. The copy_from
method
can also be used to copy values of an existing NdArray
to another.
For example, incrementing 1 to x
, an NdArray
, can be done by
x.copy_from(x+1)
. The copy is performed with device acceleration if
a device context is specified by using nnabla.set_default_context
or
nnabla.context_scope
.
# The following doesn't perform substitution but assigns a new NdArray object to `xi`.
# xi = xi + 1
# The following copies the result of `xi + 1` to `xi`.
xi.copy_from(xi + 1)
assert np.all(xi.data == (np.arange(4).reshape(2, 2) + 1))
# Inplace operations like `+=`, `*=` can also be used (more efficient).
xi += 1
assert np.all(xi.data == (np.arange(4).reshape(2, 2) + 2))
Solver¶
NNabla provides stochastic gradient descent algorithms to optimize
parameters listed in the nnabla.solvers
module. The parameter
updates demonstrated above can be replaced with this Solver API, which
is easier and usually faster.
from nnabla import solvers as S
solver = S.Sgd(lr=0.00001)
solver.set_parameters(nn.get_parameters())
# Set random data
x.d = np.random.randn(*x.shape)
label.d = np.random.randn(*label.shape)
# Forward
loss.forward()
Just call the the following solver method to fill zero grad region, then backprop
solver.zero_grad()
loss.backward()
The following block updates parameters with the Vanilla Sgd rule (equivalent to the imperative example above).
solver.update()
Toy Problem To Demonstrate Training¶
The following function defines a regression problem which computes the norm of a vector.
def vector2length(x):
# x : [B, 2] where B is number of samples.
return np.sqrt(np.sum(x ** 2, axis=1, keepdims=True))
We visualize this mapping with the contour plot by matplotlib as follows.
# Data for plotting contour on a grid data.
xs = np.linspace(-1, 1, 100)
ys = np.linspace(-1, 1, 100)
grid = np.meshgrid(xs, ys)
X = grid[0].flatten()
Y = grid[1].flatten()
def plot_true():
"""Plotting contour of true mapping from a grid data created above."""
plt.contourf(xs, ys, vector2length(np.hstack([X[:, None], Y[:, None]])).reshape(100, 100))
plt.axis('equal')
plt.colorbar()
plot_true()

We define a deep prediction neural network.
def length_mlp(x):
h = x
for i, hnum in enumerate([4, 8, 4, 2]):
h = F.tanh(PF.affine(h, hnum, name="fc{}".format(i)))
y = PF.affine(h, 1, name='fc')
return y
nn.clear_parameters()
batchsize = 100
x = nn.Variable([batchsize, 2])
y = length_mlp(x)
label = nn.Variable([batchsize, 1])
loss = F.reduce_mean(F.squared_error(y, label))
We created a 5 layers deep MLP using for-loop. Note that only 3 lines of the code potentially create infinitely deep neural networks. The next block adds helper functions to visualize the learned function.
def predict(inp):
ret = []
for i in range(0, inp.shape[0], x.shape[0]):
xx = inp[i:i + x.shape[0]]
# Imperative execution
xi = nn.NdArray.from_numpy_array(xx)
yi = length_mlp(xi)
ret.append(yi.data.copy())
return np.vstack(ret)
def plot_prediction():
plt.contourf(xs, ys, predict(np.hstack([X[:, None], Y[:, None]])).reshape(100, 100))
plt.colorbar()
plt.axis('equal')
Next we instantiate a solver object as follows. We use Adam optimizer which is one of the most popular SGD algorithm used in the literature.
from nnabla import solvers as S
solver = S.Adam(alpha=0.01)
solver.set_parameters(nn.get_parameters())
The following function generates data from the true system infinitely.
def random_data_provider(n):
x = np.random.uniform(-1, 1, size=(n, 2))
y = vector2length(x)
return x, y
In the next block, we run 2000 training steps (SGD updates).
num_iter = 2000
for i in range(num_iter):
# Sample data and set them to input variables of training.
xx, ll = random_data_provider(batchsize)
x.d = xx
label.d = ll
# Forward propagation given inputs.
loss.forward(clear_no_need_grad=True)
# Parameter gradients initialization and gradients computation by backprop.
solver.zero_grad()
loss.backward(clear_buffer=True)
# Apply weight decay and update by Adam rule.
solver.weight_decay(1e-6)
solver.update()
# Just print progress.
if i % 100 == 0 or i == num_iter - 1:
print("Loss@{:4d}: {}".format(i, loss.d))
Loss@ 0: 0.6976373195648193
Loss@ 100: 0.08075223118066788
Loss@ 200: 0.005213144235312939
Loss@ 300: 0.001955194864422083
Loss@ 400: 0.0011660841992124915
Loss@ 500: 0.0006421314901672304
Loss@ 600: 0.0009330055327154696
Loss@ 700: 0.0008817618945613503
Loss@ 800: 0.0006205961108207703
Loss@ 900: 0.0009072928223758936
Loss@1000: 0.0008160348515957594
Loss@1100: 0.0011569359339773655
Loss@1200: 0.000837412488181144
Loss@1300: 0.0011542742140591145
Loss@1400: 0.0005833200993947685
Loss@1500: 0.0009848927147686481
Loss@1600: 0.0005141657311469316
Loss@1700: 0.0009339841199107468
Loss@1800: 0.000950580753851682
Loss@1900: 0.0005430278833955526
Loss@1999: 0.0007046313839964569
Memory usage optimization: You may notice that, in the above
updates, .forward()
is called with the clear_no_need_grad=
option, and .backward()
is called with the clear_buffer=
option.
Training of neural network in more realistic scenarios usually consumes
huge memory due to the nature of backpropagation algorithm, in which all
of the forward variable buffer data
should be kept in order to
compute the gradient of a function. In a naive implementation, we keep
all the variable data
and grad
living until the NdArray
objects are not referenced (i.e. the graph is deleted). The clear_*
options in .forward()
and .backward()
enables to save memory
consumption due to that by clearing (erasing) memory of data
and
grad
when it is not referenced by any subsequent computation. (More
precisely speaking, it doesn’t free memory actually. We use our memory
pool engine by default to avoid memory alloc/free overhead). The
unreferenced buffers can be re-used in subsequent computation. See the
document of Variable
for more details. Note that the following
loss.forward(clear_buffer=True)
clears data
of any intermediate
variables. If you are interested in intermediate variables for some
purposes (e.g. debug, log), you can use the .persistent
flag to
prevent clearing buffer of a specific Variable
like below.
loss.forward(clear_buffer=True)
print("The prediction `y` is cleared because it's an intermediate variable.")
print(y.d.flatten()[:4]) # to save space show only 4 values
y.persistent = True
loss.forward(clear_buffer=True)
print("The prediction `y` is kept by the persistent flag.")
print(y.d.flatten()[:4]) # to save space show only 4 value
The predictiony
is cleared because it's an intermediate variable. [ 2.27279830e-04 6.02164946e-05 5.33679675e-04 2.35557582e-05] The predictiony
is kept by the persistent flag. [ 1.0851264 0.87657517 0.79603785 0.40098712]
We can confirm the prediction performs fairly well by looking at the following visualization of the ground truth and prediction function.
plt.subplot(121)
plt.title("Ground truth")
plot_true()
plt.subplot(122)
plt.title("Prediction")
plot_prediction()

You can save learned parameters by nnabla.save_parameters
and load
by nnabla.load_parameters
.
path_param = "param-vector2length.h5"
nn.save_parameters(path_param)
# Remove all once
nn.clear_parameters()
nn.get_parameters()
2017-09-27 14:00:40,544 [nnabla][INFO]: Parameter save (.h5): param-vector2length.h5
OrderedDict()
# Load again
nn.load_parameters(path_param)
print('\n'.join(map(str, nn.get_parameters().items())))
2017-09-27 14:00:40,564 [nnabla][INFO]: Parameter load (<built-in function format>): param-vector2length.h5
('fc0/affine/W', <Variable((2, 4), need_grad=True) at 0x7f576328df48>)
('fc0/affine/b', <Variable((4,), need_grad=True) at 0x7f57245f2868>)
('fc1/affine/W', <Variable((4, 8), need_grad=True) at 0x7f576328def8>)
('fc1/affine/b', <Variable((8,), need_grad=True) at 0x7f5727ee5c78>)
('fc2/affine/W', <Variable((8, 4), need_grad=True) at 0x7f5763297318>)
('fc2/affine/b', <Variable((4,), need_grad=True) at 0x7f5727d29908>)
('fc3/affine/W', <Variable((4, 2), need_grad=True) at 0x7f57632973b8>)
('fc3/affine/b', <Variable((2,), need_grad=True) at 0x7f57632974a8>)
('fc/affine/W', <Variable((2, 1), need_grad=True) at 0x7f57632974f8>)
('fc/affine/b', <Variable((1,), need_grad=True) at 0x7f5763297598>)
Both save and load functions can also be used in a parameter scope.
with nn.parameter_scope('foo'):
nn.load_parameters(path_param)
print('\n'.join(map(str, nn.get_parameters().items())))
2017-09-27 14:00:40,714 [nnabla][INFO]: Parameter load (<built-in function format>): param-vector2length.h5
('fc0/affine/W', <Variable((2, 4), need_grad=True) at 0x7f576328df48>)
('fc0/affine/b', <Variable((4,), need_grad=True) at 0x7f57245f2868>)
('fc1/affine/W', <Variable((4, 8), need_grad=True) at 0x7f576328def8>)
('fc1/affine/b', <Variable((8,), need_grad=True) at 0x7f5727ee5c78>)
('fc2/affine/W', <Variable((8, 4), need_grad=True) at 0x7f5763297318>)
('fc2/affine/b', <Variable((4,), need_grad=True) at 0x7f5727d29908>)
('fc3/affine/W', <Variable((4, 2), need_grad=True) at 0x7f57632973b8>)
('fc3/affine/b', <Variable((2,), need_grad=True) at 0x7f57632974a8>)
('fc/affine/W', <Variable((2, 1), need_grad=True) at 0x7f57632974f8>)
('fc/affine/b', <Variable((1,), need_grad=True) at 0x7f5763297598>)
('foo/fc0/affine/W', <Variable((2, 4), need_grad=True) at 0x7f5763297958>)
('foo/fc0/affine/b', <Variable((4,), need_grad=True) at 0x7f57632978b8>)
('foo/fc1/affine/W', <Variable((4, 8), need_grad=True) at 0x7f572a51ac78>)
('foo/fc1/affine/b', <Variable((8,), need_grad=True) at 0x7f5763297c78>)
('foo/fc2/affine/W', <Variable((8, 4), need_grad=True) at 0x7f5763297a98>)
('foo/fc2/affine/b', <Variable((4,), need_grad=True) at 0x7f5763297d68>)
('foo/fc3/affine/W', <Variable((4, 2), need_grad=True) at 0x7f5763297e08>)
('foo/fc3/affine/b', <Variable((2,), need_grad=True) at 0x7f5763297ea8>)
('foo/fc/affine/W', <Variable((2, 1), need_grad=True) at 0x7f5763297f48>)
('foo/fc/affine/b', <Variable((1,), need_grad=True) at 0x7f5763297cc8>)
!rm {path_param} # Clean ups
NNabla Models Finetuning Tutorial¶
Here we demonstrate how to perform finetuning using nnabla’s pre-trained models.
Load the model¶
Loading the model is very simple. All you need is just 2 lines.
from nnabla.models.imagenet import ResNet18
model = ResNet18()
You can choose other ResNet models such as ResNet34
, ResNet50
,
by specifying the model’s name as an argument. Of course, you can choose
other pretrained models as well. See the
Docs.
NOTE: If you use the ResNet18
for the first time, nnabla will
automatically download the weights from https://nnabla.org
and it
may take up to a few minutes.
Dataset¶
In this tutorial, we use Caltech101 as the dataset for finetuning. Caltech101 consists of more than 9,000 object images in total and each image belongs to one of 101 distinct categories or “clutter” category. We use images from 101 categories for simple classification.
We have a script named caltech101_data.py
which can automatically
download the dataset and store it in nnabla_data
.
If you have your own dataset and DataIterator
which can load your
data, you can use it instead.
run caltech101_data.py
batch_size = 32 # we set batch_size = 32
all_data = data_iterator_caltech101(batch_size)
Since there is no separate data for training and validation in caltech101, we need to manually split it up. Here, we will split the dataset as the following way; 80% for training, and 20% for validation.
num_samples = all_data.size
num_train_samples = int(0.8 * num_samples) # Take 80% for training, and the rest for validation.
num_class = 101
data_iterator_train = all_data.slice(
rng=None, slice_start=0, slice_end=num_train_samples)
data_iterator_valid = all_data.slice(
rng=None, slice_start=num_train_samples, slice_end=num_samples)
Now we have model and data!
Optional: Check the image in the dataset¶
Let’s take a look at what kind of images are included in the dataset.
You can get images by DataIterator
’s method, next
import matplotlib.pyplot as plt
%matplotlib inline
images, labels = data_iterator_train.next()
sample_image, sample_label = images[0], labels[0]
plt.imshow(sample_image.transpose(1,2,0))
plt.show()
print("image_shape: {}".format(sample_image.shape))
print("label_id: {}".format(sample_label))

image_shape: (3, 128, 128)
label_id: [94]
Preparing Graph Construction¶
Let’s start with importing basic modules.
import nnabla as nn
# Optional: If you want to use GPU
from nnabla.ext_utils import get_extension_context
ctx = get_extension_context("cudnn")
nn.set_default_context(ctx)
ext = nn.ext_utils.import_extension_module("cudnn")
Create input Variables for the Network¶
Now we are going to create the input variables.
channels, image_height, image_width = sample_image.shape # use info from the image we got
# input variables for the validation network
image_valid = nn.Variable((batch_size, channels, image_height, image_width))
label_valid = nn.Variable((batch_size, 1))
input_image_valid = {"image": image_valid, "label": label_valid}
# input variables for the training network
image_train = nn.Variable((batch_size, channels, image_height, image_width))
label_train = nn.Variable((batch_size, 1))
input_image_train = {"image": image_train, "label": label_train}
Create the training graph using the pretrained model¶
If you take a look at the Model’s API
Reference,
you can find use_up_to
option. Specifying one of the pre-defined
strings when calling the model, the computation graph will be
constructed up to the layer you specify. For example, in case of
ResNet18
, you can choose one of the following as the last layer of
the graph.
‘classifier’ (default): The output of the final affine layer for classification.
‘pool’: The output of the final global average pooling.
‘lastconv’: The input of the final global average pooling without ReLU activation..
‘lastconv+relu’: Network up to ‘lastconv’ followed by ReLU activation.
For finetuning, it is common to replace only the upper layers with the
new (not trained) ones and re-use the lower layers with their pretrained
weights. Also, pretrained models have been trained on a classification
task on ImageNet, which has 1000 categories, so the output of the
classifier
layer has the output shape (batch_size, 1000)
that
wouldn’t fit our current dataset. For this reason, here we construct the
graph up to the pool
layer, which corresponds to the
global average pooling
layer in the original graph, and connect it
to the additional affine (fully-connected) layer for 101-way
classification. For finetuning, it is common to train only the weights
for the newly added layers (in this case, the last affine layer), but in
this tutorial, we will update the weights for all layers in the graph.
Also, when creating a training graph, you need to set training=True
.
import nnabla.parametric_functions as PF
y_train = model(image_train, force_global_pooling=True, use_up_to="pool", training=True)
with nn.parameter_scope("finetuning_fc"):
pred_train = PF.affine(y_train, 101) # adding the affine layer to the graph.
NOTE: You need to specify force_global_pooling=True
when the
input shape is different from what the model expects. You can check the
model’s default input shape by typing model.input_shape
.
Create the validation graph using the model¶
Creating the validation graph is almost the same. You simply need to
change training
flag to False
.
y_valid = model(image_valid,
force_global_pooling=True, use_up_to="pool", training=False)
with nn.parameter_scope("finetuning_fc"):
pred_valid = PF.affine(y_valid, 101)
pred_valid.persistent = True # to keep the value when get `forward(clear_buffer=True)`-ed.
Define the functions for computing Loss and Categorical Error¶
import nnabla.functions as F
def loss_function(pred, label):
"""
Compute loss.
"""
loss = F.mean(F.softmax_cross_entropy(pred, label))
return loss
loss_valid = loss_function(pred_valid, label_valid)
top_1_error_valid = F.mean(F.top_n_error(pred_valid, label_valid))
loss_train = loss_function(pred_train, label_train)
top_1_error_train = F.mean(F.top_n_error(pred_train, label_train))
Prepare the solver¶
import nnabla.solvers as S
solver = S.Momentum(0.01) # you can choose others as well
solver.set_parameters(nn.get_parameters())
Some setting for iteration¶
num_epoch = 10 # arbitrary
one_epoch = data_iterator_train.size // batch_size
max_iter = num_epoch * one_epoch
val_iter = data_iterator_valid.size // batch_size
Performance before finetuning¶
Let’s see how well the model works. Note that all the weights are pretrained on ImageNet except for the last affine layer. First, prepare a function to show us the model’s performance,
def run_validation(pred_valid, loss_valid, top_1_error_valid,
input_image_valid, data_iterator_valid,
with_visualized=False, num_visualized=3):
assert num_visualized < pred_valid.shape[0], "too many images to plot."
val_iter = data_iterator_valid.size // pred_valid.shape[0]
ve = 0.
vloss = 0.
for j in range(val_iter):
v_image, v_label = data_iterator_valid.next()
input_image_valid["image"].d = v_image
input_image_valid["label"].d = v_label
nn.forward_all([loss_valid, top_1_error_valid], clear_no_need_grad=True)
vloss += loss_valid.d.copy()
ve += top_1_error_valid.d.copy()
vloss /= val_iter
ve /= val_iter
if with_visualized:
ind = 1
random_start = np.random.randint(pred_valid.shape[0] - num_visualized)
fig = plt.figure(figsize=(12., 12.))
for n in range(random_start, random_start + num_visualized):
sample_image, sample_label = v_image[n], v_label[n]
ax = fig.add_subplot(1, num_visualized, ind)
ax.imshow(sample_image.transpose(1,2,0))
with nn.auto_forward():
predicted_id = np.argmax(F.softmax(pred_valid)[n].d)
result = "true label_id: {} - predicted as {}".format(str(sample_label[0]), str(predicted_id))
ax.set_title(result)
ind += 1
fig.show()
return ve, vloss
_, _ = run_validation(pred_valid, loss_valid, top_1_error_valid, input_image_valid, data_iterator_valid, with_visualized=True)

As you can see, the model fails to classify images properly. Now, let’s begin the finetuning and see how performance improves.
Start Finetuning¶
Let’s prepare the monitor for training.
from nnabla.monitor import Monitor, MonitorSeries, MonitorTimeElapsed
monitor = Monitor("tmp.monitor")
monitor_loss = MonitorSeries("Training loss", monitor, interval=200)
monitor_err = MonitorSeries("Training error", monitor, interval=200)
monitor_vloss = MonitorSeries("Test loss", monitor, interval=200)
monitor_verr = MonitorSeries("Test error", monitor, interval=200)
# Training-loop
for i in range(max_iter):
image, label = data_iterator_train.next()
input_image_train["image"].d = image
input_image_train["label"].d = label
nn.forward_all([loss_train, top_1_error_train], clear_no_need_grad=True)
monitor_loss.add(i, loss_train.d.copy())
monitor_err.add(i, top_1_error_train.d.copy())
solver.zero_grad()
loss_train.backward(clear_buffer=True)
# update parameters
solver.weight_decay(3e-4)
solver.update()
if i % 200 == 0:
ve, vloss = run_validation(pred_valid, loss_valid, top_1_error_valid,
input_image_valid, data_iterator_valid,
with_visualized=False, num_visualized=3)
monitor_vloss.add(i, vloss)
monitor_verr.add(i, ve)
2019-07-05 14:26:26,885 [nnabla][INFO]: iter=199 {Training loss}=1.5021580457687378
2019-07-05 14:26:26,887 [nnabla][INFO]: iter=199 {Training error}=0.3345312476158142
2019-07-05 14:26:28,756 [nnabla][INFO]: iter=200 {Test loss}=2.975713219355654
2019-07-05 14:26:28,756 [nnabla][INFO]: iter=200 {Test error}=0.5384837962962963
2019-07-05 14:26:50,249 [nnabla][INFO]: iter=399 {Training loss}=0.22022955119609833
2019-07-05 14:26:50,250 [nnabla][INFO]: iter=399 {Training error}=0.053437501192092896
2019-07-05 14:26:52,256 [nnabla][INFO]: iter=400 {Test loss}=0.12045302835327608
2019-07-05 14:26:52,257 [nnabla][INFO]: iter=400 {Test error}=0.029513888888888888
2019-07-05 14:27:14,151 [nnabla][INFO]: iter=599 {Training loss}=0.0659928247332573
2019-07-05 14:27:14,152 [nnabla][INFO]: iter=599 {Training error}=0.012500000186264515
2019-07-05 14:27:16,175 [nnabla][INFO]: iter=600 {Test loss}=0.08744175952893717
2019-07-05 14:27:16,175 [nnabla][INFO]: iter=600 {Test error}=0.02199074074074074
2019-07-05 14:27:38,097 [nnabla][INFO]: iter=799 {Training loss}=0.03324155509471893
2019-07-05 14:27:38,098 [nnabla][INFO]: iter=799 {Training error}=0.0054687499068677425
2019-07-05 14:27:40,120 [nnabla][INFO]: iter=800 {Test loss}=0.07678695395588875
2019-07-05 14:27:40,121 [nnabla][INFO]: iter=800 {Test error}=0.02025462962962963
2019-07-05 14:28:02,041 [nnabla][INFO]: iter=999 {Training loss}=0.019672293215990067
2019-07-05 14:28:02,042 [nnabla][INFO]: iter=999 {Training error}=0.0017187499906867743
2019-07-05 14:28:04,064 [nnabla][INFO]: iter=1000 {Test loss}=0.06333287184437116
2019-07-05 14:28:04,065 [nnabla][INFO]: iter=1000 {Test error}=0.017361111111111112
2019-07-05 14:28:25,984 [nnabla][INFO]: iter=1199 {Training loss}=0.009992362931370735
2019-07-05 14:28:25,985 [nnabla][INFO]: iter=1199 {Training error}=0.0003124999930150807
2019-07-05 14:28:28,008 [nnabla][INFO]: iter=1200 {Test loss}=0.06950318495984431
2019-07-05 14:28:28,008 [nnabla][INFO]: iter=1200 {Test error}=0.015625
2019-07-05 14:28:49,954 [nnabla][INFO]: iter=1399 {Training loss}=0.007941835559904575
2019-07-05 14:28:49,955 [nnabla][INFO]: iter=1399 {Training error}=0.0003124999930150807
2019-07-05 14:28:51,978 [nnabla][INFO]: iter=1400 {Test loss}=0.06711215277512868
2019-07-05 14:28:51,979 [nnabla][INFO]: iter=1400 {Test error}=0.016203703703703703
2019-07-05 14:29:13,898 [nnabla][INFO]: iter=1599 {Training loss}=0.008225565776228905
2019-07-05 14:29:13,899 [nnabla][INFO]: iter=1599 {Training error}=0.0007812500116415322
2019-07-05 14:29:15,923 [nnabla][INFO]: iter=1600 {Test loss}=0.06447940292181792
2019-07-05 14:29:15,923 [nnabla][INFO]: iter=1600 {Test error}=0.016203703703703703
2019-07-05 14:29:37,850 [nnabla][INFO]: iter=1799 {Training loss}=0.005678100511431694
2019-07-05 14:29:37,850 [nnabla][INFO]: iter=1799 {Training error}=0.0
2019-07-05 14:29:39,873 [nnabla][INFO]: iter=1800 {Test loss}=0.06282947226255028
2019-07-05 14:29:39,873 [nnabla][INFO]: iter=1800 {Test error}=0.01678240740740741
2019-07-05 14:30:01,795 [nnabla][INFO]: iter=1999 {Training loss}=0.006834140978753567
2019-07-05 14:30:01,796 [nnabla][INFO]: iter=1999 {Training error}=0.00046874998952262104
2019-07-05 14:30:03,818 [nnabla][INFO]: iter=2000 {Test loss}=0.05948294078310331
2019-07-05 14:30:03,818 [nnabla][INFO]: iter=2000 {Test error}=0.014467592592592593
As you see, the loss and error rate is decreasing as the finetuning progresses. Let’s see the classification result after finetuning.
_, _ = run_validation(pred_valid, loss_valid, top_1_error_valid, input_image_valid, data_iterator_valid, with_visualized=True)

You can see now the model is able to classify the image properly.
Finetuning more¶
we have a convenient script named finetuning.py
. By using this, you
can try finetuning with different models even on your original
dataset.
To do this, you need to prepare your own dataset and do some preprocessing. We will explain how to do this in the following.
Prepare your dataset¶
Suppose you have a lot of images which can be used for image
classification. You need to organize your data in a certain manner.
Here, we will explain that with another dataset, Stanford Dogs
Dataset. First,
visit the official page and download images.tar
(here is the direct
link).
Next, untar the archive and then you will see a directory named
Images
. Inside that directory, there are many subdirectories and
each subdirectory stores images which belong to 1 category. For example,
a directory n02099712-Labrador_retriever
contains labrador
retriever’s images only. So if you want to use your own dataset, you
need to organize your images and directiories in the same way like the
following;
parent_directory
├── subdirectory_for_category_A
│ ├── image_0.jpg
│ ├── image_1.jpg
│ ├── image_2.jpg
│ ├── ...
│
├── subdirectory_for_category_B
│ ├── image_0.jpg
│ ├── ...
│
├── subdirectory_for_category_C
│ ├── image_0.jpg
│ ├── ...
│
├── subdirectory_for_category_D
│ ├── image_0.jpg
│ ├── ...
│
...
The numbers of images in each category can vary, do not have to be exactly the same. Once you arrange your dataset, now you’re good to go!
Create image classification dataset using NNabla CLI¶
Now that you prepare and organize your dataset, the only thing you have
to do is to create a .csv
file which will be used in
finetuning.py
. To do so, you can use NNabla’s Python Command Line
Interface.
Just type like the following.
nnabla_cli create_image_classification_dataset -i <path to parent directory> -o <output directory which contains "preprocessed" images> -c <number of channels> -w <width> -g <height> -m <padding or trimming> -s <whether apply shuffle or not> -f1 <name of the output csv file for training data> -f2 <name of the output csv file for test data> -r2 <ratio(%) of test data to training data>
If you do that on Stanford Dogs Dataset,
nnabla_cli create_image_classification_dataset -i Images -o arranged_images -c 3 -w 128 -g 128 -m padding -s true -f1 stanford_dog_train.csv -f2 stanford_dog_test.csv -r2 20
Note that output .csv
file will be stored in the same directory you
specified with -o option. For more information, please check the
docs.
After executing the command above, you can start finetuning on your dataset.
Run finetuning¶
All you need is just to type one line.
python finetuning.py --model <model name> --train-csv <.csv file containing training data> --test-csv <.csv file containing test data>
It will execute finetuning on your dataset!
run finetuning.py --model ResNet34 --epoch 10 --train-csv ~/nnabla_data/stanford_dog_arranged/stanford_dog_train.csv --test-csv ~/nnabla_data/stanford_dog_arranged/stanford_dog_test.csv --shuffle True
An example of how to use finetuning’s result for inference¶
Once the finetuning finished, let’s use it for inference! The script above has saved the parameters at every certain iteration you specified. So now call the same model you trained and this time let’s use the finetuned parameters in the following way.
from nnabla.models.imagenet import ResNet34
import nnabla as nn
param_path = "params_XXX.h5" # specify the path to the saved parameter (.h5)
model = ResNet34()
batch_size = 1 # just for inference
input_shape = (batch_size, ) + model.input_shape
Then define an input Variable and a network for inference. Note that you need to construct the network exactly the same way as done in finetuning script (layer configuration, parameters names, and so on…).
x = nn.Variable(input_shape) # input Variable
pooled = model(x, use_up_to="pool", training=False)
with nn.parameter_scope("finetuning"):
with nn.parameter_scope("last_fc"):
pred = PF.affine(pooled, 120)
Load the parameters which you finetuned above. You can use
nn.load_parameters()
to load the parameters. Once you call this, the
parameters stored in the params.h5
will be stored in global scope.
You can check the parameters are different before and after
nn.load_parameters()
by using nn.get_parameters()
.
nn.load_parameters(param_path) # load the finetuned parameters.
pred.forward()
Debugging¶
Deep neural networks are going deeper and deeper every year, requiring more components in the networks. Such complexity often misleads us to mal-configure the networks that can turn out be critical. Even if we correctly configure a neural network as desired, we may still want to find out its performance bottleneck, e.g., from which layer(s) the computational bottleneck comes.
In this debugging tutorial, we introduce the following ways to deal with such cases:
visit
method of a variablepretty-print
simple graph viewer
profiling utils
value tracer
We will go over each technique, but first prepare the following reference model.
# If you run this notebook on Google Colab, uncomment and run the following to set up dependencies.
# !pip install nnabla-ext-cuda100
# !git clone https://github.com/sony/nnabla.git
# %cd nnabla/tutorial
# Python2/3 compatibility
from __future__ import print_function
from __future__ import absolute_import
from __future__ import division
import numpy as np
import nnabla as nn
import nnabla.logger as logger
import nnabla.functions as F
import nnabla.parametric_functions as PF
import nnabla.solvers as S
def block(x, maps, test=False, name="block"):
h = x
with nn.parameter_scope(name):
with nn.parameter_scope("in-block-1"):
h = PF.convolution(h, maps, kernel=(3, 3), pad=(1, 1), with_bias=False)
h = PF.batch_normalization(h, batch_stat=not test)
h = F.relu(h)
with nn.parameter_scope("in-block-2"):
h = PF.convolution(h, maps // 2, kernel=(3, 3), pad=(1, 1), with_bias=False)
h = PF.batch_normalization(h, batch_stat=not test)
h = F.relu(h)
with nn.parameter_scope("in-block-3"):
h = PF.convolution(h, maps, kernel=(3, 3), pad=(1, 1), with_bias=False)
h = PF.batch_normalization(h, batch_stat=not test)
if h.shape[1] != x.shape[1]:
with nn.parameter_scope("skip"):
s = PF.convolution(x, maps, kernel=(3, 3), pad=(1, 1), with_bias=False)
s = PF.batch_normalization(s, batch_stat=not test)
return F.relu(h + s)
def network(x, maps=16, test=False):
h = x
h = PF.convolution(h, maps, kernel=(3, 3), pad=(1, 1), name="first-conv", with_bias=False)
h = PF.batch_normalization(h, batch_stat=not test, name="first-bn")
h = F.relu(h)
for l in range(4):
h = block(h, maps * 2 ** (l + 1), name="block-{}".format(l))
h = F.max_pooling(h, (2, 2))
h = F.average_pooling(h, h.shape[2:])
pred = PF.affine(h, 100, name="pred")
return pred
Visit Method¶
Visit method of a variable takes either lambda, function, callable object as an argument and calls it over all NNabla functions where the variable can traverse in the forward order. It is easier to see the usage than expalined.
First of all, define the callable class.
class PrintFunc(object):
def __call__(self, nnabla_func):
print("==========")
print(nnabla_func.info.type_name)
print(nnabla_func.inputs)
print(nnabla_func.outputs)
print(nnabla_func.info.args)
This callable object takes a NNabla function, e.g., convolution, relu, etc., so a user can get information of that function.
nn.clear_parameters() # this call is just in case to do the following code again
x = nn.Variable.from_numpy_array(np.random.randn(*[4, 3, 128, 128]))
pred = network(x)
pred.visit(PrintFunc())
This is the low-level API to see the graph information as you want by hand.
PPrint¶
PPrint method is one of the instantiation of the visit method. We can see the graph structure in the topological (forward) order in details. Here is a usage to see detailed information of a graph.
nn.clear_parameters() # call this in case you want to run the following code agian
x = nn.Variable.from_numpy_array(np.random.randn(*[4, 3, 128, 128]))
pred = network(x)
# pprint
from nnabla.utils.inspection import pprint
pprint(pred, summary=True, forward=True, backward=True)
Simple Graph Viewer¶
Visit method is very useful for getting information about each function used in a graph, but it is hard to see the details of the whole network structure, e.g., which variable is connected to which variable. So we have a graph viewer that visually shows the whole structure of network, enabling us to debug more efficiently. Using this graph viewer is straightforward, as shown in the following code:
nn.clear_parameters() # call this in case you want to run the following code agian
x = nn.Variable([4, 3, 128, 128])
pred = network(x)
import nnabla.experimental.viewers as V
graph = V.SimpleGraph(verbose=False)
graph.view(pred)
If one would like to see more detailed information as in visit
method case, change verbose option to True
.
graph = V.SimpleGraph(verbose=True)
graph.view(pred)
Now one can see detailed information!
Note that this viewer is mainly for NNabla users who want to write codes in python, so for those who like to see more beautiful network and play with that, please use Neural Network Console and visit https://dl.sony.com/.
Profiling Utils¶
Basically, this feature is for developers who want to know the whole stats in speed and which functions could be bottlenecks. NNabla provides a simple profiling tool. Once a network is prepared, one better to have other components to train the network like a loss function and solvers.
To create the profiler and see the results, run the following codes.
nn.clear_parameters() # call this in case you want to run the following code agian
# Context
from nnabla.ext_utils import get_extension_context
device = "cudnn"
ctx = get_extension_context(device)
nn.set_default_context(ctx)
# Network
x = nn.Variable.from_numpy_array(np.random.randn(*[4, 3, 128, 128]))
t = nn.Variable([4, 1])
pred = network(x)
loss = F.mean(F.softmax_cross_entropy(pred, t))
# Solver
solver = S.Momentum()
solver.set_parameters(nn.get_parameters())
# Profiler
from nnabla.utils.profiler import GraphProfiler
B = GraphProfiler(loss, solver=solver, device_id=0, ext_name=device, n_run=100)
B.run()
print("Profile finished.")
# Report
from nnabla.utils.profiler import GraphProfilerCsvWriter
with open("./profile.csv", "w") as f:
writer = GraphProfilerCsvWriter(B, file=f)
writer.write()
print("Report is prepared.")
You can also find TimeProfiler to profile, but it is more fine-grained in measureing execution time.
With TimeProfiler, you can put a callback function to the forward and/or backward method in the training loop.
Value Tracer¶
We sometimes want to check if there exsits NaN/Inf. NanInfTracer is a convenient way to check if one of all layers in a graph has NaN/Inf value.
# Create graph again just in case
nn.clear_parameters() # call this in case you want to run the following code agian
# Try to switch these two
x = nn.Variable.from_numpy_array(np.random.randn(*[4, 3, 64, 64]))
#x = nn.Variable([4, 3, 64, 64])
pred = network(x)
# NanInfTracer
from nnabla.utils.inspection import NanInfTracer
nit = NanInfTracer(trace_inf=True, trace_nan=True, need_details=True)
with nit.trace():
# Try to comment either of these two or both
pred.forward(function_post_hook=nit.forward_post_hook)
pred.backward(function_post_hook=nit.backward_post_hook)
print(nit.check())
Static vs Dynamic Neural Networks in NNabla¶
NNabla allows you to define static and dynamic neural networks. Static neural networks have a fixed layer architecture, i.e., a static computation graph. In contrast, dynamic neural networks use a dynamic computation graph, e.g., randomly dropping layers for each minibatch.
This tutorial compares both computation graphs.
%matplotlib inline
import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import nnabla.solvers as S
import numpy as np
np.random.seed(0)
GPU = 0 # ID of GPU that we will use
2017-06-26 23:10:05,832 [nnabla][INFO]: Initializing CPU extension...
Dataset loading¶
We will first setup the digits dataset from scikit-learn:
from tiny_digits import *
digits = load_digits()
data = data_iterator_tiny_digits(digits, batch_size=16, shuffle=True)
2017-06-26 23:10:06,042 [nnabla][INFO]: DataSource with shuffle(True)
2017-06-26 23:10:06,043 [nnabla][INFO]: Using DataSourceWithMemoryCache
2017-06-26 23:10:06,044 [nnabla][INFO]: DataSource with shuffle(True)
2017-06-26 23:10:06,044 [nnabla][INFO]: On-memory
2017-06-26 23:10:06,045 [nnabla][INFO]: Using DataIterator
Each sample in this dataset is a grayscale image of size 8x8 and belongs
to one of the ten classes 0
, 1
, …, 9
.
img, label = data.next()
print(img.shape, label.shape)
(16, 1, 8, 8) (16, 1)
Network definition¶
As an example, we define a (unnecessarily) deep CNN:
def cnn(x):
"""Unnecessarily Deep CNN.
Args:
x : Variable, shape (B, 1, 8, 8)
Returns:
y : Variable, shape (B, 10)
"""
with nn.parameter_scope("cnn"): # Parameter scope can be nested
with nn.parameter_scope("conv1"):
h = F.tanh(PF.batch_normalization(
PF.convolution(x, 64, (3, 3), pad=(1, 1))))
for i in range(10): # unnecessarily deep
with nn.parameter_scope("conv{}".format(i + 2)):
h = F.tanh(PF.batch_normalization(
PF.convolution(h, 128, (3, 3), pad=(1, 1))))
with nn.parameter_scope("conv_last"):
h = F.tanh(PF.batch_normalization(
PF.convolution(h, 512, (3, 3), pad=(1, 1))))
h = F.average_pooling(h, (2, 2))
with nn.parameter_scope("fc"):
h = F.tanh(PF.affine(h, 1024))
with nn.parameter_scope("classifier"):
y = PF.affine(h, 10)
return y
Static computation graph¶
First, we will look at the case of a static computation graph where the neural network does not change during training.
from nnabla.ext_utils import get_extension_context
# setup cuda extension
ctx_cuda = get_extension_context('cudnn', device_id=GPU) # replace 'cudnn' by 'cpu' if you want to run the example on the CPU
nn.set_default_context(ctx_cuda)
# create variables for network input and label
x = nn.Variable(img.shape)
t = nn.Variable(label.shape)
# create network
static_y = cnn(x)
static_y.persistent = True
# define loss function for training
static_l = F.mean(F.softmax_cross_entropy(static_y, t))
2017-06-26 23:10:06,350 [nnabla][INFO]: Initializing CUDA extension...
2017-06-26 23:10:06,571 [nnabla][INFO]: Initializing cuDNN extension...
Setup solver for training
solver = S.Adam(alpha=1e-3)
solver.set_parameters(nn.get_parameters())
Create data iterator
loss = []
def epoch_end_callback(epoch):
global loss
print("[{} {} {}]".format(epoch, np.mean(loss), itr))
loss = []
data = data_iterator_tiny_digits(digits, batch_size=16, shuffle=True)
data.register_epoch_end_callback(epoch_end_callback)
2017-06-26 23:10:07,221 [nnabla][INFO]: DataSource with shuffle(True)
2017-06-26 23:10:07,224 [nnabla][INFO]: Using DataSourceWithMemoryCache
2017-06-26 23:10:07,226 [nnabla][INFO]: DataSource with shuffle(True)
2017-06-26 23:10:07,228 [nnabla][INFO]: On-memory
2017-06-26 23:10:07,230 [nnabla][INFO]: Using DataIterator
Perform training iterations and output training loss:
%%time
for epoch in range(30):
itr = 0
while data.epoch == epoch:
x.d, t.d = data.next()
static_l.forward(clear_no_need_grad=True)
solver.zero_grad()
static_l.backward(clear_buffer=True)
solver.update()
loss.append(static_l.d.copy())
itr += 1
print()
[ 0 0.909297 112 ] [ 1 0.183863 111 ] [ 2 0.0723054 111 ] [ 3 0.0653021 112 ] [ 4 0.0628503 111 ] [ 5 0.0731626 111 ] [ 6 0.0319093 112 ] [ 7 0.0610926 111 ] [ 8 0.0817437 111 ] [ 9 0.0717577 112 ] [ 10 0.0241882 111 ] [ 11 0.0119452 111 ] [ 12 0.00664761 112 ] [ 13 0.00377711 111 ] [ 14 0.000605656 111 ] [ 15 0.000236613 111 ] [ 16 0.000174549 112 ] [ 17 0.000142428 111 ] [ 18 0.000126015 111 ] [ 19 0.000111144 112 ] [ 20 0.000100751 111 ] [ 21 9.03808e-05 111 ] [ 22 8.35904e-05 112 ] [ 23 7.73492e-05 111 ] [ 24 6.91389e-05 111 ] [ 25 6.74929e-05 112 ] [ 26 6.08386e-05 111 ] [ 27 5.62182e-05 111 ] [ 28 5.33428e-05 112 ] [ 29 4.94594e-05 111 ]
CPU times: user 14.3 s, sys: 6.78 s, total: 21.1 s
Wall time: 21.1 s
Dynamic computation graph¶
Now, we will use a dynamic computation graph, where the neural network
is setup each time we want to do a forward/backward pass through it.
This allows us to, e.g., randomly dropout layers or to have network
architectures that depend on input data. In this example, we will use
for simplicity the same neural network structure and only dynamically
create it. For example, adding a
if np.random.rand() > dropout_probability:
into cnn()
allows to
dropout layers.
First, we setup the solver and the data iterator for the training:
nn.clear_parameters()
solver = S.Adam(alpha=1e-3)
solver.set_parameters(nn.get_parameters())
loss = []
def epoch_end_callback(epoch):
global loss
print("[{} {} {}]".format(epoch, np.mean(loss), itr))
loss = []
data = data_iterator_tiny_digits(digits, batch_size=16, shuffle=True)
data.register_epoch_end_callback(epoch_end_callback)
2017-06-26 23:10:28,449 [nnabla][INFO]: DataSource with shuffle(True)
2017-06-26 23:10:28,450 [nnabla][INFO]: Using DataSourceWithMemoryCache
2017-06-26 23:10:28,450 [nnabla][INFO]: DataSource with shuffle(True)
2017-06-26 23:10:28,451 [nnabla][INFO]: On-memory
2017-06-26 23:10:28,451 [nnabla][INFO]: Using DataIterator
%%time
for epoch in range(30):
itr = 0
while data.epoch == epoch:
x.d, t.d = data.next()
with nn.auto_forward():
dynamic_y = cnn(x)
dynamic_l = F.mean(F.softmax_cross_entropy(dynamic_y, t))
solver.set_parameters(nn.get_parameters(), reset=False, retain_state=True) # this can be done dynamically
solver.zero_grad()
dynamic_l.backward(clear_buffer=True)
solver.update()
loss.append(dynamic_l.d.copy())
itr += 1
print()
[ 0 1.04669 112 ] [ 1 0.151949 111 ] [ 2 0.093581 111 ] [ 3 0.129242 112 ] [ 4 0.0452591 111 ] [ 5 0.0343987 111 ] [ 6 0.0315372 112 ] [ 7 0.0336886 111 ] [ 8 0.0194571 111 ] [ 9 0.00923094 112 ] [ 10 0.00536065 111 ] [ 11 0.000669383 111 ] [ 12 0.000294232 112 ] [ 13 0.000245866 111 ] [ 14 0.000201116 111 ] [ 15 0.000164177 111 ] [ 16 0.00014832 112 ] [ 17 0.000131479 111 ] [ 18 0.000115171 111 ] [ 19 0.000101432 112 ] [ 20 9.06228e-05 111 ] [ 21 8.7103e-05 111 ] [ 22 7.79601e-05 112 ] [ 23 7.59678e-05 111 ] [ 24 6.64341e-05 111 ] [ 25 6.22717e-05 112 ] [ 26 5.8643e-05 111 ] [ 27 5.35373e-05 111 ] [ 28 4.96717e-05 112 ] [ 29 4.65124e-05 111 ]
CPU times: user 23.4 s, sys: 5.35 s, total: 28.7 s
Wall time: 28.7 s
Comparing the two processing times, we can observe that both schemes (“static” and “dynamic”) takes the same execution time, i.e., although we created the computation graph dynamically, we did not lose performance.
Graph Converters¶
As neural networks becomes complex and one of components in a system, we sometimes want to convert a network as we want. Typical usecase is for inference. We want to merge or change some layers in a network as a high-level optimization for the inference speed. Also, there are other usecases: adding new layers to keep track some stats, adding quantize/dequantize layers for a quantized inference, decomposing a layer as combination of a low-rank ones, changing a network architecture for the neural architecture search based on an original network architecture, changing the tensor format from the channel first to channel last and opposite, and so on.
Let’s look at the simple cases 1. batch normalization folding 2. channel last conversion
As a reference network, use the follows.
# ResNet-50 for inference
import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import numpy as np
from nnabla.utils.inspection import pprint
from nnabla.models.imagenet import ResNet50
model = ResNet50()
batch_size = 1
x = nn.Variable((batch_size,) + model.input_shape)
y = model(x, training=False)
Batch Normalization Folding¶
See the resnet architecture.
pprint(y)
Now, we can see the batch normalization. For the inference, we do not need to compute the batch normalization explicitly by folding the batch normalization parameters if there is e.g., a convolution before the batch normalization.
To fold the batch normalization, use BatchNormalizationFoldingModifier as the following.
import nnabla.experimental.graph_converters as GC
modifiers = [GC.BatchNormalizationFoldingModifier()]
gc = GC.GraphConverter(modifiers)
yy = gc.convert(y)
Again, see the resnet architecture converted.
pprint(yy)
You can see that the converterd network does not contain the batch normalization any more!
In some cases, we can not fold the batch normalization, but the batch normalization can also be self-folded, i.e., the four parameters: scale, bias, running mean, running variance can be two other scale and bias. For doing this, use BatchNormalizationSelfFoldingModifier.
Channel Last Conversion¶
In NVIDIA latest GPU architectures since Volta, it supports TensorCore to accelerate the computatoinal performance. To boost the performance as maximum as possible, we need the channel-last tensor format aka NHWC. In NNabla, the default tensor format is the channel first aka NCHW, so as to utilize TensorCore, we need to change the tensor format to NHWC format.
ChannelLastModifier convert a network with NCHW tesnor format to another network with NHWC tensor format.
import nnabla.experimental.graph_converters as GC
modifiers = [GC.ChannelLastModifier([x])]
gc = GC.GraphConverter(modifiers)
yy = gc.convert(y)
Let’s see the resnet architecture converted.
pprint(yy)
We can find the channel dimension changed at the last!
If we want to access to the inputs of which tensor format converted,
x_cl = modifiers[0].inputs_cl[0]
print(x_cl)
Note that ChannelLastModifier supports a set of layers: Convolution, Deconvolution, BatchNormalization, MaxPooling, AveragePooling, SumPooling, Unpooling, Concatenate and also supposes NCHW format.
There also exists ChannelFirstModifier in the opposite change.
Mixed Precision Training¶
Introduction¶
Traditionally, for training a neural network, we used to use FP32
for weights and activations; however computation costs for training a
neural network rapidly increase over years as the success of deep
learning and the growing size of a neural network. It indicates that we
need to spend much more time for training a huge size of a neural
network while we would like to do lots of trials before a product
launch. To address this problem, companies (e.g., NVIDIA) introduced an
accelerator for speeding up computation. For example, NVIDIA Volta has
Tensor
Cores
to speed up computation.
However, it uses FP16
weights, activations, gradients, and the range
of FP16
is very limited when compared to that of FP32
, meaning
that sometimes (or often) values of gradients overflow and/or underflow,
which affects the performance of a neural network or makes it collapse
during training.
Mixed precision training is one of the algorithms to circumvent that
problem while maintaining the same results that we could obtain with
FP32
networks. It is well-described in The Training with Mixed
Precision User
Guide
and Mixed Precision Training.
This tutorial explains how to do the mixed precision training in NNabla step-by-step.
Step-by-Step Instruction¶
Basically, the mixed precision training are composed of three parts.
Use the accelerator for computation (here we assume Tensor Cores)
Use loss scaling to prevent underflow
Use dynamic loss scaling to prevent overflow/underflow
In NNabla, we can do the correspondences as follows.
1. Use Tensor Cores¶
ctx = get_extension_context("cudnn", type_config="half")
2. Use loss scaling to prevent underflow¶
loss_scale = 8
loss.backward(loss_scale)
solver.scale_grad(1. / loss_scale) # do some gradient clipping, etc. after this
solver.update()
3. Use dynamic loss scaling to prevent overflow/underflow¶
loss_scale = 8
scaling_factor = 2
counter = 0
interval = 2000
...
loss.backward(loss_scale, ...)
...
if solver.check_inf_or_nan_grad():
loss_scale /= scaling_factor
counter = 0
else:
solver.scale_grad(1. / loss_scale) # do some gradient clipping, etc. after this
solver.update()
if counter > interval:
loss_scale *= scaling_factor
counter = 0
counter += 1
Note that currently the procedures of 2nd (Use loss scaling to prevent underflow) and 3rd (Use loss scaling to prevent overflow) are experimental, and we are now trying to speed up the mixed precision training, so API might change for future use, especially 3rd.
All-in-one Instruction¶
In the previous step-by-step example, the 3rd step is lengthy in a training loop, thus we can write a wrapper class like the following.
class DynamicLossScalingUpdater(object):
'''Dynamic Loss Scaling Updater for the mixed precision training.
Args:
solver (:obj:`nnabla.solvers.Solver`): Solver object. E.g., Momentum or Adam.
loss (:obj:`nnabla.Variable`): Loss variable from which the forward and the backward is called.
data_feeder (callable :obj:`object`, function, or lambda): Data feeder
scale (:obj:`float`): Loss scale constant. This is dynamically changing during training.
scaling_factor (:obj:`float`): Scaling factor for the dynamic loss scaling.
N (:obj:`int`): Interval, the number of iterations in training for increasing `loss scale` by `scaling_factor`.
clear_buffer (:obj:`bool`): Clears the no longer referenced variables during backpropagation to save memory.
accum_grad (:obj:`int`): Number of accumulation of gradients. Update method of the `solver` is called after the `accum_grad` number of the forward and backward is called.
weight_decay (:obj:`float`): Decay constant. Default is `None`, not applying the weight decay.
comm (:obj:`nnabla.communicators.Communicator`): Communicator when to do distributed training. Default is :obj:`None`.
grads (:obj:`list` of :obj:`nnabla._nd_array.NdArray`): The list of gradients to be exchanged when to do distributed training. Default is the empty :obj:`list`.
Attributes:
solver (:obj:`nnabla.solvers.Solver`): Solver object. E.g., Momentum or Adam.
loss (:obj:`nnabla.Variable`): Loss variable from which the forward and the backward is called.
data_feeder (callable :obj:`object`, function, lambda): Data feeder
scale (:obj:`float`): Loss scale constant. This is dynamically changing during training.
scaling_factor (:obj:`float`): Scaling factor for the dynamic loss scaling.
N (:obj:`int`): Interval, the number of iterations in training for increasing `loss scale` by `scaling_factor`.
clear_buffer (:obj:`bool`): Clears the no longer referenced variables during backpropagation to save memory.
accum_grad (:obj:`int`): Number of accumulation of gradients. Update method of the `solver` is called after the `accum_grad` number of the forward and backward is called.
weight_decay (:obj:`float`): Decay constant. Default is `None`, not applying the weight decay.
comm (:obj:`nnabla.communicators.Communicator`): Communicator when to do distributed training.
grads (:obj:`list` of :obj:`nnabla._nd_array.NdArray`): The list of gradients to be exchanged when to do distributed training.
Example:
.. code-block:: python
solver = <Solver>
loss = <Loss Variable of Network>
data_feeder = <DataFeeder>
updater = DynamicLossScalingUpdater(solver, loss, data_feeder)
# Training iteration
for itr in range(max_iter):
# Call solver.zero_grad, data_feeder, loss.forward, loss.backward
# and solver.update with the dynamic loss scaling.
updater.update()
Reference:
https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#scalefactor
'''
def __init__(self, solver, loss, data_feeder=lambda x: x,
scale=8.0, scaling_factor=2.0, N=2000, clear_buffer=True,
accum_grad=1, weight_decay=None,
comm=None,
grads=[]):
self.solver = solver
self.loss = loss
self.data_feeder = data_feeder
self.scale = scale
self.scaling_factor = scaling_factor
self.N = N
self.clear_buffer = clear_buffer
self.accum_grad = accum_grad
self.weight_decay = weight_decay
self.comm = comm
self.grads = grads
self._counter = 0
self._recursive_count = 0
self._max_recursive_count = 100
def update(self):
"""Monolithic update method.
This method calls the following methods with the dynamic loss scaling.
1. solver.zerograd
2. feed data
3. loss.forward
4. loss.backward
5. comm.all_reduce (if it is specified)
6. solver.update
"""
# Initialize gradients.
self.solver.zero_grad()
# Forward and backward
for _ in range(self.accum_grad):
# feed data
self.data_feeder()
# forward
self.loss.forward(clear_no_need_grad=self.clear_buffer)
# backward with scale
self.loss.backward(self.scale, clear_buffer=self.clear_buffer)
# AllReduce
if self.comm and len(self.grads) != 0:
self.comm.all_reduce(self.grads, division=False, inplace=False)
# Check Inf/NaN in grads
if self.solver.check_inf_or_nan_grad():
self.scale /= self.scaling_factor
self._counter = 0
# Recursively call udpate function until no inf nor nan.
self._recursive_count += 1
if self._recursive_count > self._max_recursive_count:
self._recursive_count = 0
return # skip
return self.update()
self._recursive_count = 0
# Rescale grads
self.solver.scale_grad(1. / self.scale)
# Do some gradient clipping, etc.
if self.weight_decay is not None:
self.solver.weight_decay(self.weight_decay)
# Update
self.solver.update()
if self._counter > self.N:
self.scale *= self.scaling_factor
self._counter = 0
self._counter += 1
Then, call the update method in a training loop:
from nnabla.experimental.mixed_precision_training import DynamicLossScalingUpdater
solver = <Solver>
loss = <Loss Variable of Network>
data_feeder = <DataFeeder>
updater = DynamicLossScalingUpdater(solver, loss, data_feeder)
# Training iteration
for itr in range(max_iter):
# Call solver.zero_grad, data_feeder, loss.forward, loss.backward
# and solver.update with the dynamic loss scaling.
updater.update()
Notice¶
In the mixed-precision training, the followings are premise:
Solver contains
FP16
weights and theFP32
copy of weights. Solvers in NNabla holdFP32
weights and weight gradients and cast it toFP16
weights in forward pass and toFP16
weight gradients in backward pass if one setstype_config="half"
.Reductions should be left in
FP32
, for examples, the statistics (mean and variance) computed by the batch-normalization, Mean, Sum, SoftMax, SoftMaxCrossEntropy, etc. (see The Training with Mixed Precision User Guide). In NNabla, these functions are automatically fallbacked to useFP32
.
Data Parallel Distributed Training¶
DataParallelCommunicator enables to train your neural network using multiple devices. It is normally used for gradients exchange in data parallel distributed training. Basically, there are two types of distributed trainings in Neural Network literature: Data Parallel and Model Parallel. Here we only focus on the former, Data Parallel Training. Data Parallel Distributed Training is based on the very simple equation used for the optimization of a neural network called (Mini-Batch) Stochastic Gradient Descent.
In the optimization process, the objective one tries to minimize is
where \(f\) is a neural network, \(B \times N\) is the batch size, \(\ell\) is a loss function for each data point \(\mathbf{x} \in X\), and \(\mathbf{w}\) is the trainable parameter of the neural network.
When taking the derivative of this objective, one gets,
Since the derivative has linearity, one can change the objective to the sum of summations each of which is the sum of derivatives over \(B\) data points.
In data parallel distributed training, the following steps are performed according to the above equation,
each term, summation of derivatives (gradients) divided by batch size \(B\), is computed on a separated device (typically GPU),
take the sum over devices,
divide the result by the number of devices, \(N\).
That is the underlying foundation of Data Parallel Distributed Training.
This tutorial shows the usage of Multi Process Data Parallel Communicator for data parallel distributed training with a very simple example.
NOTE¶
This tutorial depends on IPython Cluster, thus when you want to run the following excerpts of the scripts on Jupyter Notebook, follow this to enable mpiexec/mpirun mode, then launch a corresponding Ipython Cluster on Ipython Clusters tab.
Launch client¶
This code is only needed for this tutorial via Jupyter Notebook.
import ipyparallel as ipp
rc = ipp.Client(profile='mpi')
Prepare the dependencies¶
%%px
import os
import time
import nnabla as nn
import nnabla.communicators as C
from nnabla.ext_utils import get_extension_context
import nnabla.functions as F
from nnabla.initializer import (
calc_uniform_lim_glorot,
UniformInitializer)
import nnabla.parametric_functions as PF
import nnabla.solvers as S
import numpy as np
Define the communicator for gradients exchange.¶
%%px
extension_module = "cudnn"
ctx = get_extension_context(extension_module)
comm = C.MultiProcessCommunicator(ctx)
comm.init()
n_devices = comm.size
mpi_rank = comm.rank
device_id = mpi_rank
ctx = get_extension_context(extension_module, device_id=device_id)
Check different ranks are assigned to different devices
%%px
print("n_devices={}".format(n_devices))
print("mpi_rank={}".format(mpi_rank))
[stdout:0]
n_devices=2
mpi_rank=1
[stdout:1]
n_devices=2
mpi_rank=0
Create data points and a very simple neural network¶
%%px
# Data points setting
n_class = 2
b, c, h, w = 4, 1, 32, 32
# Data points
x_data = np.random.rand(b, c, h, w)
y_data = np.random.choice(n_class, b).reshape((b, 1))
x = nn.Variable(x_data.shape)
y = nn.Variable(y_data.shape)
x.d = x_data
y.d = y_data
# Network setting
C = 1
kernel = (3, 3)
pad = (1, 1)
stride = (1, 1)
%%px
rng = np.random.RandomState(0)
w_init = UniformInitializer(
calc_uniform_lim_glorot(C, C/2, kernel=(1, 1)),
rng=rng)
%%px
# Network
with nn.context_scope(ctx):
h = PF.convolution(x, C, kernel, pad, stride, w_init=w_init)
pred = PF.affine(h, n_class, w_init=w_init)
loss = F.mean(F.softmax_cross_entropy(pred, y))
Important notice here is that w_init
is passed to parametric
functions to let the network on each GPU start from the same values of
trainable parameters in the optimization process.
Create a solver.¶
%%px
# Solver and add parameters
solver = S.Adam()
solver.set_parameters(nn.get_parameters())
Training¶
Recall the basic usage of nnabla
API for training a neural network,
it is
loss.forward()
solver.zero_grad()
loss.backward()
solver.update()
In use of C.MultiProcessCommunicator
, these steps are
performed in different GPUs, and the only difference from these
steps is comm.all_reduce()
. Thus, in case of
C.MultiProcessCommunicator
training steps are as
follows,
loss.forward()
solver.zero_grad()
loss.backward()
comm.all_reduce([x.grad for x in nn.get_parameters().values()])
solver.update()
First, forward, zero_grad, and backward,
%%px
# Training steps
loss.forward()
solver.zero_grad()
loss.backward()
Check gradients of weights once,
%%px
for n, v in nn.get_parameters().items():
print(n, v.g)
[stdout:0]
('conv/W', array([[[[ 5.0180483, 0.457942 , -2.8701296],
[ 2.0715926, 3.0698593, -1.6650047],
[-2.5591214, 6.4248834, 9.881935 ]]]], dtype=float32))
('conv/b', array([8.658947], dtype=float32))
('affine/W', array([[-0.93160367, 0.9316036 ],
[-1.376812 , 1.376812 ],
[-1.8957546 , 1.8957543 ],
...,
[-0.33000934, 0.33000934],
[-0.7211893 , 0.72118926],
[-0.25237036, 0.25237036]], dtype=float32))
('affine/b', array([-0.48865744, 0.48865741], dtype=float32))
[stdout:1]
('conv/W', array([[[[ -1.2505884 , -0.87151337, -8.685524 ],
[ 10.738419 , 14.676786 , 7.483423 ],
[ 5.612471 , -12.880402 , 19.141157 ]]]], dtype=float32))
('conv/b', array([13.196114], dtype=float32))
('affine/W', array([[-1.6865108 , 1.6865108 ],
[-0.938529 , 0.938529 ],
[-1.028422 , 1.028422 ],
...,
[-0.98217344, 0.98217344],
[-0.97528917, 0.97528917],
[-0.413546 , 0.413546 ]], dtype=float32))
('affine/b', array([-0.7447065, 0.7447065], dtype=float32))
You can see the different values on each device, then call
all_reduce
,
%%px
comm.all_reduce([x.grad for x in nn.get_parameters().values()], division=True)
Commonly, all_reduce
only means the sum; however,
comm.all_reduce
addresses both cases: summation and summation
division.
Again, check gradients of weights,
%%px
for n, v in nn.get_parameters().items():
print(n, v.g)
[stdout:0]
('conv/W', array([[[[ 1.8837299 , -0.20678568, -5.777827 ],
[ 6.4050055 , 8.8733225 , 2.9092093 ],
[ 1.5266749 , -3.2277591 , 14.511546 ]]]], dtype=float32))
('conv/b', array([21.85506], dtype=float32))
('affine/W', array([[-2.6181145, 2.6181145],
[-2.315341 , 2.315341 ],
[-2.9241767, 2.9241762],
...,
[-1.3121828, 1.3121828],
[-1.6964785, 1.6964784],
[-0.6659163, 0.6659163]], dtype=float32))
('affine/b', array([-1.233364 , 1.2333639], dtype=float32))
[stdout:1]
('conv/W', array([[[[ 1.8837299 , -0.20678568, -5.777827 ],
[ 6.4050055 , 8.8733225 , 2.9092093 ],
[ 1.5266749 , -3.2277591 , 14.511546 ]]]], dtype=float32))
('conv/b', array([21.85506], dtype=float32))
('affine/W', array([[-2.6181145, 2.6181145],
[-2.315341 , 2.315341 ],
[-2.9241767, 2.9241762],
...,
[-1.3121828, 1.3121828],
[-1.6964785, 1.6964784],
[-0.6659163, 0.6659163]], dtype=float32))
('affine/b', array([-1.233364 , 1.2333639], dtype=float32))
You can see the same values over the devices because of all_reduce
.
Update weights,
%%px
solver.update()
This concludes the usage of C.MultiProcessDataCommunicator
for Data Parallel Distributed Training.
Now you should have an understanding of how to use
C.MultiProcessCommunicator
, go to the cifar10 example,
multi_device_multi_process_classification.sh
multi_device_multi_process_classification.py
for more details.
Function list and converter¶
nnabla_cli
is the command line interface of nnabla. With this command line interface, user may know current NNabla support status, and know whether or how to convert a nnabla model(e.g. *.nnp)
to other format of model(e.g. *.onnx).
The subcommand function_info
provides a set of functions to output implemented function information.
With this information, you may build tailored nnabla-c-runtime library for your model, or skip some unsupported
functions for the target model.
Some simple use cases¶
Please let us introduce some simple use cases:
At first, you want to know how many functions (which functions) nnabla currently supports:
$ nnabla_cli function_info
You get the following list:
2019-06-14 16:16:13,106 [nnabla][INFO]: Initializing CPU extension...
NNabla command line interface (Version:1.0.18.dev1, Build:190531084842)
LSTM
Sub2
Mul2
GreaterEqual
Sigmoid
NotEqual
Unpooling
Log
CategoricalCrossEntropy
...
That is the list of current nnabla all supported functions. Only function names are shown, no more detail, only for seeking certain function by name. For the detail of each function, you have to check with online document.
As you known, nnabla’s model *.nnp can be converted to a compact version, it has the postfix .nnb
, can be inferred by nnabla-c-runtime library. We simply named this format as NNB
. To know how many functions are supported in this format, you may use this command:
$ nnabla_cli function_info -f NNB
Similar as above, a function list is shown.
Do we simple list the functions used in a .nnp model? Yes, of course.
$ nnabla_cli function_info my_model.nnp
Similar as above, a function list used in this model is listed.
Then, we may know whether our model can be converted to nnabla-c-runtime model format, or formally speaking, we can know the intersection of 2 function sets, one is the function set in .nnp and the other is nnabla-c-runtime has supported.
$ nnabla_cli function_info my_model.nnp -f NNB
The output looks like:
2019-06-14 17:01:29,393 [nnabla][INFO]: Initializing CPU extension...
NNabla command line interface (Version:1.0.18.dev1, Build:190531084842)
Importing mnist_nnp/lenet_010000.nnp
Expanding runtime.
nnabla-c-runtime currently support the following functions in model:
Convolution
MulScalar
Affine
MaxPooling
ReLU
...
Unsupported functions are also listed up if there are any in this model.
Tailored nnabla-c-runtime library¶
When implementing nnabla-c-runtime library, we hope to implement all functions we can. But from customer’s aspect, that is sometimes no need. If user only wants to use nnabla-c-runtime for enumerable models, the nnabla-c-runtime should be tailed exactly as what these models required. How to do then?
It can be implemented with the following steps:
generate function list
config your nnabla-c-runtime library
build nnabla-c-runtime library
1. Generate function list¶
$ nnabla_cli function_info my_model.nnp -f NNB -o functions.txt
This is similar as above, except that with -o
parameter, which pointed out which file should be written to. (of course, the format is different from the version output to stdout, it is more compact)
2. config your nnabla-c-runtime library¶
User may manually modify functions.txt
. Then, this file is used as input, used to generate nnabla-c-runtime library’s config file:
$ nnabla_cli function_info -c functions.txt -o nnabla-c-runtime/build-tools/code-generator/functions.yaml
As we inferred, if there is no -c
parameter, a full function set will be used to generate this config file, of course, the library will finally contain all implemented functions. This is the default behavior.
3. build nnabla-c-runtime library¶
The build process is relatively directly, as the following:
#> nnabla-c-runtime>mkdir build
#> nnabla-c-runtime>cd build
#> nnabla-c-runtime>cmake ..
#> nnabla-c-runtime>make
The nnabla-c-runtime library libnnablart_functions.a
will contain the functions what you want.
Skip functions unsupported¶
When you want to convert *.nnp
to *.onnx
or *.nnb
, there are some functions are not supported in target function list. For example, you want to convert a network to nnabla-c-runtime. The network looks like:
Affine
Softmax
Tanh
Convolution
MaxPooling
ReLU
You do not want to use nnabla-c-runtime library’s Convolution
, you want to split the network in 2 pieces at the point of Convolution
. 2 Steps are needed to do so:
comment out the function in functions.txt
convert the network with
-c
parameter
2. convert the network with -c
parameter¶
$ nnabla_cli convert -c functions.txt a.nnp b.nnb
Thus, the network is splitted into pieces, the output shows as the following:
...
LeNet_036_0_5.nnb:
input:
- name: Input
shape: (-1, 1, 28, 28)
output:
- name: Tanh_2
shape: (-1, 30, 4, 4)
LeNet_036_7_7.nnb:
input:
- name: Affine
shape: (-1, 150)
output:
- name: ReLU_2
shape: (-1, 150)
LeNet_036_9_9.nnb:
input:
- name: Affine_2
shape: (-1, 10)
output:
- name: Softmax
shape: (-1, 10)
The network is split at the Affine
function. Since there are 2 Affine
in network, 3 sub-networks is generated.
Converting to ONNX¶
The following commands just do similar as above, exactly to *.onnx.
List all functions supported:
$ nnabla_cli function_info -f ONNX
List the intersection of function sets, in a model and supported by ONNX:
$ nnabla_cli function_info LeNet_036.nnp -f ONNX
Split network to skip some function:
$ nnabla_cli convert -c functions.txt a.nnp a.onnx
Python Command Line Interface¶
Nnabla has command line interface utility which can do train, forward(inference), convert param and dataset, measure performance, file format converter and so on.
usage: nnabla_cli [-h] [-m]
{train,infer,forward,encode_param,decode_param,profile,conv_dataset,compare_with_cpu,create_image_classification_dataset,upload,create_tar,function_info,optimize,dump,nnb_template,convert,plot_series,plot_timer,draw_graph,version}
...
Command line interface for NNabla(Version 1.0.11.dev1, Build 181226024531)
positional arguments:
{train,infer,forward,encode_param,decode_param,profile,conv_dataset,compare_with_cpu,create_image_classification_dataset,upload,create_tar,function_info,optimize,dump,nnb_template,convert,plot_series,plot_timer,draw_graph,version}
train Training with NNP.
infer Do inference with NNP and binary data file input.
forward Do evaluation with NNP and test dataset.
encode_param Encode plain text to parameter format.
decode_param Decode parameter to plain text.
profile Profiling performance with NNP.
conv_dataset Convert CSV dataset to cache.
compare_with_cpu Compare performance between two nntxt.
create_image_classification_dataset
Create dataset from image files.
upload Upload dataset to Neural Network Console.
create_tar Create tar file for Neural Network Console.
function_info Output function info.
optimize Optimize pb model.
dump Dump network with supported format.
nnb_template Generate NNB config file template.
convert File format converter.
plot_series Plot *.series.txt files.
plot_timer Plot *.timer.txt files.
draw_graph Draw a graph in a NNP or nntxt file with graphviz.
version Print version and build number.
optional arguments:
-h, --help show this help message and exit
-m, --mpi exec with mpi.
Work with NNP¶
Training¶
usage: nnabla_cli train [-h] -c CONFIG [-p PARAM] -o OUTDIR
optional arguments:
-h, --help show this help message and exit
-c CONFIG, --config CONFIG
path to nntxt
-p PARAM, --param PARAM
path to parameter file
-o OUTDIR, --outdir OUTDIR
output directory
Profile¶
usage: nnabla_cli profile [-h] -c CONFIG -o OUTDIR
optional arguments:
-h, --help show this help message and exit
-c CONFIG, --config CONFIG
path to nntxt
-o OUTDIR, --outdir OUTDIR
output directory
Forward¶
usage: nnabla_cli forward [-h] -c CONFIG [-p PARAM] [-d DATASET] -o OUTDIR [-b BATCH_SIZE]
optional arguments:
-h, --help show this help message and exit
-c CONFIG, --config CONFIG
path to nntxt
-p PARAM, --param PARAM
path to parameter file
-d DATASET, --dataset DATASET
path to CSV dataset
-o OUTDIR, --outdir OUTDIR
output directory
-b BATCH_SIZE, --batch_size BATCH_SIZE
Batch size to use batch size in nnp file set -1.
Inference¶
usage: nnabla_cli infer [-h] -c CONFIG [-o OUTPUT] [-p PARAM] [-b BATCH_SIZE] inputs [inputs ...]
positional arguments:
inputs
optional arguments:
-h, --help show this help message and exit
-c CONFIG, --config CONFIG
path to nntxt
-o OUTPUT, --output OUTPUT
output file prefix
-p PARAM, --param PARAM
path to parameter file
-b BATCH_SIZE, --batch_size BATCH_SIZE
Batch size to use batch size in nnp file set -1.
Compare with CPU¶
usage: nnabla_cli compare_with_cpu [-h] -c CONFIG -c2 CONFIG2 -o OUTDIR
optional arguments:
-h, --help show this help message and exit
-c CONFIG, --config CONFIG
path to nntxt
-c2 CONFIG2, --config2 CONFIG2
path to cpu nntxt
-o OUTDIR, --outdir OUTDIR
output directory
Dataset manipulation¶
Encode parameter¶
usage: nnabla_cli encode_param [-h] -i INDIR [-p PARAM]
optional arguments:
-h, --help show this help message and exit
-i INDIR, --indir INDIR
input directory
-p PARAM, --param PARAM
path to parameter file
Decode parameter¶
usage: nnabla_cli decode_param [-h] [-p PARAM] -o OUTDIR
optional arguments:
-h, --help show this help message and exit
-p PARAM, --param PARAM
path to parameter file
-o OUTDIR, --outdir OUTDIR
output directory
Convert dataset¶
usage: nnabla_cli conv_dataset [-h] [-F] [-S] [-N] source destination
positional arguments:
source
destination
optional arguments:
-h, --help show this help message and exit
-F, --force force overwrite destination
-S, --shuffle shuffle data
-N, --normalize normalize data range
Create image classification dataset¶
usage: nnabla_cli create_image_classification_dataset [-h] -i SOURCEDIR -o OUTDIR -c CHANNEL -w WIDTH -g HEIGHT -m MODE -s SHUFFLE -f1 FILE1 [-r1 RATIO1] [-f2 FILE2]
[-r2 RATIO2]
optional arguments:
-h, --help show this help message and exit
-i SOURCEDIR, --sourcedir SOURCEDIR
source directory with directories for each class
-o OUTDIR, --outdir OUTDIR
output directory
-c CHANNEL, --channel CHANNEL
number of output color channels
-w WIDTH, --width WIDTH
width of output image
-g HEIGHT, --height HEIGHT
height of output image
-m MODE, --mode MODE shaping mode (trimming or padding)
-s SHUFFLE, --shuffle SHUFFLE
shuffle mode (true or false)
-f1 FILE1, --file1 FILE1
output file name 1
-r1 RATIO1, --ratio1 RATIO1
output file ratio(%) 1
-f2 FILE2, --file2 FILE2
output file name 2
-r2 RATIO2, --ratio2 RATIO2
output file ratio(%) 2
Upload dataset to Neural Network Console¶
usage: nnabla_cli upload [-h] [-e ENDPOINT] token filename
positional arguments:
token token for upload
filename filename to upload
optional arguments:
-h, --help show this help message and exit
-e ENDPOINT, --endpoint ENDPOINT
set endpoint uri
Create dataset archive for Neural Network Console¶
usage: nnabla_cli create_tar [-h] source destination
positional arguments:
source CSV dataset
destination TAR filename
optional arguments:
-h, --help show this help message and exit
File format converter¶
For detailed information please see File format converter.
Dump content of supported format¶
usage: nnabla_cli dump [-h] [-v] [-F] [-V] [--dump-limit DUMP_LIMIT]
[-n DUMP_VARIABLE_NAME] [-I IMPORT_FORMAT]
[-E NNP_IMPORT_EXECUTOR_INDEX]
[--nnp-exclude-preprocess] [--nnp-no-expand-network]
FILE [FILE ...]
positional arguments:
FILE File or directory name(s) to convert.
optional arguments:
-h, --help show this help message and exit
-v, --dump-verbose [dump] verbose output.
-F, --dump-functions [dump] dump function list.
-V, --dump-variables [dump] dump variable list.
--dump-limit DUMP_LIMIT
[dump] limit num of items.
-n DUMP_VARIABLE_NAME, --dump-variable-name DUMP_VARIABLE_NAME
[dump] Specific variable name to display.
-I IMPORT_FORMAT, --import-format IMPORT_FORMAT
[import] import format. (one of [NNP,ONNX])
-E NNP_IMPORT_EXECUTOR_INDEX, --nnp-import-executor-index NNP_IMPORT_EXECUTOR_INDEX
[import][NNP] import only specified executor.
--nnp-exclude-preprocess
[import][NNP] EXPERIMENTAL exclude preprocess
functions when import.
--nnp-no-expand-network
[import][NNP] expand network with repeat or recurrent.
Generate NNB config file template¶
usage: nnabla_cli nnb_template [-h] [-I IMPORT_FORMAT]
[--nnp-no-expand-network] [-b BATCH_SIZE]
[-T DEFAULT_VARIABLE_TYPE]
FILE [FILE ...]
positional arguments:
FILE File or directory name(s) to convert.
optional arguments:
-h, --help show this help message and exit
-I IMPORT_FORMAT, --import-format IMPORT_FORMAT
[import] import format. (one of [NNP,ONNX])
--nnp-no-expand-network
[import][NNP] expand network with repeat or recurrent.
-b BATCH_SIZE, --batch-size BATCH_SIZE
[export] overwrite batch size.
-T DEFAULT_VARIABLE_TYPE, --default-variable-type DEFAULT_VARIABLE_TYPE
Default type of variable
File format converter¶
usage: nnabla_cli convert [-h] [-I IMPORT_FORMAT] [--nnp-no-expand-network]
[-O EXPORT_FORMAT] [-f] [-b BATCH_SIZE]
[--nnp-parameter-h5] [--nnp-parameter-nntxt]
[--nnp-exclude-parameter] [-T DEFAULT_VARIABLE_TYPE]
[-s SETTINGS] [-c CONFIG] [-d DEFINE_VERSION] [--api API]
[--enable-optimize-pb] [--outputs OUTPUTS]
[--inputs INPUTS] FILE [FILE ...]
positional arguments:
FILE File or directory name(s) to convert.
(When convert ckpt format of the tensorflow model,
If the version of the checkpoint is V1, need to enter the `.ckpt` file,
otherwise need to enter the `.meta` file.)
optional arguments:
-h, --help show this help message and exit
-I IMPORT_FORMAT, --import-format IMPORT_FORMAT
[import] import format. (one of [NNP,ONNX,TF_CKPT_V1,TF_CKPT_V2,TF_PB,SAVED_MODEL,TFLITE])
--nnp-no-expand-network
[import][NNP] expand network with repeat or recurrent.
--outputs OUTPUTS
[import][tensorflow] The name(s) of the output nodes, comma separated.
Only needed when convert CKPT format.
--inputs INPUTS
[import][tensorflow] The name(s) of the input nodes, comma separated.
Only needed when convert CKPT format.
-O EXPORT_FORMAT, --export-format EXPORT_FORMAT
[export] export format. (one of [NNP,NNB,CSRC,ONNX,SAVED_MODEL,TFLITE,TF_PB],
the export file format is 'CSRC' or 'SAVED_MODEL' that
argument '--export-format' will have to be set!!!)
-f, --force [export] overwrite output file.
-b BATCH_SIZE, --batch-size BATCH_SIZE
[export] overwrite batch size.
--nnp-parameter-h5 [export][NNP] store parameter with h5 format
--nnp-parameter-nntxt
[export][NNP] store parameter into nntxt
--nnp-exclude-parameter
[export][NNP] output without parameter
-T DEFAULT_VARIABLE_TYPE, --default-variable-type DEFAULT_VARIABLE_TYPE
Default type of variable
-s SETTINGS, --settings SETTINGS
Settings in YAML format file.
-c CONFIG, --config CONFIG
[export] config target function list.
-d DEFINE_VERSION, --define_version
[export][ONNX] define onnx opset version. e.g. opset_6
[export][ONNX] define convert to onnx for SNPE. e.g. opset_snpe
[export][ONNX] define convert to onnx for TensorRT. e.g. opset_tensorrt
[export][NNB] define binary format version. e.g. nnb_3
--api API [export][NNB] Set API Level to convert to, default is highest API Level.
--enable-optimize-pb [export][tensorflow] enable optimization when export to pb.
--channel_last [export][TFLite] Specify the data_format of the NNP network,
data_format default is channel_first.
Optimize pb model¶
usage: nnabla_cli optimize [-h] input_pb_file output_pb_file
positional arguments:
input_pb_file Input pre-optimized pb model.
output_pb_file Output optimized pb model.
Plot Monitor class output files¶
Note:
Plotting subcommands require matplotlib package.
By default, the following commands show a plot on your display using a backend rendering engine of matplotlib depending on your environment. If you want to save a plot as an image or a vector data, use
-o
option to specifiy a file name where a plot is saved.
MonitorSeries¶
usage: nnabla_cli plot_series [-h] [-l LABEL] [-o OUTFILE] [-x XLABEL]
[-y YLABEL] [-t TITLE] [-T YLIM_MAX]
[-B YLIM_MIN] [-R XLIM_MAX] [-L XLIM_MIN]
infile [infile ...]
Plot *.series.txt files produced by nnabla.monitor.MonitorSeries class.
Example:
nnabla_cli plot_series -x "Epochs" -y "Squared error loss" -T 10 -l "config A" -l "config B" result_a/Training-loss.series.txt result_b/Training-loss.series.txt
positional arguments:
infile Path to input file.
optional arguments:
-h, --help show this help message and exit
-l LABEL, --label LABEL
Label of each plot.
-o OUTFILE, --outfile OUTFILE
Path to output file.
-x XLABEL, --xlabel XLABEL
X-axis label of plot.
-y YLABEL, --ylabel YLABEL
Y-axis label of plot.
-t TITLE, --title TITLE
Title of plot.
-T YLIM_MAX, --ylim-max YLIM_MAX
Y-axis plot range max.
-B YLIM_MIN, --ylim-min YLIM_MIN
Y-axis plot range min.
-R XLIM_MAX, --xlim-max XLIM_MAX
X-axis plot range max.
-L XLIM_MIN, --xlim-min XLIM_MIN
X-axis plot range min.
MonitorTimeElapsed¶
usage: nnabla_cli plot_timer [-h] [-l LABEL] [-o OUTFILE] [-x XLABEL]
[-y YLABEL] [-t TITLE] [-T YLIM_MAX]
[-B YLIM_MIN] [-R XLIM_MAX] [-L XLIM_MIN] [-e]
[-u TIME_UNIT]
infile [infile ...]
Plot *.timer.txt files produced by nnabla.MonitorTimeElapsed class.
Example:
nnabla_cli plot_timer -x "Epochs" -l "config A" -l "config B" result_a/Epoch-time.timer.txt result_b/Epoch-time.timer.txt
positional arguments:
infile Path to input file.
optional arguments:
-h, --help show this help message and exit
-l LABEL, --label LABEL
Label of each plot.
-o OUTFILE, --outfile OUTFILE
Path to output file.
-x XLABEL, --xlabel XLABEL
X-axis label of plot.
-y YLABEL, --ylabel YLABEL
Y-axis label of plot.
-t TITLE, --title TITLE
Title of plot.
-T YLIM_MAX, --ylim-max YLIM_MAX
Y-axis plot range max.
-B YLIM_MIN, --ylim-min YLIM_MIN
Y-axis plot range min.
-R XLIM_MAX, --xlim-max XLIM_MAX
X-axis plot range max.
-L XLIM_MIN, --xlim-min XLIM_MIN
X-axis plot range min.
-e, --elapsed Plot total elapsed time. By default, it plots elapsed time per iteration.
-u TIME_UNIT, --time-unit TIME_UNIT
Time unit chosen from {s|m|h|d}.
Draw a graph from NNP or .nntxt files¶
Note:
This feature requires
graphviz
installed as a Python package. Thegraphviz
Python is a interface to graphviz library which is not installed bypip
command. You have to install it usingapt
on Ubuntu for example.
usage: nnabla_cli draw_graph [-h] [-o OUTPUT_DIR] [-n NETWORK] [-f FORMAT]
input
Draw a graph in a NNP or nntxt file with graphviz.
Example:
nnabla_cli draw_graph -o output-folder path-to-nnp.nnp
positional arguments:
input Path to input nnp or nntxt.
optional arguments:
-h, --help show this help message and exit
-o OUTPUT_DIR, --output-dir OUTPUT_DIR
Output directory.
-n NETWORK, --network NETWORK
Network names to be drawn.
-f FORMAT, --format FORMAT
Graph saving format compatible with graphviz (`pdf`, `png`, ...).
Development¶
Generate function information¶
usage: nnabla_cli function_info [-h] [-o OUTFILE] [-f FUNC_SET] [-c CONFIG]
[-t TARGET] [-q --query] [--nnp-no-expand-network]
[--api API] [FILE] [FILE ...]
positional arguments:
FILE Path to nnp file.
optional arguments:
-h, --help show this help message and exit
-o OUTFILE, --output OUTFILE
output filename, *.txt or *.yaml, the default is stdout.
-f FUNC_SET, --all_support FUNC_SET
select function set: NNB, ONNX, the default is nnabla.
-c CONFIG, --config CONFIG
user config file for target constraint, *.txt file of the
function list or the "opset_" args.
-t, --target
output target function list.
-q, --query
query the detail of a function.
--nnp-no-expand-network
[import][NNP] expand network with repeat or recurrent.
--api API List up api levels
Display version¶
usage: nnabla_cli version [-h]
optional arguments:
-h, --help show this help message and exit
Python API Examples¶
There are a bunch of examples provided in NNabla repository. Please follow [this link](https://github.com/sony/nnabla-examples) to see examples.
Python API Reference¶
Common¶
Config¶
Search config file and get config information from config file.
Config file search order is described in following table. Each config value is overwritten by the following configs.
Type |
Posix |
Windows |
---|---|---|
System wide |
/etc/nnabla.conf |
c:\ProgramData\NNabla\nnabla.ini |
User |
~/.nnabla |
c:\Users\[USERNAME]\AppData\Roaming\NNabla\nnabla.ini |
Default |
(Same directory with ‘config.py’)/nnabla.conf |
|
Local |
[CURRENT DIRECTORY]/nnabla.conf |
You can get config value as followings.
from utils.config import nnabla_config
value = nnabla_config.get(CATEGORY, VALUE_NAME)
CATEGORY and VALUE_NAME does not defined in config.py. You can add CATEGORY and VALUE as you like. See Official document for more information.
[CATEGORY]
VALUE_NAME = value
Default values defined in ‘nnabla.conf’ placed same directory with config.py is here.
Logger¶
Wrapper module for logging.
You can use the logger as follows:
from utils.logger import logger
logger.debug('Log message(DEBUG)')
logger.info('Log message(INFO)')
logger.warn('Log message(WARNING)')
logger.error('Log message(ERROR)')
logger.critical('Log message(CRITICAL)')
With the default settings, it should yield the following output:
$ python scripts/logger_test.py
[nnabla][ERROR]: logger_test.py : <module> : 5 : Log message(ERROR)
[nnabla][CRITICAL]: logger_test.py : <module> : 6 : Log message(CRITICAL)
If you want to output log to file.
You must create nnabla.conf
file and put following entry.
See nnabla.config
for more information about config file.
[LOG]
log_file_name = /tmp/nbla.log
After this you can get following output.
$ python scripts/logger_test.py
[nnabla][ERROR]: logger_test.py : <module> : 5 : Log message(ERROR)
[nnabla][CRITICAL]: logger_test.py : <module> : 6 : Log message(CRITICAL)
$ cat /tmp/nbla.log
2017-01-19 14:41:35,132 [nnabla][DEBUG]: scripts/logger_test.py : <module> : 3 : Log message(DEBUG)
2017-01-19 14:41:35,132 [nnabla][INFO]: scripts/logger_test.py : <module> : 4 : Log message(INFO)
2017-01-19 14:41:35,132 [nnabla][ERROR]: scripts/logger_test.py : <module> : 5 : Log message(ERROR)
2017-01-19 14:41:35,132 [nnabla][CRITICAL]: scripts/logger_test.py : <module> : 6 : Log message(CRITICAL)
- nnabla.logger.logger¶
Auto-forward mode¶
NNabla provides the dynamic computation graph feature, which enables automatic forward propagation during graph construction. This can be enabled using the set_auto_forward()
function. Backpropagation shall be manually executed on the dynamically constructed graph.
- nnabla.auto_forward(auto=True)[source]¶
Context for dynamic graph execution mode.
- Parameters
auto (bool) – Whether forward computation is executed during a computation graph construction.
Returns: bool
- nnabla.set_auto_forward(auto)[source]¶
Set the default mode for automatic forward propagation.
When it is set to
True
, forward propagation is invoked immediately when the computation graph is updated.- Parameters
auto (bool) – Whether forward computation is executed when the computation graph is updated.
Returns: bool
Context¶
- class nnabla.Context(backend=None, array_class='', device_id='0')¶
Context is used to specify the computation engine (cpu, cuda, cudnn etc.) which the function operator modules and optimizer modules shall be ran on. The context can be set for each function, as well as set globally with functions listed in the
context-specifier()
.
Context Specifier API¶
- nnabla.context_scope(ctx)[source]¶
Context as Python context.
import nnabla as nn import nnabla.functions as F x = nn.Variable([2, 3 ,4]) ctx = nnabla_ext.cuda.context('0') with context_scope(ctx): # Inside with scope, the specified context is used. with parameter_scope('w1'): l1 = F.relu(F.affine(x, 64)) with parameter_scope('w2'): l2 = F.relu(F.affine(x, 64))
- nnabla.set_default_context(ctx)[source]¶
Set the default context.
Note
It cannot be called inside any
context_scope
.- Parameters
ctx (Context) – A Context.
- nnabla.get_current_context()[source]¶
Get the current context.
It can be set using
nnabla.context_scope()
ornnabla.set_default_context()
.- Returns
a current context.
- Return type
NdArray¶
- class nnabla.NdArray(*args, **kwargs)¶
nnabla.NdArray
is a device-agnostic data container for multi-dimensional arrays (tensors).nnabla.NdArray
can also implicitly handle data transfers across different devices (e.g. CPU to CUDA GPU, CUDA GPU to CPU). See Python API Tutorial for more details.NdArray
overrides some arithmetic operators (+
,-
,*
,/
,**
). Operands can be either a scalar number,NdArray
orVariable
. An arithmetic operation containingNdArray
returnsNdArray
which stores the output of the computation immediately invoked. Also, inplace arithmetic operations (+=
,-=
,*=
,/=
,**=
) are implemented. Note that=
doesn’t perform inplace substitution but just replaces the object reference. Instead, you can usecopy_from()
for inplace substitution.- cast(self, dtype, ctx=None)¶
In-place cast of data type of the NdArray. It returns the reference values as a numpy.ndarray only if optional parameter ctx is not given, None otherwise.
- Parameters
dtype (
numpy.dtype
) – Numpy Data type.ctx (
nnabla.Context
, optional) – Context descriptor.
- Returns
numpy.array
ifctx
is None, otherwise nothing.
- clear(self)¶
Clear memories which this NdArray has and return them to allocator.
- clear_called¶
Checking if the array is not modified after cleared. This returns False until clear is called at the first time.
- copy_from(self, NdArray arr, use_current_context=True)¶
Copy values from another NdArray object.
It returns the caller object itself.
- Parameters
arr (NdArray) – Values will be copied to the caller object. The shape of
arr`
must be same as the caller object.use_current_context (bool) – If True, a copy is happening in a device and dtype specified in the current context (equivalent to call
F.identity(src, output=[self])
). Otherwise, a device and dtype in the source array is used. The default is True.
- Returns
- data¶
Returns the values held by this array as a
numpy.ndarray
. Note that only the references are returned, and the values are not copied. Therefore, modifying the returnednnabla.NdArray
will affect the data contained inside the NNabla array. This method can also be called as a setter where an array is created as the same type as rhs. There is an exception wherezero()
orfill(rhs)
is invoked if a scalar with a float or an integer <= 2^53 (as filling value is maintained as float64) is given.Note that this may implicitly invoke a data transfer from device arrays to the CPU.
- Parameters
value (
numpy.ndarray
) –- Returns
- data_ptr(self, dtype, ctx=None)¶
Get array’s pointer.
The behavior is similar to
cast
method but returns the data pointer based on thectx
. If thectx
is not specified, the default context obtained bynn.get_current_context
is used.- Parameters
dtype (
numpy.dtype
) – Numpy Data type.ctx (
nnabla.Context
, optional) – Context descriptor.
- Returns
The data pointer.
- Return type
- dtype¶
Get dtype.
- Returns
- fill(self, value)¶
Fill all of the elements with the provided scalar value.
Note
This doesn’t not fill values in an internal array with 0 immediately. An array is created as a requested data type when this array is used (in forward or backward computation for exampe), and is filled with the value.
- Parameters
value (float) – The value filled with.
- static from_numpy_array(nparr)¶
Create a NdArray object from Numpy array data.
The data is initialized with the given Numpy array.
- Parameters
nparr (ndarray) – Numpy multi-dimensional array.
- Returns
nnabla.NdArray
- get_data(self, str mode='rw', dtype=None)¶
Returns the values held by this array as a
numpy.ndarray
with a specified mode.- Parameters
mode (str) – Computation becomes more efficient if right one is chosen. * ‘r’: Read-only access. * ‘w’: Write-only access. * ‘rw’: You can both read and write.
dtype (
numpy.dtype
, optional) – Force dtype of a returned array.
See :function:`nnabla.NdArray.data` for more details.
- modification_count¶
Returns how many times modified after memory allocation or clearing buffer.
- ndim¶
Number of dimensions.
- Returns
int
- shape¶
Shape of the N-d array.
- Returns
tuple of int
- size¶
Total size of the N-d array.
- Returns
int
- size_from_axis(self, axis=- 1)¶
Gets the size followed by the provided axis.
Example
a = nnabla.NdArray([10,9]) a.size_from_axis() # ==> 90 a.size_from_axis(0) # ==> 90 a.size_from_axis(1) # ==> 9 a.size_from_axis(2) # ==> 1
- strides¶
Strides.
- Returns
tuple of int
- zero(self)¶
Fill all of the elements with 0.
Note
This doesn’t not fill values in an internal array with 0 immediately. An array is created as a requested data type when this array is used (in forward or backward computation for exampe), and is filled with 0.
Variable¶
- class nnabla.Variable¶
Bases:
object
nnabla.Variable
is used to construct computation graphs (neural networks) together with functions in Functions and List of Parametric Functions . It also provides a method to execute forward and backward propagation of the network. Thennabla.Variable
class holds:Reference to the parent function in a computation graph. This provides traceability of all connections in the computation graph.
Both data and error signal (gradient) containers as
nnabla.NdArray
s.Some additional information of the computation graph.
Variable
overrides some arithmetic operators (+
,-
,*
,/
,**
). Operands can be either a scalar number,NdArray
orVariable
. IfNdArray
is given as either of left or right operand, the arithmetic operation returns anNdArray
which stores the output of the computation immediately invoked. Otherwise, it returnsVariable
holds the graph connection. The computation is invoked immediately whennnabla.auto_forward
ornnabla.set_auto_forward(True)
is used.Note
Relational operators
==
and!=
of twoVariable
s are defined as an address comparison of underlying C++ instances (nbla::Variable
). Also,hash()
function, which is often used in a key forset
anddict
, is based on the address.See also
- Parameters
shape (Iterable of int) – Shape of variable.
need_grad (bool) – Flag for backprop or not.
- apply(self, **kwargs)¶
Helper for setting property, then return self.
- backward(self, grad=1, bool clear_buffer=False, communicator_callbacks=None, function_pre_hook=None, function_post_hook=None)¶
Performs a backward propagation starting from this variable until the root variable(s) is/are reached in the function graph. The propagation will stop at a variable with need_grad=False.
- Parameters
grad (scalar,
numpy.ndarray
,nnabla.NdArray
, or None) – The gradient signal value(s) of this variable. The default value 1 is used in an usual neural network training. This option is useful if you have a gradient computation module outside NNabla, and want to use that result as a gradient signal. Note that this doesn’t modifies the grad values of this variable, instead assign received values to its gradient temporarily. Also, if theVariable
you want to executennabla._variable.Variable.backward
is an unlinked variable from another, and the correspondingVariable
holds the pre-computed gradient values, You need to set grad=None, otherwise, for that backward pass (propagated from the unlinkedVariable
), pre-computed gradient values are ignored.clear_buffer (bool) – Clears the no longer referenced variables during backpropagation to save memory. Note that all unnecessary intermediate variables will be cleared unless set explicitly as
persistent=True
.communicator_callbacks (
nnabla.CommunicatorBackwardCallback
or list ofnnabla.CommunicatorBackwardCallback
) – The callback functions invoked when 1) backward computation of each function is finished and 2) all backward computation is finished.function_pre_hook (callable) – This callable object is called immediately before each function is executed. It must take
Function
as an input. The default is None.function_post_hook (callable) – This callable object is called immediately after each function is executed. It must take
Function
as an input. The default is None.
Example
We first explain simple backward usage.
import nnabla as nn import nnabla.functions as F import nnabla.parametric_functions as PF import numpy as np import nnabla.initializer as I rng = np.random.seed(217) initializer = I.UniformInitializer((-0.1, 0.1), rng=rng) x = nn.Variable((8, 3, 32, 32)) x.d = np.random.random(x.shape) # random input, just for example. y0 = PF.convolution(x, outmaps=64, kernel=(3, 3), pad=(1, 1), stride=(2, 2), w_init=initializer, name="conv1", with_bias=False) y1 = F.relu(y0) y2 = PF.convolution(y1, outmaps=128, kernel=(3, 3), pad=(1, 1), stride=(2, 2), w_init=initializer, name="conv2", with_bias=False) y3 = F.relu(y2) y4 = F.average_pooling(y3, kernel=y3.shape[2:]) y5 = PF.affine(y4, 1, w_init=initializer) loss = F.mean(F.abs(y5 - 1.)) loss.forward() # Execute forward # We can check the current gradient of parameter. print(nn.get_parameters()["conv1/conv/W"].g)
Output :
[[[[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]] ...
Initially all the gradient values should be zero. Then let’s see what happens after calling backward.
loss.backward() print(nn.get_parameters()["conv1/conv/W"].g)
Output :
[[[[ 0.00539637 0.00770839 0.0090611 ] [ 0.0078223 0.00978992 0.00720569] [ 0.00879023 0.00578172 0.00790895]] ...
Now we know the gradient values are computed and registered by calling
backward
. Note that callingbackward
successively accumulates the result. It means if we executebackward
again, we get the doubled result.loss.backward() # execute again. print(nn.get_parameters()["conv1/conv/W"].g)
We can see it’s accumulated.
[[[[ 0.01079273 0.01541678 0.0181222 ] [ 0.01564459 0.01957984 0.01441139] [ 0.01758046 0.01156345 0.0158179 ]] ...
Next is an advanced usage with an unlinked variable (please refer to
get_unlinked_variable
). We use the same network, but it is separated by the unlinked variable.import nnabla as nn import nnabla.functions as F import nnabla.parametric_functions as PF import numpy as np import nnabla.initializer as I rng = np.random.seed(217) # use the same random seed. initializer = I.UniformInitializer((-0.1, 0.1), rng=rng) x = nn.Variable((8, 3, 32, 32)) x.d = np.random.random(x.shape) # random input, just for example. y0 = PF.convolution(x, outmaps=64, kernel=(3, 3), pad=(1, 1), stride=(2, 2), w_init=initializer, name="conv1", with_bias=False) y1 = F.relu(y0) y2 = PF.convolution(y1, outmaps=128, kernel=(3, 3), pad=(1, 1), stride=(2, 2), w_init=initializer, name="conv2", with_bias=False) y3 = F.relu(y2) y3_unlinked = y3.get_unlinked_variable() # the computation graph is cut apart here. y4 = F.average_pooling(y3_unlinked, kernel=y3_unlinked.shape[2:]) y5 = PF.affine(y4, 1, w_init=initializer) loss = F.mean(F.abs(y5 - 1.)) # Execute forward. y3.forward() # you need to execute forward at the unlinked variable first. loss.forward() # Then execute forward at the leaf variable. # Execute backward. loss.backward() # works, but backpropagation stops at y3_unlinked. print(nn.get_parameters()["conv1/conv/W"].g) # no gradient registered yet.
Output :
[[[[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]] ...
We can confirm that backpropagation stops at
y3_unlinked
. Then let’s see how to execute backpropagation to the root variable (x
). Since it’s a little bit complicated, let us give you an example of common pitfall first. Note that this is an incorrect way and intended just to show the backward’s behavior.y3.backward() # this works, but computed gradient values are not correct. print(nn.get_parameters()["conv1/conv/W"].g)
Output :
[[[[ 17.795254 23.960905 25.51168 ] [ 20.661646 28.484127 19.406212 ] [ 26.91042 22.239697 23.395714 ]] ...
Note that this is a wrong result. The gradient held by
y3_unlinked
has been totally ignored. As described above, just callingbackward
, the gradient (of the leaf variable where you callbackward
) is considered to be 1.To execute backpropagation over 2 separate graphs correctly, We need to specify
grad=None
as shown below, then present gradient held by that variable is used for computation. (y3.backward(grad=y3_unlinked.g)
does the same thing.)#reset all the gradient values. for v in nn.get_parameters().values(): v.g = 0. for v in [y0, y1, y2, y3, y4, y5]: v.g = 0. # need to reset all the gradient values. loss.backward() # backpropagation starts from the leaf variable again. y3.backward(grad=None) # By this, it can take over the gradient held by y3_unlinked. print(nn.get_parameters()["conv1/conv/W"].g) # correct result.
This time you should have the same result.
[[[[ 0.00539637 0.00770839 0.0090611 ] [ 0.0078223 0.00978992 0.00720569] [ 0.00879023 0.00578172 0.00790895]] ...
- clear_all_graph_links(self)¶
Clear all intermediate functions and variables.
This method clear all intermediate functions and variables up to this variable in forward pass and is useful for the truncated backpropagation through time (truncated BPTT) in dynamic graph.
- d¶
Returns the values held by this variable, as a
numpy.ndarray
. Note that the values are referenced (not copied). Therefore, the modification of the returned ndarray will affect the data of the NNabla array. This method can be called as a setter to set the value held by this variable. Refer to the documentation of the setternnabla.NdArray.data
for detailed behaviors of the setter.- Parameters
value (
numpy.ndarray
) (optional) –- Returns
- forward(self, bool clear_buffer=False, bool clear_no_need_grad=False, function_pre_hook=None, function_post_hook=None)¶
Performs a forward propagation from the root node to this variable. The forward propagation is performed on a subset of variables determined by the dependency of this variable. The subset is recursively constructed by tracking variables that the variables in the subset depend on, starting from this variable, until it reaches the root variable(s) in the function graph. See also
forward_all
, which performs forward computations for all variables within the input graph.- Parameters
clear_buffer (bool) – Clear the no longer referenced variables during forward propagation to save memory. This is usually set as True in an inference or a validation phase. Default is False. Note that all unnecessary intermediate variables will be cleared unless set explicitly as
persistent=True
.clear_no_need_grad (bool) – Clear the unreferenced variables with need_grad=False during forward propagation. True is usually used when calling this during training. This is ignored when clear_buffer=True.
function_pre_hook (callable) – This callable object is called immediately before each function is executed. It must take
Function
as an input. The default is None.function_post_hook (callable) – This callable object is called immediately after each function is executed. It must take
Function
as an input. The default is None.
- static from_numpy_array(data, grad=None, need_grad=None)¶
Create a Variable object from Numpy array(s).
The
data
is initialized with the given Numpy array, as well asgrad
if given.The shape is also determined by the given array.
- function_references¶
Returns a list of functions which take this variable as an input. This method can be called only as a getter.
- Returns
list of
nnabla.function.Function
- g¶
Returns the gradient values held by this variable, as a
numpy.ndarray
. Note that the values are referenced (not copied). Therefore, the modification of the returned ndarray will affect the data of the NNabla array. This method can be called as a setter to set the gradient held by this variable. Refer to the documentation of the setternnabla.NdArray.data
for detailed behaviors of the setter.- Parameters
value (
numpy.ndarray
) –- Returns
- get_unlinked_variable(self, need_grad=None)¶
Gets an unlinked (forgetting parent) variable that shares a Variable buffer instance.
- Parameters
need_grad (bool, optional) – By default, the unlinked variable will have the same need_grad flag with this variable instance. By specifying a boolean value, the new need_grad flags will be set to the unlinked variable. It is recommended to explicitly specify this option to avoid an unintended behavior.
Returns:
Variable
Note
The unlinked Variable behaves equivalent to the original variable in a comparison operator and hash function regardless whether or not the
need_grad
attribute is changed. See a note in the Variable class documentation. Also, for backward execution with unlinked variable(s), please refer tobackward
and its example.Example
import numpy as np import nnabla as nn import nnabla.parametric_functions as PF x = nn.Variable.from_numpy_array(np.array([[1, 2], [3, 4]])) y = PF.affine(x, 4, name="y") # Create a new variable of which graph connection is unlinked. # Recommend to specify need_grad option explicitly . z = y.get_unlinked_variable(need_grad=False) print(y.parent) # Affine print(z.parent) # z is unlinked from the parent x but shares the buffers of y. # None
- info¶
object
Information of the variable.
- Type
info
- ndim¶
Gets the number of dimensions of this variable.
- Returns
int
- need_grad¶
Gets or sets a boolean indicating whether backpropagation is performed at this variable.
- no_grad(self)¶
No gradients for the whole network.
This method is like
nnabla.no_grad
but can be used for the static network only, and useful for the case where the network is loaded from NNP format.Example
x = nn.Variable.from_numpy_array([2, 3]) y = <Network>(x).no_grad()
- parent¶
Returns the parent function of this variable. This method can also be called as a setter.
- Parameters
func (
nnabla.function.Function
) –- Returns
- persistent¶
Returns the persistent flag of this variable. If True, the variable is not cleared even if clear options in
nnabla._variable.Variable.forward()
andnnabla._variable.Variable.backward()
are enabled. This is useful when you debug the variable values, or log them. This method can also be called as a setter.- Parameters
b (bool) –
- Returns
bool
- recompute¶
Gets or sets a boolean indicating whether its data is cleared during forward propagation and recomputation is performed during backward propagation.
- reset_shape(self, shape, force=False)¶
Resizes the shape of the variable to a specified shape.
- Parameters
shape (Iterable of int) – Target shape.
force (bool) – Flag to force reshape.
Note
This method destructively changes the shape of the target variable. For safety,
reshape()
should be used instead.- Returns
None
- reshape(self, shape, unlink=False)¶
Returns a new variable, where this variable is reshaped to a specified shape.
- rewire_on(self, var)¶
Rewire a successor graph of this variable on top of
var
.- Parameters
var (
nnabla.Variable
) – The array elements and the parent function ofvar
is copied toself
as references. Note that the parent function ofvar
is removed.
Example
# A. Create a graph A. xa = nn.Variable((2, 8), need_grad=True) ya = F.tanh(PF.affine(xa, 10, name='a')) # B. Create a graph B. xb = nn.Variable((2, 16), need_grad=True) yb = F.tanh(PF.affine( F.tanh(PF.affine(xb, 8, name='b1')), 8, name='b2')) # C. Rewire the graph A on top of B such that # `xb->B->(yb->)xa->A->ya`. Note `yb` is gone. xa.rewire_on(yb) # D. Execute the rewired graph. xb.d = 1 ya.forward() ya.backward()
- size_from_axis(self, axis=- 1)¶
Gets the size followed by the provided axis.
Example
a = nnabla.Variable([10,9]) a.size_from_axis() # ==> 90 a.size_from_axis(0) # ==> 90 a.size_from_axis(1) # ==> 9 a.size_from_axis(2) # ==> 1
- unlinked(self, need_grad=None)¶
This function is
deprecated
, use get_unlinked_variable instead.
- visit(self, f)¶
Visit functions recursively in forward order.
- Parameters
f (function) – Function object which takes
nnabla._function.Function
object as an argument.- Returns
None
Example
import nnabla as nn import nnabla.functions as F import nnabla.parametric_functions as PF # Define a simple network-graph def network_graph(x, maps=16, test=False): h = x h = PF.convolution(h, maps, kernel=(3, 3), pad=(1, 1), name="first-conv", with_bias=False) h = F.average_pooling(h, h.shape[2:]) pred = PF.affine(h, 10, name="pred") return pred # You can modify this PrintFunc to get the other information like inputs(nnabla_func.inputs), outputs and arguments(nnabla_func.info.args) of nnabla functions. class PrintFunc(object): def __call__(self, nnabla_func): print(nnabla_func.info.type_name) x = nn.Variable([1, 3, 16, 16]) output = network_graph(x) output.visit(PrintFunc())
Output :
Convolution AveragePooling Affine
- visit_check(self, f)¶
Visit functions recursively in forward order.
Note
If any of evaluation of the function object returns True, the visit propagation will stop immediately, and will return True.
- Parameters
f (function) – Function object which takes
nnabla._function.Function
object as an argument.- Returns
bool Returns True if any of the function object call returns True.
Example
Define a simple network-graph where AveragePooling function can be added explicitly as below:
def network_graph(x, add_avg_pool=False, maps=16, test=False): h = x h = PF.convolution(h, maps, kernel=(3, 3), pad=(1, 1), name="first-conv", with_bias=False) if add_avg_pool : h = F.average_pooling(h, h.shape[2:]) else : h = F.relu(h) pred = PF.affine(h, 10, name="pred") return pred # Define 'PrintFunc()' to check whether "AveragePooling" function exists in the network-graph class PrintFunc(object): def __call__(self, nnabla_func): if nnabla_func.info.type_name =="AveragePooling" : print("{} exists in the graph".format(nnabla_func.info.type_name)) return True else : return False
Create a network-graph which has AveragePooling function and call visit_check() method :
x = nn.Variable([1, 3, 16, 16]) output = network_graph(x, add_avg_pool=True) #Adding AveragePooling function to the graph print("The return value of visit_check() method is : {}".format(output.visit_check(PrintFunc())))
Output :
AveragePooling exists in the graph The return value of visit_check() method is : True
Create a network-graph which doesn’t have AveragePooling function and call visit_check() method :
nn.clear_parameters() # call this in case you want to run the following code again output = network_graph(x, add_avg_pool=False) # Exclusion of AveragePooling function in the graph print("The return value of visit_check() method is : {}".format(output.visit_check(PrintFunc())))
Output :
The return value of visit_check() method is : False
Computation Graph¶
Computation Graph¶
- nnabla.forward_all(variables, bool clear_buffer=False, bool clear_no_need_grad=False, function_pre_hook=None, function_post_hook=None)¶
Performs a forward propagation up to variables specified as the 1st argument. See also
forward
.- Parameters
clear_buffer (bool) –
Clear the no longer referenced variables during forward propagation to save memory. This is usually set as True in an inference or a validation phase. Default is False. Note that starting variable and destination variable of the input graph will not be cleared, regardless of their
persistent
flag. All intermediate variables will be cleared unless set explicitly aspersistent=True
. For example,forward_all([h_i, y], clear_buffer=True)
will clear all intermediate variables between
h_i
andy
unless set explicitly aspersistent=True
, buth_i
andy
will not be cleared regardless of theirpersistent
flag.clear_no_need_grad (bool) – Clear the unreferenced variables with need_grad=False during forward propagation. True is usually used when calling this during training. This is ignored when clear_buffer=True.
function_pre_hook (callable) – This callable object is called immediately before each function is executed. It must take
Function
as an input. The default is None.function_post_hook (callable) – This callable object is called immediately after each function is executed. It must take
Function
as an input. The default is None.
Example
import numpy as np import nnabla as nn import nnabla.parametric_functions as PF # Create a graph which has two outputs x = nn.Variable.from_numpy_array(np.array([[1, 2], [3, 4]])) y = PF.affine(x, 4, name="y") z = PF.affine(x, 8, name="z") # Execute a forward propagation recursively up to y and z nn.forward_all([y, z], clear_buffer)
- nnabla.no_grad(no_grad_=True)[source]¶
No gradients for the whole network.
No gradients are required when creating a network, such that when the forward pass is executed, all intermediate buffers except for the leafs in the network are gone at the same time, resulting in memory optimization.
This is useful for example when an output of a pre-trained network is used for an input to another network, where the first pre-trained network does not need to be fine-tuned, but the other network is optimized.
- Parameters
no_grad (bool) – No gradient flag. Default is True.
Example:
with nn.no_grad(): output0 = <Network0>(<input0>) output1 = <Network1>(<input1>, output0) loss = <Loss>(output1, <ground_truth>) loss.forward(clear_no_need_grad=True)
This context also works in the dynamic mode.
with nn.auto_forward(), nn.no_grad(): output0 = <Network0>(<input0>)
Note
When working with the static network, the need_grad property of the input (e.g., input image) must be False and do not forget to add
<root>.forward(clear_no_need_grad=True)
; otherwise, all intermediate buffers are not gone as expected.
Functions¶
All NNabla functions are derived from the nnabla.function.Function
class.
Function¶
- class nnabla.function.Function¶
Function interface class.
Instances of
nnabla.function.Function
are not directly created by users. It is indirectly created by the functions available innnabla.functions
. These functions returnnnabla.Variable
(s) holding the created function instance as the parent property.- args¶
Experimental
Get args of the function.
- backward(self, inputs, outputs, accum=None)¶
- forward(self, inputs, outputs)¶
- grad_depends_output_data(self, int i, int o)¶
- info¶
object
- Type
info
- inplace_data(self, int i)¶
- inplace_data_with(self, int i)¶
- min_outputs(self)¶
- need_setup_recompute(self, int o)¶
- recompute(self, inputs, outputs)¶
- setup(self, inputs, outputs)¶
- setup_recompute(self, inputs, outputs)¶
- tags¶
Experimental
Get tags of the function.
- class nnabla.function.PythonFunction(ctx=None)¶
Creates a user-defined custom function in the subclsass.
To implement the naive multiplicaiton function of two variables using PythonFunction,
import nnabla as nn import nnabla.functions as F from nnabla.function import PythonFunction class Mul2(PythonFunction): def __init__(self, ctx): super(Mul2, self).__init__(ctx) @property def name(self): return self.__class__.__name__ def min_outputs(self): return 1 def setup_impl(self, inputs, outputs): i0 = inputs[0] i1 = inputs[1] assert i0.shape == i1.shape, "Shapes of inputs are different." o0 = outputs[0] o0.reset_shape(i0.shape, True) def forward_impl(self, inputs, outputs): x0 = inputs[0].data x1 = inputs[1].data y = outputs[0].data # We can also write like, y.copy_from(x0 * x1) y.copy_from(F.mul2(x0, x1)) def backward_impl(self, inputs, outputs, propagate_down, accum): # Data of inputs and outputs x0 = inputs[0].data x1 = inputs[1].data y = outputs[0].data # Grads of inputs and outputs dx0 = inputs[0].grad dx1 = inputs[1].grad dy = outputs[0].grad # backward w.r.t. x0 if propagate_down[0]: if accum[0]: dx0 += F.mul2(dy, x1) else: dx0.copy_from(F.mul2(dy, x1)) # backward w.r.t. x1 if propagate_down[1]: if accum[1]: dx1 += F.mul2(dy, x0) else: dx1.copy_from(F.mul2(dy, x0)) def grad_depends_output_data(self, i, o): return False def grad_depends_input_data(self, i, j): return True def mul2(x, y, ctx=None): func = Mul2(ctx) return func(x, y)
- __init__(self, ctx=None)¶
- Parameters
ctx (
nnabla.Context
) – Context used for the forward and backward pass. If not specified, the current context is used.
- backward_impl(self, inputs, outputs, propagate_down, accum)¶
Backward method.
- Parameters
inputs – (list of
nnabla.Variable
): Inputs to the function.outputs – (list of
nnabla.Variable
): Outputs from the function.
- property ctx¶
Context Return the context if the context is set in the constructor; otherwise return the global context
- forward_impl(self, inputs, outputs)¶
Forward method.
- Parameters
inputs – (list of
nnabla.Variable
): Inputs to the function.outputs – (list of
nnabla.Variable
): Outputs from the function.
- grad_depends_input_data(self, i, j)¶
Checking if i-th input’ gradient computation requires j-th input’s data or not.
- Parameters
i – (list of
nnabla.Variable
): Input variable index.i – (list of
nnabla.Variable
): Input variable index.
- grad_depends_output_data(self, i, o)¶
Checking if i-th input’ gradient computation requires o-th output’s data or not.
- Parameters
i – (list of
nnabla.Variable
): Input variable index.o – (list of
nnabla.Variable
): Output variable index.
- min_outputs(self)¶
Minimum number of outputs of the function.
- property name¶
Name of the function.
- setup_impl(self, inputs, outputs)¶
Setup method.
- Parameters
inputs – (list of
nnabla.Variable
): Inputs to the function.outputs – (list of
nnabla.Variable
): Outputs from the function.
List of Functions¶
The nnabla.functions
module provides various types of functions listed below.
These functions takes input nnabla.Variable
(s) as its leading argument(s), followed by options
specific to each function.
Note
The functions can also take NdArray
(s) as inputs instead
of Variable
(s). It will execute the function operation immediately,
and returns NdArray
(s) as output(s) holding output values of the
operation. We call this “Imperative Mode” (NdArray + Functions).
Neural Network Layers¶
- nnabla.functions.affine(x, weight, bias=None, base_axis=1, n_outputs=- 1, outputs=None)[source]¶
Affine layer, also called as the fully connected layer. It calculates:
\[{\mathbf y} = {\mathbf A} {\mathbf x} + {\mathbf b}.\]where \({\mathbf x}\) is the input and \({\mathbf y}\) is the output.
- Parameters
x (Variable) – Input N-D array with shape (\(M_0 \times ... \times M_{B-1} \times D_B \times ... \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.
weight (Variable) – Weight matrix with shape (\((D_B \times ... \times D_N) \times L_{0} \times \ldots \times L_{I}\)) [parameter]
bias (Variable) – Bias vector (\(L_{0} \times \ldots \times L_{I}\)) [optional][parameter]
base_axis (int) – Base axis of Affine operation. Dimensions up to base_axis is treated as sample dimension. [default=
1
]
- Returns
\((B + 1)\)-D array. (\(M_0 \times ... \times M_{B-1} \times L_{0} \times \ldots \times L_{I}\))
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.convolution(x, weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, channel_last=False, n_outputs=- 1, outputs=None)[source]¶
N-D Convolution with bias.
See references for dilated convolution (a.k.a. atrous convolution).
References
Note
Convolution is a computationally intensive operation that should preferrably be run with the
cudnn
backend. NNabla then uses CuDNN library functions to determine and cache the fastest algorithm for the given set of convolution parameters, which results in additional memory consumption which may pose a problem for GPUs with insufficient memory size. In that case, theNNABLA_CUDNN_WORKSPACE_LIMIT
environment variable can be used to restrict the choice of algorithms to those that fit the given workspace memory limit, expressed in bytes. In some cases it may also be desired to restrict the automatic search to algorithms that produce deterministic (reproducable) results. This can be requested by setting the the environment variableNNABLA_CUDNN_DETERMINISTIC
to a non-zero value.- Parameters
x (Variable) – \((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C \times L_1 \times ... \times L_N\)).
weight (Variable) – \((2 + N)\)-D array (\(C' \times C \times K_1 \times ... \times K_N\)). [parameter]
bias (Variable) – Bias vector (\(C'\)). [optional][parameter]
base_axis (int) – base axis \(B\). [default=
1
]pad (
tuple
ofint
) – Padding sizes for dimensions. [default=(0,) * (len(x.shape) - (base_axis+1))
]stride (
tuple
ofint
) – Stride sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default=
1
]channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default=
False
]
- Returns
\((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C' \times L'_1 \times ... \times L'_N\)).
A spatial size of the output is calculated as
\[L'_i = \frac{L_i + 2 p_i - d_i (k_i - 1) - 1}{s_i} + 1,\]where \(L_i\) is the spatial size, \(p_i\) is the padding, \(d_i\) is the dilation, \(k_i\) is the kernel size, and \(s_i\) is the stride for \(i\)-th spatial dimension. The same calculation can also be applied to the other spatial dimensions.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.depthwise_convolution(x, weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, multiplier=1, n_outputs=- 1, outputs=None)[source]¶
N-D Depthwise Convolution with bias.
References
- Parameters
x (Variable) – \((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C \times L_1 \times ... \times L_N\)).
weight (Variable) – \((1 + N)\)-D array (\(C \times K_1 \times ... \times K_N\)). [parameter]
bias (Variable) – Bias vector (\(C'\)). [optional][parameter]
base_axis (int) – base axis \(B\). [default=
1
]pad (
tuple
ofint
) – Padding sizes for dimensions. [default=(0,) * (len(x.shape) - (base_axis+1))
]stride (
tuple
ofint
) – Stride sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]multiplier (int) – Number of output feature maps per input feature map. [default=
1
]
- Returns
\((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C' \times L'_1 \times ... \times L'_N\)).
The output map size \(C'\) is \(C\) multiplied by \(m\)
\[C' = m \times C,\]where \(m\) is the multiplier.
A spatial size of the output is calculated as
\[L'_i = \frac{L_i + 2 p_i - d_i (k_i - 1) - 1}{s_i} + 1,\]where \(L_i\) is the spatial size, \(p_i\) is the padding, \(d_i\) is the dilation, \(k_i\) is the kernel size, and \(s_i\) is the stride for \(i\)-th spatial dimension. The same calculation can also be applied to the other spatial dimensions.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.deconvolution(x, weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, channel_last=False, output_padding=None, n_outputs=- 1, outputs=None)[source]¶
N-D deconvolution, also known as transposed convolution, with bias operates backward convolution (derivative of the output w.r.t. the input) plus channel-wise learned bias.
The weights are specified in the same manner as
convolution()
, as if it was an ordinary convolution function. The forward operation ofdeconvolution()
will then be operationally equivalent to the backward pass ofconvolution()
. Therefore, the number of input channels (can be seen as output channels of forward convolution) is specified in the first dimension, and the number of the output channels divided by the number of groups is specified in the second dimension.For
stride > 1
, a parameter-wise identical deconvolution on the output of a convolution may not produce the same output shape as the input to the convolution if, due to striding, the convolution did not fully cover the input spatial dimension. Theoutput_padding
parameter can then be used to appropriately increase the calculated output shape. Note that this is used to find the output shape for the deconvolution operation, but not to add zero-padding to the output.- Parameters
x (Variable) – \((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C \times L_1 \times ... \times L_N\)).
weight (Variable) – \((2 + N)\)-D array (\(C \times C' \times K_1 \times ... \times K_N\)). [parameter]
bias (Variable) – Bias vector (\(C'\)). [optional][parameter]
base_axis (int) – base axis \(B\). [default=
1
]pad (
tuple
ofint
) – Padding sizes for dimensions. [default=(0,) * (len(x.shape) - (base_axis+1))
]stride (
tuple
ofint
) – Stride sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default=
1
]channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default=
False
]output_padding (
tuple
ofint
) – Additional size added to the output shape. [default=(0,) * (len(x.shape) - (base_axis+1))
]
- Returns
\((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C' \times L'_1 \times ... \times L'_N\)).
A spatial size of the output is calculated as
\[L'_i =s_i (L_i - 1) - 2 p_i + d_i (k_i - 1) + 1,\]where \(s_i\) is the stride, \(L_i\) is the spatial size, \(p_i\) is the padding, \(d_i\) is the dilation, and \(k_i\) is the kernel size for \(i\)-th spatial dimension. The same calculation can also be applied to the other spatial dimensions.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.depthwise_deconvolution(x, weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, divisor=1, n_outputs=- 1, outputs=None)[source]¶
Depthwise deconvolution computes the transposed depthwise convolution with bias for one-dimensional and two-dimensional input data.
- Parameters
x (Variable) – \((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C \times L_1 \times ... \times L_N\)).
weight (Variable) – \((1 + N)\)-D array (\(C \times K_1 \times ... \times K_N\)). [parameter]
bias (Variable) – Bias vector (\(C'\)). [optional][parameter]
base_axis (int) – base axis \(B\). [default=
1
]pad (
tuple
ofint
) – Padding sizes for dimensions. [default=(0,) * (len(x.shape) - (base_axis+1))
]stride (
tuple
ofint
) – Stride sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]divisor (int) – Number of input feature maps per output feature map. [default=
1
]
- Returns
\((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C' \times L'_1 \times ... \times L'_N\)).
The output map size \(C'\) is \(C\) multiplied by \(m\)
\[C' = \frac{C}{d},\]where \(d\) is the divisor.
A spatial size of the output is calculated as
\[L'_i =s_i (L_i - 1) - 2 p_i + d_i (k_i - 1) + 1,\]where \(s_i\) is the stride, \(L_i\) is the spatial size, \(p_i\) is the padding, \(d_i\) is the dilation, and \(k_i\) is the kernel size for \(i\)-th spatial dimension. The same calculation can also be applied to the other spatial dimensions.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.deformable_convolution(x, weight, offset, mask=None, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, deformable_group=1, channel_last=False, n_outputs=- 1, outputs=None)[source]¶
2-D Deformable Convolution with bias. Another convolution with fixed output channels must be passed externally to calculate the offsets and mask. Mask should be normalized to \([0,1]\) interval.
\[\begin{eqnarray} y(p) = \sum_{k=1}^{K} w_k \cdot x(p + p_k + \Delta p_k) \cdot \Delta m_k, \end{eqnarray}\]where \(x\) and \(y\) are input and output, \(w_k\) is the weight, \(p\) is the pixel location of interest, \(p_k\) is the fixed displacement e.g., \(p_k \in \{(-1, -1), (-1, 0), \ldots (1, 1)\}\) for the 2D 3x3 receptive field, \(\Delta p_k\) is the learnable displacement, and \(\Delta m_k\) is the learnable scale normalized in \([0, 1]\) by a function like the sigmoid. Note that \(\Delta p_k\) and \(\Delta m_k\) are sample-dependent, location-dependent, and feature-independent.
References
- Parameters
x (Variable) – \((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C \times L_1 \times ... \times L_N\)).
weight (Variable) – \((2 + N)\)-D array (\(C' \times C \times K_1 \times ... \times K_N\)). [parameter]
offset (Variable) – Offsets for deformable convolutions. Shape is fixed to \((N, deformable_group \times 2 \times Kh \times Kw, H, W)\). Offsets must be calculated externally through a separate convolution layer.
mask (Variable) – Normalized mask for deformable convolutions v2. Shape is fixed to \((N, deformable_group \times 2 \times Kh \times Kw, H, W)\). Masks must be calculated externally together with the offsets through a separate convolution layer. [optional]
bias (Variable) – Bias vector (\(C'\)). [optional][parameter]
base_axis (int) – base axis \(B\). [default=
1
]pad (
tuple
ofint
) – Padding sizes for dimensions. [default=(0,) * (len(x.shape) - (base_axis+1))
]stride (
tuple
ofint
) – Stride sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default=
1
]deformable_group (int) – Number of deformable groups of channels. [default=
1
]channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default=
False
]
- Returns
\((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C' \times L'_1 \times ... \times L'_N\)).
A spatial size of the output is calculated as
\[L'_i = \frac{L_i + 2 p_i - d_i (k_i - 1) - 1}{s_i} + 1,\]where \(L_i\) is the spatial size, \(p_i\) is the padding, \(d_i\) is the dilation, \(k_i\) is the kernel size, and \(s_i\) is the stride for \(i\)-th spatial dimension. The same calculation can also be applied to the other spatial dimensions.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.adaptive_separable_convolution(x, vertical_kernel, horizontal_kernel, n_outputs=- 1, outputs=None)[source]¶
2-D Adaptive Separable Convolution for NCHW (the channel-first tensor). Sample and pixel dependent vertical and horizontal kernels are dynamically generated ones, which are used for approximating a feature-independent 2-D kernel in this function. Thus, the kernel used in this function is dependent on samples and pixels but independent on features.
If the padding is needed, use the pad function to the input \(x\) before this function.
Adaptive separable convolution is formulated as
\[\tilde{I}(c, h, w) = \sum_{j, i} K_v(j, h, w) \times K_h(i, h, w) \times I(c, h + j, w + i),\]where \(I(c, h, w)\) and \(\tilde{I}(c, h, w)\) are the input and output images at \(c\)-th channel, \(h\)-th height, \(w\)-th width. \(K_V(:, h, w)\) and \(K_h(:, h, w)\) are vertical and horizontal 1-D kernels at \(h\)-th height and \(w\)-th width.
References
- Parameters
- Returns
\(4-D\) array (\(B \times C \times H - K_v + 1 \times W - K_h + 1\))
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.max_pooling(x, kernel, stride=None, ignore_border=True, pad=None, channel_last=False, n_outputs=- 1, outputs=None)[source]¶
Max pooling. It pools the maximum values inside the scanning kernel:
\[y_{i_1, i_2} = \max_{k_1, k_2 \in K} (x_{i_1 + k_1, i_2 + k_2})\]where \(x_{i_1 + k_1, i_2 + k_2}\) is the input and \(y_{i_1, i_2}\) is the output.
- Parameters
x (Variable) – Input variable.
stride (
tuple
ofint
) – Subsampling factors for each spatial axis. [default=kernel
]ignore_border (bool) – If false, kernels covering borders are also considered for the output. [default=
True
]pad (
tuple
ofint
) – Border padding values for each spatial axis. Padding will be added both sides of the dimension. [default=(0,) * len(kernel)
]channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default=
False
]
- Returns
Maximum values variable
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.average_pooling(x, kernel, stride=None, ignore_border=True, pad=None, channel_last=False, including_pad=True, n_outputs=- 1, outputs=None)[source]¶
Average pooling. It pools the averaged values inside the scanning kernel:
\[y_{i_1, i_2} = \frac{1}{K_1 K_2} \sum_{k1} \sum_{k2} x_{i_1 + k_1, i_2 + k_2}\]where \(x_{i_1 + k_1, i_2 + k_2}\) is the input and \(y_{i_1, i_2}\) is the output.
- Parameters
x (Variable) – Input variable.
stride (
tuple
ofint
) – Subsampling factors for each spatial axis. [default=kernel
]ignore_border (bool) – If false, kernels covering borders are also considered for the output. [default=
True
]pad (
tuple
ofint
) – Border padding values for each spatial axis. Padding will be added both sides of the dimension. [default=(0,) * len(kernel)
]channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default=
False
]including_pad (bool) – If true, border padding values are considered for the output. [default=
True
]
- Returns
Average values variable
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.global_average_pooling(x, n_outputs=- 1, outputs=None)[source]¶
Warning
This function is experimental support, so please do not actively use it.
Global average pooling. It pools an averaged value from the whole image
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.sum_pooling(x, kernel, stride=None, ignore_border=True, pad=None, channel_last=False, n_outputs=- 1, outputs=None)[source]¶
Sum pooling. It pools the summed values inside the scanning kernel:
\[y_{i_1, i_2} = \sum_{k1} \sum_{k2} x_{i_1 + k_1, i_2 + k_2}\]where \(x_{i_1 + k_1, i_2 + k_2}\) is the input and \(y_{i_1, i_2}\) is the output.
- Parameters
x (Variable) – Input variable.
stride (
tuple
ofint
) – Subsampling factors for each spatial axis. [default=kernel
]ignore_border (bool) – If false, kernels covering borders are also considered for the output. [default=
True
]pad (
tuple
ofint
) – Border padding values for each spatial axis. Padding will be added both sides of the dimension. [default=(0,) * len(kernel)
]channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default=
False
]
- Returns
Summed values variable
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.unpooling(x, kernel, channel_last=False, n_outputs=- 1, outputs=None)[source]¶
Inverse operation of pooling. It spreads the input values:
\[y_{k_1 i_1 + j_1, k_2 i_2 + j_2} = x_{i_1, i_2}\]where \(_{i_1, i_2}\) is the input and \(y_{k_1 i_1 + j_1, k_2 i_2 + j_2}\) is the output.
- Parameters
- Returns
Spread values variable
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.embed(x0, w, n_outputs=- 1, outputs=None)[source]¶
Embed slices of a matrix/tensor with indexing array/tensor.
- Parameters
- Returns
Output with shape \((I_0, ..., I_N, W_1, ..., W_M)\)
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.rnn(x, h, weight_l0, weight=None, bias=None, num_layers=1, nonlinearity='tanh', dropout=None, bidirectional=False, training=True, n_outputs=- 1, outputs=None)[source]¶
RNN function implements Elman RNN with nonlineraity to input sequence. RNN function is defined as following:
\[{\mathbf h_t} = {\mathbf \tanh}( {\mathbf w_{ih}} *{\mathbf x_t} + {\mathbf b_{ih}} + {\mathbf w_{hh}}* {\mathbf h_{(t-1)}} + {\mathbf b_{hh}}).\]We use the following notations to describe the inputs and outputs below. \(T\): sequcne length, \(B\): batch size, \(I\): input size, \(L\): number of layers, \(D\): number of directions, can be either 1 or 2, \(H\): hidden size.
References
- Parameters
x (Variable) – Input N-D array with shape \((T, B, I)\).
h (Variable) – Input N-D array with shape \((L, D, B, H)\).
weight_l0 (Variable) – Input N-D array with shape \((D, H, I + H)\). [parameter]
weight (Variable) – Input N-D array with shape \((L-1, D, H, D * H + H)\). [optional][parameter]
bias (Variable) – Input N-D array with shape \((L, D, H)\). [optional][parameter]
num_layers (int) – Number of layers in the network. If set to 1, only the weights for the first layer will be invoked. Default is 1. [default=
1
]nonlinearity (string) – Type of nonlinearity applied to input sequcne. Must be either tanh or relu. Default is tanh. [default=
'tanh'
]dropout (float) – Dropout ratio applied to parameters. Default is 0.0. [default=
0.0
]bidirectional (bool) – If True, bidirectional computation will be performed in each layer. Default is False. [default=
False
]training (bool) – Backpropagation will be performed only when it is true. Default is True. [default=
True
]
- Returns
Output \(y\) with shape \((T, B, D * H)\) ~nnabla.Variable: Output \(h_n\) with shape \((L, D, B, H)\)
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.lstm(x, h, c, weight_l0, weight=None, bias=None, num_layers=1, dropout=None, bidirectional=False, training=True, n_outputs=- 1, outputs=None)[source]¶
N-Step LSTM layer.
\[\begin{split}{\mathbf f_t} &=& {\mathbf \sigma}( {\mathbf W_f} *{\mathbf x_t} + {\mathbf U_f}* {\mathbf h_{(t-1)}} + {\mathbf b_f})\\ {\mathbf i_t} &=& {\mathbf \sigma}( {\mathbf W_i} *{\mathbf x_t} + {\mathbf U_i}* {\mathbf h_{(t-1)}} + {\mathbf b_i})\\ {\mathbf o_t} &=& {\mathbf \sigma}( {\mathbf W_o} *{\mathbf x_t} + {\mathbf U_o}* {\mathbf h_{(t-1)}} + {\mathbf b_o})\\ {\mathbf c_t} &=& {\mathbf f_t}\odot {\mathbf c_{(t-1)}} + {\mathbf i_t}\odot {\mathbf \tanh}({\mathbf W_c}*{\mathbf x_t} + {\mathbf U_c} *{\mathbf h_{(t-1)}} + {\mathbf b_c})\\ {\mathbf h_t} &=& {\mathbf o_t} \odot {\mathbf \tanh}({\mathbf c_t}).\end{split}\]We use the following notations to describe the inputs and outputs below. \(T\): sequcne length, \(B\): batch size, \(I\): input size, \(L\): number of layers, \(D\): number of directions, can be either 1 or 2, \(H\): hidden size.
References
- Parameters
x (Variable) – Input N-D array with shape \((T, B, I)\).
h (Variable) – Input N-D array with shape \((L, D, B, H)\).
c (Variable) – Input N-D array with shape \((L, D, B, H)\).
weight_l0 (Variable) – weight parameters for the first layer. Shape is \((D, 4, H, I + H)\). [parameter]
weight (Variable) – weight parameters for the second layer and above. Shape is \((L-1, D, 4, H, D * H + H)\). [optional][parameter]
bias (Variable) – Bias vector (\(L\)). Shape is \((L, D, 4, H)\). [optional][parameter]
num_layers (int) – Number of layers in the network. If set to 1, only the weights for the first layer will be invoked. Default is 1. [default=
1
]dropout (float) – Dropout ratio applied to parameters. Default is 0.0. [default=
0.0
]bidirectional (bool) – If True, bidirecitonal computation will be performed in each layer. Default is False. [default=
False
]training (bool) – Backpropagation will be performed only when it is True. Default is True. [default=
True
]
- Returns
Output \(y\) with shape \((T, B, D * H)\). Its memory layout can be reshaped as \((T, B, D, H)\). ~nnabla.Variable: Output \(h_n\) with shape \((L, D, B, H)\) ~nnabla.Variable: Output \(c_n\) with shape \((L, D, B, H)\)
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.gru(x, h, weight_l0, weight=None, bias=None, num_layers=1, dropout=None, bidirectional=False, training=True, n_outputs=- 1, outputs=None)[source]¶
N-Step GRU layer.
\[\begin{split}{\mathbf r_t} &=& {\mathbf \sigma}( {\mathbf W_r} *{\mathbf x_t} + {\mathbf U_r}* {\mathbf h_{(t-1)}} + {\mathbf b_r})\\ {\mathbf z_t} &=& {\mathbf \sigma}( {\mathbf W_z} *{\mathbf x_t} + {\mathbf U_z}* {\mathbf h_{(t-1)}} + {\mathbf b_z})\\ {\mathbf n_t} &=& {\mathbf \tanh}( {\mathbf W_n}{\mathbf x_t}+ {\mathbf b_{in}}+ {\mathbf r_n}\odot( {\mathbf U_n}{\mathbf h_{t-1}}+ {\mathbf b_{hn}})) \\ {\mathbf h_t} &=& (1- {\mathbf z_t})\odot {\mathbf n_t} + {\mathbf z_t}\odot {\mathbf h_{t-1}}.\end{split}\]We use the following notations to describe the inputs and outputs below. \(T\): sequcne length, \(B\): batch size, \(I\): input size, \(L\): number of layers, \(D\): number of directions, can be either 1 or 2, \(H\): hidden size.
References
- Parameters
x (Variable) – Input N-D array with shape \((T, B, I)\).
h (Variable) – Input N-D array with shape \((L, D, B, H)\).
weight_l0 (Variable) – weight parameters for the first layer. Shape is \((D, 3, H, I + H)\). [parameter]
weight (Variable) – weight parameters for the second layer and above. Shape is \((L-1, D, 3, H, D * H + H)\). [optional][parameter]
bias (Variable) – Bias vector (\(L\)). Shape is \((L, D, 4, H)\). [optional][parameter]
num_layers (int) – Number of layers in the network. If set to 1, only the weights for the first layer will be invoked. Default is 1. [default=
1
]dropout (float) – Dropout ratio applied to parameters. Default is 0.0. [default=
0.0
]bidirectional (bool) – If True, bidirecitonal computation will be performed in each layer. Default is False. [default=
False
]training (bool) – Backpropagation will be performed only when it is True. Default is True. [default=
True
]
- Returns
Output \(y\) with shape \((T, B, D * H)\). Its memory layout can be reshaped as \((T, B, D, H)\). ~nnabla.Variable: Output \(h_n\) with shape \((L, D, B, H)\)
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.multi_head_attention(query, key, value, num_heads, q_weight, k_weight, v_weight, out_weight, q_bias=None, k_bias=None, v_bias=None, out_bias=None, attn_bias_k=None, attn_bias_v=None, dropout=0.0, additive_mask=None, key_padding_mask=None)[source]¶
MultiHeadAttention.
Computes multi-headed attention with query, key, and value. We use the following notations to describe the inputs and outputs below. \(L_T\): target sequence length, \(L_S\): source sequence length, \(B\): batch size, \(D\): input dimension, \(E\): embedding dimension, \(H\): number of attention heads.
References
A. Vaswani et al. “Attention is All You Need.” NIPS. 2017. <https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf>
- Parameters
query (Variable) – Input N-D array with shape \((L_T, B, D_q)\).
key (Variable) – Input N-D array with shape \((L_S, B, D_k)\).
value (Variable) – Input N-D array with shape \((L_S, B, D_v)\).
num_heads (int) – Number of attention heads. Note that embedding dimensoin E must be divisible by the number of heads. Default is 12 which is conventional.
q_weight (Variable) – Input N-D array with shape \((D_q, E)\).
k_weight (Variable) – Input N-D array with shape \((D_k, E)\).
v_weight (Variable) – Input N-D array with shape \((D_v, E_v)\).
out_weight (Variable) – Input N-D array with shape \((D_v, E_{out})\).
q_bias (Variable, optional) – Input N-D array with shape \((E, )\).
k_bias (Variable, optional) – Input N-D array with shape \((E, )\).
v_bias (Variable, optional) – Input N-D array with shape \((E_v, )\).
out_bias (Variable, optional) – Input N-D array with shape \((E_{out}, )\).
attn_bias_k (Variable, optional) – Input N-D array with shape \((E, )\).
attn_bias_v (Variable, optional) – Input N-D array with shape \((E_v, )\).
dropout (float, optional) – Dropout ratio applied to parameters. Default is 0.
additive_mask (Variable, optional) – Input N-D array with shape \((L_T, L_S)\). Values will be added to the attention layer to prevent attention to certain positions.
key_padding_mask (Variable, optional) – Input N-D array with shape \((B, L_S)\). Specified padding elements will be ignored by the attention layer. Values must be either 1 or 0.
- Returns
Output \(y\) with shape \((L_T, B, E_{out})\) ~nnabla.Variable: Output \(h_n\) with shape \((B, L_T, L_S)\)
- Return type
- nnabla.functions.patch_correlation(x1, x2, patch=(1, 1), shift=(0, 0), patch_step=(1, 1), shift_step=(1, 1), padding=(0, 0, 0, 0), channel_last=False)[source]¶
Multiplicative patch-wise comparison between inputs
x1
andx2
, which must both be 4-dimensional NCHW (withchannel_last=False
) or NHWC (withchannel_last=True
) arrays (where N is the number of samples, H and W are the sample height and width and C is the number of channels). The function returns a 5-D array with shape \((N, C_y, C_x, H_o, W_o)\) where \(H_o, W_o\) are determined by the possible patch locations within the, optionally padded, input image sizeand \(C_y, C_x\) are determined by the optionally shifted patch positions.Mathematically, the patch correlation is formulated as
\[O(s_y, s_x, h_0, w_0) = \sum_{c} \sum_{k_h} \sum_{k_w} I_1(c, h + k_h, w + k_w) \times I_2(c, h + k_h + s_h, w + k_w + s_w),\]where \(I_1(c, h, w)\) and \(I_2(c, h, w)\) are the inputs at \(c\)-th channel, \(h\)-th height, and \(w\)-th width, \(k_h, k_w\) indices for the patch size and \(s_h, s_w\) indices for the shifts.
A single correlation value (per sample) is produced if the patch extends to the image dimensions and all other parameters use the default values.
>>> import numpy as np, nnabla as nn, nnabla.functions as F >>> N, C, H, W = (1, 2, 3, 4) >>> x = nn.Variable.from_numpy_array(np.ones([N, C, H, W])) >>> F.patch_correlation(x, x, patch=(H, W)).d array([[[[[24.]]]]], dtype=float32)
A patch that is smaller than the image size moves horizontal and vertical producing a value per position. The
patch_step
argument may be used to control the position increments.>>> F.patch_correlation(x, x, patch=(H-1, W-1)).d array([[[[[12., 12.], [12., 12.]]]]], dtype=float32) >>> F.patch_correlation(x, x, patch=(H-1, W-1), patch_step=(2, 1)).d array([[[[[12., 12.]]]]], dtype=float32)
Multiple correlations may be performed at each position between the patch from
x1
and patches fromx2
at relative offsets striding the maximum vertical and horizontal distance given by theshift
values at increments ofshift_step
. The shifted correlation values can be obtained for the from the second and third output dimension for the vertical and horizontal shifts.>>> F.patch_correlation(x, x, (H, 1), shift=(0, 1)).shape (1, 1, 3, 1, 4) >>> F.patch_correlation(x, x, (H, 1), shift=(0, 1)).d array([[[[[0., 6., 6., 6.]], [[6., 6., 6., 6.]], [[6., 6., 6., 0.]]]]], dtype=float32) >>> F.patch_correlation(x, x, (H, 1), shift=(0, 1), shift_step=(1, 2)).d array([[[[[0., 6., 6., 6.]], [[6., 6., 6., 0.]]]]], dtype=float32)
Padding with zero values may be applied individually to the top, bottom, left and right side of the input image.
>>> F.patch_correlation(x, x, patch=(H, W), padding=(0, 1, W, W)).d array([[[[[ 0., 6., 12., 18., 24., 18., 12., 6., 0.], [ 0., 4., 8., 12., 16., 12., 8., 4., 0.]]]]], dtype=float32)
This function may be used to implement the FlowNetC correlation layer.
>>> N, C, H, W = (1, 256, 44, 60) >>> x1, x2 = nn.Variable((N, C, H, W)), nn.Variable((N, C, H, W)) >>> F.patch_correlation(x1, x2, shift=20, shift_step=2).shape (1, 21, 21, 44, 60)
References
- Parameters
x1 (Variable) – Input N-D array with shape \((N, C, H, W)\) or \((N, H, W, C)\).
x2 (Variable) – Input N-D array with shape \((N, C, H, W)\) or \((N, H, W, C)\).
patch – A tuple with height and width of the correlation patch. A single integer expands to identical height and width.
shift – A tuple of maximum vertical and horizontal displacement of patches from
x2
that are correlated with a single patch fromx1
. A single integer expands to identical vertical and horizontal displacement.patch_step – A tuple of vertical and horizontal increments for advancing the position of the correlation patch within the input image shape. A single integer expands to identical vertical and horizontal increments.
shift_step – A tuple of vertical and horizontal increments for advancing the relative offset position within the shift range. A single integer expands to identical vertical and horizontal increments.
padding – A tuple of top, bottom, left and right padding extent. A tuple of two values yields identical top/bottom and left/right padding from the first and second tuple value. A single integer expands to indential padding extent for all sides.
channel_last – Last dimension is the channel (NHWC order) if True.
- Returns
N-D array with shape \((N, C_y, C_x, H_o, W_o)\) or \((N, H, W, C_y, C_x)\) if
channel_last=True
.A spatial size of the output is calculated as
\[H_o = \frac{H + (top\_pad + bottom\_pad) - patch_v}{patch\_step_v} + 1.\]A channel size of the ouptut is calculated as
\[C_y = \frac{2 \times shift_v}{shift\_step_v} + 1.\]\(W_o\) and \(C_x\) are the same calculation with differenct components.
- Return type
Neural Network Activation¶
- nnabla.functions.sigmoid(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise sigmoid function.
\[f(x) = \frac{1}{1 + \exp(-x)},\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.swish(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise swish function, by Ramachandran et al. (2017).
\[y_i = \frac{x_i}{1 + \exp(-x_i)},\]References
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.tanh(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise hyperbolic tangent (tanh) function.
\[y_i = \tanh (x_i)\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.relu(x, inplace=False, n_outputs=- 1, outputs=None)[source]¶
Element-wise Rectified Linear Unit (ReLU) function.
\[y_i = \max (0, x_i)\]- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.softmax(x, axis=None, n_outputs=- 1, outputs=None)[source]¶
Softmax normalization. Calculates
\[y_i = \frac{\exp(x_i)}{\sum_j \exp(x_j)}\]along the dimension specified by
axis
, where \(x_i\) is the input and \(y_i\) is the output.- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.log_softmax(x, axis=None, n_outputs=- 1, outputs=None)[source]¶
Fused operation of Softmax normalization followed by log, which is defined as
\[y_i = \log \frac{\exp(x_i)}{\sum_j \exp(x_j)},\]where \(y_i\) is the input and \(x_i\) is the output at i-th channel. An advantage of this fusion is reducing the numerical instability due to the log application.
The original definition can be rewritten as
\[y_i = x_i - \max_j(x_j) - \log\left(\sum_j \exp(x_j - \max_k(x_k))\right).\]It is more stable as a log is always applied to a value \(\ge e\), while a log can be evaluated for 0 in the non-fused operation.
Also, backward gradient computation is more stable than the original one as it doesn’t perform division by x due to a gradient of log. The definition is as following.
\[dx_i = dy_i - y_i * \sum_j dy_j\]where \(dx_i\) and \(dy_i\) denote gradients of loss wrt \(x_i\) and \(y_i\) respectively.
- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.elu(x, alpha=1.0, n_outputs=- 1, outputs=None)[source]¶
Element-wise Exponential Linear Unit (ELU) function.
\[\begin{split}y_i= \left\{ \begin{array}{ll} x_i & (x > 0)\\ \alpha (\exp(x_i) - 1) & (x \leq 0) \end{array} \right..\end{split}\]References
- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.selu(x, scale=1.05070098735548, alpha=1.673263242354377, n_outputs=- 1, outputs=None)[source]¶
Element-wise Scaled Exponential Linear Unit (SELU) function by Klambauer et al. (2017).
\[\begin{split}y_i= \lambda \left\{ \begin{array}{ll} x_i & (x > 0)\\ \alpha (\exp(x_i) - 1) & (x \leq 0) \end{array} \right..\end{split}\]The coefficients \(\lambda\) and \(\alpha\) default to the following values \(\lambda_{01}\) and \(\alpha_{01}\), respectively, provided by Klambauer et al. (2017):
\[\begin{split}\begin{array}{lll} \lambda_{01} &=& \left( 1 - \operatorname{erfc}\left( \frac{1}{\sqrt{2}} \right) \sqrt{e} \right) \sqrt{2 \pi} \\ && \left( 2 \operatorname{erfc} \left( \sqrt{2} \right) e^2 + \pi \operatorname{erfc}\left( \frac{1}{\sqrt{2}} \right)^2 e \right. \\ && \left. - 2(2 + \pi) \operatorname{erfc} \left( \frac{1}{\sqrt{2}} \right) \sqrt{e} + \pi + 2 \right)^{-1/2} \\ &\approx& 1.0507 \\ \alpha_{01} &=& - \frac {\sqrt {\frac {2}{\pi}}} {\operatorname{erfc} \left( \frac{1}{\sqrt{2}} \right) \exp \left(\frac {1} {2} \right) - 1} \\ &\approx& 1.67326 \end{array}\end{split}\]References
- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.crelu(x, axis=1, n_outputs=- 1, outputs=None)[source]¶
Element-wise Concatenated Rectified Linear Unit (CReLU) function. This function calculates the ReLU of \(x\) and \(-x\) , then concatenates the results together at a specified axis, and returns the resulting array.
References
- Parameters
- Returns
N-D array where axis dimension is doubled by concatenating.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.celu(x, alpha=1.0, axis=1, n_outputs=- 1, outputs=None)[source]¶
Element-wise Concatenated Exponential Linear Unit (CELU) function. Concatenates ELU outputs of positive and negative inputs together at specified axis.
- Parameters
- Returns
N-D array where axis dimension is doubled by concatenating.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.gelu(x, n_outputs=- 1, outputs=None)[source]¶
Gaussian Error Unit (GELU) function.
\[GELU(x) = xP(X \leq x) = x \Phi (x)\]which is approximated by
\[GELU(x) = 0.5x (1 + \tanh ( \sqrt(2/\pi)(x + 0.044715x^3) ))\]References
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.mish(x, n_outputs=- 1, outputs=None)[source]¶
Mish activation function.
\[Mish(x) = x \tanh(\log(1+\exp(x_i)))\]References
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.prelu(x0, x1, base_axis=1, n_outputs=- 1, outputs=None)[source]¶
Element-wise Parametrized Rectified Linear Unit function. Calculates:
\[y_i = \max(0, x_i) + w_i \min(0, x_i)\]where negative slope \(w\) is learned and can vary across channels (an axis specified with
base_axis
).- Parameters
- Returns
N-D array.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.leaky_relu(x, alpha=0.1, inplace=False, n_outputs=- 1, outputs=None)[source]¶
Element-wise Leaky Rectified Linear Unit (ReLU) function.
It is defined as:
\[y_i = \alpha * \min(0, x_i) + \max (0, x_i)\]- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.relu6(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise ReLU6 function. Capping ReLU activation to 6 is often observed to learn sparse features earlier.
\[ReLU6(x) = \min(\max(0,x,),6)\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.hard_sigmoid(x, n_outputs=- 1, outputs=None)[source]¶
Segment-wise linear approximation of sigmoid. Preferable when speed of computation is more important than precision. Returns \(0\) if \(x < -2.5\). Returns \(1\) if \(x> 2.5\). Returns \(0.2x + 0.5\) if \(-2.5 <= x <= 2.5\).
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.hard_tanh(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise HardTanh function. Computationally cheaper than Tanh function. Returns \(1\) if \(x > 1\). Returns \(-1\) if \(x < -1\). Returns \(x\) otherwise.
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.log_sigmoid(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise LogSigmoid function.
\[LogSigmoid(x) = \log(1/(1+\exp(-x_i)))\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.softplus(x, beta=1.0, n_outputs=- 1, outputs=None)[source]¶
Element-wise SoftPlus function. Unlike Sigmoid and Tanh that have upper and lower bound, SoftPlus is only lower-bounded by 0.
\[SoftPlus(x) = \frac{1}{\beta} * \log(1+\exp(\beta * x_i))\]- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.softsign(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise SoftSign. Can be used in place of Tanh function. While Tanh converges exponentially, SoftSign converges polynomially.
\[SoftSign(x) = x/(1+|x|)\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.tanh_shrink(x, n_outputs=- 1, outputs=None)[source]¶
Element-wies TanhShrink function.
\[TanhShrink(x) = x - \tanh(x)\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.sinc(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise Sinc function. Unlike other popular activation functions, it has rises and falls. returns \(1\) if \(x = 0\). returns \(\sin(x)/x\) otherwise.
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Normalization¶
- nnabla.functions.batch_normalization(x, beta, gamma, mean, variance, axes=[1], decay_rate=0.9, eps=1e-05, batch_stat=True, output_stat=False, n_outputs=None)[source]¶
Batch normalization.
\[\begin{split}\begin{eqnarray} \mu &=& \frac{1}{M} \sum x_i \\ \sigma^2 &=& \frac{1}{M} \sum \left(x_i - \mu\right)^2 \\ \hat{x}_i &=& \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon}} \\ y_i &=& \hat{x}_i \gamma + \beta. \end{eqnarray}\end{split}\]At testing time, the mean and variance values used are those that were computed during training by moving average.
References
- Parameters
x (Variable) – N-D array of input.
beta (Variable or None) – N-D array of beta which is learned. If None, the bias term is omitted.
gamma (Variable or None) – N-D array of gamma which is learned. If None, the scale term is omitted.
mean (Variable or None) – N-D array of running mean (modified during forward execution). If None, dummy variable is created and running mean is not updated. mean=None with batch_stat=False is prohibited.
variance (Variable or None) – N-D array of running variance (modified during forward execution). If None, dummy variable is created and running variance is not updated. variance=None with batch_stat=False is prohibited.
axes (list of int or int) – Mean and variance are calculated along these axes.
decay_rate (float) – Decay rate of running mean and variance.
eps (float) – Tiny value to avoid zero division by std.
batch_stat (bool) – Use mini-batch statistics rather than running ones. If False, mean and variance must be
~nnabla.Variable
. (None is prohibited.)output_stat (bool) – It true, the batch statistics of mean and variance, will be returned as Variables. They are also differentiable.
- Returns
Returns batch normalization output as
Variable
. Ifoutput_stat=True
, it also returns the mean and variance of the mini-batch
See also
nnabla.function_bases.batch_normalization
.
- nnabla.functions.fused_batch_normalization(x, beta, gamma, mean, variance, z=None, axes=[1], decay_rate=0.9, eps=1e-05, batch_stat=True, nonlinearity='relu', output_stat=False, n_outputs=None)[source]¶
Batch normalization fused with an add operation and an activation.
References
- Parameters
x (Variable) – N-D array of input.
beta (Variable or None) – N-D array of beta which is learned. If None, the bias term is omitted.
gamma (Variable or None) – N-D array of gamma which is learned. If None, the scale term is omitted.
mean (Variable or None) – N-D array of running mean (modified during forward execution). If None, dummy variable is created and running mean is never updated. mean=None with batch_stat=False is prohibited.
variance (Variable) – N-D array of running variance (modified during forward execution). If None, dummy variable is created and running variance is not updated. variance=None with batch_stat=False is prohibited.
z (Variable, optional) – N-D array
axes (list of int or int) – Mean and variance are calculated along these axes.
decay_rate (float) – Decay rate of running mean and variance.
eps (float) – Tiny value to avoid zero division by std.
batch_stat (bool) – Use mini-batch statistics rather than running ones. If False, mean and variance must be
~nnabla.Variable
. (None is prohibited.)nonlinearity (str) – Nonlinearity chosen from relu. Default is relu.
output_stat (bool) – It true, the batch statistics of mean and variance, will be returned as Variables. They are also differentiable.
- Returns
Returns batch normalization output as
Variable
. Ifoutput_stat=True
, it also returns the mean and variance of the mini-batch
See also
nnabla.function_bases.batch_normalization
.
- nnabla.functions.sync_batch_normalization(x, beta, gamma, mean, variance, comm, group='world', axes=[1], decay_rate=0.9, eps=1e-05, batch_stat=True, output_stat=False, n_outputs=None)[source]¶
Synchronized batch normalization.
For some tasks (e.g., semantic segmentation), batch size will be too small and BatchNormalization layer might not work well. SyncBatchNorlization layer solves these problems by synchronizing batch stats (mean and var) between multiple processes.
\[\begin{split}\begin{eqnarray} \mu &=& \frac{1}{M} \sum x_i \\ \sigma^2 &=& \frac{1}{M} \left(\sum x_i - \mu\right)^2 \\ \hat{x}_i &=& \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon}} \\ y_i &=& \hat{x}_i \gamma + \beta. \end{eqnarray}\end{split}\]References
Implementing Synchronized Multi-GPU Batch Normalization https://hangzhang.org/PyTorch-Encoding/notes/syncbn.html
- Parameters
x (Variable) – N-D array of input.
beta (Variable or None) – N-D array of beta which is learned. If None, the bias term is omitted.
gamma (Variable or None) – N-D array of gamma which is learned. If None, the scale term is omitted.
mean (Variable or None) – N-D array of running mean (modified during forward execution). If None, dummy variable is created and running mean is never updated. mean=None with batch_stat=False is prohibited.
variance (Variable or None) – N-D array of running variance (modified during forward execution). If None, dummy variable is created and running variance is never updated. variance=None with batch_stat=False is prohibited.
comm (Communicator) – The communicator
group (string) – The name of the communicator group
axes (list of int or int) – Mean and variance are calculated along these axes.
decay_rate (float) – Decay rate of running mean and variance.
eps (float) – Tiny value to avoid zero division by std.
batch_stat (bool) – Use mini-batch statistics rather than running ones. If False, mean and variance must be
~nnabla.Variable
. (None is prohibited.)output_stat (bool) – It true, the batch statistics of mean and variance, will be returned as Variables. They are also differentiable.
- Returns
Returns batch normalization output as
Variable
. Ifoutput_stat=True
, it also returns the mean and variance of the mini-batch
See also
nnabla.function_bases.batch_normalization
.
- nnabla.functions.mean_subtraction(x, mean, t, base_axis=1, update_running_mean=True)[source]¶
It subtracts the mean of the elements of the input array, and normalizes it to \(0\). Preprocessing arrays with this function has the effect of improving accuracy in various tasks such as image classification.
At training time, this function is defined as
\[\begin{split}\begin{eqnarray} \mu &=& \frac{1}{M} \sum x_i \\ y_i &=& x_i - \mu \end{eqnarray}\end{split}\]At testing time, the mean values used are those that were computed during training by moving average.
Note
The backward performs an approximated differentiation that takes into account only the latest mini-batch.
- Parameters
x (Variable) – N-D array of input.
mean (Variable) – N-D array of running mean (modified during forward execution).
t (Variable) – Scalar of num of iteration of running mean (modified during forward execution).
base_axis (int) – Base axis of Mean Subtraction operation. Dimensions up to base_axis is treated as sample dimension. [default=``1``]
update_running_mean (bool) – Update running mean during forward execution. [default=``True``]
- Returns
N-D array.
- Return type
See also
nnabla.function_bases.mean_subtraction
.
- nnabla.functions.norm_normalization(x, p=None, axes=None, eps=1e-12)[source]¶
Norm normalization.
\[y = \frac{x_i}{\|x\|_p}\]- Parameters
x (Variable) – N-D array.
p (float) – Order of the norm. [default=
2
]axes (repeated int64) – Axes to be reduced. If empty list is given, all dimensions are reduced. [default=
range(x.ndim)
]eps (float) – Epsilon for the normalization. This
eps
is added before taking the p-th root in the norm computation. [default=1e-12
]
- Returns
N-D array
- Return type
- nnabla.functions.clip_by_value(x, min, max)[source]¶
Clip inputs by values.
\[\begin{split}y = \begin{cases} max & (x > max) \\ x & (otherwise) \\ min & (x < min) \end{cases}.\end{split}\]- Parameters
x (Variable) – An input variable.
min (Variable or float) – A min variable or float value by which
x
is clipped. Note that if Variable is given, its shape must be the same asx
’s.max (Variable or float) – A max variable or float value by which
x
is clipped. Note that if Variable is given, its shape must be the same asx
’s
- Returns
N-D array.
- Return type
- nnabla.functions.clip_grad_by_value(x, min, max, n_outputs=- 1, outputs=None)[source]¶
In forward pass, the function behaves as the identity.
In backward pass,
\[\begin{split}g_x = \begin{cases} max & (g_y > max) \\ g_y & (otherwise) \\ min & (g_y < min) \end{cases}.\end{split}\]A typical case for use is to prevent the gradient explosion through a whole computational graph. For example, if you want to clip gradient values for each feature map,
x = nn.Variable([16, 3, 32, 32]) min = F.broadcast(nn.Variable.from_numpy_array(np.asarray([-1.0]).reshape((1, 1, 1, 1))), (16, 3, 32, 32)) max = F.broadcast(nn.Variable.from_numpy_array(np.asarray([1.0]).reshape((1, 1, 1, 1))), (16, 3, 32, 32)) c = F.clip_grad_by_value(x, min=min, max=max) h = PF.convolution(c, 64, (3, 3), pad=(1, 1))
- Parameters
x (Variable) – N-D array of input.
min (Variable) – N-D array of minimum input value by which the gradients of the
y
are clipped. Note that the shape ofmin
must be the same asx
’s and the backward tomin
is not performed.max (Variable) – N-D array of maximum input value by which the gradients of the
y
are clipped. Note that the shape ofmax
must be the same asx
’s and the backward tomax
is not performed.
- Returns
N-D array.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.clip_by_norm(x, clip_norm, axis=None)[source]¶
Clip inputs by its L2 norm when the L2 norm is larger than the threshold value (defined by clip_norm). If it is less than the threshold, inputs are not modified. If it is applied, the operation is represented as
\[y = N \times \frac{x}{\|x\|_2}.\]where \(x\) is the input, \(y\) is the output, and \(N\) is
clip_norm
. this is the case thataxes
is not set. Whenaxes
is set, the norm is computed overaxes
.- Parameters
- Returns
N-D array.
- Return type
- nnabla.functions.clip_grad_by_norm(x, clip_norm=None, axes=None, n_outputs=- 1, outputs=None)[source]¶
In the forward pass, the function behaves like the identity.
In the backward pass,
\[g_x = N \times \frac{g_y}{\|g_y\|_2}.\]where \(g_x\) is the gradient w.r.t the input, \(g_y\) is the gradient w.r.t. the output, and \(N\) is
clip_norm
where the norm of \(g_y\) becomes. this is the case thataxes
is not set. Whenaxes
is set, the norm is computed overaxes
.A typical case for use is to prevent the gradient explosion through a whole computational graph. For example, if you want to normalize gradient values over feature axis,
x = nn.Variable([16, 3, 32, 32]) c = F.clip_grad_by_norm(x, axes=(1, )) h = PF.convolution(c, 64, (3, 3), pad=(1, 1))
- Parameters
- Returns
N-D array.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.layer_normalization(x, beta, gamma, batch_axis=0, eps=1e-05, output_stat=False)[source]¶
Applies Layer Normalization over an input tensor, which is defined as:
\[\begin{split}\begin{eqnarray} \mu^l &=& \frac{1}{H} \sum_{i=1}^{H} x_i^l \\ \sigma^l &=& \sqrt{\frac{1}{H} \sum_{i=1}^{H} \left(x_i^l - \mu^l\right)^2 + \epsilon} \\ y &=& \frac{x - \mu^l}{\sigma^l} \gamma + \beta \end{eqnarray}\end{split}\]where \(x\) and \(y\) are input and output variable, \(\mu^l\) and \(\sigma^l\) are the mean and std of each layer which is separately calculated for each batch, and \(\beta\) and \(\gamma\) are adaptive biases and gains.
If the input shape is [B, C, H, W] (= batch_axis=0), the shape of calculated mean and std are [B, 1, 1, 1]
References
- Parameters
x (Variable) – An input variable.
beta (Variable or None) – An Adaptive biases. If None, the bias term is omitted.
gamma (Variable or None) – An Adaptive gains. If None, the scale term is omitted.
batch_axis (int or repeated int) – Axes mean and variance are taken.
eps (float) – Tiny value to avoid zero division by std.
output_stat (bool) – If true, calculated mean and variance are also returned.
- Returns
output variable which is normalized its statics and rescaled by alpha and beta. *
Variable
: Mean (if ``output_stat=True`). *Variable
: Std (if ``output_stat=True`)- Return type
- nnabla.functions.instance_normalization(x, beta, gamma, channel_axis=1, batch_axis=0, eps=1e-05, output_stat=False)[source]¶
Applies Instance Normalization over an input tensor, which is defined as:
\[\begin{split}\begin{eqnarray} \mu^i &=& \frac{1}{H} \sum_{i=1}^{H} x_i^i \\ \sigma^i &=& \sqrt{\frac{1}{H} \sum_{i=1}^{H} \left(x_i^i - \mu^i\right)^2 + \epsilon} \\ y &=& \frac{x - \mu^i}{\sigma^i} \gamma + \beta \end{eqnarray}\end{split}\]where \(x\) and \(y\) are input and output variable, \(\mu^i\) and \(\sigma^i\) are the mean and std of each instance which is separately calculated for each batch and channel, and \(\gamma\) and \(\beta\) are adaptive gains and biases.
If the input shape is [B, C, H, W] (= channel_axis=1, batch_axis=0), the shape of calculated mean and std are [B, C, 1, 1]
References
- Parameters
x (Variable) – An input variable.
beta (Variable or None) – An Adaptive biases. If None, the bias term is omitted.
gamma (Variable or None) – An Adaptive gains. If None, the scale term is omitted.
channel_axis (int) – Channel axis.
batch_axis (int or repeated int) – Batch axes.
eps (float) – Tiny value to avoid zero division by std.
output_stat (bool) – If true, the batch statistics of mean and variance.
- Returns
Normalized output variable. *
Variable
: Mean (if ``output_stat=True`) *Variable
: Std (if ``output_stat=True`)- Return type
- nnabla.functions.group_normalization(x, beta, gamma, num_groups, channel_axis=1, batch_axis=0, eps=1e-05, output_stat=False)[source]¶
Applies Group Normalization over an input tensor, which is defined as:
\[\begin{split}\begin{eqnarray} \mu^g &=& \frac{1}{H} \sum_{i=1}^{H} x_i^g \\ \sigma^g &=& \sqrt{\frac{1}{H} \sum_{i=1}^{H} \left(x_i^g - \mu^g\right)^2 + \epsilon} \\ y &=& \frac{x - \mu^g}{\sigma^g} \gamma + \beta \end{eqnarray}\end{split}\]where \(x\) and \(y\) are input and output variable, \(\mu^g\) and \(\sigma^g\) are the mean and std of each group which contains
num_channels / num_groups
channels, and \(\gamma\) and \(\beta\) are adaptive gains and biases.The input channels, specified by
channel_axis
, are separated intonum_groups
groups, and the mean and std are calculated over the each group. For example, if the input shape is [B, C, H, W] (= channel_axis=1, batch_axis=0), an input variable is once reshaped to [B, num_groups, C / num_groups, H, W] and standardize by its mean and std whose shapes are [B, num_groups, 1, 1, 1]. Finally, an output variable is reshaped again to the original input shape (= [B, C, H, W] in the case above).References
- Parameters
x (Variable) – An input variable.
beta (Variable or None) – An Adaptive biases. If None, the bias term is omitted.
gamma (Variable or None) – An Adaptive gains. If None, the scale term is omitted.
num_groups (int) – A number of groups. The channel dim of ‘x’ must be integer multiple of
num_groups
.channel_axis (int) – Channel axis.
batch_axis (int or repeated int) – Batch axes.
eps (float) – Tiny value to avoid zero division by std.
output_stat (bool) – If true, the batch statistics of mean and variance.
- Returns
Normalized output variable. *
Variable
: Mean (if ``output_stat=True`) *Variable
: Std (if ``output_stat=True`)- Return type
- nnabla.functions.weight_standardization(w, channel_axis=0, eps=1e-05, output_stat=False)[source]¶
Applies Weight Standardization over an input weight, which is defined as:
\[\begin{split}\begin{eqnarray} \mu_{W_i} &=& \frac{1}{I} \sum_{j=1}^{I} W_{ij} \\ \sigma_{W_i} &=& \sqrt{\frac{1}{I} \sum_{i=1}^{I} \left(W_{ij} - \mu_{W_{i}}\right)^2 + \epsilon} \\ \hat{W_{ij}} &=& \frac{W_{ij} - \mu_{W_i}}{\sigma_{W_i}} \\ y &=& \hat{W} \ast x \end{eqnarray}\end{split}\]Example
import numpy as np import nnabla as nn import nnabla.functions as F import nnabla.parametric_functions as PF rng = np.random.RandomState(313) x = nn.Variable.from_numpy_array(rng.randn(*(32, 16, 3, 3))) # For convolution: def ws_callback_conv(w): return F.weight_standardization(w, channel_axis=0) y = PF.convolution(x, 10, (2, 2), apply_w=ws_callback_conv) # For affine: def ws_callback_affine(w): return F.weight_standardization(w, channel_axis=1) y = PF.affine(x, 10, apply_w=ws_callback_affine)
References
- nnabla.functions.weight_normalization(w, g, dim=0, eps=1e-12, n_outputs=- 1, outputs=None)[source]¶
Weight normalization.
\[\mathbf{w}_{WN} = g \dfrac{\mathbf{w}}{\|\mathbf{w}\|}\]where \(\mathbf{w}\) is the input weights to be normalized. and \(g\) is learnable multiplication factors each of which is applied to each data at
dim
.References
- Parameters
w (Variable) – N-D array of learnable weights.
g (Variable) – 1-D array of learnable scales.
dim (int) – Output dimension. For the other dimensions, the norms are computed. [default=
0
]eps (float) – Epsilon for the normalization. This
eps
is added before taking the sqrt in the norm computation. [default=1e-12
]
- Returns
N-D array
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.spectral_norm(w, u, dim=0, itr=1, eps=1e-12, test=False, n_outputs=- 1, outputs=None)[source]¶
Spectral Normalization.
\[\begin{split}W_{sn} = \\frac{W}{\\sigma(W)}.\end{split}\]where \(W\) is the input matrix, and the \(\\sigma(W)\) is the spectral norm of \(W\). The spectral norm is approximately computed by the power iteration.
References
Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida, “Spectral Normalization for Generative Adversarial Networks”, International Conference on Learning Representations. 2018.
- Parameters
w (Variable) – N-D array of learnable weights. This is normally network parameter.
u (Variable) – 1-D array of singular vector. When
test == False
, the data region ofu
will be updated during forward calculation.dim (int) – Output dimension. Default is 0. If the dimension is not 0, then the specified dimension becomes the most-left dimension by transposing. [default=
0
]itr (int) – Number of power iterations. Default is 1. [default=
1
]eps (float) – Epsilon for the normalization. This
eps
is added before taking the sqrt in the norm computation. [default=1e-12
]test (bool) – When in
True
,u
will not be updated. Default isFalse
. [default=False
]
- Returns
Spectrally normalized \(W_{sn}\) with the same shape as \(W\).
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Reduction¶
- nnabla.functions.sum(x, axis=None, keepdims=False)[source]¶
Reduction along axes with sum operation.
- Parameters
- Returns
N-D array.
- Return type
- nnabla.functions.mean(x, axis=None, keepdims=False)[source]¶
Reduction along axes with mean operation.
- Parameters
- Returns
N-D array.
- Return type
- nnabla.functions.max(x, axis=None, keepdims=False, with_index=False, only_index=False)[source]¶
Reduce the input N-D array
x
along the givenaxis
using the max operation. Theaxis
argument may be a single integer to reduce over one axis, a tuple of integers to reduce over multiple axes, orNone
to reduce over all axes. Ifkeepdims
isTrue
, the output will keep all reduced dimensions with size 1. Ifwith_index
is True, result is a tuple(sorted, indices)
or onlyindices
ifonly_index
is True. Settingonly_index
to True implies thatwith_index
is also True.import numpy as np import nnabla as nn import nnabla.functions as F nn.set_auto_forward(True) x = nn.Variable.from_numpy_array(np.random.rand(2, 3, 4)) maxval = F.max(x, axis=1) assert np.allclose(maxval.d, np.max(x.d, axis=1)) maxval, indices = F.max(x, axis=1, with_index=True) assert np.allclose(maxval.d, np.max(x.d, axis=1)) assert np.all(indices.d == np.argmax(x.d, axis=1)) indices = F.max(x, axis=1, only_index=True) assert np.all(indices.d == np.argmax(x.d, axis=1))
- Parameters
x (Variable) – An input variable.
axis (None, int or tuple of ints) – Axis or axes along which max is calculated. The default value
None
will reduce all dimensions.keepdims (bool) – Keep reduced axes as dimension with 1 element.
with_index (bool) – Return tuple of max values and index.
only_index (bool) – Return only the index of max values.
- Returns
N-D array.
- Return type
- nnabla.functions.min(x, axis=None, keepdims=False, with_index=False, only_index=False)[source]¶
Reduce the input N-D array
x
along the givenaxis
using the min operation. Theaxis
argument may be a single integer to reduce over one axis, a tuple of integers to reduce over multiple axes, orNone
to reduce over all axes. Ifkeepdims
isTrue
, the output will keep all reduced dimensions with size 1. Ifwith_index
is True, result is a tuple(sorted, indices)
or onlyindices
ifonly_index
is True. Settingonly_index
to True implies thatwith_index
is also True.import numpy as np import nnabla as nn import nnabla.functions as F nn.set_auto_forward(True) x = nn.Variable.from_numpy_array(np.random.rand(2, 3, 4)) minval = F.min(x, axis=1) assert np.allclose(minval.d, np.min(x.d, axis=1)) minval, indices = F.min(x, axis=1, with_index=True) assert np.allclose(minval.d, np.min(x.d, axis=1)) assert np.all(indices.d == np.argmin(x.d, axis=1)) indices = F.min(x, axis=1, only_index=True) assert np.all(indices.d == np.argmin(x.d, axis=1))
- Parameters
x (Variable) – An input variable.
axis (None, int or tuple of ints) – Axis or axes along which min is calculated. The default value
None
will reduce all dimensions.keepdims (bool) – Keep reduced axes as dimension with 1 element.
with_index (bool) – Return tuple of min values and index.
only_index (bool) – Return only the index of min values.
- Returns
N-D array.
- Return type
- nnabla.functions.norm(x, p=None, axis=None, keepdims=False)[source]¶
Reduction along axes with norm operation.
\[y = \|x\|_p = \left( \sum_i |x_i|^p \right)^{\frac{1}{p}}\]- Parameters
- Returns
N-D array.
- Return type
- nnabla.functions.prod(x, axis=None, keepdims=False)[source]¶
Reduction along axes with product operation.
- Parameters
- Returns
N-D array.
- Return type
Note
Backward computation is not accurate in a zero value input.
- nnabla.functions.reduce_sum(x, n_outputs=- 1, outputs=None)[source]¶
Reduction along an axis with sum operation.
Note
This is deprecated. Use
sum
instead.Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.reduce_mean(x, n_outputs=- 1, outputs=None)[source]¶
Reduction by mean along an axis.
Note
This is deprecated. Use
mean
instead.Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Arithmetic¶
- nnabla.functions.add2(x0, x1, inplace=False, n_outputs=- 1, outputs=None)[source]¶
Element-wise addition.
\[y_i = x^{(0)}_i + x^{(1)}_i\]- Parameters
- Returns
N-D array
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.add_n(*x, **kw)[source]¶
Element-wise addition.
\[y_i = x^{(0)}_i + . . . + x^{(n-1)}_i\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.sub2(x0, x1, inplace=False, n_outputs=- 1, outputs=None)[source]¶
Element-wise subtraction.
\[y_i = x^{(0)}_i - x^{(1)}_i\]- Parameters
- Returns
N-D array
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.mul2(x0, x1, inplace=False, n_outputs=- 1, outputs=None)[source]¶
Element-wise multiplication.
\[y_i = x^{(0)}_i x^{(1)}_i\]- Parameters
- Returns
N-D array
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.mul_n(*x, **kw)[source]¶
Element-wise multiplication.
\[y_i = x^{(0)}_i . . . x^{(n-1)}_i\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.div2(x0, x1, inplace=False, n_outputs=- 1, outputs=None)[source]¶
Element-wise division.
\[y_i = \frac{x^{(0)}_i} {x^{(1)}_i}\]- Parameters
- Returns
N-D array
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.pow2(x0, x1, inplace=False, n_outputs=- 1, outputs=None)[source]¶
Element-wise power function.
\[y_i = {(x^{(0)}_i)} ^ {x^{(1)}_i}\]- Parameters
- Returns
N-D array
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.add_scalar(x, val=1, inplace=False, n_outputs=- 1, outputs=None)[source]¶
Element-wise scalar addition.
\[y_i = x_i + v\]- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.mul_scalar(x, val=1, inplace=False, n_outputs=- 1, outputs=None)[source]¶
Element-wise scalar multiplication.
\[y_i = v x_i\]- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.pow_scalar(x, val=1, inplace=False, n_outputs=- 1, outputs=None)[source]¶
Element-wise scalar power function.
\[y_i = (x_i) ^ v\]- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.r_sub_scalar(x, val=1, n_outputs=- 1, outputs=None)[source]¶
Element-wise scalar subtraction.
\[y_i = v - x_i\]- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.r_div_scalar(x, val=1, n_outputs=- 1, outputs=None)[source]¶
Element-wise scalar division.
\[y_i = \frac{v}{x_i}\]- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.r_pow_scalar(x, val=1, n_outputs=- 1, outputs=None)[source]¶
Element-wise scalar power function.
\[y_i = v ^ {x_i}\]- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Logical¶
- nnabla.functions.equal(x0, x1, n_outputs=- 1, outputs=None)[source]¶
Element wise ‘equal’
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i = x^{(1)}_i) \\ 0 & otherwise \end{cases}.\end{split}\]- Parameters
- Returns
No Description
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.equal_scalar(x0, val=1, n_outputs=- 1, outputs=None)[source]¶
Element wise ‘equal’ with a scalar
\[\begin{split}f(x_i,v) = \begin{cases} 1 & (x_i = v) \\ 0 & otherwise \end{cases}.\end{split}\]- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.greater(x0, x1, n_outputs=- 1, outputs=None)[source]¶
Element wise comparison. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i > x^{(1)}_i) \\ 0 & (x^{(0)}_i \leq x^{(1)}_i) \end{cases}.\end{split}\]- Parameters
- Returns
No Description
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.greater_equal(x0, x1, n_outputs=- 1, outputs=None)[source]¶
Element wise comparison. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i \geq x^{(1)}_i) \\ 0 & (x^{(0)}_i < x^{(1)}_i) \end{cases}.\end{split}\]- Parameters
- Returns
No Description
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.greater_equal_scalar(x0, val=1, n_outputs=- 1, outputs=None)[source]¶
Element wise comparison with a scalar. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,v) = \begin{cases} 1 & (x^{(0)}_i \geq v \\ 0 & (x^{(0)}_i < v \end{cases}.\end{split}\]- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.greater_scalar(x0, val=1, n_outputs=- 1, outputs=None)[source]¶
Element wise comparison with a scalar. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,v) = \begin{cases} 1 & (x^{(0)}_i > v \\ 0 & (x^{(0)}_i \leq v \end{cases}.\end{split}\]- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.less(x0, x1, n_outputs=- 1, outputs=None)[source]¶
Element wise comparison. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i < x^{(1)}_i) \\ 0 & (x^{(0)}_i \geq x^{(1)}_i) \end{cases}.\end{split}\]- Parameters
- Returns
No Description
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.less_equal(x0, x1, n_outputs=- 1, outputs=None)[source]¶
Element wise comparison. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i \leq x^{(1)}_i) \\ 0 & (x^{(0)}_i > x^{(1)}_i) \end{cases}.\end{split}\]- Parameters
- Returns
No Description
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.less_equal_scalar(x0, val=1, n_outputs=- 1, outputs=None)[source]¶
Element wise comparison with a scalar. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,v) = \begin{cases} 1 & (x^{(0)}_i \leq v) \\ 0 & (x^{(0)}_i > v) \end{cases}.\end{split}\]- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.less_scalar(x0, val=1, n_outputs=- 1, outputs=None)[source]¶
Element wise comparison with a scalar. The \(i^{th}\) element of the output is:
\[\begin{split}f(x^{(0)}_i,v) = \begin{cases} 1 & (x^{(0)}_i < v) \\ 0 & (x^{(0)}_i \geq v) \end{cases}.\end{split}\]- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.logical_and(x0, x1, n_outputs=- 1, outputs=None)[source]¶
Elementwise logical AND.
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i \neq 0 \;\&\; x^{(1)}_i \neq 0) \\ 0 & otherwise \end{cases}.\end{split}\]- Parameters
- Returns
No Description
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.logical_and_scalar(x0, val, n_outputs=- 1, outputs=None)[source]¶
Elementwise logical AND with scalar.
\[\begin{split}f(x_i,v) = \begin{cases} 1 & (x_i \neq 0 \;\&\; v \neq 0) \\ 0 & otherwise \end{cases}.\end{split}\]- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.logical_not(x0, n_outputs=- 1, outputs=None)[source]¶
Element-wise logical NOT operation
\[\begin{split}f(x_i) = \begin{cases} 1 & (x_i = 0) \\ 0 & otherwise \end{cases}.\end{split}\]- Parameters
x0 (Variable) – Input variable
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.logical_or(x0, x1, n_outputs=- 1, outputs=None)[source]¶
Elementwise logical OR.
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 0 & (x^{(0)}_i = 0 \;\&\; x^{(1)}_i = 0) \\ 1 & otherwise \end{cases}.\end{split}\]- Parameters
- Returns
No Description
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.logical_or_scalar(x0, val, n_outputs=- 1, outputs=None)[source]¶
Elementwise logical OR with scalar.
\[\begin{split}f(x_i,v) = \begin{cases} 0 & (x_i = 0 \;\&\; v = 0) \\ 1 & otherwise \end{cases}.\end{split}\]- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.logical_xor(x0, x1, n_outputs=- 1, outputs=None)[source]¶
Elementwise logical XOR.
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i = 0 \;\&\; x^{(1)}_i = 0) \\ 1 & (x^{(0)}_i \neq 0 \;\&\; x^{(1)}_i \neq 0) \\ 0 & otherwise \end{cases}.\end{split}\]- Parameters
- Returns
No Description
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.logical_xor_scalar(x0, val, n_outputs=- 1, outputs=None)[source]¶
Elementwise logical XOR with scalar.
\[\begin{split}f(x_i,v) = \begin{cases} 1 & (x_i = 0 \;\&\; v = 0) \\ 1 & (x_i \neq 0 \;\&\; v \neq 0) \\ 0 & otherwise \end{cases}.\end{split}\]- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.not_equal(x0, x1, n_outputs=- 1, outputs=None)[source]¶
Element wise ‘not equal’
\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 0 & (x^{(0)}_i = x^{(1)}_i) \\ 1 & otherwise \end{cases}.\end{split}\]- Parameters
- Returns
No Description
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.not_equal_scalar(x0, val=1, n_outputs=- 1, outputs=None)[source]¶
Element wise ‘not equal’ with a scalar
\[\begin{split}f(x_i,v) = \begin{cases} 0 & (x_i = v) \\ 1 & otherwise \end{cases}.\end{split}\]- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.sign(x, alpha=1.0, n_outputs=- 1, outputs=None)[source]¶
Element-wise sign function.
In the forward pass, it is defined as
\[\begin{split}f(x) = \begin{cases} 1 & (x > 0) \\ -1 & (x < 0) \\ \alpha & (x = 0) \end{cases}.\end{split}\]In the backward pass, it is defined as
\[\frac{\partial f(x)}{\partial x} = 1,\]or in other words, it behaves as the identity function for the gradient in the backward pass.
- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.minimum2(x0, x1, n_outputs=- 1, outputs=None)[source]¶
Element-wise minimum.
\[y_i = \min(x^{(0)}_i, x^{(1)}_i)\]- Parameters
- Returns
N-D array of min value
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.maximum2(x0, x1, n_outputs=- 1, outputs=None)[source]¶
Element-wise maximum.
\[y_i = \max(x^{(0)}_i, x^{(1)}_i)\]- Parameters
- Returns
N-D array of max value
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.minimum_scalar(x, val=1.0, n_outputs=- 1, outputs=None)[source]¶
Element-wise scalar minimum.
\[y_i = \min(x_i, v)\]- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.maximum_scalar(x, val=1.0, n_outputs=- 1, outputs=None)[source]¶
Element-wise scalar maximum.
\[y_i = \max (x_i, v)\]- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.isnan(x0, n_outputs=- 1, outputs=None)[source]¶
Test element-wise for NaN and return a
0/1
array.- Parameters
x0 (Variable) – Input variable
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.isinf(x0, n_outputs=- 1, outputs=None)[source]¶
Test element-wise for
inf/-inf
and return a0/1
array.- Parameters
x0 (Variable) – Input variable
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.reset_nan(x0, val=0, n_outputs=- 1, outputs=None)[source]¶
Replace NaNs with a scalar value specified by
val
.- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.reset_inf(x0, val=0, n_outputs=- 1, outputs=None)[source]¶
Replace
-inf/inf
with a scalar value specified byval
.- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.where(condition, x_true, x_false, n_outputs=- 1, outputs=None)[source]¶
Return elements, either from
x_true
orx_false
, depending oncondition
.If rank of
condition
is higher than those ofx_true
andx_false
, the first dimensions ofx_true
andx_false
must match the dimensions ofcondition
.Example:
import numpy as np import nnabla as nn import nnabla.functions as F a = nn.Variable.from_numpy_array(np.random.rand(2, 3)) x = nn.Variable.from_numpy_array(np.random.rand(2, 3, 4)) y = nn.Variable.from_numpy_array(np.random.rand(2, 3, 4)) z = F.where(F.greater_scalar(a, 0.5), x, y) z.forward() # Numpy equivalent z_numpy = np.where(a.d > 0.5, x.d, y.d) assert np.allclose(z_numpy, z.d)
- Parameters
- Returns
N-D array with the same shape as condition
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Math¶
- nnabla.functions.constant(val=0, shape=[], n_outputs=- 1, outputs=None)[source]¶
Generate a constant-valued array.
- Parameters
- Returns
N-D array where all values are the specified constant.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.arange(start, stop, step=1, n_outputs=- 1, outputs=None)[source]¶
Generate a range of values within the half-open interval
[start, stop)
(the interval including start but excluding stop) withstep
increments.- Parameters
- Returns
1-D array with the generated values.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.abs(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise absolute value function.
\[y_i = |x_i|\]- Parameters
x (Variable) – Input variable
- Returns
Element-wise absolute variable
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.exp(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise natural exponential function.
\[y_i = \exp(x_i).\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.log(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise natural logarithm function.
\[y_i = \ln(x_i).\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.round(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise round function.
In the forward pass, this function simply computes
round
to the nearest integer value.\[y_i = round(x_i).\]In the backward pass, the simple Straight-Through Estimator (STE) is applied,
\[\frac{\partial y_i}{\partial x_i} = 1.\]- Parameters
x (Variable) – Input variable
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.ceil(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise ceil function.
In the forward pass, this function simply returns the smallest integer which is not less than the input.
\[y_i = ceil(x_i).\]In the backward pass, the simple Straight-Through Estimator (STE) is applied,
\[\frac{\partial y_i}{\partial x_i} = 1.\]- Parameters
x (Variable) – Input variable
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.floor(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise floor function.
In the forward pass, this function simply returns the largest integer which is not greater than the input.
\[y_i = floor(x_i).\]In the backward pass, the simple Straight-Through Estimator (STE) is applied,
\[\frac{\partial y_i}{\partial x_i} = 1.\]- Parameters
x (Variable) – Input variable
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.identity(x, n_outputs=- 1, outputs=None)[source]¶
Identity function.
\[y = x\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.matrix_diag(x, n_outputs=- 1, outputs=None)[source]¶
Returns an array where the last two dimensions consist of the diagonal matrix.
- Parameters
x (Variable) – N-D array with shape (\(M_0 \times \ldots \times M_N\)).
- Returns
N-D array with shape (\(M_0 \times \ldots \times M_N \times M_N\)).
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.matrix_diag_part(x, n_outputs=- 1, outputs=None)[source]¶
Returns an array in which the values of the last dimension consist of the diagonal elements of the last two dimensions of an input array.
- Parameters
x (Variable) – N-D array with shape (\(M_0 \times \ldots \times M_N \times M_N\)).
- Returns
N-D array with shape (\(M_0 \times \ldots \times M_N\)).
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.batch_matmul(a, b, transpose_a=False, transpose_b=False, n_outputs=- 1, outputs=None)[source]¶
Batch matrix multiplication.
Two of batchs of matrices are multiplied for each sample in a batch. A batch of matrices is composed as […, P, Q] where the last two dimensions compose matrix dimensions, and the first dimensions up to the third last dimension are considered as batch samples. These batch dimensions are internally broadcasted when the size of a dimension is 1.
Example:
import nnabla as nn import nnabla.functions as F import numpy as np nn.set_auto_forward(True) # Same batch size a = nn.Variable.from_numpy_array(np.random.rand(2, 2, 3, 4)) b = nn.Variable.from_numpy_array(np.random.rand(2, 2, 4, 3)) c = F.batch_matmul(a, b) # Different batch size with the broadcast a = nn.Variable.from_numpy_array(np.random.rand(2, 1, 3, 4)) b = nn.Variable.from_numpy_array(np.random.rand(1, 3, 4, 3)) c = F.batch_matmul(a, b)
Warning
Since the version 1.13, the behavior of the batch dimensions changed, it supported the internal broadcast when the size of a dimension is 1. Accordingly, this function does not supports different batch dimensions between two inputs even if the total sample size for each input is same.
- Parameters
a (Variable) – N-D array with >= 2-dim. The last two dimensions will be treated as a matrix.
b (Variable) – N-D array with >= 2-dim. The last two dimensions will be treated as a matrix. The product of the size of 0-th dimension through the size of the third last dimension must be same as that of the input
a
.transpose_a (bool) – Transpose the last two axes of
a
in matrix multiplication. [default=False
]transpose_b (bool) – Transpose the last two axes of
b
in matrix multiplication. [default=False
]
- Returns
Output of sample-wise matrix multiplication in a batch. When
a
is of a shape of [N, P, Q],b
is of a shape of [N, Q, R], and transpose options are all False, the output will be a shape of [N, P, R].- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.sin(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise sine (sin) function.
\[y_i = \sin (x_i)\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.cos(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise cosine (cos) function.
\[y_i = \cos (x_i)\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.tan(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise tangent (tan) function.
\[y_i = \tan (x_i)\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.sinh(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise hyperbolic sine (sinh) function.
\[y_i = \sinh (x_i)\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.cosh(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise hyperbolic cosine (cosh) function.
\[y_i = \cosh (x_i)\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.tanh(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise hyperbolic tangent (tanh) function.
\[y_i = \tanh (x_i)\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.asin(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise arcsine (asin) function.
\[y_i = \arcsin (x_i)\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.acos(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise arccosine (acos) function.
\[y_i = \arccos (x_i)\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.atan(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise arctangent (atan) function.
\[y_i = \arctan (x_i)\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.atan2(x0, x1, n_outputs=- 1, outputs=None)[source]¶
Element-wise arctangent (atan) function with 2 input variables.
\[y_i = \arctan2 (x_{i1}, x_{i2})\]- Parameters
- Returns
N-D array with the same shape as input variables
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.asinh(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise hyperbolic arcsine (asinh) function.
\[y_i = \text{arcsinh} (x_i)\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.acosh(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise hyperbolic arccosine (acosh) function.
\[y_i = \text{arccosh} (x_i)\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.atanh(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise hyperbolic arctangent (atanh) function.
\[y_i = \text{arctanh} (x_i)\]Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.cumsum(x, axis=None, exclusive=False, reverse=False, n_outputs=- 1, outputs=None)[source]¶
Cumulative sum along a given axis.
- Parameters
- Returns
N-D array
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.cumprod(x, axis=None, exclusive=False, reverse=False, n_outputs=- 1, outputs=None)[source]¶
Cumulative product along a given axis.
Note
Backward computation is not accurate in a zero value input.
- Parameters
- Returns
N-D array
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Array Manipulation¶
- nnabla.functions.concatenate(*x, **kw)[source]¶
Concatenate a variable number of input arrays along the specified axis.
- Parameters
- Returns
Concatenate variable
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.split(x, axis=0)[source]¶
Split arrays at the specified axis.
It returns a number corresponding the size of the given axis (i.e
x.shape[axis]
) ofVariable
s.Returns: A
tuple
ofVariable
sSee also
nnabla.function_bases.split()
.
- nnabla.functions.stack(*x, **kw)[source]¶
Joins two or more arrays on a new axis.
Note
Unlike
nnabla.functions.concatenate()
, which joins arrays on an existing axis, Stack joins arrays on a new axis.- Parameters
*x (Variable) – N-D arrays. The sizes of all the arrays to be stacked must be the same. [variadic]
axis (int) – The axis on which to concatenate arrays. Axis indices take on values 0, 1, 2, and so on from the left. For example, to stack four (3,28,28) inputs on the second axis, specify 1. In this case, the output size will be (3,4,28,28). [default=
0
]
- Returns
Output
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.slice(x, start=None, stop=None, step=None, n_outputs=- 1, outputs=None)[source]¶
Slice arrays along specified axis. This function complies with python slice wherre
slice(None, None, -1)
andslice(-1, None, -1)
are the special case, which flips the input array and results in the output array from the end to the beginning of the input array along the corresponding dimension.- Parameters
x (Variable) – N-D array
start (repeated int64) – Start indices for each axis [default=``(0,) * len(x.shape)``]
stop (repeated int64) – Stop indices for each axis [default=``tuple(x.shape)``]
step (repeated int64) – Step indices for each axis [default=``(1,) * len(x.shape)``]
- Returns
Sliced N-D array
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.gather(x, Indices, axis=None, batch_dims=None, n_outputs=- 1, outputs=None)[source]¶
Gather from the input data according to the index.
Given the input data \(X\) of \((D_{0}, \ldots, D_{N-1})\) shape and the indices \(IDX\) of \((I_{0}, \ldots, I_{M-1})\) shape, in case of
batch_dims = 0
, the gather outputs\[\begin{split}&& Y[d_{0}, \ldots, d_{axis - 1}, i_{0}, \ldots, i_{M-1}, d_{axis + 1}, \ldots, d_{N-1}] = \\ && X[d_{0}, \ldots, d_{axis - 1}, IDX[i_{0}, \ldots, i_{M-1}], d_{axis + 1}, \ldots, d_{N-1}].\end{split}\]Generally, the gather ouptuts
\[\begin{split}&& Y[d_{0}, \ldots, d_{axis - 1}, i_{B}, \ldots, i_{M-1}, d_{axis + 1}, \ldots, d_{N-1}] = \\ && X[d_{0}, \ldots, d_{axis - 1}, IDX[i_{0}, \ldots, i_{B - 1}, i_{B} \ldots, i_{M-1}], d_{axis + 1}, \ldots d_{N-1}].\end{split}\]where \(B\) =
batch_dims
.x.shape[:batch_dims]
must be equal toindices.shape[:batch_dims]
.Output shape is
x.shape[:axis] + indices.shape[batch_dims:] + x.shape[axis + 1]
.- Parameters
- Returns
Gathered output.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.gather_nd(data, indices)[source]¶
Gather elements or slices from
data
according toindices
, which must be at least two-dimensional with the first dimension \(M\) being less or equal to the \(N\) dimensions ofdata
. Givendata
with shape \((X_0, X_1, ..., X_{N-1})\) and indices with shape \((M, Y_0, ..., Y_{K-1})\) output has shape \((Y_0, ..., Y_{K-1}, X_M, ..., X_{N-1})\). If \(M == N\), output shape is simply \((Y_0, ..., Y_{K-1})\).The forward of
gather_nd()
is equivalent to:def gather_nd(data, index): import numpy as np tmp_index = index.reshape(index.shape[0], -1) tmp_index = (idx + (Ellipsis,) for idx in zip(*new_index)) out_shape = index.shape[1:] + data.shape[index.shape[0]:] return np.vstack(data[idx] for idx in tmp_index).reshape(*out_shape)
Examples:
>>> import numpy as np, nnabla as nn, nnabla.functions as F >>> nn.set_auto_forward(True) >>> data = F.arange(1, 11).reshape([2, 5]) >>> print(data.d) [[ 1. 2. 3. 4. 5.] [ 6. 7. 8. 9. 10.]] >>> F.gather_nd(data, [[1, 1, 0]]).shape (3, 5) >>> F.gather_nd(data, [[1, 1, 0], [0, 1, 0]]).shape (3,) >>> print(F.gather_nd(data, [[1, 1, 0], [0, 1, 0]]).d) [6. 7. 1.] >>> print(F.gather_nd(data, [[1, 1, 0]]).d) [[ 6. 7. 8. 9. 10.] [ 6. 7. 8. 9. 10.] [ 1. 2. 3. 4. 5.]]
When
indices
is provided as aVariable
it will be possible to change the actual index values after function creation. It is important to note that out-of-bound indices raise errors when running on CPU but are ignored when using an accelerated computation context.>>> indices = nn.Variable((2, 1)) >>> indices.d = [[0], [0]] >>> y = F.gather_nd(data, indices) >>> print(y.d) [1.] >>> indices.d = [[1], [4]] >>> y.forward() >>> print(y.d) [10.]
- Parameters
Returns: ~nnabla.Variable or ~nnabla.NdArray of gathered elements.
- nnabla.functions.scatter_nd(data, indices, shape=None, out=None)[source]¶
Scatter
data
according toindices
into a new array of givenshape
or an existing array provided asout
. Exactly one of theshape
orout
argument must be given. Given outputshape
, or shape ofout
array, \((X_0,X_1,\ldots,X_{N-1})\) andindices
shape \((M,Y_0,\ldots,Y_{K-1})\) the inputdata
shape is \((Y_0,\ldots,Y_{K-1},X_M,\ldots,X_{N-1})\), where \(M<=N\). If \(M==N\) thedata
shape is simply \((Y_0,\ldots,Y_{K-1})\). Note thatindices
are treated as integers and potentially converted.The forward of
scatter_nd()
is equivalent to:def scatter_nd(data, indices, shape=None, out=None): assert (shape and not out) or (out and not shape) if isinstance(indices, numpy.ndarray) indices = indices.tolist() result = out if out else numpy.zeros(shape) result[indices] = data return result
Examples:
>>> import numpy as np, nnabla as nn, nnabla.functions as F >>> nn.set_auto_forward(True) >>> data = nn.Variable.from_numpy_array(np.array([9, 10, 11, 12])) >>> indices = nn.Variable.from_numpy_array(np.array([[4, 3, 1, 7]])) >>> scattered = F.scatter_nd(data, indices, shape=(8,)) >>> print(scatterd.d) [ 0. 11. 0. 10. 9. 0. 0. 12.] >>> print(F.gather_nd(scattered, indices).d) [ 9. 10. 11. 12.]
- Parameters
Returns: ~nnabla.Variable or ~nnabla.NdArray of given
shape
.
- nnabla.functions.scatter_add(x0, indices, x1, axis=None)[source]¶
Add all values from
x1
into thex0
according to index specified byindices
. This function addsx1
into the copy ofx0
and outputs the copy. The originalx0
will not be changed.x0
,indices
andx1
must have same number of dimensions.The forward of
scatter_add()
is equivalent to:def scatter_add(x0, indices, x1, axis): # Assuming each input is 3 dimensional import numpy as np output = np.copy(x0) for i in range(indices.shape[0]): for j in range(indices.shape[1]): for k in range(indices.shape[2]): if axis == 0: output[indices[i][j][k]][j][k] += x1[i][j][k] elif axis == 1: output[i][indices[i][j][k]][k] += x1[i][j][k] elif axis == 2: output[i][j][indices[i][j][k]] += x1[i][j][k] return output
- Parameters
x0 (Variable) – N-D array which the data is added to its copy.
indices (Variable) – N-D array scatter indices. The size of each dimension must be equal or smaller than that of x0 except for the specified axis. The value of indices must be smaller than the size of specified axis’ dimension of x0. The size of each dimension must be equal or smaller than that of x1. Indices must not be negative.
x1 (Variable) – N-D array which is scattered and added to x0.
axis (int) – Axis along which to index. The axis must not exceed the inputs’ dimension. [default=
0
]
- Returns
N-D array which contains the result of scatter addition. The shape is same as x0.
- Return type
- nnabla.functions.pad(x, pad_width, mode='constant', constant_value=0, n_outputs=- 1, outputs=None)[source]¶
Pad the input N-D array
x
over the number of dimensions given by half the length of thepad_width
iterable, where every two values inpad_width
determine the before and after pad size of an axis. Thepad_width
iterable must hold an even number of positive values which may cover all or fewer dimensions of the input variablex
. Ifpad_width
covers fewer dimensions then it applies to the innermost dimensions ofx
.x = nn.Variable.from_numpy_array(np.ones((2, 3, 4))) assert F.pad(x, (1, 1, 2, 2)).shape == (2, 5, 8)
Padding is performed according to the requested
mode
:- constant
Pads with a value given by the keyword argument
constant_value
.x = nn.Variable.from_numpy_array(np.array([1, 2, 3, 4], dtype=np.int)) y = F.pad(x, (3, 3), 'constant', constant_value = -1) y.forward() assert np.all(y.d == np.array([-1, -1, -1, 1, 2, 3, 4, -1, -1, -1]))
- reflect
Pads with the reflection of the vector mirrored on the first and last values of the vector along each axis.
x = nn.Variable.from_numpy_array(np.array([1, 2, 3, 4], dtype=np.int)) y = F.pad(x, (3, 3), 'reflect') y.forward() assert np.all(y.d == np.array([4, 3, 2, 1, 2, 3, 4, 3, 2, 1]))
- repeat
Pads with the edge value of the vector along each axis.
x = nn.Variable.from_numpy_array(np.array([1, 2, 3, 4], dtype=np.int)) y = F.pad(x, (3, 3), 'repeat') y.forward() assert np.all(y.d == np.array([1, 1, 1, 1, 2, 3, 4, 4, 4, 4]))
- Parameters
- Returns
Padded N-D array with the same number of dimensions as the input.
x = nn.Variable((3, 3, 4, 2)) # a shape like (B, C, H, W) # 1-D padding: last dim by 1 left and 2 on the right side assert F.pad(x, (1, 2)).shape == (3, 3, 4, 5) # 2-D padding: last dim by (1, 1) and 2nd to last by (2, 2) assert F.pad(x, (2, 2, 1, 1)).shape == (3, 3, 8, 4) # 3-D padding: dims C by (0, 1), H by (2, 1), and W by (3, 3) assert F.pad(x, (0, 1, 2, 1, 3, 3)).shape == (3, 4, 7, 8)
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.transpose(x, axes, n_outputs=- 1, outputs=None)[source]¶
Transposes tensor dimensions.
- Parameters
x (Variable) – N-D array
axes (repeated int64) – Source axis indices for each axis.
- Returns
Transposed N-D array.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.broadcast(x, shape, n_outputs=- 1, outputs=None)[source]¶
Broadcasting ND-array to the specified shape.
- Parameters
- Returns
Broadcasted N-D array
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.broadcast_to(x, y, axis=None, n_outputs=- 1, outputs=None)[source]¶
Warning
This function is experimental support, so please do not actively use it.
Broadcasting ND-array to the specified buffer.
- Parameters
- Returns
Broadcasted N-D array
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.tile(x, reps)[source]¶
Forward
x
repeated the number of times given byreps
. Ifreps
is a sequence, the output has dimension ofd = max(len(reps), x.ndim)
and eitherx
is promoted to be d-dimensional by prepending new axes orreps
is promoted to x.ndim by prepending 1’s.- Parameters
- Returns
N-D array.
- Return type
>>> import numpy as np, nnabla as nn, nnabla.functions as F >>> F.tile(nn.Variable([2, 3]), 3).shape # reps is promoted to [1, 3] (2, 9) >>> F.tile(nn.Variable([3]), [2, 3]).shape # x is promoted to shape (1, 3) (2, 9) >>> nn.set_auto_forward(True) >>> x = nn.Variable.from_numpy_array(np.array([1, 2, 3])) >>> print(F.tile(x, 3).d) [1. 2. 3. 1. 2. 3. 1. 2. 3.] >>> print(F.tile(x, [2, 3]).d) [[1. 2. 3. 1. 2. 3. 1. 2. 3.] [1. 2. 3. 1. 2. 3. 1. 2. 3.]] >>> x = nn.Variable.from_numpy_array(np.array([[1, 3], [2, 4]])) >>> print(F.tile(x, 3).d) [[1. 3. 1. 3. 1. 3.] [2. 4. 2. 4. 2. 4.]] >>> print(F.tile(x, [2, 3]).d) [[1. 3. 1. 3. 1. 3.] [2. 4. 2. 4. 2. 4.] [1. 3. 1. 3. 1. 3.] [2. 4. 2. 4. 2. 4.]]
- nnabla.functions.flip(x, axes=None, n_outputs=- 1, outputs=None)[source]¶
Reverses the order of elements of the specified dimension of an array.
- Parameters
x (Variable) – N-D array
axes (repeated int64) – The index of the dimension to reverse the order of the elements. Axis indices take on values 0, 1, 2, and so on from the left. For example, to flip a 32 (W) by 24 (H) 100 RGB image (100,3,24,32) vertically and horizontally, specify (2,3). [default=
[len(x.shape) - 1]
]
- Returns
N-D array
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.shift(x, shifts=None, border_mode='nearest', n_outputs=- 1, outputs=None)[source]¶
Shifts the array elements by the specified amount.
- Parameters
x (Variable) – N-D array.
shifts (repeated int64) – The amount to shift elements. For example, to shift image data to the right by 2 pixels and up 3 pixels, specify (-3,2). [default=
(0,) * len(x.shape)
]border_mode (string) – Specify how to process the ends of arrays whose values will be undetermined as a result of shifting. nearest: The data at the ends of the original array is copied and used. reflect: Original data reflected at the ends of the original array is used. [default=
'nearest'
]
- Returns
N-D array.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.sort(x, axis=- 1, reverse=False, with_index=False, only_index=False)[source]¶
Sorts the elements of
x
along a givenaxis
in ascending order by value. A negativeaxis
counts from the last dimension ofx
, so the default of -1 sorts along the last dimension. Ifreverse
is True, then the elements are soreted in descending order.If
with_index
is True, result is a tuple(sorted, indices)
or onlyindices
ifonly_index
is True. Settingonly_index
to True implies thatwith_index
is also True.import numpy as np import nnabla as nn import nnabla.functions as F nn.set_auto_forward(True) x = nn.Variable.from_numpy_array(np.random.rand(2, 3, 4)) sorted = F.sort(x) assert np.allclose(sorted.d, np.sort(x.d)) sorted, indices = F.sort(x, with_index=True) assert np.allclose(sorted.d, np.sort(x.d)) assert np.all(indices.d == np.argsort(x.d)) indices = F.sort(x, only_index=True) assert np.all(indices.d == np.argsort(x.d))
- Parameters
Returns: ~nnabla.Variable
sorted
or ~nnabla.Variableindices
or (~nnabla.Variablesorted
, ~nnabla.Variableindices
)
- nnabla.functions.reshape(x, shape, inplace=True, n_outputs=- 1, outputs=None)[source]¶
Reshapes the input variable in-place. It does not create a copy of the variable. The output variable (y) has a new shape but points to the same data as the input variable (x). This means that if the data in the output variable (y) is modified, the data in the input variable (x) also gets modified since the reshape was done in-place.
Note
This function has the same behavior as the
nnabla.Variable.reshape()
method.- Parameters
- Returns
Reshaped N-D array
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.one_hot(x, shape, n_outputs=- 1, outputs=None)[source]¶
This function creates one-hot vector based on input indices.
Example:
import nnabla as nn import nnabla.functions as F import numpy as np labels = nn.Variable.from_numpy_array(np.array([[9], [4], [5], [1], [0]])) print(labels.shape) # (5, 1) num_class = 10 y_train = F.one_hot(labels, shape=(num_class, )) y_train.forward() print(y_train.shape) # (5, 10) print(y_train.d) # [[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.] # [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.] # [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.] # [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]] # Can also be used for ndarray. labels = nn.Variable.from_numpy_array(np.array([[1, 7], [4, 7], [8, 6], [5, 0], [2, 6]])) print(labels.shape) # (5, 2) num_class_1, num_class_2 = 10, 8 y_train = F.one_hot(labels, shape=(num_class_1, num_class_2)) y_train.forward() print(y_train.shape) # (5, 10, 8) print(y_train.d) # [[[0. 0. 0. 0. 0. 0. 0. 0.] [[0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 1.] [0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 1. 0.] # [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 0.] ... [0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.] # [0. 0. 0. 0. 0. 0. 0. 0.]], [0. 0. 0. 0. 0. 0. 0. 0.]]]
- Parameters
x (Variable) – N-D array representing label’s indice.
shape (
tuple
ofint
) – Number of classes. Note that it must be exactly the same as the number of classes included in label data. Passing incorrect numbers might cause an unexpected error and currently this function doesn’t check if the input is valid or not. Also, when nd-labels are given, dimensions must match. See the example above.
- Returns
N-D array one-hot vector/tensor.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.batch_inv(x, n_outputs=- 1, outputs=None)[source]¶
Returns an array of inverted matrix
- Parameters
x (Variable) – batched N-D array
- Returns
batched N-D array of inverted matrix
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.batch_det(x, n_outputs=- 1, outputs=None)[source]¶
Batch-wise determinant function.
\[Y_b = \det(X_b),\]where \(X_b\) and \(Y_b\) are the \(b\)-th input and output, respectively.
- Parameters
x (Variable) – batched N-D array
- Returns
batched N-D array of determinant
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.batch_logdet(x, n_outputs=- 1, outputs=None)[source]¶
Batch-wise log absolute determinant function.
\[Y_b = \log(|\det(X_b)|),\]where \(X_b\) and \(Y_b\) are the \(b\)-th input and output, respectively.
- Parameters
x (Variable) – batched N-D array
- Returns
batched N-D array of log absolute determinant
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.assign(dst, src, n_outputs=- 1, outputs=None)[source]¶
Assign source array to destination array just like
tf.assign
. This is useful to synchronize or manually update parameters.dst = nn.Variable((2, 3, 4)) src = nn.Variable((2, 3, 4)) assign = F.assign(dst, src) assign.forward() assert np.allclose(dst.d, src.d) # dst and src have identical values. assert np.allclose(assign.d dst.d) # returned Variable is also identical to dst.
Unlike TensorFlow, the returned Variable has a backward path to
dst
:\[g_{dst} = g_{y}\]- Parameters
- Returns
An assigned array
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.top_k_data(x, k, abs=False, reduce=True, base_axis=1, n_outputs=- 1, outputs=None)[source]¶
Select the
k
largest values from each sample inx
to propagate unmodified and set all other values to 0. Ifabs
is True, thek
largest values are selected by magnitude. Ifreduce
is True (the default), all feature dimensions are reduced to a single dimension of sizek
that propagates only thek
largest values. Otherwise, ifreduce
is False, input and output dimensions are identical. Dimensions beforebase_axis
are treated as number of sample dimensions andk
values get selected from all elements of a sample (dimensions frombase_axis
) regardless of shape.>>> import nnabla as nn, nnabla.functions as F >>> x = nn.Variable((4, 5, 6)) >>> F.top_k_data(x, 3, reduce=False).shape (4, 5, 6) >>> F.top_k_data(x, 3, reduce=True).shape (4, 3) >>> F.top_k_data(x, 3, reduce=True, base_axis=2).shape (4, 5, 3)
- Parameters
x (Variable) – N-D array
k (int) – Number of largest data values to propagate.
abs (bool) – Determine largest data values by magnitude. [default=
False
]reduce (bool) – Reduce feature size to one dimension of size
k
. [default=True
]base_axis (int) – First dimension of the sample shape. [default=
1
]
- Returns
N-D array.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.top_k_grad(x, k, abs=False, base_axis=1, n_outputs=- 1, outputs=None)[source]¶
Select the
k
largest gradients for each sample inx
to back-propagate unmodified and set all other gradients to 0. Ifabs
is True, thek
largest gradients are selected by magnitude. Dimensions beforebase_axis
are treated as number of sample dimensions andk
gradients get selected from all gradients of a sample (dimensions frombase_axis
) regardless of shape.- Parameters
- Returns
N-D array with same shape and data as
x
.- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.pack_padded_sequence(padded_sequence, lengths, batch_first=False, n_outputs=- 1, outputs=None)[source]¶
Pack a padded variable-length sequences.
This method packs a padded variable-length sequences.
\(T_i\) is the length of the \(i\)-th Variable in the sequences. \(B\) is the batch size equal to the length of the sequences. \(T\) is the max of \(T_i\) for all \(i\). \(*\) is the remaining dimensions including none.
Note
This function assumes the length-sorted padded sequence in the decreasing order and must be used by
pack_padded_sequence()
in the dynamic computation mode. See :- Parameters
padded_sequence (Variable) – Padded sequence of (\(T \times B \times *\)) or (\(B \times T \times *\)) shape.
lengths (Variable) – Sequence length for each batch and always resides in CPU.
batch_first (bool) –
padded_sequence
is of (\(T\), \(B\), \(*\)) shape if False, otherwise (\(B\), \(T\), \(*\)).[default=
False
]
- Returns
Packed sequence of (\(N\), \(*\)) shape. ~nnabla.Variable: Batch size for each time and always resides in CPU.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.pad_packed_sequence(packed_sequence, batch_sizes, batch_first=False, padding_value=None, total_length=None, n_outputs=- 1, outputs=None)[source]¶
Pad packed sequence.
This method unpacks the packed sequqnce and pad it, the inverse operation of
pack_padded_sequence()
.\(T_i\) is the length of the \(i\)-th Variable in the sequences. \(B\) is the batch size equal to the length of the sequences. \(T\) is the max of \(T_i\) for all \(i\). \(*\) is the remaining dimensions including none.
Note
This function assumes the output of the length-sorted padded sequence in the decreasing order and must be used by
pad_packed_sequence()
in the dynamic computation mode.- Parameters
packed_sequence (Variable) – Packed sequence of (\(N\), \(*\)) shape.
batch_sizes (Variable) – Batch size for each time and always resides in CPU.
batch_first (bool) –
padded_sequence
is of (\(T\), \(B\), \(*\)) shape if False, otherwise (\(B\), \(T\), \(*\)).[default=
False
]padding_value (float) – Padding value. [default=
0.0
]total_length (int) –
If not None, the outputs are padded up to the
total_length
. If thetotal_length
is less than the max length in thesequences
, the error is thrown.[default=
-1
]
- Returns
Padded sequence of (\(T \times B \times *\)) or (\(B \times T \times *\)) shape. ~nnabla.Variable: Sequence length for each batch and always resides in CPU.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Stochasticity¶
- nnabla.functions.rand(low=0, high=1, shape=[], seed=- 1, n_outputs=- 1, outputs=None)[source]¶
Samples numbers from a uniform distribution \(x \sim U(low, high)\) given lowest value \(low\), upper bound \(high\), and shape of the returned Variable.
- Parameters
- Returns
Variable with the shape specified in the argument.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.randint(low=0, high=1, shape=[], seed=- 1, n_outputs=- 1, outputs=None)[source]¶
Samples integer numbers from a uniform distribution \(x \sim U(low, high)\) given lowest value \(low\), upper bound \(high\), and shape of the returned Variable.
- Parameters
- Returns
Variable with the shape specified in the argument. The dtype is int32.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.randn(mu=0, sigma=1, shape=[], seed=- 1, n_outputs=- 1, outputs=None)[source]¶
Samples numbers from a normal distribution \(x \sim N(\mu, \sigma)\) given mean \(\mu\), standard deviation \(\sigma\), and shape of the returned Variable.
- Parameters
- Returns
Variable with the shape specified in the argument.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.rand_binomial(n=1, p=0.5, shape=[], seed=- 1, n_outputs=- 1, outputs=None)[source]¶
Samples numbers from a binomial distribution \(x \sim B(n, p)\) given the numbers of trials \(n\), probability \(p\), and shape of the returned Variable. When \(n = 1\), this behaves like the Bernoulli distriburion.
- Parameters
n (int) – \(n\) in definition, the number of trials. [default=
1
]p (float) – \(p\) in definition, probability of success. [default=
0.5
]shape (
tuple
ofint
) – Shape of returned variable. [default=[]
]seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default=
-1
]
- Returns
Variable with the shape specified in the argument.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.rand_beta(alpha=0.5, beta=0.5, shape=[], seed=- 1, n_outputs=- 1, outputs=None)[source]¶
Samples numbers from a beta distribution \(x \sim \beta(\alpha, \beta)\).
- Parameters
- Returns
Variable with the shape specified in the argument.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.rand_gamma(k=0.5, theta=1, shape=[], seed=- 1, n_outputs=- 1, outputs=None)[source]¶
Samples numbers from a gamma distribution \(x \sim \frac {\gamma(k, \frac {x}{\theta})}{\Gamma(k)}\).
- Parameters
- Returns
Variable with the shape specified in the argument.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.dropout(x, p=0.5, seed=- 1, output_mask=False)[source]¶
- Dropout.
Samples a number \(u\) from a uniform distribution in \([0, 1]\) , and ignores the input if \(u \leq p\).
\[y = \left\{ egin{array}{ll}\]- rac{x}{1 - p} & (u > p)
0 & ({
- m otherwise})
end{array}
ight.
- Args:
x (Variable): An input variable. p (float): math:
p
in definition. [default=0.5
] seed (int): Random seed. When -1, seed is sampled from global random number generator. [default=-1
] output_mask (bool): Whether or not to output mask. [default=False
]- Returns:
~nnabla.Variable: N-D array.
- Note:
Usually dropout only applied during training as below (except MC dropout). If you want to use dropout as an MC dropout, remove ‘if train:’.
h = PF.affine(x, num_hidden) if train: h = F.dropout(h, 0.5)
- nnabla.functions.random_choice(x, w, shape=[], replace=True, seed=- 1, n_outputs=- 1, outputs=None)[source]¶
Generate random samples from population
x
with selection probabilities determined by the relative weightsw
. The number of samples to draw is given by the product ofshape
s dimensions, and the samples are returned with the givenshape
. By default, samples are drawn with replacement, i.e. selection of a specific population member is solely determined by its associated weight. Sampling without replacement, where any population member may be drawn only once, is used ifreplace
is set to False.For both
x
andw
the innermost dimension corresponds to the individual populations and their weights from which samples are returned with the requestedshape
following all outermost dimensions of the input.import nnabla as nn import nnabla.functions as F import numpy as np nn.set_auto_forward(True) # x holds two populations x = nn.Variable.from_numpy_array(np.array([[11, 22, 33], [110, 220, 330]])) # w holds the weights for each population w = nn.Variable.from_numpy_array(np.array([[10, 20, 70], [70, 20, 10]])) # draw one sample from each population y = F.random_choice(x, w) # y.shape => (2, 1) # draw 12 samples with shape (3, 4) from each population y = F.random_choice(x, w, shape=(3, 4)) # y.shape => (2, 3, 4)
Note that weights must not be less than zero and for each population the sum of weights must be greater than zero. Additionally, sampling without replacement requires that the number of non-zero weights is not less than the number of samples to be drawn. These conditions are verified in “cpu” computation context but not when using “cuda” or “cudnn” acceleration (this would require additional device synchronization steps penalizing performance).
Random sampling from an implicit array of index values (like categorical or multinomial) can be realized with input
x
constructed as indices.w = nn.Variable.from_numpy_array(np.array([1, 2, 3, 2, 1])) y = F.random_choice(F.arange(0, 5), w)
- Parameters
- Returns
N-D array
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.random_crop(x, shape=None, base_axis=1, seed=- 1, n_outputs=- 1, outputs=None)[source]¶
RandomCrop randomly extracts a portion of an array.
- Parameters
x (Variable) – N-D array
shape (
tuple
ofint
) – The data size to extract. For example, to randomly extract a portion of the image (3,48,48) from a 3,64,64 image, specify (3,48,48). [default=x.shape
]base_axis (int) – No Description [default=
1
]seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default=
-1
]
- Returns
N-D array
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.random_erase(x, prob=0.5, area_ratios=(0.02, 0.4), aspect_ratios=(0.3, 3.3333), replacements=(0.0, 255.0), n=None, share=True, inplace=False, base_axis=1, seed=- 1, channel_last=False, ste_fine_grained=True, n_outputs=- 1, outputs=None)[source]¶
Randomly erase patches of the inputs and replace with random values.
Erasing is applied for each sample and for each
n
with the given probability, the randomly selected area ratio and aspect ratio ifshare
isTrue
; otherwise (share`=`False
), for each feature additionally.Random patch are selected by random coordinates as the following,
\[\begin{split}S_e &&= Uniform(s_l, s_h) \times S \\ r_e &&= Uniform(r_l, r_h) \\ H_e &&= \sqrt{S_e \times r_e} \\ W_e &&= \sqrt{S_e / r_e} \\ y_e &&= Uniform(0, H - H_e) \\ x_e &&= Uniform(0, W - W_e),\end{split}\]where \(S\) is the area, \(s_l\) and \(s_h\) are the low and high values of the area ratio range, \(r_l\) and \(r_h\) are the low and high values of the aspect ratio range, \(H_e\) and \(W_e\) are height and width of a patch, and \(y_e\) and \(x_e\) are the start coordinates of a patch. If a pixel of the inputs falls in this patch, the value of that pixel is replaced with a random value in
replacements
range.Backward is implemented as passing gradients if
ste_fine_grained
is False; otherwise, the backward only occurs in regions not erased.References
- Parameters
x (Variable) – N-D array.
prob (float) – Probability to erase. [default=
0.5
]area_ratios (repeated float) – Low and high of the area ratio range. [default=
(0.02, 0.4)
]aspect_ratios (repeated float) – Low and high of the aspect ratios range. [default=
(0.3, 3.3333)
]replacements (repeated float) – Low and high of the replacement value range. [default=
(0.0, 255.0)
]n (int) – Max number of patches to be erased. [default=
1
]share (bool) – Use a same bounding box randomly picked over the feature dimension when being True. Default is False. [default=
True
]inplace (bool) – The output array is shared with the input array if True. [default=
False
]base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default=
1
]seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default=
-1
]channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default=
False
]ste_fine_grained (bool) – Straight Through Estimator is fine-grained or not. Default is True. [default=
True
]
- Returns
N-D array.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.random_flip(x, axes=None, base_axis=1, seed=- 1, n_outputs=- 1, outputs=None)[source]¶
Reverses the order of elements of the specified dimension of an array at 50% probability.
- Parameters
x (Variable) – N-D array
axes (repeated int64) – The index of the axis to reverse the order of the elements. Axis indices take on values 0, 1, 2, and so on from the left. For example, to flip a 32 (W) by 24 (H) 100 RGB images (100, 3,24,32) vertically and horizontally at random, specify (2,3). [default=
[len(x.shape) - 1]
]base_axis (int) – No Description [default=
1
]seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default=
-1
]
- Returns
N-D array
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.random_shift(x, shifts=None, border_mode='nearest', constant_value=0, base_axis=1, seed=- 1, n_outputs=- 1, outputs=None)[source]¶
Randomly shifts the array elements within the specified range.
- Parameters
x (Variable) – N-D array.
shifts (repeated int64) – Max absolute amount to shift elements. For example, to shift image data horizontally by \(\pm 2\) pixels and vertically by \(\pm 3\) pixels, specify (3,2). [default=
(0,) * len(x.shape)
]border_mode (string) – Specify how to process the ends of arrays whose values will be undetermined as a result of shifting. nearest: The data at the ends of the original array is copied and used. reflect: Original data reflected at the ends of the original array is used. constant: Constant value is used. [default=
'nearest'
]constant_value (float) – Value used for outside of the original array if border_mode=’constant’. [default=
0
]base_axis (int) – No Description [default=
1
]seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default=
-1
]
- Returns
N-D array.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.image_augmentation(x, shape=None, pad=(0, 0), min_scale=1.0, max_scale=1.0, angle=0.0, aspect_ratio=1.0, distortion=0.0, flip_lr=False, flip_ud=False, brightness=0.0, brightness_each=False, contrast=1.0, contrast_center=0.0, contrast_each=False, noise=0.0, seed=- 1, n_outputs=- 1, outputs=None)[source]¶
ImageAugmentation randomly alters the input image.
- Parameters
x (Variable) – N-D array.
shape (
tuple
ofint
) – The output image data size. [default=x.shape
]pad (
tuple
ofint
) – Border padding values for each spatial axis. Padding will be added both sides of the dimension. [default=(0, 0)
]min_scale (float) – The minimum scale ratio when randomly scaling the image. For example, to scale down to 0.8 times the size of the original image, specify “0.8”. To not apply random scaling, set both min_scale and max_scale to “1.0”. [default=
1.0
]max_scale (float) – The maximum scale ratio when randomly scaling the image. For example, to scale down to 2 times the size of the original image, specify “2.0”. [default=
1.0
]angle (float) – The rotation angle range in radians when randomly rotating the image. The image is randomly rotated in the -Angle to +Angle range. For example, to rotate in a +-15 degree range, specify “0.26” (15 degrees/360 degrees * 2PI). To not apply random rotation, specify “0.0”. [default=
0.0
]aspect_ratio (float) – The aspect ratio range when randomly deforming the image. For example, to deform aspect ratio of image from 1:1.3 to 1.3:1, specify “1.3”. To not apply random deforming, specify “1.0”. [default=
1.0
]distortion (float) – The distortion range when randomly distorting the image. To not apply distortion, specify “0.0”. [default=
0.0
]flip_lr (bool) – Whether to randomly flip the image horizontally at 50% probability. [default=
False
]flip_ud (bool) – Whether to randomly flip the image vertically at 50% probability. [default=
False
]brightness (float) – The absolute range of values to randomly add to the brightness. A random value in the -Brightness to +Brightness range is added to the brightness. For example, to vary the brightness in the -0.05 to +0.05 range, specify “0.05”. To not apply random addition to brightness, specify “0.0”. [default=
0.0
]brightness_each (bool) – Whether to apply the random addition to brightness (as specified by brightness) to each color channel. True: brightness is added based on a different random number for each channel. False: brightness is added based on a random number common to all channels. [default=
False
]contrast (float) – The range in which to randomly vary the image contrast. The contrast is varied in the 1/Contrast times to Contrast times range. The output brightness is equal to (input - contrast_center) * contrast + contrast_center. For example, to vary the contrast in the 0.91 times to 1.1 times range, specify “1.1”. To not apply random contrast variation, specify “1.0”. [default=
1.0
]contrast_center (float) – Intensity center used for applying contrast. [default=
0.0
]contrast_each (bool) – Whether to apply the random contrast variation (as specified by contrast) to each color channel. True: contrast is varied based on a different random number for each channel. False: contrast is varied based on a random number common to all channels. [default=
False
]noise (float) – Sigma of normal random number to be added. [default=
0.0
]seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default=
-1
]
- Returns
N-D array.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Loss Functions¶
- nnabla.functions.sigmoid_cross_entropy(x, target, n_outputs=- 1, outputs=None)[source]¶
Element-wise cross entropy between
x
and the target variables, passed to a sigmoid function.\[y_i = - \left(x^{(1)}_i \ln \left(\sigma \left(x^{(0)}_i \right)\right) + \ \left(1 - x^{(1)}_i\right) \ln \left(1 - \sigma \left(x^{(0)}_i \ \right)\right)\right)\]where \(\sigma(s)=\frac{1}{1+\exp(-s)}\).
Note
SigmoidCrossEntropy is equivalent to Sigmoid+BinaryCrossEntropy, but computing them at once has the effect of reducing computational error.
- Parameters
- Returns
N-D array of element-wise losses.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.binary_cross_entropy(x, target, n_outputs=- 1, outputs=None)[source]¶
Element-wise cross entropy between
x
and the target variables.\[y_i = - \left(x^{(1)}_i * \ln \left(x^{(0)}_i\right) + \left(1 - \ x^{(1)}_i\right) * \ln \left(1 - x^{(0)}_i\right)\right).\]- Parameters
- Returns
N-D array of element-wise losses.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.softmax_cross_entropy(x, target, axis=None, n_outputs=- 1, outputs=None)[source]¶
Element-wise cross entropy between the variables and the variables of a label given by a category index with Softmax normalization.
\[y_{j} = -\ln \left(\frac{\exp(x_{j,t_j})}{\sum_{i'} \exp(x_{j,i'})}\right)\]along dimension specified by axis (\(i\) is the axis where normalization is performed on).
Note
SoftmaxCrossEntropy is equivalent to Softmax+CategoricalCrossEntropy, but computing them at once has the effect of reducing computational error.
- Parameters
- Returns
N-D array of element-wise losses. \((D_1 \times ... \times 1 \times ... \times D_N)\)
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.categorical_cross_entropy(x, target, axis=None, n_outputs=- 1, outputs=None)[source]¶
Element-wise cross entropy between
x
and the targett
where targets are given by a category index.\[y_{j} = -\ln \left( x_{j, t_j} \right)\]along dimension specified by axis (\(i\) is the axis where normalization is performed on).
- Parameters
- Returns
N-D array of element-wise losses. \((D_1 \times ... \times 1 \times ... \times D_N)\)
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.squared_error(x0, x1, n_outputs=- 1, outputs=None)[source]¶
Element-wise squared error
\[y_i = \left(x^{(0)}_i - x^{(1)}_i\right)^2.\]- Parameters
- Returns
N-D array.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.absolute_error(x0, x1, n_outputs=- 1, outputs=None)[source]¶
Element-wise absolute error
\[y_i = | x^{(0)}_i - x^{(1)}_i |.\]- Parameters
- Returns
N-D array.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.huber_loss(x0, x1, delta=1.0, n_outputs=- 1, outputs=None)[source]¶
Element-wise Huber loss
\[\begin{split}y_i= \left\{ \begin{array}{ll} d^2 & (|d| < \delta)\\ \delta (2 |d| - \delta) & ({\rm otherwise}) \end{array} \right.\end{split}\]where \(d = x^{(0)}_i - x^{(1)}_i\)
- Parameters
- Returns
N-D array of element-wise losses.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.epsilon_insensitive_loss(x0, x1, epsilon, n_outputs=- 1, outputs=None)[source]¶
Element-wise Epsilon Insensitive Loss
\[\begin{split}y_i= \left\{ \begin{array}{ll} | x^{(0)}_i - x^{(1)}_i | - \epsilon & if \ \ | x^{(0)}_i - x^{(1)}_i | > \epsilon \\ 0 & otherwise \end{array} \right.\end{split}\]- Parameters
- Returns
N-D array of element-wise losses.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.kl_multinomial(p, q, base_axis=1, n_outputs=- 1, outputs=None)[source]¶
The Kullback Leibler Divergence for multinomial distributions.
\[D = \sum_i p_i \log \left( \frac{p_i}{q_i} \right)\]- Parameters
- Returns
Kullback Leibler divergence \(KL(p \parallel q)\).
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Signal Processing¶
- nnabla.functions.interpolate(x, scale=None, output_size=None, mode='linear', align_corners=False, half_pixel=False, half_pixel_for_nn=False, channel_last=False)[source]¶
Resize an ND array with interpolation.
Scaling factors for spatial dimensions are determined by either
scale
oroutput_size
.nd = len(scale)
ornd = len(output_size)
determines the number of spatial dimensions, and the lastnd
dimensions of the inputx
are considered as the spatial dimensions to be resized.If
scale
is given, theoutput_size
is calculated byoutput_size[i] = floor(scale[i] * x.shape[i - len(scale)]).
Calculation of the coordinate transformation are as follows.
The input coordinate i_input is computed by the output coordinate i_output, the input size size_input, and the output size size_output as
align_corners
half_pixel
i_input
True
True
Not supported.
True
False
i_output * (size_input - 1) / (size_output - 1)
False
True
(i_output + 0.5) * size_input / size_output - 0.5
False
False
i_output * size_input / size_output
In the case of the
nearest
mode andhalf_pixel_for_nn
isTrue
, the input coordinate i_input is computed by the output coordinate i_output asi_input = (i_output + 0.5) * size_input / size_output.
Example:
import numpy as np import nnabla as nn import nnabla.functions as F x_data = np.random.rand(64, 3, 224, 224) x = nn.Variable.from_numpy_array(x_data) # Resize by scales y = F.interpolate(x, scale=(2, 2), mode='linear') print(y.shape) # (64, 3, 448, 448) y.forward() print(y.d) # Print output # Resize to a size y2 = F.interpolate(x, output_size=(320, 257), mode='linear') print(y2.shape) # (64, 3, 320, 257) y2.forward() print(y2.d) # Print output
- Parameters
x (Variable) – N-D array with an arbitrary number of dimensions.
scale (tuple of ints) – Scale factors along axes. The default is
None
, and if this is omitted,output_size
must be specified.output_size (tuple of ints) – The output sizes for axes. If this is given, the scale factors are determined by the output sizes and the input sizes. The default is
None
, and if this is omitted,scale
must be specified.mode (str) – Interpolation mode chosen from (‘linear’|’nearest’). The default is ‘linear’.
align_corners (bool) – If true, the corner pixels of input and output arrays are aligned, such that the output corner pixels have the same values with the input corner pixels. Default is
False
.half_pixel – If true, in the coordinate transformation, 0.5 is added to the output coordinate and 0.5 is subtracted from the input coordinate after scaling. Default is
False
.half_pixel_for_nn – This is a special argument to support the backward-compatibility of the nearest neighbor interpolation. Default is
False
. When inTrue
, the implementation of nearest neighbor interpolation is the old one.channel_last – Last dimension is the channel (NHWC order) if True.
- Returns
N-D array.
- Return type
Warning
Up to the version 1.8.0, the default of
align_corners
wasNone
, and it becomesTrue
ifmode
is linear, otherwiseFalse
.Warning
Up to the version 1.8.0, the nearest
mode
interpolation corresponds to the nearestmode
andhalf_pixel_for_nn
=True
after the version 1.8.0.
- nnabla.functions.fft(x, signal_ndim, normalized=False, n_outputs=- 1, outputs=None)[source]¶
Complex-to-complex Discrete Fourier Transform,
\[X_{k_1, \ldots, k_d} = \sum_{n_1=0}^{N_1-1} \dots \sum_{n_d=0}^{N_d-1} x_{n_1, \ldots, n_d} \exp\left(-2 \pi j \left( \sum_{i=0}^{d} \frac{k_i n_i}{N_i} \right) \right),\]where
\[k_i = 0, \ldots, N_i - 1.\]This function now supports 1-D, 2-D, and 3-D DFT with or without the leading batch dimension(s).
The input is expected to be complex-valued with at least signal_ndim + 1 dimensions. The last dimension has a shape of two where x[…, 0] is the real part and x[…, 1] the imaginary part.
Example:
import numpy as np import nnabla as nn import nnabla.functions as F from nnabla.ext_utils import get_extension_context ctx = get_extension_context("cudnn") nn.set_default_context(ctx) # Example for a batched 2D-FFT and 2D-IFFT (batch-size: 2, data-size: 4x3) x_data = np.random.rand(2, 4, 3) + 1j * np.random.rand(2, 4, 3) x = nn.Variable.from_numpy_array(np.stack([np.real(x_data), np.imag(x_data)], axis=3)) y = F.fft(x, signal_ndim=2, normalized=True) z = F.ifft(y, signal_ndim=2, normalized=True) z.forward() np.allclose(z.d[..., 0] + 1j*z.d[...,1], x_data)
- Parameters
- Returns
FFT transformed signal.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.ifft(x, signal_ndim, normalized=False, n_outputs=- 1, outputs=None)[source]¶
Complex-to-complex inverse Discrete Fourier Transform,
\[X_{k_1, \ldots, k_d} = \frac{1}{\prod_{i=1}^{d} N_i} \sum_{n_1=0}^{N_1-1} \dots \sum_{n_d=0}^{N_d-1} x_{n_1, \ldots, n_d} \exp\left(2 \pi j \left( \sum_{i=0}^{d} \frac{k_i n_i}{N_i} \right) \right),\]where
\[k_i = 0, \ldots, N_i - 1.\]This function now supports 1-D, 2-D, and 3-D DFT with or without the leading batch dimension(s).
The input is expected to be complex-valued with at least signal_ndim + 1 dimensions. The last dimension has a shape of two where x[…, 0] is the real part and x[…, 1] the imaginary part.
- Parameters
- Returns
IFFT transformed signal.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.stft(x, window_size, stride, fft_size, window_type='hanning', center=True, pad_mode='reflect')[source]¶
Computes the short-time Fourier transform
- Parameters
x (Variable) – Time domain sequence of size
batch_size x sample_size
.window_size (int) – Size of STFT analysis window.
stride (int) – Number of samples that we shift the window, also called
hop size
.fft_size (int) – Size of the FFT, the output will have
fft_size // 2+ 1
frequency bins.window_type (str) – Analysis window, can be either
hanning
,hamming
orrectangular
. For convenience, alsowindow_type=None
is supported which is equivalent towindow_type='rectangular'
.center (bool) – If
True
, then the signalx
is padded by half the FFT size using reflection padding.pad_mode (str) – Padding mode, which can be
'constant'
or'reflect'
.'constant'
pads with0
.
- Returns
Returns real and imaginary parts of STFT result.
- nnabla.functions.istft(y_r, y_i, window_size, stride, fft_size, window_type='hanning', center=True)[source]¶
Computes the inverse shoft-time Fourier transform
Note: We use a constant square inverse window for the reconstruction of the time-domain signal, therefore, the first and last
window_size - stride
are not perfectly reconstructed.- Parameters
y_r (Variable) – Real part of STFT of size
batch_size x fft_size//2 + 1 x frame_size
.y_i (Variable) – Imaginary part of STFT of size
batch_size x fft_size//2 + 1 x frame_size
.window_size (int) – Size of STFT analysis window.
stride (int) – Number of samples that we shift the window, also called
hop size
.fft_size (int) – Size of the FFT, (STFT has
fft_size // 2 + 1
frequency bins).window_type (str) – Analysis window, can be either
hanning
,hamming
orrectangular
. For convenience, alsowindow_type=None
is supported which is equivalent towindow_type='rectangular'
.center (bool) – If
True
, then it is assumed that the time-domain signal has centered frames.
- Returns
Time domain sequence of size
batch_size x sample_size
.- Return type
Geometric Neural Network Layers¶
- nnabla.functions.affine_grid(theta, size, align_corners=False, n_outputs=- 1, outputs=None)[source]¶
Generate the source grid based on the normalized target grid with
size
. The target grid is first normalized in [-1, 1], then tranformed by the affine transformation \(\theta\) to generate the source grid. 2D and 3D grid are supported now.This function is normally used with the
warp_by_grid
function for constructing the spatial transformer.- Parameters
theta (Variable) – N-D array with the shape (\(B \times 2 \times 3\)), the sample-wise affine transformation matrix.
size (repeated int64) – The grid size of (\(H \times W\)) for 2D and (\(D \times H \times W\)) for 3D.
align_corners (bool) – If
True
, the top-left and bottom-right pixels correspond to (-1, -1) and (1, 1) respectively since a pixel is located on the corner of a grid, and the target grid is normalized in [-1, 1]. IfFalse
, the normalized target grid in [-1, 1] is scaled bysize - 1 / size
according to the respective spatial size (e.g., \(H\) and \(W\)) before the transformation since a pixel is located on a center of a cell in a grid. [default=False
]
- Returns
N-D array with the shape (\(B \times H \times W \times 2\)) for 2D and (\(B \times D \times H \times W \times 3\)) for 3D. The last dimension of 2 is for (x, y) and of 3 for (x, y, z). The
gird
is used as the source grid for the warping.- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.warp_by_grid(x, grid, mode='linear', padding_mode='zero', align_corners=False, channel_last=False, n_outputs=- 1, outputs=None)[source]¶
Warp the input data by the grid. This function is normally used with the generated normalized grid by the
affine_grid
function for constructing the spatial transformer.- Parameters
x (Variable) – Input data to be warped with the shape (\(B \times C \times H_{in} \times W_{in}\)) for 2D and (\(B \times C \times D_{in} \times H_{in} \times W_{in}\)) for 3D.
grid (Variable) – Grid warping the input data with the shape (\(B \times H_{out} \times W_{out} \times 2\)) for 2D and (\(B \times D_{out} \times H_{out} \times W_{out} \times 3\)) for 3D. The last dimension of 2 is for (x, y) or 3 for (x, y, z).
mode (string) – Interpolation mode, linear or nearest. [default=
'linear'
]padding_mode (string) – Padding mode when the grid value is outside [-1, 1]. If this is “zero”, 0 is used for padding. “reflect” uses the values reflected at the ends of the original input data like the mirror. “repeat” used the values at the ends of the original input data. [default=
'zero'
]align_corners (bool) – The target grid normalized in [-1, 1] is scaled by
size - 1 / size
according to the respective spatial size (e.g., \(H\) and \(W\)) before the transformation if this isFalse
. If this isTrue
, the top-left and bottom-right pixels correspond to (-1, -1) and (1, 1) respectively. [default=False
]channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default=
False
]
- Returns
Output data warped by the grid.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.warp_by_flow(data, flow, n_outputs=- 1, outputs=None)[source]¶
Transform the image(s) data by flow field(s) of offset vectors such that each output pixel corresponds to the input image pixel at the relative offset location given by horizontal and vertical flow values (in other words, the flow field describes the coordinate displacements for each output pixel to the corresponding input pixel). Both data and flow are 4-D variables (in “NCHW” layout) with identical shape except the flow channel dimension (which is always 2).
\[output_{n,c,y,x} = data_{n,c,y',x'},\]where
\[\begin{split}y' &=& y + flow_{n,1,y,x}, \\ x' &=& x + flow_{n,0,y,x}.\end{split}\]The output pixel values at \(y'\) and \(x'\) locations are obtained by bilinear interpolating between the 4 closest pixels of the input image. Pixel values outside of the input image are implicitly padded with the value of the closest boundary pixel.
- Parameters
- Returns
Transformed image data with shape
(N, Channels, Height, Width)
.- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Quantized Neural Network Layers¶
- nnabla.functions.binary_sigmoid(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise binary sigmoid function. In the forward pass, it computes
\[\begin{split}f(x) = \begin{cases} 1 & (x > 0) \\ 0 & ({\rm otherwise})\end{cases},\end{split}\]but in the backward pass, a straight-through approximation of the gradient is used, i.e.,
\[\begin{split}\frac{\partial f(x)}{\partial x} = \begin{cases} 0 & (|x| \geq 1) \\ \frac{1}{2} & ({\rm otherwise}) \end{cases}.\end{split}\]References
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.binary_tanh(x, n_outputs=- 1, outputs=None)[source]¶
Element-wise binary tanh function. In the forward pass, it computes
\[\begin{split}f(x) = \begin{cases} 1 & (x > 0) \\ -1 & ({\rm otherwise}) \end{cases},\end{split}\]but in the backward pass, a straight-through approximation of the gradient is used, i.e.,
\[\begin{split}\frac{\partial f(x)}{\partial x} = \begin{cases} 0 & (|x| \geq 1) \\ 1 & ({\rm otherwise}) \end{cases}.\end{split}\]References
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.binary_connect_affine(x, weight, binary_weight, bias=None, base_axis=1, quantize_zero_to=1.0, n_outputs=- 1, outputs=None)[source]¶
This function provides a BinaryConnect affine layer. It computes in the forward pass
\[y_j = \sum_{i} sign(w_{j,i}) x_i,\]i.e., the weights \(w_{j,i}\) are binarized to \(sign(w_{j,i})\) and, hence, each weight is in \(\{-1,\,1\}\). By this weight binarization, the inner product computations do not require any multiplications anymore as they turn into additions/subtractions.
This function should be used together with
batch_normalization()
.Note
1) If you would like to share the binary weights between other layers, please use the standard, floating value weights (
weight
) and not the binary weights (binary_weight
).2) The weights and the binary weights become in sync only after a call to
forward()
, and not after a call tobackward()
. If you wish to store the parameters of the network, remember to callforward()
, once before doing so, otherwise the weights and the binary weights will not be in sync.3) CPU and GPU implementations now use floating values for
binary_weight
, since this function is for simulation purposes.References
- Parameters
x (Variable) – Input .
weight (Variable) – Weight . [parameter]
binary_weight (Variable) – Binarized weight . [parameter]
bias (Variable) – Bias. [optional][parameter]
base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default=
1
]quantize_zero_to (float) – Input value at zero is quantized to this value. [default=
1.0
]
- Returns
Output.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.binary_connect_convolution(x, weight, binary_weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, quantize_zero_to=1.0, n_outputs=- 1, outputs=None)[source]¶
This function provides a BinaryConnect convolution layer. It computes in the forward pass
\[y_{n, a, b} = \sum_{m} \sum_{i} \sum_{j} sign(w_{n, m, i, j}) x_{m, a + i, b + j},\]i.e., the weights \(w_{n, m, i, j}\) are binarized to \(sign(w_{n, m, i, j})\) and, hence, each weight is in \(\{-1,\,1\}\). By this weight binarization, the inner product computations do not require any multiplications anymore as they turn into additions/subtractions.
This function should be used together with
batch_normalization()
.Reference
Note
1) If you would like to share the binary weights between other layers, please use the standard, floating value weights (
weight
) and not the binary weights (binary_weight
).2) The weights and the binary weights become in sync only after a call to
forward()
, and not after a call tobackward()
. If you wish to store the parameters of the network, remember to callforward()
, once before doing so, otherwise the weights and the binary weights will not be in sync.3) CPU and GPU implementations now use floating values for
binary_weight
, since this function is for simulation purposes.- Parameters
x (Variable) – Input.
weight (Variable) – Weight. [parameter]
binary_weight (Variable) – Binarized weight. [parameter]
bias (Variable) – Bias. [optional][parameter]
base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default=
1
]pad (
tuple
ofint
) – Padding sizes for dimensions. [default=(0,) * (len(x.shape) - (base_axis+1))
]stride (
tuple
ofint
) – Stride sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default=
1
]quantize_zero_to (float) – Input value at zero is quantized to this value. [default=
1.0
]
- Returns
Output
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.binary_weight_affine(x, weight, binary_weight, alpha, bias=None, base_axis=1, quantize_zero_to=1.0, n_outputs=- 1, outputs=None)[source]¶
This function provides a Binary Weight Network affine layer. It computes in the forward pass
\[y_j = \frac{1}{\|\mathbf{w}_j\|_{\ell_1}} \sum_{i} sign(w_{j,i}) x_i\]i.e., the weights \(w_{j,i}\) are binarized to \(sign(w_{j,i})\) and, hence, each weight is in \(\{-1,\,1\}\). By this weight binarization, the inner product computations turn into additions/subtractions which are followed by multiplication with the scaling factor \(\alpha_j = \frac{1}{\|\mathbf{w}_j\|_{\ell_1}}\).
Reference
Note
1) If you would like to share the binary weights with other layers, please use the standard, floating value weights (
weight
) and not the binary weights (binary_weight
).2) The weights and the binary weights become in sync only after a call to
forward()
, and not after a call tobackward()
. If you wish to store the parameters of the network, remember to callforward()
, once before doing so, otherwise the weights and the binary weights will not be in sync.3) CPU and GPU implementations now use floating values for
binary_weight
, since this function is for simulation purposes.- Parameters
x (Variable) – Input .
weight (Variable) – Weight. [parameter]
binary_weight (Variable) – Binarized weight. [parameter]
alpha (Variable) – Alpha. [parameter]
bias (Variable) – Bias. [optional][parameter]
base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default=
1
]quantize_zero_to (float) – Input value at zero is quantized to this value. [default=
1.0
]
- Returns
Output.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.binary_weight_convolution(x, weight, binary_weight, alpha, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, quantize_zero_to=1.0, n_outputs=- 1, outputs=None)[source]¶
This function provides a Binary Weight Network convolution layer. It computes in the forward pass
\[y_{n, a, b} = \frac{1}{\|\mathbf{w}_n\|_{\ell_1}} \sum_{m} \sum_{i} \sum_{j} sign(w_{n, m, i, j}) x_{m, a + i, b + j}.\]i.e., the weights \(w_{n, m, i, j}\) are binarized to \(sign(w_{n, m, i, j})\) and, hence, each weight is in \(\{-1,\,1\}\). By this weight binarization, the inner product computations turn into additions/subtractions which are followed by multiplication with the scaling factor \(\alpha_n = \frac{1}{\|\mathbf{w}_n\|_{\ell_1}}\).
Reference
Note
1) If you would like to share the binary weights between other standard layers, please use the standard, floating value weights (
weight
) and not the binary weights (binary_weight
).2) The weights and the binary weights become in sync only after a call to
forward()
, and not after a call tobackward()
. If you wish to store the parameters of the network, remember to callforward()
, once before doing so, otherwise the weights and the binary weights will not be in sync.3) CPU and GPU implementations now use floating values for
binary_weight
, since this function is for simulation purposes.- Parameters
x (Variable) – Input.
weight (Variable) – Weight. [parameter]
binary_weight (Variable) – Binarized weight. [parameter]
alpha (Variable) – Alpha. [parameter]
bias (Variable) – Bias. [optional][parameter]
base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default=
1
]pad (
tuple
ofint
) – Padding sizes for dimensions. [default=(0,) * (len(x.shape) - (base_axis+1))
]stride (
tuple
ofint
) – Stride sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default=
1
]quantize_zero_to (float) – Input value at zero is quantized to this value. [default=
1.0
]
- Returns
Output
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.fixed_point_quantize(x, sign=True, n=8, delta=0.0625, quantize=True, ste_fine_grained=True, outputs=None)[source]¶
Fixed Point Quantize.
This function simulates to uniformly quantize values in fixed-point number representation.
- Parameters
x (Variable) – An input variable.
sign (bool) – Indicate the signed number or the unsigned number. Default is true.
n (int) – Bit width used. Note that
sign
consumes one bit. \(n-1\) is used for number representation insigned
case.delta (float) – Step size.
quantize (bool) – If true, quantize input, otherwise not.
ste_fine_grained (bool) – If true, STE is not 1.
- Returns
N-D array.
- Return type
See also
nnabla.function_bases.fixed_point_quantize
.In the forward pass,
\[\begin{split}\begin{equation} q_i= \left\{ \begin{array}{ll} max & if \ \ \ x_i > max \\ sign(x_i) \times floor(|x_i| \delta^{-1} + 2^{-1}) \times \delta & if \ \ min \le x_i \le max \\ min & if \ \ x_i < min \\ \end{array} \right., \end{equation}\end{split}\]where \(\delta\) is the step size, \((min, max) :=(- (2^{n-1} - 1)\delta, (2^{n-1} - 1)\delta)\) if \(sign\) is true, \((min, max) := (0, (2^n - 1) \delta)\) otherwise, and \(n\) is the total bit-width used.
In the backward pass when using
ste_fine_grained
as false,\[\begin{equation} \frac{\partial q_i}{\partial x_i} = 1. \end{equation}\]In the backward pass when using
ste_fine_grained
as true,\[\begin{split}\begin{equation} \frac{\partial q_i}{\partial x_i}= \left\{ \begin{array}{ll} 0 & if \ \ \ x_i > max \\ 1 & if \ \ min \le x_i \le max \\ 0 & if \ \ x_i < min \\ \end{array} \right.. \end{equation}\end{split}\]Note
Quantized values are stored as floating point number, since this function is for simulation purposes.
- nnabla.functions.min_max_quantize(x, qr_min, qr_max, ql_min, ql_max, decay=0.999, x_min_max=False, ema=False, ste_fine_grained=True, eps=0.01, quantize=True, outputs=None)[source]¶
Min-max quantization.
This function simulates to uniformly quantize values in fixed-point number representation.
Min-max quantization is defined as the following equation
\[y = round \left(\frac{\min(\max(x, m), M) - m}{scale} \right) \times scale + m,\]where the \(scale\) is defined as
\[scale = \frac{M - m}{M_q - m_q},\]and
\[\begin{split}m_q = ql_{min}, \\ M_q = ql_{max}, \\ m = qr_{min}, \\ M = qr_{max}.\end{split}\]In the backward pass when using
ste_fine_grained
as false,\[\frac{\partial q_i}{\partial x_i} = 1.\]In the backward pass when using
ste_fine_grained
as true,\[\begin{split} \frac{\partial q_i}{\partial x_i}= \left\{ \begin{array}{ll} 0 & if \ \ \ x_i > M \\ 1 & if \ \ m \le x_i \le M \\ 0 & if \ \ x_i < m \\ \end{array} \right..\end{split}\]\(qr_{min}\) and \(qr_{max}\) are treaded as follows.
x_min_max
isTrue
andema
isTrue
: Exponential moving average are computed for each \(min(x)\) and \(max(x)\) then stored in \(qr_{min}\) and \(qr_{max}\).x_min_max
isTrue
andema
isFalse
: \(min(x)\) and \(max(x)\) are computed then stored in \(qr_{min}\) and \(qr_{max}\).x_min_max
isFalse
andema
isTrue
: Exponential moving average stored in \(qr_{min}\) and \(qr_{max}\) are used.x_min_max
isFalse
andema
isFalse
Gradients of \(qr_{min}\) and \(qr_{max}\) are computed in the backward pass.
More precisely, in inference of the min-max quantization, one has to consider zero-point (zp) which corresponds to the real value 0, and its data type is an integer. zero-point is defined as
\[\begin{split} && zp_f = ql_{min} -\frac{qr_{min}}{scale}, \\ && zp = \left\{ \begin{array}{ll} ql_{max} & if \ \ \ zp_f >= ql_{max} \\ round(zp_f) & if \ \ otherwise \\ ql_{min} & if \ \ zp_f <= ql_{min} \\ \end{array} \right..\end{split}\]Accordingly, in order to simulate quantization effect of zero-point, during both forward and backward pass, \(qr_{min}\) and \(qr_{max}\) are adjusted as follows,
\[\begin{split}qr_{min}^{adj} = ql_{min} - zp * scale, \\ qr_{max}^{adj} = ql_{max} - zp * scale.\end{split}\]These operations are often called nudge.
Finally, in the formulas of the min-max quantization, \(m\) and \(M\) are replaced by \(qr_{min}^{adj}\) and \(qr_{max}^{adj}\) respectively.
- Parameters
x (Variable) – Input N-D array.
qr_min (Variable) – Minimum quantization range (modified during forward execution).
qr_max (Variable) – Maximum quantization range (modified during forward execution).
ql_min (Variable) – Minimum quantization level, typically 0.
ql_max (Variable) – Maximum quantization level, typically 255.
decay (float) – The decay rate for the exponential moving average.
x_min_max (bool) – Use the min and max of x to compute quantization ranges. Default is
False
.ema (bool) – Use the exponential moving average for the min and max quantization ranges. Default is
False
.ste_fine_grained (bool) – If
True
, STE is not 1, the {0, 1}-mask computed from the min-max is applied to the gradient in the backward; otherwise, STE is 1.eps (float) – Epsilon, or small value to ensure \(qr_{max} - qr_{min}\) must be greater than the epsilon.
quantize (bool) – Apply quantization or not.
References
Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko, “Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference”, https://arxiv.org/abs/1712.05877
- nnabla.functions.pow2_quantize(x, sign=True, with_zero=True, n=8, m=1, quantize=True, ste_fine_grained=True, outputs=None)[source]¶
Pow2 Quantize.
This function simulates to uniformly quantize values in fixed-point number representation.
- Parameters
x (Variable) – An input variable.
sign (bool) – Indicate the signed number or the unsigned number. Default is true.
with_zero (bool) – Indicate using zero as a quantized value. Default is true. Note that
zero
consumes one bit.n (int) – Bit width used. Note that
sign
consumes one bit. \(n-1\) is used for number representation insigned
case. Default is 8.m (int) – \(2^m\) is the upper bound of the dynamic range and \(-2^m\) is the lower bound, \(m \in \mathcal{Z}\). Default is 1.
quantize (bool) – If true, quantize input, otherwise not.
ste_fine_grained (bool) – If true, STE is not 1.
- Returns
N-D array.
- Return type
See also
nnabla.function_bases.pow2_quantize
.In the forward pass of
signed
case,\[\begin{split}q_i= \left\{ \begin{array}{ll} max_{+} & if \ \ \overline{q_i} > max_{+} \\ \overline{q_i} & if \ \ min_{+} \le \overline{q_i} \le max_{+} \\ min_{+} & if \ \ 0 \le \overline{q_i} < min_{+} \\ min_{-} & if \ \ min_{-} < \overline{q_i} < 0 \\ \overline{q_i} & if \ \ max_{-} \le \overline{q_i} \le min_{-}\\ max_{-} & if \ \ \overline{q_i} < max_{-} \\ \end{array} \right.,\end{split}\]where
\[\begin{split}&& max_{+} = 2^{m}, min_{+} = 2^{m - (2^{n-1} - 1)},\\ && max_{-} = -2^{m}, min_{-} = -2^{m - (2^{n-1} - 1)},\\ && \overline{q_i} = sign(x_i) \times 2^{round(\log_2 |x_i|)}.\end{split}\]This quantization uses the geometric mean between two power-of-two numbers as quantization threshold.
In the forward pass of
unsigned
case,\[\begin{split}q_i= \left\{ \begin{array}{ll} max & if \ \ \overline{q_i} > max \\ \overline{q_i} & if \ \ min \le \overline{q_i} \le max \\ min & if \ \ 0 < \overline{q_i} < min \\ \end{array} \right.,\end{split}\]where
\[\begin{split}&& max = 2^{m}, min = 2^{m - (2^{n} - 1)},\\ && \overline{q_i} = 2^{int(\log_2 |x_i|)}.\end{split}\]When using
with_zero
as true, a pruning threshold is used to round an input to 0 or \(min\). The pruning threshold is defined in this function as the following,\[pruning\ threshold = min \times 2^{-\frac{1}{2}}.\]If an absolute value of the input is lesser than this value, the input is rounded to 0, otherwise \(min\).
In the backward pass when using ste_fine_grained as false,
\[\frac{\partial q_i}{\partial x_i} = 1.\]In the backward pass when using ste_fine_grained as true,
\[\begin{split}\frac{\partial q_i}{\partial x_i}= \left\{ \begin{array}{ll} 0 & if \ \ \overline{q_i} > max_{+} \\ 1 & if \ \ otherwise \\ 0 & if \ \ \overline{q_i} < max_{-} \\ \end{array} \right..\end{split}\]
- nnabla.functions.prune(x, rate=0.9, n_outputs=- 1, outputs=None)[source]¶
Prune the input as the following equation,
\[\begin{split}q_i = \left \{ \begin{array}{ll} 0 & abs(x_i) < threshold \\ x_i & otherwise \end{array} \right.\end{split}\]where \(threshold\) is determined by
threshold = np.sort(np.abs(x))[int((x.size - 1) * rate)]
.- Parameters
- Returns
N-D array with the same shape as x
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.inq_affine(x, weight, indicator_fixedweights, bias=None, base_axis=1, num_bits=4, inq_iterations=(), selection_algorithm='largest_abs', seed=- 1, n_outputs=- 1, outputs=None)[source]¶
This function provides a INQ affine layer. It computes in the forward pass
\[y_j = \sum_{i} w_{j,i} x_i,\]where the weights \(w_{j,i}\) are quantized sequentially during training to power-of-two numbers. In the backward pass, only the non-fixed (i.e., learnable) weights are updated.
References
- Parameters
x (Variable) – Input .
weight (Variable) – Weight . [parameter]
indicator_fixedweights (Variable) – Indicates which weights are already fixed (0 = not fixed, 1 = fixed) . [parameter]
bias (Variable) – Bias. [optional][parameter]
base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default=
1
]num_bits (int) – Number of bits per weight. Needs to be >= 2 as two bits are used to code
zero
and sign of weight. [default=4
]inq_iterations (repeated int64) – List which specifies after how many forward passes we fix 50% of the learnable weights. If we have done as many iterations as specified in the last element of
inq_iterations
, then all weights are fixed. [default=()
]selection_algorithm (string) – Chooses algorithm that we use for selecting the weights to fix (“largest_abs” … fix weights with largest absolute value, “random” … fix weights randomly) [default=
'largest_abs'
]seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default=
-1
]
- Returns
Output.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.inq_convolution(x, weight, indicator_fixedweights, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, num_bits=4, inq_iterations=(), selection_algorithm='largest_abs', seed=- 1, n_outputs=- 1, outputs=None)[source]¶
This function provides a INQ convolution layer. It computes in the forward pass
\[y_{n, a, b} = \sum_{m} \sum_{i} \sum_{j} w_{n, m, i, j} x_{m, a + i, b + j},\]where the weights \(w_{j,i}\) are quantized sequentially during training to power-of-two numbers. In the backward pass, only the non-fixed (i.e., learnable) weights are updated.
Reference
- Parameters
x (Variable) – Input.
weight (Variable) – Weight. [parameter]
indicator_fixedweights (Variable) – Indicates which weights are already fixed (0 = not fixed, 1 = fixed) . [parameter]
bias (Variable) – Bias. [optional][parameter]
base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default=
1
]pad (
tuple
ofint
) – Padding sizes for dimensions. [default=(0,) * (len(x.shape) - (base_axis+1))
]stride (
tuple
ofint
) – Stride sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]dilation (
tuple
ofint
) – Dilation sizes for dimensions. [default=(1,) * (len(x.shape) - (base_axis+1))
]group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default=
1
]num_bits (int) – Number of bits per weight. Needs to be >= 2 as two bits are used to code
zero
and sign of weight. [default=4
]inq_iterations (repeated int64) – List which specifies after how many forward passes we fix 50% of the learnable weights. If we have done as many iterations as specified in the last element of
inq_iterations
, then all weights are fixed. [default=()
]selection_algorithm (string) – Chooses algorithm that we use for selecting the weights to fix (“largest_abs” … fix weights with largest absolute value, “random” … fix weights randomly) [default=
'largest_abs'
]seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default=
-1
]
- Returns
Output
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Unsupported, Special Use¶
- nnabla.functions.vat_noise(x, w, base_axis=1, eps=1.0, n_outputs=- 1, outputs=None)[source]¶
Noise for virtual adversarial training.
This layer is a special layer for GUI network designing, specialized for getting the noise of virtual adversarial training.
In the backward process, the weight parameter will be replaced with the gradient.
Forward
\[y_i = \frac{\epsilon x_i}{\sqrt{\sum_k x_k^2 + c}}\]Backward
\[\delta x_i = 0\]\[w_i = \epsilon \delta y_i\]Note
This layer is a special layer for GUI network designing.
References
- Parameters
x (Variable) – N-D array of noise input. Noise is standard Gaussian noise initially, but the next step, fed back gradient variable.
w (Variable) – N-D array for keep gradient values.
base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default=
1
]eps (float) – Noise norm (l2) factor. [default=
1.0
]
- Returns
N-D array
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.unlink(x, n_outputs=- 1, outputs=None)[source]¶
This function behaves as an identity function on the forward pass, and deletes the gradient for the background pass.
This layer is a special layer for GUI network designing, used for getting zero backward operation by adding this layer.
Forward
\[y_i = x_i\]Backward
\[\delta x_i = 0\]Note
This layer is a special layer for GUI network designing.
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.sink(*x, **kw)[source]¶
Creates a dummy variable used to call forward or backward function of multiple variables at one place.
This takes any numbers of input variables with any shape, and creates a single 0-shape outputs. The forward pass does nothing. The backward pass set ones to the input grads if one_input_grad is set as true.
Note
sink
can only be called at the very end of the graph, andgrad
of input variables are clearedwhen
y.backward(clear_buffer=True)
is called.- Parameters
- Returns
Dummy variable.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.confusion_matrix(x, target, axis=None, n_outputs=- 1, outputs=None)[source]¶
Confusion matrix. The return value is already summed over samples.
- Parameters
- Returns
Confusion matrix 2-D array. Col index is estimated class. Row index is label class.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Image Object Detection¶
- nnabla.functions.nms_detection2d(x, thresh=None, nms=None, nms_per_class=None, n_outputs=- 1, outputs=None)[source]¶
Non-Maximum Suppression (NMS) to 2D Object detector output. The input is a 3-dimensional tensor with shape of
(B, N, 5 + C)
whereB
denotes batch size,N
denotes the number of detection box candidates, andC
denotes the number of classes of object detection.5 + C
consists of the box coordinatesx, y, w, h
in normalized coordinates (size of each x and y are 1.0), objectness (learned to predict IoU value to ground truth box), and the class probabilities ofC
classes. It outputs a tensor with the same dimensions as the input, where all values are copied from the input to the output, except the class probabilities are multiplied by objectness, and possibly suppressed to 0 by NMS. During NMS, all of combination of pairs of bounding boxes is compared. For each pair, the bounding box with a lower detection score (described below) is suppressed if the overlap ratio (the IoU) is greater than the value ofnms
.There are two suppression modes for NMS.
1. Suppress by class probability (
nms_per_class
isTrue
): For each bounding box, the detection score is calculated byobjectness * probability[class_id]
for each class. The suppression is done for each class independently.2. Suppress by objectness (
nms_per_class
isFalse
): The suppression is done for each bounding box usingobjectness
as a detection score. All class probabilities becomes 0 for every suppressed boxes.References
- Parameters
- Returns
A 3-dim array with the same dimensions with the input.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Validation¶
- nnabla.functions.top_n_error(x, target, axis=None, n=1, n_outputs=- 1, outputs=None)[source]¶
Top N error along the dimension specified by the axis, the element of outputs is
\[\begin{split}y_i = \left \{ \begin{array}{l} 1 \ (x_i \ is \ not \ within \ N-th \ place) \\ 0 \ (x_i \ is \ within \ N-th \ place) \end{array} \right.\end{split}\]- Parameters
- Returns
Element-wise error N-D array. (\(D_1 \times ... \times 1 \times ... \times D_N\))
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
- nnabla.functions.binary_error(x, target, n_outputs=- 1, outputs=None)[source]¶
Elementwise binary error.
\[\begin{split}y_i = \left \{ \begin{array}{l} 0 ((x^{(0)} \geq 0.5) = (x^{(1)} \geq 0.5)) \\ 1 ((x^{(0)} \geq 0.5) \neq (x^{(1)} \geq 0.5)) \end{array} \right.\end{split}\]- Parameters
- Returns
Element-wise errors N-D array.
- Return type
Note
All nnabla functions in
nnabla.functions
are decorated with thennabla.function_bases.function_api
decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.
Parametric Functions¶
In NNabla, trainable models are created by composing functions that have optimizable parameters.
These functions are called parametric functions.
Parametric functions are provided by nnabla.parametric_functions
.
- See also:
Parameter Management API¶
The parameters registered by List of Parametric Functions can be managed using APIs listed in this section.
- nnabla.parameter.parameter_scope(name, scope=None)[source]¶
Grouping parameters registered by parametric functions listed in
nnabla.parametric_functions
.- Parameters
name (str) – Parameter scope name.
scope (OrderedDict, optional) – Specify current parameter scope as a local dictionary. The default value is
None
. In this case, the current parameter scope maintained in global is used.
Example:
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.functions as F with nn.parameter_scope('conv1'): conv_out1 = PF.convolution(x, 32, (5, 5)) bn_out1 = PF.batch_normalization(conv_out1) act_out1 = F.relu(bn_out1) with nn.parameter_scope('conv2'): conv_out2 = PF.convolution(act_out1, 64, (3, 3)) bn_out2 = PF.batch_normalization(conv_out2) act_out2 = F.relu(bn_out2)
Nesting
The with statement
blocks allows you to nest parameter scopes. This can also be done by using “/” inside the parameter names.Example:
with nn.parameter_scope('network1'): with nn.parameter_scope('conv1'): conv_out1 = PF.convolution(x, 32, (5, 5)) bn_out1 = PF.batch_normalization(conv_out1) act_out1 = F.relu(bn_out1) with nn.parameter_scope('conv2'): conv_out2 = PF.convolution(act_out1, 64, (3, 3)) bn_out2 = PF.batch_normalization(conv_out2) act_out2 = F.relu(bn_out2)
is equivalent to
with nn.parameter_scope('network1/conv1'): conv_out1 = PF.convolution(x, 32, (5, 5)) bn_out1 = PF.batch_normalization(conv_out1) act_out1 = F.relu(bn_out1) with nn.parameter_scope('network1/conv2'): conv_out2 = PF.convolution(act_out1, 64, (3, 3)) bn_out2 = PF.batch_normalization(conv_out2) act_out2 = F.relu(bn_out2)
- nnabla.parameter.get_parameters(params=None, path='', grad_only=True)[source]¶
Get parameter Variables under the current parameter scope.
- Parameters
- Returns
- Return type
- nnabla.parameter.save_parameters(path, params=None, extension=None)[source]¶
Save all parameters into a file with the specified format.
Currently hdf5 and protobuf formats are supported.
- nnabla.parameter.load_parameters(path, proto=None, needs_proto=False, extension='.nntxt')[source]¶
Load parameters from a file with the specified format.
- Parameters
path – path or file object
- nnabla.parameter.get_parameter_or_create(name, shape=None, initializer=None, need_grad=True, as_need_grad=None)[source]¶
Returns an existing parameter variable in current parameter scope with the provided name.
If a variable with the provided name does not exist, a new variable is created and registered to the current parameter scope with the name, then returned.
- Parameters
name (str) – The name under the current scope. If it already exists, the name is queried from the parameter manager.
shape (
tuple
ofint
) – Shape of created parameter. The shape of the specified parameter must match with this shape. The default is None which is only valid if initializer is given as annumpy.ndarray
.initializer (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – An initialization function to be applied to the parameter.numpy.ndarray
can also be given to initialize parameters from numpy array data.need_grad (bool) – Register the parameter with the specified
need_grad
flag. The default is True. If the flag is different from the previously specified one, the flag will be overwritten, but the values will be kept.as_need_grad (bool) – Get a parameter variable with the specified
need_grad
flag. Note that this doesn’t overwrite the flag of the registered parameter variable with the provided name. Instead, if the given flag mismatches with the previously registeredneed_grad
flag, it returns a new variable referring to the same array contents but withneed_grad=as_need_grad
.
Note
It returns a Variable which is unlinked from the registered one in the current parmeter scope (using
nnabla.Variable.get_unlinked_variable()
). That means changing aneed_grad
attribute doesn’t affect the variable existing in the current parameter scope.
List of Parametric Functions¶
Parametric functions are provided by nnabla.parametric_functions
, as listed below.
Like functions listed in Functions, they take Variable
(s) as
first argument(s) followed by options specific to a parametric function. In addition,
they register parameter Variable
(s) into the parameter scope.
The parameter variables are registered with need_grad
properties specific
to a parametric function. The variables with need_grad=False
flag will not
be updated by gradient descent. Hence, backward computation is not executed for
those variables. False
is usually specified when the parameters are updated
during foward pass and/or backward pass, e.g., batch normalization.
All parametric functions take an optional argument fix_parameters=False
.
By giving True
, the associated parameter variables are connected to a
computation graph with a property need_grad=False
regardless properties
of the registered variables, then backward gradient
computation is not executed for those variables. This is useful when you create
a computation graph for evaluation purpose, fixing parameters partially in a
graph, and so on.
All parametric functions listed below are decorated with the following decorator.
- nnabla.parametric_functions.parametric_function_api(scope_name=None, param_desc=None)[source]¶
Decorator for parametric functions.
The decorated function is always called under a parameter scope
scope_name
. Also, the decorator adds an additional argumentname
(str
, default isNone
) at the end. Ifname
is specified, the scopescope_name
comes under a scopename
. This feature could reduce vertical space usage of the source code. Any parametric function should be decorated by this.- Parameters
scope_name (str, optional) – The original function will be called under a parameter scope named by
scope_name
.param_desc (list, optional) – Descriptions of parameters will be automatically included into docstring. This must be a list of tuples with 4 elements composed of (name (str), description (str), shape info (str), need_grad (bool)).
- Returns
A decorated parametric function.
- Return type
function
See Parameter Management API to know how to query and manipulate registered variables.
Here is the list of parametric functions.
- nnabla.parametric_functions.affine(inp, n_outmaps, base_axis=1, w_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, apply_w=None, apply_b=None, name=None)[source]¶
The affine layer, also known as the fully connected layer. Computes
\[{\mathbf y} = {\mathbf A} {\mathbf x} + {\mathbf b}.\]where \({\mathbf x}, {\mathbf y}\) are the inputs and outputs respectively, and \({\mathbf A}, {\mathbf b}\) are constants.
- Parameters
inp (Variable) – Input N-D array with shape (\(M_0 \times \ldots \times M_{B-1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.
n_outmaps (
int
ortuple
ofint
) – Number of output neurons per data.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros ifwith_bias
isTrue
.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
apply_w (function) – Lambda, function, or callable object applied to the weights.
apply_b (function) – Lambda, function, or callable object applied to the bias.
- Returns
\((B + 1)\)-D array. (\(M_0 \times \ldots \times M_{B-1} \times L\))
- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"affine"
;W (
need_grad=True
) : Weight matrix. (shape:(inmaps, outmaps)
)b (
need_grad=True
) : bias vector. (shape:(outputs,)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = affine(<args>)
- nnabla.parametric_functions.convolution(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, channel_last=False, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, apply_w=None, apply_b=None, name=None)[source]¶
N-D Convolution with a bias term.
For Dilated Convolution (a.k.a. Atrous Convolution), refer to:
Chen et al., DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. https://arxiv.org/abs/1606.00915
Yu et al., Multi-Scale Context Aggregation by Dilated Convolutions. https://arxiv.org/abs/1511.07122
Note
Convolution is a computationally intensive operation that should preferably be run with the
cudnn
backend. NNabla then uses CuDNN library functions to determine and cache the fastest algorithm for the given set of convolution parameters, which results in additional memory consumption which may pose a problem for GPUs with insufficient memory size. In that case, theNNABLA_CUDNN_WORKSPACE_LIMIT
environment variable can be used to restrict the choice of algorithms to those that fit the given workspace memory limit, expressed in bytes. In some cases it may also be desired to restrict the automatic search to algorithms that produce deterministic (reproducible) results. This can be requested by setting the the environment variableNNABLA_CUDNN_DETERMINISTIC
to a non-zero value.- Parameters
inp (Variable) – N-D array.
outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) two-dimensional kernel, specify (3,5).group (int) – Number of groups of channels. This makes connections across channels more sparse by grouping connections along map direction.
channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a. NHWC order.
w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros ifwith_bias
isTrue
.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
apply_w (function) – Lambda, function, or callable object applied to the weights.
apply_b (function) – Lambda, function, or callable object applied to the bias.
- Returns
N-D array. See
convolution
for the output shape.- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"conv"
;W (
need_grad=True
) : Filter weights. (shape:(outmaps, inmaps // group, *kernel)
)b (
need_grad=True
) : Bias vector. (shape:(outmaps,)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = convolution(<args>)
- nnabla.parametric_functions.depthwise_convolution(inp, kernel, pad=None, stride=None, dilation=None, multiplier=1, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶
N-D Depthwise Convolution with a bias term.
Reference:
Chollet: Chollet, Francois. “Xception: Deep Learning with Depthwise Separable Convolutions. https://arxiv.org/abs/1610.02357
- Parameters
inp (Variable) – N-D array.
kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) two-dimensional kernel, specify (3,5).multiplier (
int
) – Number of output feature maps per input feature map.w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros ifwith_bias
isTrue
.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
- Returns
N-D array. See
depthwise_convolution
for the output shape.- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"depthwise_conv"
;W (
need_grad=True
) : Filter weights. (shape:(inmaps * multiplier, *kernel)
)b (
need_grad=True
) : Bias vector. (shape:(inmaps * multiplier,)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = depthwise_convolution(<args>)
- nnabla.parametric_functions.deconvolution(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, channel_last=False, output_padding=None, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, apply_w=None, apply_b=None, name=None)[source]¶
Deconvolution layer.
- Parameters
inp (Variable) – N-D array.
outmaps (int) – Number of deconvolution kernels (which is equal to the number of output channels). For example, to apply deconvolution on an input with 16 types of filters, specify 16.
kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply deconvolution on an image with a 3 (height) by 5 (width) two-dimensional kernel, specify (3,5).group (int) – Number of groups of channels. This makes connections across channels sparser by grouping connections along map direction.
w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros ifwith_bias
isTrue
.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
apply_w (function) – Lambda, function, or callable object applied to the weights.
apply_b (function) – Lambda, function, or callable object applied to the bias.
- Returns
N-D array. See
deconvolution
for the output shape.- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"deconv"
;W (
need_grad=True
) : Filter weights. (shape:(inmaps, outmaps // group, *kernel)
)b (
need_grad=True
) : Bias vector. (shape:(outmaps,)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = deconvolution(<args>)
- nnabla.parametric_functions.depthwise_deconvolution(inp, kernel, pad=None, stride=None, dilation=None, divisor=1, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶
Depthwise deconvolution computes the transposed depthwise convolution for one-dimensional and two-dimensional input data.
- Parameters
inp (Variable) – N-D array.
kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) two-dimensional kernel, specify (3,5).divisor (
int
) – Number of input feature maps per output feature map.w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros ifwith_bias
isTrue
.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
- Returns
N-D array. See
depthwise_deconvolution
for the output shape.- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"depthwise_deconv"
;W (
need_grad=True
) : Filter weights. (shape:(inmaps,) + kernel
)b (
need_grad=True
) : Bias vector. (shape:(inmaps / divisor,)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = depthwise_deconvolution(<args>)
- nnabla.parametric_functions.deformable_convolution(inp, outmaps, kernel, offset, mask=None, pad=None, stride=None, dilation=None, group=1, deformable_group=1, channel_last=False, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, apply_w=None, apply_b=None, name=None)[source]¶
2D Deformable Convolution with a bias term. If use mask, this function is Deformable Convolution v2.
Dai et al., Deformable Convolutional Networks. https://arxiv.org/abs/1703.06211
Zhu et al., Deformable ConvNets v2: More Deformable, Better Results. https://arxiv.org/abs/1811.11168
- Parameters
inp (Variable) – N-D array.
outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) two-dimensional kernel, specify (3,5).offset (Variable) – Offsets for deformable convolutions. Shape is fixed to \((N, deformable_group imes 2 imes Kh imes Kw, H, W)\). Offsets must be calculated externally through a separate convolution layer.
mask (Variable) – Normalized mask for deformable convolutions v2. Shape is fixed to \((N, deformable_group imes 1 imes Kh imes Kw, H, W)\). Masks must be calculated externally together with the offsets through a separate convolution layer.
group (int) – Number of groups of channels. This makes connections across channels more sparse by grouping connections along map direction.
deformable_group (int) – Number of deformable groups of channels. This makes connections across channels more sparse by grouping connections along map direction.
channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a. NHWC order.
w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros ifwith_bias
isTrue
.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
apply_w (function) – Lambda, function, or callable object applied to the weights.
apply_b (function) – Lambda, function, or callable object applied to the bias.
- Returns
N-D array. See
convolution
for the output shape.- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"deformable_conv"
;W (
need_grad=True
) : Filter weights. (shape:(outmaps, inmaps // group, *kernel)
)b (
need_grad=True
) : Bias vector. (shape:(outmaps,)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = deformable_convolution(<args>)
- nnabla.parametric_functions.batch_normalization(inp, axes=[1], decay_rate=0.9, eps=1e-05, batch_stat=True, output_stat=False, fix_parameters=False, param_init=None, no_scale=False, no_bias=False, name=None)[source]¶
Batch normalization layer.
\[\begin{split}\begin{array}{lcl} \mu &=& \frac{1}{M} \sum x_i\\ \sigma^2 &=& \frac{1}{M} \sum \left(x_i - \mu\right)^2\\ \hat{x}_i &=& \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon }}\\ y_i &= & \hat{x}_i \gamma + \beta. \end{array}\end{split}\]where \(x_i, y_i\) are the inputs. In testing, the mean and variance computed by moving average calculated during training are used.
- Parameters
inp (Variable) – N-D array of input.
axes (
tuple
ofint
) – Mean and variance for each element inaxes
are calculated using elements on the rest axes. For example, if an input is 4 dimensions, andaxes
is[1]
, batch mean is calculated asnp.mean(inp.d, axis=(0, 2, 3), keepdims=True)
(using numpy expression as an example).decay_rate (float) – Decay rate of running mean and variance.
eps (float) – Tiny value to avoid zero division by std.
batch_stat (bool) – Use mini-batch statistics rather than running ones.
output_stat (bool) – Output batch mean and variance.
fix_parameters (bool) – When set to
True
, the beta and gamma will not be updated.param_init (dict) – Parameter initializers can be set with a dict. A key of the dict must be
'beta'
,'gamma'
,'mean'
or'var'
. A value of the dict must be anInitializer
or anumpy.ndarray
. E.g.{'beta': ConstantInitializer(0), 'gamma': np.ones(gamma_shape) * 2}
.
- Returns
N-D array.
- Return type
References
Ioffe and Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. https://arxiv.org/abs/1502.03167
The shape of parameters has the same number of dimensions with the input data, and the shapes in
axes
has the same dimensions with the input, while the rest has1
. If an input is 4-dim andaxes=[1]
, the parameter shape will beparam_shape = np.mean(inp.d, axis=(0, 2, 3), keepdims=True).shape
(using numpy expression as an example).- Parameters to be registered
The following variables are registered in a parameter scope
"bn"
;beta (
need_grad=True
) : Trainable bias \(\beta\). (shape:<see above>
)gamma (
need_grad=True
) : Trainable scaling factor \(\gamma\). (shape:<see above>
)mean (
need_grad=False
) : Moving average of batch mean. (shape:<see above>
)var (
need_grad=False
) : Moving average of batch variance. (shape:<see above>
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = batch_normalization(<args>)
- nnabla.parametric_functions.fused_batch_normalization(inp, z=None, axes=[1], decay_rate=0.9, eps=1e-05, batch_stat=True, nonlinearity='relu', output_stat=False, fix_parameters=False, param_init=None, no_scale=False, no_bias=False, name=None)[source]¶
Batch normalization layer fused with the following add2 operation of a residual input and an nonlinear activation.
- Parameters
inp (Variable) – N-D array of input.
z (Variable, optional) – A residual input. By specifying None, the activation function will follow immediately after BN operation.
axes (
tuple
ofint
) – Mean and variance for each element inaxes
are calculated using elements on the rest axes. For example, if an input is 4 dimensions, andaxes
is[1]
, batch mean is calculated asnp.mean(inp.d, axis=(0, 2, 3), keepdims=True)
(using numpy expression as an example).decay_rate (float) – Decay rate of running mean and variance.
eps (float) – Tiny value to avoid zero division by std.
batch_stat (bool) – Use mini-batch statistics rather than running ones.
nonlinearity (string) – Activation function. The default is ‘relu’.
output_stat (bool) – Output batch mean and variance.
fix_parameters (bool) – When set to
True
, the beta and gamma will not be updated.
- Returns
N-D array.
- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"bn"
;beta (
need_grad=True
) : Trainable bias \(\beta\). (shape:<see above>
)gamma (
need_grad=True
) : Trainable scaling factor \(\gamma\). (shape:<see above>
)mean (
need_grad=False
) : Moving average of batch mean. (shape:<see above>
)var (
need_grad=False
) : Moving average of batch variance. (shape:<see above>
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = fused_batch_normalization(<args>)
- nnabla.parametric_functions.sync_batch_normalization(inp, comm, group='world', axes=[1], decay_rate=0.9, eps=1e-05, batch_stat=True, output_stat=False, fix_parameters=False, param_init=None, no_scale=False, no_bias=False, name=None)[source]¶
Synchronized batch normalization layer.
For some tasks (e.g., semantic segmentation), batch size will be too small and BatchNormalization layer might not work well. SyncBatchNorlization layer solves these problems by synchronizing batch stats (mean and var) between multiple processes.
\[\begin{split}\begin{array}{lcl} \mu &=& \frac{1}{M} \sum x_i\\ \sigma^2 &=& \frac{1}{M} \left(\sum x_i - \mu\right)^2\\ \hat{x}_i &=& \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon }}\\ y_i &= & \hat{x}_i \gamma + \beta. \end{array}\end{split}\]where \(x_i, y_i\) are the inputs.
- Parameters
inp (Variable) – N-D array of input.
comm (Communicator) – The communicator
group (string) – The name of the communicator group
axes (
tuple
ofint
) – Mean and variance for each element inaxes
are calculated using elements on the rest axes. For example, if an input is 4 dimensions, andaxes
is[1]
, batch mean is calculated asnp.mean(inp.d, axis=(0, 2, 3), keepdims=True)
(using numpy expression as an example).decay_rate (float) – Decay rate of running mean and variance.
eps (float) – Tiny value to avoid zero division by std.
batch_stat (bool) – Use mini-batch statistics rather than running ones.
output_stat (bool) – Output batch mean and variance.
fix_parameters (bool) – When set to
True
, the beta and gamma will not be updated.param_init (dict) – Parameter initializers can be set with a dict. A key of the dict must be
'beta'
,'gamma'
,'mean'
or'var'
. A value of the dict must be anInitializer
or anumpy.ndarray
. E.g.{'beta': ConstantInitializer(0), 'gamma': np.ones(gamma_shape) * 2}
.
- Returns
N-D array.
- Return type
References
Ioffe and Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, https://arxiv.org/abs/1502.03167
Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal, Context Encoding for Semantic Segmentation, https://arxiv.org/abs/1803.08904
Implementing Synchronized Multi-GPU Batch Normalization https://hangzhang.org/PyTorch-Encoding/notes/syncbn.html
The shape of parameters has the same number of dimensions with the input data, and the shapes in
axes
has the same dimensions with the input, while the rest has1
. If an input is 4-dim andaxes=[1]
, the parameter shape will beparam_shape = np.mean(inp.d, axis=(0, 2, 3), keepdims=True).shape
(using numpy expression as an example).- Parameters to be registered
The following variables are registered in a parameter scope
"bn"
;beta (
need_grad=True
) : Trainable bias \(\beta\). (shape:<see above>
)gamma (
need_grad=True
) : Trainable scaling factor \(\gamma\). (shape:<see above>
)mean (
need_grad=False
) : Moving average of batch mean. (shape:<see above>
)var (
need_grad=False
) : Moving average of batch variance. (shape:<see above>
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = sync_batch_normalization(<args>)
- nnabla.parametric_functions.mean_subtraction(inp, base_axis=1, update_running_mean=True, fix_parameters=False, name=None)[source]¶
Mean subtraction layer.
It subtracts the mean of the elements of the input array, and normalizes it to \(0\). Preprocessing arrays with this function has the effect of improving accuracy in various tasks such as image classification.
At training time, this function is defined as
\[\begin{split}\begin{array}{lcl} \mu &=& \frac{1}{M} \sum x_i \\ y_i &=& x_i - \mu \end{array}\end{split}\]At testing time, the mean values used are those that were computed during training by moving average.
Note
The backward performs an approximated differentiation that takes into account only the latest mini-batch.
- Parameters
inp (Variable) – N-D array of input.
base_axis (int) – Base axis of Mean Subtraction operation. Dimensions up to base_axis is treated as sample dimension.
update_running_mean (bool) – When set to
True
, the running mean will not be updated.fix_parameters (bool) – dummy parameter. This argument dose not affect anything.
- Returns
N-D array.
- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"mean_subtraction"
;mean (
need_grad=False
) : Moving average. (shape:inp.shape[base_axis:]
)t (
need_grad=False
) : Minibatch counter used in forward pass. (shape:(1,)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = mean_subtraction(<args>)
- nnabla.parametric_functions.layer_normalization(inp, batch_axis=0, eps=1e-05, output_stat=False, fix_parameters=False, param_init=None, no_scale=False, no_bias=False, name=None)[source]¶
Applies Layer Normalization over an input variable, which is defined as:
\[\begin{split}\begin{eqnarray} \mu^l &=& \frac{1}{H} \sum_{i=1}^{H} x_i^l \\ \sigma^l &=& \sqrt{\frac{1}{H} \sum_{i=1}^{H} \left(x_i^l - \mu^l\right)^2} \\ y &=& \frac{x - \mu^l}{\sigma^l + \epsilon} \gamma + \beta \end{eqnarray}\end{split}\]where \(x\) and \(y\) are input and output variable, \(\mu^l\) and \(\sigma^l\) are the mean and std of each layer along batch axis, and \(\alpha\) and \(\beta\) are trainable parameter.
Note
Unlike other normalizations, which applies scalar scale and bias for each entire channel/plane, Layer Normalization applies per-element scale and bias.
References
- Parameters
inp (Variable) – An input variable.
batch_axis (int or repeated int) – Axes mean and variance are taken.
eps (float) – Tiny value to avoid zero division by std.
output_stat (bool) – It
True
, calculated mean and variance are also returned.fix_parameters (bool) – When set to
True
, the beta and gamma will not be updated.param_init (dict) – Parameter initializers can be set with a dict. A key of the dict must be
'gamma'
,'beta'
. A value of the dict must be anInitializer
or anumpy.ndarray
. E.g.{'gamma': np.ones(...) * 2, 'beta': ConstantInitializer(0)}
.
- Returns
Normalized output variable. *
Variable
: Mean (if ``output_stat=True`). *Variable
: Std (if ``output_stat=True`)- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"layer_normalization"
;beta (
need_grad=True
) : Trainable bias \(\beta\). (shape:<see above>
)gamma (
need_grad=True
) : Trainable scaling factor \(\gamma\). (shape:<see above>
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = layer_normalization(<args>)
- nnabla.parametric_functions.instance_normalization(inp, channel_axis=1, batch_axis=0, eps=1e-05, output_stat=False, fix_parameters=False, param_init=None, no_scale=False, no_bias=False, name=None)[source]¶
Applies Instance Normalization over an input variable, which is defined as:
\[\begin{split}\begin{eqnarray} \mu^i &=& \frac{1}{H} \sum_{i=1}^{H} x_i^i \\ \sigma^i &=& \sqrt{\frac{1}{H} \sum_{i=1}^{H} \left(x_i^i - \mu^i\right)^2} \\ y &=& \frac{x - \mu^i}{\sigma^ + \epsilon} \gamma + \beta \end{eqnarray}\end{split}\]where \(x\) and \(y\) are input and output variable, \(\mu^i\) and \(\sigma^i\) are the mean and std of each instance which is separately calculated for each batch and channel, and \(\gamma\) and \(\beta\) are adaptive gains and biases.
If the input shape is [B, C, H, W] (= channel_axis=1, batch_axis=0), the shape of calculated mean and std are [B, C, 1, 1]
References
- Parameters
inp (Variable) – An input variable.
channel_axis (int or repeated int) – Channel axes.
batch_axis (int or repeated int) – Batch axes.
eps (float) – Tiny value to avoid zero division by std.
output_stat (bool) – It
True
, the batch statistics of mean and variance.fix_parameters (bool) – If
True
, the beta and gamma will not be updated.param_init (dict) – Parameter initializers can be set with a dict. A key of the dict must be
'gamma'
,'beta'
. A value of the dict must be anInitializer
or anumpy.ndarray
. E.g.{'gamma': np.ones(...) * 2, 'beta': ConstantInitializer(0)}
.Returns –
- Parameters to be registered
The following variables are registered in a parameter scope
"instance_normalization"
;beta (
need_grad=True
) : Trainable bias \(\beta\). (shape:<see above>
)gamma (
need_grad=True
) : Trainable scaling factor \(\gamma\). (shape:<see above>
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = instance_normalization(<args>)
- nnabla.parametric_functions.group_normalization(inp, num_groups, channel_axis=1, batch_axis=0, eps=1e-05, output_stat=False, fix_parameters=False, param_init=None, no_scale=False, no_bias=False, name=None)[source]¶
Applies Group Normalization over an input tensor, which is defined as:
\[\begin{split}\begin{eqnarray} \mu^g &=& \frac{1}{H} \sum_{i=1}^{H} x_i^g \\ \sigma^g &=& \sqrt{\frac{1}{H} \sum_{i=1}^{H} \left(x_i^g - \mu^g\right)^2} \\ y &=& \frac{x - \mu^g}{\sigma^g + \epsilon} \gamma + \beta \end{eqnarray}\end{split}\]where \(x\) and \(y\) are input and output variable, \(\mu^g\) and \(\sigma^g\) are the mean and std of each group which contains
num_channels / num_groups
channels, and \(\gamma\) and \(\beta\) are adaptive gains and biases.The input channels, specified by
channel_axis
, are separeted intonum_groups
groups, and the mean and std are calculated over the each group. For example, if the input shape is [B, C, H, W] (= channel_axis=1, batch_axis=0), an input variable is once reshaped to [B, num_groups, C / num_groups, H, W] and standardize by its mean and std whose shapes are [B, num_groups, C / num_groups, 1, 1]. Before returning, an output variable is reshaped again to the original input shape (= [B, C, H, W] in the case above).References
- Parameters
inp (Variable) – An input variable.
num_groups (int) – A number of groups. The channel dim of ‘x’ must be integer multiple of
num_groups
.channel_axis (int) – Channel axis.
batch_axis (int or repeated int) – Axes mean and variance are taken.
eps (float) – Tiny value to avoid zero division by std.
output_stat (bool) – It true, the batch statistics of mean and variance.
fix_parameters (bool) – When set to
True
, the beta and gamma will not be updated.param_init (dict) – Parameter initializers can be set with a dict. A key of the dict must be
'gamma'
,'beta'
. A value of the dict must be anInitializer
or anumpy.ndarray
. E.g.{'gamma': np.ones(...) * 2, 'beta': ConstantInitializer(0)}
.
- Returns
Normalized output variable. *
Variable
: Mean (if ``output_stat=True`) *Variable
: Std (if ``output_stat=True`)- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"group_normalization"
;beta (
need_grad=True
) : Trainable bias \(\beta\). (shape:<see above>
)gamma (
need_grad=True
) : Trainable scaling factor \(\gamma\). (shape:<see above>
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = group_normalization(<args>)
- nnabla.parametric_functions.rnn(x, h, w0_init=None, w_init=None, b_init=None, num_layers=1, nonlinearity='tanh', dropout=0.0, bidirectional=False, training=True, rng=None, with_bias=True, fix_parameters=False, name=None)[source]¶
N-Step RNN (recurrent neural networks).
N-Step RNN function implements Elman RNN with nonlineraity to input sequence. N-Step RNN function is defined as following:
\[h_t = \tanh(w_{ih}x_t+b_{ih}+w_{hh}h_{(t-1)}).\]We use the following notations to describe the inputs and outputs below. \(T\): sequcne length, \(B\): batch size, \(I\): input size, \(L\): number of layers, \(D\): number of directions, can be either 1 or 2, \(H\): hidden size.
References
Jeffrey L. Elman. “Finding Structure in Time.” Cognitive Science. 1990.
- Parameters
x (Variable) – Input N-D array with shape \((T, B, I)\).
h (Variable) – Input N-D array with shape \((L, D, B, H)\).
w0_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for weight at the first layer. Shape is \((D, H, I + H)\).w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for weights at the second layer and up. Shape is \((L-1, D, H, D*H + H)\).b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for bias. Shape is \((L, D, H)\).num_layers (int, optional) – Number of layers in the network. If set to 1, only the weights for the first layer will be invoked. Default is 1.
nonlinearity (str, optional) – Type of nonlinearity applied to input sequcne. Must be either tanh or relu. Default is tanh.
dropout (float, optional) – Dropout ratio applied to parameters. Default is 0.0.
bidirectional (bool, optional) – If True, bidirectional computation will be performed in each layer. Default is False.
training (bool, optional) – Backpropagation will be performed only when it is true. Default is True.
with_bias (bool, optional) – Specify whether to include the bias term.
- Returns
Output \(y\) with shape \((T, B, D * H)\) ~nnabla.Variable: Output \(h_n\) with shape \((L, D, B, H)\)
- Return type
Example
x = nn.Variable((seq_len, batch_size, input_size)) h = nn.Variable((num_layers, num_directions, batch_size, hidden_size)) y, hn = PF.rnn(x, h)
- Parameters to be registered
The following variables are registered in a parameter scope
"rnn"
;weight_l0 (
need_grad=True
) : Filter weights at 0-th layer. (shape:(D, H, I + H)
)weight (
need_grad=True
) : Filter weights at 1-st layer and above. (shape:(L-1, D, H, DH + H)
)bias (
need_grad=True
) : Biases. (shape:(L, D, H)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = rnn(<args>)
- nnabla.parametric_functions.lstm(x, h, c, w0_init=None, w_init=None, b_init=None, num_layers=1, dropout=0.0, bidirectional=False, training=True, rng=None, with_bias=True, fix_parameters=False, name=None)[source]¶
LSTM (long short-term memory).
Long Short-Term Memory, or LSTM, is a building block for recurrent neural networks (RNN) layers. LSTM unit consists of a cell and input, output, forget gates whose functions are defined as following:
\[\begin{split}f_t&&=\sigma(W_fx_t+U_fh_{t-1}+b_f) \\ i_t&&=\sigma(W_ix_t+U_ih_{t-1}+b_i) \\ o_t&&=\sigma(W_ox_t+U_oh_{t-1}+b_o) \\ c_t&&=f_t\odot c_{t-1}+i_t\odot\tanh(W_cx_t+U_ch_{t-1}+b_c) \\ h_t&&=o_t\odot\tanh(c_t).\end{split}\]We use the following notations to describe the inputs and outputs below. \(T\): sequcne length, \(B\): batch size, \(I\): input size, \(L\): number of layers, \(D\): number of directions, can be either 1 or 2, \(H\): hidden size.
References
S. Hochreiter, and J. Schmidhuber. “Long Short-Term Memory.” Neural Computation. 1997.
- Parameters
x (Variable) – Input N-D array with shape \((T, B, I)\).
h (Variable) – Input N-D array with shape \((L, D, B, H)\).
c (Variable) – Input N-D array with shape \((L, D, B, H)\) .
w0_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for weight at the first layer. Shape is \((D, 4, H, I + H)\).w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for weights at the second layer and up. Shape is \((L-1, D, 4, H, D * H + H)\).b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for bias. Shape is \((L, D, 4, H)\).num_layers (int, optional) – Number of layers in the network. If set to 1, only the weights for the first layer will be invoked. Default is 1.
dropout (float, optional) – Dropout ratio applied to parameters. Default is 0.0.
bidirectional (bool, optional) – If True, bidirectional computation will be performed in each layer. Default is False.
training (bool, optional) – Backpropagation will be performed only when it is true. Default is True.
with_bias (bool, optional) – Specify whether to include the bias term.
fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.
- Returns
Output \(y\) with shape \((T, B, D * H)\) ~nnabla.Variable: Output \(h_n\) with shape \((L, D, B, H)\) ~nnabla.Variable: Output \(c_n\) with shape \((L, D, B, H)\)
- Return type
Example
x = nn.Variable((seq_len, batch_size, input_size)) h = nn.Variable((num_layers, num_directions, batch_size, hidden_size)) c = nn.Variable((num_layers, num_directions, batch_size, hidden_size)) y, hn, cn = PF.lstm(x, h, c)
- Parameters to be registered
The following variables are registered in a parameter scope
"lstm"
;weight_l0 (
need_grad=True
) : Filter weights at 0-th layer. (shape:(D, 4, H, I + H)
)weight (
need_grad=True
) : Filter weights at 1-st layer and above. (shape:(L-1, D, 4, H, DH + H)
)bias (
need_grad=True
) : Biases. (shape:(L, D, 4, H)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = lstm(<args>)
- nnabla.parametric_functions.gru(x, h, w0_init=None, w_init=None, b_init=None, num_layers=1, dropout=0.0, bidirectional=False, training=True, rng=None, with_bias=True, fix_parameters=False, name=None)[source]¶
GRU (gated recurrent units).
GRU is defined as following:
\[\begin{split}r_t&&=\sigma(W_rx_t+U_rh_{t-1}+b_r) \\ z_t&&=\sigma(W_zx_t+U_zh_{t-1}+b_z) \\ n_t&&=\tanh(W_nx_t+b_{in}+r_n \odot (U_nh_{t-1}+b_{hn})) \\ h_t&&=(1-z_t) \odot n_t+z_t \odot h_{t-1}.\end{split}\]We use the following notations to describe the inputs and outputs below. \(T\): sequcne length, \(B\): batch size, \(I\): input size, \(L\): number of layers, \(D\): number of directions, can be either 1 or 2, \(H\): hidden size.
References
K. Cho et al. “Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation.” Empirical Methods in Natural Language Processing. 2014.
- Parameters
x (Variable) – Input N-D array with shape \((T, B, I)\).
h (Variable) – Input N-D array with shape \((L, D, B, H)\).
w0_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for weight at the first layer. Shape is \((D, 3, H, I + H)\).w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for weights at the second layer and up. Shape is \((L-1, D, 3, H, D * H + H)\).b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for bias. Shape is \((L, D, 4, H)\).num_layers (int, optional) – Number of layers in the network. If set to 1, only the weights for the first layer will be invoked. Default is 1.
dropout (float, optional) – Dropout ratio applied to parameters. Default is 0.0.
bidirectional (bool, optional) – If True, bidirectional computation will be performed in each layer. Default is False.
training (bool, optional) – Backpropagation will be performed only when it is true. Default is True.
with_bias (bool, optional) – Specify whether to include the bias term.
- Returns
Output \(y\) with shape \((T, B, D * H)\) ~nnabla.Variable: Output \(h_n\) with shape \((L, D, B, H)\)
- Return type
Example
x = nn.Variable((seq_len, batch_size, input_size)) h = nn.Variable((num_layers, num_directions, batch_size, hidden_size)) y, hn = PF.gru(x, h)
- Parameters to be registered
The following variables are registered in a parameter scope
"gru"
;weight_l0 (
need_grad=True
) : Filter weights at 0-th layer. (shape:(D, 3, H, I + H)
)weight (
need_grad=True
) : Filter weights at 1-st layer and above. (shape:(L-1, D, 3, H, DH + H)
)bias (
need_grad=True
) : Biases. (shape:(L, D, 4, H)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = gru(<args>)
- nnabla.parametric_functions.embed(inp, n_inputs, n_features, initializer=None, fix_parameters=False, apply_w=None, name=None)[source]¶
Embed.
Embed slices a matrix/tensor with indexing array/tensor. Weights are initialized with
nnabla.initializer.UniformInitializer
within the range of \(-\sqrt{3}\) and \(\sqrt{3}\).- Parameters
x (Variable) – [Integer] Indices with shape \((I_0, ..., I_N)\)
n_inputs – number of possible inputs, words or vocabraries
n_features – number of embedding features
fix_parameters (bool) – When set to
True
, the embedding weight matrix will not be updated.apply_w (function) – Lambda, function, or callable object applied to the weights.
- Returns
Output with shape \((I_0, ..., I_N, W_1, ..., W_M)\)
- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"embed"
;W (
need_grad=True
) : Embedding matrix. (shape:(n_inputs, n_features)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = embed(<args>)
- nnabla.parametric_functions.prelu(inp, base_axis=1, shared=True, fix_parameters=False, slope_init=None, name=None)[source]¶
Parametrized Rectified Linear Unit function defined as
\[y_i = \max(0, x_i) + w_i \min(0, x_i)\]where negative slope \(w\) is learned and can vary across channels (an axis specified with base_axis). Weights are initialized with \(-1\).
- Parameters
x (Variable) – N-D array as input
base_axis (int) – Dimensions up to base_axis is treated as sample dimension.
shared (bool) – Use shared weight value or not
fix_parameters (bool) – When set to
True
, the negative slope values will not be updated.slope_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer of negative slopes. By default, they are initialized with0.25
.
- Returns
N-D array.
- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"prelu"
;slope (
need_grad=True
) : Negative slope. (shape:tuple() if shared else (inp.shape[base_axis],)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = prelu(<args>)
- nnabla.parametric_functions.svd_affine(inp, n_outmaps, r, base_axis=1, uv_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶
SVD affine is a low rank approximation of the affine layer. It can be seen as two consecutive affine layers with a bottleneck. It computes:
\[{\mathbf y} = {\mathbf U} {\mathbf V} {\mathbf x} + {\mathbf b}.\]where \({\mathbf x}, {\mathbf y}\) are the inputs and outputs respectively, and \({\mathbf U}, {\mathbf V}, {\mathbf b}\) are constants.
The weights \({\mathbf U}\) and \({\mathbf V}\) are approximated with singular value decomposition (SVD) of the original weight matrix \({\mathbf W}\) and by selecting the \({R}\) dominant singular values and the corresponding singular vectors. Therefore the low rank \({R}\) is the size of the bottleneck.
If
uv_init
is a numpy array, \({\mathbf U}\) and \({\mathbf V}\) are computed such thatuv_init
is approximated by \({\mathbf{UV}}\). Ifuv_init
isNone
or an initializer, the product of \({\mathbf U}\) and \({\mathbf V}\) approximates the random initialization.If \({\mathbf U}\) and \({\mathbf V}\) exist in the context, they take precedence over
uv_init
.Suppose the weight of the affine is of \({I \times O}\) and the compression rate you want to specify is \({CR}\), then you set \({R}\) as
\[R = \left\lfloor \frac{(1 - CR)OI}{O + I} \right\rfloor.\]- Parameters
inp (Variable) – Input N-D array with shape (\(M_0 \times \ldots \times M_{B-1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.
n_outmaps (int or tuple) – Number of output neurons per data.
r (int) – rank of the factorized layer (size of the bottleneck)
base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.uv_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros ifwith_bias
isTrue
.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
- Returns
\((B + 1)\)-D array. (\(M_0 \times \ldots \times M_{B-1} \times L\))
- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"svd_affine"
;U (
need_grad=True
) : \({\mathbf U}\). (shape:(inmaps, r)
)V (
need_grad=True
) : \({\mathbf V}\). (shape:(r, outmaps)
)b (
need_grad=True
) : Bias vector. (shape:(outmaps,)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = svd_affine(<args>)
- nnabla.parametric_functions.svd_convolution(inp, outmaps, kernel, r, pad=None, stride=None, dilation=None, uv_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶
SVD convolution is a low rank approximation of the convolution layer. It can be seen as a depth wise convolution followed by a 1x1 convolution.
The flattened kernels for the i-th input map are expressed by their low rank approximation. The kernels for the i-th input \({\mathbf W_i}\) are approximated with the singular value decomposition (SVD) and by selecting the \({R}\) dominant singular values and the corresponding singular vectors.
\[{\mathbf W_{:,i,:}} ~ {\mathbf U_i} {\mathbf V_i}.\]\({\mathbf U}\) contains the weights of the depthwise convolution with multiplier \({R}\) and \({\mathbf V}\) contains the weights of the 1x1 convolution.
If
uv_init
is a numpy array, \({\mathbf U}\) and \({\mathbf V}\) are computed such thatuv_init
is approximated by \({\mathbf{UV}}\). Ifuv_init
isNone
or an initializer, the product of \({\mathbf U}\) and \({\mathbf V}\) approximates the random initialization.If \({\mathbf U}\) and \({\mathbf V}\) exist in the context, they take precedence over
uv_init
.Suppose the kernel tensor of the convolution is of \({O \times I \times K \times K}\) and the compression rate you want to specify is \({CR}\), then you set \({R}\) as
\[R = \left\lfloor \frac{(1 - CR)OIK^2}{I(O + K^2)} \right\rfloor.\]- Parameters
inp (Variable) – N-D array.
outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
kernel (tuple) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) two-dimensional kernel, specify (3, 5).
r (int) – Rank of the factorized layer.
uv_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros ifwith_bias
isTrue
.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
- Returns
\((B + 1)\)-D array. (\(M_0 \times \ldots \times M_{B-1} \times L\))
- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"svd_conv"
;U (
need_grad=True
) : Decomposed filter weights \({\mathbf U}\). (shape:(inmaps * r, *kernel)
)V (
need_grad=True
) : Decomposed filter weights \({\mathbf V}\). (shape:(outmaps, inmaps * r, 1, ...)
)b (
need_grad=True
) : Bias vector. (shape:(outmaps,)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = svd_convolution(<args>)
- nnabla.parametric_functions.cpd3_convolution(inp, outmaps, kernel, r, pad=None, stride=None, dilation=None, oik_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, max_iter=500, stopping_criterion=1e-05, lambda_reg=0.0, name=None)[source]¶
CP convolution is a low rank approximation of a convolution layer. A 3D tensor containing the parameter is built by collapsing the N-D kernels into 1D, then the tensor is decomposed into three matrices. The decomposed layer can be seen as linear combinations of the input feature maps to \({R}\) feature maps followed by a depthwise convolution and followed by linear combinations of the feature maps to compute the output feature maps.
The CP decomposition allows to approximate the kernel tensor by \({R}\) rank-1 tensors of the form:
\[\sum_{r=1}^{R} \lambda_r {\mathbf{o}^{(r)} \otimes \mathbf{i}^{(r)} \otimes \mathbf{k}^{(r)}},\]where \({\lambda}_r\) is the normalization coefficient and \({\otimes}\) is the outer product.
If
oik_init
is a numpy array, U and V are computed so that uv_init can be approximates from UV Ifoik_init
is None or an initializer, the product of U and V approximate the randomly initialized arrayIf
O
,I
andK
exist in context, they are used to initialize the layer and oik_init is not used.Suppose the kernel tensor of the affine is of \({I \times O}\) and the compression rate you want to specify is \({CR}\), then you set \({R}\) as
\[R = \left\lfloor \frac{(1 - CR)OIK^2}{O + I + K^2} \right\rfloor.\]References
Lebedev, Vadim, Yaroslav Ganin, Maksim Rakhuba, Ivan Oseledets, and Victor Lempitsky, “Speeding-up convolutional neural networks using fine-tuned cp-decomposition.”, arXiv preprint arXiv:1412.6553 (2014).
Marcella Astrid, Seung-Ik Lee, “CP-decomposition with Tensor Power Method for Convolutional Neural Networks Compression”, BigComp 2017.
- Parameters
inp (Variable) – N-D array.
outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) two-dimensional kernel, specify (3,5).r (int) – rank of the factorized layer
oik_init (numpy array or
nnabla.initializer.BaseInitializer
) – Initializer for weight. Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. It is initialized with zeros ifwith_bias
isTrue
.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
max_iter (int) – Max iteration of the ALS.
stopping_criterion (float) – Threshold for stopping the ALS. If the value is negative, the convergence check is ignored; in other words, it may reduce the computation time.
lambda_reg (float) – regularization parameter for the ALS. Larger lambda_reg means larger regularization.
- Returns
\((B + 1)\)-D array. (\(M_0 \times \ldots \times M_{B-1} \times L\))
- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"cpd3_conv"
;I (
need_grad=True
) : Decomposed filter weights \({\mathbf I}\). (shape:(r, inmaps, 1, ...)
)K (
need_grad=True
) : Decomposed filter weights \({\mathbf K}\). (shape:(r, *kernel)
)O (
need_grad=True
) : Decomposed filter weights \({\mathbf O}\). (shape:(outmaps, r, 1, ...)
)b (
need_grad=True
) : Bias vector. (shape:(outmaps,)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = cpd3_convolution(<args>)
- nnabla.parametric_functions.binary_connect_affine(inp, n_outmaps, base_axis=1, quantize_zero_to=1.0, w_init=None, wb_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶
Binary Connect Affine, multiplier-less inner-product.
Binary Connect Affine is an affine function, except the definition of the inner product is modified. The input-output relation of this function is as follows:
\[y_i = \sum_{i} sign(w_i) x_i.\]Therefore \(sign(w_i)\) is either \(1\) or \(-1\) and the inner product simplifies to addition.
This function should be used together with Batch Normalization.
References
M. Courbariaux, Y. Bengio, and J.-P. David. “BinaryConnect: Training Deep Neural Networks with binary weights during propagations.” Advances in Neural Information Processing Systems. 2015.
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (
weight
) and not the binarized weights (binary_weight
)2) The weights and the binary weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the binary weights will not be in sync.3) Quantized values are stored as floating point number for
binary_weight
, since this function is only for simulation purposes.- Parameters
inp (Variable) – Input N-D array with shape (\(M_0 \times \ldots \times M_{B-1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.
n_outmaps (int or
tuple
ofint
) – Number of output neurons per data.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.quantize_zero_to (float) – Input value at zero is quantized to this value.
w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.wb_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for binary weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros ifwith_bias
isTrue
.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
- Returns
- Parameters to be registered
The following variables are registered in a parameter scope
"bicon_affine"
;W (
need_grad=True
) : Weight matrix in floating type. (shape:(inmaps, outmaps)
)Wb (
need_grad=False
) : Binarized weights. (shape:(inmaps, outmaps)
)b (
need_grad=True
) : Bias vector. (shape:(outmaps,)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = binary_connect_affine(<args>)
- nnabla.parametric_functions.binary_connect_convolution(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, quantize_zero_to=1.0, w_init=None, wb_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶
Binary Connect Convolution, multiplier-less inner-product.
Binary Connect Convolution is the convolution function, except the definition of the inner product is modified. The input-output relation of this function is as follows:
\[y_{n, a, b} = \sum_{m} \sum_{i} \sum_{j} sign(w_{n, m, i, j}) x_{m, a + i, b + j}.\]Therefore \(sign(w_i)\) is either \(1\) or \(-1\) and the inner product simplifies to addition.
This function should be used together with BatchNormalization.
References
M. Courbariaux, Y. Bengio, and J.-P. David. “BinaryConnect: Training Deep Neural Networks with binary weights during propagations.” Advances in Neural Information Processing Systems. 2015.
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (
weight
) and not the binarized weights (binary_weight
)2) The weights and the binary weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the binary weights will not be in sync.3) Quantized values are stored as floating point number for
binary_weight
, since this function is only for simulation purposes.- Parameters
inp (Variable) – N-D array.
outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) two-dimensional kernel, specify (3,5).group (int) – Number of groups of channels. This makes connections across channels sparser by grouping connections along map direction.
quantize_zero_to (float) – Input value at zero is quantized to this value.
w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.wb_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for binary weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros ifwith_bias
isTrue
.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
- Returns
- Parameters to be registered
The following variables are registered in a parameter scope
"bicon_conv"
;W (
need_grad=True
) : Filter weights in float. (shape:(outmaps, inmaps, *kernel)
)Wb (
need_grad=False
) : Binarized filter weights. (shape:(outmaps, inmaps, *kernel)
)b (
need_grad=True
) : Bias vector. (shape:(outmaps,)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = binary_connect_convolution(<args>)
- nnabla.parametric_functions.binary_weight_affine(inp, n_outmaps, base_axis=1, quantize_zero_to=1.0, w_init=None, wb_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶
Binary Weight Affine, multiplier-less inner-product with a scale factor.
Binary Weight Affine is the affine function, but the inner product in this function is the following,
\[y_j = \frac{1}{\|\mathbf{w}_j\|_{\ell_1}} \sum_{i} sign(w_{ji}) x_i\]Therefore \(sign(w_{ji})\) is either \(1\) or \(-1\) and the inner product simplifies to addition followed by scaling factor \(\alpha = \frac{1}{\|\mathbf{w}_j\|_{\ell_1}}\). The number of :\(\alpha\) is the outmaps of the affine function.
References
Rastegari, Mohammad, et al. “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks.” arXiv preprint arXiv:1603.05279 (2016).
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (
weight
) and not the binarized weights (binary_weight
)2) The weights and the binary weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the binary weights will not be in sync.3) Quantized values are stored as floating point number for
binary_weight
, since this function is only for simulation purposes.- Parameters
inp (Variable) – Input N-D array with shape (\(M_0 \times \ldots \times M_{B-1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it was a matrix.
n_outmaps (int or
tuple
ofint
) – Number of output neurons per data.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.quantize_zero_to (float) – Input value at zero is quantized to this value.
w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.wb_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the binary weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the bias. By defalut, it is initialized with zeros ifwith_bias
isTrue
.fix_parameters (bool) – When set to
True
, the weight and bias will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
- Returns
- Parameters to be registered
The following variables are registered in a parameter scope
"bwn_affine"
;W (
need_grad=True
) : Weight matrix in floating type. (shape:(inmaps, outmaps)
)Wb (
need_grad=False
) : Binarized weights. (shape:(inmaps, outmaps)
)alpha (
need_grad=False
) : Scaling factor \(\alpha\). (shape:(outmaps,)
)b (
need_grad=True
) : Bias vector. (shape:(outmaps,)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = binary_weight_affine(<args>)
- nnabla.parametric_functions.binary_weight_convolution(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, quantize_zero_to=1.0, w_init=None, wb_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶
Binary Weight Convolution, multiplier-less inner-product with a scale factor.
Binary Weight Convolution is the convolution function, but the inner product in this function is the following,
\[y_{n, a, b} = \frac{1}{\|\mathbf{w}_n\|_{\ell_1}} \sum_{m} \sum_{i} \sum_{j} sign(w_{n, m, i, j}) x_{m, a + i, b + j}.\]Therefore \(sign(w_{n, m, i, j})\) is either \(1\) or \(-1\) and the inner product simplifies to addition followed by scaling factor \(\alpha = \frac{1}{\|\mathbf{w}_n\|_{\ell_1}}\). The number of \(n\) is the number of outmaps of the convolution function.
References
Rastegari, Mohammad, et al. “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks.” arXiv preprint arXiv:1603.05279 (2016).
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (
weight
) and not the binarized weights (binary_weight
)2) The weights and the binary weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the binary weights will not be in sync.3) Quantized values are stored as floating point number for
binary_weight
, since this function is only for simulation purposes.- Parameters
inp (Variable) – N-D array.
outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) two-dimensional kernel, specify (3,5).group (int) – Number of groups of channels. This makes connections across channels sparser by grouping connections along map direction.
quantize_zero_to (float) – Input value at zero is quantized to this value.
w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.wb_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for binary weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros ifwith_bias
isTrue
.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
- Returns
- Parameters to be registered
The following variables are registered in a parameter scope
"bwn_conv"
;W (
need_grad=True
) : Filter weights in float. (shape:(outmaps, inmaps, *kernel)
)Wb (
need_grad=False
) : Binarized filter weights. (shape:(outmaps, inmaps, *kernel)
)alpha (
need_grad=False
) : Scaling factor \(\alpha\). (shape:(outmaps,)
)b (
need_grad=True
) : Bias vector. (shape:(outmaps,)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = binary_weight_convolution(<args>)
- nnabla.parametric_functions.inq_affine(inp, n_outmaps, base_axis=1, num_bits=4, inq_iterations=(), selection_algorithm='random', seed=- 1, w_init=None, i_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶
Incremental Network Quantization Affine Layer
During training, the weights are sequentially quantized to power-of-two values, which allows the training of a multiplierless network.
Using
inq_iterations
, one can specify after how many forward passes half of the learnable weights are fixed and quantized to powers-of-two. After reaching the last value ininq_iterations
, all weights are fixed.For more details, please refer to the reference.
Reference: Zhou A, Yao A, Guo Y, Xu L, Chen Y. Incremental network quantization: Towards lossless CNNs with low-precision weights. <https://arxiv.org/abs/1702.03044>
- Parameters
inp (Variable) – Input N-D array with shape (\(M_0 \times \ldots \times M_{B-1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it was a matrix.
n_outmaps (int or
tuple
ofint
) – Number of output neurons per data.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.quantize_zero_to (float) – Input value at zero is quantized to this value.
num_bits (int) – Number of bits per weight. Value has to be larger than 1 as one bit is already used to code the value “0”
inq_iterations (tuple of int) – Tuple of iteration numbers at which we fix half of the weights.
selection_algorithm (str) – Chooses algorithm that is used to decide which weights are fixed. (“largest_abs” … fix weights with largest absolute value, “random” … fix weights randomly)
seed (int) – Random seed for INQ algorithm
w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.i_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for indicators (0 … learnable, 1 … fixed). By default, it is initialized with zeros.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros ifwith_bias
isTrue
.fix_parameters (bool) – When set to
True
, the weight and bias will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
- Returns
- Parameters to be registered
The following variables are registered in a parameter scope
"inq_affine"
;W (
need_grad=True
) : Weight matrix in floating type. (shape:(inmaps, outmaps)
)I (
need_grad=False
) : Binary indicator matrix of fixed weights. (shape:(inmaps, outmaps)
)b (
need_grad=True
) : Bias vector. (shape:(outmaps,)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = inq_affine(<args>)
- nnabla.parametric_functions.inq_convolution(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, num_bits=4, inq_iterations=(), selection_algorithm='random', seed=- 1, w_init=None, i_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]¶
Incremental Network Quantization Convolution Layer
During training, the weights are sequentially quantized to power-of-two values, which allows the training of a multiplierless network.
Using
inq_iterations
, one can specify after how many forward passes half of the learnable weights are fixed and quantized to powers-of-two. After reaching the last value ininq_iterations
, all weights are fixed.For more details, please refer to the reference.
Reference: Zhou A, Yao A, Guo Y, Xu L, Chen Y. Incremental network quantization: Towards lossless CNNs with low-precision weights. <https://arxiv.org/abs/1702.03044>
- Parameters
inp (Variable) – Input N-D array with shape (\(M_0 \times \ldots \times M_{B-1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it was a matrix.
n_outmaps (int or
tuple
ofint
) – Number of output neurons per data.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.num_bits (int) – Number of bits per weight. Value has to be larger than 1 as one bit is already used to code the value “0”
inq_iterations (tuple of int) – Tuple of iteration numbers at which we fix half of the weights.
selection_algorithm (str) – Chooses algorithm that is used to decide which weights are fixed. (“largest_abs” … fix weights with largest absolute value, “random” … fix weights randomly)
seed (int) – Random seed for INQ algorithm
w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.i_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the indicators (0 … learnable, 1 … fixed). By default, it is initialized with zeros.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the bias. By default, it is initialized with zeros ifwith_bias
isTrue
.fix_parameters (bool) – When set to
True
, the weight and bias will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
- Returns
- Parameters to be registered
The following variables are registered in a parameter scope
"inq_conv"
;W (
need_grad=True
) : Filter weights in float. (shape:(outmaps, inmaps, *kernel)
)I (
need_grad=False
) : Binary indicator matrix of fixed weights. (shape:(outmaps, inmaps, *kernel)
)b (
need_grad=True
) : Bias vector. (shape:(outmaps,)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = inq_convolution(<args>)
- nnabla.parametric_functions.fixed_point_quantized_affine(inp, n_outmaps, base_axis=1, w_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, quantize_w=True, sign_w=True, n_w=8, delta_w=0.0625, ste_fine_grained_w=True, quantize_b=True, sign_b=True, n_b=8, delta_b=0.0625, ste_fine_grained_b=True, name=None)[source]¶
Fixed-Point Quantized Affine.
Fixed-Point Quantized Affine is the affine function, except the definition of the inner product is modified. The input-output relation of this function is as follows:
\[y_j = \sum_{i} Q(w_{ji}) x_i,\]where \(Q(w_{ji})\) is the fixed-point quantization function.
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (
weight
) and not the quantized weights (quantized weight
)2) The weights and the quantized weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the quantized weights will not be in sync.3) CPU and GPU implementations now use float value for
quantized weight
, since this function is only for simulation purposes.- Parameters
inp (Variable) – Input N-D array with shape (\(M_0 \times \ldots \times M_{B-1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.
n_outmaps (
int
ortuple
ofint
) – Number of output neurons per data.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros ifwith_bias
isTrue
.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
n_w (int) – Bit width used for weight.
delta_w (float) – Step size for weight.
n_b (int) – Bit width used for bias.
delta_w – Step size for bias.
- Returns
\((B + 1)\)-D array. (\(M_0 \times \ldots \times M_{B-1} \times L\))
- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"fp_quantized_affine"
;W (
need_grad=True
) : Weight matrix in float. (shape:(inmaps, outmaps)
)b (
need_grad=True
) : Bias vector in float. (shape:(outmaps,)
)W_q (
need_grad=False
) : Quantized weights. (shape:(inmaps, outmaps)
)b_q (
need_grad=False
) : Quantized biases. (shape:(outmaps,)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = fixed_point_quantized_affine(<args>)
- nnabla.parametric_functions.fixed_point_quantized_convolution(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, channel_last=False, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, quantize_w=True, sign_w=True, n_w=8, delta_w=0.0625, ste_fine_grained_w=True, quantize_b=True, sign_b=True, n_b=8, delta_b=0.0625, ste_fine_grained_b=True, name=None)[source]¶
Fixed-Point Quantized Convolution.
Fixed-Point Quantized Convolution is the convolution function, except the definition of the inner product is modified. The input-output relation of this function is as follows:
\[y_{n, a, b} = \sum_{m} \sum_{i} \sum_{j} Q(w_{n, m, i, j}) x_{m, a + i, b + j},\]where \(Q(w_{n, m, i, j})\) is the fixed-point quantization function.
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (
weight
) and not the quantized weights (quantized weight
)2) The weights and the quantized weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the quantized weights will not be in sync.3) CPU and GPU implementations now use float value for
quantized weight
, since this function is only for simulation purposes.- Parameters
inp (Variable) – N-D array.
outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) two-dimensional kernel, specify (3,5).group (int) – Number of groups of channels. This makes connections across channels more sparse by grouping connections along map direction.
w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros ifwith_bias
isTrue
.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
n_w (int) – Bit width used for weight.
delta_w (float) – Step size for weight.
n_b (int) – Bit width used for bias.
delta_w – Step size for bias.
- Returns
N-D array.
- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"fp_quantized_conv"
;W (
need_grad=True
) : Filter weights in float. (shape:(outmaps, inmaps // group, *kernel)
)b (
need_grad=True
) : Bias vector in float. (shape:(outmaps,)
)W_q (
need_grad=False
) : Quantized weights. (shape:(outmaps, inmaps // group, *kernel)
)b_q (
need_grad=False
) : Quantized biases. (shape:(outmaps,)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = fixed_point_quantized_convolution(<args>)
- nnabla.parametric_functions.min_max_quantized_affine(inp, n_outmaps, base_axis=1, w_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, quantize_w=True, ql_min_w=0, ql_max_w=255, w_min_max=False, qr_min_w_init=None, qr_max_w_init=None, ste_fine_grained_w=True, quantize_b=True, ql_min_b=0, ql_max_b=255, b_min_max=False, qr_min_b_init=None, qr_max_b_init=None, ste_fine_grained_b=True, eps=0.01, name=None)[source]¶
Min-max Quantized Affine.
Min-max Quantized Affine is the affine function, except the definition of the inner product is modified. The input-output relation of this function is as follows:
\[y_j = \sum_{i} Q(w_{ji}) x_i,\]where \(Q(w_{ji})\) is the min-max quantization function.
In the min_max_quantized affine, the exponential moving average is not used. the min and max quantization ranges are either the min-max of weights and bias or trained.
Notice that the min and max values of inputs are always used instead of the exponential moving average.
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (
weight
) and not the quantized weights (quantized weight
)2) The weights and the quantized weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the quantized weights will not be in sync.3) CPU and GPU implementations now use float value for
quantized weight
, since this function is only for simulation purposes.- Parameters
inp (Variable) – Input N-D array with shape (\(M_0 \times \ldots \times M_{B-1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.
n_outmaps (
int
ortuple
ofint
) – Number of output neurons per data.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros ifwith_bias
isTrue
.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
ql_min_w (int, float, or Variable) – Minimum quantization level for weights. Default is 0.
ql_max_w (int, float, or Variable) – Maximum quantization level for weights. Default is 255.
w_min_max (bool) – Use the min and max of weights to compute quantization ranges. Default is
False
.qr_min_w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the minimum quantization range, qr_min. Default isnnabla.initializer.ConstantInitializer
(-2.0).qr_max_w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the maximum quantization range, qr_max. Default isnnabla.initializer.ConstantInitializer
(2.0).ste_fine_grained_w (bool) – If true, STE is not 1, the {0, 1}-mask computed from the min-max is applied to the gradient in the backward; otherwise, STE is 1.
ql_min_b (int, float, or Variable) – Minimum quantization level for bias. Default is 0.
ql_max_b (int, float, or Variable) – Maximum quantization level for bias. Default is 255.
b_min_max (bool) – Use the min and max of bias to compute quantization ranges. Default is
False
.qr_min_b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the minimum quantization range, qr_min. Default isnnabla.initializer.ConstantInitializer
(-6.0).qr_max_b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the maximum quantization range, qr_max. Default isnnabla.initializer.ConstantInitializer
(6.0).ste_fine_grained_b (bool) – If true, STE is not 1, the {0, 1}-mask computed from the min-max is applied to the gradient in the backward; otherwise, STE is 1.
eps (float) – Epsilon, or small value to ensure \(qr_{max} - qr_{min}\) must be greater than the epsilon for both weights and bias.
- Returns
\((B + 1)\)-D array. (\(M_0 \times \ldots \times M_{B-1} \times L\))
- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"min_max_quantized_affine"
;W (
need_grad=True
) : Weight matrix in float. (shape:(inmaps, outmaps)
)b (
need_grad=True
) : Bias vector in float. (shape:(outmaps,)
)W_q (
need_grad=False
) : Quantized weights. (shape:(inmaps, outmaps)
)b_q (
need_grad=False
) : Quantized biases. (shape:(outmaps,)
)qr_min (
need_grad=False
) : Minimum quantization range. Minimum values of inputs or trainable range.. (shape:ql_min.shape
)qr_max (
need_grad=False
) : Maximum quantization range. Maximum values of inputs or trainable range.. (shape:ql_max.shape
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = min_max_quantized_affine(<args>)
- nnabla.parametric_functions.min_max_quantized_convolution(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, channel_last=False, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, quantize_w=True, ql_min_w=0, ql_max_w=255, w_min_max=False, qr_min_w_init=None, qr_max_w_init=None, ste_fine_grained_w=True, quantize_b=True, ql_min_b=0, ql_max_b=255, b_min_max=False, qr_min_b_init=None, qr_max_b_init=None, ste_fine_grained_b=True, eps=0.01, name=None)[source]¶
Min-max Quantized Convolution.
Min-max Quantized Convolution is the convolution function, except the definition of the inner product is modified. The input-output relation of this function is as follows:
\[y_{n, a, b} = \sum_{m} \sum_{i} \sum_{j} Q(w_{n, m, i, j}) x_{m, a + i, b + j},\]where \(Q(w_{n, m, i, j})\) is the min-max quantization function.
In the min_max_quantized convolution, the exponential moving average is not used. the min and max quantization ranges are either the min-max of weights and bias or trained.
Notice that the min and max values of inputs are always used instead of the exponential moving average.
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (
weight
) and not the quantized weights (quantized weight
)2) The weights and the quantized weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the quantized weights will not be in sync.3) CPU and GPU implementations now use float value for
quantized weight
, since this function is only for simulation purposes.- Parameters
inp (Variable) – N-D array.
outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) two-dimensional kernel, specify (3,5).group (int) – Number of groups of channels. This makes connections across channels more sparse by grouping connections along map direction.
channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a. NHWC order.
w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros ifwith_bias
isTrue
.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
ql_min_w (int, float, or Variable) – Minimum quantization level for weights. Default is 0.
ql_max_w (int, float, or Variable) – Maximum quantization level for weights. Default is 255.
w_min_max (bool) – Use the min and max of weights to compute quantization ranges. Default is
False
.qr_min_w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the minimum quantization range, qr_min. Default isnnabla.initializer.ConstantInitializer
(-2.0).qr_max_w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the maximum quantization range, qr_max Default isnnabla.initializer.ConstantInitializer
(2.0).ste_fine_grained_w (bool) – If true, STE is not 1, the {0, 1}-mask computed from the min-max is applied to the gradient in the backward; otherwise, STE is 1.
ql_min_b (int, float, or Variable) – Minimum quantization level for bias. Default is 0.
ql_max_b (int, float, or Variable) – Maximum quantization level for bias. Default is 255.
b_min_max (bool) – Use the min and max of bias to compute quantization ranges. Default is
False
.qr_min_b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the minimum quantization range, qr_min. Default isnnabla.initializer.ConstantInitializer
(-6.0).qr_max_b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the maximum quantization range, qr_max Default isnnabla.initializer.ConstantInitializer
(6.0).ste_fine_grained_b (bool) – If true, STE is not 1, the {0, 1}-mask computed from the min-max is applied to the gradient in the backward; otherwise, STE is 1.
eps (float) – Epsilon, or small value to ensure \(qr_{max} - qr_{min}\) must be greater than the epsilon for both weights and bias.
- Returns
N-D array.
- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"min_max_quantized_conv"
;W (
need_grad=True
) : Filter weights in float. (shape:(outmaps, inmaps // group, *kernel)
)b (
need_grad=True
) : Bias vector in float. (shape:(outmaps,)
)W_q (
need_grad=False
) : Quantized weights. (shape:(outmaps, inmaps // group, *kernel)
)b_q (
need_grad=False
) : Quantized biases. (shape:(outmaps,)
)qr_min (
need_grad=False
) : Minimum quantization range. Minimum values of inputs or trainable range.. (shape:ql_min.shape
)qr_max (
need_grad=False
) : Maximum quantization range. Maximum values of inputs or trainable range.. (shape:ql_max.shape
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = min_max_quantized_convolution(<args>)
- nnabla.parametric_functions.pow2_quantized_affine(inp, n_outmaps, base_axis=1, w_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, quantize_w=True, sign_w=True, with_zero_w=False, n_w=8, m_w=2, ste_fine_grained_w=True, quantize_b=True, sign_b=True, with_zero_b=False, n_b=8, m_b=2, ste_fine_grained_b=True, name=None)[source]¶
Pow2 Quantized Affine.
Pow2 Quantized Affine is the affine function, except the definition of the inner product is modified. The input-output relation of this function is as follows:
\[y_j = \sum_{i} Q(w_{ji}) x_i,\]where \(Q(w_{ji})\) is the power-of-2 quantization function.
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (
weight
) and not the quantized weights (quantized weight
)2) The weights and the quantized weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the quantized weights will not be in sync.3) Quantized values are stored as floating point number for
quantized weight
, since this function is only for simulation purposes.- Parameters
inp (Variable) – Input N-D array with shape (\(M_0 \times \ldots \times M_{B-1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.
n_outmaps (
int
ortuple
ofint
) – Number of output neurons per data.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros ifwith_bias
isTrue
.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
with_zero_w (bool) – Indicate using zero as a quantized value. Default is false.
n_w (int) – Bit width used for weight.
m_w (int) – \(2^m\) is upper bound and \(-2^m\) is lower bound for weights. Default is 2.
with_zero_b (bool) – Indicate using zero as a quantized value. Default is false.
n_b (int) – Bit width used for bias.
m_b (int) – \(2^m\) is upper bound and \(-2^m\) is lower bound for bias. Default is 2.
- Returns
\((B + 1)\)-D array. (\(M_0 \times \ldots \times M_{B-1} \times L\))
- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"pow2_quantized_affine"
;W (
need_grad=True
) : Weight matrix in float. (shape:(inmaps, outmaps)
)b (
need_grad=True
) : Bias vector in float. (shape:(outmaps,)
)W_q (
need_grad=False
) : Quantized weights. (shape:(inmaps, outmaps)
)b_q (
need_grad=False
) : Quantized biases. (shape:(outmaps,)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = pow2_quantized_affine(<args>)
- nnabla.parametric_functions.pow2_quantized_convolution(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, quantize_w=True, with_zero_w=False, sign_w=True, n_w=8, m_w=2, ste_fine_grained_w=True, quantize_b=True, with_zero_b=False, sign_b=True, n_b=8, m_b=2, ste_fine_grained_b=True, name=None)[source]¶
Pow2 Quantized Convolution.
Pow2 Quantized Convolution is the convolution function, except the definition of the inner product is modified. The input-output relation of this function is as follows:
\[y_{n, a, b} = \sum_{m} \sum_{i} \sum_{j} Q(w_{n, m, i, j}) x_{m, a + i, b + j},\]where \(Q(w_{n, m, i, j})\) is the power-of-2 quantization function.
Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (
weight
) and not the quantized weights (quantized weight
)2) The weights and the quantized weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the quantized weights will not be in sync.3) Quantized values are stored as floating point number for
quantized weight
, since this function is only for simulation purposes.- Parameters
inp (Variable) – N-D array.
outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) two-dimensional kernel, specify (3,5).group (int) – Number of groups of channels. This makes connections across channels more sparse by grouping connections along map direction.
w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros ifwith_bias
isTrue
.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
n_w (int) – Bit width used for weight.
m_w (int) – \(2^m\) is upper bound and \(-2^m\) is lower bound for weights. Default is 2.
n_b (int) – Bit width used for bias.
m_b (int) – \(2^m\) is upper bound and \(-2^m\) is lower bound for bias. Default is 2.
- Returns
N-D array.
- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"pow2_quantized_conv"
;W (
need_grad=True
) : Filter weights in float. (shape:(outmaps, inmaps // group, *kernel)
)b (
need_grad=True
) : Bias vector in float. (shape:(outmaps,)
)W_q (
need_grad=False
) : Quantized weights. (shape:(outmaps, inmaps // group, *kernel)
)b_q (
need_grad=False
) : Quantized biases. (shape:(outmaps,)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = pow2_quantized_convolution(<args>)
- nnabla.parametric_functions.pruned_affine(inp, n_outmaps, base_axis=1, w_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, prune_w=True, rate_w=0.9, prune_b=True, rate_b=0.9, name=None)[source]¶
Pruned Affine.
Pruned Affine is the affine function, except the definition of the inner product is modified. The input-output relation of this function is as follows:
\[y_j = \sum_{i} Q(w_{ji}) x_i,\]where \(Q(w_{ji})\) is the pruning function, i.e.,
F.prune
.Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (
weight
) and not the quantized weights (quantized weight
)2) The weights and the quantized weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the quantized weights will not be in sync.3) CPU and GPU implementations now use float value for
quantized weight
, since this function is only for simulation purposes.- Parameters
inp (Variable) – Input N-D array with shape (\(M_0 \times \ldots \times M_{B-1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.
n_outmaps (
int
ortuple
ofint
) – Number of output neurons per data.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
rate_w (float) – Pruning rate for weights.
rate_b (float) – Pruning rate for bias.
- Returns
\((B + 1)\)-D array. (\(M_0 \times \ldots \times M_{B-1} \times L\))
- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"pruned_affine"
;W (
need_grad=True
) : Weight matrix in float. (shape:(inmaps, outmaps)
)b (
need_grad=True
) : Bias vector in float. (shape:(outmaps,)
)W_q (
need_grad=False
) : Qunatized weights. (shape:(inmaps, outmaps)
)b_q (
need_grad=False
) : Quantized biases. (shape:(outmaps,)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = pruned_affine(<args>)
- nnabla.parametric_functions.pruned_convolution(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, channel_last=False, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, prune_w=True, rate_w=0.9, prune_b=True, rate_b=0.9, name=None)[source]¶
Pruned Convolution.
Pruned Convolution is the convolution function, except the definition of the inner product is modified. The input-output relation of this function is as follows:
\[y_{n, a, b} = \sum_{m} \sum_{i} \sum_{j} Q(w_{n, m, i, j}) x_{m, a + i, b + j},\]where \(Q(w_{ji})\) is the pruning function, i.e.,
F.prune
.Note
1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (
weight
) and not the quantized weights (quantized weight
)2) The weights and the quantized weights become synced only after
forward()
is called, and not after a call tobackward()
. To access the parameters of the network, remember to callforward()
once before doing so, otherwise the float weights and the quantized weights will not be in sync.3) CPU and GPU implementations now use float value for
quantized weight
, since this function is only for simulation purposes.- Parameters
inp (Variable) – N-D array.
outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) two-dimensional kernel, specify (3,5).group (int) – Number of groups of channels. This makes connections across channels more sparse by grouping connections along map direction.
w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
rate_w (float) – Pruning rate for weights.
rate_b (float) – Pruning rate for bias.
- Returns
N-D array.
- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"pruned_conv"
;W (
need_grad=True
) : Filter weights in float. (shape:(outmaps, inmaps // group, *kernel)
)b (
need_grad=True
) : Bias vector in float. (shape:(outmaps,)
)W_q (
need_grad=False
) : Qunatized weights. (shape:(outmaps, inmaps // group, *kernel)
)b_q (
need_grad=False
) : Quantized biases. (shape:(outmaps,)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = pruned_convolution(<args>)
- nnabla.parametric_functions.min_max_quantize(x, ql_min=0, ql_max=255, decay=0.999, x_min_max=False, ema=False, ste_fine_grained=True, eps=0.01, qr_min_init=None, qr_max_init=None, fix_parameters=False, outputs=None, name=None)[source]¶
Min-max quantization.
This function uniformly quantizes values in the range of min and max quantization levels.
Min-max quantization is defined as the following equation
\[y = round \left(\frac{\min(\max(x, m), M) - m}{scale} \right) \times scale + m,\]where the \(scale\) is defined as
\[scale = \frac{M - m}{M_q - m_q},\]and
\[\begin{split}m_q = ql_{min}, \\ M_q = ql_{max}, \\ m = qr_{min}, \\ M = qr_{max}.\end{split}\]In the backward pass when using
ste_fine_grained
as false,\[\frac{\partial q_i}{\partial x_i} = 1.\]In the backward pass when using
ste_fine_grained
as true,\[\begin{split} \frac{\partial q_i}{\partial x_i}= \left\{ \begin{array}{ll} 0 & if \ \ \ x_i > M \\ 1 & if \ \ m \le x_i \le M \\ 0 & if \ \ x_i < m \\ \end{array} \right..\end{split}\]\(qr_{min}\) and \(qr_{max}\) are treaded as follows.
x_min_max
isTrue
andema
isTrue
: Exponential moving average are computed for each \(min(x)\) and \(max(x)\) then stored in \(qr_{min}\) and \(qr_{max}\).x_min_max
isTrue
andema
isFalse
: \(min(x)\) and \(max(x)\) are computed then stored in \(qr_{min}\) and \(qr_{max}\).x_min_max
isFalse
andema
isTrue
: Exponential moving average stored in \(qr_{min}\) and \(qr_{max}\) are used.x_min_max
isFalse
andema
isFalse
Gradients of \(qr_{min}\) and \(qr_{max}\) are computed in the backward pass.
More precisely, in inference of the min-max quantization, one has to consider zero-point (zp) which corresponds to the real value 0, and its data type is an integer. zero-point is defined as
\[\begin{split} && zp_f = ql_{min} -\frac{qr_{min}}{scale}, \\ && zp = \left\{ \begin{array}{ll} ql_{max} & if \ \ \ zp_f >= ql_{max} \\ round(zp_f) & if \ \ otherwise \\ ql_{min} & if \ \ zp_f <= ql_{min} \\ \end{array} \right..\end{split}\]Accordingly, in order to simulate quantization effect of zero-point, during both forward and backward pass, \(qr_{min}\) and \(qr_{max}\) are adjusted as follows,
\[\begin{split}qr_{min}^{adj} = ql_{min} - zp * scale, \\ qr_{max}^{adj} = ql_{max} - zp * scale.\end{split}\]These operations are often called nudge.
Finally, in the formulas of the min-max quantization, \(m\) and \(M\) are replaced by \(qr_{min}^{adj}\) and \(qr_{max}^{adj}\) respectively.
- Parameters
x (Variable) – Input N-D array.
ql_min (int, float, or Variable) – Minimum quantization level. Default is 0.
ql_max (int, float, or Variable) – Maximum quantization level. Default is 255.
decay (float) – The decay rate for the exponential moving average.
x_min_max (bool) – Use the min and max of x to compute quantization ranges. Default is
False
.ema (bool) – Use the exponential moving average for the min and max quantization ranges. Default is
False
.ste_fine_grained (bool) – If true, STE is not 1, the {0, 1}-mask computed from the min-max is applied to the gradient in the backward; otherwise, STE is 1.
eps (float) – Epsilon, or small value to ensure \(qr_{max} - qr_{min}\) must be greater than the epsilon for both weights and bias.
qr_min_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the minimum quantization range, qr_min. Default isnnabla.initializer.ConstantInitializer
(-6.0).qr_max_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the maximum quantization range, qr_max Default isnnabla.initializer.ConstantInitializer
(6.0).fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.
References
Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko, “Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference”, https://arxiv.org/abs/1712.05877
- Parameters to be registered
The following variables are registered in a parameter scope
"min_max_quantize"
;qr_min (
need_grad=False
) : Minimum quantization range, the exponential movining average of min values of inputs initialized with -6.0 if ema is True. (shape:ql_min.shape
)qr_max (
need_grad=False
) : Maximum quantization range, the exponential movining average of max values of inputs initialized with 6.0 if ema is True. (shape:ql_max.shape
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = min_max_quantize(<args>)
- nnabla.parametric_functions.lstm_cell(x, h, c, state_size, w_init=None, b_init=None, fix_parameters=False, name=None)[source]¶
Long Short-Term Memory.
Long Short-Term Memory, or LSTM, is a building block for recurrent neural networks (RNN) layers. LSTM unit consists of a cell and input, output, forget gates whose functions are defined as following:
\[\begin{split}f_t&&=\sigma(W_fx_t+U_fh_{t-1}+b_f) \\ i_t&&=\sigma(W_ix_t+U_ih_{t-1}+b_i) \\ o_t&&=\sigma(W_ox_t+U_oh_{t-1}+b_o) \\ c_t&&=f_t\odot c_{t-1}+i_t\odot\tanh(W_cx_t+U_ch_{t-1}+b_c) \\ h_t&&=o_t\odot\tanh(c_t).\end{split}\]References
S. Hochreiter, and J. Schmidhuber. “Long Short-Term Memory.” Neural Computation. 1997.
- Parameters
x (Variable) – Input N-D array with shape (batch_size, input_size).
h (Variable) – Input N-D array with shape (batch_size, state_size).
c (Variable) – Input N-D array with shape (batch_size, state_size).
state_size (int) – Internal state size is set to
state_size
.w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for bias. By default, it is initialized with zeros ifwith_bias
isTrue
.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.
- Returns
- Parameters to be registered
The following variables are registered in a parameter scope
"lstm"
;affine/W (
need_grad=True
) : Stacked weight matrixes of LSTM block. (shape:(inmaps, 4, state_size)
)affine/b (
need_grad=True
) : Stacked bias vectors of LSTM block. (shape:(4, state_size,)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = lstm_cell(<args>)
- class nnabla.parametric_functions.LSTMCell(batch_size, state_size, h=None, c=None, name=None)[source]¶
- __call__(x, w_init, b_init, fix_parameters)[source]¶
Updates h and c by calling lstm function.
- Parameters
x (Variable) – Input N-D array with shape (batch_size, input_size).
w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
, optional) – Initializer for bias. By default, it is initialized with zeros ifwith_bias
isTrue
.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.
- nnabla.parametric_functions.spectral_norm(w, dim=0, itr=1, eps=1e-12, test=False, u_init=None, fix_parameters=True, name=None)[source]¶
Spectral Normalization.
\[W_{sn} = \frac{W}{\sigma(W)}.\]where \(W\) is the input matrix, and the \(\sigma(W)\) is the spectral norm of \(W\). The spectral norm is approximately computed by the power iteration.
References
Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida, “Spectral Normalization for Generative Adversarial Networks”, International Conference on Learning Representations. 2018.
- Parameters
W (Variable) – Input N-D array with shape. This is normally network parameter.
dim (
int
) – Output dimension. Default is 0. If the dimension is not 0, then the specified dimension becomes the most-left dimension by transposing.itr (
int
) – Number of iterations. Default is 1.eps (
float
) – Epsilon for the normalization. Default is 1e-12.test (
bool
) – Use test mode. Default is False.
- Returns
Spectrally normalized \(W_{sn}\) with the same shape as \(W\).
- Return type
Example
import nnabla as nn import nnabla.parametric_functions as PF b, c, h, w = 4, 64, 32, 32 # Spectrally normalized convolution apply_w = lambda w: PF.spectral_norm(w, dim=0) h = nn.Variable.from_numpy_array(np.random.randn(b, c, h, w)) h = PF.convolution(h, with_bias=False, apply_w=apply_w) # Spectrally normalized affine apply_w = lambda w: PF.spectral_norm(w, dim=1) h = nn.Variable.from_numpy_array(np.random.randn(b, c)) h = PF.affine(h, with_bias=False, apply_w=apply_w) # Spectrally normalized embed apply_w = lambda w: PF.spectral_norm(w, dim=1) h = nn.Variable.from_numpy_array(np.random.randn(b, c)) h = PF.embed(h, c, apply_w=apply_w)
- Parameters to be registered
The following variables are registered in a parameter scope
"spectral-norm"
;u (
need_grad=False
) : singular vector. (shape:(w.shape[dim], )
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = spectral_norm(<args>)
- nnabla.parametric_functions.weight_normalization(w, dim=0, eps=1e-12, g_init=None, fix_parameters=False, name=None)[source]¶
Weight Normalization.
\[\mathbf{w}_{WN} = g \dfrac{\mathbf{w}}{\|\mathbf{w}\|}\]where \(\mathbf{w}\) is the input weights to be normalized, and \(g\) is learnable multiplication factors each of which is applied to each input weights at
dim
. This function is in general used as callback passed to apply_w for PF.convolution, PF.affine and so on. According to the author`s original implementation, \(v\) should be initialized by \(N(0, 0.05)\). To meet this condition, initializer should be passed to convolution which Weight Normalization is applied, like an example below.References
- Parameters
W (Variable) – Input N-D array with shape. This is normally network parameter.
dim (
int
) – Output dimension. Default is 0. If the dimension is not 0, then the specified dimension becomes the most-left dimension by transposing.eps (
float
) – Epsilon for the normalization. Default is 1e-12.g_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for the scale. By default, L2-norm of weights corresponding todim
are used.
- Returns
\(W\) with the same shape as \(v\).
- Return type
Example
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.initializer as I # h is nn.Variable. # convolution # according to the original implementation, w should be initialized by N(0, 0.05). h = PF.convolution(h, ..., apply_w=PF.weight_normalization, w_init=I.NormalInitializer(0.05)) # affine h = PF.affine(h, ..., apply_w=lambda w: PF.weight_normalization(w, dim=1), w_init=I.NormalInitializer(0.05))
Warning
Up to the version 1.10.0, this had been implemented as the composite functions.
- Parameters to be registered
The following variables are registered in a parameter scope
"wn"
;g (
need_grad=True
) : Weight Normalization adaptive scale scalar.. (shape:w.shape[dim]
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = weight_normalization(<args>)
- nnabla.parametric_functions.multi_head_attention(query, key, value, num_heads=12, dropout=0.0, k_embed_dim=None, v_embed_dim=None, out_dim=None, rng=None, with_bias=True, add_attn_bias=False, additive_mask=None, key_padding_mask=None, fix_parameters=False, param_init=None, name=None)[source]¶
MultiHeadAttention.
Computes multi-headed attention with query, key, and value. We use the following notations to describe the inputs and outputs below. \(L_T\): target sequence length, \(L_S\): source sequence length, \(B\): batch size, \(D\): input dimension, \(E\): embedding dimension.
References
A. Vaswani et al. “Attention is All You Need.” NIPS. 2017. <https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf>
Example:
q = nn.Variable((tgt_len, batch_size, q_input_dim)) k = nn.Variable((src_len, batch_size, k_input_dim)) v = nn.Variable((src_len, batch_size, v_input_dim)) out, w = PF.multi_head_attention(q, k, v) out.forward()
- Parameters
query (Variable) – Input N-D array with shape \((L_T, B, D_q)\).
key (Variable) – Input N-D array with shape \((L_S, B, D_k)\).
value (Variable) – Input N-D array with shape \((L_S, B, D_v)\).
num_heads (int, optional) – Number of attention heads. Note that embedding dimensoin E must be divisible by the number of heads. Default is 12 which is conventional.
dropout (float, optional) – Dropout ratio applied to parameters. Default is 0.
k_embed_dim (int, optional) – Embedding dimension for key. If specified, embedding dimensions for both query and key are set as that value. Otherwise, k_embed_dim is set as the same alue as embedding dimension for query.
v_embed_dim (int, optional) – Embedding dimension for value. If not specified, it is defaulted as the same value as embedding dimension for query.
out_dim (int, optional) – Embedding dimension for output weight. If not spefied, it is defaulted as the same value as embedding dimension for value.
rng (numpy.random.RandomState, optional) – Random generator for Initializer. Default is None.
with_bias (bool, optional) – Specify whether to include the bias parameters. Default is True.
add_attn_bias (bool, optional) – Specify whether to add attention bias parameters for key and value. Default is False.
additive_mask (Variable, optional) – Input N-D array with shape \((L_T, L_S)\). Values will be added to the attention layer to prevent attention to certain positions.
key_padding_mask (Variable, optional) – Input N-D array with shape \((B, L_S)\). Specified padding elements will be ignored by the attention layer. Values must be either 1 or 0.
fix_parameters (bool, optional) – When set to
True
, the weights and biases will not be updated. Default is False.param_init (dict, optional) – Parameter initializers can be set with a dict. Possible keys of the dict include q_weight, k_weight, v_weight, q_bias, k_bias, v_bias, out_weight, out_bias, attn_bias_k, attn_bias_v. A value of the dict must be an
Initializer
or anumpy.ndarray
. E.g.{'q_bias': ConstantInitializer(0)}
.
- Returns
Output \(y\) with shape \((L_T, B, E)\) ~nnabla.Variable: Output \(h_n\) with shape \((B, L_T, L_S)\)
- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"multi_head_attention"
;q_weight (
need_grad=True
) : weights for query. (shape:(E, E)
)k_weight (
need_grad=True
) : weights for key. (shape:(E_k, E)
)v_weight (
need_grad=True
) : weights for value. (shape:(E_v, E)
)out_weight (
need_grad=True
) : weigths for out projection. (shape:(E, E)
)q_bias (
need_grad=True
) : bias for query. (shape:(E, )
)k_bias (
need_grad=True
) : bias for key. (shape:(E, )
)v_bias (
need_grad=True
) : bais for value. (shape:(E, )
)out_bias (
need_grad=True
) : bias for out projection. (shape:(E, )
)attn_bias_k (
need_grad=True
) : attnetion bias for k. (shape:(E, 1)
)attn_bias_v (
need_grad=True
) : attnetion bias for v. (shape:(E, 1)
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = multi_head_attention(<args>)
- nnabla.parametric_functions.transformer(src, tgt, embed_dim=512, num_heads=8, num_encoder_layers=6, num_decoder_layers=6, dim_feedforward=2048, dropout=0.1, activation=None, src_additive_mask=None, tgt_additive_mask=None, memory_additive_mask=None, src_key_padding_mask=None, tgt_key_padding_mask=None, memory_key_padding_mask=None, rng=None, add_attn_bias=False, fix_parameters=False, name=None)[source]¶
Transformer.
We use the following notations to describe the inputs and outputs below. \(L_T\): target sequence length, \(L_S\): source sequence length, \(B\): batch size, \(E\): embedding dimension.
References
A. Vaswani et al. “Attention is All You Need.” NIPS. 2017. <https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf>
Examples:
src = nn.Variable((src_len, batch_size, embed_dim),need_grad=True) tgt = nn.Variable((tgt_len, batch_size, embed_dim),need_grad=True) out = PF.transformer(src, tgt, num_heads=16, num_encoder_layers=12) out.forward()
- Parameters
src (Variable) – Input source sequence to the encoder with shape:math:
(L_S, B, E)
.tgt (Variable) – Input target sequence to the decoder with shape \((L_T, B, E)\).
embed_dim (int, optional) – Embedding dimension to be used. Default is 512.
num_heads (int, optional) – Number of attention heads. Default is 12.
num_encoder_layers (int, optional) – Number of encoder layers to stack. Default is 6.
num_decoder_layers (int, optional) – Number of decoder layers to stack. Default is 6.
dim_feedforward (int, optional) – Dimension of the feedforward network model. Default is 2048.
dropout (float, optional) – Dropout ratio applied. Default is 0.1.
activation (function, optional) – Non-linear activation function to be used. Default is None, which is set as F.relu in the code.
src_additive_mask (Variable, optional) – Additive mask for the src sequence (optional). \((L_S, L_S)\).
tgt_additive_mask (Variable, optional) – Additive mask for the tgt sequence (optional). \((L_T, L_T)\).
memory_additive_mask (Variable, optional) – Additive mask for the encoder output (optional). \((L_T, L_S)\).
src_key_padding_mask (Variable, optional) – Key padding mask for src keys per batch (optional). \((B, L_S)\). Specified padding elements will be ignored by the attention layer. Values must be either 1 or 0.
tgt_key_padding_mask (Variable, optional) – Key padding mask for tgt keys per batch (optional). \((B, L_T)\). Specified padding elements will be ignored by the attention layer. Values must be either 1 or 0.
memory_key_padding_mask (Variable, optional) – Key padding mask for memory keys per batch (optional). \((B, L_S)\). Specified padding elements will be ignored by the attention layer. Values must be either 1 or 0.
rng (numpy.random.RandomState, optional) – Random generator for Initializer. Default is None.
add_attn_bias (bool, optional) – Specify whether to add attention bias parameters for key and value. Default is False.
fix_parameters (bool, optional) – When set to
True
, the weights and biases will not be updated. Default is False.
- Returns
Output \(y\) with shape \((L_T, B, E)\)
- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"transformer"
;encoder{layer#} (
need_grad=True
) : parameters for the n’th encoder layer. (shape:Refer to transformer_encode for details
)decoder{layer#} (
need_grad=True
) : parameters for the n’th decoder layer. (shape:Refer to transformer_decode for details
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = transformer(<args>)
- nnabla.parametric_functions.transformer_encode(src, embed_dim, num_heads, dim_feedforward=2048, dropout=0.1, activation=None, src_additive_mask=None, src_key_padding_mask=None, rng=None, add_attn_bias=False, fix_parameters=False, name=None)[source]¶
Transformer Encoder.
- Parameters
src (Variable) – Input sequnce to the encoder layer with shape \((L_S, B, E)\).
embed_dim (int) – Number of embedding dimension.
num_heads (int) – Number of attention heads.
dim_feedforward (int, optional) – Dimension of the feedforward network model. Default is 2048.
dropout (float, optional) – Dropout ratio. Default is 0.1.
activation (function, optional) – Non-linear activation function to be used. Default is None, which is set as F.relu in the code.
src_additive_mask (Variable, optional) – Additive mask for the source sequence with shape \((L_S, L_S)\)
src_key_padding_mask (Variable, optional) – Padding mask for the source sequence with shape \((B, L_S)\). Specified padding elements will be ignored by the attention layer. Values must be either 1 or 0.
rng (numpy.random.RandomState, optional) – Random generator for Initializer. Defalut is None.
add_attn_bias (bool, optional) – Specify whether to add attention bias parameters for key and value. Default is False.
fix_parameters (bool, optional) – When set to
True
, the weights and biases will not be updated. Default is False.
- Returns
Output \(y\) with shape \((L_S, B, E)\)
- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"transformer_encode"
;src_self_attn (
need_grad=True
) : self-attention parameters for source sequence. (shape:Refer to multi_head_attention for details
)enc_affine1 (
need_grad=True
) : first affine used in encoder. (shape:Refer to affine for details
)enc_affine2 (
need_grad=True
) : second affine used in encoder. (shape:Refer to affine for details
)enc_layer_norm1 (
need_grad=True
) : fist layer normalization used in encoder. (shape:Refer to layer_normalization for details
)enc_layer_norm2 (
need_grad=True
) : second layer normalization used in encoder. (shape:Refer to layer_normalization for details
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = transformer_encode(<args>)
- nnabla.parametric_functions.transformer_decode(tgt, memory, embed_dim, num_heads, dim_feedforward=2048, dropout=0.1, activation=None, tgt_additive_mask=None, memory_additive_mask=None, tgt_key_padding_mask=None, memory_key_padding_mask=None, rng=None, add_attn_bias=False, fix_parameters=False, name=None)[source]¶
Transformer Decoder.
- Parameters
tgt (Variable) – Input sequnce to the decoder layer with shape \((L_T, B, E)\).
memory (Variable) – Output sequnce from the last layer of the encoder with shape \((L_T, B, E)\).
embed_dim (int) – Number of embedding dimension.
num_heads (int) – Number of attention heads.
dim_feedforward (int, optional) – Dimension of the feedforward network model. Default is 2048.
dropout (float, optional) – Dropout ratio. Default is 0.1.
activation (function, optional) – Non-linear activation function to be used. Default is None, which is set as F.relu in the code.
tgt_additive_mask (Variable, optional) – Additive mask for the target sequence with shape \((L_T, L_T)\).
memory_additive_mask (Variable, optional) – Additive mask for the memory sequcne with shape \((L_T, L_S)\).
tgt_key_padding_mask (Variable, optional) – Padding mask for the target sequence with shape \((B, L_T)\). Specified padding elements will be ignored by the attention layer. Values must be either 1 or 0.
memory_key_padding_mask (Variable, optional) – Padding mask for the mask sequence with shape \((B, L_S)\). Specified padding elements will be ignored by the attention layer. Values must be either 1 or 0.
rng (numpy.random.RandomState) – Random generator for Initializer. Default is None.
add_attn_bias (bool, optional) – Specify whether to add attention bias parameters for key and value. Default is False.
fix_parameters (bool) – When set to
True
, the weights and biases will not be updated. Default is False.
- Returns
Output \(y\) with shape \((L_T, B, E)\)
- Return type
- Parameters to be registered
The following variables are registered in a parameter scope
"transformer_decode"
;tgt_self_attn (
need_grad=True
) : self-attention parameters for target sequence. (shape:Refer to multi_head_attention for details
)tgt_memory_attn (
need_grad=True
) : attention parameters for target sequence with output from encoder as key. (shape:Refer to multi_head_attention for details
)dec_affine1 (
need_grad=True
) : first affine used in decoder. (shape:Refer to affine for details
)dec_affine2 (
need_grad=True
) : second affine used in decoder. (shape:Refer to affine for details
)dec_layer_norm1 (
need_grad=True
) : fist layer normalization used in decoder. (shape:Refer to layer_normalization for details
)dec_layer_norm2 (
need_grad=True
) : second layer normalization used in decoder. (shape:Refer to layer_normalization for details
)dec_layer_norm3 (
need_grad=True
) : third layer normalization used in decoder. (shape:Refer to layer_normalization for details
)
Note
If the
name
option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.with parametric_scope(name): output = transformer_decode(<args>)
Parameter Initializer¶
Some of the parametric functions optionally takes parameter initializer listed below.
- class nnabla.initializer.BaseInitializer[source]¶
Base class of the parameter initializer.
- __call__(shape)[source]¶
Generates an array with an initializer.
- Parameters
shape (
tuple
ofint
) –numpy.ndarray
with the shape created.- Returns
Array.
- Return type
Note
Subclasses of
BaseInitializer
must override this method.
- class nnabla.initializer.ConstantInitializer(value=0)[source]¶
Bases:
nnabla.initializer.BaseInitializer
Generates a constant valued array.
- Parameters
value (float) – A constant value.
Example:
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.initializer as I x = nn.Variable([60,1,28,28]) w = I.ConstantInitializer(0.1) b = I.ConstantInitializer() # this generates constant valued array of default value 0 h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv'
- class nnabla.initializer.NormalInitializer(sigma=1.0, rng=None)[source]¶
Bases:
nnabla.initializer.BaseInitializer
Generates a random array from a specified normal distribution.
\[\mathbf x \sim {\cal N} (\mathbf 0 | \sigma^2 \mathbf I)\]- Parameters
sigma (float) – \(\sigma\).
rng (numpy.random.RandomState) – Random number generator.
Example:
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.initializer as I x = nn.Variable([60,1,28,28]) w = I.NormalInitializer(5e-5) b = I.NormalInitializer(0.0) h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv')
- class nnabla.initializer.UniformInitializer(lim=(- 1, 1), rng=None)[source]¶
Bases:
nnabla.initializer.BaseInitializer
Generates a random array from a specified uniform distribution.
\[\mathbf x \sim {\cal U} (a, b)\]- Parameters
rng (numpy.random.RandomState) – Random number generator.
Example:
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.initializer as I x = nn.Variable([60,1,28,28]) w = I.UniformInitializer() # this generates uniform distribution within the default range of (-1,1) b = I.UniformInitializer((-0.5,0.5)) h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv')
- class nnabla.initializer.UniformIntInitializer(lim=(0, 10), rng=None)[source]¶
Bases:
nnabla.initializer.BaseInitializer
Generates a random array from a specified integer uniform distribution.
\[\mathbf x \sim {\cal U} ([a, b))\]- Parameters
rng (numpy.random.RandomState) – Random number generator.
Example:
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.initializer as I x = nn.Variable([60,1,28,28]) w = I.UniformIntInitializer() # this generates uniform integer distribution within the default range of (0,10) b = I.UniformIntInitializer((-1,1)) h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv')
- class nnabla.initializer.RangeInitializer(start=0, step=1)[source]¶
Bases:
nnabla.initializer.BaseInitializer
Generates an array with sequence of numbers.
\[\mathbf x[i] = start + step * i\]Example:
import nnabla as nn import nnabla.initializer as I x = nn.Variable([100]) x.d = I.RangeInitializer(0, 1)(x.shape)
- class nnabla.initializer.OrthogonalInitializer(gain=1.0, rng=None)[source]¶
Bases:
nnabla.initializer.BaseInitializer
Generates an orthogonal matrix weights proposed by Saxe et al.
- Parameters
gain (float) – scaling factor which should be decided depending on a type of units.
rng (numpy.random.RandomState) – Random number generator.
Example:
import numpy as np import nnabla as nn import nnabla.parametric_functions as PF import nnabla.initializer as I x = nn.Variable([60,1,28,28]) w = I.OrthogonalInitializer(np.sqrt(2.0)) b = I.ConstantInitializer(0.0) h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv')
References
- class nnabla.initializer.WeightNormalizationScaleInitializer(w, dim=0, eps=1e-12)[source]¶
Bases:
nnabla.initializer.BaseInitializer
Compute the L2-norm for each weight kernel.
This initializer is specific to the weight normalization scale to keep the same magnitude of the originally initialized weights even after the applicaiton of the weight normalization at only initialization.
- nnabla.initializer.calc_normal_std_he_forward(inmaps, outmaps, kernel=(1, 1))[source]¶
Calculates the standard deviation proposed by He et al.
\[\sigma = \sqrt{\frac{2}{NK}}\]- Parameters
Example:
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.initializer as I x = nn.Variable([60,1,28,28]) s = I.calc_normal_std_he_forward(x.shape[1],64) w = I.NormalInitializer(s) b = I.ConstantInitializer(0) h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv')
References
- nnabla.initializer.calc_normal_std_he_backward(inmaps, outmaps, kernel=(1, 1))[source]¶
Calculates the standard deviation of He et al. (backward case).
\[\sigma = \sqrt{\frac{2}{MK}}\]- Parameters
Example:
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.initializer as I x = nn.Variable([60,1,28,28]) s = I.calc_normal_std_he_backward(x.shape[1],64) w = I.NormalInitializer(s) b = I.ConstantInitializer(0) h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv')
References
- nnabla.initializer.calc_normal_std_glorot(inmaps, outmaps, kernel=(1, 1))[source]¶
Calculates the standard deviation proposed by Glorot et al.
Note
We have updated the definition as following from v.1.2. It may affect the behavior of existing scripts that rely on the default initialization.
\[\sigma = \sqrt{\frac{2}{K(N + M)}}\]- Parameters
Example:
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.initializer as I x = nn.Variable([60,1,28,28]) s = I.calc_normal_std_glorot(x.shape[1],64) w = I.NormalInitializer(s) b = I.ConstantInitializer(0) h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv')
References
- nnabla.initializer.calc_uniform_lim_glorot(inmaps, outmaps, kernel=(1, 1))[source]¶
Calculates the lower bound and the upper bound of the uniform distribution proposed by Glorot et al.
Note
We have updated the definition as following from v.1.3. It may affect the behavior of existing scripts that rely on the default initialization.
\[\begin{split}b &= \sqrt{\frac{6}{K(N + M)}}\\ a &= -b\end{split}\]- Parameters
Example:
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.initializer as I x = nn.Variable([60,1,28,28]) lb,ub= I.calc_uniform_lim_glorot(x.shape[1],64) w = I.UniformInitializer((lb,ub)) b = I.ConstantInitializer(0) h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv')
References
Grad¶
- nnabla.grad.grad(outputs, inputs, grad_outputs=None, persistent_outputs=[], bind_grad_output=False)[source]¶
Gradient function for the outputs with respect to the inputs.
The grad function computes the sum of gradients of the outputs w.r.t. the inputs.
\[g_i = \sum_{j} {\frac{\partial y_j}{\partial x_i}},\]\(y_j\) is each output, \(x_i\) is each input, and \(g_i\) is the sum of the gradient of \(y_j\) w.r.t. \(x_i\) over all \(j\).
- Parameters
outputs (list of
Variable
orVariable
) – Outputs of the differentiable function.inputs (list of
Variable
orVariable
) – Inputs w.r.t. which the gradients of outputs are computed.grad_outputs (None, scalar,
numpy.ndarray
,nnabla.NdArray
, or list of scalar,numpy.ndarray
, ornnabla.NdArray
,) – Gradient outputs corresponding to outputs. This is same as the grad argument ofbackward()
. Default is None, so 1 is used as the in-coming gradient at the very beginning of the Variable in the gradient graph.persistent_outputs (list of
bool
) – Outputs become persistent accordingly. If not specified, all outputs become persistent.bind_grad_output (
bool
) – Bind data to grad of input variable. This is useful for the case where one wants to use the gradient graph for training a neural network using the first-order gradients only. Default is False.
- Returns
List of
Variable
.If the backpropagation does not reach input(s), the corresponding returned value(s) are
zero
(i.e., the gradients w.r.t. inputs are zero) and not connected as a part of the gradient graph.
Example (Gradient Penalty):
import nnabla as nn import nnabla.functions as F import nnabla.parametric_functions as PF import numpy as np from nnabla.ext_utils import get_extension_context # Context extension_module = "cudnn" ctx = get_extension_context(extension_module) nn.set_default_context(ctx) # Input and label x = nn.Variable.from_numpy_array(np.random.randn(4, 3, 32, 32)) y = nn.Variable.from_numpy_array(np.random.randint(0, 10, 4).reshape(4, 1)) # Network h = PF.convolution(x, 8, (3, 3), (1, 1), name="conv1") h = F.relu(h) h = F.max_pooling(h, (2, 2)) h = PF.convolution(h, 16, (3, 3), (1, 1), name="conv2") h = F.relu(h) h = F.max_pooling(h, (2, 2)) p = PF.affine(h, 10, name="pred") loss = F.mean(F.softmax_cross_entropy(p, y)) # Grad outputs = [loss] inputs = nn.get_parameters().values() grads = nn.grad(outputs, inputs) # gradients of the parameters # Backward of the outputs w.r.t. the parameters by constraining the gradient norms. t = 0 # or 1 gp = sum([(F.sum(g ** 2) ** 0.5 - t) ** 2 for g in grads]) loss += gp loss.forward() loss.backward()
Example (Higer-order Gradients):
import nnabla as nn import nnabla.functions as F import numpy as np x = nn.Variable.from_numpy_array(np.random.randn(2, 2)).apply(need_grad=True) x.grad.zero() y = F.sin(x) def grad(y, x, n=1): dx = [y] for _ in range(n): dx = nn.grad([dx[0]], [x]) return dx[0] dnx = grad(y, x, n=10) dnx.forward() print(np.allclose(-np.sin(x.d), dnx.d)) dnx.backward() print(np.allclose(-np.cos(x.d), x.g)) # Show the supported status for each function from nnabla.backward_functions import show_registry show_registry()
- nnabla.backward_functions.register(func_name, func)[source]¶
Register the backward function to a function.
- Parameters
func_name (str) – The function class name, for example, Affine.
func (function) – The function to be called as the backward function to the function
func_name
.. Arguments of the func must be (ctx: nn.Context, inputs: list of nn.Variable, **kwargs).. The inputs are the ones to the function of thefunc_name
. The kwargs are the arguments of the function. For example, if thefunc_name
is Affine, func isaffine_backward
, the inputs are data, weights, and bias if necessary, and kwargs = dict(base_axis=base_axis).
Solvers¶
The nnabla.solvers.Solver
class represents a stochastic gradient descent based optimizer for optimizing the parameters in the computation graph. NNabla provides various solvers listed below.
Solver¶
- class nnabla.solvers.Solver¶
Solver interface class.
The same API provided in this class can be used to implement various types of solvers.
Example:
# Network building comes above import nnabla.solvers as S solver = S.Sgd(lr=1e-3) solver.set_parameters(nn.get_parameters()) for itr in range(num_itr): x.d = ... # set data t.d = ... # set label loss.forward() solver.zero_grad() # All gradient buffer being 0 loss.backward() solver.weight_decay(decay_rate) # Apply weight decay solver.clip_grad_by_norm(clip_norm) # Apply clip grad by norm solver.update() # updating parameters
Note
All solvers provided by NNabla belong to an inherited class of
Solver
. A solver is never instantiated by this class itself.- check_inf_grad(self, pre_hook=None, post_hook=None)¶
Check if there is any inf on the gradients which were setup.
- check_inf_or_nan_grad(self, pre_hook=None, post_hook=None)¶
Check if there is any inf or nan on the gradients which were setup.
- check_nan_grad(self, pre_hook=None, post_hook=None)¶
Check if there is any nan on the gradients which were setup.
- clear_parameters(self)¶
Clear all registered parameters and states.
- clip_grad_by_norm(self, float clip_norm, pre_hook=None, post_hook=None)¶
Clip gradients by norm. When called, the gradient will be clipped by the given norm.
- Parameters
clip_norm (float) – The value of clipping norm.
- get_parameters(self)¶
Get all registered parameters
- get_states(self)¶
Get all states
- info¶
object
- Type
info
- learning_rate(self)¶
Get the learning rate.
- load_states(self, path)¶
Load solver states.
- Parameters
path – path to the state file to be loaded.
- name¶
Get the name of the solver.
- remove_parameters(self, vector[string] keys)¶
Remove previously registered parameters, specified by a
vector
of its keys.
- save_states(self, path)¶
Save solver states.
- Parameters
path – path or file object
- scale_grad(self, scale, pre_hook=None, post_hook=None)¶
Rescale gradient
- set_learning_rate(self, learning_rate)¶
Set the learning rate.
- set_parameters(self, param_dict, bool reset=True, bool retain_state=False)¶
Set parameters by dictionary of keys and parameter Variables.
- Parameters
param_dict (dict) – key:string, value: Variable.
reset (bool) – If true, clear all parameters before setting parameters. If false, parameters are overwritten or added (if it’s new).
retain_state (bool) – The value is only considered if reset is false. If true and a key already exists (overwriting), a state (such as momentum) associated with the key will be kept if the shape of the parameter and that of the new param match.
- set_states(self, states)¶
Set states. Call
set_parameters
to initialize states of a solver first, otherwise this method raise an value error.
- set_states_from_protobuf(self, optimizer_proto)¶
Set states to the solver from the protobuf file.
Internally used helper method.
- set_states_to_protobuf(self, optimizer)¶
Set states to the protobuf file from the solver.
Internally used helper method.
- setup(self, params)¶
Deprecated. Call
set_parameters
withparam_dict
.
- update(self, update_pre_hook=None, update_post_hook=None)¶
When this function is called, parameter values are updated using the gradients accumulated in backpropagation, stored in the
grad
field of the parameterVariable
s. Update rules are implemented in the C++ core, in derived classes of Solver. The updated parameter values will be stored into the data field of the parameterVariable
s.- Parameters
update_pre_hook (callable) – This callable object is called immediately before each update of parameters. The default is None.
update_post_hook (callable) – This callable object is called immediately after each update of parameters. The default is None.
- weight_decay(self, float decay_rate, pre_hook=None, post_hook=None)¶
Apply weight decay to gradients. When called, the gradient weight will be decayed by a rate of the current parameter value.
- Parameters
decay_rate (float) – The coefficient of weight decay.
- zero_grad(self)¶
Initialize gradients of all registered parameter by zero.
List of solvers¶
- nnabla.solvers.Sgd(lr=0.001)¶
Stochastic gradient descent (SGD) optimizer.
\[w_{t+1} \leftarrow w_t - \eta \Delta w_t\]
- nnabla.solvers.Momentum(lr=0.001, momentum=0.9)¶
SGD with Momentum.
\[\begin{split}v_t &\leftarrow \gamma v_{t-1} + \eta \Delta w_t\\ w_{t+1} &\leftarrow w_t - v_t\end{split}\]- Parameters
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.Lars(lr=0.001, momentum=0.9, coefficient=0.001, eps=1e-06)¶
LARS with Momentum.
\[\begin{split}\lambda &\leftarrow \eta \frac{\| w_t \|}{\| \Delta w_t + \beta w_t \|} \\ v_{t+1} &\leftarrow m v_t + \gamma \lambda (\Delta w_t + \beta w_t) \\ w_{t+1} &\leftarrow w_t - v_{t+1}\end{split}\]- Parameters
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.Nesterov(lr=0.001, momentum=0.9)¶
Nesterov Accelerated Gradient optimizer.
\[\begin{split}v_t &\leftarrow \gamma v_{t-1} - \eta \Delta w_t\\ w_{t+1} &\leftarrow w_t - \gamma v_{t-1} + \left(1 + \gamma \right) v_t\end{split}\]- Parameters
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
Yurii Nesterov. A method for unconstrained convex minimization problem with the rate of convergence \(o(1/k2)\).
- nnabla.solvers.Adadelta(lr=1.0, decay=0.95, eps=1e-06)¶
AdaDelta optimizer.
\[\begin{split}g_t &\leftarrow \Delta w_t\\ v_t &\leftarrow - \frac{RMS \left[ v_t \right]_{t-1}} {RMS \left[ g \right]_t}g_t\\ w_{t+1} &\leftarrow w_t + \eta v_t\end{split}\]- Parameters
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.Adagrad(lr=0.01, eps=1e-08)¶
ADAGrad optimizer.
\[\begin{split}g_t &\leftarrow \Delta w_t\\ G_t &\leftarrow G_{t-1} + g_t^2\\ w_{t+1} &\leftarrow w_t - \frac{\eta}{\sqrt{G_t} + \epsilon} g_t\end{split}\]- Parameters
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.AdaBelief(alpha=0.001, beta1=0.9, beta2=0.999, eps=1e-08, wd=0.0, amsgrad=False, weight_decouple=False, fixed_decay=False, rectify=False)¶
AdaBelief optimizer.
\[\begin{split}m_t &\leftarrow \beta_1 m_{t-1} + (1 - \beta_1) g_t\\ s_t &\leftarrow \beta_2 s_{t-1} + (1 - \beta_2) (g_t - m_t)^2\\ w_{t+1} &\leftarrow w_t - \alpha \frac{\sqrt{1 - \beta_2^t}}{1 - \beta_1^t} \frac{m_t}{\sqrt{s_t + \epsilon} + \epsilon}\end{split}\]- Parameters
alpha (float) – Step size (\(\alpha\)).
beta1 (float) – Decay rate of first-order momentum (\(\beta_1\)).
beta2 (float) – Decay rate of second-order momentum (\(\beta_2\)).
eps (float) – Small value for avoiding zero division(\(\epsilon\)).
wd (float) – Weight decay rate. This option only takes effect when weight_decouple option is enabled.
amsgrad (bool) – Perform AMSGrad variant of AdaBelief.
weight_decouple (bool) – Perform decoupled weight decay as in AdamW.
fixed_decay (bool) – If True, the weight decay ratio will be kept fixed. Note that this option only takes effect when weight_decouple option is enabled.
rectify (bool) – Perform RAdam variant of AdaBelief.
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.RMSprop(lr=0.001, decay=0.9, eps=1e-08)¶
RMSprop optimizer (Geoffery Hinton).
\[\begin{split}g_t &\leftarrow \Delta w_t\\ v_t &\leftarrow \gamma v_{t-1} + \left(1 - \gamma \right) g_t^2\\ w_{t+1} &\leftarrow w_t - \eta \frac{g_t}{\sqrt{v_t} + \epsilon}\end{split}\]- Parameters
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.RMSpropGraves(lr=0.0001, decay=0.95, momentum=0.9, eps=0.0001)¶
RMSpropGraves optimizer (Alex Graves).
\[\begin{split}n_t &\leftarrow \rho n_{t-1} + \left(1 - \rho \right) {e_t}^2\\ g_t &\leftarrow \rho g_{t-1} + \left(1 - \rho \right) e_t\\ d_t &\leftarrow \beta d_{t-1} - \eta \frac{e_t}{\sqrt{n_t - {g_t}^2 + \epsilon}}\\ w_{t+1} &\leftarrow w_t + d_t\end{split}\]where \(e_t\) denotes the gradient.
- Parameters
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.Adam(alpha=0.001, beta1=0.9, beta2=0.999, eps=1e-08)¶
ADAM optimizer.
\[\begin{split}m_t &\leftarrow \beta_1 m_{t-1} + (1 - \beta_1) g_t\\ v_t &\leftarrow \beta_2 v_{t-1} + (1 - \beta_2) g_t^2\\ w_{t+1} &\leftarrow w_t - \alpha \frac{\sqrt{1 - \beta_2^t}}{1 - \beta_1^t} \frac{m_t}{\sqrt{v_t} + \epsilon}\end{split}\]where \(g_t\) denotes a gradient, and let \(m_0 \leftarrow 0\) and \(v_0 \leftarrow 0\).
- Parameters
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.AdaBound(alpha=0.001, beta1=0.9, beta2=0.999, eps=1e-08, final_lr=0.1, gamma=0.001)¶
AdaBound optimizer applies dynamic bounds on learning rates to Adam.
\[\begin{split}w_{t+1} &\leftarrow w_t - \eta_t*m_t\\ \eta_t = clip( \alpha\frac{\sqrt{1 - \beta_2^t}}{(1 - \beta_1^t)(\sqrt{v_t} + \epsilon)}, \eta_l(t), \eta_u(t))\\ \eta_l(t) = (1 - (1/((1-\gamma)t+1)))\alpha^*\\ \eta_u(t) = (1 + (1/((1-\gamma)t)))\alpha^*\end{split}\]where \(\alpha^*\) (
final_lr
) is scaled by a factor defined as the current value of \(\alpha\) (set byset_learning_rate(lr)
) over initial value of \(\alpha\), so that learnign rate scheduling is properly applied to both \(\alpha\) and \(\alpha^*\).- Parameters
alpha (float) – Step size (\(\alpha\)).
beta1 (float) – Decay rate of first-order momentum (\(\beta_1\)).
beta2 (float) – Decay rate of second-order momentum (\(\beta_2\)).
eps (float) – Small value for avoiding zero division(\(\epsilon\)).
final_lr (float) – Final (SGD) learning rate.
gamma (float) – Convergence speed of the bound functions.
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.Adamax(alpha=0.002, beta1=0.9, beta2=0.999, eps=1e-08)¶
ADAMAX Optimizer.
\[\begin{split}m_t &\leftarrow \beta_1 m_{t-1} + (1 - \beta_1) g_t\\ v_t &\leftarrow \max\left(\beta_2 v_{t-1}, |g_t|\right)\\ w_{t+1} &\leftarrow w_t - \alpha \frac{\sqrt{1 - \beta_2^t}}{1 - \beta_1^t} \frac{m_t}{v_t + \epsilon}\end{split}\]where \(g_t\) denotes a gradient, and let \(m_0 \leftarrow 0\) and \(v_0 \leftarrow 0\), \(v_t\) is an exponentially weighted infinity norm of a sequence of gradients \(t=0,...,t\).
- Parameters
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.AMSGRAD(alpha=0.001, beta1=0.9, beta2=0.999, eps=1e-08, bias_correction=False)¶
AMSGRAD optimizer.
\[\begin{split}m_t &\leftarrow \beta_1 m_{t-1} + (1 - \beta_1) g_t\\ v_t &\leftarrow \beta_2 v_{t-1} + (1 - \beta_2) g_t^2\\ \hat{v_t} = \max(\hat{v_{t-1}}, v_t)\\ w_{t+1} &\leftarrow w_t - \alpha \frac{m_t}{\sqrt{\hat{v_t}} + \epsilon}\end{split}\]where \(g_t\) denotes a gradient, and let \(m_0 \leftarrow 0\) and \(v_0 \leftarrow 0\).
- Parameters
alpha (float) – Step size (\(\alpha\)).
beta1 (float) – Decay rate of first-order momentum (\(\beta_1\)).
beta2 (float) – Decay rate of second-order momentum (\(\beta_2\)).
eps (float) – Small value for avoiding zero division(\(\epsilon\)). Note this does not appear in the paper.
bias_correction (bool) – Apply bias correction to moving averages defined in ADAM. Note this does not appear in the paper.
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.AMSBound(alpha=0.001, beta1=0.9, beta2=0.999, eps=1e-08, final_lr=0.1, gamma=0.001, bias_correction=False)¶
AMSBound optimizer applies dynamic bounds on learning rates to AMSGrad.
\[\begin{split}w_{t+1} &\leftarrow w_t - \eta_t*m_t\\ \eta_t = clip( \alpha\frac{\sqrt{1 - \beta_2^t}}{(1 - \beta_1^t)(\sqrt{\hat{v_t}} + \epsilon)}, \eta_l(t), \eta_u(t))\\ \hat{v_t} = \max(\hat{v_{t-1}}, v_t)\\ \eta_l(t) = (1 - (1/((1-\gamma)t+1)))\alpha^*\\ \eta_u(t) = (1 + (1/((1-\gamma)t)))\alpha^*\end{split}\]where \(\alpha^*\) (
final_lr
) is scaled by a factor defined as the current value of \(\alpha\) (set byset_learning_rate(lr)
) over initial value of \(\alpha\), so that learnign rate scheduling is properly applied to both \(\alpha\) and \(\alpha^*\).- Parameters
alpha (float) – Step size (\(\alpha\)).
beta1 (float) – Decay rate of first-order momentum (\(\beta_1\)).
beta2 (float) – Decay rate of second-order momentum (\(\beta_2\)).
eps (float) – Small value for avoiding zero division(\(\epsilon\)). Note this does not appear in the paper.
final_lr (float) – Final (SGD) learning rtae
gamma (float) – Convergence speed of the bound functions
bias_correction (bool) – Apply bias correction to moving averages defined in ADAM. Note this does not appear in the paper.
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.AdamW(alpha=0.001, beta1=0.9, beta2=0.999, eps=1e-08, wd=0.0001)¶
ADAM optimizer with decoupled weight decay.
\[\begin{split}m_t &\leftarrow \beta_1 m_{t-1} + (1 - \beta_1) g_t\\ v_t &\leftarrow \beta_2 v_{t-1} + (1 - \beta_2) g_t^2\\ w_{t+1} &\leftarrow w_t - \alpha \frac{\sqrt{1 - \beta_2^t}}{1 - \beta_1^t} \frac{m_t}{\sqrt{v_t} + \epsilon} - \eta_t\lambda w_t\end{split}\]where \(g_t\) denotes a gradient, \(\lambda\) is the decoupled weight decay rate, and \(m_0 \leftarrow 0\) and \(v_0 \leftarrow 0\).
- Parameters
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.SgdW(lr=0.001, momentum=0.9, wd=0.0001)¶
Stochastic gradient descent (SGD) optimizer with decoupled weight decay.
\[\begin{split}v_t \leftarrow \gamma v_{t-1} + \eta g_t - (\eta / \eta_0)\lambda v_{t-1}\\ w_{t+1} \leftarrow w_t - v_t\end{split}\]where \(\lambda\) is the decoupled weight decay rate.
- Parameters
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
Communicator¶
Communicator transfers parameters over the compute graphs.
This is an alias to communicator.py.
Communicator interface¶
- class nnabla.communicators.Communicator¶
Communicator interface class.
Communicator exchanges data (e.g., gradient) using MPI-like collectives. This class is used for the distributed training.
- abort(self)¶
Terminates MPI execution environment
- add_context_and_parameters(self, ctx_param_dict)¶
Add context and parameters.
- all_gather(self, ndarray, ndarray_list, string group='world')¶
All gather over data in different device.
- Parameters
ndarray (
NdArray
) – Data to be gathered.ndarray_list (
NdArray
) – Data to be saved.group (string) – Name of a group. This groups is used when the collective is called.
Example:
# Run like `mpirun -n 2 python <code_snippet.py>` # note: the order of the output to stdout are stochastic because of multiprocesses. # Communicator and Context import numpy as np import nnabla as nn import nnabla.communicators as C from nnabla.ext_utils import get_extension_context extension_module = "cudnn" ctx = get_extension_context(extension_module) comm = C.MultiProcessCommunicator(ctx) comm.init() # Data x = nn.Variable([2, 2]) x.d = np.random.rand(*x.shape) y_list = [nn.Variable([2, 2]), nn.Variable([2, 2])] print("Before the collective ({}-th)".format(comm.rank)) print(x.d) # AllGather comm.all_gather(x.data, [y.data for y in y_list]) # Check print("After the collective ({}-th)".format(comm.rank)) for y in y_list: print(y.d)
- all_reduce(self, data, bool division=False, bool inplace=False, string group='world')¶
All reduce over data in different device.
- Parameters
data (
NdArray
or list ofNdArray
) –division (bool) – Flag to divide the reduce data by the number of
contexts
added, or the number of devices.inplace (bool) – Flag to use a packed array. Default is false. When true, it is memory-efficient but slow. When false, it is not memory efficient but fast. In both case, one can get the result in the same memory region.
group (string) – Name of a group. This groups is used when the collective is called.
Example:
# Run like `mpirun -n 2 python <code_snippet.py>` # note: the order of the output to stdout are stochastic because of multiprocesses. # Communicator and Context import numpy as np import nnabla as nn import nnabla.communicators as C from nnabla.ext_utils import get_extension_context extension_module = "cudnn" ctx = get_extension_context(extension_module) comm = C.MultiProcessCommunicator(ctx) comm.init() # Data x_list = [nn.Variable([2, 2]), nn.Variable([2, 2])] print("Before the collective ({}-th)".format(comm.rank)) for x in x_list: x.d = np.random.rand(*x.shape) print(x.d) # AllReduce comm.all_reduce([x.data for x in x_list], inplace=True) # Check print("After the collective ({}-th)".format(comm.rank)) for x in x_list: print(x.d)
- all_reduce_callback(self, data, size_t pack_size, bool division=False, string group='world')¶
All reduce over data in different device.
Note
This function does not support shared parameters (such as RNNs) currently.
- Parameters
Example:
In case of the multi-process data parallel distributed training,
# Run like `mpirun -n 2 python <code_snippet.py>` # Communicator and Context import numpy as np import nnabla as nn import nnabla.communicators as C from nnabla.ext_utils import get_extension_context extension_module = "cudnn" ctx = get_extension_context(extension_module) comm = C.MultiProcessCommunicator(ctx) comm.init() n_class = 2 b, c, h, w = 4, 1, 32, 32 # Data x = nn.Variable([b, c, h, w]) y = nn.Variable([b, 1]) # Network setting h = PF.convolution(x, 1, (3, 3), (1, 1), (1, 1)) pred = PF.affine(h, 2) loss = F.mean(F.softmax_cross_entropy(pred, y)) loss.forward() # AllReduce during backward loss.backward(communicator_callbacks = comm.all_reduce_callback([v.grad for v in nn.get_parameters().values()], 1024 * 1024 * 2))
- allreduce(self, bool division=False, bool inplace=False)¶
Deprecated. See all_reduce, instead.
Allreduce over parameters added. Currently,
allreduce
is applied to gradient regions.- Parameters
division (bool) – Flag to divide the reduce data by the number of
contexts
added, or the number of devices.inplace (bool) – Flag to use a packed array. Default is false. When true, it is memory-efficient but slow. When false, it is not memory efficient but fast. In both case, one can get the result in the same memory region.
- barrier(self)¶
Blocks until all processes in the communicator have reached this routine.
- bcast(self, data, int src, bool inplace=False, string group='world')¶
Broadcast data to different devices.
- Parameters
data (
NdArray
or list ofNdArray
) –src (int) – Source rank where the data is broadcasted.
inplace (bool) – Flag to use a packed array. Default is false. When true, it is memory-efficient but slow. When false, it is not memory efficient but fast. In both case, one can get the result in the same memory region.
group (string) – Name of a group. This groups is used when the collective is called.
Example:
# Run like `mpirun -n 2 python <code_snippet.py>` # note: the order of the output to stdout are stochastic because of multiprocesses. # Communicator and Context import numpy as np import nnabla as nn import nnabla.communicators as C from nnabla.ext_utils import get_extension_context extension_module = "cudnn" ctx = get_extension_context(extension_module) comm = C.MultiProcessCommunicator(ctx) comm.init() # Data x_list = [nn.Variable([2, 2]), nn.Variable([2, 2])] print("Before the collective ({}-th)".format(comm.rank)) for x in x_list: x.d = np.random.rand(*x.shape) print(x.d) # Bcast comm.bcast([x.data for x in x_list], src=0, inplace=True) # Check print("After the collective ({}-th)".format(comm.rank)) for x in x_list: print(x.d)
- clear_context_parameters(self)¶
Clear all registered contexts and parameters.
- find_group(self, group)¶
Return the list of ranks in the group. If the group does not exist, the empty list is returned.
- init(self)¶
Initialize a communicator.
Initall or initrank, depending multi-threads or multi-processes. This function MUST be called after all parameters communicated are added by
add_context_and_parameters
.
- local_rank¶
Get local rank of communicator.
- name¶
Get communicator name.
- new_group(self, name_ranks)¶
-
Example:
# Communicator and Context extension_module = "cudnn" ctx = get_extension_context(extension_module) comm = C.MultiProcessCommunicator(ctx) comm.init() # New group group = comm.new_group("node0", [0, 1, 2, 3])
- rank¶
Get rank of communicator.
- reduce(self, data, int dst, bool division=False, bool inplace=False, string group='world')¶
Reduce over data in different device.
- Parameters
data (
NdArray
or list ofNdArray
) –dst (int) – Destination rank where the result is saved.
division (bool) – Flag to divide the reduce data by the number of
contexts
added, or the number of devices.inplace (bool) – Flag to use a packed array. Default is false. When true, it is memory-efficient but slow. When false, it is not memory efficient but fast. In both case, one can get the result in the same memory region.
group (string) – Name of a group. This groups is used when the collective is called.
Example:
# Run like `mpirun -n 2 python <code_snippet.py>` # note: the order of the output to stdout are stochastic because of multiprocesses. # Communicator and Context import numpy as np import nnabla as nn import nnabla.communicators as C from nnabla.ext_utils import get_extension_context extension_module = "cudnn" ctx = get_extension_context(extension_module) comm = C.MultiProcessCommunicator(ctx) comm.init() # Data x_list = [nn.Variable([2, 2]), nn.Variable([2, 2])] print("Before the collective ({}-th)".format(comm.rank)) for x in x_list: x.d = np.random.rand(*x.shape) print(x.d) # Reduce comm.reduce([x.data for x in x_list], dst=0, inplace=True) # Check print("After the collective ({}-th)".format(comm.rank)) for x in x_list: print(x.d)
- reduce_scatter(self, ndarray_list, ndarray, bool division=False, string group='world')¶
Reduce scatter over data in different device.
- Parameters
ndarray_list (
NdArray
) – List of data to be reduced over different devices.ndarray (
NdArray
) – Data to be saved.group (string) – Name of a group. This groups is used when the collective is called.
Example:
# Run like `mpirun -n 2 python <code_snippet.py>` # note: the order of the output to stdout are stochastic because of multiprocesses. # Communicator and Context import numpy as np import nnabla as nn import nnabla.communicators as C from nnabla.ext_utils import get_extension_context extension_module = "cudnn" ctx = get_extension_context(extension_module) comm = C.MultiProcessCommunicator(ctx) comm.init() # Data x_list = [nn.Variable([2, 2]), nn.Variable([2, 2])] y = nn.Variable([2, 2]) print("Before the collective ({}-th)".format(comm.rank)) for x in x_list: x.d = np.random.rand(*x.shape) print(x.d) # ReduceScatter comm.reduce_scatter([x.data for x in x_list], y.data) # Check print("After the collective ({}-th)".format(comm.rank)) print(y.d)
- size¶
Get size of communicator.
List of communicators¶
- nnabla.communicators.MultiProcessDataParalellCommunicator()¶
MultiProcessDataParallelCommunicator(CContext ctx)
Multi Process Data Parallel Communicator for Distributed Training.
- Parameters
context (
Context
) – context used in this communicator.
Example:
In case of the multi-process data parallel distributed training,
# Communicator and Context extension_module = "cudnn" ctx = get_extension_context(extension_module) comm = C.MultiProcessCommunicator(ctx) comm.init() n_devices = comm.size mpi_rank = comm.rank device_id = comm.local_rank ctx.device_id = str(device_id) nn.set_default_context(ctx) # Network and Solver created here ... # Training loop for itr in range(num_itr): # Forward, zerograd, backward loss.forward() solver.zero_grad() loss.backward() # Allreduce comm.all_reduce([v.grad for v in nn.get_parameters().values()]) # Update solver.update()
Monitors¶
The Monitor API provides helpers for logging the progress of neural network training.
- class nnabla.monitor.Monitor(save_path)[source]¶
This class is created to setup the output directory of the monitoring logs. The created
nnabla.monitor.Monitor
instance is passed to classes in the following Monitors.
List of Monitors¶
- class nnabla.monitor.MonitorSeries(name, monitor=None, interval=1, verbose=True)[source]¶
Logs a series of values.
The values are displayed and/or output to the file
<name>-series.txt
.Example:
mons = MonitorSeries('mon', interval=2) for i in range(10): mons.add(i, i * 2)
- Parameters
- class nnabla.monitor.MonitorTimeElapsed(name, monitor=None, interval=100, verbose=True)[source]¶
Logs the elapsed time.
The values are displayed and/or output to the file
<name>-timer.txt
.Example:
import time mont = MonitorTimeElapsed("time", interval=2) for i in range(10): time.sleep(1) mont.add(i)
- Parameters
- class nnabla.monitor.MonitorImage(name, monitor, interval=1000, verbose=True, num_images=16, normalize_method=None)[source]¶
Saves a series of images.
The
.add()
method takes a(N,..., C, H, W)
array as an input, andnum_images
of[H, W, :min(3, C)]
are saved into the monitor folder for each interval.The values are displayed and/or output to the file
<name>/{iter}-{image index}.png
.Example:
import numpy as np m = Monitor('tmp.monitor') mi = MonitorImage('noise', m, interval=2, num_images=2) x = np.random.randn(10, 3, 8, 8) for i in range(10): mi.add(i, x)
- Parameters
name (str) – Name of the monitor. Used in the log.
monitor (Monitor) – Monitor class instance.
interval (int) – Interval of flush the outputs.
num_images (int) – Number of images to be saved in each iteration.
normalize_method (function) – A function that takes a NCHW format image minibatch as
numpy.ndarray
. The function should define a normalizer which map any inputs to a range of [0, 1]. The default normalizer normalizes the images into min-max normalization.
- class nnabla.monitor.MonitorImageTile(name, monitor, interval=1000, verbose=True, num_images=16, normalize_method=None)[source]¶
Saving a series of images.
The
.add()
method takes a(N,..., C, H, W)
array as an input, andnum_images
tiled(H, W, :min(3, C))
images are saved into the monitor folder for each interval.The values are displayed and/or output to the file
<name>/{iter}-{image index}.png
.Example:
import numpy as np m = Monitor('tmp.monitor') mi = MonitorImageTile('noise_noise', m, interval=2, num_images=4) x = np.random.randn(10, 3, 8, 8) for i in range(10): mi.add(i, x)
- Parameters
name (str) – Name of the monitor. Used in the log.
monitor (Monitor) – Monitor class instance.
interval (int) – Interval of flush the outputs.
num_images (int) – Number of images tiled to be saved into a single image in each iteration.
normalize_method (function) – A function that takes a NCHW format image minibatch as
numpy.ndarray
. The function should define a normalizer which map any inputs to a range of [0, 1]. The default normalizer normalizes the images into min-max normalization.
Utility functions¶
- nnabla.monitor.tile_images(data, padsize=1, padval=0)[source]¶
Convert an array with shape of (B, C, H, W) into a tiled image.
- nnabla.monitor.plot_series(filename, plot_kwargs=None)[source]¶
Plot series data from MonitorSeries output text file.
- Parameters
filename (str) – Path to *.series.txt file produced by
MonitorSeries
class.plot_kwags (dict, optional) – Keyward arguments passed to :function:`matplotlib.pyplot.plot`.
Note
matplotlib package is required.
Utils¶
NNP save and load utilities¶
IMPORTANT NOTICE: To handle NNP file from Neural Network Console, if the network you want to save/load contains LoopControl
functions RepeatStart, RepeatEnd, RecurrentInput, RecurrentOutput or Delay, you must expand network with File format converter.
- nnabla.utils.save.save(filename, contents, include_params=False, variable_batch_size=True, extension='.nnp', parameters=None)[source]¶
Save network definition, inference/training execution configurations etc.
- Parameters
filename (str or file object) –
Filename to store information. The file extension is used to determine the saving file format.
.nnp
: (Recommended) Creating a zip archive with nntxt (network definition etc.) and h5 (parameters)..nntxt
: Protobuf in text format..protobuf
: Protobuf in binary format (unsafe in terms ofbackward compatibility).
contents (dict) – Information to store.
include_params (bool) – Includes parameter into single file. This is ignored when the extension of filename is nnp.
variable_batch_size (bool) – By
True
, the first dimension of all variables is considered as batch size, and left as a placeholder (more specifically-1
). The placeholder dimension will be filled during/after loading.extension – if files is file-like object, extension is one of “.nntxt”, “.prototxt”, “.protobuf”, “.h5”, “.nnp”.
Example
The following example creates a two inputs and two outputs MLP, and save the network structure and the initialized parameters.
import nnabla as nn import nnabla.functions as F import nnabla.parametric_functions as PF from nnabla.utils.save import save batch_size = 16 x0 = nn.Variable([batch_size, 100]) x1 = nn.Variable([batch_size, 100]) h1_0 = PF.affine(x0, 100, name='affine1_0') h1_1 = PF.affine(x1, 100, name='affine1_0') h1 = F.tanh(h1_0 + h1_1) h2 = F.tanh(PF.affine(h1, 50, name='affine2')) y0 = PF.affine(h2, 10, name='affiney_0') y1 = PF.affine(h2, 10, name='affiney_1') contents = { 'networks': [ {'name': 'net1', 'batch_size': batch_size, 'outputs': {'y0': y0, 'y1': y1}, 'names': {'x0': x0, 'x1': x1}}], 'executors': [ {'name': 'runtime', 'network': 'net1', 'data': ['x0', 'x1'], 'output': ['y0', 'y1']}]} save('net.nnp', contents)
To get a trainable model, use following code instead.
contents = { 'global_config': {'default_context': ctx}, 'training_config': {'max_epoch': args.max_epoch, 'iter_per_epoch': args_added.iter_per_epoch, 'save_best': True}, 'networks': [ {'name': 'training', 'batch_size': args.batch_size, 'outputs': {'loss': loss_t}, 'names': {'x': x, 'y': t, 'loss': loss_t}}, {'name': 'validation', 'batch_size': args.batch_size, 'outputs': {'loss': loss_v}, 'names': {'x': x, 'y': t, 'loss': loss_v}}], 'optimizers': [ {'name': 'optimizer', 'solver': solver, 'network': 'training', 'dataset': 'mnist_training', 'weight_decay': 0, 'lr_decay': 1, 'lr_decay_interval': 1, 'update_interval': 1}], 'datasets': [ {'name': 'mnist_training', 'uri': 'MNIST_TRAINING', 'cache_dir': args.cache_dir + '/mnist_training.cache/', 'variables': {'x': x, 'y': t}, 'shuffle': True, 'batch_size': args.batch_size, 'no_image_normalization': True}, {'name': 'mnist_validation', 'uri': 'MNIST_VALIDATION', 'cache_dir': args.cache_dir + '/mnist_test.cache/', 'variables': {'x': x, 'y': t}, 'shuffle': False, 'batch_size': args.batch_size, 'no_image_normalization': True }], 'monitors': [ {'name': 'training_loss', 'network': 'validation', 'dataset': 'mnist_training'}, {'name': 'validation_loss', 'network': 'validation', 'dataset': 'mnist_validation'}], }
- class nnabla.utils.nnp_graph.NnpLoader(filepath, scope=None, extension='.nntxt')[source]¶
An NNP file loader.
- Parameters
filepath – file-like object or filepath.
extension – if filepath is file-like object, extension is one of “.nnp”, “.nntxt”, “.prototxt”.
Example
from nnabla.utils.nnp_graph import NnpLoader # Read a .nnp file. nnp = NnpLoader('/path/to/nnp.nnp') # Assume a graph `graph_a` is in the nnp file. net = nnp.get_network(network_name, batch_size=1) # `x` is an input of the graph. x = net.inputs['x'] # 'y' is an outputs of the graph. y = net.outputs['y'] # Set random data as input and perform forward prop. x.d = np.random.randn(*x.shape) y.forward(clear_buffer=True) print('output:', y.d)
- class nnabla.utils.nnp_graph.NnpNetwork(proto_network, batch_size, callback)[source]¶
A graph object which is read from nnp file.
An instance of NnpNetwork is usually created by an NnpLoader instance. See an example usage described in
NnpLoader
.- variables¶
A dict of all variables in a created graph with a variable name as a key, and a nnabla.Variable as a value.
- Type
Image Utils¶
This module provides read, write and resize functions for images. The backends of these functions are automatically changed, depending on the user`s environment. The priority of the backends is as below (upper is higher priority):
OpenCV (cv2)
scikit-image (skimage)
pillow (PIL) (need to be installed)
At least one of these modules needs to be installed to use this module.
- nnabla.utils.image_utils.imread(path, grayscale=False, size=None, interpolate='bilinear', channel_first=False, as_uint16=False, num_channels=- 1, **kwargs)[source]¶
Read image from
path
. If you specify thesize
, the output array is resized. Default output shape is (height, width, channel) for RGB image and (height, width) for gray-scale image.- Parameters
path (String or File Object) – Input image path.
grayscale (bool) – If True, the img is rescaled to gray-scale. Default is False.
size (tuple of int) – Output shape. The order is (width, height). If None, the image is not resized. Default is None.
interpolate (str) –
Interpolation method. This argument is depend on the backend. If you want to specify this argument, you should pay much attention to which backend you use now. What you can select is below:
pil backend: [“nearest”, “box”, “bilinear”, “hamming”, “bicubic”, “lanczos”].
cv2 backend: [“nearest”, “bilinear”, “bicubic”, “lanczos”].
Default is “bilinear” for both backends.
channel_first (bool) – If True, the shape of the output array is (channel, height, width) for RGB image. Default is False.
as_uint16 (bool) – If True, this function tries to read img as np.uint16. Default is False.
num_channels (int) – channel size of output array. Default is -1 which preserves raw image shape.
return_palette_indices (bool) – This argument can be used only by pil backend. On pil backend, if this flag is True and PIL.Image has the mode “P”, then this function returns 2-D array containing the indices into palette. Otherwise, 3-D array of “RGB” or “RGBA” (it depends on an image info) will be returned. Default value is False.
- Returns
if as_uint16=True output dtype is np.uint16, else np.uint8 (default).
- Return type
- nnabla.utils.image_utils.imsave(path, img, channel_first=False, as_uint16=False, auto_scale=True, **kwargs)[source]¶
Save
img
to the file specified bypath
. As default, the shape ofimg
has to be (height, width, channel).- Parameters
path (str) – Output path.
img (numpy.ndarray) – Input image. All pixel values must be positive and in the range [0, 255] of int for uint8, [0, 65535] of int for uint16 or [0, 1] for float. When you pass float image, you must set
auto_scale
as True (If not, exception will be raised). If img with negative values is passed as input, exception will be raised.channel_first (bool) – If True, you can input the image whose shape is (channel, height, width). Default is False.
as_uint16 (bool) – If True, cast image to uint16 before save. Default is False.
auto_scale (bool) – Whether the range of pixel values are scaled up or not. The range of upscaled pixel values depends on output dtype, which is [0, 255] as uint8 and [0, 65535] as uint16.
- nnabla.utils.image_utils.imresize(img, size, interpolate='bilinear', channel_first=False, **kwargs)[source]¶
Resize
img
tosize
. As default, the shape of input image has to be (height, width, channel).- Parameters
img (numpy.ndarray) – Input image.
size (tuple of int) – Output shape. The order is (width, height).
interpolate (str) –
Interpolation method. This argument is depend on the backend. If you want to specify this argument, you should pay much attention to which backend you use now. What you can select is below:
pil backend: [“nearest”, “box”, “bilinear”, “hamming”, “bicubic”, “lanczos”].
cv2 backend: [“nearest”, “bilinear”, “bicubic”, “lanczos”].
Default is “bilinear” for both backends.
channel_first (bool) – If True, the shape of the output array is (channel, height, width) for RGB image. Default is False.
- Returns
numpy.ndarray
Data Iterators¶
NNabla provides various utilities for using data for training.
DataSource¶
- class nnabla.utils.data_source.DataSource(shuffle=False, rng=None)[source]¶
Bases:
object
This class contains various properties and methods for the data source, which are utilized by py:class:DataIterator.
- Parameters
shuffle (bool) – Indicates whether the dataset is shuffled or not.
rng (None or
numpy.random.RandomState
) – Numpy random number generator.
- property shuffle¶
Whether dataset is shuffled or not.
- Returns
whether dataset is shuffled.
- Return type
- class nnabla.utils.data_source.DataSourceWithFileCache(data_source, cache_dir=None, cache_file_name_prefix='cache', shuffle=False, rng=None)[source]¶
Bases:
nnabla.utils.data_source.DataSource
This class contains properties and methods for data source that can be read from cache files, which are utilized by data iterator.
- Parameters
data_source (
DataSource
) – Instance of DataSource class which provides data.cache_dir (str) – Location of file_cache. If this value is None,
data_source.DataSourceWithFileCache
creates file caches implicitly on temporary directory and erases them all when data_iterator is finished. Otherwise,data_source.DataSourceWithFileCache
keeps created cache. Default is None.cache_file_name_prefix (str) – Beginning of the filenames of cache files. Default is ‘cache’.
shuffle (bool) – Indicates whether the dataset is shuffled or not.
rng (None or
numpy.random.RandomState
) – Numpy random number generator.
- property shuffle¶
Whether dataset is shuffled or not.
- Returns
whether dataset is shuffled.
- Return type
- class nnabla.utils.data_source.DataSourceWithMemoryCache(data_source, shuffle=False, rng=None)[source]¶
Bases:
nnabla.utils.data_source.DataSource
This class contains properties and methods for data source that can be read from memory cache, which is utilized by data iterator.
- Parameters
data_source (
DataSource
) – Instance of DataSource class which provides data.shuffle (bool) – Indicates whether the dataset is shuffled or not.
rng (None or
numpy.random.RandomState
) – Numpy random number generator.
- property shuffle¶
Whether dataset is shuffled or not.
- Returns
whether dataset is shuffled.
- Return type
DataIterator¶
- class nnabla.utils.data_iterator.DataIterator(data_source, batch_size, rng=None, use_thread=True, epoch_begin_callbacks=[], epoch_end_callbacks=[], stop_exhausted=False)[source]¶
Bases:
object
Collect data from
data_source
and yields bunch of data.- Parameters
data_source (
DataSource
) – Instance of DataSource class witch provides data for this class.batch_size (int) – Size of data unit.
rng (None or
numpy.random.RandomState
) – Numpy random number generator.use_thread (bool) – If
use_thread
is set to True, iterator will use another thread to fetch data. Ifuse_thread
is set to False, iterator will use current thread to fetch data.epoch_begin_callbacks (list of functions) – An item is a function which takes an epoch index as a argument. These are called at the beginning of an epoch.
epoch_end_callbacks (list of functions) – An item is a function which takes an epoch index as a argument. These are called at the end of an epoch.
stop_exhausted (bool) – If
stop_exhausted
is set to False, iterator will be reset so that iteration can be continued. Ifstop_exhausted
is set to True, iterator will raise StopIteration to stop the loop.
- property batch_size¶
Number of training samples that
next()
returns.- Returns
Number of training samples.
- Return type
- property epoch¶
The number of times
position()
returns to zero.- Returns
epoch
- Return type
- next()[source]¶
It generates tuple of data.
For example, if
self._variables == ('x', 'y')()
This method returns :py:meth:` ( [[X] * batch_size], [[Y] * batch_size] )`- Returns
tuple of data for mini-batch in numpy.ndarray.
- Return type
- register_epoch_begin_callback(callback)[source]¶
Register epoch begin callback.
- Parameters
callback (function) – A function takes an epoch index as an argument.
- register_epoch_end_callback(callback)[source]¶
Register epoch end callback.
- Parameters
callback (function) – A function takes an epoch index as an argument.
- property size¶
Data size that DataIterator will generate. This is the largest integer multiple of batch_size not exceeding
self._data_source.size()
.- Returns
Data size
- Return type
- slice(rng, num_of_slices=None, slice_pos=None, slice_start=None, slice_end=None, cache_dir=None, use_cache=False)[source]¶
Slices the data iterator so that newly generated data iterator has access to limited portion of the original data.
- Parameters
rng (numpy.random.RandomState) – Random generator for Initializer.
num_of_slices (int) – Total number of slices to be made. Muts be used together with
slice_pos
.slice_pos (int) – Position of the slice to be assigned to the new data iterator. Must be used together with
num_of_slices
.slice_start (int) – Starting position of the range to be sliced into new data iterator. Must be used together with
slice_end
.slice_end (int) – End position of the range to be sliced into new data iterator. Must be used together with
slice_start
.cache_dir (str) – Directory to save cache files. if cache_dir is None and use_cache is True, will used memory cache.
use_cache (bool) – Whether use cache for data_source.
Example:
from nnabla.utils.data_iterator import data_iterator_simple import numpy as np def load_func1(index): d = np.ones((2, 2)) * index return d di = data_iterator_simple(load_func1, 1000, batch_size=3) di_s1 = di.slice(None, num_of_slices=10, slice_pos=0) di_s2 = di.slice(None, num_of_slices=10, slice_pos=1) di_s3 = di.slice(None, slice_start=100, slice_end=200) di_s4 = di.slice(None, slice_start=300, slice_end=400)
Utilities¶
- nnabla.utils.data_iterator.data_iterator(data_source, batch_size, rng=None, use_thread=True, with_memory_cache=True, with_file_cache=False, cache_dir=None, epoch_begin_callbacks=[], epoch_end_callbacks=[], stop_exhausted=False)[source]¶
Helper method to use
DataSource
.You can use
DataIterator
with your ownDataSource
for easy implementation of data sources.For example,
ds = YourOwnImplementationOfDataSource() batch = data_iterator(ds, batch_size)
- Parameters
data_source (
DataSource
) – Instance of DataSource class which provides data.batch_size (int) – Batch size.
rng (None or
numpy.random.RandomState
) – Numpy random number generator.use_thread (bool) – If
use_thread
is set to True, iterator will use another thread to fetch data. Ifuse_thread
is set to False, iterator will use current thread to fetch data.with_memory_cache (bool) – If
True
, usedata_source.DataSourceWithMemoryCache
to wrapdata_source
. It is a good idea to set this as true unless data_source provides on-memory data. Default value is True.with_file_cache (bool) – If
True
, usedata_source.DataSourceWithFileCache
to wrapdata_source
. Ifdata_source
is slow, enabling this option a is good idea. Default value is False.cache_dir (str) – Location of file_cache. If this value is None,
data_source.DataSourceWithFileCache
creates file caches implicitly on temporary directory and erases them all when data_iterator is finished. Otherwise,data_source.DataSourceWithFileCache
keeps created cache. Default is None.epoch_begin_callbacks (list of functions) – An item is a function which takes an epoch index as an argument. These are called at the beginning of an epoch.
epoch_end_callbacks (list of functions) – An item is a function which takes an epoch index as an argument. These are called at the end of an epoch.
stop_exhausted (bool) – If
stop_exhausted
is set to False, iterator will be reset so that iteration can be continued. Ifstop_exhausted
is set to True, iterator will raise StopIteration to stop the loop.
- Returns
Instance of DataIterator.
- Return type
- nnabla.utils.data_iterator.data_iterator_simple(load_func, num_examples, batch_size, shuffle=False, rng=None, use_thread=True, with_memory_cache=False, with_file_cache=False, cache_dir=None, epoch_begin_callbacks=[], epoch_end_callbacks=[], stop_exhausted=False)[source]¶
A generator that
yield
s minibatch data as a tuple, as defined inload_func
. It can unlimitedly yield minibatches at your request, queried from the provided data.- Parameters
load_func (function) – Takes a single argument
i
, an index of an example in your dataset to be loaded, and returns a tuple of data. Every call by any indexi
must return a tuple of arrays with the same shape.num_examples (int) – Number of examples in your dataset. Random sequence of indexes is generated according to this number.
batch_size (int) – Size of data unit.
shuffle (bool) – Indicates whether the dataset is shuffled or not. Default value is False.
rng (None or
numpy.random.RandomState
) – Numpy random number generator.use_thread (bool) – If
use_thread
is set to True, iterator will use another thread to fetch data. Ifuse_thread
is set to False, iterator will use current thread to fetch data.with_memory_cache (bool) – If
True
, usedata_source.DataSourceWithMemoryCache
to wrapdata_source
. It is a good idea to set this as true unless data_source provides on-memory data. Default value is False.with_file_cache (bool) – If
True
, usedata_source.DataSourceWithFileCache
to wrapdata_source
. Ifdata_source
is slow, enabling this option a is good idea. Default value is False.cache_dir (str) – Location of file_cache. If this value is None,
data_source.DataSourceWithFileCache
creates file caches implicitly on temporary directory and erases them all when data_iterator is finished. Otherwise,data_source.DataSourceWithFileCache
keeps created cache. Default is None.epoch_begin_callbacks (list of functions) – An item is a function which takes an epoch index as an argument. These are called at the beginning of an epoch.
epoch_end_callbacks (list of functions) – An item is a function which takes an epoch index as an argument. These are called at the end of an epoch.
stop_exhausted (bool) – If
stop_exhausted
is set to False, iterator will be reset so that iteration can be continued. Ifstop_exhausted
is set to True, iterator will raise StopIteration to stop the loop.
- Returns
Instance of DataIterator.
- Return type
Here is an example of
load_func
which returns an image and a label of a classification dataset.import numpy as np from nnabla.utils.image_utils import imread image_paths = load_image_paths() labels = load_labels() def my_load_func(i): ''' Returns: image: c x h x w array label: 0-shape array ''' img = imread(image_paths[i]).astype('float32') return np.rollaxis(img, 2), np.array(labels[i])
- nnabla.utils.data_iterator.data_iterator_csv_dataset(uri, batch_size, shuffle=False, rng=None, use_thread=True, normalize=True, with_memory_cache=True, with_file_cache=True, cache_dir=None, epoch_begin_callbacks=[], epoch_end_callbacks=[], stop_exhausted=False)[source]¶
Get data directly from a dataset provided as a CSV file.
You can read files located on the local file system, http(s) servers or Amazon AWS S3 storage.
For example,
batch = data_iterator_csv_dataset('CSV_FILE.csv', batch_size, shuffle=True)
- Parameters
uri (str) – Location of dataset CSV file.
batch_size (int) – Size of data unit.
shuffle (bool) – Indicates whether the dataset is shuffled or not. Default value is False.
rng (None or
numpy.random.RandomState
) – Numpy random number generator.use_thread (bool) – If
use_thread
is set to True, iterator will use another thread to fetch data. Ifuse_thread
is set to False, iterator will use current thread to fetch data.normalize (bool) – If True, each sample in the data gets normalized by a factor of 255. Default is True.
with_memory_cache (bool) – If
True
, usedata_source.DataSourceWithMemoryCache
to wrapdata_source
. It is a good idea to set this as true unless data_source provides on-memory data. Default value is True.with_file_cache (bool) – If
True
, usedata_source.DataSourceWithFileCache
to wrapdata_source
. Ifdata_source
is slow, enabling this option a is good idea. Default value is False.cache_dir (str) – Location of file_cache. If this value is None,
data_source.DataSourceWithFileCache
creates file caches implicitly on temporary directory and erases them all when data_iterator is finished. Otherwise,data_source.DataSourceWithFileCache
keeps created cache. Default is None.epoch_begin_callbacks (list of functions) – An item is a function which takes an epoch index as an argument. These are called at the beginning of an epoch.
epoch_end_callbacks (list of functions) – An item is a function which takes an epoch index as an argument. These are called at the end of an epoch.
stop_exhausted (bool) – If
stop_exhausted
is set to False, iterator will be reset so that iteration can be continued. Ifstop_exhausted
is set to True, iterator will raise StopIteration to stop the loop.
- Returns
Instance of DataIterator
- Return type
- nnabla.utils.data_iterator.data_iterator_cache(uri, batch_size, shuffle=False, rng=None, use_thread=True, normalize=True, with_memory_cache=True, epoch_begin_callbacks=[], epoch_end_callbacks=[], stop_exhausted=False)[source]¶
Get data from the cache directory.
Cache files are read from the local file system.
For example,
batch = data_iterator_cache('CACHE_DIR', batch_size, shuffle=True)
- Parameters
uri (str) – Location of directory with cache files.
batch_size (int) – Size of data unit.
shuffle (bool) – Indicates whether the dataset is shuffled or not. Default value is False.
rng (None or
numpy.random.RandomState
) – Numpy random number generator.use_thread (bool) – If
use_thread
is set to True, iterator will use another thread to fetch data. Ifuse_thread
is set to False, iterator will use current thread to fetch data.normalize (bool) – If True, each sample in the data gets normalized by a factor of 255. Default is True.
with_memory_cache (bool) – If
True
, usedata_source.DataSourceWithMemoryCache
to wrapdata_source
. It is a good idea to set this as true unless data_source provides on-memory data. Default value is True.epoch_begin_callbacks (list of functions) – An item is a function which takes an epoch index as an argument. These are called at the beginning of an epoch.
epoch_end_callbacks (list of functions) – An item is a function which takes an epoch index as an argument. These are called at the end of an epoch.
stop_exhausted (bool) – If
stop_exhausted
is set to False, iterator will be reset so that iteration can be continued. Ifstop_exhausted
is set to True, iterator will raise StopIteration to stop the loop.
- Returns
Instance of DataIterator
- Return type
- nnabla.utils.data_iterator.data_iterator_concat_datasets(data_source_list, batch_size, shuffle=False, rng=None, use_thread=True, with_memory_cache=True, with_file_cache=False, cache_dir=None, epoch_begin_callbacks=[], epoch_end_callbacks=[], stop_exhausted=False)[source]¶
Get data from multiple datasets.
For example,
batch = data_iterator_concat_datasets([DataSource0, DataSource1, ...], batch_size)
- Parameters
data_source_list (list of DataSource) – list of datasets.
batch_size (int) – Size of data unit.
shuffle (bool) – Indicates whether the dataset is shuffled or not. Default value is False.
rng (None or
numpy.random.RandomState
) – Numpy random number generator.use_thread (bool) – If
use_thread
is set to True, iterator will use another thread to fetch data. Ifuse_thread
is set to False, iterator will use current thread to fetch data.with_memory_cache (bool) – If
True
, usedata_source.DataSourceWithMemoryCache
to wrapdata_source
. It is a good idea to set this as true unless data_source provides on-memory data. Default value is True.with_file_cache (bool) – If
True
, usedata_source.DataSourceWithFileCache
to wrapdata_source
. Ifdata_source
is slow, enabling this option a is good idea. Default value is False.cache_dir (str) – Location of file_cache. If this value is None,
data_source.DataSourceWithFileCache
creates file caches implicitly on temporary directory and erases them all when data_iterator is finished. Otherwise,data_source.DataSourceWithFileCache
keeps created cache. Default is None.epoch_begin_callbacks (list of functions) – An item is a function which takes an epoch index as an argument. These are called at the beginning of an epoch.
epoch_end_callbacks (list of functions) – An item is a function which takes an epoch index as an argument. These are called at the end of an epoch.
stop_exhausted (bool) – If
stop_exhausted
is set to False, iterator will be reset so that iteration can be continued. Ifstop_exhausted
is set to True, iterator will raise StopIteration to stop the loop.
- Returns
Instance of DataIterator
- Return type
Debug Utils¶
Graph Profiler¶
- class nnabla.utils.profiler.GraphProfiler(graph, device_id, ext_name, solver=None, n_run=100, max_measure_execution_time=1, time_scale='m', backward_accum=False)[source]¶
Class for measuring calculation time of each functions which compose nnabla computation graph.
You can check some performances of your nnabla network. This can measure the calculation times of :
function-wise forward
function-wise backward
whole graph forward
whole graph backward
training (forward + backward + update) (if
solver
is not None)
Example:
import nnabla as nn import nnabla.functions as F import nnabla.solvers as S from nnabla.utils.profiler import GraphProfiler # Set up nnabla context device = "cpu" # you can also use GPU ("cudnn") ctx = get_extension_context(device) nn.set_default_context(ctx) # Network building x = nn.Variable(shape=...) t = nn.Variable(shape=...) y = CNN(x) # you can build not only CNN but any networks loss = F.mean(F.softmax_cross_entropy(y, t)) # any loss functions or variables can be used # solver setting solver = S.Sgd() solver.set_parameters(nn.get_parameters()) # SOME CODE (data loading or so on) B = GraphProfiler(loss, solver=solver, device_id=0, ext_name=device, n_run=1000) B.run()
- Parameters
graph (
nnabla.Variable
) – Instance ofnnabla.Variable
class. GraphProfiler find all functions which compose network graph from rootnnabla.Variable
to thisnnabla.Variable
.device_id (str) – gpu device id.
ext_name (str) – Extension name. e.g. ‘cpu’, ‘cuda’, ‘cudnn’ etc.
solver (
nnabla.solvers.Solver
) – Instance ofnnabla.solvers.Solver
for optimizing the parameters of the computation graph. if None, the training process is ignored. Default value is None.n_run (int) – This argument specifies how many times the each functions` execution time are measured. Default value is 100.
max_measure_execution_time (float) – Maximum time of executing measurement for each functions. This argument has higher priority than
n_run
. When the measurement time for each functions get bigger than this argument, this class stops measuring and goes to next function, unless the total times of measurement are less than n_run. Default value is 1 [sec].time_scale (str) – Time scale to display. [‘m’, ‘u’, ‘n’] (which stands for ‘mili’, ‘micro’ and ‘nano’)
backward_accum (bool) – Accumulation flag passed to the each backward function. The flag will fulfill the all accumulation flags with the same value of backward_accum. This flag is only valid for the time measurement of each function. For whole graph comutation, the NNabla graph engine set the appropriate accumulation flags to functions. Pay attention to inplace flag for your graph because accumulation and inplace flags cannot be set at the same time. If even one inplace flag is true in your graph, this backward_accum must be false. Default value is False.
- class nnabla.utils.profiler.GraphProfilerCsvWriter(gb, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶
csv writer for GraphProfiler class.
Example:
from nnabla.utils.profiler import GraphProfiler, GraphProfilerCsvWriter # Network building comes above B = GraphProfiler(variable, solver=solver, device_id=0, ext_name=device, n_run=1000) B.run() with open("./profile.csv", "w") as f: writer = GraphProfilerCsvWriter(B, file=f) writer.write()
- Parameters
gb (
GraphProfiler
) – Instance of GraphProfiler class which is main executor of profiling.file (Python file object) – Output file object. Profile results will be written to the file which is specified by this argument.
Time Profiler¶
- class nnabla.utils.inspection.profile.TimeProfiler(ext_name, device_id)[source]¶
An utility API to create function_hook callbacks to profile the execution time of each function. Passing
ext_name
anddevice_id
, you can define which device time you want to profile. Ifext_name
= “cuda” or “cudnn”, then cudaEvent will be used to measure the execution time. For more information about cudaEvent, see the CUDA document. If `ext_name`=”cpu” , then wall-clock-time on host will be used.Example:
ext_name = "cpu" device_id = "0" from nnabla.ext_utils import get_extension_context ctx = get_extension_context(ext_name, device_id=device_id) nn.set_default_context(ctx) y = model(...) from nnabla.utils.inspection import TimeProfiler tp = TimeProfiler(ext_name=ext_name, device_id=device_id) for i in range(max_iter): # All results of executions under "forward" scope are registered as "forward" execution. with tp.scope("forward"): y.forward(function_pre_hook=tp.pre_hook, function_post_hook=tp.post_hook) # All results of executions under "backward" scope are registered as "backward" execution. with tp.scope("backward") as tp: y.backward(function_pre_hook=tp.pre_hook, function_post_hook=tp.post_hook) # All results are evaluated by passing scopes to .calc_elapsed_time(). # Be sure to call calc_elapsed_time at each iteration, otherwise nothing is measured. tp.calc_elapsed_time(["forward", "backward", "summary"]) # To output results on stdout, call instance as a function. tp() # To write out as csv file, call .to_csv(). tp.to_csv(output_file_name)
- calc_elapsed_time(names=None)[source]¶
Evaluate all elapsed times. Note that elapsed time is not recorded until calc_elapsed_time is called.
- Parameters
names (str or list of str) – Scope name(s) to evaluate elapsed time.
- property post_hook¶
Get a callback for function_post_hook. This function can be used like the example below:
tp = TimeProfiler(..) with tp.scope("forward"): v.forward(function_post_hook=tp.post_hook()) with tp.scope("backward"): v.backward(function_post_hook=tp.post_hook())
- property pre_hook¶
Get a callback for function_pre_hook. This function can be used like the example below:
tp = TimeProfiler(..) with tp.scope("forward"): v.forward(function_pre_hook=tp.pre_hook()) with tp.scope("backward"): v.backward(function_pre_hook=tp.pre_hook())
- scope(scope_name)[source]¶
Change a scope to aggregate results. This function is used as context (
The with statement
statement),and all results under the context are labeled by
scope_name
.In adttion to the execution time of each function, the elapsed times between entering and exiting the each context are also recorded
and they are aggregated as “summary” scope.
- Parameters
scope_name (str) – Scope name.
Nan/Inf Tracer¶
- class nnabla.utils.inspection.value_trace.NanInfTracer(trace_nan=True, trace_inf=True, need_details=True)[source]¶
An utility API to create function_hook callbacks to check whether the outputs of all layers have NaN or inf as their values. During forward and backward execution, passed as function_hook, this API reports ValueError if at least one of all layer outputs has Nan or inf as its values. Otherwise, all tensors passed to next layer or function as is.
Example:
pred = model(...) from nnabla.utils.inspection import NanInfTracer nit = NanInfTracer(trace_inf=True, trace_nan=True, need_details=True) with nit.trace(): pred.forward(function_post_hook=nit.forward_post_hook) pred.backward(function_post_hook=nit.backward_post_hook)
- property backward_post_hook¶
Create callback function object which can be used as a function_post_hook argument of backward().
- check()[source]¶
Checks nan/inf existence at all outputs of all layers and raises ValueError only if exist.
- property forward_post_hook¶
Create callback function object which can be used as a function_post_hook argument of forward().
- trace()[source]¶
Create context manager to check nan/inf existence by using with statement. Using this context manager, checking nan/inf is performed automatically just before exiting with scope. Unless you use this context manager, be sure to call .check() explicitly to check nan/inf.
Example:
nit = NanInfTracer() with nit.trace(): pred.forward(function_post_hook=nit.forward_post_hook) pred.backward(function_post_hook=nit.backward_post_hook)
Pretty Printer¶
- class nnabla.utils.inspection.pretty_print.PrettyPrinter(summary=False, hidden=False)[source]¶
Pretty printer to print the graph structure used with the
visit
method of a Variable.
- nnabla.utils.inspection.pretty_print.pprint(v, forward=False, backward=False, summary=False, hidden=False, printer=False)[source]¶
Pretty print information of a graph from a root variable
v
.Note that in order to print the summary statistics, this function stores, i.e., does not reuse the intermediate buffers of a computation graph, increasing the memory usage if either the forward or backward is True.
- Parameters
v (
nnabla.Variable
) – Root variable.forward (bool) – Call the forward method of a variable
v
.backward (bool) – Call the backward method of a variable
v
.summary (bool) – Print statictis of a intermediate variable.
hidden (bool) – Store the intermediate input and output variables if True.
printer (bool) – Return the printer object if True.
Example:
pred = Model(...) from nnabla.utils.inspection import pprint pprint(pred, summary=True, forward=True, backward=True)
DLPack¶
Via a DLPack capsule, you can borrow a tensor from external software as a nnabla.NdArray
and can share a NdArray
to external software.
- nnabla.utils.dlpack.from_dlpack(dlp, arr=None)¶
Decode a DLPack to
NdArray
.Example:
# Create a tensor of an external tool, and encode as an DLPack. import torch from torch.utils.dlpack import to_dlpack t = torch.ones((5, 5), dtype=torch.float32, device=torch.device('cuda')) dlp = to_dlpack(t) # Borrow the DLPack tensor as nnabla.NdArray from nnabla.utils.dlpack import from_dlpack arr = from_dlpack(dlp)
If you want to move an ownership of DLPack to an exiting
NdArray
;from nnabla import NdArray arr = NdArray() from_dlpack(dlp, arr=arr)
- Parameters
dlp (PyCapsule) – A PyCapsule object of a
DLManagedTensor
(as"dltensor"
) which internal memory is borrowed by a tensor of an external package. The ownership of theDLManagedTensor
is moved to anNdArray
object, and the PyCapsule object is marked as"used_dltensor"
to inform that the ownership has been moved.arr (NdArray) – If specified, a given DLPack is decoded to it. Otherwise, it creates a new
NdArray
object and decodes the DLPack to it.
- Returns
an
NdArray
object borrowing the DLPack tensor.- Return type
- nnabla.utils.dlpack.to_dlpack(a, dtype=None, ctx=None)¶
Returns a DLPack which owns an internal array object borrowed by a specified
NdArray
.Example:
# Create a nnabla.NdArray in CUDA. import nnabla as nn from nnabla.ext_utils import get_extension_context ctx = get_extension_context('cudnn') nn.set_default_context(ctx) a = nn.NdArray.from_numpy_array(np.ones((5, 5), dtype=np.float32)) a.cast(np.float32, ctx) # Expose as a DLPack. from nnabla.utils.dlpack import to_dlpack dlp = to_dlpack(a) # Use the DLPack in PyTorch. import torch from torch.utils.dlpack import from_dlpack t = from_dlpack(dlp) # Changing the values in Torch will also be affected in nnabla # because they share memory. t.add_(1) print(a.data) # All values become 2.
- Parameters
a (NdArray) – An
NdArray
object. An internal array which is recently modified or created will be encoded into a DLPack.dtype (numpy.dtype) – If specified, in-place cast operation may be performed before encoding it to a DLPack.
ctx (Context) – If specified, in-place device transfer operation may be performed before encoding it into a DLPack.
- Returns
A PyCapsule object of a
DLManagedTensor
(as"dltensor"
) which internal memory is borrowed by the specifiedNdArray
.- Return type
PyCapsule
RNN Utils¶
- class nnabla.utils.rnn.PackedSequence[source]¶
- Parameters
data (
nnabla.Variable
) – Packed sequence.batch_sizes (
nnabla.Variable
) – Batch size for each time step and always resides in CPU.sorted_indices (
nnabla.Variable
) – Sorted indices to reconstruct the original sequences.unsorted_indices (
nnabla.Variable
) – Unsorted indices to reconstruct the original sequences.
- nnabla.utils.rnn.pad_sequence(sequences, batch_first=False, padding_value=0.0)[source]¶
Pad a list of variable-length Variables.
This method stacks a list of variable-length
nnabla.Variable
s with the padding_value.\(T_i\) is the length of the \(i\)-th Variable in the sequences. \(B\) is the batch size equal to the length of the sequences. \(T\) is the max of \(T_i\) for all \(i\). \(*\) is the remaining dimensions including none.
Note
This function must be used the dynamic computation mode.
Example:
import numpy as np import nnabla as nn import nnabla.functions as F import nnabla.utils.rnn as rnn_utils nn.set_auto_forward(True) l2v = lambda ldata: nn.Variable.from_numpy_array(np.asarray(ldata)) a = l2v([1, 1, 1, 1]) b = l2v([2, 2, 2]) c = l2v([2, 2, 2]) d = l2v([3, 3]) e = l2v([3, 3]) sequences = [a, b, c, d, e] padded_sequence = rnn_utils.pad_sequence(sequences) print(padded_sequence.d)
- Parameters
sequences (list of
nnabla.Variable
) – Sequence of the variable of (\(T_i\), \(*\)) shape.batch_first (bool) – If False, output is of (\(T\), \(B\), \(*\)) shape, otherwise (\(B\), \(T\), \(*\)).
padding_value (float) – Padding value.
- Returns
nnabla.Variable
of (\(T\), \(B\), \(*\)) or (\(B\), \(T\), \(*\)) shape
- nnabla.utils.rnn.pack_padded_sequence(padded_sequence, lengths, batch_first=False, enforce_sorted=True)[source]¶
Pack a padded variable-length sequences.
This method packs a padded variable-length sequences.
\(T\) is the max length over the lengths of sequences. \(B\) is the batch size equal to the length of the sequences. \(*\) is the remaining dimensions including none.
Note
This function must be used the dynamic computation mode.
Example:
import numpy as np import nnabla as nn import nnabla.functions as F import nnabla.utils.rnn as rnn_utils nn.set_auto_forward(True) l2v = lambda ldata: nn.Variable.from_numpy_array(np.asarray(ldata)) a = l2v([1, 1, 1, 1]) b = l2v([2, 2, 2]) c = l2v([2, 2, 2]) d = l2v([3, 3]) e = l2v([3, 3]) sequences = [a, b, c, d, e] lengths = l2v([seq.shape[0] for seq in sequences]) padded_sequence = rnn_utils.pad_sequence(sequences) print(padded_sequence.d) packed_sequence = rnn_utils.pack_padded_sequence(padded_sequence, lengths) print(packed_sequence.data.d) print(packed_sequence.batch_sizes.d)
- Parameters
padded_sequence (
nnabla.Variable
) – Padded sequence of (\(T \times B \times *\)) or (\(B \times T \times *\)) shape.lengths (
nnabla.Variable
) – Sequence length for each batch and always resides in CPU.batch_first (bool) –
padded_sequence
is of (\(T\), \(B\), \(*\)) shape if False, otherwise (\(B\), \(T\), \(*\)).enforce_sorted (bool) – Sequences are sorted by the length in a decreasing order if True. Default is True.
- Returns
- nnabla.utils.rnn.pack_sequence(sequences, batch_first=False, enforce_sorted=True)[source]¶
Pack a list of variable-length Variables.
This method packs a list of variable-length Variables.
\(T_i\) is the length of the \(i\)-th Variable in the sequences. \(T\) is the max of \(T_i\) for all \(i\). \(B\) is the batch size equal to the length of the sequences. \(*\) is the remaining dimensions including none.
Note
This function must be used the dynamic computation mode.
Example:
import numpy as np import nnabla as nn import nnabla.functions as F import nnabla.utils.rnn as rnn_utils nn.set_auto_forward(True) l2v = lambda ldata: nn.Variable.from_numpy_array(np.asarray(ldata)) a = l2v([3, 3]) b = l2v([2, 2, 2]) c = l2v([2, 2, 2]) d = l2v([1, 1, 1, 1]) e = l2v([3, 3]) sequences = [a, b, c, d, e] packed_sequence = rnn_utils.pack_sequence(sequences, enforce_sorted=False) print(packed_sequence.data.d) print(packed_sequence.batch_sizes.d)
- Parameters
sequences (list of
nnabla.Variable
) – List ofnnabla.Variable
of (\(T_i\), \(*\)) shape.enforce_sorted (bool) – Sequences are sorted by the length in a decreasing order if True. Default is True.
- Returns
packed_sequence
- Return type
- nnabla.utils.rnn.pad_packed_sequence(sequence, batch_first=False, padding_value=0.0, total_length=None)[source]¶
Pad packed sequence.
This method unpacks the packed sequqnce and pad it, the inverse operation of
pack_padded_sequence()
.\(T_i\) is the length of the \(i\)-th Variable in the sequences. \(B\) is the batch size equal to the length of the sequences. \(T\) is the max of \(T_i\) for all \(i\). \(*\) is the remaining dimensions including none.
Note
This function must be used the dynamic computation mode.
Example:
import numpy as np import nnabla as nn import nnabla.functions as F import nnabla.utils.rnn as rnn_utils nn.set_auto_forward(True) l2v = lambda ldata: nn.Variable.from_numpy_array(np.asarray(ldata)) a = l2v([3, 3]) b = l2v([2, 2, 2]) c = l2v([2, 2, 2]) d = l2v([1, 1, 1, 1]) e = l2v([3, 3]) sequences = [a, b, c, d, e] packed_sequence = rnn_utils.pack_sequence(sequences, enforce_sorted=False) print(packed_sequence.data.d) print(packed_sequence.batch_sizes.d) padded_sequence, lengths = rnn_utils.pad_packed_sequence(packed_sequence) print(padded_sequence.d) print(lengths.d)
- Parameters
sequence (
PackedSequence
) – PackedSequence.batch_first (bool) – If False, output is of (\(T\), \(B\), \(*\)) shape, otherwise (\(B\), \(T\), \(*\)).
padding_value (float) – Padding value.
total_length (int) – If not None, the outputs are padded up to the
total_length
. If thetotal_length
is less than the max length in thesequences
, the error is thrown. This is normally used in the distributed training to align with the longest sequence in a distributed system.
- Returns
nnabla.Variable
of (\(T\), \(B\), \(*\)) or (\(B\), \(T\), \(*\)) shape
Misc¶
Python function profiler utilities¶
- nnabla.utils.function_profile.profile(fn=None, condition=None, profile_class=<class 'cProfile.Profile'>, print_freq=0, sort_keys=None, print_restrictions=None)[source]¶
Decorating a function that is profiled with a Python profiler such as
cProfile.Profile
.Note:
function
doesn’t refer toFunction
. A Python function.- Parameters
fn (function) – A function that is profiled. If None is specified (default), it returns a new decorator function. It is used when you want to specify optional arguments of this decorator function.
condition (function) – A function object which takes the same inputs with the decorated function, and returns a boolean value. The decorated function is profiled only when the
condition
function returnsTrue
. By default, it returns alwaysTrue
, hence profiling is performed everytime the decorated function is called.profile_class (class) – A profiler class such as
cProfile.Profile
andProfile.Profile
. The default value iscProfile.Profile
.print_freq (int) – The profiling result is printed at function calls with an interval specified by
print_freq
. If 0 is specified (default), the profiling result is only printed at the end of the Python process unlessdecorated_func.profiler.print_stats()
is called manually.sort_keys (iterable) – A list or tuple of string, which is passed to
pstats.Stats.sort_stats()
as arguments. The default is('cumulative', 'time', 'calls')
.print_restriction (iterable) – A list or tuple which is passed to
pstats.Stats.print_stats()
as arguments. The default value is(40,)
, which results in only 40 functions inside the decorated function are printed in the profiling result.
Returns: function
A decorated function. If
fn
isNone
, a new decorator function with optional arguments specified.Example
By decorating a function as following, the profling result is printed at the end of the Python process.
from nnabla.utils import function_profile @function_profile.profile def foo(a, b, c=None, d=None): ...
If you want to manually print the profiling result so far, use
FunctionProfile.print_stats()
of theFunctionProfile
object attached to the decorated function asprofiler
attribute.foo.profiler.print_stats()
If you want to profile the function only when a specific argument is passed to, use the
condition
argument as following.def profile_only_if_c_is_not_none(a, b, c=None, d=None): return c is not None @function_profile.profile(condition=profile_only_if_c_is_not_none) def foo(a, b, c=None, d=None): ...
- class nnabla.utils.function_profile.FunctionProfile(fn, condition=None, profile_class=<class 'cProfile.Profile'>, print_freq=0, sort_keys=None, print_restrictions=None)[source]¶
Function profiler object.
This is usually not directly used by users. It’s created via
profile()
, and attached to a decorated function object as an attributeprofiler
. Seeprofile
function for details.- print_stats(reset=True)[source]¶
Manually print profiling result.
- Parameters
reset (bool) – If False is specified, the profiling statistics so far is maintained. If
True
(default),reset_stats
is called to reset the profiling statistics.
Extensions¶
NNabla offers easy extensibility for developers to add new device extensions.
The NNabla Python package officially supports the cudnn
extension, which
dramatically accelerates computation
by leveraging NVIDIA CUDA GPUs with cuDNN computation primitives.
You can manually import extensions by:
import nnabla_ext.cudnn
See :ref:`python-package-installation` to install the CUDA extension.
Utilities for extension¶
Utilities for NNabla extensions.
- nnabla.ext_utils.list_extensions()[source]¶
List up available extensions.
Note
It may not work on some platforms/environments since it depends on the directory structure of the namespace packages.
- Returns: list of str
Names of available extensions.
- nnabla.ext_utils.import_extension_module(ext_name)[source]¶
Import an extension module by name.
The extension modules are installed under the
nnabla_ext
package as namespace packages. All extension modules provide a unified set of APIs.- Parameters
ext_name (str) – Extension name. e.g. ‘cpu’, ‘cuda’, ‘cudnn’ etc.
- Returns: module
An Python module of a particular NNabla extension.
Example
ext = import_extension_module('cudnn') available_devices = ext.get_devices() print(available_devices) ext.device_synchronize(available_devices[0]) ext.clear_memory_cache()
- nnabla.ext_utils.get_extension_context(ext_name, **kw)[source]¶
Get the context of the specified extension.
All extension’s module must provide
context(**kw)
function.- Parameters
- Returns
The current extension context.
- Return type
Example
ctx = get_extension_context('cudnn', device_id='0', type_config='half') nn.set_default_context(ctx)
APIs of extension modules¶
All extension modules must have the following functions.
- nnabla.ext_utils.context(*kw)¶
Returns a default context descriptor of the extension module. This method takes optional arguments depending on the extension. For example, in the
cudnn
extension, it takes thedevice_id
as anint
to specify the GPU where computation runs on.
- nnabla.ext_utils.device_synchronize(*kw)¶
This method is used to synchronize the device execution stream with respect to the host thread. For example, in CUDA, the kernel execution is enqueued into a stream, and is executed asynchronously w.r.t. the host thread. This function is only valid in devices that use such features. In the CPU implementation, this method is implemented as dummy function, and therefore calls to this function are ignored. The function in the
cudnn
extension takes thedevice_id
as an optional argument, which specifies the device you want to synchronize with.
- nnabla.ext_utils.get_device_count()¶
TODO: Write me.
- nnabla.ext_utils.get_devices()¶
TODO: Write me.
- nnabla.ext_utils.clear_memory_cache()¶
TODO: Write me.
Implementing an extension¶
TODO: Link to a page of how to create a new extension.
Pretrained Models¶
The nnabla.models
package provides APIs that allow users to execute state-of-the-art pre-trained models for inference and training in few lines of code.
ImageNet Models¶
This subpackage provides a variety of pre-trained state-of-the-art models which is trained on ImageNet dataset.
The pre-trained models can be used for both inference and training as following:
# Create ResNet-50 for inference
import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import numpy as np
from nnabla.models.imagenet import ResNet50
model = ResNet50()
batch_size = 1
# model.input_shape returns (3, 224, 224) when ResNet-50
x = nn.Variable((batch_size,) + model.input_shape)
y = model(x, training=False)
# Execute inference
# Load input image as uint8 array with shape of (3, 224, 224)
from nnabla.utils.image_utils import imread
img = imread('example.jpg', size=model.input_shape[1:], channel_first=True)
x.d[0] = img
y.forward()
predicted_label = np.argmax(y.d[0])
print('Predicted label:', model.category_names[predicted_label])
# Create ResNet-50 for fine-tuning
batch_size=32
x = nn.Variable((batch_size,) + model.input_shape)
# * By training=True, it sets batch normalization mode for training
# and gives trainable attributes to parameters.
# * By use_up_to='pool', it creats a network up to the output of
# the final global average pooling.
pool = model(x, training=True, use_up_to='pool')
# Add a classification layer for another 10 category dataset
# and loss function
num_classes = 10
y = PF.affine(pool, num_classes, name='classifier10')
t = nn.Variable((batch_size, 1))
loss = F.sum(F.softmax_cross_entropy(y, t))
# Training...
Available models are summarized in the following table. Error rates are calculated using single center crop.
Name |
Class |
Top-1 error |
Top-5 error |
Trained by/with |
---|---|---|---|---|
ResNet18 |
30.28 |
10.90 |
Neural Network Console |
|
ResNet34 |
26.72 |
8.89 |
Neural Network Console |
|
ResNet50 |
24.59 |
7.48 |
Neural Network Console |
|
ResNet101 |
23.81 |
7.01 |
Neural Network Console |
|
ResNet152 |
23.48 |
7.09 |
Neural Network Console |
|
MobileNet |
29.51 |
10.34 |
Neural Network Console |
|
MobileNetV2 |
29.94 |
10.82 |
Neural Network Console |
|
SENet |
22.04 |
6.29 |
Neural Network Console |
|
SqueezeNetV10 |
42.71 |
20.12 |
Neural Network Console |
|
SqueezeNetV11 |
41.23 |
19.18 |
Neural Network Console |
|
VGG11 |
30.85 |
11.38 |
Neural Network Console |
|
VGG13 |
29.51 |
10.46 |
Neural Network Console |
|
VGG16 |
29.03 |
10.07 |
Neural Network Console |
|
NIN |
42.91 |
20.66 |
Neural Network Console |
|
DenseNet |
23.82 |
7.02 |
Neural Network Console |
|
InceptionV3 |
21.82 |
5.88 |
Neural Network Console |
|
Xception |
23.59 |
6.91 |
Neural Network Console |
|
GoogLeNet |
31.22 |
11.34 |
Neural Network Console |
|
ResNeXt50 |
22.95 |
6.73 |
Neural Network Console |
|
ResNeXt101 |
22.80 |
6.74 |
Neural Network Console |
|
ShuffleNet10 |
34.15 |
13.85 |
Neural Network Console |
|
ShuffleNet05 |
41.99 |
19.64 |
Neural Network Console |
|
ShuffleNet20 |
30.34 |
11.12 |
Neural Network Console |
Common interfaces¶
- class nnabla.models.imagenet.base.ImageNetBase[source]¶
Most of ImageNet pretrained models are inherited from this class so that it provides some common interfaces.
- __call__(input_var=None, use_from=None, use_up_to='classifier', training=False, force_global_pooling=False, check_global_pooling=True, returns_net=False, verbose=0)[source]¶
Create a network (computation graph) from a loaded model.
- Parameters
input_var (Variable, optional) – If given, input variable is replaced with the given variable and a network is constructed on top of the variable. Otherwise, a variable with batch size as 1 and a default shape from
self.input_shape
.use_up_to (str) – Network is constructed up to a variable specified by a string. A list of string-variable correspondences in a model is described in documentation for each model class.
training (bool) – This option enables additional training (fine-tuning, transfer learning etc.) for the constructed network. If True, the
batch_stat
option in batch normalization is turnedTrue
, andneed_grad
attribute in trainable variables (conv weights and gamma and beta of bn etc.) is turnedTrue
. The default isFalse
.force_global_pooling (bool) – Regardless the input image size, the final average pooling before classification layer will be automatically transformed to a global average pooling. The default is
False
.check_global_pooling (bool) – If
True
, and if the stride configuration of the final average pooling is not for global pooling, it raises an exception. The default isTrue
. UseFalse
when user want to do the pooling with the trained stride(7, 7)
regardless the input spatial size.returns_net (bool) – When
True
, it returns aNnpNetwork
object. Otherwise, It only returns the last variable of the constructed network. The default isFalse
.verbose (bool, or int) – Verbose level. With
0
, it says nothing during network construction.
- property category_names¶
Returns category names of 1000 ImageNet classes.
- property input_shape¶
Should returns default image size (channel, height, width) as a tuple.
List of models¶
- class nnabla.models.imagenet.ResNet(num_layers=18)[source]¶
ResNet architectures for 18, 34, 50, 101, and 152 of number of layers.
- Parameters
num_layers (int) – Number of layers chosen from 18, 34, 50, 101, and 152.
The following is a list of string that can be specified to
use_up_to
option in__call__
method;'classifier'
(default): The output of the final affine layer for classification.'pool'
: The output of the final global average pooling.'lastconv'
: The input of the final global average pooling without ReLU activation.'lastconv+relu'
: Network up to'lastconv'
followed by ReLU activation.
References
- class nnabla.models.imagenet.MobileNet[source]¶
MobileNet architecture.
The following is a list of string that can be specified to
use_up_to
option in__call__
method;'classifier'
(default): The output of the final affine layer for classification.'pool'
: The output of the final global average pooling.'lastconv'
: The input of the final global average pooling without ReLU activation.'lastconv+relu'
: Network up to'lastconv'
followed by ReLU activation.
References
- class nnabla.models.imagenet.MobileNetV2[source]¶
MobileNetV2 architecture.
The following is a list of string that can be specified to
use_up_to
option in__call__
method;'classifier'
(default): The output of the final affine layer for classification.'pool'
: The output of the final global average pooling.'lastconv'
: The input of the final global average pooling without ReLU activation.'lastconv+relu'
: Network up to'lastconv'
followed by ReLU activation.
References
- class nnabla.models.imagenet.SENet[source]¶
SENet-154 model which integrates SE blocks with a modified ResNeXt architecture.
The following is a list of string that can be specified to
use_up_to
option in__call__
method;'classifier'
(default): The output of the final affine layer for classification.'pool'
: The output of the final global average pooling.'lastconv'
: The input of the final global average pooling without ReLU activation.'lastconv+relu'
: Network up to'lastconv'
followed by ReLU activation.
References
- class nnabla.models.imagenet.SqueezeNetV10[source]¶
SquezeNetV10 An alias of
SqueezeNet
('v1.0')
.
- class nnabla.models.imagenet.SqueezeNetV11[source]¶
SquezeNetV11 An alias of
SqueezeNet
('v1.1')
.
- class nnabla.models.imagenet.SqueezeNet(version='v1.1')[source]¶
SqueezeNet model for architecture-v1.0 and v1.1 .
- Parameters
version (str) – Version chosen from ‘v1.0’ and ‘v1.1’.
The following is a list of string that can be specified to
use_up_to
option in__call__
method;'classifier'
(default): The output of the final affine layer for classification.'pool'
: The output of the final global average pooling.'lastconv'
: The input of the final global average pooling without ReLU activation.'lastconv+relu'
: Network up to'lastconv'
followed by ReLU activation.
References
- class nnabla.models.imagenet.VGG(num_layers=11)[source]¶
VGG architectures for 11, 13, 16 layers.
- Parameters
num_layers (int) – Number of layers chosen from 11, 13, 16.
The following is a list of string that can be specified to
use_up_to
option in__call__
method;'classifier'
(default): The output of the final affine layer for classification.'pool'
: The output of the final global average pooling.'lastconv'
: The input of the final global average pooling without ReLU activation.'lastconv+relu'
: Network up to'lastconv'
followed by ReLU activation.'lastfeature'
: Network up to one layer before'classifier'
, but without activation.
References
- class nnabla.models.imagenet.NIN[source]¶
NIN(Network In Network) architecture.
The following is a list of string that can be specified to
use_up_to
option in__call__
method;'classifier'
(default): The output of the final affine layer for classification.'pool'
: The output of the final global average pooling.'lastconv'
: The input of the final global average pooling without ReLU activation.'lastconv+relu'
: Network up to'lastconv'
followed by ReLU activation.
References
- class nnabla.models.imagenet.DenseNet[source]¶
The following is a list of string that can be specified to
use_up_to
option in__call__
method;'classifier'
(default): The output of the final affine layer for classification.'pool'
: The output of the final global average pooling.'lastconv'
: The output from last denseblock.'lastconv+relu'
: Network up to'lastconv'
followed by ReLU activation.
References
- class nnabla.models.imagenet.InceptionV3[source]¶
InceptionV3 architecture.
The following is a list of string that can be specified to
use_up_to
option in__call__
method;'classifier'
(default): The output of the final affine layer for classification.'pool'
: The output of the final global average pooling.'prepool'
: The input of the final global average pooling, i.e. the output of the final inception block.
References
- class nnabla.models.imagenet.Xception[source]¶
Xception model.
The following is a list of string that can be specified to
use_up_to
option in__call__
method;'classifier'
(default): The output of the final affine layer for classification.'pool'
: The output of the final global average pooling.'lastconv'
: The input of the final global average pooling without ReLU activation.'lastconv+relu'
: Network up to'lastconv'
followed by ReLU activation.
References
- class nnabla.models.imagenet.GoogLeNet[source]¶
GoogLeNet model.
The following is a list of string that can be specified to
use_up_to
option in__call__
method;'classifier'
(default): The output of the final affine layer for classification.'pool'
: The output of the final global average pooling.'prepool'
: The input of the final global average pooling, i.e. the output of the final inception block.
References
- class nnabla.models.imagenet.ResNeXt(num_layers=50)[source]¶
ResNeXt architectures for 50 and 101 of number of layers.
- Parameters
num_layers (int) – Number of layers chosen from 50 and 101.
The following is a list of string that can be specified to
use_up_to
option in__call__
method;'classifier'
(default): The output of the final affine layer for classification.'pool'
: The output of the final global average pooling.'lastconv'
: The input of the final global average pooling without ReLU activation.'lastconv+relu'
: Network up to'lastconv'
followed by ReLU activation.
References
- class nnabla.models.imagenet.ShuffleNet10[source]¶
An alias of
ShuffleNet
(10)
.
- class nnabla.models.imagenet.ShuffleNet05[source]¶
An alias of
ShuffleNet
(5)
.
- class nnabla.models.imagenet.ShuffleNet20[source]¶
An alias of
ShuffleNet
(20)
.
- class nnabla.models.imagenet.ShuffleNet(scaling_factor=10)[source]¶
Model for architecture ShuffleNet, ShuffleNet-0.5x and ShufffleNet-2.0x.
- Parameters
Factor (Scaling) – To customize the network to a desired complexity, we can simply apply a scale factor on the number of channnels. This can be chosen from ‘10’, ‘5’ and ‘20’.
The following is a list of string that can be specified to
use_up_to
option in__call__
method;'classifier'
(default): The output of the final affine layer for classification.'pool'
: The output of the final global average pooling.'lastconv'
: The input of the final global average pooling without ReLU activation.'lastconv+relu'
: Network up to'lastconv'
followed by ReLU activation.
References
Object Detection Models¶
This subpackage provides a pre-trained state-of-the-art models for the purpose of object detection which is trained on ImageNet dataset and fine-tuned on Pascal VOC and MS COCO dataset.
The pre-trained models can be used for both inference and training as following:
# Import required modules
import nnabla as nn
from nnabla.models.object_detection import YoloV2
from nnabla.models.object_detection.utils import (
LetterBoxTransform,
draw_bounding_boxes)
from nnabla.utils.image_utils import imread, imsave
import numpy as np
# Set device
from nnabla.ext_utils import get_extension_context
nn.set_default_context(get_extension_context('cudnn', device_id='0'))
# Load and create a detection model
h, w = 608, 608
yolov2 = YoloV2('coco')
x = nn.Variable((1, 3, h, w))
y = yolov2(x)
# Load an image and scale it to fit inside the (h, w) frame
img_orig = imread('dog.jpg')
lbt = LetterBoxTransform(img_orig, h, w)
# Execute detection
x.d = lbt.image.transpose(2, 0, 1)[None]
y.forward(clear_buffer=True)
# Draw bounding boxes to the original image
bboxes = lbt.inverse_coordinate_transform(y.d[0])
img_draw = draw_bounding_boxes(
img_orig, bboxes, yolov2.get_category_names())
imsave("detected.jpg", img_draw)
Name |
Class |
mAP |
Training framework |
Notes |
---|---|---|---|---|
YoloV2 |
44.12 |
Darknet |
Weights converted from author’s model |
Name |
Class |
mAP |
Training framework |
Notes |
---|---|---|---|---|
YoloV2 |
76.00 |
Darknet |
Weights converted from author’s model |
Common interfaces¶
- class nnabla.models.object_detection.base.ObjectDetection[source]¶
- __call__(input_var=None, use_from=None, use_up_to='detection', training=False, returns_net=False, verbose=0)[source]¶
Create a network (computation graph) from a loaded model.
- Parameters
input_var (Variable, optional) – If given, input variable is replaced with the given variable and a network is constructed on top of the variable. Otherwise, a variable with batch size as 1 and a default shape from
self.input_shape
.use_up_to (str) – Network is constructed up to a variable specified by a string. A list of string-variable correspondences in a model is described in documentation for each model class.
training (bool) – This option enables additional training (fine-tuning, transfer learning etc.) for the constructed network. If True, the
batch_stat
option in batch normalization is turnedTrue
, andneed_grad
attribute in trainable variables (conv weights and gamma and beta of bn etc.) is turnedTrue
. The default isFalse
.returns_net (bool) – When
True
, it returns aNnpNetwork
object. Otherwise, It only returns the last variable of the constructed network. The default isFalse
.verbose (bool, or int) – Verbose level. With
0
, it says nothing during network construction.
- property input_shape¶
Should returns default image size (channel, height, width) as a tuple.
- class nnabla.models.object_detection.utils.LetterBoxTransform(image, height, width)[source]¶
Create an object holding a new letterboxed image as
image
attribute.Letterboxing is defined as scaling the input image to fit inside the desired output image frame (letterbox) while preserving the aspect ratio of the original image. The pixels that are not filled with the original image pixels become 127.
The created object also provides a functionality to convert bounding box coordinates back to the original image frame.
- Parameters
image (numpy.ndarray) – An uint8 3-channel image
height (int) – Letterbox height
width (int) – Letterbox width
- inverse_coordinate_transform(coords)[source]¶
Convert the bounding boxes back to the original image frame.
- Parameters
coords (numpy.ndarray) –
N
xM
array whereM >= 4
and first 4 elements ofM
arex
,y
(center coordinates of bounding box),w
andh
(bouding box width and height).
- nnabla.models.object_detection.utils.draw_bounding_boxes(img, bboxes, names, colors=None, thresh=0.5)[source]¶
The transformed cordinates are further used to draw bounding boxes for the detected objects.
- Parameters
img (numpy.ndarray) – Input image
bboxes (numpy.ndarray) – Transformed bounding box coorinates from the model.
names (list of str) – Name of categories in the dataset
colors (list of tuple of 3 ints) – Colors for bunding boxes
thresh (float) – Threshold of bounding boxes.
List of models¶
- class nnabla.models.object_detection.YoloV2(dataset='voc')[source]¶
The following is a list of string that can be specified to
use_up_to
option in__call__
method;'detection'
(default): The output from the last convolution (detection layer) after post-processing.'convdetect'
: The output of last convolution without post-processing.'lastconv'
: Network till the convolution layer+relu which comes before detection convolution layer.
References
Semantic Segmentation Models¶
This subpackage provides a pre-trained state-of-the-art model for the purpose of semantic segmentation (DeepLabv3+, Xception-65 as backbone) which is trained on ImageNet dataset and fine-tuned on Pascal VOC and MS COCO dataset.
The pre-trained models can be used for inference as following:
#Import reauired modules
import numpy as np
import nnabla as nn
from nnabla.utils.image_utils import imread
from nnabla.models.semantic_segmentation import DeepLabV3plus
from nnabla.models.semantic_segmentation.utils import ProcessImage
target_h = 513
target_w = 513
# Get context
from nnabla.ext_utils import get_extension_context
nn.set_default_context(get_extension_context('cudnn', device_id='0'))
# Build a Deeplab v3+ network
image = imread("./test.jpg")
x = nn.Variable((1, 3, target_h, target_w), need_grad=False)
deeplabv3 = DeepLabV3plus('voc-coco',output_stride=8)
y = deeplabv3(x)
# preprocess image
processed_image = ProcessImage(image, target_h, target_w)
input_array = processed_image.pre_process()
# Compute inference
x.d = input_array
y.forward(clear_buffer=True)
print ("done")
output = np.argmax(y.d, axis=1)
# Apply post processing
post_processed = processed_image.post_process(output[0])
#Display predicted class names
predicted_classes = np.unique(post_processed).astype(int)
for i in range(predicted_classes.shape[0]):
print('Classes Segmented: ', deeplabv3.category_names[predicted_classes[i]])
# save inference result
processed_image.save_segmentation_image("./output.png")
Name |
Class |
Output stride |
mIOU |
Training framework |
Notes |
---|---|---|---|---|---|
DeepLabv3+ |
8 |
81.48 |
Nnabla |
Backbone (Xception-65) weights converted from author’s model and used for finetuning |
|
DeepLabv3+ |
16 |
82.20 |
Nnabla |
Backbone (Xception-65) weights converted from author’s model and used for finetuning |
Name |
Class |
Output stride |
mIOU |
Training framework |
Notes |
---|---|---|---|---|---|
DeepLabv3+ |
8 |
82.20 |
Tensorflow |
Weights converted from author’s model |
|
DeepLabv3+ |
16 |
83.58 |
Tensorflow |
Weights converted from author’s model |
Common interfaces¶
- class nnabla.models.semantic_segmentation.base.SemanticSegmentation[source]¶
Semantic Segmentation pretrained models are inherited from this class so that it provides some common interfaces.
- __call__(input_var=None, use_from=None, use_up_to='segmentation', training=False, returns_net=False, verbose=0)[source]¶
Create a network (computation graph) from a loaded model.
- Parameters
input_var (Variable, optional) – If given, input variable is replaced with the given variable and a network is constructed on top of the variable. Otherwise, a variable with batch size as 1 and a default shape from
self.input_shape
.use_up_to (str) – Network is constructed up to a variable specified by a string. A list of string-variable correspondences in a model is described in documentation for each model class.
training (bool) – This option enables additional training (fine-tuning, transfer learning etc.) for the constructed network. If True, the
batch_stat
option in batch normalization is turnedTrue
, andneed_grad
attribute in trainable variables (conv weights and gamma and beta of bn etc.) is turnedTrue
. The default isFalse
.returns_net (bool) – When
True
, it returns aNnpNetwork
object. Otherwise, It only returns the last variable of the constructed network. The default isFalse
.verbose (bool, or int) – Verbose level. With
0
, it says nothing during network construction.
- property input_shape¶
Should return default image size (channel, height, width) as a tuple.
List of models¶
- class nnabla.models.semantic_segmentation.DeepLabV3plus(dataset='voc', output_stride=16)[source]¶
DeepLabV3+.
- Parameters
dataset (str) – Specify a training dataset name from ‘voc’ or ‘coco’.
output_stride (int) – DeepLabV3 uses atrous (a.k.a. dilated) convolutions. The atrous rate depends on the output stride. the output stride has to be selected from 8 or 16. Default is 8. If the output_stride is 8 the atrous rate will be [12,24,36] and if the output_stride is 16 the atrous rate will be [6,12,18].
The following is a list of string that can be specified to
use_up_to
option in__call__
method;'segmentation'
(default): The output of the final layer.'lastconv'
: The output from last Convolution.'lastconv+relu'
: Network up to'lastconv'
followed by ReLU activation.
References
Out-of-core execution¶
The nnabla.lms
package provides APIs that allow users to execute large-scale networks than allotted GPU memory by utilizing out-of-core algorithm.
Out-of-core algorithm, or external memory algorithm, is an algorithm that enables processing data that are too large to fit into a main memory at once.
SwapInOutScheduler¶
- class nnabla.lms.SwapInOutScheduler¶
Interface class for out-of-core execution / training.
This API enables training neural networks whose size is larger than alloted GPU memory. See https://arxiv.org/abs/2010.14109 for more detail of shcheduling strategy.
Note
cuda_init.prefer_cuda_virtual_array()
used in following example can be used under cuda >= 10.2 and cudnn >= 8. We utilize virtual memory management supported from cuda 10.2. Additionally, when we tested virtual memory management with cuda >= 10.2 and cudnn < 8, we found the computation results of some cuddn functions are inaccurate. So, when your environment has cuda < 10.2 or cudnn < 8, the virtual memory allocator in nnabla will not be built and you can’t use it. If you would like to use SwapInOutScheduler to the fullest extent, please install cuda >= 10.2 and cudnn >= 8 and reinstall the corresponding nnabla-ext-cuda package.Example:
from nnabla.lms import SwapInOutScheduler # Change memory allocator which is preferable for SwapInOutScheduler. from nnabla_ext.cuda.init as cuda_init cuda_init.prefer_cpu_pinned_array() # To accelerate memory transfer, using pinned memory for cpu memory will be preferable. # Only for cuda >= 10.2 and cudnn >= 8. This setting is the best for SwapInOutScheduler. cuda_init.prefer_cuda_virtual_array() # To reduce memory fragmentation due to cpu-gpu memory transfers, using virtual allocator for gpu memory will be preferable. # create context for both host and device from nnabla.ext_utils import get_extension_context host_ctx = get_extension_context("cpu", device_id="", type_config="float") # device_id is dummy device_ctx = get_extension_context("cudnn", device_id="0", type_config="float") scheduler = SwapInOutScheduler(host_ctx, device_ctx, size=max_gpu_memory_size) # Make sure to call `nn.set_default_context` after calling prefer_xxx_array() to activate a change of memory preference. nn.set_default_context(device_ctx) x = nn.Variable(...) loss = build_network(x) solver = S.Sgd(nn.get_parameters()) for i in range(iteration): # scheduling memory transfers for all tensors appearing under the context of scheduler. with scheduler: x.d = next_data() loss.forward(clear_no_need_grad=True) solver.zero_grad() loss.backward(clear_buffer=True) solver.update()
When you get Out-of-Memory (OOM) error under the SwapInOutScheduler, possibly there are 2 options to avoid this OOM.
Set small budget of GPU memory for scheduling.
Set small size for a physical memory chunk allocated by virtual memory allocator.
These are examplified as follows:
Example:
# 1. Set small budget of GPU memory for scheduling # You can reduce the below ratio until you can execute your network. memsize_for_scheduler = max_gpu_memory_size * 0.8 scheduler = SwapInOutScheduler(..., size=memsize_for_scheduler) # 2. Set small size for a physical memory chunk allocated by virtual memory allocator # In default, the chunk size is set as 20MB (20 << 20). from nnabla_ext.cuda.init import set_cuda_virtual_memory_chunk_size set_cuda_virtual_memory_chunk_size(2 << 20) # Set 2MB, for example.
- end_scheduling(self)¶
An interface to specify the end point for scheduling. A range between
start_scheduling()
andend_scheduling()
is a target for a single scheduling.Note that, using with statement of SwapInOutScheduler,
end_scheduling()
will be automatically called when exiting with statement. In general, avoid to usestart_scheduling()
andend_scheduling()
and use with statement insted (with scheduler:
, see an example above).
- function_post_hook(self, func)¶
A callback executed as
function_post_hook
in forward and backward.For all forward and backward wrapped by with statement of SwapInOutScheduler, this callback is automatically set. In general, avoid to set this manually and use with statement of SwapInOutScheduler.
- function_pre_hook(self, func)¶
A callback executed as
function_pre_hook
in forward and backward.For all forward and backward wrapped by with statement of SwapInOutScheduler, this callback is automatically set. In general, avoid to set this manually and use with statement of SwapInOutScheduler.
- start_scheduling(self)¶
An interface to specify the starting point for scheduling. A range between
start_scheduling()
andend_scheduling()
is a target for a single scheduling.Note that, using with statement of SwapInOutScheduler,
start_scheduling()
will be automatically called when entering with statement. In general, avoid to usestart_scheduling()
andend_scheduling()
and use with statement insted (with scheduler:
, see an example above).
- update_post_hook(self)¶
A callback executed as
post_hook
in all solver functions, e.g. solver.update, solver.weight_decay, solver.clip_grad_by_norm, and so on.For all solver functions wrapped by with statement of SwapInOutScheduler, this callback is automatically set. In general, avoid to set this manually and use with statement of SwapInOutScheduler.
- update_pre_hook(self)¶
A callback executed as
pre_hook
in all solver functions, e.g. solver.update, solver.weight_decay, solver.clip_grad_by_norm, and so on.For all solver functions wrapped by with statement of SwapInOutScheduler, this callback is automatically set. In general, avoid to set this manually and use with statement of SwapInOutScheduler.
Modules¶
The nnabla.core.module.Module
class represents a construction block of neural network.
Module¶
- class nnabla.core.module.Module[source]¶
Module is a construction block of a computation model. Modules normally are constructed by lower level operators or other Modules, thus, nesting them in a tree-like structure may construct a more complex computation model.
Example
User may construct his model by derived from this class. Like:
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.functions as F class ConvBn(nn.Module): def __init__(self, outmaps, kernel=1, stride=1, act=None): self.outmaps = outmaps self.kernel = kernel self.stride = stride self.act = act def call(self, x, training=True): kernel = (self.kernel, self.kernel) pad = (self.kernel // 2, self.kernel // 2) stride = (self.stride, self.stride) h = PF.convolution(x, self.outmaps, kernel, pad, stride, with_bias=False) h = PF.batch_normalization(h, batch_stat=training) if self.act is None: return h return self.act(h) class ResUnit(nn.Module): def __init__(self, channels, stride=1, skip_by_conv=True): self.conv1 = ConvBn(channels // 4, 1, 1, act=lambda x: F.relu(x, inplace=True)) self.conv2 = ConvBn(channels // 4, 3, stride, act=lambda x: F.relu(x, inplace=True)) self.conv3 = ConvBn(channels, 1) self.skip_by_conv = skip_by_conv self.skip = ConvBn(channels, 1, stride) def call(self, x, training=True): h = self.conv1(x) h = self.conv2(h) h = self.conv3(h) s = x if self.skip_by_conv: s = self.skip(s) h = F.relu(F.add2(h, s, inplace=True), inplace=True) return h
To use this model, user may do like the following code:
res_unit = ResUnit(1024) x = nn.Variable((64, 3, 32, 32)) x.d = np.random.random(x.shape) y = res_unit(x) y.forward(clear_buffer=True)
For working with dynamic network, user may do like the following:
res_unit = ResUnit(1024) with nn.auto_forward(): x = nn.Variable.from_numpy_array(np.random.random((1, 3, 32, 32))) y = res_unit(x) print(y.d)
For training, please set the parameters in module scope to optimizer. For example,
import nnabla.solvers as S resnet = ResNet(18) loss = resnet(x, y_) solver = S.Sgd(lr=1e-3) solver.set_parameters(resnet.get_parameters()) for _ in range(max_iter): x.d, y_.d = data.next() loss.forward() solver.zero_grad() loss.backward() solver.weight_decay(1e-5) solver.update()
In this example, we supposed ResNet is a derived class of Module, x, y_ is
Variable
,data
is an instance ofDataIterator
, supposed it has already been attached to a DataSet.- Note:
From this example, we knew that model parameters are owned by model. Here it is variable
resnet
. These parameters will be referred when network is forward or backward or solve.update(). Hence, it is necessary to keep this module instance from being unexpectedly released, to ensure forward() or backward() can refer to these variables.
- call(*args, **kwargs)[source]¶
User needs implement this function to construct their neural network. In the implementation, user may instantiate existing predefined Modules as its members, then use it. For example:
class AModule(nn.Module): def __init__(...): ... self.cnb = ConvBN(128) # A submodule is instantiated here. def call(...): h = self.cnb(x) # Using beforehand instantiated submodule.
or directly use parametric functions or functions:
class AModule(nn.Module): ... def call(...): ... h = PF.convolution(x, self.outmaps, ...) return h
Note
The following usage is currently not supported, it might be supported in future version:
class AModule(nn.Module): def __init__(...): ... self.cnb = [ConvBN(k) for k in [8, 16, 32]] # using an array to hold module instances. self.cnb = {f'name_{k}': ConvBN(k) for k in [8, 16, 32]} # using a dict to hold module instances.
Note
The following method to temporarily instantiate a module is also not allowed:
class AModule(nn.Module): def call(...): ... cnb = ConvBN(k) # Instantiate a temporary instance of Module is not allowed y = cnb(x) return y
Because when leave this scope, the parameters registered to
cnb
module will be released, which cause unexpected result.
- get_parameters(recursive=True, grad_only=False, memo=None)[source]¶
Obtain an OrderedDict object of all parameters in current Module.
For example,
x = nn.Variable.from_numpy_array((np.random.random((8, 32, 256, 256)))) conv_bn = ConvBn(2) y = conv_bn(x) params = conv_bn.get_parameters() for parameter_name, parameter_value in params.items(): print("{}:{}".format(parameter_name, parameter_value.shape))
The output looks like:
conv/W:(2, 32, 1, 1) bn/beta:(1, 2, 1, 1) bn/gamma:(1, 2, 1, 1) bn/mean:(1, 2, 1, 1) bn/var:(1, 2, 1, 1)
Notice that the parameter name looks like a filepath, with splash separated nested scope name. In addition, module name default is used with a prefix
@
.- Parameters
- Returns
Flattened parameter’s name-value pairs of current Module.
- Return type
OrderedDict
- load_parameters(path, extension='.h5')[source]¶
Load parameters from a file into this module.
- Parameters
path – str or file-like object
- property parameter_scope¶
A module has its owned parameter_scope, which can avoid to pollute global parameter name space. User may obtain the parameter_scope of a module by this property.
- Returns
The parameter scope of current Module.
- Return type
OrderedDict
- save_parameters(path, extension='.h5')[source]¶
Save parameters of this module to a file.
- Parameters
path – str or file-like object
- property training¶
Return a bool value which indicates whether current Module is in training state or not. A module may be set to training state or not, so that the computation graph created from this module can be changed according to this state. For example,
class ConvBN(Module): ... def call(self, x): h = self.conv1(x) if self.training: h = self.drop_out(h) h = F.relu(h, inplace=True) return h conv_bn = ConvBN() conv_bn.training = True train_y = conv_bn(x) conv_bn.training = False eval_y = conv_bn(x)
- Returns
which indicates whether current Module is in training state.
- Return type
Graph Definition¶
In NNabla, Graph Definition represents a kind of representation of a computation graph which is special designed for storage optimization and format converter.
A computation graph can be defined by the call of NNabla functions. Such computation graph has instantiated the input and output variables of the functions, inherent topology has been established for forward or backward computation. But for persistence of such graph, another abstract representation, so-called protobuf graph(or network), as abbreviation - proto graph is used normally. In this graph, only the information being necessary for persistence are kept, the information only used for computation will be dropped.
Graph Definition provides a group of functions and classes, tends to facilitate user creates protobuf network from their computation graph, and saving and restoring their neural network from a persistent protobuf network representation.
ProtoGraph¶
- class nnabla.graph_def.ProtoGraph(networks=None, parameter_scope=None)¶
This class represents a group of proto networks. It normally corresponds to a
.nnp
file. In a.nnp
file, there might be one or multiple networks, for example, there might be a network for doing directly inferring, another network with similar network structure, sharing same parameters, using for training. This class works as a container of proto networks, providing a group of functions for accessing proto network by its name. Especially, when there is only one network in it, also some short-cut functions are provided for directly operating with this network. For example,import nnabla as nn g = nn.graph_def.load("my_model.nnp") # Suppose there is only one network in this file. x1 = nn.Variable(input_shape) x1.d = ... # load data here. y1 = g.networks['executor_net'](x1) #<== (1) y1.forward() print(y1.d) x2 = nn.Variable(input_shape) x2.d = ... # load data here. y2 = g(x2) #<== (2) # y2 = g.default_graph()(x2) #<== (3) y2.forward() print(y2.d)
The computation graph
y1
andy2
are exactly same. (2) and (3) are equal. If there are multiple networks in a graph, the first network being loaded acted as itsdefault
network. Please not usedefault_graph()
when there are multiple networks in graph, since thedefault
heavily depends on concrete implementation.If you know the name of each network, you may access proto network in this graph by its member name. For example,
g = nn.graph_def.load("my_model.nnp") # Suppose there is only one network in this file. x = nn.Variable(input_shape) x.d = ... # load data here. y = g.executor_net(x) # here, we knew there is a network named "executor_net" existed. y.forward() print(y.d)
- as_proto(include_parameter=False, only_parameter=False, networks=None, variable_batch_size=True)¶
This function exports a protobuf data structure, which can be manipulated by google protobuf APIs.
- Parameters
include_parameter (bool, optional, default=False) – Whether exports the parameters to protobuf data structure.
only_parameter (bool, optional, default=False) – Whether only exports the parameters to protobuf data structure.
networks (array of proto networks, optional, default=None) – User may provides their networks to export a protobuf data structure.
variable_batch_size (bool, optional, default=True) – Replace batch size of current network with an abstract placeholder, so that batch size can be replaced with other value in use time.
- property current_context¶
Current backend context of this proto network.
- default_graph()¶
This function returns default proto network in this graph. Which network is
default
graph depends on its loading sequence. Hence, it is safe to be used when there is only one network.
- expand_loop_control()¶
This function expands loop control statements for all networks in this graph.
- static from_proto(proto, batch_size=None, param_scope=None, rng=None)¶
This function create a proto graph object from a protobuf data structure.
- Parameters
proto (protobuf object) – A protobuf data structure.
batch_size (int, optional, default=None) – The batch size will be applied to this graph. If it is None, it is pending to apply a the batch size value.
param_scope (OrderedDict, optional, default=None) – User may provide an owned parameter scope.
rng (np.random.RandomState, optional, default=None) – A random seed, which is used in parameter initialization.
- get_parameters(grad_only=False)¶
Get parameters in current module name scope.
ProtoNetwork¶
- class nnabla.graph_def.ProtoNetwork(owner, name=None, batch_size=None)¶
This class represents a protobuf network, which comes from a corresponding computation graph or restored from a saved protobuf network(e.g.
.nnp
file).This class describes a neural network by the following members:
functions: An OrderedDict of name-value pairs, the value is ProtoFunction object.
variables: An OrderedDict of name-value pairs, the value is ProtoVariable object.
parameters: An OrderedDict of name-value pairs, the value is ProtoVariable object.
inputs: A string list, which contains the name of input variables of this network.
outputs: A string list, which contains the name of output variables of this network.
variables
represents activations in networks,parameters
mainly includes weights and all learnable parameters.functions
represents functions in networks, the sequence of functions might not equal forward sequence. Please useforward_sequence
to obtain exactly forward function sequence.- __call__(*args, **kwargs)¶
Generate a computation graph of this protonetwork.
- Parameters
args (tuple of nn.Variables or None) –
The inputs of network, which can be different from the inputs of original computation graph as long as the network allows.
For example,
import nnabla as nn import numpy as np resnet = nn.graph_def.load("resnet.nnp") x.d = np.random.random(input_shape) y = resnet(x)
The variable y corresponding to a computation graph, user may perform forward like:
y.forward()
If user does not provide inputs for this function, because proto network has the memory of network inputs, this function will create corresponding nn.Variable objects as the inputs of this network. This input variables actually are placeholder of input, hence, users need to find these input variables and fill actual value to these placeholders, so that this computation graph is ready for forward or backward.
For example,
g = nn.graph_def.load("resnet.nnp") y = g() # Not provide input variables
To feed training or evaluation data to this network, user needs to locate input variable, for example:
input = g.networks[network_name].variables[input_name].variable_instance input.d = np.random.random(input_shape)
- batch_size (int, optional, default=None):
If provided, batch_size will be applied for newly created computation graph. For example,
g = nn.graph_def.load("my_model.nnp") y = g(batch_size=32)
In this sample,
batch_size
will be used to create a computation graph with specified batch size. Supposedx
is the input of network, its original shape is (1, 3, 32, 32), then the actual computation graph will be (32, 3, 32, 32).
- as_proto(**kwargs)¶
This function returns a protobuf data structure, which can be directly accessed by the functions in
nnabla.utils.nnabla_pb2
. Thus, it allows user further manipulates this protobuf representation, for example, performing format converting, or network structure optimization.- Parameters
variable_batch_size (bool, optional) – If true, the batch size of the network will be replaced with an abstract representation, so that it can be replaced with other value in restoring computation graph.
- Returns
A protobuf object.
- Return type
protobuf
- execute_on_proto(execute)¶
This function performs a virtual forward, following the sequence from inputs to output. This function does not use recursive call to perform graph-travel, instead, a non-recursive algorithm is used to graph-travel. For each function,
execute
is called when meet a function, a ProtoFunction object is passed in for further operation with this function.- Parameters
execute (callable) –
A callback function (or callable object), which is called when each ProtoFunction is met in travelling graph.
execute
should look like:def execute(pf: ProtoFunction): # Do what you want to do with pf pass
Or:
class MyCallback: def __call__(pf: ProtoFunction): # Do what you want to do with pf pass
- expand_loop_control()¶
This function expand
loop control
statement and generate a new proto network object withoutloop control
statement.loop control
statement cannot be created by python code, it can be only created by interactive neural network design tool. The following briefly introduce its specification:- As for variable,
In nntxt, if the variable includes a field repeat_id, it means that this variable is in surround with a loop control structure. A renaming rule is applied if expanding this network. The variable name will be added a postfix, like:
For old style, e.g.:
BatchNormalization_6/bn/mean --> BatchNormalization_6/bn/mean_RepeatStart[0] ^ ^ repeat_time repeat_id[index] original_name --> original_name + << _%repeat_id%[%repeat_time%], for each in repeat_id >>
For new style, e.g.:
BatchNormalization_6{RepeatStart}/bn/mean --> BatchNormalization_6[0]/bn/mean_RepeatStart ^ repeat_time original_name --> original_name + << [%repeat_time%], for each in repeat_id >>
- As for RepeatStart, RepeatEnd
The functions or variables nodes between these 2 layers will be repeated. Expanding will create times of functions and variables, and connected them each other.
- As for RecurrentInput,
Axis of RecurrentParam points out which axis will be split-ed. And each branch will duplicated the functions and variables with this repeat_id. This layer works like a split function.
- As for RecurrentOutput,
RecurrentOutput merge multiple branches into one output, looks like a stack function.
- As for Delay
First time, the output is its input[1], after that, the output is its input[0]
- forward_sequence()¶
This function creates an iteratable for iterating functions in network with the sequence of actually forward.
For example,
for pf in proto_network.forward_sequence(): print(pf.name)
- promote(callback)¶
User may manipulate a proto network by a callback, like NnpNetworkPass.
- Parameters
callback (NnpNetworkPass,) – Currently, only NnpNetworkPass object is supported as a network promotion callback.
Developers may manipulate a proto network by a modifier, acts as a callback. nnabla.utils.nnp_graph.NnpNetworkPass is a kind of modifier. The following gives a simple example to illustrate this usage:
Example
from nnabla as nn from nnabla.utils import nnp_graph verbose = 1 callback = nnp_graph.NnpNetworkPass(verbose) @callback.on_generate_function_by_name('Convolution') def change_convolution_param(f): print('{}'.format(f.proto.convolution_param.pad.dim[:])) f.proto.convolution_param.pad.dim[:] = [1, 1] return f g = nn.graph_def.load("my_model.nnp") n = g.default_graph().promote(callback) x = nn.Variable(input_shape) y = n(x) y.forward()
In this example, a callback is defined to change pad of a Convolution function, locating this target function by the name of function, here, only the function with the name
'Convolution'
is located and operated.
- save(filename, include_parameter=False, variable_batch_size=True)¶
This function saves current proto network to a file, which is specified by filename, normally, e.g. a .nnp file.
- Parameters
filename (str) – string filename, its extension name is used to determine the file format. The extension name normally is .nnp.
include_parameter (bool, optional, default=False) – Whether saving parameters to protobuf tree.
variable_batch_size (bool, optional, default=True) – Whether replace current network’s batch size dimension with an abstract representation. If it is true, it is possible to use another batch size when this network is reused.
ProtoVariable¶
- class nnabla.graph_def.ProtoVariable(shape, name=None, need_grad=False, var_type='Buffer')¶
This class represents a variable, so-called proto variable. Passing this variable to network definition, a proto network will be generated in a proto graph scope. If this procedure is done under a with statement as
g
, a proto network will be generated ing
. Otherwise, a global graph scope is used, a proto network will be generated in global graph scope.
ProtoFunction¶
- class nnabla.graph_def.ProtoFunction(func, f_type, args, name=None, owner=None)¶
This class represent a function that is used to define a proto network.
- There are the following properties to describe a proto function:
name: The name of this function.
type: The type of this function, e.g. ReLU.
inputs: An array of string name, which represents the proto variables of inputs.
outputs: An array of string name, which represents the proto variables of outputs.
- graph_call(**kwargs)¶
This function create function instance for generating computation graph.
load¶
- nnabla.graph_def.load(filename, batch_size=None, exclude_parameter=False, parameter_only=False, extension='.nntxt', parameter_scope=None, rng=None)¶
Load network from files
- Parameters
filename (str or list or file-like object) – Filename string ,list of filenames or file-like object.
batch_size (int) – The batch size expected to be set.
exclude_parameter (bool) – If True, only load model, not load parameters of this model. Default is False.
parameter_only (bool) – If True, only load model parameters. Default is False.
extension (str) – This parameter is needed when filename is a file-like object. Default is
.nntxt
.parameter_scope (OrderedDict) – User may provide a user owned parameter scope. If this parameter is not provided, loaded parameters will be created in created proto_graph’s parameter_scope. This parameter_scope is default initialized with empty dictionary.
rng (random state) – User may specify random state, which cause parameters are initialized by determined random seed.
- Returns
A ProtoGraph object, in which, there are one or multiple ProtoNetwork objects.
- Return type
Example
The following example loads a model and generate the output variable through this model:
import nnabla as nn import nnabla.functions as F import nnabla.parametric_functions as PF def fusion_net(x): def unit(i, prefix): c1 = PF.convolution(i, 4, (3, 3), pad=(1, 1), name=prefix + '-c1') c2 = PF.convolution(F.relu(c1), 4, (3, 3), pad=(1, 1), name=prefix + '-c2') c = F.add2(c2, c1, inplace=True) return c c = unit(x, 'c1') c2 = unit(c, 'c2') y = PF.affine(c2, 5, name='fc') return y x = nn.ProtoVariable((64, 3, 32, 32)) y = fusion_net(x) g = nn.graph_def.get_default_graph() # Get generated graph_def g.save("fusion_net.nnp") ... g = nn.graph_def.load("fusion_net.nnp") x = nn.Variable((64, 3, 32, 32)) x.d = ... # user provided input data for this graph y = g(x) # create computation graph by passing in nn.Variable() y.forward() # calculate output by this graph ... # You may use your special context(e.g. cuda context) with context_scope(ctx): y = g(x) # create computation graph representation with specified backend context. y.forward() # forward using specified backend
save¶
- nnabla.graph_def.save(filename, content, include_parameters=False, variable_batch_size=True, extension='.nnp')¶
Save network
- Parameters
filename (str or file object) –
Filename to store information. The file extension is used to determine the saving file format.
.nnp
: (Recommended) Creating a zip archive with nntxt (network definition etc.) and h5 (parameters)..nntxt
: Protobuf in text format..protobuf
: Protobuf in binary format (unsafe in terms ofbackward compatibility).
content (list) – Currently only ProtoGraph or PhotoNetwork objects are supported.
include_parameters (bool) – Includes parameter into single file. This is ignored when the extension of filename is nnp.
variable_batch_size (bool) – Whether or not convert batch size of computation graph to a special value, so that user may use any other batch_size value when using it.
Example
The following example creates a two inputs and two outputs MLP, and save the network structure and the initialized parameters.
import nnabla as nn import nnabla.functions as F import nnabla.parametric_functions as PF def mlp_module(x0, x1): h1_0 = PF.affine(x0, 100, name='affine1_0') h1_1 = PF.affine(x1, 100, name='affine1_0') h1 = F.tanh(h1_0 + h1_1) h2 = F.tanh(PF.affine(h1, 50, name='affine2')) y0 = PF.affine(h2, 10, name='affiney_0') y1 = PF.affine(h2, 10, name='affiney_1') return y0, y1 with nn.graph_def.graph() as g: x0 = nn.ProtoVariable((64, 100)) x1 = nn.ProtoVariable((64, 100)) y0, y1 = mlp_module(x0, x1) nn.graph_def.save("mlp_net.nnp", [g])
Create Protobuf Representation from Computation Graph¶
create_graph_from_variable¶
- nnabla.graph_def.create_graph_from_variable(name, variables, names=None, parameter_scope=None)¶
Create a Proto Graph from one or multiple outputs.
If developers have a computation graph, it means that they have a nn.Variable() object, it might be loss of a network or an output variable of an executor network, this variable inherently corresponds to a computation network. From these variables, we can create corresponding proto network by this function.
- Parameters
name (str) – The name of generated proto_network.
variables (nn.Variables) – One or multiple variables, if multiple variables, this function adds a sink function to reduce these multiple outputs to one.
names (dict, optional, default=None) – A name to nn.Variable mapping table. This function default names all activation variables and parameters with internal naming rule. But sometimes, developers expects specially name some of variables so that these variable can be accessed conveniently. In generating proto network, when the variable occurs in this mapping table, the corresponding name of that variable will be used to name the variable in proto network.
parameter_scope (OrderedDict, optional, default=None) – Developers may provide a parameter scope, thus, when create proto networks, the name will be replaced if corresponding variable is found in specified parameter_scope, which make the name of weights or some parameters meaningful.
Example
import nnabla as nn x = nn.Variable((1, 3, 32, 32)) y = my_model(x) g = nn.graph_def.create_graph_from_variable("proto_network_name", y) g.save("my_model.nnp")
get_default_graph¶
- nnabla.graph_def.get_default_graph(*args, **kwargs)¶
This function obtain current default graph_def.
If user does not create their proto network in a with statement scope, proto network will default be created in a global scope. User may retrieve this proto graph by this function.
Example
import nnabla as nn from nnabla.core.modules import ResUnit resunit = ResUnit(16) input = nn.ProtoVariable((64, 3, 32, 32)) y = resunit(input) graph_def = nn.graph_def.get_graph_graph()
Note
If user does not ensure whether there is any previous existing proto graph remained in global graph scope. It is better to call reset_default_graph(). If user uses with statement like
with nn.graph_def.graph() as g
, this point is no need to care about.- Returns
A proto graph is returned
- Return type
get_default_graph_by_variable¶
- nnabla.graph_def.get_default_graph_by_variable(proto_variable)¶
This function obtain a specify network by its outputs.
User may retrieve one of the networks in default proto graph scope, if this network has the specified outputs. Let us image that there is a global proto graph, when user passed a ProtoVariable to model, during the procedure that create output variables, proto network is generated in this global proto graph. By this function, user may retrieve this generated proto network, saving it or do any other operations.
Note
This proto network will become invalid after reset_default_graph(). For example,
proto_variable_inputs = [nn.ProtoVariable(v.d.shape) for v in inputs] outputs = module(*proto_variable_inputs) net = nn.graph_def.get_default_graph_by_variable(outputs[0]) ... nn.graph_def.reset_default_graph() y = net(x) # Cannot access net anymore, become invalid at this point
graph¶
- nnabla.graph_def.graph(**kwargs)¶
This function is only used in with statement.
- Parameters
name (str, optional, default=None) – User may specify a name for the generated proto network. This name is useful for saving to .nnp.
parameter_scope (OrderedDict, optional, default=None) – User may specify a parameter scope, thus, the parameters are created during creating model will be placed into this parameter scope.
For example,
reset_default_graph¶
- nnabla.graph_def.reset_default_graph()¶
This function clear all information in global graph scope.
Sequential¶
The nnabla.core.sequential.Sequential
class represents a construction block of neural network.
Sequential¶
- class nnabla.core.sequential.Sequential(*args, **kwargs)[source]¶
A sequential block. User may construct their network by a sequential block. Importantly, the component within sequential block must be an instance of nn.Module.
For intuitive understanding, some small examples as follows:
import nnabla as nn import nnabla.parametric_functions as PF import nnabla.functions as F class ConvLayer(nn.Module): def __init__(self, outmaps, kernel, stride=1, pad=0): self.outmaps = outmaps self.kernel = (kernel, kernel) self.pad = (pad, pad) self.stride = (stride, stride) def call(self, x): x = PF.convolution(x, outmaps=self.outmaps, kernel=self.kernel, pad=self.pad, stride=self.stride) x = F.relu(x) return x # Example of using Sequentional layer = nn.Sequential( ConvLayer(48, kernel=1), ConvLayer(64, kernel=3, pad=1) ) # Example of using Sequentional with a specify name for each layer layer = nn.Sequential( ('conv1', ConvLayer(48, kernel=1)), ('conv2', ConvLayer(64, kernel=3, pad=1)) )
Experimental¶
Viewers¶
SimpleGraph¶
- class nnabla.experimental.viewers.SimpleGraph(format='png', verbose=False, fname_color_map=None, vname_color_map=None)[source]¶
Simple Graph with GraphViz.
Example:
import nnabla as nn import nnabla.functions as F import nnabla.parametric_functions as PF import nnabla.experimental.viewers as V # Model definition def network(image, test=False): h = image h /= 255.0 h = PF.convolution(h, 16, kernel=(3, 3), pad=(1, 1), name="conv") h = PF.batch_normalization(h, name="bn", batch_stat=not test) h = F.relu(h) pred = PF.affine(h, 10, name='fc') return pred # Model image = nn.Variable([4, 3, 32, 32]) pred = network(image, test=False) # Graph Viewer graph = V.SimpleGraph(verbose=False) graph.view(pred) graph.save(pred, "sample_grpah")
If the parameters are module-scoped, for example, the
pred
comes from a module output, parameters should be obtained beforehand then passed to view():Example:
import nnabla as nn import nnabla.functions as F from nnabla.core.modules import ConvBn import nnabla.experimental.viewers as V class TSTNetNormal(nn.Module): def __init__(self): self.conv_bn_1 = ConvBn(1) self.conv_bn_2 = ConvBn(1) def call(self, x1, x2): y1 = self.conv_bn_1(x1) y2 = self.conv_bn_2(x2) y = F.concatenate(y1, y2, axis=1) return y tnd = TSTNetNormal() v1 = nn.Variable((4, 3, 32, 32)) v2 = nn.Variable((4, 3, 32, 32)) ya = tnd(v1, v2) graph = V.SimpleGraph(verbose=False) graph.view(ya, params=tnd.get_parameters(grad_only=False))
- create_graphviz_digraph(vleaf, params, format=None)[source]¶
Create a
graphviz.Digraph
object given the leaf variable of a computation graph.One of nice things of getting
Digraph
directly is that the drawn graph can be displayed inline in a Jupyter notebook as described in Graphviz documentation.- Parameters
vleaf (
nnabla.Variable
) – End variable. All variables and functions which can be traversed from this variable are shown in the reuslt.params (dict) – The parameters dictionary, it can be obtained by nn.get_parameters().
format (str) – Force overwrite
format
('pdf', 'png', ...)
) configuration.
Returns: graphviz.Digraph
- save(vleaf, fpath, cleanup=False, format=None)[source]¶
Save the graph to a given file path.
- Parameters
vleaf (
nnabla.Variable
) – End variable. All variables and functions which can be traversed from this variable are shown in the reuslt.fpath (
str
) – The file path used to save.cleanup (
bool
) – Clean up the source file after rendering. Default is False.format (str) – Force overwrite
format
('pdf', 'png', ...)
) configuration.
- view(vleaf, fpath=None, cleanup=True, format=None, params=None)[source]¶
View the graph.
- Parameters
vleaf (
nnabla.Variable
) – End variable. All variables and functions which can be traversed from this variable are shown in the reuslt.fpath (
str
) – The file path used to save.cleanup (
bool
) – Clean up the source file after rendering. Default is True.format (str) – Force overwrite
format
('pdf', 'png', ...)
) configuration.params (dict) – Parameter dictionary, which can be obtained by get_parameters() function. Default is None. If params is None, global parameters are obtained.
Show Graph by Tensorboard¶
TBGraphWriter¶
Graph Converters¶
- class nnabla.experimental.graph_converters.GraphConverter(modifiers=[])[source]¶
Convert a graph with the modifiers by traversing from output variables.
- convert(o)[source]¶
- Parameters
o (list of
nnabla.Variable
) – Output variables.
- class nnabla.experimental.graph_converters.FunctionModifier[source]¶
Base class of modifiers.
The
modify
method is called for a function with inputs in a graph topological order when you call the GraphConverter(<modifiers>).convert(<root variable>) method.- finish_up()[source]¶
Finish the very time function modification.
Clean up the internal modifier states.
- Parameters
None –
- Returns
None
- get_parameter_scope(v)[source]¶
Get the parameter name corresponding to v
- Parameters
v (
nnabla.Variable
) – NNabla Variable Object.- Returns
Scope name
- Return type
- modify(f, inputs)[source]¶
Modify the function.
Implement this method in a sub class to modify a function.
Examples:
class ReLUToLeakyReLUModifier(FunctionModifier): def __init__(self): super(ReLUToLeakyReLUModifier, self).__init__() def modify(self, f, inputs): if f.info.type_name == 'ReLU': x = inputs[0] return F.leaky_relu(x)
This examples is a simple case since the network topological order does not change. In GraphConverter, we expect the modify method is called along the original network tolopogical order not the modified order. In such a complex case, see themodify method of
BatchNormalizationFoldingModifierInner
as a reference.- Parameters
f (
nnabla.function.Function
) – NNabla function object.inputs (list of
Variable
) – New inputs tof
. This may be modified one or the same as f.inputs.
- Returns
Variable
or list ofVariable
.
Function Modifiers¶
- class nnabla.experimental.graph_converters.BatchNormalizationFoldingModifier(opposite=False, channel_last=False)[source]¶
Single
Convolution -> BatchNormalization
pass is folded into oneConvolution
.If there is a
Convolution -> BatchNormalization
pass, fold the batch normalization parameters to the kernel and bias (if it exists) of the preceding convolution, then skip the batch normalization following the convolution.Supported folding functions:
Convolution
,Deconvolution
,Affine
.Examples:
pred = Model(...) import nnabla.experimental.graph_converters as GC modifiers = [GC.BatchNormalizationFoldingModifier()] gc = GC.GraphConverter(modifiers) pred = gc.convert(pred)
- class nnabla.experimental.graph_converters.BatchNormalizationFoldingModifierInner(channel_last=False)[source]¶
Single
Convolution -> BatchNormalization
pass is folded into oneConvolution
.If there is a
Convolution -> BatchNormalization
pass, fold the batch normalization parameters to the kernel and bias (if it exists) of the preceding convolution, then skip the batch normalization following the convolution.Supported folding functions:
Convolution
,Deconvolution
,Affine
.
- class nnabla.experimental.graph_converters.AddBiasModifier[source]¶
Add bias to
Convolution
in BatchNormalization folding case if it doesn’t have bias.Supported folding functions:
Convolution
,Deconvolution
,Affine
.Examples:
pred = Model(...) import nnabla.experimental.graph_converters as GC modifiers = [GC.AddBiasModifier()] gc = GC.GraphConverter(modifiers) pred = gc.convert(pred)
- class nnabla.experimental.graph_converters.BatchNormalizationSelfFoldingModifier(name='bn-self-folding')[source]¶
The parameters of the batch normalization replaced simple scale and bias.
- Parameters
name (
str
) – Prefix of the parameter scope.
Examples:
pred = Model(...) import nnabla.experimental.graph_converters as GC modifiers = [GC.BatchNormalizationSelfFoldingModifier()] gc = GC.GraphConverter(modifiers) pred = gc.convert(pred)
- class nnabla.experimental.graph_converters.ChannelFirstModifier(inputs, inputs_cf=None)[source]¶
Convert graph shape from Channel last (NHWC) to Channel first (NCHW) format.
Supported functions:
Convolution
,Deconvolution
,BatchNormalization
,MaxPooling
,AveragePooling
,SumPooling
,Unpooling
,Concatenate
- Parameters
inputs (list of nn.Variable) – Original channel last version of very begining inputs (NHWC) of a network.
inputs_cf (list of nn.Variable) – Channel first version of very begining inputs (NCHW) of a network. If this is not given,
inputs_cf
are generated internally and holded.
Examples:
pred = Model(...) import nnabla.experimental.graph_converters as GC modifiers = [GC.ChannelFirstModifier(<inputs of pred>)] gc = GC.GraphConverter(modifiers) pred = gc.convert(pred)
- class nnabla.experimental.graph_converters.ChannelLastModifier(inputs, inputs_cl=None)[source]¶
Convert graph shape from Channel first (NCHW) to Channel last (NHWC) format.
Supported functions:
Convolution
,Deconvolution
,BatchNormalization
,MaxPooling
,AveragePooling
,SumPooling
,Unpooling
,Concatenate
- Parameters
inputs (list of nn.Variable) – Original very begining inputs (NCHW) of a network.
inputs_cl (list of nn.Variable) – Channel last version of very begining inputs (NHWC) of a network. If this is not given,
inputs_cl
are generated internally and holded.
Examples:
pred = Model(...) import nnabla.experimental.graph_converters as GC modifiers = [GC.ChannelLastModifier(<inputs of pred>)] gc = GC.GraphConverter(modifiers) pred = gc.convert(pred)
- class nnabla.experimental.graph_converters.FusedBatchNormalizationModifier[source]¶
Block
BatchNormalization -> Add2 -> Non-Linear
pass is fused into oneFusedBatchNormalization
.If there is a block
BatchNormalization -> Add2 -> Non-Linear
pass, remove all the block functions and replace the whole block toFusedBatchNormalization
.Examples:
pred = Model(...) import nnabla.experimental.graph_converters as GC modifiers = [GC.FusedBatchNormalizationModifier()] gc = GC.GraphConverter(modifiers) pred = gc.convert(pred)
- class nnabla.experimental.graph_converters.UnfusedBatchNormalizationModifier[source]¶
Unfuse
FusedBatchNormalization
toBatchNormalization -> Add2 -> Non-Linear
block.If there is a
FusedBatchNormalization
pass, remove the fused batch normalization and replace it with the blockBatchNormalization -> Add2 -> Non-Linear
.Supported Non-Linear functions:
relu
Examples:
pred = Model(...) import nnabla.experimental.graph_converters as GC modifiers = [GC.UnfusedBatchNormalizationModifier()] gc = GC.GraphConverter(modifiers) pred = gc.convert(pred)
- class nnabla.experimental.graph_converters.RemoveFunctionModifier(rm_funcs=[])[source]¶
Remove specified function layer(s) from a graph.
A convenient converter when one or more functions in an existing graph needs to be removed. This converter remove specified function(s) without recreating a new graph from scratch.
- Parameters
rm_funcs (list of
str
) – list of function name
Examples:
pred = Model(...) import nnabla.experimental.graph_converters as GC modifiers = [GC.RemoveFunctionModifier(rm_funcs=['BatchNormalization', 'MulScalar'])] gc = GC.GraphConverter(modifiers) pred = gc.convert(pred)
Trainers¶
- class nnabla.experimental.trainers.Trainer(updater=None, evaluator=None, model_save_path=None, max_epoch=1, iter_per_epoch=None, callback_on_start=<function Trainer.<lambda>>, callback_on_finish=<function Trainer.<lambda>>, update_callback_on_start=<function Trainer.<lambda>>, update_callback_on_finish=<function Trainer.<lambda>>)[source]¶
Trainer API
Trainer class is the very basic class for training neural network. You can composite this class to your own trainer class and delegate the train method of this class to your class.
- Parameters
evaluator (
Evaluator
or list ofEvaluator
) – Evaluator object.model_save_path (
str
) – Model save path.max_epoch (
int
) – Max epoch to train.iter_per_epoch (
int
, optional) – Iterations per one epoch.callback_on_start (callable
object
, function, lambda, or list of these, optional) – Callback called before the trainer.train.callback_on_finish (callable
object
, function, lambda, or list of these, optional) – Callback called after the trainer.train.update_callback_on_start (callable
object
, function, lambda, or list of these, optional) – Callback called before the updater.update.update_callback_on_finish (callable
object
, function, lambda, or list of these, optional) – Callback called after the updater.update.
The following example is a complete snippet to use this base trainer.
Example
import nnabla as nn import nnabla.functions as F import nnabla.parametric_functions as PF import nnabla.solvers as S from nnabla.monitor import Monitor, MonitorSeries, MonitorTimeElapsed import numpy as np from nnabla.experimental.trainers import Trainer, Updater, Evaluator # Batch, channel, height, width b, c, h, w = 32, 1, 128, 128 # Train Input tinput = nn.Variable([b, c, h, w]) tlabel = nn.Variable([b, c, h, w]) # Train Model and Loss tpred = <training model>.apply(persistent=True) tloss = F.mean(F.softmax_cross_entropy(tpred, tlabel)) # Test Input vinput = nn.Variable([b, c, h, w]) vlabel = nn.Variable([b, c, h, w]) # Test Model and Error vpred = <evaluation model>.apply(persistent=True) vloss = F.mean(F.softmax_cross_entropy(vpred, vlabel)) verror = F.mean(F.top_n_error(vpred.get_unlinked_variable(), vlabel)) # Solver solver = S.Adam() solver.set_parameters(nn.get_parameters()) # DataIterator tdata = <training_data_iterator> vdata = <validation_data_iterator> # Monitor monitor = Monitor(<monitor_path>) monitor_loss = MonitorSeries("Training loss", monitor, interval=10) monitor_err = MonitorSeries("Training error", monitor, interval=10) monitor_time = MonitorTimeElapsed("Training time", monitor, interval=100) monitor_verr = MonitorSeries("Valid error", monitor, interval=10) # Updater def tdata_feeder(): tinput.d, tlabel.d = tdata.next() def update_callback_on_finish(i): monitor_loss.add(i, tloss.d) monitor_time.add(i) updater = Updater(solver, tloss, data_feeder=tdata_feeder, update_callback_on_finish=update_callback_on_finish) # Evaluator def vdata_feeder(): vinput.d, vlabel.d = vdata.next() def eval_callback_on_finish(i, ve): monitor_verr.add(i, ve) evaluator = Evaluator(verror, data_feeder=vdata_feeder, val_iter=vdata.size // b, callback_on_finish=eval_callback_on_finish) # Trainer trainer = Trainer(updater, evaluator, <model_save_path>, max_epoch=<max_epoch>, iter_per_epoch=tdata.size // b) trainer.train()
- class nnabla.experimental.trainers.NaiveClassificationTrainer(solver, tinput=None, tlabel=None, tpred=None, tdata=None, vinput=None, vlabel=None, vpred=None, vdata=None, monitor_path=None, model_save_path=None, max_epoch=1, iter_per_epoch=None, val_iter=None)[source]¶
Naive Classification Trainer
- Parameters
solver (
Solver
) – Solver object.tinput (
Variable
) – Input variable for input feature in training.tlabel (
Variable
) – Label variable for lable in training.tpred (
Variable
) – Root variable for prediction in the training graph.tdata (
nnabla.utils.data_iterator.DataIterator
) – DataIterator for training.vinput (
Variable
) – Input variable for input feature in evaluation.vlabel (
Variable
) – Label variable for label in evaluation.vpred (
Variable
) – Root variable for prediction in the evaluation graph.vdata (
DataIterator
) – DataIterator for evaluation.monitor_path (
str
) – Monitor path.model_save_path (
str
) – Model save path.max_epoch (
int
) – Max epoch to train.iter_per_epoch (
int
, optional) – Iterations per one epoch. If not set, this value are determined bytdata.size // tdata.batch_size
.val_iter (
int
, optional) – Iterations for evaluation. If not set, this value are determined byvdata.size // vdata.batch_size
.
The following example is a complete snippet to use this base trainer.
Example
import nnabla as nn import nnabla.functions as F import nnabla.parametric_functions as PF import nnabla.solvers as S import numpy as np from nnabla.experimental.trainers import NaiveClassificationTrainer # Batch, channel, height, width b, c, h, w = 32, 1, 128, 128 # Train Input tinput = nn.Variable([b, c, h, w]) tlabel = nn.Variable([b, c, h, w]) # Train Model and Loss tpred = <training model> # Test Input vinput = nn.Variable([b, c, h, w]) # Test Model vpred = <evaluation model> # Solver solver = S.Adam() solver.set_parameters(nn.get_parameters()) # DataIterator tdata = <training_data_iterator> vdata = <validation_data_iterator> # Trainer trainer = NaiveClassificationTrainer(solver, tinput, tlabel, tpred, tdata, vinput, vlabel, vpred, vdata, <monitor_path>, <model_save_path>, max_epoch=<max_epoch>) trainer.train()
- class nnabla.experimental.trainers.NaiveRegressionTrainer(solver, tinput=None, tlabel=None, tpred=None, tdata=None, vinput=None, vlabel=None, vpred=None, vdata=None, monitor_path=None, model_save_path=None, max_epoch=1, iter_per_epoch=None, val_iter=None)[source]¶
Naive Regression Trainer
- Parameters
solver (
Solver
) – Solver object.tinput (
Variable
) – Input variable for input feature in training.tlabel (
Variable
) – Label variable for lable in training.tpred (
Variable
) – Root variable for prediction in the training graph.tdata (
nnabla.utils.data_iterator.DataIterator
) – DataIterator for training.vinput (
Variable
) – Input variable for input feature in evaluation.vlabel (
Variable
) – Label variable for label in evaluation.vpred (
Variable
) – Root variable for prediction in the evaluation graph.vdata (
DataIterator
) – DataIterator for evaluation.monitor_path (
str
) – Monitor path.model_save_path (
str
) – Model save path.max_epoch (
int
) – Max epoch to train.iter_per_epoch (
int
, optional) – Iterations per one epoch. If not set, this value are determined bytdata.size // tdata.batch_size
.val_iter (
int
, optional) – Iterations for evaluation. If not set, this value are determined byvdata.size // vdata.batch_size
.
Example
import nnabla as nn import nnabla.functions as F import nnabla.parametric_functions as PF import nnabla.solvers as S import numpy as np from nnabla.experimental.trainers import NaiveRegressionTrainer # Batch, channel, height, width b, c, h, w = 32, 1, 128, 128 # Train Input tinput = nn.Variable([b, c, h, w]) tlabel = nn.Variable([b, c, h, w]) # Train Model and Loss tpred = <training model> # Test Input vinput = nn.Variable([b, c, h, w]) vlabel = nn.Variable([b, c, h, w]) # Test Model vpred = <evaluation model> # Solver solver = S.Adam() solver.set_parameters(nn.get_parameters()) # DataIterator tdata = <training_data_iterator> vdata = <validation_data_iterator> # Trainer trainer = NaiveRegressionTrainer(solver, tinput, tlabel, tpred, tdata, vinput, vlabel, vpred, vdata, <monitor_path>, <model_save_path>, max_epoch=<max_epoch>) trainer.train()
- class nnabla.experimental.trainers.Updater(solver=None, loss=None, data_feeder=<function Updater.<lambda>>, forward_callback_on_start=<function Updater.<lambda>>, forward_callback_on_finish=<function Updater.<lambda>>, backward_callback_on_start=<function Updater.<lambda>>, backward_callback_on_finish=<function Updater.<lambda>>, comm_callback_on_start=<function Updater.<lambda>>, comm_callback_on_finish=<function Updater.<lambda>>, update_callback_on_start=<function Updater.<lambda>>, update_callback_on_finish=<function Updater.<lambda>>, clear_buffer=True, accum_grad=1, comm=None, grads=[])[source]¶
- Parameters
solver (
nnabla.solvers.Solver
) – Solver object. E.g., Momentum or Adam.loss (
nnabla.Variable
) – Loss variable from which the forward and the backward is called.data_feeder (callable
object
, function, or lambda) – Data feeder.forward_callback_on_start (callable
object
, function, lambda, or list of these, optional) – Callback called before forward function.forward_callback_on_finish (callable
object
, function, lambda, or list of these, optional) – Callback called after forward function.backward_callback_on_start (callable
object
, function, lambda, or list of these, optional) – Callback called before backward function.backward_callback_on_finish (callable
object
, function, lambda, or list of these, optional) – Callback called after backward function.comm_callback_on_start (callable
object
, function, lambda, or list of these, optional) – Callback called before comm.all_reduce.comm_callback_on_finish (callable
object
, function, lambda, or list of these, optional) – Callback called after comm.all_reduce.update_callback_on_start (callable
object
, function, lambda, or list of these, optional) – Callback called before update function.update_callback_on_finish (callable
object
, function, lambda, or list of these, optional) – Callback called after update function.clear_buffer (
bool
, optional) – Clears the no longer referenced variables during backpropagation to save memory.accum_grad (
int
, optional) – Number of accumulation of gradients. Update method of the Solver is called after theaccum_grad
number of the forward and backward is called. Default is 1.comm (
nnabla.communicators.Communicator
, optional) – Communicator when to do distributed training. Default isNone
.grads (
list
ofnnabla.NdArray
, optional) – The list of gradients to be exchanged when to do distributed training. Default is the emptylist
.
Example
from nnabla.experimental.trainers import Updater solver = <Solver> loss = <Loss Variable of Network> def tdata_feeder(): ... def update_callback_on_finish(i): ... updater = Updater(solver, loss, tdata_feeder, updater_callback_on_finish) # Training iteration for itr in range(<max_iter>): updater.update()
- class nnabla.experimental.trainers.Evaluator(vroot=None, data_feeder=None, val_iter=None, callback_on_start=<function Evaluator.<lambda>>, callback_on_finish=<function Evaluator.<lambda>>, clear_buffer=True, comm=None)[source]¶
- Parameters
vroot (
Variable
) – Root varible of the evaluation graph.data_feeder (callable
object
, function, or lambda) – Data feeder.val_iter (
int
, optional) – Iterations for evaluation.callback_on_start (callable
object
, function, lambda, or list of these, optional) – Callback called before the evaluator.evalute.callback_on_finish (callable
object
, function, lambda, or list of these, optional) – Callback called after the evaluator.evalute.clear_buffer (
bool
, optional) – Clears the no longer referenced variables during backpropagation to save memory.comm (
nnabla.communicators.Communicator
, optional) – Communicator when to do distributed training. Default isNone
.
Example
from nnabla.experimental.trainers import Evaluator # Evaluator def vdata_feeder(): ... def eval_callback_on_finish(i, ve): ... evaluator = Evaluator(verror, data_feeder=vdata_feeder, val_iter=<val_iter>, callback_on_finish=eval_callback_on_finish)
Mixed Precision Trainings¶
DynamicLossScalingUpdater¶
- class nnabla.experimental.mixed_precision_training.DynamicLossScalingUpdater(solver, loss, data_feeder=<function DynamicLossScalingUpdater.<lambda>>, scale=8.0, scaling_factor=2.0, N=2000, clear_buffer=True, accum_grad=1, weight_decay=None, comm=None, grads=[])[source]¶
Dynamic Loss Scaling Updater for the mixed precision training.
- Parameters
solver (
nnabla.solvers.Solver
) – Solver object. E.g., Momentum or Adam.loss (
nnabla.Variable
) – Loss variable from which the forward and the backward is called.data_feeder (callable
object
, function, or lambda) – Data feederscale (
float
) – Loss scale constant. This is dynamically changing during training.scaling_factor (
float
) – Scaling factor for the dynamic loss scaling.N (
int
) – Interval, the number of iterations in training for increasingloss scale
byscaling_factor
.clear_buffer (
bool
) – Clears the no longer referenced variables during backpropagation to save memory.accum_grad (
int
) – Number of accumulation of gradients. Update method of the Solver is called after theaccum_grad
number of the forward and backward is called.weight_decay (
float
) – Decay constant. Default isNone
, not applying the weight decay.comm (
nnabla.communicators.Communicator
) – Communicator when to do distributed training. Default isNone
.grads (
list
ofnnabla.NdArray
) – The list of gradients to be exchanged when to do distributed training. Default is the emptylist
.
- solver¶
Solver object. E.g., Momentum or Adam.
- loss¶
Loss variable from which the forward and the backward is called.
- Type
- N¶
Interval, the number of iterations in training for increasing
loss scale
byscaling_factor
.- Type
- clear_buffer¶
Clears the no longer referenced variables during backpropagation to save memory.
- Type
- accum_grad¶
Number of accumulation of gradients. Update method of the Solver is called after the
accum_grad
number of the forward and backward is called.- Type
- comm¶
Communicator when to do distributed training.
- grads¶
The list of gradients to be exchanged when to do distributed training.
- Type
Example
Reference:
Parametric Function Classes¶
- class nnabla.experimental.parametric_function_class.affine.Affine(n_inmaps, n_outmaps, base_axis=1, w_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True)[source]¶
The affine layer, also known as the fully connected layer. Computes
\[{\mathbf y} = {\mathbf A} {\mathbf x} + {\mathbf b}.\]where \({\mathbf x}, {\mathbf y}\) are the inputs and outputs respectively, and \({\mathbf A}, {\mathbf b}\) are constants.
- Parameters
inp (Variable) – Input N-D array with shape (\(M_0 \times \ldots \times M_{B-1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.
n_outmaps (
int
ortuple
ofint
) – Number of output neurons per data.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros ifwith_bias
isTrue
.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
- Returns
\((B + 1)\)-D array. (\(M_0 \times \ldots \times M_{B-1} \times L\))f
- Return type
- nnabla.experimental.parametric_function_class.affine.Linear¶
alias of
nnabla.experimental.parametric_function_class.affine.Affine
- class nnabla.experimental.parametric_function_class.convolution.Convolution(inmaps, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True)[source]¶
N-D Convolution with a bias term.
For Dilated Convolution (a.k.a. Atrous Convolution), refer to:
Chen et al., DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. https://arxiv.org/abs/1606.00915
Yu et al., Multi-Scale Context Aggregation by Dilated Convolutions. https://arxiv.org/abs/1511.07122
Note
Convolution is a computationally intensive operation that should preferably be run with the
cudnn
backend. NNabla then uses CuDNN library functions to determine and cache the fastest algorithm for the given set of convolution parameters, which results in additional memory consumption which may pose a problem for GPUs with insufficient memory size. In that case, theNNABLA_CUDNN_WORKSPACE_LIMIT
environment variable can be used to restrict the choice of algorithms to those that fit the given workspace memory limit, expressed in bytes. In some cases it may also be desired to restrict the automatic search to algorithms that produce deterministic (reproducable) results. This can be requested by setting the the environment variableNNABLA_CUDNN_DETERMINISTIC
to a non-zero value.- Parameters
inp (Variable) – N-D array.
outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.
kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) two-dimensional kernel, specify (3,5).group (int) – Number of groups of channels. This makes connections across channels more sparse by grouping connections along map direction.
w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros ifwith_bias
isTrue
.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
- Returns
N-D array. See
convolution
for the output shape.- Return type
- nnabla.experimental.parametric_function_class.convolution.Conv1d¶
alias of
nnabla.experimental.parametric_function_class.convolution.Convolution
- nnabla.experimental.parametric_function_class.convolution.Conv2d¶
alias of
nnabla.experimental.parametric_function_class.convolution.Convolution
- nnabla.experimental.parametric_function_class.convolution.Conv3d¶
alias of
nnabla.experimental.parametric_function_class.convolution.Convolution
- nnabla.experimental.parametric_function_class.convolution.ConvNd¶
alias of
nnabla.experimental.parametric_function_class.convolution.Convolution
- class nnabla.experimental.parametric_function_class.deconvolution.Deconvolution(inmaps, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True)[source]¶
Deconvolution layer.
- Parameters
inp (Variable) – N-D array.
outmaps (int) – Number of deconvolution kernels (which is equal to the number of output channels). For example, to apply deconvolution on an input with 16 types of filters, specify 16.
kernel (
tuple
ofint
) – Convolution kernel size. For example, to apply deconvolution on an image with a 3 (height) by 5 (width) two-dimensional kernel, specify (3,5).group (int) – Number of groups of channels. This makes connections across channels sparser by grouping connections along map direction.
w_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for weight. By default, it is initialized withnnabla.initializer.UniformInitializer
within the range determined bynnabla.initializer.calc_uniform_lim_glorot
.b_init (
nnabla.initializer.BaseInitializer
ornumpy.ndarray
) – Initializer for bias. By default, it is initialized with zeros ifwith_bias
isTrue
.base_axis (int) – Dimensions up to
base_axis
are treated as the sample dimensions.fix_parameters (bool) – When set to
True
, the weights and biases will not be updated.rng (numpy.random.RandomState) – Random generator for Initializer.
with_bias (bool) – Specify whether to include the bias term.
- Returns
N-D array. See
deconvolution
for the output shape.- Return type
- nnabla.experimental.parametric_function_class.deconvolution.Deconv1d¶
alias of
nnabla.experimental.parametric_function_class.deconvolution.Deconvolution
- nnabla.experimental.parametric_function_class.deconvolution.Deconv2d¶
alias of
nnabla.experimental.parametric_function_class.deconvolution.Deconvolution
- nnabla.experimental.parametric_function_class.deconvolution.Deconv3d¶
alias of
nnabla.experimental.parametric_function_class.deconvolution.Deconvolution
- nnabla.experimental.parametric_function_class.deconvolution.DeconvNd¶
alias of
nnabla.experimental.parametric_function_class.deconvolution.Deconvolution
- class nnabla.experimental.parametric_function_class.batch_normalization.BatchNormalization(n_features, n_dims, axes=[1], decay_rate=0.9, eps=1e-05, batch_stat=True, output_stat=False, fix_parameters=False, param_init=None)[source]¶
Batch normalization layer.
\[\begin{split}\begin{array}{lcl} \mu &=& \frac{1}{M} \sum x_i\\ \sigma^2 &=& \frac{1}{M} \left(\sum x_i - \mu\right)^2\\ \hat{x}_i &=& \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon }}\\ y_i &= & \hat{x}_i \gamma + \beta. \end{array}\end{split}\]where \(x_i, y_i\) are the inputs. In testing, the mean and variance computed by moving average calculated during training are used.
- Parameters
inp (Variable) – N-D array of input.
axes (
tuple
ofint
) – Mean and variance for each element inaxes
are calculated using elements on the rest axes. For example, if an input is 4 dimensions, andaxes
is[1]
, batch mean is calculated asnp.mean(inp.d, axis=(0, 2, 3), keepdims=True)
(using numpy expression as an example).decay_rate (float) – Decay rate of running mean and variance.
eps (float) – Tiny value to avoid zero division by std.
batch_stat (bool) – Use mini-batch statistics rather than running ones.
output_stat (bool) – Output batch mean and variance.
fix_parameters (bool) – When set to
True
, the beta and gamma will not be updated.param_init (dict) – Parameter initializers can be set with a dict. A key of the dict must be
'beta'
,'gamma'
,'mean'
or'var'
. A value of the dict must be anInitializer
or anumpy.ndarray
. E.g.{'beta': ConstantInitializer(0), 'gamma': np.ones(gamma_shape) * 2}
.
- Returns
N-D array.
- Return type
References
Ioffe and Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. https://arxiv.org/abs/1502.03167
The shape of parameters has the same number of dimensions with the input data, and the shapes in
axes
has the same dimensions with the input, while the rest has1
. If an input is 4-dim andaxes=[1]
, the parameter shape will beparam_shape = np.mean(inp.d, axis=(0, 2, 3), keepdims=True).shape
(using numpy expression as an example).
- class nnabla.experimental.parametric_function_class.batch_normalization.BatchNorm1d(n_features, axes=[1], decay_rate=0.9, eps=1e-05, batch_stat=True, output_stat=False, fix_parameters=False, param_init=None)[source]¶
Batch normalization layer for 3d-Array or 3d-Variable. This is typically used together with
Conv1d
.\[\begin{split}\begin{array}{lcl} \mu &=& \frac{1}{M} \sum x_i\\ \sigma^2 &=& \frac{1}{M} \left(\sum x_i - \mu\right)^2\\ \hat{x}_i &=& \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon }}\\ y_i &= & \hat{x}_i \gamma + \beta. \end{array}\end{split}\]where \(x_i, y_i\) are the inputs. In testing, the mean and variance computed by moving average calculated during training are used.
- Parameters
inp (Variable) – N-D array of input.
axes (
tuple
ofint
) – Mean and variance for each element inaxes
are calculated using elements on the rest axes. For example, if an input is 4 dimensions, andaxes
is[1]
, batch mean is calculated asnp.mean(inp.d, axis=(0, 2, 3), keepdims=True)
(using numpy expression as an example).decay_rate (float) – Decay rate of running mean and variance.
eps (float) – Tiny value to avoid zero division by std.
batch_stat (bool) – Use mini-batch statistics rather than running ones.
output_stat (bool) – Output batch mean and variance.
fix_parameters (bool) – When set to
True
, the beta and gamma will not be updated.param_init (dict) – Parameter initializers can be set with a dict. A key of the dict must be
'beta'
,'gamma'
,'mean'
or'var'
. A value of the dict must be anInitializer
or anumpy.ndarray
. E.g.{'beta': ConstantInitializer(0), 'gamma': np.ones(gamma_shape) * 2}
.
- Returns
N-D array.
- Return type
References
Ioffe and Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. https://arxiv.org/abs/1502.03167
The shape of parameters has the same number of dimensions with the input data, and the shapes in
axes
has the same dimensions with the input, while the rest has1
. If an input is 4-dim andaxes=[1]
, the parameter shape will beparam_shape = np.mean(inp.d, axis=(0, 2, 3), keepdims=True).shape
(using numpy expression as an example).
- class nnabla.experimental.parametric_function_class.batch_normalization.BatchNorm2d(n_features, axes=[1], decay_rate=0.9, eps=1e-05, batch_stat=True, output_stat=False, fix_parameters=False, param_init=None)[source]¶
Batch normalization layer for 4d-Array or 4d-Variable. This is typically used together with
Conv2d
.\[\begin{split}\begin{array}{lcl} \mu &=& \frac{1}{M} \sum x_i\\ \sigma^2 &=& \frac{1}{M} \left(\sum x_i - \mu\right)^2\\ \hat{x}_i &=& \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon }}\\ y_i &= & \hat{x}_i \gamma + \beta. \end{array}\end{split}\]where \(x_i, y_i\) are the inputs. In testing, the mean and variance computed by moving average calculated during training are used.
- Parameters
inp (Variable) – N-D array of input.
axes (
tuple
ofint
) – Mean and variance for each element inaxes
are calculated using elements on the rest axes. For example, if an input is 4 dimensions, andaxes
is[1]
, batch mean is calculated asnp.mean(inp.d, axis=(0, 2, 3), keepdims=True)
(using numpy expression as an example).decay_rate (float) – Decay rate of running mean and variance.
eps (float) – Tiny value to avoid zero division by std.
batch_stat (bool) – Use mini-batch statistics rather than running ones.
output_stat (bool) – Output batch mean and variance.
fix_parameters (bool) – When set to
True
, the beta and gamma will not be updated.param_init (dict) – Parameter initializers can be set with a dict. A key of the dict must be
'beta'
,'gamma'
,'mean'
or'var'
. A value of the dict must be anInitializer
or anumpy.ndarray
. E.g.{'beta': ConstantInitializer(0), 'gamma': np.ones(gamma_shape) * 2}
.
- Returns
N-D array.
- Return type
References
Ioffe and Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. https://arxiv.org/abs/1502.03167
The shape of parameters has the same number of dimensions with the input data, and the shapes in
axes
has the same dimensions with the input, while the rest has1
. If an input is 4-dim andaxes=[1]
, the parameter shape will beparam_shape = np.mean(inp.d, axis=(0, 2, 3), keepdims=True).shape
(using numpy expression as an example).
- class nnabla.experimental.parametric_function_class.batch_normalization.BatchNorm3d(n_features, axes=[1], decay_rate=0.9, eps=1e-05, batch_stat=True, output_stat=False, fix_parameters=False, param_init=None)[source]¶
Batch normalization layer for 5d-Array or 5d-Variable. This is typically used together with
Conv3d
.\[\begin{split}\begin{array}{lcl} \mu &=& \frac{1}{M} \sum x_i\\ \sigma^2 &=& \frac{1}{M} \left(\sum x_i - \mu\right)^2\\ \hat{x}_i &=& \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon }}\\ y_i &= & \hat{x}_i \gamma + \beta. \end{array}\end{split}\]where \(x_i, y_i\) are the inputs. In testing, the mean and variance computed by moving average calculated during training are used.
- Parameters
inp (Variable) – N-D array of input.
axes (
tuple
ofint
) – Mean and variance for each element inaxes
are calculated using elements on the rest axes. For example, if an input is 4 dimensions, andaxes
is[1]
, batch mean is calculated asnp.mean(inp.d, axis=(0, 2, 3), keepdims=True)
(using numpy expression as an example).decay_rate (float) – Decay rate of running mean and variance.
eps (float) – Tiny value to avoid zero division by std.
batch_stat (bool) – Use mini-batch statistics rather than running ones.
output_stat (bool) – Output batch mean and variance.
fix_parameters (bool) – When set to
True
, the beta and gamma will not be updated.param_init (dict) – Parameter initializers can be set with a dict. A key of the dict must be
'beta'
,'gamma'
,'mean'
or'var'
. A value of the dict must be anInitializer
or anumpy.ndarray
. E.g.{'beta': ConstantInitializer(0), 'gamma': np.ones(gamma_shape) * 2}
.
- Returns
N-D array.
- Return type
References
Ioffe and Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. https://arxiv.org/abs/1502.03167
The shape of parameters has the same number of dimensions with the input data, and the shapes in
axes
has the same dimensions with the input, while the rest has1
. If an input is 4-dim andaxes=[1]
, the parameter shape will beparam_shape = np.mean(inp.d, axis=(0, 2, 3), keepdims=True).shape
(using numpy expression as an example).
- class nnabla.experimental.parametric_function_class.embed.Embed(n_inputs, n_features, w_init=None, fix_parameters=False)[source]¶
Embed.
Embed slices a matrix/tensor with indexing array/tensor. Weights are initialized with
nnabla.initializer.UniformInitializer
within the range of \(-\sqrt{3}\) and \(\sqrt{3}\).- Parameters
- Returns
Output with shape \((I_0, ..., I_N, W_1, ..., W_M)\)
- Return type
C++ API¶
The C++ libraries currently provide:
APIs to execute an inference of a trained model created by Python APIs and Neural Network Console, a Sony’s GUI neural network IDE.
A command line interface written in C++ which executes an inference.
An example of how to use C++ API with a trained model.
We are still preparing a well-formatted C++ API reference manual, however you can read through the header files where most of classes and functions are documented in Doxygen format. The header files can be found under include directory.
The example MNIST runtime is a good starting point to understand how to use C++ API for neural network inference.
Build C++ libraries¶
Documentation has been moved to Github repository.
C++ Command Line Interface¶
Nnabla has c++ version’s command line interface utility which can do train, forward(inference). Using this command line interface, developers can run train and infer without any python environment.
usage: nbla (infer|dump|train)
Basic functions¶
Forward¶
usage: nbla infer -e EXECUTOR [-b BATCHSIZE] [-o OUTPUT] input_files ...
arguments:
-e EXECUTOR EXECUTOR is the name of executor network.
input_files input_file must be one of followings.
*.nnp : Network structure and parameter.
*.nntxt : Network structure in prototxt format.
*.prototxt : Same as nntxt.
*.h5 : Parameters in h5 format.
*.protobuf : Network structure and parameters in binary.
*.bin : Input data.
optional arguments:
-b BATCHSIZE batch size for the input data.
-o OUTPUT the filename pattern of output file, default output to stdout.
example:
Infer using LeNet_input.bin as input, LeNet_output_0.bin as output:
nbla infer -e Executor -b 1 LeNet.nnp LeNet_input.bin -o LeNet_output
Infer and output the result to console:
nbla infer -e Executor -b 1 LeNet.nnp LeNet_input.bin
Dump¶
usage: nbla dump input_files ...
arguments:
input_files input_files must be one of *.nnp, *.nntxt, prototxt, h5, protobuf
example:
Show network information by dump command:
nbla dump LeNet.nnp
The output looks like:
This configuration has 1 executors.
Executor No.0 Name [Executor]
Using default batch size 64 .
Inputs
Input No.0 Name [x] Shape ( 64 1 28 28 )
Outputs
Output No.0 Name [y'] Shape ( 64 10 )
Finished
Train¶
usage: nbla train input_file
arguments:
input_file input_file must be *.nnp
C++ API Examples¶
Follow this link to see examples.
Data exchange file format¶
Data exchange format for “Neural Network Library”.
Current version of .nnp file is just a ZIP format archive file but filename extension is ‘.nnp’.
‘.nnp’ file will contain following files. If ‘.nnp’ file contain other file, nnabla will just ignore that files.
‘nnp_version.txt’
Specify version of nnp file. Version string in this file is got from nnp_version().
‘.nntxt’ (or ‘.prototxt’)
Network structure in Protocol buffer text format.
‘*.protobuf’
Trained parameter in Protocol buffer binary format.
‘*.h5’
Trained parameter in HDF5 format.(Will be obsolete soon.)
Data Format¶
Here is data format for exchange network structures and trained parameters.
Network Structure¶
Network structure and parameter will store with Google Protocol Buffer format internally.
Overview¶
Overview of network structure defined as following.
- NNablaProtoBuf
Root message of NNabla network structure. This message could be store GlobalConfig, TrainingConfig, Network(s), Parameter(s), Dataset(s), Optimizer(s), Monitor(s) and Executor(s).
- Variable
Internal data structure to store tensor for Neural network I/O and parameters.
- GlobalConfig
Configuration of environment that suggest to do train or inference.
- TrainingConfig
Configuration of training.
- Network
Network structure.
- Parameter
Special variable to store train result. (e.g Weight or Bias of affine layer)
- Dataset
Specify dataset for training.
- Optimizer
Define network, dataset, and input/output variables for train.
- Monitor
Define network, dataset, and input/output variables for monitor training status..
- Executor
Define network and input/output variables for train.
Structure for Training¶
TBD
Structure for Inference¶
TBD
Overall structure¶
Parameter¶
From the performance point of view, parameters can be saved in HDF 5 format.
File Format and extensions¶
- Protocol buffer text format file
.nntxt or .prototxt
- Protocol buffer serialized binary file
.protobuf
- HDF5
.h5
- NNP (ZIP archived file with above formats.)
.nnp
File format converter¶
Overview¶
File format converter will realize Neural Network Libraries (or Console) workflow with ONNX file format, and also NNabla C Runtime.
File format converter has following functions.
Convert NNP variations to valid NNP
Convert ONNX to NNP
Convert NNP to ONNX
Convert NNP to NNB(Binary format for NNabla C Runtime)
Convert NNP to Tensorflow saved_model
Convert Tensorflow checkpoint, frozen graph or saved_model to NNP
Convert NNP to Tensorflow Lite
Convert Tensorflow Lite to NNP
Experimental: Convert NNP to C Source code for NNabla C Runtime
IMPORTANT NOTICE: This file format converter still has some known problems.
Supported ONNX operator is limited. See Function-Level Support Status.
Supported Tensorflow operator is limited. See Function-Level Support Status.
Converting NNP to C Source code is still experimental. It should work but did not tested well.
Architecture¶
This file format converter uses protobuf defined in Neural Network Libraries as intermediate format.
While this is not a generic file format converter, this is the specified converter for Neural Network Libraries.
This converter can specify both inputs and outputs for ONNX file, but if ONNX file contains a function unsupported by Neural Network Libraries, it may cause error in conversion.
This converter also provides some intermediate process functionalities. See Process.
Conversion¶
Supported Formats¶
NNP¶
NNP is file format of NNabla.
NNP format is described at Data Format.
But with this file format converter is work with several variation of NNP.
Standard NNP format (.nnp)
Contents of NNP files(.nntxt, .prototxt, .h5, .protobuf)
ONNX¶
Training is not supported.
Support operator set 7,9,10,11.
Not all functions are supported. See Function-Level Support Status.
Only limited Neural Network Console projects supported. See Model Support Status.
Before using this converter, please use command
pip install nnabla_converter
to install nnabla_converter.
NNB¶
NNB is compact binary format for NNabla C Runtime. The file format is shown as the following diagram:

There are several concepts, such as buffer, variable, function, input and output in this file. Each of them is represented as a list. Each list is recorded with 2 members: number of object, and index in memory block table. The index points to the position in a memory block index table. The index in memory block index table points to the start address of memory data block.
It is designed for nnabla-c-runtime.
C Source Code¶
File format converter supports C source code output for nnabla-c-runtime.
Tensorflow¶
Bridged by onnx, tensorflow import and export is supported with some limitations.
- As for the importer, 4 formats tends to be supported:
.pb, tensorflow frozen graph format
.ckpt, tensorflow check point format version 1
.ckpt.*, tensorflow check point format version 2
saved_model, tensorflow saved_model format
As for the exporter, some of Neural Network Console projects are supported. See Model Support Status. The output of converter is tensorflow saved_model format.
Before using this converter, please use command pip install nnabla_converter
to install nnabla_converter.
Tensorflow Lite¶
- For export to tensorflow lite, please install
flatbuffers
package: For Windows platform, download package from FlatBuffers and extract.
For Linux platform, use command
snap install nnabla_converter
to install flatbuffers.For MaxOS platform, use command
brew install flatbuffers
to install flatbuffers.
and add the executable file flatc
to the system PATH.
After exporting TFLite, a json file with the same name will be generated, recording whether the input and output of the TFLite network need to be transposed to channel_last according to base_axis.
Process¶
Expand Repeat and Recurrent¶
Neural Network Console supports LoopControl
pseudo functions RepeatStart, RepeatEnd, RecurrentInput, RecurrentOutput or Delay.
Currently, these functions are not supported by Neural Network Libraries directly.
The file format converter expands the network and removes these pseudo functions by default.
If you want to preserve these, specify command line option --nnp-no-expand-network
when converting files.
Split network¶
You can split network with --split
option.
See Splitting network to use this functionality.
Usage¶
NNP Operation¶
Convert NNP to NNP¶
Sometimes we need to convert NNP to NNP.
Most major usecase, expand repeat or recurrent network supported by Neural Network Console but not supported by C++ API.
$ nnabla_cli convert input.nnp output.nnp
Convert console output to single NNP file¶
Current version of Neural Network Console outputs .nntxt and .h5 as training result.
Then we need to convert separated files into single NNP and parameters store with protobuf format.
$ nnabla_cli convert net.nntxt parameters.h5 output.nnp
Convert console output to single NNP file without expanding Repeat or recurrent.¶
$ nnabla_cli convert --nnp-no-expand-network net.nntxt parameters.h5 output.nnp
Keep parameter format as hdf5¶
$ nnabla_cli convert --nnp-no-expand-network --nnp-parameter-h5 net.nntxt parameters.h5 output.nnp
Everything into single nntxt.¶
$ nnabla_cli convert --nnp-parameter-nntxt net.nntxt parameters.h5 output.nntxt
ONNX Operation¶
Convert NNP to ONNX¶
$ nnabla_cli convert input.nnp output.onnx
If specify output onnx opset 9, please use the following (default is opset 7):
$ nnabla_cli convert input.nnp output.onnx -d opset_9
Convert ONNX to NNP¶
$ nnabla_cli convert input.onnx output.nnp
Currently, opset 7,9,10,11 are supported to import.
C Runtime Operation¶
Generally, it is better to set the batch size to 1 when convert file to C runtime.
If the batch size is larger than 1, it is necessary to process the batch size data collectively
To make the batch size 1, add -b 1
to command line option.
Convert NNP to NNB¶
$ nnabla_cli convert -b 1 input.nnp output.nnb
Convert NNP to C source code¶
$ nnabla_cli convert -b 1 -O CSRC input.onnx output-dir
Quantization¶
C-runtime library supports binary(or fixed point) weights, which can dramatically downsize the model (and footprint). See Compress network by fixed point quantization for how to quantize your model.
Tensorflow Operation¶
Convert NNP to Tensorflow saved_model¶
$ nnabla_cli convert input.nnp output_saved_model --export-format SAVED_MODEL
Convert NNP to Tensorflow frozen graph¶
$ nnabla_cli convert input.nnp output.pb
Convert Tensorflow frozen graph to NNP¶
$ nnabla_cli convert input.pb output.nnp
Convert Tensorflow checkpoint to NNP¶
For checkpoint version 1:
$ nnabla_cli convert input.ckpt output.nnp --inputs x0,x1 --outputs y0,y1
In the same directory of input.ckpt, the related files, such as checkpoint, input.ckpt.meta and so on are required
to exist. The inputs
required the input name of model, separated by comma. The outputs
is same. In parsing checkpoint format, input and output needs to be provided.
For checkpoint version 2:
$ nnabla_cli convert input.ckpt.meta output.nnp --inputs x0,x1 --outputs y0,y1
In the same directory of input.ckpt.meta, the related files, such as checkpoint, *.ckpt.index, … and so on are required to exist.
Convert Tensorflow saved_model to NNP¶
$ nnabla_cli convert input_saved_model output.nnp
Convert NNP to Tensorflow Lite¶
$ nnabla_cli convert input.nnp output.tflite
Convert Tensorflow Lite to NNP¶
$ nnabla_cli convert input.tflite output.nnp
Splitting network¶
Splitting network is a bit complicated and can be troublesome.
NNP file could have multiple Executor networks, but Split supports only single network to split.
First, you must confirm how many Executors there are in the NNP, and specify what executor to split with nnabla_cli dump
.
$ nnabla_cli dump squeezenet11.files/SqueezeNet-1.1/*.{nntxt,h5}
2018-08-27 15:02:40,006 [nnabla][INFO]: Initializing CPU extension...
Importing squeezenet11.files/SqueezeNet-1.1/net.nntxt
Importing squeezenet11.files/SqueezeNet-1.1/parameters.h5
Expanding Training.
Expanding Top5Error.
Expanding Top1Error.
Expanding Runtime.
Optimizer[0]: Optimizer
Optimizer[0]: (In) Data variable[0]: Name:TrainingInput Shape:[-1, 3, 480, 480]
Optimizer[0]: (In) Data variable[1]: Name:SoftmaxCrossEntropy_T Shape:[-1, 1]
Optimizer[0]: (Out)Loss variable[0]: Name:SoftmaxCrossEntropy Shape:[-1, 1]
Monitor [0]: train_error
Monitor [0]: (In) Data variable[0]: Name:Input Shape:[-1, 3, 320, 320]
Monitor [0]: (In) Data variable[1]: Name:Top5Error_T Shape:[-1, 1]
Monitor [0]: (Out)Monitor variable[0]: Name:Top5Error Shape:[-1, 1]
Monitor [1]: valid_error
Monitor [1]: (In) Data variable[0]: Name:Input Shape:[-1, 3, 320, 320]
Monitor [1]: (In) Data variable[1]: Name:Top1rror_T Shape:[-1, 1]
Monitor [1]: (Out)Monitor variable[0]: Name:Top1rror Shape:[-1, 1]
Executor [0]: Executor
Executor [0]: (In) Data variable[0]: Name:Input Shape:[-1, 3, 320, 320]
Executor [0]: (Out)Output variable[0]: Name:y' Shape:[-1, 1000]
As above output now you know only 1 executor.
Then you can show executor information with nnabla_cli dump -E0
.
$ nnabla_cli dump -E0 squeezenet11.files/SqueezeNet-1.1/*.{nntxt,h5}
2018-08-27 15:03:26,547 [nnabla][INFO]: Initializing CPU extension...
Importing squeezenet11.files/SqueezeNet-1.1/net.nntxt
Importing squeezenet11.files/SqueezeNet-1.1/parameters.h5
Try to leave only executor[Executor].
Expanding Runtime.
Executor [0]: Executor
Executor [0]: (In) Data variable[0]: Name:Input Shape:[-1, 3, 320, 320]
Executor [0]: (Out)Output variable[0]: Name:y' Shape:[-1, 1000]
You can get list of function adding -F
option.
$ nnabla_cli dump -FE0 squeezenet11.files/SqueezeNet-1.1/*.{nntxt,h5}
2018-08-27 15:04:10,954 [nnabla][INFO]: Initializing CPU extension...
Importing squeezenet11.files/SqueezeNet-1.1/net.nntxt
Importing squeezenet11.files/SqueezeNet-1.1/parameters.h5
Try to leave only executor[Executor].
Expanding Runtime.
Executor [0]: Executor
Executor [0]: (In) Data variable[0]: Name:Input Shape:[-1, 3, 320, 320]
Executor [0]: (Out)Output variable[0]: Name:y' Shape:[-1, 1000]
Executor [0]: Function[ 0 ]: Type: Slice Name: Slice
Executor [0]: Function[ 1 ]: Type: ImageAugmentation Name: ImageAugmentation
Executor [0]: Function[ 2 ]: Type: MulScalar Name: SqueezeNet/MulScalar
Executor [0]: Function[ 3 ]: Type: AddScalar Name: SqueezeNet/AddScalar
Executor [0]: Function[ 4 ]: Type: Convolution Name: SqueezeNet/Convolution
Executor [0]: Function[ 5 ]: Type: ReLU Name: SqueezeNet/ReLU
Executor [0]: Function[ 6 ]: Type: MaxPooling Name: SqueezeNet/MaxPooling
SNIP...
Executor [0]: Function[ 63 ]: Type: ReLU Name: SqueezeNet/FireModule_8/Expand1x1ReLU
Executor [0]: Function[ 64 ]: Type: Concatenate Name: SqueezeNet/FireModule_8/Concatenate
Executor [0]: Function[ 65 ]: Type: Dropout Name: SqueezeNet/Dropout
Executor [0]: Function[ 66 ]: Type: Convolution Name: SqueezeNet/Convolution_2
Executor [0]: Function[ 67 ]: Type: ReLU Name: SqueezeNet/ReLU_2
Executor [0]: Function[ 68 ]: Type: AveragePooling Name: SqueezeNet/AveragePooling
Executor [0]: Function[ 69 ]: Type: Reshape Name: SqueezeNet/Reshape
Executor [0]: Function[ 70 ]: Type: Identity Name: y'
If you want to get network without Image Augmentation, according to above output, ImageAugmentation is placed on index 2.
With splitting after index 3, you can get network without ImageAugmentation.
You must specify -E0 -S 3-
option to nnabla_cli convert
This command rename output to XXX_S_E.nnp
, XXX is original name, S is start function index, and E is end function index.
$ nnabla_cli convert -E0 -S 3- squeezenet11.files/SqueezeNet-1.1/*.{nntxt,h5} splitted.nnp
2018-08-27 15:20:21,950 [nnabla][INFO]: Initializing CPU extension...
Importing squeezenet11.files/SqueezeNet-1.1/net.nntxt
Importing squeezenet11.files/SqueezeNet-1.1/parameters.h5
Try to leave only executor[Executor].
Expanding Runtime.
Shrink 3 to 70.
Output to [splitted_3_70.nnp]
Finally you got splitted_3_70.nnp
as splitted output.
You can check splitted NNP with nnabla_cli dump
NOTE: Input shape is changed from original network. New input shape is same as start function’s input.
$ nnabla_cli dump splitted_3_70.nnp
2018-08-27 15:20:28,021 [nnabla][INFO]: Initializing CPU extension...
Importing splitted_3_70.nnp
Expanding Runtime.
Executor [0]: Executor
Executor [0]: (In) Data variable[0]: Name:SqueezeNet/MulScalar Shape:[-1, 3, 227, 227]
Executor [0]: (Out)Output variable[0]: Name:y' Shape:[-1, 1000]
Done.
Support Status¶
Function-Level Support Status¶
ONNX Support Status¶
- Note
In this document, the numbers in the header of all tables represent the version of onnx opset.
Import¶
✓: onnx specification defined, and supported.
X: onnx specification defined, but not support yet.
Empty: Not defined (Support status follows latest).
Total: 93/155
ONNX Operator |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
NNabla Func |
Description |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Abs |
✓ |
✓ |
✓ |
Abs |
|||||||||
Acos |
✓ |
ACos |
|||||||||||
Acosh |
✓ |
ACosh |
|||||||||||
Add |
✓ |
✓ |
✓ |
Add2, Reshape |
|||||||||
And |
✓ |
✓ |
LogicalAnd, Reshape |
||||||||||
ArgMax |
✓ |
✓ |
X |
✓ |
Max |
||||||||
ArgMin |
✓ |
✓ |
X |
✓ |
Min |
||||||||
Asin |
✓ |
ASin |
|||||||||||
Asinh |
✓ |
ASinh |
|||||||||||
Atan |
✓ |
ATan |
|||||||||||
Atanh |
✓ |
ATanh |
|||||||||||
AveragePool |
✓ |
✓ |
X |
X |
AveragePooling, Pad |
Not all features are verified. Those features can be verified by ONNXRuntime when opset > 6. Some feature is not supported by Nnabla such as Pad’s edge mode. if opset >= 10, the ceil_mode is not supported. |
|||||||
BatchNormalization |
X |
X |
X |
✓ |
BatchNormalization |
||||||||
BitShift |
X |
Not yet implemented. |
|||||||||||
Cast |
✓ |
✓ |
✓ |
X |
Abs, Log |
||||||||
Ceil |
✓ |
✓ |
✓ |
Ceil |
|||||||||
Clip |
✓ |
✓ |
✓ |
✓ |
Identity, MaximumScalar, MinimumScalar |
||||||||
Compress |
X |
X |
Not yet implemented. |
||||||||||
Concat |
✓ |
✓ |
✓ |
X |
Concatenate |
||||||||
ConcatFromSequence |
X |
Not yet implemented. |
|||||||||||
Constant |
✓ |
✓ |
X |
X |
Identity |
||||||||
ConstantOfShape |
✓ |
Constant |
|||||||||||
Conv |
✓ |
✓ |
X |
Convolution |
|||||||||
ConvInteger |
X |
Not yet implemented. |
|||||||||||
ConvTranspose |
✓ |
✓ |
X |
Deconvolution, Pad |
|||||||||
Cos |
✓ |
Cos |
|||||||||||
Cosh |
✓ |
Cosh |
|||||||||||
CumSum |
X |
Not yet implemented. |
|||||||||||
DepthToSpace |
✓ |
✓ |
✓ |
Reshape, Transpose |
|||||||||
DequantizeLinear |
X |
Not yet implemented. |
|||||||||||
Det |
X |
Not yet implemented. |
|||||||||||
Div |
✓ |
✓ |
✓ |
Div2, Reshape |
|||||||||
Dropout |
✓ |
✓ |
✓ |
X |
Identity |
||||||||
DynamicQuantizeLinear |
X |
Not yet implemented. |
|||||||||||
Elu |
✓ |
✓ |
✓ |
ELU |
|||||||||
Equal |
✓ |
✓ |
X |
Equal, Reshape |
|||||||||
Erf |
X |
Not yet implemented. |
|||||||||||
Exp |
✓ |
✓ |
✓ |
Exp |
|||||||||
Expand |
✓ |
✓ |
Broadcast, Reshape |
||||||||||
EyeLike |
X |
Not yet implemented. |
|||||||||||
Flatten |
✓ |
✓ |
✓ |
✓ |
Reshape |
||||||||
Floor |
✓ |
✓ |
✓ |
Floor |
|||||||||
GRU |
X |
X |
X |
Not yet implemented. |
|||||||||
Gather |
✓ |
✓ |
✓ |
Concatenate, Slice |
|||||||||
GatherElements |
X |
Not yet implemented. |
|||||||||||
GatherND |
X |
Not yet implemented. |
|||||||||||
Gemm |
✓ |
✓ |
✓ |
✓ |
✓ |
Add2, BatchMatmul, MulScalar, Reshape |
|||||||
GlobalAveragePool |
✓ |
✓ |
GlobalAveragePooling |
||||||||||
GlobalLpPool |
X |
X |
Not yet implemented. |
||||||||||
GlobalMaxPool |
X |
Not yet implemented. |
|||||||||||
Greater |
✓ |
✓ |
✓ |
Greater, Reshape |
|||||||||
HardSigmoid |
✓ |
✓ |
✓ |
AddScalar, HardSigmoid, MaximumScalar, MinimumScalar, MulScalar |
|||||||||
Hardmax |
✓ |
✓ |
✓ |
Max, OneHot, Reshape |
|||||||||
Identity |
✓ |
✓ |
Identity |
||||||||||
If |
X |
Not yet implemented. |
|||||||||||
InstanceNormalization |
✓ |
✓ |
✓ |
BatchNormalization, Concatenate, Reshape, Split |
|||||||||
IsInf |
✓ |
IsInf |
|||||||||||
IsNaN |
✓ |
IsNaN |
|||||||||||
LRN |
✓ |
✓ |
AddScalar, Div2, MulScalar, PowScalar, SumPooling, Transpose |
||||||||||
LSTM |
X |
X |
Not yet implemented. |
||||||||||
LeakyRelu |
✓ |
✓ |
✓ |
LeakyReLU |
|||||||||
Less |
✓ |
✓ |
✓ |
Less, Reshape |
|||||||||
Log |
✓ |
✓ |
✓ |
Log |
|||||||||
LogSoftmax |
✓ |
✓ |
✓ |
Add2, Exp, Log, Max, Reshape, Sub2, Sum |
|||||||||
Loop |
X |
X |
Not yet implemented. |
||||||||||
LpNormalization |
X |
Not yet implemented. |
|||||||||||
LpPool |
X |
X |
X |
Not yet implemented. |
|||||||||
MatMul |
✓ |
✓ |
✓ |
BatchMatmul, Reshape |
|||||||||
MatMulInteger |
X |
Not yet implemented. |
|||||||||||
Max |
✓ |
✓ |
✓ |
✓ |
✓ |
Maximum2 |
|||||||
MaxPool |
✓ |
✓ |
X |
X |
X |
MaxPooling, Pad |
Not all features are verified. Those features can be verified by ONNXRuntime. if opset >= 10, the ceil_mode is not supported, dilations is not equal to 1 is not supported. |
||||||
MaxRoiPool |
X |
Not yet implemented. |
|||||||||||
MaxUnpool |
X |
X |
Not yet implemented. |
||||||||||
Mean |
✓ |
✓ |
✓ |
✓ |
✓ |
Broadcast, Mean, Stack |
|||||||
MeanVarianceNormalization |
X |
Not yet implemented. |
|||||||||||
Min |
✓ |
✓ |
✓ |
✓ |
✓ |
Minimum2 |
|||||||
Mod |
X |
Not yet implemented. |
|||||||||||
Mul |
✓ |
✓ |
✓ |
Mul2, Reshape |
|||||||||
Multinomial |
X |
Not yet implemented. |
|||||||||||
Neg |
✓ |
✓ |
✓ |
MulScalar |
|||||||||
NonMaxSuppression |
X |
X |
Not yet implemented. |
||||||||||
NonZero |
X |
Not yet implemented. |
|||||||||||
Not |
✓ |
✓ |
LogicalNot |
||||||||||
OneHot |
X |
X |
Not yet implemented. |
||||||||||
Or |
✓ |
✓ |
LogicalOr, Reshape |
||||||||||
PRelu |
✓ |
✓ |
✓ |
X |
PReLU |
||||||||
Pad |
✓ |
✓ |
✓ |
✓ |
Pad |
Onnx required to support “edge” mode, while nnabla does not support it. |
|||||||
Pow |
✓ |
✓ |
Pow2, Reshape |
||||||||||
QLinearConv |
X |
Not yet implemented. |
|||||||||||
QLinearMatMul |
X |
Not yet implemented. |
|||||||||||
QuantizeLinear |
X |
Not yet implemented. |
|||||||||||
RNN |
X |
X |
Not yet implemented. |
||||||||||
RandomNormal |
X |
Not yet implemented. |
|||||||||||
RandomNormalLike |
X |
Not yet implemented. |
|||||||||||
RandomUniform |
X |
Not yet implemented. |
|||||||||||
RandomUniformLike |
X |
Not yet implemented. |
|||||||||||
Range |
X |
Not yet implemented. |
|||||||||||
Reciprocal |
✓ |
✓ |
✓ |
RDivScalar |
|||||||||
ReduceL1 |
X |
X |
Not yet implemented. |
||||||||||
ReduceL2 |
X |
X |
Not yet implemented. |
||||||||||
ReduceLogSum |
X |
X |
Not yet implemented. |
||||||||||
ReduceLogSumExp |
X |
X |
Not yet implemented. |
||||||||||
ReduceMax |
✓ |
✓ |
✓ |
Max |
|||||||||
ReduceMean |
✓ |
✓ |
✓ |
Mean |
|||||||||
ReduceMin |
✓ |
✓ |
✓ |
Min |
|||||||||
ReduceProd |
✓ |
✓ |
✓ |
Prod |
|||||||||
ReduceSum |
✓ |
✓ |
✓ |
Sum |
|||||||||
ReduceSumSquare |
✓ |
✓ |
✓ |
PowScalar, Sum |
|||||||||
Relu |
✓ |
✓ |
✓ |
ReLU |
|||||||||
Reshape |
✓ |
✓ |
✓ |
Reshape |
|||||||||
Resize |
X |
X |
Not yet implemented. |
||||||||||
ReverseSequence |
X |
Not yet implemented. |
|||||||||||
RoiAlign |
X |
Not yet implemented. |
|||||||||||
Round |
✓ |
Round |
|||||||||||
Scan |
X |
X |
X |
Not yet implemented. |
|||||||||
Scatter |
X |
X |
Not yet implemented. |
||||||||||
ScatterElements |
X |
Not yet implemented. |
|||||||||||
ScatterND |
X |
Not yet implemented. |
|||||||||||
Selu |
✓ |
✓ |
✓ |
SELU |
|||||||||
SequenceAt |
X |
Not yet implemented. |
|||||||||||
SequenceConstruct |
X |
Not yet implemented. |
|||||||||||
SequenceErase |
X |
Not yet implemented. |
|||||||||||
SequenceInsert |
X |
Not yet implemented. |
|||||||||||
SequenceLength |
X |
Not yet implemented. |
|||||||||||
Shape |
X |
Not yet implemented. |
|||||||||||
Shrink |
X |
Not yet implemented. |
|||||||||||
Sigmoid |
✓ |
✓ |
✓ |
Sigmoid |
|||||||||
Sign |
✓ |
Sign |
|||||||||||
Sin |
✓ |
Sin |
|||||||||||
Sinh |
✓ |
Sinh |
|||||||||||
Size |
X |
Not yet implemented. |
|||||||||||
Slice |
✓ |
✓ |
✓ |
X |
Slice |
||||||||
Softmax |
✓ |
✓ |
✓ |
Div2, Exp, Max, Reshape, Sub2, Sum |
|||||||||
Softplus |
✓ |
✓ |
SoftPlus |
||||||||||
Softsign |
✓ |
✓ |
SoftSign |
||||||||||
SpaceToDepth |
✓ |
✓ |
Reshape, Transpose |
||||||||||
Split |
✓ |
✓ |
✓ |
✓ |
Split, Stack |
||||||||
SplitToSequence |
X |
Not yet implemented. |
|||||||||||
Sqrt |
✓ |
✓ |
✓ |
PowScalar |
|||||||||
Squeeze |
✓ |
✓ |
✓ |
Reshape |
|||||||||
StringNormalizer |
X |
Not yet implemented. |
|||||||||||
Sub |
✓ |
✓ |
✓ |
Reshape, Sub2 |
|||||||||
Sum |
✓ |
✓ |
✓ |
X |
X |
AddN |
|||||||
Tan |
✓ |
Tan |
|||||||||||
Tanh |
✓ |
✓ |
✓ |
Tanh |
|||||||||
TfIdfVectorizer |
X |
Not yet implemented. |
|||||||||||
ThresholdedRelu |
✓ |
Constant, GreaterScalar, Where |
|||||||||||
Tile |
✓ |
✓ |
✓ |
Tile |
|||||||||
TopK |
X |
X |
X |
Not yet implemented. |
|||||||||
Transpose |
✓ |
✓ |
Transpose |
||||||||||
Unique |
X |
Not yet implemented. |
|||||||||||
Unsqueeze |
✓ |
✓ |
✓ |
Reshape |
|||||||||
Upsample |
X |
X |
✓ |
X |
Unpooling |
||||||||
Where |
✓ |
Where |
|||||||||||
Xor |
✓ |
✓ |
LogicalXor, Reshape |
Export¶
✓: Support to export this opset.
△: Partially support to export this opset (e.g. some cases cannot be supported, or not completely tested).
X: Supported, but test failed.
Empty: Not support corresponding opset version.
Total: 120/173
Neural Network Layer¶
Count 11/14
NNabla Function
7
9
10
11
ONNX Op
Description
Affine
✓
✓
✓
✓
Gemm, Reshape
RNN
Not yet implemented.
LSTM
Not yet implemented.
GRU
Not yet implemented.
Convolution
✓
✓
✓
✓
Conv, Reshape
DepthwiseConvolution
✓
✓
✓
✓
Conv, Reshape
Deconvolution
✓
✓
✓
✓
ConvTranspose, Reshape
DepthwiseDeconvolution
✓
✓
✓
✓
ConvTranspose, Reshape
MaxPooling
✓
✓
✓
✓
Constant, MaxPool, Pad, Reshape
AveragePooling
△
△
△
△
AveragePool, Constant, Pad, Reshape
Currently only supports the cases where both ignore_border and including_pad are True.
GlobalAveragePooling
✓
✓
✓
✓
GlobalAveragePool
SumPooling
✓
✓
✓
✓
AveragePool, Constant, Mul, Pad, Reshape
Unpooling
✓
✓
✓
✓
Resize
Embed
✓
✓
✓
✓
Gather
Neural Network Activation Functions¶
Count 21/21
NNabla Function
7
9
10
11
ONNX Op
Description
Sigmoid
✓
✓
✓
✓
Sigmoid
Swish
✓
✓
✓
✓
Mul, Sigmoid
Tanh
✓
✓
✓
✓
Tanh
ReLU
✓
✓
✓
✓
Relu
LeakyReLU
✓
✓
✓
✓
LeakyRelu
Softmax
✓
✓
✓
✓
Div, Exp, ReduceMax, ReduceSum, Sub
LogSoftmax
✓
✓
✓
✓
Exp, Log, ReduceMax, ReduceSum, Sub
ELU
✓
✓
✓
✓
Elu
SELU
✓
✓
✓
✓
Selu
CReLU
✓
✓
✓
✓
Concat, Neg, Relu
CELU
✓
✓
✓
✓
Concat, Elu, Neg
PReLU
✓
✓
✓
✓
PRelu, Reshape
GELU
✓
✓
✓
✓
Add, Constant, Div, Mul, Pow, Sqrt, Tanh
ReLU6
✓
✓
✓
✓
Constant, Min, Relu
HardSigmoid
✓
✓
✓
✓
HardSigmoid
HardTanh
✓
✓
✓
✓
Constant, Max, Min, Neg
LogSigmoid
✓
✓
✓
✓
Log, Sigmoid
SoftPlus
✓
✓
✓
✓
Softplus
SoftSign
✓
✓
✓
✓
Softsign
TanhShrink
✓
✓
✓
✓
Sub, Tanh
Sinc
X
X
X
✓
Constant, Div, Equal, Sin, Where
Normalization¶
Count 2/6
NNabla Function
7
9
10
11
ONNX Op
Description
FusedBatchNormalization
✓
✓
✓
✓
Add, BatchNormalization, Constant, Div, Mul, ReduceMean, ReduceSum, Relu, Reshape, Squeeze, Sub
BatchNormalization
✓
✓
✓
✓
BatchNormalization, Constant, Div, Mul, ReduceMean, ReduceSum, Reshape, Squeeze, Sub
SyncBatchNormalization
Not yet implemented.
MeanSubtraction
Not yet implemented.
ClipGradByValue
Not yet implemented.
ClipGradByNorm
Not yet implemented.
Reduction¶
Count 5/7
NNabla Function
7
9
10
11
ONNX Op
Description
Sum
✓
✓
✓
✓
ReduceSum
Mean
✓
✓
✓
✓
ReduceMean
Max
✓
✓
✓
✓
ReduceMax
Min
✓
✓
✓
✓
ReduceMin
Prod
✓
✓
✓
✓
ReduceProd
ReduceSum
Not yet implemented.
ReduceMean
Not yet implemented.
Arithmetic¶
Count 11/12
NNabla Function
7
9
10
11
ONNX Op
Description
Add2
✓
✓
✓
✓
Add
BcAdd2
Not yet implemented.
Sub2
✓
✓
✓
✓
Sub
Mul2
✓
✓
✓
✓
Mul
Div2
✓
✓
✓
✓
Div
Pow2
✓
✓
✓
✓
Pow
AddScalar
✓
✓
✓
✓
Add, Constant
MulScalar
✓
✓
✓
✓
Constant, Mul
PowScalar
✓
✓
✓
✓
Constant, Pow
RSubScalar
✓
✓
✓
✓
Constant, Sub
RDivScalar
✓
✓
✓
✓
Constant, Div
RPowScalar
✓
✓
✓
✓
Constant, Pow
Logical¶
Count 29/29
NNabla Function
7
9
10
11
ONNX Op
Description
Sign
X
✓
✓
✓
Sign
Minimum2
✓
✓
✓
✓
Add, Constant, Min
Maximum2
✓
✓
✓
✓
Add, Constant, Max
MinimumScalar
✓
✓
✓
✓
Add, Constant, Min
MaximumScalar
✓
✓
✓
✓
Add, Constant, Max
LogicalAnd
✓
✓
✓
✓
And
LogicalOr
✓
✓
✓
✓
Or
LogicalXor
✓
✓
✓
✓
Xor
Equal
X
X
X
✓
Equal
NotEqual
X
X
X
✓
Equal, Not
GreaterEqual
✓
✓
✓
✓
Less, Not
Greater
✓
✓
✓
✓
Greater
LessEqual
✓
✓
✓
✓
Greater, Not
Less
✓
✓
✓
✓
Less
LogicalAndScalar
✓
✓
✓
✓
And, Constant
LogicalOrScalar
✓
✓
✓
✓
Constant, Or
LogicalXorScalar
✓
✓
✓
✓
Constant, Xor
EqualScalar
X
X
X
✓
Constant, Equal
NotEqualScalar
X
X
X
✓
Constant, Equal, Not
GreaterEqualScalar
✓
✓
✓
✓
Constant, Less, Not
GreaterScalar
✓
✓
✓
✓
Constant, Greater
LessEqualScalar
✓
✓
✓
✓
Constant, Greater, Not
LessScalar
✓
✓
✓
✓
Constant, Less
LogicalNot
✓
✓
✓
✓
Not
IsNaN
X
✓
✓
✓
IsNaN
IsInf
X
X
✓
✓
IsInf
ResetNaN
X
✓
✓
✓
Constant, IsNaN, Where
ResetInf
X
X
✓
✓
Constant, IsInf, Where
Where
X
✓
✓
✓
Where
Math¶
Count 22/22
NNabla Function
7
9
10
11
ONNX Op
Description
Constant
✓
✓
✓
✓
Constant, Identity
Arange
✓
✓
✓
✓
Constant, Identity
Abs
✓
✓
✓
✓
Abs
Exp
✓
✓
✓
✓
Exp
Log
✓
✓
✓
✓
Log
Identity
✓
✓
✓
✓
Identity
BatchMatmul
✓
✓
✓
✓
MatMul, Transpose
Round
X
X
X
✓
Round
Ceil
✓
✓
✓
✓
Ceil
Floor
✓
✓
✓
✓
Floor
Sin
✓
✓
✓
✓
Sin
Cos
✓
✓
✓
✓
Cos
Tan
✓
✓
✓
✓
Tan
Sinh
X
✓
✓
✓
Sinh
Cosh
X
✓
✓
✓
Cosh
ASin
✓
✓
✓
✓
Asin
ACos
✓
✓
✓
✓
Acos
ATan
✓
✓
✓
✓
Atan
ATan2
✓
✓
✓
✓
Atan, Div
ASinh
X
✓
✓
✓
Asinh
ACosh
X
✓
✓
✓
Acosh
ATanh
X
✓
✓
✓
Atanh
Array Manipulation¶
Count 12/19
NNabla Function
7
9
10
11
ONNX Op
Description
Concatenate
✓
✓
✓
✓
Concat
Split
✓
✓
✓
✓
Split, Squeeze
Stack
✓
✓
✓
✓
Concat, Unsqueeze
Slice
△
△
✓
✓
Constant, Slice
ONNX slice cannot support step != 1 on opset < 10.
Pad
△
△
△
△
Constant, Pad
When the mode of the pad is reflect, if the size of the pad exceeds the input size, onnxruntime cannot handle it.
Transpose
✓
✓
✓
✓
Transpose
Broadcast
X
✓
✓
✓
BroadcastTo
✓
✓
✓
✓
Tile
✓
✓
✓
✓
Constant, Reshape, Tile
OneHot
X
✓
✓
✓
Flatten, Gather, Reshape
Flip
✓
✓
✓
✓
Gather, Identity, Transpose
Shift
Not yet implemented.
Sort
Not yet implemented.
Reshape
✓
✓
✓
✓
Constant, Reshape
MatrixDiag
Not yet implemented.
MatrixDiagPart
Not yet implemented.
Assign
Not yet implemented.
GatherNd
Not yet implemented.
ScatterNd
Not yet implemented.
Signal Processing¶
Count 1/3
NNabla Function
7
9
10
11
ONNX Op
Description
Interpolate
X
X
△
✓
Resize
FFT
Not yet implemented.
IFFT
Not yet implemented.
Stochasticity¶
Count 0/11
NNabla Function
7
9
10
11
ONNX Op
Description
Dropout
X
X
X
X
Dropout
The Dropout in nnabla has no test mode and contains random parameters, so the test result is not the same as onnx.
TopKData
Not yet implemented.
TopKGrad
Not yet implemented.
Rand
Not yet implemented.
Randint
Not yet implemented.
Randn
Not yet implemented.
RandomChoice
Not yet implemented.
RandomCrop
Not yet implemented.
RandomFlip
Not yet implemented.
RandomShift
Not yet implemented.
ImageAugmentation
Not yet implemented.
Loss Functions¶
Count 0/9
NNabla Function
7
9
10
11
ONNX Op
Description
SigmoidCrossEntropy
Not yet implemented.
BinaryCrossEntropy
Not yet implemented.
SoftmaxCrossEntropy
Not yet implemented.
CategoricalCrossEntropy
Not yet implemented.
SquaredError
Not yet implemented.
AbsoluteError
Not yet implemented.
HuberLoss
Not yet implemented.
EpsilonInsensitiveLoss
Not yet implemented.
KLMultinomial
Not yet implemented.
Quantization Neural Network Layers¶
Count 6/12
NNabla Function
7
9
10
11
ONNX Op
Description
BinarySigmoid
X
✓
✓
✓
Constant, Greater, Where
BinaryTanh
X
✓
✓
✓
Constant, Greater, Where
BinaryConnectAffine
✓
✓
✓
✓
Gemm, Reshape
BinaryConnectConvolution
✓
✓
✓
✓
Conv, Reshape
BinaryWeightAffine
✓
✓
✓
✓
Add, MatMul, Mul, Reshape
BinaryWeightConvolution
✓
✓
✓
✓
Add, Conv, Mul, Reshape
INQAffine
Not yet implemented.
INQConvolution
Not yet implemented.
FixedPointQuantize
Not yet implemented.
MinMaxQuantize
Not yet implemented.
Pow2Quantize
Not yet implemented.
Prune
Not yet implemented.
Validation¶
Count 0/3
NNabla Function
7
9
10
11
ONNX Op
Description
TopNError
Not yet implemented.
BinaryError
Not yet implemented.
ConfusionMatrix
Not yet implemented.
Unsupported, Special Use¶
Count 0/5
NNabla Function
7
9
10
11
ONNX Op
Description
VATNoise
Not yet implemented.
Unlink
Not yet implemented.
Sink
Not yet implemented.
NmsDetection2d
Not yet implemented.
MaxPoolingBackward
Not yet implemented.
Tensorflow Support Status¶
Import¶
✓: Supported
△: Partially supported
X: Supported, but test failed.
Empty: Not support yet.
Total: 109/122
Tensorflow Function |
Status |
NNabla Func |
Description |
---|---|---|---|
Abs |
✓ |
Abs |
|
Acos |
✓ |
ACos |
|
Acosh |
✓ |
ACosh |
|
Add |
✓ |
Add2 |
|
AddN |
✓ |
AddN |
|
All |
✓ |
Greater, Min, Reshape |
|
Any |
✓ |
Greater, Reshape, Sum |
|
ArgMax |
✓ |
Max |
|
ArgMin |
✓ |
Min |
|
Asin |
✓ |
ASin |
|
Asinh |
✓ |
ASinh |
|
Atan |
✓ |
ATan |
|
Atan2 |
✓ |
ATan, Add2, Div2, Mul2, Reshape, Sign, Sub2 |
|
Atanh |
✓ |
ATanh |
|
AvgPool |
△ |
AveragePooling, Pad, Transpose |
|
AvgPool3D |
△ |
AveragePooling, Pad, Transpose |
|
BatchMatMul |
✓ |
BatchMatmul, Transpose |
|
BatchNormalization |
✓ |
Add2, Mul2, PowScalar, RDivScalar, Reshape, Sub2 |
|
BiasAdd |
✓ |
Add2, Reshape |
|
BroadcastTo |
✓ |
||
Cast |
X |
NA |
Not yet implemented. |
Ceil |
✓ |
Ceil |
|
ClipByValue |
✓ |
Maximum2, Minimum2, Reshape |
|
Concat |
✓ |
Concatenate |
|
ConcatV2 |
✓ |
Concatenate |
|
Const |
✓ |
NA |
|
Conv1D |
△ |
Convolution, Pad, Reshape, Transpose |
|
Conv1DTranspose |
△ |
Deconvolution, Reshape, Transpose |
|
Conv2D |
△ |
Convolution, Pad, Transpose |
|
Conv2DBackpropInput |
△ |
Deconvolution, Transpose |
|
Conv3D |
△ |
Convolution, Pad, Transpose |
|
Conv3DBackpropInput |
△ |
Deconvolution, Pad, Transpose |
|
Cos |
✓ |
Cos |
|
Cosh |
✓ |
Cosh |
|
Crelu |
✓ |
Concatenate, MulScalar, ReLU |
|
Cumsum |
X |
Not yet implemented. |
|
DepthToSpace |
✓ |
Reshape, Transpose |
|
DepthwiseConv2d |
△ |
Convolution, Pad, Reshape, Transpose |
|
Div |
✓ |
Div2 |
|
Elu |
✓ |
ELU |
|
Equal |
✓ |
Equal |
|
Erf |
X |
Not yet implemented. |
|
Erfc |
X |
Not yet implemented. |
|
Exp |
✓ |
Exp |
|
ExpandDims |
✓ |
Reshape |
|
Floor |
✓ |
Floor |
|
FloorDiv |
✓ |
Div2, Floor |
|
FloorMod |
✓ |
Div2, Floor, Mul2, Sub2 |
|
GatherNd |
X |
Not yet implemented. |
|
GatherV2 |
X |
Concatenate, Slice |
Not yet implemented. |
Greater |
✓ |
Greater |
|
GreaterEqual |
✓ |
Less, LogicalNot |
|
Identity |
✓ |
Identity |
|
IsInf |
✓ |
IsInf |
|
IsNan |
✓ |
IsNaN |
|
LeakyRelu |
✓ |
LeakyReLU |
|
Less |
✓ |
Less |
|
LessEqual |
✓ |
Greater, LogicalNot |
|
Log |
✓ |
Log |
|
LogSigmoid |
✓ |
MulScalar, SoftPlus |
|
LogSoftmax |
✓ |
Add2, Exp, Log, Max, Reshape, Sub2, Sum, Transpose |
|
LogicalAnd |
✓ |
LogicalAnd |
|
LogicalNot |
✓ |
LogicalNot |
|
LogicalOr |
✓ |
LogicalOr |
|
LogicalXor |
✓ |
LogicalAnd, LogicalNot, LogicalOr |
|
Max |
✓ |
Max |
|
MaxPool |
△ |
MaxPooling, Pad, Reshape, Transpose |
|
MaxPool3D |
△ |
MaxPooling, Pad, Transpose |
|
MaxPoolWithArgmax |
X |
Not yet implemented. |
|
Maximum |
✓ |
Maximum2 |
|
Mean |
✓ |
Mean |
|
Min |
✓ |
Min |
|
Minimum |
✓ |
Minimum2 |
|
Mul |
✓ |
Mul2 |
|
Neg |
✓ |
MulScalar |
|
NotEqual |
✓ |
Equal, LogicalNot |
|
Pack |
✓ |
Concatenate, Reshape |
|
Pad |
△ |
Pad |
|
Pow |
✓ |
Pow2 |
|
Prod |
✓ |
Prod |
|
RealDiv |
✓ |
Div2 |
|
Reciprocal |
✓ |
RDivScalar |
|
Relu |
✓ |
ReLU |
|
Relu6 |
✓ |
MaximumScalar, MinimumScalar |
|
Reshape |
✓ |
Reshape |
|
ReverseSequence |
X |
Not yet implemented. |
|
ReverseV2 |
X |
Not yet implemented. |
|
Round |
✓ |
Round |
|
Rsqrt |
✓ |
PowScalar, RDivScalar |
|
Selu |
✓ |
SELU |
|
Shape |
X |
Not yet implemented. |
|
Sigmoid |
✓ |
Sigmoid |
|
Sign |
✓ |
Sign |
|
Sin |
✓ |
Sin |
|
Sinh |
✓ |
Sinh |
|
Size |
X |
Not yet implemented. |
|
Slice |
✓ |
Slice |
|
Softmax |
✓ |
Div2, Exp, Max, Reshape, Sub2, Sum, Transpose |
|
Softplus |
✓ |
SoftPlus |
|
Softsign |
✓ |
SoftSign |
|
SpaceToDepth |
✓ |
Reshape, Transpose |
|
Split |
✓ |
Split, Stack |
|
SplitV |
✓ |
Split, Stack |
|
Sqrt |
✓ |
PowScalar |
|
Square |
✓ |
Mul2 |
|
SquaredDifference |
✓ |
Mul2, Sub2 |
|
Squeeze |
✓ |
Reshape |
|
StopGradient |
✓ |
Identity |
|
StridedSlice |
△ |
Slice |
|
Sub |
✓ |
Sub2 |
|
Sum |
✓ |
Sum |
|
Swish |
✓ |
Mul2, Sigmoid |
|
Tan |
✓ |
Tan |
|
Tanh |
✓ |
Tanh |
|
Tile |
✓ |
Tile |
|
TopKV2 |
X |
Not yet implemented. |
|
Transpose |
✓ |
Transpose |
|
TruncateDiv |
✓ |
Div2 |
|
TruncateMod |
X |
Not yet implemented. |
|
Unpack |
✓ |
Reshape, Split, Stack |
|
Where |
△ |
Where |
|
ZerosLike |
✓ |
NA |
Export¶
✓: Supported
△: Partially supported
X: Supported, but test failed.
Empty: Not support yet.
Total: 120/173
Neural Network Layer¶
Count 11/14
NNabla Function
Status
Description
Affine
✓
RNN
Not yet implemented.
LSTM
Not yet implemented.
GRU
Not yet implemented.
Convolution
△
The cases
dilations
andstrides
larger than 1 are not supported by tensorflow.DepthwiseConvolution
△
The cases
dilations
andstrides
larger than 1 are not supported by tensorflow.Deconvolution
△
The cases
dilations
larger than 1 are not supported by tensorflow.DepthwiseDeconvolution
△
The cases
dilations
larger than 1 are not supported by tensorflow.MaxPooling
✓
AveragePooling
△
Currently only supports the cases both ignore_border and including_pad are True.
GlobalAveragePooling
✓
SumPooling
✓
Unpooling
△
The kernel only supports 2d.
Embed
✓
Neural Network Activation Functions¶
Count 21/21
NNabla Function
Status
Description
Sigmoid
✓
Swish
✓
Tanh
✓
ReLU
✓
LeakyReLU
✓
Softmax
✓
LogSoftmax
✓
ELU
✓
SELU
△
CReLU
✓
CELU
✓
PReLU
✓
GELU
✓
ReLU6
✓
HardSigmoid
✓
HardTanh
✓
LogSigmoid
✓
SoftPlus
✓
SoftSign
✓
TanhShrink
✓
Sinc
✓
Normalization¶
Count 2/6
NNabla Function
Status
Description
FusedBatchNormalization
✓
BatchNormalization
✓
SyncBatchNormalization
Not yet implemented.
MeanSubtraction
Not yet implemented.
ClipGradByValue
Not yet implemented.
ClipGradByNorm
Not yet implemented.
Reduction¶
Count 5/7
NNabla Function
Status
Description
Sum
✓
Mean
✓
Max
✓
Min
✓
Prod
✓
ReduceSum
Not yet implemented.
ReduceMean
Not yet implemented.
Arithmetic¶
Count 11/12
NNabla Function
Status
Description
Add2
✓
BcAdd2
Not yet implemented.
Sub2
✓
Mul2
✓
Div2
✓
Pow2
✓
AddScalar
✓
MulScalar
✓
PowScalar
✓
RSubScalar
✓
RDivScalar
✓
RPowScalar
✓
Logical¶
Count 29/29
NNabla Function
Status
Description
Sign
✓
Minimum2
✓
Maximum2
✓
MinimumScalar
✓
MaximumScalar
✓
LogicalAnd
✓
LogicalOr
✓
LogicalXor
✓
Equal
✓
NotEqual
✓
GreaterEqual
✓
Greater
✓
LessEqual
✓
Less
✓
LogicalAndScalar
✓
LogicalOrScalar
✓
LogicalXorScalar
✓
EqualScalar
✓
NotEqualScalar
✓
GreaterEqualScalar
✓
GreaterScalar
✓
LessEqualScalar
✓
LessScalar
✓
LogicalNot
✓
IsNaN
✓
IsInf
✓
ResetNaN
✓
ResetInf
✓
Where
✓
Math¶
Count 22/22
NNabla Function
Status
Description
Constant
✓
Arange
✓
Abs
✓
Exp
✓
Log
✓
Identity
✓
BatchMatmul
✓
Round
✓
Ceil
✓
Floor
✓
Sin
✓
Cos
✓
Tan
✓
Sinh
✓
Cosh
✓
ASin
✓
ACos
✓
ATan
✓
ATan2
✓
ASinh
✓
ACosh
✓
ATanh
✓
Array Manipulation¶
Count 12/19
NNabla Function
Status
Description
Concatenate
✓
Split
✓
Stack
✓
Slice
✓
Pad
△
When the mode of the pad is reflect, if the size of the pad exceeds the input size, tensorflow cannot handle it.
Transpose
✓
Broadcast
✓
BroadcastTo
✓
Tile
✓
OneHot
✓
Flip
✓
Shift
Not yet implemented.
Sort
Not yet implemented.
Reshape
✓
MatrixDiag
Not yet implemented.
MatrixDiagPart
Not yet implemented.
Assign
Not yet implemented.
GatherNd
Not yet implemented.
ScatterNd
Not yet implemented.
Signal Processing¶
Count 1/3
NNabla Function
Status
Description
Interpolate
△
FFT
Not yet implemented.
IFFT
Not yet implemented.
Stochasticity¶
Count 0/11
NNabla Function
Status
Description
Dropout
X
The Dropout in nnabla has no test mode and contains random parameters, so the test result is not the same as tensorflow.
TopKData
Not yet implemented.
TopKGrad
Not yet implemented.
Rand
Not yet implemented.
Randint
Not yet implemented.
Randn
Not yet implemented.
RandomChoice
Not yet implemented.
RandomCrop
Not yet implemented.
RandomFlip
Not yet implemented.
RandomShift
Not yet implemented.
ImageAugmentation
Not yet implemented.
Loss Functions¶
Count 0/9
NNabla Function
Status
Description
SigmoidCrossEntropy
Not yet implemented.
BinaryCrossEntropy
Not yet implemented.
SoftmaxCrossEntropy
Not yet implemented.
CategoricalCrossEntropy
Not yet implemented.
SquaredError
Not yet implemented.
AbsoluteError
Not yet implemented.
HuberLoss
Not yet implemented.
EpsilonInsensitiveLoss
Not yet implemented.
KLMultinomial
Not yet implemented.
Quantization Neural Network Layers¶
Count 6/12
NNabla Function
Status
Description
BinarySigmoid
✓
BinaryTanh
✓
BinaryConnectAffine
✓
BinaryConnectConvolution
△
The cases
dilations
andstrides
larger than 1 are not supported by tensorflow.BinaryWeightAffine
✓
BinaryWeightConvolution
△
The cases
dilations
andstrides
larger than 1 are not supported by tensorflow.INQAffine
Not yet implemented.
INQConvolution
Not yet implemented.
FixedPointQuantize
Not yet implemented.
MinMaxQuantize
Not yet implemented.
Pow2Quantize
Not yet implemented.
Prune
Not yet implemented.
Validation¶
Count 0/3
NNabla Function
Status
Description
TopNError
Not yet implemented.
BinaryError
Not yet implemented.
ConfusionMatrix
Not yet implemented.
Unsupported, Special Use¶
Count 0/5
NNabla Function
Status
Description
VATNoise
Not yet implemented.
Unlink
Not yet implemented.
Sink
Not yet implemented.
NmsDetection2d
Not yet implemented.
MaxPoolingBackward
Not yet implemented.
Tensorflow Lite Support Status¶
Export¶
✓: Supported
△: Partially supported
X: Supported, but test failed.
Empty: Not support yet.
Total: 98/173
Neural Network Layer¶
Count 8/14
NNabla Function
Status
Affine
✓
RNN
LSTM
GRU
Convolution
△
DepthwiseConvolution
△
Deconvolution
△
DepthwiseDeconvolution
△
MaxPooling
X
AveragePooling
X
GlobalAveragePooling
✓
SumPooling
X
Unpooling
△
Embed
✓
Neural Network Activation Functions¶
Count 20/21
NNabla Function
Status
Sigmoid
✓
Swish
✓
Tanh
✓
ReLU
✓
LeakyReLU
✓
Softmax
✓
LogSoftmax
✓
ELU
✓
SELU
△
CReLU
✓
CELU
✓
PReLU
✓
GELU
✓
ReLU6
✓
HardSigmoid
✓
HardTanh
✓
LogSigmoid
✓
SoftPlus
✓
SoftSign
✓
TanhShrink
✓
Sinc
X
Normalization¶
Count 0/6
NNabla Function
Status
FusedBatchNormalization
X
BatchNormalization
X
SyncBatchNormalization
MeanSubtraction
ClipGradByValue
ClipGradByNorm
Reduction¶
Count 5/7
NNabla Function
Status
Sum
✓
Mean
✓
Max
✓
Min
✓
Prod
✓
ReduceSum
ReduceMean
Arithmetic¶
Count 11/12
NNabla Function
Status
Add2
✓
BcAdd2
Sub2
✓
Mul2
✓
Div2
✓
Pow2
✓
AddScalar
✓
MulScalar
✓
PowScalar
✓
RSubScalar
✓
RDivScalar
✓
RPowScalar
✓
Logical¶
Count 25/29
NNabla Function
Status
Sign
✓
Minimum2
✓
Maximum2
✓
MinimumScalar
✓
MaximumScalar
✓
LogicalAnd
✓
LogicalOr
✓
LogicalXor
✓
Equal
✓
NotEqual
✓
GreaterEqual
✓
Greater
✓
LessEqual
✓
Less
✓
LogicalAndScalar
✓
LogicalOrScalar
✓
LogicalXorScalar
✓
EqualScalar
✓
NotEqualScalar
✓
GreaterEqualScalar
✓
GreaterScalar
✓
LessEqualScalar
✓
LessScalar
✓
LogicalNot
✓
IsNaN
✓
IsInf
X
ResetNaN
X
ResetInf
X
Where
X
Math¶
Count 14/22
NNabla Function
Status
Constant
✓
Arange
✓
Abs
✓
Exp
✓
Log
✓
Identity
✓
BatchMatmul
✓
Round
X
Ceil
✓
Floor
✓
Sin
✓
Cos
✓
Tan
✓
Sinh
✓
Cosh
✓
ASin
X
ACos
X
ATan
X
ATan2
X
ASinh
X
ACosh
X
ATanh
X
Array Manipulation¶
Count 11/19
NNabla Function
Status
Concatenate
✓
Split
✓
Stack
✓
Slice
△
Pad
X
Transpose
✓
Broadcast
✓
BroadcastTo
✓
Tile
✓
OneHot
✓
Flip
✓
Shift
Sort
Reshape
✓
MatrixDiag
MatrixDiagPart
Assign
GatherNd
ScatterNd
Signal Processing¶
Count 0/3
NNabla Function
Status
Interpolate
X
FFT
IFFT
Stochasticity¶
Count 0/11
NNabla Function
Status
Dropout
X
TopKData
TopKGrad
Rand
Randint
Randn
RandomChoice
RandomCrop
RandomFlip
RandomShift
ImageAugmentation
Loss Functions¶
Count 0/9
NNabla Function
Status
SigmoidCrossEntropy
BinaryCrossEntropy
SoftmaxCrossEntropy
CategoricalCrossEntropy
SquaredError
AbsoluteError
HuberLoss
EpsilonInsensitiveLoss
KLMultinomial
Quantization Neural Network Layers¶
Count 4/12
NNabla Function
Status
BinarySigmoid
X
BinaryTanh
X
BinaryConnectAffine
✓
BinaryConnectConvolution
△
BinaryWeightAffine
✓
BinaryWeightConvolution
△
INQAffine
INQConvolution
FixedPointQuantize
MinMaxQuantize
Pow2Quantize
Prune
Validation¶
Count 0/3
NNabla Function
Status
TopNError
BinaryError
ConfusionMatrix
Unsupported, Special Use¶
Count 0/5
NNabla Function
Status
VATNoise
Unlink
Sink
NmsDetection2d
MaxPoolingBackward
NNabla C Runtime Support Status¶
NNabla version: None
✓: Supported
△: Partially supported
X: Supported, but test failed or no test data.
Empty: Not support yet.
Export¶
Total: 56/173
Neural Network Layer¶
Count 8/14
NNabla Function
Status
Description
Affine
✓
RNN
LSTM
GRU
Convolution
✓
DepthwiseConvolution
✓
Deconvolution
✓
DepthwiseDeconvolution
MaxPooling
✓
AveragePooling
✓
GlobalAveragePooling
SumPooling
✓
Unpooling
✓
Embed
Neural Network Activation Functions¶
Count 11/21
NNabla Function
Status
Description
Sigmoid
✓
Swish
✓
Tanh
✓
ReLU
✓
LeakyReLU
✓
Softmax
✓
LogSoftmax
ELU
✓
SELU
✓
CReLU
✓
CELU
✓
PReLU
✓
GELU
ReLU6
HardSigmoid
HardTanh
LogSigmoid
SoftPlus
SoftSign
TanhShrink
Sinc
Normalization¶
Count 1/6
NNabla Function
Status
Description
FusedBatchNormalization
BatchNormalization
✓
SyncBatchNormalization
MeanSubtraction
X
ClipGradByValue
ClipGradByNorm
Reduction¶
Count 1/7
NNabla Function
Status
Description
Sum
✓
Mean
Max
Min
Prod
ReduceSum
ReduceMean
Arithmetic¶
Count 11/12
NNabla Function
Status
Description
Add2
✓
BcAdd2
Sub2
✓
Mul2
✓
Div2
✓
Pow2
✓
AddScalar
✓
MulScalar
✓
PowScalar
✓
RSubScalar
✓
RDivScalar
✓
RPowScalar
✓
Logical¶
Count 5/29
NNabla Function
Status
Description
Sign
✓
Minimum2
✓
Maximum2
✓
MinimumScalar
✓
MaximumScalar
✓
LogicalAnd
LogicalOr
LogicalXor
Equal
NotEqual
GreaterEqual
Greater
LessEqual
Less
LogicalAndScalar
LogicalOrScalar
LogicalXorScalar
EqualScalar
NotEqualScalar
GreaterEqualScalar
GreaterScalar
LessEqualScalar
LessScalar
LogicalNot
IsNaN
IsInf
ResetNaN
ResetInf
Where
Math¶
Count 6/22
NNabla Function
Status
Description
Constant
Arange
Abs
✓
Exp
✓
Log
✓
Identity
✓
BatchMatmul
△
Round
✓
Ceil
Floor
Sin
Cos
Tan
Sinh
Cosh
ASin
ACos
ATan
ATan2
ASinh
ACosh
ATanh
Array Manipulation¶
Count 7/19
NNabla Function
Status
Description
Concatenate
✓
Split
✓
Stack
✓
Slice
✓
Pad
Transpose
✓
Broadcast
BroadcastTo
Tile
OneHot
Flip
✓
Shift
X
Sort
Reshape
✓
MatrixDiag
X
MatrixDiagPart
X
Assign
GatherNd
ScatterNd
Signal Processing¶
Count 0/3
NNabla Function
Status
Description
Interpolate
FFT
IFFT
Stochasticity¶
Count 0/11
NNabla Function
Status
Description
Dropout
X
TopKData
TopKGrad
Rand
Randint
Randn
RandomChoice
RandomCrop
RandomFlip
RandomShift
ImageAugmentation
Loss Functions¶
Count 0/9
NNabla Function
Status
Description
SigmoidCrossEntropy
BinaryCrossEntropy
SoftmaxCrossEntropy
CategoricalCrossEntropy
SquaredError
AbsoluteError
HuberLoss
EpsilonInsensitiveLoss
KLMultinomial
Quantization Neural Network Layers¶
Count 6/12
NNabla Function
Status
Description
BinarySigmoid
✓
BinaryTanh
✓
BinaryConnectAffine
✓
BinaryConnectConvolution
✓
BinaryWeightAffine
✓
BinaryWeightConvolution
✓
INQAffine
INQConvolution
FixedPointQuantize
MinMaxQuantize
Pow2Quantize
Prune
Validation¶
Count 0/3
NNabla Function
Status
Description
TopNError
BinaryError
ConfusionMatrix
Unsupported, Special Use¶
Count 0/5
NNabla Function
Status
Description
VATNoise
Unlink
Sink
NmsDetection2d
MaxPoolingBackward
Model Support Status¶
ONNX Support Status¶
Import¶
✓: Support to convert
X: Not support
Total: 11/12
ONNX Import Sample Test(onnx –> nnp)¶
Count 11/12
Name
Support
Memo
✓
✓
✓
✓
✓
X
The
edge
mode of thepad
in nnabla is not implemented.✓
✓
✓
✓
✓
✓
Export¶
✓: Support to convert
X: Not support
Total: 59/65
ONNX Export Sample Test(nnp –> onnx)¶
Count 34/37
Name
Support
Memo
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
X
NNabla converter error, will be fixed in the future.
X
NNabla converter error, will be fixed in the future.
✓
X
NNP with only a single executor is currently supported.
✓
✓
ONNX Export Pretrained Model Test(nnp –> onnx)¶
Count 17/18
Name
Support
Memo
✓
✓
✓
✓
X
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
ONNX Export Example Model Test(nnp –> onnx)¶
Count 8/10
Name
Support
Memo
✓
✓
✓
✓
X
✓
✓
X
The
onehot
dimension != 2 is not supported.✓
✓
Tensorflow Support Status¶
Import¶
✓: Support to convert
X: Not support
Total: 15/16
Tensorflow Import Sample Test(tf –> nnp)¶
Count 15/16
Name
Support
Memo
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
X
The
Shape
is currently not supported to convert by nnabla.✓
✓
Export¶
✓: Support to convert
X: Not support
Total: 59/65
Tensorflow Export Sample Test(nnp –> tf)¶
Count 34/37
Name
Support
Memo
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
X
NNabla converter error, will be fixed in the future.
X
NNabla converter error, will be fixed in the future.
✓
X
NNP with only a single executor is currently supported.
✓
✓
Tensorflow Export Pretrained Models(nnp –> tf)¶
Count 17/18
Name
Support
Memo
✓
✓
✓
✓
X
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
Tensorflow Export Example Models(nnp –> tf)¶
Count 8/10
Name
Support
Memo
✓
✓
✓
✓
X
✓
✓
X
The
onehot
dimension != 2 is not supported.✓
✓
Tensorflow Lite Support Status¶
Export¶
✓: Support to convert
X: Not support
Total: 45/65
Tensorflow Lite Export Sample Test(nnp –> tflite)¶
Count 32/37
Name
Support
Memo
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
X
X
✓
✓
✓
✓
✓
✓
✓
✓
✓
X
X
✓
X
✓
✓
Tensorflow Lite Export Pretrained Models(nnp –> tflite)¶
Count 6/18
Name
Support
Memo
X
X
X
✓
X
X
X
X
X
X
X
X
✓
✓
✓
✓
✓
X
Tensorflow Lite Export Example Models(nnp –> tflite)¶
Count 7/10
Name
Support
Memo
✓
✓
✓
X
X
✓
✓
X
✓
✓
NNabla C Runtime Support Status¶
Export¶
✓: Support to convert
X: Not support
Total: 34/37
NNC Export Sample Test(nnp –> nnb)¶
Count 34/37
Name
Support
Memo
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
X
Failed to infer by nnabla.
X
Failed to compare inferring result.
✓
X
Failed to compare inferring result.
✓
✓
Contributing Guide¶
Moved to Github.
License¶
Copyright (c) 2017 Sony Corporation. All rights reserved.
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
Definitions.
“License” shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.
“Licensor” shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.
“Legal Entity” shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, “control” means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.
“You” (or “Your”) shall mean an individual or Legal Entity exercising permissions granted by this License.
“Source” form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.
“Object” form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.
“Work” shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).
“Derivative Works” shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
“Contribution” shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, “submitted” means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as “Not a Contribution.”
“Contributor” shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
You must give any other recipients of the Work or Derivative Works a copy of this License; and You must cause any modified files to carry prominent notices stating that You changed the files; and You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and If the Work includes a “NOTICE” text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: HOW TO APPLY THE APACHE LICENSE TO YOUR WORK To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets “[]” replaced with your own identifying information. (Don’t include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same “printed page” as the copyright notice for easier identification within third-party archives.
Copyright 2017, Sony Corporation
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
1. comment out the function in functions.txt¶