Neural Network Libraries

Neural Network Libraries is deep learning framework that is intended to be used for research, development, and production. We aim it running everywhere like desktop PCs, HPC clusters, embedded devices and production servers.

This document describes how to use the Python API and C++ API, the contribution guide for developers, and the license term of this software. The Python API is more suitable for fast prototyping and experimentation of deep learning systems, while the C++ API is for deploying inference or training algorithms into embedded systems and servers (The documentation is not available so far. We will make it available soon). The framework is designed modularity and extensibility in mind. Community contributors can add a new operator or optimizer module of neural networks, and a specialized implementation of neural network modules for a specific target device as an extension.

Python Package

The Python API built on top of our C++11 core maximizes the flexibility of the design of neural networks , and encourages fast prototyping and experimentation. NNabla works on Python>=3.5 (>=3.6 is recommended).

Python Package Installation

There are three ways to install NNabla Python package.

How to setup environment including CUDA/cudNN, and how to install for each OS, please refer to this site.

Install with pip command

The NNabla python packages are hosted on PYPI for many platforms. For people who are familiar with Python and its package management system pip (and optionally CUDA, but recommended), the following pip installation guide will be satisfactory when you install NNabla Python. To see the a bit more detailed OS specific setup guide, go to the next section.

NNabla package installation using PIP

Note: please refer to the OS specific workflows for the OS specific dependencies setup.

Install NNabla package via pip:

pip install nnabla

Note: If you want to make sure the latest version will be installed, try to uninstall previously installed one with pip uninstall -y nnabla beforehand.

Then, check if it works by running:

python -c "import nnabla"
2018-06-26 15:20:16,759 [nnabla][INFO]: Initializing CPU extension...
NNabla CUDA extension package installation

See NNabla CUDA extension package installation using PIP.

Run an Example

Get the examples (, and unzip) or clone NNabla Examples repository, and go to the MNIST folder.

cd nnabla-examples/mnist-collection/

Run MNIST classification.

python classification.py

Run MNIST classification with CUDA/cuDNN.

python classification.py -c cudnn
OS specific workflows
Installation on Linux
Prerequisites

This installation instruction describes how to install NNabla using pip on almost any Linux 64-bit systems.

The supported Python versions for provided binary packages are 3.5(not recommended), 3.6 and 3.7. It is recommended to use Miniconda as a Python distribution. The following is a simple procedure to install Miniconda Python.

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b -p {installation path e.g. ~/miniconda}
# You have to set an environment variable PATH accordingly
# to enable the installed ``Python`` and the ``conda`` system.
echo 'export PATH=<installation path>/bin:$PATH' >> ~/.bashrc
# Restart your bash or source ~/.bashrc

# Switch the default Python version
conda install -y python={version number e.g. 3.6}
Installation

See NNabla package installation using PIP.

FAQ
Q. I want to use another linux distribution.

We actually tested other linux distributions and versions; Ubuntu 16.04, 18.04, 20.04, CentOS 7, 8 on various environments; Baremetal server, AWS instance, and/or Docker machine. Thus, you can install in almost the same way described here. The details of how-to-install for each are coming soon.

Installation on Windows
Prerequisites

We tested on Windows8.1 64bit and Windows10 64bit.

The following software are required for installation:

  • Required software.

    • Python>=3.6: PIP

    • Microsoft Visual C++ 2015 Redistributable

  • Recommended.

    • CUDA Toolkit and cuDNN (if you have CUDA GPUs).

Setup environment
Python

In this instruction, we use Miniconda.

Get and install the windows binary from here

And then install required packages from command prompt.

> conda install scipy scikit-image ipython

If your network is using proxy and setup fails, configure proxy server with environment variable and try install again.

> SET HTTP_PROXY=http://(enter the address of the http proxy server here)
> SET HTTPS_PROXY=https://(enter the address of the https proxy server here)
Microsoft Visual C++ 2015 Redistributable

Get and install from here

CUDA and cuDNN library

If you are using a NVIDIA GPU, execution speed will be drastically improved by installing the following software.

CUDA Toolkit

cuDNN

To install cuDNN, copy bin, include and lib to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v{CUDA_VERSION}

See a list of compatible cuDNN versions of CUDA extension packages.

Install

See NNabla package installation using PIP.

FAQ
Q. Scikit-image installation takes a long time.

Depending on the environment, it will take a long time. Please wait.

Q. Failed to install Scipy during installation.

Please install scipy using “conda install” before “pip install nnabla”.

Installation on macOS

NOTE: Our testing coverage in terms of environments and machines on macOS is very limited. Please submit an issue if you face any issue.

Prerequisites

We test the installation on macOS Sierra.

The following software are required for installation:

  • Python>=3.6 (We’d recommend you to setup Python using Anaconda or Miniconda).

    • pip (bundled in Conda Python)

    • wheel (bundled in Conda Python)

    • setuptools (bundled in Conda Python. You may need to upgrade the version of setuptools with pip install -U --no-deps setuptools.)

Install

See NNabla package installation using PIP (note that the binary packages for the CUDA extension are not available for macOS. Please build it from source).

Install NNabla package compatible with Multi-GPU execution

To enable multi-GPU execution such as distributed training on NNabla, you have to install a special edition of NNabla package. See Installation with Multi-GPU supported for installation.

Install from source

Documentation of build from source has been moved to Github repository (build or build_distributed).

Running on Docker

Python API Tutorial

The following tutorial documents are automatically generated from Jupyter notebook files listed in NNabla Tutorial. If you want to run these step-by-step, follow the link and see the instruction found there.

NNabla by Examples

This tutorial demonstrates how you can write a script to train a neural network by using a simple hand digits classification task.

Note: This tutorial notebook requires scikit-learn and matplotlib installed in your Python environment.

First let us prepare some dependencies.

import nnabla as nn

import nnabla.functions as F
import nnabla.parametric_functions as PF
import nnabla.solvers as S
from nnabla.monitor import tile_images

import numpy as np
import matplotlib.pyplot as plt
import tiny_digits
%matplotlib inline

np.random.seed(0)
imshow_opt = dict(cmap='gray', interpolation='nearest')
2017-06-26 23:09:49,971 [nnabla][INFO]: Initializing CPU extension...

The tiny_digits module is located under this folder. It provides some utilities for loading a handwritten-digit classification dataset (MNIST) available in scikit-learn.

Logistic Regression

We will first start by defining a computation graph for logistic regression. (For details on logistic regression, see Appendix A.)

The training will be done by gradient descent, where gradients are calculated using the error backpropagation algorithm (backprop).

Preparing a Toy Dataset

This section just prepares a dataset to be used for demonstration of NNabla usage.

digits = tiny_digits.load_digits(n_class=10)
tiny_digits.plot_stats(digits)
Num images: 1797
Image shape: (8, 8)
Labels: [0 1 2 3 4 5 6 7 8 9]
_images/by_examples_8_1.png

The next block creates a dataset loader which is a generator providing images and labels as minibatches. Note that this dataset is just an example purpose and not a part of NNabla.

data = tiny_digits.data_iterator_tiny_digits(digits, batch_size=64, shuffle=True)
2017-06-26 23:09:50,545 [nnabla][INFO]: DataSource with shuffle(True)
2017-06-26 23:09:50,546 [nnabla][INFO]: Using DataSourceWithMemoryCache
2017-06-26 23:09:50,546 [nnabla][INFO]: DataSource with shuffle(True)
2017-06-26 23:09:50,547 [nnabla][INFO]: On-memory
2017-06-26 23:09:50,547 [nnabla][INFO]: Using DataIterator

A minibatch is as follows. img and label are in numpy.ndarray.

img, label = data.next()
plt.imshow(tile_images(img), **imshow_opt)
print("labels: {}".format(label.reshape(8, 8)))
print("Label shape: {}".format(label.shape))
labels: [[ 2.  8.  2.  6.  6.  7.  1.  9.]
 [ 8.  5.  2.  8.  6.  6.  6.  6.]
 [ 1.  0.  5.  8.  8.  7.  8.  4.]
 [ 7.  5.  4.  9.  2.  9.  4.  7.]
 [ 6.  8.  9.  4.  3.  1.  0.  1.]
 [ 8.  6.  7.  7.  1.  0.  7.  6.]
 [ 2.  1.  9.  6.  7.  9.  0.  0.]
 [ 5.  1.  6.  3.  0.  2.  3.  4.]]
Label shape: (64, 1)
_images/by_examples_12_1.png
Preparing the Computation Graph

NNabla provides two different ways for backprop-based gradient descent optimization. One is with a static graph, and another is with a dynamic graph. We are going to show a static version first.

# Forward pass
x = nn.Variable(img.shape)  # Define an image variable
with nn.parameter_scope("affine1"):
    y = PF.affine(x, 10)  # Output is 10 class

This code block shows one of the most important features in graph building in NNabla, the parameter scope. The first line defines an input variable x. The second line creates a parameter scope. The third line then applies PF.affine - an affine transform - to x, and creates a variable y holding that result. Here, the PF (parametric_function) module provides functions that contain learnable parameters, such as affine transforms (which contains weights), convolution (which contains kernels) and batch normalization (which contains transformation factors and coefficients). We will call these functions as parametric functions. The parameters are created and initialized randomly at function call, and registered by a name “affine1” using parameter_scope context.

# Building a loss graph
t = nn.Variable(label.shape)  # Define an target variable
loss = F.mean(F.softmax_cross_entropy(y, t))  # Softmax Xentropy fits multi-class classification problems

The remaining lines shown above define a target variable and attach functions for loss at the end of the graph. Note that the static graph build doesn’t execute any computation, but the shapes of output variables are inferred. Therefore, we can inspect the shapes of each variable at this time:

print("Printing shapes of variables")
print(x.shape)
print(y.shape)
print(t.shape)
print(loss.shape)  # empty tuple means scalar
Printing shapes of variables
(64, 1, 8, 8)
(64, 10)
(64, 1)
()
Executing a static graph

You can execute the computation of the graph by calling the forward() method in a sink variable. Inputs can be set via .d accessor. It will borrow CPU array references as numpy.ndarray.

# Set data
x.d = img
t.d = label
# Execute a forward pass
loss.forward()
# Showing results
print("Prediction score of 0-th image: {}".format(y.d[0]))
print("Loss: {}".format(loss.d))
Prediction score of 0-th image: [  9.75851917   6.49118519  16.47323608  -1.36296904  -0.78583491
   4.08872032   7.84134388   2.42956853   3.31485462   3.61868763]
Loss: 10.6016616821

The output doesn’t make sense since the network is just randomly initialized.

Backward propagation through the graph

The parameters registered by parameter_scope management function can be queried by get_parameters() as a dict format.

print(nn.get_parameters())
OrderedDict([('affine1/affine/W', <Variable((64, 10), need_grad=True) at 0x7fa0ba361d50>), ('affine1/affine/b', <Variable((10,), need_grad=True) at 0x7fa0ba361ce8>)])

Before executing backpropagation, we should initialize gradient buffers of all parameter to zeros.

for param in nn.get_parameters().values():
    param.grad.zero()

Then, you can execute backprop by calling backward() method at the sink variable.

# Compute backward
loss.backward()
# Showing gradients.
for name, param in nn.get_parameters().items():
    print(name, param.shape, param.g.flat[:20])  # Showing first 20.
affine1/affine/W (64, 10) [  0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00
   0.00000000e+00   0.00000000e+00   4.98418584e-02   8.72317329e-03
  -4.06671129e-02  -4.68742661e-02   2.52632981e-09   7.86017510e-04
   9.06870365e-02  -1.56249944e-02  -1.56217301e-02  -3.12499963e-02]
affine1/affine/b (10,) [ 0.42710391 -0.01852455  0.07369987 -0.04687012 -0.07798236 -0.03664626
  0.01651323 -0.1249291  -0.11862005 -0.09374455]

Gradient is stored in grad field of Variable. .g accessor can be used to access grad data in numpy.ndarray format.

Optimizing parameters (=Training)

To optimize parameters, we provide solver module (aliased as S here). The solver module contains a bunch of optimizer implementations such as SGD, SGD with momentum, Adam etc. The below block creates SGD solver and sets parameters of logistic regression to it.

# Create a solver (gradient-based optimizer)
learning_rate = 1e-3
solver = S.Sgd(learning_rate)
solver.set_parameters(nn.get_parameters())  # Set parameter variables to be updated.

In the next block, we demonstrate a single step of optimization loop. solver.zero_grad() line does equivalent to calling .grad.zero() for all parameters as we shown above. After backward computation, we apply weight decay, then applying gradient descent implemented in Sgd solver class as follows

\[\theta \leftarrow \theta - \eta \nabla_{\theta} L(\theta, X_{\mathrm minibatch})\]

where \(\eta\) denotes learning rate.

# One step of training
x.d, t.d = data.next()
loss.forward()
solver.zero_grad()  # Initialize gradients of all parameters to zero.
loss.backward()
solver.weight_decay(1e-5)  # Applying weight decay as an regularization
solver.update()
print(loss.d)
12.9438686371

Next block iterates optimization steps, and shows the loss decreases.

for i in range(1000):
    x.d, t.d = data.next()
    loss.forward()
    solver.zero_grad()  # Initialize gradients of all parameters to zero.
    loss.backward()
    solver.weight_decay(1e-5)  # Applying weight decay as an regularization
    solver.update()
    if i % 100 == 0:  # Print for each 10 iterations
        print(i, loss.d)
0 12.6905069351
100 3.17041015625
200 1.60036706924
300 0.673069953918
400 0.951370298862
500 0.724424362183
600 0.361597299576
700 0.588107347488
800 0.28792989254
900 0.415006935596
Show prediction

The following code displays training results.

x.d, t.d = data.next()  # Here we predict images from training set although it's useless.
y.forward()  # You can execute a sub graph.
plt.imshow(tile_images(x.d), **imshow_opt)
print("prediction:")
print(y.d.argmax(axis=1).reshape(8, 8))  # Taking a class index based on prediction score.
prediction:
[[5 0 1 9 0 1 3 3]
 [2 4 1 7 4 5 6 5]
 [7 7 9 7 9 0 7 3]
 [5 3 7 6 6 8 0 9]
 [0 1 3 5 5 5 4 9]
 [1 0 0 8 5 1 8 8]
 [7 5 0 7 6 9 0 0]
 [0 6 2 6 4 4 2 6]]
_images/by_examples_36_1.png
Dynamic graph construction support

This is another way of running computation graph in NNabla. This example doesn’t show how useful dynamic graph is, but shows a bit of flavor.

The next block just define computation graph building as functions for later use.

def logreg_forward(x):
    with nn.parameter_scope("affine1"):
        y = PF.affine(x, 10)
    return y

def logreg_loss(y, t):
    loss = F.mean(F.softmax_cross_entropy(y, t))  # Softmax Xentropy fits multi-class classification problems
    return loss

To run a computation graph dynamically during creation, you use nnabla.auto_forward() context as you see in the below block. By this, computation is fired immediately at functions are called. (You can also use nnabla.set_auto_forward(auto) to set the auto-forward state globally.)

x = nn.Variable(img.shape)
t = nn.Variable(label.shape)
x.d, t.d = data.next()
with nn.auto_forward():  # Graph are executed
    y = logreg_forward(x)
    loss = logreg_loss(y, t)
print("Loss: {}".format(loss.d))
plt.imshow(tile_images(x.d), **imshow_opt)
print("prediction:")
print(y.d.argmax(axis=1).reshape(8, 8))
Loss: 0.43071603775
prediction:
[[9 3 5 0 1 9 9 2]
 [5 6 6 2 7 5 1 1]
 [3 7 7 6 0 8 3 8]
 [0 6 4 6 0 6 9 9]
 [6 1 2 5 8 3 2 4]
 [1 4 4 0 5 7 1 7]
 [7 8 9 5 8 3 7 8]
 [5 7 5 3 3 0 0 7]]
_images/by_examples_41_1.png

Backward computation can be done on a dynamically constructed graph.

solver.zero_grad()
loss.backward()
Multi-Layer Perceptron (MLP)

In this section, you see an example of MLP graph building and training.

Before starting, we clear all parameters registered in the logistic regression example.

nn.clear_parameters()  # Clear all parameters

Here is the function that builds a MLP with an arbitrary depth and width for 10 class classification.

def mlp(x, hidden=[16, 32, 16]):
    hs = []
    with nn.parameter_scope("mlp"):  # Parameter scope can be nested
        h = x
        for hid, hsize in enumerate(hidden):
            with nn.parameter_scope("affine{}".format(hid + 1)):
                h = F.tanh(PF.affine(h, hsize))
                hs.append(h)
        with nn.parameter_scope("classifier"):
            y = PF.affine(h, 10)
    return y, hs
# Construct a MLP graph
y, hs = mlp(x)
print("Printing shapes")
print("x: {}".format(x.shape))
for i, h in enumerate(hs):
    print("h{}:".format(i + 1), h.shape)
print("y: {}".format(y.shape))
Printing shapes
x: (64, 1, 8, 8)
h1: (64, 16)
h2: (64, 32)
h3: (64, 16)
y: (64, 10)
# Training
loss = logreg_loss(y, t)  # Reuse logreg loss function.

# Copied from the above logreg example.
def training(steps, learning_rate):
    solver = S.Sgd(learning_rate)
    solver.set_parameters(nn.get_parameters())  # Set parameter variables to be updated.
    for i in range(steps):
        x.d, t.d = data.next()
        loss.forward()
        solver.zero_grad()  # Initialize gradients of all parameters to zero.
        loss.backward()
        solver.weight_decay(1e-5)  # Applying weight decay as an regularization
        solver.update()
        if i % 100 == 0:  # Print for each 10 iterations
            print(i, loss.d)


# Training
training(1000, 1e-2)
0 2.42193937302
100 1.83251476288
200 1.49943637848
300 1.30751883984
400 1.00974023342
500 0.904026031494
600 0.873289525509
700 0.725554704666
800 0.614291608334
900 0.555113613605
# Showing responses for each layer
num_plot = len(hs) + 2
gid = 1

def scale01(h):
    return (h - h.min()) / (h.max() - h.min())

def imshow(img, title):
    global gid
    plt.subplot(num_plot, 1, gid)
    gid += 1
    plt.title(title)
    plt.imshow(img, **imshow_opt)
    plt.axis('off')

plt.figure(figsize=(2, 5))
imshow(x.d[0, 0], 'x')
for hid, h in enumerate(hs):
    imshow(scale01(h.d[0]).reshape(-1, 8), 'h{}'.format(hid + 1))
imshow(scale01(y.d[0]).reshape(2, 5), 'y')
_images/by_examples_52_0.png
Convolutional Neural Network with CUDA acceleration

Here we demonstrates a CNN with CUDA GPU acceleration.

nn.clear_parameters()
def cnn(x):
    with nn.parameter_scope("cnn"):  # Parameter scope can be nested
        with nn.parameter_scope("conv1"):
            c1 = F.tanh(PF.batch_normalization(
                PF.convolution(x, 4, (3, 3), pad=(1, 1), stride=(2, 2))))
        with nn.parameter_scope("conv2"):
            c2 = F.tanh(PF.batch_normalization(
                PF.convolution(c1, 8, (3, 3), pad=(1, 1))))
            c2 = F.average_pooling(c2, (2, 2))
        with nn.parameter_scope("fc3"):
            fc3 = F.tanh(PF.affine(c2, 32))
        with nn.parameter_scope("classifier"):
            y = PF.affine(fc3, 10)
    return y, [c1, c2, fc3]

To enable CUDA extension in NNabla, you have to install nnabla-ext-cuda package first. See the install guide. After installing the CUDA extension, you can easily switch to run on CUDA by specifying a context before building a graph. We strongly recommend using a cuDNN context that is fast. Although the context class can be instantiated by nn.Context(), specifying a context descriptor might be a bit complicated for users. There for we recommend create a context by using a helper function get_extension_context() found in the nnabla.ext_utils module. NNabla officially supports cpu and cudnn as a context specifier passed to the first argument (extension name). NOTE: By setting the cudnn context as a global default context, Functions and solves created are instantiated with cuDNN (preferred) mode. You can also specify a context using with nn.context_scope(). See API reference for details.

# Run on CUDA
from nnabla.ext_utils import get_extension_context
cuda_device_id = 0
ctx = get_extension_context('cudnn', device_id=cuda_device_id)
print("Context: {}".format(ctx))
nn.set_default_context(ctx)  # Set CUDA as a default context.
y, hs = cnn(x)
loss = logreg_loss(y, t)
2017-06-26 23:09:54,555 [nnabla][INFO]: Initializing CUDA extension...
2017-06-26 23:09:54,731 [nnabla][INFO]: Initializing cuDNN extension...
Context: Context(backend='cpu|cuda', array_class='CudaCachedArray', device_id='0', compute_backend='default|cudnn')
training(1000, 1e-1)
0 2.34862923622
100 1.00527024269
200 0.416576713324
300 0.240603536367
400 0.254562884569
500 0.206138283014
600 0.220851421356
700 0.161689639091
800 0.230873346329
900 0.121101222932
# Showing responses for each layer
num_plot = len(hs) + 2
gid = 1
plt.figure(figsize=(2, 8))
imshow(x.d[0, 0], 'x')
imshow(tile_images(hs[0].d[0][:, None]), 'conv1')
imshow(tile_images(hs[1].d[0][:, None]), 'conv2')
imshow(hs[2].d[0].reshape(-1, 8), 'fc3')
imshow(scale01(y.d[0]).reshape(2, 5), 'y')
_images/by_examples_59_0.png

nn.save_parameters writes parameters registered in parameter_scope system in HDF5 format. We use it a later example.

path_cnn_params = "tmp.params.cnn.h5"
nn.save_parameters(path_cnn_params)
2017-06-26 23:09:56,132 [nnabla][INFO]: Parameter save (hdf5): tmp.params.cnn.h5
Recurrent Neural Network (Elman RNN)

This is an example of recurrent neural network training.

nn.clear_parameters()
def rnn(xs, h0, hidden=32):
    hs = []
    with nn.parameter_scope("rnn"):
        h = h0
        # Time step loop
        for x in xs:
            # Note: Parameter scopes are reused over time
            # which means parameters are shared over time.
            with nn.parameter_scope("x2h"):
                x2h = PF.affine(x, hidden, with_bias=False)
            with nn.parameter_scope("h2h"):
                h2h = PF.affine(h, hidden)
            h = F.tanh(x2h + h2h)
            hs.append(h)
        with nn.parameter_scope("classifier"):
            y = PF.affine(h, 10)
    return y, hs

It is not meaningful, but just a demonstration purpose. We split an image into 2 by 2 grids, and feed them sequentially into RNN.

def split_grid4(x):
    x0 = x[..., :4, :4]
    x1 = x[..., :4, 4:]
    x2 = x[..., 4:, :4]
    x3 = x[..., 4:, 4:]
    return x0, x1, x2, x3
hidden = 32
seq_img = split_grid4(img)
seq_x = [nn.Variable(subimg.shape) for subimg in seq_img]
h0 = nn.Variable((img.shape[0], hidden))  # Initial hidden state.
y, hs = rnn(seq_x, h0, hidden)
loss = logreg_loss(y, t)
# Copied from the above logreg example.
def training_rnn(steps, learning_rate):
    solver = S.Sgd(learning_rate)
    solver.set_parameters(nn.get_parameters())  # Set parameter variables to be updated.
    for i in range(steps):
        minibatch = data.next()
        img, t.d = minibatch
        seq_img = split_grid4(img)
        h0.d = 0  # Initialize as 0
        for x, subimg in zip(seq_x, seq_img):
            x.d = subimg
        loss.forward()
        solver.zero_grad()  # Initialize gradients of all parameters to zero.
        loss.backward()
        solver.weight_decay(1e-5)  # Applying weight decay as an regularization
        solver.update()
        if i % 100 == 0:  # Print for each 10 iterations
            print(i, loss.d)

training_rnn(1000, 1e-1)
0 2.62527275085
100 0.780260562897
200 0.486522495747
300 0.289345681667
400 0.249717146158
500 0.538961410522
600 0.276877015829
700 0.159639537334
800 0.249660402536
900 0.0925596579909
# Showing responses for each layer
num_plot = len(hs) + 2
gid = 1
plt.figure(figsize=(2, 8))
imshow(x.d[0, 0], 'x')
for hid, h in enumerate(hs):
    imshow(scale01(h.d[0]).reshape(-1, 8), 'h{}'.format(hid + 1))
imshow(scale01(y.d[0]).reshape(2, 5), 'y')
_images/by_examples_69_0.png
Siamese Network

This example show how to embed an image in a categorical dataset into 2D space using deep learning. This also demonstrates how to reuse a pretrained network.

First, we load parameters learned in the CNN example.

nn.clear_parameters()
# Loading CNN pretrained parameters.
_ = nn.load_parameters(path_cnn_params)
2017-06-26 23:09:57,838 [nnabla][INFO]: Parameter load (<built-in function format>): tmp.params.cnn.h5

We define embedding function. Note that the network structure and parameter hierarchy is identical to the previous CNN example. That enables you to reuse the saved parameters and finetune from it.

def cnn_embed(x, test=False):
    # Note: Identical configuration with the CNN example above.
    # Parameters pretrained in the above CNN example are used.
    with nn.parameter_scope("cnn"):
        with nn.parameter_scope("conv1"):
            c1 = F.tanh(PF.batch_normalization(PF.convolution(x, 4, (3, 3), pad=(1, 1), stride=(2, 2)), batch_stat=not test))
        with nn.parameter_scope("conv2"):
            c2 = F.tanh(PF.batch_normalization(PF.convolution(c1, 8, (3, 3), pad=(1, 1)), batch_stat=not test))
            c2 = F.average_pooling(c2, (2, 2))
        with nn.parameter_scope("fc3"):
            fc3 = PF.affine(c2, 32)
    # Additional affine for map into 2D.
    with nn.parameter_scope("embed2d"):
        embed = PF.affine(c2, 2)
    return embed, [c1, c2, fc3]

def siamese_loss(e0, e1, t, margin=1.0, eps=1e-4):
    dist = F.sum(F.squared_error(e0, e1), axis=1)  # Squared distance
    # Contrastive loss
    sim_cost = t * dist
    dissim_cost = (1 - t) * \
        (F.maximum_scalar(margin - (dist + eps) ** (0.5), 0) ** 2)
    return F.mean(sim_cost + dissim_cost)

We build two stream CNNs and compare them with the contrastive loss function defined above. Note that both CNNs have the same parameter hierarchy, which means both parameters are shared.

x0 = nn.Variable(img.shape)
x1 = nn.Variable(img.shape)
t = nn.Variable((img.shape[0],))  # Same class or not
e0, hs0 = cnn_embed(x0)
e1, hs1 = cnn_embed(x1)  # NOTE: parameters are shared
loss = siamese_loss(e0, e1, t)
def training_siamese(steps):
    for i in range(steps):
        minibatchs = []
        for _ in range(2):
            minibatch = data.next()
            minibatchs.append((minibatch[0].copy(), minibatch[1].copy()))
        x0.d, label0 = minibatchs[0]
        x1.d, label1 = minibatchs[1]
        t.d = (label0 == label1).astype(np.int).flat
        loss.forward()
        solver.zero_grad()  # Initialize gradients of all parameters to zero.
        loss.backward()
        solver.weight_decay(1e-5)  # Applying weight decay as an regularization
        solver.update()
        if i % 100 == 0:  # Print for each 10 iterations
            print(i, loss.d)
learning_rate = 1e-2
solver = S.Sgd(learning_rate)
with nn.parameter_scope("embed2d"):
    # Only 2d embedding affine will be updated.
    solver.set_parameters(nn.get_parameters())
training_siamese(2000)
# Decay learning rate
solver.set_learning_rate(solver.learning_rate() * 0.1)
training_siamese(2000)
0 0.150528043509
100 0.186870157719
200 0.149316266179
300 0.207163512707
400 0.171384960413
500 0.190256178379
600 0.138507723808
700 0.0918073058128
800 0.159692272544
900 0.0833697617054
1000 0.0839115008712
1100 0.104669973254
1200 0.0776312947273
1300 0.114788673818
1400 0.120309025049
1500 0.107732802629
1600 0.070114441216
1700 0.101728007197
1800 0.114350572228
1900 0.118794307113
0 0.0669310241938
100 0.0553173273802
200 0.0829797014594
300 0.0951051414013
400 0.128303915262
500 0.102963000536
600 0.0910559669137
700 0.0898950695992
800 0.119949311018
900 0.0603067912161
1000 0.105748720467
1100 0.108760476112
1200 0.0820947736502
1300 0.0971114039421
1400 0.0836166366935
1500 0.0899554267526
1600 0.109069615602
1700 0.0921652168036
1800 0.0759357959032
1900 0.100669950247

We visualize embedded training images as following. You see the images from the same class embedded near each other.

all_image = digits.images[:512, None]
all_label = digits.target[:512]
x_all = nn.Variable(all_image.shape)
x_all.d = all_image
with nn.auto_forward():
    embed, _ = cnn_embed(x_all, test=True)
plt.figure(figsize=(16, 9))
for i in range(10):
    c = plt.cm.Set1(i / 10.)  # Maybe it doesn't work in an older version of Matplotlib where color map lies in [0, 256)
    plt.plot(embed.d[all_label == i, 0].flatten(), embed.d[
             all_label == i, 1].flatten(), '.', c=c)
plt.legend(map(str, range(10)))
plt.grid()
_images/by_examples_81_0.png
Appendix
A. Logistic Regression

Here we demonstrate how to train the simplest neural network, logistic regression (single layer perceptron). Logistic regression is a linear classifier \(f : {\cal R}^{D\times 1} \rightarrow {\cal R}^{K\times 1}\)

\[\mathbf f(\mathbf x, \mathbf \Theta) = \mathbf W \mathbf x + \mathbf b\]

where \(\mathbf x \in {\cal R}^{D \times 1}\) is an input image flattened to a vector, \(t \in \{0, 1, \cdots, K\}\) is a target label, \(\mathbf W \in {\cal R}^{K \times D}\) is a weight matrix, \(\mathbf b \in {\cal R}^{K \times 1}\) is a bias vector and \(\mathbf \Theta \equiv \left\{\mathbf W, \mathbf b\right\}\). Loss function is defined as

\[\mathbf L(\mathbf \Theta, \mathbf X) = \frac{1}{N} \sum_{\mathbf x, t \subset \mathbf X} -log \left(\left[\sigma\left(f(\mathbf x, \mathbf \Theta)\right)\right]_{t}\right)\]

where \(\mathbf X \equiv \left\{\mathbf x_1, t_1, \cdots, \mathbf x_N, t_N\right\}\) denotes a dataset the network trained on, \(\sigma(\mathbf z)\) is softmax operation defined as \(\frac{\exp(-\mathbf z)}{\sum_{z \subset \mathbf z} \exp(-z)}\), and \(\left[\mathbf z\right]_i\) denotes i-th element of \(\mathbf z\).

NNabla Python API Demonstration Tutorial

Let us import nnabla first, and some additional useful tools.

# python2/3 compatibility
from __future__ import print_function
from __future__ import absolute_import
from __future__ import division
import nnabla as nn  # Abbreviate as nn for convenience.

import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
2017-09-27 14:00:30,785 [nnabla][INFO]: Initializing CPU extension...
NdArray

NdArray is a data container of a multi-dimensional array. NdArray is device (e.g. CPU, CUDA) and type (e.g. uint8, float32) agnostic, in which both type and device are implicitly casted or transferred when it is used. Below, you create a NdArray with a shape of (2, 3, 4).

a = nn.NdArray((2, 3, 4))

You can see the values held inside a by the following. The values are not initialized, and are created as float32 by default.

print(a.data)
[[[  9.42546995e+24   4.56809286e-41   8.47690058e-38   0.00000000e+00]
  [  7.38056336e+34   7.50334969e+28   1.17078231e-32   7.58387310e+31]
  [  7.87001454e-12   9.84394250e-12   6.85712044e+22   1.81785692e+31]]

 [[  1.84681296e+25   1.84933247e+20   4.85656319e+33   2.06176836e-19]
  [  6.80020530e+22   1.69307638e+22   2.11235872e-19   1.94316151e-19]
  [  1.81805047e+31   3.01289097e+29   2.07004908e-19   1.84648795e+25]]]

The accessor .data returns a reference to the values of NdArray as numpy.ndarray. You can modify these by using the NumPy API as follows.

print('[Substituting random values]')
a.data = np.random.randn(*a.shape)
print(a.data)
print('[Slicing]')
a.data[0, :, ::2] = 0
print(a.data)
[Substituting random values]
[[[ 0.36133638  0.22121875 -1.5912329  -0.33490974]
  [ 1.35962474  0.2165522   0.54483992 -0.61813235]
  [-0.13718799 -0.44104072 -0.51307833  0.73900551]]

 [[-0.59464753 -2.17738533 -0.28626776 -0.45654735]
  [ 0.73566747  0.87292582 -0.41605178  0.04792296]
  [-0.63856047  0.31966645 -0.63974309 -0.61385244]]]
[Slicing]
[[[ 0.          0.22121875  0.         -0.33490974]
  [ 0.          0.2165522   0.         -0.61813235]
  [ 0.         -0.44104072  0.          0.73900551]]

 [[-0.59464753 -2.17738533 -0.28626776 -0.45654735]
  [ 0.73566747  0.87292582 -0.41605178  0.04792296]
  [-0.63856047  0.31966645 -0.63974309 -0.61385244]]]

Note that the above operation is all done in the host device (CPU). NdArray provides more efficient functions in case you want to fill all values with a constant, .zero and .fill. They are lazily evaluated when the data is requested (when neural network computation requests the data, or when NumPy array is requested by Python) The filling operation is executed within a specific device (e.g. CUDA GPU), and more efficient if you specify the device setting, which we explain later.

a.fill(1)  # Filling all values with one.
print(a.data)
[[[ 1.  1.  1.  1.]
  [ 1.  1.  1.  1.]
  [ 1.  1.  1.  1.]]

 [[ 1.  1.  1.  1.]
  [ 1.  1.  1.  1.]
  [ 1.  1.  1.  1.]]]

You can create an NdArray instance directly from a NumPy array object.

b = nn.NdArray.from_numpy_array(np.ones(a.shape))
print(b.data)
[[[ 1.  1.  1.  1.]
  [ 1.  1.  1.  1.]
  [ 1.  1.  1.  1.]]

 [[ 1.  1.  1.  1.]
  [ 1.  1.  1.  1.]
  [ 1.  1.  1.  1.]]]

NdArray is used in Variable class, as well as NNabla’s imperative computation of neural networks. We describe them in the later sections.

Variable

Variable class is used when you construct a neural network. The neural network can be described as a graph in which an edge represents a function (a.k.a operator and layer) which defines operation of a minimum unit of computation, and a node represents a variable which holds input/output values of a function (Function class is explained later). The graph is called “Computation Graph”.

In NNabla, a Variable, a node of a computation graph, holds two NdArrays, one for storing the input or output values of a function during forward propagation (executing computation graph in the forward order), while another for storing the backward error signal (gradient) during backward propagation (executing computation graph in backward order to propagate error signals down to parameters (weights) of neural networks). The first one is called data, the second is grad in NNabla.

The following line creates a Variable instance with a shape of (2, 3, 4). It has data and grad as NdArray. The flag need_grad is used to omit unnecessary gradient computation during backprop if set to False.

x = nn.Variable([2, 3, 4], need_grad=True)
print('x.data:', x.data)
print('x.grad:', x.grad)
x.data: <NdArray((2, 3, 4)) at 0x7f575caf4ea0>
x.grad: <NdArray((2, 3, 4)) at 0x7f575caf4ea0>

You can get the shape by:

x.shape
(2, 3, 4)

Since both data and grad are NdArray, you can get a reference to its values as NdArray with the .data accessor, but also it can be referred by .d or .g property for data and grad respectively.

print('x.data')
print(x.d)
x.d = 1.2345  # To avoid NaN
assert np.all(x.d == x.data.data), 'd: {} != {}'.format(x.d, x.data.data)
print('x.grad')
print(x.g)
x.g = 1.2345  # To avoid NaN
assert np.all(x.g == x.grad.data), 'g: {} != {}'.format(x.g, x.grad.data)

# Zeroing grad values
x.grad.zero()
print('x.grad (after `.zero()`)')
print(x.g)
x.data
[[[  9.42553452e+24   4.56809286e-41   8.32543479e-38   0.00000000e+00]
  [             nan              nan   0.00000000e+00   0.00000000e+00]
  [  3.70977305e+25   4.56809286e-41   3.78350585e-44   0.00000000e+00]]

 [[  5.68736600e-38   0.00000000e+00   1.86176378e-13   4.56809286e-41]
  [  4.74367616e+25   4.56809286e-41   5.43829710e+19   4.56809286e-41]
  [  0.00000000e+00   0.00000000e+00   2.93623372e-38   0.00000000e+00]]]
x.grad
[[[  9.42576510e+24   4.56809286e-41   9.42576510e+24   4.56809286e-41]
  [  9.27127763e-38   0.00000000e+00   9.27127763e-38   0.00000000e+00]
  [  1.69275966e+22   4.80112800e+30   1.21230330e+25   7.22962302e+31]]

 [[  1.10471027e-32   4.63080422e+27   2.44632805e+20   2.87606258e+20]
  [  4.46263300e+30   4.62311881e+30   7.65000750e+28   3.01339003e+29]
  [  2.08627352e-10   1.03961868e+21   7.99576678e+20   1.74441223e+22]]]
x.grad (after .zero())
[[[ 0.  0.  0.  0.]
  [ 0.  0.  0.  0.]
  [ 0.  0.  0.  0.]]

 [[ 0.  0.  0.  0.]
  [ 0.  0.  0.  0.]
  [ 0.  0.  0.  0.]]]

Like NdArray, a Variable can also be created from NumPy array(s).

x2 = nn.Variable.from_numpy_array(np.ones((3,)), need_grad=True)
print(x2)
print(x2.d)
x3 = nn.Variable.from_numpy_array(np.ones((3,)), np.zeros((3,)), need_grad=True)
print(x3)
print(x3.d)
print(x3.g)
<Variable((3,), need_grad=True) at 0x7f572a5242c8>
[ 1.  1.  1.]
<Variable((3,), need_grad=True) at 0x7f572a5244a8>
[ 1.  1.  1.]
[ 0.  0.  0.]

Besides storing values of a computation graph, pointing a parent edge (function) to trace the computation graph is an important role. Here x doesn’t have any connection. Therefore, the .parent property returns None.

print(x.parent)
None
Function

A function defines an operation block of a computation graph as we described above. The module nnabla.functions offers various functions (e.g. Convolution, Affine and ReLU). You can see the list of functions available in the API reference guide.

import nnabla.functions as F

As an example, here you will defines a computation graph that computes the element-wise Sigmoid function outputs for the input variable and sums up all values into a scalar. (This is simple enough to explain how it behaves but a meaningless example in the context of neural network training. We will show you a neural network example later.)

sigmoid_output = F.sigmoid(x)
sum_output = F.reduce_sum(sigmoid_output)

The function API in nnabla.functions takes one (or several) Variable(s) and arguments (if any), and returns one (or several) output Variable(s). The .parent points to the function instance which created it. Note that no computation occurs at this time since we just define the graph. (This is the default behavior of NNabla computation graph API. You can also fire actual computation during graph definition which we call “Dynamic mode” (explained later)).

print("sigmoid_output.parent.name:", sigmoid_output.parent.name)
print("x:", x)
print("sigmoid_output.parent.inputs refers to x:", sigmoid_output.parent.inputs)
sigmoid_output.parent.name: Sigmoid
x: <Variable((2, 3, 4), need_grad=True) at 0x7f572a51a778>
sigmoid_output.parent.inputs refers to x: [<Variable((2, 3, 4), need_grad=True) at 0x7f572a273a48>]
print("sum_output.parent.name:", sum_output.parent.name)
print("sigmoid_output:", sigmoid_output)
print("sum_output.parent.inputs refers to sigmoid_output:", sum_output.parent.inputs)
sum_output.parent.name: ReduceSum
sigmoid_output: <Variable((2, 3, 4), need_grad=True) at 0x7f572a524638>
sum_output.parent.inputs refers to sigmoid_output: [<Variable((2, 3, 4), need_grad=True) at 0x7f572a273a48>]

The .forward() at a leaf Variable executes the forward pass computation in the computation graph.

sum_output.forward()
print("CG output:", sum_output.d)
print("Reference:", np.sum(1.0 / (1.0 + np.exp(-x.d))))
CG output: 18.59052085876465
Reference: 18.5905

The .backward() does the backward propagation through the graph. Here we initialize the grad values as zero before backprop since the NNabla backprop algorithm always accumulates the gradient in the root variables.

x.grad.zero()
sum_output.backward()
print("d sum_o / d sigmoid_o:")
print(sigmoid_output.g)
print("d sum_o / d x:")
print(x.g)
d sum_o / d sigmoid_o:
[[[ 1.  1.  1.  1.]
  [ 1.  1.  1.  1.]
  [ 1.  1.  1.  1.]]

 [[ 1.  1.  1.  1.]
  [ 1.  1.  1.  1.]
  [ 1.  1.  1.  1.]]]
d sum_o / d x:
[[[ 0.17459197  0.17459197  0.17459197  0.17459197]
  [ 0.17459197  0.17459197  0.17459197  0.17459197]
  [ 0.17459197  0.17459197  0.17459197  0.17459197]]

 [[ 0.17459197  0.17459197  0.17459197  0.17459197]
  [ 0.17459197  0.17459197  0.17459197  0.17459197]
  [ 0.17459197  0.17459197  0.17459197  0.17459197]]]

NNabla is developed by mainly focused on neural network training and inference. Neural networks have parameters to be learned associated with computation blocks such as Convolution, Affine (a.k.a. fully connected, dense etc.). In NNabla, the learnable parameters are also represented as Variable objects. Just like input variables, those parameter variables are also used by passing into Functions. For example, Affine function takes input, weights and biases as inputs.

x = nn.Variable([5, 2])  # Input
w = nn.Variable([2, 3], need_grad=True)  # Weights
b = nn.Variable([3], need_grad=True)  # Biases
affine_out = F.affine(x, w, b)  # Create a graph including only affine

The above example takes an input with B=5 (batchsize) and D=2 (dimensions) and maps it to D’=3 outputs, i.e. (B, D’) output.

You may also notice that here you set need_grad=True only for parameter variables (w and b). The x is a non-parameter variable and the root of computation graph. Therefore, it doesn’t require gradient computation. In this configuration, the gradient computation for x is not executed in the first affine, which will omit the computation of unnecessary backpropagation.

The next block sets data and initializes grad, then applies forward and backward computation.

# Set random input and parameters
x.d = np.random.randn(*x.shape)
w.d = np.random.randn(*w.shape)
b.d = np.random.randn(*b.shape)
# Initialize grad
x.grad.zero()  # Just for showing gradients are not computed when need_grad=False (default).
w.grad.zero()
b.grad.zero()

# Forward and backward
affine_out.forward()
affine_out.backward()
# Note: Calling backward at non-scalar Variable propagates 1 as error message from all element of outputs. .

You can see that affine_out holds an output of Affine.

print('F.affine')
print(affine_out.d)
print('Reference')
print(np.dot(x.d, w.d) + b.d)
F.affine
[[-0.17701732  2.86095762 -0.82298267]
 [-0.75544345 -1.16702223 -2.44841242]
 [-0.36278027 -3.4771595  -0.75681627]
 [ 0.32743117  0.24258983  1.30944324]
 [-0.87201929  1.94556415 -3.23357344]]
Reference
[[-0.1770173   2.86095762 -0.82298267]
 [-0.75544345 -1.16702223 -2.44841242]
 [-0.3627803  -3.4771595  -0.75681627]
 [ 0.32743117  0.24258983  1.309443  ]
 [-0.87201929  1.94556415 -3.23357344]]

The resulting gradients of weights and biases are as follows.

print("dw")
print(w.g)
print("db")
print(b.g)
dw
[[ 3.10820675  3.10820675  3.10820675]
 [ 0.37446201  0.37446201  0.37446201]]
db
[ 5.  5.  5.]

The gradient of x is not changed because need_grad is set as False.

print(x.g)
[[ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]]
Parametric Function

Considering parameters as inputs of Function enhances expressiveness and flexibility of computation graphs. However, to define all parameters for each learnable function is annoying for users to define a neural network. In NNabla, trainable models are usually created by composing functions that have optimizable parameters. These functions are called “Parametric Functions”. The Parametric Function API provides various parametric functions and an interface for composing trainable models.

To use parametric functions, import:

import nnabla.parametric_functions as PF

The function with optimizable parameter can be created as below.

with nn.parameter_scope("affine1"):
    c1 = PF.affine(x, 3)

The first line creates a parameter scope. The second line then applies PF.affine - an affine transform - to x, and creates a variable c1 holding that result. The parameters are created and initialized randomly at function call, and registered by a name “affine1” using parameter_scope context. The function nnabla.get_parameters() allows to get the registered parameters.

nn.get_parameters()
OrderedDict([('affine1/affine/W',
              <Variable((2, 3), need_grad=True) at 0x7f572822f0e8>),
             ('affine1/affine/b',
              <Variable((3,), need_grad=True) at 0x7f572822f138>)])

The name= argument of any PF function creates the equivalent parameter space to the above definition of PF.affine transformation as below. It could save the space of your Python code. The nnabla.parameter_scope is more useful when you group multiple parametric functions such as Convolution-BatchNormalization found in a typical unit of CNNs.

c1 = PF.affine(x, 3, name='affine1')
nn.get_parameters()
OrderedDict([('affine1/affine/W',
              <Variable((2, 3), need_grad=True) at 0x7f572822f0e8>),
             ('affine1/affine/b',
              <Variable((3,), need_grad=True) at 0x7f572822f138>)])

It is worth noting that the shapes of both outputs and parameter variables (as you can see above) are automatically determined by only providing the output size of affine transformation(in the example above the output size is 3). This helps to create a graph in an easy way.

c1.shape
(5, 3)

Parameter scope can be nested as follows (although a meaningless example).

with nn.parameter_scope('foo'):
    h = PF.affine(x, 3)
    with nn.parameter_scope('bar'):
        h = PF.affine(h, 4)

This creates the following.

nn.get_parameters()
OrderedDict([('affine1/affine/W',
              <Variable((2, 3), need_grad=True) at 0x7f572822f0e8>),
             ('affine1/affine/b',
              <Variable((3,), need_grad=True) at 0x7f572822f138>),
             ('foo/affine/W',
              <Variable((2, 3), need_grad=True) at 0x7f572822fa98>),
             ('foo/affine/b',
              <Variable((3,), need_grad=True) at 0x7f572822fae8>),
             ('foo/bar/affine/W',
              <Variable((3, 4), need_grad=True) at 0x7f572822f728>),
             ('foo/bar/affine/b',
              <Variable((4,), need_grad=True) at 0x7f572822fdb8>)])

Also, get_parameters() can be used in parameter_scope. For example:

with nn.parameter_scope("foo"):
    print(nn.get_parameters())
OrderedDict([('affine/W', <Variable((2, 3), need_grad=True) at 0x7f572822fa98>), ('affine/b', <Variable((3,), need_grad=True) at 0x7f572822fae8>), ('bar/affine/W', <Variable((3, 4), need_grad=True) at 0x7f572822f728>), ('bar/affine/b', <Variable((4,), need_grad=True) at 0x7f572822fdb8>)])

nnabla.clear_parameters() can be used to delete registered parameters under the scope.

with nn.parameter_scope("foo"):
    nn.clear_parameters()
print(nn.get_parameters())
OrderedDict([('affine1/affine/W', <Variable((2, 3), need_grad=True) at 0x7f572822f0e8>), ('affine1/affine/b', <Variable((3,), need_grad=True) at 0x7f572822f138>)])
MLP Example For Explanation

The following block creates a computation graph to predict one dimensional output from two dimensional inputs by a 2 layer fully connected neural network (multi-layer perceptron).

nn.clear_parameters()
batchsize = 16
x = nn.Variable([batchsize, 2])
with nn.parameter_scope("fc1"):
    h = F.tanh(PF.affine(x, 512))
with nn.parameter_scope("fc2"):
    y = PF.affine(h, 1)
print("Shapes:", h.shape, y.shape)
Shapes: (16, 512) (16, 1)

This will create the following parameter variables.

nn.get_parameters()
OrderedDict([('fc1/affine/W',
              <Variable((2, 512), need_grad=True) at 0x7f572822fef8>),
             ('fc1/affine/b',
              <Variable((512,), need_grad=True) at 0x7f572822f9a8>),
             ('fc2/affine/W',
              <Variable((512, 1), need_grad=True) at 0x7f572822f778>),
             ('fc2/affine/b',
              <Variable((1,), need_grad=True) at 0x7f572822ff98>)])

As described above, you can execute the forward pass by calling forward method at the terminal variable.

x.d = np.random.randn(*x.shape)  # Set random input
y.forward()
print(y.d)
[[-0.05708594]
 [ 0.01661986]
 [-0.34168088]
 [ 0.05822293]
 [-0.16566885]
 [-0.04867431]
 [ 0.2633169 ]
 [ 0.10496549]
 [-0.01291842]
 [-0.09726256]
 [-0.05720493]
 [-0.09691752]
 [-0.07822668]
 [-0.17180404]
 [ 0.11970415]
 [-0.08222144]]

Training a neural networks needs a loss value to be minimized by gradient descent with backprop. In NNabla, loss function is also a just function, and packaged in the functions module.

# Variable for label
label = nn.Variable([batchsize, 1])
# Set loss
loss = F.reduce_mean(F.squared_error(y, label))

# Execute forward pass.
label.d = np.random.randn(*label.shape)  # Randomly generate labels
loss.forward()
print(loss.d)
1.9382084608078003

As you’ve seen above, NNabla backward accumulates the gradients at the root variables. You have to initialize the grad of the parameter variables before backprop (We will show you the easiest way with Solver API).

# Collect all parameter variables and init grad.
for name, param in nn.get_parameters().items():
    param.grad.zero()
# Gradients are accumulated to grad of params.
loss.backward()
Imperative Mode

After performing backprop, gradients are held in parameter variable grads. The next block will update the parameters with vanilla gradient descent.

for name, param in nn.get_parameters().items():
    param.data -= param.grad * 0.001  # 0.001 as learning rate

The above computation is an example of NNabla’s “Imperative Mode” for executing neural networks. Normally, NNabla functions (instances of nnabla.functions) take Variables as their input. When at least one NdArray is provided as an input for NNabla functions (instead of Variables), the function computation will be fired immediately, and returns an NdArray as the output, instead of returning a Variable. In the above example, the NNabla functions F.mul_scalar and F.sub2 are called by the overridden operators * and -=, respectively.

In other words, NNabla’s “Imperative mode” doesn’t create a computation graph, and can be used like NumPy. If device acceleration such as CUDA is enabled, it can be used like NumPy empowered with device acceleration. Parametric functions can also be used with NdArray input(s). The following block demonstrates a simple imperative execution example.

# A simple example of imperative mode.
xi = nn.NdArray.from_numpy_array(np.arange(4).reshape(2, 2))
yi = F.relu(xi - 1)
print(xi.data)
print(yi.data)
[[0 1]
 [2 3]]
[[ 0.  0.]
 [ 1.  2.]]

Note that in-place substitution from the rhs to the lhs cannot be done by the = operator. For example, when x is an NdArray, writing x = x + 1 will not increment all values of x - instead, the expression on the rhs will create a new NdArray object that is different from the one originally bound by x, and binds the new NdArray object to the Python variable x on the lhs.

For in-place editing of NdArrays, the in-place assignment operators +=, -=, *=, and /= can be used. The copy_from method can also be used to copy values of an existing NdArray to another. For example, incrementing 1 to x, an NdArray, can be done by x.copy_from(x+1). The copy is performed with device acceleration if a device context is specified by using nnabla.set_default_context or nnabla.context_scope.

# The following doesn't perform substitution but assigns a new NdArray object to `xi`.
# xi = xi + 1

# The following copies the result of `xi + 1` to `xi`.
xi.copy_from(xi + 1)
assert np.all(xi.data == (np.arange(4).reshape(2, 2) + 1))

# Inplace operations like `+=`, `*=` can also be used (more efficient).
xi += 1
assert np.all(xi.data == (np.arange(4).reshape(2, 2) + 2))
Solver

NNabla provides stochastic gradient descent algorithms to optimize parameters listed in the nnabla.solvers module. The parameter updates demonstrated above can be replaced with this Solver API, which is easier and usually faster.

from nnabla import solvers as S
solver = S.Sgd(lr=0.00001)
solver.set_parameters(nn.get_parameters())
# Set random data
x.d = np.random.randn(*x.shape)
label.d = np.random.randn(*label.shape)

# Forward
loss.forward()

Just call the the following solver method to fill zero grad region, then backprop

solver.zero_grad()
loss.backward()

The following block updates parameters with the Vanilla Sgd rule (equivalent to the imperative example above).

solver.update()
Toy Problem To Demonstrate Training

The following function defines a regression problem which computes the norm of a vector.

def vector2length(x):
    # x : [B, 2] where B is number of samples.
    return np.sqrt(np.sum(x ** 2, axis=1, keepdims=True))

We visualize this mapping with the contour plot by matplotlib as follows.

# Data for plotting contour on a grid data.
xs = np.linspace(-1, 1, 100)
ys = np.linspace(-1, 1, 100)
grid = np.meshgrid(xs, ys)
X = grid[0].flatten()
Y = grid[1].flatten()

def plot_true():
    """Plotting contour of true mapping from a grid data created above."""
    plt.contourf(xs, ys, vector2length(np.hstack([X[:, None], Y[:, None]])).reshape(100, 100))
    plt.axis('equal')
    plt.colorbar()

plot_true()
_images/python_api_98_0.png

We define a deep prediction neural network.

def length_mlp(x):
    h = x
    for i, hnum in enumerate([4, 8, 4, 2]):
        h = F.tanh(PF.affine(h, hnum, name="fc{}".format(i)))
    y = PF.affine(h, 1, name='fc')
    return y
nn.clear_parameters()
batchsize = 100
x = nn.Variable([batchsize, 2])
y = length_mlp(x)
label = nn.Variable([batchsize, 1])
loss = F.reduce_mean(F.squared_error(y, label))

We created a 5 layers deep MLP using for-loop. Note that only 3 lines of the code potentially create infinitely deep neural networks. The next block adds helper functions to visualize the learned function.

def predict(inp):
    ret = []
    for i in range(0, inp.shape[0], x.shape[0]):
        xx = inp[i:i + x.shape[0]]
        # Imperative execution
        xi = nn.NdArray.from_numpy_array(xx)
        yi = length_mlp(xi)
        ret.append(yi.data.copy())
    return np.vstack(ret)

def plot_prediction():
    plt.contourf(xs, ys, predict(np.hstack([X[:, None], Y[:, None]])).reshape(100, 100))
    plt.colorbar()
    plt.axis('equal')

Next we instantiate a solver object as follows. We use Adam optimizer which is one of the most popular SGD algorithm used in the literature.

from nnabla import solvers as S
solver = S.Adam(alpha=0.01)
solver.set_parameters(nn.get_parameters())

The following function generates data from the true system infinitely.

def random_data_provider(n):
    x = np.random.uniform(-1, 1, size=(n, 2))
    y = vector2length(x)
    return x, y

In the next block, we run 2000 training steps (SGD updates).

num_iter = 2000
for i in range(num_iter):
    # Sample data and set them to input variables of training.
    xx, ll = random_data_provider(batchsize)
    x.d = xx
    label.d = ll
    # Forward propagation given inputs.
    loss.forward(clear_no_need_grad=True)
    # Parameter gradients initialization and gradients computation by backprop.
    solver.zero_grad()
    loss.backward(clear_buffer=True)
    # Apply weight decay and update by Adam rule.
    solver.weight_decay(1e-6)
    solver.update()
    # Just print progress.
    if i % 100 == 0 or i == num_iter - 1:
        print("Loss@{:4d}: {}".format(i, loss.d))
Loss@   0: 0.6976373195648193
Loss@ 100: 0.08075223118066788
Loss@ 200: 0.005213144235312939
Loss@ 300: 0.001955194864422083
Loss@ 400: 0.0011660841992124915
Loss@ 500: 0.0006421314901672304
Loss@ 600: 0.0009330055327154696
Loss@ 700: 0.0008817618945613503
Loss@ 800: 0.0006205961108207703
Loss@ 900: 0.0009072928223758936
Loss@1000: 0.0008160348515957594
Loss@1100: 0.0011569359339773655
Loss@1200: 0.000837412488181144
Loss@1300: 0.0011542742140591145
Loss@1400: 0.0005833200993947685
Loss@1500: 0.0009848927147686481
Loss@1600: 0.0005141657311469316
Loss@1700: 0.0009339841199107468
Loss@1800: 0.000950580753851682
Loss@1900: 0.0005430278833955526
Loss@1999: 0.0007046313839964569

Memory usage optimization: You may notice that, in the above updates, .forward() is called with the clear_no_need_grad= option, and .backward() is called with the clear_buffer= option. Training of neural network in more realistic scenarios usually consumes huge memory due to the nature of backpropagation algorithm, in which all of the forward variable buffer data should be kept in order to compute the gradient of a function. In a naive implementation, we keep all the variable data and grad living until the NdArray objects are not referenced (i.e. the graph is deleted). The clear_* options in .forward() and .backward() enables to save memory consumption due to that by clearing (erasing) memory of data and grad when it is not referenced by any subsequent computation. (More precisely speaking, it doesn’t free memory actually. We use our memory pool engine by default to avoid memory alloc/free overhead). The unreferenced buffers can be re-used in subsequent computation. See the document of Variable for more details. Note that the following loss.forward(clear_buffer=True) clears data of any intermediate variables. If you are interested in intermediate variables for some purposes (e.g. debug, log), you can use the .persistent flag to prevent clearing buffer of a specific Variable like below.

loss.forward(clear_buffer=True)
print("The prediction `y` is cleared because it's an intermediate variable.")
print(y.d.flatten()[:4])  # to save space show only 4 values
y.persistent = True
loss.forward(clear_buffer=True)
print("The prediction `y` is kept by the persistent flag.")
print(y.d.flatten()[:4])  # to save space show only 4 value
The prediction y is cleared because it's an intermediate variable.
[  2.27279830e-04   6.02164946e-05   5.33679675e-04   2.35557582e-05]
The prediction y is kept by the persistent flag.
[ 1.0851264   0.87657517  0.79603785  0.40098712]

We can confirm the prediction performs fairly well by looking at the following visualization of the ground truth and prediction function.

plt.subplot(121)
plt.title("Ground truth")
plot_true()
plt.subplot(122)
plt.title("Prediction")
plot_prediction()
_images/python_api_113_0.png

You can save learned parameters by nnabla.save_parameters and load by nnabla.load_parameters.

path_param = "param-vector2length.h5"
nn.save_parameters(path_param)
# Remove all once
nn.clear_parameters()
nn.get_parameters()
2017-09-27 14:00:40,544 [nnabla][INFO]: Parameter save (.h5): param-vector2length.h5
OrderedDict()
# Load again
nn.load_parameters(path_param)
print('\n'.join(map(str, nn.get_parameters().items())))
2017-09-27 14:00:40,564 [nnabla][INFO]: Parameter load (<built-in function format>): param-vector2length.h5
('fc0/affine/W', <Variable((2, 4), need_grad=True) at 0x7f576328df48>)
('fc0/affine/b', <Variable((4,), need_grad=True) at 0x7f57245f2868>)
('fc1/affine/W', <Variable((4, 8), need_grad=True) at 0x7f576328def8>)
('fc1/affine/b', <Variable((8,), need_grad=True) at 0x7f5727ee5c78>)
('fc2/affine/W', <Variable((8, 4), need_grad=True) at 0x7f5763297318>)
('fc2/affine/b', <Variable((4,), need_grad=True) at 0x7f5727d29908>)
('fc3/affine/W', <Variable((4, 2), need_grad=True) at 0x7f57632973b8>)
('fc3/affine/b', <Variable((2,), need_grad=True) at 0x7f57632974a8>)
('fc/affine/W', <Variable((2, 1), need_grad=True) at 0x7f57632974f8>)
('fc/affine/b', <Variable((1,), need_grad=True) at 0x7f5763297598>)

Both save and load functions can also be used in a parameter scope.

with nn.parameter_scope('foo'):
    nn.load_parameters(path_param)
print('\n'.join(map(str, nn.get_parameters().items())))
2017-09-27 14:00:40,714 [nnabla][INFO]: Parameter load (<built-in function format>): param-vector2length.h5
('fc0/affine/W', <Variable((2, 4), need_grad=True) at 0x7f576328df48>)
('fc0/affine/b', <Variable((4,), need_grad=True) at 0x7f57245f2868>)
('fc1/affine/W', <Variable((4, 8), need_grad=True) at 0x7f576328def8>)
('fc1/affine/b', <Variable((8,), need_grad=True) at 0x7f5727ee5c78>)
('fc2/affine/W', <Variable((8, 4), need_grad=True) at 0x7f5763297318>)
('fc2/affine/b', <Variable((4,), need_grad=True) at 0x7f5727d29908>)
('fc3/affine/W', <Variable((4, 2), need_grad=True) at 0x7f57632973b8>)
('fc3/affine/b', <Variable((2,), need_grad=True) at 0x7f57632974a8>)
('fc/affine/W', <Variable((2, 1), need_grad=True) at 0x7f57632974f8>)
('fc/affine/b', <Variable((1,), need_grad=True) at 0x7f5763297598>)
('foo/fc0/affine/W', <Variable((2, 4), need_grad=True) at 0x7f5763297958>)
('foo/fc0/affine/b', <Variable((4,), need_grad=True) at 0x7f57632978b8>)
('foo/fc1/affine/W', <Variable((4, 8), need_grad=True) at 0x7f572a51ac78>)
('foo/fc1/affine/b', <Variable((8,), need_grad=True) at 0x7f5763297c78>)
('foo/fc2/affine/W', <Variable((8, 4), need_grad=True) at 0x7f5763297a98>)
('foo/fc2/affine/b', <Variable((4,), need_grad=True) at 0x7f5763297d68>)
('foo/fc3/affine/W', <Variable((4, 2), need_grad=True) at 0x7f5763297e08>)
('foo/fc3/affine/b', <Variable((2,), need_grad=True) at 0x7f5763297ea8>)
('foo/fc/affine/W', <Variable((2, 1), need_grad=True) at 0x7f5763297f48>)
('foo/fc/affine/b', <Variable((1,), need_grad=True) at 0x7f5763297cc8>)
!rm {path_param}  # Clean ups

NNabla Models Finetuning Tutorial

Here we demonstrate how to perform finetuning using nnabla’s pre-trained models.

Load the model

Loading the model is very simple. All you need is just 2 lines.

from nnabla.models.imagenet import ResNet18
model = ResNet18()

You can choose other ResNet models such as ResNet34, ResNet50, by specifying the model’s name as an argument. Of course, you can choose other pretrained models as well. See the Docs.

NOTE: If you use the ResNet18 for the first time, nnabla will automatically download the weights from https://nnabla.org and it may take up to a few minutes.

Dataset

In this tutorial, we use Caltech101 as the dataset for finetuning. Caltech101 consists of more than 9,000 object images in total and each image belongs to one of 101 distinct categories or “clutter” category. We use images from 101 categories for simple classification.

We have a script named caltech101_data.py which can automatically download the dataset and store it in nnabla_data.

If you have your own dataset and DataIterator which can load your data, you can use it instead.

run caltech101_data.py
batch_size = 32  # we set batch_size = 32
all_data = data_iterator_caltech101(batch_size)

Since there is no separate data for training and validation in caltech101, we need to manually split it up. Here, we will split the dataset as the following way; 80% for training, and 20% for validation.

num_samples = all_data.size
num_train_samples = int(0.8 * num_samples)  # Take 80% for training, and the rest for validation.
num_class = 101
data_iterator_train = all_data.slice(
        rng=None, slice_start=0, slice_end=num_train_samples)
data_iterator_valid = all_data.slice(
        rng=None, slice_start=num_train_samples, slice_end=num_samples)

Now we have model and data!

Optional: Check the image in the dataset

Let’s take a look at what kind of images are included in the dataset. You can get images by DataIterator’s method, next

import matplotlib.pyplot as plt
%matplotlib inline
images, labels = data_iterator_train.next()
sample_image, sample_label = images[0], labels[0]
plt.imshow(sample_image.transpose(1,2,0))
plt.show()
print("image_shape: {}".format(sample_image.shape))
print("label_id: {}".format(sample_label))
_images/model_finetuning_11_0.png
image_shape: (3, 128, 128)
label_id: [94]
Preparing Graph Construction

Let’s start with importing basic modules.

import nnabla as nn

# Optional: If you want to use GPU
from nnabla.ext_utils import get_extension_context
ctx = get_extension_context("cudnn")
nn.set_default_context(ctx)
ext = nn.ext_utils.import_extension_module("cudnn")
Create input Variables for the Network

Now we are going to create the input variables.

channels, image_height, image_width = sample_image.shape  # use info from the image we got

# input variables for the validation network
image_valid = nn.Variable((batch_size, channels, image_height, image_width))
label_valid = nn.Variable((batch_size, 1))
input_image_valid = {"image": image_valid, "label": label_valid}

# input variables for the training network
image_train = nn.Variable((batch_size, channels, image_height, image_width))
label_train = nn.Variable((batch_size, 1))
input_image_train = {"image": image_train, "label": label_train}
Create the training graph using the pretrained model

If you take a look at the Model’s API Reference, you can find use_up_to option. Specifying one of the pre-defined strings when calling the model, the computation graph will be constructed up to the layer you specify. For example, in case of ResNet18, you can choose one of the following as the last layer of the graph.

  • ‘classifier’ (default): The output of the final affine layer for classification.

  • ‘pool’: The output of the final global average pooling.

  • ‘lastconv’: The input of the final global average pooling without ReLU activation..

  • ‘lastconv+relu’: Network up to ‘lastconv’ followed by ReLU activation.

For finetuning, it is common to replace only the upper layers with the new (not trained) ones and re-use the lower layers with their pretrained weights. Also, pretrained models have been trained on a classification task on ImageNet, which has 1000 categories, so the output of the classifier layer has the output shape (batch_size, 1000) that wouldn’t fit our current dataset. For this reason, here we construct the graph up to the pool layer, which corresponds to the global average pooling layer in the original graph, and connect it to the additional affine (fully-connected) layer for 101-way classification. For finetuning, it is common to train only the weights for the newly added layers (in this case, the last affine layer), but in this tutorial, we will update the weights for all layers in the graph. Also, when creating a training graph, you need to set training=True.

import nnabla.parametric_functions as PF

y_train = model(image_train, force_global_pooling=True, use_up_to="pool", training=True)
with nn.parameter_scope("finetuning_fc"):
    pred_train = PF.affine(y_train, 101)  # adding the affine layer to the graph.

NOTE: You need to specify force_global_pooling=True when the input shape is different from what the model expects. You can check the model’s default input shape by typing model.input_shape.

Create the validation graph using the model

Creating the validation graph is almost the same. You simply need to change training flag to False.

y_valid = model(image_valid,
                force_global_pooling=True, use_up_to="pool", training=False)
with nn.parameter_scope("finetuning_fc"):
    pred_valid = PF.affine(y_valid, 101)
pred_valid.persistent = True  # to keep the value when get `forward(clear_buffer=True)`-ed.
Define the functions for computing Loss and Categorical Error
import nnabla.functions as F


def loss_function(pred, label):
    """
        Compute loss.
    """
    loss = F.mean(F.softmax_cross_entropy(pred, label))
    return loss

loss_valid = loss_function(pred_valid, label_valid)
top_1_error_valid = F.mean(F.top_n_error(pred_valid, label_valid))
loss_train = loss_function(pred_train, label_train)
top_1_error_train = F.mean(F.top_n_error(pred_train, label_train))
Prepare the solver
import nnabla.solvers as S

solver = S.Momentum(0.01)  # you can choose others as well

solver.set_parameters(nn.get_parameters())
Some setting for iteration
num_epoch = 10  # arbitrary
one_epoch = data_iterator_train.size // batch_size
max_iter = num_epoch * one_epoch
val_iter = data_iterator_valid.size // batch_size
Performance before finetuning

Let’s see how well the model works. Note that all the weights are pretrained on ImageNet except for the last affine layer. First, prepare a function to show us the model’s performance,

def run_validation(pred_valid, loss_valid, top_1_error_valid,
                   input_image_valid, data_iterator_valid,
                   with_visualized=False, num_visualized=3):
    assert num_visualized < pred_valid.shape[0], "too many images to plot."
    val_iter = data_iterator_valid.size // pred_valid.shape[0]
    ve = 0.
    vloss = 0.
    for j in range(val_iter):
        v_image, v_label = data_iterator_valid.next()
        input_image_valid["image"].d = v_image
        input_image_valid["label"].d = v_label
        nn.forward_all([loss_valid, top_1_error_valid], clear_no_need_grad=True)
        vloss += loss_valid.d.copy()
        ve += top_1_error_valid.d.copy()

    vloss /= val_iter
    ve /= val_iter

    if with_visualized:
        ind = 1
        random_start = np.random.randint(pred_valid.shape[0] - num_visualized)
        fig = plt.figure(figsize=(12., 12.))
        for n in range(random_start, random_start + num_visualized):
            sample_image, sample_label = v_image[n], v_label[n]
            ax = fig.add_subplot(1, num_visualized, ind)
            ax.imshow(sample_image.transpose(1,2,0))
            with nn.auto_forward():
                predicted_id = np.argmax(F.softmax(pred_valid)[n].d)
            result = "true label_id: {} - predicted as {}".format(str(sample_label[0]), str(predicted_id))
            ax.set_title(result)
            ind += 1
        fig.show()

    return ve, vloss
_, _ = run_validation(pred_valid, loss_valid, top_1_error_valid, input_image_valid, data_iterator_valid, with_visualized=True)
_images/model_finetuning_29_1.png

As you can see, the model fails to classify images properly. Now, let’s begin the finetuning and see how performance improves.

Start Finetuning

Let’s prepare the monitor for training.

from nnabla.monitor import Monitor, MonitorSeries, MonitorTimeElapsed
monitor = Monitor("tmp.monitor")
monitor_loss = MonitorSeries("Training loss", monitor, interval=200)
monitor_err = MonitorSeries("Training error", monitor, interval=200)
monitor_vloss = MonitorSeries("Test loss", monitor, interval=200)
monitor_verr = MonitorSeries("Test error", monitor, interval=200)
# Training-loop
for i in range(max_iter):
    image, label = data_iterator_train.next()
    input_image_train["image"].d = image
    input_image_train["label"].d = label
    nn.forward_all([loss_train, top_1_error_train], clear_no_need_grad=True)

    monitor_loss.add(i, loss_train.d.copy())
    monitor_err.add(i, top_1_error_train.d.copy())

    solver.zero_grad()
    loss_train.backward(clear_buffer=True)

    # update parameters
    solver.weight_decay(3e-4)
    solver.update()

    if i % 200 == 0:
        ve, vloss = run_validation(pred_valid, loss_valid, top_1_error_valid,
                                   input_image_valid, data_iterator_valid,
                                   with_visualized=False, num_visualized=3)

        monitor_vloss.add(i, vloss)
        monitor_verr.add(i, ve)
2019-07-05 14:26:26,885 [nnabla][INFO]: iter=199 {Training loss}=1.5021580457687378
2019-07-05 14:26:26,887 [nnabla][INFO]: iter=199 {Training error}=0.3345312476158142
2019-07-05 14:26:28,756 [nnabla][INFO]: iter=200 {Test loss}=2.975713219355654
2019-07-05 14:26:28,756 [nnabla][INFO]: iter=200 {Test error}=0.5384837962962963
2019-07-05 14:26:50,249 [nnabla][INFO]: iter=399 {Training loss}=0.22022955119609833
2019-07-05 14:26:50,250 [nnabla][INFO]: iter=399 {Training error}=0.053437501192092896
2019-07-05 14:26:52,256 [nnabla][INFO]: iter=400 {Test loss}=0.12045302835327608
2019-07-05 14:26:52,257 [nnabla][INFO]: iter=400 {Test error}=0.029513888888888888
2019-07-05 14:27:14,151 [nnabla][INFO]: iter=599 {Training loss}=0.0659928247332573
2019-07-05 14:27:14,152 [nnabla][INFO]: iter=599 {Training error}=0.012500000186264515
2019-07-05 14:27:16,175 [nnabla][INFO]: iter=600 {Test loss}=0.08744175952893717
2019-07-05 14:27:16,175 [nnabla][INFO]: iter=600 {Test error}=0.02199074074074074
2019-07-05 14:27:38,097 [nnabla][INFO]: iter=799 {Training loss}=0.03324155509471893
2019-07-05 14:27:38,098 [nnabla][INFO]: iter=799 {Training error}=0.0054687499068677425
2019-07-05 14:27:40,120 [nnabla][INFO]: iter=800 {Test loss}=0.07678695395588875
2019-07-05 14:27:40,121 [nnabla][INFO]: iter=800 {Test error}=0.02025462962962963
2019-07-05 14:28:02,041 [nnabla][INFO]: iter=999 {Training loss}=0.019672293215990067
2019-07-05 14:28:02,042 [nnabla][INFO]: iter=999 {Training error}=0.0017187499906867743
2019-07-05 14:28:04,064 [nnabla][INFO]: iter=1000 {Test loss}=0.06333287184437116
2019-07-05 14:28:04,065 [nnabla][INFO]: iter=1000 {Test error}=0.017361111111111112
2019-07-05 14:28:25,984 [nnabla][INFO]: iter=1199 {Training loss}=0.009992362931370735
2019-07-05 14:28:25,985 [nnabla][INFO]: iter=1199 {Training error}=0.0003124999930150807
2019-07-05 14:28:28,008 [nnabla][INFO]: iter=1200 {Test loss}=0.06950318495984431
2019-07-05 14:28:28,008 [nnabla][INFO]: iter=1200 {Test error}=0.015625
2019-07-05 14:28:49,954 [nnabla][INFO]: iter=1399 {Training loss}=0.007941835559904575
2019-07-05 14:28:49,955 [nnabla][INFO]: iter=1399 {Training error}=0.0003124999930150807
2019-07-05 14:28:51,978 [nnabla][INFO]: iter=1400 {Test loss}=0.06711215277512868
2019-07-05 14:28:51,979 [nnabla][INFO]: iter=1400 {Test error}=0.016203703703703703
2019-07-05 14:29:13,898 [nnabla][INFO]: iter=1599 {Training loss}=0.008225565776228905
2019-07-05 14:29:13,899 [nnabla][INFO]: iter=1599 {Training error}=0.0007812500116415322
2019-07-05 14:29:15,923 [nnabla][INFO]: iter=1600 {Test loss}=0.06447940292181792
2019-07-05 14:29:15,923 [nnabla][INFO]: iter=1600 {Test error}=0.016203703703703703
2019-07-05 14:29:37,850 [nnabla][INFO]: iter=1799 {Training loss}=0.005678100511431694
2019-07-05 14:29:37,850 [nnabla][INFO]: iter=1799 {Training error}=0.0
2019-07-05 14:29:39,873 [nnabla][INFO]: iter=1800 {Test loss}=0.06282947226255028
2019-07-05 14:29:39,873 [nnabla][INFO]: iter=1800 {Test error}=0.01678240740740741
2019-07-05 14:30:01,795 [nnabla][INFO]: iter=1999 {Training loss}=0.006834140978753567
2019-07-05 14:30:01,796 [nnabla][INFO]: iter=1999 {Training error}=0.00046874998952262104
2019-07-05 14:30:03,818 [nnabla][INFO]: iter=2000 {Test loss}=0.05948294078310331
2019-07-05 14:30:03,818 [nnabla][INFO]: iter=2000 {Test error}=0.014467592592592593

As you see, the loss and error rate is decreasing as the finetuning progresses. Let’s see the classification result after finetuning.

_, _ = run_validation(pred_valid, loss_valid, top_1_error_valid, input_image_valid, data_iterator_valid, with_visualized=True)
_images/model_finetuning_36_0.png

You can see now the model is able to classify the image properly.

Finetuning more

we have a convenient script named finetuning.py. By using this, you can try finetuning with different models even on your original dataset.

To do this, you need to prepare your own dataset and do some preprocessing. We will explain how to do this in the following.

Prepare your dataset

Suppose you have a lot of images which can be used for image classification. You need to organize your data in a certain manner. Here, we will explain that with another dataset, Stanford Dogs Dataset. First, visit the official page and download images.tar (here is the direct link). Next, untar the archive and then you will see a directory named Images. Inside that directory, there are many subdirectories and each subdirectory stores images which belong to 1 category. For example, a directory n02099712-Labrador_retriever contains labrador retriever’s images only. So if you want to use your own dataset, you need to organize your images and directiories in the same way like the following;

parent_directory
├── subdirectory_for_category_A
│   ├── image_0.jpg
│   ├── image_1.jpg
│   ├── image_2.jpg
│   ├── ...
│
├── subdirectory_for_category_B
│   ├── image_0.jpg
│   ├── ...
│
├── subdirectory_for_category_C
│   ├── image_0.jpg
│   ├── ...
│
├── subdirectory_for_category_D
│   ├── image_0.jpg
│   ├── ...
│
 ...

The numbers of images in each category can vary, do not have to be exactly the same. Once you arrange your dataset, now you’re good to go!

Create image classification dataset using NNabla CLI

Now that you prepare and organize your dataset, the only thing you have to do is to create a .csv file which will be used in finetuning.py. To do so, you can use NNabla’s Python Command Line Interface. Just type like the following.

nnabla_cli create_image_classification_dataset -i <path to parent directory> -o <output directory which contains "preprocessed" images> -c <number of channels> -w <width> -g <height> -m <padding or trimming> -s <whether apply shuffle or not> -f1 <name of the output csv file for training data> -f2 <name of the output csv file for test data> -r2 <ratio(%) of test data to training data>

If you do that on Stanford Dogs Dataset,

nnabla_cli create_image_classification_dataset -i Images -o arranged_images -c 3 -w 128 -g 128 -m padding -s true -f1 stanford_dog_train.csv -f2 stanford_dog_test.csv -r2 20

Note that output .csv file will be stored in the same directory you specified with -o option. For more information, please check the docs.

After executing the command above, you can start finetuning on your dataset.

Run finetuning

All you need is just to type one line.

python finetuning.py --model <model name> --train-csv <.csv file containing training data>  --test-csv <.csv file containing test data>

It will execute finetuning on your dataset!

run finetuning.py --model ResNet34 --epoch 10 --train-csv ~/nnabla_data/stanford_dog_arranged/stanford_dog_train.csv --test-csv ~/nnabla_data/stanford_dog_arranged/stanford_dog_test.csv --shuffle True
An example of how to use finetuning’s result for inference

Once the finetuning finished, let’s use it for inference! The script above has saved the parameters at every certain iteration you specified. So now call the same model you trained and this time let’s use the finetuned parameters in the following way.

from nnabla.models.imagenet import ResNet34
import nnabla as nn

param_path = "params_XXX.h5"  # specify the path to the saved parameter (.h5)

model = ResNet34()
batch_size = 1  # just for inference
input_shape = (batch_size, ) + model.input_shape

Then define an input Variable and a network for inference. Note that you need to construct the network exactly the same way as done in finetuning script (layer configuration, parameters names, and so on…).

x = nn.Variable(input_shape)  # input Variable
pooled = model(x, use_up_to="pool", training=False)
with nn.parameter_scope("finetuning"):
    with nn.parameter_scope("last_fc"):
        pred = PF.affine(pooled, 120)

Load the parameters which you finetuned above. You can use nn.load_parameters() to load the parameters. Once you call this, the parameters stored in the params.h5 will be stored in global scope. You can check the parameters are different before and after nn.load_parameters() by using nn.get_parameters().

nn.load_parameters(param_path)  # load the finetuned parameters.
pred.forward()

Debugging

Deep neural networks are going deeper and deeper every year, requiring more components in the networks. Such complexity often misleads us to mal-configure the networks that can turn out be critical. Even if we correctly configure a neural network as desired, we may still want to find out its performance bottleneck, e.g., from which layer(s) the computational bottleneck comes.

In this debugging tutorial, we introduce the following ways to deal with such cases:

  1. visit method of a variable

  2. pretty-print

  3. simple graph viewer

  4. profiling utils

  5. value tracer

We will go over each technique, but first prepare the following reference model.

# If you run this notebook on Google Colab, uncomment and run the following to set up dependencies.
# !pip install nnabla-ext-cuda100
# !git clone https://github.com/sony/nnabla.git
# %cd nnabla/tutorial
# Python2/3 compatibility
from __future__ import print_function
from __future__ import absolute_import
from __future__ import division
import numpy as np
import nnabla as nn
import nnabla.logger as logger
import nnabla.functions as F
import nnabla.parametric_functions as PF
import nnabla.solvers as S

def block(x, maps, test=False, name="block"):
    h = x
    with nn.parameter_scope(name):
        with nn.parameter_scope("in-block-1"):
            h = PF.convolution(h, maps, kernel=(3, 3), pad=(1, 1), with_bias=False)
            h = PF.batch_normalization(h, batch_stat=not test)
            h = F.relu(h)
        with nn.parameter_scope("in-block-2"):
            h = PF.convolution(h, maps // 2, kernel=(3, 3), pad=(1, 1), with_bias=False)
            h = PF.batch_normalization(h, batch_stat=not test)
            h = F.relu(h)
        with nn.parameter_scope("in-block-3"):
            h = PF.convolution(h, maps, kernel=(3, 3), pad=(1, 1), with_bias=False)
            h = PF.batch_normalization(h, batch_stat=not test)

        if h.shape[1] != x.shape[1]:
            with nn.parameter_scope("skip"):
                s = PF.convolution(x, maps, kernel=(3, 3), pad=(1, 1), with_bias=False)
                s = PF.batch_normalization(s, batch_stat=not test)

    return F.relu(h + s)

def network(x, maps=16, test=False):
    h = x
    h = PF.convolution(h, maps, kernel=(3, 3), pad=(1, 1), name="first-conv", with_bias=False)
    h = PF.batch_normalization(h, batch_stat=not test, name="first-bn")
    h = F.relu(h)
    for l in range(4):
        h = block(h, maps * 2 ** (l + 1), name="block-{}".format(l))
        h = F.max_pooling(h, (2, 2))
    h = F.average_pooling(h, h.shape[2:])
    pred = PF.affine(h, 100, name="pred")
    return pred
Visit Method

Visit method of a variable takes either lambda, function, callable object as an argument and calls it over all NNabla functions where the variable can traverse in the forward order. It is easier to see the usage than expalined.

First of all, define the callable class.

class PrintFunc(object):
    def __call__(self, nnabla_func):
        print("==========")
        print(nnabla_func.info.type_name)
        print(nnabla_func.inputs)
        print(nnabla_func.outputs)
        print(nnabla_func.info.args)

This callable object takes a NNabla function, e.g., convolution, relu, etc., so a user can get information of that function.

nn.clear_parameters()  # this call is just in case to do the following code again

x = nn.Variable.from_numpy_array(np.random.randn(*[4, 3, 128, 128]))
pred = network(x)
pred.visit(PrintFunc())

This is the low-level API to see the graph information as you want by hand.

PPrint

PPrint method is one of the instantiation of the visit method. We can see the graph structure in the topological (forward) order in details. Here is a usage to see detailed information of a graph.

nn.clear_parameters()  # call this in case you want to run the following code agian

x = nn.Variable.from_numpy_array(np.random.randn(*[4, 3, 128, 128]))
pred = network(x)

# pprint
from nnabla.utils.inspection import pprint
pprint(pred, summary=True, forward=True, backward=True)
Simple Graph Viewer

Visit method is very useful for getting information about each function used in a graph, but it is hard to see the details of the whole network structure, e.g., which variable is connected to which variable. So we have a graph viewer that visually shows the whole structure of network, enabling us to debug more efficiently. Using this graph viewer is straightforward, as shown in the following code:

nn.clear_parameters()  # call this in case you want to run the following code agian

x = nn.Variable([4, 3, 128, 128])
pred = network(x)
import nnabla.experimental.viewers as V

graph = V.SimpleGraph(verbose=False)
graph.view(pred)

If one would like to see more detailed information as in visit method case, change verbose option to True.

graph = V.SimpleGraph(verbose=True)
graph.view(pred)

Now one can see detailed information!

Note that this viewer is mainly for NNabla users who want to write codes in python, so for those who like to see more beautiful network and play with that, please use Neural Network Console and visit https://dl.sony.com/.

Profiling Utils

Basically, this feature is for developers who want to know the whole stats in speed and which functions could be bottlenecks. NNabla provides a simple profiling tool. Once a network is prepared, one better to have other components to train the network like a loss function and solvers.

To create the profiler and see the results, run the following codes.

nn.clear_parameters()  # call this in case you want to run the following code agian

# Context
from nnabla.ext_utils import get_extension_context
device = "cudnn"
ctx = get_extension_context(device)
nn.set_default_context(ctx)

# Network
x = nn.Variable.from_numpy_array(np.random.randn(*[4, 3, 128, 128]))
t = nn.Variable([4, 1])
pred = network(x)
loss = F.mean(F.softmax_cross_entropy(pred, t))

# Solver
solver = S.Momentum()
solver.set_parameters(nn.get_parameters())

# Profiler
from nnabla.utils.profiler import GraphProfiler
B = GraphProfiler(loss, solver=solver, device_id=0, ext_name=device, n_run=100)
B.run()
print("Profile finished.")

# Report
from nnabla.utils.profiler import GraphProfilerCsvWriter
with open("./profile.csv", "w") as f:
    writer = GraphProfilerCsvWriter(B, file=f)
    writer.write()
print("Report is prepared.")

You can also find TimeProfiler to profile, but it is more fine-grained in measureing execution time.

With TimeProfiler, you can put a callback function to the forward and/or backward method in the training loop.

Value Tracer

We sometimes want to check if there exsits NaN/Inf. NanInfTracer is a convenient way to check if one of all layers in a graph has NaN/Inf value.

# Create graph again just in case
nn.clear_parameters()  # call this in case you want to run the following code agian

# Try to switch these two
x = nn.Variable.from_numpy_array(np.random.randn(*[4, 3, 64, 64]))
#x = nn.Variable([4, 3, 64, 64])
pred = network(x)

# NanInfTracer
from nnabla.utils.inspection import NanInfTracer
nit = NanInfTracer(trace_inf=True, trace_nan=True, need_details=True)

with nit.trace():
    # Try to comment either of these two or both
    pred.forward(function_post_hook=nit.forward_post_hook)
    pred.backward(function_post_hook=nit.backward_post_hook)

print(nit.check())

Static vs Dynamic Neural Networks in NNabla

NNabla allows you to define static and dynamic neural networks. Static neural networks have a fixed layer architecture, i.e., a static computation graph. In contrast, dynamic neural networks use a dynamic computation graph, e.g., randomly dropping layers for each minibatch.

This tutorial compares both computation graphs.

%matplotlib inline
import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import nnabla.solvers as S

import numpy as np
np.random.seed(0)

GPU = 0  # ID of GPU that we will use
2017-06-26 23:10:05,832 [nnabla][INFO]: Initializing CPU extension...
Dataset loading

We will first setup the digits dataset from scikit-learn:

from tiny_digits import *

digits = load_digits()
data = data_iterator_tiny_digits(digits, batch_size=16, shuffle=True)
2017-06-26 23:10:06,042 [nnabla][INFO]: DataSource with shuffle(True)
2017-06-26 23:10:06,043 [nnabla][INFO]: Using DataSourceWithMemoryCache
2017-06-26 23:10:06,044 [nnabla][INFO]: DataSource with shuffle(True)
2017-06-26 23:10:06,044 [nnabla][INFO]: On-memory
2017-06-26 23:10:06,045 [nnabla][INFO]: Using DataIterator

Each sample in this dataset is a grayscale image of size 8x8 and belongs to one of the ten classes 0, 1, …, 9.

img, label = data.next()
print(img.shape, label.shape)
(16, 1, 8, 8) (16, 1)
Network definition

As an example, we define a (unnecessarily) deep CNN:

def cnn(x):
    """Unnecessarily Deep CNN.

    Args:
        x : Variable, shape (B, 1, 8, 8)

    Returns:
        y : Variable, shape (B, 10)
    """
    with nn.parameter_scope("cnn"):  # Parameter scope can be nested
        with nn.parameter_scope("conv1"):
            h = F.tanh(PF.batch_normalization(
                PF.convolution(x, 64, (3, 3), pad=(1, 1))))
        for i in range(10):  # unnecessarily deep
            with nn.parameter_scope("conv{}".format(i + 2)):
                h = F.tanh(PF.batch_normalization(
                    PF.convolution(h, 128, (3, 3), pad=(1, 1))))
        with nn.parameter_scope("conv_last"):
            h = F.tanh(PF.batch_normalization(
                PF.convolution(h, 512, (3, 3), pad=(1, 1))))
            h = F.average_pooling(h, (2, 2))
        with nn.parameter_scope("fc"):
            h = F.tanh(PF.affine(h, 1024))
        with nn.parameter_scope("classifier"):
            y = PF.affine(h, 10)
    return y
Static computation graph

First, we will look at the case of a static computation graph where the neural network does not change during training.

from nnabla.ext_utils import get_extension_context

# setup cuda extension
ctx_cuda = get_extension_context('cudnn', device_id=GPU)  # replace 'cudnn' by 'cpu' if you want to run the example on the CPU
nn.set_default_context(ctx_cuda)

# create variables for network input and label
x = nn.Variable(img.shape)
t = nn.Variable(label.shape)

# create network
static_y = cnn(x)
static_y.persistent = True

# define loss function for training
static_l = F.mean(F.softmax_cross_entropy(static_y, t))
2017-06-26 23:10:06,350 [nnabla][INFO]: Initializing CUDA extension...
2017-06-26 23:10:06,571 [nnabla][INFO]: Initializing cuDNN extension...

Setup solver for training

solver = S.Adam(alpha=1e-3)
solver.set_parameters(nn.get_parameters())

Create data iterator

loss = []
def epoch_end_callback(epoch):
    global loss
    print("[{} {} {}]".format(epoch, np.mean(loss), itr))
    loss = []

data = data_iterator_tiny_digits(digits, batch_size=16, shuffle=True)
data.register_epoch_end_callback(epoch_end_callback)
2017-06-26 23:10:07,221 [nnabla][INFO]: DataSource with shuffle(True)
2017-06-26 23:10:07,224 [nnabla][INFO]: Using DataSourceWithMemoryCache
2017-06-26 23:10:07,226 [nnabla][INFO]: DataSource with shuffle(True)
2017-06-26 23:10:07,228 [nnabla][INFO]: On-memory
2017-06-26 23:10:07,230 [nnabla][INFO]: Using DataIterator

Perform training iterations and output training loss:

%%time
for epoch in range(30):
    itr = 0
    while data.epoch == epoch:
        x.d, t.d = data.next()
        static_l.forward(clear_no_need_grad=True)
        solver.zero_grad()
        static_l.backward(clear_buffer=True)
        solver.update()
        loss.append(static_l.d.copy())
        itr += 1
print()
[ 0 0.909297 112 ] [ 1 0.183863 111 ] [ 2 0.0723054 111 ] [ 3 0.0653021 112 ] [ 4 0.0628503 111 ] [ 5 0.0731626 111 ] [ 6 0.0319093 112 ] [ 7 0.0610926 111 ] [ 8 0.0817437 111 ] [ 9 0.0717577 112 ] [ 10 0.0241882 111 ] [ 11 0.0119452 111 ] [ 12 0.00664761 112 ] [ 13 0.00377711 111 ] [ 14 0.000605656 111 ] [ 15 0.000236613 111 ] [ 16 0.000174549 112 ] [ 17 0.000142428 111 ] [ 18 0.000126015 111 ] [ 19 0.000111144 112 ] [ 20 0.000100751 111 ] [ 21 9.03808e-05 111 ] [ 22 8.35904e-05 112 ] [ 23 7.73492e-05 111 ] [ 24 6.91389e-05 111 ] [ 25 6.74929e-05 112 ] [ 26 6.08386e-05 111 ] [ 27 5.62182e-05 111 ] [ 28 5.33428e-05 112 ] [ 29 4.94594e-05 111 ]
CPU times: user 14.3 s, sys: 6.78 s, total: 21.1 s
Wall time: 21.1 s
Dynamic computation graph

Now, we will use a dynamic computation graph, where the neural network is setup each time we want to do a forward/backward pass through it. This allows us to, e.g., randomly dropout layers or to have network architectures that depend on input data. In this example, we will use for simplicity the same neural network structure and only dynamically create it. For example, adding a if np.random.rand() > dropout_probability: into cnn() allows to dropout layers.

First, we setup the solver and the data iterator for the training:

nn.clear_parameters()
solver = S.Adam(alpha=1e-3)
solver.set_parameters(nn.get_parameters())

loss = []
def epoch_end_callback(epoch):
    global loss
    print("[{} {} {}]".format(epoch, np.mean(loss), itr))
    loss = []
data = data_iterator_tiny_digits(digits, batch_size=16, shuffle=True)
data.register_epoch_end_callback(epoch_end_callback)
2017-06-26 23:10:28,449 [nnabla][INFO]: DataSource with shuffle(True)
2017-06-26 23:10:28,450 [nnabla][INFO]: Using DataSourceWithMemoryCache
2017-06-26 23:10:28,450 [nnabla][INFO]: DataSource with shuffle(True)
2017-06-26 23:10:28,451 [nnabla][INFO]: On-memory
2017-06-26 23:10:28,451 [nnabla][INFO]: Using DataIterator
%%time
for epoch in range(30):
    itr = 0
    while data.epoch == epoch:
        x.d, t.d = data.next()
        with nn.auto_forward():
            dynamic_y = cnn(x)
            dynamic_l = F.mean(F.softmax_cross_entropy(dynamic_y, t))
        solver.set_parameters(nn.get_parameters(), reset=False, retain_state=True) # this can be done dynamically
        solver.zero_grad()
        dynamic_l.backward(clear_buffer=True)
        solver.update()
        loss.append(dynamic_l.d.copy())
        itr += 1
print()
[ 0 1.04669 112 ] [ 1 0.151949 111 ] [ 2 0.093581 111 ] [ 3 0.129242 112 ] [ 4 0.0452591 111 ] [ 5 0.0343987 111 ] [ 6 0.0315372 112 ] [ 7 0.0336886 111 ] [ 8 0.0194571 111 ] [ 9 0.00923094 112 ] [ 10 0.00536065 111 ] [ 11 0.000669383 111 ] [ 12 0.000294232 112 ] [ 13 0.000245866 111 ] [ 14 0.000201116 111 ] [ 15 0.000164177 111 ] [ 16 0.00014832 112 ] [ 17 0.000131479 111 ] [ 18 0.000115171 111 ] [ 19 0.000101432 112 ] [ 20 9.06228e-05 111 ] [ 21 8.7103e-05 111 ] [ 22 7.79601e-05 112 ] [ 23 7.59678e-05 111 ] [ 24 6.64341e-05 111 ] [ 25 6.22717e-05 112 ] [ 26 5.8643e-05 111 ] [ 27 5.35373e-05 111 ] [ 28 4.96717e-05 112 ] [ 29 4.65124e-05 111 ]
CPU times: user 23.4 s, sys: 5.35 s, total: 28.7 s
Wall time: 28.7 s

Comparing the two processing times, we can observe that both schemes (“static” and “dynamic”) takes the same execution time, i.e., although we created the computation graph dynamically, we did not lose performance.

Graph Converters

As neural networks becomes complex and one of components in a system, we sometimes want to convert a network as we want. Typical usecase is for inference. We want to merge or change some layers in a network as a high-level optimization for the inference speed. Also, there are other usecases: adding new layers to keep track some stats, adding quantize/dequantize layers for a quantized inference, decomposing a layer as combination of a low-rank ones, changing a network architecture for the neural architecture search based on an original network architecture, changing the tensor format from the channel first to channel last and opposite, and so on.

Let’s look at the simple cases 1. batch normalization folding 2. channel last conversion

As a reference network, use the follows.

# ResNet-50 for inference
import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import numpy as np
from nnabla.utils.inspection import pprint
from nnabla.models.imagenet import ResNet50

model = ResNet50()

batch_size = 1
x = nn.Variable((batch_size,) + model.input_shape)
y = model(x, training=False)
Batch Normalization Folding

See the resnet architecture.

pprint(y)

Now, we can see the batch normalization. For the inference, we do not need to compute the batch normalization explicitly by folding the batch normalization parameters if there is e.g., a convolution before the batch normalization.

To fold the batch normalization, use BatchNormalizationFoldingModifier as the following.

import nnabla.experimental.graph_converters as GC

modifiers = [GC.BatchNormalizationFoldingModifier()]
gc = GC.GraphConverter(modifiers)
yy = gc.convert(y)

Again, see the resnet architecture converted.

pprint(yy)

You can see that the converterd network does not contain the batch normalization any more!

In some cases, we can not fold the batch normalization, but the batch normalization can also be self-folded, i.e., the four parameters: scale, bias, running mean, running variance can be two other scale and bias. For doing this, use BatchNormalizationSelfFoldingModifier.

Channel Last Conversion

In NVIDIA latest GPU architectures since Volta, it supports TensorCore to accelerate the computatoinal performance. To boost the performance as maximum as possible, we need the channel-last tensor format aka NHWC. In NNabla, the default tensor format is the channel first aka NCHW, so as to utilize TensorCore, we need to change the tensor format to NHWC format.

ChannelLastModifier convert a network with NCHW tesnor format to another network with NHWC tensor format.

import nnabla.experimental.graph_converters as GC

modifiers = [GC.ChannelLastModifier([x])]
gc = GC.GraphConverter(modifiers)
yy = gc.convert(y)

Let’s see the resnet architecture converted.

pprint(yy)

We can find the channel dimension changed at the last!

If we want to access to the inputs of which tensor format converted,

x_cl = modifiers[0].inputs_cl[0]
print(x_cl)

Note that ChannelLastModifier supports a set of layers: Convolution, Deconvolution, BatchNormalization, MaxPooling, AveragePooling, SumPooling, Unpooling, Concatenate and also supposes NCHW format.

There also exists ChannelFirstModifier in the opposite change.

Mixed Precision Training

Introduction

Traditionally, for training a neural network, we used to use FP32 for weights and activations; however computation costs for training a neural network rapidly increase over years as the success of deep learning and the growing size of a neural network. It indicates that we need to spend much more time for training a huge size of a neural network while we would like to do lots of trials before a product launch. To address this problem, companies (e.g., NVIDIA) introduced an accelerator for speeding up computation. For example, NVIDIA Volta has Tensor Cores to speed up computation.

However, it uses FP16 weights, activations, gradients, and the range of FP16 is very limited when compared to that of FP32, meaning that sometimes (or often) values of gradients overflow and/or underflow, which affects the performance of a neural network or makes it collapse during training.

Mixed precision training is one of the algorithms to circumvent that problem while maintaining the same results that we could obtain with FP32 networks. It is well-described in The Training with Mixed Precision User Guide and Mixed Precision Training.

This tutorial explains how to do the mixed precision training in NNabla step-by-step.

Step-by-Step Instruction

Basically, the mixed precision training are composed of three parts.

  1. Use the accelerator for computation (here we assume Tensor Cores)

  2. Use loss scaling to prevent underflow

  3. Use dynamic loss scaling to prevent overflow/underflow

In NNabla, we can do the correspondences as follows.

1. Use Tensor Cores
ctx = get_extension_context("cudnn", type_config="half")
2. Use loss scaling to prevent underflow
loss_scale = 8
loss.backward(loss_scale)
solver.scale_grad(1. / loss_scale)  # do some gradient clipping, etc. after this
solver.update()
3. Use dynamic loss scaling to prevent overflow/underflow
loss_scale = 8
scaling_factor = 2
counter = 0
interval = 2000
...
loss.backward(loss_scale, ...)
...
if solver.check_inf_or_nan_grad():
    loss_scale /= scaling_factor
    counter = 0
else:
    solver.scale_grad(1. / loss_scale) # do some gradient clipping, etc. after this
    solver.update()
    if counter > interval:
        loss_scale *= scaling_factor
        counter = 0
    counter += 1

Note that currently the procedures of 2nd (Use loss scaling to prevent underflow) and 3rd (Use loss scaling to prevent overflow) are experimental, and we are now trying to speed up the mixed precision training, so API might change for future use, especially 3rd.

All-in-one Instruction

In the previous step-by-step example, the 3rd step is lengthy in a training loop, thus we can write a wrapper class like the following.

class DynamicLossScalingUpdater(object):
    '''Dynamic Loss Scaling Updater for the mixed precision training.

    Args:
        solver (:obj:`nnabla.solvers.Solver`): Solver object. E.g., Momentum or Adam.
        loss (:obj:`nnabla.Variable`): Loss variable from which the forward and the backward is called.
        data_feeder (callable :obj:`object`, function, or lambda): Data feeder
        scale (:obj:`float`): Loss scale constant. This is dynamically changing during training.
        scaling_factor (:obj:`float`): Scaling factor for the dynamic loss scaling.
        N (:obj:`int`): Interval, the number of iterations in training for increasing `loss scale` by `scaling_factor`.
        clear_buffer (:obj:`bool`): Clears the no longer referenced variables during backpropagation to save memory.
        accum_grad (:obj:`int`): Number of accumulation of gradients. Update method of the `solver` is called after the `accum_grad` number of the forward and backward is called.
        weight_decay (:obj:`float`): Decay constant. Default is `None`, not applying the weight decay.
        comm (:obj:`nnabla.communicators.Communicator`): Communicator when to do distributed training. Default is :obj:`None`.
        grads (:obj:`list` of :obj:`nnabla._nd_array.NdArray`): The list of gradients to be exchanged when to do distributed training. Default is the empty :obj:`list`.

    Attributes:
        solver (:obj:`nnabla.solvers.Solver`): Solver object. E.g., Momentum or Adam.
        loss (:obj:`nnabla.Variable`): Loss variable from which the forward and the backward is called.
        data_feeder (callable :obj:`object`, function, lambda): Data feeder
        scale (:obj:`float`): Loss scale constant. This is dynamically changing during training.
        scaling_factor (:obj:`float`): Scaling factor for the dynamic loss scaling.
        N (:obj:`int`): Interval, the number of iterations in training for increasing `loss scale` by `scaling_factor`.
        clear_buffer (:obj:`bool`): Clears the no longer referenced variables during backpropagation to save memory.
        accum_grad (:obj:`int`): Number of accumulation of gradients. Update method of the `solver` is called after the `accum_grad` number of the forward and backward is called.
        weight_decay (:obj:`float`): Decay constant. Default is `None`, not applying the weight decay.
        comm (:obj:`nnabla.communicators.Communicator`): Communicator when to do distributed training.
        grads (:obj:`list` of :obj:`nnabla._nd_array.NdArray`): The list of gradients to be exchanged when to do distributed training.

    Example:

        .. code-block:: python
            solver = <Solver>
            loss = <Loss Variable of Network>
            data_feeder = <DataFeeder>

            updater = DynamicLossScalingUpdater(solver, loss, data_feeder)

            # Training iteration
            for itr in range(max_iter):
                # Call solver.zero_grad, data_feeder, loss.forward, loss.backward
                # and solver.update with the dynamic loss scaling.
                updater.update()

    Reference:

        https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#scalefactor

    '''

    def __init__(self, solver, loss, data_feeder=lambda x: x,
                  scale=8.0, scaling_factor=2.0, N=2000, clear_buffer=True,
                  accum_grad=1, weight_decay=None,
                  comm=None,
                  grads=[]):
        self.solver = solver
        self.loss = loss
        self.data_feeder = data_feeder
        self.scale = scale
        self.scaling_factor = scaling_factor
        self.N = N
        self.clear_buffer = clear_buffer
        self.accum_grad = accum_grad
        self.weight_decay = weight_decay
        self.comm = comm
        self.grads = grads
        self._counter = 0
        self._recursive_count = 0
        self._max_recursive_count = 100

    def update(self):
        """Monolithic update method.

        This method calls the following methods with the dynamic loss scaling.

        1. solver.zerograd
        2. feed data
        3. loss.forward
        4. loss.backward
        5. comm.all_reduce (if it is specified)
        6. solver.update

        """

        # Initialize gradients.
        self.solver.zero_grad()

        # Forward and backward
        for _ in range(self.accum_grad):
            # feed data
            self.data_feeder()

            # forward
            self.loss.forward(clear_no_need_grad=self.clear_buffer)

            # backward with scale
            self.loss.backward(self.scale, clear_buffer=self.clear_buffer)

        # AllReduce
        if self.comm and len(self.grads) != 0:
            self.comm.all_reduce(self.grads, division=False, inplace=False)

        # Check Inf/NaN in grads
        if self.solver.check_inf_or_nan_grad():
            self.scale /= self.scaling_factor
            self._counter = 0

            # Recursively call udpate function until no inf nor nan.
            self._recursive_count += 1
            if self._recursive_count > self._max_recursive_count:
                self._recursive_count = 0
                return  # skip
            return self.update()
        self._recursive_count = 0

        # Rescale grads
        self.solver.scale_grad(1. / self.scale)

        # Do some gradient clipping, etc.
        if self.weight_decay is not None:
            self.solver.weight_decay(self.weight_decay)

        # Update
        self.solver.update()
        if self._counter > self.N:
            self.scale *= self.scaling_factor
            self._counter = 0
        self._counter += 1

Then, call the update method in a training loop:

from nnabla.experimental.mixed_precision_training import DynamicLossScalingUpdater

solver = <Solver>
loss = <Loss Variable of Network>
data_feeder = <DataFeeder>

updater = DynamicLossScalingUpdater(solver, loss, data_feeder)

# Training iteration
for itr in range(max_iter):
    # Call solver.zero_grad, data_feeder, loss.forward, loss.backward
    # and solver.update with the dynamic loss scaling.
    updater.update()
Notice

In the mixed-precision training, the followings are premise:

  1. Solver contains FP16 weights and the FP32 copy of weights. Solvers in NNabla hold FP32 weights and weight gradients and cast it to FP16 weights in forward pass and to FP16 weight gradients in backward pass if one sets type_config="half".

  2. Reductions should be left in FP32, for examples, the statistics (mean and variance) computed by the batch-normalization, Mean, Sum, SoftMax, SoftMaxCrossEntropy, etc. (see The Training with Mixed Precision User Guide). In NNabla, these functions are automatically fallbacked to use FP32.

Data Parallel Distributed Training

DataParallelCommunicator enables to train your neural network using multiple devices. It is normally used for gradients exchange in data parallel distributed training. Basically, there are two types of distributed trainings in Neural Network literature: Data Parallel and Model Parallel. Here we only focus on the former, Data Parallel Training. Data Parallel Distributed Training is based on the very simple equation used for the optimization of a neural network called (Mini-Batch) Stochastic Gradient Descent.

In the optimization process, the objective one tries to minimize is

\[f(\mathbf{w}; X) = \frac{1}{B \times N} \sum_{i=1}^{B \times N} \ell(\mathbf{w}, \mathbf{x}_i),\]

where \(f\) is a neural network, \(B \times N\) is the batch size, \(\ell\) is a loss function for each data point \(\mathbf{x} \in X\), and \(\mathbf{w}\) is the trainable parameter of the neural network.

When taking the derivative of this objective, one gets,

\[\nabla_{\mathbf{w}} f(\mathbf{w}; X) = \frac{1}{B \times N} \sum_{i=1}^{B \times N} \nabla_{\mathbf{w}} \ell (\mathbf{w}, \mathbf{x}_i).\]

Since the derivative has linearity, one can change the objective to the sum of summations each of which is the sum of derivatives over \(B\) data points.

\[\nabla_{\mathbf{w}} f(\mathbf{w}; X) = \frac{1}{N} \left( \frac{1}{B} \sum_{i=1}^{B} \nabla_{\mathbf{w}} \ell (\mathbf{w}, \mathbf{x}_i) \ + \frac{1}{B} \sum_{i=B+1}^{B \times 2} \nabla_{\mathbf{w}} \ell (\mathbf{w}, \mathbf{x}_i) \ + \ldots \ + \frac{1}{B} \sum_{i=B \times (N-1) + 1}^{B \times N} \nabla_{\mathbf{w}} \ell (\mathbf{w}, \mathbf{x}_i) \right)\]

In data parallel distributed training, the following steps are performed according to the above equation,

  1. each term, summation of derivatives (gradients) divided by batch size \(B\), is computed on a separated device (typically GPU),

  2. take the sum over devices,

  3. divide the result by the number of devices, \(N\).

That is the underlying foundation of Data Parallel Distributed Training.

This tutorial shows the usage of Multi Process Data Parallel Communicator for data parallel distributed training with a very simple example.

NOTE

This tutorial depends on IPython Cluster, thus when you want to run the following excerpts of the scripts on Jupyter Notebook, follow this to enable mpiexec/mpirun mode, then launch a corresponding Ipython Cluster on Ipython Clusters tab.

Launch client

This code is only needed for this tutorial via Jupyter Notebook.

import ipyparallel as ipp
rc = ipp.Client(profile='mpi')
Prepare the dependencies
%%px
import os
import time

import nnabla as nn
import nnabla.communicators as C
from nnabla.ext_utils import get_extension_context
import nnabla.functions as F
from nnabla.initializer import (
    calc_uniform_lim_glorot,
    UniformInitializer)
import nnabla.parametric_functions as PF
import nnabla.solvers as S
import numpy as np
Define the communicator for gradients exchange.
%%px
extension_module = "cudnn"
ctx = get_extension_context(extension_module)
comm = C.MultiProcessCommunicator(ctx)
comm.init()
n_devices = comm.size
mpi_rank = comm.rank
device_id = mpi_rank
ctx = get_extension_context(extension_module, device_id=device_id)

Check different ranks are assigned to different devices

%%px
print("n_devices={}".format(n_devices))
print("mpi_rank={}".format(mpi_rank))
[stdout:0]
n_devices=2
mpi_rank=1
[stdout:1]
n_devices=2
mpi_rank=0
Create data points and a very simple neural network
%%px
# Data points setting
n_class = 2
b, c, h, w = 4, 1, 32, 32

# Data points
x_data = np.random.rand(b, c, h, w)
y_data = np.random.choice(n_class, b).reshape((b, 1))
x = nn.Variable(x_data.shape)
y = nn.Variable(y_data.shape)
x.d = x_data
y.d = y_data

# Network setting
C = 1
kernel = (3, 3)
pad = (1, 1)
stride = (1, 1)
%%px
rng = np.random.RandomState(0)
w_init = UniformInitializer(
                    calc_uniform_lim_glorot(C, C/2, kernel=(1, 1)),
                    rng=rng)
%%px
# Network
with nn.context_scope(ctx):
    h = PF.convolution(x, C, kernel, pad, stride, w_init=w_init)
    pred = PF.affine(h, n_class, w_init=w_init)
    loss = F.mean(F.softmax_cross_entropy(pred, y))

Important notice here is that w_init is passed to parametric functions to let the network on each GPU start from the same values of trainable parameters in the optimization process.

Create a solver.
%%px
# Solver and add parameters
solver = S.Adam()
solver.set_parameters(nn.get_parameters())
Training

Recall the basic usage of nnabla API for training a neural network, it is

  1. loss.forward()

  2. solver.zero_grad()

  3. loss.backward()

  4. solver.update()

In use of C.MultiProcessCommunicator, these steps are performed in different GPUs, and the only difference from these steps is comm.all_reduce(). Thus, in case of C.MultiProcessCommunicator training steps are as follows,

  1. loss.forward()

  2. solver.zero_grad()

  3. loss.backward()

  4. comm.all_reduce([x.grad for x in nn.get_parameters().values()])

  5. solver.update()

First, forward, zero_grad, and backward,

%%px
# Training steps
loss.forward()
solver.zero_grad()
loss.backward()

Check gradients of weights once,

%%px
for n, v in nn.get_parameters().items():
    print(n, v.g)
[stdout:0]
('conv/W', array([[[[ 5.0180483,  0.457942 , -2.8701296],
         [ 2.0715926,  3.0698593, -1.6650047],
         [-2.5591214,  6.4248834,  9.881935 ]]]], dtype=float32))
('conv/b', array([8.658947], dtype=float32))
('affine/W', array([[-0.93160367,  0.9316036 ],
       [-1.376812  ,  1.376812  ],
       [-1.8957546 ,  1.8957543 ],
       ...,
       [-0.33000934,  0.33000934],
       [-0.7211893 ,  0.72118926],
       [-0.25237036,  0.25237036]], dtype=float32))
('affine/b', array([-0.48865744,  0.48865741], dtype=float32))
[stdout:1]
('conv/W', array([[[[ -1.2505884 ,  -0.87151337,  -8.685524  ],
         [ 10.738419  ,  14.676786  ,   7.483423  ],
         [  5.612471  , -12.880402  ,  19.141157  ]]]], dtype=float32))
('conv/b', array([13.196114], dtype=float32))
('affine/W', array([[-1.6865108 ,  1.6865108 ],
       [-0.938529  ,  0.938529  ],
       [-1.028422  ,  1.028422  ],
       ...,
       [-0.98217344,  0.98217344],
       [-0.97528917,  0.97528917],
       [-0.413546  ,  0.413546  ]], dtype=float32))
('affine/b', array([-0.7447065,  0.7447065], dtype=float32))

You can see the different values on each device, then call all_reduce,

%%px
comm.all_reduce([x.grad for x in nn.get_parameters().values()], division=True)

Commonly, all_reduce only means the sum; however, comm.all_reduce addresses both cases: summation and summation division.

Again, check gradients of weights,

%%px
for n, v in nn.get_parameters().items():
    print(n, v.g)
[stdout:0]
('conv/W', array([[[[ 1.8837299 , -0.20678568, -5.777827  ],
         [ 6.4050055 ,  8.8733225 ,  2.9092093 ],
         [ 1.5266749 , -3.2277591 , 14.511546  ]]]], dtype=float32))
('conv/b', array([21.85506], dtype=float32))
('affine/W', array([[-2.6181145,  2.6181145],
       [-2.315341 ,  2.315341 ],
       [-2.9241767,  2.9241762],
       ...,
       [-1.3121828,  1.3121828],
       [-1.6964785,  1.6964784],
       [-0.6659163,  0.6659163]], dtype=float32))
('affine/b', array([-1.233364 ,  1.2333639], dtype=float32))
[stdout:1]
('conv/W', array([[[[ 1.8837299 , -0.20678568, -5.777827  ],
         [ 6.4050055 ,  8.8733225 ,  2.9092093 ],
         [ 1.5266749 , -3.2277591 , 14.511546  ]]]], dtype=float32))
('conv/b', array([21.85506], dtype=float32))
('affine/W', array([[-2.6181145,  2.6181145],
       [-2.315341 ,  2.315341 ],
       [-2.9241767,  2.9241762],
       ...,
       [-1.3121828,  1.3121828],
       [-1.6964785,  1.6964784],
       [-0.6659163,  0.6659163]], dtype=float32))
('affine/b', array([-1.233364 ,  1.2333639], dtype=float32))

You can see the same values over the devices because of all_reduce.

Update weights,

%%px
solver.update()

This concludes the usage of C.MultiProcessDataCommunicator for Data Parallel Distributed Training.

Now you should have an understanding of how to use C.MultiProcessCommunicator, go to the cifar10 example,

  1. multi_device_multi_process_classification.sh

  2. multi_device_multi_process_classification.py

for more details.

Function list and converter

nnabla_cli is the command line interface of nnabla. With this command line interface, user may know current NNabla support status, and know whether or how to convert a nnabla model(e.g. *.nnp) to other format of model(e.g. *.onnx).

The subcommand function_info provides a set of functions to output implemented function information. With this information, you may build tailored nnabla-c-runtime library for your model, or skip some unsupported functions for the target model.

Some simple use cases

Please let us introduce some simple use cases:

At first, you want to know how many functions (which functions) nnabla currently supports:

$ nnabla_cli function_info

You get the following list:

2019-06-14 16:16:13,106 [nnabla][INFO]: Initializing CPU extension...
NNabla command line interface (Version:1.0.18.dev1, Build:190531084842)
LSTM
Sub2
Mul2
GreaterEqual
Sigmoid
NotEqual
Unpooling
Log
CategoricalCrossEntropy
...

That is the list of current nnabla all supported functions. Only function names are shown, no more detail, only for seeking certain function by name. For the detail of each function, you have to check with online document.

As you known, nnabla’s model *.nnp can be converted to a compact version, it has the postfix .nnb, can be inferred by nnabla-c-runtime library. We simply named this format as NNB. To know how many functions are supported in this format, you may use this command:

$ nnabla_cli function_info -f NNB

Similar as above, a function list is shown.

Do we simple list the functions used in a .nnp model? Yes, of course.

$ nnabla_cli function_info my_model.nnp

Similar as above, a function list used in this model is listed.

Then, we may know whether our model can be converted to nnabla-c-runtime model format, or formally speaking, we can know the intersection of 2 function sets, one is the function set in .nnp and the other is nnabla-c-runtime has supported.

$ nnabla_cli function_info my_model.nnp -f NNB

The output looks like:

2019-06-14 17:01:29,393 [nnabla][INFO]: Initializing CPU extension...
NNabla command line interface (Version:1.0.18.dev1, Build:190531084842)
Importing mnist_nnp/lenet_010000.nnp
 Expanding runtime.
nnabla-c-runtime currently support the following functions in model:
Convolution
MulScalar
Affine
MaxPooling
ReLU
...

Unsupported functions are also listed up if there are any in this model.

Tailored nnabla-c-runtime library

When implementing nnabla-c-runtime library, we hope to implement all functions we can. But from customer’s aspect, that is sometimes no need. If user only wants to use nnabla-c-runtime for enumerable models, the nnabla-c-runtime should be tailed exactly as what these models required. How to do then?

It can be implemented with the following steps:

  1. generate function list

  2. config your nnabla-c-runtime library

  3. build nnabla-c-runtime library

1. Generate function list
$ nnabla_cli function_info my_model.nnp -f NNB -o functions.txt

This is similar as above, except that with -o parameter, which pointed out which file should be written to. (of course, the format is different from the version output to stdout, it is more compact)

2. config your nnabla-c-runtime library

User may manually modify functions.txt. Then, this file is used as input, used to generate nnabla-c-runtime library’s config file:

$ nnabla_cli function_info -c functions.txt -o nnabla-c-runtime/build-tools/code-generator/functions.yaml

As we inferred, if there is no -c parameter, a full function set will be used to generate this config file, of course, the library will finally contain all implemented functions. This is the default behavior.

3. build nnabla-c-runtime library

The build process is relatively directly, as the following:

#> nnabla-c-runtime>mkdir build
#> nnabla-c-runtime>cd build
#> nnabla-c-runtime>cmake ..
#> nnabla-c-runtime>make

The nnabla-c-runtime library libnnablart_functions.a will contain the functions what you want.

Skip functions unsupported

When you want to convert *.nnp to *.onnx or *.nnb, there are some functions are not supported in target function list. For example, you want to convert a network to nnabla-c-runtime. The network looks like:

Affine
Softmax
Tanh
Convolution
MaxPooling
ReLU

You do not want to use nnabla-c-runtime library’s Convolution, you want to split the network in 2 pieces at the point of Convolution. 2 Steps are needed to do so:

  1. comment out the function in functions.txt

  2. convert the network with -c parameter

1. comment out the function in functions.txt
...
;Affine
...
2. convert the network with -c parameter
$ nnabla_cli convert -c functions.txt a.nnp b.nnb

Thus, the network is splitted into pieces, the output shows as the following:

...
LeNet_036_0_5.nnb:
  input:
  - name: Input
    shape: (-1, 1, 28, 28)
  output:
  - name: Tanh_2
    shape: (-1, 30, 4, 4)
LeNet_036_7_7.nnb:
  input:
  - name: Affine
    shape: (-1, 150)
  output:
  - name: ReLU_2
    shape: (-1, 150)
LeNet_036_9_9.nnb:
  input:
  - name: Affine_2
    shape: (-1, 10)
  output:
  - name: Softmax
    shape: (-1, 10)

The network is split at the Affine function. Since there are 2 Affine in network, 3 sub-networks is generated.

Converting to ONNX

The following commands just do similar as above, exactly to *.onnx.

List all functions supported:

$ nnabla_cli function_info -f ONNX

List the intersection of function sets, in a model and supported by ONNX:

$ nnabla_cli function_info LeNet_036.nnp -f ONNX

Split network to skip some function:

$ nnabla_cli convert -c functions.txt a.nnp a.onnx

Python Command Line Interface

Nnabla has command line interface utility which can do train, forward(inference), convert param and dataset, measure performance, file format converter and so on.

usage: nnabla_cli [-h] [-m]
                  {train,infer,forward,encode_param,decode_param,profile,conv_dataset,compare_with_cpu,create_image_classification_dataset,upload,create_tar,function_info,optimize,dump,nnb_template,convert,plot_series,plot_timer,draw_graph,version}
                  ...

Command line interface for NNabla(Version 1.0.11.dev1, Build 181226024531)

positional arguments:
  {train,infer,forward,encode_param,decode_param,profile,conv_dataset,compare_with_cpu,create_image_classification_dataset,upload,create_tar,function_info,optimize,dump,nnb_template,convert,plot_series,plot_timer,draw_graph,version}
    train               Training with NNP.
    infer               Do inference with NNP and binary data file input.
    forward             Do evaluation with NNP and test dataset.
    encode_param        Encode plain text to parameter format.
    decode_param        Decode parameter to plain text.
    profile             Profiling performance with NNP.
    conv_dataset        Convert CSV dataset to cache.
    compare_with_cpu    Compare performance between two nntxt.
    create_image_classification_dataset
                        Create dataset from image files.
    upload              Upload dataset to Neural Network Console.
    create_tar          Create tar file for Neural Network Console.
    function_info       Output function info.
    optimize            Optimize pb model.
    dump                Dump network with supported format.
    nnb_template        Generate NNB config file template.
    convert             File format converter.
    plot_series         Plot *.series.txt files.
    plot_timer          Plot *.timer.txt files.
    draw_graph          Draw a graph in a NNP or nntxt file with graphviz.
    version             Print version and build number.

optional arguments:
  -h, --help            show this help message and exit
  -m, --mpi             exec with mpi.

Work with NNP

Training
usage: nnabla_cli train [-h] -c CONFIG [-p PARAM] -o OUTDIR

optional arguments:
  -h, --help            show this help message and exit
  -c CONFIG, --config CONFIG
                        path to nntxt
  -p PARAM, --param PARAM
                        path to parameter file
  -o OUTDIR, --outdir OUTDIR
                        output directory
Profile
usage: nnabla_cli profile [-h] -c CONFIG -o OUTDIR

optional arguments:
  -h, --help            show this help message and exit
  -c CONFIG, --config CONFIG
                        path to nntxt
  -o OUTDIR, --outdir OUTDIR
                        output directory
Forward
usage: nnabla_cli forward [-h] -c CONFIG [-p PARAM] [-d DATASET] -o OUTDIR [-b BATCH_SIZE]

optional arguments:
  -h, --help            show this help message and exit
  -c CONFIG, --config CONFIG
                        path to nntxt
  -p PARAM, --param PARAM
                        path to parameter file
  -d DATASET, --dataset DATASET
                        path to CSV dataset
  -o OUTDIR, --outdir OUTDIR
                        output directory
  -b BATCH_SIZE, --batch_size BATCH_SIZE
                        Batch size to use batch size in nnp file set -1.
Inference
usage: nnabla_cli infer [-h] -c CONFIG [-o OUTPUT] [-p PARAM] [-b BATCH_SIZE] inputs [inputs ...]

positional arguments:
  inputs

optional arguments:
  -h, --help            show this help message and exit
  -c CONFIG, --config CONFIG
                        path to nntxt
  -o OUTPUT, --output OUTPUT
                        output file prefix
  -p PARAM, --param PARAM
                        path to parameter file
  -b BATCH_SIZE, --batch_size BATCH_SIZE
                        Batch size to use batch size in nnp file set -1.
Compare with CPU
usage: nnabla_cli compare_with_cpu [-h] -c CONFIG -c2 CONFIG2 -o OUTDIR

optional arguments:
  -h, --help            show this help message and exit
  -c CONFIG, --config CONFIG
                        path to nntxt
  -c2 CONFIG2, --config2 CONFIG2
                        path to cpu nntxt
  -o OUTDIR, --outdir OUTDIR
                        output directory

Dataset manipulation

Encode parameter
usage: nnabla_cli encode_param [-h] -i INDIR [-p PARAM]

optional arguments:
  -h, --help            show this help message and exit
  -i INDIR, --indir INDIR
                        input directory
  -p PARAM, --param PARAM
                        path to parameter file
Decode parameter
usage: nnabla_cli decode_param [-h] [-p PARAM] -o OUTDIR

optional arguments:
  -h, --help            show this help message and exit
  -p PARAM, --param PARAM
                        path to parameter file
  -o OUTDIR, --outdir OUTDIR
                        output directory
Convert dataset
usage: nnabla_cli conv_dataset [-h] [-F] [-S] [-N] source destination

positional arguments:
  source
  destination

optional arguments:
  -h, --help       show this help message and exit
  -F, --force      force overwrite destination
  -S, --shuffle    shuffle data
  -N, --normalize  normalize data range
Create image classification dataset
usage: nnabla_cli create_image_classification_dataset [-h] -i SOURCEDIR -o OUTDIR -c CHANNEL -w WIDTH -g HEIGHT -m MODE -s SHUFFLE -f1 FILE1 [-r1 RATIO1] [-f2 FILE2]
                                                      [-r2 RATIO2]

optional arguments:
  -h, --help            show this help message and exit
  -i SOURCEDIR, --sourcedir SOURCEDIR
                        source directory with directories for each class
  -o OUTDIR, --outdir OUTDIR
                        output directory
  -c CHANNEL, --channel CHANNEL
                        number of output color channels
  -w WIDTH, --width WIDTH
                        width of output image
  -g HEIGHT, --height HEIGHT
                        height of output image
  -m MODE, --mode MODE  shaping mode (trimming or padding)
  -s SHUFFLE, --shuffle SHUFFLE
                        shuffle mode (true or false)
  -f1 FILE1, --file1 FILE1
                        output file name 1
  -r1 RATIO1, --ratio1 RATIO1
                        output file ratio(%) 1
  -f2 FILE2, --file2 FILE2
                        output file name 2
  -r2 RATIO2, --ratio2 RATIO2
                        output file ratio(%) 2
Upload dataset to Neural Network Console
usage: nnabla_cli upload [-h] [-e ENDPOINT] token filename

positional arguments:
  token                 token for upload
  filename              filename to upload

optional arguments:
  -h, --help            show this help message and exit
  -e ENDPOINT, --endpoint ENDPOINT
                        set endpoint uri
Create dataset archive for Neural Network Console
usage: nnabla_cli create_tar [-h] source destination

positional arguments:
  source       CSV dataset
  destination  TAR filename

optional arguments:
  -h, --help   show this help message and exit

File format converter

For detailed information please see File format converter.

Dump content of supported format
usage: nnabla_cli dump [-h] [-v] [-F] [-V] [--dump-limit DUMP_LIMIT]
                       [-n DUMP_VARIABLE_NAME] [-I IMPORT_FORMAT]
                       [-E NNP_IMPORT_EXECUTOR_INDEX]
                       [--nnp-exclude-preprocess] [--nnp-no-expand-network]
                       FILE [FILE ...]

positional arguments:
  FILE                  File or directory name(s) to convert.

optional arguments:
  -h, --help            show this help message and exit
  -v, --dump-verbose    [dump] verbose output.
  -F, --dump-functions  [dump] dump function list.
  -V, --dump-variables  [dump] dump variable list.
  --dump-limit DUMP_LIMIT
                        [dump] limit num of items.
  -n DUMP_VARIABLE_NAME, --dump-variable-name DUMP_VARIABLE_NAME
                        [dump] Specific variable name to display.
  -I IMPORT_FORMAT, --import-format IMPORT_FORMAT
                        [import] import format. (one of [NNP,ONNX])
  -E NNP_IMPORT_EXECUTOR_INDEX, --nnp-import-executor-index NNP_IMPORT_EXECUTOR_INDEX
                        [import][NNP] import only specified executor.
  --nnp-exclude-preprocess
                        [import][NNP] EXPERIMENTAL exclude preprocess
                        functions when import.
  --nnp-no-expand-network
                        [import][NNP] expand network with repeat or recurrent.
Generate NNB config file template
usage: nnabla_cli nnb_template [-h] [-I IMPORT_FORMAT]
                               [--nnp-no-expand-network] [-b BATCH_SIZE]
                               [-T DEFAULT_VARIABLE_TYPE]
                               FILE [FILE ...]

positional arguments:
  FILE                  File or directory name(s) to convert.

optional arguments:
  -h, --help            show this help message and exit
  -I IMPORT_FORMAT, --import-format IMPORT_FORMAT
                        [import] import format. (one of [NNP,ONNX])
  --nnp-no-expand-network
                        [import][NNP] expand network with repeat or recurrent.
  -b BATCH_SIZE, --batch-size BATCH_SIZE
                        [export] overwrite batch size.
  -T DEFAULT_VARIABLE_TYPE, --default-variable-type DEFAULT_VARIABLE_TYPE
                        Default type of variable
File format converter
usage: nnabla_cli convert [-h] [-I IMPORT_FORMAT] [--nnp-no-expand-network]
                          [-O EXPORT_FORMAT] [-f] [-b BATCH_SIZE]
                          [--nnp-parameter-h5] [--nnp-parameter-nntxt]
                          [--nnp-exclude-parameter] [-T DEFAULT_VARIABLE_TYPE]
                          [-s SETTINGS] [-c CONFIG] [-d DEFINE_VERSION] [--api API]
                          [--enable-optimize-pb] [--outputs OUTPUTS]
                          [--inputs INPUTS] FILE [FILE ...]

positional arguments:
  FILE                  File or directory name(s) to convert.
                        (When convert ckpt format of the tensorflow model,
                        If the version of the checkpoint is V1, need to enter the `.ckpt` file,
                        otherwise need to enter the `.meta` file.)

optional arguments:
  -h, --help            show this help message and exit
  -I IMPORT_FORMAT, --import-format IMPORT_FORMAT
                        [import] import format. (one of [NNP,ONNX,TF_CKPT_V1,TF_CKPT_V2,TF_PB,SAVED_MODEL,TFLITE])
  --nnp-no-expand-network
                        [import][NNP] expand network with repeat or recurrent.
  --outputs OUTPUTS
                        [import][tensorflow] The name(s) of the output nodes, comma separated.
                                             Only needed when convert CKPT format.
  --inputs INPUTS
                        [import][tensorflow] The name(s) of the input nodes, comma separated.
                                             Only needed when convert CKPT format.
  -O EXPORT_FORMAT, --export-format EXPORT_FORMAT
                        [export] export format. (one of [NNP,NNB,CSRC,ONNX,SAVED_MODEL,TFLITE,TF_PB],
                                 the export file format is 'CSRC' or 'SAVED_MODEL' that
                                 argument '--export-format' will have to be set!!!)
  -f, --force           [export] overwrite output file.
  -b BATCH_SIZE, --batch-size BATCH_SIZE
                        [export] overwrite batch size.
  --nnp-parameter-h5    [export][NNP] store parameter with h5 format
  --nnp-parameter-nntxt
                        [export][NNP] store parameter into nntxt
  --nnp-exclude-parameter
                        [export][NNP] output without parameter
  -T DEFAULT_VARIABLE_TYPE, --default-variable-type DEFAULT_VARIABLE_TYPE
                        Default type of variable
  -s SETTINGS, --settings SETTINGS
                        Settings in YAML format file.
  -c CONFIG, --config CONFIG
                        [export] config target function list.
  -d DEFINE_VERSION, --define_version
                        [export][ONNX] define onnx opset version. e.g. opset_6
                        [export][ONNX] define convert to onnx for SNPE. e.g. opset_snpe
                        [export][ONNX] define convert to onnx for TensorRT. e.g. opset_tensorrt
                        [export][NNB] define binary format version. e.g. nnb_3
  --api API             [export][NNB] Set API Level to convert to, default is highest API Level.
  --enable-optimize-pb  [export][tensorflow] enable optimization when export to pb.
  --channel_last        [export][TFLite] Specify the data_format of the NNP network,
                                         data_format default is channel_first.
  --quantization        [export][TFLite] export to INT8 quantized tflite model.
  --dataset             [export][TFLite] Specify the path of represent dataset which will be passed to INT8 quantized tflite converter.
Optimize pb model
usage: nnabla_cli optimize [-h] input_pb_file output_pb_file

positional arguments:
  input_pb_file       Input pre-optimized pb model.
  output_pb_file      Output optimized pb model.

Plot Monitor class output files

Note:

  • Plotting subcommands require matplotlib package.

  • By default, the following commands show a plot on your display using a backend rendering engine of matplotlib depending on your environment. If you want to save a plot as an image or a vector data, use -o option to specifiy a file name where a plot is saved.

MonitorSeries
usage: nnabla_cli plot_series [-h] [-l LABEL] [-o OUTFILE] [-x XLABEL]
                              [-y YLABEL] [-t TITLE] [-T YLIM_MAX]
                              [-B YLIM_MIN] [-R XLIM_MAX] [-L XLIM_MIN]
                              infile [infile ...]

Plot *.series.txt files produced by nnabla.monitor.MonitorSeries class.

Example:

    nnabla_cli plot_series -x "Epochs" -y "Squared error loss" -T 10 -l "config A" -l "config B" result_a/Training-loss.series.txt result_b/Training-loss.series.txt

positional arguments:
  infile                Path to input file.

optional arguments:
  -h, --help            show this help message and exit
  -l LABEL, --label LABEL
                        Label of each plot.
  -o OUTFILE, --outfile OUTFILE
                        Path to output file.
  -x XLABEL, --xlabel XLABEL
                        X-axis label of plot.
  -y YLABEL, --ylabel YLABEL
                        Y-axis label of plot.
  -t TITLE, --title TITLE
                        Title of plot.
  -T YLIM_MAX, --ylim-max YLIM_MAX
                        Y-axis plot range max.
  -B YLIM_MIN, --ylim-min YLIM_MIN
                        Y-axis plot range min.
  -R XLIM_MAX, --xlim-max XLIM_MAX
                        X-axis plot range max.
  -L XLIM_MIN, --xlim-min XLIM_MIN
                        X-axis plot range min.
MonitorTimeElapsed
usage: nnabla_cli plot_timer [-h] [-l LABEL] [-o OUTFILE] [-x XLABEL]
                             [-y YLABEL] [-t TITLE] [-T YLIM_MAX]
                             [-B YLIM_MIN] [-R XLIM_MAX] [-L XLIM_MIN] [-e]
                             [-u TIME_UNIT]
                             infile [infile ...]

Plot *.timer.txt files produced by nnabla.MonitorTimeElapsed class.

Example:

    nnabla_cli plot_timer -x "Epochs" -l "config A" -l "config B" result_a/Epoch-time.timer.txt result_b/Epoch-time.timer.txt

positional arguments:
  infile                Path to input file.

optional arguments:
  -h, --help            show this help message and exit
  -l LABEL, --label LABEL
                        Label of each plot.
  -o OUTFILE, --outfile OUTFILE
                        Path to output file.
  -x XLABEL, --xlabel XLABEL
                        X-axis label of plot.
  -y YLABEL, --ylabel YLABEL
                        Y-axis label of plot.
  -t TITLE, --title TITLE
                        Title of plot.
  -T YLIM_MAX, --ylim-max YLIM_MAX
                        Y-axis plot range max.
  -B YLIM_MIN, --ylim-min YLIM_MIN
                        Y-axis plot range min.
  -R XLIM_MAX, --xlim-max XLIM_MAX
                        X-axis plot range max.
  -L XLIM_MIN, --xlim-min XLIM_MIN
                        X-axis plot range min.
  -e, --elapsed         Plot total elapsed time. By default, it plots elapsed time per iteration.
  -u TIME_UNIT, --time-unit TIME_UNIT
                        Time unit chosen from {s|m|h|d}.
Draw a graph from NNP or .nntxt files

Note:

  • This feature requires graphviz installed as a Python package. The graphviz Python is a interface to graphviz library which is not installed by pip command. You have to install it using apt on Ubuntu for example.

usage: nnabla_cli draw_graph [-h] [-o OUTPUT_DIR] [-n NETWORK] [-f FORMAT]
                             input

Draw a graph in a NNP or nntxt file with graphviz.

Example:

    nnabla_cli draw_graph -o output-folder path-to-nnp.nnp

positional arguments:
  input                 Path to input nnp or nntxt.

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT_DIR, --output-dir OUTPUT_DIR
                        Output directory.
  -n NETWORK, --network NETWORK
                        Network names to be drawn.
  -f FORMAT, --format FORMAT
                        Graph saving format compatible with graphviz (`pdf`, `png`, ...).

Development

Generate function information
usage: nnabla_cli function_info [-h] [-o OUTFILE] [-f FUNC_SET] [-c CONFIG]
                                [-t TARGET] [-q --query] [--nnp-no-expand-network]
                                [--api API] [FILE] [FILE ...]

positional arguments:
  FILE                  Path to nnp file.

optional arguments:
  -h, --help  show this help message and exit
  -o OUTFILE, --output OUTFILE
                      output filename, *.txt or *.yaml, the default is stdout.
  -f FUNC_SET, --all_support FUNC_SET
                      select function set: NNB, ONNX, the default is nnabla.
  -c CONFIG, --config CONFIG
                      user config file for target constraint, *.txt file of the
                      function list or the "opset_" args.
  -t, --target
                      output target function list.
  -q, --query
                      query the detail of a function.
  --nnp-no-expand-network
                      [import][NNP] expand network with repeat or recurrent.
  --api API           List up api levels
Display version
usage: nnabla_cli version [-h]

optional arguments:
  -h, --help  show this help message and exit

Python API Examples

There are a bunch of examples provided in NNabla repository. Please follow [this link](https://github.com/sony/nnabla-examples) to see examples.

Python API Reference

Common

Config

Search config file and get config information from config file.

Config file search order is described in following table. Each config value is overwritten by the following configs.

Type

Posix

Windows

System wide

/etc/nnabla.conf

c:\ProgramData\NNabla\nnabla.ini

User

~/.nnabla

c:\Users\[USERNAME]\AppData\Roaming\NNabla\nnabla.ini

Default

(Same directory with ‘config.py’)/nnabla.conf

Local

[CURRENT DIRECTORY]/nnabla.conf

You can get config value as followings.

from utils.config import nnabla_config
value = nnabla_config.get(CATEGORY, VALUE_NAME)

CATEGORY and VALUE_NAME does not defined in config.py. You can add CATEGORY and VALUE as you like. See Official document for more information.

[CATEGORY]
VALUE_NAME = value

Default values defined in ‘nnabla.conf’ placed same directory with config.py is here.

Logger

Wrapper module for logging.

You can use the logger as follows:

from utils.logger import logger

logger.debug('Log message(DEBUG)')
logger.info('Log message(INFO)')
logger.warn('Log message(WARNING)')
logger.error('Log message(ERROR)')
logger.critical('Log message(CRITICAL)')

With the default settings, it should yield the following output:

$ python scripts/logger_test.py
[nnabla][ERROR]: logger_test.py : <module> : 5 : Log message(ERROR)
[nnabla][CRITICAL]: logger_test.py : <module> : 6 : Log message(CRITICAL)

If you want to output log to file. You must create nnabla.conf file and put following entry.

See nnabla.config for more information about config file.

[LOG]
log_file_name = /tmp/nbla.log

After this you can get following output.

$ python scripts/logger_test.py
[nnabla][ERROR]: logger_test.py : <module> : 5 : Log message(ERROR)
[nnabla][CRITICAL]: logger_test.py : <module> : 6 : Log message(CRITICAL)
$ cat /tmp/nbla.log
2017-01-19 14:41:35,132 [nnabla][DEBUG]: scripts/logger_test.py : <module> : 3 : Log message(DEBUG)
2017-01-19 14:41:35,132 [nnabla][INFO]: scripts/logger_test.py : <module> : 4 : Log message(INFO)
2017-01-19 14:41:35,132 [nnabla][ERROR]: scripts/logger_test.py : <module> : 5 : Log message(ERROR)
2017-01-19 14:41:35,132 [nnabla][CRITICAL]: scripts/logger_test.py : <module> : 6 : Log message(CRITICAL)
nnabla.logger.logger

alias of <Logger nnabla (INFO)>

Auto-forward mode

NNabla provides the dynamic computation graph feature, which enables automatic forward propagation during graph construction. This can be enabled using the set_auto_forward() function. Backpropagation shall be manually executed on the dynamically constructed graph.

nnabla.auto_forward(auto=True)[source]

Context for dynamic graph execution mode.

Parameters

auto (bool) – Whether forward computation is executed during a computation graph construction.

Returns: bool

nnabla.set_auto_forward(auto)[source]

Set the default mode for automatic forward propagation.

When it is set to True , forward propagation is invoked immediately when the computation graph is updated.

Parameters

auto (bool) – Whether forward computation is executed when the computation graph is updated.

Returns: bool

nnabla.get_auto_forward()[source]

Get the state of automatic forward execution.

When it is true, forward execution is invoked during a computation graph definition.

Note

This is called by users usually.

Context
class nnabla.Context(backend=None, array_class='', device_id='0')

Context is used to specify the computation engine (cpu, cuda, cudnn etc.) which the function operator modules and optimizer modules shall be ran on. The context can be set for each function, as well as set globally with functions listed in the context-specifier().

Parameters
  • backend (list of str) – ‘cpu’, ‘cuda’, ‘cudnn’ etc.

  • array_class (str) – str, ‘CpuArray’, ‘CpuCachedArray’, ‘CudaArray’, ‘CudaCachedArray’ etc.

  • device_id (str) – str, default ‘0’

Context Specifier API
nnabla.context_scope(ctx)[source]

Context as Python context.

import nnabla as nn
import nnabla.functions as F
x = nn.Variable([2, 3 ,4])
ctx = nnabla_ext.cuda.context('0')
with context_scope(ctx):
    # Inside with scope, the specified context is used.
    with parameter_scope('w1'):
        l1 = F.relu(F.affine(x, 64))
    with parameter_scope('w2'):
        l2 = F.relu(F.affine(x, 64))
nnabla.set_default_context(ctx)[source]

Set the default context.

Note

It cannot be called inside any context_scope.

Parameters

ctx (Context) – A Context.

nnabla.get_current_context()[source]

Get the current context.

It can be set using nnabla.context_scope() or nnabla.set_default_context() .

Returns

a current context.

Return type

Context

NdArray

class nnabla.NdArray(*args, **kwargs)

nnabla.NdArray is a device-agnostic data container for multi-dimensional arrays (tensors). nnabla.NdArray can also implicitly handle data transfers across different devices (e.g. CPU to CUDA GPU, CUDA GPU to CPU). See Python API Tutorial for more details.

NdArray overrides some arithmetic operators (+, -, *, /, **). Operands can be either a scalar number, NdArray or Variable. An arithmetic operation containing NdArray returns NdArray which stores the output of the computation immediately invoked. Also, inplace arithmetic operations (+=, -=, *=, /=, **=) are implemented. Note that = doesn’t perform inplace substitution but just replaces the object reference. Instead, you can use copy_from() for inplace substitution.

Parameters

shape (tuple or int) – Shape of tuple.

bool_fill(self, mask, value)

Return a new but inplaced nnabla.NdArray filled with value where mask is non-zero.

Parameters
  • mask (nnabla.NdArray) – Mask with which to fill. Non-zero/zero elements are supposed to be a binary mask as 1/0. No gradients are computed with respect to mask.

  • value (float) – The value to fill.

cast(self, dtype, ctx=None)

In-place cast of data type of the NdArray. It returns the reference values as a numpy.ndarray only if optional parameter ctx is not given, None otherwise.

Parameters
Returns

numpy.array if ctx is None, otherwise nothing.

clear(self)

Clear memories which this NdArray has and return them to allocator.

clear_called

Checking if the array is not modified after cleared. This returns False until clear is called at the first time.

copy_from(self, NdArray arr, use_current_context=True)

Copy values from another NdArray object.

It returns the caller object itself.

Parameters
  • arr (NdArray) – Values will be copied to the caller object. The shape of arr` must be same as the caller object.

  • use_current_context (bool) – If True, a copy is happening in a device and dtype specified in the current context (equivalent to call F.identity(src, output=[self])). Otherwise, a device and dtype in the source array is used. The default is True.

Returns

nnabla.NdArray

data

Returns the values held by this array as a numpy.ndarray. Note that only the references are returned, and the values are not copied. Therefore, modifying the returned nnabla.NdArray will affect the data contained inside the NNabla array. This method can also be called as a setter where an array is created as the same type as rhs. There is an exception where zero() or fill(rhs) is invoked if a scalar with a float or an integer <= 2^53 (as filling value is maintained as float64) is given.

Note that this may implicitly invoke a data transfer from device arrays to the CPU.

Parameters

value (numpy.ndarray) –

Returns

numpy.ndarray

data_ptr(self, dtype, ctx=None)

Get array’s pointer.

The behavior is similar to cast method but returns the data pointer based on the ctx. If the ctx is not specified, the default context obtained by nn.get_current_context is used.

Parameters
Returns

The data pointer.

Return type

int

dtype

Get dtype.

Returns

numpy.dtype

fill(self, value)

Fill all of the elements with the provided scalar value.

Note

This doesn’t not fill values in an internal array with 0 immediately. An array is created as a requested data type when this array is used (in forward or backward computation for exampe), and is filled with the value.

Parameters

value (float) – The value filled with.

static from_numpy_array(nparr)

Create a NdArray object from Numpy array data.

The data is initialized with the given Numpy array.

Parameters

nparr (ndarray) – Numpy multi-dimensional array.

Returns

nnabla.NdArray

get_data(self, str mode='rw', dtype=None)

Returns the values held by this array as a numpy.ndarray with a specified mode.

Parameters
  • mode (str) – Computation becomes more efficient if right one is chosen. * ‘r’: Read-only access. * ‘w’: Write-only access. * ‘rw’: You can both read and write.

  • dtype (numpy.dtype, optional) – Force dtype of a returned array.

See :function:`nnabla.NdArray.data` for more details.

masked_fill()

NdArray.bool_fill(self, mask, value) Return a new but inplaced nnabla.NdArray filled with value where mask is non-zero.

Args:

mask (nnabla.NdArray): Mask with which to fill. Non-zero/zero elements are supposed to be a binary mask as 1/0. No gradients are computed with respect to mask. value (float): The value to fill.

modification_count

Returns how many times modified after memory allocation or clearing buffer.

ndim

Number of dimensions.

Returns

int

shape

Shape of the N-d array.

Returns

tuple of int

size

Total size of the N-d array.

Returns

int

size_from_axis(self, axis=- 1)

Gets the size followed by the provided axis.

Example

a = nnabla.NdArray([10,9])
a.size_from_axis()
# ==> 90
a.size_from_axis(0)
# ==> 90
a.size_from_axis(1)
# ==> 9
a.size_from_axis(2)
# ==> 1
Parameters

axis (int, optional) – -1 as default

Returns

int

strides

Strides.

Returns

tuple of int

zero(self)

Fill all of the elements with 0.

Note

This doesn’t not fill values in an internal array with 0 immediately. An array is created as a requested data type when this array is used (in forward or backward computation for exampe), and is filled with 0.

zeroing

Checking if the array is not modified after calling zero().

Variable

class nnabla.Variable

Bases: object

nnabla.Variable is used to construct computation graphs (neural networks) together with functions in Functions and List of Parametric Functions . It also provides a method to execute forward and backward propagation of the network. The nnabla.Variable class holds:

  • Reference to the parent function in a computation graph. This provides traceability of all connections in the computation graph.

  • Both data and error signal (gradient) containers as nnabla.NdArray s.

  • Some additional information of the computation graph.

Variable overrides some arithmetic operators (+, -, *, /, **). Operands can be either a scalar number, NdArray or Variable. If NdArray is given as either of left or right operand, the arithmetic operation returns an NdArray which stores the output of the computation immediately invoked. Otherwise, it returns Variable holds the graph connection. The computation is invoked immediately when nnabla.auto_forward or nnabla.set_auto_forward(True) is used.

Note

Relational operators == and != of two Variable s are defined as an address comparison of underlying C++ instances (nbla::Variable). Also, hash() function, which is often used in a key for set and dict, is based on the address.

Parameters
  • shape (Iterable of int) – Shape of variable.

  • need_grad (bool) – Flag for backprop or not.

apply(self, **kwargs)

Helper for setting property, then return self.

backward(self, grad=1, bool clear_buffer=False, communicator_callbacks=None, function_pre_hook=None, function_post_hook=None)

Performs a backward propagation starting from this variable until the root variable(s) is/are reached in the function graph. The propagation will stop at a variable with need_grad=False.

Parameters
  • grad (scalar, numpy.ndarray, nnabla.NdArray, or None) – The gradient signal value(s) of this variable. The default value 1 is used in an usual neural network training. This option is useful if you have a gradient computation module outside NNabla, and want to use that result as a gradient signal. Note that this doesn’t modifies the grad values of this variable, instead assign received values to its gradient temporarily. Also, if the Variable you want to execute nnabla._variable.Variable.backward is an unlinked variable from another, and the corresponding Variable holds the pre-computed gradient values, You need to set grad=None, otherwise, for that backward pass (propagated from the unlinked Variable), pre-computed gradient values are ignored.

  • clear_buffer (bool) – Clears the no longer referenced variables during backpropagation to save memory. Note that all unnecessary intermediate variables will be cleared unless set explicitly as persistent=True.

  • communicator_callbacks (nnabla.CommunicatorBackwardCallback or list of nnabla.CommunicatorBackwardCallback) – The callback functions invoked when 1) backward computation of each function is finished and 2) all backward computation is finished.

  • function_pre_hook (callable) – This callable object is called immediately before each function is executed. It must take Function as an input. The default is None.

  • function_post_hook (callable) – This callable object is called immediately after each function is executed. It must take Function as an input. The default is None.

Example

We first explain simple backward usage.

import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import numpy as np
import nnabla.initializer as I

rng = np.random.seed(217)
initializer = I.UniformInitializer((-0.1, 0.1), rng=rng)

x = nn.Variable((8, 3, 32, 32))
x.d = np.random.random(x.shape)  # random input, just for example.

y0 = PF.convolution(x, outmaps=64, kernel=(3, 3), pad=(1, 1), stride=(2, 2), w_init=initializer, name="conv1", with_bias=False)
y1 = F.relu(y0)
y2 = PF.convolution(y1, outmaps=128, kernel=(3, 3), pad=(1, 1), stride=(2, 2), w_init=initializer, name="conv2", with_bias=False)
y3 = F.relu(y2)
y4 = F.average_pooling(y3, kernel=y3.shape[2:])
y5 = PF.affine(y4, 1, w_init=initializer)
loss = F.mean(F.abs(y5 - 1.))
loss.forward()  # Execute forward

# We can check the current gradient of parameter.
print(nn.get_parameters()["conv1/conv/W"].g)

Output :

[[[[0. 0. 0.]
   [0. 0. 0.]
   [0. 0. 0.]]
      ...

Initially all the gradient values should be zero. Then let’s see what happens after calling backward.

loss.backward()
print(nn.get_parameters()["conv1/conv/W"].g)

Output :

[[[[ 0.00539637  0.00770839  0.0090611 ]
   [ 0.0078223   0.00978992  0.00720569]
   [ 0.00879023  0.00578172  0.00790895]]
                     ...

Now we know the gradient values are computed and registered by calling backward. Note that calling backward successively accumulates the result. It means if we execute backward again, we get the doubled result.

loss.backward()  # execute again.
print(nn.get_parameters()["conv1/conv/W"].g)

We can see it’s accumulated.

[[[[ 0.01079273  0.01541678  0.0181222 ]
   [ 0.01564459  0.01957984  0.01441139]
   [ 0.01758046  0.01156345  0.0158179 ]]
                     ...

Next is an advanced usage with an unlinked variable (please refer to get_unlinked_variable). We use the same network, but it is separated by the unlinked variable.

import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import numpy as np
import nnabla.initializer as I

rng = np.random.seed(217)  # use the same random seed.
initializer = I.UniformInitializer((-0.1, 0.1), rng=rng)

x = nn.Variable((8, 3, 32, 32))
x.d = np.random.random(x.shape)  # random input, just for example.

y0 = PF.convolution(x, outmaps=64, kernel=(3, 3), pad=(1, 1), stride=(2, 2), w_init=initializer, name="conv1", with_bias=False)
y1 = F.relu(y0)
y2 = PF.convolution(y1, outmaps=128, kernel=(3, 3), pad=(1, 1), stride=(2, 2), w_init=initializer, name="conv2", with_bias=False)
y3 = F.relu(y2)
y3_unlinked = y3.get_unlinked_variable()  # the computation graph is cut apart here.
y4 = F.average_pooling(y3_unlinked, kernel=y3_unlinked.shape[2:])
y5 = PF.affine(y4, 1, w_init=initializer)
loss = F.mean(F.abs(y5 - 1.))

# Execute forward.
y3.forward()  # you need to execute forward at the unlinked variable first.
loss.forward()  # Then execute forward at the leaf variable.

# Execute backward.
loss.backward()  # works, but backpropagation stops at y3_unlinked.
print(nn.get_parameters()["conv1/conv/W"].g)  # no gradient registered yet.

Output :

[[[[0. 0. 0.]
   [0. 0. 0.]
   [0. 0. 0.]]
      ...

We can confirm that backpropagation stops at y3_unlinked. Then let’s see how to execute backpropagation to the root variable (x). Since it’s a little bit complicated, let us give you an example of common pitfall first. Note that this is an incorrect way and intended just to show the backward’s behavior.

y3.backward()  # this works, but computed gradient values are not correct.
print(nn.get_parameters()["conv1/conv/W"].g)

Output :

[[[[ 17.795254    23.960905    25.51168   ]
   [ 20.661646    28.484127    19.406212  ]
   [ 26.91042     22.239697    23.395714  ]]
                     ...

Note that this is a wrong result. The gradient held by y3_unlinked has been totally ignored. As described above, just calling backward, the gradient (of the leaf variable where you call backward) is considered to be 1.

To execute backpropagation over 2 separate graphs correctly, We need to specify grad=None as shown below, then present gradient held by that variable is used for computation. (y3.backward(grad=y3_unlinked.g) does the same thing.)

#reset all the gradient values.
for v in nn.get_parameters().values():
    v.g = 0.
for v in [y0, y1, y2, y3, y4, y5]:
    v.g = 0.  # need to reset all the gradient values.

loss.backward()  # backpropagation starts from the leaf variable again.
y3.backward(grad=None)  # By this, it can take over the gradient held by y3_unlinked.
print(nn.get_parameters()["conv1/conv/W"].g)  # correct result.

This time you should have the same result.

[[[[ 0.00539637  0.00770839  0.0090611 ]
   [ 0.0078223   0.00978992  0.00720569]
   [ 0.00879023  0.00578172  0.00790895]]
                     ...
bool_fill_(self, mask, value)

Return a new but inplaced nnabla.Variable filled with value where mask is non-zero.

Parameters
  • mask (nnabla.NdArray) – Mask with which to fill. Non-zero/zero elements are supposed to be a binary mask as 1/0. No gradients are computed with respect to mask.

  • value (float) – The value to fill.

Returns

nnabla.Variable

Clear all intermediate functions and variables.

This method clear all intermediate functions and variables up to this variable in forward pass and is useful for the truncated backpropagation through time (truncated BPTT) in dynamic graph.

d

Returns the values held by this variable, as a numpy.ndarray. Note that the values are referenced (not copied). Therefore, the modification of the returned ndarray will affect the data of the NNabla array. This method can be called as a setter to set the value held by this variable. Refer to the documentation of the setter nnabla.NdArray.data for detailed behaviors of the setter.

Parameters

value (numpy.ndarray) (optional) –

Returns

numpy.ndarray

data

Returns the data held by this variable, as a NdArray. This can also be used as a setter.

Parameters

ndarray (NdArray) – NdArray object. Size must be the same as this Variable.

Returns

NdArray

forward(self, bool clear_buffer=False, bool clear_no_need_grad=False, function_pre_hook=None, function_post_hook=None)

Performs a forward propagation from the root node to this variable. The forward propagation is performed on a subset of variables determined by the dependency of this variable. The subset is recursively constructed by tracking variables that the variables in the subset depend on, starting from this variable, until it reaches the root variable(s) in the function graph. See also forward_all, which performs forward computations for all variables within the input graph.

Parameters
  • clear_buffer (bool) – Clear the no longer referenced variables during forward propagation to save memory. This is usually set as True in an inference or a validation phase. Default is False. Note that all unnecessary intermediate variables will be cleared unless set explicitly as persistent=True.

  • clear_no_need_grad (bool) – Clear the unreferenced variables with need_grad=False during forward propagation. True is usually used when calling this during training. This is ignored when clear_buffer=True.

  • function_pre_hook (callable) – This callable object is called immediately before each function is executed. It must take Function as an input. The default is None.

  • function_post_hook (callable) – This callable object is called immediately after each function is executed. It must take Function as an input. The default is None.

static from_numpy_array(data, grad=None, need_grad=None)

Create a Variable object from Numpy array(s).

The data is initialized with the given Numpy array, as well as grad if given.

The shape is also determined by the given array.

Parameters
  • data (ndarray) – Values copied to the data of the created Variable.

  • grad (ndarray) – Values copied to the grad of the created Variable.

  • need_grad (bool) – Flag for backprop or not.

Returns

Variable

function_references

Returns a list of functions which take this variable as an input. This method can be called only as a getter.

Returns

list of nnabla.function.Function

g

Returns the gradient values held by this variable, as a numpy.ndarray. Note that the values are referenced (not copied). Therefore, the modification of the returned ndarray will affect the data of the NNabla array. This method can be called as a setter to set the gradient held by this variable. Refer to the documentation of the setter nnabla.NdArray.data for detailed behaviors of the setter.

Parameters

value (numpy.ndarray) –

Returns

numpy.ndarray

get_unlinked_variable(self, need_grad=None)

Gets an unlinked (forgetting parent) variable that shares a Variable buffer instance.

Parameters

need_grad (bool, optional) – By default, the unlinked variable will have the same need_grad flag with this variable instance. By specifying a boolean value, the new need_grad flags will be set to the unlinked variable. It is recommended to explicitly specify this option to avoid an unintended behavior.

Returns: Variable

Note

The unlinked Variable behaves equivalent to the original variable in a comparison operator and hash function regardless whether or not the need_grad attribute is changed. See a note in the Variable class documentation. Also, for backward execution with unlinked variable(s), please refer to backward and its example.

Example

import numpy as np
import nnabla as nn
import nnabla.parametric_functions as PF

x = nn.Variable.from_numpy_array(np.array([[1, 2], [3, 4]]))
y = PF.affine(x, 4, name="y")

# Create a new variable of which graph connection is unlinked.
# Recommend to specify need_grad option explicitly .
z = y.get_unlinked_variable(need_grad=False)

print(y.parent)
# Affine
print(z.parent)  # z is unlinked from the parent x but shares the buffers of y.
# None
grad

Returns the gradient held by this variable, as a NdArray. This can also be used as a setter.

Parameters

ndarray (NdArray) – NdArray object. Size must be the same as this Variable.

Returns

NdArray

info

object

Information of the variable.

Type

info

masked_fill_()

Variable.bool_fill_(self, mask, value)

Return a new but inplaced nnabla.Variable filled with value where mask is non-zero.

Parameters
  • mask (nnabla.NdArray) – Mask with which to fill. Non-zero/zero elements are supposed to be a binary mask as 1/0. No gradients are computed with respect to mask.

  • value (float) – The value to fill.

Returns

nnabla.Variable

ndim

Gets the number of dimensions of this variable.

Returns

int

need_grad

Gets or sets a boolean indicating whether backpropagation is performed at this variable.

Parameters

b (bool) – Whether backpropagation is performed at this variable.

Returns

Whether this variable requires gradient or not.

Return type

bool

no_grad(self)

No gradients for the whole network.

This method is like nnabla.no_grad but can be used for the static network only, and useful for the case where the network is loaded from NNP format.

Example

x = nn.Variable.from_numpy_array([2, 3])
y = <Network>(x).no_grad()
parent

Returns the parent function of this variable. This method can also be called as a setter.

Parameters

func (nnabla.function.Function) –

Returns

nnabla.function.Function

persistent

Returns the persistent flag of this variable. If True, the variable is not cleared even if clear options in nnabla._variable.Variable.forward() and nnabla._variable.Variable.backward() are enabled. This is useful when you debug the variable values, or log them. This method can also be called as a setter.

Parameters

b (bool) –

Returns

bool

recompute

Gets or sets a boolean indicating whether its data is cleared during forward propagation and recomputation is performed during backward propagation.

Parameters

b (bool) – Whether recomputation is performed during backward propagation.

Returns

Whether this variable is recomputed during backward propagation.

Return type

bool

reset_shape(self, shape, force=False)

Resizes the shape of the variable to a specified shape.

Parameters
  • shape (Iterable of int) – Target shape.

  • force (bool) – Flag to force reshape.

Note

This method destructively changes the shape of the target variable. For safety, reshape() should be used instead.

Returns

None

reshape(self, shape, unlink=False)

Returns a new variable, where this variable is reshaped to a specified shape.

Parameters
  • shape (Iterable of int) – Target shape.

  • unlink (bool) – Unlink graph connection. Or, keep graph connection, i.e. the gradient will be backprop-ed to the original variable.

Returns

Variable

rewire_on(self, var)

Rewire a successor graph of this variable on top of var.

Parameters

var (nnabla.Variable) – The array elements and the parent function of var is copied to self as references. Note that the parent function of var is removed.

Example

# A. Create a graph A.
xa = nn.Variable((2, 8), need_grad=True)
ya = F.tanh(PF.affine(xa, 10, name='a'))

# B. Create a graph B.
xb = nn.Variable((2, 16), need_grad=True)
yb = F.tanh(PF.affine(
    F.tanh(PF.affine(xb, 8, name='b1')),
    8, name='b2'))

# C. Rewire the graph A on top of B such that
#    `xb->B->(yb->)xa->A->ya`. Note `yb` is gone.
xa.rewire_on(yb)

# D. Execute the rewired graph.
xb.d = 1
ya.forward()
ya.backward()
shape

Gets the shape of the variable.

Returns

tuple of int

size

Gets the size of the variable.

Returns

int

size_from_axis(self, axis=- 1)

Gets the size followed by the provided axis.

Example

a = nnabla.Variable([10,9])
a.size_from_axis()
# ==> 90
a.size_from_axis(0)
# ==> 90
a.size_from_axis(1)
# ==> 9
a.size_from_axis(2)
# ==> 1
Parameters

axis (int, optional) – -1 as default

Returns

int

unlinked(self, need_grad=None)

This function is deprecated, use get_unlinked_variable instead.

visit(self, f)

Visit functions recursively in forward order.

Parameters

f (function) – Function object which takes nnabla._function.Function object as an argument.

Returns

None

Example

import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF

# Define a simple network-graph
def network_graph(x, maps=16, test=False):
    h = x
    h = PF.convolution(h, maps, kernel=(3, 3), pad=(1, 1), name="first-conv", with_bias=False)
    h = F.average_pooling(h, h.shape[2:])
    pred = PF.affine(h, 10, name="pred")
    return pred

# You can modify this PrintFunc to get the other information like inputs(nnabla_func.inputs), outputs and arguments(nnabla_func.info.args) of nnabla functions.
class PrintFunc(object):
    def __call__(self, nnabla_func):
        print(nnabla_func.info.type_name)

x = nn.Variable([1, 3, 16, 16])
output = network_graph(x)
output.visit(PrintFunc())

Output :

Convolution
AveragePooling
Affine
visit_check(self, f)

Visit functions recursively in forward order.

Note

If any of evaluation of the function object returns True, the visit propagation will stop immediately, and will return True.

Parameters

f (function) – Function object which takes nnabla._function.Function object as an argument.

Returns

bool Returns True if any of the function object call returns True.

Example

Define a simple network-graph where AveragePooling function can be added explicitly as below:

def network_graph(x, add_avg_pool=False, maps=16, test=False):
    h = x
    h = PF.convolution(h, maps, kernel=(3, 3), pad=(1, 1), name="first-conv", with_bias=False)
    if add_avg_pool :
        h = F.average_pooling(h, h.shape[2:])
    else :
        h = F.relu(h)
    pred = PF.affine(h, 10, name="pred")
    return pred

# Define 'PrintFunc()' to check whether "AveragePooling" function exists in the network-graph
class PrintFunc(object):
    def __call__(self, nnabla_func):
        if nnabla_func.info.type_name =="AveragePooling" :
            print("{} exists in the graph".format(nnabla_func.info.type_name))
            return True
        else :
            return False

Create a network-graph which has AveragePooling function and call visit_check() method :

x = nn.Variable([1, 3, 16, 16])
output = network_graph(x, add_avg_pool=True)  #Adding AveragePooling function to the graph
print("The return value of visit_check() method is : {}".format(output.visit_check(PrintFunc())))

Output :

AveragePooling exists in the graph
The return value of visit_check() method is : True

Create a network-graph which doesn’t have AveragePooling function and call visit_check() method :

nn.clear_parameters()                         # call this in case you want to run the following code again
output = network_graph(x, add_avg_pool=False) # Exclusion of AveragePooling function in the graph
print("The return value of visit_check() method is : {}".format(output.visit_check(PrintFunc())))

Output :

The return value of visit_check() method is : False

Computation Graph

Computation Graph
nnabla.forward_all(variables, bool clear_buffer=False, bool clear_no_need_grad=False, function_pre_hook=None, function_post_hook=None)

Performs a forward propagation up to variables specified as the 1st argument. See also forward.

Parameters
  • clear_buffer (bool) –

    Clear the no longer referenced variables during forward propagation to save memory. This is usually set as True in an inference or a validation phase. Default is False. Note that starting variable and destination variable of the input graph will not be cleared, regardless of their persistent flag. All intermediate variables will be cleared unless set explicitly as persistent=True. For example,

    forward_all([h_i, y], clear_buffer=True)
    

    will clear all intermediate variables between h_i and y unless set explicitly as persistent=True, but h_i and y will not be cleared regardless of their persistent flag.

  • clear_no_need_grad (bool) – Clear the unreferenced variables with need_grad=False during forward propagation. True is usually used when calling this during training. This is ignored when clear_buffer=True.

  • function_pre_hook (callable) – This callable object is called immediately before each function is executed. It must take Function as an input. The default is None.

  • function_post_hook (callable) – This callable object is called immediately after each function is executed. It must take Function as an input. The default is None.

Example

import numpy as np
import nnabla as nn
import nnabla.parametric_functions as PF

# Create a graph which has two outputs
x = nn.Variable.from_numpy_array(np.array([[1, 2], [3, 4]]))
y = PF.affine(x, 4, name="y")
z = PF.affine(x, 8, name="z")

# Execute a forward propagation recursively up to y and z
nn.forward_all([y, z], clear_buffer)
nnabla.no_grad(no_grad_=True)[source]

No gradients for the whole network.

No gradients are required when creating a network, such that when the forward pass is executed, all intermediate buffers except for the leafs in the network are gone at the same time, resulting in memory optimization.

This is useful for example when an output of a pre-trained network is used for an input to another network, where the first pre-trained network does not need to be fine-tuned, but the other network is optimized.

Parameters

no_grad (bool) – No gradient flag. Default is True.

Example:

with nn.no_grad():
    output0 = <Network0>(<input0>)

output1 = <Network1>(<input1>, output0)
loss = <Loss>(output1, <ground_truth>)
loss.forward(clear_no_need_grad=True)

This context also works in the dynamic mode.

with nn.auto_forward(), nn.no_grad():
    output0 = <Network0>(<input0>)

Note

When working with the static network, the need_grad property of the input (e.g., input image) must be False and do not forget to add <root>.forward(clear_no_need_grad=True); otherwise, all intermediate buffers are not gone as expected.

Functions

All NNabla functions are derived from the nnabla.function.Function class.

Function
class nnabla.function.Function

Function interface class.

Instances of nnabla.function.Function are not directly created by users. It is indirectly created by the functions available in nnabla.functions. These functions return nnabla.Variable (s) holding the created function instance as the parent property.

args

Experimental

Get args of the function.

backward(self, inputs, outputs, accum=None)
forward(self, inputs, outputs)
grad_depends_output_data(self, int i, int o)
info

object

Type

info

inplace_data(self, int i)
inplace_data_with(self, int i)
min_outputs(self)
need_setup_recompute(self, int o)
recompute(self, inputs, outputs)
setup(self, inputs, outputs)
setup_recompute(self, inputs, outputs)
tags

Experimental

Get tags of the function.

class nnabla.function.PythonFunction(ctx=None)

Creates a user-defined custom function in the subclsass.

To implement the naive multiplicaiton function of two variables using PythonFunction,

import nnabla as nn
import nnabla.functions as F
from nnabla.function import PythonFunction

class Mul2(PythonFunction):

    def __init__(self, ctx):
        super(Mul2, self).__init__(ctx)

    @property
    def name(self):
        return self.__class__.__name__

    def min_outputs(self):
        return 1

    def setup_impl(self, inputs, outputs):
        i0 = inputs[0]
        i1 = inputs[1]
        assert i0.shape == i1.shape, "Shapes of inputs are different."
        o0 = outputs[0]
        o0.reset_shape(i0.shape, True)

    def forward_impl(self, inputs, outputs):
        x0 = inputs[0].data
        x1 = inputs[1].data
        y = outputs[0].data

        # We can also write like, y.copy_from(x0 * x1)
        y.copy_from(F.mul2(x0, x1))

    def backward_impl(self, inputs, outputs, propagate_down, accum):
        # Data of inputs and outputs
        x0 = inputs[0].data
        x1 = inputs[1].data
        y = outputs[0].data
        # Grads of inputs and outputs
        dx0 = inputs[0].grad
        dx1 = inputs[1].grad
        dy = outputs[0].grad

        # backward w.r.t. x0
        if propagate_down[0]:
            if accum[0]:
                dx0 += F.mul2(dy, x1)
            else:
                dx0.copy_from(F.mul2(dy, x1))

        # backward w.r.t. x1
        if propagate_down[1]:
            if accum[1]:
                dx1 += F.mul2(dy, x0)
            else:
                dx1.copy_from(F.mul2(dy, x0))

    def grad_depends_output_data(self, i, o):
        return False

    def grad_depends_input_data(self, i, j):
        return True

def mul2(x, y, ctx=None):
    func = Mul2(ctx)
    return func(x, y)
__init__(self, ctx=None)
Parameters

ctx (nnabla.Context) – Context used for the forward and backward pass. If not specified, the current context is used.

backward_impl(self, inputs, outputs, propagate_down, accum)

Backward method.

Parameters
property ctx

Context Return the context if the context is set in the constructor; otherwise return the global context

forward_impl(self, inputs, outputs)

Forward method.

Parameters
grad_depends_input_data(self, i, j)

Checking if i-th input’ gradient computation requires j-th input’s data or not.

Parameters
grad_depends_output_data(self, i, o)

Checking if i-th input’ gradient computation requires o-th output’s data or not.

Parameters
min_outputs(self)

Minimum number of outputs of the function.

property name

Name of the function.

setup_impl(self, inputs, outputs)

Setup method.

Parameters
List of Functions

The nnabla.functions module provides various types of functions listed below. These functions takes input nnabla.Variable (s) as its leading argument(s), followed by options specific to each function.

Note

The functions can also take NdArray (s) as inputs instead of Variable (s). It will execute the function operation immediately, and returns NdArray (s) as output(s) holding output values of the operation. We call this “Imperative Mode” (NdArray + Functions).

Neural Network Layers
nnabla.functions.affine(x, weight, bias=None, base_axis=1, n_outputs=- 1, outputs=None)[source]

Affine layer, also called as the fully connected layer. It calculates:

\[{\mathbf y} = {\mathbf A} {\mathbf x} + {\mathbf b}.\]

where \({\mathbf x}\) is the input and \({\mathbf y}\) is the output.

Parameters
  • x (Variable) – Input N-D array with shape (\(M_0 \times ... \times M_{B-1} \times D_B \times ... \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.

  • weight (Variable) – Weight matrix with shape (\((D_B \times ... \times D_N) \times L_{0} \times \ldots \times L_{I}\)) [parameter]

  • bias (Variable) – Bias vector (\(L_{0} \times \ldots \times L_{I}\)) [optional][parameter]

  • base_axis (int) – Base axis of Affine operation. Dimensions up to base_axis is treated as sample dimension. [default= 1 ]

Returns

\((B + 1)\)-D array. (\(M_0 \times ... \times M_{B-1} \times L_{0} \times \ldots \times L_{I}\))

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.convolution(x, weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, channel_last=False, n_outputs=- 1, outputs=None)[source]

N-D Convolution with bias.

See references for dilated convolution (a.k.a. atrous convolution).

References

Note

Convolution is a computationally intensive operation that should preferrably be run with the cudnn backend. NNabla then uses CuDNN library functions to determine and cache the fastest algorithm for the given set of convolution parameters, which results in additional memory consumption which may pose a problem for GPUs with insufficient memory size. In that case, the NNABLA_CUDNN_WORKSPACE_LIMIT environment variable can be used to restrict the choice of algorithms to those that fit the given workspace memory limit, expressed in bytes. In some cases it may also be desired to restrict the automatic search to algorithms that produce deterministic (reproducable) results. This can be requested by setting the the environment variable NNABLA_CUDNN_DETERMINISTIC to a non-zero value.

Parameters
  • x (Variable) – \((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C \times L_1 \times ... \times L_N\)).

  • weight (Variable) – \((2 + N)\)-D array (\(C' \times C \times K_1 \times ... \times K_N\)). [parameter]

  • bias (Variable) – Bias vector (\(C'\)). [optional][parameter]

  • base_axis (int) – base axis \(B\). [default= 1 ]

  • pad (tuple of int) – Padding sizes for dimensions. [default= (0,) * (len(x.shape) - (base_axis+1)) ]

  • stride (tuple of int) – Stride sizes for dimensions. [default= (1,) * (len(x.shape) - (base_axis+1)) ]

  • dilation (tuple of int) – Dilation sizes for dimensions. [default= (1,) * (len(x.shape) - (base_axis+1)) ]

  • group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default= 1 ]

  • channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default= False ]

Returns

\((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C' \times L'_1 \times ... \times L'_N\)).

A spatial size of the output is calculated as

\[L'_i = \frac{L_i + 2 p_i - d_i (k_i - 1) - 1}{s_i} + 1,\]

where \(L_i\) is the spatial size, \(p_i\) is the padding, \(d_i\) is the dilation, \(k_i\) is the kernel size, and \(s_i\) is the stride for \(i\)-th spatial dimension. The same calculation can also be applied to the other spatial dimensions.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.depthwise_convolution(x, weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, multiplier=1, n_outputs=- 1, outputs=None)[source]

N-D Depthwise Convolution with bias.

References

Parameters
  • x (Variable) – \((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C \times L_1 \times ... \times L_N\)).

  • weight (Variable) – \((1 + N)\)-D array (\(C \times K_1 \times ... \times K_N\)). [parameter]

  • bias (Variable) – Bias vector (\(C'\)). [optional][parameter]

  • base_axis (int) – base axis \(B\). [default= 1 ]

  • pad (tuple of int) – Padding sizes for dimensions. [default= (0,) * (len(x.shape) - (base_axis+1)) ]

  • stride (tuple of int) – Stride sizes for dimensions. [default= (1,) * (len(x.shape) - (base_axis+1)) ]

  • dilation (tuple of int) – Dilation sizes for dimensions. [default= (1,) * (len(x.shape) - (base_axis+1)) ]

  • multiplier (int) – Number of output feature maps per input feature map. [default= 1 ]

Returns

\((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C' \times L'_1 \times ... \times L'_N\)).

The output map size \(C'\) is \(C\) multiplied by \(m\)

\[C' = m \times C,\]

where \(m\) is the multiplier.

A spatial size of the output is calculated as

\[L'_i = \frac{L_i + 2 p_i - d_i (k_i - 1) - 1}{s_i} + 1,\]

where \(L_i\) is the spatial size, \(p_i\) is the padding, \(d_i\) is the dilation, \(k_i\) is the kernel size, and \(s_i\) is the stride for \(i\)-th spatial dimension. The same calculation can also be applied to the other spatial dimensions.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.deconvolution(x, weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, channel_last=False, output_padding=None, n_outputs=- 1, outputs=None)[source]

N-D deconvolution, also known as transposed convolution, with bias operates backward convolution (derivative of the output w.r.t. the input) plus channel-wise learned bias.

The weights are specified in the same manner as convolution() , as if it was an ordinary convolution function. The forward operation of deconvolution() will then be operationally equivalent to the backward pass of convolution() . Therefore, the number of input channels (can be seen as output channels of forward convolution) is specified in the first dimension, and the number of the output channels divided by the number of groups is specified in the second dimension.

For stride > 1, a parameter-wise identical deconvolution on the output of a convolution may not produce the same output shape as the input to the convolution if, due to striding, the convolution did not fully cover the input spatial dimension. The output_padding parameter can then be used to appropriately increase the calculated output shape. Note that this is used to find the output shape for the deconvolution operation, but not to add zero-padding to the output.

Parameters
  • x (Variable) – \((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C \times L_1 \times ... \times L_N\)).

  • weight (Variable) – \((2 + N)\)-D array (\(C \times C' \times K_1 \times ... \times K_N\)). [parameter]

  • bias (Variable) – Bias vector (\(C'\)). [optional][parameter]

  • base_axis (int) – base axis \(B\). [default= 1 ]

  • pad (tuple of int) – Padding sizes for dimensions. [default= (0,) * (len(x.shape) - (base_axis+1)) ]

  • stride (tuple of int) – Stride sizes for dimensions. [default= (1,) * (len(x.shape) - (base_axis+1)) ]

  • dilation (tuple of int) – Dilation sizes for dimensions. [default= (1,) * (len(x.shape) - (base_axis+1)) ]

  • group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default= 1 ]

  • channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default= False ]

  • output_padding (tuple of int) – Additional size added to the output shape. [default= (0,) * (len(x.shape) - (base_axis+1)) ]

Returns

\((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C' \times L'_1 \times ... \times L'_N\)).

A spatial size of the output is calculated as

\[L'_i =s_i (L_i - 1) - 2 p_i + d_i (k_i - 1) + 1,\]

where \(s_i\) is the stride, \(L_i\) is the spatial size, \(p_i\) is the padding, \(d_i\) is the dilation, and \(k_i\) is the kernel size for \(i\)-th spatial dimension. The same calculation can also be applied to the other spatial dimensions.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.depthwise_deconvolution(x, weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, divisor=1, n_outputs=- 1, outputs=None)[source]

Depthwise deconvolution computes the transposed depthwise convolution with bias for one-dimensional and two-dimensional input data.

Parameters
  • x (Variable) – \((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C \times L_1 \times ... \times L_N\)).

  • weight (Variable) – \((1 + N)\)-D array (\(C \times K_1 \times ... \times K_N\)). [parameter]

  • bias (Variable) – Bias vector (\(C'\)). [optional][parameter]

  • base_axis (int) – base axis \(B\). [default= 1 ]

  • pad (tuple of int) – Padding sizes for dimensions. [default= (0,) * (len(x.shape) - (base_axis+1)) ]

  • stride (tuple of int) – Stride sizes for dimensions. [default= (1,) * (len(x.shape) - (base_axis+1)) ]

  • dilation (tuple of int) – Dilation sizes for dimensions. [default= (1,) * (len(x.shape) - (base_axis+1)) ]

  • divisor (int) – Number of input feature maps per output feature map. [default= 1 ]

Returns

\((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C' \times L'_1 \times ... \times L'_N\)).

The output map size \(C'\) is \(C\) multiplied by \(m\)

\[C' = \frac{C}{d},\]

where \(d\) is the divisor.

A spatial size of the output is calculated as

\[L'_i =s_i (L_i - 1) - 2 p_i + d_i (k_i - 1) + 1,\]

where \(s_i\) is the stride, \(L_i\) is the spatial size, \(p_i\) is the padding, \(d_i\) is the dilation, and \(k_i\) is the kernel size for \(i\)-th spatial dimension. The same calculation can also be applied to the other spatial dimensions.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.deformable_convolution(x, weight, offset, mask=None, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, deformable_group=1, channel_last=False, n_outputs=- 1, outputs=None)[source]

2-D Deformable Convolution with bias. Another convolution with fixed output channels must be passed externally to calculate the offsets and mask. Mask should be normalized to \([0,1]\) interval.

\[\begin{eqnarray} y(p) = \sum_{k=1}^{K} w_k \cdot x(p + p_k + \Delta p_k) \cdot \Delta m_k, \end{eqnarray}\]

where \(x\) and \(y\) are input and output, \(w_k\) is the weight, \(p\) is the pixel location of interest, \(p_k\) is the fixed displacement e.g., \(p_k \in \{(-1, -1), (-1, 0), \ldots (1, 1)\}\) for the 2D 3x3 receptive field, \(\Delta p_k\) is the learnable displacement, and \(\Delta m_k\) is the learnable scale normalized in \([0, 1]\) by a function like the sigmoid. Note that \(\Delta p_k\) and \(\Delta m_k\) are sample-dependent, location-dependent, and feature-independent.

References

Parameters
  • x (Variable) – \((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C \times L_1 \times ... \times L_N\)).

  • weight (Variable) – \((2 + N)\)-D array (\(C' \times C \times K_1 \times ... \times K_N\)). [parameter]

  • offset (Variable) – Offsets for deformable convolutions. Shape is fixed to \((N, deformable_group \times 2 \times Kh \times Kw, H, W)\). Offsets must be calculated externally through a separate convolution layer.

  • mask (Variable) – Normalized mask for deformable convolutions v2. Shape is fixed to \((N, deformable_group \times 2 \times Kh \times Kw, H, W)\). Masks must be calculated externally together with the offsets through a separate convolution layer. [optional]

  • bias (Variable) – Bias vector (\(C'\)). [optional][parameter]

  • base_axis (int) – base axis \(B\). [default= 1 ]

  • pad (tuple of int) – Padding sizes for dimensions. [default= (0,) * (len(x.shape) - (base_axis+1)) ]

  • stride (tuple of int) – Stride sizes for dimensions. [default= (1,) * (len(x.shape) - (base_axis+1)) ]

  • dilation (tuple of int) – Dilation sizes for dimensions. [default= (1,) * (len(x.shape) - (base_axis+1)) ]

  • group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default= 1 ]

  • deformable_group (int) – Number of deformable groups of channels. [default= 1 ]

  • channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default= False ]

Returns

\((B + 1 + N)\)-D array (\(M_1 \times ... \times M_B \times C' \times L'_1 \times ... \times L'_N\)).

A spatial size of the output is calculated as

\[L'_i = \frac{L_i + 2 p_i - d_i (k_i - 1) - 1}{s_i} + 1,\]

where \(L_i\) is the spatial size, \(p_i\) is the padding, \(d_i\) is the dilation, \(k_i\) is the kernel size, and \(s_i\) is the stride for \(i\)-th spatial dimension. The same calculation can also be applied to the other spatial dimensions.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.adaptive_separable_convolution(x, vertical_kernel, horizontal_kernel, n_outputs=- 1, outputs=None)[source]

2-D Adaptive Separable Convolution for NCHW (the channel-first tensor). Sample and pixel dependent vertical and horizontal kernels are dynamically generated ones, which are used for approximating a feature-independent 2-D kernel in this function. Thus, the kernel used in this function is dependent on samples and pixels but independent on features.

If the padding is needed, use the pad function to the input \(x\) before this function.

Adaptive separable convolution is formulated as

\[\tilde{I}(c, h, w) = \sum_{j, i} K_v(j, h, w) \times K_h(i, h, w) \times I(c, h + j, w + i),\]

where \(I(c, h, w)\) and \(\tilde{I}(c, h, w)\) are the input and output images at \(c\)-th channel, \(h\)-th height, \(w\)-th width. \(K_V(:, h, w)\) and \(K_h(:, h, w)\) are vertical and horizontal 1-D kernels at \(h\)-th height and \(w\)-th width.

References

Parameters
  • x (Variable) – \(4-D\) array (\(B \times C \times H \times W\))

  • vertical_kernel (Variable) – \(4-D\) array (\(B \times K_v \times H \times W\))

  • horizontal_kernel (Variable) – \(4-D\) array (\(B \times K_h \times H \times W\))

Returns

\(4-D\) array (\(B \times C \times H - K_v + 1 \times W - K_h + 1\))

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.max_pooling(x, kernel, stride=None, ignore_border=True, pad=None, channel_last=False, n_outputs=- 1, outputs=None)[source]

Max pooling. It pools the maximum values inside the scanning kernel:

\[y_{i_1, i_2} = \max_{k_1, k_2 \in K} (x_{i_1 + k_1, i_2 + k_2})\]

where \(x_{i_1 + k_1, i_2 + k_2}\) is the input and \(y_{i_1, i_2}\) is the output.

Parameters
  • x (Variable) – Input variable.

  • kernel (tuple of int) – Kernel sizes for each spatial axis.

  • stride (tuple of int) – Subsampling factors for each spatial axis. [default= kernel ]

  • ignore_border (bool) – If false, kernels covering borders are also considered for the output. [default= True ]

  • pad (tuple of int) – Border padding values for each spatial axis. Padding will be added both sides of the dimension. [default= (0,) * len(kernel) ]

  • channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default= False ]

Returns

Maximum values variable

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.average_pooling(x, kernel, stride=None, ignore_border=True, pad=None, channel_last=False, including_pad=True, n_outputs=- 1, outputs=None)[source]

Average pooling. It pools the averaged values inside the scanning kernel:

\[y_{i_1, i_2} = \frac{1}{K_1 K_2} \sum_{k1} \sum_{k2} x_{i_1 + k_1, i_2 + k_2}\]

where \(x_{i_1 + k_1, i_2 + k_2}\) is the input and \(y_{i_1, i_2}\) is the output.

Parameters
  • x (Variable) – Input variable.

  • kernel (tuple of int) – Kernel sizes for each spatial axis.

  • stride (tuple of int) – Subsampling factors for each spatial axis. [default= kernel ]

  • ignore_border (bool) – If false, kernels covering borders are also considered for the output. [default= True ]

  • pad (tuple of int) – Border padding values for each spatial axis. Padding will be added both sides of the dimension. [default= (0,) * len(kernel) ]

  • channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default= False ]

  • including_pad (bool) – If true, border padding values are considered for the output. [default= True ]

Returns

Average values variable

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.global_average_pooling(x, n_outputs=- 1, outputs=None)[source]

Warning

This function is experimental support, so please do not actively use it.

Global average pooling. It pools an averaged value from the whole image

Parameters

x (Variable) – Input variable.

Returns

Average values variable

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.sum_pooling(x, kernel, stride=None, ignore_border=True, pad=None, channel_last=False, n_outputs=- 1, outputs=None)[source]

Sum pooling. It pools the summed values inside the scanning kernel:

\[y_{i_1, i_2} = \sum_{k1} \sum_{k2} x_{i_1 + k_1, i_2 + k_2}\]

where \(x_{i_1 + k_1, i_2 + k_2}\) is the input and \(y_{i_1, i_2}\) is the output.

Parameters
  • x (Variable) – Input variable.

  • kernel (tuple of int) – Kernel sizes for each spatial axis.

  • stride (tuple of int) – Subsampling factors for each spatial axis. [default= kernel ]

  • ignore_border (bool) – If false, kernels covering borders are also considered for the output. [default= True ]

  • pad (tuple of int) – Border padding values for each spatial axis. Padding will be added both sides of the dimension. [default= (0,) * len(kernel) ]

  • channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default= False ]

Returns

Summed values variable

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.unpooling(x, kernel, channel_last=False, n_outputs=- 1, outputs=None)[source]

Inverse operation of pooling. It spreads the input values:

\[y_{k_1 i_1 + j_1, k_2 i_2 + j_2} = x_{i_1, i_2}\]

where \(_{i_1, i_2}\) is the input and \(y_{k_1 i_1 + j_1, k_2 i_2 + j_2}\) is the output.

Parameters
  • x (Variable) – Input variable.

  • kernel (tuple of int) – Kernel sizes for each spatial axis.

  • channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default= False ]

Returns

Spread values variable

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.embed(x0, w, n_outputs=- 1, outputs=None)[source]

Embed slices of a matrix/tensor with indexing array/tensor.

Parameters
  • x0 (Variable) – Indices with shape \((I_0, ..., I_N)\)

  • w (Variable) – Weights with shape \((W_0, ..., W_M)\) [parameter]

Returns

Output with shape \((I_0, ..., I_N, W_1, ..., W_M)\)

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.rnn(x, h, weight_l0, weight=None, bias=None, num_layers=1, nonlinearity='tanh', dropout=None, bidirectional=False, training=True, n_outputs=- 1, outputs=None)[source]

RNN function implements Elman RNN with nonlineraity to input sequence. RNN function is defined as following:

\[{\mathbf h_t} = {\mathbf \tanh}( {\mathbf w_{ih}} *{\mathbf x_t} + {\mathbf b_{ih}} + {\mathbf w_{hh}}* {\mathbf h_{(t-1)}} + {\mathbf b_{hh}}).\]

We use the following notations to describe the inputs and outputs below. \(T\): sequcne length, \(B\): batch size, \(I\): input size, \(L\): number of layers, \(D\): number of directions, can be either 1 or 2, \(H\): hidden size.

References

Parameters
  • x (Variable) – Input N-D array with shape \((T, B, I)\).

  • h (Variable) – Input N-D array with shape \((L, D, B, H)\).

  • weight_l0 (Variable) – Input N-D array with shape \((D, H, I + H)\). [parameter]

  • weight (Variable) – Input N-D array with shape \((L-1, D, H, D * H + H)\). [optional][parameter]

  • bias (Variable) – Input N-D array with shape \((L, D, H)\). [optional][parameter]

  • num_layers (int) – Number of layers in the network. If set to 1, only the weights for the first layer will be invoked. Default is 1. [default= 1 ]

  • nonlinearity (string) – Type of nonlinearity applied to input sequcne. Must be either tanh or relu. Default is tanh. [default= 'tanh' ]

  • dropout (float) – Dropout ratio applied to parameters. Default is 0.0. [default= 0.0 ]

  • bidirectional (bool) – If True, bidirectional computation will be performed in each layer. Default is False. [default= False ]

  • training (bool) – Backpropagation will be performed only when it is true. Default is True. [default= True ]

Returns

Output \(y\) with shape \((T, B, D * H)\) ~nnabla.Variable: Output \(h_n\) with shape \((L, D, B, H)\)

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.lstm(x, h, c, weight_l0, weight=None, bias=None, num_layers=1, dropout=None, bidirectional=False, training=True, n_outputs=- 1, outputs=None)[source]

N-Step LSTM layer.

\[\begin{split}{\mathbf f_t} &=& {\mathbf \sigma}( {\mathbf W_f} *{\mathbf x_t} + {\mathbf U_f}* {\mathbf h_{(t-1)}} + {\mathbf b_f})\\ {\mathbf i_t} &=& {\mathbf \sigma}( {\mathbf W_i} *{\mathbf x_t} + {\mathbf U_i}* {\mathbf h_{(t-1)}} + {\mathbf b_i})\\ {\mathbf o_t} &=& {\mathbf \sigma}( {\mathbf W_o} *{\mathbf x_t} + {\mathbf U_o}* {\mathbf h_{(t-1)}} + {\mathbf b_o})\\ {\mathbf c_t} &=& {\mathbf f_t}\odot {\mathbf c_{(t-1)}} + {\mathbf i_t}\odot {\mathbf \tanh}({\mathbf W_c}*{\mathbf x_t} + {\mathbf U_c} *{\mathbf h_{(t-1)}} + {\mathbf b_c})\\ {\mathbf h_t} &=& {\mathbf o_t} \odot {\mathbf \tanh}({\mathbf c_t}).\end{split}\]

We use the following notations to describe the inputs and outputs below. \(T\): sequcne length, \(B\): batch size, \(I\): input size, \(L\): number of layers, \(D\): number of directions, can be either 1 or 2, \(H\): hidden size.

References

Parameters
  • x (Variable) – Input N-D array with shape \((T, B, I)\).

  • h (Variable) – Input N-D array with shape \((L, D, B, H)\).

  • c (Variable) – Input N-D array with shape \((L, D, B, H)\).

  • weight_l0 (Variable) – weight parameters for the first layer. Shape is \((D, 4, H, I + H)\). [parameter]

  • weight (Variable) – weight parameters for the second layer and above. Shape is \((L-1, D, 4, H, D * H + H)\). [optional][parameter]

  • bias (Variable) – Bias vector (\(L\)). Shape is \((L, D, 4, H)\). [optional][parameter]

  • num_layers (int) – Number of layers in the network. If set to 1, only the weights for the first layer will be invoked. Default is 1. [default= 1 ]

  • dropout (float) – Dropout ratio applied to parameters. Default is 0.0. [default= 0.0 ]

  • bidirectional (bool) – If True, bidirecitonal computation will be performed in each layer. Default is False. [default= False ]

  • training (bool) – Backpropagation will be performed only when it is True. Default is True. [default= True ]

Returns

Output \(y\) with shape \((T, B, D * H)\). Its memory layout can be reshaped as \((T, B, D, H)\). ~nnabla.Variable: Output \(h_n\) with shape \((L, D, B, H)\) ~nnabla.Variable: Output \(c_n\) with shape \((L, D, B, H)\)

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.gru(x, h, weight_l0, weight=None, bias=None, num_layers=1, dropout=None, bidirectional=False, training=True, n_outputs=- 1, outputs=None)[source]

N-Step GRU layer.

\[\begin{split}{\mathbf r_t} &=& {\mathbf \sigma}( {\mathbf W_r} *{\mathbf x_t} + {\mathbf U_r}* {\mathbf h_{(t-1)}} + {\mathbf b_r})\\ {\mathbf z_t} &=& {\mathbf \sigma}( {\mathbf W_z} *{\mathbf x_t} + {\mathbf U_z}* {\mathbf h_{(t-1)}} + {\mathbf b_z})\\ {\mathbf n_t} &=& {\mathbf \tanh}( {\mathbf W_n}{\mathbf x_t}+ {\mathbf b_{in}}+ {\mathbf r_n}\odot( {\mathbf U_n}{\mathbf h_{t-1}}+ {\mathbf b_{hn}})) \\ {\mathbf h_t} &=& (1- {\mathbf z_t})\odot {\mathbf n_t} + {\mathbf z_t}\odot {\mathbf h_{t-1}}.\end{split}\]

We use the following notations to describe the inputs and outputs below. \(T\): sequcne length, \(B\): batch size, \(I\): input size, \(L\): number of layers, \(D\): number of directions, can be either 1 or 2, \(H\): hidden size.

References

Parameters
  • x (Variable) – Input N-D array with shape \((T, B, I)\).

  • h (Variable) – Input N-D array with shape \((L, D, B, H)\).

  • weight_l0 (Variable) – weight parameters for the first layer. Shape is \((D, 3, H, I + H)\). [parameter]

  • weight (Variable) – weight parameters for the second layer and above. Shape is \((L-1, D, 3, H, D * H + H)\). [optional][parameter]

  • bias (Variable) – Bias vector (\(L\)). Shape is \((L, D, 4, H)\). [optional][parameter]

  • num_layers (int) – Number of layers in the network. If set to 1, only the weights for the first layer will be invoked. Default is 1. [default= 1 ]

  • dropout (float) – Dropout ratio applied to parameters. Default is 0.0. [default= 0.0 ]

  • bidirectional (bool) – If True, bidirecitonal computation will be performed in each layer. Default is False. [default= False ]

  • training (bool) – Backpropagation will be performed only when it is True. Default is True. [default= True ]

Returns

Output \(y\) with shape \((T, B, D * H)\). Its memory layout can be reshaped as \((T, B, D, H)\). ~nnabla.Variable: Output \(h_n\) with shape \((L, D, B, H)\)

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.multi_head_attention(query, key, value, num_heads, q_weight, k_weight, v_weight, out_weight, q_bias=None, k_bias=None, v_bias=None, out_bias=None, attn_bias_k=None, attn_bias_v=None, dropout=0.0, additive_mask=None, key_padding_mask=None)[source]

MultiHeadAttention.

Computes multi-headed attention with query, key, and value. We use the following notations to describe the inputs and outputs below. \(L_T\): target sequence length, \(L_S\): source sequence length, \(B\): batch size, \(D\): input dimension, \(E\): embedding dimension, \(H\): number of attention heads.

References

A. Vaswani et al. “Attention is All You Need.” NIPS. 2017. <https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf>

Parameters
  • query (Variable) – Input N-D array with shape \((L_T, B, D_q)\).

  • key (Variable) – Input N-D array with shape \((L_S, B, D_k)\).

  • value (Variable) – Input N-D array with shape \((L_S, B, D_v)\).

  • num_heads (int) – Number of attention heads. Note that embedding dimensoin E must be divisible by the number of heads. Default is 12 which is conventional.

  • q_weight (Variable) – Input N-D array with shape \((D_q, E)\).

  • k_weight (Variable) – Input N-D array with shape \((D_k, E)\).

  • v_weight (Variable) – Input N-D array with shape \((D_v, E_v)\).

  • out_weight (Variable) – Input N-D array with shape \((D_v, E_{out})\).

  • q_bias (Variable, optional) – Input N-D array with shape \((E, )\).

  • k_bias (Variable, optional) – Input N-D array with shape \((E, )\).

  • v_bias (Variable, optional) – Input N-D array with shape \((E_v, )\).

  • out_bias (Variable, optional) – Input N-D array with shape \((E_{out}, )\).

  • attn_bias_k (Variable, optional) – Input N-D array with shape \((E, )\).

  • attn_bias_v (Variable, optional) – Input N-D array with shape \((E_v, )\).

  • dropout (float, optional) – Dropout ratio applied to parameters. Default is 0.

  • additive_mask (Variable, optional) – Input N-D array with shape \((L_T, L_S)\). Values will be added to the attention layer to prevent attention to certain positions.

  • key_padding_mask (Variable, optional) – Input N-D array with shape \((B, L_S)\). Specified padding elements will be ignored by the attention layer. Values must be either 1 or 0.

Returns

Output \(y\) with shape \((L_T, B, E_{out})\) ~nnabla.Variable: Output \(h_n\) with shape \((B, L_T, L_S)\)

Return type

Variable

nnabla.functions.patch_correlation(x1, x2, patch=(1, 1), shift=(0, 0), patch_step=(1, 1), shift_step=(1, 1), padding=(0, 0, 0, 0), channel_last=False)[source]

Multiplicative patch-wise comparison between inputs x1 and x2, which must both be 4-dimensional NCHW (with channel_last=False) or NHWC (with channel_last=True) arrays (where N is the number of samples, H and W are the sample height and width and C is the number of channels). The function returns a 5-D array with shape \((N, C_y, C_x, H_o, W_o)\) where \(H_o, W_o\) are determined by the possible patch locations within the, optionally padded, input image sizeand \(C_y, C_x\) are determined by the optionally shifted patch positions.

Mathematically, the patch correlation is formulated as

\[O(s_y, s_x, h_0, w_0) = \sum_{c} \sum_{k_h} \sum_{k_w} I_1(c, h + k_h, w + k_w) \times I_2(c, h + k_h + s_h, w + k_w + s_w),\]

where \(I_1(c, h, w)\) and \(I_2(c, h, w)\) are the inputs at \(c\)-th channel, \(h\)-th height, and \(w\)-th width, \(k_h, k_w\) indices for the patch size and \(s_h, s_w\) indices for the shifts.

A single correlation value (per sample) is produced if the patch extends to the image dimensions and all other parameters use the default values.

>>> import numpy as np, nnabla as nn, nnabla.functions as F
>>> N, C, H, W = (1, 2, 3, 4)
>>> x = nn.Variable.from_numpy_array(np.ones([N, C, H, W]))
>>> F.patch_correlation(x, x, patch=(H, W)).d
array([[[[[24.]]]]], dtype=float32)

A patch that is smaller than the image size moves horizontal and vertical producing a value per position. The patch_step argument may be used to control the position increments.

>>> F.patch_correlation(x, x, patch=(H-1, W-1)).d
array([[[[[12., 12.],
          [12., 12.]]]]], dtype=float32)
>>> F.patch_correlation(x, x, patch=(H-1, W-1), patch_step=(2, 1)).d
array([[[[[12., 12.]]]]], dtype=float32)

Multiple correlations may be performed at each position between the patch from x1 and patches from x2 at relative offsets striding the maximum vertical and horizontal distance given by the shift values at increments of shift_step. The shifted correlation values can be obtained for the from the second and third output dimension for the vertical and horizontal shifts.

>>> F.patch_correlation(x, x, (H, 1), shift=(0, 1)).shape
(1, 1, 3, 1, 4)
>>> F.patch_correlation(x, x, (H, 1), shift=(0, 1)).d
array([[[[[0., 6., 6., 6.]],
         [[6., 6., 6., 6.]],
         [[6., 6., 6., 0.]]]]], dtype=float32)
>>> F.patch_correlation(x, x, (H, 1), shift=(0, 1), shift_step=(1, 2)).d
array([[[[[0., 6., 6., 6.]],
         [[6., 6., 6., 0.]]]]], dtype=float32)

Padding with zero values may be applied individually to the top, bottom, left and right side of the input image.

>>> F.patch_correlation(x, x, patch=(H, W), padding=(0, 1, W, W)).d
array([[[[[ 0.,  6., 12., 18., 24., 18., 12.,  6.,  0.],
          [ 0.,  4.,  8., 12., 16., 12.,  8.,  4.,  0.]]]]], dtype=float32)

This function may be used to implement the FlowNetC correlation layer.

>>> N, C, H, W = (1, 256, 44, 60)
>>> x1, x2 = nn.Variable((N, C, H, W)), nn.Variable((N, C, H, W))
>>> F.patch_correlation(x1, x2, shift=20, shift_step=2).shape
(1, 21, 21, 44, 60)

References

Parameters
  • x1 (Variable) – Input N-D array with shape \((N, C, H, W)\) or \((N, H, W, C)\).

  • x2 (Variable) – Input N-D array with shape \((N, C, H, W)\) or \((N, H, W, C)\).

  • patch – A tuple with height and width of the correlation patch. A single integer expands to identical height and width.

  • shift – A tuple of maximum vertical and horizontal displacement of patches from x2 that are correlated with a single patch from x1. A single integer expands to identical vertical and horizontal displacement.

  • patch_step – A tuple of vertical and horizontal increments for advancing the position of the correlation patch within the input image shape. A single integer expands to identical vertical and horizontal increments.

  • shift_step – A tuple of vertical and horizontal increments for advancing the relative offset position within the shift range. A single integer expands to identical vertical and horizontal increments.

  • padding – A tuple of top, bottom, left and right padding extent. A tuple of two values yields identical top/bottom and left/right padding from the first and second tuple value. A single integer expands to indential padding extent for all sides.

  • channel_last – Last dimension is the channel (NHWC order) if True.

Returns

N-D array with shape \((N, C_y, C_x, H_o, W_o)\) or \((N, H, W, C_y, C_x)\) if channel_last=True.

A spatial size of the output is calculated as

\[H_o = \frac{H + (top\_pad + bottom\_pad) - patch_v}{patch\_step_v} + 1.\]

A channel size of the ouptut is calculated as

\[C_y = \frac{2 \times shift_v}{shift\_step_v} + 1.\]

\(W_o\) and \(C_x\) are the same calculation with differenct components.

Return type

Variable

nnabla.functions.roi_align(input, boxes, output_size, spatial_scale=(1.0, 1.0), sampling_ratio=None, channel_last=None, n_outputs=- 1, outputs=None)[source]

Map Regions of Interest (RoI) defined by bounding boxes to features of output_size height and width using bilinear interpolation with sampling_ratio points in the interpolation grid.

>>> import numpy as np, nnabla as nn, nnabla.functions as F
>>> nn.set_auto_forward(True)
>>> input = F.pad(F.constant(1, (1, 1, 2, 2)) * 2, (1, 1, 1, 1), "constant", 1)
>>> print(input.d)
[[[[1. 1. 1. 1.]
   [1. 2. 2. 1.]
   [1. 2. 2. 1.]
   [1. 1. 1. 1.]]]]
>>> boxes = nn.Variable.from_numpy_array([[0, 0, 0, 4, 4], [0, 1, 1, 3, 3]])
>>> output = F.roi_align(input, boxes, (2, 2))
>>> print(output.d[0])
[[[[1.25 1.25]
   [1.25 1.25]]]
>>> print(output.d[1])
[[[2.   2.  ]
  [2.   2.  ]]]]

The spatial_scale argument tuple may be used to appropriately scale the box coordinates, for example, to scale normalized box coordinate to the input height and width dimensions.

>>> input = F.reshape(F.arange(1, 13), (1, 1, 3, 4))
>>> print(input.d)
>>> boxes = nn.Variable.from_numpy_array([[0, 1/4, 1/3, 3/4, 2/30]])
>>> output = F.roi_align(input, boxes, (1, 2), spatial_scale=(3, 4))
>>> print(input.d)
[[[[6. 7.]]]]

References:

Parameters
  • input (Variable) – N-D array with shape \((N, H, W, C)\) or \((N, C, H, W)\).

  • boxes (Variable) – N-D array with shape \((K, 5)\) containing box coordinates in (b, x1, y1, x2, y2) format where b is the batch index. Note that an invalid (out-of-range) batch index will generate an error only when running on CPU; when using a GPU context the batch index values are clipped to the range of input samples.

  • output_size (tuple of int) – the height and width of the output feature maps.

  • spatial_scale (repeated float) – Scaling factor from box to input coordinates, as (x, y). [default= (1.0, 1.0) ]

  • sampling_ratio (int) – The number of sampling points used for interpolation. Computed as ceil((y2 - y1) / output_size[0]) for height and likewise for width if sampling_ratio <= 0. [default= -1 ]

  • channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default= False ]

Returns

N-D array with shape \((K, C, output\_size[0], output\_size[1])\) or \((K, output\_size[0], output\_size[1], C)\).

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

Neural Network Activation
nnabla.functions.sigmoid(x, n_outputs=- 1, outputs=None)[source]

Element-wise sigmoid function.

\[f(x) = \frac{1}{1 + \exp(-x)},\]
Parameters

x (Variable) – Input

Returns

Output

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.swish(x, n_outputs=- 1, outputs=None)[source]

Element-wise swish function, by Ramachandran et al. (2017).

\[y_i = \frac{x_i}{1 + \exp(-x_i)},\]

References

Parameters

x (Variable) – Input

Returns

Output

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.tanh(x, n_outputs=- 1, outputs=None)[source]

Element-wise hyperbolic tangent (tanh) function.

\[y_i = \tanh (x_i)\]
Parameters

x (Variable) – N-D array

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.relu(x, inplace=False, n_outputs=- 1, outputs=None)[source]

Element-wise Rectified Linear Unit (ReLU) function.

\[y_i = \max (0, x_i)\]
Parameters
  • x (Variable) – N-D array

  • inplace (bool) – This option is obsolete and ignored. Output is never in-placed with input. [default= False ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.softmax(x, axis=None, n_outputs=- 1, outputs=None)[source]

Softmax normalization. Calculates

\[y_i = \frac{\exp(x_i)}{\sum_j \exp(x_j)}\]

along the dimension specified by axis, where \(x_i\) is the input and \(y_i\) is the output.

Parameters
  • x (Variable) – N-D array. Typically indicates a score.

  • axis (int) – Axis normalization is taken. [default= len(x.shape) - 1 ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.log_softmax(x, axis=None, n_outputs=- 1, outputs=None)[source]

Fused operation of Softmax normalization followed by log, which is defined as

\[y_i = \log \frac{\exp(x_i)}{\sum_j \exp(x_j)},\]

where \(y_i\) is the input and \(x_i\) is the output at i-th channel. An advantage of this fusion is reducing the numerical instability due to the log application.

The original definition can be rewritten as

\[y_i = x_i - \max_j(x_j) - \log\left(\sum_j \exp(x_j - \max_k(x_k))\right).\]

It is more stable as a log is always applied to a value \(\ge e\), while a log can be evaluated for 0 in the non-fused operation.

Also, backward gradient computation is more stable than the original one as it doesn’t perform division by x due to a gradient of log. The definition is as following.

\[dx_i = dy_i - y_i * \sum_j dy_j\]

where \(dx_i\) and \(dy_i\) denote gradients of loss wrt \(x_i\) and \(y_i\) respectively.

Parameters
  • x (Variable) – N-D array. Typically indicates a score.

  • axis (int) – Axis normalization is taken. [default= len(x.shape) - 1 ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.elu(x, alpha=1.0, n_outputs=- 1, outputs=None)[source]

Element-wise Exponential Linear Unit (ELU) function.

\[\begin{split}y_i= \left\{ \begin{array}{ll} x_i & (x > 0)\\ \alpha (\exp(x_i) - 1) & (x \leq 0) \end{array} \right..\end{split}\]

References

Parameters
  • x (Variable) – N-D array

  • alpha (float) – Coefficient for negative outputs. \(\alpha\) in definition [default= 1.0 ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.selu(x, scale=1.05070098735548, alpha=1.673263242354377, n_outputs=- 1, outputs=None)[source]

Element-wise Scaled Exponential Linear Unit (SELU) function by Klambauer et al. (2017).

\[\begin{split}y_i= \lambda \left\{ \begin{array}{ll} x_i & (x > 0)\\ \alpha (\exp(x_i) - 1) & (x \leq 0) \end{array} \right..\end{split}\]

The coefficients \(\lambda\) and \(\alpha\) default to the following values \(\lambda_{01}\) and \(\alpha_{01}\), respectively, provided by Klambauer et al. (2017):

\[\begin{split}\begin{array}{lll} \lambda_{01} &=& \left( 1 - \operatorname{erfc}\left( \frac{1}{\sqrt{2}} \right) \sqrt{e} \right) \sqrt{2 \pi} \\ && \left( 2 \operatorname{erfc} \left( \sqrt{2} \right) e^2 + \pi \operatorname{erfc}\left( \frac{1}{\sqrt{2}} \right)^2 e \right. \\ && \left. - 2(2 + \pi) \operatorname{erfc} \left( \frac{1}{\sqrt{2}} \right) \sqrt{e} + \pi + 2 \right)^{-1/2} \\ &\approx& 1.0507 \\ \alpha_{01} &=& - \frac {\sqrt {\frac {2}{\pi}}} {\operatorname{erfc} \left( \frac{1}{\sqrt{2}} \right) \exp \left(\frac {1} {2} \right) - 1} \\ &\approx& 1.67326 \end{array}\end{split}\]

References

Parameters
  • x (Variable) – N-D array

  • scale (float) – The coefficient \(\lambda\) in the definition. [default= 1.05070098735548 ]

  • alpha (float) – The coefficient \(\alpha\) in the definition. [default= 1.673263242354377 ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.crelu(x, axis=1, n_outputs=- 1, outputs=None)[source]

Element-wise Concatenated Rectified Linear Unit (CReLU) function. This function calculates the ReLU of \(x\) and \(-x\) , then concatenates the results together at a specified axis, and returns the resulting array.

References

Parameters
  • x (Variable) – N-D array.

  • axis (int) – The ReLU activations of positive inputs and negative inputs are concatenated at axis. [default= 1 ]

Returns

N-D array where axis dimension is doubled by concatenating.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.celu(x, alpha=1.0, axis=1, n_outputs=- 1, outputs=None)[source]

Element-wise Concatenated Exponential Linear Unit (CELU) function. Concatenates ELU outputs of positive and negative inputs together at specified axis.

Parameters
  • x (Variable) – N-D array.

  • alpha (float) – Coefficient for negative outputs. \(\alpha\) in definition. [default= 1.0 ]

  • axis (int) – The ELU activations of positive inputs and negative inputs are concatenated at axis. [default= 1 ]

Returns

N-D array where axis dimension is doubled by concatenating.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.gelu(x, n_outputs=- 1, outputs=None)[source]

Gaussian Error Unit (GELU) function.

\[GELU(x) = xP(X \leq x) = x \Phi (x)\]

which is approximated by

\[GELU(x) = 0.5x (1 + \tanh ( \sqrt(2/\pi)(x + 0.044715x^3) ))\]

References

Parameters

x (Variable) – N-D array

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.mish(x, n_outputs=- 1, outputs=None)[source]

Mish activation function.

\[Mish(x) = x \tanh(\log(1+\exp(x_i)))\]

References

Parameters

x (Variable) – N-D array

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.prelu(x0, x1, base_axis=1, n_outputs=- 1, outputs=None)[source]

Element-wise Parametrized Rectified Linear Unit function. Calculates:

\[y_i = \max(0, x_i) + w_i \min(0, x_i)\]

where negative slope \(w\) is learned and can vary across channels (an axis specified with base_axis).

Parameters
  • x0 (Variable) – (N-D array) Input

  • x1 (Variable) – (N-D array) Weights

  • base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default= 1 ]

Returns

N-D array.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.leaky_relu(x, alpha=0.1, inplace=False, n_outputs=- 1, outputs=None)[source]

Element-wise Leaky Rectified Linear Unit (ReLU) function.

It is defined as:

\[y_i = \alpha * \min(0, x_i) + \max (0, x_i)\]
Parameters
  • x (Variable) – N-D array

  • alpha (float) – The slope value multiplied to negative numbers. \(\alpha\) in the definition. [default= 0.1 ]

  • inplace (bool) – This option is obsolete and ignored. Output is never in-placed with input. [default= False ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.relu6(x, n_outputs=- 1, outputs=None)[source]

Element-wise ReLU6 function. Capping ReLU activation to 6 is often observed to learn sparse features earlier.

\[ReLU6(x) = \min(\max(0,x,),6)\]
Parameters

x (Variable) – N-D array

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.hard_sigmoid(x, n_outputs=- 1, outputs=None)[source]

Segment-wise linear approximation of sigmoid. Preferable when speed of computation is more important than precision. Returns \(0\) if \(x < -2.5\). Returns \(1\) if \(x> 2.5\). Returns \(0.2x + 0.5\) if \(-2.5 <= x <= 2.5\).

Parameters

x (Variable) – N-D array

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.hard_tanh(x, n_outputs=- 1, outputs=None)[source]

Element-wise HardTanh function. Computationally cheaper than Tanh function. Returns \(1\) if \(x > 1\). Returns \(-1\) if \(x < -1\). Returns \(x\) otherwise.

Parameters

x (Variable) – N-D array

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.log_sigmoid(x, n_outputs=- 1, outputs=None)[source]

Element-wise LogSigmoid function.

\[LogSigmoid(x) = \log(1/(1+\exp(-x_i)))\]
Parameters

x (Variable) – N-D array

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.softplus(x, beta=1.0, n_outputs=- 1, outputs=None)[source]

Element-wise SoftPlus function. Unlike Sigmoid and Tanh that have upper and lower bound, SoftPlus is only lower-bounded by 0.

\[SoftPlus(x) = \frac{1}{\beta} * \log(1+\exp(\beta * x_i))\]
Parameters
  • x (Variable) – N-D array

  • beta (float) – the beta value for SoftPlus formulation [default= 1.0 ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.softsign(x, n_outputs=- 1, outputs=None)[source]

Element-wise SoftSign. Can be used in place of Tanh function. While Tanh converges exponentially, SoftSign converges polynomially.

\[SoftSign(x) = x/(1+|x|)\]
Parameters

x (Variable) – N-D array

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.tanh_shrink(x, n_outputs=- 1, outputs=None)[source]

Element-wies TanhShrink function.

\[TanhShrink(x) = x - \tanh(x)\]
Parameters

x (Variable) – N-D array

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.sinc(x, n_outputs=- 1, outputs=None)[source]

Element-wise Sinc function. Unlike other popular activation functions, it has rises and falls. returns \(1\) if \(x = 0\). returns \(\sin(x)/x\) otherwise.

Parameters

x (Variable) – N-D array

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

Normalization
nnabla.functions.batch_normalization(x, beta, gamma, mean, variance, axes=[1], decay_rate=0.9, eps=1e-05, batch_stat=True, output_stat=False, n_outputs=None)[source]

Batch normalization.

\[\begin{split}\begin{eqnarray} \mu &=& \frac{1}{M} \sum x_i \\ \sigma^2 &=& \frac{1}{M} \sum \left(x_i - \mu\right)^2 \\ \hat{x}_i &=& \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon}} \\ y_i &=& \hat{x}_i \gamma + \beta. \end{eqnarray}\end{split}\]

At testing time, the mean and variance values used are those that were computed during training by moving average.

References

Parameters
  • x (Variable) – N-D array of input.

  • beta (Variable or None) – N-D array of beta which is learned. If None, the bias term is omitted.

  • gamma (Variable or None) – N-D array of gamma which is learned. If None, the scale term is omitted.

  • mean (Variable or None) – N-D array of running mean (modified during forward execution). If None, dummy variable is created and running mean is not updated. mean=None with batch_stat=False is prohibited.

  • variance (Variable or None) – N-D array of running variance (modified during forward execution). If None, dummy variable is created and running variance is not updated. variance=None with batch_stat=False is prohibited.

  • axes (list of int or int) – Mean and variance are calculated along these axes.

  • decay_rate (float) – Decay rate of running mean and variance.

  • eps (float) – Tiny value to avoid zero division by std.

  • batch_stat (bool) – Use mini-batch statistics rather than running ones. If False, mean and variance must be ~nnabla.Variable. (None is prohibited.)

  • output_stat (bool) – It true, the batch statistics of mean and variance, will be returned as Variables. They are also differentiable.

Returns

Returns batch normalization output as Variable. If output_stat=True, it also returns the mean and variance of the mini-batch

See also

nnabla.function_bases.batch_normalization.

nnabla.functions.fused_batch_normalization(x, beta, gamma, mean, variance, z=None, axes=[1], decay_rate=0.9, eps=1e-05, batch_stat=True, nonlinearity='relu', output_stat=False, n_outputs=None)[source]

Batch normalization fused with an add operation and an activation.

References

Parameters
  • x (Variable) – N-D array of input.

  • beta (Variable or None) – N-D array of beta which is learned. If None, the bias term is omitted.

  • gamma (Variable or None) – N-D array of gamma which is learned. If None, the scale term is omitted.

  • mean (Variable or None) – N-D array of running mean (modified during forward execution). If None, dummy variable is created and running mean is never updated. mean=None with batch_stat=False is prohibited.

  • variance (Variable) – N-D array of running variance (modified during forward execution). If None, dummy variable is created and running variance is not updated. variance=None with batch_stat=False is prohibited.

  • z (Variable, optional) – N-D array

  • axes (list of int or int) – Mean and variance are calculated along these axes.

  • decay_rate (float) – Decay rate of running mean and variance.

  • eps (float) – Tiny value to avoid zero division by std.

  • batch_stat (bool) – Use mini-batch statistics rather than running ones. If False, mean and variance must be ~nnabla.Variable. (None is prohibited.)

  • nonlinearity (str) – Nonlinearity chosen from relu. Default is relu.

  • output_stat (bool) – It true, the batch statistics of mean and variance, will be returned as Variables. They are also differentiable.

Returns

Returns batch normalization output as Variable. If output_stat=True, it also returns the mean and variance of the mini-batch

See also

nnabla.function_bases.batch_normalization.

nnabla.functions.sync_batch_normalization(x, beta, gamma, mean, variance, comm, group='world', axes=[1], decay_rate=0.9, eps=1e-05, batch_stat=True, output_stat=False, n_outputs=None)[source]

Synchronized batch normalization.

For some tasks (e.g., semantic segmentation), batch size will be too small and BatchNormalization layer might not work well. SyncBatchNorlization layer solves these problems by synchronizing batch stats (mean and var) between multiple processes.

\[\begin{split}\begin{eqnarray} \mu &=& \frac{1}{M} \sum x_i \\ \sigma^2 &=& \frac{1}{M} \left(\sum x_i - \mu\right)^2 \\ \hat{x}_i &=& \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon}} \\ y_i &=& \hat{x}_i \gamma + \beta. \end{eqnarray}\end{split}\]

References

Parameters
  • x (Variable) – N-D array of input.

  • beta (Variable or None) – N-D array of beta which is learned. If None, the bias term is omitted.

  • gamma (Variable or None) – N-D array of gamma which is learned. If None, the scale term is omitted.

  • mean (Variable or None) – N-D array of running mean (modified during forward execution). If None, dummy variable is created and running mean is never updated. mean=None with batch_stat=False is prohibited.

  • variance (Variable or None) – N-D array of running variance (modified during forward execution). If None, dummy variable is created and running variance is never updated. variance=None with batch_stat=False is prohibited.

  • comm (Communicator) – The communicator

  • group (string) – The name of the communicator group

  • axes (list of int or int) – Mean and variance are calculated along these axes.

  • decay_rate (float) – Decay rate of running mean and variance.

  • eps (float) – Tiny value to avoid zero division by std.

  • batch_stat (bool) – Use mini-batch statistics rather than running ones. If False, mean and variance must be ~nnabla.Variable. (None is prohibited.)

  • output_stat (bool) – It true, the batch statistics of mean and variance, will be returned as Variables. They are also differentiable.

Returns

Returns batch normalization output as Variable. If output_stat=True, it also returns the mean and variance of the mini-batch

See also

nnabla.function_bases.batch_normalization.

nnabla.functions.mean_subtraction(x, mean, t, base_axis=1, update_running_mean=True)[source]

It subtracts the mean of the elements of the input array, and normalizes it to \(0\). Preprocessing arrays with this function has the effect of improving accuracy in various tasks such as image classification.

At training time, this function is defined as

\[\begin{split}\begin{eqnarray} \mu &=& \frac{1}{M} \sum x_i \\ y_i &=& x_i - \mu \end{eqnarray}\end{split}\]

At testing time, the mean values used are those that were computed during training by moving average.

Note

The backward performs an approximated differentiation that takes into account only the latest mini-batch.

Parameters
  • x (Variable) – N-D array of input.

  • mean (Variable) – N-D array of running mean (modified during forward execution).

  • t (Variable) – Scalar of num of iteration of running mean (modified during forward execution).

  • base_axis (int) – Base axis of Mean Subtraction operation. Dimensions up to base_axis is treated as sample dimension. [default=``1``]

  • update_running_mean (bool) – Update running mean during forward execution. [default=``True``]

Returns

N-D array.

Return type

Variable

See also

nnabla.function_bases.mean_subtraction.

nnabla.functions.norm_normalization(x, p=None, axes=None, eps=1e-12)[source]

Norm normalization.

\[y = \frac{x_i}{\|x\|_p}\]
Parameters
  • x (Variable) – N-D array.

  • p (float) – Order of the norm. [default= 2 ]

  • axes (repeated int64) – Axes to be reduced. If empty list is given, all dimensions are reduced. [default= range(x.ndim) ]

  • eps (float) – Epsilon for the normalization. This eps is added before taking the p-th root in the norm computation. [default= 1e-12 ]

Returns

N-D array

Return type

Variable

nnabla.functions.clip_by_value(x, min, max)[source]

Clip inputs by values.

\[\begin{split}y = \begin{cases} max & (x > max) \\ x & (otherwise) \\ min & (x < min) \end{cases}.\end{split}\]
Parameters
  • x (Variable) – An input variable.

  • min (Variable or float) – A min variable or float value by which x is clipped. Note that if Variable is given, its shape must be the same as x’s.

  • max (Variable or float) – A max variable or float value by which x is clipped. Note that if Variable is given, its shape must be the same as x’s

Returns

N-D array.

Return type

Variable

nnabla.functions.clip_grad_by_value(x, min, max, n_outputs=- 1, outputs=None)[source]

In forward pass, the function behaves as the identity.

In backward pass,

\[\begin{split}g_x = \begin{cases} max & (g_y > max) \\ g_y & (otherwise) \\ min & (g_y < min) \end{cases}.\end{split}\]

A typical case for use is to prevent the gradient explosion through a whole computational graph. For example, if you want to clip gradient values for each feature map,

x = nn.Variable([16, 3, 32, 32])
min = F.broadcast(nn.Variable.from_numpy_array(np.asarray([-1.0]).reshape((1, 1, 1, 1))), (16, 3, 32, 32))
max = F.broadcast(nn.Variable.from_numpy_array(np.asarray([1.0]).reshape((1, 1, 1, 1))), (16, 3, 32, 32))
c = F.clip_grad_by_value(x, min=min, max=max)
h = PF.convolution(c, 64, (3, 3), pad=(1, 1))
Parameters
  • x (Variable) – N-D array of input.

  • min (Variable) – N-D array of minimum input value by which the gradients of the y are clipped. Note that the shape of min must be the same as x’s and the backward to min is not performed.

  • max (Variable) – N-D array of maximum input value by which the gradients of the y are clipped. Note that the shape of max must be the same as x’s and the backward to max is not performed.

Returns

N-D array.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.clip_by_norm(x, clip_norm, axis=None)[source]

Clip inputs by its L2 norm when the L2 norm is larger than the threshold value (defined by clip_norm). If it is less than the threshold, inputs are not modified. If it is applied, the operation is represented as

\[y = N \times \frac{x}{\|x\|_2}.\]

where \(x\) is the input, \(y\) is the output, and \(N\) is clip_norm. this is the case that axes is not set. When axes is set, the norm is computed over axes.

Parameters
  • x (Variable) – An input variable.

  • clip_norm (Variable or float) – An input scalar variable or float value. Must be positive.

  • axis (None, int or tuple of ints) – Axis or axes along which the reduction is performed. Passing the default value None will reduce all dimensions.

Returns

N-D array.

Return type

Variable

nnabla.functions.clip_grad_by_norm(x, clip_norm=None, axes=None, n_outputs=- 1, outputs=None)[source]

In the forward pass, the function behaves like the identity.

In the backward pass,

\[g_x = N \times \frac{g_y}{\|g_y\|_2}.\]

where \(g_x\) is the gradient w.r.t the input, \(g_y\) is the gradient w.r.t. the output, and \(N\) is clip_norm where the norm of \(g_y\) becomes. this is the case that axes is not set. When axes is set, the norm is computed over axes.

A typical case for use is to prevent the gradient explosion through a whole computational graph. For example, if you want to normalize gradient values over feature axis,

x = nn.Variable([16, 3, 32, 32])
c = F.clip_grad_by_norm(x, axes=(1, ))
h = PF.convolution(c, 64, (3, 3), pad=(1, 1))
Parameters
  • x (Variable) – N-D array of input.

  • clip_norm (float) – Clip to the norm of input to clip_norm in the backward pass. [default= 1.0 ]

  • axes (repeated int64) – Axes to be reduced. If empty list is given, all dimensions are reduced to scalar. This is used in the forward pass. [default= range(x.ndim) ]

Returns

N-D array.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.layer_normalization(x, beta, gamma, batch_axis=0, eps=1e-05, output_stat=False)[source]

Applies Layer Normalization over an input tensor, which is defined as:

\[\begin{split}\begin{eqnarray} \mu^l &=& \frac{1}{H} \sum_{i=1}^{H} x_i^l \\ \sigma^l &=& \sqrt{\frac{1}{H} \sum_{i=1}^{H} \left(x_i^l - \mu^l\right)^2 + \epsilon} \\ y &=& \frac{x - \mu^l}{\sigma^l} \gamma + \beta \end{eqnarray}\end{split}\]

where \(x\) and \(y\) are input and output variable, \(\mu^l\) and \(\sigma^l\) are the mean and std of each layer which is separately calculated for each batch, and \(\beta\) and \(\gamma\) are adaptive biases and gains.

If the input shape is [B, C, H, W] (= batch_axis=0), the shape of calculated mean and std are [B, 1, 1, 1]

References

Parameters
  • x (Variable) – An input variable.

  • beta (Variable or None) – An Adaptive biases. If None, the bias term is omitted.

  • gamma (Variable or None) – An Adaptive gains. If None, the scale term is omitted.

  • batch_axis (int or repeated int) – Axes mean and variance are taken.

  • eps (float) – Tiny value to avoid zero division by std.

  • output_stat (bool) – If true, calculated mean and variance are also returned.

Returns

output variable which is normalized its statics and rescaled by alpha and beta. * Variable: Mean (if ``output_stat=True`). * Variable: Std (if ``output_stat=True`)

Return type

nnabla.functions.instance_normalization(x, beta, gamma, channel_axis=1, batch_axis=0, eps=1e-05, output_stat=False)[source]

Applies Instance Normalization over an input tensor, which is defined as:

\[\begin{split}\begin{eqnarray} \mu^i &=& \frac{1}{H} \sum_{i=1}^{H} x_i^i \\ \sigma^i &=& \sqrt{\frac{1}{H} \sum_{i=1}^{H} \left(x_i^i - \mu^i\right)^2 + \epsilon} \\ y &=& \frac{x - \mu^i}{\sigma^i} \gamma + \beta \end{eqnarray}\end{split}\]

where \(x\) and \(y\) are input and output variable, \(\mu^i\) and \(\sigma^i\) are the mean and std of each instance which is separately calculated for each batch and channel, and \(\gamma\) and \(\beta\) are adaptive gains and biases.

If the input shape is [B, C, H, W] (= channel_axis=1, batch_axis=0), the shape of calculated mean and std are [B, C, 1, 1]

References

Parameters
  • x (Variable) – An input variable.

  • beta (Variable or None) – An Adaptive biases. If None, the bias term is omitted.

  • gamma (Variable or None) – An Adaptive gains. If None, the scale term is omitted.

  • channel_axis (int) – Channel axis.

  • batch_axis (int or repeated int) – Batch axes.

  • eps (float) – Tiny value to avoid zero division by std.

  • output_stat (bool) – If true, the batch statistics of mean and variance.

Returns

Normalized output variable. * Variable: Mean (if ``output_stat=True`) * Variable: Std (if ``output_stat=True`)

Return type

nnabla.functions.group_normalization(x, beta, gamma, num_groups, channel_axis=1, batch_axis=0, eps=1e-05, output_stat=False)[source]

Applies Group Normalization over an input tensor, which is defined as:

\[\begin{split}\begin{eqnarray} \mu^g &=& \frac{1}{H} \sum_{i=1}^{H} x_i^g \\ \sigma^g &=& \sqrt{\frac{1}{H} \sum_{i=1}^{H} \left(x_i^g - \mu^g\right)^2 + \epsilon} \\ y &=& \frac{x - \mu^g}{\sigma^g} \gamma + \beta \end{eqnarray}\end{split}\]

where \(x\) and \(y\) are input and output variable, \(\mu^g\) and \(\sigma^g\) are the mean and std of each group which contains num_channels / num_groups channels, and \(\gamma\) and \(\beta\) are adaptive gains and biases.

The input channels, specified by channel_axis, are separated into num_groups groups, and the mean and std are calculated over the each group. For example, if the input shape is [B, C, H, W] (= channel_axis=1, batch_axis=0), an input variable is once reshaped to [B, num_groups, C / num_groups, H, W] and standardize by its mean and std whose shapes are [B, num_groups, 1, 1, 1]. Finally, an output variable is reshaped again to the original input shape (= [B, C, H, W] in the case above).

References

Parameters
  • x (Variable) – An input variable.

  • beta (Variable or None) – An Adaptive biases. If None, the bias term is omitted.

  • gamma (Variable or None) – An Adaptive gains. If None, the scale term is omitted.

  • num_groups (int) – A number of groups. The channel dim of ‘x’ must be integer multiple of num_groups.

  • channel_axis (int) – Channel axis.

  • batch_axis (int or repeated int) – Batch axes.

  • eps (float) – Tiny value to avoid zero division by std.

  • output_stat (bool) – If true, the batch statistics of mean and variance.

Returns

Normalized output variable. * Variable: Mean (if ``output_stat=True`) * Variable: Std (if ``output_stat=True`)

Return type

nnabla.functions.weight_standardization(w, channel_axis=0, eps=1e-05, output_stat=False)[source]

Applies Weight Standardization over an input weight, which is defined as:

\[\begin{split}\begin{eqnarray} \mu_{W_i} &=& \frac{1}{I} \sum_{j=1}^{I} W_{ij} \\ \sigma_{W_i} &=& \sqrt{\frac{1}{I} \sum_{i=1}^{I} \left(W_{ij} - \mu_{W_{i}}\right)^2 + \epsilon} \\ \hat{W_{ij}} &=& \frac{W_{ij} - \mu_{W_i}}{\sigma_{W_i}} \\ y &=& \hat{W} \ast x \end{eqnarray}\end{split}\]

Example

import numpy as np
import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF

rng = np.random.RandomState(313)
x = nn.Variable.from_numpy_array(rng.randn(*(32, 16, 3, 3)))

# For convolution:

def ws_callback_conv(w):
    return F.weight_standardization(w, channel_axis=0)

y = PF.convolution(x, 10, (2, 2), apply_w=ws_callback_conv)

# For affine:

def ws_callback_affine(w):
    return F.weight_standardization(w, channel_axis=1)

y = PF.affine(x, 10, apply_w=ws_callback_affine)

References

Parameters
  • w (Variable) – A weight variable.

  • channel_axis (int) – An axis for output channel. Default value is 0 which assumes the weights of convolution.

  • eps (float) – Tiny value to avoid zero division by std.

  • output_stat (bool) – If true, the batch statistics of mean and variance.

Returns

Standardized output weight. * Variable: Mean (if ``output_stat=True`) * Variable: Std (if ``output_stat=True`)

Return type

nnabla.functions.weight_normalization(w, g, dim=0, eps=1e-12, n_outputs=- 1, outputs=None)[source]

Weight normalization.

\[\mathbf{w}_{WN} = g \dfrac{\mathbf{w}}{\|\mathbf{w}\|}\]

where \(\mathbf{w}\) is the input weights to be normalized. and \(g\) is learnable multiplication factors each of which is applied to each data at dim.

References

Parameters
  • w (Variable) – N-D array of learnable weights.

  • g (Variable) – 1-D array of learnable scales.

  • dim (int) – Output dimension. For the other dimensions, the norms are computed. [default= 0 ]

  • eps (float) – Epsilon for the normalization. This eps is added before taking the sqrt in the norm computation. [default= 1e-12 ]

Returns

N-D array

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.spectral_norm(w, u, dim=0, itr=1, eps=1e-12, test=False, output_u=False)[source]

Spectral Normalization.

\[\begin{split}W_{sn} = \\frac{W}{\\sigma(W)}.\end{split}\]

where \(W\) is the input matrix, and the \(\\sigma(W)\) is the spectral norm of \(W\). The spectral norm is approximately computed by the power iteration.

References

Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida, “Spectral Normalization for Generative Adversarial Networks”, International Conference on Learning Representations. 2018.

Parameters
  • w (Variable) – N-D array of learnable weights. This is normally network parameter.

  • u (Variable) – 1-D array of singular vector. When test == False, the data region of u will be updated during forward calculation.

  • dim (int) – Output dimension. Default is 0. If the dimension is not 0, then the specified dimension becomes the most-left dimension by transposing. [default= 0 ]

  • itr (int) – Number of power iterations. Default is 1. [default= 1 ]

  • eps (float) – Epsilon for the normalization. This eps is added before taking the sqrt in the norm computation. [default= 1e-12 ]

  • test (bool) – When in True, u will not be updated. Default is False. [default= False ]

  • output_u (bool) – Output original u or not. u is updated when test == True but you can get original u as output with this option. Default is False. [default= False ]

Returns

Spectrally normalized \(W_{sn}\) with the same shape as \(W\).

Return type

Variable

Reduction
nnabla.functions.sum(x, axis=None, keepdims=False)[source]

Reduction along axes with sum operation.

Parameters
  • x (Variable) – An input variable.

  • axis (None, int or tuple of ints) – Axis or axes along which the sum is calculated. Passing the default value None will reduce all dimensions.

  • keepdims (bool) – Flag whether the reduced axes are kept as a dimension with 1 element.

Returns

N-D array.

Return type

Variable

nnabla.functions.mean(x, axis=None, keepdims=False)[source]

Reduction along axes with mean operation.

Parameters
  • x (Variable) – An input variable.

  • axis (None, int or tuple of ints) – Axis or axes along which mean is calculated. Passing the default value None will reduce all dimensions.

  • keepdims (bool) – Flag whether the reduced axes are kept as a dimension with 1 element.

Returns

N-D array.

Return type

Variable

nnabla.functions.max(x, axis=None, keepdims=False, with_index=False, only_index=False)[source]

Reduce the input N-D array x along the given axis using the max operation. The axis argument may be a single integer to reduce over one axis, a tuple of integers to reduce over multiple axes, or None to reduce over all axes. If keepdims is True, the output will keep all reduced dimensions with size 1. If with_index is True, result is a tuple (sorted, indices) or only indices if only_index is True. Setting only_index to True implies that with_index is also True.

import numpy as np
import nnabla as nn
import nnabla.functions as F

nn.set_auto_forward(True)
x = nn.Variable.from_numpy_array(np.random.rand(2, 3, 4))

maxval = F.max(x, axis=1)
assert np.allclose(maxval.d, np.max(x.d, axis=1))

maxval, indices = F.max(x, axis=1, with_index=True)
assert np.allclose(maxval.d, np.max(x.d, axis=1))
assert np.all(indices.d == np.argmax(x.d, axis=1))

indices = F.max(x, axis=1, only_index=True)
assert np.all(indices.d == np.argmax(x.d, axis=1))
Parameters
  • x (Variable) – An input variable.

  • axis (None, int or tuple of ints) – Axis or axes along which max is calculated. The default value None will reduce all dimensions.

  • keepdims (bool) – Keep reduced axes as dimension with 1 element.

  • with_index (bool) – Return tuple of max values and index.

  • only_index (bool) – Return only the index of max values.

Returns

N-D array.

Return type

Variable

nnabla.functions.min(x, axis=None, keepdims=False, with_index=False, only_index=False)[source]

Reduce the input N-D array x along the given axis using the min operation. The axis argument may be a single integer to reduce over one axis, a tuple of integers to reduce over multiple axes, or None to reduce over all axes. If keepdims is True, the output will keep all reduced dimensions with size 1. If with_index is True, result is a tuple (sorted, indices) or only indices if only_index is True. Setting only_index to True implies that with_index is also True.

import numpy as np
import nnabla as nn
import nnabla.functions as F

nn.set_auto_forward(True)
x = nn.Variable.from_numpy_array(np.random.rand(2, 3, 4))

minval = F.min(x, axis=1)
assert np.allclose(minval.d, np.min(x.d, axis=1))

minval, indices = F.min(x, axis=1, with_index=True)
assert np.allclose(minval.d, np.min(x.d, axis=1))
assert np.all(indices.d == np.argmin(x.d, axis=1))

indices = F.min(x, axis=1, only_index=True)
assert np.all(indices.d == np.argmin(x.d, axis=1))
Parameters
  • x (Variable) – An input variable.

  • axis (None, int or tuple of ints) – Axis or axes along which min is calculated. The default value None will reduce all dimensions.

  • keepdims (bool) – Keep reduced axes as dimension with 1 element.

  • with_index (bool) – Return tuple of min values and index.

  • only_index (bool) – Return only the index of min values.

Returns

N-D array.

Return type

Variable

nnabla.functions.norm(x, p=None, axis=None, keepdims=False)[source]

Reduction along axes with norm operation.

\[y = \|x\|_p = \left( \sum_i |x_i|^p \right)^{\frac{1}{p}}\]
Parameters
  • x (Variable) – An input variable.

  • p (float) – Order of the norm.

  • axis (None, int or tuple of ints) – Axis or axes along which product is calculated. Passing the default value None will reduce all dimensions.

  • keepdims (bool) – Flag whether the reduced axes are kept as a dimension with 1 element.

Returns

N-D array.

Return type

Variable

nnabla.functions.prod(x, axis=None, keepdims=False)[source]

Reduction along axes with product operation.

Parameters
  • x (Variable) – An input variable.

  • axis (None, int or tuple of ints) – Axis or axes along which product is calculated. Passing the default value None will reduce all dimensions.

  • keepdims (bool) – Flag whether the reduced axes are kept as a dimension with 1 element.

Returns

N-D array.

Return type

Variable

Note

Backward computation is not accurate in a zero value input.

nnabla.functions.reduce_sum(x, n_outputs=- 1, outputs=None)[source]

Reduction along an axis with sum operation.

Note

This is deprecated. Use sum instead.

Parameters

x (Variable) – N-D array.

Returns

N-D array

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.reduce_mean(x, n_outputs=- 1, outputs=None)[source]

Reduction by mean along an axis.

Note

This is deprecated. Use mean instead.

Parameters

x (Variable) – N-D array

Returns

N-D array

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

Arithmetic
nnabla.functions.add2(x0, x1, inplace=False, n_outputs=- 1, outputs=None)[source]

Element-wise addition.

\[y_i = x^{(0)}_i + x^{(1)}_i\]
Parameters
  • x0 (Variable) – N-D array

  • x1 (Variable) – N-D array

  • inplace (bool) – This option is obsolete and ignored. Output is never in-placed with input. [default= False ]

Returns

N-D array

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.add_n(*x, **kw)[source]

Element-wise addition.

\[y_i = x^{(0)}_i + . . . + x^{(n-1)}_i\]
Parameters

*x (Variable) – N-D arrays [variadic]

Returns

N-D array

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.sub2(x0, x1, inplace=False, n_outputs=- 1, outputs=None)[source]

Element-wise subtraction.

\[y_i = x^{(0)}_i - x^{(1)}_i\]
Parameters
  • x0 (Variable) – N-D array

  • x1 (Variable) – N-D array

  • inplace (bool) – This option is obsolete and ignored. Output is never in-placed with input. [default= False ]

Returns

N-D array

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.mul2(x0, x1, inplace=False, n_outputs=- 1, outputs=None)[source]

Element-wise multiplication.

\[y_i = x^{(0)}_i x^{(1)}_i\]
Parameters
  • x0 (Variable) – N-D array

  • x1 (Variable) – N-D array

  • inplace (bool) – This option is obsolete and ignored. Output is never in-placed with input. [default= False ]

Returns

N-D array

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.mul_n(*x, **kw)[source]

Element-wise multiplication.

\[y_i = x^{(0)}_i . . . x^{(n-1)}_i\]
Parameters

*x (Variable) – N-D arrays [variadic]

Returns

N-D array

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.div2(x0, x1, inplace=False, n_outputs=- 1, outputs=None)[source]

Element-wise division.

\[y_i = \frac{x^{(0)}_i} {x^{(1)}_i}\]
Parameters
  • x0 (Variable) – N-D array

  • x1 (Variable) – N-D array

  • inplace (bool) – This option is obsolete and ignored. Output is never in-placed with input. [default= False ]

Returns

N-D array

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.pow2(x0, x1, inplace=False, n_outputs=- 1, outputs=None)[source]

Element-wise power function.

\[y_i = {(x^{(0)}_i)} ^ {x^{(1)}_i}\]
Parameters
  • x0 (Variable) – N-D array

  • x1 (Variable) – N-D array

  • inplace (bool) – This option is obsolete and ignored. Output is never in-placed with input. [default= False ]

Returns

N-D array

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.add_scalar(x, val=1, inplace=False, n_outputs=- 1, outputs=None)[source]

Element-wise scalar addition.

\[y_i = x_i + v\]
Parameters
  • x (Variable) – Input variable

  • val (float) – Value of the scalar [default= 1 ]

  • inplace (bool) – This option is obsolete and ignored. Output is never in-placed with input. [default= False ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.mul_scalar(x, val=1, inplace=False, n_outputs=- 1, outputs=None)[source]

Element-wise scalar multiplication.

\[y_i = v x_i\]
Parameters
  • x (Variable) – Input variable

  • val (float) – Value of the scalar [default= 1 ]

  • inplace (bool) – This option is obsolete and ignored. Output is never in-placed with input. [default= False ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.pow_scalar(x, val=1, inplace=False, n_outputs=- 1, outputs=None)[source]

Element-wise scalar power function.

\[y_i = (x_i) ^ v\]
Parameters
  • x (Variable) – Input variable

  • val (float) – Value of the scalar [default= 1 ]

  • inplace (bool) – This option is obsolete and ignored. Output is never in-placed with input. [default= False ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.r_sub_scalar(x, val=1, n_outputs=- 1, outputs=None)[source]

Element-wise scalar subtraction.

\[y_i = v - x_i\]
Parameters
  • x (Variable) – Input variable

  • val (float) – Value of the scalar [default= 1 ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.r_div_scalar(x, val=1, n_outputs=- 1, outputs=None)[source]

Element-wise scalar division.

\[y_i = \frac{v}{x_i}\]
Parameters
  • x (Variable) – Input variable

  • val (float) – Value of the scalar [default= 1 ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.r_pow_scalar(x, val=1, n_outputs=- 1, outputs=None)[source]

Element-wise scalar power function.

\[y_i = v ^ {x_i}\]
Parameters
  • x (Variable) – Input variable

  • val (float) – Value of the scalar [default= 1 ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

Logical
nnabla.functions.equal(x0, x1, n_outputs=- 1, outputs=None)[source]

Element wise ‘equal’

\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i = x^{(1)}_i) \\ 0 & otherwise \end{cases}.\end{split}\]
Parameters
Returns

No Description

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.equal_scalar(x0, val=1, n_outputs=- 1, outputs=None)[source]

Element wise ‘equal’ with a scalar

\[\begin{split}f(x_i,v) = \begin{cases} 1 & (x_i = v) \\ 0 & otherwise \end{cases}.\end{split}\]
Parameters
  • x0 (Variable) – Input variable

  • val (float) – Value of the scalar [default= 1 ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.greater(x0, x1, n_outputs=- 1, outputs=None)[source]

Element wise comparison. The \(i^{th}\) element of the output is:

\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i > x^{(1)}_i) \\ 0 & (x^{(0)}_i \leq x^{(1)}_i) \end{cases}.\end{split}\]
Parameters
Returns

No Description

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.greater_equal(x0, x1, n_outputs=- 1, outputs=None)[source]

Element wise comparison. The \(i^{th}\) element of the output is:

\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i \geq x^{(1)}_i) \\ 0 & (x^{(0)}_i < x^{(1)}_i) \end{cases}.\end{split}\]
Parameters
Returns

No Description

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.greater_equal_scalar(x0, val=1, n_outputs=- 1, outputs=None)[source]

Element wise comparison with a scalar. The \(i^{th}\) element of the output is:

\[\begin{split}f(x^{(0)}_i,v) = \begin{cases} 1 & (x^{(0)}_i \geq v \\ 0 & (x^{(0)}_i < v \end{cases}.\end{split}\]
Parameters
  • x0 (Variable) – Input variable

  • val (float) – Value of the scalar [default= 1 ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.greater_scalar(x0, val=1, n_outputs=- 1, outputs=None)[source]

Element wise comparison with a scalar. The \(i^{th}\) element of the output is:

\[\begin{split}f(x^{(0)}_i,v) = \begin{cases} 1 & (x^{(0)}_i > v \\ 0 & (x^{(0)}_i \leq v \end{cases}.\end{split}\]
Parameters
  • x0 (Variable) – Input variable

  • val (float) – Value of the scalar [default= 1 ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.less(x0, x1, n_outputs=- 1, outputs=None)[source]

Element wise comparison. The \(i^{th}\) element of the output is:

\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i < x^{(1)}_i) \\ 0 & (x^{(0)}_i \geq x^{(1)}_i) \end{cases}.\end{split}\]
Parameters
Returns

No Description

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.less_equal(x0, x1, n_outputs=- 1, outputs=None)[source]

Element wise comparison. The \(i^{th}\) element of the output is:

\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i \leq x^{(1)}_i) \\ 0 & (x^{(0)}_i > x^{(1)}_i) \end{cases}.\end{split}\]
Parameters
Returns

No Description

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.less_equal_scalar(x0, val=1, n_outputs=- 1, outputs=None)[source]

Element wise comparison with a scalar. The \(i^{th}\) element of the output is:

\[\begin{split}f(x^{(0)}_i,v) = \begin{cases} 1 & (x^{(0)}_i \leq v) \\ 0 & (x^{(0)}_i > v) \end{cases}.\end{split}\]
Parameters
  • x0 (Variable) – Input variable

  • val (float) – Value of the scalar [default= 1 ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.less_scalar(x0, val=1, n_outputs=- 1, outputs=None)[source]

Element wise comparison with a scalar. The \(i^{th}\) element of the output is:

\[\begin{split}f(x^{(0)}_i,v) = \begin{cases} 1 & (x^{(0)}_i < v) \\ 0 & (x^{(0)}_i \geq v) \end{cases}.\end{split}\]
Parameters
  • x0 (Variable) – Input variable

  • val (float) – Value of the scalar [default= 1 ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.logical_and(x0, x1, n_outputs=- 1, outputs=None)[source]

Elementwise logical AND.

\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i \neq 0 \;\&\; x^{(1)}_i \neq 0) \\ 0 & otherwise \end{cases}.\end{split}\]
Parameters
Returns

No Description

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.logical_and_scalar(x0, val, n_outputs=- 1, outputs=None)[source]

Elementwise logical AND with scalar.

\[\begin{split}f(x_i,v) = \begin{cases} 1 & (x_i \neq 0 \;\&\; v \neq 0) \\ 0 & otherwise \end{cases}.\end{split}\]
Parameters
  • x0 (Variable) – Input variable

  • val (bool) – No Description

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.logical_not(x0, n_outputs=- 1, outputs=None)[source]

Element-wise logical NOT operation

\[\begin{split}f(x_i) = \begin{cases} 1 & (x_i = 0) \\ 0 & otherwise \end{cases}.\end{split}\]
Parameters

x0 (Variable) – Input variable

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.logical_or(x0, x1, n_outputs=- 1, outputs=None)[source]

Elementwise logical OR.

\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 0 & (x^{(0)}_i = 0 \;\&\; x^{(1)}_i = 0) \\ 1 & otherwise \end{cases}.\end{split}\]
Parameters
Returns

No Description

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.logical_or_scalar(x0, val, n_outputs=- 1, outputs=None)[source]

Elementwise logical OR with scalar.

\[\begin{split}f(x_i,v) = \begin{cases} 0 & (x_i = 0 \;\&\; v = 0) \\ 1 & otherwise \end{cases}.\end{split}\]
Parameters
  • x0 (Variable) – Input variable

  • val (bool) – No Description

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.logical_xor(x0, x1, n_outputs=- 1, outputs=None)[source]

Elementwise logical XOR.

\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 1 & (x^{(0)}_i = 0 \;\&\; x^{(1)}_i = 0) \\ 1 & (x^{(0)}_i \neq 0 \;\&\; x^{(1)}_i \neq 0) \\ 0 & otherwise \end{cases}.\end{split}\]
Parameters
Returns

No Description

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.logical_xor_scalar(x0, val, n_outputs=- 1, outputs=None)[source]

Elementwise logical XOR with scalar.

\[\begin{split}f(x_i,v) = \begin{cases} 1 & (x_i = 0 \;\&\; v = 0) \\ 1 & (x_i \neq 0 \;\&\; v \neq 0) \\ 0 & otherwise \end{cases}.\end{split}\]
Parameters
  • x0 (Variable) – Input variable

  • val (bool) – No Description

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.not_equal(x0, x1, n_outputs=- 1, outputs=None)[source]

Element wise ‘not equal’

\[\begin{split}f(x^{(0)}_i,x^{(1)}_i) = \begin{cases} 0 & (x^{(0)}_i = x^{(1)}_i) \\ 1 & otherwise \end{cases}.\end{split}\]
Parameters
Returns

No Description

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.not_equal_scalar(x0, val=1, n_outputs=- 1, outputs=None)[source]

Element wise ‘not equal’ with a scalar

\[\begin{split}f(x_i,v) = \begin{cases} 0 & (x_i = v) \\ 1 & otherwise \end{cases}.\end{split}\]
Parameters
  • x0 (Variable) – Input variable

  • val (float) – Value of the scalar [default= 1 ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.sign(x, alpha=1.0, n_outputs=- 1, outputs=None)[source]

Element-wise sign function.

In the forward pass, it is defined as

\[\begin{split}f(x) = \begin{cases} 1 & (x > 0) \\ -1 & (x < 0) \\ \alpha & (x = 0) \end{cases}.\end{split}\]

In the backward pass, it is defined as

\[\frac{\partial f(x)}{\partial x} = 1,\]

or in other words, it behaves as the identity function for the gradient in the backward pass.

Parameters
  • x (Variable) – Input

  • alpha (float) – Value in case of \(x = 0\). [default= 1.0 ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.minimum2(x0, x1, n_outputs=- 1, outputs=None)[source]

Element-wise minimum.

\[y_i = \min(x^{(0)}_i, x^{(1)}_i)\]
Parameters
Returns

N-D array of min value

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.maximum2(x0, x1, n_outputs=- 1, outputs=None)[source]

Element-wise maximum.

\[y_i = \max(x^{(0)}_i, x^{(1)}_i)\]
Parameters
Returns

N-D array of max value

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.minimum_scalar(x, val=1.0, n_outputs=- 1, outputs=None)[source]

Element-wise scalar minimum.

\[y_i = \min(x_i, v)\]
Parameters
  • x (Variable) – Input variable

  • val (float) – Value of the scalar [default= 1.0 ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.maximum_scalar(x, val=1.0, n_outputs=- 1, outputs=None)[source]

Element-wise scalar maximum.

\[y_i = \max (x_i, v)\]
Parameters
  • x (Variable) – Input variable

  • val (float) – Value of the scalar [default= 1.0 ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.isnan(x0, n_outputs=- 1, outputs=None)[source]

Test element-wise for NaN and return a 0/1 array.

Parameters

x0 (Variable) – Input variable

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.isinf(x0, n_outputs=- 1, outputs=None)[source]

Test element-wise for inf/-inf and return a 0/1 array.

Parameters

x0 (Variable) – Input variable

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.reset_nan(x0, val=0, n_outputs=- 1, outputs=None)[source]

Replace NaNs with a scalar value specified by val.

Parameters
  • x0 (Variable) – Input variable

  • val (float) – Value of the scalar [default= 0 ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.reset_inf(x0, val=0, n_outputs=- 1, outputs=None)[source]

Replace -inf/inf with a scalar value specified by val.

Parameters
  • x0 (Variable) – Input variable

  • val (float) – Value of the scalar [default= 0 ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.where(condition, x_true, x_false, n_outputs=- 1, outputs=None)[source]

Return elements, either from x_true or x_false, depending on condition.

If rank of condition is higher than those of x_true and x_false, the first dimensions of x_true and x_false must match the dimensions of condition.

Example:

import numpy as np
import nnabla as nn
import nnabla.functions as F

a = nn.Variable.from_numpy_array(np.random.rand(2, 3))
x = nn.Variable.from_numpy_array(np.random.rand(2, 3, 4))
y = nn.Variable.from_numpy_array(np.random.rand(2, 3, 4))
z = F.where(F.greater_scalar(a, 0.5), x, y)
z.forward()

# Numpy equivalent
z_numpy = np.where(a.d > 0.5, x.d, y.d)
assert np.allclose(z_numpy, z.d)
Parameters
  • condition (Variable) – N-d array. For all i, when condition[i] == true, yield x_true[i], otherwise x_false[i].

  • x_true (Variable) – N-d array with higher or equal rank to condition.

  • x_false (Variable) – N-d array with higher or equal rank to condition.

Returns

N-D array with the same shape as condition

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

Math
nnabla.functions.constant(val=0, shape=[], n_outputs=- 1, outputs=None)[source]

Generate a constant-valued array.

Parameters
  • val (float) – Constant value. [default= 0 ]

  • shape (tuple of int) – Shape of the output array. [default= [] ]

Returns

N-D array where all values are the specified constant.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.arange(start, stop, step=1, n_outputs=- 1, outputs=None)[source]

Generate a range of values within the half-open interval [start, stop) (the interval including start but excluding stop) with step increments.

Parameters
  • start (float) – Start value.

  • stop (float) – End value.

  • step (float) – Step value. [default= 1 ]

Returns

1-D array with the generated values.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.abs(x, n_outputs=- 1, outputs=None)[source]

Element-wise absolute value function.

\[y_i = |x_i|\]
Parameters

x (Variable) – Input variable

Returns

Element-wise absolute variable

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.exp(x, n_outputs=- 1, outputs=None)[source]

Element-wise natural exponential function.

\[y_i = \exp(x_i).\]
Parameters

x (Variable) – Input variable

Returns

Element-wise exp variable

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.log(x, n_outputs=- 1, outputs=None)[source]

Element-wise natural logarithm function.

\[y_i = \ln(x_i).\]
Parameters

x (Variable) – Input variable

Returns

Element-wise log variable

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.round(x, n_outputs=- 1, outputs=None)[source]

Element-wise round function.

In the forward pass, this function simply computes round to the nearest integer value.

\[y_i = round(x_i).\]

In the backward pass, the simple Straight-Through Estimator (STE) is applied,

\[\frac{\partial y_i}{\partial x_i} = 1.\]
Parameters

x (Variable) – Input variable

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.ceil(x, n_outputs=- 1, outputs=None)[source]

Element-wise ceil function.

In the forward pass, this function simply returns the smallest integer which is not less than the input.

\[y_i = ceil(x_i).\]

In the backward pass, the simple Straight-Through Estimator (STE) is applied,

\[\frac{\partial y_i}{\partial x_i} = 1.\]
Parameters

x (Variable) – Input variable

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.floor(x, n_outputs=- 1, outputs=None)[source]

Element-wise floor function.

In the forward pass, this function simply returns the largest integer which is not greater than the input.

\[y_i = floor(x_i).\]

In the backward pass, the simple Straight-Through Estimator (STE) is applied,

\[\frac{\partial y_i}{\partial x_i} = 1.\]
Parameters

x (Variable) – Input variable

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.identity(x, n_outputs=- 1, outputs=None)[source]

Identity function.

\[y = x\]
Parameters

x (Variable) – N-D array.

Returns

N-D array

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.matrix_diag(x, n_outputs=- 1, outputs=None)[source]

Returns an array where the last two dimensions consist of the diagonal matrix.

Parameters

x (Variable) – N-D array with shape (\(M_0 \times \ldots \times M_N\)).

Returns

N-D array with shape (\(M_0 \times \ldots \times M_N \times M_N\)).

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.matrix_diag_part(x, n_outputs=- 1, outputs=None)[source]

Returns an array in which the values of the last dimension consist of the diagonal elements of the last two dimensions of an input array.

Parameters

x (Variable) – N-D array with shape (\(M_0 \times \ldots \times M_N \times M_N\)).

Returns

N-D array with shape (\(M_0 \times \ldots \times M_N\)).

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.batch_matmul(a, b, transpose_a=False, transpose_b=False, n_outputs=- 1, outputs=None)[source]

Batch matrix multiplication.

Two of batchs of matrices are multiplied for each sample in a batch. A batch of matrices is composed as […, P, Q] where the last two dimensions compose matrix dimensions, and the first dimensions up to the third last dimension are considered as batch samples. These batch dimensions are internally broadcasted when the size of a dimension is 1.

Example:

import nnabla as nn
import nnabla.functions as F
import numpy as np

nn.set_auto_forward(True)

# Same batch size
a = nn.Variable.from_numpy_array(np.random.rand(2, 2, 3, 4))
b = nn.Variable.from_numpy_array(np.random.rand(2, 2, 4, 3))
c = F.batch_matmul(a, b)

# Different batch size with the broadcast
a = nn.Variable.from_numpy_array(np.random.rand(2, 1, 3, 4))
b = nn.Variable.from_numpy_array(np.random.rand(1, 3, 4, 3))
c = F.batch_matmul(a, b)

Warning

Since the version 1.13, the behavior of the batch dimensions changed, it supported the internal broadcast when the size of a dimension is 1. Accordingly, this function does not supports different batch dimensions between two inputs even if the total sample size for each input is same.

Parameters
  • a (Variable) – N-D array with >= 2-dim. The last two dimensions will be treated as a matrix.

  • b (Variable) – N-D array with >= 2-dim. The last two dimensions will be treated as a matrix. The product of the size of 0-th dimension through the size of the third last dimension must be same as that of the input a.

  • transpose_a (bool) – Transpose the last two axes of a in matrix multiplication. [default= False ]

  • transpose_b (bool) – Transpose the last two axes of b in matrix multiplication. [default= False ]

Returns

Output of sample-wise matrix multiplication in a batch. When a is of a shape of [N, P, Q], b is of a shape of [N, Q, R], and transpose options are all False, the output will be a shape of [N, P, R].

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.sin(x, n_outputs=- 1, outputs=None)[source]

Element-wise sine (sin) function.

\[y_i = \sin (x_i)\]
Parameters

x (Variable) – N-D array

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.cos(x, n_outputs=- 1, outputs=None)[source]

Element-wise cosine (cos) function.

\[y_i = \cos (x_i)\]
Parameters

x (Variable) – N-D array

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.tan(x, n_outputs=- 1, outputs=None)[source]

Element-wise tangent (tan) function.

\[y_i = \tan (x_i)\]
Parameters

x (Variable) – N-D array

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.sinh(x, n_outputs=- 1, outputs=None)[source]

Element-wise hyperbolic sine (sinh) function.

\[y_i = \sinh (x_i)\]
Parameters

x (Variable) – N-D array

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.cosh(x, n_outputs=- 1, outputs=None)[source]

Element-wise hyperbolic cosine (cosh) function.

\[y_i = \cosh (x_i)\]
Parameters

x (Variable) – N-D array

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.tanh(x, n_outputs=- 1, outputs=None)[source]

Element-wise hyperbolic tangent (tanh) function.

\[y_i = \tanh (x_i)\]
Parameters

x (Variable) – N-D array

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.asin(x, n_outputs=- 1, outputs=None)[source]

Element-wise arcsine (asin) function.

\[y_i = \arcsin (x_i)\]
Parameters

x (Variable) – N-D array

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.acos(x, n_outputs=- 1, outputs=None)[source]

Element-wise arccosine (acos) function.

\[y_i = \arccos (x_i)\]
Parameters

x (Variable) – N-D array

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.atan(x, n_outputs=- 1, outputs=None)[source]

Element-wise arctangent (atan) function.

\[y_i = \arctan (x_i)\]
Parameters

x (Variable) – N-D array

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.atan2(x0, x1, n_outputs=- 1, outputs=None)[source]

Element-wise arctangent (atan) function with 2 input variables.

\[y_i = \arctan2 (x_{i1}, x_{i2})\]
Parameters
Returns

N-D array with the same shape as input variables

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.asinh(x, n_outputs=- 1, outputs=None)[source]

Element-wise hyperbolic arcsine (asinh) function.

\[y_i = \text{arcsinh} (x_i)\]
Parameters

x (Variable) – N-D array

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.acosh(x, n_outputs=- 1, outputs=None)[source]

Element-wise hyperbolic arccosine (acosh) function.

\[y_i = \text{arccosh} (x_i)\]
Parameters

x (Variable) – N-D array

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.atanh(x, n_outputs=- 1, outputs=None)[source]

Element-wise hyperbolic arctangent (atanh) function.

\[y_i = \text{arctanh} (x_i)\]
Parameters

x (Variable) – N-D array

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.cumsum(x, axis=None, exclusive=False, reverse=False, n_outputs=- 1, outputs=None)[source]

Cumulative sum along a given axis.

Parameters
  • x (Variable) – N-D array.

  • axis (int) – Axis along which cumulative sum is to be calculated [default= 0 ]

  • exclusive (bool) – If True, perform exclusive cumsum [default= False ]

  • reverse (bool) – If True, perform cumsum in reverse direction [default= False ]

Returns

N-D array

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.cumprod(x, axis=None, exclusive=False, reverse=False, n_outputs=- 1, outputs=None)[source]

Cumulative product along a given axis.

Note

Backward computation is not accurate in a zero value input.

Parameters
  • x (Variable) – N-D array.

  • axis (int) – Axis along which cumulative product is to be calculated [default= 0 ]

  • exclusive (bool) – If True, perform exclusive cumprod [default= False ]

  • reverse (bool) – If True, perform cumprod in reverse direction [default= False ]

Returns

N-D array

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.batch_inv(x, n_outputs=- 1, outputs=None)[source]

Returns an array of inverted matrix

Parameters

x (Variable) – batched N-D array

Returns

batched N-D array of inverted matrix

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.batch_det(x, n_outputs=- 1, outputs=None)[source]

Batch-wise determinant function.

\[Y_b = \det(X_b),\]

where \(X_b\) and \(Y_b\) are the \(b\)-th input and output, respectively.

Parameters

x (Variable) – batched N-D array

Returns

batched N-D array of determinant

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.batch_logdet(x, n_outputs=- 1, outputs=None)[source]

Batch-wise log absolute determinant function.

\[Y_b = \log(|\det(X_b)|),\]

where \(X_b\) and \(Y_b\) are the \(b\)-th input and output, respectively.

Parameters

x (Variable) – batched N-D array

Returns

batched N-D array of log absolute determinant

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

Array Manipulation
nnabla.functions.concatenate(*x, **kw)[source]

Concatenate a variable number of input arrays along the specified axis.

Parameters
  • *x (Variable) – N-D arrays. [variadic]

  • axis (int) – Axis [default= len(x[0].shape) - 1 ]

Returns

Concatenate variable

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.split(x, axis=0)[source]

Split arrays at the specified axis.

It returns a number corresponding the size of the given axis (i.e x.shape[axis]) of Variable s.

Parameters

Returns: A tuple of Variable s

See also

nnabla.function_bases.split().

nnabla.functions.stack(*x, **kw)[source]

Joins two or more arrays on a new axis.

Note

Unlike nnabla.functions.concatenate() , which joins arrays on an existing axis, Stack joins arrays on a new axis.

Parameters
  • *x (Variable) – N-D arrays. The sizes of all the arrays to be stacked must be the same. [variadic]

  • axis (int) – The axis on which to concatenate arrays. Axis indices take on values 0, 1, 2, and so on from the left. For example, to stack four (3,28,28) inputs on the second axis, specify 1. In this case, the output size will be (3,4,28,28). [default= 0 ]

Returns

Output

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.slice(x, start=None, stop=None, step=None, n_outputs=- 1, outputs=None)[source]

Slice arrays along specified axis. This function complies with python slice wherre slice(None, None, -1) and slice(-1, None, -1) are the special case, which flips the input array and results in the output array from the end to the beginning of the input array along the corresponding dimension.

Parameters
  • x (Variable) – N-D array

  • start (repeated int64) – Start indices for each axis [default=``(0,) * len(x.shape)``]

  • stop (repeated int64) – Stop indices for each axis [default=``tuple(x.shape)``]

  • step (repeated int64) – Step indices for each axis [default=``(1,) * len(x.shape)``]

Returns

Sliced N-D array

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.gather(x, Indices, axis=None, batch_dims=None, n_outputs=- 1, outputs=None)[source]

Gather from the input data according to the index.

Given the input data \(X\) of \((D_{0}, \ldots, D_{N-1})\) shape and the indices \(IDX\) of \((I_{0}, \ldots, I_{M-1})\) shape, in case of batch_dims = 0, the gather outputs

\[\begin{split}&& Y[d_{0}, \ldots, d_{axis - 1}, i_{0}, \ldots, i_{M-1}, d_{axis + 1}, \ldots, d_{N-1}] = \\ && X[d_{0}, \ldots, d_{axis - 1}, IDX[i_{0}, \ldots, i_{M-1}], d_{axis + 1}, \ldots, d_{N-1}].\end{split}\]

Generally, the gather ouptuts

\[\begin{split}&& Y[d_{0}, \ldots, d_{axis - 1}, i_{B}, \ldots, i_{M-1}, d_{axis + 1}, \ldots, d_{N-1}] = \\ && X[d_{0}, \ldots, d_{axis - 1}, IDX[i_{0}, \ldots, i_{B - 1}, i_{B} \ldots, i_{M-1}], d_{axis + 1}, \ldots d_{N-1}].\end{split}\]

where \(B\) = batch_dims.

x.shape[:batch_dims] must be equal to indices.shape[:batch_dims].

Output shape is x.shape[:axis] + indices.shape[batch_dims:] + x.shape[axis + 1].

Parameters
  • x (Variable) – Data from which to gather.

  • Indices (Variable) – Index with which to gather.

  • axis (int) – Axis in x to gather from. axis must be greater than or equal to batch_dims. [default= 0 ]

  • batch_dims (int) – The number of batch dimensions. [default= 0 ]

Returns

Gathered output.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.gather_nd(data, indices)[source]

Gather elements or slices from data according to indices, which must be at least two-dimensional with the first dimension \(M\) being less or equal to the \(N\) dimensions of data. Given data with shape \((X_0, X_1, ..., X_{N-1})\) and indices with shape \((M, Y_0, ..., Y_{K-1})\) output has shape \((Y_0, ..., Y_{K-1}, X_M, ..., X_{N-1})\). If \(M == N\), output shape is simply \((Y_0, ..., Y_{K-1})\).

The forward of gather_nd() is equivalent to:

def gather_nd(data, index):
    import numpy as np
    tmp_index = index.reshape(index.shape[0], -1)
    tmp_index = (idx + (Ellipsis,) for idx in zip(*new_index))
    out_shape = index.shape[1:] + data.shape[index.shape[0]:]
    return np.vstack(data[idx] for idx in tmp_index).reshape(*out_shape)

Examples:

>>> import numpy as np, nnabla as nn, nnabla.functions as F
>>> nn.set_auto_forward(True)
>>> data = F.arange(1, 11).reshape([2, 5])
>>> print(data.d)
[[ 1.  2.  3.  4.  5.]
 [ 6.  7.  8.  9. 10.]]
>>> F.gather_nd(data, [[1, 1, 0]]).shape
(3, 5)
>>> F.gather_nd(data, [[1, 1, 0], [0, 1, 0]]).shape
(3,)
>>> print(F.gather_nd(data, [[1, 1, 0], [0, 1, 0]]).d)
[6. 7. 1.]
>>> print(F.gather_nd(data, [[1, 1, 0]]).d)
[[ 6.  7.  8.  9. 10.]
 [ 6.  7.  8.  9. 10.]
 [ 1.  2.  3.  4.  5.]]

When indices is provided as a Variable it will be possible to change the actual index values after function creation. It is important to note that out-of-bound indices raise errors when running on CPU but are ignored when using an accelerated computation context.

>>> indices = nn.Variable((2, 1))
>>> indices.d = [[0], [0]]
>>> y = F.gather_nd(data, indices)
>>> print(y.d)
[1.]
>>> indices.d = [[1], [4]]
>>> y.forward()
>>> print(y.d)
[10.]
Parameters

Returns: ~nnabla.Variable or ~nnabla.NdArray of gathered elements.

nnabla.functions.scatter_nd(data, indices, shape=None, out=None, add=False)[source]

Scatter data according to indices into a new array of given shape or an existing array provided as out. Exactly one of the shape or out argument must be given. Given output shape, or shape of out array, \((X_0,X_1,\ldots,X_{N-1})\) and indices shape \((M,Y_0,\ldots,Y_{K-1})\) the input data shape is \((Y_0,\ldots,Y_{K-1},X_M,\ldots,X_{N-1})\), where \(M<=N\). If \(M==N\) the data shape is simply \((Y_0,\ldots,Y_{K-1})\). Note that indices are treated as integers and potentially converted.

The forward of scatter_nd() is equivalent to:

def scatter_nd(data, indices, shape=None, out=None):
    assert (shape and not out) or (out and not shape)
    if isinstance(indices, numpy.ndarray)
        indices = indices.tolist()
    result = out if out else numpy.zeros(shape)
    result[indices] = data
    return result

Examples:

>>> import numpy as np, nnabla as nn, nnabla.functions as F
>>> nn.set_auto_forward(True)
>>> data = nn.Variable.from_numpy_array(np.array([9, 10, 11, 12]))
>>> indices = nn.Variable.from_numpy_array(np.array([[4, 3, 1, 7]]))
>>> scattered = F.scatter_nd(data, indices, shape=(8,))
>>> print(scatterd.d)
[ 0. 11.  0. 10.  9.  0.  0. 12.]
>>> print(F.gather_nd(scattered, indices).d)
[ 9. 10. 11. 12.]
Parameters

Returns: ~nnabla.Variable or ~nnabla.NdArray of given shape.

nnabla.functions.scatter_add(x0, indices, x1, axis=None)[source]

Add all values from x1 into the x0 according to index specified by indices. This function adds x1 into the copy of x0 and outputs the copy. The original x0 will not be changed. x0, indices and x1 must have same number of dimensions.

The forward of scatter_add() is equivalent to:

def scatter_add(x0, indices, x1, axis):
    # Assuming each input is 3 dimensional
    import numpy as np
    output = np.copy(x0)
    for i in range(indices.shape[0]):
        for j in range(indices.shape[1]):
            for k in range(indices.shape[2]):
                if axis == 0:
                    output[indices[i][j][k]][j][k] += x1[i][j][k]
                elif axis == 1:
                    output[i][indices[i][j][k]][k] += x1[i][j][k]
                elif axis == 2:
                    output[i][j][indices[i][j][k]] += x1[i][j][k]
    return output
Parameters
  • x0 (Variable) – N-D array which the data is added to its copy.

  • indices (Variable) – N-D array scatter indices. The size of each dimension must be equal or smaller than that of x0 except for the specified axis. The value of indices must be smaller than the size of specified axis’ dimension of x0. The size of each dimension must be equal or smaller than that of x1. Indices must not be negative.

  • x1 (Variable) – N-D array which is scattered and added to x0.

  • axis (int) – Axis along which to index. The axis must not exceed the inputs’ dimension. [default= 0 ]

Returns

N-D array which contains the result of scatter addition. The shape is same as x0.

Return type

Variable

nnabla.functions.pad(x, pad_width, mode='constant', constant_value=0, n_outputs=- 1, outputs=None)[source]

Pad the input N-D array x over the number of dimensions given by half the length of the pad_width iterable, where every two values in pad_width determine the before and after pad size of an axis. The pad_width iterable must hold an even number of positive values which may cover all or fewer dimensions of the input variable x. If pad_width covers fewer dimensions then it applies to the innermost dimensions of x.

x = nn.Variable.from_numpy_array(np.ones((2, 3, 4)))
assert F.pad(x, (1, 1, 2, 2)).shape == (2, 5, 8)

Padding is performed according to the requested mode:

constant

Pads with a value given by the keyword argument constant_value.

x = nn.Variable.from_numpy_array(np.array([1, 2, 3, 4], dtype=np.int))
y = F.pad(x, (3, 3), 'constant', constant_value = -1)
y.forward()
assert np.all(y.d == np.array([-1, -1, -1, 1, 2, 3, 4, -1, -1, -1]))
reflect

Pads with the reflection of the vector mirrored on the first and last values of the vector along each axis.

x = nn.Variable.from_numpy_array(np.array([1, 2, 3, 4], dtype=np.int))
y = F.pad(x, (3, 3), 'reflect')
y.forward()
assert np.all(y.d == np.array([4, 3, 2, 1, 2, 3, 4, 3, 2, 1]))
repeat

Pads with the edge value of the vector along each axis.

x = nn.Variable.from_numpy_array(np.array([1, 2, 3, 4], dtype=np.int))
y = F.pad(x, (3, 3), 'repeat')
y.forward()
assert np.all(y.d == np.array([1, 1, 1, 1, 2, 3, 4, 4, 4, 4]))
Parameters
  • x (Variable) – N-D array

  • pad_width (repeated int64) – Iterable of before and after pad values.

  • mode (string) – Padding mode string. [default= 'constant' ]

  • constant_value (float) – Fill value if mode is constant. [default= 0 ]

Returns

Padded N-D array with the same number of dimensions as the input.

x = nn.Variable((3, 3, 4, 2))  # a shape like (B, C, H, W)
# 1-D padding: last dim by 1 left and 2 on the right side
assert F.pad(x, (1, 2)).shape == (3, 3, 4, 5)
# 2-D padding: last dim by (1, 1) and 2nd to last by (2, 2)
assert F.pad(x, (2, 2, 1, 1)).shape == (3, 3, 8, 4)
# 3-D padding: dims C by (0, 1), H by (2, 1), and W by (3, 3)
assert F.pad(x, (0, 1, 2, 1, 3, 3)).shape == (3, 4, 7, 8)

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.transpose(x, axes, n_outputs=- 1, outputs=None)[source]

Transposes tensor dimensions.

Parameters
  • x (Variable) – N-D array

  • axes (repeated int64) – Source axis indices for each axis.

Returns

Transposed N-D array.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.broadcast(x, shape, n_outputs=- 1, outputs=None)[source]

Broadcasting ND-array to the specified shape.

Parameters
  • x (Variable) – N-D array

  • shape (tuple of int) – Shape broadcasted to. The size must be the same in axis where x’s shape is not 1.

Returns

Broadcasted N-D array

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.broadcast_to(x, y, axis=None, n_outputs=- 1, outputs=None)[source]

Warning

This function is experimental support, so please do not actively use it.

Broadcasting ND-array to the specified buffer.

Parameters
  • x (Variable) – N-D array

  • y (Variable) – N-D array

  • axis (int) – Target axis to start broadcasting. If this is not set, broadcast will try to fit y to x starting from the last dimension [default= -1 ]

Returns

Broadcasted N-D array

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.tile(x, reps)[source]

Forward x repeated the number of times given by reps. If reps is a sequence, the output has dimension of d = max(len(reps), x.ndim) and either x is promoted to be d-dimensional by prepending new axes or reps is promoted to x.ndim by prepending 1’s.

Parameters
  • x (Variable) – Input N-D array.

  • reps (int or sequence of int) – Repetitions of x along each axis.

Returns

N-D array.

Return type

Variable

>>> import numpy as np, nnabla as nn, nnabla.functions as F
>>> F.tile(nn.Variable([2, 3]), 3).shape    # reps is promoted to [1, 3]
(2, 9)
>>> F.tile(nn.Variable([3]), [2, 3]).shape  # x is promoted to shape (1, 3)
(2, 9)
>>> nn.set_auto_forward(True)
>>> x = nn.Variable.from_numpy_array(np.array([1, 2, 3]))
>>> print(F.tile(x, 3).d)
[1. 2. 3. 1. 2. 3. 1. 2. 3.]
>>> print(F.tile(x, [2, 3]).d)
[[1. 2. 3. 1. 2. 3. 1. 2. 3.]
 [1. 2. 3. 1. 2. 3. 1. 2. 3.]]
>>> x = nn.Variable.from_numpy_array(np.array([[1, 3], [2, 4]]))
>>> print(F.tile(x, 3).d)
[[1. 3. 1. 3. 1. 3.]
 [2. 4. 2. 4. 2. 4.]]
>>> print(F.tile(x, [2, 3]).d)
[[1. 3. 1. 3. 1. 3.]
 [2. 4. 2. 4. 2. 4.]
 [1. 3. 1. 3. 1. 3.]
 [2. 4. 2. 4. 2. 4.]]
nnabla.functions.meshgrid(*x, ij_indexing=False)[source]
nnabla.functions.flip(x, axes=None, n_outputs=- 1, outputs=None)[source]

Reverses the order of elements of the specified dimension of an array.

Parameters
  • x (Variable) – N-D array

  • axes (repeated int64) – The index of the dimension to reverse the order of the elements. Axis indices take on values 0, 1, 2, and so on from the left. For example, to flip a 32 (W) by 24 (H) 100 RGB image (100,3,24,32) vertically and horizontally, specify (2,3). [default= [len(x.shape) - 1] ]

Returns

N-D array

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.shift(x, shifts=None, border_mode='nearest', n_outputs=- 1, outputs=None)[source]

Shifts the array elements by the specified amount.

Parameters
  • x (Variable) – N-D array.

  • shifts (repeated int64) – The amount to shift elements. For example, to shift image data to the right by 2 pixels and up 3 pixels, specify (-3,2). [default= (0,) * len(x.shape) ]

  • border_mode (string) – Specify how to process the ends of arrays whose values will be undetermined as a result of shifting. nearest: The data at the ends of the original array is copied and used. reflect: Original data reflected at the ends of the original array is used. [default= 'nearest' ]

Returns

N-D array.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.sort(x, axis=- 1, reverse=False, with_index=False, only_index=False)[source]

Sorts the elements of x along a given axis in ascending order by value. A negative axis counts from the last dimension of x, so the default of -1 sorts along the last dimension. If reverse is True, then the elements are soreted in descending order.

If with_index is True, result is a tuple (sorted, indices) or only indices if only_index is True. Setting only_index to True implies that with_index is also True.

import numpy as np
import nnabla as nn
import nnabla.functions as F

nn.set_auto_forward(True)
x = nn.Variable.from_numpy_array(np.random.rand(2, 3, 4))

sorted = F.sort(x)
assert np.allclose(sorted.d, np.sort(x.d))

sorted, indices = F.sort(x, with_index=True)
assert np.allclose(sorted.d, np.sort(x.d))
assert np.all(indices.d == np.argsort(x.d))

indices = F.sort(x, only_index=True)
assert np.all(indices.d == np.argsort(x.d))
Parameters
  • x (Variable) – N-D array

  • axis (int) – Axis along which to sort.

  • reverse (bool) – Sort in descending order.

  • with_index (bool) – Return sorted values and index.

  • only_index (bool) – Return only the sort index.

Returns: ~nnabla.Variable sorted or ~nnabla.Variable indices or (~nnabla.Variable sorted, ~nnabla.Variable indices)

nnabla.functions.reshape(x, shape, inplace=True, n_outputs=- 1, outputs=None)[source]

Reshapes the input variable in-place. It does not create a copy of the variable. The output variable (y) has a new shape but points to the same data as the input variable (x). This means that if the data in the output variable (y) is modified, the data in the input variable (x) also gets modified since the reshape was done in-place.

Note

This function has the same behavior as the nnabla.Variable.reshape() method.

Parameters
  • x (Variable) – N-D array.

  • shape (tuple of int) – Dimensions for each axis. -1 can be specified only in one shape dimension. The value is calculated from the size of the array and remaining dimensions.

  • inplace (bool) – The output array is shared with the input array if True. [default= True ]

Returns

Reshaped N-D array

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.one_hot(x, shape, n_outputs=- 1, outputs=None)[source]

This function creates one-hot vector based on input indices.

Example:

import nnabla as nn
import nnabla.functions as F
import numpy as np

labels = nn.Variable.from_numpy_array(np.array([[9], [4], [5], [1], [0]]))
print(labels.shape)  # (5, 1)

num_class = 10

y_train = F.one_hot(labels, shape=(num_class, ))
y_train.forward()

print(y_train.shape)  # (5, 10)
print(y_train.d)

# [[0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
#  [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
#  [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
#  [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
#  [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]

# Can also be used for ndarray.

labels = nn.Variable.from_numpy_array(np.array([[1, 7], [4, 7], [8, 6], [5, 0], [2, 6]]))
print(labels.shape)  # (5, 2)

num_class_1, num_class_2  = 10, 8

y_train = F.one_hot(labels, shape=(num_class_1, num_class_2))
y_train.forward()

print(y_train.shape)  # (5, 10, 8)
print(y_train.d)

# [[[0. 0. 0. 0. 0. 0. 0. 0.]          [[0. 0. 0. 0. 0. 0. 0. 0.]
#   [0. 0. 0. 0. 0. 0. 0. 1.]           [0. 0. 0. 0. 0. 0. 0. 0.]
#   [0. 0. 0. 0. 0. 0. 0. 0.]           [0. 0. 0. 0. 0. 0. 1. 0.]
#   [0. 0. 0. 0. 0. 0. 0. 0.]           [0. 0. 0. 0. 0. 0. 0. 0.]
#   [0. 0. 0. 0. 0. 0. 0. 0.]           [0. 0. 0. 0. 0. 0. 0. 0.]
#   [0. 0. 0. 0. 0. 0. 0. 0.]    ...    [0. 0. 0. 0. 0. 0. 0. 0.]
#   [0. 0. 0. 0. 0. 0. 0. 0.]           [0. 0. 0. 0. 0. 0. 0. 0.]
#   [0. 0. 0. 0. 0. 0. 0. 0.]           [0. 0. 0. 0. 0. 0. 0. 0.]
#   [0. 0. 0. 0. 0. 0. 0. 0.]           [0. 0. 0. 0. 0. 0. 0. 0.]
#   [0. 0. 0. 0. 0. 0. 0. 0.]],         [0. 0. 0. 0. 0. 0. 0. 0.]]]
Parameters
  • x (Variable) – N-D array representing label’s indice.

  • shape (tuple of int) – Number of classes. Note that it must be exactly the same as the number of classes included in label data. Passing incorrect numbers might cause an unexpected error and currently this function doesn’t check if the input is valid or not. Also, when nd-labels are given, dimensions must match. See the example above.

Returns

N-D array one-hot vector/tensor.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.assign(dst, src, n_outputs=- 1, outputs=None)[source]

Assign source array to destination array just like tf.assign. This is useful to synchronize or manually update parameters.

dst = nn.Variable((2, 3, 4))
src = nn.Variable((2, 3, 4))
assign = F.assign(dst, src)

assign.forward()
assert np.allclose(dst.d, src.d) # dst and src have identical values.
assert np.allclose(assign.d dst.d) # returned Variable is also identical to dst.

Unlike TensorFlow, the returned Variable has a backward path to dst:

\[g_{dst} = g_{y}\]
Parameters
  • dst (Variable) – A destination N-D array

  • src (Variable) – A source N-D array

Returns

An assigned array

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.top_k_data(x, k, abs=False, reduce=True, base_axis=1, n_outputs=- 1, outputs=None)[source]

Select the k largest values from each sample in x to propagate unmodified and set all other values to 0. If abs is True, the k largest values are selected by magnitude. If reduce is True (the default), all feature dimensions are reduced to a single dimension of size k that propagates only the k largest values. Otherwise, if reduce is False, input and output dimensions are identical. Dimensions before base_axis are treated as number of sample dimensions and k values get selected from all elements of a sample (dimensions from base_axis) regardless of shape.

>>> import nnabla as nn, nnabla.functions as F
>>> x = nn.Variable((4, 5, 6))
>>> F.top_k_data(x, 3, reduce=False).shape
(4, 5, 6)
>>> F.top_k_data(x, 3, reduce=True).shape
(4, 3)
>>> F.top_k_data(x, 3, reduce=True, base_axis=2).shape
(4, 5, 3)
Parameters
  • x (Variable) – N-D array

  • k (int) – Number of largest data values to propagate.

  • abs (bool) – Determine largest data values by magnitude. [default= False ]

  • reduce (bool) – Reduce feature size to one dimension of size k. [default= True ]

  • base_axis (int) – First dimension of the sample shape. [default= 1 ]

Returns

N-D array.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.top_k_grad(x, k, abs=False, base_axis=1, n_outputs=- 1, outputs=None)[source]

Select the k largest gradients for each sample in x to back-propagate unmodified and set all other gradients to 0. If abs is True, the k largest gradients are selected by magnitude. Dimensions before base_axis are treated as number of sample dimensions and k gradients get selected from all gradients of a sample (dimensions from base_axis) regardless of shape.

Parameters
  • x (Variable) – N-D array

  • k (int) – Number of largest gradients to propagate.

  • abs (bool) – Determine largest gradients by magnitude. [default= False ]

  • base_axis (int) – First dimension of the sample shape. [default= 1 ]

Returns

N-D array with same shape and data as x.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.pack_padded_sequence(padded_sequence, lengths, batch_first=False, n_outputs=- 1, outputs=None)[source]

Pack a padded variable-length sequences.

This method packs a padded variable-length sequences.

\(T_i\) is the length of the \(i\)-th Variable in the sequences. \(B\) is the batch size equal to the length of the sequences. \(T\) is the max of \(T_i\) for all \(i\). \(*\) is the remaining dimensions including none.

Note

This function assumes the length-sorted padded sequence in the decreasing order and must be used by pack_padded_sequence() in the dynamic computation mode. See :

Parameters
  • padded_sequence (Variable) – Padded sequence of (\(T \times B \times *\)) or (\(B \times T \times *\)) shape.

  • lengths (Variable) – Sequence length for each batch and always resides in CPU.

  • batch_first (bool) –

    padded_sequence is of (\(T\), \(B\), \(*\)) shape if False, otherwise (\(B\), \(T\), \(*\)).

    [default= False ]

Returns

Packed sequence of (\(N\), \(*\)) shape. ~nnabla.Variable: Batch size for each time and always resides in CPU.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.pad_packed_sequence(packed_sequence, batch_sizes, batch_first=False, padding_value=None, total_length=None, n_outputs=- 1, outputs=None)[source]

Pad packed sequence.

This method unpacks the packed sequqnce and pad it, the inverse operation of pack_padded_sequence().

\(T_i\) is the length of the \(i\)-th Variable in the sequences. \(B\) is the batch size equal to the length of the sequences. \(T\) is the max of \(T_i\) for all \(i\). \(*\) is the remaining dimensions including none.

Note

This function assumes the output of the length-sorted padded sequence in the decreasing order and must be used by pad_packed_sequence() in the dynamic computation mode.

Parameters
  • packed_sequence (Variable) – Packed sequence of (\(N\), \(*\)) shape.

  • batch_sizes (Variable) – Batch size for each time and always resides in CPU.

  • batch_first (bool) –

    padded_sequence is of (\(T\), \(B\), \(*\)) shape if False, otherwise (\(B\), \(T\), \(*\)).

    [default= False ]

  • padding_value (float) – Padding value. [default= 0.0 ]

  • total_length (int) –

    If not None, the outputs are padded up to the total_length. If the total_length is less than the max length in the sequences, the error is thrown.

    [default= -1 ]

Returns

Padded sequence of (\(T \times B \times *\)) or (\(B \times T \times *\)) shape. ~nnabla.Variable: Sequence length for each batch and always resides in CPU.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.searchsorted(sorted_sequence, values, right=None, n_outputs=- 1, outputs=None)[source]

Finds indices in the innermost dimension of a sorted sequance where values must be inserted in order to maintain value

Parameters
  • sorted_sequence (Variable) – N-D array of sorted sequence where search is to be performed. Note that this must be a sorted array

  • values (Variable) – N-D array of Search values

  • right (bool) – :If True, given a value v, the function returns index i such that sorted_sequence[i-1] <= v < sorted_sequence[i] (index of closest upper bound of v). By default, this is false so the function returns index i such that a[i-1] < v <= a[i] (index of closest lower bound of v) [default= False ]

Returns

N-D array containing the required indices

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.bool_gather(input, mask, n_outputs=- 1, outputs=None)[source]

Gather from the input data according to the mask.

Given an input of \((B_1, \ldots, B_N, D_1, \ldots, D_M)\) shape and mask of \((B_1, \ldots, B_N)\) shape, the function returns an output of \((nnz, D_1, \ldots, D_M)\) shape and \(nnz\) is the number of non-zero elements in mask.

import numpy as np
import nnabla as nn
import nnabla.functions as F

nn.set_auto_forward(True)

input = nn.Variable.from_numpy_array([[1, 2], [3, 4], [5, 6]])
mask = nn.Variable.from_numpy_array([1, 0, 1])
output = F.bool_gather(input, mask)

print(output.d) # [[1, 2], [5, 6]]

Note that this function is normally used with the dynamic graph since this function outputs a variable-length output. If used with the static graph, a network has to be constructed all time in iteration.

Parameters
  • input (Variable) – Data from which to gather.

  • mask (Variable) – Mask with which to gather. Non-zero/zero elements are supposed to be a binary mask as 1/0. No gradients are computed with respect to mask.

Returns

Gathered output.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.bool_scatter(input, mask, output=None, n_outputs=- 1, outputs=None)[source]

Scatter the input according to the mask.

Given an input of \((nnz, D_1, \ldots, D_M)\) shape and mask of \((B_1, \ldots, B_N)\) shape, the function returns an output \((B_1, \ldots, B_N, D_1, \ldots, D_M)\) and \(nnz\) is the number of non-zero elements in the mask.

import numpy as np
import nnabla as nn
import nnabla.functions as F

nn.set_auto_forward(True)

input0 = nn.Variable.from_numpy_array([[1, 2], [3, 4], [5, 6]])
mask = nn.Variable.from_numpy_array([1, 0, 1])
output0 = F.bool_gather(input0, mask)

input1 = output0 + 10
output1 = F.bool_scatter(input1, mask)

print(output1.d)  # [[11, 12], [0, 0], [15, 16]]

Note that the higher-order gradients of this function relies on F.gather, thus the higher-order gradients of this function is normally used with the dynamic graph.

Parameters
  • input (Variable) – Data to be scattered.

  • mask (Variable) – Mask with which to scatter. Non-zero/zero elements are supposed to be a binary mask as 1/0. No gradients are computed with respect to mask.

  • output (Variable) – Destination of output. If specified, data are inplaced. [optional]

Returns

Scattered output.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.bool_fill(data, mask, value=0, n_outputs=- 1, outputs=None)[source]

Fill the data with the value to according to the mask.

import numpy as np
import nnabla as nn
import nnabla.functions as F

nn.set_auto_forward(True)

input = nn.Variable.from_numpy_array([[np.inf, 2], [3, np.nan]])
mask = nn.Variable.from_numpy_array([[1, 0], [0, 1]])
output = F.bool_fill(input, mask, -1)

print(output.d)  # [[-1, 2], [3, -1]]
Parameters
  • data (Variable) – Data to be filled.

  • mask (Variable) – Mask with which to fill. Non-zero/zero elements are supposed to be a binary mask as 1/0. No gradients are computed with respect to mask.

  • value (float) – Value to fill. [default= 0 ]

Returns

Filled output.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.dot(a, b, out=None)[source]

A compatible operation with numpy.dot.

Note

Any operation between nnabla’s Variable/NdArray and numpy array is not supported.

If both arguments are 1-D, it is inner product of vectors. If both arguments are 2-D, it is matrix multiplication. If either a or b is 0-D(scalar), it is equivalent to multiply. If b is a 1-D array, it is a sum product over the last axis of a and b. If b is an M-D array (M>=2), it is a sum product over the last axis of a and the second-to-last axis of b.

Parameters
  • a (Variable, NdArray or scalar) – Left input array.

  • b (Variable, NdArray or scalar) – Right input array.

  • out – Output argument. This must have the same shape, dtype, and type as the result that would be returned for F.dot(a,b).

Returns

~nnabla.Variable or ~nnabla.NdArray

Examples:

import numpy as np
import nnabla as nn
import nnabla.functions as F

# 2-D matrix * 2-D matrix
arr1 = np.arange(5*6).reshape(5, 6)
arr2 = np.arange(6*8).reshape(6, 8)
nd1 = nn.NdArray.from_numpy_array(arr1)
nd2 = nn.NdArray.from_numpy_array(arr2)
ans1 = F.dot(nd1, nd2)
print(ans1.shape)
#(5, 8)

var1 = nn.Variable.from_numpy_array(arr1)
var2 = nn.Variable.from_numpy_array(arr2)
ans2 = F.dot(var1, var2)
ans2.forward()
print(ans2.shape)
#(5, 8)

out1 = nn.NdArray((5, 8))
out1.cast(np.float32)
F.dot(nd1, nd2, out1)
print(out1.shape)
#(5, 8)

out2 = nn.Variable((5, 8))
out2.data.cast(np.float32)
F.dot(var1, var2, out2)
out2.forward()
print(out2.shape)
#(5, 8)

# N-D matrix * M-D matrix (M>=2)
arr1 = np.arange(5*6*7*8).reshape(5, 6, 7, 8)
arr2 = np.arange(2*3*8*6).reshape(2, 3, 8, 6)
nd1 = nn.NdArray.from_numpy_array(arr1)
nd2 = nn.NdArray.from_numpy_array(arr2)
ans1 = F.dot(nd1, nd2)
print(ans1.shape)
#(5, 6, 7, 2, 3, 6)

var1 = nn.Variable.from_numpy_array(arr1)
var2 = nn.Variable.from_numpy_array(arr2)
ans2 = F.dot(var1, var2)
ans2.forward()
print(ans2.shape)
#(5, 6, 7, 2, 3, 6)

out1 = nn.NdArray((5, 6, 7, 2, 3, 6))
out1.cast(np.float32)
F.dot(nd1, nd2, out1)
print(out1.shape)
#(5, 6, 7, 2, 3, 6)

out2 = nn.Variable((5, 6, 7, 2, 3, 6))
out2.data.cast(np.float32)
F.dot(var1, var2, out2)
out2.forward()
print(out2.shape)
#(5, 6, 7, 2, 3, 6)
Stochasticity
nnabla.functions.rand(low=0, high=1, shape=[], seed=- 1, n_outputs=- 1, outputs=None)[source]

Samples numbers from a uniform distribution \(x \sim U(low, high)\) given lowest value \(low\), upper bound \(high\), and shape of the returned Variable.

Parameters
  • low (float) – \(low\) in definition. [default= 0 ]

  • high (float) – \(high\) in definition. [default= 1 ]

  • shape (tuple of int) – Shape of returned variable. [default= [] ]

  • seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default= -1 ]

Returns

Variable with the shape specified in the argument.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.randint(low=0, high=1, shape=[], seed=- 1, n_outputs=- 1, outputs=None)[source]

Samples integer numbers from a uniform distribution \(x \sim U(low, high)\) given lowest value \(low\), upper bound \(high\), and the shape of the returned Variable. The lowest value \(low\) is included in the range, while the upper bound \(high\) is excluded, corresponding to the half-open interval \([low, high)\).

Parameters
  • low (int) – \(low\) in definition. [default= 0 ]

  • high (int) – \(high\) in definition. [default= 1 ]

  • shape (tuple of int) – Shape of returned variable. [default= [] ]

  • seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default= -1 ]

Returns

Variable with the shape specified in the argument. The dtype is int32.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.randn(mu=0, sigma=1, shape=[], seed=- 1, n_outputs=- 1, outputs=None)[source]

Samples numbers from a normal distribution \(x \sim N(\mu, \sigma)\) given mean \(\mu\), standard deviation \(\sigma\), and shape of the returned Variable.

Parameters
  • mu (float) – \(\mu\) in definition. [default= 0 ]

  • sigma (float) – \(\sigma\) in definition. [default= 1 ]

  • shape (tuple of int) – Shape of returned variable. [default= [] ]

  • seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default= -1 ]

Returns

Variable with the shape specified in the argument.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.rand_binomial(n=1, p=0.5, shape=[], seed=- 1, n_outputs=- 1, outputs=None)[source]

Samples numbers from a binomial distribution \(x \sim B(n, p)\) given the numbers of trials \(n\), probability \(p\), and shape of the returned Variable. When \(n = 1\), this behaves like the Bernoulli distriburion.

Parameters
  • n (int) – \(n\) in definition, the number of trials. [default= 1 ]

  • p (float) – \(p\) in definition, probability of success. [default= 0.5 ]

  • shape (tuple of int) – Shape of returned variable. [default= [] ]

  • seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default= -1 ]

Returns

Variable with the shape specified in the argument.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.rand_beta(alpha=0.5, beta=0.5, shape=[], seed=- 1, n_outputs=- 1, outputs=None)[source]

Samples numbers from a beta distribution \(x \sim \beta(\alpha, \beta)\).

Parameters
  • alpha (float) – \(\alpha\), scale parameter. [default= 0.5 ]

  • beta (float) – \(\beta\), scale parameter. [default= 0.5 ]

  • shape (tuple of int) – Shape of returned variable. [default= [] ]

  • seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default= -1 ]

Returns

Variable with the shape specified in the argument.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.rand_gamma(k=0.5, theta=1, shape=[], seed=- 1, n_outputs=- 1, outputs=None)[source]

Samples numbers from a gamma distribution \(x \sim \frac {\gamma(k, \frac {x}{\theta})}{\Gamma(k)}\).

Parameters
  • k (float) – k, scale parameter. [default= 0.5 ]

  • theta (float) – \(\theta\), scale parameter. [default= 1 ]

  • shape (tuple of int) – Shape of returned variable. [default= [] ]

  • seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default= -1 ]

Returns

Variable with the shape specified in the argument.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.dropout(x, p=0.5, seed=- 1, output_mask=False)[source]

Dropout. Samples a number \(u\) from a uniform distribution in \([0, 1]\), and ignores the input if \(u \leq p\).

\[\begin{split}\begin{equation} y = \left\{ \begin{array}{ll} \frac{x}{1 - p} & (u > p) \\ 0 & ({\rm otherwise}) \end{array} \right. \end{equation}\end{split}\]
Parameters
  • x (Variable) – An input variable.

  • p (float) – math:p in definition. [default= 0.5 ]

  • seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default= -1 ]

  • output_mask (bool) – Whether or not to output mask. [default= False ]

Returns

N-D array.

Return type

Variable

Note

Usually dropout only applied during training as below (except `MC dropout`_). If you want to use dropout as an MC dropout, remove if train:.

h = PF.affine(x, num_hidden)
if train:
    h = F.dropout(h, 0.5)

reference: https://arxiv.org/abs/1506.02142

Note

If you use nn.grad to a graph having dropout, you must set output_mask=True for all dropouts. Otherwise, backward function of dropout raises ValueError when you call nn.grad.

h = PF.affine(x, num_hidden)
h, mask = F.dropout(h, p=0.1, output_mask=True)
y = PF.affine(h, num_hidden)

grad = nn.grad([y], nn.get_parameters().values())
nnabla.functions.random_choice(x, w, shape=[], replace=True, seed=- 1, n_outputs=- 1, outputs=None)[source]

Generate random samples from population x with selection probabilities determined by the relative weights w. The number of samples to draw is given by the product of shapes dimensions, and the samples are returned with the given shape. By default, samples are drawn with replacement, i.e. selection of a specific population member is solely determined by its associated weight. Sampling without replacement, where any population member may be drawn only once, is used if replace is set to False.

For both x and w the innermost dimension corresponds to the individual populations and their weights from which samples are returned with the requested shape following all outermost dimensions of the input.

import nnabla as nn
import nnabla.functions as F
import numpy as np
nn.set_auto_forward(True)

# x holds two populations
x = nn.Variable.from_numpy_array(np.array([[11, 22, 33], [110, 220, 330]]))
# w holds the weights for each population
w = nn.Variable.from_numpy_array(np.array([[10, 20, 70], [70, 20, 10]]))

# draw one sample from each population
y = F.random_choice(x, w)  # y.shape => (2, 1)

# draw 12 samples with shape (3, 4) from each population
y = F.random_choice(x, w, shape=(3, 4))  # y.shape => (2, 3, 4)

Note that weights must not be less than zero and for each population the sum of weights must be greater than zero. Additionally, sampling without replacement requires that the number of non-zero weights is not less than the number of samples to be drawn. These conditions are verified in “cpu” computation context but not when using “cuda” or “cudnn” acceleration (this would require additional device synchronization steps penalizing performance).

Random sampling from an implicit array of index values (like categorical or multinomial) can be realized with input x constructed as indices.

w = nn.Variable.from_numpy_array(np.array([1, 2, 3, 2, 1]))
y = F.random_choice(F.arange(0, 5), w)
Parameters
  • x (Variable) – N-D array from which a random sample is generated.

  • w (Variable) – N-D array of associated weights of elements in x.

  • shape (tuple of int) – Number and shape of generated samples. [default= [] ]

  • replace (bool) – Whether sampling is with or without replacement. [default= True ]

  • seed (int) – Random seed. [default= -1 ]

Returns

N-D array

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.random_crop(x, shape=None, base_axis=1, seed=- 1, n_outputs=- 1, outputs=None)[source]

RandomCrop randomly extracts a portion of an array.

Parameters
  • x (Variable) – N-D array

  • shape (tuple of int) – The data size to extract. For example, to randomly extract a portion of the image (3,48,48) from a 3,64,64 image, specify (3,48,48). [default= x.shape ]

  • base_axis (int) – No Description [default= 1 ]

  • seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default= -1 ]

Returns

N-D array

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.random_erase(x, prob=0.5, area_ratios=(0.02, 0.4), aspect_ratios=(0.3, 3.3333), replacements=(0.0, 255.0), n=None, share=True, inplace=False, base_axis=1, seed=- 1, channel_last=False, ste_fine_grained=True, n_outputs=- 1, outputs=None)[source]

Randomly erase patches of the inputs and replace with random values.

Erasing is applied for each sample and for each n with the given probability, the randomly selected area ratio and aspect ratio if share is True; otherwise (share`=`False), for each feature additionally.

Random patch are selected by random coordinates as the following,

\[\begin{split}S_e &&= Uniform(s_l, s_h) \times S \\ r_e &&= Uniform(r_l, r_h) \\ H_e &&= \sqrt{S_e \times r_e} \\ W_e &&= \sqrt{S_e / r_e} \\ y_e &&= Uniform(0, H - H_e) \\ x_e &&= Uniform(0, W - W_e),\end{split}\]

where \(S\) is the area, \(s_l\) and \(s_h\) are the low and high values of the area ratio range, \(r_l\) and \(r_h\) are the low and high values of the aspect ratio range, \(H_e\) and \(W_e\) are height and width of a patch, and \(y_e\) and \(x_e\) are the start coordinates of a patch. If a pixel of the inputs falls in this patch, the value of that pixel is replaced with a random value in replacements range.

Backward is implemented as passing gradients if ste_fine_grained is False; otherwise, the backward only occurs in regions not erased.

References

Parameters
  • x (Variable) – N-D array.

  • prob (float) – Probability to erase. [default= 0.5 ]

  • area_ratios (repeated float) – Low and high of the area ratio range. [default= (0.02, 0.4) ]

  • aspect_ratios (repeated float) – Low and high of the aspect ratios range. [default= (0.3, 3.3333) ]

  • replacements (repeated float) – Low and high of the replacement value range. [default= (0.0, 255.0) ]

  • n (int) – Max number of patches to be erased. [default= 1 ]

  • share (bool) – Use a same bounding box randomly picked over the feature dimension when being True. Default is False. [default= True ]

  • inplace (bool) – This option is obsolete and ignored. Output is never in-placed with input. [default= False ]

  • base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default= 1 ]

  • seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default= -1 ]

  • channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default= False ]

  • ste_fine_grained (bool) – Straight Through Estimator is fine-grained or not. Default is True. [default= True ]

Returns

N-D array.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.random_flip(x, axes=None, base_axis=1, seed=- 1, n_outputs=- 1, outputs=None)[source]

Reverses the order of elements of the specified dimension of an array at 50% probability.

Parameters
  • x (Variable) – N-D array

  • axes (repeated int64) – The index of the axis to reverse the order of the elements. Axis indices take on values 0, 1, 2, and so on from the left. For example, to flip a 32 (W) by 24 (H) 100 RGB images (100, 3,24,32) vertically and horizontally at random, specify (2,3). [default= [len(x.shape) - 1] ]

  • base_axis (int) – No Description [default= 1 ]

  • seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default= -1 ]

Returns

N-D array

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.random_shift(x, shifts=None, border_mode='nearest', constant_value=0, base_axis=1, seed=- 1, n_outputs=- 1, outputs=None)[source]

Randomly shifts the array elements within the specified range.

Parameters
  • x (Variable) – N-D array.

  • shifts (repeated int64) – Max absolute amount to shift elements. For example, to shift image data horizontally by \(\pm 2\) pixels and vertically by \(\pm 3\) pixels, specify (3,2). [default= (0,) * len(x.shape) ]

  • border_mode (string) – Specify how to process the ends of arrays whose values will be undetermined as a result of shifting. nearest: The data at the ends of the original array is copied and used. reflect: Original data reflected at the ends of the original array is used. constant: Constant value is used. [default= 'nearest' ]

  • constant_value (float) – Value used for outside of the original array if border_mode=’constant’. [default= 0 ]

  • base_axis (int) – No Description [default= 1 ]

  • seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default= -1 ]

Returns

N-D array.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.image_augmentation(x, shape=None, pad=(0, 0), min_scale=1.0, max_scale=1.0, angle=0.0, aspect_ratio=1.0, distortion=0.0, flip_lr=False, flip_ud=False, brightness=0.0, brightness_each=False, contrast=1.0, contrast_center=0.0, contrast_each=False, noise=0.0, seed=- 1, n_outputs=- 1, outputs=None)[source]

ImageAugmentation randomly alters the input image.

Parameters
  • x (Variable) – N-D array.

  • shape (tuple of int) – The output image data size. [default= x.shape ]

  • pad (tuple of int) – Border padding values for each spatial axis. Padding will be added both sides of the dimension. [default= (0, 0) ]

  • min_scale (float) – The minimum scale ratio when randomly scaling the image. For example, to scale down to 0.8 times the size of the original image, specify “0.8”. To not apply random scaling, set both min_scale and max_scale to “1.0”. [default= 1.0 ]

  • max_scale (float) – The maximum scale ratio when randomly scaling the image. For example, to scale down to 2 times the size of the original image, specify “2.0”. [default= 1.0 ]

  • angle (float) – The rotation angle range in radians when randomly rotating the image. The image is randomly rotated in the -Angle to +Angle range. For example, to rotate in a +-15 degree range, specify “0.26” (15 degrees/360 degrees * 2PI). To not apply random rotation, specify “0.0”. [default= 0.0 ]

  • aspect_ratio (float) – The aspect ratio range when randomly deforming the image. For example, to deform aspect ratio of image from 1:1.3 to 1.3:1, specify “1.3”. To not apply random deforming, specify “1.0”. [default= 1.0 ]

  • distortion (float) – The distortion range when randomly distorting the image. To not apply distortion, specify “0.0”. [default= 0.0 ]

  • flip_lr (bool) – Whether to randomly flip the image horizontally at 50% probability. [default= False ]

  • flip_ud (bool) – Whether to randomly flip the image vertically at 50% probability. [default= False ]

  • brightness (float) – The absolute range of values to randomly add to the brightness. A random value in the -Brightness to +Brightness range is added to the brightness. For example, to vary the brightness in the -0.05 to +0.05 range, specify “0.05”. To not apply random addition to brightness, specify “0.0”. [default= 0.0 ]

  • brightness_each (bool) – Whether to apply the random addition to brightness (as specified by brightness) to each color channel. True: brightness is added based on a different random number for each channel. False: brightness is added based on a random number common to all channels. [default= False ]

  • contrast (float) – The range in which to randomly vary the image contrast. The contrast is varied in the 1/Contrast times to Contrast times range. The output brightness is equal to (input - contrast_center) * contrast + contrast_center. For example, to vary the contrast in the 0.91 times to 1.1 times range, specify “1.1”. To not apply random contrast variation, specify “1.0”. [default= 1.0 ]

  • contrast_center (float) – Intensity center used for applying contrast. [default= 0.0 ]

  • contrast_each (bool) – Whether to apply the random contrast variation (as specified by contrast) to each color channel. True: contrast is varied based on a different random number for each channel. False: contrast is varied based on a random number common to all channels. [default= False ]

  • noise (float) – Sigma of normal random number to be added. [default= 0.0 ]

  • seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default= -1 ]

Returns

N-D array.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

Loss Functions
nnabla.functions.sigmoid_cross_entropy(x, target, n_outputs=- 1, outputs=None)[source]

Element-wise cross entropy between x and the target variables, passed to a sigmoid function.

\[y_i = - \left(x^{(1)}_i \ln \left(\sigma \left(x^{(0)}_i \right)\right) + \ \left(1 - x^{(1)}_i\right) \ln \left(1 - \sigma \left(x^{(0)}_i \ \right)\right)\right)\]

where \(\sigma(s)=\frac{1}{1+\exp(-s)}\).

Note

SigmoidCrossEntropy is equivalent to Sigmoid+BinaryCrossEntropy, but computing them at once has the effect of reducing computational error.

Parameters
  • x (Variable) – N-D array. Typically indicates a score. The value lies in \([-\infty, \infty]\) [parameter]

  • target (Variable) – N-D array of labels. Only 0 or 1 value is allowed. [parameter]

Returns

N-D array of element-wise losses.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.binary_cross_entropy(x, target, n_outputs=- 1, outputs=None)[source]

Element-wise cross entropy between x and the target variables.

\[y_i = - \left(x^{(1)}_i * \ln \left(x^{(0)}_i\right) + \left(1 - \ x^{(1)}_i\right) * \ln \left(1 - x^{(0)}_i\right)\right).\]
Parameters
  • x (Variable) – Probabilities N-D array. \(-\infty\) to \(\infty\).

  • target (Variable) – N-D array of labels. Usually set as 0 or 1, but, unlike SigmoidCrossEntropy, it allows probability (0 to 1) as inputs and backpropagation can be done.

Returns

N-D array of element-wise losses.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.softmax_cross_entropy(x, target, axis=None, n_outputs=- 1, outputs=None)[source]

Element-wise cross entropy between the variables and the variables of a label given by a category index with Softmax normalization.

\[y_{j} = -\ln \left(\frac{\exp(x_{j,t_j})}{\sum_{i'} \exp(x_{j,i'})}\right)\]

along dimension specified by axis (\(i\) is the axis where normalization is performed on).

Note

SoftmaxCrossEntropy is equivalent to Softmax+CategoricalCrossEntropy, but computing them at once has the effect of reducing computational error.

Parameters
  • x (Variable) – N-D array. Typically indicates a score. \((D_1 \times ... \times D_i \times ... \times D_N)\) [parameter]

  • target (Variable) – N-D array of labels. \((D_1 \times ... \times 1 \times ... \times D_N)\) , each label should be the index from 0 to n-class, -1 if not belongs any class. [parameter]

  • axis (int) – Axis normalization is taken. [default= len(x.shape) - 1 ]

Returns

N-D array of element-wise losses. \((D_1 \times ... \times 1 \times ... \times D_N)\)

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.categorical_cross_entropy(x, target, axis=None, n_outputs=- 1, outputs=None)[source]

Element-wise cross entropy between x and the target t where targets are given by a category index.

\[y_{j} = -\ln \left( x_{j, t_j} \right)\]

along dimension specified by axis (\(i\) is the axis where normalization is performed on).

Parameters
  • x (Variable) – N-D array. Typically indicates a score. \((D_1 \times ... \times D_i \times ... \times D_N)\) [parameter]

  • target (Variable) – N-D array of labels. \((D_1 \times ... \times 1 \times ... \times D_N)\), each label should be the index from 0 to n-class, -1 if not belongs any class. [parameter]

  • axis (int) – Axis normalization is taken. [default= len(x.shape) - 1 ]

Returns

N-D array of element-wise losses. \((D_1 \times ... \times 1 \times ... \times D_N)\)

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.squared_error(x0, x1, n_outputs=- 1, outputs=None)[source]

Element-wise squared error

\[y_i = \left(x^{(0)}_i - x^{(1)}_i\right)^2.\]
Parameters
Returns

N-D array.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.absolute_error(x0, x1, n_outputs=- 1, outputs=None)[source]

Element-wise absolute error

\[y_i = | x^{(0)}_i - x^{(1)}_i |.\]
Parameters
Returns

N-D array.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.huber_loss(x0, x1, delta=1.0, n_outputs=- 1, outputs=None)[source]

Element-wise Huber loss

\[\begin{split}y_i= \left\{ \begin{array}{ll} d^2 & (|d| < \delta)\\ \delta (2 |d| - \delta) & ({\rm otherwise}) \end{array} \right.\end{split}\]

where \(d = x^{(0)}_i - x^{(1)}_i\)

Parameters
Returns

N-D array of element-wise losses.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.epsilon_insensitive_loss(x0, x1, epsilon, n_outputs=- 1, outputs=None)[source]

Element-wise Epsilon Insensitive Loss

\[\begin{split}y_i= \left\{ \begin{array}{ll} | x^{(0)}_i - x^{(1)}_i | - \epsilon & if \ \ | x^{(0)}_i - x^{(1)}_i | > \epsilon \\ 0 & otherwise \end{array} \right.\end{split}\]
Parameters
Returns

N-D array of element-wise losses.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.kl_multinomial(p, q, base_axis=1, n_outputs=- 1, outputs=None)[source]

The Kullback Leibler Divergence for multinomial distributions.

\[D = \sum_i p_i \log \left( \frac{p_i}{q_i} \right)\]
Parameters
  • p (Variable) – N-D array of the source categorical probabilities

  • q (Variable) – N-D array of the target categorical probabilities

  • base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default= 1 ]

Returns

Kullback Leibler divergence \(KL(p \parallel q)\).

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

Signal Processing
nnabla.functions.interpolate(x, scale=None, output_size=None, mode='linear', align_corners=False, half_pixel=False, half_pixel_for_nn=False, channel_last=False)[source]

Resize an ND array with interpolation.

Scaling factors for spatial dimensions are determined by either scale or output_size.

nd = len(scale) or nd = len(output_size) determines the number of spatial dimensions, and the last nd dimensions of the input x are considered as the spatial dimensions to be resized.

If scale is given, the output_size is calculated by

output_size[i] = floor(scale[i] * x.shape[i - len(scale)]).

Calculation of the coordinate transformation are as follows.

The input coordinate i_input is computed by the output coordinate i_output, the input size size_input, and the output size size_output as

align_corners

half_pixel

i_input

True

True

Not supported.

True

False

i_output * (size_input - 1) / (size_output - 1)

False

True

(i_output + 0.5) * size_input / size_output - 0.5

False

False

i_output * size_input / size_output

In the case of the nearest mode and half_pixel_for_nn is True, the input coordinate i_input is computed by the output coordinate i_output as

i_input = (i_output + 0.5) * size_input / size_output.

Example:

import numpy as np
import nnabla as nn
import nnabla.functions as F

x_data = np.random.rand(64, 3, 224, 224)
x = nn.Variable.from_numpy_array(x_data)

# Resize by scales
y = F.interpolate(x, scale=(2, 2), mode='linear')
print(y.shape)  # (64, 3, 448, 448)
y.forward()
print(y.d)  # Print output

# Resize to a size
y2 = F.interpolate(x, output_size=(320, 257), mode='linear')
print(y2.shape)  # (64, 3, 320, 257)
y2.forward()
print(y2.d)  # Print output
Parameters
  • x (Variable) – N-D array with an arbitrary number of dimensions.

  • scale (tuple of ints) – Scale factors along axes. The default is None, and if this is omitted, output_size must be specified.

  • output_size (tuple of ints) – The output sizes for axes. If this is given, the scale factors are determined by the output sizes and the input sizes. The default is None, and if this is omitted, scale must be specified.

  • mode (str) – Interpolation mode chosen from (‘linear’|’nearest’). The default is ‘linear’.

  • align_corners (bool) – If true, the corner pixels of input and output arrays are aligned, such that the output corner pixels have the same values with the input corner pixels. Default is False.

  • half_pixel – If true, in the coordinate transformation, 0.5 is added to the output coordinate and 0.5 is subtracted from the input coordinate after scaling. Default is False.

  • half_pixel_for_nn – This is a special argument to support the backward-compatibility of the nearest neighbor interpolation. Default is False. When in True, the implementation of nearest neighbor interpolation is the old one.

  • channel_last – Last dimension is the channel (NHWC order) if True.

Returns

N-D array.

Return type

Variable

Warning

Up to the version 1.8.0, the default of align_corners was None, and it becomes True if mode is linear, otherwise False.

Warning

Up to the version 1.8.0, the nearest mode interpolation corresponds to the nearest mode and half_pixel_for_nn = True after the version 1.8.0.

nnabla.functions.fft(x, signal_ndim, normalized=False, n_outputs=- 1, outputs=None)[source]

Complex-to-complex Discrete Fourier Transform,

\[X_{k_1, \ldots, k_d} = \sum_{n_1=0}^{N_1-1} \dots \sum_{n_d=0}^{N_d-1} x_{n_1, \ldots, n_d} \exp\left(-2 \pi j \left( \sum_{i=0}^{d} \frac{k_i n_i}{N_i} \right) \right),\]

where

\[k_i = 0, \ldots, N_i - 1.\]

This function now supports 1-D, 2-D, and 3-D DFT with or without the leading batch dimension(s).

The input is expected to be complex-valued with at least signal_ndim + 1 dimensions. The last dimension has a shape of two where x[…, 0] is the real part and x[…, 1] the imaginary part.

Example:

import numpy as np
import nnabla as nn
import nnabla.functions as F
from nnabla.ext_utils import get_extension_context

ctx = get_extension_context("cudnn")
nn.set_default_context(ctx)

# Example for a batched 2D-FFT and 2D-IFFT (batch-size: 2, data-size: 4x3)
x_data = np.random.rand(2, 4, 3) + 1j * np.random.rand(2, 4, 3)
x = nn.Variable.from_numpy_array(np.stack([np.real(x_data), np.imag(x_data)], axis=3))
y = F.fft(x, signal_ndim=2, normalized=True)
z = F.ifft(y, signal_ndim=2, normalized=True)
z.forward()

np.allclose(z.d[..., 0] + 1j*z.d[...,1], x_data)
Parameters
  • x (Variable) – Input.

  • signal_ndim (int) – The number of dimensions for each signal. It must be 1, 2, or 3.

  • normalized (bool) – Use unitary normalization. If True, the normalization constant \(\sqrt{\frac{1}{\prod_{i=1}^{d} N_i}}\) is multiplied. [default= False ]

Returns

FFT transformed signal.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.ifft(x, signal_ndim, normalized=False, n_outputs=- 1, outputs=None)[source]

Complex-to-complex inverse Discrete Fourier Transform,

\[X_{k_1, \ldots, k_d} = \frac{1}{\prod_{i=1}^{d} N_i} \sum_{n_1=0}^{N_1-1} \dots \sum_{n_d=0}^{N_d-1} x_{n_1, \ldots, n_d} \exp\left(2 \pi j \left( \sum_{i=0}^{d} \frac{k_i n_i}{N_i} \right) \right),\]

where

\[k_i = 0, \ldots, N_i - 1.\]

This function now supports 1-D, 2-D, and 3-D DFT with or without the leading batch dimension(s).

The input is expected to be complex-valued with at least signal_ndim + 1 dimensions. The last dimension has a shape of two where x[…, 0] is the real part and x[…, 1] the imaginary part.

Parameters
  • x (Variable) – Input.

  • signal_ndim (int) – The number of dimensions for each signal. It must be 1, 2, or 3.

  • normalized (bool) – Use unitary normalization. If True, the normalization constant \(\frac{1}{\prod_{i=1}^{d} N_i}\) becomes \(\sqrt{\frac{1}{\prod_{i=1}^{d} N_i}}\). [default= False ]

Returns

IFFT transformed signal.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.stft(x, window_size, stride, fft_size, window_type='hanning', center=True, pad_mode='reflect')[source]

Computes the short-time Fourier transform

Parameters
  • x (Variable) – Time domain sequence of size batch_size x sample_size.

  • window_size (int) – Size of STFT analysis window.

  • stride (int) – Number of samples that we shift the window, also called hop size.

  • fft_size (int) – Size of the FFT, the output will have fft_size // 2+ 1 frequency bins.

  • window_type (str) – Analysis window, can be either hanning, hamming or rectangular. For convenience, also window_type=None is supported which is equivalent to window_type='rectangular'.

  • center (bool) – If True, then the signal x is padded by half the FFT size using reflection padding.

  • pad_mode (str) – Padding mode, which can be 'constant' or 'reflect'. 'constant' pads with 0.

Returns

Returns real and imaginary parts of STFT result.

  • Variable: Real part of STFT of size batch_size x fft_size//2 + 1 x frame_size.

  • Variable: Imaginary part of STFT of size batch x fft_size//2 + 1 x frame_size.

nnabla.functions.istft(y_r, y_i, window_size, stride, fft_size, window_type='hanning', center=True)[source]

Computes the inverse shoft-time Fourier transform

Note: We use a constant square inverse window for the reconstruction of the time-domain signal, therefore, the first and last window_size - stride are not perfectly reconstructed.

Parameters
  • y_r (Variable) – Real part of STFT of size batch_size x fft_size//2 + 1 x frame_size.

  • y_i (Variable) – Imaginary part of STFT of size batch_size x fft_size//2 + 1 x frame_size.

  • window_size (int) – Size of STFT analysis window.

  • stride (int) – Number of samples that we shift the window, also called hop size.

  • fft_size (int) – Size of the FFT, (STFT has fft_size // 2 + 1 frequency bins).

  • window_type (str) – Analysis window, can be either hanning, hamming or rectangular. For convenience, also window_type=None is supported which is equivalent to window_type='rectangular'.

  • center (bool) – If True, then it is assumed that the time-domain signal has centered frames.

Returns

Time domain sequence of size batch_size x sample_size.

Return type

Variable

Geometric Neural Network Layers
nnabla.functions.affine_grid(theta, size, align_corners=False, n_outputs=- 1, outputs=None)[source]

Generate the source grid based on the normalized target grid with size. The target grid is first normalized in [-1, 1], then tranformed by the affine transformation \(\theta\) to generate the source grid. 2D and 3D grid are supported now.

This function is normally used with the warp_by_grid function for constructing the spatial transformer.

Parameters
  • theta (Variable) – N-D array with the shape (\(B \times 2 \times 3\)), the sample-wise affine transformation matrix.

  • size (repeated int64) – The grid size of (\(H \times W\)) for 2D and (\(D \times H \times W\)) for 3D.

  • align_corners (bool) – If True, the top-left and bottom-right pixels correspond to (-1, -1) and (1, 1) respectively since a pixel is located on the corner of a grid, and the target grid is normalized in [-1, 1]. If False, the normalized target grid in [-1, 1] is scaled by size - 1 / size according to the respective spatial size (e.g., \(H\) and \(W\)) before the transformation since a pixel is located on a center of a cell in a grid. [default= False ]

Returns

N-D array with the shape (\(B \times H \times W \times 2\)) for 2D and (\(B \times D \times H \times W \times 3\)) for 3D. The last dimension of 2 is for (x, y) and of 3 for (x, y, z). The gird is used as the source grid for the warping.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.warp_by_grid(x, grid, mode='linear', padding_mode='zero', align_corners=False, channel_last=False, n_outputs=- 1, outputs=None)[source]

Warp the input data by the grid. This function is normally used with the generated normalized grid by the affine_grid function for constructing the spatial transformer.

Parameters
  • x (Variable) – Input data to be warped with the shape (\(B \times C \times H_{in} \times W_{in}\)) for 2D and (\(B \times C \times D_{in} \times H_{in} \times W_{in}\)) for 3D.

  • grid (Variable) – Grid warping the input data with the shape (\(B \times H_{out} \times W_{out} \times 2\)) for 2D and (\(B \times D_{out} \times H_{out} \times W_{out} \times 3\)) for 3D. The last dimension of 2 is for (x, y) or 3 for (x, y, z).

  • mode (string) – Interpolation mode, linear or nearest. [default= 'linear' ]

  • padding_mode (string) – Padding mode when the grid value is outside [-1, 1]. If this is “zero”, 0 is used for padding. “reflect” uses the values reflected at the ends of the original input data like the mirror. “repeat” used the values at the ends of the original input data. [default= 'zero' ]

  • align_corners (bool) – The target grid normalized in [-1, 1] is scaled by size - 1 / size according to the respective spatial size (e.g., \(H\) and \(W\)) before the transformation if this is False. If this is True, the top-left and bottom-right pixels correspond to (-1, -1) and (1, 1) respectively. [default= False ]

  • channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a NHWC order. [default= False ]

Returns

Output data warped by the grid.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.warp_by_flow(data, flow, n_outputs=- 1, outputs=None)[source]

Transform the image(s) data by flow field(s) of offset vectors such that each output pixel corresponds to the input image pixel at the relative offset location given by horizontal and vertical flow values (in other words, the flow field describes the coordinate displacements for each output pixel to the corresponding input pixel). Both data and flow are 4-D variables (in “NCHW” layout) with identical shape except the flow channel dimension (which is always 2).

\[output_{n,c,y,x} = data_{n,c,y',x'},\]

where

\[\begin{split}y' &=& y + flow_{n,1,y,x}, \\ x' &=& x + flow_{n,0,y,x}.\end{split}\]

The output pixel values at \(y'\) and \(x'\) locations are obtained by bilinear interpolating between the 4 closest pixels of the input image. Pixel values outside of the input image are implicitly padded with the value of the closest boundary pixel.

Parameters
  • data (Variable) – Input image data with shape (N, Channels, Height, Width).

  • flow (Variable) – Flow field vectors with shape (N, 2, Height, Width).

Returns

Transformed image data with shape (N, Channels, Height, Width).

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

Quantized Neural Network Layers
nnabla.functions.binary_sigmoid(x, n_outputs=- 1, outputs=None)[source]

Element-wise binary sigmoid function. In the forward pass, it computes

\[\begin{split}f(x) = \begin{cases} 1 & (x > 0) \\ 0 & ({\rm otherwise})\end{cases},\end{split}\]

but in the backward pass, a straight-through approximation of the gradient is used, i.e.,

\[\begin{split}\frac{\partial f(x)}{\partial x} = \begin{cases} 0 & (|x| \geq 1) \\ \frac{1}{2} & ({\rm otherwise}) \end{cases}.\end{split}\]

References

Parameters

x (Variable) – Input .

Returns

Output.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.binary_tanh(x, n_outputs=- 1, outputs=None)[source]

Element-wise binary tanh function. In the forward pass, it computes

\[\begin{split}f(x) = \begin{cases} 1 & (x > 0) \\ -1 & ({\rm otherwise}) \end{cases},\end{split}\]

but in the backward pass, a straight-through approximation of the gradient is used, i.e.,

\[\begin{split}\frac{\partial f(x)}{\partial x} = \begin{cases} 0 & (|x| \geq 1) \\ 1 & ({\rm otherwise}) \end{cases}.\end{split}\]

References

Parameters

x (Variable) – Input .

Returns

Output.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.binary_connect_affine(x, weight, binary_weight, bias=None, base_axis=1, quantize_zero_to=1.0, n_outputs=- 1, outputs=None)[source]

This function provides a BinaryConnect affine layer. It computes in the forward pass

\[y_j = \sum_{i} sign(w_{j,i}) x_i,\]

i.e., the weights \(w_{j,i}\) are binarized to \(sign(w_{j,i})\) and, hence, each weight is in \(\{-1,\,1\}\). By this weight binarization, the inner product computations do not require any multiplications anymore as they turn into additions/subtractions.

This function should be used together with batch_normalization().

Note

1) If you would like to share the binary weights between other layers, please use the standard, floating value weights (weight) and not the binary weights (binary_weight).

2) The weights and the binary weights become in sync only after a call to forward(), and not after a call to backward(). If you wish to store the parameters of the network, remember to call forward(), once before doing so, otherwise the weights and the binary weights will not be in sync.

3) CPU and GPU implementations now use floating values for binary_weight, since this function is for simulation purposes.

References

Parameters
  • x (Variable) – Input .

  • weight (Variable) – Weight . [parameter]

  • binary_weight (Variable) – Binarized weight . [parameter]

  • bias (Variable) – Bias. [optional][parameter]

  • base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default= 1 ]

  • quantize_zero_to (float) – Input value at zero is quantized to this value. [default= 1.0 ]

Returns

Output.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.binary_connect_convolution(x, weight, binary_weight, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, quantize_zero_to=1.0, n_outputs=- 1, outputs=None)[source]

This function provides a BinaryConnect convolution layer. It computes in the forward pass

\[y_{n, a, b} = \sum_{m} \sum_{i} \sum_{j} sign(w_{n, m, i, j}) x_{m, a + i, b + j},\]

i.e., the weights \(w_{n, m, i, j}\) are binarized to \(sign(w_{n, m, i, j})\) and, hence, each weight is in \(\{-1,\,1\}\). By this weight binarization, the inner product computations do not require any multiplications anymore as they turn into additions/subtractions.

This function should be used together with batch_normalization().

Reference

Note

1) If you would like to share the binary weights between other layers, please use the standard, floating value weights (weight) and not the binary weights (binary_weight).

2) The weights and the binary weights become in sync only after a call to forward(), and not after a call to backward(). If you wish to store the parameters of the network, remember to call forward(), once before doing so, otherwise the weights and the binary weights will not be in sync.

3) CPU and GPU implementations now use floating values for binary_weight, since this function is for simulation purposes.

Parameters
  • x (Variable) – Input.

  • weight (Variable) – Weight. [parameter]

  • binary_weight (Variable) – Binarized weight. [parameter]

  • bias (Variable) – Bias. [optional][parameter]

  • base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default= 1 ]

  • pad (tuple of int) – Padding sizes for dimensions. [default= (0,) * (len(x.shape) - (base_axis+1)) ]

  • stride (tuple of int) – Stride sizes for dimensions. [default= (1,) * (len(x.shape) - (base_axis+1)) ]

  • dilation (tuple of int) – Dilation sizes for dimensions. [default= (1,) * (len(x.shape) - (base_axis+1)) ]

  • group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default= 1 ]

  • quantize_zero_to (float) – Input value at zero is quantized to this value. [default= 1.0 ]

Returns

Output

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.binary_weight_affine(x, weight, binary_weight, alpha, bias=None, base_axis=1, quantize_zero_to=1.0, n_outputs=- 1, outputs=None)[source]

This function provides a Binary Weight Network affine layer. It computes in the forward pass

\[y_j = \frac{1}{\|\mathbf{w}_j\|_{\ell_1}} \sum_{i} sign(w_{j,i}) x_i\]

i.e., the weights \(w_{j,i}\) are binarized to \(sign(w_{j,i})\) and, hence, each weight is in \(\{-1,\,1\}\). By this weight binarization, the inner product computations turn into additions/subtractions which are followed by multiplication with the scaling factor \(\alpha_j = \frac{1}{\|\mathbf{w}_j\|_{\ell_1}}\).

Reference

Note

1) If you would like to share the binary weights with other layers, please use the standard, floating value weights (weight) and not the binary weights (binary_weight).

2) The weights and the binary weights become in sync only after a call to forward(), and not after a call to backward(). If you wish to store the parameters of the network, remember to call forward(), once before doing so, otherwise the weights and the binary weights will not be in sync.

3) CPU and GPU implementations now use floating values for binary_weight, since this function is for simulation purposes.

Parameters
  • x (Variable) – Input .

  • weight (Variable) – Weight. [parameter]

  • binary_weight (Variable) – Binarized weight. [parameter]

  • alpha (Variable) – Alpha. [parameter]

  • bias (Variable) – Bias. [optional][parameter]

  • base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default= 1 ]

  • quantize_zero_to (float) – Input value at zero is quantized to this value. [default= 1.0 ]

Returns

Output.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.binary_weight_convolution(x, weight, binary_weight, alpha, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, quantize_zero_to=1.0, n_outputs=- 1, outputs=None)[source]

This function provides a Binary Weight Network convolution layer. It computes in the forward pass

\[y_{n, a, b} = \frac{1}{\|\mathbf{w}_n\|_{\ell_1}} \sum_{m} \sum_{i} \sum_{j} sign(w_{n, m, i, j}) x_{m, a + i, b + j}.\]

i.e., the weights \(w_{n, m, i, j}\) are binarized to \(sign(w_{n, m, i, j})\) and, hence, each weight is in \(\{-1,\,1\}\). By this weight binarization, the inner product computations turn into additions/subtractions which are followed by multiplication with the scaling factor \(\alpha_n = \frac{1}{\|\mathbf{w}_n\|_{\ell_1}}\).

Reference

Note

1) If you would like to share the binary weights between other standard layers, please use the standard, floating value weights (weight) and not the binary weights (binary_weight).

2) The weights and the binary weights become in sync only after a call to forward(), and not after a call to backward(). If you wish to store the parameters of the network, remember to call forward(), once before doing so, otherwise the weights and the binary weights will not be in sync.

3) CPU and GPU implementations now use floating values for binary_weight, since this function is for simulation purposes.

Parameters
  • x (Variable) – Input.

  • weight (Variable) – Weight. [parameter]

  • binary_weight (Variable) – Binarized weight. [parameter]

  • alpha (Variable) – Alpha. [parameter]

  • bias (Variable) – Bias. [optional][parameter]

  • base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default= 1 ]

  • pad (tuple of int) – Padding sizes for dimensions. [default= (0,) * (len(x.shape) - (base_axis+1)) ]

  • stride (tuple of int) – Stride sizes for dimensions. [default= (1,) * (len(x.shape) - (base_axis+1)) ]

  • dilation (tuple of int) – Dilation sizes for dimensions. [default= (1,) * (len(x.shape) - (base_axis+1)) ]

  • group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default= 1 ]

  • quantize_zero_to (float) – Input value at zero is quantized to this value. [default= 1.0 ]

Returns

Output

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.fixed_point_quantize(x, sign=True, n=8, delta=0.0625, quantize=True, ste_fine_grained=True, outputs=None)[source]

Fixed Point Quantize.

This function simulates to uniformly quantize values in fixed-point number representation.

Parameters
  • x (Variable) – An input variable.

  • sign (bool) – Indicate the signed number or the unsigned number. Default is true.

  • n (int) – Bit width used. Note that sign consumes one bit. \(n-1\) is used for number representation in signed case.

  • delta (float) – Step size.

  • quantize (bool) – If true, quantize input, otherwise not.

  • ste_fine_grained (bool) – If true, STE is not 1.

Returns

N-D array.

Return type

Variable

See also

nnabla.function_bases.fixed_point_quantize.

In the forward pass,

\[\begin{split}\begin{equation} q_i= \left\{ \begin{array}{ll} max & if \ \ \ x_i > max \\ sign(x_i) \times floor(|x_i| \delta^{-1} + 2^{-1}) \times \delta & if \ \ min \le x_i \le max \\ min & if \ \ x_i < min \\ \end{array} \right., \end{equation}\end{split}\]

where \(\delta\) is the step size, \((min, max) :=(- (2^{n-1} - 1)\delta, (2^{n-1} - 1)\delta)\) if \(sign\) is true, \((min, max) := (0, (2^n - 1) \delta)\) otherwise, and \(n\) is the total bit-width used.

In the backward pass when using ste_fine_grained as false,

\[\begin{equation} \frac{\partial q_i}{\partial x_i} = 1. \end{equation}\]

In the backward pass when using ste_fine_grained as true,

\[\begin{split}\begin{equation} \frac{\partial q_i}{\partial x_i}= \left\{ \begin{array}{ll} 0 & if \ \ \ x_i > max \\ 1 & if \ \ min \le x_i \le max \\ 0 & if \ \ x_i < min \\ \end{array} \right.. \end{equation}\end{split}\]

Note

Quantized values are stored as floating point number, since this function is for simulation purposes.

nnabla.functions.min_max_quantize(x, qr_min, qr_max, ql_min, ql_max, decay=0.999, x_min_max=False, ema=False, ste_fine_grained=True, eps=0.01, quantize=True, outputs=None)[source]

Min-max quantization.

This function simulates to uniformly quantize values in fixed-point number representation.

Min-max quantization is defined as the following equation

\[y = round \left(\frac{\min(\max(x, m), M) - m}{scale} \right) \times scale + m,\]

where the \(scale\) is defined as

\[scale = \frac{M - m}{M_q - m_q},\]

and

\[\begin{split}m_q = ql_{min}, \\ M_q = ql_{max}, \\ m = qr_{min}, \\ M = qr_{max}.\end{split}\]

In the backward pass when using ste_fine_grained as false,

\[\frac{\partial q_i}{\partial x_i} = 1.\]

In the backward pass when using ste_fine_grained as true,

\[\begin{split} \frac{\partial q_i}{\partial x_i}= \left\{ \begin{array}{ll} 0 & if \ \ \ x_i > M \\ 1 & if \ \ m \le x_i \le M \\ 0 & if \ \ x_i < m \\ \end{array} \right..\end{split}\]

\(qr_{min}\) and \(qr_{max}\) are treaded as follows.

  • x_min_max is True and ema is True: Exponential moving average are computed for each \(min(x)\) and \(max(x)\) then stored in \(qr_{min}\) and \(qr_{max}\).

  • x_min_max is True and ema is False: \(min(x)\) and \(max(x)\) are computed then stored in \(qr_{min}\) and \(qr_{max}\).

  • x_min_max is False and ema is True: Exponential moving average stored in \(qr_{min}\) and \(qr_{max}\) are used.

  • x_min_max is False and ema is False Gradients of \(qr_{min}\) and \(qr_{max}\) are computed in the backward pass.

More precisely, in inference of the min-max quantization, one has to consider zero-point (zp) which corresponds to the real value 0, and its data type is an integer. zero-point is defined as

\[\begin{split} && zp_f = ql_{min} -\frac{qr_{min}}{scale}, \\ && zp = \left\{ \begin{array}{ll} ql_{max} & if \ \ \ zp_f >= ql_{max} \\ round(zp_f) & if \ \ otherwise \\ ql_{min} & if \ \ zp_f <= ql_{min} \\ \end{array} \right..\end{split}\]

Accordingly, in order to simulate quantization effect of zero-point, during both forward and backward pass, \(qr_{min}\) and \(qr_{max}\) are adjusted as follows,

\[\begin{split}qr_{min}^{adj} = ql_{min} - zp * scale, \\ qr_{max}^{adj} = ql_{max} - zp * scale.\end{split}\]

These operations are often called nudge.

Finally, in the formulas of the min-max quantization, \(m\) and \(M\) are replaced by \(qr_{min}^{adj}\) and \(qr_{max}^{adj}\) respectively.

Parameters
  • x (Variable) – Input N-D array.

  • qr_min (Variable) – Minimum quantization range (modified during forward execution).

  • qr_max (Variable) – Maximum quantization range (modified during forward execution).

  • ql_min (Variable) – Minimum quantization level, typically 0.

  • ql_max (Variable) – Maximum quantization level, typically 255.

  • decay (float) – The decay rate for the exponential moving average.

  • x_min_max (bool) – Use the min and max of x to compute quantization ranges. Default is False.

  • ema (bool) – Use the exponential moving average for the min and max quantization ranges. Default is False.

  • ste_fine_grained (bool) – If True, STE is not 1, the {0, 1}-mask computed from the min-max is applied to the gradient in the backward; otherwise, STE is 1.

  • eps (float) – Epsilon, or small value to ensure \(qr_{max} - qr_{min}\) must be greater than the epsilon.

  • quantize (bool) – Apply quantization or not.

References

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko, “Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference”, https://arxiv.org/abs/1712.05877

nnabla.functions.pow2_quantize(x, sign=True, with_zero=True, n=8, m=1, quantize=True, ste_fine_grained=True, outputs=None)[source]

Pow2 Quantize.

This function simulates to uniformly quantize values in fixed-point number representation.

Parameters
  • x (Variable) – An input variable.

  • sign (bool) – Indicate the signed number or the unsigned number. Default is true.

  • with_zero (bool) – Indicate using zero as a quantized value. Default is true. Note that zero consumes one bit.

  • n (int) – Bit width used. Note that sign consumes one bit. \(n-1\) is used for number representation in signed case. Default is 8.

  • m (int) – \(2^m\) is the upper bound of the dynamic range and \(-2^m\) is the lower bound, \(m \in \mathcal{Z}\). Default is 1.

  • quantize (bool) – If true, quantize input, otherwise not.

  • ste_fine_grained (bool) – If true, STE is not 1.

Returns

N-D array.

Return type

Variable

See also

nnabla.function_bases.pow2_quantize.

In the forward pass of signed case,

\[\begin{split}q_i= \left\{ \begin{array}{ll} max_{+} & if \ \ \overline{q_i} > max_{+} \\ \overline{q_i} & if \ \ min_{+} \le \overline{q_i} \le max_{+} \\ min_{+} & if \ \ 0 \le \overline{q_i} < min_{+} \\ min_{-} & if \ \ min_{-} < \overline{q_i} < 0 \\ \overline{q_i} & if \ \ max_{-} \le \overline{q_i} \le min_{-}\\ max_{-} & if \ \ \overline{q_i} < max_{-} \\ \end{array} \right.,\end{split}\]

where

\[\begin{split}&& max_{+} = 2^{m}, min_{+} = 2^{m - (2^{n-1} - 1)},\\ && max_{-} = -2^{m}, min_{-} = -2^{m - (2^{n-1} - 1)},\\ && \overline{q_i} = sign(x_i) \times 2^{round(\log_2 |x_i|)}.\end{split}\]

This quantization uses the geometric mean between two power-of-two numbers as quantization threshold.

In the forward pass of unsigned case,

\[\begin{split}q_i= \left\{ \begin{array}{ll} max & if \ \ \overline{q_i} > max \\ \overline{q_i} & if \ \ min \le \overline{q_i} \le max \\ min & if \ \ 0 < \overline{q_i} < min \\ \end{array} \right.,\end{split}\]

where

\[\begin{split}&& max = 2^{m}, min = 2^{m - (2^{n} - 1)},\\ && \overline{q_i} = 2^{int(\log_2 |x_i|)}.\end{split}\]

When using with_zero as true, a pruning threshold is used to round an input to 0 or \(min\). The pruning threshold is defined in this function as the following,

\[pruning\ threshold = min \times 2^{-\frac{1}{2}}.\]

If an absolute value of the input is lesser than this value, the input is rounded to 0, otherwise \(min\).

In the backward pass when using ste_fine_grained as false,

\[\frac{\partial q_i}{\partial x_i} = 1.\]

In the backward pass when using ste_fine_grained as true,

\[\begin{split}\frac{\partial q_i}{\partial x_i}= \left\{ \begin{array}{ll} 0 & if \ \ \overline{q_i} > max_{+} \\ 1 & if \ \ otherwise \\ 0 & if \ \ \overline{q_i} < max_{-} \\ \end{array} \right..\end{split}\]
nnabla.functions.prune(x, rate=0.9, n_outputs=- 1, outputs=None)[source]

Prune the input as the following equation,

\[\begin{split}q_i = \left \{ \begin{array}{ll} 0 & abs(x_i) < threshold \\ x_i & otherwise \end{array} \right.\end{split}\]

where \(threshold\) is determined by threshold = np.sort(np.abs(x))[int((x.size - 1) * rate)].

Parameters
  • x (Variable) – N-D array

  • rate (float) – Sparse rate, or pruning rate. [default= 0.9 ]

Returns

N-D array with the same shape as x

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.inq_affine(x, weight, indicator_fixedweights, bias=None, base_axis=1, num_bits=4, inq_iterations=(), selection_algorithm='largest_abs', seed=- 1, n_outputs=- 1, outputs=None)[source]

This function provides a INQ affine layer. It computes in the forward pass

\[y_j = \sum_{i} w_{j,i} x_i,\]

where the weights \(w_{j,i}\) are quantized sequentially during training to power-of-two numbers. In the backward pass, only the non-fixed (i.e., learnable) weights are updated.

References

Parameters
  • x (Variable) – Input .

  • weight (Variable) – Weight . [parameter]

  • indicator_fixedweights (Variable) – Indicates which weights are already fixed (0 = not fixed, 1 = fixed) . [parameter]

  • bias (Variable) – Bias. [optional][parameter]

  • base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default= 1 ]

  • num_bits (int) – Number of bits per weight. Needs to be >= 2 as two bits are used to code zero and sign of weight. [default= 4 ]

  • inq_iterations (repeated int64) – List which specifies after how many forward passes we fix 50% of the learnable weights. If we have done as many iterations as specified in the last element of inq_iterations, then all weights are fixed. [default= () ]

  • selection_algorithm (string) – Chooses algorithm that we use for selecting the weights to fix (“largest_abs” … fix weights with largest absolute value, “random” … fix weights randomly) [default= 'largest_abs' ]

  • seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default= -1 ]

Returns

Output.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.inq_convolution(x, weight, indicator_fixedweights, bias=None, base_axis=1, pad=None, stride=None, dilation=None, group=1, num_bits=4, inq_iterations=(), selection_algorithm='largest_abs', seed=- 1, n_outputs=- 1, outputs=None)[source]

This function provides a INQ convolution layer. It computes in the forward pass

\[y_{n, a, b} = \sum_{m} \sum_{i} \sum_{j} w_{n, m, i, j} x_{m, a + i, b + j},\]

where the weights \(w_{j,i}\) are quantized sequentially during training to power-of-two numbers. In the backward pass, only the non-fixed (i.e., learnable) weights are updated.

Reference

Parameters
  • x (Variable) – Input.

  • weight (Variable) – Weight. [parameter]

  • indicator_fixedweights (Variable) – Indicates which weights are already fixed (0 = not fixed, 1 = fixed) . [parameter]

  • bias (Variable) – Bias. [optional][parameter]

  • base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default= 1 ]

  • pad (tuple of int) – Padding sizes for dimensions. [default= (0,) * (len(x.shape) - (base_axis+1)) ]

  • stride (tuple of int) – Stride sizes for dimensions. [default= (1,) * (len(x.shape) - (base_axis+1)) ]

  • dilation (tuple of int) – Dilation sizes for dimensions. [default= (1,) * (len(x.shape) - (base_axis+1)) ]

  • group (int) – Number of groups of channels. This makes the connection across channels sparser, by grouping connections along the mapping direction. [default= 1 ]

  • num_bits (int) – Number of bits per weight. Needs to be >= 2 as two bits are used to code zero and sign of weight. [default= 4 ]

  • inq_iterations (repeated int64) – List which specifies after how many forward passes we fix 50% of the learnable weights. If we have done as many iterations as specified in the last element of inq_iterations, then all weights are fixed. [default= () ]

  • selection_algorithm (string) – Chooses algorithm that we use for selecting the weights to fix (“largest_abs” … fix weights with largest absolute value, “random” … fix weights randomly) [default= 'largest_abs' ]

  • seed (int) – Random seed. When -1, seed is sampled from global random number generator. [default= -1 ]

Returns

Output

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

Unsupported, Special Use
nnabla.functions.vat_noise(x, w, base_axis=1, eps=1.0, n_outputs=- 1, outputs=None)[source]

Noise for virtual adversarial training.

This layer is a special layer for GUI network designing, specialized for getting the noise of virtual adversarial training.

In the backward process, the weight parameter will be replaced with the gradient.

Forward

\[y_i = \frac{\epsilon x_i}{\sqrt{\sum_k x_k^2 + c}}\]

Backward

\[\delta x_i = 0\]
\[w_i = \epsilon \delta y_i\]

Note

This layer is a special layer for GUI network designing.

References

Parameters
  • x (Variable) – N-D array of noise input. Noise is standard Gaussian noise initially, but the next step, fed back gradient variable.

  • w (Variable) – N-D array for keep gradient values.

  • base_axis (int) – Dimensions up to base_axis is treated as sample dimension. [default= 1 ]

  • eps (float) – Noise norm (l2) factor. [default= 1.0 ]

Returns

N-D array

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

This function behaves as an identity function on the forward pass, and deletes the gradient for the background pass.

This layer is a special layer for GUI network designing, used for getting zero backward operation by adding this layer.

Forward

\[y_i = x_i\]

Backward

\[\delta x_i = 0\]

Note

This layer is a special layer for GUI network designing.

Parameters

x (Variable) – N-D array.

Returns

N-D array.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.sink(*x, **kw)[source]

Creates a dummy variable used to call forward or backward function of multiple variables at one place.

This takes any numbers of input variables with any shape, and creates a single 0-shape outputs. The forward pass does nothing. The backward pass set ones to the input grads if one_input_grad is set as true.

Note

sink can only be called at the very end of the graph, and grad of input variables are cleared

when y.backward(clear_buffer=True) is called.

Parameters
  • *x (Variable) – Any number of inputs with any shape. [variadic]

  • one_input_grad (bool) – Set grads of inputs as one during backward. It is useful to set false if you want to set external gradients to the input variables. [default= True ]

Returns

Dummy variable.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.confusion_matrix(x, target, axis=None, n_outputs=- 1, outputs=None)[source]

Confusion matrix. The return value is already summed over samples.

Parameters
  • x (Variable) – Probabilities N-D array. (\(D_1 \times ... \times D_i \times ... \times D_N\))

  • target (Variable) – Labels N-D array. (\(D_1 \times ... \times 1 \times ... \times D_N\))

  • axis (int) – Axis on which the confusion matrix is calculated. [default= len(x.shape) - 1 ]

Returns

Confusion matrix 2-D array. Col index is estimated class. Row index is label class.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

Image Object Detection
nnabla.functions.nms_detection2d(x, thresh=None, nms=None, nms_per_class=None, n_outputs=- 1, outputs=None)[source]

Non-Maximum Suppression (NMS) to 2D Object detector output. The input is a 3-dimensional tensor with shape of (B, N, 5 + C) where B denotes batch size, N denotes the number of detection box candidates, and C denotes the number of classes of object detection. 5 + C consists of the box coordinates x, y, w, h in normalized coordinates (size of each x and y are 1.0), objectness (learned to predict IoU value to ground truth box), and the class probabilities of C classes. It outputs a tensor with the same dimensions as the input, where all values are copied from the input to the output, except the class probabilities are multiplied by objectness, and possibly suppressed to 0 by NMS. During NMS, all of combination of pairs of bounding boxes is compared. For each pair, the bounding box with a lower detection score (described below) is suppressed if the overlap ratio (the IoU) is greater than the value of nms.

There are two suppression modes for NMS.

1. Suppress by class probability (nms_per_class is True): For each bounding box, the detection score is calculated by objectness * probability[class_id] for each class. The suppression is done for each class independently.

2. Suppress by objectness (nms_per_class is False): The suppression is done for each bounding box using objectness as a detection score. All class probabilities becomes 0 for every suppressed boxes.

References

Parameters
  • x (Variable) – A 3-dimensional array.

  • thresh (float) – Detection score threshold. [default= 0.5 ]

  • nms (float) – IoU threshold for Non-maximum suppression (NMS). [default= 0.45 ]

  • nms_per_class (bool) – If true, NMS is applied for each class. [default= True ]

Returns

A 3-dim array with the same dimensions with the input.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

Validation
nnabla.functions.top_n_error(x, target, axis=None, n=1, n_outputs=- 1, outputs=None)[source]

Top N error along the dimension specified by the axis, the element of outputs is

\[\begin{split}y_i = \left \{ \begin{array}{l} 1 \ (x_i \ is \ not \ within \ N-th \ place) \\ 0 \ (x_i \ is \ within \ N-th \ place) \end{array} \right.\end{split}\]
Parameters
  • x (Variable) – Probabilities N-D array. \(D_1 \times ... \times D_i \times ... \times D_N\)

  • target (Variable) – N-D array of labels. \(D_1 \times ... \times 1 \times ... \times D_N\)

  • axis (int) – Axis on which the top N error is calculated. [default= len(x.shape) - 1 ]

  • n (int) – top N [default= 1 ]

Returns

Element-wise error N-D array. (\(D_1 \times ... \times 1 \times ... \times D_N\))

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

nnabla.functions.binary_error(x, target, n_outputs=- 1, outputs=None)[source]

Elementwise binary error.

\[\begin{split}y_i = \left \{ \begin{array}{l} 0 ((x^{(0)} \geq 0.5) = (x^{(1)} \geq 0.5)) \\ 1 ((x^{(0)} \geq 0.5) \neq (x^{(1)} \geq 0.5)) \end{array} \right.\end{split}\]
Parameters
  • x (Variable) – Probabilities N-D array. \(-\infty\) to \(\infty\).

  • target (Variable) – Labels N-D array. Usually set as 0 or 1, but, it allows probability (0 to 1) as inputs.

Returns

Element-wise errors N-D array.

Return type

Variable

Note

All nnabla functions in nnabla.functions are decorated with the nnabla.function_bases.function_api decorator, which queries the current context and passes it into the first argument of the original function. The original function always takes a context as the first argument.

Parametric Functions

In NNabla, trainable models are created by composing functions that have optimizable parameters. These functions are called parametric functions. Parametric functions are provided by nnabla.parametric_functions.

See also:

Python API Tutorial.

Parameter Management API

The parameters registered by List of Parametric Functions can be managed using APIs listed in this section.

nnabla.parameter.parameter_scope(name, scope=None)[source]

Grouping parameters registered by parametric functions listed in nnabla.parametric_functions.

Parameters
  • name (str) – Parameter scope name.

  • scope (OrderedDict, optional) – Specify current parameter scope as a local dictionary. The default value is None. In this case, the current parameter scope maintained in global is used.

Example:

import nnabla as nn
import nnabla.parametric_functions as PF
import nnabla.functions as F

with nn.parameter_scope('conv1'):
    conv_out1 = PF.convolution(x, 32, (5, 5))
    bn_out1 = PF.batch_normalization(conv_out1)
    act_out1 = F.relu(bn_out1)
with nn.parameter_scope('conv2'):
    conv_out2 = PF.convolution(act_out1, 64, (3, 3))
    bn_out2 = PF.batch_normalization(conv_out2)
    act_out2 = F.relu(bn_out2)

Nesting The with statement blocks allows you to nest parameter scopes. This can also be done by using “/” inside the parameter names.

Example:

with nn.parameter_scope('network1'):
    with nn.parameter_scope('conv1'):
        conv_out1 = PF.convolution(x, 32, (5, 5))
        bn_out1 = PF.batch_normalization(conv_out1)
        act_out1 = F.relu(bn_out1)
    with nn.parameter_scope('conv2'):
        conv_out2 = PF.convolution(act_out1, 64, (3, 3))
        bn_out2 = PF.batch_normalization(conv_out2)
        act_out2 = F.relu(bn_out2)

is equivalent to

with nn.parameter_scope('network1/conv1'):
    conv_out1 = PF.convolution(x, 32, (5, 5))
    bn_out1 = PF.batch_normalization(conv_out1)
    act_out1 = F.relu(bn_out1)
with nn.parameter_scope('network1/conv2'):
    conv_out2 = PF.convolution(act_out1, 64, (3, 3))
    bn_out2 = PF.batch_normalization(conv_out2)
    act_out2 = F.relu(bn_out2)
nnabla.parameter.get_current_parameter_scope()[source]

Returns current parameter scope.

nnabla.parameter.get_parameters(params=None, path='', grad_only=True)[source]

Get parameter Variables under the current parameter scope.

Parameters
  • params (dict) – Internal use. User doesn’t set it manually.

  • path (str) – Internal use. User doesn’t set it manually.

  • grad_only (bool) – Retrieve all parameters under the current scope if False, while only parameters with need_grad=True are retrieved if True.

Returns

{str : Variable}

Return type

dict

nnabla.parameter.clear_parameters()[source]

Clear all parameters in the current scope.

nnabla.parameter.save_parameters(path, params=None, extension=None)[source]

Save all parameters into a file with the specified format.

Currently hdf5 and protobuf formats are supported.

Parameters
  • path – path or file object

  • params (dict, optional) – Parameters to be saved. Dictionary is of a parameter name (str) to Variable.

nnabla.parameter.load_parameters(path, proto=None, needs_proto=False, extension='.nntxt')[source]

Load parameters from a file with the specified format.

Parameters

path – path or file object

nnabla.parameter.get_parameter_or_create(name, shape=None, initializer=None, need_grad=True, as_need_grad=None)[source]

Returns an existing parameter variable in current parameter scope with the provided name.

If a variable with the provided name does not exist, a new variable is created and registered to the current parameter scope with the name, then returned.

Parameters
  • name (str) – The name under the current scope. If it already exists, the name is queried from the parameter manager.

  • shape (tuple of int) – Shape of created parameter. The shape of the specified parameter must match with this shape. The default is None which is only valid if initializer is given as an numpy.ndarray.

  • initializer (nnabla.initializer.BaseInitializer or numpy.ndarray) – An initialization function to be applied to the parameter. numpy.ndarray can also be given to initialize parameters from numpy array data.

  • need_grad (bool) – Register the parameter with the specified need_grad flag. The default is True. If the flag is different from the previously specified one, the flag will be overwritten, but the values will be kept.

  • as_need_grad (bool) – Get a parameter variable with the specified need_grad flag. Note that this doesn’t overwrite the flag of the registered parameter variable with the provided name. Instead, if the given flag mismatches with the previously registered need_grad flag, it returns a new variable referring to the same array contents but with need_grad=as_need_grad.

Note

It returns a Variable which is unlinked from the registered one in the current parmeter scope (using nnabla.Variable.get_unlinked_variable()). That means changing a need_grad attribute doesn’t affect the variable existing in the current parameter scope.

List of Parametric Functions

Parametric functions are provided by nnabla.parametric_functions , as listed below. Like functions listed in Functions, they take Variable (s) as first argument(s) followed by options specific to a parametric function. In addition, they register parameter Variable (s) into the parameter scope.

The parameter variables are registered with need_grad properties specific to a parametric function. The variables with need_grad=False flag will not be updated by gradient descent. Hence, backward computation is not executed for those variables. False is usually specified when the parameters are updated during foward pass and/or backward pass, e.g., batch normalization.

All parametric functions take an optional argument fix_parameters=False. By giving True, the associated parameter variables are connected to a computation graph with a property need_grad=False regardless properties of the registered variables, then backward gradient computation is not executed for those variables. This is useful when you create a computation graph for evaluation purpose, fixing parameters partially in a graph, and so on.

All parametric functions listed below are decorated with the following decorator.

nnabla.parametric_functions.parametric_function_api(scope_name=None, param_desc=None)[source]

Decorator for parametric functions.

The decorated function is always called under a parameter scope scope_name. Also, the decorator adds an additional argument name (str, default is None) at the end. If name is specified, the scope scope_name comes under a scope name. This feature could reduce vertical space usage of the source code. Any parametric function should be decorated by this.

Parameters
  • scope_name (str, optional) – The original function will be called under a parameter scope named by scope_name.

  • param_desc (list, optional) – Descriptions of parameters will be automatically included into docstring. This must be a list of tuples with 4 elements composed of (name (str), description (str), shape info (str), need_grad (bool)).

Returns

A decorated parametric function.

Return type

function

See Parameter Management API to know how to query and manipulate registered variables.

Here is the list of parametric functions.

nnabla.parametric_functions.affine(inp, n_outmaps, base_axis=1, w_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, apply_w=None, apply_b=None, name=None)[source]

The affine layer, also known as the fully connected layer. Computes

\[{\mathbf y} = {\mathbf A} {\mathbf x} + {\mathbf b}.\]

where \({\mathbf x}, {\mathbf y}\) are the inputs and outputs respectively, and \({\mathbf A}, {\mathbf b}\) are constants.

Parameters
Returns

\((B + 1)\)-D array. (\(M_0 \times \ldots \times M_{B-1} \times L\))

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "affine";

  • W (need_grad=True) : Weight matrix. (shape: (inmaps, outmaps))

  • b (need_grad=True) : bias vector. (shape: (outputs,))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = affine(<args>)
nnabla.parametric_functions.convolution(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, channel_last=False, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, apply_w=None, apply_b=None, name=None)[source]

N-D Convolution with a bias term.

For Dilated Convolution (a.k.a. Atrous Convolution), refer to:

Note

Convolution is a computationally intensive operation that should preferably be run with the cudnn backend. NNabla then uses CuDNN library functions to determine and cache the fastest algorithm for the given set of convolution parameters, which results in additional memory consumption which may pose a problem for GPUs with insufficient memory size. In that case, the NNABLA_CUDNN_WORKSPACE_LIMIT environment variable can be used to restrict the choice of algorithms to those that fit the given workspace memory limit, expressed in bytes. In some cases it may also be desired to restrict the automatic search to algorithms that produce deterministic (reproducible) results. This can be requested by setting the the environment variable NNABLA_CUDNN_DETERMINISTIC to a non-zero value.

Parameters
  • inp (Variable) – N-D array.

  • outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.

  • kernel (tuple of int) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) two-dimensional kernel, specify (3,5).

  • pad (tuple of int) – Padding sizes for dimensions.

  • stride (tuple of int) – Stride sizes for dimensions.

  • dilation (tuple of int) – Dilation sizes for dimensions.

  • group (int) – Number of groups of channels. This makes connections across channels more sparse by grouping connections along map direction.

  • channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a. NHWC order.

  • w_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for weight. By default, it is initialized with nnabla.initializer.UniformInitializer within the range determined by nnabla.initializer.calc_uniform_lim_glorot.

  • b_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.

  • base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.

  • fix_parameters (bool) – When set to True, the weights and biases will not be updated.

  • rng (numpy.random.RandomState) – Random generator for Initializer.

  • with_bias (bool) – Specify whether to include the bias term.

  • apply_w (function) – Lambda, function, or callable object applied to the weights.

  • apply_b (function) – Lambda, function, or callable object applied to the bias.

Returns

N-D array. See convolution for the output shape.

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "conv";

  • W (need_grad=True) : Filter weights. (shape: (outmaps, inmaps // group, *kernel))

  • b (need_grad=True) : Bias vector. (shape: (outmaps,))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = convolution(<args>)
nnabla.parametric_functions.depthwise_convolution(inp, kernel, pad=None, stride=None, dilation=None, multiplier=1, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]

N-D Depthwise Convolution with a bias term.

Reference:

Parameters
Returns

N-D array. See depthwise_convolution for the output shape.

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "depthwise_conv";

  • W (need_grad=True) : Filter weights. (shape: (inmaps * multiplier, *kernel))

  • b (need_grad=True) : Bias vector. (shape: (inmaps * multiplier,))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = depthwise_convolution(<args>)
nnabla.parametric_functions.deconvolution(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, channel_last=False, output_padding=None, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, apply_w=None, apply_b=None, name=None)[source]

Deconvolution layer.

Parameters
  • inp (Variable) – N-D array.

  • outmaps (int) – Number of deconvolution kernels (which is equal to the number of output channels). For example, to apply deconvolution on an input with 16 types of filters, specify 16.

  • kernel (tuple of int) – Convolution kernel size. For example, to apply deconvolution on an image with a 3 (height) by 5 (width) two-dimensional kernel, specify (3,5).

  • pad (tuple of int) – Padding sizes for dimensions.

  • stride (tuple of int) – Stride sizes for dimensions.

  • dilation (tuple of int) – Dilation sizes for dimensions.

  • group (int) – Number of groups of channels. This makes connections across channels sparser by grouping connections along map direction.

  • w_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for weight. By default, it is initialized with nnabla.initializer.UniformInitializer within the range determined by nnabla.initializer.calc_uniform_lim_glorot.

  • b_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.

  • base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.

  • fix_parameters (bool) – When set to True, the weights and biases will not be updated.

  • rng (numpy.random.RandomState) – Random generator for Initializer.

  • with_bias (bool) – Specify whether to include the bias term.

  • apply_w (function) – Lambda, function, or callable object applied to the weights.

  • apply_b (function) – Lambda, function, or callable object applied to the bias.

Returns

N-D array. See deconvolution for the output shape.

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "deconv";

  • W (need_grad=True) : Filter weights. (shape: (inmaps, outmaps // group, *kernel))

  • b (need_grad=True) : Bias vector. (shape: (outmaps,))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = deconvolution(<args>)
nnabla.parametric_functions.depthwise_deconvolution(inp, kernel, pad=None, stride=None, dilation=None, divisor=1, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]

Depthwise deconvolution computes the transposed depthwise convolution for one-dimensional and two-dimensional input data.

Parameters
Returns

N-D array. See depthwise_deconvolution for the output shape.

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "depthwise_deconv";

  • W (need_grad=True) : Filter weights. (shape: (inmaps,) + kernel)

  • b (need_grad=True) : Bias vector. (shape: (inmaps / divisor,))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = depthwise_deconvolution(<args>)
nnabla.parametric_functions.deformable_convolution(inp, outmaps, kernel, offset, mask=None, pad=None, stride=None, dilation=None, group=1, deformable_group=1, channel_last=False, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, apply_w=None, apply_b=None, name=None)[source]

2D Deformable Convolution with a bias term. If use mask, this function is Deformable Convolution v2.

Parameters
  • inp (Variable) – N-D array.

  • outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.

  • kernel (tuple of int) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) two-dimensional kernel, specify (3,5).

  • offset (Variable) – Offsets for deformable convolutions. Shape is fixed to \((N, deformable_group imes 2 imes Kh imes Kw, H, W)\). Offsets must be calculated externally through a separate convolution layer.

  • mask (Variable) – Normalized mask for deformable convolutions v2. Shape is fixed to \((N, deformable_group imes 1 imes Kh imes Kw, H, W)\). Masks must be calculated externally together with the offsets through a separate convolution layer.

  • pad (tuple of int) – Padding sizes for dimensions.

  • stride (tuple of int) – Stride sizes for dimensions.

  • dilation (tuple of int) – Dilation sizes for dimensions.

  • group (int) – Number of groups of channels. This makes connections across channels more sparse by grouping connections along map direction.

  • deformable_group (int) – Number of deformable groups of channels. This makes connections across channels more sparse by grouping connections along map direction.

  • channel_last (bool) – If True, the last dimension is considered as channel dimension, a.k.a. NHWC order.

  • w_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for weight. By default, it is initialized with nnabla.initializer.UniformInitializer within the range determined by nnabla.initializer.calc_uniform_lim_glorot.

  • b_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.

  • base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.

  • fix_parameters (bool) – When set to True, the weights and biases will not be updated.

  • rng (numpy.random.RandomState) – Random generator for Initializer.

  • with_bias (bool) – Specify whether to include the bias term.

  • apply_w (function) – Lambda, function, or callable object applied to the weights.

  • apply_b (function) – Lambda, function, or callable object applied to the bias.

Returns

N-D array. See convolution for the output shape.

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "deformable_conv";

  • W (need_grad=True) : Filter weights. (shape: (outmaps, inmaps // group, *kernel))

  • b (need_grad=True) : Bias vector. (shape: (outmaps,))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = deformable_convolution(<args>)
nnabla.parametric_functions.batch_normalization(inp, axes=[1], decay_rate=0.9, eps=1e-05, batch_stat=True, output_stat=False, fix_parameters=False, param_init=None, no_scale=False, no_bias=False, name=None)[source]

Batch normalization layer.

\[\begin{split}\begin{array}{lcl} \mu &=& \frac{1}{M} \sum x_i\\ \sigma^2 &=& \frac{1}{M} \sum \left(x_i - \mu\right)^2\\ \hat{x}_i &=& \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon }}\\ y_i &= & \hat{x}_i \gamma + \beta. \end{array}\end{split}\]

where \(x_i, y_i\) are the inputs. In testing, the mean and variance computed by moving average calculated during training are used.

Parameters
  • inp (Variable) – N-D array of input.

  • axes (tuple of int) – Mean and variance for each element in axes are calculated using elements on the rest axes. For example, if an input is 4 dimensions, and axes is [1], batch mean is calculated as np.mean(inp.d, axis=(0, 2, 3), keepdims=True) (using numpy expression as an example).

  • decay_rate (float) – Decay rate of running mean and variance.

  • eps (float) – Tiny value to avoid zero division by std.

  • batch_stat (bool) – Use mini-batch statistics rather than running ones.

  • output_stat (bool) – Output batch mean and variance.

  • fix_parameters (bool) – When set to True, the beta and gamma will not be updated.

  • param_init (dict) – Parameter initializers can be set with a dict. A key of the dict must be 'beta', 'gamma', 'mean' or 'var'. A value of the dict must be an Initializer or a numpy.ndarray. E.g. {'beta': ConstantInitializer(0), 'gamma': np.ones(gamma_shape) * 2}.

  • no_scale (bool) – If True, the scale term is omitted.

  • no_bias (bool) – If True, the bias term is omitted.

Returns

N-D array.

Return type

Variable

References

The shape of parameters has the same number of dimensions with the input data, and the shapes in axes has the same dimensions with the input, while the rest has 1. If an input is 4-dim and axes=[1], the parameter shape will be param_shape  = np.mean(inp.d, axis=(0, 2, 3), keepdims=True).shape (using numpy expression as an example).

Parameters to be registered

The following variables are registered in a parameter scope "bn";

  • beta (need_grad=True) : Trainable bias \(\beta\). (shape: <see above>)

  • gamma (need_grad=True) : Trainable scaling factor \(\gamma\). (shape: <see above>)

  • mean (need_grad=False) : Moving average of batch mean. (shape: <see above>)

  • var (need_grad=False) : Moving average of batch variance. (shape: <see above>)

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = batch_normalization(<args>)
nnabla.parametric_functions.fused_batch_normalization(inp, z=None, axes=[1], decay_rate=0.9, eps=1e-05, batch_stat=True, nonlinearity='relu', output_stat=False, fix_parameters=False, param_init=None, no_scale=False, no_bias=False, name=None)[source]

Batch normalization layer fused with the following add2 operation of a residual input and an nonlinear activation.

Parameters
  • inp (Variable) – N-D array of input.

  • z (Variable, optional) – A residual input. By specifying None, the activation function will follow immediately after BN operation.

  • axes (tuple of int) – Mean and variance for each element in axes are calculated using elements on the rest axes. For example, if an input is 4 dimensions, and axes is [1], batch mean is calculated as np.mean(inp.d, axis=(0, 2, 3), keepdims=True) (using numpy expression as an example).

  • decay_rate (float) – Decay rate of running mean and variance.

  • eps (float) – Tiny value to avoid zero division by std.

  • batch_stat (bool) – Use mini-batch statistics rather than running ones.

  • nonlinearity (string) – Activation function. The default is ‘relu’.

  • output_stat (bool) – Output batch mean and variance.

  • fix_parameters (bool) – When set to True, the beta and gamma will not be updated.

  • no_scale (bool) – If True, the scale term is omitted.

  • no_bias (bool) – If True, the bias term is omitted.

Returns

N-D array.

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "bn";

  • beta (need_grad=True) : Trainable bias \(\beta\). (shape: <see above>)

  • gamma (need_grad=True) : Trainable scaling factor \(\gamma\). (shape: <see above>)

  • mean (need_grad=False) : Moving average of batch mean. (shape: <see above>)

  • var (need_grad=False) : Moving average of batch variance. (shape: <see above>)

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = fused_batch_normalization(<args>)
nnabla.parametric_functions.sync_batch_normalization(inp, comm, group='world', axes=[1], decay_rate=0.9, eps=1e-05, batch_stat=True, output_stat=False, fix_parameters=False, param_init=None, no_scale=False, no_bias=False, name=None)[source]

Synchronized batch normalization layer.

For some tasks (e.g., semantic segmentation), batch size will be too small and BatchNormalization layer might not work well. SyncBatchNorlization layer solves these problems by synchronizing batch stats (mean and var) between multiple processes.

\[\begin{split}\begin{array}{lcl} \mu &=& \frac{1}{M} \sum x_i\\ \sigma^2 &=& \frac{1}{M} \left(\sum x_i - \mu\right)^2\\ \hat{x}_i &=& \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon }}\\ y_i &= & \hat{x}_i \gamma + \beta. \end{array}\end{split}\]

where \(x_i, y_i\) are the inputs.

Parameters
  • inp (Variable) – N-D array of input.

  • comm (Communicator) – The communicator

  • group (string) – The name of the communicator group

  • axes (tuple of int) – Mean and variance for each element in axes are calculated using elements on the rest axes. For example, if an input is 4 dimensions, and axes is [1], batch mean is calculated as np.mean(inp.d, axis=(0, 2, 3), keepdims=True) (using numpy expression as an example).

  • decay_rate (float) – Decay rate of running mean and variance.

  • eps (float) – Tiny value to avoid zero division by std.

  • batch_stat (bool) – Use mini-batch statistics rather than running ones.

  • output_stat (bool) – Output batch mean and variance.

  • fix_parameters (bool) – When set to True, the beta and gamma will not be updated.

  • param_init (dict) – Parameter initializers can be set with a dict. A key of the dict must be 'beta', 'gamma', 'mean' or 'var'. A value of the dict must be an Initializer or a numpy.ndarray. E.g. {'beta': ConstantInitializer(0), 'gamma': np.ones(gamma_shape) * 2}.

  • no_scale (bool) – If True, the scale term is omitted.

  • no_bias (bool) – If True, the bias term is omitted.

Returns

N-D array.

Return type

Variable

References

The shape of parameters has the same number of dimensions with the input data, and the shapes in axes has the same dimensions with the input, while the rest has 1. If an input is 4-dim and axes=[1], the parameter shape will be param_shape  = np.mean(inp.d, axis=(0, 2, 3), keepdims=True).shape (using numpy expression as an example).

Parameters to be registered

The following variables are registered in a parameter scope "bn";

  • beta (need_grad=True) : Trainable bias \(\beta\). (shape: <see above>)

  • gamma (need_grad=True) : Trainable scaling factor \(\gamma\). (shape: <see above>)

  • mean (need_grad=False) : Moving average of batch mean. (shape: <see above>)

  • var (need_grad=False) : Moving average of batch variance. (shape: <see above>)

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = sync_batch_normalization(<args>)
nnabla.parametric_functions.mean_subtraction(inp, base_axis=1, update_running_mean=True, fix_parameters=False, name=None)[source]

Mean subtraction layer.

It subtracts the mean of the elements of the input array, and normalizes it to \(0\). Preprocessing arrays with this function has the effect of improving accuracy in various tasks such as image classification.

At training time, this function is defined as

\[\begin{split}\begin{array}{lcl} \mu &=& \frac{1}{M} \sum x_i \\ y_i &=& x_i - \mu \end{array}\end{split}\]

At testing time, the mean values used are those that were computed during training by moving average.

Note

The backward performs an approximated differentiation that takes into account only the latest mini-batch.

Parameters
  • inp (Variable) – N-D array of input.

  • base_axis (int) – Base axis of Mean Subtraction operation. Dimensions up to base_axis is treated as sample dimension.

  • update_running_mean (bool) – When set to True, the running mean will not be updated.

  • fix_parameters (bool) – dummy parameter. This argument dose not affect anything.

Returns

N-D array.

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "mean_subtraction";

  • mean (need_grad=False) : Moving average. (shape: inp.shape[base_axis:])

  • t (need_grad=False) : Minibatch counter used in forward pass. (shape: (1,))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = mean_subtraction(<args>)
nnabla.parametric_functions.layer_normalization(inp, batch_axis=0, eps=1e-05, output_stat=False, fix_parameters=False, param_init=None, no_scale=False, no_bias=False, name=None)[source]

Applies Layer Normalization over an input variable, which is defined as:

\[\begin{split}\begin{eqnarray} \mu^l &=& \frac{1}{H} \sum_{i=1}^{H} x_i^l \\ \sigma^l &=& \sqrt{\frac{1}{H} \sum_{i=1}^{H} \left(x_i^l - \mu^l\right)^2} \\ y &=& \frac{x - \mu^l}{\sigma^l + \epsilon} \gamma + \beta \end{eqnarray}\end{split}\]

where \(x\) and \(y\) are input and output variable, \(\mu^l\) and \(\sigma^l\) are the mean and std of each layer along batch axis, and \(\alpha\) and \(\beta\) are trainable parameter.

Note

Unlike other normalizations, which applies scalar scale and bias for each entire channel/plane, Layer Normalization applies per-element scale and bias.

References

Parameters
  • inp (Variable) – An input variable.

  • batch_axis (int or repeated int) – Axes mean and variance are taken.

  • eps (float) – Tiny value to avoid zero division by std.

  • output_stat (bool) – It True, calculated mean and variance are also returned.

  • fix_parameters (bool) – When set to True, the beta and gamma will not be updated.

  • param_init (dict) – Parameter initializers can be set with a dict. A key of the dict must be 'gamma', 'beta'. A value of the dict must be an Initializer or a numpy.ndarray. E.g. {'gamma': np.ones(...) * 2, 'beta': ConstantInitializer(0)}.

  • no_scale (bool) – If True, the scale term is omitted.

  • no_bias (bool) – If True, the bias term is omitted.

Returns

Normalized output variable. * Variable: Mean (if ``output_stat=True`). * Variable: Std (if ``output_stat=True`)

Return type

Parameters to be registered

The following variables are registered in a parameter scope "layer_normalization";

  • beta (need_grad=True) : Trainable bias \(\beta\). (shape: <see above>)

  • gamma (need_grad=True) : Trainable scaling factor \(\gamma\). (shape: <see above>)

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = layer_normalization(<args>)
nnabla.parametric_functions.instance_normalization(inp, channel_axis=1, batch_axis=0, eps=1e-05, output_stat=False, fix_parameters=False, param_init=None, no_scale=False, no_bias=False, name=None)[source]

Applies Instance Normalization over an input variable, which is defined as:

\[\begin{split}\begin{eqnarray} \mu^i &=& \frac{1}{H} \sum_{i=1}^{H} x_i^i \\ \sigma^i &=& \sqrt{\frac{1}{H} \sum_{i=1}^{H} \left(x_i^i - \mu^i\right)^2} \\ y &=& \frac{x - \mu^i}{\sigma^ + \epsilon} \gamma + \beta \end{eqnarray}\end{split}\]

where \(x\) and \(y\) are input and output variable, \(\mu^i\) and \(\sigma^i\) are the mean and std of each instance which is separately calculated for each batch and channel, and \(\gamma\) and \(\beta\) are adaptive gains and biases.

If the input shape is [B, C, H, W] (= channel_axis=1, batch_axis=0), the shape of calculated mean and std are [B, C, 1, 1]

References

Parameters
  • inp (Variable) – An input variable.

  • channel_axis (int or repeated int) – Channel axes.

  • batch_axis (int or repeated int) – Batch axes.

  • eps (float) – Tiny value to avoid zero division by std.

  • output_stat (bool) – It True, the batch statistics of mean and variance.

  • fix_parameters (bool) – If True, the beta and gamma will not be updated.

  • param_init (dict) – Parameter initializers can be set with a dict. A key of the dict must be 'gamma', 'beta'. A value of the dict must be an Initializer or a numpy.ndarray. E.g. {'gamma': np.ones(...) * 2, 'beta': ConstantInitializer(0)}.

  • no_scale (bool) – If True, the scale term is omitted.

  • no_bias (bool) – If True, the bias term is omitted.

  • Returns

Parameters to be registered

The following variables are registered in a parameter scope "instance_normalization";

  • beta (need_grad=True) : Trainable bias \(\beta\). (shape: <see above>)

  • gamma (need_grad=True) : Trainable scaling factor \(\gamma\). (shape: <see above>)

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = instance_normalization(<args>)
nnabla.parametric_functions.group_normalization(inp, num_groups, channel_axis=1, batch_axis=0, eps=1e-05, output_stat=False, fix_parameters=False, param_init=None, no_scale=False, no_bias=False, name=None)[source]

Applies Group Normalization over an input tensor, which is defined as:

\[\begin{split}\begin{eqnarray} \mu^g &=& \frac{1}{H} \sum_{i=1}^{H} x_i^g \\ \sigma^g &=& \sqrt{\frac{1}{H} \sum_{i=1}^{H} \left(x_i^g - \mu^g\right)^2} \\ y &=& \frac{x - \mu^g}{\sigma^g + \epsilon} \gamma + \beta \end{eqnarray}\end{split}\]

where \(x\) and \(y\) are input and output variable, \(\mu^g\) and \(\sigma^g\) are the mean and std of each group which contains num_channels / num_groups channels, and \(\gamma\) and \(\beta\) are adaptive gains and biases.

The input channels, specified by channel_axis, are separeted into num_groups groups, and the mean and std are calculated over the each group. For example, if the input shape is [B, C, H, W] (= channel_axis=1, batch_axis=0), an input variable is once reshaped to [B, num_groups, C / num_groups, H, W] and standardize by its mean and std whose shapes are [B, num_groups, C / num_groups, 1, 1]. Before returning, an output variable is reshaped again to the original input shape (= [B, C, H, W] in the case above).

References

Parameters
  • inp (Variable) – An input variable.

  • num_groups (int) – A number of groups. The channel dim of ‘x’ must be integer multiple of num_groups.

  • channel_axis (int) – Channel axis.

  • batch_axis (int or repeated int) – Axes mean and variance are taken.

  • eps (float) – Tiny value to avoid zero division by std.

  • output_stat (bool) – It true, the batch statistics of mean and variance.

  • fix_parameters (bool) – When set to True, the beta and gamma will not be updated.

  • param_init (dict) – Parameter initializers can be set with a dict. A key of the dict must be 'gamma', 'beta'. A value of the dict must be an Initializer or a numpy.ndarray. E.g. {'gamma': np.ones(...) * 2, 'beta': ConstantInitializer(0)}.

  • no_scale (bool) – If True, the scale term is omitted.

  • no_bias (bool) – If True, the bias term is omitted.

Returns

Normalized output variable. * Variable: Mean (if ``output_stat=True`) * Variable: Std (if ``output_stat=True`)

Return type

Parameters to be registered

The following variables are registered in a parameter scope "group_normalization";

  • beta (need_grad=True) : Trainable bias \(\beta\). (shape: <see above>)

  • gamma (need_grad=True) : Trainable scaling factor \(\gamma\). (shape: <see above>)

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = group_normalization(<args>)
nnabla.parametric_functions.rnn(x, h, w0_init=None, w_init=None, b_init=None, num_layers=1, nonlinearity='tanh', dropout=0.0, bidirectional=False, training=True, rng=None, with_bias=True, fix_parameters=False, name=None)[source]

N-Step RNN (recurrent neural networks).

N-Step RNN function implements Elman RNN with nonlineraity to input sequence. N-Step RNN function is defined as following:

\[h_t = \tanh(w_{ih}x_t+b_{ih}+w_{hh}h_{(t-1)}).\]

We use the following notations to describe the inputs and outputs below. \(T\): sequcne length, \(B\): batch size, \(I\): input size, \(L\): number of layers, \(D\): number of directions, can be either 1 or 2, \(H\): hidden size.

References

Jeffrey L. Elman. “Finding Structure in Time.” Cognitive Science. 1990.

Parameters
  • x (Variable) – Input N-D array with shape \((T, B, I)\).

  • h (Variable) – Input N-D array with shape \((L, D, B, H)\).

  • w0_init (nnabla.initializer.BaseInitializer or numpy.ndarray, optional) – Initializer for weight at the first layer. Shape is \((D, H, I + H)\).

  • w_init (nnabla.initializer.BaseInitializer or numpy.ndarray, optional) – Initializer for weights at the second layer and up. Shape is \((L-1, D, H, D*H + H)\).

  • b_init (nnabla.initializer.BaseInitializer or numpy.ndarray, optional) – Initializer for bias. Shape is \((L, D, H)\).

  • num_layers (int, optional) – Number of layers in the network. If set to 1, only the weights for the first layer will be invoked. Default is 1.

  • nonlinearity (str, optional) – Type of nonlinearity applied to input sequcne. Must be either tanh or relu. Default is tanh.

  • dropout (float, optional) – Dropout ratio applied to parameters. Default is 0.0.

  • bidirectional (bool, optional) – If True, bidirectional computation will be performed in each layer. Default is False.

  • training (bool, optional) – Backpropagation will be performed only when it is true. Default is True.

  • with_bias (bool, optional) – Specify whether to include the bias term.

Returns

Output \(y\) with shape \((T, B, D * H)\) ~nnabla.Variable: Output \(h_n\) with shape \((L, D, B, H)\)

Return type

Variable

Example

x = nn.Variable((seq_len, batch_size, input_size))
h = nn.Variable((num_layers, num_directions, batch_size, hidden_size))
y, hn = PF.rnn(x, h)
Parameters to be registered

The following variables are registered in a parameter scope "rnn";

  • weight_l0 (need_grad=True) : Filter weights at 0-th layer. (shape: (D, H, I + H))

  • weight (need_grad=True) : Filter weights at 1-st layer and above. (shape: (L-1, D, H, DH + H))

  • bias (need_grad=True) : Biases. (shape: (L, D, H))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = rnn(<args>)
nnabla.parametric_functions.lstm(x, h, c, w0_init=None, w_init=None, b_init=None, num_layers=1, dropout=0.0, bidirectional=False, training=True, rng=None, with_bias=True, fix_parameters=False, name=None)[source]

LSTM (long short-term memory).

Long Short-Term Memory, or LSTM, is a building block for recurrent neural networks (RNN) layers. LSTM unit consists of a cell and input, output, forget gates whose functions are defined as following:

\[\begin{split}f_t&&=\sigma(W_fx_t+U_fh_{t-1}+b_f) \\ i_t&&=\sigma(W_ix_t+U_ih_{t-1}+b_i) \\ o_t&&=\sigma(W_ox_t+U_oh_{t-1}+b_o) \\ c_t&&=f_t\odot c_{t-1}+i_t\odot\tanh(W_cx_t+U_ch_{t-1}+b_c) \\ h_t&&=o_t\odot\tanh(c_t).\end{split}\]

We use the following notations to describe the inputs and outputs below. \(T\): sequcne length, \(B\): batch size, \(I\): input size, \(L\): number of layers, \(D\): number of directions, can be either 1 or 2, \(H\): hidden size.

References

S. Hochreiter, and J. Schmidhuber. “Long Short-Term Memory.” Neural Computation. 1997.

Parameters
  • x (Variable) – Input N-D array with shape \((T, B, I)\).

  • h (Variable) – Input N-D array with shape \((L, D, B, H)\).

  • c (Variable) – Input N-D array with shape \((L, D, B, H)\) .

  • w0_init (nnabla.initializer.BaseInitializer or numpy.ndarray, optional) – Initializer for weight at the first layer. Shape is \((D, 4, H, I + H)\).

  • w_init (nnabla.initializer.BaseInitializer or numpy.ndarray, optional) – Initializer for weights at the second layer and up. Shape is \((L-1, D, 4, H, D * H + H)\).

  • b_init (nnabla.initializer.BaseInitializer or numpy.ndarray, optional) – Initializer for bias. Shape is \((L, D, 4, H)\).

  • num_layers (int, optional) – Number of layers in the network. If set to 1, only the weights for the first layer will be invoked. Default is 1.

  • dropout (float, optional) – Dropout ratio applied to parameters. Default is 0.0.

  • bidirectional (bool, optional) – If True, bidirectional computation will be performed in each layer. Default is False.

  • training (bool, optional) – Backpropagation will be performed only when it is true. Default is True.

  • with_bias (bool, optional) – Specify whether to include the bias term.

  • fix_parameters (bool) – When set to True, the weights and biases will not be updated.

Returns

Output \(y\) with shape \((T, B, D * H)\) ~nnabla.Variable: Output \(h_n\) with shape \((L, D, B, H)\) ~nnabla.Variable: Output \(c_n\) with shape \((L, D, B, H)\)

Return type

Variable

Example

x = nn.Variable((seq_len, batch_size, input_size))
h = nn.Variable((num_layers, num_directions, batch_size, hidden_size))
c = nn.Variable((num_layers, num_directions, batch_size, hidden_size))
y, hn, cn = PF.lstm(x, h, c)
Parameters to be registered

The following variables are registered in a parameter scope "lstm";

  • weight_l0 (need_grad=True) : Filter weights at 0-th layer. (shape: (D, 4, H, I + H))

  • weight (need_grad=True) : Filter weights at 1-st layer and above. (shape: (L-1, D, 4, H, DH + H))

  • bias (need_grad=True) : Biases. (shape: (L, D, 4, H))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = lstm(<args>)
nnabla.parametric_functions.gru(x, h, w0_init=None, w_init=None, b_init=None, num_layers=1, dropout=0.0, bidirectional=False, training=True, rng=None, with_bias=True, fix_parameters=False, name=None)[source]

GRU (gated recurrent units).

GRU is defined as following:

\[\begin{split}r_t&&=\sigma(W_rx_t+U_rh_{t-1}+b_r) \\ z_t&&=\sigma(W_zx_t+U_zh_{t-1}+b_z) \\ n_t&&=\tanh(W_nx_t+b_{in}+r_n \odot (U_nh_{t-1}+b_{hn})) \\ h_t&&=(1-z_t) \odot n_t+z_t \odot h_{t-1}.\end{split}\]

We use the following notations to describe the inputs and outputs below. \(T\): sequcne length, \(B\): batch size, \(I\): input size, \(L\): number of layers, \(D\): number of directions, can be either 1 or 2, \(H\): hidden size.

References

K. Cho et al. “Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation.” Empirical Methods in Natural Language Processing. 2014.

Parameters
  • x (Variable) – Input N-D array with shape \((T, B, I)\).

  • h (Variable) – Input N-D array with shape \((L, D, B, H)\).

  • w0_init (nnabla.initializer.BaseInitializer or numpy.ndarray, optional) – Initializer for weight at the first layer. Shape is \((D, 3, H, I + H)\).

  • w_init (nnabla.initializer.BaseInitializer or numpy.ndarray, optional) – Initializer for weights at the second layer and up. Shape is \((L-1, D, 3, H, D * H + H)\).

  • b_init (nnabla.initializer.BaseInitializer or numpy.ndarray, optional) – Initializer for bias. Shape is \((L, D, 4, H)\).

  • num_layers (int, optional) – Number of layers in the network. If set to 1, only the weights for the first layer will be invoked. Default is 1.

  • dropout (float, optional) – Dropout ratio applied to parameters. Default is 0.0.

  • bidirectional (bool, optional) – If True, bidirectional computation will be performed in each layer. Default is False.

  • training (bool, optional) – Backpropagation will be performed only when it is true. Default is True.

  • with_bias (bool, optional) – Specify whether to include the bias term.

Returns

Output \(y\) with shape \((T, B, D * H)\) ~nnabla.Variable: Output \(h_n\) with shape \((L, D, B, H)\)

Return type

Variable

Example

x = nn.Variable((seq_len, batch_size, input_size))
h = nn.Variable((num_layers, num_directions, batch_size, hidden_size))
y, hn = PF.gru(x, h)
Parameters to be registered

The following variables are registered in a parameter scope "gru";

  • weight_l0 (need_grad=True) : Filter weights at 0-th layer. (shape: (D, 3, H, I + H))

  • weight (need_grad=True) : Filter weights at 1-st layer and above. (shape: (L-1, D, 3, H, DH + H))

  • bias (need_grad=True) : Biases. (shape: (L, D, 4, H))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = gru(<args>)
nnabla.parametric_functions.embed(inp, n_inputs, n_features, initializer=None, fix_parameters=False, apply_w=None, name=None)[source]

Embed.

Embed slices a matrix/tensor with indexing array/tensor. Weights are initialized with nnabla.initializer.UniformInitializer within the range of \(-\sqrt{3}\) and \(\sqrt{3}\).

Parameters
  • x (Variable) – [Integer] Indices with shape \((I_0, ..., I_N)\)

  • n_inputs – number of possible inputs, words or vocabraries

  • n_features – number of embedding features

  • fix_parameters (bool) – When set to True, the embedding weight matrix will not be updated.

  • apply_w (function) – Lambda, function, or callable object applied to the weights.

Returns

Output with shape \((I_0, ..., I_N, W_1, ..., W_M)\)

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "embed";

  • W (need_grad=True) : Embedding matrix. (shape: (n_inputs, n_features))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = embed(<args>)
nnabla.parametric_functions.prelu(inp, base_axis=1, shared=True, fix_parameters=False, slope_init=None, name=None)[source]

Parametrized Rectified Linear Unit function defined as

\[y_i = \max(0, x_i) + w_i \min(0, x_i)\]

where negative slope \(w\) is learned and can vary across channels (an axis specified with base_axis). Weights are initialized with \(-1\).

Parameters
  • x (Variable) – N-D array as input

  • base_axis (int) – Dimensions up to base_axis is treated as sample dimension.

  • shared (bool) – Use shared weight value or not

  • fix_parameters (bool) – When set to True, the negative slope values will not be updated.

  • slope_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer of negative slopes. By default, they are initialized with 0.25.

Returns

N-D array.

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "prelu";

  • slope (need_grad=True) : Negative slope. (shape: tuple() if shared else (inp.shape[base_axis],))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = prelu(<args>)
nnabla.parametric_functions.svd_affine(inp, n_outmaps, r, base_axis=1, uv_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, name=None)[source]

SVD affine is a low rank approximation of the affine layer. It can be seen as two consecutive affine layers with a bottleneck. It computes:

\[{\mathbf y} = {\mathbf U} {\mathbf V} {\mathbf x} + {\mathbf b}.\]

where \({\mathbf x}, {\mathbf y}\) are the inputs and outputs respectively, and \({\mathbf U}, {\mathbf V}, {\mathbf b}\) are constants.

The weights \({\mathbf U}\) and \({\mathbf V}\) are approximated with singular value decomposition (SVD) of the original weight matrix \({\mathbf W}\) and by selecting the \({R}\) dominant singular values and the corresponding singular vectors. Therefore the low rank \({R}\) is the size of the bottleneck.

If uv_init is a numpy array, \({\mathbf U}\) and \({\mathbf V}\) are computed such that uv_init is approximated by \({\mathbf{UV}}\). If uv_init is None or an initializer, the product of \({\mathbf U}\) and \({\mathbf V}\) approximates the random initialization.

If \({\mathbf U}\) and \({\mathbf V}\) exist in the context, they take precedence over uv_init.

Suppose the weight of the affine is of \({I \times O}\) and the compression rate you want to specify is \({CR}\), then you set \({R}\) as

\[R = \left\lfloor \frac{(1 - CR)OI}{O + I} \right\rfloor.\]
Parameters
Returns

\((B + 1)\)-D array. (\(M_0 \times \ldots \times M_{B-1} \times L\))

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "svd_affine";

  • U (need_grad=True) : \({\mathbf U}\). (shape: (inmaps, r))

  • V (need_grad=True) : \({\mathbf V}\). (shape: (r, outmaps))

  • b (need_grad=True) : Bias vector. (shape: (outmaps,))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = svd_affine(<args>)
nnabla.parametric_functions.svd_convolution(inp, outmaps, kernel, r, pad=None, stride=None, dilation=None, uv_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]

SVD convolution is a low rank approximation of the convolution layer. It can be seen as a depth wise convolution followed by a 1x1 convolution.

The flattened kernels for the i-th input map are expressed by their low rank approximation. The kernels for the i-th input \({\mathbf W_i}\) are approximated with the singular value decomposition (SVD) and by selecting the \({R}\) dominant singular values and the corresponding singular vectors.

\[{\mathbf W_{:,i,:}} ~ {\mathbf U_i} {\mathbf V_i}.\]

\({\mathbf U}\) contains the weights of the depthwise convolution with multiplier \({R}\) and \({\mathbf V}\) contains the weights of the 1x1 convolution.

If uv_init is a numpy array, \({\mathbf U}\) and \({\mathbf V}\) are computed such that uv_init is approximated by \({\mathbf{UV}}\). If uv_init is None or an initializer, the product of \({\mathbf U}\) and \({\mathbf V}\) approximates the random initialization.

If \({\mathbf U}\) and \({\mathbf V}\) exist in the context, they take precedence over uv_init.

Suppose the kernel tensor of the convolution is of \({O \times I \times K \times K}\) and the compression rate you want to specify is \({CR}\), then you set \({R}\) as

\[R = \left\lfloor \frac{(1 - CR)OIK^2}{I(O + K^2)} \right\rfloor.\]
Parameters
Returns

\((B + 1)\)-D array. (\(M_0 \times \ldots \times M_{B-1} \times L\))

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "svd_conv";

  • U (need_grad=True) : Decomposed filter weights \({\mathbf U}\). (shape: (inmaps * r, *kernel))

  • V (need_grad=True) : Decomposed filter weights \({\mathbf V}\). (shape: (outmaps, inmaps * r, 1, ...))

  • b (need_grad=True) : Bias vector. (shape: (outmaps,))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = svd_convolution(<args>)
nnabla.parametric_functions.cpd3_convolution(inp, outmaps, kernel, r, pad=None, stride=None, dilation=None, oik_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, max_iter=500, stopping_criterion=1e-05, lambda_reg=0.0, name=None)[source]

CP convolution is a low rank approximation of a convolution layer. A 3D tensor containing the parameter is built by collapsing the N-D kernels into 1D, then the tensor is decomposed into three matrices. The decomposed layer can be seen as linear combinations of the input feature maps to \({R}\) feature maps followed by a depthwise convolution and followed by linear combinations of the feature maps to compute the output feature maps.

The CP decomposition allows to approximate the kernel tensor by \({R}\) rank-1 tensors of the form:

\[\sum_{r=1}^{R} \lambda_r {\mathbf{o}^{(r)} \otimes \mathbf{i}^{(r)} \otimes \mathbf{k}^{(r)}},\]

where \({\lambda}_r\) is the normalization coefficient and \({\otimes}\) is the outer product.

If oik_init is a numpy array, U and V are computed so that uv_init can be approximates from UV If oik_init is None or an initializer, the product of U and V approximate the randomly initialized array

If O, I and K exist in context, they are used to initialize the layer and oik_init is not used.

Suppose the kernel tensor of the affine is of \({I \times O}\) and the compression rate you want to specify is \({CR}\), then you set \({R}\) as

\[R = \left\lfloor \frac{(1 - CR)OIK^2}{O + I + K^2} \right\rfloor.\]

References

  • Lebedev, Vadim, Yaroslav Ganin, Maksim Rakhuba, Ivan Oseledets, and Victor Lempitsky, “Speeding-up convolutional neural networks using fine-tuned cp-decomposition.”, arXiv preprint arXiv:1412.6553 (2014).

  • Marcella Astrid, Seung-Ik Lee, “CP-decomposition with Tensor Power Method for Convolutional Neural Networks Compression”, BigComp 2017.

Parameters
  • inp (Variable) – N-D array.

  • outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.

  • kernel (tuple of int) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) two-dimensional kernel, specify (3,5).

  • r (int) – rank of the factorized layer

  • pad (tuple of int) – Padding sizes for dimensions.

  • stride (tuple of int) – Stride sizes for dimensions.

  • dilation (tuple of int) – Dilation sizes for dimensions.

  • oik_init (numpy array or nnabla.initializer.BaseInitializer) – Initializer for weight. Initializer for weight. By default, it is initialized with nnabla.initializer.UniformInitializer within the range determined by nnabla.initializer.calc_uniform_lim_glorot.

  • b_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for bias. It is initialized with zeros if with_bias is True.

  • base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.

  • fix_parameters (bool) – When set to True, the weights and biases will not be updated.

  • rng (numpy.random.RandomState) – Random generator for Initializer.

  • with_bias (bool) – Specify whether to include the bias term.

  • max_iter (int) – Max iteration of the ALS.

  • stopping_criterion (float) – Threshold for stopping the ALS. If the value is negative, the convergence check is ignored; in other words, it may reduce the computation time.

  • lambda_reg (float) – regularization parameter for the ALS. Larger lambda_reg means larger regularization.

Returns

\((B + 1)\)-D array. (\(M_0 \times \ldots \times M_{B-1} \times L\))

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "cpd3_conv";

  • I (need_grad=True) : Decomposed filter weights \({\mathbf I}\). (shape: (r, inmaps, 1, ...))

  • K (need_grad=True) : Decomposed filter weights \({\mathbf K}\). (shape: (r, *kernel))

  • O (need_grad=True) : Decomposed filter weights \({\mathbf O}\). (shape: (outmaps, r, 1, ...))

  • b (need_grad=True) : Bias vector. (shape: (outmaps,))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = cpd3_convolution(<args>)
nnabla.parametric_functions.binary_connect_affine(inp, n_outmaps, base_axis=1, quantize_zero_to=1.0, w_init=None, wb_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, name=None)[source]

Binary Connect Affine, multiplier-less inner-product.

Binary Connect Affine is an affine function, except the definition of the inner product is modified. The input-output relation of this function is as follows:

\[y_i = \sum_{i} sign(w_i) x_i.\]

Therefore \(sign(w_i)\) is either \(1\) or \(-1\) and the inner product simplifies to addition.

This function should be used together with Batch Normalization.

References

M. Courbariaux, Y. Bengio, and J.-P. David. “BinaryConnect: Training Deep Neural Networks with binary weights during propagations.” Advances in Neural Information Processing Systems. 2015.

Note

1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the binarized weights (binary_weight)

2) The weights and the binary weights become synced only after forward() is called, and not after a call to backward(). To access the parameters of the network, remember to call forward() once before doing so, otherwise the float weights and the binary weights will not be in sync.

3) Quantized values are stored as floating point number for binary_weight, since this function is only for simulation purposes.

Parameters
Returns

Variable

Parameters to be registered

The following variables are registered in a parameter scope "bicon_affine";

  • W (need_grad=True) : Weight matrix in floating type. (shape: (inmaps, outmaps))

  • Wb (need_grad=False) : Binarized weights. (shape: (inmaps, outmaps))

  • b (need_grad=True) : Bias vector. (shape: (outmaps,))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = binary_connect_affine(<args>)
nnabla.parametric_functions.binary_connect_convolution(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, quantize_zero_to=1.0, w_init=None, wb_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]

Binary Connect Convolution, multiplier-less inner-product.

Binary Connect Convolution is the convolution function, except the definition of the inner product is modified. The input-output relation of this function is as follows:

\[y_{n, a, b} = \sum_{m} \sum_{i} \sum_{j} sign(w_{n, m, i, j}) x_{m, a + i, b + j}.\]

Therefore \(sign(w_i)\) is either \(1\) or \(-1\) and the inner product simplifies to addition.

This function should be used together with BatchNormalization.

References

M. Courbariaux, Y. Bengio, and J.-P. David. “BinaryConnect: Training Deep Neural Networks with binary weights during propagations.” Advances in Neural Information Processing Systems. 2015.

Note

1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the binarized weights (binary_weight)

2) The weights and the binary weights become synced only after forward() is called, and not after a call to backward(). To access the parameters of the network, remember to call forward() once before doing so, otherwise the float weights and the binary weights will not be in sync.

3) Quantized values are stored as floating point number for binary_weight, since this function is only for simulation purposes.

Parameters
Returns

Variable

Parameters to be registered

The following variables are registered in a parameter scope "bicon_conv";

  • W (need_grad=True) : Filter weights in float. (shape: (outmaps, inmaps, *kernel))

  • Wb (need_grad=False) : Binarized filter weights. (shape: (outmaps, inmaps, *kernel))

  • b (need_grad=True) : Bias vector. (shape: (outmaps,))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = binary_connect_convolution(<args>)
nnabla.parametric_functions.binary_weight_affine(inp, n_outmaps, base_axis=1, quantize_zero_to=1.0, w_init=None, wb_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, name=None)[source]

Binary Weight Affine, multiplier-less inner-product with a scale factor.

Binary Weight Affine is the affine function, but the inner product in this function is the following,

\[y_j = \frac{1}{\|\mathbf{w}_j\|_{\ell_1}} \sum_{i} sign(w_{ji}) x_i\]

Therefore \(sign(w_{ji})\) is either \(1\) or \(-1\) and the inner product simplifies to addition followed by scaling factor \(\alpha = \frac{1}{\|\mathbf{w}_j\|_{\ell_1}}\). The number of :\(\alpha\) is the outmaps of the affine function.

References

Rastegari, Mohammad, et al. “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks.” arXiv preprint arXiv:1603.05279 (2016).

Note

1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the binarized weights (binary_weight)

2) The weights and the binary weights become synced only after forward() is called, and not after a call to backward(). To access the parameters of the network, remember to call forward() once before doing so, otherwise the float weights and the binary weights will not be in sync.

3) Quantized values are stored as floating point number for binary_weight, since this function is only for simulation purposes.

Parameters
Returns

Variable

Parameters to be registered

The following variables are registered in a parameter scope "bwn_affine";

  • W (need_grad=True) : Weight matrix in floating type. (shape: (inmaps, outmaps))

  • Wb (need_grad=False) : Binarized weights. (shape: (inmaps, outmaps))

  • alpha (need_grad=False) : Scaling factor \(\alpha\). (shape: (outmaps,))

  • b (need_grad=True) : Bias vector. (shape: (outmaps,))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = binary_weight_affine(<args>)
nnabla.parametric_functions.binary_weight_convolution(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, quantize_zero_to=1.0, w_init=None, wb_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]

Binary Weight Convolution, multiplier-less inner-product with a scale factor.

Binary Weight Convolution is the convolution function, but the inner product in this function is the following,

\[y_{n, a, b} = \frac{1}{\|\mathbf{w}_n\|_{\ell_1}} \sum_{m} \sum_{i} \sum_{j} sign(w_{n, m, i, j}) x_{m, a + i, b + j}.\]

Therefore \(sign(w_{n, m, i, j})\) is either \(1\) or \(-1\) and the inner product simplifies to addition followed by scaling factor \(\alpha = \frac{1}{\|\mathbf{w}_n\|_{\ell_1}}\). The number of \(n\) is the number of outmaps of the convolution function.

References

Rastegari, Mohammad, et al. “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks.” arXiv preprint arXiv:1603.05279 (2016).

Note

1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the binarized weights (binary_weight)

2) The weights and the binary weights become synced only after forward() is called, and not after a call to backward(). To access the parameters of the network, remember to call forward() once before doing so, otherwise the float weights and the binary weights will not be in sync.

3) Quantized values are stored as floating point number for binary_weight, since this function is only for simulation purposes.

Parameters
Returns

Variable

Parameters to be registered

The following variables are registered in a parameter scope "bwn_conv";

  • W (need_grad=True) : Filter weights in float. (shape: (outmaps, inmaps, *kernel))

  • Wb (need_grad=False) : Binarized filter weights. (shape: (outmaps, inmaps, *kernel))

  • alpha (need_grad=False) : Scaling factor \(\alpha\). (shape: (outmaps,))

  • b (need_grad=True) : Bias vector. (shape: (outmaps,))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = binary_weight_convolution(<args>)
nnabla.parametric_functions.inq_affine(inp, n_outmaps, base_axis=1, num_bits=4, inq_iterations=(), selection_algorithm='random', seed=- 1, w_init=None, i_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, name=None)[source]

Incremental Network Quantization Affine Layer

During training, the weights are sequentially quantized to power-of-two values, which allows the training of a multiplierless network.

Using inq_iterations, one can specify after how many forward passes half of the learnable weights are fixed and quantized to powers-of-two. After reaching the last value in inq_iterations, all weights are fixed.

For more details, please refer to the reference.

Reference: Zhou A, Yao A, Guo Y, Xu L, Chen Y. Incremental network quantization: Towards lossless CNNs with low-precision weights. <https://arxiv.org/abs/1702.03044>

Parameters
  • inp (Variable) – Input N-D array with shape (\(M_0 \times \ldots \times M_{B-1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it was a matrix.

  • n_outmaps (int or tuple of int) – Number of output neurons per data.

  • base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.

  • quantize_zero_to (float) – Input value at zero is quantized to this value.

  • num_bits (int) – Number of bits per weight. Value has to be larger than 1 as one bit is already used to code the value “0”

  • inq_iterations (tuple of int) – Tuple of iteration numbers at which we fix half of the weights.

  • selection_algorithm (str) – Chooses algorithm that is used to decide which weights are fixed. (“largest_abs” … fix weights with largest absolute value, “random” … fix weights randomly)

  • seed (int) – Random seed for INQ algorithm

  • w_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for weight. By default, it is initialized with nnabla.initializer.UniformInitializer within the range determined by nnabla.initializer.calc_uniform_lim_glorot.

  • i_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for indicators (0 … learnable, 1 … fixed). By default, it is initialized with zeros.

  • b_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.

  • fix_parameters (bool) – When set to True, the weight and bias will not be updated.

  • rng (numpy.random.RandomState) – Random generator for Initializer.

  • with_bias (bool) – Specify whether to include the bias term.

Returns

Variable

Parameters to be registered

The following variables are registered in a parameter scope "inq_affine";

  • W (need_grad=True) : Weight matrix in floating type. (shape: (inmaps, outmaps))

  • I (need_grad=False) : Binary indicator matrix of fixed weights. (shape: (inmaps, outmaps))

  • b (need_grad=True) : Bias vector. (shape: (outmaps,))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = inq_affine(<args>)
nnabla.parametric_functions.inq_convolution(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, num_bits=4, inq_iterations=(), selection_algorithm='random', seed=- 1, w_init=None, i_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, name=None)[source]

Incremental Network Quantization Convolution Layer

During training, the weights are sequentially quantized to power-of-two values, which allows the training of a multiplierless network.

Using inq_iterations, one can specify after how many forward passes half of the learnable weights are fixed and quantized to powers-of-two. After reaching the last value in inq_iterations, all weights are fixed.

For more details, please refer to the reference.

Reference: Zhou A, Yao A, Guo Y, Xu L, Chen Y. Incremental network quantization: Towards lossless CNNs with low-precision weights. <https://arxiv.org/abs/1702.03044>

Parameters
  • inp (Variable) – Input N-D array with shape (\(M_0 \times \ldots \times M_{B-1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it was a matrix.

  • n_outmaps (int or tuple of int) – Number of output neurons per data.

  • base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.

  • num_bits (int) – Number of bits per weight. Value has to be larger than 1 as one bit is already used to code the value “0”

  • inq_iterations (tuple of int) – Tuple of iteration numbers at which we fix half of the weights.

  • selection_algorithm (str) – Chooses algorithm that is used to decide which weights are fixed. (“largest_abs” … fix weights with largest absolute value, “random” … fix weights randomly)

  • seed (int) – Random seed for INQ algorithm

  • w_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for the weight. By default, it is initialized with nnabla.initializer.UniformInitializer within the range determined by nnabla.initializer.calc_uniform_lim_glorot.

  • i_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for the indicators (0 … learnable, 1 … fixed). By default, it is initialized with zeros.

  • b_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for the bias. By default, it is initialized with zeros if with_bias is True.

  • fix_parameters (bool) – When set to True, the weight and bias will not be updated.

  • rng (numpy.random.RandomState) – Random generator for Initializer.

  • with_bias (bool) – Specify whether to include the bias term.

Returns

Variable

Parameters to be registered

The following variables are registered in a parameter scope "inq_conv";

  • W (need_grad=True) : Filter weights in float. (shape: (outmaps, inmaps, *kernel))

  • I (need_grad=False) : Binary indicator matrix of fixed weights. (shape: (outmaps, inmaps, *kernel))

  • b (need_grad=True) : Bias vector. (shape: (outmaps,))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = inq_convolution(<args>)
nnabla.parametric_functions.fixed_point_quantized_affine(inp, n_outmaps, base_axis=1, w_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, quantize_w=True, sign_w=True, n_w=8, delta_w=0.0625, ste_fine_grained_w=True, quantize_b=True, sign_b=True, n_b=8, delta_b=0.0625, ste_fine_grained_b=True, name=None)[source]

Fixed-Point Quantized Affine.

Fixed-Point Quantized Affine is the affine function, except the definition of the inner product is modified. The input-output relation of this function is as follows:

\[y_j = \sum_{i} Q(w_{ji}) x_i,\]

where \(Q(w_{ji})\) is the fixed-point quantization function.

Note

1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the quantized weights (quantized weight)

2) The weights and the quantized weights become synced only after forward() is called, and not after a call to backward(). To access the parameters of the network, remember to call forward() once before doing so, otherwise the float weights and the quantized weights will not be in sync.

3) CPU and GPU implementations now use float value for quantized weight, since this function is only for simulation purposes.

Parameters
Returns

\((B + 1)\)-D array. (\(M_0 \times \ldots \times M_{B-1} \times L\))

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "fp_quantized_affine";

  • W (need_grad=True) : Weight matrix in float. (shape: (inmaps, outmaps))

  • b (need_grad=True) : Bias vector in float. (shape: (outmaps,))

  • W_q (need_grad=False) : Quantized weights. (shape: (inmaps, outmaps))

  • b_q (need_grad=False) : Quantized biases. (shape: (outmaps,))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = fixed_point_quantized_affine(<args>)
nnabla.parametric_functions.fixed_point_quantized_convolution(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, channel_last=False, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, quantize_w=True, sign_w=True, n_w=8, delta_w=0.0625, ste_fine_grained_w=True, quantize_b=True, sign_b=True, n_b=8, delta_b=0.0625, ste_fine_grained_b=True, name=None)[source]

Fixed-Point Quantized Convolution.

Fixed-Point Quantized Convolution is the convolution function, except the definition of the inner product is modified. The input-output relation of this function is as follows:

\[y_{n, a, b} = \sum_{m} \sum_{i} \sum_{j} Q(w_{n, m, i, j}) x_{m, a + i, b + j},\]

where \(Q(w_{n, m, i, j})\) is the fixed-point quantization function.

Note

1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the quantized weights (quantized weight)

2) The weights and the quantized weights become synced only after forward() is called, and not after a call to backward(). To access the parameters of the network, remember to call forward() once before doing so, otherwise the float weights and the quantized weights will not be in sync.

3) CPU and GPU implementations now use float value for quantized weight, since this function is only for simulation purposes.

Parameters
  • inp (Variable) – N-D array.

  • outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.

  • kernel (tuple of int) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) two-dimensional kernel, specify (3,5).

  • pad (tuple of int) – Padding sizes for dimensions.

  • stride (tuple of int) – Stride sizes for dimensions.

  • dilation (tuple of int) – Dilation sizes for dimensions.

  • group (int) – Number of groups of channels. This makes connections across channels more sparse by grouping connections along map direction.

  • w_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for weight. By default, it is initialized with nnabla.initializer.UniformInitializer within the range determined by nnabla.initializer.calc_uniform_lim_glorot.

  • b_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.

  • base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.

  • fix_parameters (bool) – When set to True, the weights and biases will not be updated.

  • rng (numpy.random.RandomState) – Random generator for Initializer.

  • with_bias (bool) – Specify whether to include the bias term.

  • quantize_w (bool) – Quantize weights if True.

  • quantize_bias (bool) – Quantize bias if True.

  • sign_w (bool) – Use signed quantization if True.

  • n_w (int) – Bit width used for weight.

  • delta_w (float) – Step size for weight.

  • ste_fine_grained_w (bool) – STE is fine-grained if True.

  • quantize_b (bool) – Quantize bias if True.

  • n_b (int) – Bit width used for bias.

  • delta_w – Step size for bias.

  • ste_fine_grained_b (bool) – STE is fine-grained if True.

Returns

N-D array.

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "fp_quantized_conv";

  • W (need_grad=True) : Filter weights in float. (shape: (outmaps, inmaps // group, *kernel))

  • b (need_grad=True) : Bias vector in float. (shape: (outmaps,))

  • W_q (need_grad=False) : Quantized weights. (shape: (outmaps, inmaps // group, *kernel))

  • b_q (need_grad=False) : Quantized biases. (shape: (outmaps,))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = fixed_point_quantized_convolution(<args>)
nnabla.parametric_functions.min_max_quantized_affine(inp, n_outmaps, base_axis=1, w_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, quantize_w=True, ql_min_w=0, ql_max_w=255, w_min_max=False, qr_min_w_init=None, qr_max_w_init=None, ste_fine_grained_w=True, quantize_b=True, ql_min_b=0, ql_max_b=255, b_min_max=False, qr_min_b_init=None, qr_max_b_init=None, ste_fine_grained_b=True, eps=0.01, name=None)[source]

Min-max Quantized Affine.

Min-max Quantized Affine is the affine function, except the definition of the inner product is modified. The input-output relation of this function is as follows:

\[y_j = \sum_{i} Q(w_{ji}) x_i,\]

where \(Q(w_{ji})\) is the min-max quantization function.

In the min_max_quantized affine, the exponential moving average is not used. the min and max quantization ranges are either the min-max of weights and bias or trained.

Notice that the min and max values of inputs are always used instead of the exponential moving average.

Note

1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the quantized weights (quantized weight)

2) The weights and the quantized weights become synced only after forward() is called, and not after a call to backward(). To access the parameters of the network, remember to call forward() once before doing so, otherwise the float weights and the quantized weights will not be in sync.

3) CPU and GPU implementations now use float value for quantized weight, since this function is only for simulation purposes.

Parameters
Returns

\((B + 1)\)-D array. (\(M_0 \times \ldots \times M_{B-1} \times L\))

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "min_max_quantized_affine";

  • W (need_grad=True) : Weight matrix in float. (shape: (inmaps, outmaps))

  • b (need_grad=True) : Bias vector in float. (shape: (outmaps,))

  • W_q (need_grad=False) : Quantized weights. (shape: (inmaps, outmaps))

  • b_q (need_grad=False) : Quantized biases. (shape: (outmaps,))

  • qr_min (need_grad=False) : Minimum quantization range. Minimum values of inputs or trainable range.. (shape: ql_min.shape)

  • qr_max (need_grad=False) : Maximum quantization range. Maximum values of inputs or trainable range.. (shape: ql_max.shape)

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = min_max_quantized_affine(<args>)
nnabla.parametric_functions.min_max_quantized_convolution(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, channel_last=False, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, quantize_w=True, ql_min_w=0, ql_max_w=255, w_min_max=False, qr_min_w_init=None, qr_max_w_init=None, ste_fine_grained_w=True, quantize_b=True, ql_min_b=0, ql_max_b=255, b_min_max=False, qr_min_b_init=None, qr_max_b_init=None, ste_fine_grained_b=True, eps=0.01, name=None)[source]

Min-max Quantized Convolution.

Min-max Quantized Convolution is the convolution function, except the definition of the inner product is modified. The input-output relation of this function is as follows:

\[y_{n, a, b} = \sum_{m} \sum_{i} \sum_{j} Q(w_{n, m, i, j}) x_{m, a + i, b + j},\]

where \(Q(w_{n, m, i, j})\) is the min-max quantization function.

In the min_max_quantized convolution, the exponential moving average is not used. the min and max quantization ranges are either the min-max of weights and bias or trained.

Notice that the min and max values of inputs are always used instead of the exponential moving average.

Note

1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the quantized weights (quantized weight)

2) The weights and the quantized weights become synced only after forward() is called, and not after a call to backward(). To access the parameters of the network, remember to call forward() once before doing so, otherwise the float weights and the quantized weights will not be in sync.

3) CPU and GPU implementations now use float value for quantized weight, since this function is only for simulation purposes.

Parameters
Returns

N-D array.

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "min_max_quantized_conv";

  • W (need_grad=True) : Filter weights in float. (shape: (outmaps, inmaps // group, *kernel))

  • b (need_grad=True) : Bias vector in float. (shape: (outmaps,))

  • W_q (need_grad=False) : Quantized weights. (shape: (outmaps, inmaps // group, *kernel))

  • b_q (need_grad=False) : Quantized biases. (shape: (outmaps,))

  • qr_min (need_grad=False) : Minimum quantization range. Minimum values of inputs or trainable range.. (shape: ql_min.shape)

  • qr_max (need_grad=False) : Maximum quantization range. Maximum values of inputs or trainable range.. (shape: ql_max.shape)

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = min_max_quantized_convolution(<args>)
nnabla.parametric_functions.pow2_quantized_affine(inp, n_outmaps, base_axis=1, w_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, quantize_w=True, sign_w=True, with_zero_w=False, n_w=8, m_w=2, ste_fine_grained_w=True, quantize_b=True, sign_b=True, with_zero_b=False, n_b=8, m_b=2, ste_fine_grained_b=True, name=None)[source]

Pow2 Quantized Affine.

Pow2 Quantized Affine is the affine function, except the definition of the inner product is modified. The input-output relation of this function is as follows:

\[y_j = \sum_{i} Q(w_{ji}) x_i,\]

where \(Q(w_{ji})\) is the power-of-2 quantization function.

Note

1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the quantized weights (quantized weight)

2) The weights and the quantized weights become synced only after forward() is called, and not after a call to backward(). To access the parameters of the network, remember to call forward() once before doing so, otherwise the float weights and the quantized weights will not be in sync.

3) Quantized values are stored as floating point number for quantized weight, since this function is only for simulation purposes.

Parameters
  • inp (Variable) – Input N-D array with shape (\(M_0 \times \ldots \times M_{B-1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.

  • n_outmaps (int or tuple of int) – Number of output neurons per data.

  • base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.

  • w_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for weight. By default, it is initialized with nnabla.initializer.UniformInitializer within the range determined by nnabla.initializer.calc_uniform_lim_glorot.

  • b_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.

  • fix_parameters (bool) – When set to True, the weights and biases will not be updated.

  • rng (numpy.random.RandomState) – Random generator for Initializer.

  • with_bias (bool) – Specify whether to include the bias term.

  • quantize_w (bool) – Quantize weights if True.

  • sign_w (bool) – Use signed quantization if True.

  • with_zero_w (bool) – Indicate using zero as a quantized value. Default is false.

  • n_w (int) – Bit width used for weight.

  • m_w (int) – \(2^m\) is upper bound and \(-2^m\) is lower bound for weights. Default is 2.

  • ste_fine_grained_w (bool) – STE is fine-grained if True.

  • quantize_b (bool) – Quantize bias if True.

  • with_zero_b (bool) – Indicate using zero as a quantized value. Default is false.

  • n_b (int) – Bit width used for bias.

  • m_b (int) – \(2^m\) is upper bound and \(-2^m\) is lower bound for bias. Default is 2.

  • ste_fine_grained_b (bool) – STE is fine-grained if True.

Returns

\((B + 1)\)-D array. (\(M_0 \times \ldots \times M_{B-1} \times L\))

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "pow2_quantized_affine";

  • W (need_grad=True) : Weight matrix in float. (shape: (inmaps, outmaps))

  • b (need_grad=True) : Bias vector in float. (shape: (outmaps,))

  • W_q (need_grad=False) : Quantized weights. (shape: (inmaps, outmaps))

  • b_q (need_grad=False) : Quantized biases. (shape: (outmaps,))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = pow2_quantized_affine(<args>)
nnabla.parametric_functions.pow2_quantized_convolution(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, quantize_w=True, with_zero_w=False, sign_w=True, n_w=8, m_w=2, ste_fine_grained_w=True, quantize_b=True, with_zero_b=False, sign_b=True, n_b=8, m_b=2, ste_fine_grained_b=True, name=None)[source]

Pow2 Quantized Convolution.

Pow2 Quantized Convolution is the convolution function, except the definition of the inner product is modified. The input-output relation of this function is as follows:

\[y_{n, a, b} = \sum_{m} \sum_{i} \sum_{j} Q(w_{n, m, i, j}) x_{m, a + i, b + j},\]

where \(Q(w_{n, m, i, j})\) is the power-of-2 quantization function.

Note

1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the quantized weights (quantized weight)

2) The weights and the quantized weights become synced only after forward() is called, and not after a call to backward(). To access the parameters of the network, remember to call forward() once before doing so, otherwise the float weights and the quantized weights will not be in sync.

3) Quantized values are stored as floating point number for quantized weight, since this function is only for simulation purposes.

Parameters
  • inp (Variable) – N-D array.

  • outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.

  • kernel (tuple of int) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) two-dimensional kernel, specify (3,5).

  • pad (tuple of int) – Padding sizes for dimensions.

  • stride (tuple of int) – Stride sizes for dimensions.

  • dilation (tuple of int) – Dilation sizes for dimensions.

  • group (int) – Number of groups of channels. This makes connections across channels more sparse by grouping connections along map direction.

  • w_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for weight. By default, it is initialized with nnabla.initializer.UniformInitializer within the range determined by nnabla.initializer.calc_uniform_lim_glorot.

  • b_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.

  • base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.

  • fix_parameters (bool) – When set to True, the weights and biases will not be updated.

  • rng (numpy.random.RandomState) – Random generator for Initializer.

  • with_bias (bool) – Specify whether to include the bias term.

  • quantize_w (bool) – Quantize weights if True.

  • sign_w (bool) – Use signed quantization if True.

  • n_w (int) – Bit width used for weight.

  • m_w (int) – \(2^m\) is upper bound and \(-2^m\) is lower bound for weights. Default is 2.

  • ste_fine_grained_w (bool) – STE is fine-grained if True.

  • quantize_b (bool) – Quantize bias if True.

  • sign_b (bool) – Use signed quantization if True.

  • n_b (int) – Bit width used for bias.

  • m_b (int) – \(2^m\) is upper bound and \(-2^m\) is lower bound for bias. Default is 2.

  • ste_fine_grained_b (bool) – STE is fine-grained if True.

Returns

N-D array.

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "pow2_quantized_conv";

  • W (need_grad=True) : Filter weights in float. (shape: (outmaps, inmaps // group, *kernel))

  • b (need_grad=True) : Bias vector in float. (shape: (outmaps,))

  • W_q (need_grad=False) : Quantized weights. (shape: (outmaps, inmaps // group, *kernel))

  • b_q (need_grad=False) : Quantized biases. (shape: (outmaps,))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = pow2_quantized_convolution(<args>)
nnabla.parametric_functions.pruned_affine(inp, n_outmaps, base_axis=1, w_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True, prune_w=True, rate_w=0.9, prune_b=True, rate_b=0.9, name=None)[source]

Pruned Affine.

Pruned Affine is the affine function, except the definition of the inner product is modified. The input-output relation of this function is as follows:

\[y_j = \sum_{i} Q(w_{ji}) x_i,\]

where \(Q(w_{ji})\) is the pruning function, i.e., F.prune.

Note

1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the quantized weights (quantized weight)

2) The weights and the quantized weights become synced only after forward() is called, and not after a call to backward(). To access the parameters of the network, remember to call forward() once before doing so, otherwise the float weights and the quantized weights will not be in sync.

3) CPU and GPU implementations now use float value for quantized weight, since this function is only for simulation purposes.

Parameters
  • inp (Variable) – Input N-D array with shape (\(M_0 \times \ldots \times M_{B-1} \times D_B \times \ldots \times D_N\)). Dimensions before and after base_axis are flattened as if it is a matrix.

  • n_outmaps (int or tuple of int) – Number of output neurons per data.

  • base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.

  • w_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for weight.

  • b_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for bias.

  • fix_parameters (bool) – When set to True, the weights and biases will not be updated.

  • rng (numpy.random.RandomState) – Random generator for Initializer.

  • with_bias (bool) – Specify whether to include the bias term.

  • prune_w (bool) – Quantize weights if True.

  • rate_w (float) – Pruning rate for weights.

  • prune_b (bool) – Quantize bias if True.

  • rate_b (float) – Pruning rate for bias.

Returns

\((B + 1)\)-D array. (\(M_0 \times \ldots \times M_{B-1} \times L\))

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "pruned_affine";

  • W (need_grad=True) : Weight matrix in float. (shape: (inmaps, outmaps))

  • b (need_grad=True) : Bias vector in float. (shape: (outmaps,))

  • W_q (need_grad=False) : Qunatized weights. (shape: (inmaps, outmaps))

  • b_q (need_grad=False) : Quantized biases. (shape: (outmaps,))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = pruned_affine(<args>)
nnabla.parametric_functions.pruned_convolution(inp, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, channel_last=False, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True, prune_w=True, rate_w=0.9, prune_b=True, rate_b=0.9, name=None)[source]

Pruned Convolution.

Pruned Convolution is the convolution function, except the definition of the inner product is modified. The input-output relation of this function is as follows:

\[y_{n, a, b} = \sum_{m} \sum_{i} \sum_{j} Q(w_{n, m, i, j}) x_{m, a + i, b + j},\]

where \(Q(w_{ji})\) is the pruning function, i.e., F.prune.

Note

1) if you would like to share weights between some layers, please make sure to share the standard, floating value weights (weight) and not the quantized weights (quantized weight)

2) The weights and the quantized weights become synced only after forward() is called, and not after a call to backward(). To access the parameters of the network, remember to call forward() once before doing so, otherwise the float weights and the quantized weights will not be in sync.

3) CPU and GPU implementations now use float value for quantized weight, since this function is only for simulation purposes.

Parameters
  • inp (Variable) – N-D array.

  • outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.

  • kernel (tuple of int) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) two-dimensional kernel, specify (3,5).

  • pad (tuple of int) – Padding sizes for dimensions.

  • stride (tuple of int) – Stride sizes for dimensions.

  • dilation (tuple of int) – Dilation sizes for dimensions.

  • group (int) – Number of groups of channels. This makes connections across channels more sparse by grouping connections along map direction.

  • w_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for weight.

  • b_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for bias.

  • base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.

  • fix_parameters (bool) – When set to True, the weights and biases will not be updated.

  • rng (numpy.random.RandomState) – Random generator for Initializer.

  • with_bias (bool) – Specify whether to include the bias term.

  • prune_w (bool) – Quantize weights if True.

  • rate_w (float) – Pruning rate for weights.

  • prune_b (bool) – Quantize bias if True.

  • rate_b (float) – Pruning rate for bias.

Returns

N-D array.

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "pruned_conv";

  • W (need_grad=True) : Filter weights in float. (shape: (outmaps, inmaps // group, *kernel))

  • b (need_grad=True) : Bias vector in float. (shape: (outmaps,))

  • W_q (need_grad=False) : Qunatized weights. (shape: (outmaps, inmaps // group, *kernel))

  • b_q (need_grad=False) : Quantized biases. (shape: (outmaps,))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = pruned_convolution(<args>)
nnabla.parametric_functions.min_max_quantize(x, ql_min=0, ql_max=255, decay=0.999, x_min_max=False, ema=False, ste_fine_grained=True, eps=0.01, qr_min_init=None, qr_max_init=None, fix_parameters=False, outputs=None, name=None)[source]

Min-max quantization.

This function uniformly quantizes values in the range of min and max quantization levels.

Min-max quantization is defined as the following equation

\[y = round \left(\frac{\min(\max(x, m), M) - m}{scale} \right) \times scale + m,\]

where the \(scale\) is defined as

\[scale = \frac{M - m}{M_q - m_q},\]

and

\[\begin{split}m_q = ql_{min}, \\ M_q = ql_{max}, \\ m = qr_{min}, \\ M = qr_{max}.\end{split}\]

In the backward pass when using ste_fine_grained as false,

\[\frac{\partial q_i}{\partial x_i} = 1.\]

In the backward pass when using ste_fine_grained as true,

\[\begin{split} \frac{\partial q_i}{\partial x_i}= \left\{ \begin{array}{ll} 0 & if \ \ \ x_i > M \\ 1 & if \ \ m \le x_i \le M \\ 0 & if \ \ x_i < m \\ \end{array} \right..\end{split}\]

\(qr_{min}\) and \(qr_{max}\) are treaded as follows.

  • x_min_max is True and ema is True: Exponential moving average are computed for each \(min(x)\) and \(max(x)\) then stored in \(qr_{min}\) and \(qr_{max}\).

  • x_min_max is True and ema is False: \(min(x)\) and \(max(x)\) are computed then stored in \(qr_{min}\) and \(qr_{max}\).

  • x_min_max is False and ema is True: Exponential moving average stored in \(qr_{min}\) and \(qr_{max}\) are used.

  • x_min_max is False and ema is False Gradients of \(qr_{min}\) and \(qr_{max}\) are computed in the backward pass.

More precisely, in inference of the min-max quantization, one has to consider zero-point (zp) which corresponds to the real value 0, and its data type is an integer. zero-point is defined as

\[\begin{split} && zp_f = ql_{min} -\frac{qr_{min}}{scale}, \\ && zp = \left\{ \begin{array}{ll} ql_{max} & if \ \ \ zp_f >= ql_{max} \\ round(zp_f) & if \ \ otherwise \\ ql_{min} & if \ \ zp_f <= ql_{min} \\ \end{array} \right..\end{split}\]

Accordingly, in order to simulate quantization effect of zero-point, during both forward and backward pass, \(qr_{min}\) and \(qr_{max}\) are adjusted as follows,

\[\begin{split}qr_{min}^{adj} = ql_{min} - zp * scale, \\ qr_{max}^{adj} = ql_{max} - zp * scale.\end{split}\]

These operations are often called nudge.

Finally, in the formulas of the min-max quantization, \(m\) and \(M\) are replaced by \(qr_{min}^{adj}\) and \(qr_{max}^{adj}\) respectively.

Parameters

References

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko, “Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference”, https://arxiv.org/abs/1712.05877

Parameters to be registered

The following variables are registered in a parameter scope "min_max_quantize";

  • qr_min (need_grad=False) : Minimum quantization range, the exponential movining average of min values of inputs initialized with -6.0 if ema is True. (shape: ql_min.shape)

  • qr_max (need_grad=False) : Maximum quantization range, the exponential movining average of max values of inputs initialized with 6.0 if ema is True. (shape: ql_max.shape)

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = min_max_quantize(<args>)
nnabla.parametric_functions.lstm_cell(x, h, c, state_size, w_init=None, b_init=None, fix_parameters=False, name=None)[source]

Long Short-Term Memory.

Long Short-Term Memory, or LSTM, is a building block for recurrent neural networks (RNN) layers. LSTM unit consists of a cell and input, output, forget gates whose functions are defined as following:

\[\begin{split}f_t&&=\sigma(W_fx_t+U_fh_{t-1}+b_f) \\ i_t&&=\sigma(W_ix_t+U_ih_{t-1}+b_i) \\ o_t&&=\sigma(W_ox_t+U_oh_{t-1}+b_o) \\ c_t&&=f_t\odot c_{t-1}+i_t\odot\tanh(W_cx_t+U_ch_{t-1}+b_c) \\ h_t&&=o_t\odot\tanh(c_t).\end{split}\]

References

S. Hochreiter, and J. Schmidhuber. “Long Short-Term Memory.” Neural Computation. 1997.

Parameters
Returns

Variable

Parameters to be registered

The following variables are registered in a parameter scope "lstm";

  • affine/W (need_grad=True) : Stacked weight matrixes of LSTM block. (shape: (inmaps, 4, state_size))

  • affine/b (need_grad=True) : Stacked bias vectors of LSTM block. (shape: (4, state_size,))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = lstm_cell(<args>)
class nnabla.parametric_functions.LSTMCell(batch_size, state_size, h=None, c=None, name=None)[source]
__call__(x, w_init, b_init, fix_parameters)[source]

Updates h and c by calling lstm function.

Parameters
nnabla.parametric_functions.spectral_norm(w, dim=0, itr=1, eps=1e-12, test=False, u_init=None, fix_parameters=True, name=None)[source]

Spectral Normalization.

\[W_{sn} = \frac{W}{\sigma(W)}.\]

where \(W\) is the input matrix, and the \(\sigma(W)\) is the spectral norm of \(W\). The spectral norm is approximately computed by the power iteration.

References

Takeru Miyato, Toshiki Kataoka, Masanori Koyama, Yuichi Yoshida, “Spectral Normalization for Generative Adversarial Networks”, International Conference on Learning Representations. 2018.

Parameters
  • W (Variable) – Input N-D array with shape. This is normally network parameter.

  • dim (int) – Output dimension. Default is 0. If the dimension is not 0, then the specified dimension becomes the most-left dimension by transposing.

  • itr (int) – Number of iterations. Default is 1.

  • eps (float) – Epsilon for the normalization. Default is 1e-12.

  • test (bool) – Use test mode. Default is False.

Returns

Spectrally normalized \(W_{sn}\) with the same shape as \(W\).

Return type

Variable

Example

import nnabla as nn
import nnabla.parametric_functions as PF

b, c, h, w = 4, 64, 32, 32

# Spectrally normalized convolution
apply_w = lambda w: PF.spectral_norm(w, dim=0)
h = nn.Variable.from_numpy_array(np.random.randn(b, c, h, w))
h = PF.convolution(h, with_bias=False, apply_w=apply_w)

# Spectrally normalized affine
apply_w = lambda w: PF.spectral_norm(w, dim=1)
h = nn.Variable.from_numpy_array(np.random.randn(b, c))
h = PF.affine(h, with_bias=False, apply_w=apply_w)

# Spectrally normalized embed
apply_w = lambda w: PF.spectral_norm(w, dim=1)
h = nn.Variable.from_numpy_array(np.random.randn(b, c))
h = PF.embed(h, c, apply_w=apply_w)
Parameters to be registered

The following variables are registered in a parameter scope "spectral-norm";

  • u (need_grad=False) : singular vector. (shape: (w.shape[dim], ))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = spectral_norm(<args>)
nnabla.parametric_functions.weight_normalization(w, dim=0, eps=1e-12, g_init=None, fix_parameters=False, name=None)[source]

Weight Normalization.

\[\mathbf{w}_{WN} = g \dfrac{\mathbf{w}}{\|\mathbf{w}\|}\]

where \(\mathbf{w}\) is the input weights to be normalized, and \(g\) is learnable multiplication factors each of which is applied to each input weights at dim. This function is in general used as callback passed to apply_w for PF.convolution, PF.affine and so on. According to the author`s original implementation, \(v\) should be initialized by \(N(0, 0.05)\). To meet this condition, initializer should be passed to convolution which Weight Normalization is applied, like an example below.

References

Parameters
  • W (Variable) – Input N-D array with shape. This is normally network parameter.

  • dim (int) – Output dimension. Default is 0. If the dimension is not 0, then the specified dimension becomes the most-left dimension by transposing.

  • eps (float) – Epsilon for the normalization. Default is 1e-12.

  • g_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for the scale. By default, L2-norm of weights corresponding to dim are used.

Returns

\(W\) with the same shape as \(v\).

Return type

Variable

Example

import nnabla as nn
import nnabla.parametric_functions as PF
import nnabla.initializer as I

# h is nn.Variable.

# convolution
# according to the original implementation, w should be initialized by N(0, 0.05).
h = PF.convolution(h, ..., apply_w=PF.weight_normalization, w_init=I.NormalInitializer(0.05))

# affine
h = PF.affine(h, ..., apply_w=lambda w: PF.weight_normalization(w, dim=1), w_init=I.NormalInitializer(0.05))

Warning

Up to the version 1.10.0, this had been implemented as the composite functions.

Parameters to be registered

The following variables are registered in a parameter scope "wn";

  • g (need_grad=True) : Weight Normalization adaptive scale scalar.. (shape: w.shape[dim])

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = weight_normalization(<args>)
nnabla.parametric_functions.multi_head_attention(query, key, value, num_heads=12, dropout=0.0, k_embed_dim=None, v_embed_dim=None, out_dim=None, rng=None, with_bias=True, add_attn_bias=False, additive_mask=None, key_padding_mask=None, fix_parameters=False, param_init=None, name=None)[source]

MultiHeadAttention.

Computes multi-headed attention with query, key, and value. We use the following notations to describe the inputs and outputs below. \(L_T\): target sequence length, \(L_S\): source sequence length, \(B\): batch size, \(D\): input dimension, \(E\): embedding dimension.

References

A. Vaswani et al. “Attention is All You Need.” NIPS. 2017. <https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf>

Example:

q = nn.Variable((tgt_len, batch_size, q_input_dim))
k = nn.Variable((src_len, batch_size, k_input_dim))
v = nn.Variable((src_len, batch_size, v_input_dim))

out, w = PF.multi_head_attention(q, k, v)
out.forward()
Parameters
  • query (Variable) – Input N-D array with shape \((L_T, B, D_q)\).

  • key (Variable) – Input N-D array with shape \((L_S, B, D_k)\).

  • value (Variable) – Input N-D array with shape \((L_S, B, D_v)\).

  • num_heads (int, optional) – Number of attention heads. Note that embedding dimensoin E must be divisible by the number of heads. Default is 12 which is conventional.

  • dropout (float, optional) – Dropout ratio applied to parameters. Default is 0.

  • k_embed_dim (int, optional) – Embedding dimension for key. If specified, embedding dimensions for both query and key are set as that value. Otherwise, k_embed_dim is set as the same alue as embedding dimension for query.

  • v_embed_dim (int, optional) – Embedding dimension for value. If not specified, it is defaulted as the same value as embedding dimension for query.

  • out_dim (int, optional) – Embedding dimension for output weight. If not spefied, it is defaulted as the same value as embedding dimension for value.

  • rng (numpy.random.RandomState, optional) – Random generator for Initializer. Default is None.

  • with_bias (bool, optional) – Specify whether to include the bias parameters. Default is True.

  • add_attn_bias (bool, optional) – Specify whether to add attention bias parameters for key and value. Default is False.

  • additive_mask (Variable, optional) – Input N-D array with shape \((L_T, L_S)\). Values will be added to the attention layer to prevent attention to certain positions.

  • key_padding_mask (Variable, optional) – Input N-D array with shape \((B, L_S)\). Specified padding elements will be ignored by the attention layer. Values must be either 1 or 0.

  • fix_parameters (bool, optional) – When set to True, the weights and biases will not be updated. Default is False.

  • param_init (dict, optional) – Parameter initializers can be set with a dict. Possible keys of the dict include q_weight, k_weight, v_weight, q_bias, k_bias, v_bias, out_weight, out_bias, attn_bias_k, attn_bias_v. A value of the dict must be an Initializer or a numpy.ndarray. E.g. {'q_bias': ConstantInitializer(0)}.

Returns

Output \(y\) with shape \((L_T, B, E)\) ~nnabla.Variable: Output \(h_n\) with shape \((B, L_T, L_S)\)

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "multi_head_attention";

  • q_weight (need_grad=True) : weights for query. (shape: (E, E))

  • k_weight (need_grad=True) : weights for key. (shape: (E_k, E))

  • v_weight (need_grad=True) : weights for value. (shape: (E_v, E))

  • out_weight (need_grad=True) : weigths for out projection. (shape: (E, E))

  • q_bias (need_grad=True) : bias for query. (shape: (E, ))

  • k_bias (need_grad=True) : bias for key. (shape: (E, ))

  • v_bias (need_grad=True) : bais for value. (shape: (E, ))

  • out_bias (need_grad=True) : bias for out projection. (shape: (E, ))

  • attn_bias_k (need_grad=True) : attnetion bias for k. (shape: (E, 1))

  • attn_bias_v (need_grad=True) : attnetion bias for v. (shape: (E, 1))

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = multi_head_attention(<args>)
nnabla.parametric_functions.transformer(src, tgt, embed_dim=512, num_heads=8, num_encoder_layers=6, num_decoder_layers=6, dim_feedforward=2048, dropout=0.1, activation=None, src_additive_mask=None, tgt_additive_mask=None, memory_additive_mask=None, src_key_padding_mask=None, tgt_key_padding_mask=None, memory_key_padding_mask=None, rng=None, add_attn_bias=False, fix_parameters=False, name=None)[source]

Transformer.

We use the following notations to describe the inputs and outputs below. \(L_T\): target sequence length, \(L_S\): source sequence length, \(B\): batch size, \(E\): embedding dimension.

References

A. Vaswani et al. “Attention is All You Need.” NIPS. 2017. <https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf>

Examples:

src = nn.Variable((src_len, batch_size, embed_dim),need_grad=True)
tgt = nn.Variable((tgt_len, batch_size, embed_dim),need_grad=True)
out = PF.transformer(src, tgt, num_heads=16, num_encoder_layers=12)
out.forward()
Parameters
  • src (Variable) – Input source sequence to the encoder with shape:math:(L_S, B, E).

  • tgt (Variable) – Input target sequence to the decoder with shape \((L_T, B, E)\).

  • embed_dim (int, optional) – Embedding dimension to be used. Default is 512.

  • num_heads (int, optional) – Number of attention heads. Default is 12.

  • num_encoder_layers (int, optional) – Number of encoder layers to stack. Default is 6.

  • num_decoder_layers (int, optional) – Number of decoder layers to stack. Default is 6.

  • dim_feedforward (int, optional) – Dimension of the feedforward network model. Default is 2048.

  • dropout (float, optional) – Dropout ratio applied. Default is 0.1.

  • activation (function, optional) – Non-linear activation function to be used. Default is None, which is set as F.relu in the code.

  • src_additive_mask (Variable, optional) – Additive mask for the src sequence (optional). \((L_S, L_S)\).

  • tgt_additive_mask (Variable, optional) – Additive mask for the tgt sequence (optional). \((L_T, L_T)\).

  • memory_additive_mask (Variable, optional) – Additive mask for the encoder output (optional). \((L_T, L_S)\).

  • src_key_padding_mask (Variable, optional) – Key padding mask for src keys per batch (optional). \((B, L_S)\). Specified padding elements will be ignored by the attention layer. Values must be either 1 or 0.

  • tgt_key_padding_mask (Variable, optional) – Key padding mask for tgt keys per batch (optional). \((B, L_T)\). Specified padding elements will be ignored by the attention layer. Values must be either 1 or 0.

  • memory_key_padding_mask (Variable, optional) – Key padding mask for memory keys per batch (optional). \((B, L_S)\). Specified padding elements will be ignored by the attention layer. Values must be either 1 or 0.

  • rng (numpy.random.RandomState, optional) – Random generator for Initializer. Default is None.

  • add_attn_bias (bool, optional) – Specify whether to add attention bias parameters for key and value. Default is False.

  • fix_parameters (bool, optional) – When set to True, the weights and biases will not be updated. Default is False.

Returns

Output \(y\) with shape \((L_T, B, E)\)

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "transformer";

  • encoder{layer#} (need_grad=True) : parameters for the n’th encoder layer. (shape: Refer to transformer_encode for details)

  • decoder{layer#} (need_grad=True) : parameters for the n’th decoder layer. (shape: Refer to transformer_decode for details)

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = transformer(<args>)
nnabla.parametric_functions.transformer_encode(src, embed_dim, num_heads, dim_feedforward=2048, dropout=0.1, activation=None, src_additive_mask=None, src_key_padding_mask=None, rng=None, add_attn_bias=False, fix_parameters=False, name=None)[source]

Transformer Encoder.

Parameters
  • src (Variable) – Input sequnce to the encoder layer with shape \((L_S, B, E)\).

  • embed_dim (int) – Number of embedding dimension.

  • num_heads (int) – Number of attention heads.

  • dim_feedforward (int, optional) – Dimension of the feedforward network model. Default is 2048.

  • dropout (float, optional) – Dropout ratio. Default is 0.1.

  • activation (function, optional) – Non-linear activation function to be used. Default is None, which is set as F.relu in the code.

  • src_additive_mask (Variable, optional) – Additive mask for the source sequence with shape \((L_S, L_S)\)

  • src_key_padding_mask (Variable, optional) – Padding mask for the source sequence with shape \((B, L_S)\). Specified padding elements will be ignored by the attention layer. Values must be either 1 or 0.

  • rng (numpy.random.RandomState, optional) – Random generator for Initializer. Defalut is None.

  • add_attn_bias (bool, optional) – Specify whether to add attention bias parameters for key and value. Default is False.

  • fix_parameters (bool, optional) – When set to True, the weights and biases will not be updated. Default is False.

Returns

Output \(y\) with shape \((L_S, B, E)\)

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "transformer_encode";

  • src_self_attn (need_grad=True) : self-attention parameters for source sequence. (shape: Refer to multi_head_attention for details)

  • enc_affine1 (need_grad=True) : first affine used in encoder. (shape: Refer to affine for details)

  • enc_affine2 (need_grad=True) : second affine used in encoder. (shape: Refer to affine for details)

  • enc_layer_norm1 (need_grad=True) : fist layer normalization used in encoder. (shape: Refer to layer_normalization for details)

  • enc_layer_norm2 (need_grad=True) : second layer normalization used in encoder. (shape: Refer to layer_normalization for details)

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = transformer_encode(<args>)
nnabla.parametric_functions.transformer_decode(tgt, memory, embed_dim, num_heads, dim_feedforward=2048, dropout=0.1, activation=None, tgt_additive_mask=None, memory_additive_mask=None, tgt_key_padding_mask=None, memory_key_padding_mask=None, rng=None, add_attn_bias=False, fix_parameters=False, name=None)[source]

Transformer Decoder.

Parameters
  • tgt (Variable) – Input sequnce to the decoder layer with shape \((L_T, B, E)\).

  • memory (Variable) – Output sequnce from the last layer of the encoder with shape \((L_T, B, E)\).

  • embed_dim (int) – Number of embedding dimension.

  • num_heads (int) – Number of attention heads.

  • dim_feedforward (int, optional) – Dimension of the feedforward network model. Default is 2048.

  • dropout (float, optional) – Dropout ratio. Default is 0.1.

  • activation (function, optional) – Non-linear activation function to be used. Default is None, which is set as F.relu in the code.

  • tgt_additive_mask (Variable, optional) – Additive mask for the target sequence with shape \((L_T, L_T)\).

  • memory_additive_mask (Variable, optional) – Additive mask for the memory sequcne with shape \((L_T, L_S)\).

  • tgt_key_padding_mask (Variable, optional) – Padding mask for the target sequence with shape \((B, L_T)\). Specified padding elements will be ignored by the attention layer. Values must be either 1 or 0.

  • memory_key_padding_mask (Variable, optional) – Padding mask for the mask sequence with shape \((B, L_S)\). Specified padding elements will be ignored by the attention layer. Values must be either 1 or 0.

  • rng (numpy.random.RandomState) – Random generator for Initializer. Default is None.

  • add_attn_bias (bool, optional) – Specify whether to add attention bias parameters for key and value. Default is False.

  • fix_parameters (bool) – When set to True, the weights and biases will not be updated. Default is False.

Returns

Output \(y\) with shape \((L_T, B, E)\)

Return type

Variable

Parameters to be registered

The following variables are registered in a parameter scope "transformer_decode";

  • tgt_self_attn (need_grad=True) : self-attention parameters for target sequence. (shape: Refer to multi_head_attention for details)

  • tgt_memory_attn (need_grad=True) : attention parameters for target sequence with output from encoder as key. (shape: Refer to multi_head_attention for details)

  • dec_affine1 (need_grad=True) : first affine used in decoder. (shape: Refer to affine for details)

  • dec_affine2 (need_grad=True) : second affine used in decoder. (shape: Refer to affine for details)

  • dec_layer_norm1 (need_grad=True) : fist layer normalization used in decoder. (shape: Refer to layer_normalization for details)

  • dec_layer_norm2 (need_grad=True) : second layer normalization used in decoder. (shape: Refer to layer_normalization for details)

  • dec_layer_norm3 (need_grad=True) : third layer normalization used in decoder. (shape: Refer to layer_normalization for details)

Note

If the name option is passed, the parameters become wrapped inside the parameter scope with the specified name, yielding the same results as the following code. This can be used to simplify the code.

with parameter_scope(name):
    output = transformer_decode(<args>)
Parameter Initializer

Some of the parametric functions optionally takes parameter initializer listed below.

class nnabla.initializer.BaseInitializer[source]

Base class of the parameter initializer.

__call__(shape)[source]

Generates an array with an initializer.

Parameters

shape (tuple of int) – numpy.ndarray with the shape created.

Returns

Array.

Return type

numpy.ndarray

Note

Subclasses of BaseInitializer must override this method.

class nnabla.initializer.ConstantInitializer(value=0)[source]

Bases: nnabla.initializer.BaseInitializer

Generates a constant valued array.

Parameters

value (float) – A constant value.

Example:

import nnabla as nn
import nnabla.parametric_functions as PF
import nnabla.initializer as I

x = nn.Variable([60,1,28,28])
w = I.ConstantInitializer(0.1)
b = I.ConstantInitializer() # this generates constant valued array of default value 0
h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv'
class nnabla.initializer.NormalInitializer(sigma=1.0, rng=None)[source]

Bases: nnabla.initializer.BaseInitializer

Generates a random array from a specified normal distribution.

\[\mathbf x \sim {\cal N} (\mathbf 0 | \sigma^2 \mathbf I)\]
Parameters

Example:

import nnabla as nn
import nnabla.parametric_functions as PF
import nnabla.initializer as I

x = nn.Variable([60,1,28,28])
w = I.NormalInitializer(5e-5)
b = I.NormalInitializer(0.0)
h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv')
class nnabla.initializer.UniformInitializer(lim=(- 1, 1), rng=None)[source]

Bases: nnabla.initializer.BaseInitializer

Generates a random array from a specified uniform distribution.

\[\mathbf x \sim {\cal U} (a, b)\]
Parameters

Example:

import nnabla as nn
import nnabla.parametric_functions as PF
import nnabla.initializer as I

x = nn.Variable([60,1,28,28])
w = I.UniformInitializer() # this generates uniform distribution within the default range of (-1,1)
b = I.UniformInitializer((-0.5,0.5))
h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv')
class nnabla.initializer.UniformIntInitializer(lim=(0, 10), rng=None)[source]

Bases: nnabla.initializer.BaseInitializer

Generates a random array from a specified integer uniform distribution.

\[\mathbf x \sim {\cal U} ([a, b))\]
Parameters

Example:

import nnabla as nn
import nnabla.parametric_functions as PF
import nnabla.initializer as I

x = nn.Variable([60,1,28,28])
w = I.UniformIntInitializer() # this generates uniform integer distribution within the default range of (0,10)
b = I.UniformIntInitializer((-1,1))
h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv')
class nnabla.initializer.RangeInitializer(start=0, step=1)[source]

Bases: nnabla.initializer.BaseInitializer

Generates an array with sequence of numbers.

\[\mathbf x[i] = start + step * i\]
Parameters
  • start (int) – A start value.

  • step (int) – A step value.

Example:

import nnabla as nn
import nnabla.initializer as I

x = nn.Variable([100])
x.d = I.RangeInitializer(0, 1)(x.shape)
class nnabla.initializer.OrthogonalInitializer(gain=1.0, rng=None)[source]

Bases: nnabla.initializer.BaseInitializer

Generates an orthogonal matrix weights proposed by Saxe et al.

Parameters
  • gain (float) – scaling factor which should be decided depending on a type of units.

  • rng (numpy.random.RandomState) – Random number generator.

Example:

import numpy as np
import nnabla as nn
import nnabla.parametric_functions as PF
import nnabla.initializer as I

x = nn.Variable([60,1,28,28])
w = I.OrthogonalInitializer(np.sqrt(2.0))
b = I.ConstantInitializer(0.0)
h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv')

References

class nnabla.initializer.WeightNormalizationScaleInitializer(w, dim=0, eps=1e-12)[source]

Bases: nnabla.initializer.BaseInitializer

Compute the L2-norm for each weight kernel.

This initializer is specific to the weight normalization scale to keep the same magnitude of the originally initialized weights even after the applicaiton of the weight normalization at only initialization.

Parameters
  • w (Variable) – Weight the weight normalization is applied.

  • dim (int) – Output dimension of the weight normalization.

  • eps (float) – Eplision of the weight normalization.

nnabla.initializer.calc_normal_std_he_forward(inmaps, outmaps, kernel=(1, 1))[source]

Calculates the standard deviation proposed by He et al.

\[\sigma = \sqrt{\frac{2}{NK}}\]
Parameters
  • inmaps (int) – Map size of an input Variable, \(N\).

  • outmaps (int) – Map size of an output Variable, \(M\).

  • kernel (tuple of int) – Convolution kernel spatial shape. In above definition, \(K\) is the product of shape dimensions. In Affine, the default value should be used.

Example:

import nnabla as nn
import nnabla.parametric_functions as PF
import nnabla.initializer as I

x = nn.Variable([60,1,28,28])
s = I.calc_normal_std_he_forward(x.shape[1],64)
w = I.NormalInitializer(s)
b = I.ConstantInitializer(0)
h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv')

References

nnabla.initializer.calc_normal_std_he_backward(inmaps, outmaps, kernel=(1, 1))[source]

Calculates the standard deviation of He et al. (backward case).

\[\sigma = \sqrt{\frac{2}{MK}}\]
Parameters
  • inmaps (int) – Map size of an input Variable, \(N\).

  • outmaps (int) – Map size of an output Variable, \(M\).

  • kernel (tuple of int) – Convolution kernel spatial shape. In above definition, \(K\) is the product of shape dimensions. In Affine, the default value should be used.

Example:

import nnabla as nn
import nnabla.parametric_functions as PF
import nnabla.initializer as I

x = nn.Variable([60,1,28,28])
s = I.calc_normal_std_he_backward(x.shape[1],64)
w = I.NormalInitializer(s)
b = I.ConstantInitializer(0)
h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv')

References

nnabla.initializer.calc_normal_std_glorot(inmaps, outmaps, kernel=(1, 1))[source]

Calculates the standard deviation proposed by Glorot et al.

Note

We have updated the definition as following from v.1.2. It may affect the behavior of existing scripts that rely on the default initialization.

\[\sigma = \sqrt{\frac{2}{K(N + M)}}\]
Parameters
  • inmaps (int) – Map size of an input Variable, \(N\).

  • outmaps (int) – Map size of an output Variable, \(M\).

  • kernel (tuple of int) – Convolution kernel spatial shape. In above definition, \(K\) is the product of shape dimensions. In Affine, the default value should be used.

Example:

import nnabla as nn
import nnabla.parametric_functions as PF
import nnabla.initializer as I

x = nn.Variable([60,1,28,28])
s = I.calc_normal_std_glorot(x.shape[1],64)
w = I.NormalInitializer(s)
b = I.ConstantInitializer(0)
h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv')

References

nnabla.initializer.calc_uniform_lim_glorot(inmaps, outmaps, kernel=(1, 1))[source]

Calculates the lower bound and the upper bound of the uniform distribution proposed by Glorot et al.

Note

We have updated the definition as following from v.1.3. It may affect the behavior of existing scripts that rely on the default initialization.

\[\begin{split}b &= \sqrt{\frac{6}{K(N + M)}}\\ a &= -b\end{split}\]
Parameters
  • inmaps (int) – Map size of an input Variable, \(N\).

  • outmaps (int) – Map size of an output Variable, \(M\).

  • kernel (tuple of int) – Convolution kernel spatial shape. In above definition, \(K\) is the product of shape dimensions. In Affine, the default value should be used.

Example:

import nnabla as nn
import nnabla.parametric_functions as PF
import nnabla.initializer as I

x = nn.Variable([60,1,28,28])
lb,ub= I.calc_uniform_lim_glorot(x.shape[1],64)
w = I.UniformInitializer((lb,ub))
b = I.ConstantInitializer(0)
h = PF.convolution(x, 64, [3, 3], w_init=w, b_init=b, pad=[1, 1], name='conv')

References

Grad

nnabla.grad.grad(outputs, inputs, grad_outputs=None, persistent_outputs=[], bind_grad_output=False)[source]

Gradient function for the outputs with respect to the inputs.

The grad function computes the sum of gradients of the outputs w.r.t. the inputs.

\[g_i = \sum_{j} {\frac{\partial y_j}{\partial x_i}},\]

\(y_j\) is each output, \(x_i\) is each input, and \(g_i\) is the sum of the gradient of \(y_j\) w.r.t. \(x_i\) over all \(j\).

Parameters
  • outputs (list of Variable or Variable) – Outputs of the differentiable function.

  • inputs (list of Variable or Variable) – Inputs w.r.t. which the gradients of outputs are computed.

  • grad_outputs (None, scalar, numpy.ndarray, nnabla.NdArray, or list of scalar, numpy.ndarray, or nnabla.NdArray,) – Gradient outputs corresponding to outputs. This is same as the grad argument of backward(). Default is None, so 1 is used as the in-coming gradient at the very beginning of the Variable in the gradient graph.

  • persistent_outputs (list of bool) – Outputs become persistent accordingly. If not specified, all outputs become persistent.

  • bind_grad_output (bool) – Bind data to grad of input variable. This is useful for the case where one wants to use the gradient graph for training a neural network using the first-order gradients only. Default is False.

Returns

List of Variable.

If the backpropagation does not reach input(s), the corresponding returned value(s) are zero (i.e., the gradients w.r.t. inputs are zero) and not connected as a part of the gradient graph.

Example (Gradient Penalty):

import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import numpy as np
from nnabla.ext_utils import get_extension_context

# Context
extension_module = "cudnn"
ctx = get_extension_context(extension_module)
nn.set_default_context(ctx)

# Input and label
x = nn.Variable.from_numpy_array(np.random.randn(4, 3, 32, 32))
y = nn.Variable.from_numpy_array(np.random.randint(0, 10, 4).reshape(4, 1))

# Network
h = PF.convolution(x, 8, (3, 3), (1, 1), name="conv1")
h = F.relu(h)
h = F.max_pooling(h, (2, 2))
h = PF.convolution(h, 16, (3, 3), (1, 1), name="conv2")
h = F.relu(h)
h = F.max_pooling(h, (2, 2))
p = PF.affine(h, 10, name="pred")
loss = F.mean(F.softmax_cross_entropy(p, y))

# Grad
outputs = [loss]
inputs = nn.get_parameters().values()
grads = nn.grad(outputs, inputs)  # gradients of the parameters

# Backward of the outputs w.r.t. the parameters by constraining the gradient norms.
t = 0 # or 1
gp = sum([(F.sum(g ** 2) ** 0.5 - t) ** 2 for g in grads])
loss += gp
loss.forward()
loss.backward()

Example (Higer-order Gradients):

import nnabla as nn
import nnabla.functions as F
import numpy as np

x = nn.Variable.from_numpy_array(np.random.randn(2, 2)).apply(need_grad=True)
x.grad.zero()
y = F.sin(x)
def grad(y, x, n=1):
    dx = [y]
    for _ in range(n):
        dx = nn.grad([dx[0]], [x])
    return dx[0]
dnx = grad(y, x, n=10)
dnx.forward()
print(np.allclose(-np.sin(x.d), dnx.d))
dnx.backward()
print(np.allclose(-np.cos(x.d), x.g))

# Show the supported status for each function
from nnabla.backward_functions import show_registry
show_registry()
nnabla.backward_functions.register(func_name, func)[source]

Register the backward function to a function.

Parameters
  • func_name (str) – The function class name, for example, Affine.

  • func (function) – The function to be called as the backward function to the function func_name.. Arguments of the func must be (ctx: nn.Context, inputs: list of nn.Variable, **kwargs).. The inputs are the ones to the function of the func_name. The kwargs are the arguments of the function. For example, if the func_name is Affine, func is affine_backward, the inputs are data, weights, and bias if necessary, and kwargs = dict(base_axis=base_axis).

nnabla.backward_functions.show_registry()[source]

Show all backward fuctions registry

Solvers

The nnabla.solvers.Solver class represents a stochastic gradient descent based optimizer for optimizing the parameters in the computation graph. NNabla provides various solvers listed below.

Solver
class nnabla.solvers.Solver

Solver interface class.

The same API provided in this class can be used to implement various types of solvers.

Example:

# Network building comes above
import nnabla.solvers as S
solver = S.Sgd(lr=1e-3)
solver.set_parameters(nn.get_parameters())

for itr in range(num_itr):
    x.d = ... # set data
    t.d = ... # set label
    loss.forward()
    solver.zero_grad()  # All gradient buffer being 0
    loss.backward()
    solver.weight_decay(decay_rate)  # Apply weight decay
    solver.clip_grad_by_norm(clip_norm)  # Apply clip grad by norm
    solver.update()  # updating parameters

Note

All solvers provided by NNabla belong to an inherited class of Solver . A solver is never instantiated by this class itself.

check_inf_grad(self, pre_hook=None, post_hook=None)

Check if there is any inf on the gradients which were setup.

check_inf_or_nan_grad(self, pre_hook=None, post_hook=None)

Check if there is any inf or nan on the gradients which were setup.

check_nan_grad(self, pre_hook=None, post_hook=None)

Check if there is any nan on the gradients which were setup.

clear_parameters(self)

Clear all registered parameters and states.

clip_grad_by_norm(self, float clip_norm, pre_hook=None, post_hook=None)

Clip gradients by norm. When called, the gradient will be clipped by the given norm.

Parameters

clip_norm (float) – The value of clipping norm.

get_parameters(self)

Get all registered parameters

get_states(self)

Get all states

info

object

Type

info

learning_rate(self)

Get the learning rate.

load_states(self, path)

Load solver states.

Parameters

path – path to the state file to be loaded.

name

Get the name of the solver.

remove_parameters(self, vector[string] keys)

Remove previously registered parameters, specified by a vector of its keys.

save_states(self, path)

Save solver states.

Parameters

path – path or file object

scale_grad(self, scale, pre_hook=None, post_hook=None)

Rescale gradient

set_learning_rate(self, learning_rate)

Set the learning rate.

set_parameters(self, param_dict, bool reset=True, bool retain_state=False)

Set parameters by dictionary of keys and parameter Variables.

Parameters
  • param_dict (dict) – key:string, value: Variable.

  • reset (bool) – If true, clear all parameters before setting parameters. If false, parameters are overwritten or added (if it’s new).

  • retain_state (bool) – The value is only considered if reset is false. If true and a key already exists (overwriting), a state (such as momentum) associated with the key will be kept if the shape of the parameter and that of the new param match.

set_states(self, states)

Set states. Call set_parameters to initialize states of a solver first, otherwise this method raise an value error.

set_states_from_protobuf(self, optimizer_proto)

Set states to the solver from the protobuf file.

Internally used helper method.

set_states_to_protobuf(self, optimizer)

Set states to the protobuf file from the solver.

Internally used helper method.

setup(self, params)

Deprecated. Call set_parameters with param_dict .

update(self, update_pre_hook=None, update_post_hook=None)

When this function is called, parameter values are updated using the gradients accumulated in backpropagation, stored in the grad field of the parameter Variable s. Update rules are implemented in the C++ core, in derived classes of Solver. The updated parameter values will be stored into the data field of the parameter Variable s.

Parameters
  • update_pre_hook (callable) – This callable object is called immediately before each update of parameters. The default is None.

  • update_post_hook (callable) – This callable object is called immediately after each update of parameters. The default is None.

weight_decay(self, float decay_rate, pre_hook=None, post_hook=None)

Apply weight decay to gradients. When called, the gradient weight will be decayed by a rate of the current parameter value.

Parameters

decay_rate (float) – The coefficient of weight decay.

zero_grad(self)

Initialize gradients of all registered parameter by zero.

List of solvers
nnabla.solvers.Sgd(lr=0.001)

Stochastic gradient descent (SGD) optimizer.

\[w_{t+1} \leftarrow w_t - \eta \Delta w_t\]
Parameters

lr (float) – Learning rate (\(\eta\)).

Returns

An instance of Solver class.

See Solver API guide for details.

Return type

Solver

Note

You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by nnabla.set_default_context(ctx) or nnabla.context_scope(ctx). See API docs.

nnabla.solvers.Momentum(lr=0.001, momentum=0.9)

SGD with Momentum.

\[\begin{split}v_t &\leftarrow \gamma v_{t-1} + \eta \Delta w_t\\ w_{t+1} &\leftarrow w_t - v_t\end{split}\]
Parameters
  • lr (float) – Learning rate (\(\eta\)).

  • momentum (float) – Decay rate of momentum.

Returns

An instance of Solver class.

See Solver API guide for details.

Return type

Solver

Note

You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by nnabla.set_default_context(ctx) or nnabla.context_scope(ctx). See API docs.

References

nnabla.solvers.Lars(lr=0.001, momentum=0.9, coefficient=0.001, eps=1e-06)

LARS with Momentum.

\[\begin{split}\lambda &\leftarrow \eta \frac{\| w_t \|}{\| \Delta w_t + \beta w_t \|} \\ v_{t+1} &\leftarrow m v_t + \gamma \lambda (\Delta w_t + \beta w_t) \\ w_{t+1} &\leftarrow w_t - v_{t+1}\end{split}\]
Parameters
  • lr (float) – Learning rate (\(\eta\)).

  • momentum (float) – Decay rate of momentum.

  • coefficient (float) – Trust coefficient

  • eps (float) – Small value for avoiding zero devision(\(\epsilon\)).

Returns

An instance of Solver class.

See Solver API guide for details.

Return type

Solver

Note

You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by nnabla.set_default_context(ctx) or nnabla.context_scope(ctx). See API docs.

References

nnabla.solvers.Nesterov(lr=0.001, momentum=0.9)

Nesterov Accelerated Gradient optimizer.

\[\begin{split}v_t &\leftarrow \gamma v_{t-1} - \eta \Delta w_t\\ w_{t+1} &\leftarrow w_t - \gamma v_{t-1} + \left(1 + \gamma \right) v_t\end{split}\]
Parameters
  • lr (float) – Learning rate (\(\eta\)).

  • momentum (float) – Decay rate of momentum.

Returns

An instance of Solver class.

See Solver API guide for details.

Return type

Solver

Note

You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by nnabla.set_default_context(ctx) or nnabla.context_scope(ctx). See API docs.

References

  • Yurii Nesterov. A method for unconstrained convex minimization problem with the rate of convergence \(o(1/k2)\).

nnabla.solvers.Adadelta(lr=1.0, decay=0.95, eps=1e-06)

AdaDelta optimizer.

\[\begin{split}g_t &\leftarrow \Delta w_t\\ v_t &\leftarrow - \frac{RMS \left[ v_t \right]_{t-1}} {RMS \left[ g \right]_t}g_t\\ w_{t+1} &\leftarrow w_t + \eta v_t\end{split}\]
Parameters
  • lr (float) – Learning rate (\(\eta\)).

  • decay (float) – Decay rate (\(\gamma\)).

  • eps (float) – Small value for avoiding zero division(\(\epsilon\)).

Returns

An instance of Solver class.

See Solver API guide for details.

Return type

Solver

Note

You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by nnabla.set_default_context(ctx) or nnabla.context_scope(ctx). See API docs.

References

nnabla.solvers.Adagrad(lr=0.01, eps=1e-08)

ADAGrad optimizer.

\[\begin{split}g_t &\leftarrow \Delta w_t\\ G_t &\leftarrow G_{t-1} + g_t^2\\ w_{t+1} &\leftarrow w_t - \frac{\eta}{\sqrt{G_t} + \epsilon} g_t\end{split}\]
Parameters
  • lr (float) – Learning rate (\(\eta\)).

  • eps (float) – Small value for avoiding zero division(\(\epsilon\)).

Returns

An instance of Solver class.

See Solver API guide for details.

Return type

Solver

Note

You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by nnabla.set_default_context(ctx) or nnabla.context_scope(ctx). See API docs.

References

nnabla.solvers.AdaBelief(alpha=0.001, beta1=0.9, beta2=0.999, eps=1e-08, wd=0.0, amsgrad=False, weight_decouple=False, fixed_decay=False, rectify=False)

AdaBelief optimizer.

\[\begin{split}m_t &\leftarrow \beta_1 m_{t-1} + (1 - \beta_1) g_t\\ s_t &\leftarrow \beta_2 s_{t-1} + (1 - \beta_2) (g_t - m_t)^2\\ w_{t+1} &\leftarrow w_t - \alpha \frac{\sqrt{1 - \beta_2^t}}{1 - \beta_1^t} \frac{m_t}{\sqrt{s_t + \epsilon} + \epsilon}\end{split}\]
Parameters
  • alpha (float) – Step size (\(\alpha\)).

  • beta1 (float) – Decay rate of first-order momentum (\(\beta_1\)).

  • beta2 (float) – Decay rate of second-order momentum (\(\beta_2\)).

  • eps (float) – Small value for avoiding zero division(\(\epsilon\)).

  • wd (float) – Weight decay rate. This option only takes effect when weight_decouple option is enabled.

  • amsgrad (bool) – Perform AMSGrad variant of AdaBelief.

  • weight_decouple (bool) – Perform decoupled weight decay as in AdamW.

  • fixed_decay (bool) – If True, the weight decay ratio will be kept fixed. Note that this option only takes effect when weight_decouple option is enabled.

  • rectify (bool) – Perform RAdam variant of AdaBelief.

Returns

An instance of Solver class.

See Solver API guide for details.

Return type

Solver

Note

You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by nnabla.set_default_context(ctx) or nnabla.context_scope(ctx). See API docs.

References

nnabla.solvers.RMSprop(lr=0.001, decay=0.9, eps=1e-08)

RMSprop optimizer (Geoffery Hinton).

\[\begin{split}g_t &\leftarrow \Delta w_t\\ v_t &\leftarrow \gamma v_{t-1} + \left(1 - \gamma \right) g_t^2\\ w_{t+1} &\leftarrow w_t - \eta \frac{g_t}{\sqrt{v_t} + \epsilon}\end{split}\]
Parameters
  • lr (float) – Learning rate (\(\eta\)).

  • decay (float) – Decay rate (\(\gamma\)).

  • eps (float) – Small value for avoiding zero division(\(\epsilon\)).

Returns

An instance of Solver class.

See Solver API guide for details.

Return type

Solver

Note

You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by nnabla.set_default_context(ctx) or nnabla.context_scope(ctx). See API docs.

References

nnabla.solvers.RMSpropGraves(lr=0.0001, decay=0.95, momentum=0.9, eps=0.0001)

RMSpropGraves optimizer (Alex Graves).

\[\begin{split}n_t &\leftarrow \rho n_{t-1} + \left(1 - \rho \right) {e_t}^2\\ g_t &\leftarrow \rho g_{t-1} + \left(1 - \rho \right) e_t\\ d_t &\leftarrow \beta d_{t-1} - \eta \frac{e_t}{\sqrt{n_t - {g_t}^2 + \epsilon}}\\ w_{t+1} &\leftarrow w_t + d_t\end{split}\]

where \(e_t\) denotes the gradient.

Parameters
  • lr (float) – Learning rate (\(\eta\)).

  • decay (float) – Decay rate (\(\rho\)).

  • momentum (float) – Momentum (\(\beta\))

  • eps (float) – Small value for avoiding zero division(\(\epsilon\)).

Returns

An instance of Solver class.

See Solver API guide for details.

Return type

Solver

Note

You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by nnabla.set_default_context(ctx) or nnabla.context_scope(ctx). See API docs.

References

nnabla.solvers.Adam(alpha=0.001, beta1=0.9, beta2=0.999, eps=1e-08)

ADAM optimizer.

\[\begin{split}m_t &\leftarrow \beta_1 m_{t-1} + (1 - \beta_1) g_t\\ v_t &\leftarrow \beta_2 v_{t-1} + (1 - \beta_2) g_t^2\\ w_{t+1} &\leftarrow w_t - \alpha \frac{\sqrt{1 - \beta_2^t}}{1 - \beta_1^t} \frac{m_t}{\sqrt{v_t} + \epsilon}\end{split}\]

where \(g_t\) denotes a gradient, and let \(m_0 \leftarrow 0\) and \(v_0 \leftarrow 0\).

Parameters
  • alpha (float) – Step size (\(\alpha\)).

  • beta1 (float) – Decay rate of first-order momentum (\(\beta_1\)).

  • beta2 (float) – Decay rate of second-order momentum (\(\beta_2\)).

  • eps (float) – Small value for avoiding zero division(\(\epsilon\)).

Returns

An instance of Solver class.

See Solver API guide for details.

Return type

Solver

Note

You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by nnabla.set_default_context(ctx) or nnabla.context_scope(ctx). See API docs.

References

nnabla.solvers.AdaBound(alpha=0.001, beta1=0.9, beta2=0.999, eps=1e-08, final_lr=0.1, gamma=0.001)

AdaBound optimizer applies dynamic bounds on learning rates to Adam.

\[\begin{split}w_{t+1} &\leftarrow w_t - \eta_t*m_t\\ \eta_t = clip( \alpha\frac{\sqrt{1 - \beta_2^t}}{(1 - \beta_1^t)(\sqrt{v_t} + \epsilon)}, \eta_l(t), \eta_u(t))\\ \eta_l(t) = (1 - (1/((1-\gamma)t+1)))\alpha^*\\ \eta_u(t) = (1 + (1/((1-\gamma)t)))\alpha^*\end{split}\]

where \(\alpha^*\) (final_lr) is scaled by a factor defined as the current value of \(\alpha\) (set by set_learning_rate(lr)) over initial value of \(\alpha\), so that learnign rate scheduling is properly applied to both \(\alpha\) and \(\alpha^*\).

Parameters
  • alpha (float) – Step size (\(\alpha\)).

  • beta1 (float) – Decay rate of first-order momentum (\(\beta_1\)).

  • beta2 (float) – Decay rate of second-order momentum (\(\beta_2\)).

  • eps (float) – Small value for avoiding zero division(\(\epsilon\)).

  • final_lr (float) – Final (SGD) learning rate.

  • gamma (float) – Convergence speed of the bound functions.

Returns

An instance of Solver class.

See Solver API guide for details.

Return type

Solver

Note

You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by nnabla.set_default_context(ctx) or nnabla.context_scope(ctx). See API docs.

References

nnabla.solvers.Adamax(alpha=0.002, beta1=0.9, beta2=0.999, eps=1e-08)

ADAMAX Optimizer.

\[\begin{split}m_t &\leftarrow \beta_1 m_{t-1} + (1 - \beta_1) g_t\\ v_t &\leftarrow \max\left(\beta_2 v_{t-1}, |g_t|\right)\\ w_{t+1} &\leftarrow w_t - \alpha \frac{\sqrt{1 - \beta_2^t}}{1 - \beta_1^t} \frac{m_t}{v_t + \epsilon}\end{split}\]

where \(g_t\) denotes a gradient, and let \(m_0 \leftarrow 0\) and \(v_0 \leftarrow 0\), \(v_t\) is an exponentially weighted infinity norm of a sequence of gradients \(t=0,...,t\).

Parameters
  • alpha (float) – Step size (\(\alpha\)).

  • beta1 (float) – Decay rate of first-order momentum (\(\beta_1\)).

  • beta2 (float) – Decay rate of inf-order momentum (\(\beta_2\)).

  • eps (float) – Small value for avoiding zero division(\(\epsilon\)).

Returns

An instance of Solver class.

See Solver API guide for details.

Return type

Solver

Note

You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by nnabla.set_default_context(ctx) or nnabla.context_scope(ctx). See API docs.

References

nnabla.solvers.AMSGRAD(alpha=0.001, beta1=0.9, beta2=0.999, eps=1e-08, bias_correction=False)

AMSGRAD optimizer.

\[\begin{split}m_t &\leftarrow \beta_1 m_{t-1} + (1 - \beta_1) g_t\\ v_t &\leftarrow \beta_2 v_{t-1} + (1 - \beta_2) g_t^2\\ \hat{v_t} = \max(\hat{v_{t-1}}, v_t)\\ w_{t+1} &\leftarrow w_t - \alpha \frac{m_t}{\sqrt{\hat{v_t}} + \epsilon}\end{split}\]

where \(g_t\) denotes a gradient, and let \(m_0 \leftarrow 0\) and \(v_0 \leftarrow 0\).

Parameters
  • alpha (float) – Step size (\(\alpha\)).

  • beta1 (float) – Decay rate of first-order momentum (\(\beta_1\)).

  • beta2 (float) – Decay rate of second-order momentum (\(\beta_2\)).

  • eps (float) – Small value for avoiding zero division(\(\epsilon\)). Note this does not appear in the paper.

  • bias_correction (bool) – Apply bias correction to moving averages defined in ADAM. Note this does not appear in the paper.

Returns

An instance of Solver class.

See Solver API guide for details.

Return type

Solver

Note

You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by nnabla.set_default_context(ctx) or nnabla.context_scope(ctx). See API docs.

References

nnabla.solvers.AMSBound(alpha=0.001, beta1=0.9, beta2=0.999, eps=1e-08, final_lr=0.1, gamma=0.001, bias_correction=False)

AMSBound optimizer applies dynamic bounds on learning rates to AMSGrad.

\[\begin{split}w_{t+1} &\leftarrow w_t - \eta_t*m_t\\ \eta_t = clip( \alpha\frac{\sqrt{1 - \beta_2^t}}{(1 - \beta_1^t)(\sqrt{\hat{v_t}} + \epsilon)}, \eta_l(t), \eta_u(t))\\ \hat{v_t} = \max(\hat{v_{t-1}}, v_t)\\ \eta_l(t) = (1 - (1/((1-\gamma)t+1)))\alpha^*\\ \eta_u(t) = (1 + (1/((1-\gamma)t)))\alpha^*\end{split}\]

where \(\alpha^*\) (final_lr) is scaled by a factor defined as the current value of \(\alpha\) (set by set_learning_rate(lr)) over initial value of \(\alpha\), so that learnign rate scheduling is properly applied to both \(\alpha\) and \(\alpha^*\).

Parameters
  • alpha (float) – Step size (\(\alpha\)).

  • beta1 (float) – Decay rate of first-order momentum (\(\beta_1\)).

  • beta2 (float) – Decay rate of second-order momentum (\(\beta_2\)).

  • eps (float) – Small value for avoiding zero division(\(\epsilon\)). Note this does not appear in the paper.

  • final_lr (float) – Final (SGD) learning rtae

  • gamma (float) – Convergence speed of the bound functions

  • bias_correction (bool) – Apply bias correction to moving averages defined in ADAM. Note this does not appear in the paper.

Returns

An instance of Solver class.

See Solver API guide for details.

Return type

Solver

Note

You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by nnabla.set_default_context(ctx) or nnabla.context_scope(ctx). See API docs.

References

nnabla.solvers.AdamW(alpha=0.001, beta1=0.9, beta2=0.999, eps=1e-08, wd=0.0001)

ADAM optimizer with decoupled weight decay.

\[\begin{split}m_t &\leftarrow \beta_1 m_{t-1} + (1 - \beta_1) g_t\\ v_t &\leftarrow \beta_2 v_{t-1} + (1 - \beta_2) g_t^2\\ w_{t+1} &\leftarrow w_t - \alpha \frac{\sqrt{1 - \beta_2^t}}{1 - \beta_1^t} \frac{m_t}{\sqrt{v_t} + \epsilon} - \eta_t\lambda w_t\end{split}\]

where \(g_t\) denotes a gradient, \(\lambda\) is the decoupled weight decay rate, and \(m_0 \leftarrow 0\) and \(v_0 \leftarrow 0\).

Parameters
  • alpha (float) – Step size (\(\alpha\)).

  • beta1 (float) – Decay rate of first-order momentum (\(\beta_1\)).

  • beta2 (float) – Decay rate of second-order momentum (\(\beta_2\)).

  • eps (float) – Small value for avoiding zero division(\(\epsilon\)).

  • wd (float) – Weight decay rate.

Returns

An instance of Solver class.

See Solver API guide for details.

Return type

Solver

Note

You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by nnabla.set_default_context(ctx) or nnabla.context_scope(ctx). See API docs.

References

nnabla.solvers.SgdW(lr=0.001, momentum=0.9, wd=0.0001)

Stochastic gradient descent (SGD) optimizer with decoupled weight decay.

\[\begin{split}v_t \leftarrow \gamma v_{t-1} + \eta g_t - (\eta / \eta_0)\lambda v_{t-1}\\ w_{t+1} \leftarrow w_t - v_t\end{split}\]

where \(\lambda\) is the decoupled weight decay rate.

Parameters
  • lr (float) – Learning rate (\(\eta\)).

  • momentum (float) – Decay rate of momentum.

  • wd (float) – Weight decay rate.

Returns

An instance of Solver class.

See Solver API guide for details.

Return type

Solver

Note

You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by nnabla.set_default_context(ctx) or nnabla.context_scope(ctx). See API docs.

References

Communicator

Communicator transfers parameters over the compute graphs.

This is an alias to communicator.py.

Communicator interface
class nnabla.communicators.Communicator

Communicator interface class.

Communicator exchanges data (e.g., gradient) using MPI-like collectives. This class is used for the distributed training.

abort(self)

Terminates MPI execution environment

add_context_and_parameters(self, ctx_param_dict)

Add context and parameters.

Parameters

ctx_param_dict (tuple of Context, dict) – Key of the dictionary is string and value of the dictionary is Variable.

all_gather(self, ndarray, ndarray_list, string group='world')

All gather over data in different device.

Parameters
  • ndarray (NdArray) – Data to be gathered.

  • ndarray_list (NdArray) – Data to be saved.

  • group (string) – Name of a group. This groups is used when the collective is called.

Example:

# Run like `mpirun -n 2 python <code_snippet.py>`
# note: the order of the output to stdout are stochastic because of multiprocesses.

# Communicator and Context
import numpy as np
import nnabla as nn
import nnabla.communicators as C
from nnabla.ext_utils import get_extension_context

extension_module = "cudnn"
ctx = get_extension_context(extension_module)
comm = C.MultiProcessCommunicator(ctx)
comm.init()

# Data
x = nn.Variable([2, 2])
x.d = np.random.rand(*x.shape)
y_list = [nn.Variable([2, 2]), nn.Variable([2, 2])]
print("Before the collective ({}-th)".format(comm.rank))
print(x.d)

# AllGather
comm.all_gather(x.data, [y.data for y in y_list])

# Check
print("After the collective ({}-th)".format(comm.rank))
for y in y_list:
    print(y.d)
all_reduce(self, data, bool division=False, bool inplace=False, string group='world')

All reduce over data in different device.

Parameters
  • data (NdArray or list of NdArray) –

  • division (bool) – Flag to divide the reduce data by the number of contexts added, or the number of devices.

  • inplace (bool) – Flag to use a packed array. Default is false. When true, it is memory-efficient but slow. When false, it is not memory efficient but fast. In both case, one can get the result in the same memory region.

  • group (string) – Name of a group. This groups is used when the collective is called.

Example:

# Run like `mpirun -n 2 python <code_snippet.py>`
# note: the order of the output to stdout are stochastic because of multiprocesses.

# Communicator and Context
import numpy as np
import nnabla as nn
import nnabla.communicators as C
from nnabla.ext_utils import get_extension_context

extension_module = "cudnn"
ctx = get_extension_context(extension_module)
comm = C.MultiProcessCommunicator(ctx)
comm.init()

# Data
x_list = [nn.Variable([2, 2]), nn.Variable([2, 2])]
print("Before the collective ({}-th)".format(comm.rank))
for x in x_list:
    x.d = np.random.rand(*x.shape)
    print(x.d)

# AllReduce
comm.all_reduce([x.data for x in x_list], inplace=True)

# Check
print("After the collective ({}-th)".format(comm.rank))
for x in x_list:
    print(x.d)
all_reduce_callback(self, data, size_t pack_size, bool division=False, string group='world')

All reduce over data in different device.

Note

This function does not support shared parameters (such as RNNs) currently.

Parameters
  • data (NdArray or list of NdArray) –

  • pack_size (int) – The number of values contained in the packed data.

  • division (bool) – Flag to divide the reduce data by the number of contexts added, or the number of devices.

  • group (string) – Name of a group. This groups is used when the collective is called.

Example:

In case of the multi-process data parallel distributed training,

# Run like `mpirun -n 2 python <code_snippet.py>`

# Communicator and Context
import numpy as np
import nnabla as nn
import nnabla.communicators as C
from nnabla.ext_utils import get_extension_context

extension_module = "cudnn"
ctx = get_extension_context(extension_module)
comm = C.MultiProcessCommunicator(ctx)
comm.init()

n_class = 2
b, c, h, w = 4, 1, 32, 32

# Data
x = nn.Variable([b, c, h, w])
y = nn.Variable([b, 1])

# Network setting
h = PF.convolution(x, 1, (3, 3), (1, 1), (1, 1))
pred = PF.affine(h, 2)
loss = F.mean(F.softmax_cross_entropy(pred, y))

loss.forward()
# AllReduce during backward
loss.backward(communicator_callbacks = comm.all_reduce_callback([v.grad for v in nn.get_parameters().values()], 1024 * 1024 * 2))
allreduce(self, bool division=False, bool inplace=False)

Deprecated. See all_reduce, instead.

Allreduce over parameters added. Currently, allreduce is applied to gradient regions.

Parameters
  • division (bool) – Flag to divide the reduce data by the number of contexts added, or the number of devices.

  • inplace (bool) – Flag to use a packed array. Default is false. When true, it is memory-efficient but slow. When false, it is not memory efficient but fast. In both case, one can get the result in the same memory region.

barrier(self)

Blocks until all processes in the communicator have reached this routine.

bcast(self, data, int src, bool inplace=False, string group='world')

Broadcast data to different devices.

Parameters
  • data (NdArray or list of NdArray) –

  • src (int) – Source rank where the data is broadcasted.

  • inplace (bool) – Flag to use a packed array. Default is false. When true, it is memory-efficient but slow. When false, it is not memory efficient but fast. In both case, one can get the result in the same memory region.

  • group (string) – Name of a group. This groups is used when the collective is called.

Example:

# Run like `mpirun -n 2 python <code_snippet.py>`
# note: the order of the output to stdout are stochastic because of multiprocesses.

# Communicator and Context
import numpy as np
import nnabla as nn
import nnabla.communicators as C
from nnabla.ext_utils import get_extension_context

extension_module = "cudnn"
ctx = get_extension_context(extension_module)
comm = C.MultiProcessCommunicator(ctx)
comm.init()

# Data
x_list = [nn.Variable([2, 2]), nn.Variable([2, 2])]
print("Before the collective ({}-th)".format(comm.rank))
for x in x_list:
    x.d = np.random.rand(*x.shape)
    print(x.d)

# Bcast
comm.bcast([x.data for x in x_list], src=0, inplace=True)

# Check
print("After the collective ({}-th)".format(comm.rank))
for x in x_list:
    print(x.d)
clear_context_parameters(self)

Clear all registered contexts and parameters.

find_group(self, group)

Return the list of ranks in the group. If the group does not exist, the empty list is returned.

Parameters

group (str) – Name of the group.

Returns

List of ranks (int).

Return type

ranks (list)

init(self)

Initialize a communicator.

Initall or initrank, depending multi-threads or multi-processes. This function MUST be called after all parameters communicated are added by add_context_and_parameters.

list_groups(self)
Returns

Groups (str) of name to ranks (list).

Return type

groups (dict)

local_rank

Get local rank of communicator.

name

Get communicator name.

new_group(self, name_ranks)
Parameters

name_ranks (tuple) – Tuple of name (str) and ranks (list).

Returns

group name (str)

Example:

# Communicator and Context
extension_module = "cudnn"
ctx = get_extension_context(extension_module)
comm = C.MultiProcessCommunicator(ctx)
comm.init()

# New group
group = comm.new_group("node0", [0, 1, 2, 3])
rank

Get rank of communicator.

reduce(self, data, int dst, bool division=False, bool inplace=False, string group='world')

Reduce over data in different device.

Parameters
  • data (NdArray or list of NdArray) –

  • dst (int) – Destination rank where the result is saved.

  • division (bool) – Flag to divide the reduce data by the number of contexts added, or the number of devices.

  • inplace (bool) – Flag to use a packed array. Default is false. When true, it is memory-efficient but slow. When false, it is not memory efficient but fast. In both case, one can get the result in the same memory region.

  • group (string) – Name of a group. This groups is used when the collective is called.

Example:

# Run like `mpirun -n 2 python <code_snippet.py>`
# note: the order of the output to stdout are stochastic because of multiprocesses.

# Communicator and Context
import numpy as np
import nnabla as nn
import nnabla.communicators as C
from nnabla.ext_utils import get_extension_context

extension_module = "cudnn"
ctx = get_extension_context(extension_module)
comm = C.MultiProcessCommunicator(ctx)
comm.init()

# Data
x_list = [nn.Variable([2, 2]), nn.Variable([2, 2])]
print("Before the collective ({}-th)".format(comm.rank))
for x in x_list:
    x.d = np.random.rand(*x.shape)
    print(x.d)

# Reduce
comm.reduce([x.data for x in x_list], dst=0, inplace=True)

# Check
print("After the collective ({}-th)".format(comm.rank))
for x in x_list:
    print(x.d)
reduce_scatter(self, ndarray_list, ndarray, bool division=False, string group='world')

Reduce scatter over data in different device.

Parameters
  • ndarray_list (NdArray) – List of data to be reduced over different devices.

  • ndarray (NdArray) – Data to be saved.

  • group (string) – Name of a group. This groups is used when the collective is called.

Example:

# Run like `mpirun -n 2 python <code_snippet.py>`
# note: the order of the output to stdout are stochastic because of multiprocesses.

# Communicator and Context
import numpy as np
import nnabla as nn
import nnabla.communicators as C
from nnabla.ext_utils import get_extension_context

extension_module = "cudnn"
ctx = get_extension_context(extension_module)
comm = C.MultiProcessCommunicator(ctx)
comm.init()

# Data
x_list = [nn.Variable([2, 2]), nn.Variable([2, 2])]
y = nn.Variable([2, 2])
print("Before the collective ({}-th)".format(comm.rank))
for x in x_list:
    x.d = np.random.rand(*x.shape)
    print(x.d)

# ReduceScatter
comm.reduce_scatter([x.data for x in x_list], y.data)

# Check
print("After the collective ({}-th)".format(comm.rank))
print(y.d)
size

Get size of communicator.

List of communicators
nnabla.communicators.MultiProcessDataParalellCommunicator()

MultiProcessDataParallelCommunicator(CContext ctx)

Multi Process Data Parallel Communicator for Distributed Training.

Parameters

context (Context) – context used in this communicator.

Example:

In case of the multi-process data parallel distributed training,

# Communicator and Context
extension_module = "cudnn"
ctx = get_extension_context(extension_module)
comm = C.MultiProcessCommunicator(ctx)
comm.init()
n_devices = comm.size
mpi_rank = comm.rank
device_id = comm.local_rank
ctx.device_id = str(device_id)
nn.set_default_context(ctx)

# Network and Solver created here

...


# Training loop
for itr in range(num_itr):
    # Forward, zerograd, backward
    loss.forward()
    solver.zero_grad()
    loss.backward()

    # Allreduce
    comm.all_reduce([v.grad for v in nn.get_parameters().values()])

    # Update
    solver.update()

Monitors

The Monitor API provides helpers for logging the progress of neural network training.

class nnabla.monitor.Monitor(save_path)[source]

This class is created to setup the output directory of the monitoring logs. The created nnabla.monitor.Monitor instance is passed to classes in the following Monitors.

List of Monitors
class nnabla.monitor.MonitorSeries(name, monitor=None, interval=1, verbose=True)[source]

Logs a series of values.

The values are displayed and/or output to the file <name>-series.txt.

Example:

mons = MonitorSeries('mon', interval=2)
for i in range(10):
    mons.add(i, i * 2)
Parameters
  • name (str) – Name of the monitor. Used in the log.

  • monitor (Monitor) – Monitor class instance.

  • interval (int) – Interval of flush the outputs. The values added by .add() are averaged during interval.

  • verbose (bool) – Output to screen.

add(index, value)[source]

Add a value to the series.

Parameters
  • index (int) – Index.

  • value (float) – Value.

class nnabla.monitor.MonitorTimeElapsed(name, monitor=None, interval=100, verbose=True)[source]

Logs the elapsed time.

The values are displayed and/or output to the file <name>-timer.txt.

Example:

import time
mont = MonitorTimeElapsed("time", interval=2)
for i in range(10):
    time.sleep(1)
    mont.add(i)
Parameters
  • name (str) – Name of the monitor. Used in the log.

  • monitor (Monitor) – Monitor class instance.

  • interval (int) – Interval of flush the outputs. The elapsed time is calculated within the interval.

  • verbose (bool) – Output to screen.

add(index)[source]

Calculate time elapsed from the point previously called this method or this object is created to this is called.

Parameters

index (int) – Index to be displayed, and be used to take intervals.

class nnabla.monitor.MonitorImage(name, monitor, interval=1000, verbose=True, num_images=16, normalize_method=None)[source]

Saves a series of images.

The .add() method takes a (N,..., C, H, W) array as an input, and num_images of [H, W, :min(3, C)] are saved into the monitor folder for each interval.

The values are displayed and/or output to the file <name>/{iter}-{image index}.png.

Example:

import numpy as np
m = Monitor('tmp.monitor')
mi = MonitorImage('noise', m, interval=2, num_images=2)
x = np.random.randn(10, 3, 8, 8)
for i in range(10):
    mi.add(i, x)
Parameters
  • name (str) – Name of the monitor. Used in the log.

  • monitor (Monitor) – Monitor class instance.

  • interval (int) – Interval of flush the outputs.

  • num_images (int) – Number of images to be saved in each iteration.

  • normalize_method (function) – A function that takes a NCHW format image minibatch as numpy.ndarray. The function should define a normalizer which map any inputs to a range of [0, 1]. The default normalizer normalizes the images into min-max normalization.

add(index, var)[source]

Add a minibatch of images to the monitor.

Parameters
  • index (int) – Index.

  • var (Variable, NdArray, or ndarray) – A minibatch of images with (N, ..., C, H, W) format. If C == 2, blue channel is appended with ones. If C > 3, the array will be sliced to remove C > 3 sub-array.

class nnabla.monitor.MonitorImageTile(name, monitor, interval=1000, verbose=True, num_images=16, normalize_method=None)[source]

Saving a series of images.

The .add() method takes a (N,..., C, H, W) array as an input, and num_images tiled (H, W, :min(3, C)) images are saved into the monitor folder for each interval.

The values are displayed and/or output to the file <name>/{iter}-{image index}.png.

Example:

import numpy as np
m = Monitor('tmp.monitor')
mi = MonitorImageTile('noise_noise', m, interval=2, num_images=4)
x = np.random.randn(10, 3, 8, 8)
for i in range(10):
    mi.add(i, x)
Parameters
  • name (str) – Name of the monitor. Used in the log.

  • monitor (Monitor) – Monitor class instance.

  • interval (int) – Interval of flush the outputs.

  • num_images (int) – Number of images tiled to be saved into a single image in each iteration.

  • normalize_method (function) – A function that takes a NCHW format image minibatch as numpy.ndarray. The function should define a normalizer which map any inputs to a range of [0, 1]. The default normalizer normalizes the images into min-max normalization.

add(index, var)[source]

Add a minibatch of images to the monitor.

Parameters
  • index (int) – Index.

  • var (Variable, NdArray, or ndarray) – A minibatch of images with (N, ..., C, H, W) format. If C == 2, blue channel is appended with ones. If C > 3, the array will be sliced to remove C > 3 sub-array.

Utility functions
nnabla.monitor.tile_images(data, padsize=1, padval=0)[source]

Convert an array with shape of (B, C, H, W) into a tiled image.

Parameters
  • data (ndarray) – An array with shape of (B, C, H, W).

  • padsize (int) – Each tile has padding with this size.

  • padval (float) – Padding pixels are filled with this value.

Returns

A tile image.

Return type

tile_image (ndarray)

nnabla.monitor.plot_series(filename, plot_kwargs=None)[source]

Plot series data from MonitorSeries output text file.

Parameters

Note

matplotlib package is required.

nnabla.monitor.plot_time_elapsed(filename, elapsed=False, unit='s', plot_kwargs=None)[source]

Plot series data from MonitorTimeElapsed output text file.

Parameters
  • filename (str) – Path to *.series.txt file produced by MonitorSeries class.

  • elapsed (bool) – If True, it plots the total elapsed time.

  • unit (str) – Time unit chosen from 's', 'm', 'h', or 'd'.

  • plot_kwags (dict, optional) – Keyward arguments passed to :function:`matplotlib.pyplot.plot`.

Note

matplotlib package is required.

Utils

NNP save and load utilities

IMPORTANT NOTICE: To handle NNP file from Neural Network Console, if the network you want to save/load contains LoopControl functions RepeatStart, RepeatEnd, RecurrentInput, RecurrentOutput or Delay, you must expand network with File format converter.

nnabla.utils.save.save(filename, contents, include_params=False, variable_batch_size=True, extension='.nnp', parameters=None)[source]

Save network definition, inference/training execution configurations etc.

Parameters
  • filename (str or file object) –

    Filename to store information. The file extension is used to determine the saving file format. .nnp: (Recommended) Creating a zip archive with nntxt (network definition etc.) and h5 (parameters). .nntxt: Protobuf in text format. .protobuf: Protobuf in binary format (unsafe in terms of

    backward compatibility).

  • contents (dict) – Information to store.

  • include_params (bool) – Includes parameter into single file. This is ignored when the extension of filename is nnp.

  • variable_batch_size (bool) – By True, the first dimension of all variables is considered as batch size, and left as a placeholder (more specifically -1). The placeholder dimension will be filled during/after loading.

  • extension – if files is file-like object, extension is one of “.nntxt”, “.prototxt”, “.protobuf”, “.h5”, “.nnp”.

Example

The following example creates a two inputs and two outputs MLP, and save the network structure and the initialized parameters.

import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
from nnabla.utils.save import save

batch_size = 16
x0 = nn.Variable([batch_size, 100])
x1 = nn.Variable([batch_size, 100])
h1_0 = PF.affine(x0, 100, name='affine1_0')
h1_1 = PF.affine(x1, 100, name='affine1_0')
h1 = F.tanh(h1_0 + h1_1)
h2 = F.tanh(PF.affine(h1, 50, name='affine2'))
y0 = PF.affine(h2, 10, name='affiney_0')
y1 = PF.affine(h2, 10, name='affiney_1')

contents = {
    'networks': [
        {'name': 'net1',
         'batch_size': batch_size,
         'outputs': {'y0': y0, 'y1': y1},
         'names': {'x0': x0, 'x1': x1}}],
    'executors': [
        {'name': 'runtime',
         'network': 'net1',
         'data': ['x0', 'x1'],
         'output': ['y0', 'y1']}]}
save('net.nnp', contents)

To get a trainable model, use following code instead.

contents = {
'global_config': {'default_context': ctx},
'training_config':
    {'max_epoch': args.max_epoch,
     'iter_per_epoch': args_added.iter_per_epoch,
     'save_best': True},
'networks': [
    {'name': 'training',
     'batch_size': args.batch_size,
     'outputs': {'loss': loss_t},
     'names': {'x': x, 'y': t, 'loss': loss_t}},
    {'name': 'validation',
     'batch_size': args.batch_size,
     'outputs': {'loss': loss_v},
     'names': {'x': x, 'y': t, 'loss': loss_v}}],
'optimizers': [
    {'name': 'optimizer',
     'solver': solver,
     'network': 'training',
     'dataset': 'mnist_training',
     'weight_decay': 0,
     'lr_decay': 1,
     'lr_decay_interval': 1,
     'update_interval': 1}],
'datasets': [
    {'name': 'mnist_training',
     'uri': 'MNIST_TRAINING',
     'cache_dir': args.cache_dir + '/mnist_training.cache/',
     'variables': {'x': x, 'y': t},
     'shuffle': True,
     'batch_size': args.batch_size,
     'no_image_normalization': True},
    {'name': 'mnist_validation',
     'uri': 'MNIST_VALIDATION',
     'cache_dir': args.cache_dir + '/mnist_test.cache/',
     'variables': {'x': x, 'y': t},
     'shuffle': False,
     'batch_size': args.batch_size,
     'no_image_normalization': True
     }],
'monitors': [
    {'name': 'training_loss',
     'network': 'validation',
     'dataset': 'mnist_training'},
    {'name': 'validation_loss',
     'network': 'validation',
     'dataset': 'mnist_validation'}],
}
class nnabla.utils.nnp_graph.NnpLoader(filepath, scope=None, extension='.nntxt')[source]

An NNP file loader.

Parameters
  • filepath – file-like object or filepath.

  • extension – if filepath is file-like object, extension is one of “.nnp”, “.nntxt”, “.prototxt”.

Example

from nnabla.utils.nnp_graph import NnpLoader

# Read a .nnp file.
nnp = NnpLoader('/path/to/nnp.nnp')
# Assume a graph `graph_a` is in the nnp file.
net = nnp.get_network(network_name, batch_size=1)
# `x` is an input of the graph.
x = net.inputs['x']
# 'y' is an outputs of the graph.
y = net.outputs['y']
# Set random data as input and perform forward prop.
x.d = np.random.randn(*x.shape)
y.forward(clear_buffer=True)
print('output:', y.d)
get_network(name, batch_size=None, callback=None)[source]

Create a variable graph given network by name

Returns: NnpNetwork

get_network_names()[source]

Returns network names available.

class nnabla.utils.nnp_graph.NnpNetwork(proto_network, batch_size, callback)[source]

A graph object which is read from nnp file.

An instance of NnpNetwork is usually created by an NnpLoader instance. See an example usage described in NnpLoader.

variables

A dict of all variables in a created graph with a variable name as a key, and a nnabla.Variable as a value.

Type

dict

inputs

All input variables.

Type

dict

outputs

All output variables.

Type

dict

Image Utils

This module provides read, write and resize functions for images. The backends of these functions are automatically changed, depending on the user`s environment. The priority of the backends is as below (upper is higher priority):

  • OpenCV (cv2)

  • scikit-image (skimage)

  • pillow (PIL) (need to be installed)

At least one of these modules needs to be installed to use this module.

nnabla.utils.image_utils.imread(path, grayscale=False, size=None, interpolate='bilinear', channel_first=False, as_uint16=False, num_channels=- 1, **kwargs)[source]

Read image from path. If you specify the size, the output array is resized. Default output shape is (height, width, channel) for RGB image and (height, width) for gray-scale image.

Parameters
  • path (String or File Object) – Input image path.

  • grayscale (bool) – If True, the img is rescaled to gray-scale. Default is False.

  • size (tuple of int) – Output shape. The order is (width, height). If None, the image is not resized. Default is None.

  • interpolate (str) –

    Interpolation method. This argument is depend on the backend. If you want to specify this argument, you should pay much attention to which backend you use now. What you can select is below:

    • pil backend: [“nearest”, “box”, “bilinear”, “hamming”, “bicubic”, “lanczos”].

    • cv2 backend: [“nearest”, “bilinear”, “bicubic”, “lanczos”].

    Default is “bilinear” for both backends.

  • channel_first (bool) – If True, the shape of the output array is (channel, height, width) for RGB image. Default is False.

  • as_uint16 (bool) – If True, this function tries to read img as np.uint16. Default is False.

  • num_channels (int) – channel size of output array. Default is -1 which preserves raw image shape.

  • return_palette_indices (bool) – This argument can be used only by pil backend. On pil backend, if this flag is True and PIL.Image has the mode “P”, then this function returns 2-D array containing the indices into palette. Otherwise, 3-D array of “RGB” or “RGBA” (it depends on an image info) will be returned. Default value is False.

Returns

if as_uint16=True output dtype is np.uint16, else np.uint8 (default).

Return type

numpy.ndarray

nnabla.utils.image_utils.imsave(path, img, channel_first=False, as_uint16=False, auto_scale=True, **kwargs)[source]

Save img to the file specified by path. As default, the shape of img has to be (height, width, channel).

Parameters
  • path (str) – Output path.

  • img (numpy.ndarray) – Input image. All pixel values must be positive and in the range [0, 255] of int for uint8, [0, 65535] of int for uint16 or [0, 1] for float. When you pass float image, you must set auto_scale as True (If not, exception will be raised). If img with negative values is passed as input, exception will be raised.

  • channel_first (bool) – If True, you can input the image whose shape is (channel, height, width). Default is False.

  • as_uint16 (bool) – If True, cast image to uint16 before save. Default is False.

  • auto_scale (bool) – Whether the range of pixel values are scaled up or not. The range of upscaled pixel values depends on output dtype, which is [0, 255] as uint8 and [0, 65535] as uint16.

nnabla.utils.image_utils.imresize(img, size, interpolate='bilinear', channel_first=False, **kwargs)[source]

Resize img to size. As default, the shape of input image has to be (height, width, channel).

Parameters
  • img (numpy.ndarray) – Input image.

  • size (tuple of int) – Output shape. The order is (width, height).

  • interpolate (str) –

    Interpolation method. This argument is depend on the backend. If you want to specify this argument, you should pay much attention to which backend you use now. What you can select is below:

    • pil backend: [“nearest”, “box”, “bilinear”, “hamming”, “bicubic”, “lanczos”].

    • cv2 backend: [“nearest”, “bilinear”, “bicubic”, “lanczos”].

    Default is “bilinear” for both backends.

  • channel_first (bool) – If True, the shape of the output array is (channel, height, width) for RGB image. Default is False.

Returns

numpy.ndarray

Data Iterators

NNabla provides various utilities for using data for training.

DataSource
class nnabla.utils.data_source.DataSource(shuffle=False, rng=None)[source]

Bases: object

This class contains various properties and methods for the data source, which are utilized by py:class:DataIterator.

Parameters
  • shuffle (bool) – Indicates whether the dataset is shuffled or not.

  • rng (None or numpy.random.RandomState) – Numpy random number generator.

property position

Data position in current epoch.

Returns

Data position

Return type

int

property shuffle

Whether dataset is shuffled or not.

Returns

whether dataset is shuffled.

Return type

bool

property variables

Variable names of the data.

Returns

tuple of Variable names

Return type

tuple

class nnabla.utils.data_source.DataSourceWithFileCache(data_source, cache_dir=None, cache_file_name_prefix='cache', shuffle=False, rng=None)[source]

Bases: nnabla.utils.data_source.DataSource

This class contains properties and methods for data source that can be read from cache files, which are utilized by data iterator.

Parameters
  • data_source (DataSource) – Instance of DataSource class which provides data.

  • cache_dir (str) – Location of file_cache. If this value is None, data_source.DataSourceWithFileCache creates file caches implicitly on temporary directory and erases them all when data_iterator is finished. Otherwise, data_source.DataSourceWithFileCache keeps created cache. Default is None.

  • cache_file_name_prefix (str) – Beginning of the filenames of cache files. Default is ‘cache’.

  • shuffle (bool) – Indicates whether the dataset is shuffled or not.

  • rng (None or numpy.random.RandomState) – Numpy random number generator.

property position

Data position in current epoch.

Returns

Data position

Return type

int

property shuffle

Whether dataset is shuffled or not.

Returns

whether dataset is shuffled.

Return type

bool

property variables

Variable names of the data.

Returns

tuple of Variable names

Return type

tuple

class nnabla.utils.data_source.DataSourceWithMemoryCache(data_source, shuffle=False, rng=None)[source]

Bases: nnabla.utils.data_source.DataSource

This class contains properties and methods for data source that can be read from memory cache, which is utilized by data iterator.

Parameters
  • data_source (DataSource) – Instance of DataSource class which provides data.

  • shuffle (bool) – Indicates whether the dataset is shuffled or not.

  • rng (None or numpy.random.RandomState) – Numpy random number generator.

property position

Data position in current epoch.

Returns

Data position

Return type

int

property shuffle

Whether dataset is shuffled or not.

Returns

whether dataset is shuffled.

Return type

bool

property variables

Variable names of the data.

Returns

tuple of Variable names

Return type

tuple

DataIterator
class nnabla.utils.data_iterator.DataIterator(data_source, batch_size, rng=None, use_thread=True, epoch_begin_callbacks=[], epoch_end_callbacks=[], stop_exhausted=False)[source]

Bases: object

Collect data from data_source and yields bunch of data.

Parameters
  • data_source (DataSource) – Instance of DataSource class witch provides data for this class.

  • batch_size (int) – Size of data unit.

  • rng (None or numpy.random.RandomState) – Numpy random number generator.

  • use_thread (bool) – If use_thread is set to True, iterator will use another thread to fetch data. If use_thread is set to False, iterator will use current thread to fetch data.

  • epoch_begin_callbacks (list of functions) – An item is a function which takes an epoch index as a argument. These are called at the beginning of an epoch.

  • epoch_end_callbacks (list of functions) – An item is a function which takes an epoch index as a argument. These are called at the end of an epoch.

  • stop_exhausted (bool) – If stop_exhausted is set to False, iterator will be reset so that iteration can be continued. If stop_exhausted is set to True, iterator will raise StopIteration to stop the loop.

property batch_size

Number of training samples that next() returns.

Returns

Number of training samples.

Return type

int

property epoch

The number of times position() returns to zero.

Returns

epoch

Return type

int

next()[source]

It generates tuple of data.

For example, if self._variables == ('x', 'y')() This method returns :py:meth:` ( [[X] * batch_size], [[Y] * batch_size] )`

Returns

tuple of data for mini-batch in numpy.ndarray.

Return type

tuple

property position

Data position in current epoch.

Returns

Data position

Return type

int

register_epoch_begin_callback(callback)[source]

Register epoch begin callback.

Parameters

callback (function) – A function takes an epoch index as an argument.

register_epoch_end_callback(callback)[source]

Register epoch end callback.

Parameters

callback (function) – A function takes an epoch index as an argument.

property size

Data size that DataIterator will generate. This is the largest integer multiple of batch_size not exceeding self._data_source.size().

Returns

Data size

Return type

int

slice(rng, num_of_slices=None, slice_pos=None, slice_start=None, slice_end=None, cache_dir=None, use_cache=False)[source]

Slices the data iterator so that newly generated data iterator has access to limited portion of the original data.

Parameters
  • rng (numpy.random.RandomState) – Random generator for Initializer.

  • num_of_slices (int) – Total number of slices to be made. Muts be used together with slice_pos.

  • slice_pos (int) – Position of the slice to be assigned to the new data iterator. Must be used together with num_of_slices.

  • slice_start (int) – Starting position of the range to be sliced into new data iterator. Must be used together with slice_end.

  • slice_end (int) – End position of the range to be sliced into new data iterator. Must be used together with slice_start.

  • cache_dir (str) – Directory to save cache files. if cache_dir is None and use_cache is True, will used memory cache.

  • use_cache (bool) – Whether use cache for data_source.

Example:

from nnabla.utils.data_iterator import data_iterator_simple
import numpy as np

def load_func1(index):
    d = np.ones((2, 2)) * index
    return d

di = data_iterator_simple(load_func1, 1000, batch_size=3)

di_s1 = di.slice(None, num_of_slices=10, slice_pos=0)
di_s2 = di.slice(None, num_of_slices=10, slice_pos=1)

di_s3 = di.slice(None, slice_start=100, slice_end=200)
di_s4 = di.slice(None, slice_start=300, slice_end=400)
property variables

Variable names of the data.

Returns

tuple of Variable names

Return type

tuple

Utilities
nnabla.utils.data_iterator.data_iterator(data_source, batch_size, rng=None, use_thread=True, with_memory_cache=True, with_file_cache=False, cache_dir=None, epoch_begin_callbacks=[], epoch_end_callbacks=[], stop_exhausted=False)[source]

Helper method to use DataSource.

You can use DataIterator with your own DataSource for easy implementation of data sources.

For example,

ds = YourOwnImplementationOfDataSource()
batch = data_iterator(ds, batch_size)
Parameters
  • data_source (DataSource) – Instance of DataSource class which provides data.

  • batch_size (int) – Batch size.

  • rng (None or numpy.random.RandomState) – Numpy random number generator.

  • use_thread (bool) – If use_thread is set to True, iterator will use another thread to fetch data. If use_thread is set to False, iterator will use current thread to fetch data.

  • with_memory_cache (bool) – If True, use data_source.DataSourceWithMemoryCache to wrap data_source. It is a good idea to set this as true unless data_source provides on-memory data. Default value is True.

  • with_file_cache (bool) – If True, use data_source.DataSourceWithFileCache to wrap data_source. If data_source is slow, enabling this option a is good idea. Default value is False.

  • cache_dir (str) – Location of file_cache. If this value is None, data_source.DataSourceWithFileCache creates file caches implicitly on temporary directory and erases them all when data_iterator is finished. Otherwise, data_source.DataSourceWithFileCache keeps created cache. Default is None.

  • epoch_begin_callbacks (list of functions) – An item is a function which takes an epoch index as an argument. These are called at the beginning of an epoch.

  • epoch_end_callbacks (list of functions) – An item is a function which takes an epoch index as an argument. These are called at the end of an epoch.

  • stop_exhausted (bool) – If stop_exhausted is set to False, iterator will be reset so that iteration can be continued. If stop_exhausted is set to True, iterator will raise StopIteration to stop the loop.

Returns

Instance of DataIterator.

Return type

DataIterator

nnabla.utils.data_iterator.data_iterator_simple(load_func, num_examples, batch_size, shuffle=False, rng=None, use_thread=True, with_memory_cache=False, with_file_cache=False, cache_dir=None, epoch_begin_callbacks=[], epoch_end_callbacks=[], stop_exhausted=False)[source]

A generator that yield s minibatch data as a tuple, as defined in load_func . It can unlimitedly yield minibatches at your request, queried from the provided data.

Parameters
  • load_func (function) – Takes a single argument i, an index of an example in your dataset to be loaded, and returns a tuple of data. Every call by any index i must return a tuple of arrays with the same shape.

  • num_examples (int) – Number of examples in your dataset. Random sequence of indexes is generated according to this number.

  • batch_size (int) – Size of data unit.

  • shuffle (bool) – Indicates whether the dataset is shuffled or not. Default value is False.

  • rng (None or numpy.random.RandomState) – Numpy random number generator.

  • use_thread (bool) – If use_thread is set to True, iterator will use another thread to fetch data. If use_thread is set to False, iterator will use current thread to fetch data.

  • with_memory_cache (bool) – If True, use data_source.DataSourceWithMemoryCache to wrap data_source. It is a good idea to set this as true unless data_source provides on-memory data. Default value is False.

  • with_file_cache (bool) – If True, use data_source.DataSourceWithFileCache to wrap data_source. If data_source is slow, enabling this option a is good idea. Default value is False.

  • cache_dir (str) – Location of file_cache. If this value is None, data_source.DataSourceWithFileCache creates file caches implicitly on temporary directory and erases them all when data_iterator is finished. Otherwise, data_source.DataSourceWithFileCache keeps created cache. Default is None.

  • epoch_begin_callbacks (list of functions) – An item is a function which takes an epoch index as an argument. These are called at the beginning of an epoch.

  • epoch_end_callbacks (list of functions) – An item is a function which takes an epoch index as an argument. These are called at the end of an epoch.

  • stop_exhausted (bool) – If stop_exhausted is set to False, iterator will be reset so that iteration can be continued. If stop_exhausted is set to True, iterator will raise StopIteration to stop the loop.

Returns

Instance of DataIterator.

Return type

DataIterator

Here is an example of load_func which returns an image and a label of a classification dataset.

import numpy as np
from nnabla.utils.image_utils import imread
image_paths = load_image_paths()
labels = load_labels()
def my_load_func(i):
    '''
    Returns:
        image: c x h x w array
        label: 0-shape array
    '''
    img = imread(image_paths[i]).astype('float32')
    return np.rollaxis(img, 2), np.array(labels[i])
nnabla.utils.data_iterator.data_iterator_csv_dataset(uri, batch_size, shuffle=False, rng=None, use_thread=True, normalize=True, with_memory_cache=True, with_file_cache=True, cache_dir=None, epoch_begin_callbacks=[], epoch_end_callbacks=[], stop_exhausted=False)[source]

Get data directly from a dataset provided as a CSV file.

You can read files located on the local file system, http(s) servers or Amazon AWS S3 storage.

For example,

batch = data_iterator_csv_dataset('CSV_FILE.csv', batch_size, shuffle=True)
Parameters
  • uri (str) – Location of dataset CSV file.

  • batch_size (int) – Size of data unit.

  • shuffle (bool) – Indicates whether the dataset is shuffled or not. Default value is False.

  • rng (None or numpy.random.RandomState) – Numpy random number generator.

  • use_thread (bool) – If use_thread is set to True, iterator will use another thread to fetch data. If use_thread is set to False, iterator will use current thread to fetch data.

  • normalize (bool) – If True, each sample in the data gets normalized by a factor of 255. Default is True.

  • with_memory_cache (bool) – If True, use data_source.DataSourceWithMemoryCache to wrap data_source. It is a good idea to set this as true unless data_source provides on-memory data. Default value is True.

  • with_file_cache (bool) – If True, use data_source.DataSourceWithFileCache to wrap data_source. If data_source is slow, enabling this option a is good idea. Default value is False.

  • cache_dir (str) – Location of file_cache. If this value is None, data_source.DataSourceWithFileCache creates file caches implicitly on temporary directory and erases them all when data_iterator is finished. Otherwise, data_source.DataSourceWithFileCache keeps created cache. Default is None.

  • epoch_begin_callbacks (list of functions) – An item is a function which takes an epoch index as an argument. These are called at the beginning of an epoch.

  • epoch_end_callbacks (list of functions) – An item is a function which takes an epoch index as an argument. These are called at the end of an epoch.

  • stop_exhausted (bool) – If stop_exhausted is set to False, iterator will be reset so that iteration can be continued. If stop_exhausted is set to True, iterator will raise StopIteration to stop the loop.

Returns

Instance of DataIterator

Return type

DataIterator

nnabla.utils.data_iterator.data_iterator_cache(uri, batch_size, shuffle=False, rng=None, use_thread=True, normalize=True, with_memory_cache=True, epoch_begin_callbacks=[], epoch_end_callbacks=[], stop_exhausted=False)[source]

Get data from the cache directory.

Cache files are read from the local file system.

For example,

batch = data_iterator_cache('CACHE_DIR', batch_size, shuffle=True)
Parameters
  • uri (str) – Location of directory with cache files.

  • batch_size (int) – Size of data unit.

  • shuffle (bool) – Indicates whether the dataset is shuffled or not. Default value is False.

  • rng (None or numpy.random.RandomState) – Numpy random number generator.

  • use_thread (bool) – If use_thread is set to True, iterator will use another thread to fetch data. If use_thread is set to False, iterator will use current thread to fetch data.

  • normalize (bool) – If True, each sample in the data gets normalized by a factor of 255. Default is True.

  • with_memory_cache (bool) – If True, use data_source.DataSourceWithMemoryCache to wrap data_source. It is a good idea to set this as true unless data_source provides on-memory data. Default value is True.

  • epoch_begin_callbacks (list of functions) – An item is a function which takes an epoch index as an argument. These are called at the beginning of an epoch.

  • epoch_end_callbacks (list of functions) – An item is a function which takes an epoch index as an argument. These are called at the end of an epoch.

  • stop_exhausted (bool) – If stop_exhausted is set to False, iterator will be reset so that iteration can be continued. If stop_exhausted is set to True, iterator will raise StopIteration to stop the loop.

Returns

Instance of DataIterator

Return type

DataIterator

nnabla.utils.data_iterator.data_iterator_concat_datasets(data_source_list, batch_size, shuffle=False, rng=None, use_thread=True, with_memory_cache=True, with_file_cache=False, cache_dir=None, epoch_begin_callbacks=[], epoch_end_callbacks=[], stop_exhausted=False)[source]

Get data from multiple datasets.

For example,

batch = data_iterator_concat_datasets([DataSource0, DataSource1, ...], batch_size)
Parameters
  • data_source_list (list of DataSource) – list of datasets.

  • batch_size (int) – Size of data unit.

  • shuffle (bool) – Indicates whether the dataset is shuffled or not. Default value is False.

  • rng (None or numpy.random.RandomState) – Numpy random number generator.

  • use_thread (bool) – If use_thread is set to True, iterator will use another thread to fetch data. If use_thread is set to False, iterator will use current thread to fetch data.

  • with_memory_cache (bool) – If True, use data_source.DataSourceWithMemoryCache to wrap data_source. It is a good idea to set this as true unless data_source provides on-memory data. Default value is True.

  • with_file_cache (bool) – If True, use data_source.DataSourceWithFileCache to wrap data_source. If data_source is slow, enabling this option a is good idea. Default value is False.

  • cache_dir (str) – Location of file_cache. If this value is None, data_source.DataSourceWithFileCache creates file caches implicitly on temporary directory and erases them all when data_iterator is finished. Otherwise, data_source.DataSourceWithFileCache keeps created cache. Default is None.

  • epoch_begin_callbacks (list of functions) – An item is a function which takes an epoch index as an argument. These are called at the beginning of an epoch.

  • epoch_end_callbacks (list of functions) – An item is a function which takes an epoch index as an argument. These are called at the end of an epoch.

  • stop_exhausted (bool) – If stop_exhausted is set to False, iterator will be reset so that iteration can be continued. If stop_exhausted is set to True, iterator will raise StopIteration to stop the loop.

Returns

Instance of DataIterator

Return type

DataIterator

Debug Utils
Graph Profiler
class nnabla.utils.profiler.GraphProfiler(graph, device_id, ext_name, solver=None, n_run=100, max_measure_execution_time=1, time_scale='m', backward_accum=False)[source]

Class for measuring calculation time of each functions which compose nnabla computation graph.

You can check some performances of your nnabla network. This can measure the calculation times of :

  • function-wise forward

  • function-wise backward

  • whole graph forward

  • whole graph backward

  • training (forward + backward + update) (if solver is not None)

Example:

import nnabla as nn
import nnabla.functions as F
import nnabla.solvers as S
from nnabla.utils.profiler import GraphProfiler

# Set up nnabla context
device = "cpu"  # you can also use GPU ("cudnn")
ctx = get_extension_context(device)
nn.set_default_context(ctx)

# Network building
x = nn.Variable(shape=...)
t = nn.Variable(shape=...)
y = CNN(x) # you can build not only CNN but any networks
loss = F.mean(F.softmax_cross_entropy(y, t)) # any loss functions or variables can be used

# solver setting
solver = S.Sgd()
solver.set_parameters(nn.get_parameters())

# SOME CODE (data loading or so on)

B = GraphProfiler(loss, solver=solver, device_id=0, ext_name=device, n_run=1000)
B.run()
Parameters
  • graph (nnabla.Variable) – Instance of nnabla.Variable class. GraphProfiler find all functions which compose network graph from root nnabla.Variable to this nnabla.Variable.

  • device_id (str) – gpu device id.

  • ext_name (str) – Extension name. e.g. ‘cpu’, ‘cuda’, ‘cudnn’ etc.

  • solver (nnabla.solvers.Solver) – Instance of nnabla.solvers.Solver for optimizing the parameters of the computation graph. if None, the training process is ignored. Default value is None.

  • n_run (int) – This argument specifies how many times the each functions` execution time are measured. Default value is 100.

  • max_measure_execution_time (float) – Maximum time of executing measurement for each functions. This argument has higher priority than n_run. When the measurement time for each functions get bigger than this argument, this class stops measuring and goes to next function, unless the total times of measurement are less than n_run. Default value is 1 [sec].

  • time_scale (str) – Time scale to display. [‘m’, ‘u’, ‘n’] (which stands for ‘mili’, ‘micro’ and ‘nano’)

  • backward_accum (bool) – Accumulation flag passed to the each backward function. The flag will fulfill the all accumulation flags with the same value of backward_accum. This flag is only valid for the time measurement of each function. For whole graph comutation, the NNabla graph engine set the appropriate accumulation flags to functions. Pay attention to inplace flag for your graph because accumulation and inplace flags cannot be set at the same time. If even one inplace flag is true in your graph, this backward_accum must be false. Default value is False.

run()[source]

Execute profiling.

This executes the 5 types of measurement:

  • function-wise forward

  • function-wise backward

  • whole graph forward

  • whole graph backward

  • training (forward + backward + update) (if solver is not None.)

class nnabla.utils.profiler.GraphProfilerCsvWriter(gb, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]

csv writer for GraphProfiler class.

Example:

from nnabla.utils.profiler import GraphProfiler, GraphProfilerCsvWriter

# Network building comes above

B = GraphProfiler(variable, solver=solver, device_id=0, ext_name=device, n_run=1000)
B.run()

with open("./profile.csv", "w") as f:
    writer = GraphProfilerCsvWriter(B, file=f)
    writer.write()
Parameters
  • gb (GraphProfiler) – Instance of GraphProfiler class which is main executor of profiling.

  • file (Python file object) – Output file object. Profile results will be written to the file which is specified by this argument.

write()[source]

Write result to the file. The output file is specified by file.

Time Profiler
class nnabla.utils.inspection.profile.TimeProfiler(ext_name, device_id)[source]

An utility API to create function_hook callbacks to profile the execution time of each function. Passing ext_name and device_id, you can define which device time you want to profile. If ext_name = “cuda” or “cudnn”, then cudaEvent will be used to measure the execution time. For more information about cudaEvent, see the CUDA document. If `ext_name`=”cpu” , then wall-clock-time on host will be used.

Example:

ext_name = "cpu"
device_id = "0"

from nnabla.ext_utils import get_extension_context
ctx = get_extension_context(ext_name, device_id=device_id)
nn.set_default_context(ctx)

y = model(...)

from nnabla.utils.inspection import TimeProfiler
tp = TimeProfiler(ext_name=ext_name, device_id=device_id)

for i in range(max_iter):
    # All results of executions under "forward" scope are registered as "forward" execution.
    with tp.scope("forward"):
        y.forward(function_pre_hook=tp.pre_hook, function_post_hook=tp.post_hook)

    # All results of executions under "backward" scope are registered as "backward" execution.
    with tp.scope("backward") as tp:
        y.backward(function_pre_hook=tp.pre_hook, function_post_hook=tp.post_hook)

    # All results are evaluated by passing scopes to .calc_elapsed_time().
    # Be sure to call calc_elapsed_time at each iteration, otherwise nothing is measured.
    tp.calc_elapsed_time(["forward", "backward", "summary"])

# To output results on stdout, call instance as a function.
tp()

# To write out as csv file, call .to_csv().
tp.to_csv(output_file_name)
calc_elapsed_time(names=None)[source]

Evaluate all elapsed times. Note that elapsed time is not recorded until calc_elapsed_time is called.

Parameters

names (str or list of str) – Scope name(s) to evaluate elapsed time.

property post_hook

Get a callback for function_post_hook. This function can be used like the example below:

tp = TimeProfiler(..)
with tp.scope("forward"):
    v.forward(function_post_hook=tp.post_hook())

with tp.scope("backward"):
    v.backward(function_post_hook=tp.post_hook())
property pre_hook

Get a callback for function_pre_hook. This function can be used like the example below:

tp = TimeProfiler(..)
with tp.scope("forward"):
    v.forward(function_pre_hook=tp.pre_hook())

with tp.scope("backward"):
    v.backward(function_pre_hook=tp.pre_hook())
scope(scope_name)[source]

Change a scope to aggregate results. This function is used as context (The with statement statement),

and all results under the context are labeled by scope_name.

In adttion to the execution time of each function, the elapsed times between entering and exiting the each context are also recorded

and they are aggregated as “summary” scope.

Parameters

scope_name (str) – Scope name.

to_csv(out_dir='./', ignore_init=True)[source]

Writes out to csv file. Output directory can be specified by out_dir. As default, the elapsed times of first iteration will be omitted. If you evaluate the first iteration as well, pass True to ignore_init.

Parameters
  • out_dir (str) – Output directory.

  • ignore_init (bool) – Ignore the result of the first iteration or not.

Nan/Inf Tracer
class nnabla.utils.inspection.value_trace.NanInfTracer(trace_nan=True, trace_inf=True, need_details=True)[source]

An utility API to create function_hook callbacks to check whether the outputs of all layers have NaN or inf as their values. During forward and backward execution, passed as function_hook, this API reports ValueError if at least one of all layer outputs has Nan or inf as its values. Otherwise, all tensors passed to next layer or function as is.

Example:

pred = model(...)

from nnabla.utils.inspection import NanInfTracer
nit = NanInfTracer(trace_inf=True, trace_nan=True, need_details=True)

with nit.trace():
    pred.forward(function_post_hook=nit.forward_post_hook)
    pred.backward(function_post_hook=nit.backward_post_hook)
property backward_post_hook

Create callback function object which can be used as a function_post_hook argument of backward().

check()[source]

Checks nan/inf existence at all outputs of all layers and raises ValueError only if exist.

property forward_post_hook

Create callback function object which can be used as a function_post_hook argument of forward().

trace()[source]

Create context manager to check nan/inf existence by using with statement. Using this context manager, checking nan/inf is performed automatically just before exiting with scope. Unless you use this context manager, be sure to call .check() explicitly to check nan/inf.

Example:

nit = NanInfTracer()
with nit.trace():
    pred.forward(function_post_hook=nit.forward_post_hook)
    pred.backward(function_post_hook=nit.backward_post_hook)
Pretty Printer
class nnabla.utils.inspection.pretty_print.PrettyPrinter(summary=False, hidden=False)[source]

Pretty printer to print the graph structure used with the visit method of a Variable.

functions

List of functions of which element is the dictionary. The (key, value) pair is the (name, function name), (inputs, list of input variables), and (outputs, list of output variables) of a function.

Type

list of dict

nnabla.utils.inspection.pretty_print.pprint(v, forward=False, backward=False, summary=False, hidden=False, printer=False)[source]

Pretty print information of a graph from a root variable v.

Note that in order to print the summary statistics, this function stores, i.e., does not reuse the intermediate buffers of a computation graph, increasing the memory usage if either the forward or backward is True.

Parameters
  • v (nnabla.Variable) – Root variable.

  • forward (bool) – Call the forward method of a variable v.

  • backward (bool) – Call the backward method of a variable v.

  • summary (bool) – Print statictis of a intermediate variable.

  • hidden (bool) – Store the intermediate input and output variables if True.

  • printer (bool) – Return the printer object if True.

Example:

pred = Model(...)

from nnabla.utils.inspection import pprint

pprint(pred, summary=True, forward=True, backward=True)
DLPack

Via a DLPack capsule, you can borrow a tensor from external software as a nnabla.NdArray and can share a NdArray to external software.

nnabla.utils.dlpack.from_dlpack(dlp, arr=None)

Decode a DLPack to NdArray.

Example:

# Create a tensor of an external tool, and encode as an DLPack.
import torch
from torch.utils.dlpack import to_dlpack
t = torch.ones((5, 5), dtype=torch.float32,
               device=torch.device('cuda'))
dlp = to_dlpack(t)

# Borrow the DLPack tensor as nnabla.NdArray
from nnabla.utils.dlpack import from_dlpack
arr = from_dlpack(dlp)

If you want to move an ownership of DLPack to an exiting NdArray;

from nnabla import NdArray
arr = NdArray()
from_dlpack(dlp, arr=arr)
Parameters
  • dlp (PyCapsule) – A PyCapsule object of a DLManagedTensor (as "dltensor") which internal memory is borrowed by a tensor of an external package. The ownership of the DLManagedTensor is moved to an NdArray object, and the PyCapsule object is marked as "used_dltensor" to inform that the ownership has been moved.

  • arr (NdArray) – If specified, a given DLPack is decoded to it. Otherwise, it creates a new NdArray object and decodes the DLPack to it.

Returns

an NdArray object borrowing the DLPack tensor.

Return type

NdArray

nnabla.utils.dlpack.to_dlpack(a, dtype=None, ctx=None)

Returns a DLPack which owns an internal array object borrowed by a specified NdArray.

Example:

# Create a nnabla.NdArray in CUDA.
import nnabla as nn
from nnabla.ext_utils import get_extension_context
ctx = get_extension_context('cudnn')
nn.set_default_context(ctx)

a = nn.NdArray.from_numpy_array(np.ones((5, 5), dtype=np.float32))
a.cast(np.float32, ctx)

# Expose as a DLPack.
from nnabla.utils.dlpack import to_dlpack
dlp = to_dlpack(a)

# Use the DLPack in PyTorch.
import torch
from torch.utils.dlpack import from_dlpack
t = from_dlpack(dlp)

# Changing the values in Torch will also be affected in nnabla
# because they share memory.
t.add_(1)
print(a.data)  # All values become 2.
Parameters
  • a (NdArray) – An NdArray object. An internal array which is recently modified or created will be encoded into a DLPack.

  • dtype (numpy.dtype) – If specified, in-place cast operation may be performed before encoding it to a DLPack.

  • ctx (Context) – If specified, in-place device transfer operation may be performed before encoding it into a DLPack.

Returns

A PyCapsule object of a DLManagedTensor (as "dltensor") which internal memory is borrowed by the specified NdArray.

Return type

PyCapsule

RNN Utils
class nnabla.utils.rnn.PackedSequence[source]
Parameters
  • data (nnabla.Variable) – Packed sequence.

  • batch_sizes (nnabla.Variable) – Batch size for each time step and always resides in CPU.

  • sorted_indices (nnabla.Variable) – Sorted indices to reconstruct the original sequences.

  • unsorted_indices (nnabla.Variable) – Unsorted indices to reconstruct the original sequences.

nnabla.utils.rnn.pad_sequence(sequences, batch_first=False, padding_value=0.0)[source]

Pad a list of variable-length Variables.

This method stacks a list of variable-length nnabla.Variable s with the padding_value.

\(T_i\) is the length of the \(i\)-th Variable in the sequences. \(B\) is the batch size equal to the length of the sequences. \(T\) is the max of \(T_i\) for all \(i\). \(*\) is the remaining dimensions including none.

Note

This function must be used the dynamic computation mode.

Example:

import numpy as np
import nnabla as nn
import nnabla.functions as F
import nnabla.utils.rnn as rnn_utils

nn.set_auto_forward(True)

l2v = lambda ldata: nn.Variable.from_numpy_array(np.asarray(ldata))
a = l2v([1, 1, 1, 1])
b = l2v([2, 2, 2])
c = l2v([2, 2, 2])
d = l2v([3, 3])
e = l2v([3, 3])
sequences = [a, b, c, d, e]

padded_sequence = rnn_utils.pad_sequence(sequences)
print(padded_sequence.d)
Parameters
  • sequences (list of nnabla.Variable) – Sequence of the variable of (\(T_i\), \(*\)) shape.

  • batch_first (bool) – If False, output is of (\(T\), \(B\), \(*\)) shape, otherwise (\(B\), \(T\), \(*\)).

  • padding_value (float) – Padding value.

Returns

nnabla.Variable of (\(T\), \(B\), \(*\)) or (\(B\), \(T\), \(*\)) shape

nnabla.utils.rnn.pack_padded_sequence(padded_sequence, lengths, batch_first=False, enforce_sorted=True)[source]

Pack a padded variable-length sequences.

This method packs a padded variable-length sequences.

\(T\) is the max length over the lengths of sequences. \(B\) is the batch size equal to the length of the sequences. \(*\) is the remaining dimensions including none.

Note

This function must be used the dynamic computation mode.

Example:

import numpy as np
import nnabla as nn
import nnabla.functions as F
import nnabla.utils.rnn as rnn_utils

nn.set_auto_forward(True)

l2v = lambda ldata: nn.Variable.from_numpy_array(np.asarray(ldata))
a = l2v([1, 1, 1, 1])
b = l2v([2, 2, 2])
c = l2v([2, 2, 2])
d = l2v([3, 3])
e = l2v([3, 3])
sequences = [a, b, c, d, e]
lengths = l2v([seq.shape[0] for seq in sequences])

padded_sequence = rnn_utils.pad_sequence(sequences)
print(padded_sequence.d)

packed_sequence = rnn_utils.pack_padded_sequence(padded_sequence, lengths)
print(packed_sequence.data.d)
print(packed_sequence.batch_sizes.d)
Parameters
  • padded_sequence (nnabla.Variable) – Padded sequence of (\(T \times B \times *\)) or (\(B \times T \times *\)) shape.

  • lengths (nnabla.Variable) – Sequence length for each batch and always resides in CPU.

  • batch_first (bool) – padded_sequence is of (\(T\), \(B\), \(*\)) shape if False, otherwise (\(B\), \(T\), \(*\)).

  • enforce_sorted (bool) – Sequences are sorted by the length in a decreasing order if True. Default is True.

Returns

PackedSequence

nnabla.utils.rnn.pack_sequence(sequences, batch_first=False, enforce_sorted=True)[source]

Pack a list of variable-length Variables.

This method packs a list of variable-length Variables.

\(T_i\) is the length of the \(i\)-th Variable in the sequences. \(T\) is the max of \(T_i\) for all \(i\). \(B\) is the batch size equal to the length of the sequences. \(*\) is the remaining dimensions including none.

Note

This function must be used the dynamic computation mode.

Example:

import numpy as np
import nnabla as nn
import nnabla.functions as F
import nnabla.utils.rnn as rnn_utils

nn.set_auto_forward(True)

l2v = lambda ldata: nn.Variable.from_numpy_array(np.asarray(ldata))
a = l2v([3, 3])
b = l2v([2, 2, 2])
c = l2v([2, 2, 2])
d = l2v([1, 1, 1, 1])
e = l2v([3, 3])
sequences = [a, b, c, d, e]

packed_sequence = rnn_utils.pack_sequence(sequences, enforce_sorted=False)
print(packed_sequence.data.d)
print(packed_sequence.batch_sizes.d)
Parameters
  • sequences (list of nnabla.Variable) – List of nnabla.Variable of (\(T_i\), \(*\)) shape.

  • enforce_sorted (bool) – Sequences are sorted by the length in a decreasing order if True. Default is True.

Returns

packed_sequence

Return type

PackedSequence

nnabla.utils.rnn.pad_packed_sequence(sequence, batch_first=False, padding_value=0.0, total_length=None)[source]

Pad packed sequence.

This method unpacks the packed sequqnce and pad it, the inverse operation of pack_padded_sequence().

\(T_i\) is the length of the \(i\)-th Variable in the sequences. \(B\) is the batch size equal to the length of the sequences. \(T\) is the max of \(T_i\) for all \(i\). \(*\) is the remaining dimensions including none.

Note

This function must be used the dynamic computation mode.

Example:

import numpy as np
import nnabla as nn
import nnabla.functions as F
import nnabla.utils.rnn as rnn_utils

nn.set_auto_forward(True)

l2v = lambda ldata: nn.Variable.from_numpy_array(np.asarray(ldata))
a = l2v([3, 3])
b = l2v([2, 2, 2])
c = l2v([2, 2, 2])
d = l2v([1, 1, 1, 1])
e = l2v([3, 3])
sequences = [a, b, c, d, e]

packed_sequence = rnn_utils.pack_sequence(sequences, enforce_sorted=False)
print(packed_sequence.data.d)
print(packed_sequence.batch_sizes.d)

padded_sequence, lengths = rnn_utils.pad_packed_sequence(packed_sequence)
print(padded_sequence.d)
print(lengths.d)
Parameters
  • sequence (PackedSequence) – PackedSequence.

  • batch_first (bool) – If False, output is of (\(T\), \(B\), \(*\)) shape, otherwise (\(B\), \(T\), \(*\)).

  • padding_value (float) – Padding value.

  • total_length (int) – If not None, the outputs are padded up to the total_length. If the total_length is less than the max length in the sequences, the error is thrown. This is normally used in the distributed training to align with the longest sequence in a distributed system.

Returns

nnabla.Variable of (\(T\), \(B\), \(*\)) or (\(B\), \(T\), \(*\)) shape

Misc
Python function profiler utilities
nnabla.utils.function_profile.profile(fn=None, condition=None, profile_class=<class 'cProfile.Profile'>, print_freq=0, sort_keys=None, print_restrictions=None)[source]

Decorating a function that is profiled with a Python profiler such as cProfile.Profile.

Note: function doesn’t refer to Function. A Python function.

Parameters
  • fn (function) – A function that is profiled. If None is specified (default), it returns a new decorator function. It is used when you want to specify optional arguments of this decorator function.

  • condition (function) – A function object which takes the same inputs with the decorated function, and returns a boolean value. The decorated function is profiled only when the condition function returns True. By default, it returns always True, hence profiling is performed everytime the decorated function is called.

  • profile_class (class) – A profiler class such as cProfile.Profile and Profile.Profile. The default value is cProfile.Profile.

  • print_freq (int) – The profiling result is printed at function calls with an interval specified by print_freq. If 0 is specified (default), the profiling result is only printed at the end of the Python process unless decorated_func.profiler.print_stats() is called manually.

  • sort_keys (iterable) – A list or tuple of string, which is passed to pstats.Stats.sort_stats() as arguments. The default is ('cumulative', 'time', 'calls').

  • print_restriction (iterable) – A list or tuple which is passed to pstats.Stats.print_stats() as arguments. The default value is (40,), which results in only 40 functions inside the decorated function are printed in the profiling result.

Returns: function

A decorated function. If fn is None, a new decorator function with optional arguments specified.

Example

By decorating a function as following, the profling result is printed at the end of the Python process.

from nnabla.utils import function_profile

@function_profile.profile
def foo(a, b, c=None, d=None):
    ...

If you want to manually print the profiling result so far, use FunctionProfile.print_stats() of the FunctionProfile object attached to the decorated function as profiler attribute.

foo.profiler.print_stats()

If you want to profile the function only when a specific argument is passed to, use the condition argument as following.

def profile_only_if_c_is_not_none(a, b, c=None, d=None):
    return c is not None

@function_profile.profile(condition=profile_only_if_c_is_not_none)
def foo(a, b, c=None, d=None):
    ...
class nnabla.utils.function_profile.FunctionProfile(fn, condition=None, profile_class=<class 'cProfile.Profile'>, print_freq=0, sort_keys=None, print_restrictions=None)[source]

Function profiler object.

This is usually not directly used by users. It’s created via profile(), and attached to a decorated function object as an attribute profiler. See profile function for details.

print_stats(reset=True)[source]

Manually print profiling result.

Parameters

reset (bool) – If False is specified, the profiling statistics so far is maintained. If True (default), reset_stats is called to reset the profiling statistics.

reset_stats()[source]

Manually reset the profiling statistics collected so far.

Extensions

NNabla offers easy extensibility for developers to add new device extensions. The NNabla Python package officially supports the cpu, cuda and cudnn extension, cuda and cudnn extension can dramatically accelerate computation by leveraging NVIDIA CUDA GPUs with cuDNN computation primitives.

You can manually import extensions by:

import nnabla_ext.cudnn

See :ref:`python-package-installation` to install the CUDA extension.
Utilities for extension

Utilities for NNabla extensions.

nnabla.ext_utils.list_extensions()[source]

List up available extensions.

Note

It may not work on some platforms/environments since it depends on the directory structure of the namespace packages.

Returns: list of str

Names of available extensions.

nnabla.ext_utils.import_extension_module(ext_name)[source]

Import an extension module by name.

The extension modules are installed under the nnabla_ext package as namespace packages. All extension modules provide a unified set of APIs.

Parameters

ext_name (str) – Extension name. e.g. ‘cpu’, ‘cuda’, ‘cudnn’ etc.

Returns: module

An Python module of a particular NNabla extension.

Example

ext = import_extension_module('cudnn')
available_devices = ext.get_devices()
print(available_devices)
ext.device_synchronize(available_devices[0])
ext.clear_memory_cache()
nnabla.ext_utils.get_extension_context(ext_name, **kw)[source]

Get the context of the specified extension.

All extension’s module must provide context(**kw) function.

Parameters
  • ext_name (str) – Module path relative to nnabla_ext.

  • kw (dict) – Additional keyword arguments for context function in a extension module.

Returns

The current extension context.

Return type

nnabla.Context

Example

ctx = get_extension_context('cudnn', device_id='0', type_config='half')
nn.set_default_context(ctx)
APIs of extension modules

All extension modules must have the following functions.

nnabla.ext_utils.context(*kw)

Returns a default context descriptor of the extension module. This method takes optional arguments depending on the extension. For example, in the cudnn extension, it takes the device_id as an int to specify the GPU where computation runs on.

nnabla.ext_utils.device_synchronize(*kw)

This method is used to synchronize the device execution stream with respect to the host thread. For example, in CUDA, the kernel execution is enqueued into a stream, and is executed asynchronously w.r.t. the host thread. This function is only valid in devices that use such features. In the CPU implementation, this method is implemented as dummy function, and therefore calls to this function are ignored. The function in the cudnn extension takes the device_id as an optional argument, which specifies the device you want to synchronize with.

Pretrained Models

The nnabla.models package provides APIs that allow users to execute state-of-the-art pre-trained models for inference and training in few lines of code.

ImageNet Models

This subpackage provides a variety of pre-trained state-of-the-art models which is trained on ImageNet dataset.

The pre-trained models can be used for both inference and training as following:

# Create ResNet-50 for inference
import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import numpy as np
from nnabla.models.imagenet import ResNet50
model = ResNet50()
batch_size = 1
# model.input_shape returns (3, 224, 224) when ResNet-50
x = nn.Variable((batch_size,) + model.input_shape)
y = model(x, training=False)

# Execute inference
# Load input image as uint8 array with shape of (3, 224, 224)
from nnabla.utils.image_utils import imread
img = imread('example.jpg', size=model.input_shape[1:], channel_first=True)
x.d[0] = img
y.forward()
predicted_label = np.argmax(y.d[0])
print('Predicted label:', model.category_names[predicted_label])


# Create ResNet-50 for fine-tuning
batch_size=32
x = nn.Variable((batch_size,) + model.input_shape)
# * By training=True, it sets batch normalization mode for training
#   and gives trainable attributes to parameters.
# * By use_up_to='pool', it creats a network up to the output of
#   the final global average pooling.
pool = model(x, training=True, use_up_to='pool')

# Add a classification layer for another 10 category dataset
# and loss function
num_classes = 10
y = PF.affine(pool, num_classes, name='classifier10')
t = nn.Variable((batch_size, 1))
loss = F.sum(F.softmax_cross_entropy(y, t))

# Training...

Available models are summarized in the following table. Error rates are calculated using single center crop.

Available ImageNet models

Name

Class

Top-1 error

Top-5 error

Trained by/with

ResNet-18

ResNet18

30.28

10.90

Neural Network Console

ResNet-34

ResNet34

26.72

8.89

Neural Network Console

ResNet-50

ResNet50

24.59

7.48

Neural Network Console

ResNet-101

ResNet101

23.81

7.01

Neural Network Console

ResNet-152

ResNet152

23.48

7.09

Neural Network Console

MobileNet

MobileNet

29.51

10.34

Neural Network Console

MobileNetV2

MobileNetV2

29.94

10.82

Neural Network Console

SENet-154

SENet

22.04

6.29

Neural Network Console

SqueezeNet v1.0

SqueezeNetV10

42.71

20.12

Neural Network Console

SqueezeNet v1.1

SqueezeNetV11

41.23

19.18

Neural Network Console

VGG-11

VGG11

30.85

11.38

Neural Network Console

VGG-13

VGG13

29.51

10.46

Neural Network Console

VGG-16

VGG16

29.03

10.07

Neural Network Console

NIN

NIN

42.91

20.66

Neural Network Console

DenseNet-161

DenseNet

23.82

7.02

Neural Network Console

InceptionV3

InceptionV3

21.82

5.88

Neural Network Console

Xception

Xception

23.59

6.91

Neural Network Console

GoogLeNet

GoogLeNet

31.22

11.34

Neural Network Console

ResNeXt-50

ResNeXt50

22.95

6.73

Neural Network Console

ResNeXt-101

ResNeXt101

22.80

6.74

Neural Network Console

ShuffleNet

ShuffleNet10

34.15

13.85

Neural Network Console

ShuffleNet-0.5x

ShuffleNet05

41.99

19.64

Neural Network Console

ShuffleNet-2.0x

ShuffleNet20

30.34

11.12

Neural Network Console

Common interfaces
class nnabla.models.imagenet.base.ImageNetBase[source]

Most of ImageNet pretrained models are inherited from this class so that it provides some common interfaces.

__call__(input_var=None, use_from=None, use_up_to='classifier', training=False, force_global_pooling=False, check_global_pooling=True, returns_net=False, verbose=0)[source]

Create a network (computation graph) from a loaded model.

Parameters
  • input_var (Variable, optional) – If given, input variable is replaced with the given variable and a network is constructed on top of the variable. Otherwise, a variable with batch size as 1 and a default shape from self.input_shape.

  • use_up_to (str) – Network is constructed up to a variable specified by a string. A list of string-variable correspondences in a model is described in documentation for each model class.

  • training (bool) – This option enables additional training (fine-tuning, transfer learning etc.) for the constructed network. If True, the batch_stat option in batch normalization is turned True, and need_grad attribute in trainable variables (conv weights and gamma and beta of bn etc.) is turned True. The default is False.

  • force_global_pooling (bool) – Regardless the input image size, the final average pooling before classification layer will be automatically transformed to a global average pooling. The default is False.

  • check_global_pooling (bool) – If True, and if the stride configuration of the final average pooling is not for global pooling, it raises an exception. The default is True. Use False when user want to do the pooling with the trained stride (7, 7) regardless the input spatial size.

  • returns_net (bool) – When True, it returns a NnpNetwork object. Otherwise, It only returns the last variable of the constructed network. The default is False.

  • verbose (bool, or int) – Verbose level. With 0, it says nothing during network construction.

property category_names

Returns category names of 1000 ImageNet classes.

property input_shape

Should returns default image size (channel, height, width) as a tuple.

List of models
class nnabla.models.imagenet.ResNet18[source]

An alias of ResNet (18).

class nnabla.models.imagenet.ResNet34[source]

An alias of ResNet (34).

class nnabla.models.imagenet.ResNet50[source]

An alias of ResNet (50).

class nnabla.models.imagenet.ResNet101[source]

An alias of ResNet (101).

class nnabla.models.imagenet.ResNet152[source]

An alias of ResNet (152).

class nnabla.models.imagenet.ResNet(num_layers=18)[source]

ResNet architectures for 18, 34, 50, 101, and 152 of number of layers.

Parameters

num_layers (int) – Number of layers chosen from 18, 34, 50, 101, and 152.

The following is a list of string that can be specified to use_up_to option in __call__ method;

  • 'classifier' (default): The output of the final affine layer for classification.

  • 'pool': The output of the final global average pooling.

  • 'lastconv': The input of the final global average pooling without ReLU activation.

  • 'lastconv+relu': Network up to 'lastconv' followed by ReLU activation.

References

class nnabla.models.imagenet.MobileNet[source]

MobileNet architecture.

The following is a list of string that can be specified to use_up_to option in __call__ method;

  • 'classifier' (default): The output of the final affine layer for classification.

  • 'pool': The output of the final global average pooling.

  • 'lastconv': The input of the final global average pooling without ReLU activation.

  • 'lastconv+relu': Network up to 'lastconv' followed by ReLU activation.

References

class nnabla.models.imagenet.MobileNetV2[source]

MobileNetV2 architecture.

The following is a list of string that can be specified to use_up_to option in __call__ method;

  • 'classifier' (default): The output of the final affine layer for classification.

  • 'pool': The output of the final global average pooling.

  • 'lastconv': The input of the final global average pooling without ReLU activation.

  • 'lastconv+relu': Network up to 'lastconv' followed by ReLU activation.

References

class nnabla.models.imagenet.SENet[source]

SENet-154 model which integrates SE blocks with a modified ResNeXt architecture.

The following is a list of string that can be specified to use_up_to option in __call__ method;

  • 'classifier' (default): The output of the final affine layer for classification.

  • 'pool': The output of the final global average pooling.

  • 'lastconv': The input of the final global average pooling without ReLU activation.

  • 'lastconv+relu': Network up to 'lastconv' followed by ReLU activation.

References

class nnabla.models.imagenet.SqueezeNetV10[source]

SquezeNetV10 An alias of SqueezeNet ('v1.0').

class nnabla.models.imagenet.SqueezeNetV11[source]

SquezeNetV11 An alias of SqueezeNet ('v1.1').

class nnabla.models.imagenet.SqueezeNet(version='v1.1')[source]

SqueezeNet model for architecture-v1.0 and v1.1 .

Parameters

version (str) – Version chosen from ‘v1.0’ and ‘v1.1’.

The following is a list of string that can be specified to use_up_to option in __call__ method;

  • 'classifier' (default): The output of the final affine layer for classification.

  • 'pool': The output of the final global average pooling.

  • 'lastconv': The input of the final global average pooling without ReLU activation.

  • 'lastconv+relu': Network up to 'lastconv' followed by ReLU activation.

References

class nnabla.models.imagenet.VGG11[source]

An alias of VGG (11).

class nnabla.models.imagenet.VGG13[source]

An alias of VGG (13).

class nnabla.models.imagenet.VGG16[source]

An alias of VGG (16).

class nnabla.models.imagenet.VGG(num_layers=11)[source]

VGG architectures for 11, 13, 16 layers.

Parameters

num_layers (int) – Number of layers chosen from 11, 13, 16.

The following is a list of string that can be specified to use_up_to option in __call__ method;

  • 'classifier' (default): The output of the final affine layer for classification.

  • 'pool': The output of the final global average pooling.

  • 'lastconv': The input of the final global average pooling without ReLU activation.

  • 'lastconv+relu': Network up to 'lastconv' followed by ReLU activation.

  • 'lastfeature': Network up to one layer before 'classifier', but without activation.

References

class nnabla.models.imagenet.NIN[source]

NIN(Network In Network) architecture.

The following is a list of string that can be specified to use_up_to option in __call__ method;

  • 'classifier' (default): The output of the final affine layer for classification.

  • 'pool': The output of the final global average pooling.

  • 'lastconv': The input of the final global average pooling without ReLU activation.

  • 'lastconv+relu': Network up to 'lastconv' followed by ReLU activation.

References

class nnabla.models.imagenet.DenseNet[source]

The following is a list of string that can be specified to use_up_to option in __call__ method;

  • 'classifier' (default): The output of the final affine layer for classification.

  • 'pool': The output of the final global average pooling.

  • 'lastconv': The output from last denseblock.

  • 'lastconv+relu': Network up to 'lastconv' followed by ReLU activation.

References

class nnabla.models.imagenet.InceptionV3[source]

InceptionV3 architecture.

The following is a list of string that can be specified to use_up_to option in __call__ method;

  • 'classifier' (default): The output of the final affine layer for classification.

  • 'pool': The output of the final global average pooling.

  • 'prepool': The input of the final global average pooling, i.e. the output of the final inception block.

References

class nnabla.models.imagenet.Xception[source]

Xception model.

The following is a list of string that can be specified to use_up_to option in __call__ method;

  • 'classifier' (default): The output of the final affine layer for classification.

  • 'pool': The output of the final global average pooling.

  • 'lastconv': The input of the final global average pooling without ReLU activation.

  • 'lastconv+relu': Network up to 'lastconv' followed by ReLU activation.

References

class nnabla.models.imagenet.GoogLeNet[source]

GoogLeNet model.

The following is a list of string that can be specified to use_up_to option in __call__ method;

  • 'classifier' (default): The output of the final affine layer for classification.

  • 'pool': The output of the final global average pooling.

  • 'prepool': The input of the final global average pooling, i.e. the output of the final inception block.

References

class nnabla.models.imagenet.ResNeXt50[source]

An alias of ResNeXt (50).

class nnabla.models.imagenet.ResNeXt101[source]

An alias of ResNeXt (101).

class nnabla.models.imagenet.ResNeXt(num_layers=50)[source]

ResNeXt architectures for 50 and 101 of number of layers.

Parameters

num_layers (int) – Number of layers chosen from 50 and 101.

The following is a list of string that can be specified to use_up_to option in __call__ method;

  • 'classifier' (default): The output of the final affine layer for classification.

  • 'pool': The output of the final global average pooling.

  • 'lastconv': The input of the final global average pooling without ReLU activation.

  • 'lastconv+relu': Network up to 'lastconv' followed by ReLU activation.

References

class nnabla.models.imagenet.ShuffleNet10[source]

An alias of ShuffleNet (10).

class nnabla.models.imagenet.ShuffleNet05[source]

An alias of ShuffleNet (5).

class nnabla.models.imagenet.ShuffleNet20[source]

An alias of ShuffleNet (20).

class nnabla.models.imagenet.ShuffleNet(scaling_factor=10)[source]

Model for architecture ShuffleNet, ShuffleNet-0.5x and ShufffleNet-2.0x.

Parameters

Factor (Scaling) – To customize the network to a desired complexity, we can simply apply a scale factor on the number of channnels. This can be chosen from ‘10’, ‘5’ and ‘20’.

The following is a list of string that can be specified to use_up_to option in __call__ method;

  • 'classifier' (default): The output of the final affine layer for classification.

  • 'pool': The output of the final global average pooling.

  • 'lastconv': The input of the final global average pooling without ReLU activation.

  • 'lastconv+relu': Network up to 'lastconv' followed by ReLU activation.

References

Object Detection Models

This subpackage provides a pre-trained state-of-the-art models for the purpose of object detection which is trained on ImageNet dataset and fine-tuned on Pascal VOC and MS COCO dataset.

The pre-trained models can be used for both inference and training as following:

# Import required modules
import nnabla as nn
from nnabla.models.object_detection import YoloV2
from nnabla.models.object_detection.utils import (
    LetterBoxTransform,
    draw_bounding_boxes)
from nnabla.utils.image_utils import imread, imsave
import numpy as np

# Set device
from nnabla.ext_utils import get_extension_context
nn.set_default_context(get_extension_context('cudnn', device_id='0'))

# Load and create a detection model
h, w = 608, 608
yolov2 = YoloV2('coco')
x = nn.Variable((1, 3, h, w))
y = yolov2(x)

# Load an image and scale it to fit inside the (h, w) frame
img_orig = imread('dog.jpg')
lbt = LetterBoxTransform(img_orig, h, w)

# Execute detection
x.d = lbt.image.transpose(2, 0, 1)[None]
y.forward(clear_buffer=True)

# Draw bounding boxes to the original image
bboxes = lbt.inverse_coordinate_transform(y.d[0])
img_draw = draw_bounding_boxes(
    img_orig, bboxes, yolov2.get_category_names())
imsave("detected.jpg", img_draw)
Available models trained on COCO dataset

Name

Class

mAP

Training framework

Notes

YOLO v2

YoloV2

44.12

Darknet

Weights converted from author’s model

Available models trained on VOC dataset

Name

Class

mAP

Training framework

Notes

YOLO v2

YoloV2

76.00

Darknet

Weights converted from author’s model

Common interfaces
class nnabla.models.object_detection.base.ObjectDetection[source]
__call__(input_var=None, use_from=None, use_up_to='detection', training=False, returns_net=False, verbose=0)[source]

Create a network (computation graph) from a loaded model.

Parameters
  • input_var (Variable, optional) – If given, input variable is replaced with the given variable and a network is constructed on top of the variable. Otherwise, a variable with batch size as 1 and a default shape from self.input_shape.

  • use_up_to (str) – Network is constructed up to a variable specified by a string. A list of string-variable correspondences in a model is described in documentation for each model class.

  • training (bool) – This option enables additional training (fine-tuning, transfer learning etc.) for the constructed network. If True, the batch_stat option in batch normalization is turned True, and need_grad attribute in trainable variables (conv weights and gamma and beta of bn etc.) is turned True. The default is False.

  • returns_net (bool) – When True, it returns a NnpNetwork object. Otherwise, It only returns the last variable of the constructed network. The default is False.

  • verbose (bool, or int) – Verbose level. With 0, it says nothing during network construction.

property input_shape

Should returns default image size (channel, height, width) as a tuple.

class nnabla.models.object_detection.utils.LetterBoxTransform(image, height, width)[source]

Create an object holding a new letterboxed image as image attribute.

Letterboxing is defined as scaling the input image to fit inside the desired output image frame (letterbox) while preserving the aspect ratio of the original image. The pixels that are not filled with the original image pixels become 127.

The created object also provides a functionality to convert bounding box coordinates back to the original image frame.

Parameters
  • image (numpy.ndarray) – An uint8 3-channel image

  • height (int) – Letterbox height

  • width (int) – Letterbox width

inverse_coordinate_transform(coords)[source]

Convert the bounding boxes back to the original image frame.

Parameters

coords (numpy.ndarray) – N x M array where M >= 4 and first 4 elements of M are x, y (center coordinates of bounding box), w and h (bouding box width and height).

nnabla.models.object_detection.utils.draw_bounding_boxes(img, bboxes, names, colors=None, thresh=0.5)[source]

The transformed cordinates are further used to draw bounding boxes for the detected objects.

Parameters
  • img (numpy.ndarray) – Input image

  • bboxes (numpy.ndarray) – Transformed bounding box coorinates from the model.

  • names (list of str) – Name of categories in the dataset

  • colors (list of tuple of 3 ints) – Colors for bunding boxes

  • thresh (float) – Threshold of bounding boxes.

List of models
class nnabla.models.object_detection.YoloV2(dataset='voc')[source]

The following is a list of string that can be specified to use_up_to option in __call__ method;

  • 'detection' (default): The output from the last convolution (detection layer) after post-processing.

  • 'convdetect': The output of last convolution without post-processing.

  • 'lastconv': Network till the convolution layer+relu which comes before detection convolution layer.

References

Semantic Segmentation Models

This subpackage provides a pre-trained state-of-the-art model for the purpose of semantic segmentation (DeepLabv3+, Xception-65 as backbone) which is trained on ImageNet dataset and fine-tuned on Pascal VOC and MS COCO dataset.

The pre-trained models can be used for inference as following:

#Import reauired modules
import numpy as np
import nnabla as nn
from nnabla.utils.image_utils import imread
from nnabla.models.semantic_segmentation import DeepLabV3plus
from nnabla.models.semantic_segmentation.utils import ProcessImage

target_h = 513
target_w = 513
# Get context
from nnabla.ext_utils import get_extension_context
nn.set_default_context(get_extension_context('cudnn', device_id='0'))

# Build a Deeplab v3+ network
image = imread("./test.jpg")
x = nn.Variable((1, 3, target_h, target_w), need_grad=False)
deeplabv3 = DeepLabV3plus('voc-coco',output_stride=8)
y = deeplabv3(x)

# preprocess image
processed_image = ProcessImage(image, target_h, target_w)
input_array = processed_image.pre_process()

# Compute inference
x.d = input_array
y.forward(clear_buffer=True)
print ("done")
output = np.argmax(y.d, axis=1)

# Apply post processing
post_processed = processed_image.post_process(output[0])

#Display predicted class names
predicted_classes = np.unique(post_processed).astype(int)
for i in range(predicted_classes.shape[0]):
    print('Classes Segmented: ', deeplabv3.category_names[predicted_classes[i]])

# save inference result
processed_image.save_segmentation_image("./output.png")
Available models trained on voc dataset

Name

Class

Output stride

mIOU

Training framework

Notes

DeepLabv3+

DeepLabv3+

8

81.48

Nnabla

Backbone (Xception-65) weights converted from author’s model and used for finetuning

DeepLabv3+

DeepLabv3+

16

82.20

Nnabla

Backbone (Xception-65) weights converted from author’s model and used for finetuning

Available models trained on Voc and coco dataset

Name

Class

Output stride

mIOU

Training framework

Notes

DeepLabv3+

DeepLabv3+

8

82.20

Tensorflow

Weights converted from author’s model

DeepLabv3+

DeepLabv3+

16

83.58

Tensorflow

Weights converted from author’s model

Common interfaces
class nnabla.models.semantic_segmentation.base.SemanticSegmentation[source]

Semantic Segmentation pretrained models are inherited from this class so that it provides some common interfaces.

__call__(input_var=None, use_from=None, use_up_to='segmentation', training=False, returns_net=False, verbose=0)[source]

Create a network (computation graph) from a loaded model.

Parameters
  • input_var (Variable, optional) – If given, input variable is replaced with the given variable and a network is constructed on top of the variable. Otherwise, a variable with batch size as 1 and a default shape from self.input_shape.

  • use_up_to (str) – Network is constructed up to a variable specified by a string. A list of string-variable correspondences in a model is described in documentation for each model class.

  • training (bool) – This option enables additional training (fine-tuning, transfer learning etc.) for the constructed network. If True, the batch_stat option in batch normalization is turned True, and need_grad attribute in trainable variables (conv weights and gamma and beta of bn etc.) is turned True. The default is False.

  • returns_net (bool) – When True, it returns a NnpNetwork object. Otherwise, It only returns the last variable of the constructed network. The default is False.

  • verbose (bool, or int) – Verbose level. With 0, it says nothing during network construction.

property input_shape

Should return default image size (channel, height, width) as a tuple.

List of models
class nnabla.models.semantic_segmentation.DeepLabV3plus(dataset='voc', output_stride=16)[source]

DeepLabV3+.

Parameters
  • dataset (str) – Specify a training dataset name from ‘voc’ or ‘coco’.

  • output_stride (int) – DeepLabV3 uses atrous (a.k.a. dilated) convolutions. The atrous rate depends on the output stride. the output stride has to be selected from 8 or 16. Default is 8. If the output_stride is 8 the atrous rate will be [12,24,36] and if the output_stride is 16 the atrous rate will be [6,12,18].

The following is a list of string that can be specified to use_up_to option in __call__ method;

  • 'segmentation' (default): The output of the final layer.

  • 'lastconv': The output from last Convolution.

  • 'lastconv+relu': Network up to 'lastconv' followed by ReLU activation.

References

Out-of-core execution

The nnabla.lms package provides APIs that allow users to execute large-scale networks than allotted GPU memory by utilizing out-of-core algorithm. Out-of-core algorithm, or external memory algorithm, is an algorithm that enables processing data that are too large to fit into a main memory at once.

SwapInOutScheduler
class nnabla.lms.SwapInOutScheduler

Interface class for out-of-core execution / training.

This API enables training neural networks whose size is larger than alloted GPU memory. See https://arxiv.org/abs/2010.14109 for more detail of shcheduling strategy.

Note

cuda_init.prefer_cuda_virtual_array() used in following example can be used under cuda >= 10.2 and cudnn >= 8. We utilize virtual memory management supported from cuda 10.2. Additionally, when we tested virtual memory management with cuda >= 10.2 and cudnn < 8, we found the computation results of some cuddn functions are inaccurate. So, when your environment has cuda < 10.2 or cudnn < 8, the virtual memory allocator in nnabla will not be built and you can’t use it. If you would like to use SwapInOutScheduler to the fullest extent, please install cuda >= 10.2 and cudnn >= 8 and reinstall the corresponding nnabla-ext-cuda package.

Example:

from nnabla.lms import SwapInOutScheduler

# Change memory allocator which is preferable for SwapInOutScheduler.
from nnabla_ext.cuda.init as cuda_init
cuda_init.prefer_cpu_pinned_array()  # To accelerate memory transfer, using pinned memory for cpu memory will be preferable.

# Only for cuda >= 10.2 and cudnn >= 8. This setting is the best for SwapInOutScheduler.
cuda_init.prefer_cuda_virtual_array()  # To reduce memory fragmentation due to cpu-gpu memory transfers, using virtual allocator for gpu memory will be preferable.

# create context for both host and device
from nnabla.ext_utils import get_extension_context
host_ctx = get_extension_context("cpu", device_id="", type_config="float") # device_id is dummy
device_ctx = get_extension_context("cudnn", device_id="0", type_config="float")

scheduler = SwapInOutScheduler(host_ctx, device_ctx, size=max_gpu_memory_size)

# Make sure to call `nn.set_default_context` after calling prefer_xxx_array() to activate a change of memory preference.
nn.set_default_context(device_ctx)

x = nn.Variable(...)
loss = build_network(x)

solver = S.Sgd(nn.get_parameters())

for i in range(iteration):
    # scheduling memory transfers for all tensors appearing under the context of scheduler.
    with scheduler:
        x.d = next_data()

        loss.forward(clear_no_need_grad=True)

        solver.zero_grad()
        loss.backward(clear_buffer=True)

        solver.update()

When you get Out-of-Memory (OOM) error under the SwapInOutScheduler, possibly there are 2 options to avoid this OOM.

  1. Set small budget of GPU memory for scheduling.

  2. Set small size for a physical memory chunk allocated by virtual memory allocator.

These are examplified as follows:

Example:

# 1. Set small budget of GPU memory for scheduling
# You can reduce the below ratio until you can execute your network.
memsize_for_scheduler = max_gpu_memory_size * 0.8
scheduler = SwapInOutScheduler(..., size=memsize_for_scheduler)

# 2. Set small size for a physical memory chunk allocated by virtual memory allocator
# In default, the chunk size is set as 20MB (20 << 20).
from nnabla_ext.cuda.init import set_cuda_virtual_memory_chunk_size
set_cuda_virtual_memory_chunk_size(2 << 20)  # Set 2MB, for example.
end_scheduling(self)

An interface to specify the end point for scheduling. A range between start_scheduling() and end_scheduling() is a target for a single scheduling.

Note that, using with statement of SwapInOutScheduler, end_scheduling() will be automatically called when exiting with statement. In general, avoid to use start_scheduling() and end_scheduling() and use with statement insted (with scheduler:, see an example above).

function_post_hook(self, func)

A callback executed as function_post_hook in forward and backward.

For all forward and backward wrapped by with statement of SwapInOutScheduler, this callback is automatically set. In general, avoid to set this manually and use with statement of SwapInOutScheduler.

function_pre_hook(self, func)

A callback executed as function_pre_hook in forward and backward.

For all forward and backward wrapped by with statement of SwapInOutScheduler, this callback is automatically set. In general, avoid to set this manually and use with statement of SwapInOutScheduler.

start_scheduling(self)

An interface to specify the starting point for scheduling. A range between start_scheduling() and end_scheduling() is a target for a single scheduling.

Note that, using with statement of SwapInOutScheduler, start_scheduling() will be automatically called when entering with statement. In general, avoid to use start_scheduling() and end_scheduling() and use with statement insted (with scheduler:, see an example above).

update_post_hook(self)

A callback executed as post_hook in all solver functions, e.g. solver.update, solver.weight_decay, solver.clip_grad_by_norm, and so on.

For all solver functions wrapped by with statement of SwapInOutScheduler, this callback is automatically set. In general, avoid to set this manually and use with statement of SwapInOutScheduler.

update_pre_hook(self)

A callback executed as pre_hook in all solver functions, e.g. solver.update, solver.weight_decay, solver.clip_grad_by_norm, and so on.

For all solver functions wrapped by with statement of SwapInOutScheduler, this callback is automatically set. In general, avoid to set this manually and use with statement of SwapInOutScheduler.

Modules

The nnabla.core.module.Module class represents a construction block of neural network.

Module
class nnabla.core.module.Module[source]

Module is a construction block of a computation model. Modules normally are constructed by lower level operators or other Modules, thus, nesting them in a tree-like structure may construct a more complex computation model.

Example

User may construct his model by derived from this class. Like:

import nnabla as nn
import nnabla.parametric_functions as PF
import nnabla.functions as F

class ConvBn(nn.Module):
    def __init__(self, outmaps, kernel=1, stride=1, act=None):
        self.outmaps = outmaps
        self.kernel = kernel
        self.stride = stride
        self.act = act

    def call(self, x, training=True):
        kernel = (self.kernel, self.kernel)
        pad = (self.kernel // 2, self.kernel // 2)
        stride = (self.stride, self.stride)
        h = PF.convolution(x, self.outmaps, kernel,
                           pad, stride, with_bias=False)
        h = PF.batch_normalization(h, batch_stat=training)
        if self.act is None:
            return h
        return self.act(h)


class ResUnit(nn.Module):
    def __init__(self, channels, stride=1, skip_by_conv=True):
        self.conv1 = ConvBn(channels // 4, 1, 1,
                            act=lambda x: F.relu(x, inplace=True))
        self.conv2 = ConvBn(channels // 4, 3, stride,
                            act=lambda x: F.relu(x, inplace=True))
        self.conv3 = ConvBn(channels, 1)
        self.skip_by_conv = skip_by_conv
        self.skip = ConvBn(channels, 1, stride)

    def call(self, x, training=True):
        h = self.conv1(x)
        h = self.conv2(h)
        h = self.conv3(h)

        s = x
        if self.skip_by_conv:
            s = self.skip(s)
        h = F.relu(F.add2(h, s, inplace=True), inplace=True)
        return h

To use this model, user may do like the following code:

res_unit = ResUnit(1024)
x = nn.Variable((64, 3, 32, 32))
x.d = np.random.random(x.shape)
y = res_unit(x)
y.forward(clear_buffer=True)

For working with dynamic network, user may do like the following:

res_unit = ResUnit(1024)
with nn.auto_forward():
    x = nn.Variable.from_numpy_array(np.random.random((1, 3, 32, 32)))
    y = res_unit(x)
    print(y.d)

For training, please set the parameters in module scope to optimizer. For example,

import nnabla.solvers as S

resnet = ResNet(18)
loss = resnet(x, y_)

solver = S.Sgd(lr=1e-3)
solver.set_parameters(resnet.get_parameters())

for _ in range(max_iter):
    x.d, y_.d = data.next()
    loss.forward()
    solver.zero_grad()
    loss.backward()
    solver.weight_decay(1e-5)
    solver.update()

In this example, we supposed ResNet is a derived class of Module, x, y_ is Variable, data is an instance of DataIterator, supposed it has already been attached to a DataSet.

Note:

From this example, we knew that model parameters are owned by model. Here it is variable resnet. These parameters will be referred when network is forward or backward or solve.update(). Hence, it is necessary to keep this module instance from being unexpectedly released, to ensure forward() or backward() can refer to these variables.

call(*args, **kwargs)[source]

User needs implement this function to construct their neural network. In the implementation, user may instantiate existing predefined Modules as its members, then use it. For example:

class AModule(nn.Module):
   def __init__(...):
      ...
      self.cnb = ConvBN(128) # A submodule is instantiated here.

   def call(...):
      h = self.cnb(x) # Using beforehand instantiated submodule.

or directly use parametric functions or functions:

class AModule(nn.Module):
    ...
    def call(...):
        ...
        h = PF.convolution(x, self.outmaps, ...)
        return h

Note

The following usage is currently not supported, it might be supported in future version:

class AModule(nn.Module):
   def __init__(...):
      ...
      self.cnb = [ConvBN(k) for k in [8, 16, 32]] # using an array to hold module instances.
      self.cnb = {f'name_{k}': ConvBN(k) for k in [8, 16, 32]} # using a dict to hold module instances.

Note

The following method to temporarily instantiate a module is also not allowed:

class AModule(nn.Module):
   def call(...):
      ...
      cnb = ConvBN(k) # Instantiate a temporary instance of Module is not allowed
      y = cnb(x)
      return y

Because when leave this scope, the parameters registered to cnb module will be released, which cause unexpected result.

get_parameters(recursive=True, grad_only=False, memo=None)[source]

Obtain an OrderedDict object of all parameters in current Module.

For example,

x = nn.Variable.from_numpy_array((np.random.random((8, 32, 256, 256))))
conv_bn = ConvBn(2)
y = conv_bn(x)

params = conv_bn.get_parameters()
for parameter_name, parameter_value in params.items():
    print("{}:{}".format(parameter_name, parameter_value.shape))

The output looks like:

conv/W:(2, 32, 1, 1)
bn/beta:(1, 2, 1, 1)
bn/gamma:(1, 2, 1, 1)
bn/mean:(1, 2, 1, 1)
bn/var:(1, 2, 1, 1)

Notice that the parameter name looks like a filepath, with splash separated nested scope name. In addition, module name default is used with a prefix @.

Parameters
  • recursive (bool, optional, default=True) – Whether obtain the parameters of current module’s submodules. Default is True.

  • grad_only (bool, optional, default=False) – Whether only obtain the grad. Default is False.

Returns

Flattened parameter’s name-value pairs of current Module.

Return type

OrderedDict

load_parameters(path, extension='.h5')[source]

Load parameters from a file into this module.

Parameters

path – str or file-like object

property parameter_scope

A module has its owned parameter_scope, which can avoid to pollute global parameter name space. User may obtain the parameter_scope of a module by this property.

Returns

The parameter scope of current Module.

Return type

OrderedDict

save_parameters(path, extension='.h5')[source]

Save parameters of this module to a file.

Parameters

path – str or file-like object

property training

Return a bool value which indicates whether current Module is in training state or not. A module may be set to training state or not, so that the computation graph created from this module can be changed according to this state. For example,

class ConvBN(Module):
    ...
    def call(self, x):
        h = self.conv1(x)
        if self.training:
            h = self.drop_out(h)
        h = F.relu(h, inplace=True)
        return h

conv_bn = ConvBN()
conv_bn.training = True
train_y = conv_bn(x)

conv_bn.training = False
eval_y = conv_bn(x)
Returns

which indicates whether current Module is in training state.

Return type

bool

zero_grad()[source]

Clear the gradient of the parameters in this module to 0.

Graph Definition

In NNabla, Graph Definition represents a kind of representation of a computation graph which is special designed for storage optimization and format converter.

A computation graph can be defined by the call of NNabla functions. Such computation graph has instantiated the input and output variables of the functions, inherent topology has been established for forward or backward computation. But for persistence of such graph, another abstract representation, so-called protobuf graph(or network), as abbreviation - proto graph is used normally. In this graph, only the information being necessary for persistence are kept, the information only used for computation will be dropped.

Graph Definition provides a group of functions and classes, tends to facilitate user creates protobuf network from their computation graph, and saving and restoring their neural network from a persistent protobuf network representation.

ProtoGraph
class nnabla.graph_def.ProtoGraph(networks=None, parameter_scope=None)

This class represents a group of proto networks. It normally corresponds to a .nnp file. In a .nnp file, there might be one or multiple networks, for example, there might be a network for doing directly inferring, another network with similar network structure, sharing same parameters, using for training. This class works as a container of proto networks, providing a group of functions for accessing proto network by its name. Especially, when there is only one network in it, also some short-cut functions are provided for directly operating with this network. For example,

import nnabla as nn

g = nn.graph_def.load("my_model.nnp") # Suppose there is only one network in this file.
x1 = nn.Variable(input_shape)
x1.d = ... # load data here.
y1 = g.networks['executor_net'](x1)  #<== (1)
y1.forward()
print(y1.d)

x2 = nn.Variable(input_shape)
x2.d = ... # load data here.
y2 = g(x2) #<== (2)
# y2 = g.default_graph()(x2) #<== (3)
y2.forward()
print(y2.d)

The computation graph y1 and y2 are exactly same. (2) and (3) are equal. If there are multiple networks in a graph, the first network being loaded acted as its default network. Please not use default_graph() when there are multiple networks in graph, since the default heavily depends on concrete implementation.

If you know the name of each network, you may access proto network in this graph by its member name. For example,

g = nn.graph_def.load("my_model.nnp") # Suppose there is only one network in this file.
x = nn.Variable(input_shape)
x.d = ... # load data here.
y = g.executor_net(x) # here, we knew there is a network named "executor_net" existed.
y.forward()
print(y.d)
as_proto(include_parameter=False, only_parameter=False, networks=None, variable_batch_size=True)

This function exports a protobuf data structure, which can be manipulated by google protobuf APIs.

Parameters
  • include_parameter (bool, optional, default=False) – Whether exports the parameters to protobuf data structure.

  • only_parameter (bool, optional, default=False) – Whether only exports the parameters to protobuf data structure.

  • networks (array of proto networks, optional, default=None) – User may provides their networks to export a protobuf data structure.

  • variable_batch_size (bool, optional, default=True) – Replace batch size of current network with an abstract placeholder, so that batch size can be replaced with other value in use time.

property current_context

Current backend context of this proto network.

default_graph()

This function returns default proto network in this graph. Which network is default graph depends on its loading sequence. Hence, it is safe to be used when there is only one network.

expand_loop_control()

This function expands loop control statements for all networks in this graph.

static from_proto(proto, batch_size=None, param_scope=None, rng=None)

This function create a proto graph object from a protobuf data structure.

Parameters
  • proto (protobuf object) – A protobuf data structure.

  • batch_size (int, optional, default=None) – The batch size will be applied to this graph. If it is None, it is pending to apply a the batch size value.

  • param_scope (OrderedDict, optional, default=None) – User may provide an owned parameter scope.

  • rng (np.random.RandomState, optional, default=None) – A random seed, which is used in parameter initialization.

get_parameters(grad_only=False)

Get parameters in current module name scope.

ProtoNetwork
class nnabla.graph_def.ProtoNetwork(owner, name=None, batch_size=None)

This class represents a protobuf network, which comes from a corresponding computation graph or restored from a saved protobuf network(e.g. .nnp file).

This class describes a neural network by the following members:

  • functions: An OrderedDict of name-value pairs, the value is ProtoFunction object.

  • variables: An OrderedDict of name-value pairs, the value is ProtoVariable object.

  • parameters: An OrderedDict of name-value pairs, the value is ProtoVariable object.

  • inputs: A string list, which contains the name of input variables of this network.

  • outputs: A string list, which contains the name of output variables of this network.

variables represents activations in networks, parameters mainly includes weights and all learnable parameters. functions represents functions in networks, the sequence of functions might not equal forward sequence. Please use forward_sequence to obtain exactly forward function sequence.

__call__(*args, **kwargs)

Generate a computation graph of this protonetwork.

Parameters

args (tuple of nn.Variables or None) –

The inputs of network, which can be different from the inputs of original computation graph as long as the network allows.

For example,

import nnabla as nn
import numpy as np

resnet = nn.graph_def.load("resnet.nnp")
x.d = np.random.random(input_shape)
y = resnet(x)

The variable y corresponding to a computation graph, user may perform forward like:

y.forward()

If user does not provide inputs for this function, because proto network has the memory of network inputs, this function will create corresponding nn.Variable objects as the inputs of this network. This input variables actually are placeholder of input, hence, users need to find these input variables and fill actual value to these placeholders, so that this computation graph is ready for forward or backward.

For example,

g = nn.graph_def.load("resnet.nnp")
y = g() # Not provide input variables

To feed training or evaluation data to this network, user needs to locate input variable, for example:

input = g.networks[network_name].variables[input_name].variable_instance
input.d = np.random.random(input_shape)
batch_size (int, optional, default=None):

If provided, batch_size will be applied for newly created computation graph. For example,

g = nn.graph_def.load("my_model.nnp")
y = g(batch_size=32)

In this sample, batch_size will be used to create a computation graph with specified batch size. Supposed x is the input of network, its original shape is (1, 3, 32, 32), then the actual computation graph will be (32, 3, 32, 32).

as_proto(**kwargs)

This function returns a protobuf data structure, which can be directly accessed by the functions in nnabla.utils.nnabla_pb2. Thus, it allows user further manipulates this protobuf representation, for example, performing format converting, or network structure optimization.

Parameters

variable_batch_size (bool, optional) – If true, the batch size of the network will be replaced with an abstract representation, so that it can be replaced with other value in restoring computation graph.

Returns

A protobuf object.

Return type

protobuf

execute_on_proto(execute)

This function performs a virtual forward, following the sequence from inputs to output. This function does not use recursive call to perform graph-travel, instead, a non-recursive algorithm is used to graph-travel. For each function, execute is called when meet a function, a ProtoFunction object is passed in for further operation with this function.

Parameters

execute (callable) –

A callback function (or callable object), which is called when each ProtoFunction is met in travelling graph.

execute should look like:

def execute(pf: ProtoFunction):
    # Do what you want to do with pf
    pass

Or:

class MyCallback:
    def __call__(pf: ProtoFunction):
        # Do what you want to do with pf
        pass

expand_loop_control()

This function expand loop control statement and generate a new proto network object without loop control statement. loop control statement cannot be created by python code, it can be only created by interactive neural network design tool. The following briefly introduce its specification:

  • As for variable,

    In nntxt, if the variable includes a field repeat_id, it means that this variable is in surround with a loop control structure. A renaming rule is applied if expanding this network. The variable name will be added a postfix, like:

    • For old style, e.g.:

    BatchNormalization_6/bn/mean --> BatchNormalization_6/bn/mean_RepeatStart[0]
                                                                     ^        ^  repeat_time
                                                                  repeat_id[index]
    
    original_name --> original_name + << _%repeat_id%[%repeat_time%],  for each in repeat_id >>
    
    • For new style, e.g.:

    BatchNormalization_6{RepeatStart}/bn/mean --> BatchNormalization_6[0]/bn/mean_RepeatStart
                                                                       ^
                                                                  repeat_time
    original_name --> original_name + << [%repeat_time%],  for each in repeat_id >>
    
  • As for RepeatStart, RepeatEnd

    The functions or variables nodes between these 2 layers will be repeated. Expanding will create times of functions and variables, and connected them each other.

  • As for RecurrentInput,

    Axis of RecurrentParam points out which axis will be split-ed. And each branch will duplicated the functions and variables with this repeat_id. This layer works like a split function.

  • As for RecurrentOutput,

    RecurrentOutput merge multiple branches into one output, looks like a stack function.

  • As for Delay

    First time, the output is its input[1], after that, the output is its input[0]

forward_sequence()

This function creates an iteratable for iterating functions in network with the sequence of actually forward.

For example,

for pf in proto_network.forward_sequence():
    print(pf.name)
promote(callback)

User may manipulate a proto network by a callback, like NnpNetworkPass.

Parameters

callback (NnpNetworkPass,) – Currently, only NnpNetworkPass object is supported as a network promotion callback.

Developers may manipulate a proto network by a modifier, acts as a callback. nnabla.utils.nnp_graph.NnpNetworkPass is a kind of modifier. The following gives a simple example to illustrate this usage:

Example

from nnabla as nn
from nnabla.utils import nnp_graph

verbose = 1
callback = nnp_graph.NnpNetworkPass(verbose)

@callback.on_generate_function_by_name('Convolution')
def change_convolution_param(f):
    print('{}'.format(f.proto.convolution_param.pad.dim[:]))
    f.proto.convolution_param.pad.dim[:] = [1, 1]
    return f

g = nn.graph_def.load("my_model.nnp")
n = g.default_graph().promote(callback)
x = nn.Variable(input_shape)
y = n(x)
y.forward()

In this example, a callback is defined to change pad of a Convolution function, locating this target function by the name of function, here, only the function with the name 'Convolution' is located and operated.

save(filename, include_parameter=False, variable_batch_size=True)

This function saves current proto network to a file, which is specified by filename, normally, e.g. a .nnp file.

Parameters
  • filename (str) – string filename, its extension name is used to determine the file format. The extension name normally is .nnp.

  • include_parameter (bool, optional, default=False) – Whether saving parameters to protobuf tree.

  • variable_batch_size (bool, optional, default=True) – Whether replace current network’s batch size dimension with an abstract representation. If it is true, it is possible to use another batch size when this network is reused.

ProtoVariable
class nnabla.graph_def.ProtoVariable(shape, name=None, need_grad=False, var_type='Buffer')

This class represents a variable, so-called proto variable. Passing this variable to network definition, a proto network will be generated in a proto graph scope. If this procedure is done under a with statement as g, a proto network will be generated in g. Otherwise, a global graph scope is used, a proto network will be generated in global graph scope.

ProtoFunction
class nnabla.graph_def.ProtoFunction(func, f_type, args, name=None, owner=None)

This class represent a function that is used to define a proto network.

There are the following properties to describe a proto function:
  • name: The name of this function.

  • type: The type of this function, e.g. ReLU.

  • inputs: An array of string name, which represents the proto variables of inputs.

  • outputs: An array of string name, which represents the proto variables of outputs.

graph_call(**kwargs)

This function create function instance for generating computation graph.

load
nnabla.graph_def.load(filename, batch_size=None, exclude_parameter=False, parameter_only=False, extension='.nntxt', parameter_scope=None, rng=None)

Load network from files

Parameters
  • filename (str or list or file-like object) – Filename string ,list of filenames or file-like object.

  • batch_size (int) – The batch size expected to be set.

  • exclude_parameter (bool) – If True, only load model, not load parameters of this model. Default is False.

  • parameter_only (bool) – If True, only load model parameters. Default is False.

  • extension (str) – This parameter is needed when filename is a file-like object. Default is .nntxt.

  • parameter_scope (OrderedDict) – User may provide a user owned parameter scope. If this parameter is not provided, loaded parameters will be created in created proto_graph’s parameter_scope. This parameter_scope is default initialized with empty dictionary.

  • rng (random state) – User may specify random state, which cause parameters are initialized by determined random seed.

Returns

A ProtoGraph object, in which, there are one or multiple ProtoNetwork objects.

Return type

ProtoGraph

Example

The following example loads a model and generate the output variable through this model:

import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF

def fusion_net(x):
    def unit(i, prefix):
        c1 = PF.convolution(i, 4, (3, 3), pad=(1, 1), name=prefix + '-c1')
        c2 = PF.convolution(F.relu(c1), 4,
                            (3, 3), pad=(1, 1), name=prefix + '-c2')
        c = F.add2(c2, c1, inplace=True)
        return c
    c = unit(x, 'c1')
    c2 = unit(c, 'c2')
    y = PF.affine(c2, 5, name='fc')
    return y

x = nn.ProtoVariable((64, 3, 32, 32))
y = fusion_net(x)
g = nn.graph_def.get_default_graph()  # Get generated graph_def
g.save("fusion_net.nnp")
...
g = nn.graph_def.load("fusion_net.nnp")
x = nn.Variable((64, 3, 32, 32))
x.d = ... # user provided input data for this graph
y = g(x) # create computation graph by passing in nn.Variable()
y.forward() # calculate output by this graph
...

# You may use your special context(e.g. cuda context)
with context_scope(ctx):
   y = g(x) # create computation graph representation with specified backend context.
   y.forward() # forward using specified backend
save
nnabla.graph_def.save(filename, content, include_parameters=False, variable_batch_size=True, extension='.nnp')

Save network

Parameters
  • filename (str or file object) –

    Filename to store information. The file extension is used to determine the saving file format. .nnp: (Recommended) Creating a zip archive with nntxt (network definition etc.) and h5 (parameters). .nntxt: Protobuf in text format. .protobuf: Protobuf in binary format (unsafe in terms of

    backward compatibility).

  • content (list) – Currently only ProtoGraph or PhotoNetwork objects are supported.

  • include_parameters (bool) – Includes parameter into single file. This is ignored when the extension of filename is nnp.

  • variable_batch_size (bool) – Whether or not convert batch size of computation graph to a special value, so that user may use any other batch_size value when using it.

Example

The following example creates a two inputs and two outputs MLP, and save the network structure and the initialized parameters.

import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF

def mlp_module(x0, x1):
    h1_0 = PF.affine(x0, 100, name='affine1_0')
    h1_1 = PF.affine(x1, 100, name='affine1_0')
    h1 = F.tanh(h1_0 + h1_1)
    h2 = F.tanh(PF.affine(h1, 50, name='affine2'))
    y0 = PF.affine(h2, 10, name='affiney_0')
    y1 = PF.affine(h2, 10, name='affiney_1')
    return y0, y1

with nn.graph_def.graph() as g:
    x0 = nn.ProtoVariable((64, 100))
    x1 = nn.ProtoVariable((64, 100))
    y0, y1 = mlp_module(x0, x1)

nn.graph_def.save("mlp_net.nnp", [g])
Create Protobuf Representation from Computation Graph
create_graph_from_variable
nnabla.graph_def.create_graph_from_variable(name, variables, names=None, parameter_scope=None)

Create a Proto Graph from one or multiple outputs.

If developers have a computation graph, it means that they have a nn.Variable() object, it might be loss of a network or an output variable of an executor network, this variable inherently corresponds to a computation network. From these variables, we can create corresponding proto network by this function.

Parameters
  • name (str) – The name of generated proto_network.

  • variables (nn.Variables) – One or multiple variables, if multiple variables, this function adds a sink function to reduce these multiple outputs to one.

  • names (dict, optional, default=None) – A name to nn.Variable mapping table. This function default names all activation variables and parameters with internal naming rule. But sometimes, developers expects specially name some of variables so that these variable can be accessed conveniently. In generating proto network, when the variable occurs in this mapping table, the corresponding name of that variable will be used to name the variable in proto network.

  • parameter_scope (OrderedDict, optional, default=None) – Developers may provide a parameter scope, thus, when create proto networks, the name will be replaced if corresponding variable is found in specified parameter_scope, which make the name of weights or some parameters meaningful.

Example

import nnabla as nn

x = nn.Variable((1, 3, 32, 32))
y = my_model(x)
g = nn.graph_def.create_graph_from_variable("proto_network_name", y)
g.save("my_model.nnp")
get_default_graph
nnabla.graph_def.get_default_graph(*args, **kwargs)

This function obtain current default graph_def.

If user does not create their proto network in a with statement scope, proto network will default be created in a global scope. User may retrieve this proto graph by this function.

Example

import nnabla as nn
from nnabla.core.modules import ResUnit

resunit = ResUnit(16)
input = nn.ProtoVariable((64, 3, 32, 32))
y = resunit(input)
graph_def = nn.graph_def.get_graph_graph()

Note

If user does not ensure whether there is any previous existing proto graph remained in global graph scope. It is better to call reset_default_graph(). If user uses with statement like with nn.graph_def.graph() as g, this point is no need to care about.

Returns

A proto graph is returned

Return type

ProtoGraph

get_default_graph_by_variable
nnabla.graph_def.get_default_graph_by_variable(proto_variable)

This function obtain a specify network by its outputs.

User may retrieve one of the networks in default proto graph scope, if this network has the specified outputs. Let us image that there is a global proto graph, when user passed a ProtoVariable to model, during the procedure that create output variables, proto network is generated in this global proto graph. By this function, user may retrieve this generated proto network, saving it or do any other operations.

Note

This proto network will become invalid after reset_default_graph(). For example,

proto_variable_inputs = [nn.ProtoVariable(v.d.shape) for v in inputs]
outputs = module(*proto_variable_inputs)
net = nn.graph_def.get_default_graph_by_variable(outputs[0])
...
nn.graph_def.reset_default_graph()
y = net(x) # Cannot access net anymore, become invalid at this point
graph
nnabla.graph_def.graph(**kwargs)

This function is only used in with statement.

Parameters
  • name (str, optional, default=None) – User may specify a name for the generated proto network. This name is useful for saving to .nnp.

  • parameter_scope (OrderedDict, optional, default=None) – User may specify a parameter scope, thus, the parameters are created during creating model will be placed into this parameter scope.

For example,

import nnabla as nn

proto_variable_inputs = [nn.ProtoVariable(v.d.shape) for v in inputs]
with nn.graph_def.graph() as g:
    outputs = module(*proto_variable_inputs)

g.save("my_model.nnp")

Here, inputs is an array of input nn.Variables. Modules is a module object instantiated from a Module definition.

reset_default_graph
nnabla.graph_def.reset_default_graph()

This function clear all information in global graph scope.

Sequential

The nnabla.core.sequential.Sequential class represents a construction block of neural network.

Sequential
class nnabla.core.sequential.Sequential(*args, **kwargs)[source]

A sequential block. User may construct their network by a sequential block. Importantly, the component within sequential block must be an instance of nn.Module.

For intuitive understanding, some small examples as follows:

import nnabla as nn
import nnabla.parametric_functions as PF
import nnabla.functions as F

class ConvLayer(nn.Module):
    def __init__(self, outmaps, kernel, stride=1, pad=0):
        self.outmaps = outmaps
        self.kernel = (kernel, kernel)
        self.pad = (pad, pad)
        self.stride = (stride, stride)

    def call(self, x):
        x = PF.convolution(x, outmaps=self.outmaps, kernel=self.kernel, pad=self.pad, stride=self.stride)
        x = F.relu(x)
        return x

# Example of using Sequentional
layer = nn.Sequential(
    ConvLayer(48, kernel=1),
    ConvLayer(64, kernel=3, pad=1)
)

# Example of using Sequentional with a specify name for each layer
layer = nn.Sequential(
    ('conv1', ConvLayer(48, kernel=1)),
    ('conv2', ConvLayer(64, kernel=3, pad=1))
)

Experimental

Viewers
SimpleGraph
class nnabla.experimental.viewers.SimpleGraph(format='png', verbose=False, fname_color_map=None, vname_color_map=None)[source]

Simple Graph with GraphViz.

Example:

import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF

import nnabla.experimental.viewers as V

# Model definition
def network(image, test=False):
    h = image
    h /= 255.0
    h = PF.convolution(h, 16, kernel=(3, 3), pad=(1, 1), name="conv")
    h = PF.batch_normalization(h, name="bn", batch_stat=not test)
    h = F.relu(h)
    pred = PF.affine(h, 10, name='fc')
    return pred

# Model
image = nn.Variable([4, 3, 32, 32])
pred = network(image, test=False)

# Graph Viewer
graph = V.SimpleGraph(verbose=False)
graph.view(pred)
graph.save(pred, "sample_grpah")

If the parameters are module-scoped, for example, the pred comes from a module output, parameters should be obtained beforehand then passed to view():

Example:

import nnabla as nn
import nnabla.functions as F
from nnabla.core.modules import ConvBn

import nnabla.experimental.viewers as V

class TSTNetNormal(nn.Module):
    def __init__(self):
        self.conv_bn_1 = ConvBn(1)
        self.conv_bn_2 = ConvBn(1)

    def call(self, x1, x2):
        y1 = self.conv_bn_1(x1)
        y2 = self.conv_bn_2(x2)
        y = F.concatenate(y1, y2, axis=1)
        return y

tnd = TSTNetNormal()

v1 = nn.Variable((4, 3, 32, 32))
v2 = nn.Variable((4, 3, 32, 32))

ya = tnd(v1, v2)

graph = V.SimpleGraph(verbose=False)
graph.view(ya, params=tnd.get_parameters(grad_only=False))
create_graphviz_digraph(vleaf, params=None, format=None)[source]

Create a graphviz.Digraph object given the leaf variable of a computation graph.

One of nice things of getting Digraph directly is that the drawn graph can be displayed inline in a Jupyter notebook as described in Graphviz documentation.

Parameters
  • vleaf (nnabla.Variable) – End variable. All variables and functions which can be traversed from this variable are shown in the reuslt.

  • params (dict) – The parameters dictionary, it can be obtained by nn.get_parameters().

  • format (str) – Force overwrite format ('pdf', 'png', ...)) configuration.

Returns: graphviz.Digraph

save(vleaf, fpath, cleanup=False, format=None)[source]

Save the graph to a given file path.

Parameters
  • vleaf (nnabla.Variable) – End variable. All variables and functions which can be traversed from this variable are shown in the reuslt.

  • fpath (str) – The file path used to save.

  • cleanup (bool) – Clean up the source file after rendering. Default is False.

  • format (str) – Force overwrite format ('pdf', 'png', ...)) configuration.

view(vleaf, fpath=None, cleanup=True, format=None, params=None)[source]

View the graph.

Parameters
  • vleaf (nnabla.Variable) – End variable. All variables and functions which can be traversed from this variable are shown in the reuslt.

  • fpath (str) – The file path used to save.

  • cleanup (bool) – Clean up the source file after rendering. Default is True.

  • format (str) – Force overwrite format ('pdf', 'png', ...)) configuration.

  • params (dict) – Parameter dictionary, which can be obtained by get_parameters() function. Default is None. If params is None, global parameters are obtained.

Show Graph by Tensorboard
TBGraphWriter
Graph Converters
class nnabla.experimental.graph_converters.GraphConverter(modifiers=[])[source]

Convert a graph with the modifiers by traversing from output variables.

convert(o)[source]
Parameters

o (list of nnabla.Variable) – Output variables.

class nnabla.experimental.graph_converters.FunctionModifier[source]

Base class of modifiers.

The modify method is called for a function with inputs in a graph topological order when you call the GraphConverter(<modifiers>).convert(<root variable>) method.

finish_up()[source]

Finish the very time function modification.

Clean up the internal modifier states.

Parameters

None

Returns

None

get_parameter_scope(v)[source]

Get the parameter name corresponding to v

Parameters

v (nnabla.Variable) – NNabla Variable Object.

Returns

Scope name

Return type

str

modify(f, inputs)[source]

Modify the function.

Implement this method in a sub class to modify a function.

Examples:

class ReLUToLeakyReLUModifier(FunctionModifier):

  def __init__(self):
    super(ReLUToLeakyReLUModifier, self).__init__()

  def modify(self, f, inputs):
    if f.info.type_name == 'ReLU':
      x = inputs[0]
      return F.leaky_relu(x)

This examples is a simple case since the network topological order does not change. In GraphConverter, we expect the modify method is called along the original network tolopogical order not the modified order. In such a complex case, see themodify method of BatchNormalizationFoldingModifierInner as a reference.

Parameters
  • f (nnabla.function.Function) – NNabla function object.

  • inputs (list of Variable) – New inputs to f. This may be modified one or the same as f.inputs.

Returns

Variable or list of Variable.

Function Modifiers
class nnabla.experimental.graph_converters.BatchNormalizationFoldingModifier(opposite=False, channel_last=False)[source]

Single Convolution -> BatchNormalization pass is folded into one Convolution.

If there is a Convolution -> BatchNormalization pass, fold the batch normalization parameters to the kernel and bias (if it exists) of the preceding convolution, then skip the batch normalization following the convolution.

Supported folding functions: Convolution, Deconvolution, Affine.

Examples:

pred = Model(...)

import nnabla.experimental.graph_converters as GC

modifiers = [GC.BatchNormalizationFoldingModifier()]
gc = GC.GraphConverter(modifiers)
pred = gc.convert(pred)
class nnabla.experimental.graph_converters.AddBiasModifier[source]

Add bias to Convolution in BatchNormalization folding case if it doesn’t have bias.

Supported folding functions: Convolution, Deconvolution, Affine.

Examples:

pred = Model(...)

import nnabla.experimental.graph_converters as GC

modifiers = [GC.AddBiasModifier()]
gc = GC.GraphConverter(modifiers)
pred = gc.convert(pred)
class nnabla.experimental.graph_converters.BatchNormalizationFoldingModifierInner(channel_last=False)[source]

Single Convolution -> BatchNormalization pass is folded into one Convolution.

If there is a Convolution -> BatchNormalization pass, fold the batch normalization parameters to the kernel and bias (if it exists) of the preceding convolution, then skip the batch normalization following the convolution.

Supported folding functions: Convolution, Deconvolution, Affine.

class nnabla.experimental.graph_converters.BatchNormalizationFoldingOppositeModifierInner(channel_last=False)[source]

Single BatchNormalization -> Convolution pass is folded into one Convolution.

If there is a BatchNormalization -> Convolution pass, fold the batch normalization parameters to the kernel and bias (if it exists) of the preceding convolution, then skip the batch normalization following the convolution.

Supported folding functions: Convolution, Deconvolution, Affine.

class nnabla.experimental.graph_converters.BatchNormalizationSelfFoldingModifier(name='bn-self-folding')[source]

The parameters of the batch normalization replaced simple scale and bias.

Parameters

name (str) – Prefix of the parameter scope.

Examples:

pred = Model(...)

import nnabla.experimental.graph_converters as GC

modifiers = [GC.BatchNormalizationSelfFoldingModifier()]
gc = GC.GraphConverter(modifiers)
pred = gc.convert(pred)
class nnabla.experimental.graph_converters.FusedBatchNormalizationModifier[source]

Block BatchNormalization -> Add2 -> Non-Linear pass is fused into one FusedBatchNormalization.

If there is a block BatchNormalization -> Add2 -> Non-Linear pass, remove all the block functions and replace the whole block to FusedBatchNormalization.

Examples:

pred = Model(...)

import nnabla.experimental.graph_converters as GC

modifiers = [GC.FusedBatchNormalizationModifier()]
gc = GC.GraphConverter(modifiers)
pred = gc.convert(pred)
class nnabla.experimental.graph_converters.UnfusedBatchNormalizationModifier[source]

Unfuse FusedBatchNormalization to BatchNormalization -> Add2 -> Non-Linear block.

If there is a FusedBatchNormalization pass, remove the fused batch normalization and replace it with the block BatchNormalization -> Add2 -> Non-Linear.

Supported Non-Linear functions: relu

Examples:

pred = Model(...)

import nnabla.experimental.graph_converters as GC

modifiers = [GC.UnfusedBatchNormalizationModifier()]
gc = GC.GraphConverter(modifiers)
pred = gc.convert(pred)
class nnabla.experimental.graph_converters.ChannelLastModifier(inputs, inputs_cl=None)[source]

Convert graph shape from Channel first (NCHW) to Channel last (NHWC) format.

Supported functions: Convolution, Deconvolution, BatchNormalization, MaxPooling, AveragePooling, SumPooling, Unpooling, Concatenate

Parameters
  • inputs (list of nn.Variable) – Original very begining inputs (NCHW) of a network.

  • inputs_cl (list of nn.Variable) – Channel last version of very begining inputs (NHWC) of a network. If this is not given, inputs_cl are generated internally and holded.

Examples:

pred = Model(...)

import nnabla.experimental.graph_converters as GC

modifiers = [GC.ChannelLastModifier(<inputs of pred>)]
gc = GC.GraphConverter(modifiers)
pred = gc.convert(pred)
class nnabla.experimental.graph_converters.ChannelFirstModifier(inputs, inputs_cf=None)[source]

Convert graph shape from Channel last (NHWC) to Channel first (NCHW) format.

Supported functions: Convolution, Deconvolution, BatchNormalization, MaxPooling, AveragePooling, SumPooling, Unpooling, Concatenate

Parameters
  • inputs (list of nn.Variable) – Original channel last version of very begining inputs (NHWC) of a network.

  • inputs_cf (list of nn.Variable) – Channel first version of very begining inputs (NCHW) of a network. If this is not given, inputs_cf are generated internally and holded.

Examples:

pred = Model(...)

import nnabla.experimental.graph_converters as GC

modifiers = [GC.ChannelFirstModifier(<inputs of pred>)]
gc = GC.GraphConverter(modifiers)
pred = gc.convert(pred)
class nnabla.experimental.graph_converters.RemoveFunctionModifier(rm_funcs=[])[source]

Remove specified function layer(s) from a graph.

A convenient converter when one or more functions in an existing graph needs to be removed. This converter remove specified function(s) without recreating a new graph from scratch.

Parameters

rm_funcs (list of str) – list of function name

Examples:

pred = Model(...)

import nnabla.experimental.graph_converters as GC

modifiers = [GC.RemoveFunctionModifier(rm_funcs=['BatchNormalization', 'MulScalar'])]
gc = GC.GraphConverter(modifiers)
pred = gc.convert(pred)
class nnabla.experimental.graph_converters.BatchNormBatchStatModifier[source]

Change batch_stat to False. Supported functions: BatchNormalization, FusedBatchNormalization, SyncBatchNormalization.

Examples:

pred = Model(...)

import nnabla.experimental.graph_converters as GC

modifiers = [GC.BatchNormBatchStatModifier()]
gc = GC.GraphConverter(modifiers)
pred = gc.convert(pred)
class nnabla.experimental.graph_converters.TestModeModifier(rm_funcs=[])[source]

This converter combines BatNormBatchStateModifier and RemoveFunctionModifer. It changes batch_stat to False. Supported functions: BatchNormalization, FusedBatchNormalization, SyncBatchNormalization.

Functions that specified rm_funcs will be removed from a graph.

Parameters

rm_funcs (list of str) – list of function name

Examples:

pred = Model(...)

import nnabla.experimental.graph_converters as GC

modifiers = [GC.TestModeModifier(rm_funcs=['MulScalar'])]
gc = GC.GraphConverter(modifiers)
pred = gc.convert(pred)
class nnabla.experimental.graph_converters.IdentityModifier(inputs={}, copy_value=False)[source]

All functions are replaced to the same new function.

Parameters

inputs (dict) – Input variable mapping from the original input to another input. Default is the empty dictionary, so the new graph shares the original inputs.

Examples:

pred = Model(...)
x = nn.Variable(...)

import nnabla.experimental.graph_converters as GC

modifiers = [GC.IdentityModifier({x0: x1})]
gc = GC.GraphConverter(modifiers)
pred = gc.convert(pred)
class nnabla.experimental.graph_converters.NoGradModifier[source]

All functions are replaced to the same new function.

Parameters

inputs (dict) – Input variable mapping from the original input to another input. Default is the empty dictionary, so the new graph shares the original inputs.

Examples:

pred = Model(...)
x = nn.Variable(...)

import nnabla.experimental.graph_converters as GC

modifiers = [GC.NoGradModifier()]
gc = GC.GraphConverter(modifiers)
pred = gc.convert(pred)
Trainers
class nnabla.experimental.trainers.Trainer(updater=None, evaluator=None, model_save_path=None, max_epoch=1, iter_per_epoch=None, callback_on_start=<function Trainer.<lambda>>, callback_on_finish=<function Trainer.<lambda>>, update_callback_on_start=<function Trainer.<lambda>>, update_callback_on_finish=<function Trainer.<lambda>>)[source]

Trainer API

Trainer class is the very basic class for training neural network. You can composite this class to your own trainer class and delegate the train method of this class to your class.

Parameters
  • updater (Updater or list of Updater) – Updater object.

  • evaluator (Evaluator or list of Evaluator) – Evaluator object.

  • model_save_path (str) – Model save path.

  • max_epoch (int) – Max epoch to train.

  • iter_per_epoch (int, optional) – Iterations per one epoch.

  • callback_on_start (callable object, function, lambda, or list of these, optional) – Callback called before the trainer.train.

  • callback_on_finish (callable object, function, lambda, or list of these, optional) – Callback called after the trainer.train.

  • update_callback_on_start (callable object, function, lambda, or list of these, optional) – Callback called before the updater.update.

  • update_callback_on_finish (callable object, function, lambda, or list of these, optional) – Callback called after the updater.update.

The following example is a complete snippet to use this base trainer.

Example

import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import nnabla.solvers as S

from nnabla.monitor import Monitor, MonitorSeries, MonitorTimeElapsed

import numpy as np

from nnabla.experimental.trainers import Trainer, Updater, Evaluator

# Batch, channel, height, width
b, c, h, w = 32, 1, 128, 128

# Train Input
tinput = nn.Variable([b, c, h, w])
tlabel = nn.Variable([b, c, h, w])

# Train Model and Loss
tpred = <training model>.apply(persistent=True)
tloss = F.mean(F.softmax_cross_entropy(tpred, tlabel))

# Test Input
vinput = nn.Variable([b, c, h, w])
vlabel = nn.Variable([b, c, h, w])

# Test Model and Error
vpred = <evaluation model>.apply(persistent=True)
vloss = F.mean(F.softmax_cross_entropy(vpred, vlabel))
verror = F.mean(F.top_n_error(vpred.get_unlinked_variable(), vlabel))

# Solver
solver = S.Adam()
solver.set_parameters(nn.get_parameters())

# DataIterator
tdata = <training_data_iterator>
vdata = <validation_data_iterator>

# Monitor
monitor = Monitor(<monitor_path>)
monitor_loss = MonitorSeries("Training loss", monitor, interval=10)
monitor_err = MonitorSeries("Training error", monitor, interval=10)
monitor_time = MonitorTimeElapsed("Training time", monitor, interval=100)
monitor_verr = MonitorSeries("Valid error", monitor, interval=10)

# Updater
def tdata_feeder():
    tinput.d, tlabel.d = tdata.next()
def update_callback_on_finish(i):
    monitor_loss.add(i, tloss.d)
    monitor_time.add(i)
updater = Updater(solver, tloss,
                  data_feeder=tdata_feeder,
                  update_callback_on_finish=update_callback_on_finish)

# Evaluator
def vdata_feeder():
    vinput.d, vlabel.d = vdata.next()
def eval_callback_on_finish(i, ve):
    monitor_verr.add(i, ve)
evaluator = Evaluator(verror,
                      data_feeder=vdata_feeder,
                      val_iter=vdata.size // b,
                      callback_on_finish=eval_callback_on_finish)

# Trainer
trainer = Trainer(updater, evaluator, <model_save_path>,
                  max_epoch=<max_epoch>, iter_per_epoch=tdata.size // b)
trainer.train()
class nnabla.experimental.trainers.NaiveClassificationTrainer(solver, tinput=None, tlabel=None, tpred=None, tdata=None, vinput=None, vlabel=None, vpred=None, vdata=None, monitor_path=None, model_save_path=None, max_epoch=1, iter_per_epoch=None, val_iter=None)[source]

Naive Classification Trainer

Parameters
  • solver (Solver) – Solver object.

  • tinput (Variable) – Input variable for input feature in training.

  • tlabel (Variable) – Label variable for lable in training.

  • tpred (Variable) – Root variable for prediction in the training graph.

  • tdata (nnabla.utils.data_iterator.DataIterator) – DataIterator for training.

  • vinput (Variable) – Input variable for input feature in evaluation.

  • vlabel (Variable) – Label variable for label in evaluation.

  • vpred (Variable) – Root variable for prediction in the evaluation graph.

  • vdata (DataIterator) – DataIterator for evaluation.

  • monitor_path (str) – Monitor path.

  • model_save_path (str) – Model save path.

  • max_epoch (int) – Max epoch to train.

  • iter_per_epoch (int, optional) – Iterations per one epoch. If not set, this value are determined by tdata.size // tdata.batch_size.

  • val_iter (int, optional) – Iterations for evaluation. If not set, this value are determined by vdata.size // vdata.batch_size.

The following example is a complete snippet to use this base trainer.

Example

import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import nnabla.solvers as S

import numpy as np

from nnabla.experimental.trainers import NaiveClassificationTrainer

# Batch, channel, height, width
b, c, h, w = 32, 1, 128, 128

# Train Input
tinput = nn.Variable([b, c, h, w])
tlabel = nn.Variable([b, c, h, w])

# Train Model and Loss
tpred = <training model>

# Test Input
vinput = nn.Variable([b, c, h, w])

# Test Model
vpred = <evaluation model>

# Solver
solver = S.Adam()
solver.set_parameters(nn.get_parameters())

# DataIterator
tdata = <training_data_iterator>
vdata = <validation_data_iterator>

# Trainer
trainer = NaiveClassificationTrainer(solver,
                                     tinput, tlabel, tpred, tdata,
                                     vinput, vlabel, vpred, vdata,
                                     <monitor_path>,
                                     <model_save_path>,
                                     max_epoch=<max_epoch>)
trainer.train()
class nnabla.experimental.trainers.NaiveRegressionTrainer(solver, tinput=None, tlabel=None, tpred=None, tdata=None, vinput=None, vlabel=None, vpred=None, vdata=None, monitor_path=None, model_save_path=None, max_epoch=1, iter_per_epoch=None, val_iter=None)[source]

Naive Regression Trainer

Parameters
  • solver (Solver) – Solver object.

  • tinput (Variable) – Input variable for input feature in training.

  • tlabel (Variable) – Label variable for lable in training.

  • tpred (Variable) – Root variable for prediction in the training graph.

  • tdata (nnabla.utils.data_iterator.DataIterator) – DataIterator for training.

  • vinput (Variable) – Input variable for input feature in evaluation.

  • vlabel (Variable) – Label variable for label in evaluation.

  • vpred (Variable) – Root variable for prediction in the evaluation graph.

  • vdata (DataIterator) – DataIterator for evaluation.

  • monitor_path (str) – Monitor path.

  • model_save_path (str) – Model save path.

  • max_epoch (int) – Max epoch to train.

  • iter_per_epoch (int, optional) – Iterations per one epoch. If not set, this value are determined by tdata.size // tdata.batch_size.

  • val_iter (int, optional) – Iterations for evaluation. If not set, this value are determined by vdata.size // vdata.batch_size.

Example

import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import nnabla.solvers as S

import numpy as np

from nnabla.experimental.trainers import NaiveRegressionTrainer

# Batch, channel, height, width
b, c, h, w = 32, 1, 128, 128

# Train Input
tinput = nn.Variable([b, c, h, w])
tlabel = nn.Variable([b, c, h, w])

# Train Model and Loss
tpred = <training model>

# Test Input
vinput = nn.Variable([b, c, h, w])
vlabel = nn.Variable([b, c, h, w])

# Test Model
vpred = <evaluation model>

# Solver
solver = S.Adam()
solver.set_parameters(nn.get_parameters())

# DataIterator
tdata = <training_data_iterator>
vdata = <validation_data_iterator>

# Trainer
trainer = NaiveRegressionTrainer(solver,
                                 tinput, tlabel, tpred, tdata,
                                 vinput, vlabel, vpred, vdata,
                                 <monitor_path>,
                                 <model_save_path>,
                                 max_epoch=<max_epoch>)
trainer.train()
class nnabla.experimental.trainers.Updater(solver=None, loss=None, data_feeder=<function Updater.<lambda>>, forward_callback_on_start=<function Updater.<lambda>>, forward_callback_on_finish=<function Updater.<lambda>>, backward_callback_on_start=<function Updater.<lambda>>, backward_callback_on_finish=<function Updater.<lambda>>, comm_callback_on_start=<function Updater.<lambda>>, comm_callback_on_finish=<function Updater.<lambda>>, update_callback_on_start=<function Updater.<lambda>>, update_callback_on_finish=<function Updater.<lambda>>, clear_buffer=True, accum_grad=1, comm=None, grads=[])[source]
Parameters
  • solver (nnabla.solvers.Solver) – Solver object. E.g., Momentum or Adam.

  • loss (nnabla.Variable) – Loss variable from which the forward and the backward is called.

  • data_feeder (callable object, function, or lambda) – Data feeder.

  • forward_callback_on_start (callable object, function, lambda, or list of these, optional) – Callback called before forward function.

  • forward_callback_on_finish (callable object, function, lambda, or list of these, optional) – Callback called after forward function.

  • backward_callback_on_start (callable object, function, lambda, or list of these, optional) – Callback called before backward function.

  • backward_callback_on_finish (callable object, function, lambda, or list of these, optional) – Callback called after backward function.

  • comm_callback_on_start (callable object, function, lambda, or list of these, optional) – Callback called before comm.all_reduce.

  • comm_callback_on_finish (callable object, function, lambda, or list of these, optional) – Callback called after comm.all_reduce.

  • update_callback_on_start (callable object, function, lambda, or list of these, optional) – Callback called before update function.

  • update_callback_on_finish (callable object, function, lambda, or list of these, optional) – Callback called after update function.

  • clear_buffer (bool, optional) – Clears the no longer referenced variables during backpropagation to save memory.

  • accum_grad (int, optional) – Number of accumulation of gradients. Update method of the Solver is called after the accum_grad number of the forward and backward is called. Default is 1.

  • comm (nnabla.communicators.Communicator, optional) – Communicator when to do distributed training. Default is None.

  • grads (list of nnabla.NdArray, optional) – The list of gradients to be exchanged when to do distributed training. Default is the empty list.

Example

from nnabla.experimental.trainers import Updater

solver = <Solver>
loss = <Loss Variable of Network>

def tdata_feeder():
    ...
def update_callback_on_finish(i):
    ...
updater = Updater(solver, loss, tdata_feeder, updater_callback_on_finish)

# Training iteration
for itr in range(<max_iter>):
    updater.update()
update(i)[source]

Monolithic update method.

This method calls the following methods with the dynamic loss scaling.

  1. solver.zerograd

  2. feed data

  3. loss.forward

  4. loss.backward

  5. comm.all_reduce (if it is specified)

  6. solver.update

class nnabla.experimental.trainers.Evaluator(vroot=None, data_feeder=None, val_iter=None, callback_on_start=<function Evaluator.<lambda>>, callback_on_finish=<function Evaluator.<lambda>>, clear_buffer=True, comm=None)[source]
Parameters
  • vroot (Variable) – Root varible of the evaluation graph.

  • data_feeder (callable object, function, or lambda) – Data feeder.

  • val_iter (int, optional) – Iterations for evaluation.

  • callback_on_start (callable object, function, lambda, or list of these, optional) – Callback called before the evaluator.evalute.

  • callback_on_finish (callable object, function, lambda, or list of these, optional) – Callback called after the evaluator.evalute.

  • clear_buffer (bool, optional) – Clears the no longer referenced variables during backpropagation to save memory.

  • comm (nnabla.communicators.Communicator, optional) – Communicator when to do distributed training. Default is None.

Example

from nnabla.experimental.trainers import Evaluator

# Evaluator
def vdata_feeder():
    ...
def eval_callback_on_finish(i, ve):
    ...
evaluator = Evaluator(verror,
                      data_feeder=vdata_feeder,
                      val_iter=<val_iter>,
                      callback_on_finish=eval_callback_on_finish)
Mixed Precision Trainings
DynamicLossScalingUpdater
class nnabla.experimental.mixed_precision_training.DynamicLossScalingUpdater(solver, loss, data_feeder=<function DynamicLossScalingUpdater.<lambda>>, scale=8.0, scaling_factor=2.0, N=2000, clear_buffer=True, accum_grad=1, weight_decay=None, comm=None, grads=[])[source]

Dynamic Loss Scaling Updater for the mixed precision training.

Parameters
  • solver (nnabla.solvers.Solver) – Solver object. E.g., Momentum or Adam.

  • loss (nnabla.Variable) – Loss variable from which the forward and the backward is called.

  • data_feeder (callable object, function, or lambda) – Data feeder

  • scale (float) – Loss scale constant. This is dynamically changing during training.

  • scaling_factor (float) – Scaling factor for the dynamic loss scaling.

  • N (int) – Interval, the number of iterations in training for increasing loss scale by scaling_factor.

  • clear_buffer (bool) – Clears the no longer referenced variables during backpropagation to save memory.

  • accum_grad (int) – Number of accumulation of gradients. Update method of the Solver is called after the accum_grad number of the forward and backward is called.

  • weight_decay (float) – Decay constant. Default is None, not applying the weight decay.

  • comm (nnabla.communicators.Communicator) – Communicator when to do distributed training. Default is None.

  • grads (list of nnabla.NdArray) – The list of gradients to be exchanged when to do distributed training. Default is the empty list.

solver

Solver object. E.g., Momentum or Adam.

Type

nnabla.solvers.Solver

loss

Loss variable from which the forward and the backward is called.

Type

nnabla.Variable

data_feeder

Data feeder

Type

callable object, function, lambda

scale

Loss scale constant. This is dynamically changing during training.

Type

float

scaling_factor

Scaling factor for the dynamic loss scaling.

Type

float

N

Interval, the number of iterations in training for increasing loss scale by scaling_factor.

Type

int

clear_buffer

Clears the no longer referenced variables during backpropagation to save memory.

Type

bool

accum_grad

Number of accumulation of gradients. Update method of the Solver is called after the accum_grad number of the forward and backward is called.

Type

int

weight_decay

Decay constant. Default is None, not applying the weight decay.

Type

float

comm

Communicator when to do distributed training.

Type

nnabla.communicators.Communicator

grads

The list of gradients to be exchanged when to do distributed training.

Type

list of nnabla.NdArray

Example

Reference:

update()[source]

Monolithic update method.

This method calls the following methods with the dynamic loss scaling.

  1. solver.zerograd

  2. feed data

  3. loss.forward

  4. loss.backward

  5. comm.all_reduce (if it is specified)

  6. solver.update

Parametric Function Classes
class nnabla.experimental.parametric_function_class.affine.Affine(n_inmaps, n_outmaps, base_axis=1, w_init=None, b_init=None, fix_parameters=False, rng=None, with_bias=True)[source]

The affine layer, also known as the fully connected layer. Computes

\[{\mathbf y} = {\mathbf A} {\mathbf x} + {\mathbf b}.\]

where \({\mathbf x}, {\mathbf y}\) are the inputs and outputs respectively, and \({\mathbf A}, {\mathbf b}\) are constants.

Parameters
Returns

\((B + 1)\)-D array. (\(M_0 \times \ldots \times M_{B-1} \times L\))f

Return type

Variable

nnabla.experimental.parametric_function_class.affine.Linear

alias of nnabla.experimental.parametric_function_class.affine.Affine

class nnabla.experimental.parametric_function_class.convolution.Convolution(inmaps, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True)[source]

N-D Convolution with a bias term.

For Dilated Convolution (a.k.a. Atrous Convolution), refer to:

Note

Convolution is a computationally intensive operation that should preferably be run with the cudnn backend. NNabla then uses CuDNN library functions to determine and cache the fastest algorithm for the given set of convolution parameters, which results in additional memory consumption which may pose a problem for GPUs with insufficient memory size. In that case, the NNABLA_CUDNN_WORKSPACE_LIMIT environment variable can be used to restrict the choice of algorithms to those that fit the given workspace memory limit, expressed in bytes. In some cases it may also be desired to restrict the automatic search to algorithms that produce deterministic (reproducable) results. This can be requested by setting the the environment variable NNABLA_CUDNN_DETERMINISTIC to a non-zero value.

Parameters
  • inp (Variable) – N-D array.

  • outmaps (int) – Number of convolution kernels (which is equal to the number of output channels). For example, to apply convolution on an input with 16 types of filters, specify 16.

  • kernel (tuple of int) – Convolution kernel size. For example, to apply convolution on an image with a 3 (height) by 5 (width) two-dimensional kernel, specify (3,5).

  • pad (tuple of int) – Padding sizes for dimensions.

  • stride (tuple of int) – Stride sizes for dimensions.

  • dilation (tuple of int) – Dilation sizes for dimensions.

  • group (int) – Number of groups of channels. This makes connections across channels more sparse by grouping connections along map direction.

  • w_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for weight. By default, it is initialized with nnabla.initializer.UniformInitializer within the range determined by nnabla.initializer.calc_uniform_lim_glorot.

  • b_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.

  • base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.

  • fix_parameters (bool) – When set to True, the weights and biases will not be updated.

  • rng (numpy.random.RandomState) – Random generator for Initializer.

  • with_bias (bool) – Specify whether to include the bias term.

Returns

N-D array. See convolution for the output shape.

Return type

Variable

nnabla.experimental.parametric_function_class.convolution.Conv1d

alias of nnabla.experimental.parametric_function_class.convolution.Convolution

nnabla.experimental.parametric_function_class.convolution.Conv2d

alias of nnabla.experimental.parametric_function_class.convolution.Convolution

nnabla.experimental.parametric_function_class.convolution.Conv3d

alias of nnabla.experimental.parametric_function_class.convolution.Convolution

nnabla.experimental.parametric_function_class.convolution.ConvNd

alias of nnabla.experimental.parametric_function_class.convolution.Convolution

class nnabla.experimental.parametric_function_class.deconvolution.Deconvolution(inmaps, outmaps, kernel, pad=None, stride=None, dilation=None, group=1, w_init=None, b_init=None, base_axis=1, fix_parameters=False, rng=None, with_bias=True)[source]

Deconvolution layer.

Parameters
  • inp (Variable) – N-D array.

  • outmaps (int) – Number of deconvolution kernels (which is equal to the number of output channels). For example, to apply deconvolution on an input with 16 types of filters, specify 16.

  • kernel (tuple of int) – Convolution kernel size. For example, to apply deconvolution on an image with a 3 (height) by 5 (width) two-dimensional kernel, specify (3,5).

  • pad (tuple of int) – Padding sizes for dimensions.

  • stride (tuple of int) – Stride sizes for dimensions.

  • dilation (tuple of int) – Dilation sizes for dimensions.

  • group (int) – Number of groups of channels. This makes connections across channels sparser by grouping connections along map direction.

  • w_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for weight. By default, it is initialized with nnabla.initializer.UniformInitializer within the range determined by nnabla.initializer.calc_uniform_lim_glorot.

  • b_init (nnabla.initializer.BaseInitializer or numpy.ndarray) – Initializer for bias. By default, it is initialized with zeros if with_bias is True.

  • base_axis (int) – Dimensions up to base_axis are treated as the sample dimensions.

  • fix_parameters (bool) – When set to True, the weights and biases will not be updated.

  • rng (numpy.random.RandomState) – Random generator for Initializer.

  • with_bias (bool) – Specify whether to include the bias term.

Returns

N-D array. See deconvolution for the output shape.

Return type

Variable

nnabla.experimental.parametric_function_class.deconvolution.Deconv1d

alias of nnabla.experimental.parametric_function_class.deconvolution.Deconvolution

nnabla.experimental.parametric_function_class.deconvolution.Deconv2d

alias of nnabla.experimental.parametric_function_class.deconvolution.Deconvolution

nnabla.experimental.parametric_function_class.deconvolution.Deconv3d

alias of nnabla.experimental.parametric_function_class.deconvolution.Deconvolution

nnabla.experimental.parametric_function_class.deconvolution.DeconvNd

alias of nnabla.experimental.parametric_function_class.deconvolution.Deconvolution

class nnabla.experimental.parametric_function_class.batch_normalization.BatchNormalization(n_features, n_dims, axes=[1], decay_rate=0.9, eps=1e-05, batch_stat=True, output_stat=False, fix_parameters=False, param_init=None)[source]

Batch normalization layer.

\[\begin{split}\begin{array}{lcl} \mu &=& \frac{1}{M} \sum x_i\\ \sigma^2 &=& \frac{1}{M} \left(\sum x_i - \mu\right)^2\\ \hat{x}_i &=& \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon }}\\ y_i &= & \hat{x}_i \gamma + \beta. \end{array}\end{split}\]

where \(x_i, y_i\) are the inputs. In testing, the mean and variance computed by moving average calculated during training are used.

Parameters
  • inp (Variable) – N-D array of input.

  • axes (tuple of int) – Mean and variance for each element in axes are calculated using elements on the rest axes. For example, if an input is 4 dimensions, and axes is [1], batch mean is calculated as np.mean(inp.d, axis=(0, 2, 3), keepdims=True) (using numpy expression as an example).

  • decay_rate (float) – Decay rate of running mean and variance.

  • eps (float) – Tiny value to avoid zero division by std.

  • batch_stat (bool) – Use mini-batch statistics rather than running ones.

  • output_stat (bool) – Output batch mean and variance.

  • fix_parameters (bool) – When set to True, the beta and gamma will not be updated.

  • param_init (dict) – Parameter initializers can be set with a dict. A key of the dict must be 'beta', 'gamma', 'mean' or 'var'. A value of the dict must be an Initializer or a numpy.ndarray. E.g. {'beta': ConstantInitializer(0), 'gamma': np.ones(gamma_shape) * 2}.

Returns

N-D array.

Return type

Variable

References

The shape of parameters has the same number of dimensions with the input data, and the shapes in axes has the same dimensions with the input, while the rest has 1. If an input is 4-dim and axes=[1], the parameter shape will be param_shape  = np.mean(inp.d, axis=(0, 2, 3), keepdims=True).shape (using numpy expression as an example).

class nnabla.experimental.parametric_function_class.batch_normalization.BatchNorm1d(n_features, axes=[1], decay_rate=0.9, eps=1e-05, batch_stat=True, output_stat=False, fix_parameters=False, param_init=None)[source]

Batch normalization layer for 3d-Array or 3d-Variable. This is typically used together with Conv1d.

\[\begin{split}\begin{array}{lcl} \mu &=& \frac{1}{M} \sum x_i\\ \sigma^2 &=& \frac{1}{M} \left(\sum x_i - \mu\right)^2\\ \hat{x}_i &=& \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon }}\\ y_i &= & \hat{x}_i \gamma + \beta. \end{array}\end{split}\]

where \(x_i, y_i\) are the inputs. In testing, the mean and variance computed by moving average calculated during training are used.

Parameters
  • inp (Variable) – N-D array of input.

  • axes (tuple of int) – Mean and variance for each element in axes are calculated using elements on the rest axes. For example, if an input is 4 dimensions, and axes is [1], batch mean is calculated as np.mean(inp.d, axis=(0, 2, 3), keepdims=True) (using numpy expression as an example).

  • decay_rate (float) – Decay rate of running mean and variance.

  • eps (float) – Tiny value to avoid zero division by std.

  • batch_stat (bool) – Use mini-batch statistics rather than running ones.

  • output_stat (bool) – Output batch mean and variance.

  • fix_parameters (bool) – When set to True, the beta and gamma will not be updated.

  • param_init (dict) – Parameter initializers can be set with a dict. A key of the dict must be 'beta', 'gamma', 'mean' or 'var'. A value of the dict must be an Initializer or a numpy.ndarray. E.g. {'beta': ConstantInitializer(0), 'gamma': np.ones(gamma_shape) * 2}.

Returns

N-D array.

Return type

Variable

References

The shape of parameters has the same number of dimensions with the input data, and the shapes in axes has the same dimensions with the input, while the rest has 1. If an input is 4-dim and axes=[1], the parameter shape will be param_shape  = np.mean(inp.d, axis=(0, 2, 3), keepdims=True).shape (using numpy expression as an example).

class nnabla.experimental.parametric_function_class.batch_normalization.BatchNorm2d(n_features, axes=[1], decay_rate=0.9, eps=1e-05, batch_stat=True, output_stat=False, fix_parameters=False, param_init=None)[source]

Batch normalization layer for 4d-Array or 4d-Variable. This is typically used together with Conv2d.

\[\begin{split}\begin{array}{lcl} \mu &=& \frac{1}{M} \sum x_i\\ \sigma^2 &=& \frac{1}{M} \left(\sum x_i - \mu\right)^2\\ \hat{x}_i &=& \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon }}\\ y_i &= & \hat{x}_i \gamma + \beta. \end{array}\end{split}\]

where \(x_i, y_i\) are the inputs. In testing, the mean and variance computed by moving average calculated during training are used.

Parameters
  • inp (Variable) – N-D array of input.

  • axes (tuple of int) – Mean and variance for each element in axes are calculated using elements on the rest axes. For example, if an input is 4 dimensions, and axes is [1], batch mean is calculated as np.mean(inp.d, axis=(0, 2, 3), keepdims=True) (using numpy expression as an example).

  • decay_rate (float) – Decay rate of running mean and variance.

  • eps (float) – Tiny value to avoid zero division by std.

  • batch_stat (bool) – Use mini-batch statistics rather than running ones.

  • output_stat (bool) – Output batch mean and variance.

  • fix_parameters (bool) – When set to True, the beta and gamma will not be updated.

  • param_init (dict) – Parameter initializers can be set with a dict. A key of the dict must be 'beta', 'gamma', 'mean' or 'var'. A value of the dict must be an Initializer or a numpy.ndarray. E.g. {'beta': ConstantInitializer(0), 'gamma': np.ones(gamma_shape) * 2}.

Returns

N-D array.

Return type

Variable

References

The shape of parameters has the same number of dimensions with the input data, and the shapes in axes has the same dimensions with the input, while the rest has 1. If an input is 4-dim and axes=[1], the parameter shape will be param_shape  = np.mean(inp.d, axis=(0, 2, 3), keepdims=True).shape (using numpy expression as an example).

class nnabla.experimental.parametric_function_class.batch_normalization.BatchNorm3d(n_features, axes=[1], decay_rate=0.9, eps=1e-05, batch_stat=True, output_stat=False, fix_parameters=False, param_init=None)[source]

Batch normalization layer for 5d-Array or 5d-Variable. This is typically used together with Conv3d.

\[\begin{split}\begin{array}{lcl} \mu &=& \frac{1}{M} \sum x_i\\ \sigma^2 &=& \frac{1}{M} \left(\sum x_i - \mu\right)^2\\ \hat{x}_i &=& \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon }}\\ y_i &= & \hat{x}_i \gamma + \beta. \end{array}\end{split}\]

where \(x_i, y_i\) are the inputs. In testing, the mean and variance computed by moving average calculated during training are used.

Parameters
  • inp (Variable) – N-D array of input.

  • axes (tuple of int) – Mean and variance for each element in axes are calculated using elements on the rest axes. For example, if an input is 4 dimensions, and axes is [1], batch mean is calculated as np.mean(inp.d, axis=(0, 2, 3), keepdims=True) (using numpy expression as an example).

  • decay_rate (float) – Decay rate of running mean and variance.

  • eps (float) – Tiny value to avoid zero division by std.

  • batch_stat (bool) – Use mini-batch statistics rather than running ones.

  • output_stat (bool) – Output batch mean and variance.

  • fix_parameters (bool) – When set to True, the beta and gamma will not be updated.

  • param_init (dict) – Parameter initializers can be set with a dict. A key of the dict must be 'beta', 'gamma', 'mean' or 'var'. A value of the dict must be an Initializer or a numpy.ndarray. E.g. {'beta': ConstantInitializer(0), 'gamma': np.ones(gamma_shape) * 2}.

Returns

N-D array.

Return type

Variable

References

The shape of parameters has the same number of dimensions with the input data, and the shapes in axes has the same dimensions with the input, while the rest has 1. If an input is 4-dim and axes=[1], the parameter shape will be param_shape  = np.mean(inp.d, axis=(0, 2, 3), keepdims=True).shape (using numpy expression as an example).

class nnabla.experimental.parametric_function_class.embed.Embed(n_inputs, n_features, w_init=None, fix_parameters=False)[source]

Embed.

Embed slices a matrix/tensor with indexing array/tensor. Weights are initialized with nnabla.initializer.UniformInitializer within the range of \(-\sqrt{3}\) and \(\sqrt{3}\).

Parameters
  • x (Variable) – [Integer] Indices with shape \((I_0, ..., I_N)\)

  • n_inputs – number of possible inputs, words or vocabraries

  • n_features – number of embedding features

  • fix_parameters (bool) – When set to True, the embedding weight matrix will not be updated.

Returns

Output with shape \((I_0, ..., I_N, W_1, ..., W_M)\)

Return type

Variable

C++ API

The C++ libraries currently provide:

  • APIs to execute an inference of a trained model created by Python APIs and Neural Network Console, a Sony’s GUI neural network IDE.

  • A command line interface written in C++ which executes an inference.

  • An example of how to use C++ API with a trained model.

We are still preparing a well-formatted C++ API reference manual, however you can read through the header files where most of classes and functions are documented in Doxygen format. The header files can be found under include directory.

The example MNIST runtime is a good starting point to understand how to use C++ API for neural network inference.

Build C++ libraries

Documentation has been moved to Github repository.

C++ Command Line Interface

Nnabla has c++ version’s command line interface utility which can do train, forward(inference). Using this command line interface, developers can run train and infer without any python environment.

usage: nbla (infer|dump|train)

Basic functions

Forward
usage: nbla infer -e EXECUTOR [-b BATCHSIZE] [-o OUTPUT] input_files ...

arguments:
   -e EXECUTOR         EXECUTOR is the name of executor network.
   input_files         input_file must be one of followings.
                           *.nnp      : Network structure and parameter.
                           *.nntxt    : Network structure in prototxt format.
                           *.prototxt : Same as nntxt.
                           *.h5       : Parameters in h5 format.
                           *.protobuf : Network structure and parameters in binary.
                           *.bin      : Input data.

optional arguments:
   -b BATCHSIZE        batch size for the input data.
   -o OUTPUT           the filename pattern of output file, default output to stdout.

example:
    Infer using LeNet_input.bin as input, LeNet_output_0.bin as output:
       nbla infer -e Executor -b 1 LeNet.nnp LeNet_input.bin -o LeNet_output

    Infer and output the result to console:
       nbla infer -e Executor -b 1 LeNet.nnp LeNet_input.bin
Dump
usage: nbla dump input_files ...

arguments:
   input_files         input_files must be one of *.nnp, *.nntxt, prototxt, h5, protobuf

example:
    Show network information by dump command:
      nbla dump LeNet.nnp

The output looks like:

This configuration has 1 executors.

  Executor No.0 Name [Executor]
    Using default batch size 64 .
     Inputs
      Input No.0 Name [x] Shape ( 64 1 28 28 )
     Outputs
      Output No.0 Name [y'] Shape ( 64 10 )
Finished
Train
usage: nbla train input_file

arguments:
   input_file          input_file must be *.nnp

C++ API Examples

Follow this link to see examples.

Data exchange file format

Data exchange format for “Neural Network Library”.

Current version of .nnp file is just a ZIP format archive file but filename extension is ‘.nnp’.

‘.nnp’ file will contain following files. If ‘.nnp’ file contain other file, nnabla will just ignore that files.

  • ‘nnp_version.txt’

    • Specify version of nnp file. Version string in this file is got from nnp_version().

  • .nntxt’ (or ‘.prototxt’)

    • Network structure in Protocol buffer text format.

  • *.protobuf’

    • Trained parameter in Protocol buffer binary format.

  • *.h5’

    • Trained parameter in HDF5 format.(Will be obsolete soon.)

nnabla.utils.nnp_format.nnp_version()[source]

Current version is “0.1”

  • Version history

    • Version 0.1

      • First version.

Data Format

Here is data format for exchange network structures and trained parameters.

Network Structure

Network structure and parameter will store with Google Protocol Buffer format internally.

Overview

Overview of network structure defined as following.

skinparam monochrome true
hide circle
hide methods

class NNablaProtoBuf {
  string version
  GlobalConfig global_config
  TrainingConfig training_config
  Network[] network
  Parameter[] parameter
  Dataset[] dataset
  Optimizer[] optimizer
  Monitor[] monitor
  Executor[] executor
}

package common <<Rectangle>> {
  class GlobalConfig {
    Context default_context
  }

  class Network {
    string name
    int batch_size
    RepeatInfo[] repeat_info
    Variable[] variable
    Function[] function
  }

  class Parameter {
    string variable_name
    Shape shape
    float[] data
    bool need_grad
  }
}

package training <<Rectangle>> {
  class TrainingConfig {
    int max_epoch
    int iter_per_epoch
    bool save_best
  }

  class Dataset {
    string name
    string type

    string uri
    int batch_size
    string cache_dir
    bool overwrite_cache
    bool create_cache_explicitly

    bool shuffle
    bool no_image_normalization

    string[] variable
  }

  class Optimizer {
    string name

    int order

    string network_name
    string dataset_name

    Solver solver
    int update_interval

    DataVariable[] data_variable
    GeneratorVariable[] generator_variable
    LossVariable[] loss_variable
    ParameterVariable[] parameter_variable
  }

  class Monitor {
    string name

    string network_name
    string dataset_name

    DataVariable[] data_variable
    GeneratorVariable[] generator_variable
    MonitorVariable[] monitor_variable
  }
}

package inference <<Rectangle>> {
  class Executor {
    string name

    string network_name

    int num_evaluations
    string repeat_evaluation_type

    bool need_back_propagation

    DataVariable[] data_variable
    GeneratorVariable[] generator_variable
    LossVariable[] loss_variable
    OutputVariable[] output_variable
    ParameterVariable[] parameter_variable
  }
}
common <.. training
common <.. inference

NNablaProtoBuf "1" o-- "0,1" GlobalConfig
NNablaProtoBuf "1" o-- "0,1" Parameter

NNablaProtoBuf "1" o-- "0,1" TrainingConfig
NNablaProtoBuf "1" o-- "0..*" Network
NNablaProtoBuf "1" o-- "0..*" Dataset
NNablaProtoBuf "1" o-- "0..*" Optimizer
NNablaProtoBuf "1" o-- "0..*" Monitor

NNablaProtoBuf "1" o-- "0..*" Executor

NNablaProtoBuf

Root message of NNabla network structure. This message could be store GlobalConfig, TrainingConfig, Network(s), Parameter(s), Dataset(s), Optimizer(s), Monitor(s) and Executor(s).

Variable

Internal data structure to store tensor for Neural network I/O and parameters.

GlobalConfig

Configuration of environment that suggest to do train or inference.

TrainingConfig

Configuration of training.

Network

Network structure.

Parameter

Special variable to store train result. (e.g Weight or Bias of affine layer)

Dataset

Specify dataset for training.

Optimizer

Define network, dataset, and input/output variables for train.

Monitor

Define network, dataset, and input/output variables for monitor training status..

Executor

Define network and input/output variables for train.

Structure for Training

TBD

Structure for Inference

TBD

Overall structure

skinparam monochrome true
hide circle
hide methods

class Shape {
  int[] dim
}

class Context {
  string backend
  string array_class
  string device_id
  string compute_backend
}

class GlobalConfig {
  Context default_context
}

class NNablaProtoBuf {
  string version
  GlobalConfig global_config
  TrainingConfig training_config
  Network[] network
  Parameter[] parameter
  Dataset[] dataset
  Optimizer[] optimizer
  Monitor[] monitor
  Executor[] executor
}

class TrainingConfig {
  int max_epoch
  int iter_per_epoch
  bool save_best
}

class Network {
  string name
  int batch_size
  RepeatInfo[] repeat_info
  Variable[] variable
  Function[] function
}

class RepeatInfo {
 string id
 int times
}

class RepeatParameter {
  string repeat_id
  int times
}

class RecurrentParameter {
  string repeat_id
  int length
  int axis
}

class Variable {
  string name
  string type
  string[] repeat_id

  Shape shape

  Initializer initializer
}

class Initializer {
  string type
  float multiplier
}

class Parameter {
  string variable_name
  Shape shape
  float[] data
  bool need_grad
}

class Dataset {
  string name
  string type

  string uri
  int batch_size
  string cache_dir
  bool overwrite_cache
  bool create_cache_explicitly

  bool shuffle
  bool no_image_normalization

  string[] variable
}

class Optimizer {
  string name

  int order

  string network_name
  string dataset_name

  Solver solver
  int update_interval

  DataVariable[] data_variable
  GeneratorVariable[] generator_variable
  LossVariable[] loss_variable
  ParameterVariable[] parameter_variable
}

class Solver {
  string type

  Context context

  float weight_decay

  float lr_decay
  int lr_decay_interval

  SolverParameter parameter
}

class DataVariable {
  string variable_name
  string data_name
}

class GeneratorVariable {
  string variable_name
  string type
  float multiplier
}

class LossVariable {
  string variable_name
}

class ParameterVariable {
  string variable_name
  float learning_rate_multiplier
}

class Monitor {
  string name

  string network_name
  string dataset_name

  DataVariable[] data_variable
  GeneratorVariable[] generator_variable
  MonitorVariable[] monitor_variable
}

class MonitorVariable {
  string variable_name
  string type
  string data_name

  float multiplier
}

class Executor {
  string name

  string network_name

  int num_evaluations
  string repeat_evaluation_type

  bool need_back_propagation

  DataVariable[] data_variable
  GeneratorVariable[] generator_variable
  LossVariable[] loss_variable
  OutputVariable[] output_variable
  ParameterVariable[] parameter_variable
}

class OutputVariable {
  string variable_name
  string type
  string data_name
}

class Function {
  string name
  string type
  string[] repeat_id

  Context context
  string[] input
  string[] output

  FunctionParameter parameter

  // Loop Functions
  RepeatParameter repeat_param
  RecurrentParameter recurrent_param
}

abstract class SolverParameter
hide SolverParameter members

abstract class FunctionParameter
hide FunctionParameter members

NNablaProtoBuf "1" o-- "0,1" GlobalConfig
NNablaProtoBuf "1" o-- "0,1" TrainingConfig
NNablaProtoBuf "1" o-- "0..*" Network
NNablaProtoBuf "1" o-- "0..*" Parameter
NNablaProtoBuf "1" o-- "0..*" Dataset

NNablaProtoBuf "1" o-- "0..*" Optimizer
NNablaProtoBuf "1" o-- "0..*" Monitor
NNablaProtoBuf "1" o-- "0..*" Executor

GlobalConfig "1" o-- "1" Context

Network "1" o-- "0..*" RepeatInfo
Network "1" o-- "0..*" Variable
Network "1" o-- "0..*" Function

Parameter "1" ..> "1" Variable
Parameter "1" o-- "1" Shape

Variable "1" o-- "1" Shape
Variable "1" o-- "0,1" Initializer

Optimizer "1" ..> "1" Network
Optimizer "1" ..> "1" Dataset
Optimizer "1" o-- "1" Solver
Optimizer "1" o-- "0..*" DataVariable
Optimizer "1" o-- "0..*" GeneratorVariable
Optimizer "1" o-- "0..*" LossVariable
Optimizer "1" o-- "0..*" ParameterVariable

Monitor "1" ..> "1" Network
Monitor "1" ..> "1" Dataset
Monitor "1" o-- "1" Solver
Monitor "1" o-- "0..*" DataVariable
Monitor "1" o-- "0..*" GeneratorVariable
Monitor "1" o-- "0..*" MonitorVariable

Executor "1" ..> "1" Network
Executor "1" o-- "1" Solver
Executor "1" o-- "0..*" DataVariable
Executor "1" o-- "0..*" GeneratorVariable
Executor "1" o-- "0..*" LossVariable
Executor "1" o-- "0..*" OutputVariable
Executor "1" o-- "0..*" ParameterVariable

DataVariable      "1" ..> "1" Variable
GeneratorVariable "1" ..> "1" Variable
LossVariable      "1" ..> "1" Variable
ParameterVariable "1" ..> "1" Variable
MonitorVariable   "1" ..> "1" Variable
OutputVariable    "1" ..> "1" Variable

Function "1" o-- "0,1" FunctionParameter
Function "1" o-- "0,1" RepeatParameter
Function "1" o-- "0,1" RecurrentParameter

Solver "1" o-- "1" Context
Solver "1" o-- "0,1" SolverParameter

Parameter

From the performance point of view, parameters can be saved in HDF 5 format.

File Format and extensions

Protocol buffer text format file

.nntxt or .prototxt

Protocol buffer serialized binary file

.protobuf

HDF5

.h5

NNP (ZIP archived file with above formats.)

.nnp

File format converter

Overview

blockdiag NNabla Use NNabla as Runtime Other (Caffe2 etc.) ONNX ONNX NNP NNB C Source code SavedModel, PB, TFlite File Format Converter File Format Converter NNP Tensorflow (.pb,ckpt,.tflite, saved_model) Other runtime NNabla C Runtime Implement to product Tensorflow

File format converter will realize Neural Network Libraries (or Console) workflow with ONNX file format, and also NNabla C Runtime.

File format converter has following functions.

  • Convert NNP variations to valid NNP

  • Convert ONNX to NNP

  • Convert NNP to ONNX

  • Convert NNP to NNB(Binary format for NNabla C Runtime)

  • Convert NNP to Tensorflow saved_model

  • Convert Tensorflow checkpoint, frozen graph or saved_model to NNP

  • Convert NNP to Tensorflow Lite

  • Convert NNP to INT8 quantized Tensorflow Lite

  • Convert Tensorflow Lite to NNP

  • Experimental: Convert NNP to C Source code for NNabla C Runtime

IMPORTANT NOTICE: This file format converter still has some known problems.

Architecture

blockdiag <<file>> INPUT <<file>> OUTPUT proto Process (Split, Expand, etc.) import export

This file format converter uses protobuf defined in Neural Network Libraries as intermediate format.

While this is not a generic file format converter, this is the specified converter for Neural Network Libraries.

This converter can specify both inputs and outputs for ONNX file, but if ONNX file contains a function unsupported by Neural Network Libraries, it may cause error in conversion.

This converter also provides some intermediate process functionalities. See Process.

Installation

Before using this converter, please use command pip install nnabla_converter to install nnabla_converter.

Note that, flatbuffer package is necessary for TFLite export, please check Tensorflow Lite section in this page for more details.

Conversion

Supported Formats
NNP

NNP is file format of NNabla.

NNP format is described at Data Format.

But with this file format converter is work with several variation of NNP.

  • Standard NNP format (.nnp)

  • Contents of NNP files(.nntxt, .prototxt, .h5, .protobuf)

ONNX
Limitation
NNB

NNB is compact binary format for NNabla C Runtime. The file format is shown as the following diagram:

_images/nnb.png

There are several concepts, such as buffer, variable, function, input and output in this file. Each of them is represented as a list. Each list is recorded with 2 members: number of object, and index in memory block table. The index points to the position in a memory block index table. The index in memory block index table points to the start address of memory data block.

It is designed for nnabla-c-runtime.

C Source Code

File format converter supports C source code output for nnabla-c-runtime.

Tensorflow
Limitation

Bridged by onnx, tensorflow import and export is supported with some limitations.

As for the importer, 4 formats tends to be supported:
  • .pb, tensorflow frozen graph format

  • .ckpt, tensorflow check point format version 1

  • .ckpt.*, tensorflow check point format version 2

  • saved_model, tensorflow saved_model format

As for the exporter, some of Neural Network Console projects are supported. See Model Support Status. The output of converter is tensorflow saved_model format.

Tensorflow Lite
Limitation
For export to tensorflow lite, please install flatbuffers package:
  • For Windows platform, download package from FlatBuffers and extract.

  • For Linux platform, use command snap install flatbuffers to install flatbuffers.

  • For MaxOS platform, use command brew install flatbuffers to install flatbuffers.

and add the executable file flatc to the system PATH.

After exporting TFLite, a json file with the same name will be generated, recording whether the input and output of the TFLite network need to be transposed to channel_last according to base_axis.

INT8 quantized Tensorflow Lite
Limitation

You should also install flatbuffers package. Please refer to the installation above. You need provide a represent dataset to the converter if you want to convert nnp to int8 quantized tflite. Represent dataset is a subset of training dataset, about 2% - 10% of training data. You can collect represent dataset in your training loop. It should be saved as numpy’s .npy format. Here’s an example:

rdataset = []
# suppose this is your training loop
for step in range(max_step):
    image, label = dataset.next()
    x.d = image
    rdataset.append(image)
    # your code
    # ...
rdataset = np.array(rdataset).astype(np.float32)
np.save('represent_dataset.npy', rdataset)

Of course, you can create represent dataset by any way you like, but please ensure the shape of each item is equal with the shape of network’s input and you have finished the necessary preprocess.

Process

Expand Repeat and Recurrent

Neural Network Console supports LoopControl pseudo functions RepeatStart, RepeatEnd, RecurrentInput, RecurrentOutput or Delay.

Currently, these functions are not supported by Neural Network Libraries directly.

The file format converter expands the network and removes these pseudo functions by default.

If you want to preserve these, specify command line option --nnp-no-expand-network when converting files.

Split network

You can split network with --split option.

See Splitting network to use this functionality.

Usage

NNP Operation

Convert NNP to NNP

Sometimes we need to convert NNP to NNP.

Most major usecase, expand repeat or recurrent network supported by Neural Network Console but not supported by C++ API.

$ nnabla_cli convert input.nnp output.nnp
Convert console output to single NNP file

Current version of Neural Network Console outputs .nntxt and .h5 as training result.

Then we need to convert separated files into single NNP and parameters store with protobuf format.

$ nnabla_cli convert net.nntxt parameters.h5 output.nnp
Convert console output to single NNP file without expanding Repeat or recurrent.
$ nnabla_cli convert --nnp-no-expand-network net.nntxt parameters.h5 output.nnp
Keep parameter format as hdf5
$ nnabla_cli convert --nnp-no-expand-network --nnp-parameter-h5 net.nntxt parameters.h5 output.nnp
Everything into single nntxt.
$ nnabla_cli convert --nnp-parameter-nntxt net.nntxt parameters.h5 output.nntxt

ONNX Operation

Convert NNP to ONNX
$ nnabla_cli convert input.nnp output.onnx

If specify output onnx opset 9, please use the following (default is opset 7):

$ nnabla_cli convert input.nnp output.onnx -d opset_9
Convert ONNX to NNP
$ nnabla_cli convert input.onnx output.nnp

Currently, opset 7,9,10,11 are supported to import.

C Runtime Operation

Generally, it is better to set the batch size to 1 when convert file to C runtime. If the batch size is larger than 1, it is necessary to process the batch size data collectively To make the batch size 1, add -b 1 to command line option.

Convert NNP to NNB
$ nnabla_cli convert -b 1 input.nnp output.nnb
Convert NNP to C source code
$ nnabla_cli convert -b 1 -O CSRC input.onnx output-dir
Quantization

C-runtime library supports binary(or fixed point) weights, which can dramatically downsize the model (and footprint). See Compress network by fixed point quantization for how to quantize your model.

Tensorflow Operation

Convert NNP to Tensorflow saved_model
$ nnabla_cli convert input.nnp output_saved_model --export-format SAVED_MODEL
Convert NNP to Tensorflow frozen graph
$ nnabla_cli convert input.nnp output.pb
Convert Tensorflow frozen graph to NNP
$ nnabla_cli convert input.pb output.nnp
Convert Tensorflow checkpoint to NNP

For checkpoint version 1:

$ nnabla_cli convert input.ckpt output.nnp --inputs x0,x1 --outputs y0,y1

In the same directory of input.ckpt, the related files, such as checkpoint, input.ckpt.meta and so on are required to exist. The inputs required the input name of model, separated by comma. The outputs is same. In parsing checkpoint format, input and output needs to be provided.

For checkpoint version 2:

$ nnabla_cli convert input.ckpt.meta output.nnp --inputs x0,x1 --outputs y0,y1

In the same directory of input.ckpt.meta, the related files, such as checkpoint, *.ckpt.index, … and so on are required to exist.

Convert Tensorflow saved_model to NNP
$ nnabla_cli convert input_saved_model output.nnp
Convert NNP to Tensorflow Lite
$ nnabla_cli convert -b 1 input.nnp output.tflite
Convert NNP to INT8 quantized Tensorflow Lite
$ nnabla_cli convert -b 1 input.nnp output.tflite --quantization --dataset represent_dataset.npy
Convert Tensorflow Lite to NNP
$ nnabla_cli convert input.tflite output.nnp

Splitting network

Splitting network is a bit complicated and can be troublesome.

NNP file could have multiple Executor networks, but Split supports only single network to split.

First, you must confirm how many Executors there are in the NNP, and specify what executor to split with nnabla_cli dump.

$ nnabla_cli dump squeezenet11.files/SqueezeNet-1.1/*.{nntxt,h5}
2018-08-27 15:02:40,006 [nnabla][INFO]: Initializing CPU extension...
Importing squeezenet11.files/SqueezeNet-1.1/net.nntxt
Importing squeezenet11.files/SqueezeNet-1.1/parameters.h5
 Expanding Training.
 Expanding Top5Error.
 Expanding Top1Error.
 Expanding Runtime.
  Optimizer[0]: Optimizer
  Optimizer[0]:  (In) Data      variable[0]: Name:TrainingInput                  Shape:[-1, 3, 480, 480]
  Optimizer[0]:  (In) Data      variable[1]: Name:SoftmaxCrossEntropy_T          Shape:[-1, 1]
  Optimizer[0]:  (Out)Loss      variable[0]: Name:SoftmaxCrossEntropy            Shape:[-1, 1]
  Monitor  [0]: train_error
  Monitor  [0]:  (In) Data      variable[0]: Name:Input                          Shape:[-1, 3, 320, 320]
  Monitor  [0]:  (In) Data      variable[1]: Name:Top5Error_T                    Shape:[-1, 1]
  Monitor  [0]:  (Out)Monitor   variable[0]: Name:Top5Error                      Shape:[-1, 1]
  Monitor  [1]: valid_error
  Monitor  [1]:  (In) Data      variable[0]: Name:Input                          Shape:[-1, 3, 320, 320]
  Monitor  [1]:  (In) Data      variable[1]: Name:Top1rror_T                     Shape:[-1, 1]
  Monitor  [1]:  (Out)Monitor   variable[0]: Name:Top1rror                       Shape:[-1, 1]
  Executor [0]: Executor
  Executor [0]:  (In) Data      variable[0]: Name:Input                          Shape:[-1, 3, 320, 320]
  Executor [0]:  (Out)Output    variable[0]: Name:y'                             Shape:[-1, 1000]

As above output now you know only 1 executor.

Then you can show executor information with nnabla_cli dump -E0.

$ nnabla_cli dump -E0 squeezenet11.files/SqueezeNet-1.1/*.{nntxt,h5}
2018-08-27 15:03:26,547 [nnabla][INFO]: Initializing CPU extension...
Importing squeezenet11.files/SqueezeNet-1.1/net.nntxt
Importing squeezenet11.files/SqueezeNet-1.1/parameters.h5
 Try to leave only executor[Executor].
 Expanding Runtime.
  Executor [0]: Executor
  Executor [0]:  (In) Data      variable[0]: Name:Input                          Shape:[-1, 3, 320, 320]
  Executor [0]:  (Out)Output    variable[0]: Name:y'                             Shape:[-1, 1000]

You can get list of function adding -F option.

$ nnabla_cli dump -FE0 squeezenet11.files/SqueezeNet-1.1/*.{nntxt,h5}
2018-08-27 15:04:10,954 [nnabla][INFO]: Initializing CPU extension...
Importing squeezenet11.files/SqueezeNet-1.1/net.nntxt
Importing squeezenet11.files/SqueezeNet-1.1/parameters.h5
 Try to leave only executor[Executor].
 Expanding Runtime.
  Executor [0]: Executor
  Executor [0]:  (In) Data      variable[0]: Name:Input                          Shape:[-1, 3, 320, 320]
  Executor [0]:  (Out)Output    variable[0]: Name:y'                             Shape:[-1, 1000]
  Executor [0]:   Function[  0  ]: Type: Slice                Name: Slice
  Executor [0]:   Function[  1  ]: Type: ImageAugmentation    Name: ImageAugmentation
  Executor [0]:   Function[  2  ]: Type: MulScalar            Name: SqueezeNet/MulScalar
  Executor [0]:   Function[  3  ]: Type: AddScalar            Name: SqueezeNet/AddScalar
  Executor [0]:   Function[  4  ]: Type: Convolution          Name: SqueezeNet/Convolution
  Executor [0]:   Function[  5  ]: Type: ReLU                 Name: SqueezeNet/ReLU
  Executor [0]:   Function[  6  ]: Type: MaxPooling           Name: SqueezeNet/MaxPooling

    SNIP...

  Executor [0]:   Function[ 63  ]: Type: ReLU                 Name: SqueezeNet/FireModule_8/Expand1x1ReLU
  Executor [0]:   Function[ 64  ]: Type: Concatenate          Name: SqueezeNet/FireModule_8/Concatenate
  Executor [0]:   Function[ 65  ]: Type: Dropout              Name: SqueezeNet/Dropout
  Executor [0]:   Function[ 66  ]: Type: Convolution          Name: SqueezeNet/Convolution_2
  Executor [0]:   Function[ 67  ]: Type: ReLU                 Name: SqueezeNet/ReLU_2
  Executor [0]:   Function[ 68  ]: Type: AveragePooling       Name: SqueezeNet/AveragePooling
  Executor [0]:   Function[ 69  ]: Type: Reshape              Name: SqueezeNet/Reshape
  Executor [0]:   Function[ 70  ]: Type: Identity             Name: y'

If you want to get network without Image Augmentation, according to above output, ImageAugmentation is placed on index 2. With splitting after index 3, you can get network without ImageAugmentation. You must specify -E0 -S 3- option to nnabla_cli convert This command rename output to XXX_S_E.nnp, XXX is original name, S is start function index, and E is end function index.

$ nnabla_cli convert -E0 -S 3- squeezenet11.files/SqueezeNet-1.1/*.{nntxt,h5} splitted.nnp
2018-08-27 15:20:21,950 [nnabla][INFO]: Initializing CPU extension...
Importing squeezenet11.files/SqueezeNet-1.1/net.nntxt
Importing squeezenet11.files/SqueezeNet-1.1/parameters.h5
 Try to leave only executor[Executor].
 Expanding Runtime.
   Shrink 3 to 70.
    Output to [splitted_3_70.nnp]

Finally you got splitted_3_70.nnp as splitted output. You can check splitted NNP with nnabla_cli dump

NOTE: Input shape is changed from original network. New input shape is same as start function’s input.

$ nnabla_cli dump splitted_3_70.nnp
2018-08-27 15:20:28,021 [nnabla][INFO]: Initializing CPU extension...
Importing splitted_3_70.nnp
 Expanding Runtime.
  Executor [0]: Executor
  Executor [0]:  (In) Data      variable[0]: Name:SqueezeNet/MulScalar           Shape:[-1, 3, 227, 227]
  Executor [0]:  (Out)Output    variable[0]: Name:y'                             Shape:[-1, 1000]

Done.

Support Status

Function-Level Support Status

ONNX Support Status

Note

In this document, the numbers in the header of all tables represent the version of onnx opset.

Import
  • ✓: onnx specification defined, and supported.

  • X: onnx specification defined, but not support yet.

  • Empty: Not defined (Support status follows latest).

Total: 93/155

ONNX Operator

1

2

3

4

5

6

7

8

9

10

11

NNabla Func

Description

Abs

Abs

Acos

ACos

Acosh

ACosh

Add

Add2, Reshape

And

LogicalAnd, Reshape

ArgMax

X

Max

ArgMin

X

Min

Asin

ASin

Asinh

ASinh

Atan

ATan

Atanh

ATanh

AveragePool

X

X

AveragePooling, Pad

Not all features are verified. Those features can be verified by ONNXRuntime when opset > 6. Some feature is not supported by Nnabla such as Pad’s edge mode. if opset >= 10, the ceil_mode is not supported.

BatchNormalization

X

X

X

BatchNormalization

BitShift

X

Not yet implemented.

Cast

X

Abs, Log

Ceil

Ceil

Clip

Identity, MaximumScalar, MinimumScalar

Compress

X

X

Not yet implemented.

Concat

X

Concatenate

ConcatFromSequence

X

Not yet implemented.

Constant

X

X

Identity

ConstantOfShape

Constant

Conv

X

Convolution

ConvInteger

X

Not yet implemented.

ConvTranspose

X

Deconvolution, Pad

Cos

Cos

Cosh

Cosh

CumSum

X

Not yet implemented.

DepthToSpace

Reshape, Transpose

DequantizeLinear

X

Not yet implemented.

Det

X

Not yet implemented.

Div

Div2, Reshape

Dropout

X

Identity

DynamicQuantizeLinear

X

Not yet implemented.

Elu

ELU

Equal

X

Equal, Reshape

Erf

X

Not yet implemented.

Exp

Exp

Expand

Broadcast, Reshape

EyeLike

X

Not yet implemented.

Flatten

Reshape

Floor

Floor

GRU

X

X

X

Not yet implemented.

Gather

Concatenate, Slice

GatherElements

X

Not yet implemented.

GatherND

X

Not yet implemented.

Gemm

Add2, BatchMatmul, MulScalar, Reshape

GlobalAveragePool

GlobalAveragePooling

GlobalLpPool

X

X

Not yet implemented.

GlobalMaxPool

X

Not yet implemented.

Greater

Greater, Reshape

HardSigmoid

AddScalar, HardSigmoid, MaximumScalar, MinimumScalar, MulScalar

Hardmax

Max, OneHot, Reshape

Identity

Identity

If

X

Not yet implemented.

InstanceNormalization

BatchNormalization, Concatenate, Reshape, Split

IsInf

IsInf

IsNaN

IsNaN

LRN

AddScalar, Div2, MulScalar, PowScalar, SumPooling, Transpose

LSTM

X

X

Not yet implemented.

LeakyRelu

LeakyReLU

Less

Less, Reshape

Log

Log

LogSoftmax

Add2, Exp, Log, Max, Reshape, Sub2, Sum

Loop

X

X

Not yet implemented.

LpNormalization

X

Not yet implemented.

LpPool

X

X

X

Not yet implemented.

MatMul

BatchMatmul, Reshape

MatMulInteger

X

Not yet implemented.

Max

Maximum2

MaxPool

X

X

X

MaxPooling, Pad

Not all features are verified. Those features can be verified by ONNXRuntime. if opset >= 10, the ceil_mode is not supported, dilations is not equal to 1 is not supported.

MaxRoiPool

X

Not yet implemented.

MaxUnpool

X

X

Not yet implemented.

Mean

Broadcast, Mean, Stack

MeanVarianceNormalization

X

Not yet implemented.

Min

Minimum2

Mod

X

Not yet implemented.

Mul

Mul2, Reshape

Multinomial

X

Not yet implemented.

Neg

MulScalar

NonMaxSuppression

X

X

Not yet implemented.

NonZero

X

Not yet implemented.

Not

LogicalNot

OneHot

X

X

Not yet implemented.

Or

LogicalOr, Reshape

PRelu

X

PReLU

Pad

Pad

Onnx required to support “edge” mode, while nnabla does not support it.

Pow

Pow2, Reshape

QLinearConv

X

Not yet implemented.

QLinearMatMul

X

Not yet implemented.

QuantizeLinear

X

Not yet implemented.

RNN

X

X

Not yet implemented.

RandomNormal

X

Not yet implemented.

RandomNormalLike

X

Not yet implemented.

RandomUniform

X

Not yet implemented.

RandomUniformLike

X

Not yet implemented.

Range

X

Not yet implemented.

Reciprocal

RDivScalar

ReduceL1

X

X

Not yet implemented.

ReduceL2

X

X

Not yet implemented.

ReduceLogSum

X

X

Not yet implemented.

ReduceLogSumExp

X

X

Not yet implemented.

ReduceMax

Max

ReduceMean

Mean

ReduceMin

Min

ReduceProd

Prod

ReduceSum

Sum

ReduceSumSquare

PowScalar, Sum

Relu

ReLU

Reshape

Reshape

Resize

X

X

Not yet implemented.

ReverseSequence

X

Not yet implemented.

RoiAlign

X

Not yet implemented.

Round

Round

Scan

X

X

X

Not yet implemented.

Scatter

X

X

Not yet implemented.

ScatterElements

X

Not yet implemented.

ScatterND

X

Not yet implemented.

Selu

SELU

SequenceAt

X

Not yet implemented.

SequenceConstruct

X

Not yet implemented.

SequenceErase

X

Not yet implemented.

SequenceInsert

X

Not yet implemented.

SequenceLength

X

Not yet implemented.

Shape

X

Not yet implemented.

Shrink

X

Not yet implemented.

Sigmoid

Sigmoid

Sign

Sign

Sin

Sin

Sinh

Sinh

Size

X

Not yet implemented.

Slice

X

Slice

Softmax

Div2, Exp, Max, Reshape, Sub2, Sum

Softplus

SoftPlus

Softsign

SoftSign

SpaceToDepth

Reshape, Transpose

Split

Split, Stack

SplitToSequence

X

Not yet implemented.

Sqrt

PowScalar

Squeeze

Reshape

StringNormalizer

X

Not yet implemented.

Sub

Reshape, Sub2

Sum

X

X

AddN

Tan

Tan

Tanh

Tanh

TfIdfVectorizer

X

Not yet implemented.

ThresholdedRelu

Constant, GreaterScalar, Where

Tile

Tile

TopK

X

X

X

Not yet implemented.

Transpose

Transpose

Unique

X

Not yet implemented.

Unsqueeze

Reshape

Upsample

X

X

X

Unpooling

Where

Where

Xor

LogicalXor, Reshape

Export
  • ✓: Support to export this opset.

  • △: Partially support to export this opset (e.g. some cases cannot be supported, or not completely tested).

  • X: Supported, but test failed.

  • Empty: Not support corresponding opset version.

Total: 120/173

Neural Network Layer

Count 11/14

NNabla Function

7

9

10

11

ONNX Op

Description

Affine

Gemm, Reshape

RNN

Not yet implemented.

LSTM

Not yet implemented.

GRU

Not yet implemented.

Convolution

Conv, Reshape

DepthwiseConvolution

Conv, Reshape

Deconvolution

ConvTranspose, Reshape

DepthwiseDeconvolution

ConvTranspose, Reshape

MaxPooling

Constant, MaxPool, Pad, Reshape

AveragePooling

AveragePool, Constant, Pad, Reshape

Currently only supports the cases where both ignore_border and including_pad are True.

GlobalAveragePooling

GlobalAveragePool

SumPooling

AveragePool, Constant, Mul, Pad, Reshape

Unpooling

Resize

Embed

Gather

Neural Network Activation Functions

Count 21/21

NNabla Function

7

9

10

11

ONNX Op

Description

Sigmoid

Sigmoid

Swish

Mul, Sigmoid

Tanh

Tanh

ReLU

Relu

LeakyReLU

LeakyRelu

Softmax

Div, Exp, ReduceMax, ReduceSum, Sub

LogSoftmax

Exp, Log, ReduceMax, ReduceSum, Sub

ELU

Elu

SELU

Selu

CReLU

Concat, Neg, Relu

CELU

Concat, Elu, Neg

PReLU

PRelu, Reshape

GELU

Add, Constant, Div, Mul, Pow, Sqrt, Tanh

ReLU6

Constant, Min, Relu

HardSigmoid

HardSigmoid

HardTanh

Constant, Max, Min, Neg

LogSigmoid

Log, Sigmoid

SoftPlus

Softplus

SoftSign

Softsign

TanhShrink

Sub, Tanh

Sinc

X

X

X

Constant, Div, Equal, Sin, Where

Normalization

Count 2/6

NNabla Function

7

9

10

11

ONNX Op

Description

FusedBatchNormalization

Add, BatchNormalization, Constant, Div, Mul, ReduceMean, ReduceSum, Relu, Reshape, Squeeze, Sub

BatchNormalization

BatchNormalization, Constant, Div, Mul, ReduceMean, ReduceSum, Reshape, Squeeze, Sub

SyncBatchNormalization

Not yet implemented.

MeanSubtraction

Not yet implemented.

ClipGradByValue

Not yet implemented.

ClipGradByNorm

Not yet implemented.

Reduction

Count 5/7

NNabla Function

7

9

10

11

ONNX Op

Description

Sum

ReduceSum

Mean

ReduceMean

Max

ReduceMax

Min

ReduceMin

Prod

ReduceProd

ReduceSum

Not yet implemented.

ReduceMean

Not yet implemented.

Arithmetic

Count 11/12

NNabla Function

7

9

10

11

ONNX Op

Description

Add2

Add

BcAdd2

Not yet implemented.

Sub2

Sub

Mul2

Mul

Div2

Div

Pow2

Pow

AddScalar

Add, Constant

MulScalar

Constant, Mul

PowScalar

Constant, Pow

RSubScalar

Constant, Sub

RDivScalar

Constant, Div

RPowScalar

Constant, Pow

Logical

Count 29/29

NNabla Function

7

9

10

11

ONNX Op

Description

Sign

X

Sign

Minimum2

Add, Constant, Min

Maximum2

Add, Constant, Max

MinimumScalar

Add, Constant, Min

MaximumScalar

Add, Constant, Max

LogicalAnd

And

LogicalOr

Or

LogicalXor

Xor

Equal

X

X

X

Equal

NotEqual

X

X

X

Equal, Not

GreaterEqual

Less, Not

Greater

Greater

LessEqual

Greater, Not

Less

Less

LogicalAndScalar

And, Constant

LogicalOrScalar

Constant, Or

LogicalXorScalar

Constant, Xor

EqualScalar

X

X

X

Constant, Equal

NotEqualScalar

X

X

X

Constant, Equal, Not

GreaterEqualScalar

Constant, Less, Not

GreaterScalar

Constant, Greater

LessEqualScalar

Constant, Greater, Not

LessScalar

Constant, Less

LogicalNot

Not

IsNaN

X

IsNaN

IsInf

X

X

IsInf

ResetNaN

X

Constant, IsNaN, Where

ResetInf

X

X

Constant, IsInf, Where

Where

X

Where

Math

Count 22/22

NNabla Function

7

9

10

11

ONNX Op

Description

Constant

Constant, Identity

Arange

Constant, Identity

Abs

Abs

Exp

Exp

Log

Log

Identity

Identity

BatchMatmul

MatMul, Transpose

Round

X

X

X

Round

Ceil

Ceil

Floor

Floor

Sin

Sin

Cos

Cos

Tan

Tan

Sinh

X

Sinh

Cosh

X

Cosh

ASin

Asin

ACos

Acos

ATan

Atan

ATan2

Atan, Div

ASinh

X

Asinh

ACosh

X

Acosh

ATanh

X

Atanh

Array Manipulation

Count 12/19

NNabla Function

7

9

10

11

ONNX Op

Description

Concatenate

Concat

Split

Split, Squeeze

Stack

Concat, Unsqueeze

Slice

Constant, Slice

ONNX slice cannot support step != 1 on opset < 10.

Pad

Constant, Pad

When the mode of the pad is reflect, if the size of the pad exceeds the input size, onnxruntime cannot handle it.

Transpose

Transpose

Broadcast

X

BroadcastTo

Tile

Constant, Reshape, Tile

OneHot

X

Flatten, Gather, Reshape

Flip

Gather, Identity, Transpose

Shift

Not yet implemented.

Sort

Not yet implemented.

Reshape

Constant, Reshape

MatrixDiag

Not yet implemented.

MatrixDiagPart

Not yet implemented.

Assign

Not yet implemented.

GatherNd

Not yet implemented.

ScatterNd

Not yet implemented.

Signal Processing

Count 1/3

NNabla Function

7

9

10

11

ONNX Op

Description

Interpolate

X

X

Resize

FFT

Not yet implemented.

IFFT

Not yet implemented.

Stochasticity

Count 0/11

NNabla Function

7

9

10

11

ONNX Op

Description

Dropout

X

X

X

X

Dropout

The Dropout in nnabla has no test mode and contains random parameters, so the test result is not the same as onnx.

TopKData

Not yet implemented.

TopKGrad

Not yet implemented.

Rand

Not yet implemented.

Randint

Not yet implemented.

Randn

Not yet implemented.

RandomChoice

Not yet implemented.

RandomCrop

Not yet implemented.

RandomFlip

Not yet implemented.

RandomShift

Not yet implemented.

ImageAugmentation

Not yet implemented.

Loss Functions

Count 0/9

NNabla Function

7

9

10

11

ONNX Op

Description

SigmoidCrossEntropy

Not yet implemented.

BinaryCrossEntropy

Not yet implemented.

SoftmaxCrossEntropy

Not yet implemented.

CategoricalCrossEntropy

Not yet implemented.

SquaredError

Not yet implemented.

AbsoluteError

Not yet implemented.

HuberLoss

Not yet implemented.

EpsilonInsensitiveLoss

Not yet implemented.

KLMultinomial

Not yet implemented.

Quantization Neural Network Layers

Count 6/12

NNabla Function

7

9

10

11

ONNX Op

Description

BinarySigmoid

X

Constant, Greater, Where

BinaryTanh

X

Constant, Greater, Where

BinaryConnectAffine

Gemm, Reshape

BinaryConnectConvolution

Conv, Reshape

BinaryWeightAffine

Add, MatMul, Mul, Reshape

BinaryWeightConvolution

Add, Conv, Mul, Reshape

INQAffine

Not yet implemented.

INQConvolution

Not yet implemented.

FixedPointQuantize

Not yet implemented.

MinMaxQuantize

Not yet implemented.

Pow2Quantize

Not yet implemented.

Prune

Not yet implemented.

Validation

Count 0/3

NNabla Function

7

9

10

11

ONNX Op

Description

TopNError

Not yet implemented.

BinaryError

Not yet implemented.

ConfusionMatrix

Not yet implemented.

Unsupported, Special Use

Count 0/5

NNabla Function

7

9

10

11

ONNX Op

Description

VATNoise

Not yet implemented.

Unlink

Not yet implemented.

Sink

Not yet implemented.

NmsDetection2d

Not yet implemented.

MaxPoolingBackward

Not yet implemented.

Tensorflow Support Status

Import
  • ✓: Supported

  • △: Partially supported

  • X: Supported, but test failed.

  • Empty: Not support yet.

Total: 109/122

Tensorflow support status

Tensorflow Function

Status

NNabla Func

Description

Abs

Abs

Acos

ACos

Acosh

ACosh

Add

Add2

AddN

AddN

All

Greater, Min, Reshape

Any

Greater, Reshape, Sum

ArgMax

Max

ArgMin

Min

Asin

ASin

Asinh

ASinh

Atan

ATan

Atan2

ATan, Add2, Div2, Mul2, Reshape, Sign, Sub2

Atanh

ATanh

AvgPool

AveragePooling, Pad, Transpose

AvgPool3D

AveragePooling, Pad, Transpose

BatchMatMul

BatchMatmul, Transpose

BatchNormalization

Add2, Mul2, PowScalar, RDivScalar, Reshape, Sub2

BiasAdd

Add2, Reshape

BroadcastTo

Cast

X

NA

Not yet implemented.

Ceil

Ceil

ClipByValue

Maximum2, Minimum2, Reshape

Concat

Concatenate

ConcatV2

Concatenate

Const

NA

Conv1D

Convolution, Pad, Reshape, Transpose

Conv1DTranspose

Deconvolution, Reshape, Transpose

Conv2D

Convolution, Pad, Transpose

Conv2DBackpropInput

Deconvolution, Transpose

Conv3D

Convolution, Pad, Transpose

Conv3DBackpropInput

Deconvolution, Pad, Transpose

Cos

Cos

Cosh

Cosh

Crelu

Concatenate, MulScalar, ReLU

Cumsum

X

Not yet implemented.

DepthToSpace

Reshape, Transpose

DepthwiseConv2d

Convolution, Pad, Reshape, Transpose

Div

Div2

Elu

ELU

Equal

Equal

Erf

X

Not yet implemented.

Erfc

X

Not yet implemented.

Exp

Exp

ExpandDims

Reshape

Floor

Floor

FloorDiv

Div2, Floor

FloorMod

Div2, Floor, Mul2, Sub2

GatherNd

X

Not yet implemented.

GatherV2

X

Concatenate, Slice

Not yet implemented.

Greater

Greater

GreaterEqual

Less, LogicalNot

Identity

Identity

IsInf

IsInf

IsNan

IsNaN

LeakyRelu

LeakyReLU

Less

Less

LessEqual

Greater, LogicalNot

Log

Log

LogSigmoid

MulScalar, SoftPlus

LogSoftmax

Add2, Exp, Log, Max, Reshape, Sub2, Sum, Transpose

LogicalAnd

LogicalAnd

LogicalNot

LogicalNot

LogicalOr

LogicalOr

LogicalXor

LogicalAnd, LogicalNot, LogicalOr

Max

Max

MaxPool

MaxPooling, Pad, Reshape, Transpose

MaxPool3D

MaxPooling, Pad, Transpose

MaxPoolWithArgmax

X

Not yet implemented.

Maximum

Maximum2

Mean

Mean

Min

Min

Minimum

Minimum2

Mul

Mul2

Neg

MulScalar

NotEqual

Equal, LogicalNot

Pack

Concatenate, Reshape

Pad

Pad

Pow

Pow2

Prod

Prod

RealDiv

Div2

Reciprocal

RDivScalar

Relu

ReLU

Relu6

MaximumScalar, MinimumScalar

Reshape

Reshape

ReverseSequence

X

Not yet implemented.

ReverseV2

X

Not yet implemented.

Round

Round

Rsqrt

PowScalar, RDivScalar

Selu

SELU

Shape

X

Not yet implemented.

Sigmoid

Sigmoid

Sign

Sign

Sin

Sin

Sinh

Sinh

Size

X

Not yet implemented.

Slice

Slice

Softmax

Div2, Exp, Max, Reshape, Sub2, Sum, Transpose

Softplus

SoftPlus

Softsign

SoftSign

SpaceToDepth

Reshape, Transpose

Split

Split, Stack

SplitV

Split, Stack

Sqrt

PowScalar

Square

Mul2

SquaredDifference

Mul2, Sub2

Squeeze

Reshape

StopGradient

Identity

StridedSlice

Slice

Sub

Sub2

Sum

Sum

Swish

Mul2, Sigmoid

Tan

Tan

Tanh

Tanh

Tile

Tile

TopKV2

X

Not yet implemented.

Transpose

Transpose

TruncateDiv

Div2

TruncateMod

X

Not yet implemented.

Unpack

Reshape, Split, Stack

Where

Where

ZerosLike

NA

Export
  • ✓: Supported

  • △: Partially supported

  • X: Supported, but test failed.

  • Empty: Not support yet.

Total: 120/173

Neural Network Layer

Count 11/14

NNabla Function

Status

Description

Affine

RNN

Not yet implemented.

LSTM

Not yet implemented.

GRU

Not yet implemented.

Convolution

The cases dilations and strides larger than 1 are not supported by tensorflow.

DepthwiseConvolution

The cases dilations and strides larger than 1 are not supported by tensorflow.

Deconvolution

The cases dilations larger than 1 are not supported by tensorflow.

DepthwiseDeconvolution

The cases dilations larger than 1 are not supported by tensorflow.

MaxPooling

AveragePooling

Currently only supports the cases both ignore_border and including_pad are True.

GlobalAveragePooling

SumPooling

Unpooling

The kernel only supports 2d.

Embed

Neural Network Activation Functions

Count 21/21

NNabla Function

Status

Description

Sigmoid

Swish

Tanh

ReLU

LeakyReLU

Softmax

LogSoftmax

ELU

SELU

CReLU

CELU

PReLU

GELU

ReLU6

HardSigmoid

HardTanh

LogSigmoid

SoftPlus

SoftSign

TanhShrink

Sinc

Normalization

Count 2/6

NNabla Function

Status

Description

FusedBatchNormalization

BatchNormalization

SyncBatchNormalization

Not yet implemented.

MeanSubtraction

Not yet implemented.

ClipGradByValue

Not yet implemented.

ClipGradByNorm

Not yet implemented.

Reduction

Count 5/7

NNabla Function

Status

Description

Sum

Mean

Max

Min

Prod

ReduceSum

Not yet implemented.

ReduceMean

Not yet implemented.

Arithmetic

Count 11/12

NNabla Function

Status

Description

Add2

BcAdd2

Not yet implemented.

Sub2

Mul2

Div2

Pow2

AddScalar

MulScalar

PowScalar

RSubScalar

RDivScalar

RPowScalar

Logical

Count 29/29

NNabla Function

Status

Description

Sign

Minimum2

Maximum2

MinimumScalar

MaximumScalar

LogicalAnd

LogicalOr

LogicalXor

Equal

NotEqual

GreaterEqual

Greater

LessEqual

Less

LogicalAndScalar

LogicalOrScalar

LogicalXorScalar

EqualScalar

NotEqualScalar

GreaterEqualScalar

GreaterScalar

LessEqualScalar

LessScalar

LogicalNot

IsNaN

IsInf

ResetNaN

ResetInf

Where

Math

Count 22/22

NNabla Function

Status

Description

Constant

Arange

Abs

Exp

Log

Identity

BatchMatmul

Round

Ceil

Floor

Sin

Cos

Tan

Sinh

Cosh

ASin

ACos

ATan

ATan2

ASinh

ACosh

ATanh

Array Manipulation

Count 12/19

NNabla Function

Status

Description

Concatenate

Split

Stack

Slice

Pad

When the mode of the pad is reflect, if the size of the pad exceeds the input size, tensorflow cannot handle it.

Transpose

Broadcast

BroadcastTo

Tile

OneHot

Flip

Shift

Not yet implemented.

Sort

Not yet implemented.

Reshape

MatrixDiag

Not yet implemented.

MatrixDiagPart

Not yet implemented.

Assign

Not yet implemented.

GatherNd

Not yet implemented.

ScatterNd

Not yet implemented.

Signal Processing

Count 1/3

NNabla Function

Status

Description

Interpolate

FFT

Not yet implemented.

IFFT

Not yet implemented.

Stochasticity

Count 0/11

NNabla Function

Status

Description

Dropout

X

The Dropout in nnabla has no test mode and contains random parameters, so the test result is not the same as tensorflow.

TopKData

Not yet implemented.

TopKGrad

Not yet implemented.

Rand

Not yet implemented.

Randint

Not yet implemented.

Randn

Not yet implemented.

RandomChoice

Not yet implemented.

RandomCrop

Not yet implemented.

RandomFlip

Not yet implemented.

RandomShift

Not yet implemented.

ImageAugmentation

Not yet implemented.

Loss Functions

Count 0/9

NNabla Function

Status

Description

SigmoidCrossEntropy

Not yet implemented.

BinaryCrossEntropy

Not yet implemented.

SoftmaxCrossEntropy

Not yet implemented.

CategoricalCrossEntropy

Not yet implemented.

SquaredError

Not yet implemented.

AbsoluteError

Not yet implemented.

HuberLoss

Not yet implemented.

EpsilonInsensitiveLoss

Not yet implemented.

KLMultinomial

Not yet implemented.

Quantization Neural Network Layers

Count 6/12

NNabla Function

Status

Description

BinarySigmoid

BinaryTanh

BinaryConnectAffine

BinaryConnectConvolution

The cases dilations and strides larger than 1 are not supported by tensorflow.

BinaryWeightAffine

BinaryWeightConvolution

The cases dilations and strides larger than 1 are not supported by tensorflow.

INQAffine

Not yet implemented.

INQConvolution

Not yet implemented.

FixedPointQuantize

Not yet implemented.

MinMaxQuantize

Not yet implemented.

Pow2Quantize

Not yet implemented.

Prune

Not yet implemented.

Validation

Count 0/3

NNabla Function

Status

Description

TopNError

Not yet implemented.

BinaryError

Not yet implemented.

ConfusionMatrix

Not yet implemented.

Unsupported, Special Use

Count 0/5

NNabla Function

Status

Description

VATNoise

Not yet implemented.

Unlink

Not yet implemented.

Sink

Not yet implemented.

NmsDetection2d

Not yet implemented.

MaxPoolingBackward

Not yet implemented.

Tensorflow Lite Support Status

Export
  • ✓: Supported

  • △: Partially supported

  • X: Supported, but test failed.

  • Empty: Not support yet.

Total: 98/173

Neural Network Layer

Count 8/14

NNabla Function

Status

Affine

RNN

LSTM

GRU

Convolution

DepthwiseConvolution

Deconvolution

DepthwiseDeconvolution

MaxPooling

X

AveragePooling

X

GlobalAveragePooling

SumPooling

X

Unpooling

Embed

Neural Network Activation Functions

Count 20/21

NNabla Function

Status

Sigmoid

Swish

Tanh

ReLU

LeakyReLU

Softmax

LogSoftmax

ELU

SELU

CReLU

CELU

PReLU

GELU

ReLU6

HardSigmoid

HardTanh

LogSigmoid

SoftPlus

SoftSign

TanhShrink

Sinc

X

Normalization

Count 0/6

NNabla Function

Status

FusedBatchNormalization

X

BatchNormalization

X

SyncBatchNormalization

MeanSubtraction

ClipGradByValue

ClipGradByNorm

Reduction

Count 5/7

NNabla Function

Status

Sum

Mean

Max

Min

Prod

ReduceSum

ReduceMean

Arithmetic

Count 11/12

NNabla Function

Status

Add2

BcAdd2

Sub2

Mul2

Div2

Pow2

AddScalar

MulScalar

PowScalar

RSubScalar

RDivScalar

RPowScalar

Logical

Count 25/29

NNabla Function

Status

Sign

Minimum2

Maximum2

MinimumScalar

MaximumScalar

LogicalAnd

LogicalOr

LogicalXor

Equal

NotEqual

GreaterEqual

Greater

LessEqual

Less

LogicalAndScalar

LogicalOrScalar

LogicalXorScalar

EqualScalar

NotEqualScalar

GreaterEqualScalar

GreaterScalar

LessEqualScalar

LessScalar

LogicalNot

IsNaN

IsInf

X

ResetNaN

X

ResetInf

X

Where

X

Math

Count 14/22

NNabla Function

Status

Constant

Arange

Abs

Exp

Log

Identity

BatchMatmul

Round

X

Ceil

Floor

Sin

Cos

Tan

Sinh

Cosh

ASin

X

ACos

X

ATan

X

ATan2

X

ASinh

X

ACosh

X

ATanh

X

Array Manipulation

Count 11/19

NNabla Function

Status

Concatenate

Split

Stack

Slice

Pad

X

Transpose

Broadcast

BroadcastTo

Tile

OneHot

Flip

Shift

Sort

Reshape

MatrixDiag

MatrixDiagPart

Assign

GatherNd

ScatterNd

Signal Processing

Count 0/3

NNabla Function

Status

Interpolate

X

FFT

IFFT

Stochasticity

Count 0/11

NNabla Function

Status

Dropout

X

TopKData

TopKGrad

Rand

Randint

Randn

RandomChoice

RandomCrop

RandomFlip

RandomShift

ImageAugmentation

Loss Functions

Count 0/9

NNabla Function

Status

SigmoidCrossEntropy

BinaryCrossEntropy

SoftmaxCrossEntropy

CategoricalCrossEntropy

SquaredError

AbsoluteError

HuberLoss

EpsilonInsensitiveLoss

KLMultinomial

Quantization Neural Network Layers

Count 4/12

NNabla Function

Status

BinarySigmoid

X

BinaryTanh

X

BinaryConnectAffine

BinaryConnectConvolution

BinaryWeightAffine

BinaryWeightConvolution

INQAffine

INQConvolution

FixedPointQuantize

MinMaxQuantize

Pow2Quantize

Prune

Validation

Count 0/3

NNabla Function

Status

TopNError

BinaryError

ConfusionMatrix

Unsupported, Special Use

Count 0/5

NNabla Function

Status

VATNoise

Unlink

Sink

NmsDetection2d

MaxPoolingBackward

NNabla C Runtime Support Status

NNabla version: None

  • ✓: Supported

  • △: Partially supported

  • X: Supported, but test failed or no test data.

  • Empty: Not support yet.

Export

Total: 56/173

Neural Network Layer

Count 8/14

NNabla Function

Status

Description

Affine

RNN

LSTM

GRU

Convolution

DepthwiseConvolution

Deconvolution

DepthwiseDeconvolution

MaxPooling

AveragePooling

GlobalAveragePooling

SumPooling

Unpooling

Embed

Neural Network Activation Functions

Count 11/21

NNabla Function

Status

Description

Sigmoid

Swish

Tanh

ReLU

LeakyReLU

Softmax

LogSoftmax

ELU

SELU

CReLU

CELU

PReLU

GELU

ReLU6

HardSigmoid

HardTanh

LogSigmoid

SoftPlus

SoftSign

TanhShrink

Sinc

Normalization

Count 1/6

NNabla Function

Status

Description

FusedBatchNormalization

BatchNormalization

SyncBatchNormalization

MeanSubtraction

X

ClipGradByValue

ClipGradByNorm

Reduction

Count 1/7

NNabla Function

Status

Description

Sum

Mean

Max

Min

Prod

ReduceSum

ReduceMean

Arithmetic

Count 11/12

NNabla Function

Status

Description

Add2

BcAdd2

Sub2

Mul2

Div2

Pow2

AddScalar

MulScalar

PowScalar

RSubScalar

RDivScalar

RPowScalar

Logical

Count 5/29

NNabla Function

Status

Description

Sign

Minimum2

Maximum2

MinimumScalar

MaximumScalar

LogicalAnd

LogicalOr

LogicalXor

Equal

NotEqual

GreaterEqual

Greater

LessEqual

Less

LogicalAndScalar

LogicalOrScalar

LogicalXorScalar

EqualScalar

NotEqualScalar

GreaterEqualScalar

GreaterScalar

LessEqualScalar

LessScalar

LogicalNot

IsNaN

IsInf

ResetNaN

ResetInf

Where

Math

Count 6/22

NNabla Function

Status

Description

Constant

Arange

Abs

Exp

Log

Identity

BatchMatmul

Round

Ceil

Floor

Sin

Cos

Tan

Sinh

Cosh

ASin

ACos

ATan

ATan2

ASinh

ACosh

ATanh

Array Manipulation

Count 7/19

NNabla Function

Status

Description

Concatenate

Split

Stack

Slice

Pad

Transpose

Broadcast

BroadcastTo

Tile

OneHot

Flip

Shift

X

Sort

Reshape

MatrixDiag

X

MatrixDiagPart

X

Assign

GatherNd

ScatterNd

Signal Processing

Count 0/3

NNabla Function

Status

Description

Interpolate

FFT

IFFT

Stochasticity

Count 0/11

NNabla Function

Status

Description

Dropout

X

TopKData

TopKGrad

Rand

Randint

Randn

RandomChoice

RandomCrop

RandomFlip

RandomShift

ImageAugmentation

Loss Functions

Count 0/9

NNabla Function

Status

Description

SigmoidCrossEntropy

BinaryCrossEntropy

SoftmaxCrossEntropy

CategoricalCrossEntropy

SquaredError

AbsoluteError

HuberLoss

EpsilonInsensitiveLoss

KLMultinomial

Quantization Neural Network Layers

Count 6/12

NNabla Function

Status

Description

BinarySigmoid

BinaryTanh

BinaryConnectAffine

BinaryConnectConvolution

BinaryWeightAffine

BinaryWeightConvolution

INQAffine

INQConvolution

FixedPointQuantize

MinMaxQuantize

Pow2Quantize

Prune

Validation

Count 0/3

NNabla Function

Status

Description

TopNError

BinaryError

ConfusionMatrix

Unsupported, Special Use

Count 0/5

NNabla Function

Status

Description

VATNoise

Unlink

Sink

NmsDetection2d

MaxPoolingBackward

Model Support Status

ONNX Support Status

Import
  • ✓: Support to convert

  • X: Not support

Total: 11/12

ONNX Import Sample Test(onnx –> nnp)

Count 11/12

Export
  • ✓: Support to convert

  • X: Not support

Total: 59/65

ONNX Export Sample Test(nnp –> onnx)

Count 34/37

Name

Support

Memo

01_logistic_regression_10

01_logistic_regression_9

02_binary_cnn_15

02_binary_cnn_16

06_auto_encoder_17

06_auto_encoder_18

10_deep_mlp_13

10_deep_mlp_14

11_deconvolution_11

11_deconvolution_12

12_residual_learning_19

12_residual_learning_20

LSTM_auto_encoder_23

LSTM_auto_encoder_24

LeNet_35

LeNet_36

bidirectional_elman_net_25

bidirectional_elman_net_26

binary_connect_mnist_LeNet_5

binary_connect_mnist_MLP_8

binary_net_mnist_LeNet_7

binary_net_mnist_MLP_4

binary_weight_mnist_MLP_6

elman_net_21

elman_net_22

elman_net_with_attention_33

elman_net_with_attention_34

gated_recurrent_unitGRU_31

gated_recurrent_unitGRU_32

long_short_term_memoryLSTM_29

long_short_term_memoryLSTM_30

mnist_dcgan_with_label_1

X

NNabla converter error, will be fixed in the future.

mnist_dcgan_with_label_2

X

NNabla converter error, will be fixed in the future.

mnist_vae_3

semi_supervised_learning_VAT_37

X

NNP with only a single executor is currently supported.

stacked_GRU_27

stacked_GRU_28

ONNX Export Pretrained Model Test(nnp –> onnx)

Count 17/18

ONNX Export Example Model Test(nnp –> onnx)

Count 8/10

Name

Support

Memo

capsules

classification

cycle_gan

deeplabv3plus

meta_learning

X

pix2pix

siamese_embedding

wavenet

X

The onehot dimension != 2 is not supported.

word_embedding

yolov2

Tensorflow Support Status

Import
  • ✓: Support to convert

  • X: Not support

Total: 15/16

Tensorflow Import Sample Test(tf –> nnp)

Count 15/16

Export
  • ✓: Support to convert

  • X: Not support

Total: 59/65

Tensorflow Export Sample Test(nnp –> tf)

Count 34/37

Name

Support

Memo

01_logistic_regression_10

01_logistic_regression_9

02_binary_cnn_15

02_binary_cnn_16

06_auto_encoder_17

06_auto_encoder_18

10_deep_mlp_13

10_deep_mlp_14

11_deconvolution_11

11_deconvolution_12

12_residual_learning_19

12_residual_learning_20

LSTM_auto_encoder_23

LSTM_auto_encoder_24

LeNet_35

LeNet_36

bidirectional_elman_net_25

bidirectional_elman_net_26

binary_connect_mnist_LeNet_5

binary_connect_mnist_MLP_8

binary_net_mnist_LeNet_7

binary_net_mnist_MLP_4

binary_weight_mnist_MLP_6

elman_net_21

elman_net_22

elman_net_with_attention_33

elman_net_with_attention_34

gated_recurrent_unitGRU_31

gated_recurrent_unitGRU_32

long_short_term_memoryLSTM_29

long_short_term_memoryLSTM_30

mnist_dcgan_with_label_1

X

NNabla converter error, will be fixed in the future.

mnist_dcgan_with_label_2

X

NNabla converter error, will be fixed in the future.

mnist_vae_3

semi_supervised_learning_VAT_37

X

NNP with only a single executor is currently supported.

stacked_GRU_27

stacked_GRU_28

Tensorflow Export Pretrained Models(nnp –> tf)

Count 17/18

Tensorflow Export Example Models(nnp –> tf)

Count 8/10

Name

Support

Memo

capsules

classification

cycle_gan

deeplabv3plus

meta_learning

X

pix2pix

siamese_embedding

wavenet

X

The onehot dimension != 2 is not supported.

word_embedding

yolov2

Tensorflow Lite Support Status

Export
  • ✓: Support to convert

  • X: Not support

Total: 45/65

Tensorflow Lite Export Sample Test(nnp –> tflite)

Count 32/37

Name

Support

Memo

01_logistic_regression_10

01_logistic_regression_9

02_binary_cnn_15

02_binary_cnn_16

06_auto_encoder_17

06_auto_encoder_18

10_deep_mlp_13

10_deep_mlp_14

11_deconvolution_11

11_deconvolution_12

12_residual_learning_19

12_residual_learning_20

LSTM_auto_encoder_23

LSTM_auto_encoder_24

LeNet_35

LeNet_36

bidirectional_elman_net_25

bidirectional_elman_net_26

binary_connect_mnist_LeNet_5

binary_connect_mnist_MLP_8

binary_net_mnist_LeNet_7

X

binary_net_mnist_MLP_4

X

binary_weight_mnist_MLP_6

elman_net_21

elman_net_22

elman_net_with_attention_33

elman_net_with_attention_34

gated_recurrent_unitGRU_31

gated_recurrent_unitGRU_32

long_short_term_memoryLSTM_29

long_short_term_memoryLSTM_30

mnist_dcgan_with_label_1

X

mnist_dcgan_with_label_2

X

mnist_vae_3

semi_supervised_learning_VAT_37

X

stacked_GRU_27

stacked_GRU_28

Tensorflow Lite Export Pretrained Models(nnp –> tflite)

Count 6/18

Tensorflow Lite Export Example Models(nnp –> tflite)

Count 7/10

NNabla C Runtime Support Status

Export
  • ✓: Support to convert

  • X: Not support

Total: 34/37

NNC Export Sample Test(nnp –> nnb)

Count 34/37

Name

Support

Memo

01_logistic_regression_10

01_logistic_regression_9

02_binary_cnn_15

02_binary_cnn_16

06_auto_encoder_17

06_auto_encoder_18

10_deep_mlp_13

10_deep_mlp_14

11_deconvolution_11

11_deconvolution_12

12_residual_learning_19

12_residual_learning_20

LSTM_auto_encoder_23

LSTM_auto_encoder_24

LeNet_35

LeNet_36

bidirectional_elman_net_25

bidirectional_elman_net_26

binary_connect_mnist_LeNet_5

binary_connect_mnist_MLP_8

binary_net_mnist_LeNet_7

binary_net_mnist_MLP_4

binary_weight_mnist_MLP_6

elman_net_21

elman_net_22

elman_net_with_attention_33

elman_net_with_attention_34

gated_recurrent_unitGRU_31

gated_recurrent_unitGRU_32

long_short_term_memoryLSTM_29

long_short_term_memoryLSTM_30

mnist_dcgan_with_label_1

X

Failed to infer by nnabla.

mnist_dcgan_with_label_2

X

Failed to compare inferring result.

mnist_vae_3

semi_supervised_learning_VAT_37

X

Failed to compare inferring result.

stacked_GRU_27

stacked_GRU_28

INT8 Quantized TFLite Support Status

Only a subset of tflite ops supports INT8 data type. If your model includes an unsupported op, the quantized converter will raise an error.

You can check all INT8 ops below:

  • ADD

  • AVERAGE_POOL_2D

  • CONCATENATION

  • CONV_2D

  • DEPTHWISE_CONV_2D

  • FULLY_CONNECTED

  • L2_NORMALIZATION

  • LOGISTIC

  • MAX_POOL_2D

  • MUL

  • RESHAPE

  • RESIZE_BILINEAR

  • SOFTMAX

  • SPACE_TO_DEPTH

  • TANH

  • PAD

  • GATHER

  • BATCH_TO_SPACE_ND

  • SPACE_TO_BATCH_ND

  • TRANSPOSE

  • MEAN

  • SUB

  • SUM

  • SQUEEZE

  • LOG_SOFTMAX

  • MAXIMUM

  • ARG_MAX

  • MINIMUM

  • LESS

  • PADV2

  • GREATER

  • GREATER_EQUAL

  • LESS_EQUAL

  • SLICE

  • EQUAL

  • NOT_EQUAL

  • SHAPE

  • QUANTIZE

  • RELU

  • LEAKY_RELU

Note that CONCATENATION is in the supported op list, but we recommend that you avoid using it. CONCATENATION will bring additional quantization error and may lead to significant degradation of accuracy.

Contributing Guide

Moved to Github.

License

Copyright (c) 2017 Sony Corporation. All rights reserved.

Apache License

Version 2.0, January 2004

http://www.apache.org/licenses/

TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

  1. Definitions.

“License” shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.

“Licensor” shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.

“Legal Entity” shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, “control” means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.

“You” (or “Your”) shall mean an individual or Legal Entity exercising permissions granted by this License.

“Source” form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.

“Object” form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.

“Work” shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).

“Derivative Works” shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.

“Contribution” shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, “submitted” means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as “Not a Contribution.”

“Contributor” shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.

  1. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.

  2. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.

  3. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:

You must give any other recipients of the Work or Derivative Works a copy of this License; and You must cause any modified files to carry prominent notices stating that You changed the files; and You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and If the Work includes a “NOTICE” text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.

You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.

  1. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.

  2. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.

  3. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.

  4. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.

END OF TERMS AND CONDITIONS

APPENDIX: HOW TO APPLY THE APACHE LICENSE TO YOUR WORK To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets “[]” replaced with your own identifying information. (Don’t include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same “printed page” as the copyright notice for easier identification within third-party archives.

Copyright 2017, Sony Corporation

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Indices and tables