Variable

class nnabla.Variable

Bases: object

nnabla.Variable is used to construct computation graphs (neural networks) together with functions in Functions and List of Parametric Functions . It also provides a method to execute forward and backward propagation of the network. The nnabla.Variable class holds:

Reference to the parent function in a computation graph. This provides traceability of all connections in the computation graph.
Both data and error signal (gradient) containers as nnabla.NdArray s.
Some additional information of the computation graph.

Variable overrides some arithmetic operators (+, -, *, /, **). Operands can be either a scalar number, NdArray or Variable. If NdArray is given as either of left or right operand, the arithmetic operation returns an NdArray which stores the output of the computation immediately invoked. Otherwise, it returns Variable holds the graph connection. The computation is invoked immediately when nnabla.auto_forward or nnabla.set_auto_forward(True) is used.

Note

Relational operators == and != of two Variable s are defined as an address comparison of underlying C++ instances (nbla::Variable). Also, hash() function, which is often used in a key for set and dict, is based on the address.

See also

Python API Tutorial.

Parameters:

shape (Iterable of int) – Shape of variable.
need_grad (bool) – Flag for backprop or not.

apply(self, **kwargs): Helper for setting property, then return self.

backward(self, grad=1, bool clear_buffer=False, communicator_callbacks=None, function_pre_hook=None, function_post_hook=None)

Performs a backward propagation starting from this variable until the root variable(s) is/are reached in the function graph. The propagation will stop at a variable with need_grad=False.

Parameters:

grad (scalar, numpy.ndarray, nnabla.NdArray, or None) – The gradient signal value(s) of this variable. The default value 1 is used in an usual neural network training. This option is useful if you have a gradient computation module outside NNabla, and want to use that result as a gradient signal. Note that this doesn’t modifies the grad values of this variable, instead assign received values to its gradient temporarily. Also, if the Variable you want to execute nnabla._variable.Variable.backward is an unlinked variable from another, and the corresponding Variable holds the pre-computed gradient values, You need to set grad=None, otherwise, for that backward pass (propagated from the unlinked Variable), pre-computed gradient values are ignored.
clear_buffer (bool) – Clears the no longer referenced variables during backpropagation to save memory. Note that all unnecessary intermediate variables will be cleared unless set explicitly as persistent=True.
communicator_callbacks (nnabla.CommunicatorBackwardCallback or list of nnabla.CommunicatorBackwardCallback) – The callback functions invoked when 1) backward computation of each function is finished and 2) all backward computation is finished.
function_pre_hook (callable) – This callable object is called immediately before each function is executed. It must take Function as an input. The default is None.
function_post_hook (callable) – This callable object is called immediately after each function is executed. It must take Function as an input. The default is None.

Example

We first explain simple backward usage.

import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import numpy as np
import nnabla.initializer as I

rng = np.random.seed(217)
initializer = I.UniformInitializer((-0.1, 0.1), rng=rng)

x = nn.Variable((8, 3, 32, 32))
x.d = np.random.random(x.shape)  # random input, just for example.

y0 = PF.convolution(x, outmaps=64, kernel=(3, 3), pad=(1, 1), stride=(2, 2), w_init=initializer, name="conv1", with_bias=False)
y1 = F.relu(y0)
y2 = PF.convolution(y1, outmaps=128, kernel=(3, 3), pad=(1, 1), stride=(2, 2), w_init=initializer, name="conv2", with_bias=False)
y3 = F.relu(y2)
y4 = F.average_pooling(y3, kernel=y3.shape[2:])
y5 = PF.affine(y4, 1, w_init=initializer)
loss = F.mean(F.abs(y5 - 1.))
loss.forward()  # Execute forward

# We can check the current gradient of parameter.
print(nn.get_parameters()["conv1/conv/W"].g)

Output :

[[[[0. 0. 0.]
   [0. 0. 0.]
   [0. 0. 0.]]
      ...

Initially all the gradient values should be zero. Then let’s see what happens after calling backward.

loss.backward()
print(nn.get_parameters()["conv1/conv/W"].g)

Output :

[[[[ 0.00539637  0.00770839  0.0090611 ]
   [ 0.0078223   0.00978992  0.00720569]
   [ 0.00879023  0.00578172  0.00790895]]
                     ...

Now we know the gradient values are computed and registered by calling backward. Note that calling backward successively accumulates the result. It means if we execute backward again, we get the doubled result.

loss.backward()  # execute again.
print(nn.get_parameters()["conv1/conv/W"].g)

We can see it’s accumulated.

[[[[ 0.01079273  0.01541678  0.0181222 ]
   [ 0.01564459  0.01957984  0.01441139]
   [ 0.01758046  0.01156345  0.0158179 ]]
                     ...

Next is an advanced usage with an unlinked variable (please refer to get_unlinked_variable). We use the same network, but it is separated by the unlinked variable.

import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import numpy as np
import nnabla.initializer as I

rng = np.random.seed(217)  # use the same random seed.
initializer = I.UniformInitializer((-0.1, 0.1), rng=rng)

x = nn.Variable((8, 3, 32, 32))
x.d = np.random.random(x.shape)  # random input, just for example.

y0 = PF.convolution(x, outmaps=64, kernel=(3, 3), pad=(1, 1), stride=(2, 2), w_init=initializer, name="conv1", with_bias=False)
y1 = F.relu(y0)
y2 = PF.convolution(y1, outmaps=128, kernel=(3, 3), pad=(1, 1), stride=(2, 2), w_init=initializer, name="conv2", with_bias=False)
y3 = F.relu(y2)
y3_unlinked = y3.get_unlinked_variable()  # the computation graph is cut apart here.
y4 = F.average_pooling(y3_unlinked, kernel=y3_unlinked.shape[2:])
y5 = PF.affine(y4, 1, w_init=initializer)
loss = F.mean(F.abs(y5 - 1.))

# Execute forward.
y3.forward()  # you need to execute forward at the unlinked variable first.
loss.forward()  # Then execute forward at the leaf variable.

# Execute backward.
loss.backward()  # works, but backpropagation stops at y3_unlinked.
print(nn.get_parameters()["conv1/conv/W"].g)  # no gradient registered yet.

Output :

[[[[0. 0. 0.]
   [0. 0. 0.]
   [0. 0. 0.]]
      ...

We can confirm that backpropagation stops at y3_unlinked. Then let’s see how to execute backpropagation to the root variable (x). Since it’s a little bit complicated, let us give you an example of common pitfall first. Note that this is an incorrect way and intended just to show the backward’s behavior.

y3.backward()  # this works, but computed gradient values are not correct.
print(nn.get_parameters()["conv1/conv/W"].g)

Output :

[[[[ 17.795254    23.960905    25.51168   ]
   [ 20.661646    28.484127    19.406212  ]
   [ 26.91042     22.239697    23.395714  ]]
                     ...

Note that this is a wrong result. The gradient held by y3_unlinked has been totally ignored. As described above, just calling backward, the gradient (of the leaf variable where you call backward) is considered to be 1.

To execute backpropagation over 2 separate graphs correctly, We need to specify grad=None as shown below, then present gradient held by that variable is used for computation. (y3.backward(grad=y3_unlinked.g) does the same thing.)

#reset all the gradient values.
for v in nn.get_parameters().values():
    v.g = 0.
for v in [y0, y1, y2, y3, y4, y5]:
    v.g = 0.  # need to reset all the gradient values.

loss.backward()  # backpropagation starts from the leaf variable again.
y3.backward(grad=None)  # By this, it can take over the gradient held by y3_unlinked.
print(nn.get_parameters()["conv1/conv/W"].g)  # correct result.

This time you should have the same result.

[[[[ 0.00539637  0.00770839  0.0090611 ]
   [ 0.0078223   0.00978992  0.00720569]
   [ 0.00879023  0.00578172  0.00790895]]
                     ...

bool_fill_(self, mask, value)

Return a new but inplaced nnabla.Variable filled with value where mask is non-zero.

Parameters:

mask (nnabla.NdArray) – Mask with which to fill. Non-zero/zero elements are supposed to be a binary mask as 1/0. No gradients are computed with respect to mask.
value (float) – The value to fill.

Returns:

nnabla.Variable

clear_all_graph_links(self)

Clear all intermediate functions and variables.

This method clear all intermediate functions and variables up to this variable in forward pass and is useful for the truncated backpropagation through time (truncated BPTT) in dynamic graph.

d

Returns the values held by this variable, as a numpy.ndarray. Note that the values are referenced (not copied). Therefore, the modification of the returned ndarray will affect the data of the NNabla array. This method can be called as a setter to set the value held by this variable. Refer to the documentation of the setter nnabla.NdArray.data for detailed behaviors of the setter.

Parameters:: value (numpy.ndarray) (optional) –
Returns:: numpy.ndarray

data

Returns the data held by this variable, as a NdArray. This can also be used as a setter.

Parameters:: ndarray (NdArray) – NdArray object. Size must be the same as this Variable.
Returns:: NdArray

forward(self, bool clear_buffer=False, bool clear_no_need_grad=False, function_pre_hook=None, function_post_hook=None)

Performs a forward propagation from the root node to this variable. The forward propagation is performed on a subset of variables determined by the dependency of this variable. The subset is recursively constructed by tracking variables that the variables in the subset depend on, starting from this variable, until it reaches the root variable(s) in the function graph. See also forward_all, which performs forward computations for all variables within the input graph.

Parameters:

clear_buffer (bool) – Clear the no longer referenced variables during forward propagation to save memory. This is usually set as True in an inference or a validation phase. Default is False. Note that all unnecessary intermediate variables will be cleared unless set explicitly as persistent=True.
clear_no_need_grad (bool) – Clear the unreferenced variables with need_grad=False during forward propagation. True is usually used when calling this during training. This is ignored when clear_buffer=True.
function_pre_hook (callable) – This callable object is called immediately before each function is executed. It must take Function as an input. The default is None.
function_post_hook (callable) – This callable object is called immediately after each function is executed. It must take Function as an input. The default is None.

static from_numpy_array(data, grad=None, need_grad=None)

Create a Variable object from Numpy array(s).

The data is initialized with the given Numpy array, as well as grad if given.

The shape is also determined by the given array.

Parameters:

data (ndarray) – Values copied to the data of the created Variable.
grad (ndarray) – Values copied to the grad of the created Variable.
need_grad (bool) – Flag for backprop or not.

Returns:

Variable

function_references

Returns a list of functions which take this variable as an input. This method can be called only as a getter.

Returns:: list of nnabla.function.Function

g

Returns the gradient values held by this variable, as a numpy.ndarray. Note that the values are referenced (not copied). Therefore, the modification of the returned ndarray will affect the data of the NNabla array. This method can be called as a setter to set the gradient held by this variable. Refer to the documentation of the setter nnabla.NdArray.data for detailed behaviors of the setter.

Parameters:: value (numpy.ndarray) –
Returns:: numpy.ndarray

get_number_of_references

Gets the number of referneces to the same memory objects.

Returns:: int

get_unlinked_variable(self, need_grad=None)

Gets an unlinked (forgetting parent) variable that shares a Variable buffer instance.

Parameters:: need_grad (bool, optional) – By default, the unlinked variable will have the same need_grad flag with this variable instance. By specifying a boolean value, the new need_grad flags will be set to the unlinked variable. It is recommended to explicitly specify this option to avoid an unintended behavior.

Returns: Variable

Note

The unlinked Variable behaves equivalent to the original variable in a comparison operator and hash function regardless whether or not the need_grad attribute is changed. See a note in the Variable class documentation. Also, for backward execution with unlinked variable(s), please refer to backward and its example.

Example

import numpy as np
import nnabla as nn
import nnabla.parametric_functions as PF

x = nn.Variable.from_numpy_array(np.array([[1, 2], [3, 4]]))
y = PF.affine(x, 4, name="y")

# Create a new variable of which graph connection is unlinked.
# Recommend to specify need_grad option explicitly .
z = y.get_unlinked_variable(need_grad=False)

print(y.parent)
# Affine
print(z.parent)  # z is unlinked from the parent x but shares the buffers of y.
# None

grad

Returns the gradient held by this variable, as a NdArray. This can also be used as a setter.

Parameters:: ndarray (NdArray) – NdArray object. Size must be the same as this Variable.
Returns:: NdArray

info

object

Information of the variable.

Type:: info

masked_fill_(mask, value)

Variable.bool_fill_(self, mask, value)

Return a new but inplaced nnabla.Variable filled with value where mask is non-zero.

Parameters:

mask (nnabla.NdArray) – Mask with which to fill. Non-zero/zero elements are supposed to be a binary mask as 1/0. No gradients are computed with respect to mask.
value (float) – The value to fill.

Returns:

nnabla.Variable

ndim

Gets the number of dimensions of this variable.

Returns:: int

need_grad

Gets or sets a boolean indicating whether backpropagation is performed at this variable.

Parameters:: b (bool) – Whether backpropagation is performed at this variable.
Returns:: Whether this variable requires gradient or not.
Return type:: bool

no_grad(self)

No gradients for the whole network.

This method is like nnabla.no_grad but can be used for the static network only, and useful for the case where the network is loaded from NNP format.

Example

x = nn.Variable.from_numpy_array([2, 3])
y = <Network>(x).no_grad()

parent

Returns the parent function of this variable. This method can also be called as a setter.

Parameters:: func (nnabla.function.Function) –
Returns:: nnabla.function.Function

persistent

Returns the persistent flag of this variable. If True, the variable is not cleared even if clear options in nnabla._variable.Variable.forward() and nnabla._variable.Variable.backward() are enabled. This is useful when you debug the variable values, or log them. This method can also be called as a setter.

Parameters:: b (bool) –
Returns:: bool

recompute

Gets or sets a boolean indicating whether its data is cleared during forward propagation and recomputation is performed during backward propagation.

Parameters:: b (bool) – Whether recomputation is performed during backward propagation.
Returns:: Whether this variable is recomputed during backward propagation.
Return type:: bool

reset_shape(self, shape, force=False)

Resizes the shape of the variable to a specified shape.

Parameters:

shape (Iterable of int) – Target shape.
force (bool) – Flag to force reshape.

Note

This method destructively changes the shape of the target variable. For safety, reshape() should be used instead.

Returns:: None

reshape(self, shape, unlink=False)

Returns a new variable, where this variable is reshaped to a specified shape.

Parameters:

shape (Iterable of int) – Target shape.
unlink (bool) – Unlink graph connection. Or, keep graph connection, i.e. the gradient will be backprop-ed to the original variable.

Returns:

Variable

rewire_on(self, var)

Rewire a successor graph of this variable on top of var.

Parameters:: var (nnabla.Variable) – The array elements and the parent function of var is copied to self as references. Note that the parent function of var is removed.

Example

# A. Create a graph A.
xa = nn.Variable((2, 8), need_grad=True)
ya = F.tanh(PF.affine(xa, 10, name='a'))

# B. Create a graph B.
xb = nn.Variable((2, 16), need_grad=True)
yb = F.tanh(PF.affine(
    F.tanh(PF.affine(xb, 8, name='b1')),
    8, name='b2'))

# C. Rewire the graph A on top of B such that
#    `xb->B->(yb->)xa->A->ya`. Note `yb` is gone.
xa.rewire_on(yb)

# D. Execute the rewired graph.
xb.d = 1
ya.forward()
ya.backward()

shape

Gets the shape of the variable.

Returns:: tuple of int

size

Gets the size of the variable.

Returns:: int

size_from_axis(self, axis=-1)

Gets the size followed by the provided axis.

Example

a = nnabla.Variable([10,9])
a.size_from_axis()
# ==> 90
a.size_from_axis(0)
# ==> 90
a.size_from_axis(1)
# ==> 9
a.size_from_axis(2)
# ==> 1

Parameters:: axis (int, optional) – -1 as default
Returns:: int

unlinked(self, need_grad=None): This function is deprecated, use get_unlinked_variable instead.

visit(self, f)

Visit functions recursively in forward order.

Parameters:: f (function) – Function object which takes nnabla._function.Function object as an argument.
Returns:: None

Example

import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF

# Define a simple network-graph
def network_graph(x, maps=16, test=False):
    h = x
    h = PF.convolution(h, maps, kernel=(3, 3), pad=(1, 1), name="first-conv", with_bias=False)
    h = F.average_pooling(h, h.shape[2:])
    pred = PF.affine(h, 10, name="pred")
    return pred

# You can modify this PrintFunc to get the other information like inputs(nnabla_func.inputs), outputs and arguments(nnabla_func.info.args) of nnabla functions.
class PrintFunc(object):
    def __call__(self, nnabla_func):
        print(nnabla_func.info.type_name)

x = nn.Variable([1, 3, 16, 16])
output = network_graph(x)
output.visit(PrintFunc())

Output :

Convolution
AveragePooling
Affine

visit_check(self, f)

Visit functions recursively in forward order.

Note

If any of evaluation of the function object returns True, the visit propagation will stop immediately, and will return True.

Parameters:: f (function) – Function object which takes nnabla._function.Function object as an argument.
Returns:: bool Returns True if any of the function object call returns True.

Example

Define a simple network-graph where AveragePooling function can be added explicitly as below:

def network_graph(x, add_avg_pool=False, maps=16, test=False):
    h = x
    h = PF.convolution(h, maps, kernel=(3, 3), pad=(1, 1), name="first-conv", with_bias=False)
    if add_avg_pool :
        h = F.average_pooling(h, h.shape[2:])
    else :
        h = F.relu(h)
    pred = PF.affine(h, 10, name="pred")
    return pred

# Define 'PrintFunc()' to check whether "AveragePooling" function exists in the network-graph
class PrintFunc(object):
    def __call__(self, nnabla_func):
        if nnabla_func.info.type_name =="AveragePooling" :
            print("{} exists in the graph".format(nnabla_func.info.type_name))
            return True
        else :
            return False

Create a network-graph which has AveragePooling function and call visit_check() method :

x = nn.Variable([1, 3, 16, 16])
output = network_graph(x, add_avg_pool=True)  #Adding AveragePooling function to the graph
print("The return value of visit_check() method is : {}".format(output.visit_check(PrintFunc())))

Output :

AveragePooling exists in the graph
The return value of visit_check() method is : True

Create a network-graph which doesn’t have AveragePooling function and call visit_check() method :

nn.clear_parameters()                         # call this in case you want to run the following code again
output = network_graph(x, add_avg_pool=False) # Exclusion of AveragePooling function in the graph
print("The return value of visit_check() method is : {}".format(output.visit_check(PrintFunc())))

Output :

The return value of visit_check() method is : False