Variable
- class nnabla.Variable
Bases:
object
nnabla.Variable
is used to construct computation graphs (neural networks) together with functions in Functions and List of Parametric Functions . It also provides a method to execute forward and backward propagation of the network. Thennabla.Variable
class holds:Reference to the parent function in a computation graph. This provides traceability of all connections in the computation graph.
Both data and error signal (gradient) containers as
nnabla.NdArray
s.Some additional information of the computation graph.
Variable
overrides some arithmetic operators (+
,-
,*
,/
,**
). Operands can be either a scalar number,NdArray
orVariable
. IfNdArray
is given as either of left or right operand, the arithmetic operation returns anNdArray
which stores the output of the computation immediately invoked. Otherwise, it returnsVariable
holds the graph connection. The computation is invoked immediately whennnabla.auto_forward
ornnabla.set_auto_forward(True)
is used.Note
Relational operators
==
and!=
of twoVariable
s are defined as an address comparison of underlying C++ instances (nbla::Variable
). Also,hash()
function, which is often used in a key forset
anddict
, is based on the address.See also
- Parameters:
- apply(self, **kwargs)
Helper for setting property, then return self.
- backward(self, grad=1, bool clear_buffer=False, communicator_callbacks=None, function_pre_hook=None, function_post_hook=None)
Performs a backward propagation starting from this variable until the root variable(s) is/are reached in the function graph. The propagation will stop at a variable with need_grad=False.
- Parameters:
grad (scalar,
numpy.ndarray
,nnabla.NdArray
, or None) – The gradient signal value(s) of this variable. The default value 1 is used in an usual neural network training. This option is useful if you have a gradient computation module outside NNabla, and want to use that result as a gradient signal. Note that this doesn’t modifies the grad values of this variable, instead assign received values to its gradient temporarily. Also, if theVariable
you want to executennabla._variable.Variable.backward
is an unlinked variable from another, and the correspondingVariable
holds the pre-computed gradient values, You need to set grad=None, otherwise, for that backward pass (propagated from the unlinkedVariable
), pre-computed gradient values are ignored.clear_buffer (bool) – Clears the no longer referenced variables during backpropagation to save memory. Note that all unnecessary intermediate variables will be cleared unless set explicitly as
persistent=True
.communicator_callbacks (
nnabla.CommunicatorBackwardCallback
or list ofnnabla.CommunicatorBackwardCallback
) – The callback functions invoked when 1) backward computation of each function is finished and 2) all backward computation is finished.function_pre_hook (callable) – This callable object is called immediately before each function is executed. It must take
Function
as an input. The default is None.function_post_hook (callable) – This callable object is called immediately after each function is executed. It must take
Function
as an input. The default is None.
Example
We first explain simple backward usage.
import nnabla as nn import nnabla.functions as F import nnabla.parametric_functions as PF import numpy as np import nnabla.initializer as I rng = np.random.seed(217) initializer = I.UniformInitializer((-0.1, 0.1), rng=rng) x = nn.Variable((8, 3, 32, 32)) x.d = np.random.random(x.shape) # random input, just for example. y0 = PF.convolution(x, outmaps=64, kernel=(3, 3), pad=(1, 1), stride=(2, 2), w_init=initializer, name="conv1", with_bias=False) y1 = F.relu(y0) y2 = PF.convolution(y1, outmaps=128, kernel=(3, 3), pad=(1, 1), stride=(2, 2), w_init=initializer, name="conv2", with_bias=False) y3 = F.relu(y2) y4 = F.average_pooling(y3, kernel=y3.shape[2:]) y5 = PF.affine(y4, 1, w_init=initializer) loss = F.mean(F.abs(y5 - 1.)) loss.forward() # Execute forward # We can check the current gradient of parameter. print(nn.get_parameters()["conv1/conv/W"].g)
Output :
[[[[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]] ...
Initially all the gradient values should be zero. Then let’s see what happens after calling backward.
loss.backward() print(nn.get_parameters()["conv1/conv/W"].g)
Output :
[[[[ 0.00539637 0.00770839 0.0090611 ] [ 0.0078223 0.00978992 0.00720569] [ 0.00879023 0.00578172 0.00790895]] ...
Now we know the gradient values are computed and registered by calling
backward
. Note that callingbackward
successively accumulates the result. It means if we executebackward
again, we get the doubled result.loss.backward() # execute again. print(nn.get_parameters()["conv1/conv/W"].g)
We can see it’s accumulated.
[[[[ 0.01079273 0.01541678 0.0181222 ] [ 0.01564459 0.01957984 0.01441139] [ 0.01758046 0.01156345 0.0158179 ]] ...
Next is an advanced usage with an unlinked variable (please refer to
get_unlinked_variable
). We use the same network, but it is separated by the unlinked variable.import nnabla as nn import nnabla.functions as F import nnabla.parametric_functions as PF import numpy as np import nnabla.initializer as I rng = np.random.seed(217) # use the same random seed. initializer = I.UniformInitializer((-0.1, 0.1), rng=rng) x = nn.Variable((8, 3, 32, 32)) x.d = np.random.random(x.shape) # random input, just for example. y0 = PF.convolution(x, outmaps=64, kernel=(3, 3), pad=(1, 1), stride=(2, 2), w_init=initializer, name="conv1", with_bias=False) y1 = F.relu(y0) y2 = PF.convolution(y1, outmaps=128, kernel=(3, 3), pad=(1, 1), stride=(2, 2), w_init=initializer, name="conv2", with_bias=False) y3 = F.relu(y2) y3_unlinked = y3.get_unlinked_variable() # the computation graph is cut apart here. y4 = F.average_pooling(y3_unlinked, kernel=y3_unlinked.shape[2:]) y5 = PF.affine(y4, 1, w_init=initializer) loss = F.mean(F.abs(y5 - 1.)) # Execute forward. y3.forward() # you need to execute forward at the unlinked variable first. loss.forward() # Then execute forward at the leaf variable. # Execute backward. loss.backward() # works, but backpropagation stops at y3_unlinked. print(nn.get_parameters()["conv1/conv/W"].g) # no gradient registered yet.
Output :
[[[[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]] ...
We can confirm that backpropagation stops at
y3_unlinked
. Then let’s see how to execute backpropagation to the root variable (x
). Since it’s a little bit complicated, let us give you an example of common pitfall first. Note that this is an incorrect way and intended just to show the backward’s behavior.y3.backward() # this works, but computed gradient values are not correct. print(nn.get_parameters()["conv1/conv/W"].g)
Output :
[[[[ 17.795254 23.960905 25.51168 ] [ 20.661646 28.484127 19.406212 ] [ 26.91042 22.239697 23.395714 ]] ...
Note that this is a wrong result. The gradient held by
y3_unlinked
has been totally ignored. As described above, just callingbackward
, the gradient (of the leaf variable where you callbackward
) is considered to be 1.To execute backpropagation over 2 separate graphs correctly, We need to specify
grad=None
as shown below, then present gradient held by that variable is used for computation. (y3.backward(grad=y3_unlinked.g)
does the same thing.)#reset all the gradient values. for v in nn.get_parameters().values(): v.g = 0. for v in [y0, y1, y2, y3, y4, y5]: v.g = 0. # need to reset all the gradient values. loss.backward() # backpropagation starts from the leaf variable again. y3.backward(grad=None) # By this, it can take over the gradient held by y3_unlinked. print(nn.get_parameters()["conv1/conv/W"].g) # correct result.
This time you should have the same result.
[[[[ 0.00539637 0.00770839 0.0090611 ] [ 0.0078223 0.00978992 0.00720569] [ 0.00879023 0.00578172 0.00790895]] ...
- bool_fill_(self, mask, value)
Return a new but inplaced
nnabla.Variable
filled with value where mask is non-zero.- Parameters:
mask (
nnabla.NdArray
) – Mask with which to fill. Non-zero/zero elements are supposed to be a binary mask as 1/0. No gradients are computed with respect to mask.value (float) – The value to fill.
- Returns:
- clear_all_graph_links(self)
Clear all intermediate functions and variables.
This method clear all intermediate functions and variables up to this variable in forward pass and is useful for the truncated backpropagation through time (truncated BPTT) in dynamic graph.
- d
Returns the values held by this variable, as a
numpy.ndarray
. Note that the values are referenced (not copied). Therefore, the modification of the returned ndarray will affect the data of the NNabla array. This method can be called as a setter to set the value held by this variable. Refer to the documentation of the setternnabla.NdArray.data
for detailed behaviors of the setter.- Parameters:
value (
numpy.ndarray
) (optional) –- Returns:
- forward(self, bool clear_buffer=False, bool clear_no_need_grad=False, function_pre_hook=None, function_post_hook=None)
Performs a forward propagation from the root node to this variable. The forward propagation is performed on a subset of variables determined by the dependency of this variable. The subset is recursively constructed by tracking variables that the variables in the subset depend on, starting from this variable, until it reaches the root variable(s) in the function graph. See also
forward_all
, which performs forward computations for all variables within the input graph.- Parameters:
clear_buffer (bool) – Clear the no longer referenced variables during forward propagation to save memory. This is usually set as True in an inference or a validation phase. Default is False. Note that all unnecessary intermediate variables will be cleared unless set explicitly as
persistent=True
.clear_no_need_grad (bool) – Clear the unreferenced variables with need_grad=False during forward propagation. True is usually used when calling this during training. This is ignored when clear_buffer=True.
function_pre_hook (callable) – This callable object is called immediately before each function is executed. It must take
Function
as an input. The default is None.function_post_hook (callable) – This callable object is called immediately after each function is executed. It must take
Function
as an input. The default is None.
- static from_numpy_array(data, grad=None, need_grad=None)
Create a Variable object from Numpy array(s).
The
data
is initialized with the given Numpy array, as well asgrad
if given.The shape is also determined by the given array.
- function_references
Returns a list of functions which take this variable as an input. This method can be called only as a getter.
- Returns:
list of
nnabla.function.Function
- g
Returns the gradient values held by this variable, as a
numpy.ndarray
. Note that the values are referenced (not copied). Therefore, the modification of the returned ndarray will affect the data of the NNabla array. This method can be called as a setter to set the gradient held by this variable. Refer to the documentation of the setternnabla.NdArray.data
for detailed behaviors of the setter.- Parameters:
value (
numpy.ndarray
) –- Returns:
- get_unlinked_variable(self, need_grad=None)
Gets an unlinked (forgetting parent) variable that shares a Variable buffer instance.
- Parameters:
need_grad (bool, optional) – By default, the unlinked variable will have the same need_grad flag with this variable instance. By specifying a boolean value, the new need_grad flags will be set to the unlinked variable. It is recommended to explicitly specify this option to avoid an unintended behavior.
Returns:
Variable
Note
The unlinked Variable behaves equivalent to the original variable in a comparison operator and hash function regardless whether or not the
need_grad
attribute is changed. See a note in the Variable class documentation. Also, for backward execution with unlinked variable(s), please refer tobackward
and its example.Example
import numpy as np import nnabla as nn import nnabla.parametric_functions as PF x = nn.Variable.from_numpy_array(np.array([[1, 2], [3, 4]])) y = PF.affine(x, 4, name="y") # Create a new variable of which graph connection is unlinked. # Recommend to specify need_grad option explicitly . z = y.get_unlinked_variable(need_grad=False) print(y.parent) # Affine print(z.parent) # z is unlinked from the parent x but shares the buffers of y. # None
- info
object
Information of the variable.
- Type:
info
- masked_fill_(mask, value)
Variable.bool_fill_(self, mask, value)
Return a new but inplaced
nnabla.Variable
filled with value where mask is non-zero.- Parameters:
mask (
nnabla.NdArray
) – Mask with which to fill. Non-zero/zero elements are supposed to be a binary mask as 1/0. No gradients are computed with respect to mask.value (float) – The value to fill.
- Returns:
- ndim
Gets the number of dimensions of this variable.
- Returns:
int
- need_grad
Gets or sets a boolean indicating whether backpropagation is performed at this variable.
- no_grad(self)
No gradients for the whole network.
This method is like
nnabla.no_grad
but can be used for the static network only, and useful for the case where the network is loaded from NNP format.Example
x = nn.Variable.from_numpy_array([2, 3]) y = <Network>(x).no_grad()
- parent
Returns the parent function of this variable. This method can also be called as a setter.
- Parameters:
func (
nnabla.function.Function
) –- Returns:
- persistent
Returns the persistent flag of this variable. If True, the variable is not cleared even if clear options in
nnabla._variable.Variable.forward()
andnnabla._variable.Variable.backward()
are enabled. This is useful when you debug the variable values, or log them. This method can also be called as a setter.- Parameters:
b (bool) –
- Returns:
bool
- recompute
Gets or sets a boolean indicating whether its data is cleared during forward propagation and recomputation is performed during backward propagation.
- reset_shape(self, shape, force=False)
Resizes the shape of the variable to a specified shape.
Note
This method destructively changes the shape of the target variable. For safety,
reshape()
should be used instead.- Returns:
None
- reshape(self, shape, unlink=False)
Returns a new variable, where this variable is reshaped to a specified shape.
- rewire_on(self, var)
Rewire a successor graph of this variable on top of
var
.- Parameters:
var (
nnabla.Variable
) – The array elements and the parent function ofvar
is copied toself
as references. Note that the parent function ofvar
is removed.
Example
# A. Create a graph A. xa = nn.Variable((2, 8), need_grad=True) ya = F.tanh(PF.affine(xa, 10, name='a')) # B. Create a graph B. xb = nn.Variable((2, 16), need_grad=True) yb = F.tanh(PF.affine( F.tanh(PF.affine(xb, 8, name='b1')), 8, name='b2')) # C. Rewire the graph A on top of B such that # `xb->B->(yb->)xa->A->ya`. Note `yb` is gone. xa.rewire_on(yb) # D. Execute the rewired graph. xb.d = 1 ya.forward() ya.backward()
- size_from_axis(self, axis=-1)
Gets the size followed by the provided axis.
Example
a = nnabla.Variable([10,9]) a.size_from_axis() # ==> 90 a.size_from_axis(0) # ==> 90 a.size_from_axis(1) # ==> 9 a.size_from_axis(2) # ==> 1
- unlinked(self, need_grad=None)
This function is
deprecated
, use get_unlinked_variable instead.
- visit(self, f)
Visit functions recursively in forward order.
- Parameters:
f (function) – Function object which takes
nnabla._function.Function
object as an argument.- Returns:
None
Example
import nnabla as nn import nnabla.functions as F import nnabla.parametric_functions as PF # Define a simple network-graph def network_graph(x, maps=16, test=False): h = x h = PF.convolution(h, maps, kernel=(3, 3), pad=(1, 1), name="first-conv", with_bias=False) h = F.average_pooling(h, h.shape[2:]) pred = PF.affine(h, 10, name="pred") return pred # You can modify this PrintFunc to get the other information like inputs(nnabla_func.inputs), outputs and arguments(nnabla_func.info.args) of nnabla functions. class PrintFunc(object): def __call__(self, nnabla_func): print(nnabla_func.info.type_name) x = nn.Variable([1, 3, 16, 16]) output = network_graph(x) output.visit(PrintFunc())
Output :
Convolution AveragePooling Affine
- visit_check(self, f)
Visit functions recursively in forward order.
Note
If any of evaluation of the function object returns True, the visit propagation will stop immediately, and will return True.
- Parameters:
f (function) – Function object which takes
nnabla._function.Function
object as an argument.- Returns:
bool Returns True if any of the function object call returns True.
Example
Define a simple network-graph where AveragePooling function can be added explicitly as below:
def network_graph(x, add_avg_pool=False, maps=16, test=False): h = x h = PF.convolution(h, maps, kernel=(3, 3), pad=(1, 1), name="first-conv", with_bias=False) if add_avg_pool : h = F.average_pooling(h, h.shape[2:]) else : h = F.relu(h) pred = PF.affine(h, 10, name="pred") return pred # Define 'PrintFunc()' to check whether "AveragePooling" function exists in the network-graph class PrintFunc(object): def __call__(self, nnabla_func): if nnabla_func.info.type_name =="AveragePooling" : print("{} exists in the graph".format(nnabla_func.info.type_name)) return True else : return False
Create a network-graph which has AveragePooling function and call visit_check() method :
x = nn.Variable([1, 3, 16, 16]) output = network_graph(x, add_avg_pool=True) #Adding AveragePooling function to the graph print("The return value of visit_check() method is : {}".format(output.visit_check(PrintFunc())))
Output :
AveragePooling exists in the graph The return value of visit_check() method is : True
Create a network-graph which doesn’t have AveragePooling function and call visit_check() method :
nn.clear_parameters() # call this in case you want to run the following code again output = network_graph(x, add_avg_pool=False) # Exclusion of AveragePooling function in the graph print("The return value of visit_check() method is : {}".format(output.visit_check(PrintFunc())))
Output :
The return value of visit_check() method is : False