Gradient function for the outputs with respect to the inputs.

The grad function computes the sum of gradients of the outputs w.r.t. the inputs.

$g_i = \sum_{j} {\frac{\partial y_j}{\partial x_i}},$

$$y_j$$ is each output, $$x_i$$ is each input, and $$g_i$$ is the sum of the gradient of $$y_j$$ w.r.t. $$x_i$$ over all $$j$$.

Parameters: outputs (list of Variable or Variable) – Outputs of the differentiable function. inputs (list of Variable or Variable) – Inputs w.r.t. which the gradients of outputs are computed. grad_outputs (None, scalar, numpy.ndarray, nnabla._nd_array.NdArray, or list of scalar, numpy.ndarray, or nnabla._nd_array.NdArray,) – Gradient outputs corresponding to outputs. This is same as the grad argument of backward(). Default is None, so the one is used as the in-coming gradient at the very end of the Variable in the backward graph. persistent_outputs (list of bool) – Outputs become persistent accordingly. If not specified, all outputs become persistent. bind_grad_output (bool) – Bind data to grad of input Varaible. This is useful for the case where one wants to use the backward graph for training a neural network using the first-order gradients only. Default is False.
Returns

List of Variable.

If the backpropagation does not reach input(s), the corresponding returned value(s) are None.

Example:

import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import numpy as np
from nnabla.ext_utils import get_extension_context

# Context
extension_module = "cudnn"
ctx = get_extension_context(extension_module)

# Input and label
x = nn.Variable.from_numpy_array(np.random.randn(4, 3, 32, 32))
y = nn.Variable.from_numpy_array(np.random.randint(0, 10, 4).reshape(4, 1))

# Network
h = PF.convolution(x, 8, (3, 3), (1, 1), name="conv1")
h = F.relu(h)
h = F.max_pooling(h, (2, 2))
h = PF.convolution(h, 16, (3, 3), (1, 1), name="conv2")
h = F.relu(h)
h = F.max_pooling(h, (2, 2))
p = PF.affine(h, 10, name="pred")
loss = F.mean(F.softmax_cross_entropy(p, y))