Grad

nnabla.grad.grad(outputs, inputs, grad_outputs=None, persistent_outputs=[], bind_grad_output=False)[source]

Gradient function for the outputs with respect to the inputs.

The grad function computes the sum of gradients of the outputs w.r.t. the inputs.

\[g_i = \sum_{j} {\frac{\partial y_j}{\partial x_i}},\]

\(y_j\) is each output, \(x_i\) is each input, and \(g_i\) is the sum of the gradient of \(y_j\) w.r.t. \(x_i\) over all \(j\).

Parameters:
  • outputs (list of Variable or Variable) – Outputs of the differentiable function.

  • inputs (list of Variable or Variable) – Inputs w.r.t. which the gradients of outputs are computed.

  • grad_outputs (None, scalar, numpy.ndarray, nnabla.NdArray, or list of scalar, numpy.ndarray, or nnabla.NdArray,) – Gradient outputs corresponding to outputs. This is same as the grad argument of backward(). Default is None, so 1 is used as the in-coming gradient at the very beginning of the Variable in the gradient graph.

  • persistent_outputs (list of bool) – Outputs become persistent accordingly. If not specified, all outputs become persistent.

  • bind_grad_output (bool) – Bind data to grad of input variable. This is useful for the case where one wants to use the gradient graph for training a neural network using the first-order gradients only. Default is False.

Returns

List of Variable.

If the backpropagation does not reach input(s), the corresponding returned value(s) are zero (i.e., the gradients w.r.t. inputs are zero) and not connected as a part of the gradient graph.

Example (Gradient Penalty):

import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import numpy as np
from nnabla.ext_utils import get_extension_context

# Context
extension_module = "cudnn"
ctx = get_extension_context(extension_module)
nn.set_default_context(ctx)

# Input and label
x = nn.Variable.from_numpy_array(np.random.randn(4, 3, 32, 32))
y = nn.Variable.from_numpy_array(np.random.randint(0, 10, 4).reshape(4, 1))

# Network
h = PF.convolution(x, 8, (3, 3), (1, 1), name="conv1")
h = F.relu(h)
h = F.max_pooling(h, (2, 2))
h = PF.convolution(h, 16, (3, 3), (1, 1), name="conv2")
h = F.relu(h)
h = F.max_pooling(h, (2, 2))
p = PF.affine(h, 10, name="pred")
loss = F.mean(F.softmax_cross_entropy(p, y))

# Grad
outputs = [loss]
inputs = nn.get_parameters().values()
grads = nn.grad(outputs, inputs)  # gradients of the parameters

# Backward of the outputs w.r.t. the parameters by constraining the gradient norms.
t = 0 # or 1
gp = sum([(F.sum(g ** 2) ** 0.5 - t) ** 2 for g in grads])
loss += gp
loss.forward()
loss.backward()

Example (Higer-order Gradients):

import nnabla as nn
import nnabla.functions as F
import numpy as np

x = nn.Variable.from_numpy_array(np.random.randn(2, 2)).apply(need_grad=True)
x.grad.zero()
y = F.sin(x)
def grad(y, x, n=1):
    dx = [y]
    for _ in range(n):
        dx = nn.grad([dx[0]], [x])
    return dx[0]
dnx = grad(y, x, n=10)
dnx.forward()
print(np.allclose(-np.sin(x.d), dnx.d))
dnx.backward()
print(np.allclose(-np.cos(x.d), x.g))

# Show the supported status for each function
from nnabla.backward_functions import show_registry
show_registry()
nnabla.backward_functions.register(func_name, func)[source]

Register the backward function to a function.

Parameters:
  • func_name (str) – The function class name, for example, Affine.

  • func (function) – The function to be called as the backward function to the function func_name.. Arguments of the func must be (ctx: nn.Context, inputs: list of nn.Variable, **kwargs).. The inputs are the ones to the function of the func_name. The kwargs are the arguments of the function. For example, if the func_name is Affine, func is affine_backward, the inputs are data, weights, and bias if necessary, and kwargs = dict(base_axis=base_axis).

nnabla.backward_functions.show_registry()[source]

Show all backward fuctions registry