Grad
- nnabla.grad.grad(outputs, inputs, grad_outputs=None, persistent_outputs=[], bind_grad_output=False)[source]
Gradient function for the outputs with respect to the inputs.
The grad function computes the sum of gradients of the outputs w.r.t. the inputs.
\[g_i = \sum_{j} {\frac{\partial y_j}{\partial x_i}},\]\(y_j\) is each output, \(x_i\) is each input, and \(g_i\) is the sum of the gradient of \(y_j\) w.r.t. \(x_i\) over all \(j\).
- Parameters:
outputs (list of
Variable
orVariable
) – Outputs of the differentiable function.inputs (list of
Variable
orVariable
) – Inputs w.r.t. which the gradients of outputs are computed.grad_outputs (None, scalar,
numpy.ndarray
,nnabla.NdArray
, or list of scalar,numpy.ndarray
, ornnabla.NdArray
,) – Gradient outputs corresponding to outputs. This is same as the grad argument ofbackward()
. Default is None, so 1 is used as the in-coming gradient at the very beginning of the Variable in the gradient graph.persistent_outputs (list of
bool
) – Outputs become persistent accordingly. If not specified, all outputs become persistent.bind_grad_output (
bool
) – Bind data to grad of input variable. This is useful for the case where one wants to use the gradient graph for training a neural network using the first-order gradients only. Default is False.
- Returns
List of
Variable
.If the backpropagation does not reach input(s), the corresponding returned value(s) are
zero
(i.e., the gradients w.r.t. inputs are zero) and not connected as a part of the gradient graph.
Example (Gradient Penalty):
import nnabla as nn import nnabla.functions as F import nnabla.parametric_functions as PF import numpy as np from nnabla.ext_utils import get_extension_context # Context extension_module = "cudnn" ctx = get_extension_context(extension_module) nn.set_default_context(ctx) # Input and label x = nn.Variable.from_numpy_array(np.random.randn(4, 3, 32, 32)) y = nn.Variable.from_numpy_array(np.random.randint(0, 10, 4).reshape(4, 1)) # Network h = PF.convolution(x, 8, (3, 3), (1, 1), name="conv1") h = F.relu(h) h = F.max_pooling(h, (2, 2)) h = PF.convolution(h, 16, (3, 3), (1, 1), name="conv2") h = F.relu(h) h = F.max_pooling(h, (2, 2)) p = PF.affine(h, 10, name="pred") loss = F.mean(F.softmax_cross_entropy(p, y)) # Grad outputs = [loss] inputs = nn.get_parameters().values() grads = nn.grad(outputs, inputs) # gradients of the parameters # Backward of the outputs w.r.t. the parameters by constraining the gradient norms. t = 0 # or 1 gp = sum([(F.sum(g ** 2) ** 0.5 - t) ** 2 for g in grads]) loss += gp loss.forward() loss.backward()
Example (Higer-order Gradients):
import nnabla as nn import nnabla.functions as F import numpy as np x = nn.Variable.from_numpy_array(np.random.randn(2, 2)).apply(need_grad=True) x.grad.zero() y = F.sin(x) def grad(y, x, n=1): dx = [y] for _ in range(n): dx = nn.grad([dx[0]], [x]) return dx[0] dnx = grad(y, x, n=10) dnx.forward() print(np.allclose(-np.sin(x.d), dnx.d)) dnx.backward() print(np.allclose(-np.cos(x.d), x.g)) # Show the supported status for each function from nnabla.backward_functions import show_registry show_registry()
- nnabla.backward_functions.register(func_name, func)[source]
Register the backward function to a function.
- Parameters:
func_name (str) – The function class name, for example, Affine.
func (function) – The function to be called as the backward function to the function
func_name
.. Arguments of the func must be (ctx: nn.Context, inputs: list of nn.Variable, **kwargs).. The inputs are the ones to the function of thefunc_name
. The kwargs are the arguments of the function. For example, if thefunc_name
is Affine, func isaffine_backward
, the inputs are data, weights, and bias if necessary, and kwargs = dict(base_axis=base_axis).