Computation Graph

Computation Graph

nnabla.forward_all(variables, bool clear_buffer=False, bool clear_no_need_grad=False, function_pre_hook=None, function_post_hook=None)

Performs a forward propagation up to variables specified as the 1st argument. See also forward.

  • clear_buffer (bool) –

    Clear the no longer referenced variables during forward propagation to save memory. This is usually set as True in an inference or a validation phase. Default is False. Note that starting variable and destination variable of the input graph will not be cleared, regardless of their persistent flag. All intermediate variables will be cleared unless set explicitly as persistent=True. For example,

    forward_all([h_i, y], clear_buffer=True)

    will clear all intermediate variables between h_i and y unless set explicitly as persistent=True, but h_i and y will not be cleared regardless of their persistent flag.

  • clear_no_need_grad (bool) – Clear the unreferenced variables with need_grad=False during forward propagation. True is usually used when calling this during training. This is ignored when clear_buffer=True.

  • function_pre_hook (callable) – This callable object is called immediately before each function is executed. It must take Function as an input. The default is None.

  • function_post_hook (callable) – This callable object is called immediately after each function is executed. It must take Function as an input. The default is None.


import numpy as np
import nnabla as nn
import nnabla.parametric_functions as PF

# Create a graph which has two outputs
x = nn.Variable.from_numpy_array(np.array([[1, 2], [3, 4]]))
y = PF.affine(x, 4, name="y")
z = PF.affine(x, 8, name="z")

# Execute a forward propagation recursively up to y and z
nn.forward_all([y, z], clear_buffer)

No gradients for the whole network.

No gradients are required when creating a network, such that when the forward pass is executed, all intermediate buffers except for the leafs in the network are gone at the same time, resulting in memory optimization.

This is useful for example when an output of a pre-trained network is used for an input to another network, where the first pre-trained network does not need to be fine-tuned, but the other network is optimized.


no_grad (bool) – No gradient flag. Default is True.


with nn.no_grad():
    output0 = <Network0>(<input0>)

output1 = <Network1>(<input1>, output0)
loss = <Loss>(output1, <ground_truth>)

This context also works in the dynamic mode.

with nn.auto_forward(), nn.no_grad():
    output0 = <Network0>(<input0>)


When working with the static network, the need_grad property of the input (e.g., input image) must be False and do not forget to add <root>.forward(clear_no_need_grad=True); otherwise, all intermediate buffers are not gone as expected.