Solvers¶
The nnabla.solvers.Solver
class represents a stochastic gradient descent based optimizer for optimizing the parameters in the computation graph. NNabla provides various solvers listed below.
Solver¶
- class nnabla.solvers.Solver¶
Solver interface class.
The same API provided in this class can be used to implement various types of solvers.
Example:
# Network building comes above import nnabla.solvers as S solver = S.Sgd(lr=1e-3) solver.set_parameters(nn.get_parameters()) for itr in range(num_itr): x.d = ... # set data t.d = ... # set label loss.forward() solver.zero_grad() # All gradient buffer being 0 loss.backward() solver.weight_decay(decay_rate) # Apply weight decay solver.clip_grad_by_norm(clip_norm) # Apply clip grad by norm solver.update() # updating parameters
Note
All solvers provided by NNabla belong to an inherited class of
Solver
. A solver is never instantiated by this class itself.- check_inf_grad(self, pre_hook=None, post_hook=None)¶
Check if there is any inf on the gradients which were setup.
- check_inf_or_nan_grad(self, pre_hook=None, post_hook=None)¶
Check if there is any inf or nan on the gradients which were setup.
- check_nan_grad(self, pre_hook=None, post_hook=None)¶
Check if there is any nan on the gradients which were setup.
- clear_parameters(self)¶
Clear all registered parameters and states.
- clip_grad_by_norm(self, float clip_norm, pre_hook=None, post_hook=None)¶
Clip gradients by norm. When called, the gradient will be clipped by the given norm.
- Parameters
clip_norm (float) – The value of clipping norm.
- get_parameters(self)¶
Get all registered parameters
- get_states(self)¶
Get all states
- info¶
object
- Type
info
- learning_rate(self)¶
Get the learning rate.
- load_states(self, path)¶
Load solver states.
- Parameters
path – path to the state file to be loaded.
- name¶
Get the name of the solver.
- remove_parameters(self, vector[string] keys)¶
Remove previously registered parameters, specified by a
vector
of its keys.
- save_states(self, path)¶
Save solver states.
- Parameters
path – path or file object
- scale_grad(self, scale, pre_hook=None, post_hook=None)¶
Rescale gradient
- set_learning_rate(self, learning_rate)¶
Set the learning rate.
- set_parameters(self, param_dict, bool reset=True, bool retain_state=False)¶
Set parameters by dictionary of keys and parameter Variables.
- Parameters
param_dict (dict) – key:string, value: Variable.
reset (bool) – If true, clear all parameters before setting parameters. If false, parameters are overwritten or added (if it’s new).
retain_state (bool) – The value is only considered if reset is false. If true and a key already exists (overwriting), a state (such as momentum) associated with the key will be kept if the shape of the parameter and that of the new param match.
- set_states(self, states)¶
Set states. Call
set_parameters
to initialize states of a solver first, otherwise this method raise an value error.
- set_states_from_protobuf(self, optimizer_proto)¶
Set states to the solver from the protobuf file.
Internally used helper method.
- set_states_to_protobuf(self, optimizer)¶
Set states to the protobuf file from the solver.
Internally used helper method.
- setup(self, params)¶
Deprecated. Call
set_parameters
withparam_dict
.
- update(self, update_pre_hook=None, update_post_hook=None)¶
When this function is called, parameter values are updated using the gradients accumulated in backpropagation, stored in the
grad
field of the parameterVariable
s. Update rules are implemented in the C++ core, in derived classes of Solver. The updated parameter values will be stored into the data field of the parameterVariable
s.- Parameters
update_pre_hook (callable) – This callable object is called immediately before each update of parameters. The default is None.
update_post_hook (callable) – This callable object is called immediately after each update of parameters. The default is None.
- weight_decay(self, float decay_rate, pre_hook=None, post_hook=None)¶
Apply weight decay to gradients.
When called, the gradient weight will be decayed by a rate of the current parameter value.
- Parameters
decay_rate (float) – The coefficient of weight decay.
Note
In solvers which
weight_decay_is_fused()
returns true, the weight decay is not immediately performed when called. Instead, the specifieddecay_rate
is stored in the solver instance, and lazily evaluated whenupdate()
method is called. The stored decay rate will expire afterupdate()
and revert to 0 or a default value specified at initialization of Solver class (if exists, ex. SgdW). The definition of weight decay operation depends on each of solver classes. Please refer to the documentation of each solver class.
- weight_decay_is_fused(self)¶
Returns a boolean which represents whether weight decay is fused into
update()
, hence lazily evaluated.See
weight_decay()
for more details.
- zero_grad(self)¶
Initialize gradients of all registered parameter by zero.
List of solvers¶
- nnabla.solvers.Sgd(lr=0.001)¶
Stochastic gradient descent (SGD) optimizer.
\[w_{t+1} \leftarrow w_t - \eta \Delta w_t\]
- nnabla.solvers.Momentum(lr=0.001, momentum=0.9)¶
SGD with Momentum.
\[\begin{split}v_t &\leftarrow \gamma v_{t-1} + \eta \Delta w_t\\ w_{t+1} &\leftarrow w_t - v_t\end{split}\]- Parameters
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.Lars(lr=0.001, momentum=0.9, coefficient=0.001, eps=1e-06)¶
LARS with Momentum.
\[\begin{split}\lambda &= \eta \frac{\| w_t \|}{\| g_t \| + \| \beta w_t \|} \\ v_{t+1} &\leftarrow m v_t + \gamma_t \lambda (g_t + \beta w_t) \\ w_{t+1} &\leftarrow w_t - v_{t+1}\end{split}\]where \(g_t\) denotes a gradient, \(\beta\) is the decoupled weight decay rate set by
weight_decay()
method (lazy evaluation), \(v_0 \leftarrow 0\), and the rest is described in the argument documentation.- Parameters
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.Nesterov(lr=0.001, momentum=0.9)¶
Nesterov Accelerated Gradient optimizer.
\[\begin{split}v_t &\leftarrow \gamma v_{t-1} - \eta \Delta w_t\\ w_{t+1} &\leftarrow w_t - \gamma v_{t-1} + \left(1 + \gamma \right) v_t\end{split}\]- Parameters
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
Yurii Nesterov. A method for unconstrained convex minimization problem with the rate of convergence \(o(1/k2)\).
- nnabla.solvers.Adadelta(lr=1.0, decay=0.95, eps=1e-06)¶
AdaDelta optimizer.
\[\begin{split}g_t &\leftarrow \Delta w_t\\ v_t &\leftarrow - \frac{RMS \left[ v_t \right]_{t-1}} {RMS \left[ g \right]_t}g_t\\ w_{t+1} &\leftarrow w_t + \eta v_t\end{split}\]- Parameters
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.Adagrad(lr=0.01, eps=1e-08)¶
ADAGrad optimizer.
\[\begin{split}g_t &\leftarrow \Delta w_t\\ G_t &\leftarrow G_{t-1} + g_t^2\\ w_{t+1} &\leftarrow w_t - \frac{\eta}{\sqrt{G_t} + \epsilon} g_t\end{split}\]- Parameters
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.AdaBelief(alpha=0.001, beta1=0.9, beta2=0.999, eps=1e-08, wd=0.0, amsgrad=False, weight_decouple=False, fixed_decay=False, rectify=False)¶
AdaBelief optimizer.
\[\begin{split}m_t &\leftarrow \beta_1 m_{t-1} + (1 - \beta_1) g_t\\ s_t &\leftarrow \beta_2 s_{t-1} + (1 - \beta_2) (g_t - m_t)^2\\ w_{t+1} &\leftarrow w_t - \alpha \frac{\sqrt{1 - \beta_2^t}}{1 - \beta_1^t} \frac{m_t}{\sqrt{s_t + \epsilon} + \epsilon}\end{split}\]- Parameters
alpha (float) – Step size (\(\alpha\)).
beta1 (float) – Decay rate of first-order momentum (\(\beta_1\)).
beta2 (float) – Decay rate of second-order momentum (\(\beta_2\)).
eps (float) – Small value for avoiding zero division(\(\epsilon\)).
wd (float) – The default weight decay rate enabled only when weight_decouple is true. If enabled, the weight decay operation is decoupled and fused into the update operation. It uses this default decay rate unless you overwrite a decay rate via
weight_decay()
for the next call ofupdate()
.amsgrad (bool) – Perform AMSGrad variant of AdaBelief.
weight_decouple (bool) – Whether to perform decoupled weight decay as in AdamW.
fixed_decay (bool) – If True, the weight decay ratio will be kept fixed. Note that this option only takes effect when weight_decouple option is enabled.
rectify (bool) – Perform RAdam variant of AdaBelief.
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.RMSprop(lr=0.001, decay=0.9, eps=1e-08)¶
RMSprop optimizer (Geoffery Hinton).
\[\begin{split}g_t &\leftarrow \Delta w_t\\ v_t &\leftarrow \gamma v_{t-1} + \left(1 - \gamma \right) g_t^2\\ w_{t+1} &\leftarrow w_t - \eta \frac{g_t}{\sqrt{v_t} + \epsilon}\end{split}\]- Parameters
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.RMSpropGraves(lr=0.0001, decay=0.95, momentum=0.9, eps=0.0001)¶
RMSpropGraves optimizer (Alex Graves).
\[\begin{split}n_t &\leftarrow \rho n_{t-1} + \left(1 - \rho \right) {e_t}^2\\ g_t &\leftarrow \rho g_{t-1} + \left(1 - \rho \right) e_t\\ d_t &\leftarrow \beta d_{t-1} - \eta \frac{e_t}{\sqrt{n_t - {g_t}^2 + \epsilon}}\\ w_{t+1} &\leftarrow w_t + d_t\end{split}\]where \(e_t\) denotes the gradient.
- Parameters
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.Adam(alpha=0.001, beta1=0.9, beta2=0.999, eps=1e-08)¶
ADAM optimizer.
\[\begin{split}m_t &\leftarrow \beta_1 m_{t-1} + (1 - \beta_1) g_t\\ v_t &\leftarrow \beta_2 v_{t-1} + (1 - \beta_2) g_t^2\\ w_{t+1} &\leftarrow w_t - \alpha \frac{\sqrt{1 - \beta_2^t}}{1 - \beta_1^t} \frac{m_t}{\sqrt{v_t} + \epsilon}\end{split}\]where \(g_t\) denotes a gradient, and let \(m_0 \leftarrow 0\) and \(v_0 \leftarrow 0\).
- Parameters
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.AdaBound(alpha=0.001, beta1=0.9, beta2=0.999, eps=1e-08, final_lr=0.1, gamma=0.001)¶
AdaBound optimizer applies dynamic bounds on learning rates to Adam.
\[\begin{split}w_{t+1} &\leftarrow w_t - \eta_t*m_t\\ \eta_t &= clip( \alpha\frac{\sqrt{1 - \beta_2^t}}{(1 - \beta_1^t)(\sqrt{v_t} + \epsilon)}, \eta_l(t), \eta_u(t))\\ \eta_l(t) &= (1 - (1/((1-\gamma)t+1)))\alpha^*\\ \eta_u(t) &= (1 + (1/((1-\gamma)t)))\alpha^*\end{split}\]where \(\alpha^*\) (
final_lr
) is scaled by a factor defined as the current value of \(\alpha\) (set byset_learning_rate(lr)
) over initial value of \(\alpha\), so that learnign rate scheduling is properly applied to both \(\alpha\) and \(\alpha^*\).- Parameters
alpha (float) – Step size (\(\alpha\)).
beta1 (float) – Decay rate of first-order momentum (\(\beta_1\)).
beta2 (float) – Decay rate of second-order momentum (\(\beta_2\)).
eps (float) – Small value for avoiding zero division(\(\epsilon\)).
final_lr (float) – Final (SGD) learning rate.
gamma (float) – Convergence speed of the bound functions.
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.Adamax(alpha=0.002, beta1=0.9, beta2=0.999, eps=1e-08)¶
ADAMAX Optimizer.
\[\begin{split}m_t &\leftarrow \beta_1 m_{t-1} + (1 - \beta_1) g_t\\ v_t &\leftarrow \max\left(\beta_2 v_{t-1}, |g_t|\right)\\ w_{t+1} &\leftarrow w_t - \alpha \frac{\sqrt{1 - \beta_2^t}}{1 - \beta_1^t} \frac{m_t}{v_t + \epsilon}\end{split}\]where \(g_t\) denotes a gradient, and let \(m_0 \leftarrow 0\) and \(v_0 \leftarrow 0\), \(v_t\) is an exponentially weighted infinity norm of a sequence of gradients \(t=0,...,t\).
- Parameters
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.AMSGRAD(alpha=0.001, beta1=0.9, beta2=0.999, eps=1e-08, bias_correction=False)¶
AMSGRAD optimizer.
\[\begin{split}m_t &\leftarrow \beta_1 m_{t-1} + (1 - \beta_1) g_t\\ v_t &\leftarrow \beta_2 v_{t-1} + (1 - \beta_2) g_t^2\\ \hat{v_t} &= \max(\hat{v_{t-1}}, v_t)\\ w_{t+1} &\leftarrow w_t - \alpha \frac{m_t}{\sqrt{\hat{v_t}} + \epsilon}\end{split}\]where \(g_t\) denotes a gradient, and let \(m_0 \leftarrow 0\) and \(v_0 \leftarrow 0\).
- Parameters
alpha (float) – Step size (\(\alpha\)).
beta1 (float) – Decay rate of first-order momentum (\(\beta_1\)).
beta2 (float) – Decay rate of second-order momentum (\(\beta_2\)).
eps (float) – Small value for avoiding zero division(\(\epsilon\)). Note this does not appear in the paper.
bias_correction (bool) – Apply bias correction to moving averages defined in ADAM. Note this does not appear in the paper.
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.AMSBound(alpha=0.001, beta1=0.9, beta2=0.999, eps=1e-08, final_lr=0.1, gamma=0.001, bias_correction=False)¶
AMSBound optimizer applies dynamic bounds on learning rates to AMSGrad.
\[\begin{split}w_{t+1} &\leftarrow w_t - \eta_t*m_t\\ \eta_t &= clip( \alpha\frac{\sqrt{1 - \beta_2^t}}{(1 - \beta_1^t)(\sqrt{\hat{v_t}} + \epsilon)}, \eta_l(t), \eta_u(t))\\ \hat{v_t} &= \max(\hat{v_{t-1}}, v_t)\\ \eta_l(t) &= (1 - (1/((1-\gamma)t+1)))\alpha^*\\ \eta_u(t) &= (1 + (1/((1-\gamma)t)))\alpha^*\end{split}\]where \(\alpha^*\) (
final_lr
) is scaled by a factor defined as the current value of \(\alpha\) (set byset_learning_rate(lr)
) over initial value of \(\alpha\), so that learnign rate scheduling is properly applied to both \(\alpha\) and \(\alpha^*\).- Parameters
alpha (float) – Step size (\(\alpha\)).
beta1 (float) – Decay rate of first-order momentum (\(\beta_1\)).
beta2 (float) – Decay rate of second-order momentum (\(\beta_2\)).
eps (float) – Small value for avoiding zero division(\(\epsilon\)). Note this does not appear in the paper.
final_lr (float) – Final (SGD) learning rtae
gamma (float) – Convergence speed of the bound functions
bias_correction (bool) – Apply bias correction to moving averages defined in ADAM. Note this does not appear in the paper.
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.AdamW(alpha=0.001, beta1=0.9, beta2=0.999, eps=1e-08, wd=0.0001)¶
ADAM optimizer with decoupled weight decay.
\[\begin{split}m_t &\leftarrow \beta_1 m_{t-1} + (1 - \beta_1) g_t\\ v_t &\leftarrow \beta_2 v_{t-1} + (1 - \beta_2) g_t^2\\ \hat{m} &= m_t / (1-\beta_1^t)\\ \hat{v} &= v_t / (1-\beta_2^t)\\ w_{t} &\leftarrow w_{t-1} - \eta_t \left( \alpha \frac{\hat{m}}{\left(\sqrt{\hat{v}} + \epsilon \right)} + \lambda w_{t-1} \right)\end{split}\]where \(g_t\) denotes a gradient, \(m_t\) and \(v_t\) are 1st and 2nd order momentum of the gradient initialized with 0 at \(t=0\), \(\eta _t\) is the scheduled learning rate, \(\lambda\) is the decoupled weight decay rate set by
weight_decay()
method (lazy evaluation), and the rest is described in the argument documentation.- Parameters
alpha (float) – Initial learning rate (\(\alpha\)). Note that you have to manage the scheduled learning rate \(\eta_t\) yourelf. By denoting learning rate updated at the
set_learning_rate()
by \(\alpha_t\), we define \(\eta_t = \frac{\alpha_t}{\alpha}\).beta1 (float) – Decay rate of first-order momentum (\(\beta_1\)).
beta2 (float) – Decay rate of second-order momentum (\(\beta_2\)).
eps (float) – Small value for avoiding zero division(\(\epsilon\)).
wd (float) – The default weight decay rate (\(\lambda\)). The weight decay operation is fused into the update operation in this solver. It uses this default decay rate unless you overwrite a decay rate via
weight_decay()
for the next call ofupdate()
.
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.SgdW(lr=0.001, momentum=0.9, wd=0.0001)¶
Momentum stochastic gradient descent (SGD) optimizer with decoupled weight decay.
\[\begin{split}m_{t} &\leftarrow \gamma m_{t-1} + \eta_t \alpha g_t\\ w_{t} &\leftarrow w_{t-1} - m_{t} - \eta_t \lambda w_{t-1}\end{split}\]where \(g_t\) denotes a gradient, \(m_t\) is momentum of the gradient initialized with 0 at \(t=0\), \(\eta _t\) is the scheduled learning rate, \(\lambda\) is the decoupled weight decay rate set by
weight_decay()
method (lazy evaluation), and the rest is described in the argument documentation.- Parameters
lr (float) – Initial learning rate (\(\alpha\)). Note that you have to manage the scheduled learning rate \(\eta_t\) yourelf. By denoting learning rate updated at the
set_learning_rate()
by \(\alpha_t\), we define \(\eta_t = \frac{\alpha_t}{\alpha}\).momentum (float) – Decay rate of momentum (\(\gamma\)).
wd (float) – The default weight decay rate (\(\lambda\)). The weight decay operation is fused into the update operation in SgdW. It uses this default decay rate unless you overwrite a decay rate via
weight_decay()
for the next call ofupdate()
.
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References
- nnabla.solvers.Lamb(eta=0.001, beta1=0.9, beta2=0.999, gamma_l=0.0, gamma_u=10.0, eps=1e-06, bias_correction=False)¶
LAMB optimizer.
\[\begin{split}m_t &\leftarrow \beta_1 m_{t-1} + (1 - \beta_1) g_t\\ v_t &\leftarrow \beta_2 v_{t-1} + (1 - \beta_2) g_t^2\\ \hat{m} &= m_t / (1-\beta_1^t)\\ \hat{v} &= v_t / (1-\beta_2^t)\\ r &= \frac{\hat{m}}{\sqrt{\hat{v}}+\epsilon}\\ w_t &\leftarrow w_{t-1} - \eta_t \frac{\phi (\|w_{t-1}\|)}{\|r + \lambda w_{t-1} \|} \left(r + \lambda w_{t-1} \right)\end{split}\]where \(g_t\) denotes a gradient, \(m_t\) and \(v_t\) are 1st and 2nd order momentum of the gradient initialized with 0 at \(t=0\), \(\lambda\) is the decoupled weight decay rate set by
weight_decay()
method (lazy evaluation), \(\phi\) is a scaling function defined as \(\phi(z)=\min\{\max\{z, \gamma_l\}, \gamma_u\}\), and the rest is described in the arguments.- Parameters
eta (float) – Learning rate (\(\eta_t\)).
beta1 (float) – Decay rate of first-order momentum (\(\beta_1\)).
beta2 (float) – Decay rate of second-order momentum (\(\beta_2\)).
gamma_l (float) – Lower bound of the clamp scaling function \(\phi\) (\(\gamma_l\)).
gamma_u (float) – Upper bound the clamp scaling function \(\phi\) (\(\gamma_u\)).
eps (float) – Small value for avoiding zero division (\(\epsilon\)).
bias_correction (bool) – Whether to apply bias correction in momentum computation \(\hat{m}\) and \(\hat{v}\).
- Returns
- An instance of Solver class.
See Solver API guide for details.
- Return type
Note
You can instantiate a preferred target implementation (ex. CUDA) of a Solver given a Context. A Context can be set by
nnabla.set_default_context(ctx)
ornnabla.context_scope(ctx)
. See API docs.References