NNabla offers easy extensibility for developers to add new device extensions. The NNabla Python package officially supports the cpu, cuda and cudnn extension, cuda and cudnn extension can dramatically accelerate computation by leveraging NVIDIA CUDA GPUs with cuDNN computation primitives.

You can manually import extensions by:

import nnabla_ext.cudnn

See :ref:`python-package-installation` to install the CUDA extension.

Utilities for extension

Utilities for NNabla extensions.


List up available extensions.


It may not work on some platforms/environments since it depends on the directory structure of the namespace packages.

Returns: list of str

Names of available extensions.


Import an extension module by name.

The extension modules are installed under the nnabla_ext package as namespace packages. All extension modules provide a unified set of APIs.


ext_name (str) – Extension name. e.g. ‘cpu’, ‘cuda’, ‘cudnn’ etc.

Returns: module

An Python module of a particular NNabla extension.


ext = import_extension_module('cudnn')
available_devices = ext.get_devices()
nnabla.ext_utils.get_extension_context(ext_name, **kw)[source]

Get the context of the specified extension.

All extension’s module must provide context(**kw) function.

  • ext_name (str) – Module path relative to nnabla_ext.

  • kw (dict) – Additional keyword arguments for context function in a extension module.


The current extension context.

Return type



ctx = get_extension_context('cudnn', device_id='0', type_config='half')

APIs of extension modules

All extension modules must have the following functions.


Returns a default context descriptor of the extension module. This method takes optional arguments depending on the extension. For example, in the cudnn extension, it takes the device_id as an int to specify the GPU where computation runs on.


This method is used to synchronize the device execution stream with respect to the host thread. For example, in CUDA, the kernel execution is enqueued into a stream, and is executed asynchronously w.r.t. the host thread. This function is only valid in devices that use such features. In the CPU implementation, this method is implemented as dummy function, and therefore calls to this function are ignored. The function in the cudnn extension takes the device_id as an optional argument, which specifies the device you want to synchronize with.