Out-of-core execution
The nnabla.lms
package provides APIs that allow users to execute large-scale networks than allotted GPU memory by utilizing out-of-core algorithm.
Out-of-core algorithm, or external memory algorithm, is an algorithm that enables processing data that are too large to fit into a main memory at once.
SwapInOutScheduler
- class nnabla.lms.SwapInOutScheduler
Interface class for out-of-core execution / training.
This API enables training neural networks whose size is larger than alloted GPU memory. See https://arxiv.org/abs/2010.14109 for more detail of shcheduling strategy.
Note
cuda_init.prefer_cuda_virtual_array()
used in following example can be used under cuda >= 10.2 and cudnn >= 8. We utilize virtual memory management supported from cuda 10.2. Additionally, when we tested virtual memory management with cuda >= 10.2 and cudnn < 8, we found the computation results of some cuddn functions are inaccurate. So, when your environment has cuda < 10.2 or cudnn < 8, the virtual memory allocator in nnabla will not be built and you can’t use it. If you would like to use SwapInOutScheduler to the fullest extent, please install cuda >= 10.2 and cudnn >= 8 and reinstall the corresponding nnabla-ext-cuda package.Example:
from nnabla.lms import SwapInOutScheduler # Change memory allocator which is preferable for SwapInOutScheduler. from nnabla_ext.cuda.init as cuda_init cuda_init.prefer_cpu_pinned_array() # To accelerate memory transfer, using pinned memory for cpu memory will be preferable. # Only for cuda >= 10.2 and cudnn >= 8. This setting is the best for SwapInOutScheduler. cuda_init.prefer_cuda_virtual_array() # To reduce memory fragmentation due to cpu-gpu memory transfers, using virtual allocator for gpu memory will be preferable. # create context for both host and device from nnabla.ext_utils import get_extension_context host_ctx = get_extension_context("cpu", device_id="", type_config="float") # device_id is dummy device_ctx = get_extension_context("cudnn", device_id="0", type_config="float") scheduler = SwapInOutScheduler(host_ctx, device_ctx, size=max_gpu_memory_size) # Make sure to call `nn.set_default_context` after calling prefer_xxx_array() to activate a change of memory preference. nn.set_default_context(device_ctx) x = nn.Variable(...) loss = build_network(x) solver = S.Sgd(nn.get_parameters()) for i in range(iteration): # scheduling memory transfers for all tensors appearing under the context of scheduler. with scheduler: x.d = next_data() loss.forward(clear_no_need_grad=True) solver.zero_grad() loss.backward(clear_buffer=True) solver.update()
When you get Out-of-Memory (OOM) error under the SwapInOutScheduler, possibly there are 2 options to avoid this OOM.
Set small budget of GPU memory for scheduling.
Set small size for a physical memory chunk allocated by virtual memory allocator.
These are examplified as follows:
Example:
# 1. Set small budget of GPU memory for scheduling # You can reduce the below ratio until you can execute your network. memsize_for_scheduler = max_gpu_memory_size * 0.8 scheduler = SwapInOutScheduler(..., size=memsize_for_scheduler) # 2. Set small size for a physical memory chunk allocated by virtual memory allocator # In default, the chunk size is set as 20MB (20 << 20). from nnabla_ext.cuda.init import set_cuda_virtual_memory_chunk_size set_cuda_virtual_memory_chunk_size(2 << 20) # Set 2MB, for example.
- end_scheduling(self)
An interface to specify the end point for scheduling. A range between
start_scheduling()
andend_scheduling()
is a target for a single scheduling.Note that, using with statement of SwapInOutScheduler,
end_scheduling()
will be automatically called when exiting with statement. In general, avoid to usestart_scheduling()
andend_scheduling()
and use with statement insted (with scheduler:
, see an example above).
- function_post_hook(self, func)
A callback executed as
function_post_hook
in forward and backward.For all forward and backward wrapped by with statement of SwapInOutScheduler, this callback is automatically set. In general, avoid to set this manually and use with statement of SwapInOutScheduler.
- function_pre_hook(self, func)
A callback executed as
function_pre_hook
in forward and backward.For all forward and backward wrapped by with statement of SwapInOutScheduler, this callback is automatically set. In general, avoid to set this manually and use with statement of SwapInOutScheduler.
- start_scheduling(self)
An interface to specify the starting point for scheduling. A range between
start_scheduling()
andend_scheduling()
is a target for a single scheduling.Note that, using with statement of SwapInOutScheduler,
start_scheduling()
will be automatically called when entering with statement. In general, avoid to usestart_scheduling()
andend_scheduling()
and use with statement insted (with scheduler:
, see an example above).
- update_post_hook(self)
A callback executed as
post_hook
in all solver functions, e.g. solver.update, solver.weight_decay, solver.clip_grad_by_norm, and so on.For all solver functions wrapped by with statement of SwapInOutScheduler, this callback is automatically set. In general, avoid to set this manually and use with statement of SwapInOutScheduler.
- update_pre_hook(self)
A callback executed as
pre_hook
in all solver functions, e.g. solver.update, solver.weight_decay, solver.clip_grad_by_norm, and so on.For all solver functions wrapped by with statement of SwapInOutScheduler, this callback is automatically set. In general, avoid to set this manually and use with statement of SwapInOutScheduler.