Semantic Segmentation Models

This subpackage provides a pre-trained state-of-the-art model for the purpose of semantic segmentation (DeepLabv3+, Xception-65 as backbone) which is trained on ImageNet dataset and fine-tuned on Pascal VOC and MS COCO dataset.

The pre-trained models can be used for inference as following:

#Import required modules
import numpy as np
import nnabla as nn
from nnabla.utils.image_utils import imread
from nnabla.models.semantic_segmentation import DeepLabV3plus
from nnabla.models.semantic_segmentation.utils import ProcessImage

target_h = 513
target_w = 513
# Get context
from nnabla.ext_utils import get_extension_context
nn.set_default_context(get_extension_context('cudnn', device_id='0'))

# Build a Deeplab v3+ network
image = imread("./test.jpg")
x = nn.Variable((1, 3, target_h, target_w), need_grad=False)
deeplabv3 = DeepLabV3plus('voc-coco',output_stride=8)
y = deeplabv3(x)

# preprocess image
processed_image = ProcessImage(image, target_h, target_w)
input_array = processed_image.pre_process()

# Compute inference
x.d = input_array
y.forward(clear_buffer=True)
print ("done")
output = np.argmax(y.d, axis=1)

# Apply post processing
post_processed = processed_image.post_process(output[0])

#Display predicted class names
predicted_classes = np.unique(post_processed).astype(int)
for i in range(predicted_classes.shape[0]):
    print('Classes Segmented: ', deeplabv3.category_names[predicted_classes[i]])

# save inference result
processed_image.save_segmentation_image("./output.png")

Available models trained on voc dataset
Name	Class	Output stride	mIOU	Training framework	Notes
DeepLabv3+	DeepLabv3+	8	81.48	Nnabla	Backbone (Xception-65) weights converted from author’s model and used for finetuning
DeepLabv3+	DeepLabv3+	16	82.20	Nnabla	Backbone (Xception-65) weights converted from author’s model and used for finetuning

Available models trained on Voc and coco dataset
Name	Class	Output stride	mIOU	Training framework	Notes
DeepLabv3+	DeepLabv3+	8	82.20	Tensorflow	Weights converted from author’s model
DeepLabv3+	DeepLabv3+	16	83.58	Tensorflow	Weights converted from author’s model

Common interfaces

class nnabla.models.semantic_segmentation.base.SemanticSegmentation[source]

Semantic Segmentation pretrained models are inherited from this class so that it provides some common interfaces.

__call__(input_var=None, use_from=None, use_up_to='segmentation', training=False, returns_net=False, verbose=0)[source]

Create a network (computation graph) from a loaded model.

Parameters:

input_var (Variable, optional) – If given, input variable is replaced with the given variable and a network is constructed on top of the variable. Otherwise, a variable with batch size as 1 and a default shape from self.input_shape.
use_up_to (str) – Network is constructed up to a variable specified by a string. A list of string-variable correspondences in a model is described in documentation for each model class.
training (bool) – This option enables additional training (fine-tuning, transfer learning etc.) for the constructed network. If True, the batch_stat option in batch normalization is turned True, and need_grad attribute in trainable variables (conv weights and gamma and beta of bn etc.) is turned True. The default is False.
returns_net (bool) – When True, it returns a NnpNetwork object. Otherwise, It only returns the last variable of the constructed network. The default is False.
verbose (bool, or int) – Verbose level. With 0, it says nothing during network construction.

property input_shape: Should return default image size (channel, height, width) as a tuple.

List of models

class nnabla.models.semantic_segmentation.DeepLabV3plus(dataset='voc', output_stride=16)[source]

DeepLabV3+.

Parameters:

dataset (str) – Specify a training dataset name from ‘voc’ or ‘voc-coco’.
output_stride (int) – DeepLabV3 uses atrous (a.k.a. dilated) convolutions. The atrous rate depends on the output stride. the output stride has to be selected from 8 or 16. Default is 8. If the output_stride is 8 the atrous rate will be [12,24,36] and if the output_stride is 16 the atrous rate will be [6,12,18].

The following is a list of string that can be specified to use_up_to option in __call__ method;

'segmentation' (default): The output of the final layer.
'lastconv': The output from last Convolution.
'lastconv+relu': Network up to 'lastconv' followed by ReLU activation.

References

Chen et al., Rethinking Atrous Convolution for Semantic Image Segmentation.