Semantic Segmentation Models
This subpackage provides a pre-trained state-of-the-art model for the purpose of semantic segmentation (DeepLabv3+, Xception-65 as backbone) which is trained on ImageNet dataset and fine-tuned on Pascal VOC and MS COCO dataset.
The pre-trained models can be used for inference as following:
#Import required modules
import numpy as np
import nnabla as nn
from nnabla.utils.image_utils import imread
from nnabla.models.semantic_segmentation import DeepLabV3plus
from nnabla.models.semantic_segmentation.utils import ProcessImage
target_h = 513
target_w = 513
# Get context
from nnabla.ext_utils import get_extension_context
nn.set_default_context(get_extension_context('cudnn', device_id='0'))
# Build a Deeplab v3+ network
image = imread("./test.jpg")
x = nn.Variable((1, 3, target_h, target_w), need_grad=False)
deeplabv3 = DeepLabV3plus('voc-coco',output_stride=8)
y = deeplabv3(x)
# preprocess image
processed_image = ProcessImage(image, target_h, target_w)
input_array = processed_image.pre_process()
# Compute inference
x.d = input_array
y.forward(clear_buffer=True)
print ("done")
output = np.argmax(y.d, axis=1)
# Apply post processing
post_processed = processed_image.post_process(output[0])
#Display predicted class names
predicted_classes = np.unique(post_processed).astype(int)
for i in range(predicted_classes.shape[0]):
print('Classes Segmented: ', deeplabv3.category_names[predicted_classes[i]])
# save inference result
processed_image.save_segmentation_image("./output.png")
Name |
Class |
Output stride |
mIOU |
Training framework |
Notes |
---|---|---|---|---|---|
DeepLabv3+ |
8 |
81.48 |
Nnabla |
Backbone (Xception-65) weights converted from author’s model and used for finetuning |
|
DeepLabv3+ |
16 |
82.20 |
Nnabla |
Backbone (Xception-65) weights converted from author’s model and used for finetuning |
Name |
Class |
Output stride |
mIOU |
Training framework |
Notes |
---|---|---|---|---|---|
DeepLabv3+ |
8 |
82.20 |
Tensorflow |
Weights converted from author’s model |
|
DeepLabv3+ |
16 |
83.58 |
Tensorflow |
Weights converted from author’s model |
Common interfaces
- class nnabla.models.semantic_segmentation.base.SemanticSegmentation[source]
Semantic Segmentation pretrained models are inherited from this class so that it provides some common interfaces.
- __call__(input_var=None, use_from=None, use_up_to='segmentation', training=False, returns_net=False, verbose=0)[source]
Create a network (computation graph) from a loaded model.
- Parameters:
input_var (Variable, optional) – If given, input variable is replaced with the given variable and a network is constructed on top of the variable. Otherwise, a variable with batch size as 1 and a default shape from
self.input_shape
.use_up_to (str) – Network is constructed up to a variable specified by a string. A list of string-variable correspondences in a model is described in documentation for each model class.
training (bool) – This option enables additional training (fine-tuning, transfer learning etc.) for the constructed network. If True, the
batch_stat
option in batch normalization is turnedTrue
, andneed_grad
attribute in trainable variables (conv weights and gamma and beta of bn etc.) is turnedTrue
. The default isFalse
.returns_net (bool) – When
True
, it returns aNnpNetwork
object. Otherwise, It only returns the last variable of the constructed network. The default isFalse
.verbose (bool, or int) – Verbose level. With
0
, it says nothing during network construction.
- property input_shape
Should return default image size (channel, height, width) as a tuple.
List of models
- class nnabla.models.semantic_segmentation.DeepLabV3plus(dataset='voc', output_stride=16)[source]
DeepLabV3+.
- Parameters:
dataset (str) – Specify a training dataset name from ‘voc’ or ‘voc-coco’.
output_stride (int) – DeepLabV3 uses atrous (a.k.a. dilated) convolutions. The atrous rate depends on the output stride. the output stride has to be selected from 8 or 16. Default is 8. If the output_stride is 8 the atrous rate will be [12,24,36] and if the output_stride is 16 the atrous rate will be [6,12,18].
The following is a list of string that can be specified to
use_up_to
option in__call__
method;'segmentation'
(default): The output of the final layer.'lastconv'
: The output from last Convolution.'lastconv+relu'
: Network up to'lastconv'
followed by ReLU activation.
References