Semantic Segmentation モデル

本サブパッケージは、 ImageNet データセットで学習され、 Pascal VOC と MS COCO データセットでファインチューニングされた Semantic Segmentation (DeepLabv3+, Xception-65 をバックボーンとする) のための最先端の学習済みモデルを提供します。

学習済みモデルは、以下のように推論で使うことができます。

#Import required modules
import numpy as np
import nnabla as nn
from nnabla.utils.image_utils import imread
from nnabla.models.semantic_segmentation import DeepLabV3plus
from nnabla.models.semantic_segmentation.utils import ProcessImage

target_h = 513
target_w = 513
# Get context
from nnabla.ext_utils import get_extension_context
nn.set_default_context(get_extension_context('cudnn', device_id='0'))

# Build a Deeplab v3+ network
image = imread("./test.jpg")
x = nn.Variable((1, 3, target_h, target_w), need_grad=False)
deeplabv3 = DeepLabV3plus('voc-coco',output_stride=8)
y = deeplabv3(x)

# preprocess image
processed_image = ProcessImage(image, target_h, target_w)
input_array = processed_image.pre_process()

# Compute inference
x.d = input_array
y.forward(clear_buffer=True)
print ("done")
output = np.argmax(y.d, axis=1)

# Apply post processing
post_processed = processed_image.post_process(output[0])

#Display predicted class names
predicted_classes = np.unique(post_processed).astype(int)
for i in range(predicted_classes.shape[0]):
    print('Classes Segmented: ', deeplabv3.category_names[predicted_classes[i]])

# save inference result
processed_image.save_segmentation_image("./output.png")

voc データセットで学習した利用可能なモデル
名前	クラス	ストライド出力	mIOU	学習フレームワーク	備考
DeepLabv3+	DeepLabv3+	8	81.48	Nnabla	論文著者のモデルから変換し、ファインチューニングに使われるバックボーン (Xception-65) の重み
DeepLabv3+	DeepLabv3+	16	82.20	Nnabla	論文著者のモデルから変換し、ファインチューニングに使われるバックボーン (Xception-65) の重み

Voc と COCO データセットで学習した利用可能なモデル
名前	クラス	ストライド出力	mIOU	学習フレームワーク	備考
DeepLabv3+	DeepLabv3+	8	82.20	Tensorflow	論文著者のモデルから変換した重み
DeepLabv3+	DeepLabv3+	16	83.58	Tensorflow	論文著者のモデルから変換した重み

共通インターフェイス

class nnabla.models.semantic_segmentation.base.SemanticSegmentation[ソース]

Semantic Segmentation 学習済みモデルは、このクラスから継承され、いくつかの共通インターフェイスを提供します。

__call__(input_var=None, use_from=None, use_up_to='segmentation', training=False, returns_net=False, verbose=0)[ソース]

読み込んだモデルからネットワーク (計算グラフ) を生成します。

パラメータ:

input_var (Variable, optional) -- 指定された場合、入力変数は指定された変数に置き換えられ、その変数の上にネットワークが構築されます。それ以外の場合、バッチサイズが１で、 self.input_shape を元にした形状を持つ変数となります。
use_up_to (str) -- ネットワークは文字列で指定した変数まで構築されます。モデルに対応した文字列変数リストは、各モデルクラスのドキュメントに記載されます。
training (bool) -- このオプションは、構築されたネットワークの追加学習 (ファインチューニング、転移学習など) を有効にします。 True の場合、batch normalization の batch_stat オプションが True になり、学習可能な変数 (畳み込み層の重みおよびバッチ正規化層のガンマとベータなど) の need_grad 属性が True になります。デフォルトは False です。
returns_net (bool) -- True の場合、 NnpNetwork オブジェクトを返します。それ以外の場合、構築したネットワークの最後の変数を返すのみです。デフォルトは False です。
verbose (bool, or int) -- 詳細レベル。 0 に設定した場合、ネットワーク構築中は何も出力しません。

property input_shape: デフォルトの画像サイズ (チャネル、高さ、幅) の組として返します。

モデルのリスト

class nnabla.models.semantic_segmentation.DeepLabV3plus(dataset='voc', output_stride=16)[ソース]

DeepLabV3+.

パラメータ:

dataset (str) -- ‘voc’ または ‘voc-coco’ から学習データセット名を指定します。
output_stride (int) -- DeepLabV3 はatrous convolution (別名：dilated convolution) を使います。atrous レートは出力ストライドに依存します。出力ストライドは 8 あるいは 16 から指定される必要があります。デフォルトは 8 です。もし output_stride が 8 の場合、atrous レートは [12,24,36] となり、output_stride が 16 の場合、atrous レートは [6,12,18] となります。

__call__ method にある use_up_to オプションで指定できる文字列リストは以下の通り;

'segmentation' (デフォルト): 最終レイヤーの出力。
'lastconv': 最後の畳み込みからの出力。
'lastconv+relu': ReLU を含む 'lastconv' までのネットワーク。

参照

Chen et al., Rethinking Atrous Convolution for Semantic Image Segmentation.