オブジェクト検出モデル

本サブパッケージは、 ImageNet データセットで学習され、 Pascal VOC と MS COCO データセットでファインチューニングされたオブジェクト検出のための最先端の学習済みモデルを提供します。

学習済みモデルは、以下のように推論や学習で使うことができます:

# Import required modules
import nnabla as nn
from nnabla.models.object_detection import YoloV2
from nnabla.models.object_detection.utils import (
    LetterBoxTransform,
    draw_bounding_boxes)
from nnabla.utils.image_utils import imread, imsave
import numpy as np

# Set device
from nnabla.ext_utils import get_extension_context
nn.set_default_context(get_extension_context('cudnn', device_id='0'))

# Load and create a detection model
h, w = 608, 608
yolov2 = YoloV2('coco')
x = nn.Variable((1, 3, h, w))
y = yolov2(x)

# Load an image and scale it to fit inside the (h, w) frame
img_orig = imread('dog.jpg')
lbt = LetterBoxTransform(img_orig, h, w)

# Execute detection
x.d = lbt.image.transpose(2, 0, 1)[None]
y.forward(clear_buffer=True)

# Draw bounding boxes to the original image
bboxes = lbt.inverse_coordinate_transform(y.d[0])
img_draw = draw_bounding_boxes(
    img_orig, bboxes, yolov2.get_category_names())
imsave("detected.jpg", img_draw)

COCO データセットで学習した利用可能なモデル
名前	クラス	mAP	学習フレームワーク	備考
YOLO v2	YoloV2	44.12	Darknet	論文著者のモデルから変換した重み

VOC データセットで学習した利用可能なモデル
名前	クラス	mAP	学習フレームワーク	備考
YOLO v2	YoloV2	76.00	Darknet	論文著者のモデルから変換した重み

共通インターフェイス

class nnabla.models.object_detection.base.ObjectDetection[ソース]

__call__(input_var=None, use_from=None, use_up_to='detection', training=False, returns_net=False, verbose=0)[ソース]

読み込んだモデルからネットワーク (計算グラフ) を作成します。

パラメータ:

input_var (Variable, optional) -- 指定された場合、入力変数は指定された変数に置き換えられ、その変数の上にネットワークが構築されます。それ以外の場合、バッチサイズが１で、 self.input_shape を元にした形状を持つ変数になります。
use_up_to (str) -- ネットワークは文字列で指定した変数まで構築されます。モデルに対応した文字列変数リストは、各モデルクラスのドキュメントに記載されています。
training (bool) -- このオプションは、構築されたネットワークの追加学習 (ファインチューニング、転移学習など) を有効にします。True の場合、batch normalization の batch_stat オプションが True になり、学習可能な変数 (畳み込み層の重みおよびバッチ正規化層のガンマとベータなど) の need_grad 属性が True になります。デフォルトは False です。
returns_net (bool) -- True の場合、 NnpNetwork オブジェクトを返します。それ以外の場合は、構築したネットワークの最後の変数を返すのみです。デフォルトは False です。
verbose (bool, or int) -- 詳細レベル。 0 に設定した場合、ネットワーク構築中は何も出力しません。

property input_shape: デフォルトの画像サイズ (チャネル、高さ、幅) の組として返します。

class nnabla.models.object_detection.utils.LetterBoxTransform(image, height, width)[ソース]

image 属性として新しいレターボックス画像を持つオブジェクトを生成します。

レターボックス化は、元画像のアスペクト比を維持しつつ、入力画像を意図した出力画像フレーム (レターボックス) 内に収めるよう拡縮する処理として定義されます。元画像のピクセルで埋められないピクセルは 127 になります。

作成されたオブジェクトは、バウンディングボックス座標を元画像フレームに変換する機能も提供します。

パラメータ:

image (numpy.ndarray) -- uint8 の 3チャネル画像
height (int) -- レターボックスの高さ
width (int) -- レターボックスの幅

inverse_coordinate_transform(coords)[ソース]

バウンディングボックスを元画像フレームに戻す変換を行います。

パラメータ:: coords (numpy.ndarray) -- M >= 4 かつ、 M の最初の４つの要素が x, y (境界ボックスの中心座標), w , h (境界ボックスの幅と高さ) となる N x M の配列。

nnabla.models.object_detection.utils.draw_bounding_boxes(img, bboxes, names, colors=None, thresh=0.5)[ソース]

変換された座標は、検出されたオブジェクトのバウンディングボックスを描画するためにさらに使用されます。

パラメータ:

img (numpy.ndarray) -- 入力画像
bboxes (numpy.ndarray) -- Transformed bounding box coorinates from the model.
names (list of str) -- データセットにあるカテゴリ名。
colors (list of tuple of 3 ints) -- バウンディングボックスの色情報。
thresh (float) -- バウンディングボックスの閾値。

モデルのリスト

class nnabla.models.object_detection.YoloV2(dataset='voc')[ソース]

__call__ method にある use_up_to オプションで指定できる文字列リストは以下の通り;

'detection' (デフォルト): 後処理後の最後の畳み込み (検出層) からの出力。
'convdetect': 後処理していない最後の畳み込みの出力。
'lastconv': 検出畳み込み層の直前の畳み込み層+relu までのネットワーク。

参照

Joseph Redmon et al., YOLO9000: Better, Faster, Stronger.