Object Detection Models

This subpackage provides a pre-trained state-of-the-art models for the purpose of object detection which is trained on ImageNet dataset and fine-tuned on Pascal VOC and MS COCO dataset.

The pre-trained models can be used for both inference and training as following:

# Import required modules
import nnabla as nn
from nnabla.models.object_detection import YoloV2
from nnabla.models.object_detection.utils import (
    LetterBoxTransform,
    draw_bounding_boxes)
from nnabla.utils.image_utils import imread, imsave
import numpy as np

# Set device
from nnabla.ext_utils import get_extension_context
nn.set_default_context(get_extension_context('cudnn', device_id='0'))

# Load and create a detection model
h, w = 608, 608
yolov2 = YoloV2('coco')
x = nn.Variable((1, 3, h, w))
y = yolov2(x)

# Load an image and scale it to fit inside the (h, w) frame
img_orig = imread('dog.jpg')
lbt = LetterBoxTransform(img_orig, h, w)

# Execute detection
x.d = lbt.image.transpose(2, 0, 1)[None]
y.forward(clear_buffer=True)

# Draw bounding boxes to the original image
bboxes = lbt.inverse_coordinate_transform(y.d[0])
img_draw = draw_bounding_boxes(
    img_orig, bboxes, yolov2.get_category_names())
imsave("detected.jpg", img_draw)

Available models trained on COCO dataset
Name	Class	mAP	Training framework	Notes
YOLO v2	YoloV2	44.12	Darknet	Weights converted from author’s model

Available models trained on VOC dataset
Name	Class	mAP	Training framework	Notes
YOLO v2	YoloV2	76.00	Darknet	Weights converted from author’s model

Common interfaces

class nnabla.models.object_detection.base.ObjectDetection[source]

__call__(input_var=None, use_from=None, use_up_to='detection', training=False, returns_net=False, verbose=0)[source]

Create a network (computation graph) from a loaded model.

Parameters:

input_var (Variable, optional) – If given, input variable is replaced with the given variable and a network is constructed on top of the variable. Otherwise, a variable with batch size as 1 and a default shape from self.input_shape.
use_up_to (str) – Network is constructed up to a variable specified by a string. A list of string-variable correspondences in a model is described in documentation for each model class.
training (bool) – This option enables additional training (fine-tuning, transfer learning etc.) for the constructed network. If True, the batch_stat option in batch normalization is turned True, and need_grad attribute in trainable variables (conv weights and gamma and beta of bn etc.) is turned True. The default is False.
returns_net (bool) – When True, it returns a NnpNetwork object. Otherwise, It only returns the last variable of the constructed network. The default is False.
verbose (bool, or int) – Verbose level. With 0, it says nothing during network construction.

property input_shape: Should returns default image size (channel, height, width) as a tuple.

class nnabla.models.object_detection.utils.LetterBoxTransform(image, height, width)[source]

Create an object holding a new letterboxed image as image attribute.

Letterboxing is defined as scaling the input image to fit inside the desired output image frame (letterbox) while preserving the aspect ratio of the original image. The pixels that are not filled with the original image pixels become 127.

The created object also provides a functionality to convert bounding box coordinates back to the original image frame.

Parameters:

image (numpy.ndarray) – An uint8 3-channel image
height (int) – Letterbox height
width (int) – Letterbox width

inverse_coordinate_transform(coords)[source]

Convert the bounding boxes back to the original image frame.

Parameters:: coords (numpy.ndarray) – N x M array where M >= 4 and first 4 elements of M are x, y (center coordinates of bounding box), w and h (bouding box width and height).

nnabla.models.object_detection.utils.draw_bounding_boxes(img, bboxes, names, colors=None, thresh=0.5)[source]

The transformed cordinates are further used to draw bounding boxes for the detected objects.

Parameters:

img (numpy.ndarray) – Input image
bboxes (numpy.ndarray) – Transformed bounding box coorinates from the model.
names (list of str) – Name of categories in the dataset
colors (list of tuple of 3 ints) – Colors for bunding boxes
thresh (float) – Threshold of bounding boxes.

List of models

class nnabla.models.object_detection.YoloV2(dataset='voc')[source]

The following is a list of string that can be specified to use_up_to option in __call__ method;

'detection' (default): The output from the last convolution (detection layer) after post-processing.
'convdetect': The output of last convolution without post-processing.
'lastconv': Network till the convolution layer+relu which comes before detection convolution layer.

References

Joseph Redmon et al., YOLO9000: Better, Faster, Stronger.