ssds.modeling.ssds¶

ssds.modeling.ssds.ssdsbase¶

class ssds.modeling.ssds.ssdsbase.SSDSBase(backbone, num_classes)[source]¶

Bases: torch.nn.modules.module.Module

Base class for all ssds model.

initialize_extra(layer)[source]¶

initialize_head(layer)[source]¶

initialize_prior(layer)[source]¶

ssds.modeling.ssds.ssd¶

class ssds.modeling.ssds.SSD(backbone, extras, head, num_classes)[source]¶

Bases: ssds.modeling.ssds.ssdsbase.SSDSBase

SSD: Single Shot MultiBox Detector See: https://arxiv.org/pdf/1512.02325.pdf for more details.

Parameters

backbone – backbone layers for input
extras – extra layers that feed to multibox loc and conf layers
head – “multibox head” consists of loc and conf conv layers
num_classes – num of classes

static add_extras(feature_layer, mbox, num_classes)[source]¶

Define and declare the extras, loc and conf modules for the ssd model.

The feature_layer is defined in cfg.MODEL.FEATURE_LAYER. For ssd model can be int, list of int and str:

int
The int in the feature_layer represents the output feature in the backbone.
str
The str in the feature_layer represents the extra layers append at the end of the backbone.

Parameters

feature_layer – the feature layers with detection head, defined by cfg.MODEL.FEATURE_LAYER
mbox – the number of boxes for each feature map
num_classes – the number of classes, defined by cfg.MODEL.NUM_CLASSES

forward(x)[source]¶

Applies network layers and ops on input image(s) x.

Parameters

x – input image or batch of images.

Returns

When self.training==True, loc and conf for each anchor box;

When self.training==False. loc and conf.sigmoid() for each anchor box;

For each player, conf with shape [batch, num_anchor*num_classes, height, width];

For each player, loc with shape [batch, num_anchor*4, height, width].

ssds.modeling.ssds.yolo¶

class ssds.modeling.ssds.YOLOV3(backbone, extras, head, num_classes)[source]¶

Bases: ssds.modeling.ssds.ssdsbase.SSDSBase

YOLOv3: An Incremental Improvement See: https://arxiv.org/abs/1804.02767v1 for more details.

Parameters

backbone – backbone layers for input
extras – contains transforms and extra layers that feed to multibox loc and conf layers
head – “multibox head” consists of loc and conf conv layers
num_classes – num of classes

static add_extras(feature_layer, mbox, num_classes)[source]¶

Define and declare the extras, loc and conf modules for the yolo v3 model.

The feature_layer is defined in cfg.MODEL.FEATURE_LAYER. For yolo v3 model can be int, list of int and str:

int
The int in the feature_layer represents the output feature in the backbone.
list of int
The list of int in the feature_layer represents the output feature in the backbone, the first int is the backbone output and the second int is the upsampling branch to fuse feature.
str
The str in the feature_layer represents the extra layers append at the end of the backbone.

Parameters

feature_layer – the feature layers with detection head, defined by cfg.MODEL.FEATURE_LAYER
mbox – the number of boxes for each feature map
num_classes – the number of classes, defined by cfg.MODEL.NUM_CLASSES

forward(x)[source]¶

Applies network layers and ops on input image(s) x.

Parameters

x – input image or batch of images.

Returns

When self.training==True, loc and conf for each anchor box;

When self.training==False. loc and conf.sigmoid() for each anchor box;

For each player, conf with shape [batch, num_anchor*num_classes, height, width];

For each player, loc with shape [batch, num_anchor*4, height, width].

class ssds.modeling.ssds.YOLOV4(backbone, extras, head, num_classes)[source]¶

Bases: ssds.modeling.ssds.ssdsbase.SSDSBase

YOLO V4 Architecture See: https://arxiv.org/abs/2004.10934v1 for more details.

Parameters

backbone – backbone layers for input
extras – contains transforms, extra and fpn layers that feed to multibox loc and conf layers
head – “multibox head” consists of loc and conf conv layers
num_classes – num of classes

static add_extras(feature_layer, mbox, num_classes)[source]¶

Define and declare the extras, loc and conf modules for the yolo v4 model.

The feature_layer is defined in cfg.MODEL.FEATURE_LAYER. For yolo v4 model can be int, list of int and str:

int
The int in the feature_layer represents the output feature in the backbone.
list of int
The list of int in the feature_layer represents the output feature in the backbone, the first int is the backbone output and the second int is the upsampling branch to fuse feature.
str
The str in the feature_layer represents the extra layers append at the end of the backbone.

Parameters

feature_layer – the feature layers with detection head, defined by cfg.MODEL.FEATURE_LAYER
mbox – the number of boxes for each feature map
num_classes – the number of classes, defined by cfg.MODEL.NUM_CLASSES

forward(x)[source]¶

Applies network layers and ops on input image(s) x.

Parameters

x – input image or batch of images.

Returns

When self.training==True, loc and conf for each anchor box;

When self.training==False. loc and conf.sigmoid() for each anchor box;

For each player, conf with shape [batch, num_anchor*num_classes, height, width];

For each player, loc with shape [batch, num_anchor*4, height, width].

ssds.modeling.ssds.fpn¶

class ssds.modeling.ssds.SSDFPN(backbone, extras, head, num_classes)[source]¶

Bases: ssds.modeling.ssds.ssdsbase.SSDSBase

RetinaNet in Focal Loss for Dense Object Detection See: https://arxiv.org/abs/1708.02002v2 for more details.

Compared with the original implementation, change the conv2d in the extra and head to ConvBNReLU to helps the model converage easily Not add the bn&relu to transforms cause it is followed by interpolate and element-wise sum

Parameters

backbone – backbone layers for input
extras – contains transforms and extra layers that feed to multibox loc and conf layers
head – “multibox head” consists of loc and conf conv layers
num_classes – num of classes

static add_extras(feature_layer, mbox, num_classes)[source]¶

Define and declare the extras, loc and conf modules for the ssdfpn model.

The feature_layer is defined in cfg.MODEL.FEATURE_LAYER. For ssdfpn model can be int, list of int and str:

int
The int in the feature_layer represents the output feature in the backbone.
list of int
The list of int in the feature_layer represents the output feature in the backbone, the first int is the backbone output and the second int is the upsampling branch to fuse feature.
str
The str in the feature_layer represents the extra layers append at the end of the backbone.

Parameters

feature_layer – the feature layers with detection head, defined by cfg.MODEL.FEATURE_LAYER
mbox – the number of boxes for each feature map
num_classes – the number of classes, defined by cfg.MODEL.NUM_CLASSES

forward(x)[source]¶

Applies network layers and ops on input image(s) x.

Parameters

x – input image or batch of images.

Returns

When self.training==True, loc and conf for each anchor box;

When self.training==False. loc and conf.sigmoid() for each anchor box;

For each player, conf with shape [batch, num_anchor*num_classes, height, width];

For each player, loc with shape [batch, num_anchor*4, height, width].

ssds.modeling.ssds.bifpn¶

class ssds.modeling.ssds.SSDBiFPN(backbone, extras, head, num_classes)[source]¶

Bases: ssds.modeling.ssds.ssdsbase.SSDSBase

EfficientDet: Scalable and Efficient Object Detection See: https://arxiv.org/abs/1911.09070v6 for more details.

Compared with the original implementation, change the conv2d in the extra and head to ConvBNReLU to helps the model converage easily Not add the bn&relu to transforms cause it is followed by interpolate and element-wise sum

Parameters

backbone – backbone layers for input
extras – contains transforms, extra and stack_bifpn layers that feed to multibox loc and conf layers
head – “multibox head” consists of loc and conf conv layers
num_classes – num of classes

static add_extras(feature_layer, mbox, num_classes)[source]¶

Define and declare the extras, loc and conf modules for the ssdfpn model.

The feature_layer is defined in cfg.MODEL.FEATURE_LAYER. For ssdfpn model can be int, list of int and str:

int
The int in the feature_layer represents the output feature in the backbone.
list of int
The list of int in the feature_layer represents the output feature in the backbone, the first int is the backbone output and the second int is the upsampling branch to fuse feature.
str
The str in the feature_layer represents the extra layers append at the end of the backbone.

Parameters

feature_layer – the feature layers with detection head, defined by cfg.MODEL.FEATURE_LAYER
mbox – the number of boxes for each feature map
num_classes – the number of classes, defined by cfg.MODEL.NUM_CLASSES

forward(x)[source]¶

Applies network layers and ops on input image(s) x.

Parameters

x – input image or batch of images.

Returns

When self.training==True, loc and conf for each anchor box;

When self.training==False. loc and conf.sigmoid() for each anchor box;

For each player, conf with shape [batch, num_anchor*num_classes, height, width];

For each player, loc with shape [batch, num_anchor*4, height, width].