ssds.modeling.ssds

ssds.modeling.ssds.ssdsbase

class ssds.modeling.ssds.ssdsbase.SSDSBase(backbone, num_classes)[source]

Bases: torch.nn.modules.module.Module

Base class for all ssds model.

initialize_extra(layer)[source]
initialize_head(layer)[source]
initialize_prior(layer)[source]

ssds.modeling.ssds.ssd

class ssds.modeling.ssds.SSD(backbone, extras, head, num_classes)[source]

Bases: ssds.modeling.ssds.ssdsbase.SSDSBase

SSD: Single Shot MultiBox Detector See: https://arxiv.org/pdf/1512.02325.pdf for more details.

Parameters
  • backbone – backbone layers for input

  • extras – extra layers that feed to multibox loc and conf layers

  • head – “multibox head” consists of loc and conf conv layers

  • num_classes – num of classes

static add_extras(feature_layer, mbox, num_classes)[source]

Define and declare the extras, loc and conf modules for the ssd model.

The feature_layer is defined in cfg.MODEL.FEATURE_LAYER. For ssd model can be int, list of int and str:

  • int

    The int in the feature_layer represents the output feature in the backbone.

  • str

    The str in the feature_layer represents the extra layers append at the end of the backbone.

Parameters
  • feature_layer – the feature layers with detection head, defined by cfg.MODEL.FEATURE_LAYER

  • mbox – the number of boxes for each feature map

  • num_classes – the number of classes, defined by cfg.MODEL.NUM_CLASSES

forward(x)[source]

Applies network layers and ops on input image(s) x.

Parameters

x – input image or batch of images.

Returns

When self.training==True, loc and conf for each anchor box;

When self.training==False. loc and conf.sigmoid() for each anchor box;

For each player, conf with shape [batch, num_anchor*num_classes, height, width];

For each player, loc with shape [batch, num_anchor*4, height, width].

ssds.modeling.ssds.yolo

class ssds.modeling.ssds.YOLOV3(backbone, extras, head, num_classes)[source]

Bases: ssds.modeling.ssds.ssdsbase.SSDSBase

YOLOv3: An Incremental Improvement See: https://arxiv.org/abs/1804.02767v1 for more details.

Parameters
  • backbone – backbone layers for input

  • extras – contains transforms and extra layers that feed to multibox loc and conf layers

  • head – “multibox head” consists of loc and conf conv layers

  • num_classes – num of classes

static add_extras(feature_layer, mbox, num_classes)[source]

Define and declare the extras, loc and conf modules for the yolo v3 model.

The feature_layer is defined in cfg.MODEL.FEATURE_LAYER. For yolo v3 model can be int, list of int and str:

  • int

    The int in the feature_layer represents the output feature in the backbone.

  • list of int

    The list of int in the feature_layer represents the output feature in the backbone, the first int is the backbone output and the second int is the upsampling branch to fuse feature.

  • str

    The str in the feature_layer represents the extra layers append at the end of the backbone.

Parameters
  • feature_layer – the feature layers with detection head, defined by cfg.MODEL.FEATURE_LAYER

  • mbox – the number of boxes for each feature map

  • num_classes – the number of classes, defined by cfg.MODEL.NUM_CLASSES

forward(x)[source]

Applies network layers and ops on input image(s) x.

Parameters

x – input image or batch of images.

Returns

When self.training==True, loc and conf for each anchor box;

When self.training==False. loc and conf.sigmoid() for each anchor box;

For each player, conf with shape [batch, num_anchor*num_classes, height, width];

For each player, loc with shape [batch, num_anchor*4, height, width].

class ssds.modeling.ssds.YOLOV4(backbone, extras, head, num_classes)[source]

Bases: ssds.modeling.ssds.ssdsbase.SSDSBase

YOLO V4 Architecture See: https://arxiv.org/abs/2004.10934v1 for more details.

Parameters
  • backbone – backbone layers for input

  • extras – contains transforms, extra and fpn layers that feed to multibox loc and conf layers

  • head – “multibox head” consists of loc and conf conv layers

  • num_classes – num of classes

static add_extras(feature_layer, mbox, num_classes)[source]

Define and declare the extras, loc and conf modules for the yolo v4 model.

The feature_layer is defined in cfg.MODEL.FEATURE_LAYER. For yolo v4 model can be int, list of int and str:

  • int

    The int in the feature_layer represents the output feature in the backbone.

  • list of int

    The list of int in the feature_layer represents the output feature in the backbone, the first int is the backbone output and the second int is the upsampling branch to fuse feature.

  • str

    The str in the feature_layer represents the extra layers append at the end of the backbone.

Parameters
  • feature_layer – the feature layers with detection head, defined by cfg.MODEL.FEATURE_LAYER

  • mbox – the number of boxes for each feature map

  • num_classes – the number of classes, defined by cfg.MODEL.NUM_CLASSES

forward(x)[source]

Applies network layers and ops on input image(s) x.

Parameters

x – input image or batch of images.

Returns

When self.training==True, loc and conf for each anchor box;

When self.training==False. loc and conf.sigmoid() for each anchor box;

For each player, conf with shape [batch, num_anchor*num_classes, height, width];

For each player, loc with shape [batch, num_anchor*4, height, width].

ssds.modeling.ssds.fpn

class ssds.modeling.ssds.SSDFPN(backbone, extras, head, num_classes)[source]

Bases: ssds.modeling.ssds.ssdsbase.SSDSBase

RetinaNet in Focal Loss for Dense Object Detection See: https://arxiv.org/abs/1708.02002v2 for more details.

Compared with the original implementation, change the conv2d in the extra and head to ConvBNReLU to helps the model converage easily Not add the bn&relu to transforms cause it is followed by interpolate and element-wise sum

Parameters
  • backbone – backbone layers for input

  • extras – contains transforms and extra layers that feed to multibox loc and conf layers

  • head – “multibox head” consists of loc and conf conv layers

  • num_classes – num of classes

static add_extras(feature_layer, mbox, num_classes)[source]

Define and declare the extras, loc and conf modules for the ssdfpn model.

The feature_layer is defined in cfg.MODEL.FEATURE_LAYER. For ssdfpn model can be int, list of int and str:

  • int

    The int in the feature_layer represents the output feature in the backbone.

  • list of int

    The list of int in the feature_layer represents the output feature in the backbone, the first int is the backbone output and the second int is the upsampling branch to fuse feature.

  • str

    The str in the feature_layer represents the extra layers append at the end of the backbone.

Parameters
  • feature_layer – the feature layers with detection head, defined by cfg.MODEL.FEATURE_LAYER

  • mbox – the number of boxes for each feature map

  • num_classes – the number of classes, defined by cfg.MODEL.NUM_CLASSES

forward(x)[source]

Applies network layers and ops on input image(s) x.

Parameters

x – input image or batch of images.

Returns

When self.training==True, loc and conf for each anchor box;

When self.training==False. loc and conf.sigmoid() for each anchor box;

For each player, conf with shape [batch, num_anchor*num_classes, height, width];

For each player, loc with shape [batch, num_anchor*4, height, width].

ssds.modeling.ssds.bifpn

class ssds.modeling.ssds.SSDBiFPN(backbone, extras, head, num_classes)[source]

Bases: ssds.modeling.ssds.ssdsbase.SSDSBase

EfficientDet: Scalable and Efficient Object Detection See: https://arxiv.org/abs/1911.09070v6 for more details.

Compared with the original implementation, change the conv2d in the extra and head to ConvBNReLU to helps the model converage easily Not add the bn&relu to transforms cause it is followed by interpolate and element-wise sum

Parameters
  • backbone – backbone layers for input

  • extras – contains transforms, extra and stack_bifpn layers that feed to multibox loc and conf layers

  • head – “multibox head” consists of loc and conf conv layers

  • num_classes – num of classes

static add_extras(feature_layer, mbox, num_classes)[source]

Define and declare the extras, loc and conf modules for the ssdfpn model.

The feature_layer is defined in cfg.MODEL.FEATURE_LAYER. For ssdfpn model can be int, list of int and str:

  • int

    The int in the feature_layer represents the output feature in the backbone.

  • list of int

    The list of int in the feature_layer represents the output feature in the backbone, the first int is the backbone output and the second int is the upsampling branch to fuse feature.

  • str

    The str in the feature_layer represents the extra layers append at the end of the backbone.

Parameters
  • feature_layer – the feature layers with detection head, defined by cfg.MODEL.FEATURE_LAYER

  • mbox – the number of boxes for each feature map

  • num_classes – the number of classes, defined by cfg.MODEL.NUM_CLASSES

forward(x)[source]

Applies network layers and ops on input image(s) x.

Parameters

x – input image or batch of images.

Returns

When self.training==True, loc and conf for each anchor box;

When self.training==False. loc and conf.sigmoid() for each anchor box;

For each player, conf with shape [batch, num_anchor*num_classes, height, width];

For each player, loc with shape [batch, num_anchor*4, height, width].