mmdetection(MMDET)目标识别原理解析

Swin Transformer实现目标检测是基于mmdetection的,本文结合MMDET的结构解析Swin Transformer用于目标识别的原理。本文只讲述了部分MMDET的运行原理,关于MMDET的更多用法,参见以下网址:

Prerequisites - MMDetection 2.12.0 documentation

MMDET的配置文件Config

MMDET的模块化和继承设计都是由配置系统实现的。配置文件夹是MMDET下的config文件夹。由于项目是需要安装的,文件夹内的文件较多,因此本文就不列出项目结构了。configs/_base*_*文件夹中包含4个基本组件:dataset, model, schedule, default_runtime.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
│  default_runtime.py

├─datasets
│      cityscapes_detection.py
│      cityscapes_instance.py
│      coco_detection.py
│      coco_instance.py
│      coco_instance_semantic.py
│      deepfashion.py
│      lvis_v0.5_instance.py
│      lvis_v1_instance.py
│      voc0712.py
│      wider_face.py

├─models
│      cascade_mask_rcnn_r50_fpn.py
│      cascade_mask_rcnn_swin_fpn.py
│      cascade_rcnn_r50_fpn.py
│      faster_rcnn_r50_caffe_c4.py
│      faster_rcnn_r50_caffe_dc5.py
│      faster_rcnn_r50_fpn.py
│      fast_rcnn_r50_fpn.py
│      mask_rcnn_r50_caffe_c4.py
│      mask_rcnn_r50_fpn.py
│      mask_rcnn_swin_fpn.py
│      retinanet_r50_fpn.py
│      rpn_r50_caffe_c4.py
│      rpn_r50_fpn.py
│      ssd300.py

└─schedules
        schedule_1x.py
        schedule_20e.py
        schedule_2x.py

可以用这四个组件实现一些基本的模型,如Faster R-CNN,Mask R-CNN,Cascade R-CNN,RPN,SSD。由_base_中的组件组成的配置称为原始配置。程序中还提供了一些其他的模型,它们都是从原始配置继承的。对于新增方法,如果可以从现有方法去继承,例如模型是基于faster_rcnn_r50_fpn开发的,可以继承它

base = ../faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py

然后再更改一些必要的字段。符合mmcv的规则。有关mmcv的信息见

mmcv基本用法

配置文件命名规则是{model}[model setting]{backbone}{neck}[norm setting][misc][gpu x batch_per_gpu]{schedule}{dataset},其中大括号表示的是必须包含的区域,中括号是可选区域。作者建议新的开发者也按照这种规则去命名。

作者给出了一个R-CNN的配置实例

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
model = dict(
    type='MaskRCNN',  # 检测器模型的名称
    pretrained=
    'torchvision://resnet50',  # ImageNet预训练的backbone加载
    backbone=dict(  #  backbone的配置
        type='ResNet',  # backbone类型,mmdet/models/backbones/resnet.py#304
        depth=50,  #backbone深度,ResNet和ResNext通常为50或101。.
        num_stages=4,  # backbone的阶段个数.
        out_indices=(0, 1, 2, 3),  # 每个阶段生成的输出特征图的索引
        frozen_stages=1,  # 前1个阶段的权重
        norm_cfg=dict(  # 归一化层的配置
            type='BN',  # norm layer的类型, BN or GN
            requires_grad=True),  # 是否训练BN中的gamma和betaBN
        norm_eval=True,  # 是否冻结统计 BN
        style='pytorch'),  # 主干的样式“ pytorch”表示第2步的步幅为3x3转换,“ caffe”表示第2步的步幅为1x1的转换。
    neck=dict(
        type='FPN',  # neck是FPN. 还支持 'NASFPN', 'PAFPN', etc. 
        in_channels=[256, 512, 1024, 2048],  # 输入通道,与backbone的输出通道一致
        out_channels=256,  # 金字塔特征图的每一级的输出通道
        num_outs=5),  # 输出数量
    
...

    train_cfg = dict(  # rpn和rcnn的训练超参数配置
        rpn=dict(  # Training config of rpn
            assigner=dict(  # Config of assigner
                type='MaxIoUAssigner',  # Type of assigner, MaxIoUAssigner is used for many common detectors. Refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/core/bbox/assigners/max_iou_assigner.py#L10 for more details.
                pos_iou_thr=0.7,  # IoU >= threshold 0.7 will be taken as positive samples
                neg_iou_thr=0.3,  # IoU < threshold 0.3 will be taken as negative samples
                min_pos_iou=0.3,  # The minimal IoU threshold to take boxes as positive samples
                match_low_quality=True,  # Whether to match the boxes under low quality (see API doc for more details).
                ignore_iof_thr=-1),  # IoF threshold for ignoring bboxes
            sampler=dict(  # Config of positive/negative sampler
                type='RandomSampler',  # Type of sampler, PseudoSampler and other samplers are also supported. Refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/core/bbox/samplers/random_sampler.py#L8 for implementation details.
                num=256,  # Number of samples
                pos_fraction=0.5,  # The ratio of positive samples in the total samples.
                neg_pos_ub=-1,  # The upper bound of negative samples based on the number of positive samples.
                add_gt_as_proposals=False),  # Whether add GT as proposals after sampling.
            allowed_border=-1,  # The border allowed after padding for valid anchors.
            pos_weight=-1,  # The weight of positive samples during training.
            debug=False),  # Whether to set the debug mode
... 
       ) 
    test_cfg = dict(  # 用于测试rpn和rcnn的超参数的配置
        rpn=dict(  # The config to generate proposals during testing
            nms_across_levels=False,  # Whether to do NMS for boxes across levels. Only work in `GARPNHead`, naive rpn does not support do nms cross levels.
            nms_pre=1000,  # The number of boxes before NMS
            nms_post=1000,  # The number of boxes to be kept by NMS, Only work in `GARPNHead`.
            max_per_img=1000,  # The number of boxes to be kept after NMS.
            nms=dict( # Config of nms
                type='nms',  #Type of nms
                iou_threshold=0.7 # NMS threshold
                ),
            min_bbox_size=0),  # The allowed minimal box size
        rcnn=dict(  # The config for the roi heads.
            score_thr=0.05,  # Threshold to filter out boxes
            nms=dict(  # Config of nms in the second stage
                type='nms',  # Type of nms
                iou_thr=0.5),  # NMS threshold
            max_per_img=100,  # Max number of detections of each image
            mask_thr_binary=0.5))  # Threshold of mask prediction
dataset_type = 'CocoDataset'  # 数据集类型
data_root = 'data/coco/'  # 数据的根路径
img_norm_cfg = dict(  # 图像规范化配置
    mean=[123.675, 116.28, 103.53],  # Mean values used to pre-training the pre-trained backbone models
    std=[58.395, 57.12, 57.375],  # Standard variance used to pre-training the pre-trained backbone models
    to_rgb=True
)  
train_pipeline = [  # 训练管道
    dict(type='LoadImageFromFile'),  # 从文件路径加载图像
    dict(
        type='LoadAnnotations',  # 为当前图像加载注释
        with_bbox=True,  #是否使用边界框, True for detection
        with_mask=True,  #是否使用实例掩码, True for instance segmentation
        poly2mask=False),  # Whether to convert the polygon mask to instance mask, set False for acceleration and to save memory
  
  ...
]
test_pipeline = [ # 测试管道
    dict(type='LoadImageFromFile'),  # First pipeline to load images from file path
    dict(
        type='MultiScaleFlipAug',  # An encapsulation that encapsulates the testing augmentations
        img_scale=(1333, 800),  # Decides the largest scale for testing, used for the Resize pipeline
        flip=False,  # Whether to flip images during testing
...
            
]
data = dict(
    samples_per_gpu=2,  # 单个GPU的批处理大小
    workers_per_gpu=2,  # 为每个GPU预取数据的辅助进程
    train=dict(  # 训练数据集配置
        type='CocoDataset',  
        ann_file='data/coco/annotations/instances_train2017.json',  # 批注文件的路径
        img_prefix='data/coco/train2017/',  # 图像路径前缀
        pipeline=[  # pipeline, this is passed by the train_pipeline created before.
  ...
        ]),
    val=dict(  # 验证数据集配置
        type='CocoDataset',
        ann_file='data/coco/annotations/instances_val2017.json',
        img_prefix='data/coco/val2017/',
        pipeline=[  # Pipeline is passed by test_pipeline created before
...
      
        ]),
    test=dict(  #测试数据集配置, modify the ann_file for test-dev/test submission
        type='CocoDataset',
        ann_file='data/coco/annotations/instances_val2017.json',
        img_prefix='data/coco/val2017/',
        pipeline=[  # Pipeline is passed by test_pipeline created before
...
        ],
        samples_per_gpu=2  # Batch size of a single GPU used in testing
        ))
evaluation = dict(  # 构建评价值钩子的配置, 
    interval=1,  # Evaluation interval
    metric=['bbox', 'segm'])  # Metrics used during evaluation
optimizer = dict(  # 构建优化器的配置
    type='SGD',  # Type of optimizers, refer to https://github.com/open-mmlab/mmdetection/blob/master/mmdet/core/optimizer/default_constructor.py#L13 for more details
    lr=0.02,  # Learning rate of optimizers, see detail usages of the parameters in the documentaion of PyTorch
    momentum=0.9,  # Momentum
    weight_decay=0.0001)  # Weight decay of SGD
optimizer_config = dict(  # 构建优化器挂钩的配置
    grad_clip=None)  # Most of the methods do not use gradient clip
lr_config = dict(  # 用于注册LrUpdater钩子的学习率调度程序配置
    policy='step',  # The policy of scheduler, also support CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9.
    warmup='linear',  # The warmup policy, also support `exp` and `constant`.
    warmup_iters=500,  # The number of iterations for warmup
    warmup_ratio=
    0.001,  # The ratio of the starting learning rate used for warmup
    step=[8, 11])  # Steps to decay the learning rate
runner = dict(type='EpochBasedRunner', max_epochs=12) # Runner that runs the workflow in total max_epochs
checkpoint_config = dict(  # Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation.
    interval=1)  # The save interval is 1
dist_params = dict(backend='nccl')  # 参数设置分布式训练,端口也可以设置
load_from = None  # 从给定路径将模型作为预先训练的模型加载。这将不会恢复训练。
resume_from = None  # 从给定路径恢复检查点开始训练
workflow = [('train', 1)]  # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once. The workflow trains the model by 12 epochs according to the total_epochs.
work_dir = 'work_dir'  # Directory to save the model checkpoints and logs for the current experiments.

可以看出一个模型配置文件内包含了检测器模型的配置、训练和测试超参数配置、数据加载配置、优化器和学习率配置等等。

如果继承的配置文件想忽略或更改某个配置的话,只需要在配置字典的位置加入_delete_=True ,然后写入自己的配置即可。比如RESNET的配置文件想更改backbone的话

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
model = dict(
    type='MaskRCNN',
    pretrained='torchvision://resnet50',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch'),
    neck=dict(...),
    rpn_head=dict(...),
    roi_head=dict(...))

就要改变为

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
_base_ = '../mask_rcnn/mask_rcnn_r50_fpn_1x_coco.py'
model = dict(
    pretrained='open-mmlab://msra/hrnetv2_w32',
    backbone=dict(
        _delete_=True,
        type='HRNet',
        extra=dict(
            stage1=dict(
                num_modules=1,
                num_branches=1,
                block='BOTTLENECK',
                num_blocks=(4, ),
                num_channels=(64, )),
            stage2=dict(
                num_modules=1,
                num_branches=2,
                block='BASIC',
                num_blocks=(4, 4),
                num_channels=(32, 64)),
            stage3=dict(
                num_modules=4,
                num_branches=3,
                block='BASIC',
                num_blocks=(4, 4, 4),
                num_channels=(32, 64, 128)),
            stage4=dict(
                num_modules=3,
                num_branches=4,
                block='BASIC',
                num_blocks=(4, 4, 4, 4),
                num_channels=(32, 64, 128, 256)))),
    neck=dict(...))

此外,还要注意的是,配置文件内是可以传递变量的。例如上面RESNET的管道配置,在后面又写了一遍,其实可以直接train=dict(pipeline=train_pipeline)就可以了。

数据集和数据管道

MMDET原生支持COCO格式和PASCAL格式的数据集。对于新数据集来说,可以离线先转化成这两种格式,也可以在训练过程中在线转换。最好是离线转化好,这样只需要修改配置的数据注释路径和类就可以了。

数据管道和数据集是分离的。通常,数据集定义如何处理注释,数据管道定义所有准备数据字典的步骤。管道由一系列操作组成。每个操作都将一个dict作为输入,并为下一个转换输出一个dict。如图所示,这些操作分为数据加载,预处理,格式化和测试时间扩充

zoom

对应的配置文件代码是

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]

模型

模型组件分为5种类型:backbone、neck、head、roi extractor、loss。

如果想要建立自己的模型,就需要分别实现这5个部分。这5部分的实现方法是类似的,以backbone为例,

首先,在mmdet/models/backbones/文件夹下创建新的backbone文件,如 mmdet/models/backbones/mobilenet.py 在这个文件中写入类,来实现backbone。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
import torch.nn as nn

from ..builder import BACKBONES

@BACKBONES.register_module()
class MobileNet(nn.Module):

    def __init__(self, arg1, arg2):
        pass

    def forward(self, x):  # should return a tuple
        pass

然后,导入这个模块,有两种方法,一种是直接在mmdet/models/backbones/**init**.py里写上导入代码

1
from .mobilenet import MobileNet

另一种在配置文件中添加

1
2
3
custom_imports = dict(
    imports=['mmdet.models.backbones.mobilenet'],
    allow_failed_imports=False)

最后,在配置文件中使用这个新的backbone即可

1
2
3
4
5
6
7
model = dict(
    ...
    backbone=dict(
        type='MobileNet',
        arg1=xxx,
        arg2=xxx),
    ...

See Also