Class 7： MMDetection代码课

Class7： MMDetection代码课

文章目录

Class7： MMDetection代码课
@[toc]
依赖&安装
依赖

数据集准备
RTMDet
Config
模型配置
数据集和评测器配置
训练和测试的配置
优化相关配置

训练
测试以及推理
可视化分析
特征图可视化
Grad-Based CAM 可视化

检测新趋势
总结
总结

课程：MMDetection代码课_哔哩哔哩_bilibili

Repo：open-mmlab/mmdetection: OpenMMLab Detection Toolbox and Benchmark (github.com)

依赖&安装

依赖

基于 PyTorch 官方说明安装 PyTorch。

在 GPU 平台上：

conda install pytorch torchvision -c pytorch

在 CPU 平台上：

conda install pytorch torchvision cpuonly -c pytorch

使用 MIM 安装 MMEngine 和 MMCV。

pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.0"

安装 MMDetection。

方案 a：如果开发并直接运行 mmdet，从源码安装它：

git clone https://github.com/open-mmlab/mmdetection.git
cd mmdetection
pip install -v -e .
# "-v" 指详细说明，或更多的输出
# "-e" 表示在可编辑模式下安装项目，因此对代码所做的任何本地修改都会生效，从而无需重新安装。

方案 b：如果将 mmdet 作为依赖或第三方 Python 包，使用 MIM 安装：

mim install mmdet

数据集准备

Cat 数据集是一个包括 144 张图片的单类别数据集, 包括了训练所需的标注信息。样例图片如下所示：

cat dataset

RTMDet

拥有一个高效的模型结构是设计实时目标检测器最关键的问题之一。自 YOLOv4 将 Cross Stage Partial Network 的结构引入 DarkNet 之后，CSPDarkNet 因其简洁高效而被广泛应用于 YOLO 系列的各个改进版中。

而 RTMDet 也将 CSPDarkNet 作为基线，并使用同样构建单元组成的 CSPPAFPN 进行多尺度的特征融合，最后将特征输入给不同的检测头，进行目标检测、实例分割和旋转框检测等任务。整体的模型结构如下图所示：

RTMDet 由 tiny/s/m/l/x 一系列不同大小的模型组成，为不同的应用场景提供了不同的选择。
其中，RTMDet-x 在 52.6 mAP 的精度下达到了 300+ FPS 的推理速度。

注：推理速度和精度测试（不包含 NMS）是在 1 块 NVIDIA 3090 GPU 上的 `TensorRT 8.4.3, cuDNN 8.2.0, FP16, batch size=1 条件里测试的。

而最轻量的模型 RTMDet-tiny，在仅有 4M 参数量的情况下也能够达到 40.9 mAP，且推理速度 < 1 ms。

Config

MMDetection 和其他 OpenMMLab 仓库使用 MMEngine 的配置文件系统。配置文件使用了模块化和继承设计，以便于进行各类实验。

模型配置

model = dict(
    type='RTMDet',  # The name of detector
    data_preprocessor=dict(  # The config of data preprocessor, usually includes image normalization and padding
        type='DetDataPreprocessor',  # The type of the data preprocessor. Refer to https://mmdetection.readthedocs.io/en/latest/api.html#mmdet.models.data_preprocessors.DetDataPreprocessor
        mean=[103.53, 116.28, 123.675],  # Mean values used to pre-training the pre-trained backbone models, ordered in R, G, B
        std=[57.375, 57.12, 58.395],  # Standard variance used to pre-training the pre-trained backbone models, ordered in R, G, B
        bgr_to_rgb=False,  # whether to convert image from BGR to RGB
        batch_augments=None),  # Batch-level augmentations
    backbone=dict(  # The config of backbone
        type='CSPNeXt',  # The type of backbone network. Refer to https://mmdetection.readthedocs.io/en/latest/api.html#mmdet.models.backbones.CSPNeXt
        arch='P5',  # Architecture of CSPNeXt, from {P5, P6}. Defaults to P5
        expand_ratio=0.5,  # Ratio to adjust the number of channels of the hidden layer. Defaults to 0.5
        deepen_factor=1,  # Depth multiplier, multiply number of blocks in CSP layer by this amount. Defaults to 1.0
        widen_factor=1,  # Width multiplier, multiply number of channels in each layer by this amount. Defaults to 1.0
        channel_attention=True,  # Whether to add channel attention in each stage. Defaults to True
        norm_cfg=dict(type='SyncBN'),  # Dictionary to construct and config norm layer. Defaults to dict(type=’BN’, requires_grad=True)
        act_cfg=dict(type='SiLU', inplace=True)),  # Config dict for activation layer. Defaults to dict(type=’SiLU’)
    neck=dict(
        type='CSPNeXtPAFPN',  # The type of neck is CSPNeXtPAFPN. Refer to https://mmdetection.readthedocs.io/en/latest/api.html#mmdet.models.necks.CSPNeXtPAFPN
        in_channels=[256, 512, 1024],  # Number of input channels per scale
        out_channels=256,  # Number of output channels (used at each scale)
        num_csp_blocks=3,  # Number of bottlenecks in CSPLayer. Defaults to 3
        expand_ratio=0.5,  # Ratio to adjust the number of channels of the hidden layer. Default: 0.5
        norm_cfg=dict(type='SyncBN'),  # Config dict for normalization layer. Default: dict(type=’BN’)
        act_cfg=dict(type='SiLU', inplace=True)),  # Config dict for activation layer. Default: dict(type=’Swish’)
    bbox_head=dict(
        type='RTMDetSepBNHead',  # The type of bbox_head is RTMDetSepBNHead. RTMDetHead with separated BN layers and shared conv layers. Refer to https://mmdetection.readthedocs.io/en/latest/api.html#mmdet.models.dense_heads.RTMDetSepBNHead
        num_classes=80,  # Number of categories excluding the background category
        in_channels=256,  # Number of channels in the input feature map
        stacked_convs=2,  # Whether to share conv layers between stages. Defaults to True
        feat_channels=256,  # Feature channels of convolutional layers in the head
        anchor_generator=dict(  # The config of anchor generator
            type='MlvlPointGenerator',  # The methods use MlvlPointGenerator. Refer to https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/task_modules/prior_generators/point_generator.py#L92
            offset=0,  # The offset of points, the value is normalized with corresponding stride. Defaults to 0.5
            strides=[8, 16, 32]),  # Strides of anchors in multiple feature levels in order (w, h)
        bbox_coder=dict(type='DistancePointBBoxCoder'),  # Distance Point BBox coder.This coder encodes gt bboxes (x1, y1, x2, y2) into (top, bottom, left,right) and decode it back to the original. Refer to https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/task_modules/coders/distance_point_bbox_coder.py#L9
        loss_cls=dict(  # Config of loss function for the classification branch
            type='QualityFocalLoss',  # Type of loss for classification branch. Refer to https://mmdetection.readthedocs.io/en/latest/api.html#mmdet.models.losses.QualityFocalLoss
            use_sigmoid=True,  # Whether sigmoid operation is conducted in QFL. Defaults to True
            beta=2.0,  # The beta parameter for calculating the modulating factor. Defaults to 2.0
            loss_weight=1.0),  #  Loss weight of current loss
        loss_bbox=dict(  # Config of loss function for the regression branch
            type='GIoULoss',  # Type of loss. Refer to https://mmdetection.readthedocs.io/en/latest/api.html#mmdet.models.losses.GIoULoss
            loss_weight=2.0),  # Loss weight of the regression branch
        with_objectness=False,  # Whether to add an objectness branch. Defaults to True
        exp_on_reg=True,  # Whether to use .exp() in regression
        share_conv=True,  # Whether to share conv layers between stages. Defaults to True
        pred_kernel_size=1,  # Kernel size of prediction layer. Defaults to 1
        norm_cfg=dict(type='SyncBN'),  # Config dict for normalization layer. Defaults to dict(type='BN', momentum=0.03, eps=0.001)
        act_cfg=dict(type='SiLU', inplace=True)),  # Config dict for activation layer. Defaults to dict(type='SiLU')
    train_cfg=dict(  # Config of training hyperparameters for ATSS
        assigner=dict(  # Config of assigner
            type='DynamicSoftLabelAssigner',   # Type of assigner. DynamicSoftLabelAssigner computes matching between predictions and ground truth with dynamic soft label assignment. Refer to https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/task_modules/assigners/dynamic_soft_label_assigner.py#L40
            topk=13),  # Select top-k predictions to calculate dynamic k best matches for each gt. Defaults to 13
        allowed_border=-1,  # The border allowed after padding for valid anchors
        pos_weight=-1,  # The weight of positive samples during training
        debug=False),  # Whether to set the debug mode
    test_cfg=dict(  # Config for testing hyperparameters for ATSS
        nms_pre=30000,  # The number of boxes before NMS
        min_bbox_size=0,  # The allowed minimal box size
        score_thr=0.001,  # Threshold to filter out boxes
        nms=dict(  # Config of NMS in the second stage
            type='nms',  # Type of NMS
            iou_threshold=0.65),  # NMS threshold
        max_per_img=300),  # Max number of detections of each image
)

数据集和评测器配置

数据 dataloader 需要设置数据集（dataset）和数据处理流程（data pipeline）。由于这部分的配置较为复杂，使用中间变量来简化 dataloader 配置的编写。

dataset_type = 'CocoDataset'  # 数据集类型，这将被用来定义数据集。
data_root = 'data/coco/'  # 数据的根路径。

train_pipeline = [  # 训练数据处理流程
    dict(type='LoadImageFromFile'),  # 第 1 个流程，从文件路径里加载图像。
    dict(
        type='LoadAnnotations',  # 第 2 个流程，对于当前图像，加载它的注释信息。
        with_bbox=True,  # 是否使用标注框(bounding box)， 目标检测需要设置为 True。
        with_mask=True,  # 是否使用 instance mask，实例分割需要设置为 True。
        poly2mask=False),  # 是否将 polygon mask 转化为 instance mask, 设置为 False 以加速和节省内存。
    dict(
        type='Resize',  # 变化图像和其标注大小的流程。
        scale=(1333, 800),  # 图像的最大尺寸
        keep_ratio=True  # 是否保持图像的长宽比。
        ),
    dict(
        type='RandomFlip',  # 翻转图像和其标注的数据增广流程。
        prob=0.5),  # 翻转图像的概率。
    dict(type='PackDetInputs')  # 将数据转换为检测器输入格式的流程
]
test_pipeline = [  # 测试数据处理流程
    dict(type='LoadImageFromFile'),  # 第 1 个流程，从文件路径里加载图像。
    dict(type='Resize', scale=(1333, 800), keep_ratio=True),  # 变化图像大小的流程。
    dict(
        type='PackDetInputs',  # 将数据转换为检测器输入格式的流程
        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
                   'scale_factor'))
]
train_dataloader = dict(  # 训练 dataloader 配置
    batch_size=2,  # 单个 GPU 的 batch size
    num_workers=2,  # 单个 GPU 分配的数据加载线程数
    persistent_workers=True,  # 如果设置为 True，dataloader 在迭代完一轮之后不会关闭数据读取的子进程，可以加速训练
    sampler=dict(  # 训练数据的采样器
        type='DefaultSampler',  # 默认的采样器，同时支持分布式和非分布式训练。请参考 https://mmengine.readthedocs.io/zh_CN/latest/api/generated/mmengine.dataset.DefaultSampler.html#mmengine.dataset.DefaultSampler
        shuffle=True),  # 随机打乱每个轮次训练数据的顺序
    batch_sampler=dict(type='AspectRatioBatchSampler'),  # 批数据采样器，用于确保每一批次内的数据拥有相似的长宽比，可用于节省显存
    dataset=dict(  # 训练数据集的配置
        type=dataset_type,
        data_root=data_root,
        ann_file='annotations/instances_train2017.json',  # 标注文件路径
        data_prefix=dict(img='train2017/'),  # 图片路径前缀
        filter_cfg=dict(filter_empty_gt=True, min_size=32),  # 图片和标注的过滤配置
        pipeline=train_pipeline))  # 这是由之前创建的 train_pipeline 定义的数据处理流程。
val_dataloader = dict(  # 验证 dataloader 配置
    batch_size=1,  # 单个 GPU 的 Batch size。如果 batch-szie > 1，组成 batch 时的额外填充会影响模型推理精度
    num_workers=2,  # 单个 GPU 分配的数据加载线程数
    persistent_workers=True,  # 如果设置为 True，dataloader 在迭代完一轮之后不会关闭数据读取的子进程，可以加速训练
    drop_last=False,  # 是否丢弃最后未能组成一个批次的数据
    sampler=dict(
        type='DefaultSampler',
        shuffle=False),  # 验证和测试时不打乱数据顺序
    dataset=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file='annotations/instances_val2017.json',
        data_prefix=dict(img='val2017/'),
        test_mode=True,  # 开启测试模式，避免数据集过滤图片和标注
        pipeline=test_pipeline))
test_dataloader = val_dataloader  # 测试 dataloader 配置

用于计算训练模型在验证和测试数据集上的指标。评测器的配置由一个或一组评价指标（Metric）配置组成：

val_evaluator = dict(  # 验证过程使用的评测器
    type='CocoMetric',  # 用于评估检测和实例分割的 AR、AP 和 mAP 的 coco 评价指标
    ann_file=data_root + 'annotations/instances_val2017.json',  # 标注文件路径
    metric=['bbox', 'segm'],  # 需要计算的评价指标，`bbox` 用于检测，`segm` 用于实例分割
    format_only=False)
test_evaluator = val_evaluator  # 测试过程使用的评测器

由于测试数据集没有标注文件，因此 MMDetection 中的 test_dataloader 和 test_evaluator 配置通常等于val。如果要保存在测试数据集上的检测结果，则可以像这样编写配置：

# 在测试集上推理，
# 并将检测结果转换格式以用于提交结果
test_dataloader = dict(
    batch_size=1,
    num_workers=2,
    persistent_workers=True,
    drop_last=False,
    sampler=dict(type='DefaultSampler', shuffle=False),
    dataset=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file=data_root + 'annotations/image_info_test-dev2017.json',
        data_prefix=dict(img='test2017/'),
        test_mode=True,
        pipeline=test_pipeline))
test_evaluator = dict(
    type='CocoMetric',
    ann_file=data_root + 'annotations/image_info_test-dev2017.json',
    metric=['bbox', 'segm'],
    format_only=True,  # 只将模型输出转换为 coco 的 JSON 格式并保存
    outfile_prefix='./work_dirs/coco_detection/test')  # 要保存的 JSON 文件的前缀

训练和测试的配置

MMEngine 的 Runner 使用 Loop 来控制训练，验证和测试过程。用户可以使用这些字段设置最大训练轮次和验证间隔。

train_cfg = dict(
    type='EpochBasedTrainLoop',  # 训练循环的类型，请参考 https://github.com/open-mmlab/mmengine/blob/main/mmengine/runner/loops.py
    max_epochs=12,  # 最大训练轮次
    val_interval=1)  # 验证间隔。每个 epoch 验证一次
val_cfg = dict(type='ValLoop')  # 验证循环的类型
test_cfg = dict(type='TestLoop')  # 测试循环的类型

优化相关配置

optim_wrapper 是配置优化相关设置的字段。优化器封装（OptimWrapper）不仅提供了优化器的功能，还支持梯度裁剪、混合精度训练等功能。更多内容请看优化器封装教程。

optim_wrapper = dict(  # 优化器封装的配置
    type='OptimWrapper',  # 优化器封装的类型。可以切换至 AmpOptimWrapper 来启用混合精度训练
    optimizer=dict(  # 优化器配置。支持 PyTorch 的各种优化器。请参考 https://pytorch.org/docs/stable/optim.html#algorithms
        type='SGD',  # 随机梯度下降优化器
        lr=0.02,  # 基础学习率
        momentum=0.9,  # 带动量的随机梯度下降
        weight_decay=0.0001),  # 权重衰减
    clip_grad=None,  # 梯度裁剪的配置，设置为 None 关闭梯度裁剪。使用方法请见 https://mmengine.readthedocs.io/en/latest/tutorials/optimizer.html
    )

param_scheduler 字段用于配置参数调度器（Parameter Scheduler）来调整优化器的超参数（例如学习率和动量）。用户可以组合多个调度器来创建所需的参数调整策略。

param_scheduler = [
    dict(
        type='LinearLR',  # 使用线性学习率预热
        start_factor=0.001, # 学习率预热的系数
        by_epoch=False,  # 按 iteration 更新预热学习率
        begin=0,  # 从第一个 iteration 开始
        end=500),  # 到第 500 个 iteration 结束
    dict(
        type='MultiStepLR',  # 在训练过程中使用 multi step 学习率策略
        by_epoch=True,  # 按 epoch 更新学习率
        begin=0,   # 从第一个 epoch 开始
        end=12,  # 到第 12 个 epoch 结束
        milestones=[8, 11],  # 在哪几个 epoch 进行学习率衰减
        gamma=0.1)  # 学习率衰减系数
]

训练

为了使用新的配置方法来对模型进行训练，你只需要运行如下命令。

python tools/train.py configs/balloon/mask-rcnn_r50-caffe_fpn_ms-poly-1x_balloon.py

测试以及推理

为了测试训练完毕的模型，只需要运行如下命令。

python tools/test.py configs/balloon/mask-rcnn_r50-caffe_fpn_ms-poly-1x_balloon.py work_dirs/mask-rcnn_r50-caffe_fpn_ms-poly-1x_balloon/epoch_12.pth

可视化分析

可视化分析包括特征图可视化以及类似 Grad CAM 等可视化分析手段。不过由于 MMDetection 中还没有实现，我们可以直接采用 MMYOLO 中提供的功能和脚本。MMYOLO 是基于 MMDetection 开发，并且此案有了统一的代码组织形式，因此 MMDetection 配置可以直接在 MMYOLO 中使用，无需更改配置。

特征图可视化

MMYOLO 中，将使用 MMEngine 提供的 Visualizer 可视化器进行特征图可视化，其具备如下功能：

支持基础绘图接口以及特征图可视化。
支持选择模型中的不同层来得到特征图，包含 squeeze_mean ， select_max ， topk 三种显示方式，用户还可以使用 arrangement 自定义特征图显示的布局方式。

你可以调用 demo/featmap_vis_demo.py 来简单快捷地得到可视化结果，为了方便理解，将其主要参数的功能梳理如下：

img：选择要用于特征图可视化的图片，支持单张图片或者图片路径列表。
config：选择算法的配置文件。
checkpoint：选择对应算法的权重文件。
--out-file：将得到的特征图保存到本地，并指定路径和文件名。
--device：指定用于推理图片的硬件，--device cuda：0 表示使用第 1 张 GPU 推理，--device cpu 表示用 CPU 推理。
--score-thr：设置检测框的置信度阈值，只有置信度高于这个值的框才会显示。
--preview-model：可以预览模型，方便用户理解模型的特征层结构。
--target-layers：对指定层获取可视化的特征图。
- 可以单独输出某个层的特征图，例如： --target-layers backbone , --target-layers neck , --target-layers backbone.stage4 等。
- 参数为列表时，也可以同时输出多个层的特征图，例如： --target-layers backbone.stage4 neck 表示同时输出 backbone 的 stage4 层和 neck 的三层一共四层特征图。
--channel-reduction：输入的 Tensor 一般是包括多个通道的，channel_reduction 参数可以将多个通道压缩为单通道，然后和图片进行叠加显示，有以下三个参数可以设置：
- squeeze_mean：将输入的 C 维度采用 mean 函数压缩为一个通道，输出维度变成 (1, H, W)。
- select_max：将输入先在空间维度 sum，维度变成 (C, )，然后选择值最大的通道。
- None：表示不需要压缩，此时可以通过 topk 参数可选择激活度最高的 topk 个特征图显示。
--topk：只有在 channel_reduction 参数为 None 的情况下， topk 参数才会生效，其会按照激活度排序选择 topk 个通道，然后和图片进行叠加显示，并且此时会通过 --arrangement 参数指定显示的布局，该参数表示为一个数组，两个数字需要以空格分开，例如： --topk 5 --arrangement 2 3 表示以 2行 3列 显示激活度排序最高的 5 张特征图， --topk 7 --arrangement 3 3 表示以 3行 3列 显示激活度排序最高的 7 张特征图。
- 如果 topk 不是 -1，则会按照激活度排序选择 topk 个通道显示。
- 如果 topk = -1，此时通道 C 必须是 1 或者 3 表示输入数据是图片，否则报错提示用户应该设置 channel_reduction 来压缩通道。
考虑到输入的特征图通常非常小，函数默认将特征图进行上采样后方便进行可视化。

注意：当图片和特征图尺度不一样时候，draw_featmap 函数会自动进行上采样对齐。如果你的图片在推理过程中前处理存在类似 Pad 的操作此时得到的特征图也是 Pad 过的，那么直接上采样就可能会出现不对齐问题。

Grad-Based CAM 可视化

由于目标检测的特殊性，这里实际上可视化的并不是 CAM 而是 Grad Box AM。使用前需要先安装 grad-cam 库

!pip install “grad-cam”
(a) 查看 neck 输出的最小输出特征图的 Grad CAM
!python demo/boxam_vis_demo.py
resized_image.jpg
…/mmdetection/rtmdet_tiny_1xb12-40e_cat.py
…/mmdetection/work_dirs/rtmdet_tiny_1xb12-40e_cat/best_coco/bbox_mAP_epoch_30.pth
–target-layer neck.out_convs[2]
Image.open(‘output/resized_image.jpg’)
可以看出效果较好

(b) 查看 neck 输出的最大输出特征图的 Grad CAM
!python demo/boxam_vis_demo.py
resized_image.jpg
…/mmdetection/rtmdet_tiny_1xb12-40e_cat.py
…/mmdetection/work_dirs/rtmdet_tiny_1xb12-40e_cat/best_coco/bbox_mAP_epoch_30.pth
–target-layer neck.out_convs[0]
Image.open(‘output/resized_image.jpg’)
可以发现由于大物体不会在该层预测，因此梯度可视化是 0，符合预期。