💡💡💡本文独家改进:本文首先复现了将EMA引入到RT-DETR中,并跟不同模块进行结合创新;1)多种Rep C3结合;2)直接作为注意力机制放在网络不同位置;
NEU-DET钢材表面缺陷检测:
原始 rtdetr-r18 map0.5为0.67
rtdetr-r18-EMA_attention map0.5为0.691
rtdetr-r18-EMA_attentionC3 map0.5为0.72
rtdetr-r18-EMA_attentionC31 map0.5为0.718
RT-DETR魔术师专栏介绍:
https://blog.csdn.net/m0_63774211/category_12497375.html
✨✨✨魔改创新RT-DETR
🚀🚀🚀引入前沿顶会创新,助力RT-DETR
🍉🍉🍉基于ultralytics优化,与YOLO完美结合
1.钢铁缺陷数据集介绍
NEU-DET钢材表面缺陷共有六大类,分别为:'crazing','inclusion','patches','pitted_surface','rolled-in_scale','scratches'
每个类别分布为:
2.EMA注意力介绍
论文:https://arxiv.org/abs/2305.13563v1
录用:ICASSP2023
通过通道降维来建模跨通道关系可能会给提取深度视觉表示带来副作用。本文提出了一种新的高效的多尺度注意力(EMA)模块。以保留每个通道上的信息和降低计算开销为目标,将部分通道重塑为批量维度,并将通道维度分组为多个子特征,使空间语义特征在每个特征组中均匀分布。
提出了一种新的无需降维的高效多尺度注意力(efficient multi-scale attention, EMA)。请注意,这里只有两个卷积核将分别放置在并行子网络中。其中一个并行子网络是一个1x1卷积核,以与CA相同的方式处理,另一个是一个3x3卷积核。为了证明所提出的EMA的通用性,详细的实验在第4节中给出,包括在CIFAR-100、ImageNet-1k、COCO和VisDrone2019基准上的结果。图1给出了图像分类和目标检测任务的实验结果。我们的主要贡献如下:
本文提出了一种新的跨空间学习方法,并设计了一个多尺度并行子网络来建立短和长依赖关系。
1)我们考虑一种通用方法,将部分通道维度重塑为批量维度,以避免通过通用卷积进行某种形式的降维。
2)除了在不进行通道降维的情况下在每个并行子网络中构建局部的跨通道交互外,我们还通过跨空间学习方法融合两个并行子网络的输出特征图。
3)与CBAM、NAM[16]、SA、ECA和CA相比,EMA不仅取得了更好的结果,而且在所需参数方面效率更高。
3 EMA_attention如何跟RT-DETR结合进行结合创新
3.1 如何跟Rep C3结合
EMA_attentionC3与RepC3结合方式
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, Resnet_ConvNormLayer, [32, 3, 2, None, False, 'relu']] # 0-P1/2
- [-1, 1, Resnet_ConvNormLayer, [32, 3, 1, None, False, 'relu']] # 1
- [-1, 1, Resnet_ConvNormLayer, [64, 3, 1, None, False, 'relu']] # 2
- [-1, 1, nn.MaxPool2d, [3, 2, 1]] # 3-P2/4
# [ch_out, block_type, block_nums, stage_num, act, variant]
- [-1, 1, Resnet_Blocks, [64, BasicBlock, 2, 2, 'relu']] # 4
- [-1, 1, Resnet_Blocks, [128, BasicBlock, 2, 3, 'relu']] # 5-P3/8
- [-1, 1, Resnet_Blocks, [256, BasicBlock, 2, 4, 'relu']] # 6-P4/16
- [-1, 1, Resnet_Blocks, [512, BasicBlock, 2, 5, 'relu']] # 7-P5/32
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 8 input_proj.2
- [-1, 1, AIFI, [1024, 8]] # 9
- [-1, 1, Conv, [256, 1, 1]] # 10, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 11
- [6, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 12 input_proj.1
- [[-2, -1], 1, Concat, [1]] # 13
- [-1, 3, EMA_attentionC3, [256, 0.5]] # 14, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 15, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 16
- [5, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 17 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 18 cat backbone P4
- [-1, 3, EMA_attentionC3, [256, 0.5]] # X3 (19), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 20, downsample_convs.0
- [[-1, 15], 1, Concat, [1]] # 21 cat Y4
- [-1, 3, EMA_attentionC3, [256, 0.5]] # F4 (22), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 23, downsample_convs.1
- [[-1, 10], 1, Concat, [1]] # 24 cat Y5
- [-1, 3, RepC3, [256, 0.5]] # F5 (25), pan_blocks.1
- [[19, 22, 25], 1, RTDETRDecoder, [nc, 256, 300, 4, 8, 3]] # Detect(P3, P4, P5)
3.2 如何跟RepC3结合
EMA_attentionC31另一种Rep C3结合方式
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, Resnet_ConvNormLayer, [32, 3, 2, None, False, 'relu']] # 0-P1/2
- [-1, 1, Resnet_ConvNormLayer, [32, 3, 1, None, False, 'relu']] # 1
- [-1, 1, Resnet_ConvNormLayer, [64, 3, 1, None, False, 'relu']] # 2
- [-1, 1, nn.MaxPool2d, [3, 2, 1]] # 3-P2/4
# [ch_out, block_type, block_nums, stage_num, act, variant]
- [-1, 1, Resnet_Blocks, [64, BasicBlock, 2, 2, 'relu']] # 4
- [-1, 1, Resnet_Blocks, [128, BasicBlock, 2, 3, 'relu']] # 5-P3/8
- [-1, 1, Resnet_Blocks, [256, BasicBlock, 2, 4, 'relu']] # 6-P4/16
- [-1, 1, Resnet_Blocks, [512, BasicBlock, 2, 5, 'relu']] # 7-P5/32
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 8 input_proj.2
- [-1, 1, AIFI, [1024, 8]] # 9
- [-1, 1, Conv, [256, 1, 1]] # 10, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 11
- [6, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 12 input_proj.1
- [[-2, -1], 1, Concat, [1]] # 13
- [-1, 3, EMA_attentionC31, [256, 0.5]] # 14, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 15, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 16
- [5, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 17 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 18 cat backbone P4
- [-1, 3, EMA_attentionC31, [256, 0.5]] # X3 (19), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 20, downsample_convs.0
- [[-1, 15], 1, Concat, [1]] # 21 cat Y4
- [-1, 3, EMA_attentionC31, [256, 0.5]] # F4 (22), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 23, downsample_convs.1
- [[-1, 10], 1, Concat, [1]] # 24 cat Y5
- [-1, 3, RepC3, [256, 0.5]] # F5 (25), pan_blocks.1
- [[19, 22, 25], 1, RTDETRDecoder, [nc, 256, 300, 4, 8, 3]] # Detect(P3, P4, P5)
3.3 直接作为注意力机制放在网络不同位置
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, Resnet_ConvNormLayer, [32, 3, 2, None, False, 'relu']] # 0-P1/2
- [-1, 1, Resnet_ConvNormLayer, [32, 3, 1, None, False, 'relu']] # 1
- [-1, 1, Resnet_ConvNormLayer, [64, 3, 1, None, False, 'relu']] # 2
- [-1, 1, nn.MaxPool2d, [3, 2, 1]] # 3-P2/4
# [ch_out, block_type, block_nums, stage_num, act, variant]
- [-1, 1, Resnet_Blocks, [64, BasicBlock, 2, 2, 'relu']] # 4
- [-1, 1, Resnet_Blocks, [128, BasicBlock, 2, 3, 'relu']] # 5-P3/8
- [-1, 1, Resnet_Blocks, [256, BasicBlock, 2, 4, 'relu']] # 6-P4/16
- [-1, 1, Resnet_Blocks, [512, BasicBlock, 2, 5, 'relu']] # 7-P5/32
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 8 input_proj.2
- [-1, 1, AIFI, [1024, 8]] # 9
- [-1, 1, Conv, [256, 1, 1]] # 10, Y5, lateral_convs.0
- [-1, 1, EMA_attention, [256]] # 11
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 12
- [6, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 13 input_proj.1
- [[-2, -1], 1, Concat, [1]] # 14
- [-1, 3, RepC3, [256, 0.5]] # 15, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 16, Y4, lateral_convs.1
- [-1, 1, EMA_attention, [256]] # 17
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 16
- [5, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 17 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 18 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (21), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 20, downsample_convs.0
- [[-1, 17], 1, Concat, [1]] # 21 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F4 (24), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 23, downsample_convs.1
- [[-1, 10], 1, Concat, [1]] # 24 cat Y5
- [-1, 3, RepC3, [256, 0.5]] # F5 (27), pan_blocks.1
- [[21, 24, 27], 1, RTDETRDecoder, [nc, 256, 300, 4, 8, 3]] # Detect(P3, P4, P5)
4.实验结果分析
原始rtdetr-r18
rtdetr-r18 summary: 319 layers, 19884264 parameters, 0 gradients
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 16/16 [00:12<00:00, 1.24it/s]
all 486 1069 0.658 0.63 0.67 0.379
crazing 486 149 0.593 0.107 0.236 0.0901
inclusion 486 222 0.6 0.784 0.772 0.404
patches 486 243 0.835 0.855 0.908 0.561
pitted_surface 486 130 0.695 0.738 0.766 0.48
rolled-in_scale 486 171 0.592 0.449 0.506 0.241
scratches 486 154 0.631 0.844 0.83 0.495
rtdetr-r18-EMA_attention
rtdetr-r18-EMA_attention summary: 335 layers, 19885608 parameters, 0 gradients
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 16/16 [00:13<00:00, 1.21it/s]
all 486 1069 0.748 0.638 0.691 0.393
crazing 486 149 0.459 0.248 0.242 0.0776
inclusion 486 222 0.825 0.739 0.797 0.416
patches 486 243 0.883 0.831 0.898 0.571
pitted_surface 486 130 0.904 0.722 0.823 0.514
rolled-in_scale 486 171 0.609 0.456 0.503 0.219
scratches 486 154 0.81 0.832 0.885 0.562
rtdetr-r18-EMA_attentionC3
rtdetr-r18-EMA_attentionC3 summary: 373 layers, 18557592 parameters, 0 gradients
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 16/16 [00:13<00:00, 1.16it/s]
all 486 1069 0.732 0.657 0.72 0.405
crazing 486 149 0.475 0.231 0.288 0.107
inclusion 486 222 0.784 0.721 0.814 0.437
patches 486 243 0.885 0.819 0.918 0.583
pitted_surface 486 130 0.868 0.708 0.8 0.488
rolled-in_scale 486 171 0.624 0.602 0.604 0.261
scratches 486 154 0.755 0.859 0.899 0.557
rtdetr-r18-EMA_attentionC31
rtdetr-r18-EMA_attentionC31 summary: 343 layers, 19884792 parameters, 0 gradients
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 16/16 [00:14<00:00, 1.14it/s]
all 486 1069 0.74 0.668 0.718 0.407
crazing 486 149 0.469 0.255 0.261 0.104
inclusion 486 222 0.792 0.775 0.825 0.419
patches 486 243 0.865 0.86 0.91 0.584
pitted_surface 486 130 0.887 0.715 0.801 0.517
rolled-in_scale 486 171 0.617 0.532 0.61 0.274
scratches 486 154 0.809 0.87 0.903 0.542
源码详见:
RT-DETR手把手教程:NEU-DET钢材表面缺陷检测任务 | 不同网络位置加入EMA注意力进行魔改-CSDN博客