一、本文介绍
本文给大家带来的是RT-DETR改进DAT(Vision Transformer with Deformable Attention)的教程,其发布于2022年CVPR2022上同时被评选为Best Paper,由此可以证明其是一种十分有效的改进机制,其主要的核心思想是:引入可变形注意力机制和动态采样点(听着是不是和可变形动态卷积DCN挺相似)。本文的讲解主要包含三方面:DAT的网络结构思想、DAttention的代码复现,如何添加DAttention到你的结构中实现涨点,本文改进是基于ResNet18、ResNet34、ResNet50、ResNet101,文章中均以提供,本专栏的改进内容全网独一份深度改进RT-DETR非那种无效Neck部分改进,同时本文的改进也支持主干上的即插即用,本文内容也支持PP-HGNetV2版本的修改。
专栏目录: RT-DETR改进有效系列目录 | 包含卷积、主干、RepC3、注意力机制、Neck上百种创新机制
专栏链接:RT-DETR剑指论文专栏,持续复现各种顶会内容——论文收割机RT-DETR
二、DAT的网络结构思想
论文地址: DAT论文地址
官方地址:官方代码的地址
2.1 DAT的主要思想和改进
DAT(Vision Transformer with Deformable Attention)是一种引入了可变形注意力机制的视觉Transformer,DAT的核心思想主要包括以下几个方面:
-
可变形注意力(Deformable Attention):传统的Transformer使用标准的自注意力机制,这种机制会处理图像中的所有像素,导致计算量很大。而DAT引入了可变形注意力机制,它只关注图像中的一小部分关键区域。这种方法可以显著减少计算量,同时保持良好的性能。
-
动态采样点:在可变形注意力机制中,DAT动态地选择采样点,而不是固定地处理整个图像。这种动态选择机制使得模型可以更加集中地关注于那些对当前任务最重要的区域。
-
即插即用:DAT的设计允许它适应不同的图像大小和内容,使其在多种视觉任务中都能有效工作,如图像分类、对象检测等。
总结:DAT通过引入可变形注意力机制,改进了视觉Transformer的效率和性能,使其在处理复杂的视觉任务时更加高效和准确。
2.2 DAT的网络结构图
(a) 展示了可变形注意力的信息流。左侧部分,一组参考点均匀地放置在特征图上,这些点的偏移量是由查询通过偏移网络学习得到的。然后,如右侧所示,根据变形点从采样特征中投影出变形的键和值。相对位置偏差也通过变形点计算,增强了输出转换特征的多头注意力。为了清晰展示,图中仅显示了4个参考点,但在实际实现中实际上有更多的点。
(b) 展示了偏移生成网络的详细结构,每层输入和输出特征图的大小都有标注(这个Offset network在网络的代码中需要控制可添加可不添加)。
通过上面的方式产生多种参考点分布在图像上,从而提高检测的效率,最终的效果图如下->
2.3 DAT和其他机制的对比
DAT与其他视觉Transformer模型和CNN模型中的DCN(可变形卷积网络)的对比图如下,突出了它们处理查询的不同方法(图片展示的很直观,不给大家描述过程了):
三、DAT即插即用的代码块
下面的代码是DAT的网络结构代码,官方的代码中存在许多bug而且参数都未定义,这里我替大家都行了修改而且在使用时无需手动添加任何参数(但是本文的方法需要按照有参的注意力机制添加但是只是不需要进行传入参数在yaml文件中,ya'm),我都设置了通过模型进行了自动计算,使用方法看章节四。
import numpy as np
import torch
import torch.nn as nn
import einops
from timm.models.layers import trunc_normal_
__all__ = ['DAttentionBaseline', 'BasicBlock_DAT', 'BottleNeck_DAT']
class LayerNormProxy(nn.Module):
def __init__(self, dim):
super().__init__()
self.norm = nn.LayerNorm(dim)
def forward(self, x):
x = einops.rearrange(x, 'b c h w -> b h w c')
x = self.norm(x)
return einops.rearrange(x, 'b h w c -> b c h w')
class DAttentionBaseline(nn.Module):
def __init__(
self, q_size=(224, 224), kv_size=(224, 224), n_heads=8, n_head_channels=32, n_groups=1,
attn_drop=0.0, proj_drop=0.0, stride=1,
offset_range_factor=-1, use_pe=True, dwc_pe=True,
no_off=False, fixed_pe=False, ksize=9, log_cpb=False
):
super().__init__()
n_head_channels = int(q_size / 8)
q_size = (q_size, q_size)
self.dwc_pe = dwc_pe
self.n_head_channels = n_head_channels
self.scale = self.n_head_channels ** -0.5
self.n_heads = n_heads
self.q_h, self.q_w = q_size
# self.kv_h, self.kv_w = kv_size
self.kv_h, self.kv_w = self.q_h // stride, self.q_w // stride
self.nc = n_head_channels * n_heads
self.n_groups = n_groups
self.n_group_channels = self.nc // self.n_groups
self.n_group_heads = self.n_heads // self.n_groups
self.use_pe = use_pe
self.fixed_pe = fixed_pe
self.no_off = no_off
self.offset_range_factor = offset_range_factor
self.ksize = ksize
self.log_cpb = log_cpb
self.stride = stride
kk = self.ksize
pad_size = kk // 2 if kk != stride else 0
self.conv_offset = nn.Sequential(
nn.Conv2d(self.n_group_channels, self.n_group_channels, kk, stride, pad_size, groups=self.n_group_channels),
LayerNormProxy(self.n_group_channels),
nn.GELU(),
nn.Conv2d(self.n_group_channels, 2, 1, 1, 0, bias=False)
)
if self.no_off:
for m in self.conv_offset.parameters():
m.requires_grad_(False)
self.proj_q = nn.Conv2d(
self.nc, self.nc,
kernel_size=1, stride=1, padding=0
)
self.proj_k = nn.Conv2d(
self.nc, self.nc,
kernel_size=1, stride=1, padding=0)
self.proj_v = nn.Conv2d(
self.nc, self.nc,
kernel_size=1, stride=1, padding=0
)
self.proj_out = nn.Conv2d(
self.nc, self.nc,
kernel_size=1, stride=1, padding=0
)
self.proj_drop = nn.Dropout(proj_drop, inplace=True)
self.attn_drop = nn.Dropout(attn_drop, inplace=True)
if self.use_pe and not self.no_off:
if self.dwc_pe:
self.rpe_table = nn.Conv2d(
self.nc, self.nc, kernel_size=3, stride=1, padding=1, groups=self.nc)
elif self.fixed_pe:
self.rpe_table = nn.Parameter(
torch.zeros(self.n_heads, self.q_h * self.q_w, self.kv_h * self.kv_w)
)
trunc_normal_(self.rpe_table, std=0.01)
elif self.log_cpb:
# Borrowed from Swin-V2
self.rpe_table = nn.Sequential(
nn.Linear(2, 32, bias=True),
nn.ReLU(inplace=True),
nn.Linear(32, self.n_group_heads, bias=False)
)
else:
self.rpe_table = nn.Parameter(
torch.zeros(self.n_heads, self.q_h * 2 - 1, self.q_w * 2 - 1)
)
trunc_normal_(self.rpe_table, std=0.01)
else:
self.rpe_table = None
@torch.no_grad()
def _get_ref_points(self, H_key, W_key, B, dtype, device):
ref_y, ref_x = torch.meshgrid(
torch.linspace(0.5, H_key - 0.5, H_key, dtype=dtype, device=device),
torch.linspace(0.5, W_key - 0.5, W_key, dtype=dtype, device=device),
indexing='ij'
)
ref = torch.stack((ref_y, ref_x), -1)
ref[..., 1].div_(W_key - 1.0).mul_(2.0).sub_(1.0)
ref[..., 0].div_(H_key - 1.0).mul_(2.0).sub_(1.0)
ref = ref[None, ...].expand(B * self.n_groups, -1, -1, -1) # B * g H W 2
return ref
@torch.no_grad()
def _get_q_grid(self, H, W, B, dtype, device):
ref_y, ref_x = torch.meshgrid(
torch.arange(0, H, dtype=dtype, device=device),
torch.arange(0, W, dtype=dtype, device=device),
indexing='ij'
)
ref = torch.stack((ref_y, ref_x), -1)
ref[..., 1].div_(W - 1.0).mul_(2.0).sub_(1.0)
ref[..., 0].div_(H - 1.0).mul_(2.0).sub_(1.0)
ref = ref[None, ...].expand(B * self.n_groups, -1, -1, -1) # B * g H W 2
return ref
def forward(self, x):
x = x
B, C, H, W = x.size()
dtype, device = x.dtype, x.device
q = self.proj_q(x)
q_off = einops.rearrange(q, 'b (g c) h w -> (b g) c h w', g=self.n_groups, c=self.n_group_channels)
offset = self.conv_offset(q_off).contiguous() # B * g 2 Hg Wg
Hk, Wk = offset.size(2), offset.size(3)
n_sample = Hk * Wk
if self.offset_range_factor >= 0 and not self.no_off:
offset_range = torch.tensor([1.0 / (Hk - 1.0), 1.0 / (Wk - 1.0)], device=device).reshape(1, 2, 1, 1)
offset = offset.tanh().mul(offset_range).mul(self.offset_range_factor)
offset = einops.rearrange(offset, 'b p h w -> b h w p')
reference = self._get_ref_points(Hk, Wk, B, dtype, device)
if self.no_off:
offset = offset.fill_(0.0)
if self.offset_range_factor >= 0:
pos = offset + reference
else:
pos = (offset + reference).clamp(-1., +1.)
if self.no_off:
x_sampled = F.avg_pool2d(x, kernel_size=self.stride, stride=self.stride)
assert x_sampled.size(2) == Hk and x_sampled.size(3) == Wk, f"Size is {x_sampled.size()}"
else:
x_sampled = F.grid_sample(
input=x.reshape(B * self.n_groups, self.n_group_channels, H, W),
grid=pos[..., (1, 0)], # y, x -> x, y
mode='bilinear', align_corners=True) # B * g, Cg, Hg, Wg
x_sampled = x_sampled.reshape(B, C, 1, n_sample)
# self.proj_k.weight = torch.nn.Parameter(self.proj_k.weight.float())
# self.proj_k.bias = torch.nn.Parameter(self.proj_k.bias.float())
# self.proj_v.weight = torch.nn.Parameter(self.proj_v.weight.float())
# self.proj_v.bias = torch.nn.Parameter(self.proj_v.bias.float())
# 检查权重的数据类型
q = q.reshape(B * self.n_heads, self.n_head_channels, H * W)
k = self.proj_k(x_sampled).reshape(B * self.n_heads, self.n_head_channels, n_sample)
v = self.proj_v(x_sampled).reshape(B * self.n_heads, self.n_head_channels, n_sample)
attn = torch.einsum('b c m, b c n -> b m n', q, k) # B * h, HW, Ns
attn = attn.mul(self.scale)
if self.use_pe and (not self.no_off):
if self.dwc_pe:
residual_lepe = self.rpe_table(q.reshape(B, C, H, W)).reshape(B * self.n_heads, self.n_head_channels,
H * W)
elif self.fixed_pe:
rpe_table = self.rpe_table
attn_bias = rpe_table[None, ...].expand(B, -1, -1, -1)
attn = attn + attn_bias.reshape(B * self.n_heads, H * W, n_sample)
elif self.log_cpb:
q_grid = self._get_q_grid(H, W, B, dtype, device)
displacement = (
q_grid.reshape(B * self.n_groups, H * W, 2).unsqueeze(2) - pos.reshape(B * self.n_groups,
n_sample,
2).unsqueeze(1)).mul(
4.0) # d_y, d_x [-8, +8]
displacement = torch.sign(displacement) * torch.log2(torch.abs(displacement) + 1.0) / np.log2(8.0)
attn_bias = self.rpe_table(displacement) # B * g, H * W, n_sample, h_g
attn = attn + einops.rearrange(attn_bias, 'b m n h -> (b h) m n', h=self.n_group_heads)
else:
rpe_table = self.rpe_table
rpe_bias = rpe_table[None, ...].expand(B, -1, -1, -1)
q_grid = self._get_q_grid(H, W, B, dtype, device)
displacement = (
q_grid.reshape(B * self.n_groups, H * W, 2).unsqueeze(2) - pos.reshape(B * self.n_groups,
n_sample,
2).unsqueeze(1)).mul(
0.5)
attn_bias = F.grid_sample(
input=einops.rearrange(rpe_bias, 'b (g c) h w -> (b g) c h w', c=self.n_group_heads,
g=self.n_groups),
grid=displacement[..., (1, 0)],
mode='bilinear', align_corners=True) # B * g, h_g, HW, Ns
attn_bias = attn_bias.reshape(B * self.n_heads, H * W, n_sample)
attn = attn + attn_bias
attn = F.softmax(attn, dim=2)
attn = self.attn_drop(attn)
out = torch.einsum('b m n, b c n -> b c m', attn, v)
if self.use_pe and self.dwc_pe:
out = out + residual_lepe
out = out.reshape(B, C, H, W)
y = self.proj_drop(self.proj_out(out))
h, w = pos.reshape(B, self.n_groups, Hk, Wk, 2), reference.reshape(B, self.n_groups, Hk, Wk, 2)
return y
from collections import OrderedDict
import torch.nn.functional as F
class ConvNormLayer(nn.Module):
def __init__(self,
ch_in,
ch_out,
filter_size,
stride,
groups=1,
act=None):
super(ConvNormLayer, self).__init__()
self.act = act
self.conv = nn.Conv2d(
in_channels=ch_in,
out_channels=ch_out,
kernel_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2,
groups=groups)
self.norm = nn.BatchNorm2d(ch_out)
def forward(self, inputs):
out = self.conv(inputs)
out = self.norm(out)
if self.act:
out = getattr(F, self.act)(out)
return out
class BasicBlock_DAT(nn.Module):
expansion = 1
def __init__(self,
ch_in,
ch_out,
stride,
shortcut,
act='relu',
variant='b',
att=False):
super(BasicBlock_DAT, self).__init__()
self.shortcut = shortcut
if not shortcut:
if variant == 'd' and stride == 2:
self.short = nn.Sequential()
self.short.add_sublayer(
'pool',
nn.AvgPool2d(
kernel_size=2, stride=2, padding=0, ceil_mode=True))
self.short.add_sublayer(
'conv',
ConvNormLayer(
ch_in=ch_in,
ch_out=ch_out,
filter_size=1,
stride=1))
else:
self.short = ConvNormLayer(
ch_in=ch_in,
ch_out=ch_out,
filter_size=1,
stride=stride)
self.branch2a = ConvNormLayer(
ch_in=ch_in,
ch_out=ch_out,
filter_size=3,
stride=stride,
act='relu')
self.branch2b = ConvNormLayer(
ch_in=ch_out,
ch_out=ch_out,
filter_size=3,
stride=1,
act=None)
self.att = att
if self.att:
self.se = DAttentionBaseline(ch_out)
def forward(self, inputs):
out = self.branch2a(inputs)
out = self.branch2b(out)
if self.att:
out = self.se(out)
if self.shortcut:
short = inputs
else:
short = self.short(inputs)
out = out + short
out = F.relu(out)
return out
class BottleNeck_DAT(nn.Module):
expansion = 4
def __init__(self, ch_in, ch_out, stride, shortcut, act='relu', variant='d', att=False):
super().__init__()
if variant == 'a':
stride1, stride2 = stride, 1
else:
stride1, stride2 = 1, stride
width = ch_out
self.branch2a = ConvNormLayer(ch_in, width, 1, stride1, act=act)
self.branch2b = ConvNormLayer(width, width, 3, stride2, act=act)
self.branch2c = ConvNormLayer(width, ch_out * self.expansion, 1, 1)
self.shortcut = shortcut
if not shortcut:
if variant == 'd' and stride == 2:
self.short = nn.Sequential(OrderedDict([
('pool', nn.AvgPool2d(2, 2, 0, ceil_mode=True)),
('conv', ConvNormLayer(ch_in, ch_out * self.expansion, 1, 1))
]))
else:
self.short = ConvNormLayer(ch_in, ch_out * self.expansion, 1, stride)
self.att = att
if self.att:
self.se = DAttentionBaseline(ch_out * 4)
def forward(self, x):
out = self.branch2a(x)
out = self.branch2b(out)
out = self.branch2c(out)
if self.att:
out = self.se(out)
if self.shortcut:
short = x
else:
short = self.short(x)
out = out + short
out = F.relu(out)
return out
四、添加DAT到你的网络中
修改教程分两种,一种是替换修改ResNet中的Basicclock/Bottleneck模块的,一种是在主干上即插即用的修改教程,如果你只需要一种那么修改对应的就行,互相之间并不影响,需要注意的是即插即用的需要修改ResNet改进才行,链接如下:
ResNet文章地址:【RT-DETR改进涨点】ResNet18、34、50、101等多个版本移植到ultralytics仓库(RT-DETR官方一比一移植)
4.1 修改Basicclock/Bottleneck的教程
4.1.1 修改一
第一还是建立文件,我们找到如下ultralytics/nn/modules文件夹下建立一个目录名字呢就是'Addmodules'文件夹(用群内的文件的话已经有了无需新建)!然后在其内部建立一个新的py文件将核心代码复制粘贴进去即可。
4.1.2 修改二
第二步此处需要注意,因为我这里默认大家修改了ResNet系列的模型了,同级目录下应该有一个ResNet.py的文件夹,我们这里需要找到我们'ultralytics/nn/Addmodules/ResNet.py'创建的ResNet的文件夹(默认大家已经创建了!!!)
我们只需要修改上面的两步即可,后面复制yaml文件进行运行即可了,修改方法大家只要仔细看是非常简单的。
4.2 修改主干上即插即用的教程
4.2.1 修改一(如果修改了4.1教程此步无需修改)
第一还是建立文件,我们找到如下ultralytics/nn/modules文件夹下建立一个目录名字呢就是'Addmodules'文件夹(用群内的文件的话已经有了无需新建)!然后在其内部建立一个新的py文件将核心代码复制粘贴进去即可。
4.2.2 修改二
第二步我们在该目录下创建一个新的py文件名字为'__init__.py'(用群内的文件的话已经有了无需新建),然后在其内部导入我们的检测头如下图所示。
4.2.3 修改三
第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块(用群内的文件的话已经有了无需重新导入直接开始第四步即可)!
从今天开始以后的教程就都统一成这个样子了,因为我默认大家用了我群内的文件来进行修改!!
4.2.4 修改四
按照我的添加在parse_model里添加即可。
elif m in {DAttentionBaseline}:
c2 = ch[f]
args = [c2, *args]
到此就修改完成了,大家可以复制下面的yaml文件运行。
五、DAT的yaml文件
5.1 替换ResNet的yaml文件1(ResNet18版本)
需要修改如下的ResNet主干才可以运行本文的改进机制 !
ResNet文章地址:【RT-DETR改进涨点】ResNet18、34、50、101等多个版本移植到ultralytics仓库(RT-DETR官方一比一移植)
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, ConvNormLayer, [32, 3, 2, 1, 'relu']] # 0-P1
- [-1, 1, ConvNormLayer, [32, 3, 1, 1, 'relu']] # 1
- [-1, 1, ConvNormLayer, [64, 3, 1, 1, 'relu']] # 2
- [-1, 1, nn.MaxPool2d, [3, 2, 1]] # 3-P2
- [-1, 2, Blocks, [64, BasicBlock_DAT, 2, True]] # 4
- [-1, 2, Blocks, [128, BasicBlock_DAT, 3, True]] # 5-P3
- [-1, 2, Blocks, [256, BasicBlock_DAT, 4, True]] # 6-P4
- [-1, 2, Blocks, [512, BasicBlock_DAT, 5, True]] # 7-P5
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 8 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 10, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 11
- [6, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 12 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256, 0.5]] # 14, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 15, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 16
- [5, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 17 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 18 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (19), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 20, downsample_convs.0
- [[-1, 15], 1, Concat, [1]] # 21 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F4 (22), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 23, downsample_convs.1
- [[-1, 10], 1, Concat, [1]] # 24 cat Y5
- [-1, 3, RepC3, [256, 0.5]] # F5 (25), pan_blocks.1
- [[19, 22, 25], 1, RTDETRDecoder, [nc, 256, 300, 4, 8, 3]] # Detect(P3, P4, P5)
5.2 替换ResNet的yaml文件1(ResNet50版本)
需要修改如下的ResNet主干才可以运行本文的改进机制 !
ResNet文章地址:【RT-DETR改进涨点】ResNet18、34、50、101等多个版本移植到ultralytics仓库(RT-DETR官方一比一移植)
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, ConvNormLayer, [32, 3, 2, 1, 'relu']] # 0-P1
- [-1, 1, ConvNormLayer, [32, 3, 1, 1, 'relu']] # 1
- [-1, 1, ConvNormLayer, [64, 3, 1, 1, 'relu']] # 2
- [-1, 1, nn.MaxPool2d, [3, 2, 1]] # 3-P2
- [-1, 3, Blocks, [64, BottleNeck_DAT, 2, True]] # 4
- [-1, 4, Blocks, [128, BottleNeck_DAT, 3, True]] # 5-P3
- [-1, 6, Blocks, [256, BottleNeck_DAT, 4, True]] # 6-P4
- [-1, 3, Blocks, [512, BottleNeck_DAT, 5, True]] # 7-P5
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 8 input_proj.2
- [-1, 1, AIFI, [1024, 8]] # 9
- [-1, 1, Conv, [256, 1, 1]] # 10, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 11
- [6, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 12 input_proj.1
- [[-2, -1], 1, Concat, [1]] # 13
- [-1, 3, RepC3, [256]] # 14, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 15, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 16
- [5, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 17 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 18 cat backbone P4
- [-1, 3, RepC3, [256]] # X3 (19), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 20, downsample_convs.0
- [[-1, 15], 1, Concat, [1]] # 21 cat Y4
- [-1, 3, RepC3, [256]] # F4 (22), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 23, downsample_convs.1
- [[-1, 10], 1, Concat, [1]] # 24 cat Y5
- [-1, 3, RepC3, [256]] # F5 (25), pan_blocks.1
- [[19, 22, 25], 1, RTDETRDecoder, [nc, 256, 300, 4, 8, 6]] # Detect(P3, P4, P5)
5.3 即插即用的yaml文件(HGNetV2版本)
此版本为HGNetV2-l的yaml文件!
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, HGStem, [32, 48]] # 0-P2/4
- [-1, 6, HGBlock, [48, 128, 3]] # stage 1
- [-1, 1, DWConv, [128, 3, 2, 1, False]] # 2-P3/8
- [-1, 6, HGBlock, [96, 512, 3]] # stage 2
- [-1, 1, DWConv, [512, 3, 2, 1, False]] # 4-P3/16
- [-1, 6, HGBlock, [192, 1024, 5, True, False]] # cm, c2, k, light, shortcut
- [-1, 6, HGBlock, [192, 1024, 5, True, True]]
- [-1, 6, HGBlock, [192, 1024, 5, True, True]] # stage 3
- [-1, 1, DWConv, [1024, 3, 2, 1, False]] # 8-P4/32
- [-1, 6, HGBlock, [384, 2048, 5, True, False]] # stage 4
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256]] # 16, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 17, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 19 input_proj.0
- [[-2, -1], 1, Concat, [1]] # cat backbone P4
- [-1, 3, RepC3, [256]] # X3 (21), fpn_blocks.1
- [-1, 1, DAttentionBaseline, []] # 22
- [-1, 1, Conv, [384, 3, 2]] # 23, downsample_convs.0
- [[-1, 17], 1, Concat, [1]] # cat Y4
- [-1, 3, RepC3, [256]] # F4 (25), pan_blocks.0
- [-1, 1, DAttentionBaseline, []] # 26
- [-1, 1, Conv, [384, 3, 2]] # 27, downsample_convs.1
- [[-1, 12], 1, Concat, [1]] # cat Y5
- [-1, 3, RepC3, [256]] # F5 (29), pan_blocks.1
- [-1, 1, DAttentionBaseline, []] # 30
- [[22, 26, 30], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
六、成功运行记录
6.1 ResNet18运行成功记录截图
6.2 ResNet50运行成功记录截图
6.3 HGNetv2运行成功记录截图
七、全文总结
到此本文的正式分享内容就结束了,在这里给大家推荐我的RT-DETR改进有效涨点专栏,本专栏目前为新开的平均质量分98分,后期我会根据各种最新的前沿顶会进行论文复现,也会对一些老的改进机制进行补充,如果大家觉得本文帮助到你了,订阅本专栏,关注后续更多的更新~
专栏链接:RT-DETR剑指论文专栏,持续复现各种顶会内容——论文收割机RT-DETR