论文
Extended Feature Pyramid Network for Small Object Detection
python3 D:/Project/EFPN-detectron2-master/tools/train_net.py --config-file
configs/InstanceSegmentation/pointrend_rcnn_R_50_FPN_1x_coco.yaml --num-gpus 1
训练脚本
cfg
中的配置
先获取配置文件对象
config
,一旦你获取了配置文件对象
cfg
,你可以通过修改它的属性来自定义模型和训练过程的各种设置。例如,可以通过 cfg.MODEL.WEIGHTS = "path/to/weights.pth"
来设置模型加载的预训练权重路径,或者通过 cfg.SOLVER.BASE_LR = 0.001
来设置学习率。
cfg.merge_from_file()
方法将指定配置文件中的配置选项合并到当前的配置文件对象
cfg
中,以
覆盖或添加新的配置选项。这样做的目的是将预定义模型的配置与当前的配置文件对象相结合,以确保模型在训练或推理过程中使用正确的参数和设置。
通过合并配置文件,你可以使用预定义模型的默认配置,并根据需要进行修改或覆盖特定的配置选项。 这样可以快速配置和使用预训练模型,并进行训练或推理任务。
在这里,将
cfg.MODEL.RESNETS.NUM_GROUPS
设置为
32
表示将使用
ResNeXt
模型,其中输入特征图将被分成32
个组进行卷积操作。如果将其设置为
1
,则表示使用传统的
ResNet
模型,不进行组卷积。
通过调整
cfg.MODEL.RESNETS.NUM_GROUPS
的值,可以控制
ResNet
或
ResNeXt
模型的架构,
以适应不同的任务和需求。
通过将
cfg.MODEL.BACKBONE.NAME
设置为
"build_resnet_fpn_backbone"
,可以
指定模型使用该函
数构建主干网络
。这意味着在模型的前向传播过程中,输入图像将通过
ResNet
网络提取特征,并与
FPN结构进行融合,以获取多尺度的特征表示。
通过设置不同的主干网络名称,可以使用不同的预定义主干网络结构或自定义的主干网络结构来适应不同的任务和数据集。
表示使用
ResNet
的第
2
、
3
、
4
、
5
和
6
层的特征图作为输入。这意味着这些层级的特征将被传递给
FPN
进行融合。通过设置不同的输入特征层,可以根据任务和数据集的需求来选择使用哪些层级的特征图进行特征融合,以获得更好的多尺度表示能力。
cfg.MODEL.RPN.BATCH_SIZE_PER_IMAGE = 128
:这句代码
设置了区域生成网络(
Region
Proposal Network
,
RPN
)每张图像的正负样本比例
。在每张图像上,
RPN
会生成一系列候选区
域,其中一部分是正样本(包含目标),一部分是负样本(不包含目标)。
BATCH_SIZE_PER_IMAGE
表示每张图像中的候选区域的总数,其中正样本和负样本的比例由算法自动调整。
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 256
:这句代码
设置了
ROI
头部(
Region of
Interest Heads
)每张图像的正负样本比例
。
ROI
头部是用于目标检测中对候选区域进行分类和回归的部分。 BATCH_SIZE_PER_IMAGE
表示每张图像中用于训练
ROI
头部的候选区域的总数,其中
正样本和负样本的比例由算法自动调整
。
cfg.SOLVER.IMS_PER_BATCH = 1
:这句代码设置了每次训练时用于更新梯度的图像批次大小。
IMS_PER_BATCH
表示每次训练使用的图像数量。在这个例子中,每次训练使用
1
张图像进行梯度更新。
构造训练器
FTT.py
用于将输入的通道数
out_channels
缩放为
out_channels * 4。
import logging
import numpy as np
import fvcore.nn.weight_init as weight_init
import torch
import torch.nn.functional as F
from torch import nn
from detectron2.layers import Conv2d, ShapeSpec, get_norm
import math
from .backbone import Backbone
from .build import BACKBONE_REGISTRY
from .resnet import build_resnet_backbone
# p2, p3 in the paper is p3, p4 for us
# format of p2, p3 is both [bs, channels, height, width] p2和p3都是张量,均表示特征
图
def FTT_get_p3pr(p2, p3, out_channels, norm):
# 1x1卷积,
channel_scaler = Conv2d(
out_channels,
out_channels * 4,
kernel_size=1,
bias=False
#norm=''
)
# 定义两个内部函数
# 用于创建内容特征的函数
# tuple of (conv2d, conv2d, iter)
# 多次应用 1x1 的卷积层和 ReLU 激活函数来提取内容特征。(内容特征也可以通过transformer
来实现)
def create_content_extractor(x, num_channels, iterations=3):
conv1 = Conv2d(
num_channels,
num_channels,
kernel_size=1,
bias=False,
#norm=get_norm(norm, num_channels),
)
conv2 = Conv2d(
num_channels,
num_channels,
kernel_size=1,
bias=False,
#norm=get_norm(norm, num_channels),
)
out = x
# 通过for循环来做
for i in range(iterations):
out = conv1(out)
out = F.relu_(out)
out = conv2(out)
out = F.relu_(out)
return out
# 创建纹理特征的函数
# 最后应用了一个输出通道数为 num_channels/2 的 1x1 卷积层,用于提取纹理特征。
def create_texture_extractor(x, num_channels, iterations=3):
conv1 = Conv2d(
num_channels,
num_channels,
kernel_size=1,
bias=False,
#norm=get_norm(norm, num_channels),
)
conv2 = Conv2d(
num_channels,
num_channels,
kernel_size=1,
bias=False,
#norm=get_norm(norm, num_channels),
)
conv3 = Conv2d(
num_channels,
int(num_channels/2),
kernel_size=1,
bias=False,
)
out = x
for i in range(iterations):
out = conv1(out)
out = F.relu_(out)
out = conv2(out)
out = F.relu_(out)
out = conv3(out)
out = F.relu_(out)
return out
bottom = p3
# 对P3进行通道缩放,通过 channel_scaler 将通道数从 channels 缩放为 channels * 4。
bottom = channel_scaler(bottom)
# 用 create_content_extractor 函数提取内容特征,将缩放后的 p3 作为输入,并将输出存储
在 bottom 变量中。
bottom = create_content_extractor(bottom, out_channels*4)
# 亚像素卷积
# 使用 nn.PixelShuffle(2) 进行像素重排,将 bottom 中的每个像素的特征图尺寸增加两倍。
sub_pixel_conv = nn.PixelShuffle(2)
# 将 p2 和重排后的 bottom 在通道维度上进行连接,形成一个新的张量 top
bottom = sub_pixel_conv(bottom)
#print("\np3 shape: ",bottom.shape,"\n")
# We interpreted "wrap" as concatenating bottom and top
# so the total channels is doubled after (basically place one on top
# of the other)
top = p2
top = torch.cat((bottom, top), axis=1)
# 使用 create_texture_extractor 函数提取纹理特征,将 top 作为输入,并将输出存储在
top 变量中。
top = create_texture_extractor(top, out_channels*2)
#top = top[:,256:]
# 残差连接部分
result = bottom + top
return result
GeneralizedRCNN(
(backbone): FPN(
(fpn_lateral2): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(fpn_output2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
(fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
(fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
(fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fpn_lateral6): Conv2d(4096, 256, kernel_size=(1, 1), stride=(1, 1))
(fpn_output6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(top_block): LastLevelMaxPool()
(bottom_up): ResNet(
(stem): BasicStem(
(conv1): Conv2d(
3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
)
(res2): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv1): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
)
(res3): Sequential(
(0): SingleDownsampling(
(conv1): Conv2d(
256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
)
(res4): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
(conv1): Conv2d(
512, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
(conv2): Conv2d(
1024, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
(conv3): Conv2d(
1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
(conv2): Conv2d(
1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
(conv3): Conv2d(
1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
(conv2): Conv2d(
1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
(conv3): Conv2d(
1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(3): BottleneckBlock(
(conv1): Conv2d(
1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
(conv2): Conv2d(
1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
(conv3): Conv2d(
1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
)
(res5): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv1): Conv2d(
1024, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(3): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(4): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(5): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(6): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(7): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(8): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(9): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(10): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(11): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(12): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(13): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(14): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(15): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(16): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(17): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(18): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(19): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(20): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(21): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(22): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
)
(res6): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
2048, 4096, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
)
(conv1): Conv2d(
2048, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
)
(conv2): Conv2d(
4096, 4096, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
)
(conv3): Conv2d(
4096, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
4096, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
)
(conv2): Conv2d(
4096, 4096, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
)
(conv3): Conv2d(
4096, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
4096, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
)
(conv2): Conv2d(
4096, 4096, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
)
(conv3): Conv2d(
4096, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
)
)
)
)
)
(proposal_generator): RPN(
(rpn_head): StandardRPNHead(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(objectness_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1))
(anchor_deltas): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1))
)
(anchor_generator): DefaultAnchorGenerator(
(cell_anchors): BufferList()
)
)
(roi_heads): StandardROIHeads(
(box_pooler): ROIPooler(
(level_poolers): ModuleList(
(0): ROIAlign(output_size=(7, 7), spatial_scale=0.25, sampling_ratio=0, aligned=True)
(1): ROIAlign(output_size=(7, 7), spatial_scale=0.125, sampling_ratio=0, aligned=True)
(2): ROIAlign(output_size=(7, 7), spatial_scale=0.0625, sampling_ratio=0, aligned=True)
(3): ROIAlign(output_size=(7, 7), spatial_scale=0.03125, sampling_ratio=0, aligned=True)
)
)
(box_head): FastRCNNConvFCHead(
(flatten): Flatten(start_dim=1, end_dim=-1)
(fc1): Linear(in_features=12544, out_features=1024, bias=True)
(fc_relu1): ReLU()
(fc2): Linear(in_features=1024, out_features=1024, bias=True)
(fc_relu2): ReLU()
)
(box_predictor): FastRCNNOutputLayers(
(cls_score): Linear(in_features=1024, out_features=81, bias=True)
(bbox_pred): Linear(in_features=1024, out_features=320, bias=True)
)
)
)
GeneralizedRCNN(
(backbone): FPN(
(fpn_lateral2): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(fpn_output2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
(fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
(fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
(fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fpn_lateral6): Conv2d(4096, 256, kernel_size=(1, 1), stride=(1, 1))
(fpn_output6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(top_block): LastLevelMaxPool()
(bottom_up): ResNet(
(stem): BasicStem(
(conv1): Conv2d(
3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
)
(res2): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv1): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
)
(res3): Sequential(
(0): SingleDownsampling(
(conv1): Conv2d(
256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
)
(res4): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
(conv1): Conv2d(
512, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
(conv2): Conv2d(
1024, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
(conv3): Conv2d(
1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
(conv2): Conv2d(
1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
(conv3): Conv2d(
1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
(conv2): Conv2d(
1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
(conv3): Conv2d(
1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(3): BottleneckBlock(
(conv1): Conv2d(
1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
(conv2): Conv2d(
1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
(conv3): Conv2d(
1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
)
(res5): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv1): Conv2d(
1024, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(3): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(4): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(5): BottleneckBlock(
(conv1): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv2): Conv2d(
2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv3): Conv2d(
2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
)
(res6): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
2048, 4096, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
)
(conv1): Conv2d(
2048, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
)
(conv2): Conv2d(
4096, 4096, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
)
(conv3): Conv2d(
4096, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
4096, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
)
(conv2): Conv2d(
4096, 4096, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
)
(conv3): Conv2d(
4096, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
4096, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
)
(conv2): Conv2d(
4096, 4096, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
(norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
)
(conv3): Conv2d(
4096, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
)
)
)
)
)
(proposal_generator): RPN(
(rpn_head): StandardRPNHead(
(conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(objectness_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1))
(anchor_deltas): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1))
)
(anchor_generator): DefaultAnchorGenerator(
(cell_anchors): BufferList()
)
)
(roi_heads): StandardROIHeads(
(box_pooler): ROIPooler(
(level_poolers): ModuleList(
(0): ROIAlign(output_size=(7, 7), spatial_scale=0.25, sampling_ratio=0, aligned=True)
(1): ROIAlign(output_size=(7, 7), spatial_scale=0.125, sampling_ratio=0, aligned=True)
(2): ROIAlign(output_size=(7, 7), spatial_scale=0.0625, sampling_ratio=0, aligned=True)
(3): ROIAlign(output_size=(7, 7), spatial_scale=0.03125, sampling_ratio=0, aligned=True)
)
)
(box_head): FastRCNNConvFCHead(
(flatten): Flatten(start_dim=1, end_dim=-1)
(fc1): Linear(in_features=12544, out_features=1024, bias=True)
(fc_relu1): ReLU()
(fc2): Linear(in_features=1024, out_features=1024, bias=True)
(fc_relu2): ReLU()
)
(box_predictor): FastRCNNOutputLayers(
(cls_score): Linear(in_features=1024, out_features=81, bias=True)
(bbox_pred): Linear(in_features=1024, out_features=320, bias=True)
)
)
)
HAT代码
CAB
由于基于
Transformer
的结构通常需要大量的通道来嵌入令牌,因此直接使用具有恒定宽度的卷积会产生很大的计算成本。因此,我们用常数β
压缩两个卷积层的通道数。对于具有
C
个通道的输入特征,第一个卷积层之后的输出特征的通道数被压缩为C/
β
,然后通过第二层将特征扩展到
C
个通道。接下来,利用标准CA
模块
[68]
自适应地重新缩放信道特征。
HAB
W-MSA
窗口划分
Linear
forward的过程
PatchMergin
在视觉注意力机制中
引入更大感受野的上下文信息
,以帮助模型更好地理解图像。通过将输入特征划分为四个子区域并进行合并
OCAB
具体来说,
nn.Linear(dim, dim * 3, bias=qkv_bias)
创建了一个线性变换层,它接受维度为
dim
的输入特征,并将其映射到维度为
dim * 3
的输出。这里的
dim * 3
是因为输出包含了查询
(
q
)、键(
k
)和值(
v
)三个部分。
该线性变换层的权重矩阵的形状为
(dim * 3, dim)
,表示将输入特征的每个元素与权重矩阵相乘,然后进行偏置项的加和。 bias=qkv_bias
参数用于控制是否包含偏置项。
通过这个线性变换层,输入特征经过映射后可以分别得到查询(
q
)、键(
k
)和值(
v
)的表示,用于后续的注意力计算。
_no_grad_trunc_normal_
函数通过截断正态分布初始化给定的张量,并确保生成的值位于指定的范围内,以帮助模型的初始化和训练。
forward函数
改进
原本的普通
FTT
,改成了使用
SwinTransformer
来提取特征的
FTT
模块
未改之前的损失