EFPN代码解读

论文

Extended Feature Pyramid Network for Small Object Detection
python3 D:/Project/EFPN-detectron2-master/tools/train_net.py --config-file
configs/InstanceSegmentation/pointrend_rcnn_R_50_FPN_1x_coco.yaml --num-gpus 1
训练脚本
cfg 中的配置
先获取配置文件对象 config ,一旦你获取了配置文件对象 cfg ,你可以通过修改它的属性来自定义模型和训练过程的各种设置。例如,可以通过 cfg.MODEL.WEIGHTS = "path/to/weights.pth" 来设置模型加载的预训练权重路径,或者通过 cfg.SOLVER.BASE_LR = 0.001 来设置学习率。
cfg.merge_from_file() 方法将指定配置文件中的配置选项合并到当前的配置文件对象 cfg 中,以
覆盖或添加新的配置选项。这样做的目的是将预定义模型的配置与当前的配置文件对象相结合,以确保模型在训练或推理过程中使用正确的参数和设置。
通过合并配置文件,你可以使用预定义模型的默认配置,并根据需要进行修改或覆盖特定的配置选项。 这样可以快速配置和使用预训练模型,并进行训练或推理任务。
在这里,将 cfg.MODEL.RESNETS.NUM_GROUPS 设置为 32 表示将使用 ResNeXt 模型,其中输入特征图将被分成32 个组进行卷积操作。如果将其设置为 1 ,则表示使用传统的 ResNet 模型,不进行组卷积。
通过调整 cfg.MODEL.RESNETS.NUM_GROUPS 的值,可以控制 ResNet ResNeXt 模型的架构, 以适应不同的任务和需求。
通过将 cfg.MODEL.BACKBONE.NAME 设置为 "build_resnet_fpn_backbone" ,可以 指定模型使用该函 数构建主干网络 。这意味着在模型的前向传播过程中,输入图像将通过 ResNet 网络提取特征,并与 FPN结构进行融合,以获取多尺度的特征表示。
通过设置不同的主干网络名称,可以使用不同的预定义主干网络结构或自定义的主干网络结构来适应不同的任务和数据集。
表示使用 ResNet 的第 2 3 4 5 6 层的特征图作为输入。这意味着这些层级的特征将被传递给 FPN 进行融合。通过设置不同的输入特征层,可以根据任务和数据集的需求来选择使用哪些层级的特征图进行特征融合,以获得更好的多尺度表示能力。
cfg.MODEL.RPN.BATCH_SIZE_PER_IMAGE = 128 :这句代码 设置了区域生成网络( Region
Proposal Network RPN )每张图像的正负样本比例 。在每张图像上, RPN 会生成一系列候选区
域,其中一部分是正样本(包含目标),一部分是负样本(不包含目标)。
BATCH_SIZE_PER_IMAGE 表示每张图像中的候选区域的总数,其中正样本和负样本的比例由算法自动调整。
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 256 :这句代码 设置了 ROI 头部( Region of  Interest Heads )每张图像的正负样本比例 ROI 头部是用于目标检测中对候选区域进行分类和回归的部分。 BATCH_SIZE_PER_IMAGE 表示每张图像中用于训练 ROI 头部的候选区域的总数,其中 正样本和负样本的比例由算法自动调整
cfg.SOLVER.IMS_PER_BATCH = 1 :这句代码设置了每次训练时用于更新梯度的图像批次大小。
IMS_PER_BATCH 表示每次训练使用的图像数量。在这个例子中,每次训练使用 1 张图像进行梯度更新。

构造训练器

FTT.py

用于将输入的通道数 out_channels 缩放为 out_channels * 4。
import logging
import numpy as np
import fvcore.nn.weight_init as weight_init
import torch
import torch.nn.functional as F
from torch import nn
from detectron2.layers import Conv2d, ShapeSpec, get_norm
import math
from .backbone import Backbone
from .build import BACKBONE_REGISTRY
from .resnet import build_resnet_backbone
# p2, p3 in the paper is p3, p4 for us
# format of p2, p3 is both [bs, channels, height, width] p2和p3都是张量,均表示特征
图
def FTT_get_p3pr(p2, p3, out_channels, norm):
# 1x1卷积,
channel_scaler = Conv2d(
out_channels,
out_channels * 4,
kernel_size=1,
bias=False
#norm=''
)
# 定义两个内部函数
# 用于创建内容特征的函数
# tuple of (conv2d, conv2d, iter)
# 多次应用 1x1 的卷积层和 ReLU 激活函数来提取内容特征。(内容特征也可以通过transformer
来实现)
def create_content_extractor(x, num_channels, iterations=3):
conv1 = Conv2d(
num_channels,
num_channels,
kernel_size=1,
bias=False,
#norm=get_norm(norm, num_channels),
)
conv2 = Conv2d(
num_channels,
num_channels,
kernel_size=1,
bias=False,
#norm=get_norm(norm, num_channels),
)
out = x
# 通过for循环来做
for i in range(iterations):
out = conv1(out)
out = F.relu_(out)
out = conv2(out)
out = F.relu_(out)
return out
# 创建纹理特征的函数
# 最后应用了一个输出通道数为 num_channels/2 的 1x1 卷积层,用于提取纹理特征。
def create_texture_extractor(x, num_channels, iterations=3):
conv1 = Conv2d(
num_channels,
num_channels,
kernel_size=1,
bias=False,
#norm=get_norm(norm, num_channels),
)
conv2 = Conv2d(
num_channels,
num_channels,
kernel_size=1,
bias=False,
#norm=get_norm(norm, num_channels),
)
conv3 = Conv2d(
num_channels,
int(num_channels/2),
kernel_size=1,
bias=False,
)
out = x
for i in range(iterations):
out = conv1(out)
out = F.relu_(out)
out = conv2(out)
out = F.relu_(out)
out = conv3(out)
out = F.relu_(out)
return out
bottom = p3
# 对P3进行通道缩放,通过 channel_scaler 将通道数从 channels 缩放为 channels * 4。
bottom = channel_scaler(bottom)
# 用 create_content_extractor 函数提取内容特征,将缩放后的 p3 作为输入,并将输出存储
在 bottom 变量中。
bottom = create_content_extractor(bottom, out_channels*4)
# 亚像素卷积
# 使用 nn.PixelShuffle(2) 进行像素重排,将 bottom 中的每个像素的特征图尺寸增加两倍。
sub_pixel_conv = nn.PixelShuffle(2)
# 将 p2 和重排后的 bottom 在通道维度上进行连接,形成一个新的张量 top
bottom = sub_pixel_conv(bottom)
#print("\np3 shape: ",bottom.shape,"\n")
# We interpreted "wrap" as concatenating bottom and top
# so the total channels is doubled after (basically place one on top
# of the other)
top = p2
top = torch.cat((bottom, top), axis=1)
# 使用 create_texture_extractor 函数提取纹理特征,将 top 作为输入,并将输出存储在
top 变量中。
top = create_texture_extractor(top, out_channels*2)
#top = top[:,256:]
# 残差连接部分
result = bottom + top
return result

GeneralizedRCNN(
 (backbone): FPN(
 (fpn_lateral2): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
 (fpn_output2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
 (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
 (fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
 (fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
 (fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
 (fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
 (fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
 (fpn_lateral6): Conv2d(4096, 256, kernel_size=(1, 1), stride=(1, 1))
 (fpn_output6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
 (top_block): LastLevelMaxPool()
 (bottom_up): ResNet(
 (stem): BasicStem(
 (conv1): Conv2d(
 3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
 (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
 )
 )
 (res2): Sequential(
 (0): BottleneckBlock(
 (shortcut): Conv2d(
 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
 )
 (conv1): Conv2d(
 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
 )
 (conv2): Conv2d(
 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
 )
 (conv3): Conv2d(
 256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
 )
 )
 (1): BottleneckBlock(
 (conv1): Conv2d(
 256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
 )
 (conv2): Conv2d(
 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
 )
 (conv3): Conv2d(
 256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
 )
 )
 (2): BottleneckBlock(
 (conv1): Conv2d(
 256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
 )
 (conv2): Conv2d(
 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
 )
 (conv3): Conv2d(
 256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
 )
 )
 )
 (res3): Sequential(
 (0): SingleDownsampling(
 (conv1): Conv2d(
 256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
 (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
 )
 )
 )
 (res4): Sequential(
 (0): BottleneckBlock(
 (shortcut): Conv2d(
 512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 (conv1): Conv2d(
 512, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 (conv2): Conv2d(
 1024, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 (conv3): Conv2d(
 1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 )
 (1): BottleneckBlock(
 (conv1): Conv2d(
 1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 (conv2): Conv2d(
 1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 (conv3): Conv2d(
 1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 )
 (2): BottleneckBlock(
 (conv1): Conv2d(
 1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 (conv2): Conv2d(
 1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 (conv3): Conv2d(
 1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 )
 (3): BottleneckBlock(
 (conv1): Conv2d(
 1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 (conv2): Conv2d(
 1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 (conv3): Conv2d(
 1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 )
 )
 (res5): Sequential(
 (0): BottleneckBlock(
 (shortcut): Conv2d(
 1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv1): Conv2d(
 1024, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (1): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (2): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (3): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (4): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (5): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (6): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (7): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (8): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (9): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (10): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (11): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (12): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (13): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (14): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (15): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (16): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (17): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (18): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (19): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (20): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (21): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (22): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 )
 (res6): Sequential(
 (0): BottleneckBlock(
 (shortcut): Conv2d(
 2048, 4096, kernel_size=(1, 1), stride=(2, 2), bias=False
 (norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
 )
 (conv1): Conv2d(
 2048, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
 )
 (conv2): Conv2d(
 4096, 4096, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
 )
 (conv3): Conv2d(
 4096, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
 )
 )
 (1): BottleneckBlock(
 (conv1): Conv2d(
 4096, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
 )
 (conv2): Conv2d(
 4096, 4096, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
 )
 (conv3): Conv2d(
 4096, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
 )
 )
 (2): BottleneckBlock(
 (conv1): Conv2d(
 4096, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
 )
 (conv2): Conv2d(
 4096, 4096, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
 )
 (conv3): Conv2d(
 4096, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
 )
 )
 )
 )
 )
 (proposal_generator): RPN(
 (rpn_head): StandardRPNHead(
 (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
 (objectness_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1))
 (anchor_deltas): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1))
 )
 (anchor_generator): DefaultAnchorGenerator(
 (cell_anchors): BufferList()
 )
 )
 (roi_heads): StandardROIHeads(
 (box_pooler): ROIPooler(
 (level_poolers): ModuleList(
 (0): ROIAlign(output_size=(7, 7), spatial_scale=0.25, sampling_ratio=0, aligned=True)
 (1): ROIAlign(output_size=(7, 7), spatial_scale=0.125, sampling_ratio=0, aligned=True)
 (2): ROIAlign(output_size=(7, 7), spatial_scale=0.0625, sampling_ratio=0, aligned=True)
 (3): ROIAlign(output_size=(7, 7), spatial_scale=0.03125, sampling_ratio=0, aligned=True)
 )
 )
 (box_head): FastRCNNConvFCHead(
 (flatten): Flatten(start_dim=1, end_dim=-1)
 (fc1): Linear(in_features=12544, out_features=1024, bias=True)
 (fc_relu1): ReLU()
 (fc2): Linear(in_features=1024, out_features=1024, bias=True)
 (fc_relu2): ReLU()
 )
 (box_predictor): FastRCNNOutputLayers(
 (cls_score): Linear(in_features=1024, out_features=81, bias=True)
 (bbox_pred): Linear(in_features=1024, out_features=320, bias=True)
 )
 )
)
GeneralizedRCNN(
 (backbone): FPN(
 (fpn_lateral2): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
 (fpn_output2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
 (fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
 (fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
 (fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
 (fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
 (fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
 (fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
 (fpn_lateral6): Conv2d(4096, 256, kernel_size=(1, 1), stride=(1, 1))
 (fpn_output6): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
 (top_block): LastLevelMaxPool()
 (bottom_up): ResNet(
 (stem): BasicStem(
 (conv1): Conv2d(
 3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
 (norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
 )
 )
 (res2): Sequential(
 (0): BottleneckBlock(
 (shortcut): Conv2d(
 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
 )
 (conv1): Conv2d(
 64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
 )
 (conv2): Conv2d(
 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
 )
 (conv3): Conv2d(
 256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
 )
 )
 (1): BottleneckBlock(
 (conv1): Conv2d(
 256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
 )
 (conv2): Conv2d(
 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
 )
 (conv3): Conv2d(
 256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
 )
 )
 (2): BottleneckBlock(
 (conv1): Conv2d(
 256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
 )
 (conv2): Conv2d(
 256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
 )
 (conv3): Conv2d(
 256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
 )
 )
 )
 (res3): Sequential(
 (0): SingleDownsampling(
 (conv1): Conv2d(
 256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
 (norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
 )
 )
 )
 (res4): Sequential(
 (0): BottleneckBlock(
 (shortcut): Conv2d(
 512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 (conv1): Conv2d(
 512, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 (conv2): Conv2d(
 1024, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 (conv3): Conv2d(
 1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 )
 (1): BottleneckBlock(
 (conv1): Conv2d(
 1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 (conv2): Conv2d(
 1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 (conv3): Conv2d(
 1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 )
 (2): BottleneckBlock(
 (conv1): Conv2d(
 1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 (conv2): Conv2d(
 1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 (conv3): Conv2d(
 1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 )
 (3): BottleneckBlock(
 (conv1): Conv2d(
 1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 (conv2): Conv2d(
 1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 (conv3): Conv2d(
 1024, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
 )
 )
 )
 (res5): Sequential(
 (0): BottleneckBlock(
 (shortcut): Conv2d(
 1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv1): Conv2d(
 1024, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (1): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (2): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (3): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (4): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 (5): BottleneckBlock(
 (conv1): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv2): Conv2d(
 2048, 2048, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 (conv3): Conv2d(
 2048, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
 )
 )
 )
 (res6): Sequential(
 (0): BottleneckBlock(
 (shortcut): Conv2d(
 2048, 4096, kernel_size=(1, 1), stride=(2, 2), bias=False
 (norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
 )
 (conv1): Conv2d(
 2048, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
 )
 (conv2): Conv2d(
 4096, 4096, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
 )
 (conv3): Conv2d(
 4096, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
 )
 )
 (1): BottleneckBlock(
 (conv1): Conv2d(
 4096, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
 )
 (conv2): Conv2d(
 4096, 4096, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
 )
 (conv3): Conv2d(
 4096, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
 )
 )
 (2): BottleneckBlock(
 (conv1): Conv2d(
 4096, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
 )
 (conv2): Conv2d(
 4096, 4096, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False
 (norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
 )
 (conv3): Conv2d(
 4096, 4096, kernel_size=(1, 1), stride=(1, 1), bias=False
 (norm): FrozenBatchNorm2d(num_features=4096, eps=1e-05)
 )
 )
 )
 )
 )
 (proposal_generator): RPN(
 (rpn_head): StandardRPNHead(
 (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
 (objectness_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1))
 (anchor_deltas): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1))
 )
 (anchor_generator): DefaultAnchorGenerator(
 (cell_anchors): BufferList()
 )
 )
 (roi_heads): StandardROIHeads(
 (box_pooler): ROIPooler(
 (level_poolers): ModuleList(
 (0): ROIAlign(output_size=(7, 7), spatial_scale=0.25, sampling_ratio=0, aligned=True)
 (1): ROIAlign(output_size=(7, 7), spatial_scale=0.125, sampling_ratio=0, aligned=True)
 (2): ROIAlign(output_size=(7, 7), spatial_scale=0.0625, sampling_ratio=0, aligned=True)
 (3): ROIAlign(output_size=(7, 7), spatial_scale=0.03125, sampling_ratio=0, aligned=True)
 )
 )
 (box_head): FastRCNNConvFCHead(
 (flatten): Flatten(start_dim=1, end_dim=-1)
 (fc1): Linear(in_features=12544, out_features=1024, bias=True)
 (fc_relu1): ReLU()
 (fc2): Linear(in_features=1024, out_features=1024, bias=True)
 (fc_relu2): ReLU()
 )
 (box_predictor): FastRCNNOutputLayers(
 (cls_score): Linear(in_features=1024, out_features=81, bias=True)
 (bbox_pred): Linear(in_features=1024, out_features=320, bias=True)
 )
 )
)

HAT代码

CAB

由于基于 Transformer 的结构通常需要大量的通道来嵌入令牌,因此直接使用具有恒定宽度的卷积会产生很大的计算成本。因此,我们用常数β 压缩两个卷积层的通道数。对于具有 C 个通道的输入特征,第一个卷积层之后的输出特征的通道数被压缩为C/ β ,然后通过第二层将特征扩展到 C 个通道。接下来,利用标准CA 模块 [68] 自适应地重新缩放信道特征。

HAB

W-MSA

窗口划分

Linear

forward的过程

PatchMergin
在视觉注意力机制中 引入更大感受野的上下文信息 ,以帮助模型更好地理解图像。通过将输入特征划分为四个子区域并进行合并

OCAB

具体来说, nn.Linear(dim, dim * 3, bias=qkv_bias) 创建了一个线性变换层,它接受维度为
dim 的输入特征,并将其映射到维度为 dim * 3 的输出。这里的 dim * 3 是因为输出包含了查询
q )、键( k )和值( v )三个部分。
该线性变换层的权重矩阵的形状为 (dim * 3, dim) ,表示将输入特征的每个元素与权重矩阵相乘,然后进行偏置项的加和。 bias=qkv_bias 参数用于控制是否包含偏置项。
通过这个线性变换层,输入特征经过映射后可以分别得到查询( q )、键( k )和值( v )的表示,用于后续的注意力计算。
_no_grad_trunc_normal_ 函数通过截断正态分布初始化给定的张量,并确保生成的值位于指定的范围内,以帮助模型的初始化和训练。
forward函数

改进
原本的普通 FTT ,改成了使用 SwinTransformer 来提取特征的 FTT 模块
未改之前的损失

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/510922.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

JavaWeb 项目运行配置

JavaWeb 项目运行配置

保持ssh断开后,程序不会停止执行

保持ssh断开后,程序不会停止执行 一、前言 笔者做远程部署搞了一阵子,快结项时发现一旦我关闭了ssh连接窗口,远程服务器会自动杀掉我在ssh连接状态下运行的程序。 这怎么行,岂不是想要它一直运行还得要一台电脑一直打开ssh连接咯…

【优选算法专栏】专题十六:BFS解决最短路问题---前言

本专栏内容为:算法学习专栏,分为优选算法专栏,贪心算法专栏,动态规划专栏以及递归,搜索与回溯算法专栏四部分。 通过本专栏的深入学习,你可以了解并掌握算法。 💓博主csdn个人主页:小…

【QingHub】企业级应用开发管理

QingHub 企业级应用开发设计器是QingHub Studio的一个核心模块,它可以实现应用搭建、团队管理,共享开发,可以快速接入API接口,复杂功能可以通过自定义脚本快速实现业务逻辑。打通前端开发与后台业务逻辑一体化。通过可视化的方式&…

使用 PDManer 对数据库表建模(建表语句生成,代码生成)

目录 前言 基本使用教程 新建项目 创建表 关系图 建表语句 生成代码 导入 前言 在软件开发中过程中,一般分为几个过程:需求分析、概要设计、详细设计、编码实现、软件测试和软件交付。 在概要设计和详细设计过程中,则需要对业务进…

苍穹外卖06(HttpClient,微信小程序开发,微信登录流程,获取授权码从微信平台获取用户信息)

目录 一、HttpClient 1. 介绍 2. 入门案例 1 导入依赖(已有) 2 GET方式请求 2 POST方式请求 二、微信小程序开发 1. 介绍 2. 准备工作 1 注册小程序获取AppID 注册小程序 完善小程序信息 2 下载并安装开发者工具 3 设置小程序开发者工具(必做) 3. 入门案例 1 小…

CentOS 7 下离线安装RabbitMQ教程

CentOS 7 下安装RabbitMQ教程一、做准备(VMWare 虚拟机上的 CentOS 7 镜像 上安装的) (1)准备RabbitMQ的安装包(rabbitmq-server-3.8.5-1.el7.noarch)下载地址mq https://github.com/rabbitmq/rabbitmq-se…

基于51单片机的简易计算器设计

1、任务 本课题模拟计算器设计硬件电路采用三部分电路模块构成, 第一部分是键盘模块电路,采用4*4矩阵式键盘作为输入电路; 第二部分是LCD1602液晶显示模块; 第三部分是以51单片机作为控制核心。 软件程序主要由三部分组成&am…

AWS-EKS 给其他IAM赋予集群管理权限

AWS EKS 设计了权限管理系统,A用户创建的集群 B用户是看不到并且不能管理和使用kubectl的,所以我们需要共同管理集群时就需要操场共享集群访问给其他IAM用户。 两种方式添加集群控制权限(前提:使用有管理权限的用户操作&#xff…

子集与全排列问题(力扣78,90,46,47)

系列文章目录 子集和全排列问题与下面的组合都是属于回溯方法里的,相信结合前两期,再看这篇笔记,更有助于大家对本系列的理解 一、组合回溯问题 二、组合总和问题 文章目录 系列文章目录题目子集一、思路二、解题方法三、Code 子集II一、思…

基于SSM的网上打印管理

摘要 进入二十一世纪以来,计算机技术蓬勃发展,人们的生活发生了许多变化。很多时候人们不需要亲力亲为的做一些事情,通过网络即可完成以往需要花费很多时间的操作,这可以提升人们的生活质量。计算机技术对人们生活的改变不仅仅包…

不会还有程序员不知道这几个宝藏学习平台吧?还不来码住!

咱作为程序员可谓是赶上了发展的时代啊!前有ChatGPT,后有5G、物联网等等层出不穷。这正是咱大展身手、大赚一笔的好时候啊!有多少人趁着风口期大干一笔,狠狠投入,最终不说是top级别,也至少是羡煞众人啊&…

最新AI智能系统ChatGPT网站源码V6.3版本,GPTs、AI绘画、AI换脸、垫图混图+(SparkAi系统搭建部署教程文档)

一、前言 SparkAi创作系统是基于ChatGPT进行开发的Ai智能问答系统和Midjourney绘画系统,支持OpenAI-GPT全模型国内AI全模型。本期针对源码系统整体测试下来非常完美,那么如何搭建部署AI创作ChatGPT?小编这里写一个详细图文教程吧。已支持GPT…

[从0开始AIGC][Transformer相关]:Transformer中的激活函数:Relu、GELU、GLU、Swish

[从0开始AIGC][Transformer相关]:Transformer中的激活函数 文章目录 [从0开始AIGC][Transformer相关]:Transformer中的激活函数1. FFN 块 计算公式?2. GeLU 计算公式?3. Swish 计算公式?4. 使用 GLU 线性门控单元的 FF…

Redis基本配置及安装

Redis也叫Remote dictionary server,是一个开源的基于内存的数据存储系统。它可以用作数据库、缓存和消息队列等各种场景。它也是目前最热门的NoSQL数据库之一 以下是NoSQL的定义 随着互联网的快速发展,应用系统的访问量越来越大,数据库的性能瓶颈越来越…

自动驾驶中基于Transformer的传感器融合:研究综述

自动驾驶中基于Transformer的传感器融合:研究综述 论文链接:https://arxiv.org/pdf/2302.11481.pdf 调研链接:https://github.com/ApoorvRoboticist/Transformers-Sensor-Fusion 附赠自动驾驶学习资料和量产经验:链接 摘要 本…

解密JavaScript混淆:剖析JScrambler、JSFack、JShaman等五款常用加密工具

摘要 本篇技术博客将介绍五款常用且好用的在线JavaScript加密混淆工具,包括 jscrambler、JShaman、jsfack、freejsobfuscator 和 jjencode。通过对这些工具的功能及使用方法进行详细解析,帮助开发人员更好地保护和加密其 JavaScript 代码,提…

图的应用试题

01.任何一个无向连通图的最小生成树( )。 A.有一棵或多棵 B.只有一棵 C.一定有多棵 D.可能不存在 02.用Prim算法和Kruskal算法构造图的最小生成树,…

食物链(并查集) 维护权值写法,非常详细,适合新手服用

题目描述: 动物王国中有三类动物 A,B,C这三类动物的食物链构成了有趣的环形。 A 吃 B,B 吃 C,C吃 A。 现有 N 个动物,以 1∼N 编号。 每个动物都是 A,B,C 中的一种,但是我们并不知道它到底是哪一种。 有人用两种说法对…

YOLOv8全网独家改进: 小目标 | CAMixing:卷积-注意融合模块和多尺度提取能力 | 2024年4月最新成果

💡💡💡本文独家改进:CAMixingBlock更好的提取全局上下文信息和局部特征,包括两个部分:卷积-注意融合模块和多尺度前馈网络; 💡💡💡红外小目标实现涨点,只有几个像素的小目标识别率提升明显 💡💡💡如何跟YOLOv8结合:1)放在backbone后增强对全局和局部特…