深度学习之目标检测篇——残差网络与FPN结合

  • 特征金字塔
  • 多尺度融合
  • 特征金字塔的网络原理
    在这里插入图片描述
  • 这里是基于resnet网络与Fpn做的结合,主要把resnet中的特征层利用FPN的思想一起结合,实现resnet_fpn。增强目标检测backone的有效性。
  • 代码实现如下:
import torch
from torch import Tensor
from collections import OrderedDict
import torch.nn.functional as F
from torch import nn
from torch.jit.annotations import Tuple, List, Dict


class Bottleneck(nn.Module):
    expansion = 4
    def __init__(self, in_channel, out_channel, stride=1, downsample=None, norm_layer=None):
        super(Bottleneck, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel,
                               kernel_size=(1,1), stride=(1,1), bias=False)  # squeeze channels
        self.bn1 = norm_layer(out_channel)
        # -----------------------------------------
        self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel,
                               kernel_size=(3,3), stride=(stride,stride), bias=False, padding=(1,1))
        self.bn2 = norm_layer(out_channel)
        # -----------------------------------------
        self.conv3 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel * self.expansion,
                               kernel_size=(1,1), stride=(1,1), bias=False)  # unsqueeze channels
        self.bn3 = norm_layer(out_channel * self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        out += identity
        out = self.relu(out)

        return out


class ResNet(nn.Module):

    def  __init__(self, block, blocks_num, num_classes=1000, include_top=True, norm_layer=None):
        '''
        :param block:块
        :param blocks_num:块数
        :param num_classes: 分类数
        :param include_top:
        :param norm_layer: BN
        '''
        super(ResNet, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        self._norm_layer = norm_layer

        self.include_top = include_top
        self.in_channel = 64

        self.conv1 = nn.Conv2d(in_channels=3, out_channels=self.in_channel, kernel_size=(7,7), stride=(2,2),
                               padding=(3,3), bias=False)
        self.bn1 = norm_layer(self.in_channel)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, blocks_num[0])
        self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2)
        self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2)
        self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2)

        if self.include_top:
            self.avgpool = nn.AdaptiveAvgPool2d((1, 1))  # output size = (1, 1)
            self.fc = nn.Linear(512 * block.expansion, num_classes)

        '''
        初始化
        '''
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

    def _make_layer(self, block, channel, block_num, stride=1):
        norm_layer = self._norm_layer
        downsample = None
        if stride != 1 or self.in_channel != channel * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=(1,1), stride=(stride,stride), bias=False),
                norm_layer(channel * block.expansion))

        layers = []
        layers.append(block(self.in_channel, channel, downsample=downsample,
                            stride=stride, norm_layer=norm_layer))
        self.in_channel = channel * block.expansion

        for _ in range(1, block_num):
            layers.append(block(self.in_channel, channel, norm_layer=norm_layer))

        return nn.Sequential(*layers)

    def forward(self, x):

        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        if self.include_top:
            x = self.avgpool(x)
            x = torch.flatten(x, 1)
            x = self.fc(x)

        return x


class IntermediateLayerGetter(nn.ModuleDict):
    """
    Module wrapper that returns intermediate layers from a models
    It has a strong assumption that the modules have been registered
    into the models in the same order as they are used.
    This means that one should **not** reuse the same nn.Module
    twice in the forward if you want this to work.
    Additionally, it is only able to query submodules that are directly
    assigned to the models. So if `models` is passed, `models.feature1` can
    be returned, but not `models.feature1.layer2`.
    Arguments:
        model (nn.Module): models on which we will extract the features
        return_layers (Dict[name, new_name]): a dict containing the names
            of the modules for which the activations will be returned as
            the key of the dict, and the value of the dict is the name
            of the returned activation (which the user can specify).
    """
    __annotations__ = {
        "return_layers": Dict[str, str],
    }

    def __init__(self, model, return_layers):

        if not set(return_layers).issubset([name for name, _ in model.named_children()]):
            raise ValueError("return_layers are not present in models")
        # {'layer1': '0', 'layer2': '1', 'layer3': '2', 'layer4': '3'}
        orig_return_layers = return_layers
        return_layers = {k: v for k, v in return_layers.items()}
        layers = OrderedDict()

        # 遍历模型子模块按顺序存入有序字典
        # 只保存layer4及其之前的结构,舍去之后不用的结构
        for name, module in model.named_children():
            layers[name] = module
            if name in return_layers:
                del return_layers[name]
            if not return_layers:
                break

        super(IntermediateLayerGetter, self).__init__(layers)
        self.return_layers = orig_return_layers

    def forward(self, x):
        out = OrderedDict()
        # 依次遍历模型的所有子模块,并进行正向传播,
        # 收集layer1, layer2, layer3, layer4的输出
        for name, module in self.named_children():
            x = module(x)
            if name in self.return_layers:
                out_name = self.return_layers[name]
                out[out_name] = x
        return out


class FeaturePyramidNetwork(nn.Module):
    """
    Module that adds a FPN from on top of a set of feature maps. This is based on
    `"Feature Pyramid Network for Object Detection" <https://arxiv.org/abs/1612.03144>`_.
    The feature maps are currently supposed to be in increasing depth
    order.
    The input to the models is expected to be an OrderedDict[Tensor], containing
    the feature maps on top of which the FPN will be added.
    Arguments:
        in_channels_list (list[int]): number of channels for each feature map that
            is passed to the module
        out_channels (int): number of channels of the FPN representation
        extra_blocks (ExtraFPNBlock or None): if provided, extra operations will
            be performed. It is expected to take the fpn features, the original
            features and the names of the original features as input, and returns
            a new list of feature maps and their corresponding names
    """

    def __init__(self, in_channels_list, out_channels, extra_blocks=None):
        super(FeaturePyramidNetwork, self).__init__()
        # 用来调整resnet特征矩阵(layer1,2,3,4)的channel(kernel_size=1)
        self.inner_blocks = nn.ModuleList()
        # 对调整后的特征矩阵使用3x3的卷积核来得到对应的预测特征矩阵
        self.layer_blocks = nn.ModuleList()
        for in_channels in in_channels_list:
            if in_channels == 0:
                continue
            inner_block_module = nn.Conv2d(in_channels, out_channels, (1,1))
            layer_block_module = nn.Conv2d(out_channels, out_channels, (3,3), padding=(1,1))
            self.inner_blocks.append(inner_block_module)
            self.layer_blocks.append(layer_block_module)

        # initialize parameters now to avoid modifying the initialization of top_blocks
        for m in self.children():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_uniform_(m.weight, a=1)
                nn.init.constant_(m.bias, 0)

        self.extra_blocks = extra_blocks

    def get_result_from_inner_blocks(self, x, idx):
        # type: (Tensor, int) -> Tensor
        """
        This is equivalent to self.inner_blocks[idx](x),
        but torchscript doesn't support this yet
        """
        num_blocks = len(self.inner_blocks)
        if idx < 0:
            idx += num_blocks
        i = 0
        out = x
        for module in self.inner_blocks:
            if i == idx:
                out = module(x)
            i += 1
        return out

    def get_result_from_layer_blocks(self, x, idx):
        # type: (Tensor, int) -> Tensor
        """
        This is equivalent to self.layer_blocks[idx](x),
        but torchscript doesn't support this yet
        """
        num_blocks = len(self.layer_blocks)
        if idx < 0:
            idx += num_blocks
        i = 0
        out = x
        for module in self.layer_blocks:
            if i == idx:
                out = module(x)
            i += 1
        return out

    def forward(self, x):
        # type: (Dict[str, Tensor]) -> Dict[str, Tensor]
        """
        Computes the FPN for a set of feature maps.
        Arguments:
            x (OrderedDict[Tensor]): feature maps for each feature level.
        Returns:
            results (OrderedDict[Tensor]): feature maps after FPN layers.
                They are ordered from highest resolution first.
        """
        # unpack OrderedDict into two lists for easier handling
        names = list(x.keys())
        x = list(x.values())

        # 将resnet layer4的channel调整到指定的out_channels
        # last_inner = self.inner_blocks[-1](x[-1])
        last_inner = self.get_result_from_inner_blocks(x[-1], -1)
        # result中保存着每个预测特征层
        results = []
        # 将layer4调整channel后的特征矩阵,通过3x3卷积后得到对应的预测特征矩阵
        # results.append(self.layer_blocks[-1](last_inner))
        results.append(self.get_result_from_layer_blocks(last_inner, -1))

        # 倒序遍历resenet输出特征层,以及对应inner_block和layer_block
        # layer3 -> layer2 -> layer1 (layer4已经处理过了)
        # for feature, inner_block, layer_block in zip(
        #         x[:-1][::-1], self.inner_blocks[:-1][::-1], self.layer_blocks[:-1][::-1]
        # ):
        #     if not inner_block:
        #         continue
        #     inner_lateral = inner_block(feature)
        #     feat_shape = inner_lateral.shape[-2:]
        #     inner_top_down = F.interpolate(last_inner, size=feat_shape, mode="nearest")
        #     last_inner = inner_lateral + inner_top_down
        #     results.insert(0, layer_block(last_inner))

        for idx in range(len(x) - 2, -1, -1):
            inner_lateral = self.get_result_from_inner_blocks(x[idx], idx)
            feat_shape = inner_lateral.shape[-2:]
            inner_top_down = F.interpolate(last_inner, size=feat_shape, mode="nearest")
            last_inner = inner_lateral + inner_top_down
            results.insert(0, self.get_result_from_layer_blocks(last_inner, idx))

        # 在layer4对应的预测特征层基础上生成预测特征矩阵5
        if self.extra_blocks is not None:
            results, names = self.extra_blocks(results, names)

        # make it back an OrderedDict
        out = OrderedDict([(k, v) for k, v in zip(names, results)])

        return out


class LastLevelMaxPool(torch.nn.Module):
    """
    Applies a max_pool2d on top of the last feature map
    """

    def forward(self, x, names):
        # type: (List[Tensor], List[str]) -> Tuple[List[Tensor], List[str]]
        names.append("pool")
        x.append(F.max_pool2d(x[-1], 1, 2, 0))
        return x, names


class BackboneWithFPN(nn.Module):
    """
    Adds a FPN on top of a models.
    Internally, it uses torchvision.models._utils.IntermediateLayerGetter to
    extract a submodel that returns the feature maps specified in return_layers.
    The same limitations of IntermediatLayerGetter apply here.
    Arguments:
        backbone (nn.Module)
        return_layers (Dict[name, new_name]): a dict containing the names
            of the modules for which the activations will be returned as
            the key of the dict, and the value of the dict is the name
            of the returned activation (which the user can specify).
        in_channels_list (List[int]): number of channels for each feature map
            that is returned, in the order they are present in the OrderedDict
        out_channels (int): number of channels in the FPN.
    Attributes:
        out_channels (int): the number of channels in the FPN
    """

    def __init__(self, backbone, return_layers, in_channels_list, out_channels):
        '''

        :param backbone: 特征层
        :param return_layers: 返回的层数
        :param in_channels_list: 输入通道数
        :param out_channels: 输出通道数
        '''
        super(BackboneWithFPN, self).__init__()
        '返回有序字典模型'
        self.body = IntermediateLayerGetter(backbone, return_layers=return_layers)

        self.fpn = FeaturePyramidNetwork(
            in_channels_list=in_channels_list,
            out_channels=out_channels,
            extra_blocks=LastLevelMaxPool(),
            )
        # super(BackboneWithFPN, self).__init__(OrderedDict(
        #     [("body", body), ("fpn", fpn)]))
        self.out_channels = out_channels

    def forward(self, x):
        x = self.body(x)
        x = self.fpn(x)
        return x


def resnet50_fpn_backbone():
    # FrozenBatchNorm2d的功能与BatchNorm2d类似,但参数无法更新
    # norm_layer=misc.FrozenBatchNorm2d
    resnet_backbone = ResNet(Bottleneck, [3, 4, 6, 3],
                             include_top=False)

    # freeze layers
    # 冻结layer1及其之前的所有底层权重(基础通用特征)
    for name, parameter in resnet_backbone.named_parameters():
        if 'layer2' not in name and 'layer3' not in name and 'layer4' not in name:
            '''
            冻结权重,不参与训练
            '''
            parameter.requires_grad_(False)
    # 字典名字
    return_layers = {'layer1': '0', 'layer2': '1', 'layer3': '2', 'layer4': '3'}

    # in_channel 为layer4的输出特征矩阵channel = 2048
    in_channels_stage2 = resnet_backbone.in_channel // 8

    in_channels_list = [
        in_channels_stage2,  # layer1 out_channel=256
        in_channels_stage2 * 2,  # layer2 out_channel=512
        in_channels_stage2 * 4,  # layer3 out_channel=1024
        in_channels_stage2 * 8,  # layer4 out_channel=2048
    ]
    out_channels = 256
    return BackboneWithFPN(resnet_backbone, return_layers, in_channels_list, out_channels)


if __name__ == '__main__':
    net = resnet50_fpn_backbone()
    x = torch.randn(1,3,224,224)
    for key,value in net(x).items():
        print(key,value.shape)

  • 测试结果
    在这里插入图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/939683.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

Leetcode 面试150题 399.除法求值

系列博客目录 文章目录 系列博客目录题目思路代码 题目 链接 思路 广度优先搜索 我们可以将整个问题建模成一张图&#xff1a;给定图中的一些点&#xff08;点即变量&#xff09;&#xff0c;以及某些边的权值&#xff08;权值即两个变量的比值&#xff09;&#xff0c;试…

python实现Excel转图片

目录 使用spire.xls库 使用excel2img库 使用spire.xls库 安装&#xff1a;pip install spire.xls -i https://pypi.tuna.tsinghua.edu.cn/simple 支持选择行和列截图&#xff0c;不好的一点就是商业库&#xff0c;转出来的图片有水印。 from spire.xls import Workbookdef …

hpe服务器更新阵列卡firmware

背景 操作系统&#xff1a;RHEL7.8 hpe服务器经常出现硬盘断开&#xff0c;阵列卡重启问题&#xff0c;导致系统hang住。只能手动硬重启。 I/O error&#xff0c;dev sda smartpqi 0000:5c:00:0: resettiong scsi 1:1:0:1 smartpqi 0000:5c:00:0: reset of scsi 1:1:0:1:…

excel 使用vlook up找出两列中不同的内容

当使用 VLOOKUP 函数时&#xff0c;您可以将其用于比较两列的内容。假设您要比较 A 列和 B 列的内容&#xff0c;并将结果显示在 C 列&#xff0c;您可以在 C1 单元格中输入以下公式&#xff1a; 这个公式将在 B 列中的每个单元格中查找是否存在于 A 列中。如果在 A 列中找不到…

北邮,成电计算机考研怎么选?

#总结结论&#xff1a; 基于当前提供的24考研复录数据&#xff0c;从报考性价比角度&#xff0c;建议25考研的同学优先选择北邮计算机学硕。主要原因是:相比成电&#xff0c;北邮计算机学硕的目标分数更低&#xff0c;录取率更高&#xff0c;而且北邮的地理位置优势明显。对于…

OpenHarmony和OpenVela的技术创新以及两者对比

两款有名的国内开源操作系统&#xff0c;OpenHarmony&#xff0c;OpenVela都非常的优秀。本文对二者的创新进行一个简要的介绍和对比。 一、OpenHarmony OpenHarmony具有诸多有特点的技术突破和重要贡献&#xff0c;以下是一些主要方面&#xff1a; 架构设计创新 分层架构…

C语言——实现找出最高分

问题描述&#xff1a;分别有6名学生的学号、姓名、性别、年龄和考试分数&#xff0c;找出这些学生当中考试成绩最高的学生姓名。 //找出最高分#include<stdio.h>struct student {char stu_num[10]; //学号 char stu_name[10]; //姓名 char sex; //性别 int age; …

Qt Quick:CheckBox 复选框

复选框不止选中和未选中2种状态哦&#xff0c;它还有1种部分选中的状态。这3种状态都是Qt自带的&#xff0c;如果想让复选框有部分选中这个状态&#xff0c;需要将三态属性&#xff08;tristate&#xff09;设为true。 未选中的状态值为0&#xff0c;部分选中是1&#xff0c;选…

Docker常用命令总结~

1、关于镜像 获取镜像 docker pull [image name] [option:tag]AI助手//获取postgres镜像(没有设置镜像版本号则默认获取最新的&#xff0c;使用latest标记) docker pull postgres or docker pull postgres:11.14 列出本地镜像 docker imagesAI助手 指定镜像启动一个容…

贪心算法在背包问题上的运用(Python)

背包问题 有n个物品,它们有各自的体积和价值,现有给定容量的背包,如何让背包里装入的物品具有最大的价值总和? 这就是典型的背包问题(又称为0-1背包问题),也是具体的、没有经过任何延伸的背包问题模型。 背包问题的传统求解方法较为复杂,现定义有一个可以载重为8kg的背…

大屏开源项目go-view二次开发3----象形柱图控件(C#)

环境搭建参考&#xff1a; 大屏开源项目go-view二次开发1----环境搭建(C#)-CSDN博客 要做的象形柱图控件最终效果如下图&#xff1a; 其实这个控件我前面的文章也介绍过&#xff0c;不过是用wpf做的&#xff0c;链接如下&#xff1a; wpf利用Microsoft.Web.WebView2显示html…

无刷电机的概念

无换向器电机 Brushless Direct Current Motor&#xff0c;BLDC 普通电机的转子就是中间旋转的线圈&#xff0c;定子就是两边的磁铁 和普通有刷相比&#xff0c;转子和定子互换材料。四周是通电的线圈&#xff0c;中间在转的是磁铁 负载工况决定额定电压&#xff0c;没有固定…

SLAAC如何工作?

SLAAC如何工作&#xff1f; IPv6无状态地址自动配置(SLAAC)-常见问题 - 苍然满关中 - 博客园 https://support.huawei.com/enterprise/zh/doc/EDOC1100323788?sectionj00shttps://www.zhihu.com/question/6691553243/answer/57023796400 主机在启动或接口UP后&#xff0c;发…

【机器学习】【集成学习——决策树、随机森林】从零起步:掌握决策树、随机森林与GBDT的机器学习之旅

这里写目录标题 一、引言机器学习中集成学习的重要性 二、决策树 (Decision Tree)2.1 基本概念2.2 组成元素2.3 工作原理分裂准则 2.4 决策树的构建过程2.5 决策树的优缺点&#xff08;1&#xff09;决策树的优点&#xff08;2&#xff09;决策树的缺点&#xff08;3&#xff0…

ubuntu+ros新手笔记(五):初探anaconda+cuda+pytorch

深度学习三件套&#xff1a;初探anacondacudapytorch 系统ubuntu22.04 ros2 humble 1.初探anaconda 1.1 安装 安装过程参照【详细】Ubuntu 下安装 Anaconda 1.2 创建和删除环境 创建新环境 conda create -n your_env_name pythonx.x比如我创建了一个名为“py312“的环境…

Diffusino Policy学习note

Diffusion Policy—基于扩散模型的机器人动作生成策略 - 知乎 建议看看&#xff0c;感觉普通实验室复现不了这种工作。复现了也没有太大扩展的意义。 Diffusion Policy 是监督学习吗 Diffusion Policy 通常被视为一种基于监督学习的方法&#xff0c;但它的实际训练过程可能结…

【Unity功能集】TextureShop纹理工坊(三)图层(下)

项目源码&#xff1a;在终章发布 索引 图层渲染绘画区域图层Shader 编辑器编辑模式新建图层设置当前图层上、下移动图层删除图层图层快照 图层 在PS中&#xff0c;图层的概念贯穿始终&#xff08;了解PS图层&#xff09;&#xff0c;他可以称作PS最基础也是最强大的特性之一。…

1、数据库概念和mysql表的管理

数据库概念 datebase&#xff1a;用来组织&#xff0c;存储&#xff0c;管理数据的仓库。 数据库的管理系统&#xff1a;DBMS&#xff08;用来实现对数据的有效组织、关系和存取的系统软件&#xff09; 关系型和非关系型数据库 关系型数据库&#xff1a;mysql、oracle 非关…

70 mysql 中事务的隔离级别

前言 mysql 隔离级别有四种 未提交读, 已提交读, 可重复度, 序列化执行 然后不同的隔离级别存在不同的问题 未提交读存在 脏读, 不可重复度, 幻觉读 等问题 已提交读存在 不可重复度, 幻觉读 等问题 可重复读存在 幻觉读 等问题 序列化执行 没有以上问题 然后 我们这里…

【FFmpeg】解封装 ① ( 封装与解封装流程 | 解封装函数简介 | 查找码流标号和码流参数信息 | 使用 MediaInfo 分析视频文件 )

文章目录 一、解封装1、封装与解封装流程2、解封装 常用函数 二、解封装函数简介1、avformat_alloc_context 函数2、avformat_free_context 函数3、avformat_open_input 函数4、avformat_close_input 函数5、avformat_find_stream_info 函数6、av_read_frame 函数7、avformat_s…