DB算法原理与构建

参考:
https://aistudio.baidu.com/projectdetail/4483048

Real-Time Scene Text Detection with Differentiable Binarization

如何读论文-by 李沐

DB (Real-Time Scene Text Detection with Differentiable Binarization)

原理

DB是一个基于分割的文本检测算法,其提出的可微分阈值,采用动态的阈值区分文本区域与背景
在这里插入图片描述
基于分割的普通文本检测算法,流程如上图蓝色箭头所示,得到分割结果后采用固定的阈值(标准二值化不可微,导致网络无法端到端训练)得到二值化的分割图,之后采用诸如像素聚类的启发式算法得到文本区域。

DB算法的流程如图中红色箭头所示,最大的不同在于DB有一个阈值图,通过网络去预测图片每个位置处的阈值,而不是采用一个固定的值,更好的分离文本背景与前景。

优势:
1.算法结构简单,无需繁琐的后处理
2.开源数据上拥有良好的精度和性能

DB算法提出了可微二值化,可微二值化将标准二值化中的阶跃函数进行了近似,使用如下公式进行代替:

在这里插入图片描述
在这里插入图片描述
DB算法整体结构:
在这里插入图片描述
输入的图像经过网络Backbone和FPN提取特征,提取后的特征级联在一起,得到原图四分之一大小的特征,然后利用卷积层分别得到文本区域预测概率图和阈值图,进而通过DB的后处理得到文本包围曲线。

DB文本检测模型构建

DB文本检测模型可以分为三个部分:

Backbone网络,负责提取图像的特征
FPN网络,特征金字塔结构增强特征
Head网络,计算文本区域概率图

backbone网络:论文中使用了ResNet50,本节实验中,为了加快训练速度,采用MobileNetV3 large结构作为backbone。

DB的Backbone用于提取图像的多尺度特征,如下代码所示,假设输入的形状为[640, 640],backbone网络的输出有四个特征,其形状分别是 [1, 16, 160, 160],[1, 24, 80, 80], [1, 56, 40, 40],[1, 480, 20, 20]。 这些特征将输入给特征金字塔FPN网络进一步的增强特征。

import paddle 
from ppocr.modeling.backbones.det_mobilenet_v3 import MobileNetV3

fake_inputs = paddle.randn([1, 3, 640, 640], dtype="float32")

# 1. 声明Backbone
model_backbone = MobileNetV3()
model_backbone.eval()

# 2. 执行预测
outs = model_backbone(fake_inputs)

# 3. 打印网络结构
# print(model_backbone)

# 4. 打印输出特征形状
for idx, out in enumerate(outs):
    print("The index is ", idx, "and the shape of output is ", out.shape)

FPN网络

特征金字塔结构FPN是一种卷积网络来高效提取图片中各维度特征的常用方法。
FPN网络的输入为Backbone部分的输出,输出特征图的高度和宽度为原图的四分之一。假设输入图像的形状为[1, 3, 640, 640],FPN输出特征的高度和宽度为[160, 160]

 import paddle
from paddle import nn
import paddle.nn.functional as F
from paddle import ParamAttr

class DBFPN(nn.Layer):
    def __init__(self, in_channels, out_channels, **kwargs):
        super(DBFPN, self).__init__()
        self.out_channels = out_channels

        # DBFPN详细实现参考: https://github.com/PaddlePaddle/PaddleOCRblob/release%2F2.4/ppocr/modeling/necks/db_fpn.py

    def forward(self, x):
        c2, c3, c4, c5 = x

        in5 = self.in5_conv(c5)
        in4 = self.in4_conv(c4)
        in3 = self.in3_conv(c3)
        in2 = self.in2_conv(c2)

        # 特征上采样
        out4 = in4 + F.upsample(
            in5, scale_factor=2, mode="nearest", align_mode=1)  # 1/16
        out3 = in3 + F.upsample(
            out4, scale_factor=2, mode="nearest", align_mode=1)  # 1/8
        out2 = in2 + F.upsample(
            out3, scale_factor=2, mode="nearest", align_mode=1)  # 1/4

        p5 = self.p5_conv(in5)
        p4 = self.p4_conv(out4)
        p3 = self.p3_conv(out3)
        p2 = self.p2_conv(out2)

        # 特征上采样
        p5 = F.upsample(p5, scale_factor=8, mode="nearest", align_mode=1)
        p4 = F.upsample(p4, scale_factor=4, mode="nearest", align_mode=1)
        p3 = F.upsample(p3, scale_factor=2, mode="nearest", align_mode=1)

        fuse = paddle.concat([p5, p4, p3, p2], axis=1)
        return fuse

Head网络

计算文本区域概率图,文本区域阈值图以及文本区域二值图。
DB Head网络会在FPN特征的基础上作上采样,将FPN特征由原图的四分之一大小映射到原图大小。


import math
import paddle
from paddle import nn
import paddle.nn.functional as F
from paddle import ParamAttr

class DBHead(nn.Layer):
    """
    Differentiable Binarization (DB) for text detection:
        see https://arxiv.org/abs/1911.08947
    args:
        params(dict): super parameters for build DB network
    """

    def __init__(self, in_channels, k=50, **kwargs):
        super(DBHead, self).__init__()
        self.k = k

        # DBHead详细实现参考 https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.4/ppocr/modeling/heads/det_db_head.py

    def step_function(self, x, y):
        # 可微二值化实现,通过概率图和阈值图计算文本分割二值图
        return paddle.reciprocal(1 + paddle.exp(-self.k * (x - y)))

    def forward(self, x, targets=None):
        shrink_maps = self.binarize(x)
        if not self.training:
            return {'maps': shrink_maps}

        threshold_maps = self.thresh(x)
        binary_maps = self.step_function(shrink_maps, threshold_maps)
        y = paddle.concat([shrink_maps, threshold_maps, binary_maps], axis=1)
        return {'maps': y}
# 1. 从PaddleOCR中imort DBHead
from ppocr.modeling.heads.det_db_head import DBHead
import paddle 

# 2. 计算DBFPN网络输出结果
fake_inputs = paddle.randn([1, 3, 640, 640], dtype="float32")
model_backbone = MobileNetV3()
in_channles = model_backbone.out_channels
model_fpn = DBFPN(in_channels=in_channles, out_channels=256)
outs = model_backbone(fake_inputs)
fpn_outs = model_fpn(outs)

# 3. 声明Head网络
model_db_head = DBHead(in_channels=256)

# 4. 打印DBhead网络
print(model_db_head)

# 5. 计算Head网络的输出
db_head_outs = model_db_head(fpn_outs)
print(f"The shape of fpn outs {fpn_outs.shape}")
print(f"The shape of DB head outs {db_head_outs['maps'].shape}")

在这里插入图片描述

运行后发现报错:
类不完整,于是重新到github paddle ocr目录下下载相应文件
db_fpn.py
det_db_head.py

完整代码:

# from paddle import nn
# 
# import paddle
# from paddle import nn
# import paddle.nn.functional as F
# from paddle import ParamAttr
# 
# import math
# import paddle
# from paddle import nn
# import paddle.nn.functional as F
# from paddle import ParamAttr
# 
# # import paddle
# # from ppocr.modeling.backbones.det_mobilenet_v3 import MobileNetV3

import math
import paddle
from paddle import nn
import paddle.nn.functional as F
from paddle import ParamAttr



def make_divisible(v, divisor=8, min_value=None):
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v


class MobileNetV3(nn.Layer):
    def __init__(self,
                 in_channels=3,
                 model_name='large',
                 scale=0.5,
                 disable_se=False,
                 **kwargs):
        """
        the MobilenetV3 backbone network for detection module.
        Args:
            params(dict): the super parameters for build network
        """
        super(MobileNetV3, self).__init__()

        self.disable_se = disable_se

        if model_name == "large":
            cfg = [
                # k, exp, c,  se,     nl,  s,
                [3, 16, 16, False, 'relu', 1],
                [3, 64, 24, False, 'relu', 2],
                [3, 72, 24, False, 'relu', 1],
                [5, 72, 40, True, 'relu', 2],
                [5, 120, 40, True, 'relu', 1],
                [5, 120, 40, True, 'relu', 1],
                [3, 240, 80, False, 'hardswish', 2],
                [3, 200, 80, False, 'hardswish', 1],
                [3, 184, 80, False, 'hardswish', 1],
                [3, 184, 80, False, 'hardswish', 1],
                [3, 480, 112, True, 'hardswish', 1],
                [3, 672, 112, True, 'hardswish', 1],
                [5, 672, 160, True, 'hardswish', 2],
                [5, 960, 160, True, 'hardswish', 1],
                [5, 960, 160, True, 'hardswish', 1],
            ]
            cls_ch_squeeze = 960
        elif model_name == "small":
            cfg = [
                # k, exp, c,  se,     nl,  s,
                [3, 16, 16, True, 'relu', 2],
                [3, 72, 24, False, 'relu', 2],
                [3, 88, 24, False, 'relu', 1],
                [5, 96, 40, True, 'hardswish', 2],
                [5, 240, 40, True, 'hardswish', 1],
                [5, 240, 40, True, 'hardswish', 1],
                [5, 120, 48, True, 'hardswish', 1],
                [5, 144, 48, True, 'hardswish', 1],
                [5, 288, 96, True, 'hardswish', 2],
                [5, 576, 96, True, 'hardswish', 1],
                [5, 576, 96, True, 'hardswish', 1],
            ]
            cls_ch_squeeze = 576
        else:
            raise NotImplementedError("mode[" + model_name +
                                      "_model] is not implemented!")

        supported_scale = [0.35, 0.5, 0.75, 1.0, 1.25]
        assert scale in supported_scale, \
            "supported scale are {} but input scale is {}".format(supported_scale, scale)
        inplanes = 16
        # conv1
        self.conv = ConvBNLayer(
            in_channels=in_channels,
            out_channels=make_divisible(inplanes * scale),
            kernel_size=3,
            stride=2,
            padding=1,
            groups=1,
            if_act=True,
            act='hardswish')

        self.stages = []
        self.out_channels = []
        block_list = []
        i = 0
        inplanes = make_divisible(inplanes * scale)
        for (k, exp, c, se, nl, s) in cfg:
            se = se and not self.disable_se
            start_idx = 2 if model_name == 'large' else 0
            if s == 2 and i > start_idx:
                self.out_channels.append(inplanes)
                self.stages.append(nn.Sequential(*block_list))
                block_list = []
            block_list.append(
                ResidualUnit(
                    in_channels=inplanes,
                    mid_channels=make_divisible(scale * exp),
                    out_channels=make_divisible(scale * c),
                    kernel_size=k,
                    stride=s,
                    use_se=se,
                    act=nl))
            inplanes = make_divisible(scale * c)
            i += 1
        block_list.append(
            ConvBNLayer(
                in_channels=inplanes,
                out_channels=make_divisible(scale * cls_ch_squeeze),
                kernel_size=1,
                stride=1,
                padding=0,
                groups=1,
                if_act=True,
                act='hardswish'))
        self.stages.append(nn.Sequential(*block_list))
        self.out_channels.append(make_divisible(scale * cls_ch_squeeze))
        for i, stage in enumerate(self.stages):
            self.add_sublayer(sublayer=stage, name="stage{}".format(i))

    def forward(self, x):
        x = self.conv(x)
        out_list = []
        for stage in self.stages:
            x = stage(x)
            out_list.append(x)
        return out_list


class ConvBNLayer(nn.Layer):
    def __init__(self,
                 in_channels,
                 out_channels,
                 kernel_size,
                 stride,
                 padding,
                 groups=1,
                 if_act=True,
                 act=None):
        super(ConvBNLayer, self).__init__()
        self.if_act = if_act
        self.act = act
        self.conv = nn.Conv2D(
            in_channels=in_channels,
            out_channels=out_channels,
            kernel_size=kernel_size,
            stride=stride,
            padding=padding,
            groups=groups,
            bias_attr=False)

        self.bn = nn.BatchNorm(num_channels=out_channels, act=None)

    def forward(self, x):
        x = self.conv(x)
        x = self.bn(x)
        if self.if_act:
            if self.act == "relu":
                x = F.relu(x)
            elif self.act == "hardswish":
                x = F.hardswish(x)
            else:
                print("The activation function({}) is selected incorrectly.".
                      format(self.act))
                exit()
        return x


class ResidualUnit(nn.Layer):
    def __init__(self,
                 in_channels,
                 mid_channels,
                 out_channels,
                 kernel_size,
                 stride,
                 use_se,
                 act=None):
        super(ResidualUnit, self).__init__()
        self.if_shortcut = stride == 1 and in_channels == out_channels
        self.if_se = use_se

        self.expand_conv = ConvBNLayer(
            in_channels=in_channels,
            out_channels=mid_channels,
            kernel_size=1,
            stride=1,
            padding=0,
            if_act=True,
            act=act)
        self.bottleneck_conv = ConvBNLayer(
            in_channels=mid_channels,
            out_channels=mid_channels,
            kernel_size=kernel_size,
            stride=stride,
            padding=int((kernel_size - 1) // 2),
            groups=mid_channels,
            if_act=True,
            act=act)
        if self.if_se:
            self.mid_se = SEModule(mid_channels)
        self.linear_conv = ConvBNLayer(
            in_channels=mid_channels,
            out_channels=out_channels,
            kernel_size=1,
            stride=1,
            padding=0,
            if_act=False,
            act=None)

    def forward(self, inputs):
        x = self.expand_conv(inputs)
        x = self.bottleneck_conv(x)
        if self.if_se:
            x = self.mid_se(x)
        x = self.linear_conv(x)
        if self.if_shortcut:
            x = paddle.add(inputs, x)
        return x


class SEModule(nn.Layer):
    def __init__(self, in_channels, reduction=4):
        super(SEModule, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2D(1)
        self.conv1 = nn.Conv2D(
            in_channels=in_channels,
            out_channels=in_channels // reduction,
            kernel_size=1,
            stride=1,
            padding=0)
        self.conv2 = nn.Conv2D(
            in_channels=in_channels // reduction,
            out_channels=in_channels,
            kernel_size=1,
            stride=1,
            padding=0)

    def forward(self, inputs):
        outputs = self.avg_pool(inputs)
        outputs = self.conv1(outputs)
        outputs = F.relu(outputs)
        outputs = self.conv2(outputs)
        outputs = F.hardsigmoid(outputs, slope=0.2, offset=0.5)
        return inputs * outputs


class DBFPN(nn.Layer):
    def __init__(self, in_channels, out_channels, **kwargs):
        super(DBFPN, self).__init__()
        self.out_channels = out_channels
        weight_attr = paddle.nn.initializer.KaimingUniform()

        self.in2_conv = nn.Conv2D(
            in_channels=in_channels[0],
            out_channels=self.out_channels,
            kernel_size=1,
            weight_attr=ParamAttr(initializer=weight_attr),
            bias_attr=False)
        self.in3_conv = nn.Conv2D(
            in_channels=in_channels[1],
            out_channels=self.out_channels,
            kernel_size=1,
            weight_attr=ParamAttr(initializer=weight_attr),
            bias_attr=False)
        self.in4_conv = nn.Conv2D(
            in_channels=in_channels[2],
            out_channels=self.out_channels,
            kernel_size=1,
            weight_attr=ParamAttr(initializer=weight_attr),
            bias_attr=False)
        self.in5_conv = nn.Conv2D(
            in_channels=in_channels[3],
            out_channels=self.out_channels,
            kernel_size=1,
            weight_attr=ParamAttr(initializer=weight_attr),
            bias_attr=False)
        self.p5_conv = nn.Conv2D(
            in_channels=self.out_channels,
            out_channels=self.out_channels // 4,
            kernel_size=3,
            padding=1,
            weight_attr=ParamAttr(initializer=weight_attr),
            bias_attr=False)
        self.p4_conv = nn.Conv2D(
            in_channels=self.out_channels,
            out_channels=self.out_channels // 4,
            kernel_size=3,
            padding=1,
            weight_attr=ParamAttr(initializer=weight_attr),
            bias_attr=False)
        self.p3_conv = nn.Conv2D(
            in_channels=self.out_channels,
            out_channels=self.out_channels // 4,
            kernel_size=3,
            padding=1,
            weight_attr=ParamAttr(initializer=weight_attr),
            bias_attr=False)
        self.p2_conv = nn.Conv2D(
            in_channels=self.out_channels,
            out_channels=self.out_channels // 4,
            kernel_size=3,
            padding=1,
            weight_attr=ParamAttr(initializer=weight_attr),
            bias_attr=False)

    def forward(self, x):
        c2, c3, c4, c5 = x

        in5 = self.in5_conv(c5)
        in4 = self.in4_conv(c4)
        in3 = self.in3_conv(c3)
        in2 = self.in2_conv(c2)

        out4 = in4 + F.upsample(
            in5, scale_factor=2, mode="nearest", align_mode=1)  # 1/16
        out3 = in3 + F.upsample(
            out4, scale_factor=2, mode="nearest", align_mode=1)  # 1/8
        out2 = in2 + F.upsample(
            out3, scale_factor=2, mode="nearest", align_mode=1)  # 1/4

        p5 = self.p5_conv(in5)
        p4 = self.p4_conv(out4)
        p3 = self.p3_conv(out3)
        p2 = self.p2_conv(out2)
        p5 = F.upsample(p5, scale_factor=8, mode="nearest", align_mode=1)
        p4 = F.upsample(p4, scale_factor=4, mode="nearest", align_mode=1)
        p3 = F.upsample(p3, scale_factor=2, mode="nearest", align_mode=1)

        fuse = paddle.concat([p5, p4, p3, p2], axis=1)
        return fuse
# class DBFPN(nn.Layer):
#     def __init__(self, in_channels, out_channels, **kwargs):
#         super(DBFPN, self).__init__()
#         self.out_channels = out_channels
#
#         # DBFPN详细实现参考: https://github.com/PaddlePaddle/PaddleOCRblob/release%2F2.4/ppocr/modeling/necks/db_fpn.py
#
#     def forward(self, x):
#         c2, c3, c4, c5 = x
#
#         in5 = self.in5_conv(c5)
#         in4 = self.in4_conv(c4)
#         in3 = self.in3_conv(c3)
#         in2 = self.in2_conv(c2)
#
#         # 特征上采样
#         out4 = in4 + F.upsample(
#             in5, scale_factor=2, mode="nearest", align_mode=1)  # 1/16
#         out3 = in3 + F.upsample(
#             out4, scale_factor=2, mode="nearest", align_mode=1)  # 1/8
#         out2 = in2 + F.upsample(
#             out3, scale_factor=2, mode="nearest", align_mode=1)  # 1/4
#
#         p5 = self.p5_conv(in5)
#         p4 = self.p4_conv(out4)
#         p3 = self.p3_conv(out3)
#         p2 = self.p2_conv(out2)
#
#         # 特征上采样
#         p5 = F.upsample(p5, scale_factor=8, mode="nearest", align_mode=1)
#         p4 = F.upsample(p4, scale_factor=4, mode="nearest", align_mode=1)
#         p3 = F.upsample(p3, scale_factor=2, mode="nearest", align_mode=1)
#
#         fuse = paddle.concat([p5, p4, p3, p2], axis=1)
#         return fuse




def get_bias_attr(k):
    stdv = 1.0 / math.sqrt(k * 1.0)
    initializer = paddle.nn.initializer.Uniform(-stdv, stdv)
    bias_attr = ParamAttr(initializer=initializer)
    return bias_attr


class Head(nn.Layer):
    def __init__(self, in_channels, name_list):
        super(Head, self).__init__()
        self.conv1 = nn.Conv2D(
            in_channels=in_channels,
            out_channels=in_channels // 4,
            kernel_size=3,
            padding=1,
            weight_attr=ParamAttr(),
            bias_attr=False)
        self.conv_bn1 = nn.BatchNorm(
            num_channels=in_channels // 4,
            param_attr=ParamAttr(
                initializer=paddle.nn.initializer.Constant(value=1.0)),
            bias_attr=ParamAttr(
                initializer=paddle.nn.initializer.Constant(value=1e-4)),
            act='relu')
        self.conv2 = nn.Conv2DTranspose(
            in_channels=in_channels // 4,
            out_channels=in_channels // 4,
            kernel_size=2,
            stride=2,
            weight_attr=ParamAttr(
                initializer=paddle.nn.initializer.KaimingUniform()),
            bias_attr=get_bias_attr(in_channels // 4))
        self.conv_bn2 = nn.BatchNorm(
            num_channels=in_channels // 4,
            param_attr=ParamAttr(
                initializer=paddle.nn.initializer.Constant(value=1.0)),
            bias_attr=ParamAttr(
                initializer=paddle.nn.initializer.Constant(value=1e-4)),
            act="relu")
        self.conv3 = nn.Conv2DTranspose(
            in_channels=in_channels // 4,
            out_channels=1,
            kernel_size=2,
            stride=2,
            weight_attr=ParamAttr(
                initializer=paddle.nn.initializer.KaimingUniform()),
            bias_attr=get_bias_attr(in_channels // 4), )

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv_bn1(x)
        x = self.conv2(x)
        x = self.conv_bn2(x)
        x = self.conv3(x)
        x = F.sigmoid(x)
        return x


class DBHead(nn.Layer):
    """
    Differentiable Binarization (DB) for text detection:
        see https://arxiv.org/abs/1911.08947
    args:
        params(dict): super parameters for build DB network
    """

    def __init__(self, in_channels, k=50, **kwargs):
        super(DBHead, self).__init__()
        self.k = k
        binarize_name_list = [
            'conv2d_56', 'batch_norm_47', 'conv2d_transpose_0', 'batch_norm_48',
            'conv2d_transpose_1', 'binarize'
        ]
        thresh_name_list = [
            'conv2d_57', 'batch_norm_49', 'conv2d_transpose_2', 'batch_norm_50',
            'conv2d_transpose_3', 'thresh'
        ]
        self.binarize = Head(in_channels, binarize_name_list)
        self.thresh = Head(in_channels, thresh_name_list)

    def step_function(self, x, y):
        return paddle.reciprocal(1 + paddle.exp(-self.k * (x - y)))

    def forward(self, x, targets=None):
        shrink_maps = self.binarize(x)
        if not self.training:
            return {'maps': shrink_maps}

        threshold_maps = self.thresh(x)
        binary_maps = self.step_function(shrink_maps, threshold_maps)
        y = paddle.concat([shrink_maps, threshold_maps, binary_maps], axis=1)
        return {'maps': y}
# class DBHead(nn.Layer):
#     """
#     Differentiable Binarization (DB) for text detection:
#         see https://arxiv.org/abs/1911.08947
#     args:
#         params(dict): super parameters for build DB network
#     """
#
#     def __init__(self, in_channels, k=50, **kwargs):
#         super(DBHead, self).__init__()
#         self.k = k
#
#         # DBHead详细实现参考 https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.4/ppocr/modeling/heads/det_db_head.py
#
#     def step_function(self, x, y):
#         # 可微二值化实现,通过概率图和阈值图计算文本分割二值图
#         return paddle.reciprocal(1 + paddle.exp(-self.k * (x - y)))
#
#     def forward(self, x, targets=None):
#         shrink_maps = self.binarize(x)
#         if not self.training:
#             return {'maps': shrink_maps}
#
#         threshold_maps = self.thresh(x)
#         binary_maps = self.step_function(shrink_maps, threshold_maps)
#         y = paddle.concat([shrink_maps, threshold_maps, binary_maps], axis=1)
#         return {'maps': y}



if __name__=='__main__':


    fake_inputs = paddle.randn([1, 3, 640, 640], dtype="float32")

    #   声明Backbone
    model_backbone = MobileNetV3()
    # model_backbone.eval()

    # # 2. 执行预测
    # outs = model_backbone(fake_inputs)

    # # 3. 打印网络结构
    # # print(model_backbone)
    #
    # # 4. 打印输出特征形状
    # for idx, out in enumerate(outs):
    #     print("The index is ", idx, "and the shape of output is ", out.shape)
    # The index is  0 and the shape of output is  [1, 16, 160, 160]
    # The index is  1 and the shape of output is  [1, 24, 80, 80]
    # The index is  2 and the shape of output is  [1, 56, 40, 40]
    # The index is  3 and the shape of output is  [1, 480, 20, 20]
    in_channles = model_backbone.out_channels

    # 声明FPN网络
    model_fpn = DBFPN(in_channels=in_channles, out_channels=256)

    #  打印FPN网络
    print(model_fpn)
    # DBFPN(
    #   (in2_conv): Conv2D(16, 256, kernel_size=[1, 1], data_format=NCHW)
    #   (in3_conv): Conv2D(24, 256, kernel_size=[1, 1], data_format=NCHW)
    #   (in4_conv): Conv2D(56, 256, kernel_size=[1, 1], data_format=NCHW)
    #   (in5_conv): Conv2D(480, 256, kernel_size=[1, 1], data_format=NCHW)
    #   (p5_conv): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
    #   (p4_conv): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
    #   (p3_conv): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
    #   (p2_conv): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
    # )
    # 5. 计算得到FPN结果输出
    outs = model_backbone(fake_inputs)
    fpn_outs = model_fpn(outs)
    # The shape of fpn outs [1, 256, 160, 160]

    # 3. 声明Head网络
    model_db_head = DBHead(in_channels=256)

    # 4. 打印DBhead网络
    print(model_db_head)
    # DBHead(
    #   (binarize): Head(
    #     (conv1): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
    #     (conv_bn1): BatchNorm()
    #     (conv2): Conv2DTranspose(64, 64, kernel_size=[2, 2], stride=[2, 2], data_format=NCHW)
    #     (conv_bn2): BatchNorm()
    #     (conv3): Conv2DTranspose(64, 1, kernel_size=[2, 2], stride=[2, 2], data_format=NCHW)
    #   )
    #   (thresh): Head(
    #     (conv1): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
    #     (conv_bn1): BatchNorm()
    #     (conv2): Conv2DTranspose(64, 64, kernel_size=[2, 2], stride=[2, 2], data_format=NCHW)
    #     (conv_bn2): BatchNorm()
    #     (conv3): Conv2DTranspose(64, 1, kernel_size=[2, 2], stride=[2, 2], data_format=NCHW)
    #   )
    # )
    # 5. 计算Head网络的输出
    db_head_outs = model_db_head(fpn_outs)
    print(f"The shape of fpn outs {fpn_outs.shape}")
    # The shape of fpn outs [1, 256, 160, 160]
    print(f"The shape of DB head outs {db_head_outs['maps'].shape}")
    # The shape of DB head outs [1, 3, 640, 640]

结果:

DBFPN(
  (in2_conv): Conv2D(16, 256, kernel_size=[1, 1], data_format=NCHW)
  (in3_conv): Conv2D(24, 256, kernel_size=[1, 1], data_format=NCHW)
  (in4_conv): Conv2D(56, 256, kernel_size=[1, 1], data_format=NCHW)
  (in5_conv): Conv2D(480, 256, kernel_size=[1, 1], data_format=NCHW)
  (p5_conv): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
  (p4_conv): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
  (p3_conv): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
  (p2_conv): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
)
DBHead(
  (binarize): Head(
    (conv1): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
    (conv_bn1): BatchNorm()
    (conv2): Conv2DTranspose(64, 64, kernel_size=[2, 2], stride=[2, 2], data_format=NCHW)
    (conv_bn2): BatchNorm()
    (conv3): Conv2DTranspose(64, 1, kernel_size=[2, 2], stride=[2, 2], data_format=NCHW)
  )
  (thresh): Head(
    (conv1): Conv2D(256, 64, kernel_size=[3, 3], padding=1, data_format=NCHW)
    (conv_bn1): BatchNorm()
    (conv2): Conv2DTranspose(64, 64, kernel_size=[2, 2], stride=[2, 2], data_format=NCHW)
    (conv_bn2): BatchNorm()
    (conv3): Conv2DTranspose(64, 1, kernel_size=[2, 2], stride=[2, 2], data_format=NCHW)
  )
)
The shape of fpn outs [1, 256, 160, 160]
The shape of DB head outs [1, 3, 640, 640]

DB算法优点:(有监督,backbone选ResNet50效果更好)

  • 精度更高、快
  • 弯曲文本
  • 多方向文本
  • 多语言

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/456560.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

如何在idea中配置tomcat服务器,然后部署一个项目

文章目录 前言第一步 先新建一个空项目第二步 添加框架支持第三步 添加配置及如何部署最后一步 运行及检查有没有问题总结 前言 本章学习的是在idea中配置tomcat服务器&#xff0c;然后部署一个项目 如果没有下载Tomcat服务器的可以在上一个博客观看下载及手动部署&#xff0c;…

pytorch之诗词生成3--utils

先上代码&#xff1a; import numpy as np import settingsdef generate_random_poetry(tokenizer, model, s):"""随机生成一首诗:param tokenizer: 分词器:param model: 用于生成古诗的模型:param s: 用于生成古诗的起始字符串&#xff0c;默认为空串:return: …

Kubernetes operator(十) kubebuilder 实战演练 之 开发多版本CronJob【更新中】

云原生学习路线导航页&#xff08;持续更新中&#xff09; 本文是 Kubernetes operator学习 系列第十篇&#xff0c;本节会在前篇开发的Cronjob基础上&#xff0c;进行 多版本Operator 开发的实战 本文的所有代码&#xff0c;都存储于github代码库&#xff1a;https://github.c…

豆瓣书影音存入Notion

使用Python将图书和影视数据存放入Notion中。 &#x1f5bc;️介绍 环境 Python 3.10 &#xff08;建议 3.11 及以上&#xff09;Pycharm / Vs Code / Vs Code Studio 项目结构 │ .env │ main.py - 主函数、执行程序 │ new_book.txt - 上一次更新书籍 │ new_video.…

13-Vue基础之路由

个人名片&#xff1a; &#x1f60a;作者简介&#xff1a;一名大二在校生 &#x1f921; 个人主页&#xff1a;坠入暮云间x &#x1f43c;座右铭&#xff1a;懒惰受到的惩罚不仅仅是自己的失败&#xff0c;还有别人的成功。 &#x1f385;**学习目标: 坚持每一次的学习打卡 文章…

26-1 SQL 注入攻击 - delete注入

环境准备:构建完善的安全渗透测试环境:推荐工具、资源和下载链接_渗透测试靶机下载-CSDN博客 一、注入原理: 对于后台来说,delete操作通常是将对应的id传递到后台,然后后台会删除该id对应的数据。 如果后台没有对接收到的 id 参数进行充分的验证和过滤,恶意用户可能会…

一、NLP中的文本分类

目录 1.0 文本分类的应用场景 1.1 文本分类流程 ​编辑 1.2 判别式模型 1.3 生成式模型 1.4 评估 1.5 参考文献 NLP学习笔记系列&#xff0c;欢迎收藏交流&#xff1a; 零、自然语言处理开篇-CSDN博客 一、NLP中的文本分类-CSDN博客 二、NLP中的序列标注&#xff08;分…

scrcpy远程投屏控制Android

下载 下载后解压压缩包scrcpy-win64-v2.4.zip scrcpy连接手机 1. 有线连接 - 手机开启开发者选项&#xff0c;并开启USB调试&#xff0c;连接电脑&#xff0c;华为手机示例解压scrcpy&#xff0c;在scrcpy目录下打开终端&#xff0c;&#xff08;或添加scrcpy路径为环境变…

NVIDIA vGPU三种授权方式(个人玩家版)

NVIDIA vGPU三种授权方式(个人玩家版) 旧版本的License Server搭建(比较推荐)说明搭建所需文件创建一个Linux虚拟机(我创建的是Ubuntu 18.04.06)修改虚拟机的MAC地址关闭虚拟机的时间同步及修改系统时间安装java安装Apache Tomcat安装许可证服务器软件上传授权文件新版本…

智慧城管:街面秩序沿街商铺视频可视化AI智能监管方案

一、背景分析 随着城市化的加速和商业活动的日益繁荣&#xff0c;沿街商铺的管理面临着越来越多的挑战。沿街商户的乱堆乱放、占道经营、违章停车等违法行为&#xff0c;一直以来都是城市管理中的难题。这不仅存在交通安全隐患&#xff0c;也造成了市容秩序混乱&#xff0c;严…

【CSS3】CSS3 3D 转换示例 - 3D 旋转木马 ( @keyframes 规则 定义动画 | 为 盒子模型 应用动画 | 开启透视视图 | 设置 3D 呈现样式 )

文章目录 一、3D 导航栏示例 - 核心要点1、需求分析2、HTML 结构section 标签 3、CSS 样式keyframes 规则 定义动画为 盒子模型 应用动画开启透视视图设置 3D 呈现样式鼠标移动到控件上方效果设置 6 个子盒子模型的效果 二、完整代码示例1、代码示例2、展示效果 一、3D 导航栏示…

访问者模式(Visitor Pattern)

访问者模式 说明 访问者模式&#xff08;Visitor Pattern&#xff09;属于行为型模式&#xff0c;表示一个作用于某对象结构中的各元素的操作。它使你可以在不改变各元素的类的前提下定义作用于这些元素的新操作。 该模式是将数据结构与数据操作分离的设计模式&#xff0c;是…

实现微服务:匹配系统

HTTP与WebSocket协议 1. HTTP协议是无状态的&#xff0c;每次请求都是独立的&#xff0c;服务器不会保存客户端的状态信息。而WebSocket协议是有状态的&#xff0c;一旦建立连接后&#xff0c;服务器和客户端可以进行双向通信&#xff0c;并且可以保持连接状态&#xff0c;服务…

“遥感+”多技术融合:碳排放监测的创新路径“

在全球环境问题日益严重的今天&#xff0c;以全球变暖为主要特征的气候变化成为了人类面临的巨大挑战。它威胁着地球的生态平衡&#xff0c;对全球可持续发展构成了严峻的挑战。为了应对这一挑战&#xff0c;各国纷纷采取行动&#xff0c;致力于实现碳达峰和碳中和的目标。 在…

Window11安装达梦数据库

由于现在流行国产化&#xff0c;很多公司的数据库产品都使用了国产数据库&#xff0c;所以&#xff0c;今天给大家讲解一下&#xff0c;达梦数据库的安装和试用&#xff0c;这样学完以后&#xff0c;就可以直接在公司里面用了。 首先&#xff0c;需要先注册账号&#xff0c;然…

怎么在家里远程控制公司电脑?

在家远程控制公司办公电脑需求渐增 在家工作也被称为远程办公&#xff0c;可以节省通勤时间&#xff0c;而且也为老板提供了对应的工作成果&#xff0c;是一个一举两得的好方法。 如果您想要在家远程控制公司电脑&#xff0c;先需要在公司的电脑上安装并运行相应的远程工具&a…

css设置选中文字和选中图片字的颜色

要改变页面中选中文字的颜色&#xff0c;可以使用 CSS 的 ::selection 伪元素来实现 *::selection {/* 改变选中文字的背景色 */background-color: #c42121;/* 改变选中文字的文本颜色 */color: #fff; } 用通配符选择器给所有元素都加上了 ::selection伪元素&#xff0c;用于…

CrossOver24软件免费电脑虚拟机,快速在Mac和Linux上运行Windows软件

当然&#xff0c;除了之前提到的核心技术、兼容性和性能优化外&#xff0c;CrossOver2024还具有其他一些值得关注的性能特点&#xff1a; CrossOver Mac-安装包下载如下&#xff1a;https://wm.makeding.com/iclk/?zoneid50028 CrossOver linux-安装包下载如下&#xff1a;ht…

工业界真实的推荐系统(小红书)-离散特征处理、矩阵补充模型、双塔模型

课程特点&#xff1a;系统、清晰、实用&#xff0c;原理和落地经验兼具 b站&#xff1a;https://www.bilibili.com/video/BV1HZ421U77y/?spm_id_from333.337.search-card.all.click&vd_sourceb60d8ab7e659b10ea6ea743ede0c5b48 讲义&#xff1a;https://github.com/wangsh…

linux系统创建私有容器仓库和docker容器的资源限制

私有仓库创建和资源限制 创建私有仓库docker资源限制系统压力测试工具stresscpu资源限制限制CPU Share限制CPU核数CPU绑定 mem资源限制限制IO 创建私有仓库 上传harbor压缩包 解压 下载docker-compose 进入解压后的目录 修改配置文件 mv harbor.yml.tmpl harbor.yml vim harb…