从MobileNetv1到MobileNetv3模型详解

简言

MobileNet系列包括V1、V2和V3,专注于轻量级神经网络。MobileNetV1采用深度可分离卷积,MobileNetV2引入倒残差模块,提高准确性。MobileNetV3引入更多设计元素,如可变形卷积和Squeeze-and-Excitation模块,平衡计算效率和准确性。这三个系列在移动设备和嵌入式系统上取得成功,为资源受限的环境提供高效的深度学习解决方案。

  1. mobilenetv1原论文地址:https://arxiv.org/pdf/1704.04861.pdf
  2. mobilenetv2原论文地址:https://arxiv.org/pdf/1801.04381.pdf
  3. mobilenetv3原论文地址:https://arxiv.org/abs/1905.02244.pdf

MobileNetv1

在最近,人们对构建小型而高效的神经网络很感兴趣,使用的方法大致为压缩预训练网络和直接训练小型网络。MobileNet主要关注于优化延迟,但也产生小的网络。

深度可分离卷积

标准卷积本质上是一种通过学习参数的方式,对输入数据进行特征提取的操作;深度可分离卷积相较于标准卷积层引入了两个主要的改进:深度卷积和逐点卷积。

  1. 深度卷积(DwConv): 在深度可分离卷积中,首先对输入数据的每个通道使用单独的卷积核,称之为深度卷积。这个步骤实际上是对输入数据的每个通道分别进行卷积操作,而不像标准卷积那样在所有通道上共享一个卷积核。这样做减少了参数的数量,因为每个通道有自己的一组卷积核。
  2. 逐点卷积(PwConv): 在深度卷积之后,使用逐点卷积,也称为 1x1 卷积,将深度卷积的输出进行线性组合,生成最终的输出特征图。逐点卷积使用  1x1 的卷积核,这相当于在每个通道上进行全连接操作。逐点卷积的作用是将深度卷积的输出特征图进行组合和混合,引入非线性关系,从而更好地捕捉通道间的信息。

class DepthSepConv(nn.Module):
    """
    深度可分卷积: DW卷积 + PW卷积
    dw卷积, 当分组个数等于输入通道数时, 输出矩阵的通道输也变成了输入通道数
    pw卷积, 使用了1x1的卷积核与普通的卷积一样
    """
    def __init__(self, in_channels, out_channels, stride):
        super(DepthSepConv, self).__init__()
        self.depthwise = nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=stride, groups=in_channels, padding=1)
        self.pointwise = nn.Conv2d(in_channels, out_channels, kernel_size=1)
        self.batch_norm1 = nn.BatchNorm2d(in_channels)
        self.batch_norm2 = nn.BatchNorm2d(out_channels)
        self.relu6 = nn.ReLU6(inplace=True)

    def forward(self, x):
        x = self.depthwise(x)
        x = self.batch_norm1(x)
        x = self.relu6(x)
        x = self.pointwise(x)
        x = self.batch_norm2(x)
        x = self.relu6(x)

        return x

引入了深度可分离卷积,可减少参数的数量,模型的参数量大幅降低,降低了过拟合的风险,同时减小了计算复杂度。

对于标准卷积来说:

Calculate=K\times K\times C_{in} \times H\times W\times C_{out}

而深度可分离卷积则是:

Calculate_{DW}=K\times K\times C_{in} \times H\times W\times 1

Calculate_{PW}=1\times 1\times C_{in} \times H\times W\times 1

所以:Calculate=Calculate_{DW}+Calculate_{PW}

其使用的计算量比标准卷积少8到9倍,而且精度只有很小的降低。

mobilenetv1模型实现

这一部分是我参照着论文中的图表按照输出结构复现的。

class MobileNetV1(nn.Module):
    def __init__(self, num_classes=1000, drop_rate=0.2):
        super(MobileNetV1, self).__init__()
        # torch.Size([1, 3, 224, 224])
        self.conv_bn = nn.Sequential(
                nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=2, padding=1, bias=False),
                nn.BatchNorm2d(32),
                nn.ReLU(inplace=True)
            )                                # torch.Size([1, 32, 112, 112])
        self.dwmodule = nn.Sequential(
            # 参考MobileNet_V1 https://arxiv.org/pdf/1704.04861.pdf Table 1
            DepthSepConv(32, 64, 1),            # torch.Size([1, 64, 112, 112])
            DepthSepConv(64, 128, 2),           # torch.Size([1, 128, 56, 56])
            DepthSepConv(128, 128, 1),          # torch.Size([1, 128, 56, 56])
            DepthSepConv(128, 256, 2),          # torch.Size([1, 256, 28, 28])
            DepthSepConv(256, 256, 1),          # torch.Size([1, 256, 28, 28])
            DepthSepConv(256, 512, 2),          # torch.Size([1, 512, 14, 14])
            # 5 x DepthSepConv(512, 512, 1),
            DepthSepConv(512, 512, 1),          # torch.Size([1, 512, 14, 14])
            DepthSepConv(512, 512, 1),
            DepthSepConv(512, 512, 1),
            DepthSepConv(512, 512, 1),
            DepthSepConv(512, 512, 1),
            DepthSepConv(512, 1024, 2),         # torch.Size([1, 1024, 7, 7])
            DepthSepConv(1024, 1024, 1),
            nn.AvgPool2d(7, stride=1),
        )
        self.fc = nn.Linear(in_features=1024, out_features=num_classes)
        self.dropout = nn.Dropout(p=drop_rate)
        self.softmax = nn.Softmax(dim=1)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.constant_(m.bias, 0)

    def forward(self, x):
        x = self.conv_bn(x)
        x = self.dwmodule(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        x = self.softmax(self.dropout(x))
        return x

第1层为标准卷积层,紧接着的26层为核心层结构,采用深度可分离卷积层。这些层通过堆叠深度可分离卷积单元来构建网络。然后是全局平均池化层,使用7x7的池化核,目的是降低空间维度,将图像的每个通道的特征合并为一个值。全连接层加softmax层输出。 

MobileNetv2

Mobilenetv2网络设计基于Mobilenetv1,它保持了其简单性,不需要任何特殊的操作,同时显著提高了其准确性,实现了移动应用的多图像分类和检测任务的最先进水平。

MobileNetV2是基于倒置的残差结构,普通的残差结构是先经过 1x1 的卷积核把 feature map的通道数压下来,然后经过 3x3 的卷积核,最后再用 1x1 的卷积核将通道数扩张回去,即先压缩后扩张,而MobileNetV2的倒置残差结构是先扩张后压缩。另外,我们发现移除通道数很少的层做线性激活非常重要。

Inverted Residual Block倒残差结构 

可以看见在我们上图的右边,就是倒残差结构,它会经历以下部分:

  • 1x1卷积升维
  • 3x3卷积DW
  • 1x1卷积降维

接下来请结合着下面的代码来看,首先有一个expand_ratio来表示是否对输入进来的特征层进行升维,如果不需要就会进行卷积、标准化、激活函数、卷积、标准化。不然就会先有1x1卷积进行通道数的上升,在用3x3逐层卷积,进行跨特征点的特征提取,最后1x1卷积进行通道数的下降。

上升是为了让我们的网络结构有具备更好的特征表征能力,下降是为了让我们的网络具备更低的运算量,在完成这样的特征提取后,如果要使用残差边,我们就会将特征提取的结果直接与输入相接,如果没有使用残差边,就会直接输出卷积结果。

import torch
import torch.nn as nn

def _make_divisible(v, divisor, min_value=None):
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v

class ConvBNReLU6(nn.Module):
    def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1, dilation=1,):
        super(ConvBNReLU6, self).__init__()
        padding = (kernel_size - 1) // 2 * dilation

        self.convbnrelu6 = nn.Sequential(
            nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, dilation=dilation,
                      groups=groups, bias=False),
            nn.BatchNorm2d(out_planes),
            nn.ReLU6(inplace=True)
        )

    def forward(self, x):
        return self.convbnrelu6(x)

class InvertedResidual(nn.Module):
    def __init__(self, in_planes, out_planes, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        assert stride in [1, 2]

        hidden_dim = int(round(in_planes * expand_ratio))
        self.use_res_connect = self.stride == 1 and in_planes == out_planes

        layers = []
        if expand_ratio != 1:
            # pw 利用1x1卷积进行通道数的上升
            layers.append(ConvBNReLU6(in_planes, hidden_dim, kernel_size=1))

        layers.extend([
            # dw 进行3x3的逐层卷积,进行跨特征点的特征提取
            ConvBNReLU6(hidden_dim, hidden_dim, kernel_size=3, stride=stride, groups=hidden_dim),
            # pw-linear 利用1x1卷积进行通道数的下降
            nn.Conv2d(hidden_dim, out_planes, kernel_size=1, stride=1, padding=0),
            nn.BatchNorm2d(out_planes),
        ])
        self.conv = nn.Sequential(*layers)
        self.out_channels = out_planes

    def forward(self, x):
        if self.use_res_connect:
            return x + self.conv(x)
        else:
            return self.conv(x)

if __name__ == "__main__":
    inverted_residual_setting = [
        # t, c, n, s
        [1, 16, 1, 1],
        [6, 24, 2, 2],
        [6, 32, 3, 2],
        [6, 64, 4, 2],
        [6, 96, 3, 1],
        [6, 160, 3, 2],
        [6, 320, 1, 1],
    ]

    class Invertedmodels(nn.Module):
        def __init__(self, input_channel=32, round_nearest=8):
            super(Invertedmodels, self).__init__()
            input_channel = _make_divisible(input_channel, round_nearest)
            self.conv1 = ConvBNReLU6(3, input_channel, stride=2)
            self.inverted_residuals = nn.ModuleList()

            for t, c, n, s in inverted_residual_setting:
                output_channel = _make_divisible(c, round_nearest)
                inverted_residual_list = []
                for i in range(n):
                    stride = s if i == 0 else 1
                    inverted_residual = InvertedResidual(input_channel, output_channel, stride, expand_ratio=t)
                    inverted_residual_list.append(inverted_residual)
                    input_channel = output_channel

                # 将InvertedResidual的实例添加到模型中
                setattr(self, f'inverted_residual_{t}_{c}_{n}', nn.Sequential(*inverted_residual_list))
                self.inverted_residuals.extend(inverted_residual_list)

        def forward(self, x):
            x = self.conv1(x)
            print(x.shape)
            for i, inverted_residual in enumerate(self.inverted_residuals):
                x = inverted_residual(x)
                print(i, x.shape)
            return x

    input_tensor = torch.randn((1, 3, 224, 224))
    model = Invertedmodels()
    output = model(input_tensor)

mobilenetv2模型实现

这一部分可以参照着论文中的图表进行理解。

import torch
import torch.nn as nn
import torch.nn.functional as F

def _make_divisible(v, divisor, min_value=None):
    """
    This function is taken from the original tf repo.It can be seen here:
        https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    Args:
        v: The number of input channels.
        divisor: The number of channels should be a multiple of this value.
        min_value: The minimum value of the number of channels, which defaults to the advisor.

    Returns: It ensures that all layers have a channel number that is divisible by 8
    """
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v

class ConvBNReLU6(nn.Module):
    def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1, dilation=1,):
        super(ConvBNReLU6, self).__init__()
        padding = (kernel_size - 1) // 2 * dilation

        self.convbnrelu6 = nn.Sequential(
            nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, dilation=dilation,
                      groups=groups, bias=False),
            nn.BatchNorm2d(out_planes),
            nn.ReLU6(inplace=True)
        )

    def forward(self, x):
        return self.convbnrelu6(x)

class InvertedResidual(nn.Module):
    def __init__(self, in_planes, out_planes, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        assert stride in [1, 2]

        hidden_dim = int(round(in_planes * expand_ratio))
        self.use_res_connect = self.stride == 1 and in_planes == out_planes

        layers = []
        if expand_ratio != 1:
            # pw 利用1x1卷积进行通道数的上升
            layers.append(ConvBNReLU6(in_planes, hidden_dim, kernel_size=1))
        layers.extend([
            # dw 进行3x3的逐层卷积,进行跨特征点的特征提取
            ConvBNReLU6(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim),
            # pw-linear 利用1x1卷积进行通道数的下降
            nn.Conv2d(hidden_dim, out_planes, kernel_size=1, stride=1, padding=0),
            nn.BatchNorm2d(out_planes),
        ])
        self.conv = nn.Sequential(*layers)
        self.out_channels = out_planes

    def forward(self, x):
        if self.use_res_connect:
            return x + self.conv(x)
        else:
            return self.conv(x)


class MobileNetV2(nn.Module):
    def __init__(self, num_classes=1000, drop_rate=0.2, width_mult=1.0, round_nearest=8):
        """
        MobileNet V2 main class

        Args:
            num_classes (int): Number of classes
            drop_rate (float): Dropout layer drop rate
            width_mult (float): Width multiplier - adjusts number of channels in each layer by this amount
            round_nearest (int): Round the number of channels in each layer to be a multiple of this number
            Set to 1 to turn off rounding
        """
        super(MobileNetV2, self).__init__()
        input_channel = 32
        last_channel = 1280
        inverted_residual_setting = [
            # t, c, n, s
            [1, 16, 1, 1],
            [6, 24, 2, 2],
            [6, 32, 3, 2],
            [6, 64, 4, 2],
            [6, 96, 3, 1],
            [6, 160, 3, 2],
            [6, 320, 1, 1],
        ]
        # t表示是否进行1*1卷积上升的过程 c表示output_channel大小 n表示小列表倒残差次数 s是步长,表示是否对高和宽进行压缩
        # building first layer
        input_channel = _make_divisible(input_channel * width_mult, round_nearest)
        self.last_channel = _make_divisible(last_channel * max(1.0, width_mult), round_nearest)
        features = [ConvBNReLU6(3, input_channel, stride=2)]
        # building inverted residual blocks
        for t, c, n, s in inverted_residual_setting:
            output_channel = _make_divisible(c * width_mult, round_nearest)
            for i in range(n):
                stride = s if i == 0 else 1
                features.append(InvertedResidual(input_channel, output_channel, stride, expand_ratio=t))
                input_channel = output_channel
        # building last several layers
        features.append(ConvBNReLU6(input_channel, self.last_channel, kernel_size=1))
        # make it nn.Sequential
        self.features = nn.Sequential(*features)

        self.classifier = nn.Sequential(
            nn.Dropout(drop_rate),
            nn.Linear(self.last_channel, num_classes),
        )

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def forward(self, x):
        x = self.features(x)
        # Cannot use "squeeze" as batch-size can be 1 => must use reshape with x.shape[0]
        x = F.adaptive_avg_pool2d(x, (1, 1)).reshape(x.shape[0], -1)
        x = self.classifier(x)
        return x

if __name__=="__main__":
    import torchsummary
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    input = torch.ones(2, 3, 224, 224).to(device)
    net = MobileNetV2(num_classes=4)
    net = net.to(device)
    out = net(input)
    print(out)
    print(out.shape)
    torchsummary.summary(net, input_size=(3, 224, 224))

MobileNetv3

mobilenetv3中的block

在如上的结构图当中,mobilenetv3添加了SE模块,并且更换了激活函数。

SE模块你可以通过这里了解更多:SE通道注意力机制模块-CSDN博客 

这里用到的激活函数不一样,有hardswish、relu两种。relu我想大家也是十分的了解了。

HardSwish的数学表达式如下:

HardSwish(x)=x\cdot ReLU6(x+3) / 6

hardswish我写了一个手写版本的帮助大家理解,这也是我与官方的实现进行过对比的

class Hardswish(nn.Module):
    def __init__(self, inplace=False):
        super(Hardswish, self).__init__()
        self.inplace = inplace

    def _hardswish(self, x):
        inner = F.relu6(x + 3.).div_(6.)
        return x.mul_(inner) if self.inplace else x.mul(inner)

    def forward(self, x):
        return self._hardswish(x)

这种设计的优势在于,HardSwish在保持一定的非线性特性的同时,通过使用ReLU6的硬性截断,使得函数在接近零的地方趋向于线性,这有助于梯度的传播。

mobilenetv3模型实现

论文当中提供了两种实现方式,分别是large和small。

import torch
import torch.nn as nn
from functools import partial

def _make_divisible(v, divisor, min_value=None):
    """
    This function is taken from the original tf repo.It can be seen here:
        https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    Args:
        v: The number of input channels.
        divisor: The number of channels should be a multiple of this value.
        min_value: The minimum value of the number of channels, which defaults to the advisor.

    Returns: It ensures that all layers have a channel number that is divisible by 8
    """
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v

class ConvBNActivation(nn.Module):
    def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1,
                 norm_layer=None, activation_layer=None, dilation=1,):
        super(ConvBNActivation, self).__init__()
        padding = (kernel_size - 1) // 2 * dilation
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        if activation_layer is None:
            activation_layer = nn.ReLU6
        self.convbnact=nn.Sequential(
            nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, dilation=dilation, groups=groups,
                      bias=False),
            norm_layer(out_planes),
            activation_layer(inplace=True)
        )
        self.out_channels = out_planes

    def forward(self, x):
        return self.convbnact(x)


class SeModule(nn.Module):
    def __init__(self, input_channels, reduction=4):
        super(SeModule, self).__init__()
        expand_size = _make_divisible(input_channels // reduction, 8)
        self.se = nn.Sequential(
            nn.AdaptiveAvgPool2d(1),
            nn.Conv2d(input_channels, expand_size, kernel_size=1, bias=False),
            nn.BatchNorm2d(expand_size),
            nn.ReLU(inplace=True),
            nn.Conv2d(expand_size, input_channels, kernel_size=1, bias=False),
            nn.Hardsigmoid()
        )

    def forward(self, x):
        return x * self.se(x)

class MobileNetV3(nn.Module):
    """
    MobileNet V3 main class

    Args:
        num_classes: Number of classes
        mode: "large" or "small"
    """
    def __init__(self, num_classes=1000, mode=None, drop_rate=0.2):
        super().__init__()
        norm_layer = partial(nn.BatchNorm2d, eps=0.001, momentum=0.01)
        layers = []
        inverted_residual_setting, last_channel = _mobilenetv3_cfg[mode]
        # building first layer
        firstconv_output_channels = 16
        layers.append(ConvBNActivation(3, firstconv_output_channels, kernel_size=3, stride=2, norm_layer=norm_layer,
                                       activation_layer=nn.Hardswish))
        layers.append(inverted_residual_setting)
        # building last several layers
        lastconv_input_channels = 96 if mode == "small" else 160
        lastconv_output_channels = 6 * lastconv_input_channels
        layers.append(ConvBNActivation(lastconv_input_channels, lastconv_output_channels, kernel_size=1,
                                       norm_layer=norm_layer, activation_layer=nn.Hardswish))

        self.features = nn.Sequential(*layers)
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.classifier = nn.Sequential(
            nn.Linear(lastconv_output_channels, last_channel),
            nn.Hardswish(inplace=True),
            nn.Dropout(p=drop_rate, inplace=True),
            nn.Linear(last_channel, num_classes),
        )

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)

        return x

class InvertedResidualv3(nn.Module):
    '''expand + depthwise + pointwise'''
    def __init__(self, kernel_size, input_channels, expanded_channels, out_channels, activation, use_se, stride):
        super(InvertedResidualv3, self).__init__()
        self.stride = stride
        norm_layer = partial(nn.BatchNorm2d, eps=0.001, momentum=0.01)
        self.use_res_connect = stride == 1 and input_channels == out_channels
        activation_layer = nn.ReLU if activation == "RE" else nn.Hardswish
        layers = []
        if expanded_channels != input_channels:
            layers.append(ConvBNActivation(input_channels, expanded_channels, kernel_size=1,
                                           norm_layer=norm_layer, activation_layer=activation_layer))

        # depthwise
        layers.append(ConvBNActivation(expanded_channels, expanded_channels, kernel_size=kernel_size,
                                       stride=stride, groups=expanded_channels,
                                       norm_layer=norm_layer, activation_layer=activation_layer))
        if use_se:
            layers.append(SeModule(expanded_channels))

        layers.append(ConvBNActivation(expanded_channels, out_channels, kernel_size=1, norm_layer=norm_layer,
                                       activation_layer=nn.Identity))

        self.block = nn.Sequential(*layers)
        self.out_channels = out_channels

    def forward(self, x):
        result = self.block(x)
        if self.use_res_connect:
            result += x
        return result

_mobilenetv3_cfg = {
    "large": [nn.Sequential(
        # kernel, in_chs, exp_chs, out_chs, act, use_se, stride
        InvertedResidualv3(3, 16, 16, 16, "RE", False, 1),
        InvertedResidualv3(3, 16, 64, 24, "RE", False, 2),
        InvertedResidualv3(3, 24, 72, 24, "RE", False, 1),
        InvertedResidualv3(5, 24, 72, 40, "RE", True, 2),
        InvertedResidualv3(5, 40, 120, 40, "RE", True, 1),
        InvertedResidualv3(5, 40, 120, 40, "RE", True, 1),
        InvertedResidualv3(3, 40, 240, 80, "HS", False, 2),
        InvertedResidualv3(3, 80, 200, 80, "HS", False, 1),
        InvertedResidualv3(3, 80, 184, 80, "HS", False, 1),
        InvertedResidualv3(3, 80, 184, 80, "HS", False, 1),
        InvertedResidualv3(3, 80, 480, 112, "HS", True, 1),
        InvertedResidualv3(3, 112, 672, 112, "HS", True, 1),
        InvertedResidualv3(5, 112, 672, 160, "HS", True, 1),
        InvertedResidualv3(5, 160, 672, 160, "HS", True, 2),
        InvertedResidualv3(5, 160, 960, 160, "HS", True, 1),
    ),
        _make_divisible(1280, 8)],
    "small": [nn.Sequential(
        # kernel, in_chs, exp_chs, out_chs, act, use_se, stride
        InvertedResidualv3(3, 16, 16, 16, "RE", True, 2),
        InvertedResidualv3(3, 16, 72, 24, "RE", False, 2),
        InvertedResidualv3(3, 24, 88, 24, "RE", False, 1),
        InvertedResidualv3(5, 24, 96, 40, "HS", True, 2),
        InvertedResidualv3(5, 40, 240, 40, "HS", True, 1),
        InvertedResidualv3(5, 40, 240, 40, "HS", True, 1),
        InvertedResidualv3(5, 40, 120, 48, "HS", True, 1),
        InvertedResidualv3(5, 48, 144, 48, "HS", True, 1),
        InvertedResidualv3(5, 48, 288, 96, "HS", True, 2),
        InvertedResidualv3(5, 96, 576, 96, "HS", True, 1),
        InvertedResidualv3(5, 96, 576, 96, "HS", True, 1),
    ),
        _make_divisible(1024, 8)],
}

def MobileNetV3_Large(num_classes):
    """Large version of mobilenet_v3"""
    return MobileNetV3(num_classes=num_classes, mode="large")

def MobileNetV3_Small(num_classes):
    """small version of mobilenet_v3"""
    return MobileNetV3(num_classes=num_classes, mode="small")

if __name__=="__main__":
    import torchsummary
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    input = torch.ones(2, 3, 224, 224).to(device)
    net = MobileNetV3_Large(num_classes=4)
    net = net.to(device)
    out = net(input)
    print(out)
    print(out.shape)
    torchsummary.summary(net, input_size=(3, 224, 224))

其他

老规矩,模型实现了还是要测试一下它的分类性能,但让我感到奇怪的一点是mobilenetv3在验证集上的损失在不断上升,而且越来越离谱,大致在10到20,这让我一度以为是我写的训练脚本计算出了问题(因为期间在不断的改进),后面我又跑了前面的网络,以及mobilenetv1和v2两个版本都还是挺正常的,然后我又拿官方的进行实验(torchvision下的mobilenetv3),也是和我一样的问题,验证集损失在十几,所以这部分我暂时还是比较的疑惑的。

问题暂时没有解决,先放在这里。

参考文章

【轻量化网络系列(1)】MobileNetV1论文超详细解读(翻译 +学习笔记+代码实现)-CSDN博客

【轻量化网络系列(2)】MobileNetV2论文超详细解读(翻译 +学习笔记+代码实现)-CSDN博客

轻量级网络——MobileNetV1_mobilenet_v1-CSDN博客

MobileNetV3网络结构详解-CSDN博客

MobileNet系列(4):MobileNetv3网络详解-CSDN博客

DeepLabV3+:搭建Mobilenetv2网络_deeplabv3+编码部分采用 mobilenetv2-CSDN博客

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/390401.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

CSS的background 背景图片自动适应元素大小,实现img的默认效果 background-size:100% 100%;

CSS的background 背景图片自动适应元素大小,实现img的默认效果 background-size:100% 100%; 关键是background-size:100% 100%; background-size:100% 100%; background-size:100% 100%; background-size:contain; 保持纵横比, 容器部分可能空白background-size:cover; 保…

自动更改由VSCode调试器创建的默认launch.json文件

File -> Preference -> Settings 修改下面的部分

文生视频:Sora模型报告总结

作为世界模拟器的视频生成模型 我们探索视频数据生成模型的大规模训练。具体来说&#xff0c;我们在可变持续时间、分辨率和宽高比的视频和图像上联合训练文本条件扩散模型。我们利用对视频和图像潜在代码的时空补丁进行操作的变压器架构。我们最大的模型 Sora 能够生成一分钟…

AJAX——AJAX入门

1 什么是AJAX&#xff1f; Ajax&#xff08;Asynchronous JavaScript and XML&#xff09;是一种用于在Web应用程序中实现异步通信的技术。 简单点说&#xff0c;就是使用XMLHttpRequest对象与服务器通信。它可以使用JSON、XML、HTML和test文本等格式发送和接收数据。 AJAX最吸…

JavaWeb之Servlet接口

Servlet接口 什么是Servlet&#xff1f; Servlet是一种基于Java技术的Web组件&#xff0c;用于生成动态内容&#xff0c;由容器管理&#xff0c;是平台无关的Java类组成&#xff0c;并且由Java Web服务器加载执行&#xff0c;是Web容器的最基本组成单元 什么是Servlet容器&…

2.13日学习打卡----初学RocketMQ(四)

2.13日学习打卡 目录&#xff1a; 2.13日学习打卡一.RocketMQ之Java ClassDefaultMQProducer类DefaultMQPushConsumer类Message类MessageExt类 二.RocketMQ 消费幂消费过程幂等消费速度慢的处理方式 三.RocketMQ 集群服务集群特点单master模式多master模式多master多Slave模式-…

【研究生复试】计算机软件工程人工智能研究生复试——资料整理(速记版)——数据库

1、JAVA 2、计算机网络 3、计算机体系结构 4、数据库 5、计算机租场原理 6、软件工程 7、大数据 8、英文 自我介绍 4. 数据库 1. B树相对于B树的区别及优势 B树中有重复元素&#xff0c;B树没有重复元素B树种每个节点都存储了key和data&#xff0c;B树内节点去掉了其中指向数…

数学实验第三版(主编:李继成 赵小艳)课后练习答案(十一)(1)(2)(3)

目录 实验十一&#xff1a;非线性方程&#xff08;组&#xff09;求解 练习一 练习二 练习三 实验十一&#xff1a;非线性方程&#xff08;组&#xff09;求解 练习一 1.求莱昂纳多方程 的解 clc;clear; p[1,2,10,-20]; roots(p)ans -1.6844 3.4313i -1.6844 - 3.4313i…

数据结构与算法:二叉树(寻找最近公共祖先、寻找后继节点、序列化和反序列化、折纸问题的板子和相关力扣题目)

最近公共祖先 第一版&#xff08;前提&#xff1a;p和q默认存在于这棵树中&#xff09; 可以层序遍历每个节点时用个HashMap存储该结点和其直接父节点的信息。然后从p开始溯源&#xff0c;将所有的父节点都添加到一个HashSet集合里。然后从q开始溯源&#xff0c;每溯源一步看…

题目:3.神奇的数组(蓝桥OJ 3000)

问题描述&#xff1a; 解题思路&#xff1a; 官方&#xff1a; 我的总结&#xff1a; 利用双指针遍历每个区间并判断是否符合条件&#xff1a;若一个区间符合条件则该区间在其左端点不变的情况下的每一个子区间都符合条件&#xff0c;相反若一个区间内左端点固定情况下有一个以…

ssm的网上招聘系统(有报告)。Javaee项目。ssm项目。

演示视频&#xff1a; ssm的网上招聘系统&#xff08;有报告&#xff09;。Javaee项目。ssm项目。 项目介绍&#xff1a; 采用M&#xff08;model&#xff09;V&#xff08;view&#xff09;C&#xff08;controller&#xff09;三层体系结构&#xff0c;通过Spring SpringMv…

什么是位段?位段的作用是什么?他与结构体有什么关系?

目录 1.什么是位段&#xff1f; 2.位段的内存分配 判断当前机器位段的内存分配形式 1.什么是位段&#xff1f; 位段的声明和结构是类似的&#xff0c;有两个不同&#xff1a; 1.位段的成员必须是 int、unsigned int 或signed int或char 。 2.位段的成员名后边有一个冒号和…

Leetcode3010. 将数组分成最小总代价的子数组 I

Every day a Leetcode 题目来源&#xff1a;3010. 将数组分成最小总代价的子数组 I 题目描述&#xff1a; 给你一个长度为 n 的整数数组 nums 。 一个数组的代价是它的第一个元素。比方说&#xff0c;[1,2,3] 的代价是 1 &#xff0c;[3,4,1] 的代价是 3 。 你需要将 num…

C语言---指针进阶

1.字符指针 int main() {char str1[] "hello world";char str2[] "hello world";const char* str3 "hello world.";const char* str4 "hello world.";if (str3 str4){//常量字符串在内存里面是无法修改的&#xff0c;所以没必要…

“分布式透明化”在杭州银行核心系统上线之思考

导读 随着金融行业数字化转型的需求&#xff0c;银行核心系统的升级改造成为重要议题。杭州银行成功上线以 TiDB 为底层数据库的新一代核心业务系统&#xff0c;该实践采用应用与基础设施解耦、分布式透明化的设计开发理念&#xff0c;推动银行核心系统的整体升级。 本文聚焦…

【Effective Objective - C 2.0】——读书笔记(五)

文章目录 二十九、理解引用计数三十、以ARC简化引用计数三十一、在dealloc方法中只释放引用并解除监听三十二、编写异常安全代码时留意内存管理问题三十三、以弱引用避免保留环三十四、以”自动释放池块“降低内存峰值三十五、用"僵尸对象"调试内存管理问题三十六、不…

【C语言】实现队列

目录 &#xff08;一&#xff09;队列 &#xff08;二&#xff09;头文件 &#xff08;三&#xff09; 功能实现 &#xff08;1&#xff09;初始化 &#xff08;2&#xff09; 销毁队列 &#xff08;3&#xff09; 入队 &#xff08;4&#xff09;出队 &#xff08;5&a…

论文阅读:四足机器人对抗运动先验学习稳健和敏捷的行走

论文&#xff1a;Learning Robust and Agile Legged Locomotion Using Adversarial Motion Priors 进一步学习&#xff1a;AMP&#xff0c;baseline方法&#xff0c;TO 摘要&#xff1a; 介绍了一种新颖的系统&#xff0c;通过使用对抗性运动先验 (AMP) 使四足机器人在复杂地…

luigi,一个好用的 Python 数据管道库!

🏷️个人主页:鼠鼠我捏,要死了捏的主页 🏷️付费专栏:Python专栏 🏷️个人学习笔记,若有缺误,欢迎评论区指正 前言 大家好,今天为大家分享一个超级厉害的 Python 库 - luigi。 Github地址:https://github.com/spotify/luigi 在大数据时代,处理海量数据已经成…

NVIDIA 刚刚揭秘了他们的最新大作——Eos,一台跻身全球十强的超级计算机

每周跟踪AI热点新闻动向和震撼发展 想要探索生成式人工智能的前沿进展吗&#xff1f;订阅我们的简报&#xff0c;深入解析最新的技术突破、实际应用案例和未来的趋势。与全球数同行一同&#xff0c;从行业内部的深度分析和实用指南中受益。不要错过这个机会&#xff0c;成为AI领…