从MobileNetv1到MobileNetv3模型详解

简言

MobileNet系列包括V1、V2和V3，专注于轻量级神经网络。MobileNetV1采用深度可分离卷积，MobileNetV2引入倒残差模块，提高准确性。MobileNetV3引入更多设计元素，如可变形卷积和Squeeze-and-Excitation模块，平衡计算效率和准确性。这三个系列在移动设备和嵌入式系统上取得成功，为资源受限的环境提供高效的深度学习解决方案。

mobilenetv1原论文地址：https://arxiv.org/pdf/1704.04861.pdf
mobilenetv2原论文地址：https://arxiv.org/pdf/1801.04381.pdf
mobilenetv3原论文地址：https://arxiv.org/abs/1905.02244.pdf

MobileNetv1

在最近，人们对构建小型而高效的神经网络很感兴趣，使用的方法大致为压缩预训练网络和直接训练小型网络。MobileNet主要关注于优化延迟，但也产生小的网络。

深度可分离卷积

标准卷积本质上是一种通过学习参数的方式，对输入数据进行特征提取的操作；深度可分离卷积相较于标准卷积层引入了两个主要的改进：深度卷积和逐点卷积。

深度卷积（DwConv）：在深度可分离卷积中，首先对输入数据的每个通道使用单独的卷积核，称之为深度卷积。这个步骤实际上是对输入数据的每个通道分别进行卷积操作，而不像标准卷积那样在所有通道上共享一个卷积核。这样做减少了参数的数量，因为每个通道有自己的一组卷积核。
逐点卷积（PwConv）：在深度卷积之后，使用逐点卷积，也称为 1x1 卷积，将深度卷积的输出进行线性组合，生成最终的输出特征图。逐点卷积使用 1x1 的卷积核，这相当于在每个通道上进行全连接操作。逐点卷积的作用是将深度卷积的输出特征图进行组合和混合，引入非线性关系，从而更好地捕捉通道间的信息。

class DepthSepConv(nn.Module):
    """
    深度可分卷积: DW卷积 + PW卷积
    dw卷积, 当分组个数等于输入通道数时, 输出矩阵的通道输也变成了输入通道数
    pw卷积, 使用了1x1的卷积核与普通的卷积一样
    """
    def __init__(self, in_channels, out_channels, stride):
        super(DepthSepConv, self).__init__()
        self.depthwise = nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=stride, groups=in_channels, padding=1)
        self.pointwise = nn.Conv2d(in_channels, out_channels, kernel_size=1)
        self.batch_norm1 = nn.BatchNorm2d(in_channels)
        self.batch_norm2 = nn.BatchNorm2d(out_channels)
        self.relu6 = nn.ReLU6(inplace=True)

    def forward(self, x):
        x = self.depthwise(x)
        x = self.batch_norm1(x)
        x = self.relu6(x)
        x = self.pointwise(x)
        x = self.batch_norm2(x)
        x = self.relu6(x)

        return x

引入了深度可分离卷积，可减少参数的数量，模型的参数量大幅降低，降低了过拟合的风险，同时减小了计算复杂度。

对于标准卷积来说：

$Calculate=K\times K\times C_{in} \times H\times W\times C_{out}$

而深度可分离卷积则是：

$Calculate_{DW}=K\times K\times C_{in} \times H\times W\times 1$

$Calculate_{PW}=1\times 1\times C_{in} \times H\times W\times 1$

所以： $Calculate=Calculate_{DW}+Calculate_{PW}$

其使用的计算量比标准卷积少8到9倍，而且精度只有很小的降低。

mobilenetv1模型实现

这一部分是我参照着论文中的图表按照输出结构复现的。

class MobileNetV1(nn.Module):
    def __init__(self, num_classes=1000, drop_rate=0.2):
        super(MobileNetV1, self).__init__()
        # torch.Size([1, 3, 224, 224])
        self.conv_bn = nn.Sequential(
                nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, stride=2, padding=1, bias=False),
                nn.BatchNorm2d(32),
                nn.ReLU(inplace=True)
            )                                # torch.Size([1, 32, 112, 112])
        self.dwmodule = nn.Sequential(
            # 参考MobileNet_V1 https://arxiv.org/pdf/1704.04861.pdf Table 1
            DepthSepConv(32, 64, 1),            # torch.Size([1, 64, 112, 112])
            DepthSepConv(64, 128, 2),           # torch.Size([1, 128, 56, 56])
            DepthSepConv(128, 128, 1),          # torch.Size([1, 128, 56, 56])
            DepthSepConv(128, 256, 2),          # torch.Size([1, 256, 28, 28])
            DepthSepConv(256, 256, 1),          # torch.Size([1, 256, 28, 28])
            DepthSepConv(256, 512, 2),          # torch.Size([1, 512, 14, 14])
            # 5 x DepthSepConv(512, 512, 1),
            DepthSepConv(512, 512, 1),          # torch.Size([1, 512, 14, 14])
            DepthSepConv(512, 512, 1),
            DepthSepConv(512, 512, 1),
            DepthSepConv(512, 512, 1),
            DepthSepConv(512, 512, 1),
            DepthSepConv(512, 1024, 2),         # torch.Size([1, 1024, 7, 7])
            DepthSepConv(1024, 1024, 1),
            nn.AvgPool2d(7, stride=1),
        )
        self.fc = nn.Linear(in_features=1024, out_features=num_classes)
        self.dropout = nn.Dropout(p=drop_rate)
        self.softmax = nn.Softmax(dim=1)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                nn.init.constant_(m.bias, 0)

    def forward(self, x):
        x = self.conv_bn(x)
        x = self.dwmodule(x)
        x = x.view(x.size(0), -1)
        x = self.fc(x)
        x = self.softmax(self.dropout(x))
        return x

第1层为标准卷积层，紧接着的26层为核心层结构，采用深度可分离卷积层。这些层通过堆叠深度可分离卷积单元来构建网络。然后是全局平均池化层，使用7x7的池化核，目的是降低空间维度，将图像的每个通道的特征合并为一个值。全连接层加softmax层输出。

MobileNetv2

Mobilenetv2网络设计基于Mobilenetv1，它保持了其简单性，不需要任何特殊的操作，同时显著提高了其准确性，实现了移动应用的多图像分类和检测任务的最先进水平。

MobileNetV2是基于倒置的残差结构，普通的残差结构是先经过 1x1 的卷积核把 feature map的通道数压下来，然后经过 3x3 的卷积核，最后再用 1x1 的卷积核将通道数扩张回去，即先压缩后扩张，而MobileNetV2的倒置残差结构是先扩张后压缩。另外，我们发现移除通道数很少的层做线性激活非常重要。

Inverted Residual Block倒残差结构

可以看见在我们上图的右边，就是倒残差结构，它会经历以下部分：

1x1卷积升维
3x3卷积DW
1x1卷积降维

接下来请结合着下面的代码来看，首先有一个expand_ratio来表示是否对输入进来的特征层进行升维，如果不需要就会进行卷积、标准化、激活函数、卷积、标准化。不然就会先有1x1卷积进行通道数的上升，在用3x3逐层卷积，进行跨特征点的特征提取，最后1x1卷积进行通道数的下降。

上升是为了让我们的网络结构有具备更好的特征表征能力，下降是为了让我们的网络具备更低的运算量，在完成这样的特征提取后，如果要使用残差边，我们就会将特征提取的结果直接与输入相接，如果没有使用残差边，就会直接输出卷积结果。

import torch
import torch.nn as nn

def _make_divisible(v, divisor, min_value=None):
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v

class ConvBNReLU6(nn.Module):
    def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1, dilation=1,):
        super(ConvBNReLU6, self).__init__()
        padding = (kernel_size - 1) // 2 * dilation

        self.convbnrelu6 = nn.Sequential(
            nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, dilation=dilation,
                      groups=groups, bias=False),
            nn.BatchNorm2d(out_planes),
            nn.ReLU6(inplace=True)
        )

    def forward(self, x):
        return self.convbnrelu6(x)

class InvertedResidual(nn.Module):
    def __init__(self, in_planes, out_planes, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        assert stride in [1, 2]

        hidden_dim = int(round(in_planes * expand_ratio))
        self.use_res_connect = self.stride == 1 and in_planes == out_planes

        layers = []
        if expand_ratio != 1:
            # pw 利用1x1卷积进行通道数的上升
            layers.append(ConvBNReLU6(in_planes, hidden_dim, kernel_size=1))

        layers.extend([
            # dw 进行3x3的逐层卷积，进行跨特征点的特征提取
            ConvBNReLU6(hidden_dim, hidden_dim, kernel_size=3, stride=stride, groups=hidden_dim),
            # pw-linear 利用1x1卷积进行通道数的下降
            nn.Conv2d(hidden_dim, out_planes, kernel_size=1, stride=1, padding=0),
            nn.BatchNorm2d(out_planes),
        ])
        self.conv = nn.Sequential(*layers)
        self.out_channels = out_planes

    def forward(self, x):
        if self.use_res_connect:
            return x + self.conv(x)
        else:
            return self.conv(x)

if __name__ == "__main__":
    inverted_residual_setting = [
        # t, c, n, s
        [1, 16, 1, 1],
        [6, 24, 2, 2],
        [6, 32, 3, 2],
        [6, 64, 4, 2],
        [6, 96, 3, 1],
        [6, 160, 3, 2],
        [6, 320, 1, 1],
    ]

    class Invertedmodels(nn.Module):
        def __init__(self, input_channel=32, round_nearest=8):
            super(Invertedmodels, self).__init__()
            input_channel = _make_divisible(input_channel, round_nearest)
            self.conv1 = ConvBNReLU6(3, input_channel, stride=2)
            self.inverted_residuals = nn.ModuleList()

            for t, c, n, s in inverted_residual_setting:
                output_channel = _make_divisible(c, round_nearest)
                inverted_residual_list = []
                for i in range(n):
                    stride = s if i == 0 else 1
                    inverted_residual = InvertedResidual(input_channel, output_channel, stride, expand_ratio=t)
                    inverted_residual_list.append(inverted_residual)
                    input_channel = output_channel

                # 将InvertedResidual的实例添加到模型中
                setattr(self, f'inverted_residual_{t}_{c}_{n}', nn.Sequential(*inverted_residual_list))
                self.inverted_residuals.extend(inverted_residual_list)

        def forward(self, x):
            x = self.conv1(x)
            print(x.shape)
            for i, inverted_residual in enumerate(self.inverted_residuals):
                x = inverted_residual(x)
                print(i, x.shape)
            return x

    input_tensor = torch.randn((1, 3, 224, 224))
    model = Invertedmodels()
    output = model(input_tensor)

mobilenetv2模型实现

这一部分可以参照着论文中的图表进行理解。

import torch
import torch.nn as nn
import torch.nn.functional as F

def _make_divisible(v, divisor, min_value=None):
    """
    This function is taken from the original tf repo.It can be seen here:
        https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    Args:
        v: The number of input channels.
        divisor: The number of channels should be a multiple of this value.
        min_value: The minimum value of the number of channels, which defaults to the advisor.

    Returns: It ensures that all layers have a channel number that is divisible by 8
    """
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v

class ConvBNReLU6(nn.Module):
    def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1, dilation=1,):
        super(ConvBNReLU6, self).__init__()
        padding = (kernel_size - 1) // 2 * dilation

        self.convbnrelu6 = nn.Sequential(
            nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, dilation=dilation,
                      groups=groups, bias=False),
            nn.BatchNorm2d(out_planes),
            nn.ReLU6(inplace=True)
        )

    def forward(self, x):
        return self.convbnrelu6(x)

class InvertedResidual(nn.Module):
    def __init__(self, in_planes, out_planes, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        assert stride in [1, 2]

        hidden_dim = int(round(in_planes * expand_ratio))
        self.use_res_connect = self.stride == 1 and in_planes == out_planes

        layers = []
        if expand_ratio != 1:
            # pw 利用1x1卷积进行通道数的上升
            layers.append(ConvBNReLU6(in_planes, hidden_dim, kernel_size=1))
        layers.extend([
            # dw 进行3x3的逐层卷积，进行跨特征点的特征提取
            ConvBNReLU6(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim),
            # pw-linear 利用1x1卷积进行通道数的下降
            nn.Conv2d(hidden_dim, out_planes, kernel_size=1, stride=1, padding=0),
            nn.BatchNorm2d(out_planes),
        ])
        self.conv = nn.Sequential(*layers)
        self.out_channels = out_planes

    def forward(self, x):
        if self.use_res_connect:
            return x + self.conv(x)
        else:
            return self.conv(x)


class MobileNetV2(nn.Module):
    def __init__(self, num_classes=1000, drop_rate=0.2, width_mult=1.0, round_nearest=8):
        """
        MobileNet V2 main class

        Args:
            num_classes (int): Number of classes
            drop_rate (float): Dropout layer drop rate
            width_mult (float): Width multiplier - adjusts number of channels in each layer by this amount
            round_nearest (int): Round the number of channels in each layer to be a multiple of this number
            Set to 1 to turn off rounding
        """
        super(MobileNetV2, self).__init__()
        input_channel = 32
        last_channel = 1280
        inverted_residual_setting = [
            # t, c, n, s
            [1, 16, 1, 1],
            [6, 24, 2, 2],
            [6, 32, 3, 2],
            [6, 64, 4, 2],
            [6, 96, 3, 1],
            [6, 160, 3, 2],
            [6, 320, 1, 1],
        ]
        # t表示是否进行1*1卷积上升的过程 c表示output_channel大小 n表示小列表倒残差次数 s是步长,表示是否对高和宽进行压缩
        # building first layer
        input_channel = _make_divisible(input_channel * width_mult, round_nearest)
        self.last_channel = _make_divisible(last_channel * max(1.0, width_mult), round_nearest)
        features = [ConvBNReLU6(3, input_channel, stride=2)]
        # building inverted residual blocks
        for t, c, n, s in inverted_residual_setting:
            output_channel = _make_divisible(c * width_mult, round_nearest)
            for i in range(n):
                stride = s if i == 0 else 1
                features.append(InvertedResidual(input_channel, output_channel, stride, expand_ratio=t))
                input_channel = output_channel
        # building last several layers
        features.append(ConvBNReLU6(input_channel, self.last_channel, kernel_size=1))
        # make it nn.Sequential
        self.features = nn.Sequential(*features)

        self.classifier = nn.Sequential(
            nn.Dropout(drop_rate),
            nn.Linear(self.last_channel, num_classes),
        )

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def forward(self, x):
        x = self.features(x)
        # Cannot use "squeeze" as batch-size can be 1 => must use reshape with x.shape[0]
        x = F.adaptive_avg_pool2d(x, (1, 1)).reshape(x.shape[0], -1)
        x = self.classifier(x)
        return x

if __name__=="__main__":
    import torchsummary
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    input = torch.ones(2, 3, 224, 224).to(device)
    net = MobileNetV2(num_classes=4)
    net = net.to(device)
    out = net(input)
    print(out)
    print(out.shape)
    torchsummary.summary(net, input_size=(3, 224, 224))

MobileNetv3

mobilenetv3中的block

在如上的结构图当中，mobilenetv3添加了SE模块，并且更换了激活函数。

SE模块你可以通过这里了解更多：SE通道注意力机制模块-CSDN博客

这里用到的激活函数不一样，有hardswish、relu两种。relu我想大家也是十分的了解了。

HardSwish的数学表达式如下：

$HardSwish(x)=x\cdot ReLU6(x+3) / 6$

hardswish我写了一个手写版本的帮助大家理解，这也是我与官方的实现进行过对比的

class Hardswish(nn.Module):
    def __init__(self, inplace=False):
        super(Hardswish, self).__init__()
        self.inplace = inplace

    def _hardswish(self, x):
        inner = F.relu6(x + 3.).div_(6.)
        return x.mul_(inner) if self.inplace else x.mul(inner)

    def forward(self, x):
        return self._hardswish(x)

这种设计的优势在于，HardSwish在保持一定的非线性特性的同时，通过使用ReLU6的硬性截断，使得函数在接近零的地方趋向于线性，这有助于梯度的传播。

mobilenetv3模型实现

论文当中提供了两种实现方式，分别是large和small。

import torch
import torch.nn as nn
from functools import partial

def _make_divisible(v, divisor, min_value=None):
    """
    This function is taken from the original tf repo.It can be seen here:
        https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    Args:
        v: The number of input channels.
        divisor: The number of channels should be a multiple of this value.
        min_value: The minimum value of the number of channels, which defaults to the advisor.

    Returns: It ensures that all layers have a channel number that is divisible by 8
    """
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v

class ConvBNActivation(nn.Module):
    def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1,
                 norm_layer=None, activation_layer=None, dilation=1,):
        super(ConvBNActivation, self).__init__()
        padding = (kernel_size - 1) // 2 * dilation
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        if activation_layer is None:
            activation_layer = nn.ReLU6
        self.convbnact=nn.Sequential(
            nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, dilation=dilation, groups=groups,
                      bias=False),
            norm_layer(out_planes),
            activation_layer(inplace=True)
        )
        self.out_channels = out_planes

    def forward(self, x):
        return self.convbnact(x)


class SeModule(nn.Module):
    def __init__(self, input_channels, reduction=4):
        super(SeModule, self).__init__()
        expand_size = _make_divisible(input_channels // reduction, 8)
        self.se = nn.Sequential(
            nn.AdaptiveAvgPool2d(1),
            nn.Conv2d(input_channels, expand_size, kernel_size=1, bias=False),
            nn.BatchNorm2d(expand_size),
            nn.ReLU(inplace=True),
            nn.Conv2d(expand_size, input_channels, kernel_size=1, bias=False),
            nn.Hardsigmoid()
        )

    def forward(self, x):
        return x * self.se(x)

class MobileNetV3(nn.Module):
    """
    MobileNet V3 main class

    Args:
        num_classes: Number of classes
        mode: "large" or "small"
    """
    def __init__(self, num_classes=1000, mode=None, drop_rate=0.2):
        super().__init__()
        norm_layer = partial(nn.BatchNorm2d, eps=0.001, momentum=0.01)
        layers = []
        inverted_residual_setting, last_channel = _mobilenetv3_cfg[mode]
        # building first layer
        firstconv_output_channels = 16
        layers.append(ConvBNActivation(3, firstconv_output_channels, kernel_size=3, stride=2, norm_layer=norm_layer,
                                       activation_layer=nn.Hardswish))
        layers.append(inverted_residual_setting)
        # building last several layers
        lastconv_input_channels = 96 if mode == "small" else 160
        lastconv_output_channels = 6 * lastconv_input_channels
        layers.append(ConvBNActivation(lastconv_input_channels, lastconv_output_channels, kernel_size=1,
                                       norm_layer=norm_layer, activation_layer=nn.Hardswish))

        self.features = nn.Sequential(*layers)
        self.avgpool = nn.AdaptiveAvgPool2d(1)
        self.classifier = nn.Sequential(
            nn.Linear(lastconv_output_channels, last_channel),
            nn.Hardswish(inplace=True),
            nn.Dropout(p=drop_rate, inplace=True),
            nn.Linear(last_channel, num_classes),
        )

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def forward(self, x):
        x = self.features(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)

        return x

class InvertedResidualv3(nn.Module):
    '''expand + depthwise + pointwise'''
    def __init__(self, kernel_size, input_channels, expanded_channels, out_channels, activation, use_se, stride):
        super(InvertedResidualv3, self).__init__()
        self.stride = stride
        norm_layer = partial(nn.BatchNorm2d, eps=0.001, momentum=0.01)
        self.use_res_connect = stride == 1 and input_channels == out_channels
        activation_layer = nn.ReLU if activation == "RE" else nn.Hardswish
        layers = []
        if expanded_channels != input_channels:
            layers.append(ConvBNActivation(input_channels, expanded_channels, kernel_size=1,
                                           norm_layer=norm_layer, activation_layer=activation_layer))

        # depthwise
        layers.append(ConvBNActivation(expanded_channels, expanded_channels, kernel_size=kernel_size,
                                       stride=stride, groups=expanded_channels,
                                       norm_layer=norm_layer, activation_layer=activation_layer))
        if use_se:
            layers.append(SeModule(expanded_channels))

        layers.append(ConvBNActivation(expanded_channels, out_channels, kernel_size=1, norm_layer=norm_layer,
                                       activation_layer=nn.Identity))

        self.block = nn.Sequential(*layers)
        self.out_channels = out_channels

    def forward(self, x):
        result = self.block(x)
        if self.use_res_connect:
            result += x
        return result

_mobilenetv3_cfg = {
    "large": [nn.Sequential(
        # kernel, in_chs, exp_chs, out_chs, act, use_se, stride
        InvertedResidualv3(3, 16, 16, 16, "RE", False, 1),
        InvertedResidualv3(3, 16, 64, 24, "RE", False, 2),
        InvertedResidualv3(3, 24, 72, 24, "RE", False, 1),
        InvertedResidualv3(5, 24, 72, 40, "RE", True, 2),
        InvertedResidualv3(5, 40, 120, 40, "RE", True, 1),
        InvertedResidualv3(5, 40, 120, 40, "RE", True, 1),
        InvertedResidualv3(3, 40, 240, 80, "HS", False, 2),
        InvertedResidualv3(3, 80, 200, 80, "HS", False, 1),
        InvertedResidualv3(3, 80, 184, 80, "HS", False, 1),
        InvertedResidualv3(3, 80, 184, 80, "HS", False, 1),
        InvertedResidualv3(3, 80, 480, 112, "HS", True, 1),
        InvertedResidualv3(3, 112, 672, 112, "HS", True, 1),
        InvertedResidualv3(5, 112, 672, 160, "HS", True, 1),
        InvertedResidualv3(5, 160, 672, 160, "HS", True, 2),
        InvertedResidualv3(5, 160, 960, 160, "HS", True, 1),
    ),
        _make_divisible(1280, 8)],
    "small": [nn.Sequential(
        # kernel, in_chs, exp_chs, out_chs, act, use_se, stride
        InvertedResidualv3(3, 16, 16, 16, "RE", True, 2),
        InvertedResidualv3(3, 16, 72, 24, "RE", False, 2),
        InvertedResidualv3(3, 24, 88, 24, "RE", False, 1),
        InvertedResidualv3(5, 24, 96, 40, "HS", True, 2),
        InvertedResidualv3(5, 40, 240, 40, "HS", True, 1),
        InvertedResidualv3(5, 40, 240, 40, "HS", True, 1),
        InvertedResidualv3(5, 40, 120, 48, "HS", True, 1),
        InvertedResidualv3(5, 48, 144, 48, "HS", True, 1),
        InvertedResidualv3(5, 48, 288, 96, "HS", True, 2),
        InvertedResidualv3(5, 96, 576, 96, "HS", True, 1),
        InvertedResidualv3(5, 96, 576, 96, "HS", True, 1),
    ),
        _make_divisible(1024, 8)],
}

def MobileNetV3_Large(num_classes):
    """Large version of mobilenet_v3"""
    return MobileNetV3(num_classes=num_classes, mode="large")

def MobileNetV3_Small(num_classes):
    """small version of mobilenet_v3"""
    return MobileNetV3(num_classes=num_classes, mode="small")

if __name__=="__main__":
    import torchsummary
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    input = torch.ones(2, 3, 224, 224).to(device)
    net = MobileNetV3_Large(num_classes=4)
    net = net.to(device)
    out = net(input)
    print(out)
    print(out.shape)
    torchsummary.summary(net, input_size=(3, 224, 224))

其他

老规矩，模型实现了还是要测试一下它的分类性能，但让我感到奇怪的一点是mobilenetv3在验证集上的损失在不断上升，而且越来越离谱，大致在10到20，这让我一度以为是我写的训练脚本计算出了问题（因为期间在不断的改进），后面我又跑了前面的网络，以及mobilenetv1和v2两个版本都还是挺正常的，然后我又拿官方的进行实验（torchvision下的mobilenetv3），也是和我一样的问题，验证集损失在十几，所以这部分我暂时还是比较的疑惑的。

问题暂时没有解决，先放在这里。