ShuffleNet网络结构
ShuffleNet是一种专为移动设备设计的、计算效率极高的卷积神经网络(CNN)架构。其网络结构的设计主要围绕减少计算复杂度和提高模型效率展开,通过引入逐点分组卷积(Pointwise Group Convolution)和通道洗牌(Channel Shuffle)两种新技术,实现了在保持精度的同时大幅降低计算成本。
逐点分组卷积(Pointwise Group Convolution):
逐点分组卷积是ShuffleNet中用于减少1x1卷积计算复杂度的方法。它将输入特征图的通道分成多个组,每个组内的通道独立进行1x1卷积,从而显著降低了计算量。
然而,这种方法可能导致通道间的信息无法充分交流,影响模型的表达能力。可能会降低网络的特征提取能力
。
通道洗牌(Channel Shuffle):
为了解决逐点分组卷积带来的通道间信息交流不足的问题,ShuffleNet引入了通道洗牌操作。通过均匀地打乱不同分组中的通道,使得每个分组都能获得来自其他分组的信息,从而增强模型的特征提取能力。
ShuffleNet对ResNet中的Bottleneck结构进行由(a)到(b), ©的更改:
-
将开始和最后的
1×1
卷积模块(降维、升维)改成Point Wise Group Convolution
; -
为了进行不同通道的信息交流,再降维之后进行
Channel Shuffle
; -
降采样模块中,
3×3 Depth Wise Convolution
的步长设置为2,长宽降为原来的一般,因此shortcut
中采用步长为2
的3×3
平均池化,并把相加改成拼接。
ShuffleV1Block
class ShuffleV1Block(nn.Cell):
def __init__(self, inp, oup, group, first_group, mid_channels, ksize, stride):
super(ShuffleV1Block, self).__init__()
self.stride = stride
pad = ksize // 2
self.group = group
if stride == 2:
outputs = oup - inp
else:
outputs = oup
self.relu = nn.ReLU()
branch_main_1 = [
GroupConv(in_channels=inp, out_channels=mid_channels,
kernel_size=1, stride=1, pad_mode="pad", pad=0,
groups=1 if first_group else group),
nn.BatchNorm2d(mid_channels),
nn.ReLU(),
]
branch_main_2 = [
nn.Conv2d(mid_channels, mid_channels, kernel_size=ksize, stride=stride,
pad_mode='pad', padding=pad, group=mid_channels,
weight_init='xavier_uniform', has_bias=False),
nn.BatchNorm2d(mid_channels),
GroupConv(in_channels=mid_channels, out_channels=outputs,
kernel_size=1, stride=1, pad_mode="pad", pad=0,
groups=group),
nn.BatchNorm2d(outputs),
]
self.branch_main_1 = nn.SequentialCell(branch_main_1)
self.branch_main_2 = nn.SequentialCell(branch_main_2)
if stride == 2:
self.branch_proj = nn.AvgPool2d(kernel_size=3, stride=2, pad_mode='same')
def construct(self, old_x):
left = old_x
right = old_x
out = old_x
right = self.branch_main_1(right)
if self.group > 1:
right = self.channel_shuffle(right)
right = self.branch_main_2(right)
if self.stride == 1:
out = self.relu(left + right)
elif self.stride == 2:
left = self.branch_proj(left)
out = ops.cat((left, right), 1)
out = self.relu(out)
return out
def channel_shuffle(self, x):
batchsize, num_channels, height, width = ops.shape(x)
group_channels = num_channels // self.group
x = ops.reshape(x, (batchsize, group_channels, self.group, height, width))
x = ops.transpose(x, (0, 2, 1, 3, 4))
x = ops.reshape(x, (batchsize, num_channels, height, width))
return x
ShuffleNet
的基本单元是在残差单元(residual block
)的基础上改进而成的,具体结构如下:
1x1分组卷积
:首先,输入特征图通过一个1x1的分组卷积进行降维,减少通道数。
通道洗牌
:紧接着,对分组卷积的输出进行通道洗牌操作,以实现不同分组之间的信息交流。
3x3深度可分离卷积
:然后,使用3x3的深度可分离卷积(depthwise separable convolution)进行特征提取。这里的3x3卷积是瓶颈层(bottleneck),用于降低计算量。
1x1分组卷积
(可选):最后,根据需要,可以通过另一个1x1的分组卷积将通道数恢复到与输入相同或更大的数量。
短路连接
:在基本单元中,还包含短路连接(shortcut),用于将输入特征图直接加到输出特征图上,以保留原始信息并帮助梯度回传。
ShuffleNet
网络结构如上图所示,以输入图像 224×224
,组数3(g = 3
)为例,首先通过数量24
,卷积核大小为 3×3
,stride
为2
的卷积层,输出特征图大小为 112×112
,channel为24
;然后通过stride为2
的最大池化层,输出特征图大小为 56×56
,channel
数不变;再堆叠3
个ShuffleNet模块(Stage2, Stage3, Stage4
),三个模块分别重复4
次、8
次、4
次,其中每个模块开始先经过一次下采样模块(上图©),使特征图长宽减半,channel翻倍(Stage2的下采样模块除外,将channel
数从24
变为240
);随后经过全局平均池化,输出大小为 1×1×960
,再经过全连接层
和softmax
,得到分类概率
。
ShuffleNetV1
class ShuffleNetV1(nn.Cell):
def __init__(self, n_class=1000, model_size='2.0x', group=3):
super(ShuffleNetV1, self).__init__()
print('model size is ', model_size)
self.stage_repeats = [4, 8, 4]
self.model_size = model_size
if group == 3:
if model_size == '0.5x':
self.stage_out_channels = [-1, 12, 120, 240, 480]
elif model_size == '1.0x':
self.stage_out_channels = [-1, 24, 240, 480, 960]
elif model_size == '1.5x':
self.stage_out_channels = [-1, 24, 360, 720, 1440]
elif model_size == '2.0x':
self.stage_out_channels = [-1, 48, 480, 960, 1920]
else:
raise NotImplementedError
elif group == 8:
if model_size == '0.5x':
self.stage_out_channels = [-1, 16, 192, 384, 768]
elif model_size == '1.0x':
self.stage_out_channels = [-1, 24, 384, 768, 1536]
elif model_size == '1.5x':
self.stage_out_channels = [-1, 24, 576, 1152, 2304]
elif model_size == '2.0x':
self.stage_out_channels = [-1, 48, 768, 1536, 3072]
else:
raise NotImplementedError
input_channel = self.stage_out_channels[1]
self.first_conv = nn.SequentialCell(
nn.Conv2d(3, input_channel, 3, 2, 'pad', 1, weight_init='xavier_uniform', has_bias=False),
nn.BatchNorm2d(input_channel),
nn.ReLU(),
)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, pad_mode='same')
features = []
for idxstage in range(len(self.stage_repeats)):
numrepeat = self.stage_repeats[idxstage]
output_channel = self.stage_out_channels[idxstage + 2]
for i in range(numrepeat):
stride = 2 if i == 0 else 1
first_group = idxstage == 0 and i == 0
features.append(ShuffleV1Block(input_channel, output_channel,
group=group, first_group=first_group,
mid_channels=output_channel // 4, ksize=3, stride=stride))
input_channel = output_channel
self.features = nn.SequentialCell(features)
self.globalpool = nn.AvgPool2d(7)
self.classifier = nn.Dense(self.stage_out_channels[-1], n_class)
def construct(self, x):
x = self.first_conv(x)
x = self.maxpool(x)
x = self.features(x)
x = self.globalpool(x)
x = ops.reshape(x, (-1, self.stage_out_channels[-1]))
x = self.classifier(x)
return x
设置model_size="2.0x"
,定义模型的复杂度。
net = ShuffleNetV1(model_size="2.0x", n_class=10)