目录
🌊1. 研究目的
🌊2. 研究准备
🌊3. 研究内容
🌍3.1 残差网络
🌍3.2 练习
🌊4. 研究体会
🌊1. 研究目的
- 了解残差网络(ResNet)的原理和架构;
- 探究残差网络的优势;
- 分析残差网络的深度对模型性能的影响;
- 实践应用残差网络解决实际问题。
🌊2. 研究准备
- 根据GPU安装pytorch版本实现GPU运行研究代码;
- 配置环境用来运行 Python、Jupyter Notebook和相关库等相关库。
🌊3. 研究内容
启动jupyter notebook,使用新增的pytorch环境新建ipynb文件,为了检查环境配置是否合理,输入import torch以及torch.cuda.is_available() ,若返回TRUE则说明研究环境配置正确,若返回False但可以正确导入torch则说明pytorch配置成功,但研究运行是在CPU进行的,结果如下:
🌍3.1 残差网络
(1)使用jupyter notebook新增的pytorch环境新建ipynb文件,完成基本数据操作的研究代码与练习结果如下:
import torch
from torch import nn
from torch.nn import functional as F
from d2l import torch as d2l
class Residual(nn.Module): #@save
def __init__(self, input_channels, num_channels,
use_1x1conv=False, strides=1):
super().__init__()
self.conv1 = nn.Conv2d(input_channels, num_channels,
kernel_size=3, padding=1, stride=strides)
self.conv2 = nn.Conv2d(num_channels, num_channels,
kernel_size=3, padding=1)
if use_1x1conv:
self.conv3 = nn.Conv2d(input_channels, num_channels,
kernel_size=1, stride=strides)
else:
self.conv3 = None
self.bn1 = nn.BatchNorm2d(num_channels)
self.bn2 = nn.BatchNorm2d(num_channels)
def forward(self, X):
Y = F.relu(self.bn1(self.conv1(X)))
Y = self.bn2(self.conv2(Y))
if self.conv3:
X = self.conv3(X)
Y += X
return F.relu(Y)
blk = Residual(3,3)
X = torch.rand(4, 3, 6, 6)
Y = blk(X)
Y.shape
blk = Residual(3,6, use_1x1conv=True, strides=2)
blk(X).shape
ResNet模型
b1 = nn.Sequential(nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
nn.BatchNorm2d(64), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
def resnet_block(input_channels, num_channels, num_residuals,
first_block=False):
blk = []
for i in range(num_residuals):
if i == 0 and not first_block:
blk.append(Residual(input_channels, num_channels,
use_1x1conv=True, strides=2))
else:
blk.append(Residual(num_channels, num_channels))
return blk
b2 = nn.Sequential(*resnet_block(64, 64, 2, first_block=True))
b3 = nn.Sequential(*resnet_block(64, 128, 2))
b4 = nn.Sequential(*resnet_block(128, 256, 2))
b5 = nn.Sequential(*resnet_block(256, 512, 2))
net = nn.Sequential(b1, b2, b3, b4, b5,
nn.AdaptiveAvgPool2d((1,1)),
nn.Flatten(), nn.Linear(512, 10))
X = torch.rand(size=(1, 1, 224, 224))
for layer in net:
X = layer(X)
print(layer.__class__.__name__,'output shape:\t', X.shape)
训练模型
lr, num_epochs, batch_size = 0.05, 10, 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=96)
d2l.train_ch6(net, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())
🌍3.2 练习
1.图7.4.1中的Inception块与残差块之间的主要区别是什么?在删除了Inception块中的一些路径之后,它们是如何相互关联的?
Inception块和残差块(Residual block)是两种不同的网络模块,其主要区别在于它们的结构和连接方式。
Inception块是由多个不同大小的卷积核和池化操作组成的,它们在不同的分支中并行进行操作,然后将它们的输出在通道维度上进行拼接。这种设计可以捕捉不同尺度和层次的特征,并且具有较大的感受野,从而提高网络的表达能力。
残差块(Residual block)是通过引入跳跃连接(skip connection)来解决梯度消失问题的一种方式。在残差块中,输入通过一个或多个卷积层后,与原始输入进行相加操作。这种设计允许信息在网络中直接跳过一些层级,使得网络能够更容易地学习残差(原始输入与输出之间的差异),从而加速训练和改善模型的收敛性。
当从Inception块中删除一些路径时,它们仍然与其他路径相互关联。删除路径后,剩下的路径仍然可以在Inception块中共享信息,并通过拼接或连接操作将它们的输出合并起来。这样可以减少模型的计算复杂度和参数量,并且有助于防止过拟合。
在残差网络(ResNet)中,每个残差块通过跳跃连接将输入直接添加到输出中,确保了信息的流动。这种结构使得残差网络能够更深地堆叠层级,并且可以训练非常深的神经网络而不会导致梯度消失或退化问题。
2.参考ResNet论文 (He et al., 2016)中的表1,以实现不同的变体。
根据ResNet论文中的表1,我们可以实现ResNet的不同变体,如ResNet-18、ResNet-34、ResNet-50、ResNet-101和ResNet-152。以下是这些变体的具体实现代码:
import torch
from torch import nn
from torch.nn import functional as F
from d2l import torch as d2l
class Residual(nn.Module):
def __init__(self, input_channels, num_channels, use_1x1conv=False, strides=1):
super().__init__()
self.conv1 = nn.Conv2d(input_channels, num_channels, kernel_size=3, padding=1, stride=strides)
self.conv2 = nn.Conv2d(num_channels, num_channels, kernel_size=3, padding=1)
if use_1x1conv:
self.conv3 = nn.Conv2d(input_channels, num_channels, kernel_size=1, stride=strides)
else:
self.conv3 = None
self.bn1 = nn.BatchNorm2d(num_channels)
self.bn2 = nn.BatchNorm2d(num_channels)
def forward(self, X):
Y = F.relu(self.bn1(self.conv1(X)))
Y = self.bn2(self.conv2(Y))
if self.conv3:
X = self.conv3(X)
Y += X
return F.relu(Y)
def resnet_block(input_channels, num_channels, num_residuals, first_block=False):
blk = []
for i in range(num_residuals):
if i == 0 and not first_block:
blk.append(Residual(input_channels, num_channels, use_1x1conv=True, strides=2))
else:
blk.append(Residual(num_channels, num_channels))
return blk
class ResNet(nn.Module):
def __init__(self, num_classes, block_sizes):
super().__init__()
self.b1 = nn.Sequential(nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
self.b2 = nn.Sequential(*resnet_block(64, 64, block_sizes[0], first_block=True))
self.b3 = nn.Sequential(*resnet_block(64, 128, block_sizes[1]))
self.b4 = nn.Sequential(*resnet_block(128, 256, block_sizes[2]))
self.b5 = nn.Sequential(*resnet_block(256, 512, block_sizes[3]))
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.flatten = nn.Flatten()
self.fc = nn.Linear(512, num_classes)
def forward(self, X):
X = self.b1(X)
X = self.b2(X)
X = self.b3(X)
X = self.b4(X)
X = self.b5(X)
X = self.avgpool(X)
X = self.flatten(X)
X = self.fc(X)
return X
def resnet18(num_classes):
return ResNet(num_classes, [2, 2, 2, 2])
def resnet34(num_classes):
return ResNet(num_classes, [3, 4, 6, 3])
def resnet50(num_classes):
return ResNet(num_classes, [3, 4, 6, 3])
def resnet101(num_classes):
return ResNet(num_classes, [3, 4, 23, 3])
# Usage example
num_classes = 10 # Number of output classes
net = resnet18(num_classes) # Choose the ResNet variant
# Training
lr, num_epochs, batch_size = 0.1, 10, 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=96)
d2l.train_ch6(net, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())
3.对于更深层次的网络,ResNet引入了“bottleneck”架构来降低模型复杂性。请试着去实现它。
ResNet引入了“bottleneck”架构。在这个架构中,每个残差块由一个1x1卷积层、一个3x3卷积层和一个1x1卷积层组成,其中1x1卷积层用于减少维度和恢复维度。这样可以显著减少参数数量和计算量。
import torch
from torch import nn
from torch.nn import functional as F
from d2l import torch as d2l
class Bottleneck(nn.Module):
def __init__(self, input_channels, num_channels, use_1x1conv=False, strides=1):
super().__init__()
self.conv1 = nn.Conv2d(input_channels, num_channels, kernel_size=1)
self.conv2 = nn.Conv2d(num_channels, num_channels, kernel_size=3, padding=1, stride=strides)
self.conv3 = nn.Conv2d(num_channels, num_channels * 4, kernel_size=1)
self.bn1 = nn.BatchNorm2d(num_channels)
self.bn2 = nn.BatchNorm2d(num_channels)
self.bn3 = nn.BatchNorm2d(num_channels * 4)
if use_1x1conv:
self.conv4 = nn.Conv2d(input_channels, num_channels * 4, kernel_size=1, stride=strides)
else:
self.conv4 = None
def forward(self, X):
Y = F.relu(self.bn1(self.conv1(X)))
Y = F.relu(self.bn2(self.conv2(Y)))
Y = self.bn3(self.conv3(Y))
if self.conv4:
X = self.conv4(X)
Y += X
return F.relu(Y)
def bottleneck_block(input_channels, num_channels, num_residuals, first_block=False):
blk = []
for i in range(num_residuals):
if i == 0 and not first_block:
blk.append(Bottleneck(input_channels, num_channels, use_1x1conv=True, strides=2))
else:
blk.append(Bottleneck(num_channels * 4, num_channels))
return blk
class ResNet(nn.Module):
def __init__(self, num_classes, block_sizes):
super().__init__()
self.b1 = nn.Sequential(nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
self.b2 = nn.Sequential(*bottleneck_block(64, 64, block_sizes[0], first_block=True))
self.b3 = nn.Sequential(*bottleneck_block(256, 128, block_sizes[1]))
self.b4 = nn.Sequential(*bottleneck_block(512, 256, block_sizes[2]))
self.b5 = nn.Sequential(*bottleneck_block(1024, 512, block_sizes[3]))
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.flatten = nn.Flatten()
self.fc = nn.Linear(2048, num_classes)
def forward(self, X):
X = self.b1(X)
X = self.b2(X)
X = self.b3(X)
X = self.b4(X)
X = self.b5(X)
X = self.avgpool(X)
X = self.flatten(X)
X = self.fc(X)
return X
def resnet50(num_classes):
return ResNet(num_classes, [3, 4, 6, 3])
def resnet101(num_classes):
return ResNet(num_classes, [3, 4, 23, 3])
def resnet152(num_classes):
return ResNet(num_classes, [3, 8, 36, 3])
# Usage example
num_classes = 10 # Number of output classes
net = resnet50(num_classes) # Choose the ResNet variant
# Training
lr, num_epochs, batch_size = 0.1, 10, 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=96)
d2l.train_ch6(net, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())
4.在ResNet的后续版本中,作者将“卷积层、批量规范化层和激活层”架构更改为“批量规范化层、激活层和卷积层”架构。请尝试做这个改进。详见 (He et al., 2016)中的图1
在ResNet的后续版本中,作者将“卷积层、批量规范化层和激活层”架构更改为“批量规范化层、激活层和卷积层”架构。这种改进可以提高训练的稳定性和收敛速度。
以下是将ResNet的层结构改为“批量规范化层、激活层和卷积层”架构的代码实现:
import torch
from torch import nn
from torch.nn import functional as F
from d2l import torch as d2l
class Bottleneck(nn.Module):
def __init__(self, input_channels, num_channels, use_1x1conv=False, strides=1):
super().__init__()
self.bn1 = nn.BatchNorm2d(input_channels)
self.conv1 = nn.Conv2d(input_channels, num_channels, kernel_size=1)
self.bn2 = nn.BatchNorm2d(num_channels)
self.conv2 = nn.Conv2d(num_channels, num_channels, kernel_size=3, padding=1, stride=strides)
self.bn3 = nn.BatchNorm2d(num_channels)
self.conv3 = nn.Conv2d(num_channels, num_channels * 4, kernel_size=1)
if use_1x1conv:
self.conv4 = nn.Conv2d(input_channels, num_channels * 4, kernel_size=1, stride=strides)
else:
self.conv4 = None
def forward(self, X):
Y = F.relu(self.bn1(X))
Y = self.conv1(Y)
Y = F.relu(self.bn2(Y))
Y = self.conv2(Y)
Y = F.relu(self.bn3(Y))
Y = self.conv3(Y)
if self.conv4:
X = self.conv4(X)
Y += X
return Y
def bottleneck_block(input_channels, num_channels, num_residuals, first_block=False):
blk = []
for i in range(num_residuals):
if i == 0 and not first_block:
blk.append(Bottleneck(input_channels, num_channels, use_1x1conv=True, strides=2))
else:
blk.append(Bottleneck(num_channels * 4, num_channels))
return blk
class ResNet(nn.Module):
def __init__(self, num_classes, block_sizes):
super().__init__()
self.b1 = nn.Sequential(nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
self.b2 = nn.Sequential(*bottleneck_block(64, 64, block_sizes[0], first_block=True))
self.b3 = nn.Sequential(*bottleneck_block(256, 128, block_sizes[1]))
self.b4 = nn.Sequential(*bottleneck_block(512, 256, block_sizes[2]))
self.b5 = nn.Sequential(*bottleneck_block(1024, 512, block_sizes[3]))
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.flatten = nn.Flatten()
self.fc = nn.Linear(2048, num_classes)
def forward(self, X):
X = self.b1(X)
X = self.b2(X)
X = self.b3(X)
X = self.b4(X)
X = self.b5(X)
X = self.avgpool(X)
X = self.flatten(X)
X = self.fc(X)
return X
def resnet50(num_classes):
return ResNet(num_classes, [3, 4, 6, 3])
def resnet101(num_classes):
return ResNet(num_classes, [3, 4, 23, 3])
def resnet152(num_classes):
return ResNet(num_classes, [3, 8, 36, 3])
# Usage example
num_classes = 10 # Number of output classes
net = resnet50(num_classes) # Choose the ResNet variant
# Training
lr, num_epochs, batch_size = 0.1, 10, 256
train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=96)
d2l.train_ch6(net, train_iter, test_iter, num_epochs, lr, d2l.try_gpu())
5.为什么即使函数类是嵌套的,我们仍然要限制增加函数的复杂性呢?
限制函数的复杂性有几个原因:
- 可读性和可维护性:随着函数的复杂性增加,函数的代码可能变得冗长、难以理解和难以维护。函数的目标是封装特定的功能,使代码更具可读性和可维护性。如果一个函数过于复杂,它可能会变得难以理解,导致困惑和错误。
- 可重用性:函数的目标之一是促进代码的重用。通过将代码封装在函数中,可以在不同的上下文中多次使用。然而,当函数变得过于复杂时,其可重用性可能会下降。复杂的函数可能包含过多的逻辑和依赖关系,使其难以在其他上下文中重用。
- 可测试性:函数的复杂性会增加测试的难度。当函数包含大量的逻辑和依赖关系时,编写相应的测试用例和确保代码的正确性变得更加困难。通过限制函数的复杂性,可以使函数更容易进行单元测试,并提高代码的可靠性。
🌊4. 研究体会
在本次实验中,我对残差网络(ResNet)进行了深入研究和实践。通过这个实验,我对残差网络的原理、优势以及深度对模型性能的影响有了更深入的理解。
首先,对残差网络的原理和架构有了清晰的认识。残差网络通过引入跳跃连接和残差块的方式解决了深度神经网络中的梯度消失和梯度爆炸问题。跳跃连接允许信息直接传递到后续层,使得网络可以学习残差映射,从而更好地优化模型。残差块的设计也使得网络可以学习到非线性映射,提高了模型的表达能力。
其次,深入探究了残差网络的优势。相比传统的卷积神经网络,残差网络具有更深的网络结构,可以利用更多的层次特征来提取和表达数据的复杂特征。这使得残差网络在处理大规模数据集和复杂任务时表现出更强的性能。此外,我还观察到残差网络在训练过程中具有更快的收敛速度,这是由于跳跃连接的存在减少了梯度传播的路径长度,加速了模型的训练过程。
在实验中,对残差网络的深度对模型性能的影响进行了分析。通过调整网络的深度,我发现随着网络深度的增加,模型的性能在一定程度上得到了提升。然而,当网络过深时,出现了退化问题,即模型的性能开始下降。这表明在构建残差网络时,需要适当平衡网络的深度和性能之间的关系,避免过深的网络导致性能下降。
最后,在实践中应用残差网络解决实际问题的过程中,深刻体会到了残差网络的强大能力。将残差网络应用于图像分类任务,发现相比传统网络,残差网络在处理复杂图像数据时具有更好的分类性能。此外,我还尝试了在目标检测和语音识别等领域应用残差网络,也取得了较好的效果。这进一步加深了我对残差网络的理解,并使我对深度学习的实际应用能力有了更深入的认识。
在实验中,我也遇到了一些困难。首先是网络的训练时间较长,尤其是在增加网络深度的情况下。为了节省时间,我尝试了使用预训练模型和批量归一化等技术,以加快训练速度并提高模型的性能。其次,调整网络深度时需要进行多次实验和分析,以找到最佳的深度配置。这要求我具备耐心和细致的科研态度,不断地进行试验和调整。