文章目录
深度学习Week18——学习残差网络和ResNet-50算法
一、前言
二、我的环境
三、前期工作
1、配置环境
2、导入数据
2.1 加载数据
2.2 配置数据集
2.3 数据可视化
2.4 再次检查数据
四、构建ResNet-50网络模型
五、编译模型
六、训练模型
七、模型评估
八、指定图片预测
九、其他
一、前言
- 🍨 本文为🔗365天深度学习训练营 中的学习记录博客
- 🍖 原作者:K同学啊 | 接辅导、项目定制
本周学习主要了解到何恺明大佬提出的深度残差网络
在之前的17周练习中,我们常常碰见:当前广泛使用的优化器,无论是SGD,还是RMSProp,或是Adam都无法在网络深度变大后达到理论上最优的收敛结果。
准确率下降问题不是网络结构本身的问题,而是现有的训练方式不够理想造成的。
二、我的环境
- 电脑系统:Windows 10
- 语言环境:Python 3.8.0
- 编译器:Pycharm2023.2.3
深度学习环境:TensorFlow
显卡及显存:RTX 3060 8G
三、前期工作
1、配置环境
import tensorflow as tf
gpus = tf.config.list_physical_devices("GPU")
if gpus:
tf.config.experimental.set_memory_growth(gpus[0], True) #设置GPU显存用量按需使用
tf.config.set_visible_devices([gpus[0]],"GPU")
import matplotlib.pyplot as plt
# 支持中文
plt.rcParams['font.sans-serif'] = ['SimHei'] # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False # 用来正常显示负号
import os,PIL,pathlib
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers,models
2、 导入数据
导入所有数据,依次分别为训练集图片(train_images)、训练集标签(train_labels)、测试集图片(test_images)、测试集标签(test_labels),数据集来源于K同学啊
2.1 加载数据
data_dir = "/home/mw/input/J1datas9827/bird_photos/bird_photos"
data_dir = pathlib.Path(data_dir)
image_count = len(list(data_dir.glob('*/*')))
print("图片总数是:", image_count)
图片总数是: 565
2.2 配置数据集
batch_size = 8
img_height = 224
img_width = 224
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="training",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
使用
image_dataset_from_directory
方法将磁盘中的数据加载到tf.data.Dataset
中
tf.keras.preprocessing.image_dataset_from_directory()
会将文件夹中的数据加载到tf.data.Dataset中,且加载的同时会打乱数据。
- class_names
- validation_split: 0和1之间的可选浮点数,可保留一部分数据用于验证。
- subset: training或validation之一。仅在设置validation_split时使用。
- seed: 用于shuffle和转换的可选随机种子。
- batch_size: 数据批次的大小。默认值:32
- image_size: 从磁盘读取数据后将其重新调整大小。默认:(256,256)。由于管道处理的图像批次必须具有相同的大小,因此该参数必须提供。
输出:
Found 565 files belonging to 4 classes.
Using 452 files for training.
由于原始的数据集里不包含测试集,所以我们需要自己创建一个
# 测试集
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="validation",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
Found 565 files belonging to 4 classes.
Using 113 files for validation.
我们可以通过class_names输出数据集的标签。标签将按字母顺序对应于目录名称。
# 用class_names输出数据集的标签
class_names = train_ds.class_names
print(class_names)
[‘Bananaquit’, ‘Black Skimmer’, ‘Black Throated Bushtiti’, ‘Cockatoo’]
2.3 数据可视化
plt.figure(figsize=(10, 5)) # 图形的宽为10高为5
plt.suptitle("源项目:K同学啊;学习:hanon")
for images, labels in train_ds.take(1):
for i in range(8):
ax = plt.subplot(2, 4, i + 1)
plt.imshow(images[i].numpy().astype("uint8"))
plt.title(class_names[labels[i]])
plt.axis("off")
plt.imshow(images[0].numpy().astype("uint8"))
2.4 再次检查数据
for image_batch, labels_batch in train_ds:
print(image_batch.shape)
print(labels_batch.shape)
break
(8, 224, 224, 3)
(8,)
四 、构建ResNet-50网络模型
AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
from keras import layers
from keras.layers import Input,Activation,BatchNormalization,Flatten
from keras.layers import Dense,Conv2D,MaxPooling2D,ZeroPadding2D,AveragePooling2D
from keras.models import Model
def identity_block(input_tensor, kernel_size, filters, stage, block):
filters1, filters2, filters3 = filters
name_base = str(stage) + block + '_identity_block_'
x = Conv2D(filters1, (1, 1), name=name_base + 'conv1')(input_tensor)
x = BatchNormalization(name=name_base + 'bn1')(x)
x = Activation('relu', name=name_base + 'relu1')(x)
x = Conv2D(filters2, kernel_size,padding='same', name=name_base + 'conv2')(x)
x = BatchNormalization(name=name_base + 'bn2')(x)
x = Activation('relu', name=name_base + 'relu2')(x)
x = Conv2D(filters3, (1, 1), name=name_base + 'conv3')(x)
x = BatchNormalization(name=name_base + 'bn3')(x)
x = layers.add([x, input_tensor] ,name=name_base + 'add')
x = Activation('relu', name=name_base + 'relu4')(x)
return x
# 在残差网络中,广泛地使用了BN层;但是没有使用MaxPooling以便减小特征图尺寸,
# 作为替代,在每个模块的第一层,都使用了strides = (2, 2)的方式进行特征图尺寸缩减,
# 与使用MaxPooling相比,毫无疑问是减少了卷积的次数,输入图像分辨率较大时比较适合
# 在残差网络的最后一级,先利用layer.add()实现H(x) = x + F(x)
def conv_block(input_tensor, kernel_size, filters, stage, block, strides=(2, 2)):
filters1, filters2, filters3 = filters
res_name_base = str(stage) + block + '_conv_block_res_'
name_base = str(stage) + block + '_conv_block_'
x = Conv2D(filters1, (1, 1), strides=strides, name=name_base + 'conv1')(input_tensor)
x = BatchNormalization(name=name_base + 'bn1')(x)
x = Activation('relu', name=name_base + 'relu1')(x)
x = Conv2D(filters2, kernel_size, padding='same', name=name_base + 'conv2')(x)
x = BatchNormalization(name=name_base + 'bn2')(x)
x = Activation('relu', name=name_base + 'relu2')(x)
x = Conv2D(filters3, (1, 1), name=name_base + 'conv3')(x)
x = BatchNormalization(name=name_base + 'bn3')(x)
shortcut = Conv2D(filters3, (1, 1), strides=strides, name=res_name_base + 'conv')(input_tensor)
shortcut = BatchNormalization(name=res_name_base + 'bn')(shortcut)
x = layers.add([x, shortcut], name=name_base+'add')
x = Activation('relu', name=name_base+'relu4')(x)
return x
def ResNet50(input_shape=[224,224,3],classes=1000):
img_input = Input(shape=input_shape)
x = ZeroPadding2D((3, 3))(img_input)
x = Conv2D(64, (7, 7), strides=(2, 2), name='conv1')(x)
x = BatchNormalization(name='bn_conv1')(x)
x = Activation('relu')(x)
x = MaxPooling2D((3, 3), strides=(2, 2))(x)
x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1))
x = identity_block(x, 3, [64, 64, 256], stage=2, block='b')
x = identity_block(x, 3, [64, 64, 256], stage=2, block='c')
x = conv_block(x, 3, [128, 128, 512], stage=3, block='a')
x = identity_block(x, 3, [128, 128, 512], stage=3, block='b')
x = identity_block(x, 3, [128, 128, 512], stage=3, block='c')
x = identity_block(x, 3, [128, 128, 512], stage=3, block='d')
x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='b')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='c')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='d')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='e')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='f')
x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a')
x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b')
x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c')
x = AveragePooling2D((7, 7), name='avg_pool')(x)
x = Flatten()(x)
x = Dense(classes, activation='softmax', name='fc1000')(x)
model = Model(img_input, x, name='resnet50')
# 加载预训练模型,这里输入你的预训练模型存放的地址
model.load_weights("/home/mw/input/J1datas9827/resnet50_weights_tf_dim_ordering_tf_kernels.h5")
return model
model = ResNet50()
model.summary()
______________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
===========================================================================================
input_2 (InputLayer) [(None, 224, 224, 3) 0
______________________________________________________________________________________________
zero_padding2d_1 (ZeroPadding2D (None, 230, 230, 3) 0 input_2[0][0]
______________________________________________________________________________________________
conv1 (Conv2D) (None, 112, 112, 64) 9472 zero_padding2d_1[0][0]
______________________________________________________________________________________________
bn_conv1 (BatchNormalization) (None, 112, 112, 64) 256 conv1[0][0]
______________________________________________________________________________________________
activation_1 (Activation) (None, 112, 112, 64) 0 bn_conv1[0][0]
______________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, 55, 55, 64) 0 activation_1[0][0]
______________________________________________________________________________________________
2a_conv_block_conv1 (Conv2D) (None, 55, 55, 64) 4160 max_pooling2d_1[0][0]
______________________________________________________________________________________________
2a_conv_block_bn1 (BatchNormali (None, 55, 55, 64) 256 2a_conv_block_conv1[0][0]
______________________________________________________________________________________________
省略…………………………
______________________________________________________________________________________________
5a_conv_block_conv3 (Conv2D) (None, 7, 7, 2048) 1050624 5a_conv_block_relu2[0][0]
______________________________________________________________________________________________
5a_conv_block_res_conv (Conv2D) (None, 7, 7, 2048) 2099200 4f_identity_block_relu4[0][0]
______________________________________________________________________________________________
5a_conv_block_bn3 (BatchNormali (None, 7, 7, 2048) 8192 5a_conv_block_conv3[0][0]
______________________________________________________________________________________________
5a_conv_block_res_bn (BatchNorm (None, 7, 7, 2048) 8192 5a_conv_block_res_conv[0][0]
______________________________________________________________________________________________
5a_conv_block_add (Add) (None, 7, 7, 2048) 0 5a_conv_block_bn3[0][0] 5a_conv_block_res_bn[0][0]
______________________________________________________________________________________________5a_conv_block_relu4 (Activation (None, 7, 7, 2048) 0 5a_conv_block_add[0][0]
______________________________________________________________________________________________
5b_identity_block_conv1 (Conv2D (None, 7, 7, 512) 1049088 5a_conv_block_relu4[0][0]
______________________________________________________________________________________________
5b_identity_block_bn1 (BatchNor (None, 7, 7, 512) 2048 5b_identity_block_conv1[0][0]
______________________________________________________________________________________________
5b_identity_block_relu1 (Activa (None, 7, 7, 512) 0 5b_identity_block_bn1[0][0]
______________________________________________________________________________________________
5b_identity_block_conv2 (Conv2D (None, 7, 7, 512) 2359808 5b_identity_block_relu1[0][0]
______________________________________________________________________________________________
5b_identity_block_bn2 (BatchNor (None, 7, 7, 512) 2048 5b_identity_block_conv2[0][0]
______________________________________________________________________________________________
5b_identity_block_relu2 (Activa (None, 7, 7, 512) 0 5b_identity_block_bn2[0][0]
______________________________________________________________________________________________
5b_identity_block_conv3 (Conv2D (None, 7, 7, 2048) 1050624 5b_identity_block_relu2[0][0]
______________________________________________________________________________________________
5b_identity_block_bn3 (BatchNor (None, 7, 7, 2048) 8192 5b_identity_block_conv3[0][0]
______________________________________________________________________________________________
5b_identity_block_add (Add) (None, 7, 7, 2048) 0 5b_identity_block_bn3[0][0] 5a_conv_block_relu4[0][0]
______________________________________________________________________________________________
5b_identity_block_relu4 (Activa (None, 7, 7, 2048) 0 5b_identity_block_add[0][0]
______________________________________________________________________________________________
5c_identity_block_conv1 (Conv2D (None, 7, 7, 512) 1049088 5b_identity_block_relu4[0][0]
______________________________________________________________________________________________
5c_identity_block_bn1 (BatchNor (None, 7, 7, 512) 2048 5c_identity_block_conv1[0][0]
______________________________________________________________________________________________
5c_identity_block_relu1 (Activa (None, 7, 7, 512) 0 5c_identity_block_bn1[0][0]
______________________________________________________________________________________________
5c_identity_block_conv2 (Conv2D (None, 7, 7, 512) 2359808 5c_identity_block_relu1[0][0]
______________________________________________________________________________________________
5c_identity_block_bn2 (BatchNor (None, 7, 7, 512) 2048 5c_identity_block_conv2[0][0]
______________________________________________________________________________________________
5c_identity_block_relu2 (Activa (None, 7, 7, 512) 0 5c_identity_block_bn2[0][0]
______________________________________________________________________________________________
5c_identity_block_conv3 (Conv2D (None, 7, 7, 2048) 1050624 5c_identity_block_relu2[0][0]
______________________________________________________________________________________________
5c_identity_block_bn3 (BatchNor (None, 7, 7, 2048) 8192 5c_identity_block_conv3[0][0]
______________________________________________________________________________________________
5c_identity_block_add (Add) (None, 7, 7, 2048) 0 5c_identity_block_bn3[0][0] 5b_identity_block_relu4[0][0]
______________________________________________________________________________________________
5c_identity_block_relu4 (Activa (None, 7, 7, 2048) 0 5c_identity_block_add[0][0]
______________________________________________________________________________________________
avg_pool (AveragePooling2D) (None, 1, 1, 2048) 0 5c_identity_block_relu4[0][0]
______________________________________________________________________________________________
flatten_1 (Flatten) (None, 2048) 0 avg_pool[0][0]
______________________________________________________________________________________________
fc1000 (Dense) (None, 1000) 2049000 flatten_1[0][0]
=========================================================================================
Total params: 25,636,712
Trainable params: 25,583,592
Non-trainable params: 53,120
______________________________________________________________________________________________
五、编译模型
# 设置学习率,我们使用不同的学习率
opt = tf.keras.optimizers.Adam(learning_rate=1e-5)
model.compile(optimizer="adam",
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
六、训练模型
epochs = 10
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
Epoch 1/10
57/57 [==============================] - 27s 138ms/step - loss: 2.3737 - accuracy: 0.6071 - val_loss: 4607.0063 - val_accuracy: 0.2655
Epoch 2/10
57/57 [==============================] - 6s 100ms/step - loss: 0.5559 - accuracy: 0.7980 - val_loss: 59.8738 - val_accuracy: 0.2301
Epoch 3/10
57/57 [==============================] - 6s 100ms/step - loss: 0.3012 - accuracy: 0.9086 - val_loss: 28.0947 - val_accuracy: 0.2743
Epoch 4/10
57/57 [==============================] - 6s 101ms/step - loss: 0.1443 - accuracy: 0.9562 - val_loss: 0.3001 - val_accuracy: 0.9381
Epoch 5/10
57/57 [==============================] - 6s 101ms/step - loss: 0.0551 - accuracy: 0.9870 - val_loss: 8.5784 - val_accuracy: 0.6195
Epoch 6/10
57/57 [==============================] - 6s 101ms/step - loss: 0.0492 - accuracy: 0.9879 - val_loss: 0.2026 - val_accuracy: 0.9204
Epoch 7/10
57/57 [==============================] - 6s 101ms/step - loss: 0.0308 - accuracy: 0.9900 - val_loss: 1.6341 - val_accuracy: 0.5929
Epoch 8/10
57/57 [==============================] - 6s 102ms/step - loss: 0.0839 - accuracy: 0.9744 - val_loss: 0.2057 - val_accuracy: 0.9646
Epoch 9/10
57/57 [==============================] - 6s 102ms/step - loss: 0.1811 - accuracy: 0.9524 - val_loss: 2.8686 - val_accuracy: 0.5929
Epoch 10/10
57/57 [==============================] - 6s 102ms/step - loss: 0.1442 - accuracy: 0.9542 - val_loss: 0.9458 - val_accuracy: 0.8142
七、模型评估
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs_range = range(epochs)
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.suptitle("源项目:K同学啊;学习:hanon;学习率:1e-5")
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
八、指定图片预测
# 采用加载的模型(new_model)来看预测结果
plt.figure(figsize=(10, 5)) # 图形的宽为10高为5
plt.suptitle("源项目:K同学啊;学习:hanon")
for images, labels in val_ds.take(1):
for i in range(8):
ax = plt.subplot(2, 4, i + 1)
# 显示图片
plt.imshow(images[i].numpy().astype("uint8"))
# 需要给图片增加一个维度
img_array = tf.expand_dims(images[i], 0)
# 使用模型预测图片中的人物
predictions = model.predict(img_array)
plt.title(class_names[np.argmax(predictions)])
plt.axis("off")
九、其他
何恺明大佬在论文中提到:随着神经网络深度的增加,训练变得更困难,出现了梯度消失或爆炸的问题,以及准确度饱和后迅速下降的退化问题,这也是深度学习领域中的两朵乌云:
拓展: 深度神经网络的“两朵乌云”
- 梯度弥散/爆炸
简单来讲就是网络太深了,会导致模型训练难以收敛。这个问题可以被标准初始化和中间层正规化的方法有效控制。
a. 比如我们在一个山谷里面大喊,如果山谷非常深,回声可能会变得非常非常小,几乎听不见。
在神经网络中,梯度弥散就像这个回声一样。当训练一个很深的网络时,从最后一层传回第一层的梯度(也就是告诉每一层怎么调整的信号)可能会变得非常小,就像声音在传播过程中逐渐减弱,导致前面的层几乎得不到有效的学习信号。
b. 又比如我们在房间里大喊一声,声音在房间里反复反射,变得越来越大。如果这个声音(梯度)没有得到控制,它可能会变得极其响亮,以至于你无法分辨出任何有用的东西。
在神经网络中,梯度爆炸就是梯度在传播过程中变得非常大,导致网络的权重更新过于剧烈,学习过程变得不稳定,甚至导致网络完全失效。- 网络退化
随着网络深度增加,网络的表现先是逐渐增加至饱和,然后迅速下降,这个退化不是由于过拟合引起的
就如传话游戏。在这个游戏中,一组人站成一排,第一个人得到一个消息,然后他小声告诉下一个人,下一个人再告诉下一个,依此类推。理论上,如果有更多的人参与,你可以传递更长的消息。可事实上,信息的内容总是逐渐增加至饱和,然后迅速下降,当消息传到队尾时,消息可能会变得模糊不清,甚至完全变了样。
为了解决这些问题,大佬使用了一种名为残差块(Residual Block)的方式,一种包含快捷连接(Shortcut Connections)的构建模块,这些连接可以跳过一层或多层,简化了网络的输出与输入之间的关系。通过下图(图片来源K同学啊),我们可以很直白的理解残差网络是如何解决上面两个问题的。
就如我们本周打卡的内容,假设有一个五层的神经网络,输入是一张图片,我们希望网络能够识别出图片中的鸟的品类。在残差网络中,除了正常的层级结构,我们在网络的某两层之间增加了一个“快捷连接”如第二层和第四层。这样,第二层学习到的特征可以直接“传递”到第四层,第四层只需要学习第二层特征和最终输出之间的差异。这样,即使网络很深,每一层的学习任务都变得简单了,整个网络的训练也变得更加容易。
而在传统的神经网络中,信息需要逐层传递,如果某一层学习效果不好,就会影响整个网络的性能。
其实和我们人类学习很像,加入我们正在学习某个知识,我们总会经历以下五个阶段:
-
基础技能:一开始,你学习了一些基础和简单的知识,这就像是神经网络的前几层,它们学习了一些基本的特征识别能力。
-
技能提升:随着时间的推移,你开始学习更复杂的知识和技巧,这就像是网络的中间层,它们在基础特征上进一步学习更复杂的模式。
-
瓶颈期:但是,你可能会到达一个点,感觉无论怎么练习,提高都很有限。这就像深层网络训练时遇到的退化问题,网络的深层似乎不再学习新的东西,性能提升停滞。
-
残差学习:这时候,如果你采取一种不同的学习方法,比如回顾并改进你的基础,确保对每个知识点掌握的更熟,对比学习现在遇到的难题和自己已掌握的基础的差别(残差)就能在现有的基础上继续提升。这就像是残差网络中的“快捷连接”,它允许网络层直接学习相对于前面层的“残差”或改进,而传统网络可能就重新开始学习新的复杂学习。
-
持续进步:通过不断地在基础和复杂技巧之间循环,你能够持续提升你的水平,即使在达到高水平后也能继续进步。这与残差网络的工作方式相似,网络能够通过学习残差来持续优化深层的特征表示,从而提高整体的性能。
以上为本周打卡内容。