本文仅在理论方面讲述CNN相关的知识,并给出AlexNet, Agg, ResNet等网络结构的代码。
1.构成
由输入层、卷积层、池化层、全连接层构成。
- 输入层:输入数据
- 卷积层:提取图像特征
- 池化层:压缩特征
- 全连接层:为输出准备,形同一维神经网络,下文不另起文笔描述
2.神经网络与CNN对比
左边为神经网络,右边为卷积神经网络。均采用的时较为简单的结构,卷积神经网络是对基础神经网络的延申,由一维扩展到三位空间,适用于对图像的操作。
3.卷积层
假设我们在输入一张 32 × 32 × 3 32 \times 32 \times 3 32×32×3 大小的图片进入CNN,我们在卷积层对他进行图像特征提取,输入图片输出特征图。首先我们需要设定以下参数作为卷积层的参数:
- 滑动窗口步长:卷积核移动的方式,通常使用1,即每进行一次卷积操作向右移动一个像素。
- 卷积核尺寸:常用 3 × 3 , 11 × 11 3\times 3 ,11\times 11 3×3,11×11等奇数尺寸。用于设定进行卷积操作的范围。
- 边缘填充:有时为了保证特征提取的结果(特征图)的大小,会在原图周围添加像素为零的点,再进行卷积操作。
- 卷积核个数:设定卷积核的个数。
卷积操作
其中卷积操作为需要卷积操作的范围内,对原图像的像素分别乘上卷积核对应内容并相加,得到结果,以红框即第一次卷积操作为例 结果为:
0
∗
1
+
2
∗
0
+
4
∗
1
+
1
∗
0
+
3
∗
1
+
5
∗
0
+
30
∗
1
+
12
∗
0
+
32
∗
1
=
64
0*1+2*0+4*1+1*0+3*1+5*0+30*1+12*0+32*1=64
0∗1+2∗0+4∗1+1∗0+3∗1+5∗0+30∗1+12∗0+32∗1=64
图片中展示的为单通道的卷积操作,由于我们输入的时RGB三通道的图片,我们需要3个卷积核对每一个通道进行卷积操作,再将三个通道相加得到特征图。
特征图尺寸
我们可以通过公式计算出最终得到的卷积结果的大小,其中H代表长,F代表卷积核,P代表Padding边缘填充,S代表步长:
H
2
=
H
1
−
F
H
+
2
P
S
+
1
W
2
=
W
2
−
F
H
+
2
P
S
+
1
H_2 =\frac{H_1-F_H+2P}{S}+1\\ W_2 = \frac{W_2-F_H+2P}{S}+1
H2=SH1−FH+2P+1W2=SW2−FH+2P+1
4.池化层
池化层是为了对特征图进行下采样(即压缩)而被使用的,池化有很多种方式,Max Pooling , Min Pooling , Average Pooling 等。在此我们仅解释Max Pooling操作,其余操作可依此类推:
Max Pooling:对取样范围内的值进行压缩,取范围内最大的值。
Average Pooling: 从核内计算平均值,取该值
5.网络构建
在构成卷积神经网络时,在卷积层后增加激活函数,一般深度神经网络使用ReUL激活函数,每一个卷积层(conv)或全连接层(fc)称为神经网络中的一层。下面我们以一个四层神经网络为例:
6.常见的卷积神经网络
我们使用tensorflow中的keras库尝试搭建这些网络,在此仅展示代码,后续会补上代码的相关解释博客,此处展示的代码为网络结构,若你了解tensorflow训练的流程,可以尝试使用以下网络训练。下述代码笔者均使用tensorflow中的数据集尝试训练过。
AlexNet
AlexNet 为第一个深度神经网络,他一共有八层,其中五个卷积层和三个全连接层,卷积核的大小为
11
×
11
11 \times 11
11×11 ,0 padding。
import tensorflow as tf
class AlexNet8(tf.keras.Model):
def __init__(self):
super(AlexNet8, self).__init__()
self.conv1 = tf.keras.layers.Conv2D(filters=96, kernel_size=(3, 3),
padding='valid', strides=1)
self.bn1 = tf.keras.layers.BatchNormalization()
self.activation1 = tf.keras.layers.Activation('relu')
self.pool1 = tf.keras.layers.MaxPooling2D(pool_size=(3, 3), strides=2)
self.conv2 = tf.keras.layers.Conv2D(filters=256, kernel_size=(3, 3),
padding='valid', strides=1)
self.bn2 = tf.keras.layers.BatchNormalization()
self.activation2 = tf.keras.layers.Activation('relu')
self.pool2 = tf.keras.layers.MaxPooling2D(pool_size=(3, 3), strides=2)
self.conv3 = tf.keras.layers.Conv2D(filters=384, kernel_size=(3, 3),
padding='same', activation='relu',
strides=1)
self.conv4 = tf.keras.layers.Conv2D(filters=384, kernel_size=(3, 3),
padding='same', activation='relu',
strides=1)
self.conv5 = tf.keras.layers.Conv2D(filters=256, kernel_size=(3, 3),
padding='same', activation='relu',
strides=1)
self.pool3 = tf.keras.layers.MaxPooling2D(pool_size=(3, 3), strides=2)
self.flatten = tf.keras.layers.Flatten()
self.dense1 = tf.keras.layers.Dense(2048, activation='relu')
self.dropout1 = tf.keras.layers.Dropout(0.5)
self.dense2 = tf.keras.layers.Dense(2048, activation='relu')
self.dropout2 = tf.keras.layers.Dropout(0.5)
self.dense3 = tf.keras.layers.Dense(10, activation='softmax')
def call(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.activation1(x)
x = self.pool1(x)
x = self.conv2(x)
x = self.bn2(x)
x = self.activation2(x)
x = self.pool2(x)
x = self.conv3(x)
x = self.conv4(x)
x = self.conv5(x)
x = self.pool3(x)
x = self.flatten(x)
x = self.dense1(x)
x = self.dropout1(x)
x = self.dense2(x)
x = self.dropout2(x)
y = self.dense3(x)
return y
Vgg
下列图中的结构为Vgg16,一共有16层,其中13个卷积层,三个全连接层,卷积核的大小为
3
×
3
3 \times 3
3×3
import tensorflow as tf
class VGGNet(tf.keras.Model):
def __init__(self):
super(VGGNet, self).__init__()
self.conv1 = tf.keras.layers.Conv2D(filters=64, kernel_size=3, padding='same', strides=1)
self.bn1 = tf.keras.layers.BatchNormalization()
self.activation1 = tf.keras.layers.Activation('relu')
self.conv2 = tf.keras.layers.Conv2D(filters=64, kernel_size=3, padding='same', strides=1)
self.bn2 = tf.keras.layers.BatchNormalization()
self.activation2 = tf.keras.layers.Activation('relu')
self.pool1 = tf.keras.layers.MaxPool2D(pool_size=2, strides=2)
self.dropout1 = tf.keras.layers.Dropout(0.2)
self.conv3 = tf.keras.layers.Conv2D(filters=128, kernel_size=3, padding='same', strides=1)
self.bn3 = tf.keras.layers.BatchNormalization()
self.activation3 = tf.keras.layers.Activation('relu')
self.conv4 = tf.keras.layers.Conv2D(filters=128, kernel_size=3, padding='same', strides=1)
self.bn4 = tf.keras.layers.BatchNormalization()
self.activation4 = tf.keras.layers.Activation('relu')
self.pool2 = tf.keras.layers.MaxPool2D(pool_size=2, strides=2)
self.dropout2 = tf.keras.layers.Dropout(0.2)
self.conv5 = tf.keras.layers.Conv2D(filters=256, kernel_size=3, padding='same', strides=1)
self.bn5 = tf.keras.layers.BatchNormalization()
self.activation5 = tf.keras.layers.Activation('relu')
self.conv6 = tf.keras.layers.Conv2D(filters=256, kernel_size=3, padding='same', strides=1)
self.bn6 = tf.keras.layers.BatchNormalization()
self.activation6 = tf.keras.layers.Activation('relu')
self.conv7 = tf.keras.layers.Conv2D(filters=256, kernel_size=3, padding='same', strides=1)
self.bn7 = tf.keras.layers.BatchNormalization()
self.activation7 = tf.keras.layers.Activation('relu')
self.pool3 = tf.keras.layers.MaxPool2D(pool_size=2, strides=2)
self.dropout3 = tf.keras.layers.Dropout(0.2)
self.conv8 = tf.keras.layers.Conv2D(filters=512, kernel_size=3, padding='same', strides=1)
self.bn8 = tf.keras.layers.BatchNormalization()
self.activation8 = tf.keras.layers.Activation('relu')
self.conv9 = tf.keras.layers.Conv2D(filters=512, kernel_size=3, padding='same')
self.bn9 = tf.keras.layers.BatchNormalization()
self.activation9 = tf.keras.layers.Activation('relu')
self.conv10 = tf.keras.layers.Conv2D(filters=512, kernel_size=3, padding='same', strides=1)
self.bn10 = tf.keras.layers.BatchNormalization()
self.activation10 = tf.keras.layers.Activation('relu')
self.pool4 = tf.keras.layers.MaxPool2D(pool_size=2, strides=2)
self.dropout4 = tf.keras.layers.Dropout(0.2)
self.conv11 = tf.keras.layers.Conv2D(filters=512, kernel_size=3, padding='same', strides=1)
self.bn11 = tf.keras.layers.BatchNormalization()
self.activation11 = tf.keras.layers.Activation('relu')
self.conv12 = tf.keras.layers.Conv2D(filters=512, kernel_size=3, padding='same', strides=1)
self.bn12 = tf.keras.layers.BatchNormalization()
self.activation12 = tf.keras.layers.Activation('relu')
self.conv13 = tf.keras.layers.Conv2D(filters=512, kernel_size=3, padding='same', strides=1)
self.bn13 = tf.keras.layers.BatchNormalization()
self.activation13 = tf.keras.layers.Activation('relu')
self.pool5 = tf.keras.layers.MaxPool2D(pool_size=2, strides=2)
self.dropout5 = tf.keras.layers.Dropout(0.2)
self.flatten = tf.keras.layers.Flatten()
self.dense1 = tf.keras.layers.Dense(512, activation='relu')
self.dropout6 = tf.keras.layers.Dropout(0.2)
self.dense2 = tf.keras.layers.Dense(512, activation='relu')
self.dropout7 = tf.keras.layers.Dropout(0.2)
self.dense3 = tf.keras.layers.Dense(10, activation='softmax')
def call(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.activation1(x)
x = self.conv2(x)
x = self.bn2(x)
x = self.activation2(x)
x = self.pool1(x)
x = self.dropout1(x)
x = self.conv3(x)
x = self.bn3(x)
x = self.activation3(x)
x = self.conv4(x)
x = self.bn4(x)
x = self.activation4(x)
x = self.pool2(x)
x = self.dropout2(x)
x = self.conv5(x)
x = self.bn5(x)
x = self.activation5(x)
x = self.conv6(x)
x = self.bn6(x)
x = self.activation6(x)
x = self.conv7(x)
x = self.bn7(x)
x = self.activation7(x)
x = self.pool3(x)
x = self.dropout3(x)
x = self.conv8(x)
x = self.bn8(x)
x = self.activation8(x)
x = self.conv9(x)
x = self.bn9(x)
x = self.activation9(x)
x = self.conv10(x)
x = self.bn10(x)
x = self.activation10(x)
x = self.pool4(x)
x = self.dropout4(x)
x = self.conv11(x)
x = self.bn11(x)
x = self.activation11(x)
x = self.conv12(x)
x = self.bn12(x)
x = self.activation12(x)
x = self.conv13(x)
x = self.bn13(x)
x = self.activation13(x)
x = self.pool5(x)
x = self.dropout5(x)
x = self.flatten(x)
x = self.dense1(x)
x = self.dropout6(x)
x = self.dense2(x)
x = self.dropout7(x)
y = self.dense3(x)
return y
ResNet
由于添加更深层网络(大于20层)时,会出现精度下降的情况,导致20层以上的深度神经网络无法达到更好的性能。resnet网络则解决了这一问题,通过将上一层结果和本层卷积结果进行比较,取更优的网络作为我们传入下层的输入。
import tensorflow as tf
class ResnetBlock(tf.keras.Model):
def __init__(self, filters, strides=1, residual_path=False):
super(ResnetBlock, self).__init__()
self.filters = filters
self.strides = strides
self.residual_path = residual_path
self.c1 = tf.keras.layers.Conv2D(filters, (3, 3), strides=strides, padding='same', use_bias=False)
self.b1 = tf.keras.layers.BatchNormalization()
self.a1 = tf.keras.layers.Activation('relu')
self.c2 = tf.keras.layers.Conv2D(filters, (3, 3), strides=1, padding='same', use_bias=False)
self.b2 = tf.keras.layers.BatchNormalization()
if residual_path:
self.down_c1 = tf.keras.layers.Conv2D(filters, (1, 1), strides=strides, padding='same', use_bias=False)
self.down_b1 = tf.keras.layers.BatchNormalization()
self.a2 = tf.keras.layers.Activation('relu')
def call(self, inputs):
residual = inputs
x = self.c1(inputs)
x = self.b1(x)
x = self.a1(x)
x = self.c2(x)
y = self.b2(x)
if self.residual_path:
residual = self.down_c1(inputs)
residual = self.down_b1(residual)
out = self.a2(y + residual)
return out
class ResNet18(tf.keras.Model):
def __init__(self, block_list, initial_filters=64):
super(ResNet18, self).__init__()
self.num_blocks = len(block_list)
self.block_list = block_list
self.out_filters = initial_filters
self.c1 = tf.keras.layers.Conv2D(self.out_filters, (3, 3), strides=1, padding='same', use_bias=False)
self.b1 = tf.keras.layers.BatchNormalization()
self.a1 = tf.keras.layers.Activation('relu')
self.blocks = tf.keras.models.Sequential()
# 构建ResNet网络结构
for block_id in range(len(block_list)):
for layer_id in range(block_list[block_id]):
if block_id != 0 and layer_id == 0:
block = ResnetBlock(self.out_filters, strides=2, residual_path=True)
else:
block = ResnetBlock(self.out_filters, residual_path=False)
self.blocks.add(block)
self.out_filters *= 2
self.p1 = tf.keras.layers.GlobalAveragePooling2D()
self.f1 = tf.keras.layers.Dense(10, activation='softmax', kernel_regularizer=tf.keras.regularizers.l2())
def call(self, inputs):
x = self.c1(inputs)
x = self.b1(x)
x = self.a1(x)
x = self.blocks(x)
x = self.p1(x)
y = self.f1(x)
return y