神经网络必备基础

和神经网络介绍相比，本文更侧重于程序实现

理解Keras中的组件

Keras是一个高级的神经网络API，用Python实现的，并且可以运行在TensorFlow、CNTK或Theano等后台之上。

model.compile()

compile(self, optimizer, loss, metrics=None, ...)

该函数用于配置模型的学习过程
接收3个参数
- optimizer: 是对损失函数(Loss function)求最小值的算法，包含一系列参数。
  - 可以在compile()调用时直接指定算法的名称，比如"SGD", "adam"，"rmsprop"等
  - 也可以在compile之前先实例化，然后再将这个实例传给compile()
- loss: 用于测量神经网络误差的准确度的目标函数，即损失函数，会在训练过程用于调整参数
- metrics: 用于模型评价。metrics和loss在含义上的区别可以参考这篇博文：loss与metric的区别以及 optimizer的介绍_metric loss-CSDN博客

常用的optimizer

SGD: Stochastic Gradient Descent，随机梯度下降法。支持动量(momentum)，学习率衰减(learning rate decay)等超参数
RMSprop：常用于循环神经网络
Adam：Adaptive Moment Estimation，自适应矩估计。是一种基于一阶梯度的随机目标函数优化算法

常用的loss

mean_squared_error：均方差。用于回归问题
categorical_crossentropy: 分类交叉熵。计算预测和目标值之间的分类交叉熵，常用于目标有多个分类的问题。
binary_crossentropy: 二元交叉熵。计算预测和目标值之间的二元交叉熵，常用于目标有2个分类的问题。

以上参数详细信息均可以查看Keras手册: Optimizers, Losses, Metrics

使用Keras实现一个神经网络

下面给出一个神经网络的程序案例。该案例使用了经典的MNIST数据集。该数据集是一个用于识别手写数字的数据集，即根据手写的数字样式（图片），识别0~9共10个数字。该问题是一个多分类问题。该数据集的官网：http://yann.lecun.com/exdb/mnist/。以下程序依然是使用google Colaboratory的开发环境。

1. 导入Python的包

from keras.datasets import mnist
from keras.preprocessing.image import load_img, array_to_img
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense

import numpy as np
import matplotlib.pyplot as plt

# Allow view plots in the notebook, '%' is a magic usage for python for special
%matplotlib inline

2. 加载数据

导入Keras的mnist数据集后，直接调用其load_data()方法

(X_train_raw, y_train_raw), (X_test_raw, y_test_raw) = mnist.load_data()
print(X_train_raw.shape, X_test_raw.shape)
print(y_train_raw.shape, y_test_raw.shape)

输出如下：

(60000, 28, 28) (10000, 28, 28)
(60000,) (10000,)

可以看到训练集中有60000组数据，每组数据包含28x28=784个像素点，即每个采样有784的维度，按照神经网络的通俗理解，有784个特性。之所以强调这个理解，是因为在随后的主题中，将讨论到卷积神经网络，在那个主题里，我们将讨论到图像识别问题中的高维问题。而本文中的问题，虽然也是一个图像识别问题，但由于像素点不是特别多，所以采用的方法还是传统的神经网络模型，即全连接的前馈神经网络。

测试集中有10000组数据。而标签y的每组数据只有一个数值，即0~9。

3. 理解图像数据

print(X_train_raw[0].shape)
plt.imshow(X_train_raw[0], cmap='gray')
print(y_train_raw[0])

输出如下：

可以看到，这个数字是5，图片显示了这个5的手写样式。

4. 预处理训练数据

对于图像数据，其每组数据（像素点）均为MxN的矩阵形式（不考虑色彩），而通常的神经网络，每层处理的实际上一个向量数据，而非矩阵形式，因此需要将MxN的矩阵转化为一个1xMN的向量。下面程序中的reshape()函数实现了这个功能，这也是一般图像数据需要进行的预处理。

每组数据的每个像素点的值是0~255（表示由黑渐变到白的每一种颜色），为了消除数值本身的影响，采用了对每个数值除以255的定标操作。

image_height, image_width = 28, 28
X_train = X_train_raw.reshape(60000, image_height*image_width)
X_test = X_test_raw.reshape(10000, image_height*image_width)
print(X_train.shape, X_test.shape)

# data value is scaled, which is divided by 255 (each pixel value is 0~255)
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0
print(X_train[0])

reshape之后，X_train和X_test的shape输出如下：

(60000, 784) (10000, 784)

5. 预处理测试数据

对于二分类问题，标签一般为0，1；而对于多分类问题，我们会将标签扩张为一个由若干0和1组成的向量。

在这个问题中，总共有10个数字，我们构造一个1x10的向量，表示每一个数字。例如数字5可以表示为[0,0,0,0,0,1,0,0,0,0]。函数to_categorical()实现了该功能

# convert each class value to a vector with value of 0 or 1
# i.e. if class vector, then it will be a matrix with value of 0 or 1
# e.g. for 0~9, 10 classes in total, if value is 5, then it will be 
#      coverted to [0,0,0,0,0,1,0,0,0,0]
y_train = to_categorical(y_train_raw, 10)
y_test = to_categorical(y_test_raw, 10)
print(y_train.shape)
print(y_train[0])

输出如下：

(60000, 10)
[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]

6. 构建Keras模型

Keras使用Sequential()构造一个空模型，然后用add()添加每一个层。Dense()用于产生一个全连接层。其中在第一层需要指明input_shape()。input_shape中的数字由每组数据包含的数值个数决定（本例中是28x28=784）。

model = Sequential()

# 'Dense' means it's a full-connected layer
# For the 1st layer, the shape must be given, input_shape=(784,) means the 
#   the input is a matrix with N * 784 (X_train.shape is (60000, 784))
model.add(Dense(512, activation='relu', input_shape=(784,)))

# For the following layers, the shape is not necessary
model.add(Dense(512, activation='relu'))

# output layer, 'softmax' is used
model.add(Dense(10, activation='softmax'))

7. 编译训练模型

# As it is a multi-classfication problem, loss is chosen as categorical_crossentropy
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# show summary to make sure it's what we expected
model.summary()

summary()输出如下

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 512)               401920    
                                                                 
 dense_1 (Dense)             (None, 512)               262656    
                                                                 
 dense_2 (Dense)             (None, 10)                5130      
                                                                 
=================================================================
Total params: 669706 (2.55 MB)
Trainable params: 669706 (2.55 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

每一层的参数是指计算这一层用到的权重和偏好的个数。对于全连接模型

权重个数 = (上一层结点数) x (当前层结点数)
偏好个数 = 当前层结点数

于是：

layer1 = 784 x 512 + 512
layer2 = 512 x 512 + 512
layer3 = 512 + 10 + 10

8. 训练模型

Keras使用fit()训练模型（其它很多AI的模型库也使用这个函数名）。返回的history会记录每个epoch的计算结果。

history = model.fit(X_train, y_train, epochs=20, validation_data=(X_test, y_test))

运行fit()后，结果如下：

Epoch 1/20
1875/1875 [==============================] - 25s 12ms/step - loss: 0.1821 - accuracy: 0.9446 - val_loss: 0.1131 - val_accuracy: 0.9642
Epoch 2/20
1875/1875 [==============================] - 24s 13ms/step - loss: 0.0796 - accuracy: 0.9753 - val_loss: 0.0731 - val_accuracy: 0.9768
Epoch 3/20
1875/1875 [==============================] - 30s 16ms/step - loss: 0.0558 - accuracy: 0.9826 - val_loss: 0.0712 - val_accuracy: 0.9774
Epoch 4/20
1875/1875 [==============================] - 35s 19ms/step - loss: 0.0419 - accuracy: 0.9871 - val_loss: 0.0793 - val_accuracy: 0.9755
Epoch 5/20
1875/1875 [==============================] - 30s 16ms/step - loss: 0.0353 - accuracy: 0.9886 - val_loss: 0.0808 - val_accuracy: 0.9773
Epoch 6/20
1875/1875 [==============================] - 23s 12ms/step - loss: 0.0281 - accuracy: 0.9906 - val_loss: 0.0690 - val_accuracy: 0.9831
Epoch 7/20
1875/1875 [==============================] - 24s 13ms/step - loss: 0.0234 - accuracy: 0.9926 - val_loss: 0.1078 - val_accuracy: 0.9769
Epoch 8/20
1875/1875 [==============================] - 24s 13ms/step - loss: 0.0218 - accuracy: 0.9929 - val_loss: 0.0940 - val_accuracy: 0.9790
Epoch 9/20
1875/1875 [==============================] - 23s 12ms/step - loss: 0.0212 - accuracy: 0.9937 - val_loss: 0.1177 - val_accuracy: 0.9777
Epoch 10/20
1875/1875 [==============================] - 23s 12ms/step - loss: 0.0193 - accuracy: 0.9943 - val_loss: 0.1055 - val_accuracy: 0.9802
Epoch 11/20
1875/1875 [==============================] - 22s 12ms/step - loss: 0.0186 - accuracy: 0.9941 - val_loss: 0.0923 - val_accuracy: 0.9824
Epoch 12/20
1875/1875 [==============================] - 26s 14ms/step - loss: 0.0126 - accuracy: 0.9961 - val_loss: 0.1040 - val_accuracy: 0.9829
Epoch 13/20
1875/1875 [==============================] - 25s 13ms/step - loss: 0.0162 - accuracy: 0.9958 - val_loss: 0.1177 - val_accuracy: 0.9792
Epoch 14/20
1875/1875 [==============================] - 28s 15ms/step - loss: 0.0156 - accuracy: 0.9958 - val_loss: 0.1153 - val_accuracy: 0.9813
Epoch 15/20
1875/1875 [==============================] - 23s 12ms/step - loss: 0.0159 - accuracy: 0.9959 - val_loss: 0.0955 - val_accuracy: 0.9833
Epoch 16/20
1875/1875 [==============================] - 23s 12ms/step - loss: 0.0128 - accuracy: 0.9964 - val_loss: 0.1216 - val_accuracy: 0.9827
Epoch 17/20
1875/1875 [==============================] - 23s 12ms/step - loss: 0.0147 - accuracy: 0.9962 - val_loss: 0.1250 - val_accuracy: 0.9807
Epoch 18/20
1875/1875 [==============================] - 22s 12ms/step - loss: 0.0096 - accuracy: 0.9975 - val_loss: 0.1409 - val_accuracy: 0.9791
Epoch 19/20
1875/1875 [==============================] - 23s 12ms/step - loss: 0.0177 - accuracy: 0.9961 - val_loss: 0.1395 - val_accuracy: 0.9789
Epoch 20/20
1875/1875 [==============================] - 23s 12ms/step - loss: 0.0122 - accuracy: 0.9967 - val_loss: 0.1324 - val_accuracy: 0.9829

注意到每一行表示一个epoch的输出，这一行中记录了’loss‘, 'accuracy', 'val_loss', 'val_accuracy' 4个参数，这些信息都会保存到history中。这一行同时也记录了每个epoch运行的时间，20~30秒。

9. 浏览训练模型的准确度

查看history(fit函数的返回值)中的记录，检查模型训练的准确度

例如，查看'accuracy'

plt.plot(history.history['accuracy'])

10. 评价模型

使用evaluate()函数执行模型评价

score = model.evaluate(X_test, y_test)
print(score)

输出如下

313/313 [==============================] - 1s 4ms/step - loss: 0.1324 - accuracy: 0.9829
[0.13238213956356049, 0.9829000234603882]

后面那个数字表示准确度：98.29%