接模型优化和调整(1)
调整反向传播
梯度消失和梯度爆炸
梯度消失和梯度爆炸都和计算出来的“delta”有关。理想的delta应该是逐渐减小的。如果delta一直太小,则会导致下降太慢,甚至对于权重没有改变,此时形成了梯度消失。如果delta一直很大,则会出现波浪式(choppy)学习过程,实际没有任何下降,此时形成了梯度爆炸。下图给出了梯度消失和梯度爆炸的示意。
解决方案有
- 权重初始化。初始化时选择较优的权重
- 激活函数。激活函数可以影响梯度下降,因此应该选择合适的激活函数
- 批规范化(Batch normalization)。这个概念在GANs和Diffusion模型(2)中提到过,本文稍后会给出一些讲解
批规范化
批规范化是一项处理梯度消失和梯度爆炸的重要技术。具体如下:
- 在每一个隐藏层之前,对输入进行规范化
- 这里的规范化是指:对权重和偏好进行中心化和定标(Center and Scale),或者称为StandardScaler
- 在计算平均值和标准差的时候,会考虑隐藏层输出的值,使得规范化后的输入数据具有相同的规格(scale)。即使delta更新了、激活函数改变了数据的规格,这个步骤也能保持每个隐藏层的输入数据具有相同的规格。
- 有助于通过更少的期数获得更高的准确度。
- 需要额外的计算,因而会增加对计算资源的使用、以及执行时间。
试验程序
试验程序仍然基于模型优化和调整(1)中的基础模型。
#Initialize the measures
accuracy_measures = {}
normalization_list = ['none', 'batch']
for normalization in normalization_list:
#Load default configuration
model_config = base_model_config()
#Acquire and process input data
X,Y = get_data()
model_config["NORMALIZATION"] = normalization
model_name = "Normalization-" + normalization
history = create_and_run_model(model_config, X, Y, model_name)
accuracy_measures[model_name] = history.history["accuracy"]
#Plot
plot_graph(accuracy_measures, "Compare Batch Normalization")
运行程序后,可以得到如下结果
可以看到,使用了批规范化后,模型的准确度提高了
优化因子(Optimizer)
优化因子是帮助快速梯度下降的关键工具。可用的优化因子有
- SGD(Stochastic Gradient Descent)
- RMSprop
- Adam
- Adagrad
本文不会对每种优化因子的数学原理展开陈述,有兴趣可以搜索相关资料
试验程序
#Initialize the measures
accuracy_measures = {}
optimizer_list = ['sgd', 'rmsprop', 'adam', 'adagrad']
for optimizer in optimizer_list:
#Load default configuration
model_config = base_model_config()
#Acquire and process input data
X,Y = get_data()
model_config["OPTIMIZER"] = optimizer
model_name = "Optimizer-" + optimizer
history = create_and_run_model(model_config, X, Y, model_name)
accuracy_measures[model_name] = history.history["accuracy"]
#Plot
plot_graph(accuracy_measures, "Compare Optimizers")
学习率(Learning Rate)
和优化因子相关的另一个超参数是学习率。学习率是
- 权重改变和其对应的估计误差之间的比值
- 和优化因子一起协同工作。在误差估计之后,优化因子会根据学习率调整delta。
- 学习率是一个一个小于1的小数。
学习率的选择
- 较大的值
- 学习更快,需要的期数更少
- 增加梯度爆炸的风险
- 较小的值
- 学习更慢,但更稳定
- 增加梯度消失的风险
试验程序
#Initialize the measures
accuracy_measures = {}
learning_rate_list = [0.001, 0.005, 0.01, 0.1, 0.5]
for learning_rate in learning_rate_list:
#Load default configuration
model_config = base_model_config()
#Acquire and process input data
X,Y = get_data()
model_config["LEARNING_RATE"] = learning_rate
model_name = "Learning_Rate-" + str(learning_rate)
history = create_and_run_model(model_config, X, Y, model_name)
accuracy_measures[model_name] = history.history["accuracy"]
#Plot
plot_graph(accuracy_measures, "Compare Learning Rates")
过拟合处理
过拟合就是对训练集中的数据有着非常高的拟合度,然而对于训练集之外的独立数据准确度相对较低。应对过拟合的方法有:
- 简化模型
- 减少层数和层中的结点数
- 训练中使用更小的期和批大小
- 增加训练数据的规模和多样性
- 正则化(Regularization)
- 丢弃(Dropout)
正则化
正则化
- 控制模型训练中的过拟合
- 在模型参数更新后,给模型参数提供一个调整量,防止其过拟合
- 当过拟合增加时,提供一个惩罚(penalty),以减少模型的偏差
- 多种可用的正则化方法
- L1,L2,L1和L2的组合
试验程序
#Initialize the measures
accuracy_measures = {}
regularizer_list = ['l1', 'l2', 'l1_l2']
for regularizer in regularizer_list:
#Load default configuration
model_config = base_model_config()
#Acquire and process input data
X,Y = get_data()
model_config["REGULARIZER"] = regularizer
model_config["EPOCHS"] = 25
model_name = "Regularizer-" + regularizer
history = create_and_run_model(model_config, X, Y, model_name)
accuracy_measures[model_name] = history.history["accuracy"]
#Plot
plot_graph(accuracy_measures, "Compare Regularization")
丢弃(Dropout)
dropout是减少过拟合的一种非常流行的方法。dropout
- 在前向传播过程中随机丢弃一些结点
- 给定一个百分比数,按照这个百分比随机丢弃一些结点
- drop的选取,应该使得训练数据集和测试数据集的准确度相似
试验程序
#Initialize the measures
accuracy_measures = {}
dropout_list = [0.0, 0.1, 0.2, 0.5]
for dropout in dropout_list:
#Load default configuration
model_config = base_model_config()
#Acquire and process input data
X,Y = get_data()
model_config["DROPOUT_RATE"] = dropout
model_config["EPOCHS"] = 25
model_name = "dropout-" + str(dropout)
history = create_and_run_model(model_config, X, Y, model_name)
accuracy_measures[model_name] = history.history["accuracy"]
#Plot
plot_graph(accuracy_measures, "Compare Dropouts")
模型优化练习
在这个练习中,需要从以下几个方面对模型进行优化
- 模型
- 模型的层数
- 每一层的结点数(基于优化后的层数)
- 反向传播
- 优化因子
- 学习率(基于已选定的优化因子)
- 过拟合
- 正则化
- 丢弃率(基于已经选定的正则化算法)
- 最终模型
- 组装所有的优化参数
- 和默认设置对比
环境准备
使用google colab的开发环境,需要作以下准备工作
- 在google colab的drive中创建一个自己的工作路径:Colab Notebooks/DeepLearning/tuning
- 将数据文件root_cause_analysis.csv上传到这个路径下
- 将模型优化和调整(1)中的“程序公共函数”代码封装为一个单独的文件:CommonFunctions.ipynb,准备重用
因为使用了google drive的本地文件,所以需要先导入自己的google drive
# mount my drive in google colab
from google.colab import drive
drive.mount('/content/drive')
# change to my working directory, all sources are in this folder
%cd /content/drive/My Drive/Colab Notebooks/DeepLearning/tuning
同时,由于需要重用公共函数,所以运行以下代码
%run CommonFunctions.ipynb
获取并准备数据
将这一个动作封装为一个函数get_rca_data(),以便后续使用。
程序对类别做了独热编码(one-hot-encoding)的处理,这个处理在我之前的很多博文中都有讲解。具体来说,laber_encoder.fit_transform会将字符类别转换为"1, 2, 3"这样的数字标签;然后再调用to_categorical(),将"1, 2, 3"这样的数字标签转化为只含有0和1的向量。比如2转化为[0, 1, 0],3转化为[0, 0, 1]。
import pandas as pd
import os
import tensorflow as tf
def get_rca_data():
#Load the data file into a Pandas Dataframe
symptom_data = pd.read_csv("root_cause_analysis.csv")
#Explore the data loaded
print(symptom_data.dtypes)
symptom_data.head()
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
laber_encoder = preprocessing.LabelEncoder()
symptom_data['ROOT_CAUSE'] = laber_encoder.fit_transform(symptom_data['ROOT_CAUSE'])
print(symptom_data['ROOT_CAUSE'][:5])
#Convert Pandas Dataframe into a numpy vector
np_symptom = symptom_data.to_numpy().astype(float)
#Extract the features (X), from 2nd column ~ 8th column (column B~H)
X_data = np_symptom[:,1:8]
#Extract the targets (Y), convert to one-hot-encoding the 9th column (column G)
Y_data = np_symptom[:,8]
Y_data = tf.keras.utils.to_categorical(Y_data, 3)
return X_data, Y_data
调整网络参数
先优化层数,基本参考了模型优化和调整(1)中的程序
#Initialize the measures
accuracy_measures = {}
layer_list = []
for layer_count in range(1, 6):
#32 nodes in each layer
layer_list.append(32)
#Load default configuration
model_config = base_model_config()
#Acquire and process input data
X,Y = get_rca_data()
#"HIDDEN_NODES" includes all nodes in layers from input layer to the last hidden layer
model_config["HIDDEN_NODES"] = layer_list
model_name = "Layer-" + str(layer_count)
history = create_and_run_model(model_config, X, Y, model_name)
accuracy_measures[model_name] = history.history["accuracy"]
#Plot
plot_graph(accuracy_measures, "Compare Layers")
结果如下:
可以看出2层具有较好的性能,因此选择层数为2
#2 layers seem to provide the highest accuracy level at lower epoch counts
LAYERS = 2
然后固定选择的层数,优化结点数
参考模型优化和调整(1)中的程序
#Initialize the measures
accuracy_measures = {}
for node_count in range(8, 40, 8):
#have a fixed number of 2 hidden layers
layer_list = []
for layer_count in range(LAYERS):
layer_list.append(node_count)
#Load default configuration
model_config = base_model_config()
#Acquire and process input data
X,Y = get_rca_data()
#"HIDDEN_NODES" includes all nodes in layers from input layer to the last hidden layer
model_config["HIDDEN_NODES"] = layer_list
model_name = "Nodes-" + str(node_count)
history = create_and_run_model(model_config, X, Y, model_name)
accuracy_measures[model_name] = history.history["accuracy"]
#Plot
plot_graph(accuracy_measures, "Compare Nodes")
可以看出,32具有较好的性能
#32 nodes seem to be best
NODES = 32
调整反向传播
调整优化因子
#Initialize the measures
accuracy_measures = {}
optimizer_list = ['sgd', 'rmsprop', 'adam', 'adagrad']
for optimizer in optimizer_list:
#Load default configuration
model_config = base_model_config()
#apply the chosen config
model_config["HIDDEN_NODES"] = []
for i in range(LAYERS):
model_config["HIDDEN_NODES"].append(NODES)
#Acquire and process input data
X,Y = get_rca_data()
model_config["OPTIMIZER"] = optimizer
model_name = "Optimizer-" + optimizer
history = create_and_run_model(model_config, X, Y, model_name)
accuracy_measures[model_name] = history.history["accuracy"]
#Plot
plot_graph(accuracy_measures, "Compare Optimizers")
运行结果如下
应该选择'rmsprop'
#rmsprop seem to be best
OPTIMIZER = 'rmsprop'
调整学习率
#Initialize the measures
accuracy_measures = {}
learning_rate_list = [0.001, 0.005, 0.01, 0.1, 0.5]
for learning_rate in learning_rate_list:
#Load default configuration
model_config = base_model_config()
#apply the chosen config
model_config["HIDDEN_NODES"] = []
for i in range(LAYERS):
model_config["HIDDEN_NODES"].append(NODES)
model_config["OPTIMIZER"] = OPTIMIZER
#Acquire and process input data
X,Y = get_rca_data()
model_config["LEARNING_RATE"] = learning_rate
model_name = "Learning_Rate-" + str(learning_rate)
history = create_and_run_model(model_config, X, Y, model_name)
accuracy_measures[model_name] = history.history["accuracy"]
#Plot
plot_graph(accuracy_measures, "Compare Learning Rates")
运行结果如下:
这个多次运行后结果不太稳定,原因是数据量太小。最终选择了0.001。
#All seems to be OK, choose 0.001
LEARNING_RATE = 0.001
避免过拟合
调整正则化
#Initialize the measures
accuracy_measures = {}
regularizer_list = [None, 'l1', 'l2', 'l1_l2']
for regularizer in regularizer_list:
#Load default configuration
model_config = base_model_config()
#apply the chosen config
model_config["HIDDEN_NODES"] = []
for i in range(LAYERS):
model_config["HIDDEN_NODES"].append(NODES)
model_config["OPTIMIZER"] = OPTIMIZER
model_config["LEARNING_RATE"] = LEARNING_RATE
#Acquire and process input data
X,Y = get_rca_data()
model_config["REGULARIZER"] = regularizer
model_config["EPOCHS"] = 25
model_name = "Regularizer-" + str(regularizer)
history = create_and_run_model(model_config, X, Y, model_name)
# as considering overfitting, we choose valication accuracy as metric
accuracy_measures[model_name] = history.history["val_accuracy"]
#Plot
plot_graph(accuracy_measures, "Compare Regularization")
结果如下:
None和'l2'具有接近的性能,多次运行后,最终选择了None
# None & l2 has simliar performance, after run with serveral times, choose None
REGULARIZER = None
调整丢弃率
#Initialize the measures
accuracy_measures = {}
dropout_list = [0.0, 0.1, 0.2, 0.5]
for dropout in dropout_list:
#Load default configuration
model_config = base_model_config()
#apply the chosen config
model_config["HIDDEN_NODES"] = []
for i in range(LAYERS):
model_config["HIDDEN_NODES"].append(NODES)
model_config["OPTIMIZER"] = OPTIMIZER
model_config["LEARNING_RATE"] = LEARNING_RATE
model_config["REGULARIZER"] = REGULARIZER
#Acquire and process input data
X,Y = get_rca_data()
model_config["DROPOUT_RATE"] = dropout
model_name = "dropout-" + str(dropout)
history = create_and_run_model(model_config, X, Y, model_name)
# as considering overfitting, we choose valication accuracy as metric
accuracy_measures[model_name] = history.history["val_accuracy"]
#Plot
plot_graph(accuracy_measures, "Compare Dropouts")
这个运行结果也不太稳定,多次运行后,选择了0.1
# 0.1 is the best
DROPOUT = 0.1
构建最终的模型
通过使用默认配置和优化后的配置,对比二者的效果
#Initialize the measures
accuracy_measures = {}
#Base model with default configurations
model_config = base_model_config()
model_config["HIDDEN_NODES"] = [16]
model_config["NORMALIZATION"] = None
model_config["OPTIMIZER"] = 'rmsprop'
model_config["LEARNING_RATE"] = 0.001
model_config["REGULARIZER"] = None
model_config["DROPOUT_RATE"] = 0.0
#Acquire and process input data
X,Y = get_rca_data()
model_name = "Base-Model"
history = create_and_run_model(model_config, X, Y, model_name)
accuracy_measures[model_name] = history.history["accuracy"]
#Optimized model
#apply the chosen config
model_config["HIDDEN_NODES"] = []
for i in range(LAYERS):
model_config["HIDDEN_NODES"].append(NODES)
model_config["NORMALIZATION"] = 'batch'
model_config["OPTIMIZER"] = OPTIMIZER
model_config["LEARNING_RATE"] = LEARNING_RATE
model_config["REGULARIZER"] = REGULARIZER
model_config["DROPOUT_RATE"] = DROPOUT
#Acquire and process input data
X,Y = get_rca_data()
model_name = "Optimized-Model"
history = create_and_run_model(model_config, X, Y, model_name)
accuracy_measures[model_name] = history.history["accuracy"]
#Plot
plot_graph(accuracy_measures, "Compare Base and Optimized Model")
这个运行结果也不是很稳定,多次运行后,总体来说,优化后模型的性能是更好的。