- 🍨 本文为🔗365天深度学习训练营 中的学习记录博客R4中的内容,为了便于自己整理总结起名为R3
- 🍖 原作者:K同学啊 | 接辅导、项目定制
目录
- 0. 总结
- 1. LSTM介绍
- LSTM的基本组成部分
- 如何理解与应用LSTM
- 2. 数据导入
- 3. 数据预处理
- 4. 划分数据集
- 5. 模型构建
- 6. 初始化模型及优化器
- 7. 定义训练函数
- 8. 定义测试函数
- 9. 训练过程
- 10. 模型评估
- 11. 调用模型进行预测
- 12. R2评估
0. 总结
数据导入及处理部分:在 PyTorch 中,我们通常先将 NumPy 数组转换为 torch.Tensor,再封装到 TensorDataset 或自定义的 Dataset 里,然后用 DataLoader 按批次加载。
模型构建部分:LSTM
设置超参数:在这之前需要定义损失函数,学习率(动态学习率),以及根据学习率定义优化器(例如SGD随机梯度下降),用来在训练中更新参数,最小化损失函数。
定义训练函数:函数的传入的参数有四个,分别是设置好的DataLoader(),定义好的模型,损失函数,优化器。函数内部初始化损失准确率为0,接着开始循环,使用DataLoader()获取一个批次的数据,对这个批次的数据带入模型得到预测值,然后使用损失函数计算得到损失值。接下来就是进行反向传播以及使用优化器优化参数,梯度清零放在反向传播之前或者是使用优化器优化之后都是可以的,一般是默认放在反向传播之前。
定义测试函数:函数传入的参数相比训练函数少了优化器,只需传入设置好的DataLoader(),定义好的模型,损失函数。此外除了处理批次数据时无需再设置梯度清零、返向传播以及优化器优化参数,其余部分均和训练函数保持一致。
训练过程:定义训练次数,有几次就使用整个数据集进行几次训练,初始化四个空list分别存储每次训练及测试的准确率及损失。使用model.train()开启训练模式,调用训练函数得到准确率及损失。使用model.eval()将模型设置为评估模式,调用测试函数得到准确率及损失。接着就是将得到的训练及测试的准确率及损失存储到相应list中并合并打印出来,得到每一次整体训练后的准确率及损失。
结果可视化
模型的保存,调取及使用。在PyTorch中,通常使用 torch.save(model.state_dict(), ‘model.pth’) 保存模型的参数,使用 model.load_state_dict(torch.load(‘model.pth’)) 加载参数。
需要改进优化的地方:确保模型和数据的一致性,都存到GPU或者CPU;注意numclasses不要直接用默认的1000,需要根据实际数据集改进;实例化模型也要注意numclasses这个参数;此外注意测试模型需要用(3,224,224)3表示通道数,这和tensorflow定义的顺序是不用的(224,224,3),做代码转换时需要注意。
1. LSTM介绍
LSTM(Long Short-Term Memory)是一种特殊类型的循环神经网络(RNN),主要用于处理和预测基于时间序列的数据。它解决了传统RNN在处理长时间依赖问题时的局限性,比如“梯度消失”问题。
LSTM的基本组成部分
LSTM通过“记忆单元”来决定哪些信息需要记住,哪些需要遗忘。它的核心是一个“门控”机制,主要有三个“门”:
- 遗忘门(Forget Gate):决定上一时刻的状态需要遗忘多少。
- 输入门(Input Gate):决定当前输入信息中有多少被存储到“记忆单元”。
- 输出门(Output Gate):决定“记忆单元”中的信息有多少会传递到下一时刻的状态。
如何理解与应用LSTM
-
理解LSTM的优势:LSTM能捕捉长时间序列中的依赖关系,适合处理那些长序列的数据,如文本、语音、金融数据等。相比于传统的RNN,LSTM能够有效解决“梯度消失”问题,使得模型能够学习长期的依赖关系。
-
应用场景:LSTM广泛应用于自然语言处理(如文本生成、机器翻译、情感分析)、语音识别、时间序列预测(如股市预测)等领域。
-
如何使用LSTM:在初学者阶段,可以从以下几个步骤入手:
- 数据预处理:将数据转换成适合时间序列建模的格式,比如文本序列或连续的时间戳数据。
- 选择框架:使用像TensorFlow或PyTorch这样的深度学习框架来实现LSTM。你可以从简单的LSTM网络开始,逐渐增加网络复杂性。
- 调试与优化:调整LSTM的超参数(如隐藏单元数、学习率、批量大小等)来提升模型性能。
你可以从一个简单的文本生成模型或者时间序列预测模型开始,逐步理解LSTM的细节和优势。
import torch.nn.functional as F
import numpy as np
import pandas as pd
import torch
from torch import nn
import copy
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import classification_report,confusion_matrix
from sklearn.metrics import r2_score
from sklearn.metrics import mean_absolute_error , mean_absolute_percentage_error , mean_squared_error
2. 数据导入
data = pd.read_csv("./data/woodpine2.csv")
data
Time | Tem1 | CO 1 | Soot 1 | |
---|---|---|---|---|
0 | 0.000 | 25.0 | 0.000000 | 0.000000 |
1 | 0.228 | 25.0 | 0.000000 | 0.000000 |
2 | 0.456 | 25.0 | 0.000000 | 0.000000 |
3 | 0.685 | 25.0 | 0.000000 | 0.000000 |
4 | 0.913 | 25.0 | 0.000000 | 0.000000 |
... | ... | ... | ... | ... |
5943 | 366.000 | 295.0 | 0.000077 | 0.000496 |
5944 | 366.000 | 294.0 | 0.000077 | 0.000494 |
5945 | 367.000 | 292.0 | 0.000077 | 0.000491 |
5946 | 367.000 | 291.0 | 0.000076 | 0.000489 |
5947 | 367.000 | 290.0 | 0.000076 | 0.000487 |
5948 rows × 4 columns
plt.rcParams['savefig.dpi'] = 500 #图片像素
plt.rcParams['figure.dpi'] = 500 #分辨率
fig, ax =plt.subplots(1,3,constrained_layout=True, figsize=(14, 3))
sns.lineplot(data=data["Tem1"], ax=ax[0])
sns.lineplot(data=data["CO 1"], ax=ax[1])
sns.lineplot(data=data["Soot 1"], ax=ax[2])
plt.show()
dataFrame = data.iloc[:,1:]
dataFrame
Tem1 | CO 1 | Soot 1 | |
---|---|---|---|
0 | 25.0 | 0.000000 | 0.000000 |
1 | 25.0 | 0.000000 | 0.000000 |
2 | 25.0 | 0.000000 | 0.000000 |
3 | 25.0 | 0.000000 | 0.000000 |
4 | 25.0 | 0.000000 | 0.000000 |
... | ... | ... | ... |
5943 | 295.0 | 0.000077 | 0.000496 |
5944 | 294.0 | 0.000077 | 0.000494 |
5945 | 292.0 | 0.000077 | 0.000491 |
5946 | 291.0 | 0.000076 | 0.000489 |
5947 | 290.0 | 0.000076 | 0.000487 |
5948 rows × 3 columns
3. 数据预处理
dataFrame = data.iloc[:,1:].copy()
sc = MinMaxScaler(feature_range=(0, 1)) #将数据归一化,范围是0到1
for i in ['CO 1', 'Soot 1', 'Tem1']:
dataFrame[i] = sc.fit_transform(dataFrame[i].values.reshape(-1, 1))
dataFrame.shape
(5948, 3)
# 设置X、y
width_X = 8
width_y = 1
##取前8个时间段的Tem1、CO 1、Soot 1为X,第9个时间段的Tem1为y。
X = []
y = []
in_start = 0
for _, _ in data.iterrows():
in_end = in_start + width_X
out_end = in_end + width_y
if out_end < len(dataFrame):
X_ = np.array(dataFrame.iloc[in_start:in_end , ])
y_ = np.array(dataFrame.iloc[in_end :out_end, 0])
X.append(X_)
y.append(y_)
in_start += 1
X = np.array(X)
y = np.array(y).reshape(-1,1,1)
X.shape, y.shape
((5939, 8, 3), (5939, 1, 1))
# 检查数据集中是否有空值
print(np.any(np.isnan(X)))
print(np.any(np.isnan(y)))
False
False
4. 划分数据集
X_train = torch.tensor(np.array(X[:5000]), dtype=torch.float32)
y_train = torch.tensor(np.array(y[:5000]), dtype=torch.float32)
X_test = torch.tensor(np.array(X[5000:]), dtype=torch.float32)
y_test = torch.tensor(np.array(y[5000:]), dtype=torch.float32)
X_train.shape, y_train.shape
(torch.Size([5000, 8, 3]), torch.Size([5000, 1, 1]))
from torch.utils.data import TensorDataset, DataLoader
train_dl = DataLoader(TensorDataset(X_train, y_train),
batch_size=64,
shuffle=False)
test_dl = DataLoader(TensorDataset(X_test, y_test),
batch_size=64,
shuffle=False)
5. 模型构建
class model_lstm(nn.Module):
def __init__(self):
super(model_lstm, self).__init__()
self.lstm0 = nn.LSTM(input_size=3 ,hidden_size=320,
num_layers=1, batch_first=True)
self.lstm1 = nn.LSTM(input_size=320 ,hidden_size=320,
num_layers=1, batch_first=True)
self.fc0 = nn.Linear(320, 1)
def forward(self, x):
out, hidden1 = self.lstm0(x)
out, _ = self.lstm1(out, hidden1)
out = self.fc0(out)
return out[:, -1:, :] #取2个预测值,否则经过lstm会得到8*2个预测
6. 初始化模型及优化器
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model_lstm()
print(model)
epochs = 50
loss_fn = nn.MSELoss() # 创建损失函数
learn_rate = 1e-1 # 学习率
opt = torch.optim.SGD(model.parameters(),lr=learn_rate,weight_decay=1e-4)
lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(opt,epochs, last_epoch=-1) # 选定调整方法
model_lstm(
(lstm0): LSTM(3, 320, batch_first=True)
(lstm1): LSTM(320, 320, batch_first=True)
(fc0): Linear(in_features=320, out_features=1, bias=True)
)
# 观察模型的输出数据集格式是什么
model(torch.rand(30,8,3)).shape
torch.Size([30, 1, 1])
7. 定义训练函数
def train(train_dl, model, loss_fn, opt, lr_scheduler=None):
size = len(train_dl.dataset)
num_batches = len(train_dl)
train_loss = 0 # 初始化训练损失和正确率
for x, y in train_dl:
x, y = x.to(device), y.to(device)
# 计算预测误差
pred = model(x) # 网络输出
loss = loss_fn(pred, y) # 计算网络输出和真实值之间的差距
# 反向传播
opt.zero_grad() # grad属性归零
loss.backward() # 反向传播
opt.step() # 每一步自动更新
# 记录loss
train_loss += loss.item()
if lr_scheduler is not None:
lr_scheduler.step()
print("learning rate = {:.5f}".format(opt.param_groups[0]['lr']), end=" ")
train_loss /= num_batches
return train_loss
8. 定义测试函数
def test (dataloader, model, loss_fn):
size = len(dataloader.dataset) # 测试集的大小
num_batches = len(dataloader) # 批次数目
test_loss = 0
# 当不进行训练时,停止梯度更新,节省计算内存消耗
with torch.no_grad():
for x, y in dataloader:
x, y = x.to(device), y.to(device)
# 计算loss
y_pred = model(x)
loss = loss_fn(y_pred, y)
test_loss += loss.item()
test_loss /= num_batches
return test_loss
9. 训练过程
train_loss = []
test_loss = []
for epoch in range(epochs):
model.train()
epoch_train_loss = train(train_dl, model, loss_fn, opt, lr_scheduler)
model.eval()
epoch_test_loss = test(test_dl, model, loss_fn)
train_loss.append(epoch_train_loss)
test_loss.append(epoch_test_loss)
template = ('Epoch:{:2d}, Train_loss:{:.5f}, Test_loss:{:.5f}')
print(template.format(epoch+1, epoch_train_loss, epoch_test_loss))
print("="*20, 'Done', "="*20)
learning rate = 0.09990 Epoch: 1, Train_loss:0.00133, Test_loss:0.01258
learning rate = 0.09961 Epoch: 2, Train_loss:0.01467, Test_loss:0.01221
learning rate = 0.09911 Epoch: 3, Train_loss:0.01437, Test_loss:0.01183
learning rate = 0.09843 Epoch: 4, Train_loss:0.01403, Test_loss:0.01142
learning rate = 0.09755 Epoch: 5, Train_loss:0.01363, Test_loss:0.01097
learning rate = 0.09649 Epoch: 6, Train_loss:0.01317, Test_loss:0.01046
learning rate = 0.09524 Epoch: 7, Train_loss:0.01262, Test_loss:0.00988
learning rate = 0.09382 Epoch: 8, Train_loss:0.01197, Test_loss:0.00924
learning rate = 0.09222 Epoch: 9, Train_loss:0.01120, Test_loss:0.00852
learning rate = 0.09045 Epoch:10, Train_loss:0.01032, Test_loss:0.00774
learning rate = 0.08853 Epoch:11, Train_loss:0.00933, Test_loss:0.00690
learning rate = 0.08645 Epoch:12, Train_loss:0.00826, Test_loss:0.00605
learning rate = 0.08423 Epoch:13, Train_loss:0.00712, Test_loss:0.00519
learning rate = 0.08187 Epoch:14, Train_loss:0.00598, Test_loss:0.00438
learning rate = 0.07939 Epoch:15, Train_loss:0.00488, Test_loss:0.00362
learning rate = 0.07679 Epoch:16, Train_loss:0.00387, Test_loss:0.00296
learning rate = 0.07409 Epoch:17, Train_loss:0.00298, Test_loss:0.00240
learning rate = 0.07129 Epoch:18, Train_loss:0.00224, Test_loss:0.00194
learning rate = 0.06841 Epoch:19, Train_loss:0.00165, Test_loss:0.00158
learning rate = 0.06545 Epoch:20, Train_loss:0.00120, Test_loss:0.00130
learning rate = 0.06243 Epoch:21, Train_loss:0.00087, Test_loss:0.00110
learning rate = 0.05937 Epoch:22, Train_loss:0.00063, Test_loss:0.00095
learning rate = 0.05627 Epoch:23, Train_loss:0.00047, Test_loss:0.00084
learning rate = 0.05314 Epoch:24, Train_loss:0.00035, Test_loss:0.00076
learning rate = 0.05000 Epoch:25, Train_loss:0.00027, Test_loss:0.00070
learning rate = 0.04686 Epoch:26, Train_loss:0.00022, Test_loss:0.00066
learning rate = 0.04373 Epoch:27, Train_loss:0.00018, Test_loss:0.00063
learning rate = 0.04063 Epoch:28, Train_loss:0.00016, Test_loss:0.00060
learning rate = 0.03757 Epoch:29, Train_loss:0.00014, Test_loss:0.00058
learning rate = 0.03455 Epoch:30, Train_loss:0.00013, Test_loss:0.00057
learning rate = 0.03159 Epoch:31, Train_loss:0.00012, Test_loss:0.00056
learning rate = 0.02871 Epoch:32, Train_loss:0.00012, Test_loss:0.00055
learning rate = 0.02591 Epoch:33, Train_loss:0.00011, Test_loss:0.00055
learning rate = 0.02321 Epoch:34, Train_loss:0.00011, Test_loss:0.00054
learning rate = 0.02061 Epoch:35, Train_loss:0.00011, Test_loss:0.00054
learning rate = 0.01813 Epoch:36, Train_loss:0.00011, Test_loss:0.00054
learning rate = 0.01577 Epoch:37, Train_loss:0.00012, Test_loss:0.00054
learning rate = 0.01355 Epoch:38, Train_loss:0.00012, Test_loss:0.00055
learning rate = 0.01147 Epoch:39, Train_loss:0.00012, Test_loss:0.00056
learning rate = 0.00955 Epoch:40, Train_loss:0.00013, Test_loss:0.00056
learning rate = 0.00778 Epoch:41, Train_loss:0.00013, Test_loss:0.00057
learning rate = 0.00618 Epoch:42, Train_loss:0.00013, Test_loss:0.00058
learning rate = 0.00476 Epoch:43, Train_loss:0.00013, Test_loss:0.00058
learning rate = 0.00351 Epoch:44, Train_loss:0.00014, Test_loss:0.00059
learning rate = 0.00245 Epoch:45, Train_loss:0.00014, Test_loss:0.00059
learning rate = 0.00157 Epoch:46, Train_loss:0.00014, Test_loss:0.00059
learning rate = 0.00089 Epoch:47, Train_loss:0.00014, Test_loss:0.00059
learning rate = 0.00039 Epoch:48, Train_loss:0.00014, Test_loss:0.00059
learning rate = 0.00010 Epoch:49, Train_loss:0.00014, Test_loss:0.00059
learning rate = 0.00000 Epoch:50, Train_loss:0.00014, Test_loss:0.00059
==================== Done ====================
10. 模型评估
# loss 图
plt.figure(figsize=(5, 3),dpi=120)
plt.plot(train_loss , label='LSTM Training Loss')
plt.plot(test_loss, label='LSTM Validation Loss')
plt.title('Training and Validation Loss')
plt.legend()
plt.show()
11. 调用模型进行预测
predicted_y_lstm = sc.inverse_transform(model(X_test).detach().numpy().reshape(-1,1)) # 测试集输入模型进行预测
y_test_1 = sc.inverse_transform(y_test.reshape(-1,1))
y_test_one = [i[0] for i in y_test_1]
predicted_y_lstm_one = [i[0] for i in predicted_y_lstm]
plt.figure(figsize=(5, 3),dpi=120)
# 画出真实数据和预测数据的对比曲线
plt.plot(y_test_one[:2000], color='red', label='real_temp')
plt.plot(predicted_y_lstm_one[:2000], color='blue', label='prediction')
plt.title('Title')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.show()
12. R2评估
from sklearn import metrics
"""
RMSE :均方根误差 -----> 对均方误差开方
R2 :决定系数,可以简单理解为反映模型拟合优度的重要的统计量
"""
RMSE_lstm = metrics.mean_squared_error(predicted_y_lstm_one, y_test_1)**0.5
R2_lstm = metrics.r2_score(predicted_y_lstm_one, y_test_1)
print('均方根误差: %.5f' % RMSE_lstm)
print('R2: %.5f' % R2_lstm)
均方根误差: 6.89830
R2: 0.83397