深度学习LSTM之预测光伏发电

代码一：训练LSTM模型

代码逐段分析

import numpy as np
import pandas as pd
import tensorflow.keras as tk
from tensorflow.keras import layers

首先，导入了必要的库：numpy用于数值计算，pandas用于数据处理，tensorflow.keras用于构建和训练神经网络模型。

数据预处理

# Part 1 - Data Preprocessing
input_data_file = 'D:\Python\python_code\pythonProject1\实训2RNN\data\JSGF001\附件4-测光数据.xls'
output_data_file = 'D:\Python\python_code\pythonProject1\实训2RNN\data\JSGF001\附件2-场站出力.xls'
x_data = pd.read_excel(input_data_file, sheet_name='2019').iloc[2:, 1:].reset_index(drop=True).values
y_data = pd.read_excel(output_data_file, sheet_name='2019').iloc[1:, 1:].reset_index(drop=True).values

读取输入和输出数据文件，并使用pandas库处理Excel文件的数据。x_data是测光数据，y_data是场站出力数据。

# Creating a data structure with 10 timesteps and 1 output
x_train = []
y_train = []
for i in range(10, x_data.shape[0]):
    x_train.append(x_data[i - 10:i])
    y_train.append(y_data[i])
x_train, y_train = np.array(x_train).astype(np.float64), np.array(y_train).astype(np.float64)

将数据转换为时间序列结构，即使用前10个时间步的数据预测下一个时间步的输出。x_train和y_train分别存储输入和输出的训练数据。

构建和训练LSTM模型

# Part 2 - Building the RNN
# Initialising the RNN
model = tk.Sequential()
model.add(layers.LSTM(units=100, return_sequences=True))
model.add(layers.LSTM(units=100))
# Adding the output layer
model.add(layers.Dense(units=1))

初始化一个顺序模型，并添加两个LSTM层，每层包含100个单元。return_sequences=True表示返回每个时间步的输出，这对于堆叠的LSTM层是必要的。最后，添加一个全连接层作为输出层。

# Compiling the RNN
model.compile(optimizer=tk.optimizers.Adam(), loss=tk.losses.mse)
# Fitting the RNN to the Training set
model.fit(x_train, y_train, epochs=20, batch_size=128)
model.save("data/powerLSTM.keras")

使用Adam优化器和均方误差（MSE）损失函数编译模型，并训练模型20个周期（epochs），每批次处理128个样本。训练完成后，将模型保存到文件中。

代码二：使用训练好的LSTM模型进行预测

代码逐段分析

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow.keras as tk

model = tk.models.load_model("data/powerLSTM.keras")

导入必要的库，并加载之前训练好的LSTM模型。

数据预处理

input_data_file = 'D:\Python\python_code\pythonProject1\实训2RNN\data\JSGF001\附件4-测光数据.xls'
output_data_file = 'D:\Python\python_code\pythonProject1\实训2RNN\data\JSGF001\附件2-场站出力.xls'
x_data = pd.read_excel(input_data_file, sheet_name='2020').iloc[2:202, 1:].reset_index(drop=True).values
y_data = pd.read_excel(output_data_file, sheet_name='2020').iloc[1:201, 1:].reset_index(drop=True).values
# Creating a data structure with 10 timesteps and 1 output
x_test = []
y_test = []
for i in range(10, x_data.shape[0]):
    x_test.append(x_data[i - 10:i])
    y_test.append(y_data[i])
x_test, y_test = np.array(x_test).astype(np.float64), np.array(y_test).astype(np.float64)

读取2020年的输入和输出数据，并进行相同的预处理，将数据转换为时间序列结构。

模型预测和结果可视化

y_pred = model.predict(x_test)

使用训练好的LSTM模型对测试数据进行预测。

# Visualising the results
plt.plot(y_test, color='red', label='Real Power')
plt.plot(y_pred, color='blue', label='Predicted Power')
plt.title('Photovoltaics Prediction')
plt.xlabel('Time')
plt.ylabel('Power')
plt.legend()
plt.show()

print(abs(y_test - y_pred).sum()/len(y_test))

绘制真实值与预测值的对比图，以红色表示真实功率，蓝色表示预测功率。计算并输出预测误差的平均绝对值。

深度学习模型：LSTM

第一个代码使用的是长短期记忆（LSTM）网络，这是循环神经网络（RNN）的一种变体。LSTM通过引入三个门控机制（输入门、遗忘门和输出门），解决了标准RNN中的长期依赖问题。

LSTM的工作原理

输入门：控制输入到当前时刻的信息有多少会被存储到细胞状态。
遗忘门：控制细胞状态中有多少信息会被保留。
输出门：控制有多少细胞状态的信息会被输出到下一层。

激活函数

在深度学习模型中，激活函数是非常重要的一部分。它们引入了非线性，使得神经网络能够学习和表示复杂的模式。

常见的激活函数

Sigmoid：

\sigma(x) = \frac{1}{1 + e^{-x}}

输出值在0和1之间。常用于输出层进行二分类问题。
Tanh：

\tanh(x) = \frac{e^x - e^{-x}}{ex + e^{-x}}

输出值在-1和1之间，常用于隐藏层，效果通常优于Sigmoid。
ReLU（Rectified Linear Unit）：

\text{ReLU}(x) = \max(0, x)

是目前最流行的激活函数，因其计算简单且能有效缓解梯度消失问题。
Leaky ReLU：

\text{Leaky ReLU}(x) = \begin{cases}
x & \text{if } x \ge 0 \
\alpha x & \text{if } x < 0
\end{cases}

是ReLU的变体，允许小部分负值通过，有效解决ReLU的“死亡”问题。

完整代码

import numpy as np
import pandas as pd
import tensorflow.keras as tk
from tensorflow.keras import layers

# Part 1 - Data Preprocessing
input_data_file = 'D:\Python\python_code\pythonProject1\实训2RNN\data\JSGF001\附件4-测光数据.xls'
output_data_file = 'D:\Python\python_code\pythonProject1\实训2RNN\data\JSGF001\附件2-场站出力.xls'
x_data = pd.read_excel(input_data_file, sheet_name='2019').iloc[2:, 1:].reset_index(drop=True).values
y_data = pd.read_excel(output_data_file, sheet_name='2019').iloc[1:, 1:].reset_index(drop=True).values
# Creating a data structure with 10 timesteps and 1 output
x_train = []
y_train = []
for i in range(10, x_data.shape[0]):
    x_train.append(x_data[i - 10:i])
    y_train.append(y_data[i])
x_train, y_train = np.array(x_train).astype(np.float64), np.array(y_train).astype(np.float64)

# Part 2 - Building the RNN
# Initialising the RNN
model = tk.Sequential()
model.add(layers.LSTM(units=100, return_sequences=True))
model.add(layers.LSTM(units=100))
# Adding the output layer
model.add(layers.Dense(units=1))
# Compiling the RNN
model.compile(optimizer=tk.optimizers.Adam(),
              loss=tk.losses.mse)
# Fitting the RNN to the Training set
model.fit(x_train, y_train, epochs=20, batch_size=128)

model.save("data/powerLSTM.keras")

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow.keras as tk

model = tk.models.load_model("data/powerLSTM.keras")

input_data_file = 'D:\Python\python_code\pythonProject1\实训2RNN\data\JSGF001\附件4-测光数据.xls'
output_data_file = 'D:\Python\python_code\pythonProject1\实训2RNN\data\JSGF001\附件2-场站出力.xls'
x_data = pd.read_excel(input_data_file, sheet_name='2020').iloc[2:202, 1:].reset_index(drop=True).values
y_data = pd.read_excel(output_data_file, sheet_name='2020').iloc[1:201, 1:].reset_index(drop=True).values
# Creating a data structure with 10 timesteps and 1 output
x_test = []
y_test = []
for i in range(10, x_data.shape[0]):
    x_test.append(x_data[i - 10:i])
    y_test.append(y_data[i])
x_test, y_test = np.array(x_test).astype(np.float64), np.array(y_test).astype(np.float64)

y_pred = model.predict(x_test)

# Visualising the results
plt.plot(y_test, color='red', label='Real Power')
plt.plot(y_pred, color='blue', label='Predicted Power')
plt.title('Photovoltaics Prediction')
plt.xlabel('Time')
plt.ylabel('Power')
plt.legend()
plt.show()

print(abs(y_test - y_pred).sum()/len(y_test))