Bidirectional Temporal Convolutional Network \begin{aligned} &\text{\Large \color{#CDA59E}Bidirectional Temporal Convolutional Network}\\ \end{aligned} Bidirectional Temporal Convolutional Network
Bidirectional Temporal Convolutional Network (BiTCN) is a forecasting architecture based on two temporal convolutional networks (TCNs). The first network (‘forward’) encodes future covariates of the time series, whereas the second network (‘backward’) encodes past observations and covariates. This method allows to preserve the temporal information of sequence data, and is computationally more efficient than common RNN methods (LSTM, GRU, …). As compared to Transformer-based methods, BiTCN has a lower space complexity, i.e. it requires orders of magnitude less parameters.
References
-Olivier Sprangers, Sebastian Schelter, Maarten de Rijke (2023). Parameter-Efficient Deep Probabilistic Forecasting. International Journal of Forecasting 39, no. 1 (1 January 2023): 332–45. URL: https://doi.org/10.1016/j.ijforecast.2021.11.011.
-Shaojie Bai, Zico Kolter, Vladlen Koltun. (2018). An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. Computing Research Repository, abs/1803.01271. URL: https://arxiv.org/abs/1803.01271.
-van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A. W., & Kavukcuoglu, K. (2016). Wavenet: A generative model for raw audio. Computing Research Repository, abs/1609.03499. URL: http://arxiv.org/abs/1609.03499. arXiv:1609.03499.
前言
系列专栏:【深度学习:算法项目实战】✨︎
涉及医疗健康、财经金融、商业零售、食品饮料、运动健身、交通运输、环境科学、社交媒体以及文本和图像处理等诸多领域,讨论了各种复杂的深度神经网络思想,如卷积神经网络、循环神经网络、生成对抗网络、门控循环单元、长短期记忆、自然语言处理、深度强化学习、大型语言模型和迁移学习。
BiTCN,即双向时间卷积网络(Bidirectional Temporal Convolutional Network),作为深度学习领域极具创新性的神经网络架构,其核心设计亮点在于 “双向卷积” 机制。与传统单向卷积网络仅从单一时间流向挖掘信息不同,BiTCN 能够同时从时间序列的正向与反向进行卷积操作。这意味着在处理电力负荷等时序数据时,它不仅能捕捉到随时间递增方向上数据的变化趋势,诸如负荷随时间逐步上升的白天用电高峰特征;还能敏锐感知反向时间流中蕴含的关键信息,像是捕捉夜间用电量逐渐降低过程中隐藏的规律。如此双向并行的信息采集模式,极大地扩充了可获取信息的边界,有效避免因单向视角局限而遗漏重要特征。
在模型内部结构方面,BiTCN 精心构建了多层卷积层与池化层交替排列的布局。通过卷积层,利用不同尺寸的卷积核精细扫描时间序列,精准提取从局部到全局的各类特征。小尺寸卷积核聚焦于数据细微波动,挖掘短周期内的用电模式变化;大尺寸卷积核则负责勾勒宏观趋势,捕捉如季节更迭引发的长期用电负荷起伏。紧随其后的池化层发挥着下采样功能,在降低数据维度的同时保留核心特征,既减少计算量、提升运算效率,又确保关键信息不流失,为后续深层次的网络处理夯实基础。
文章目录
- 1. 数据集介绍
- 2. 数据预处理
- 3. 数据可视化
- 4. 构建模型
- 5. 交叉验证
- 6. 模型预测
- 7. 回归拟合图
- 8. 模型评估
1. 数据集介绍
本文用到的数据集是ETTh1.csv,ETTh1数据集是电力变压器数据集(ETDataset)的一部分,旨在用于长序列时间序列预测问题的研究。该数据集收集了中国两个不同县两年的数据,以预测特定地区的电力需求情况。
import pandas as pd
import matplotlib.pyplot as plt
from neuralforecast.core import NeuralForecast
from neuralforecast.models import BiTCN
from neuralforecast.losses.pytorch import MAE
from neuralforecast.losses.numpy import mae, mse, mape, rmse
from datasetsforecast.long_horizon import LongHorizon
# Change this to your own data to try the model
Y_df, X_df, _ = LongHorizon.load(directory='./', group='ETTh1')
2. 数据预处理
Y_df['ds'] = pd.to_datetime(Y_df['ds'])
3. 数据可视化
plt.style.use('ggplot')
plt.plot(Y_df['y'], color='darkorange' ,label='Trend')
plt.show()
n_time = len(Y_df.ds.unique())
val_size = int(.2 * n_time)
test_size = int(.2 * n_time)
Y_df.groupby('unique_id').head(5)
4. 构建模型
nf = NeuralForecast(
models = [
BiTCN(
h = 1, # Forecasting horizon
input_size = 24, # Input size
hidden_size = 64, # Units for the TCN's hidden state size
dropout = 0.5,
loss=MAE(),
valid_loss=MAE(),
max_steps = 1000, # Number of training iterations
learning_rate = 1e-3,
num_lr_decays = -1,
early_stop_patience_steps = -1,
val_check_steps = 100, # Compute validation loss every 100 steps
batch_size = 128,
random_seed=1234,
),
],
freq='H'
)
5. 交叉验证
交叉验证方法 cross_validation
将返回模型在测试集上的预测结果。
Y_hat_df = nf.cross_validation(df=Y_df,
val_size=val_size,
test_size=test_size,
n_windows=None)
| Name | Type | Params | Mode
---------------------------------------------------------
0 | loss | MAE | 0 | train
1 | valid_loss | MAE | 0 | train
2 | padder_train | ConstantPad1d | 0 | train
3 | scaler | TemporalNorm | 0 | train
4 | lin_hist | Linear | 128 | train
5 | drop_hist | Dropout | 0 | train
6 | net_bwd | Sequential | 82.9 K | train
7 | drop_temporal | Dropout | 0 | train
8 | temporal_lin1 | Linear | 1.6 K | train
9 | temporal_lin2 | Linear | 65 | train
10 | output_lin | Linear | 65 | train
---------------------------------------------------------
84.7 K Trainable params
0 Non-trainable params
84.7 K Total params
0.339 Total estimated model params size (MB)
31 Modules in train mode
0 Modules in eval mode
Y_hat_df.head()
6. 模型预测
Y_plot = Y_hat_df.copy() # OT dataset
cutoffs = Y_hat_df['cutoff'].unique()[::1]
Y_plot = Y_plot[Y_hat_df['cutoff'].isin(cutoffs)]
plt.figure(figsize=(20,5))
plt.plot(Y_plot['ds'], Y_plot['y'], label='True')
plt.plot(Y_plot['ds'], Y_plot['BiTCN'], label='BiTCN')
plt.xlabel('Datestamp')
plt.ylabel('OT')
plt.grid()
plt.legend()
plt.savefig('BiTCN.png')
7. 回归拟合图
使用 regplot()
函数绘制数据图,拟合预测值与真实值的线性回归图。
plt.figure(figsize=(5, 5), dpi=100)
sns.regplot(x=Y_plot['y'], y=Y_plot['BiTCN'], scatter=True, marker="*", color='orange',line_kws={'color': 'red'})
plt.show()
8. 模型评估
以下代码使用了一些常见的评估指标:平均绝对误差(MAE)、平均绝对百分比误差(MAPE)、均方误差(MSE)、均方根误差(RMSE)来衡量模型预测的性能。这里我们将调用 neuralforecast.losses.numpy
模块中的 mae
, mse
, mape
, rmse
函数来对模型的预测效果进行评估。
mae = mae(Y_hat_df['y'], Y_hat_df['BiTCN'])
print(f"MAE: {mae:.4f}")
mape = mape(Y_hat_df['y'], Y_hat_df['BiTCN'])
print(f"MAPE: {mape * 100:.4f}%")
mse = mse(Y_hat_df['y'], Y_hat_df['BiTCN'])
print(f"MSE: {mse:.4f}")
rmse = rmse(Y_hat_df['y'], Y_hat_df['BiTCN'])
print(f"RMSE: {rmse:.4f}")
MAE: 0.1239
MAPE: 8.9629%
MSE: 0.0209
RMSE: 0.1444