这是我的第407篇原创文章。
一、引言
Transformer 最初用于 NLP(自然语言处理),但现在也常用于时间序列预测。贝叶斯优化是一种基于概率模型的全局优化方法,用于在函数评估代价很高(比如训练神经网络)的情况下寻找最优参数。本文通过一个具体的案例简单的介绍这一实现过程。
二、实现过程
2.1 数据准备
核心代码:
df = pd.DataFrame(pd.read_csv('data.csv'))
# 将日期列转换为日期时间类型
df['Month'] = pd.to_datetime(df['Month'])
df.columns = ['time','value']
df = df[['value','time']]
print(df)
# 画图:时间序列整体趋势和波动
plt.plot(df["time"], df["value"], color='blue')
plt.title("Generated Time Series Data")
plt.xlabel("Time")
plt.ylabel("Value")
plt.show()
数据可视化:
2.2 数据预处理
核心代码:
# 划分比例
time_steps = len(df)
train_ratio = 0.8
train_size = int(time_steps * train_ratio)
train_df = df.iloc[:train_size]
test_df = df.iloc[train_size:]
# 归一化(只用训练集参数)
mean = train_df["value"].mean()
std = train_df["value"].std()
train_df["value_norm"] = (train_df["value"] - mean) / std
test_df["value_norm"] = (test_df["value"] - mean) / std
# 画图:训练集与测试集划分
plt.plot(train_df["time"], train_df["value"], label="Train", color='green')
plt.plot(test_df["time"], test_df["value"], label="Test", color='red')
plt.title("Train-Test Split of Time Series")
plt.xlabel("Time")
plt.ylabel("Value")
plt.legend()
plt.show()
数据可视化:
2.3 构建数据集
核心代码:
# 设置窗口大小
input_window = 2
output_window = 1
train_series = train_df["value_norm"].values
test_series = test_df["value_norm"].values
train_dataset = TimeSeriesDataset(train_series, input_window, output_window)
test_dataset = TimeSeriesDataset(test_series, input_window, output_window)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)
2.4 建立transformer模型
核心代码:
class TransformerTimeSeries(nn.Module):
def __init__(self, input_dim=1, d_model=64, nhead=4, num_layers=2, dim_feedforward=128, dropout=0.1, output_dim=1):
super(TransformerTimeSeries, self).__init__()
self.d_model = d_model
self.input_proj = nn.Linear(input_dim, d_model)
self.pos_encoder = PositionalEncoding(d_model)
encoder_layer = nn.TransformerEncoderLayer(d_model=d_model, nhead=nhead, dim_feedforward=dim_feedforward,
dropout=dropout)
self.transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=num_layers)
self.decoder = nn.Linear(d_model, output_dim)
def forward(self, src):
# src shape: (batch_size, seq_len, input_dim)
src = self.input_proj(src) * math.sqrt(self.d_model) # 线性投影和缩放
src = self.pos_encoder(src)
src = src.permute(1, 0, 2) # Transformer需要(seq_len, batch_size, d_model)
output = self.transformer_encoder(src)
output = output[-1, :, :] # 取序列最后时间步的输出
output = self.decoder(output) # (batch_size, output_dim)
return output
一个简易版时间序列Transformer,核心是Encoder结构加线性预测头
2.5 贝叶斯优化调参
核心代码:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=30)
print("Best trial:")
trial = study.best_trial
print(f"Value: {trial.value}")
print("Params: ")
for key, value in trial.params.items():
print(f"{key}: {value}")
best_params = study.best_params
final_model = TransformerTimeSeries(
d_model=best_params["d_model"],
nhead=best_params["nhead"],
num_layers=best_params["num_layers"],
dim_feedforward=best_params["dim_feedforward"],
dropout=best_params["dropout"]
).to(device)
优化过程:
最终参数:
2.6 用最优超参数训练最终模型
核心代码:
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(final_model.parameters(), lr=best_params["lr"])
batch_size = best_params["batch_size"]
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)
epochs = 80
train_losses = []
val_losses = []
for epoch in range(epochs):
train_loss = train_one_epoch(final_model, optimizer, criterion, train_loader, device)
val_loss, preds, trues = evaluate(final_model, criterion, test_loader, device)
train_losses.append(train_loss)
val_losses.append(val_loss)
print(f"Epoch {epoch+1}: train loss={train_loss:.4f}, val loss={val_loss:.4f}")
训练过程:
2.7 数据可视化分析
1、训练损失与验证损失曲线
plt.plot(range(1, epochs+1), train_losses, label="Train Loss", color='blue')
plt.plot(range(1, epochs+1), val_losses, label="Validation Loss", color='orange')
plt.xlabel("Epoch")
plt.ylabel("MSE Loss")
plt.title("Training and Validation Loss Over Epochs")
plt.legend()
plt.show()
模型训练过程,观察是否过拟合、收敛情况。
2、预测值 vs 真实值(测试集)
plt.plot(trues, label="True Values", color='green')
plt.plot(preds, label="Predicted Values", color='red', alpha=0.7)
plt.title("True vs Predicted Values on Test Set")
plt.xlabel("Time Step")
plt.ylabel("Normalized Value")
plt.legend()
plt.show()
直观对比预测结果和真实数据,判断模型预测的准确度。
3、残差(预测误差)分布直方图
residuals = trues - preds
sns.histplot(residuals, bins=30, kde=True, color='purple')
plt.title("Residuals Distribution on Test Set")
plt.xlabel("Residual (True - Predicted)")
plt.ylabel("Frequency")
plt.show()
检验预测误差是否符合正态分布,残差无偏且集中,说明模型拟合良好。
4、贝叶斯优化的超参数探索轨迹(学习率 vs 损失)
lr_values = [trial.params["lr"] for trial in study.trials]
loss_values = [trial.value for trial in study.trials]
plt.scatter(lr_values, loss_values, c=loss_values, cmap='viridis', s=80, alpha=0.8)
plt.xscale('log')
plt.colorbar(label='Validation Loss')
plt.xlabel("Learning Rate (log scale)")
plt.ylabel("Validation Loss")
plt.title("Bayesian Optimization: Learning Rate vs Validation Loss")
plt.show()
超参数调优过程中学习率和验证误差的关系,帮助理解哪些参数表现较好。
作者简介:
读研期间发表6篇SCI数据挖掘相关论文,现在某研究院从事数据算法相关科研工作,结合自身科研实践经历不定期分享关于Python、机器学习、深度学习、人工智能系列基础知识与应用案例。致力于只做原创,以最简单的方式理解和学习,关注我一起交流成长。需要数据集和源码的小伙伴可以关注底部公众号添加作者微信。