本文来源公众号“Coggle数据科学”,仅用于学术分享,侵权删,干货满满。
原文链接:科大讯飞AI大赛:新能源发电功率预测挑战赛 Baseline
-
赛题名称:新能源发电功率预测挑战赛
-
赛题类型:数据挖掘
-
赛题任务:预测新能源场站发电功率预测
https://challenge.xfyun.cn/topic/info?type=renewable-power-forecast&ch=dwsf259
赛题背景
在新能源发电领域,风电和光电由于其受天气和环境因素的影响,属于间歇性和波动性的能源。准确的预测能够有效应对新能源发电的波动性和不确定性,确保电网的频率、电压、负荷平衡等关键指标稳定,降低停电风险,提高电力系统的经济性和可靠性。
电网公司会通过其调度中心对新能源发电场的功率预测进行考核,对于不达标的进行罚款,所以新能源发电功率预测是新能源场站的刚需,而且随着新能源比例的增加和智能电网技术的发展,发电功率预测将在未来电网管理中扮演越来越重要的角色。
赛题任务
根据历史发电功率数据和对应时段气象预测数据,实现未来2个月每天15分钟分辨率(共 96 个时间点)的新能源场站发电功率预测。
赛题数据
气象数据
比赛输入数据,每个风/光场站有来自三个不同的气象预报数据,气象源1(NWP_1)、气象源2(NWP_2)、气象源3(NWP_3),气象数据时间间隔15分钟,气象变量说明如下:
英文字段 | 字段类型 | 注释 |
---|---|---|
time | DATETIME | 时间 |
direct_radiation | DOUBLE | 预测辐射(W/m²) |
wind_direction_80m | DOUBLE | 预测80m风向(°) |
wind_speed_80m | DOUBLE | 预测80m风速(m/s) |
temperature_2m | DOUBLE | 预测2m温度(℃) |
relative_humidity_2m | DOUBLE | 预测2m湿度(%) |
precipitation | DOUBLE | 预测降水量(mm) |
注:参赛队伍可以不使用全部气象源的全部变量做为输入。
1.2 功率数据
比赛目标数据来自2个新能源场站的实发功率,其中包含1个风电场站和1个光伏场站。编号1为风电场,2为光伏电场。数据时间为北京时间,数据时间间隔为15分钟。需要注意数据中偶有空值、死值等异常值。功率数据变量说明如下:
英文字段 | 字段类型 | 注释 |
---|---|---|
time | DATETIME | 时间 |
real_power | DOUBLE | 实际发电功率(MW) |
1.3 数据集
训练集和测试集:
数据集 | 时间范围 | 数据含义 |
---|---|---|
气象数据 | 20240101 ~ 20250228 | 历史预测气象15min分辨率 |
功率数据 | 20240101 ~ 20241231 | 历史实发功率15min分辨率 |
1.4 结果提交
提交2个场站,20250101 ~ 20250228时间范围的预测功率结果csv文件,每个文件内的数据长度为96xD(每天24小时,15分钟分辨率的功率预测,共D日),文件index为北京时间的时间戳列名为'time',预测功率数据列名为'predict_power',示例如下。文件名以场站命名(output1.csv, output2.csv),所有输出压缩为一个文件:output.zip,提交至网站评测。
output1.csv数据示例:
time | predict_power |
---|---|
2025/1/1 0:00 | 28.76940995 |
2025/1/1 0:15 | 28.77233067 |
2025/1/1 0:30 | 28.85974633 |
2025/1/1 0:45 | 29.1289809 |
2025/1/1 1:00 | 29.74164174 |
2025/1/1 1:15 | 30.59554275 |
2025/1/1 1:30 | 31.59146121 |
... | ... |
注:详细提交结果格式,参考提供的样例output.zip
评估指标
统计准确率时,每个场站每天单独计算一个准确率值,并对其在所有评测天数内的准确率值求平均,得到该场站的整体准确率。最终准确率为风、光2个场站准确率的平均值。
赛题 Baseline
https://github.com/datawhalechina/competition-baseline/tree/master/competition/%E7%A7%91%E5%A4%A7%E8%AE%AF%E9%A3%9EAI%E5%BC%80%E5%8F%91%E8%80%85%E5%A4%A7%E8%B5%9B2025
-
数据读取
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeRegressor
from lightgbm import LGBMRegressor
windy_power = pd.read_csv("./dataset/1_windy_power_2024-01-01_2024-12-31.csv")
windy_weather1 = pd.read_csv("./dataset/2_sunny_weather_1_2024-01-01_2025-02-28.csv")
windy_weather2 = pd.read_csv("./dataset/2_sunny_weather_2_2024-01-01_2025-02-28.csv")
windy_weather3 = pd.read_csv("./dataset/2_sunny_weather_3_2024-01-01_2025-02-28.csv")
suny_power = pd.read_csv("./dataset/2_sunny_power_2024-01-01_2024-12-31.csv")
suny_weather1 = pd.read_csv("./dataset/2_sunny_weather_1_2024-01-01_2025-02-28.csv")
suny_weather2 = pd.read_csv("./dataset/2_sunny_weather_2_2024-01-01_2025-02-28.csv")
suny_weather3 = pd.read_csv("./dataset/2_sunny_weather_3_2024-01-01_2025-02-28.csv")
windy_weather1.columns = ["time"] + ["NWP_1_" + x for x in windy_weather1.columns[1:]]
windy_weather2.columns = ["time"] + ["NWP_2_" + x for x in windy_weather2.columns[1:]]
windy_weather3.columns = ["time"] + ["NWP_3_" + x for x in windy_weather3.columns[1:]]
suny_weather1.columns = ["time"] + ["NWP_1_" + x for x in suny_weather1.columns[1:]]
suny_weather2.columns = ["time"] + ["NWP_2_" + x for x in suny_weather2.columns[1:]]
suny_weather3.columns = ["time"] + ["NWP_3_" + x for x in suny_weather3.columns[1:]]
-
特征工程
windy_train_test = windy_power.merge(windy_weather1, on="time", how="right").merge(windy_weather2, on="time", how="right").merge(windy_weather3, on="time", how="right")
windy_train_test['year'] = windy_train_test['time'].dt.year
windy_train_test['month'] = windy_train_test['time'].dt.month
windy_train_test['day'] = windy_train_test['time'].dt.day
windy_train_test['dayofweek_num'] = windy_train_test['time'].dt.dayofweek
windy_train_test['dayofweek_name'] = windy_train_test['time'].dt.weekday
windy_train = windy_train_test[~windy_train_test["real_power"].isnull()]
windy_test = windy_train_test[windy_train_test["real_power"].isnull()].tail(5664)
suny_train_test = suny_power.merge(suny_weather1, on="time", how="right").merge(suny_weather3, on="time", how="right").merge(suny_weather3, on="time", how="right")
suny_train_test['year'] = suny_train_test['time'].dt.year
suny_train_test['month'] = suny_train_test['time'].dt.month
suny_train_test['day'] = suny_train_test['time'].dt.day
suny_train_test['dayofweek_num'] = suny_train_test['time'].dt.dayofweek
suny_train_test['dayofweek_name'] = suny_train_test['time'].dt.weekday
suny_train = suny_train_test[~suny_train_test["real_power"].isnull()]
suny_test = suny_train_test[suny_train_test["real_power"].isnull()].tail(5664)
-
模型训练与预测
model = LGBMRegressor()
model.fit(windy_train.drop(["time", "real_power"], axis=1), windy_train["real_power"])
pred = model.predict(windy_test.drop(["time", "real_power"], axis=1))
ouput1 = pd.read_csv("output1.csv")
ouput1["predict_power"] = pred
ouput1.to_csv("output1.csv", index=None)
model = LGBMRegressor()
model.fit(suny_train.drop(["time", "real_power"], axis=1), suny_train["real_power"])
pred = model.predict(suny_test.drop(["time", "real_power"], axis=1))
ouput2 = pd.read_csv("output2.csv")
ouput2["predict_power"] = pred
ouput2.to_csv("output2.csv", index=None)
THE END !
文章结束,感谢阅读。您的点赞,收藏,评论是我继续更新的动力。大家有推荐的公众号可以评论区留言,共同学习,一起进步。