0% found this document useful (0 votes)

9 views13 pages

Cải thiện dự báo tải điện bằng cách sử dụng mạng bộ nhớ dài hạn phân vị với cơ chế chú ý kép

This study presents a novel composite model, TCN-Self-Attention-BILSTM (TSAB), aimed at improving short-term power load forecasting accuracy by integrating advanced neural network architectures. The model utilizes a Temporal Convolutional Network for feature extraction, a self-attention mechanism for dynamic feature weighting, and Bidirectional Long Short-Term Memory for regression, supported by a new optimization algorithm called Enhanced Triangular Topology Aggregation Optimizer (ETTAO). Experimental results demonstrate that TSAB significantly outperforms traditional forecasting models, enhancing operational decision-making in the electric power sector.

Uploaded by

luongkho9420

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views13 pages

Cải thiện dự báo tải điện bằng cách sử dụng mạng bộ nhớ dài hạn phân vị với cơ chế chú ý kép

Uploaded by

luongkho9420

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Electric Power Systems Research 241 (2025) 111330

Contents lists available at ScienceDirect

Electric Power Systems Research

journal homepage: www.elsevier.com/locate/epsr

Improved composite model using metaheuristic optimization algorithm for

short-term power load forecasting
Xuhui Hu , Huimin Li * , Chen Si
School of Electrical Engineering, Shandong University, Jinan 25000, PR China

A R T I C L E I N F O A B S T R A C T

Keywords: Accurate short-term electric load forecasting contributes to operational efficiency, grid stability, and profitability
Short-term load forecasting in power systems and energy markets. With increasing complexity in grid operations due to renewable inte
BILSTM gration and demand fluctuations, enhancing forecasting precision, particularly at 15-minute intervals, has
Self-attention
become essential. In this study, we propose a composite model, TCN-Self-Attention-BILSTM (TSAB), designed to
TCN
Optimization algorithm
improve load prediction accuracy by integrating multiple advanced neural network architectures. Specifically,
TSAB the Temporal Convolutional Network (TCN) captures long-term dependencies, while a self-attention mechanism
ETTAO dynamically emphasizes key features, and the Bidirectional Long Short-Term Memory Network (BILSTM) es
tablishes robust temporal relationships. To optimize the hyperparameters of TSAB efficiently, we introduce the
Enhanced Triangular Topology Aggregation Optimizer (ETTAO), a novel approach for rapid and effective tuning
for composite models such as TSAB. Additionally, to evaluate model predication accuracy and hyperparameter
optimization, we present a new objective function that combines multiple evaluation metrics based on their
physical significance, balancing model performance across key aspects of accuracy. Experimental validation on
two benchmark datasets demonstrates that TSAB outperforms conventional models, including LSTM, TCN, and
CNN-BILSTM-Attention, in both feature extraction and predictive accuracy. Together, TSAB, ETTAO, and the
new evaluation function offer a comprehensive approach to improving prediction accuracy, hyperparameter
optimization tuning, and model evaluation, contributing to more effective and reliable short-term load fore
casting in the electric power sector, with implications for enhanced operational decision-making in the electric
power sector.

1. Introduction has been one of the most studied topics for researchers in recent decades.
Early research predominantly used linear models and classical time
Accurate power load forecasting is crucial for power systems, series models such as ARIMA [5,6], Kalman filters [7], and Bayesian
particularly in production planning, daily operations, and optimal estimation [8]. However, with the increased complexity of power sys
scheduling to effectively plan the dispatching of the power system [1]. tems, the non-stationarity of power load sequences has risen signifi
Research results show that for a 10,000 MW power company, even a 1 % cantly, rendering traditional linear methods impractical [9]. Recently,
reduction in forecast error can result in annual savings of up to $1.6 data-centric Artificial Intelligence (AI) methods have been widely uti
million [2]. lized for load forecasting [10], including the Random Forest Algorithm
The highly nonlinear and frequently fluctuating nature of power load [11], Support Vector Machine (SVM) [12], and Extreme Learning Ma
necessitates predicting its future values based on historical and current chine (ELM) [13]. Among them, the Long Short-Term Memory network
data [3]. However, this task is complicated by the influence of complex (LSTM) has gained attention due to its unique memory capabilities and
natural and social factors. Some factors can be anticipated, while others gate structure. However, the LSTM model only considers one-way in
cannot, making accurate prediction challenging due to inherent formation connections. Bidirectional LSTM (BILSTM) uses its two-way
randomness [4]. Compared to long-term load forecasting, short-term information links to improve prediction accuracy. Zhang et al. [14]
load forecasting (STLF) for the next few hours to days is more volatile, have demonstrated the suitability of BILSTM over LSTM, making it a
harder to predict, but offers greater forecasting benefits. Therefore, STLF better model for load forecasting.

* Corresponding author.
E-mail address: [email protected] (H. Li).

https://doi.org/10.1016/j.epsr.2024.111330
Received 12 October 2024; Received in revised form 29 November 2024; Accepted 3 December 2024
Available online 26 December 2024
0378-7796/© 2024 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
X. Hu et al. Electric Power Systems Research 241 (2025) 111330

As smart grid technology continues to advance and incorporate forecasting. Tested across multiple CEC benchmark functions,
diverse new energy sources, such as solar and wind power, alongside ETTAO demonstrates good performance, making it an effective tool
emerging loads like electric vehicles (EVs), the complexity and vari for tuning composite models in practical power forecasting sce
ability of short-term load patterns have significantly increased [15]. narios, particularly where high accuracy is critical for decision-
These new energy sources, EVs and other emerging loads create unique making in grid operations.
and dynamic consumption behaviors that deviate from traditional load 3. Development of a Quality Function for Balanced Model Evalu
profiles. This evolving landscape poses challenges for conventional ation and Optimization in Power Load Forecasting: We introduce
single-model forecasting approaches, which often struggle to capture the a quality function that combines four widely-used evaluation metrics
multifaceted, nonlinear, and highly variable nature of modern power into a single objective measure, providing a scientifically grounded
systems. As a result, there is a growing need for advanced research that approach for balanced model evaluation and hyperparameter opti
leverages state-of-the-art artificial intelligence (AI) technologies to mization. This function accounts for the physical significance of each
develop more robust and accurate forecasting methods. Composite metric, facilitating a holistic view of model performance. By opti
models, which combine the strengths of multiple techniques, offer a mizing for this consolidated quality function, our model achieves
promising solution by addressing these complexities and enhancing enhanced predictive accuracy across diverse load patterns. This
predictive performance. Wang et al. [16] proposed a hybrid approach ensures that model optimization is not biased towards a
CNN-BILSTM model for power load forecasting. Compared to standalone single metric but instead achieves balanced performance across
LSTM and CNN-LSTM models, the CNN-BILSTM model showed superior multiple evaluation criteria, crucial for robust power system
predictive accuracy. Ren et al. [17] enhanced this approach by inte applications.
grating an attention mechanism into the CNN-BILSTM model, resulting
in significant improvements in prediction accuracy. The remainder of this paper is structured as follows: Section 2 pre
Despite the improved accuracy achieved by the CNN-BILSTM- sents the model architecture of TSAB and its component composition.
Attention model, several challenges persist. First, CNN, originally Section 3 introduces the enhancement of the ETTAO algorithm over the
designed for image recognition, exhibits limited feature extraction ca original algorithm, and its component composition on CEC test functions
pabilities for time series data, even with adaptivity to one-dimensional using the proposed evaluation function. Section 4 introduces the two
convolutions. Its fixed window size further hampers the capture of datasets used in this paper and some evaluation indicators, and subse
long-term dependent sequence relationships, thereby restricting the quently demonstrates the process and effect of the ETTAO algorithm for
extraction of deep temporal features from the original sequence. Addi parameter optimization of the model on the validation set. Section 5
tionally, in many combined models, the attention mechanism is often presents the results of three types of experiments conducted using two
added as a single layer at the end through linear stacking, resulting in datasets. Section 6 provides the relevant conclusions.
incomplete utilization and making it difficult to elucidate its contribu
tion to accuracy enhancement. Moreover, the integration of composite 2. TSAB model architecture
models introduces new hyperparameters that require optimization. The
traditional manual trial-and-error approach is time-consuming and 2.1. Model description
labor-intensive, often failing to identify effective hyperparameter com
binations, leading to the abandonment of potentially effective models. The TSAB model is developed as a novel composite structure to
Despite the availability of potentially more effective meta-heuristic al address specific challenges in short-term electricity load forecasting.
gorithms, many scholars continue to rely on traditional methods like This architecture combines Temporal Convolutional Networks (TCN),
Bayesian optimization for hyperparameter optimization. To address self-attention mechanisms, and Bidirectional Long Short-Term Memory
these issues, this paper proposes a novel composite model assisted by an Networks (BILSTM) to leverage the strengths of each component in
advanced optimization algorithm for hyperparameter tuning. The con processing and predicting complex temporal patterns. While TCN, self-
tributions are: attention, and BILSTM are established methods in time series analysis,
their combined use in a composite model tailored for load forecasting is
1. Development of a Composite Model for Enhanced Temporal original to this work and offers specific benefits over conventional
Feature Extraction in Power Systems: This work presents an structures like CNN-LSTM-Attention.
advanced composite model, TCN-Self-Attention-BILSTM (TSAB), The composite model structure can generally be categorized into two
specifically tailored to improve the accuracy of short-term electric types: encoder-decoder and feature mining-regression structures. This
load forecasting at the grid level. By combining the Temporal Con work adopts the latter, leveraging TCN for feature mining and BILSTM
volutional Network (TCN), Bidirectional Long Short-Term Memory for regression. Unlike prior models, which typically employ CNN as the
(BILSTM), and a self-attention mechanism, the model is highly feature miner, this model replaces CNN with TCN, capitalizing on TCN’s
effective for electric load forecasting. Unlike traditional CNN-based dilated causal convolutions. This approach enables the model to capture
approaches, our model leverages the TCN’s dilated and causal long-term dependencies in the data without being restricted by a fixed
convolution structure to capture long-term dependencies essential in window size. As a result, TCN allows the mining of longer-term temporal
power load forecasting. The use of residual blocks further enhances dependencies, which is essential for short-term load forecasting. Addi
stability by mitigating gradient issues. The self-attention module tionally, to improve stability during training, batch normalization is
within the TCN adaptively weights features, highlighting critical applied to the TCN module, and a sequence forgetting layer is intro
temporal patterns, which improves regression accuracy and duced to remove information line-by-line instead of point-by-point,
computational efficiency. These enhancements allow the model to effectively mitigating risks of premature overfitting.
capture both routine and complex variations in load patterns, aiding Given that not all mined features contribute equally to accurate
in more accurate grid-level planning and load dispatch. forecasting, the second improvement is the integration of a self-attention
2. Introduction of the Enhanced Triangular Topology Aggregation mechanism within the TCN module. This helps to adaptively assigns
Optimizer (ETTAO): We propose the Enhanced Triangular Topology weights to the features extracted by TCN, emphasizing those that are
Aggregation Optimizer (ETTAO) to address hyperparameter tuning most relevant to the forecasting task. The adaptive weighting provided
challenges inherent in complex composite models. Compared to by self-attention optimizes feature relevance before entering the
conventional optimization techniques like whale optimization and regression stage, thereby enhancing the model’s ability to capture
genetic algorithms, ETTAO provides greater efficiency and robust complex temporal relationships critical for load prediction. The feature
ness in optimizing neural network architectures for short-term load mining structure is shown in Fig. 1.

2
X. Hu et al. Electric Power Systems Research 241 (2025) 111330

Fig. 1. Structure of feature mining.

Fig. 2. The architecture of TSAB model.

3
X. Hu et al. Electric Power Systems Research 241 (2025) 111330

For regression, BILSTM is utilized instead of standard LSTM. BILSTM [19].

offers a bidirectional architecture that captures both past and future To address this, dilated convolution was proposed. It skips parts of
dependencies within a time series, which is particularly advantageous in the input and adjusts the receptive field size by modifying the dilation
processing non-linear and non-stationary load data. This bidirectional rate, enabling the convolution kernel to operate on a larger region. This
structure improves the model’s predictive capability by incorporating allows the network to flexibly adjust the amount of historical informa
contextual information from both directions, which is important in tion considered by the output. For input time series x, dilated convolu
capturing the intricate dynamics of load changes. tion is calculated by:
The overall structure of the TSAB model, as shown in Fig. 2, is as k− 1
∑
follows: the input time series first enters the TCN module with the self- F(t) = f(i)xt− di (1)
attention mechanism for feature mining and adaptive weighting. These i=0
processed features are fed into BILSTM for regression, and the prediction
Where :F(t) is the output after one dilated convolution operation;dis
results are generated through a fully connected layer.
the sampling rate which is the dilation factor; k is the size of the
In summary, the proposed TSAB model introduces a unique combi
convolution kernel; f(i) is the i -th element in the convolution kernel;
nation of TCN, self-attention, and BILSTM specifically configured for
xt− di is to perform convolution only on past data [20].
short-term electricity load forecasting. This composite approach pro
vides a robust feature extraction, adaptive weighting, and regression
2.3. Self-attention mechanism
structure, addressing limitations in traditional models and improving
overall forecasting accuracy. A comprehensive overview of each module
The self-attention mechanism, a classical attention method, ensures
is provided in Sections 2.2 to 2.4.
that each sequence element is related not only to its neighbors but also to
other elements globally. By calculating the relative importance between
2.2. TCN net elements, the self-attention mechanism adaptively captures long-term
dependencies within the sequence.
TCN, a one-dimensional convolutional neural network, excels in The calculation process of the self-attention mechanism is illustrated
processing and mining time sequence information [18]. Its structure in Fig. 3. The first step is the embedding operation, which converts the
consists a stack of dilated causal convolutions, with residual connections original input sequence x into α vector. The specific formula is:
to prevent gradient vanishing from excessive network depth. Causal αi = Wxi (2)
convolution ensures that each layer’s state at time T depends only on the
state at time T and T − 1, creating a one-way structure that excludes Where: W is the parameter matrix of Embedding.
future information. However, simple causal convolution is limited by Next, the query, key and value (q, k, v) matrices are obtained by
the size of a single convolution kernel and cannot capture long-term calculation according to Eq. (3) (4) (5).
sequence dependencies. Stacking more layers to extend the receptive q i = W q αi (3)
field can lead to gradient vanishing, poor fitting, and complex training

Fig. 3. The self-attention structures.

4
X. Hu et al. Electric Power Systems Research 241 (2025) 111330

structure of the BILSTM.

ki = Wk αi (4)
Ai = f1 (ω1 xi + ω2 Ai− 1 ) (8)
vi = Wv αi (5)
Bi = f2 (ω3 xi + ω5 Bi+1 ) (9)
The operations q and k are subsequently executed to derive αj,i ,
̃j,i . The formula to
which is then fed into the softmax layer to yield α Yi = f3 (ω4 Ai + ω6 Bi ) (10)
derive α is as follows.
j,i
Where: f1 , f2 , f3 is the activation function between different layers.
ki
αj,i = qj • √̅̅̅ (6)
d 3. Parameter optimization algorithm

Where: d denotes the matrix dimension of q and k

3.1. ETTAO algorithm
Finally, the output β is obtained through the operation of v.
∑ j,i
βj = ̃ vi
α (7) The TTAO algorithm, detailed by Zhao et al. [23], will not be further
i elaborated here. This paper focuses on enhancements to address its
susceptibility to local optima, limited exploration ability, and subpar
The formulas discussed in this section are sourced from [21] and are
performance in certain optimization tasks. The following improvements
graphically represented in Fig. 3.
were implemented:
2.4. BILSTM net
3.1.1. Change the random initialization
Most optimization algorithms suffer from premature convergence to
BILSTM is an improved and optimized version of the traditional
local optima. This issue often arises from using a random initial function
LSTM network. It combines a forward LSTM layer with a backward
(rand) to initialize the population. The randomness of rand results in an
LSTM layer, both of which impact the output. The bidirectional structure
uneven distribution of points in the parameter space. For example, when
allows for comprehensive learning of sequential information connec
225 points are randomly generated in a two-dimensional space, they
tivity across the entire sequence, leading to enhanced prediction accu
tend to cluster densely in some areas while remaining sparse in others, as
racy [19].
shown in Fig. 5(a). This uneven distribution is even more pronounced in
The structure of BILSTM is shown in Fig. 4, where, x1 , x2 , x3 , ⋯, xt
higher-dimensional spaces. Consequently, the algorithm’s initial popu
represents the corresponding input data at each moment, A1 , A2 , A3 ,
lation density can lead to premature convergence, making it difficult to
⋯At , B1 , B2 , B3 , ⋯Bt represents the hidden state of the corresponding
explore the parameter space thoroughly.
forward and backward LSTM, respectively. Y1 , Y2 , Y3 , ⋯Yt is the corre
To address this problem, this paper introduces the use of a low-
sponding output value, ω1 , ω2 , ω3 , ⋯ωt is the corresponding weight of
discrepancy sequence, specifically the Sobol sequence. Unlike purely
each layer. The update formulas for the hidden layers of forward LSTM
random numbers, Sobol sequences generate "pseudo-random" numbers
and backward LSTM, as well as the final output formula for BILSTM, are
that are more uniformly distributed. For instance, generating 225 points
presented in equations (8) through (10 ) [22]. Fig. 4 illustrates the

Fig. 4. The structure of BILSTM.

5
X. Hu et al. Electric Power Systems Research 241 (2025) 111330

Fig. 5. (a) Results generated by the rand function (b) Results of Sobol sequence generation.

in a two-dimensional space using the Sobol sequence, as shown in Fig. 5

(b), results in a significantly more uniform distribution compared to
traditional random initialization. This uniformity ensures a more
comprehensive exploration of the parameter space during the popula
tion initialization phase, reducing the likelihood of premature conver
gence to local optima and enhancing the overall optimization process.

3.1.2. An adaptive t-distributed disturbance with attenuation factor is

introduced
Since it is common for optimization algorithms to fall into local
optima, this paper proposes an adaptive T-distributed perturbation
mechanism with an attenuation factor to address this challenge.:
→t+1 →t
X i,best = X i,best ∗ (1 + t(iter) ∗ γ) (11)

( (π t )) (t)
γ = exp 1.1 ∗ cos ∗ − exp (12)
2 T T
→t
Where, t represents the current iteration number, X i,best represents
the optimal hyperparameter combination of generation t, and T repre
sents the total number of iterations
During the initial iteration stages, the attenuation factor is around 1, Fig. 6. Decay curve of γ.
causing more intense position disturbances. Towards the final stages,
this factor reduces to near 0, allowing for more detailed exploration of Result in Table 1 demonstrates that the enhanced ETTAO algorithm
nearby positions. For example, with 100 iterations, the decay process of consistently outperforms both the original TTAO algorithm and tradi
the attenuation factor is illustrated in Fig. 6. tional methods such as Particle Swarm Optimization across eight test
functions. These results highlight its robust optimization efficiency,
3.1.3. Greedy algorithm and current best location update strategy leading to improved accuracy in hyperparameter tuning.
Analysis of the TTAO algorithm reveals that it updates only to the
current generation’s best position, ignoring the best positions from 4. Dataset analysis and hyperparameter tuning
previous generations. This can result in oscillation problems during it
erations. To address this, we introduce a global optimal position that is 4.1. Dataset analysis and preprocessing
updated only if the current optimal position surpasses any previously
found optimal position. Additionally, we employ a greedy algorithm. 4.1.1. Dataset partitioning and presentation
After applying the perturbation strategy detailed in Section 3.1.2, the The primary aim of this study is to perform short-term power load
new position is evaluated. The perturbation is accepted only if the new forecasting for regional grids, with a focus on day-ahead load pre
position improves upon the current optimal position. This approach dictions. To evaluate the proposed method’s effectiveness, generaliz
mitigates convergence issues caused by excessive perturbations. ability, and robustness, we utilize two distinct power load datasets
collected from different timeframes and geographical locations. The first
3.2. Performance experiments of ETTAO algorithm dataset, obtained from ELIA, a real-world power grid in Belgium, spans
580 days from June 11, 2017, and includes a total of 55,680 data points
To validate the robust optimization performance of the ETTAO al (96 points per day). During this period, Belgium experienced rapid
gorithm, we compare it with TTAO, Particle Swarm Optimization (PSO), growth in new energy tram systems and charging infrastructure,
and Genetic Algorithm (GA) using eight commonly used CEC functions resulting in an annual increase in the contribution of electric vehicle
[24]. Each algorithm employs a population size of 200 and runs for 1000 charging to the overall system power load. Dataset 2 comprises 440 days
iterations. Table 1 shows the comparative results. of data from region 2 of the Ninth China Electrical Engineering Cup

6
X. Hu et al. Electric Power Systems Research 241 (2025) 111330

Table 1
Optimization results of each algorithm.
Formula TTAO PSO GA ETTAO
∑n
f1 = x2 Mean 1.0189e-10 0.012224 5.1851e-05 0
i=1 i
std 1.9023e-10 0.004786 0.00011075 0
∑n n
∏ mean 6.8606e-08 0.78342 0.00093861 0
f2 = i=1
|xi | + |xi | std 1.7481e-07 0.1932 0.00087453 0
i=1
∑n ( ∑i )2
mean 3.6923 1.9235 1.6345e-05 0
f3 = xj
i=1 j=1 std 2.0839 0.5784 2.3732e-05 0
∑n [ ]
f4 = x2i − 10cos(2πxi ) + 10 mean 12.3707 34.4383 5.6649e-05 0
i=1
std 5.3336 10.7325 0.0001052 0
( √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ )
1 ∑n
( ∑
1 n
) mean 3.0589 1.652 0.009756 4.4409e-16
f5 = − 20exp − 0.2 i=1 i
x2 − exp cos(2πxi ) + 20 + e std 0.84946 0.55896 0.010206 0
n n i=1
( )
1 ∑n ∏n
xi mean 0.0051714 0.015698 0.001516 0
f6 = x2 −
i=1 i
cos √̅ + 1 std 0.0068205 0.016772 0.002899 0
4000 i=1 i
( )− 1
mean 1 1 1.257 1
1 ∑25 1
f7 = + ∑ ( ) std 0 9.185e-15 1.1167 0
500 j=1
j + 2i=1 xi − aij
6

[ ∑4 ( )2 ]
∑4 mean − 3.8628 − 3.8628 − 1.8996 − 3.8628
f8 = − c exp
i=1 i
a xj − pij
j=1 ij std 0.34541 2.7505e-08 0.00010705 2.7101e-15

Mathematics Competition, starting from April 14, 2009. This dataset periodic cliff-like drops and rises that are absent in the real dataset.
serves to further validate the model’s robustness and applicability across These differences may stem from distinct load behaviour patterns and
different contexts. Dataset 1 uses the first 420 days as the training set, significant variations in weather conditions between the two locations.
the middle 70 days as the validation set, and the last 70 days as the test Observing typical days reveals that the higher volatility of the
set. Dataset 2 uses the first 360 days as the training set, the middle 30 competition dataset persists consistently. In the 96-point dataset, the
days as the validation set, and the last 30 days as the test set. The ETTAO actual ELIA grid curve appears smooth, contrasting sharply with the
algorithm optimizes parameters on the validation set to prevent infor competition dataset’s pronounced zigzag shape. This difference may be
mation leakage and subsequently assesses prediction performance using attributed to the presence of variable and sensitive loads in the
the test set. competition dataset’s region. The increased nonlinearity also compli
The average of each dataset for each day is taken and plotted to show cates prediction efforts for the competition dataset.
the overall trend (Fig. 7). A typical day’s presentation for each dataset is
shown in Fig. 8. 4.1.2. Data normalization
Upon analyzing the overall trend of both load curves, it becomes Due to the substantial scale of the load data, normalization is
evident that they share a similar overall trend. However, the real dataset essential for smooth model convergence. This study uses the standard
exhibits smaller fluctuations compared to the competition dataset, min-max method to normalize the load data within the range [0,1], as
resulting in a more stable overall trend. The competition dataset displays shown by the following formula:

Fig. 7. Presentation of each dataset.

7
X. Hu et al. Electric Power Systems Research 241 (2025) 111330

Fig. 8. Presentation of typical days for each dataset.

Y − Ymin
Yʹ = (13) 1∑ n
Ymax − Ymin MAE = y i − yi |
|̂ (18)
n i=1
Where: Yʹ is the normalized load data, Ymin is the minimum value in
the load data, Ymax is the maximum value in the load data. Where: yi is the real value of i time load, ̂
y i is the predicted load value
After making predictions based on the normalized data, it is neces at time, y is the average value of the true load.n stands for the total
sary to de-normalize the predicted value in order to obtain the final amount of data.
prediction. The formula for inverse normalization is as follows: Using different evaluation metrics in predictive modeling and
regression analysis can lead to several potential problems, as each metric
Y
̂ = (Ymax − Ymin )Ypred + Ymin (14) emphasizes different aspects of model performance. To address this
issue, we propose a hybrid objective function to balance the different
Where: Ŷ is the de-normalized final prediction, Ypred is the predicted
metrics as described in 4.2.1.
value without de-normalization.

4.1.3. Model input 4.2. Application of ETTAO algorithm in hyperparameter optimization

After extensive experiments, this paper adopts a 7-day prediction
model, using 672 historical data points to predict 96 future points. The ETTAO algorithm exhibits superior optimization and conver
gence efficiency compared to traditional methods. This subsection de
4.1.4. Model evaluation scribes the use of the ETTAO algorithm for hyperparameter
In this study, power load forecasting methods are evaluated using optimization. In this study, all hyperparameter tuning is conducted
several metrics including Root Mean Square Error (RMSE), Mean Ab exclusively on the validation set to avoid information leakage during the
solute Percentage Error (MAPE), coefficient of determination (R2), and prediction process.
Mean Absolute Error (MAE). Lower values of RMSE, MAPE, and MAE
indicate higher prediction accuracy, whereas for R2, higher values 4.2.1. Objective function construction
indicate higher accuracy. The formulas are as follows [25]: All optimization algorithms, including hyperparameter optimization
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ and model evaluation, require a suitable objective function. Tradition
n
1∑ ally, a single metric such as RMSE is often used alone. However, this
RMSE = (yi − ̂ y i )2 (15)
n i=1 paper introduces a more comprehensive approach by proposing a hybrid
objective function called the cuboid mass function. This function treats
1∑ n ⃒⃒yi − ̂
⃒
y i ⃒⃒ RMSE, MAPE, 1-R2 as dimensions of a cuboid, with MAE acting as the
MAPE = ⃒ × 100% (16) cuboid’s density, so the mass function formula of the cuboid is as
n i=1 ⃒ yi ⃒
follows:
∑
(̂y i − yi )2 mass = RMSE ∗ MAPE ∗ MAE ∗ (1 − R2) (19)
R2 = 1 − ∑i 2
(17)
i (yi − y) Thus, the objective function optimization aims to minimize the cu
boid’s mass, ensuring a comprehensive and simultaneous consideration
of all four-evaluation metrics. As a result, the optimized composite

8
X. Hu et al. Electric Power Systems Research 241 (2025) 111330

model achieves more balanced performance across these metrics. Table 2

The hyperparameters to be optimized for TSAB.
4.2.2. Optimization process Model Hyperparameter Scope
Initially, hyperparameter positions are generated using the ETTAO
TSAB Filtersize [0,100]
algorithm, where each position represented a unique hyperparameter Number of filters [0,100]
configuration. The initial population positions are randomly generated Number of key channels of attention [0,100]
using the Sobol sequence. Subsequently, predictions are made for each Number of hidden units1 [2,500]
position, and their objective function values are calculated and ranked Number of hidden units2 [2,500]
Learning rate [0.0001,0.01]
to select the optimal position. The algorithm iteratively updated posi Regularization coefficient [0.0001,0.01]
tions based on its location update mode to explore new configurations
and update the optimal position. Upon completing the optimization
generation, the global optimal position and network model are saved. in the Fig.10 below. The curve shows a rapid decline in the quality
The next generation population then begins from this optimal position, function at the start of the iterations, stabilizing towards the end, which
initiating another iterative cycle. Over multiple iterations, the objective highlights the optimization capability of ETTAO.
function value consistently converged toward an optimal value, signi
fying completion of the parameter optimization process. The overall 5. Experimental case
procedure is depicted in Fig. 9.
To validate the rationality, superiority, universality, prediction ac
4.2.3. Result of hyperparameter optimization for TSBA curacy and robustness of the TSAB model, this paper designs three
Using TSAB as an example, the hyperparameters to be optimized are experiments.
listed in Table 2. For demonstration, the ETTAO algorithm is applied to Experiment 1: Dataset 1 is used to assess the specialized mining
optimize these parameters on dataset 1. The population size is set to 6, module’s ability to enhance prediction accuracy. This demonstrates that
and the algorithm runs for 10 iterations. The iteration curve is depicted the TCN module with a self-attention mechanism post-transformation

Fig. 9. The illustration of the hyperparameter optimization process of ETTAO.

9
X. Hu et al. Electric Power Systems Research 241 (2025) 111330

Fig. 10. Iteration curve of TSAB model.

outperforms TCN modules with linear stacking or without attention

mechanisms.
Experiment 2: Also using Dataset 1, model ablation experiments are
Fig. 11. The structure of model A, B and C.
conducted to validate the rationality and necessity of each module
within TSAB. This involves comparing the overall TSAB model against
predictions made by each submodule individually. Table 3
Experiment 3: For prediction accuracy and robustness, TSAB is Experiment 1 Comparison of the prediction results of each model.
compared against various models such as TCN, LSTM, and CNN-LSTM
model RMSE R2 MAPE MAE
on both Dataset 1 and Dataset 2. This experiment aims to demonstrate
the TSAB model’s effectiveness across different datasets and its TCN-BILSTM 483.4473 0.85679 3.3561 346.7098
Model A 429.787 0.88681 3.0316 320.5207
competitive edge over existing single and combined models.
Model B 455.1303 0.87307 3.2845 327.5245
Model C 476.0114 0.86116 3.3540 346.3712
TSAB 380.2151 0.91142 2.669 278.0999
5.1. Special mining module effect verification

To validate the TCN module’s effectiveness with a self-attention requirements, and system reliability.
mechanism, we compare the TSAB model against two variants: the
TCN-BILSTM model with a linear stacked self-attention mechanism 5.2. Ablation experiment of TSAB model
(configured at the beginning, middle, or end denoted as models A, B, and
C as shown in Fig. 11) and the TCN-BILSTM model without any attention Ablation experiments on dataset 1 validate the rationality and ne
mechanism. This comparison is conducted on dataset 1 to assess per cessity of each module in the composite model. Using ETTAO for
formance differences. parameter optimization across all models ensures fairness. The TSAB
The experimental results for various models are presented in Table 3. model is compared against TCN, BILSTM, TCN-BILSTM, and TCN-
An example of test results from a randomly selected day is illustrated in Attention to analyze each component’s impact on prediction accuracy.
Fig. 12. To ensure fairness, all model results are optimized by ETTAO. Results are summarized in Table 4.
The data analysis shows that the TCN variant with the fusion Results in the table show each component of the TSAB model
attention mechanism significantly enhances prediction accuracy significantly contributes to prediction accuracy, demonstrating the
compared to other models, including those with linear stacked attention indispensable role of every part in the model.
or no attention. It outperforms these models across all four evaluation
indicators, with a 11.53 % improvement in RMSE and a 19.20 % 5.3. TSAB model evaluation
improvement in MAPE compared to the next best model.
Fig. 12 presents a comparative analysis of the TSAB model alongside After confirming the rationality of the TSAB model, comparative
other benchmark models in predicting short-term power load. This tests were conducted on dataset 1 and dataset 2 to further validate its
randomly selected day highlights the TSAB model’s overall ability to superior prediction accuracy. Benchmark models including TCN, LSTM,
track the real demand closely, demonstrating its superior accuracy, CNN-BILSTM, and CNN-BILSTM-Attention were selected for compari
especially in capturing both peak and trough values. The observed dis son. ETTAO was used to optimize parameters across all models to ensure
crepancies, where TSAB’s prediction for peak demand may occasionally fairness. Results are presented in Table 5,6 and Fig. 13, 14.
be lower than real demand, do not necessarily indicate a persistent trend Based on the results presented in the Tables 5 and 6, the TSAB model
as these figures are based on isolated daily samples. As such, these in demonstrates superior prediction accuracy across both datasets. Spe
stances of minor underestimation are not conclusive enough to suggest a cifically, in Dataset 1, the TSAB model achieves significant improve
systematic bias in the TSAB model. If such a pattern were to be ments over the LSTM baseline, with reductions of 21.32 % in RMSE,
confirmed across a broader test set, however, it could imply implications 22.65 % in MAPE, and 26.98 % in MAE. Similarly, in Dataset 2, the TSAB
for grid operations, especially concerning the accuracy of peak load model continues to outperform the LSTM baseline, showing improve
forecasts which are critical for managing operational costs, reserve ments of 16.36 % in RMSE, 15.20 % in MAPE, and 17.97 % in MAE.

10
X. Hu et al. Electric Power Systems Research 241 (2025) 111330

Fig. 12. Comparison of prediction curves of each model for a randomly selected day.

capture overall load trends, including peak and trough patterns, while
Table 4
maintaining competitive accuracy compared to the other models.
Experiment 2 Comparison of the prediction results of each model.
In Fig. 13, the TSAB model effectively follows the general load trend.
Model RMSE R2 MAPE MAE However, at certain peak times, the TCN model slightly outperforms
TCN 492.5712 0.85133 3.1705 331.2629 TSAB, indicating TCN’s potential advantage in capturing rapid high-
BILSTM 610.5202 0.7716 4.6694 458.1696 magnitude fluctuations. It is important to note that these observations
TCN-BILSTM 483.4473 0.85679 3.3561 346.7098
are based on randomly selected days. Consequently, the occasional un
TCN-Attention 460.8191 0.86988 3.1705 324.123
TSAB 380.2151 0.91142 2.669 278.0999 derestimation of peaks by the TSAB model in figures such as Fig.12 and
13 may not represent a consistent pattern or trend across the entire
dataset. Further investigation would be necessary to determine if this
observation holds more broadly and to assess any potential implications
Table 5
for power system operations.
Comparison of the results of each model in the experiment of Dataset 1.
Model RMSE R2 MAPE MAE
6. Conclusion
TCN 492.5712 0.85133 3.1705 331.2629
LSTM 483.2481 0.8569 3.4505 353.1395 Accurate short-term power load forecasting is essential for opti
CNN-LSTM 454.1531 0.87362 3.1839 326.125
CNN-BILSTM-Attention 423.3559 0.89017 3.3039 315.3816
mizing production planning, scheduling, economic operations, and
TSAB 380.2151 0.91142 2.669 278.0999 maintenance within power systems. However, many existing models
struggle with accuracy, robustness, and adaptability, particularly given
the significantly increased complexity and variability of short-term load
patterns driven in part by the growing integration of renewable energy
Table 6
Comparison of the results of each model in the experiment of Dataset 2. and emerging loads such as electric vehicles (EVs). This paper introduces
TSAB, a sophisticated composite model that integrates Temporal Con
Model RMSE R2 MAPE MAE
volutional Networks (TCN), Bidirectional Long Short-Term Memory.
TCN 562.9572 0.84768 5.2684 382.0632 (BILSTM), and a self-attention mechanism to enhance temporal feature
LSTM 561.6392 0.84839 5.6362 405.9726
extraction. Hyperparameter tuning for TSAB is managed by the
CNN-LSTM 549.3354 0.85496 5.0977 366.6476
CNN-BILSTM-Attention 526.4585 0.86679 5.2287 371.4257 Enhanced Triangular Topology Aggregation Optimizer (ETTAO), which
TSAB 469.7508 0.89394 4.7796 344.1041 proves more efficient than traditional optimization methods. Extensive
experiments demonstrate that ETTAO effectively tunes hyperparameters
through a balanced quality function that consolidates multiple evalua
These results underscore the enhanced predictive capabilities of the tion metrics, optimizing model performance. This quality function also
TSAB model and prove its accuracy, versatility, and robustness, offering helps to balance the evaluation metrices in model evaluation. Validation
a promising avenue for enhancing short-term load forecasting accuracy. on two independent datasets confirms TSAB’s robustness, adaptability,
Fig. 13 and 14 present a comparison of the TSAB model with other and high forecasting accuracy, effectively addressing core challenges in
benchmark models, including TCN, LSTM, CNN-LSTM, and CNN- short-term load forecasting. TSAB’s improved forecasting precision en
BILSTM-Attention, for short-term load forecasting on randomly ables more accurate generation planning, better alignment of supply and
selected days. These figures illustrate the TSAB model’s ability to

11
X. Hu et al. Electric Power Systems Research 241 (2025) 111330

Fig. 13. Comparison of prediction curves of each model for a randomly selected day in dataset 1.

Fig. 14. Comparison of prediction curves of each model for a randomly selected day in dataset 2.

demand, and reduced reserve capacity and operational costs. Addi Software, Methodology, Investigation, Formal analysis, Data curation.
tionally, TSAB enhances smart grid dispatching by improving user-side Huimin Li: Writing – review & editing, Validation, Supervision, Fund
demand response management and facilitating more accurate pre ing acquisition, Data curation, Conceptualization. Chen Si: Writing –
dictions of energy market price fluctuations. review & editing, Validation, Software, Investigation,
Conceptualization.
CRediT authorship contribution statement

Xuhui Hu: Writing – original draft, Visualization, Validation,

12
X. Hu et al. Electric Power Systems Research 241 (2025) 111330

Declaration of competing interest [11] L.I. Yan, J.I.A. Yajun, L.I. Lei, et al., Short term power load forecasting based on a
stochastic forest algorithm[J], Power Syst. Prot. Control 48 (21) (2020) 117–124.
[12] G. Chicco, I.S. Ilie, Support vector clustering of electrical load pattern data, IEEE
The authors declare that they have no known competing financial Trans. Power Syst. 24 (3) (Aug. 2009) 1619–1628, https://doi.org/10.1109/
interests or personal relationships that could have appeared to influence TPWRS.2009.2023009.
the work reported in this paper. [13] S. Li, P. Wang, L. Goel, A novel wavelet-based ensemble method for short-term load
forecasting with hybrid neural networks and feature selection, IEEE Trans. Power
Syst. 31 (3) (May 2016) 1788–1798, https://doi.org/10.1109/
Data availability TPWRS.2015.2438322.
[14] Zhang Daohua, Jin Xinxin, Shi Piao, Chew XinYing, Real-time load forecasting
model for the smart grid using bayesian optimized CNN-BILSTM, Front. Energy
Data will be made available on request. Res. 11 (2023). https://www.frontiersin.org/articles/10.3389/fenrg.2023,
1193662,10.3389/fenrg.2023.1193662,2296-598X.
References [15] OUYANG Fulian, W.A.N.G. Jun, Z.H.O.U. Hangxia, Short-term power load
forecasting method based on improved hierarchical transfer learning and multi-
scale CNN-BILSTM-Attention[J], Power Syst. Prot. Control 51 (02) (2023)
[1] M. Li, X. Xie, D. Zhang, Improved deep learning model based on self-paced learning
132–140, https://doi.org/10.19783/j.cnki.pspc.220422.
for multiscale short-term electricity load forecasting, Sustainability. 14 (2022) 188,
[16] Yuxin W.A.N.G. . Research on the power load forecasting method based on CNN-
https://doi.org/10.3390/su14010188.
BILSTM[D]. Xi’an: Xi’an University of Technology, 2021.
[2] Xiangyu Kong, Zhengtao Wang, Fan Xiao, Linquan Bai, Power load forecasting
[17] R.E.N. Jianji, W.E.I. Huihui, Z.O.U. Zhuolin, Ultra-short-term power load
method based on demand response deviation correction, Int. J. Electr. Power
forecasting based on CNN-BILSTM-Attention[J], Power Syst. Prot. Control 50 (08)
Energy Syst. 148 (2023) 109013, https://doi.org/10.1016/j.ijepes.2023.109013.
(2022) 108–116, https://doi.org/10.19783/j.cnki.pspc.211187.
ISSN 0142-0615.
[18] L.I.U. Jie, J.I.N. Yongjie, Ming TIAN, Multi-Scale short-term load forecasting based
[3] Ziyu Sheng, Zeyu An, Huiwei Wang, Guo Chen, Kun Tian, Residual LSTM based
on VMD and TCN[J], J. Univ. Electr. Sci. Techn. China 51 (04) (2022) 550–557.
short-term load forecasting, Appl. Soft. Comput. 144 (2023) 110461, https://doi.
[19] L.I.A.N.G. Lu, L.I.U. Yuanlong, L.I.U. Shaohua, Z.H.A.N.G. Zhisheng, Research on
org/10.1016/j.asoc.2023.110461. ISSN 1568-4946.
short-term load forecasting of power system based on ECA-TCN[J], Proceed. CSU-
[4] I. Ullah, S. Muhammad Hasanat, K. Aurangzeb, M. Alhussein, M. Rizwan, M.
EPSA 34 (11) (2022) 52–57, https://doi.org/10.19635/j.cnki.csu-epsa.000989.
S Anwar, Multi-horizon short-term load forecasting using hybrid of LSTM and
[20] W. Sheng, K. Liu, D. Jia, S. Chen, R. Lin, Short-term load forecasting algorithm
modified split convolution, PeerJ Comp. Sci. 9 (2023) e1487, https://doi.org/
based on LST-TCN in power distribution network, Energies. 15 (2022) 5584,
10.7717/peerj-cs.1487.
https://doi.org/10.3390/en15155584.
[5] Agostino Tarsitano, Ilaria L. Amerise, Short-term load forecasting using a two-stage
[21] Sen Fang, You-Shuai Tan, Tao Zhang, Yepang Liu, Self-attention networks for code
sarimax model, Energy 133 (2017) 108–114, https://doi.org/10.1016/j.
search, Inf. Softw. Technol. 134 (2021) 106542, https://doi.org/10.1016/j.
energy.2017.05.126. ISSN 0360-5442.
infsof.2021.106542. ISSN 0950-5849.
[6] Kianoosh G. Boroojeni, M.Hadi Amini, Shahab Bahrami, S.S. Iyengar, Arif
[22] D. Guo, M. Sun, Q. Wang, J. Zhang, Taxi demand method based on SCSSA-CNN-
I. Sarwat, Orkun Karabasoglu, A novel multi-time-scale modeling for electric power
BiLSTM, Sustainability. 16 (2024) 7879, https://doi.org/10.3390/su16187879.
demand forecasting: from short-term to medium-term horizon, Elect. Power Syst.
[23] Shijie Zhao, Tianran Zhang, Liang Cai, Ronghua Yang, Triangulation topology
Res. 142 (2017) 58–73, https://doi.org/10.1016/j.epsr.2016.08.031. ISSN 0378-
aggregation optimizer: a novel mathematics-based meta-heuristic algorithm for
7796.
continuous optimization and engineering applications, Expert. Syst. Appl. 238
[7] Zhuang Zheng, Hainan Chen, Xiaowei Luo, A Kalman filter-based bottom-up
(2024) 121744, https://doi.org/10.1016/j.eswa.2023.121744. Part BISSN 0957-
approach for household short-term load forecast, Appl. Energy 250 (2019)
4174.
882–894, https://doi.org/10.1016/j.apenergy.2019.05.102. ISSN 0306-2619.
[24] Xin Yao, Yong Liu, Guangming Lin, Evolutionary programming made faster, IEEE
[8] A.P. Douglas, A.M. Breipohl, F.N. Lee, R. Adapa, Load forecasting using support
Trans. Evolut. Comput. 3 (2) (July 1999) 82–102, https://doi.org/10.1109/
vector machines: a study on EUNITE competition 2001, IEEE Trans. Power Syst. 13
4235.771163.
(4) (1998) 1507–1513.
[25] E. Uwimana, Y. Zhou, N.M. Sall, A short-term load demand forecasting:
[9] Fu Liu, Tian Dong, Qiaoliang Liu, Yun Liu, Shoutao Li, Combining fuzzy clustering
levenberg–Marquardt (LM), Bayesian regularization (BR), and scaled conjugate
and improved long short-term memory neural networks for short-term load
gradient (SCG) optimization algorithm analysis, J. Supercomput. 81 (2025) 55,
forecasting, Electr. Power Syst. Res. 226 (2024) 109967, https://doi.org/10.1016/
https://doi.org/10.1007/s11227-024-06513-y.
j.epsr.2023.109967. ISSN 0378-7796.
[10] Shuo Liu, Zhengmin Kong, Tao Huang, Yang Du, Wei Xiang, An ADMM-LSTM
framework for short-term load forecasting, Neur. Networks 173 (2024) 106150,
https://doi.org/10.1016/j.neunet.2024.106150. ISSN 0893-6080.