1 s2.0 S0957417422009848 Main
1 s2.0 S0957417422009848 Main
A R T I C L E I N F O A B S T R A C T
Keywords: The Short-Term Load Forecasting (STLF) is a pre-eminent task for reliable power generation and electrical load
Data analysis dispatching in the power system. Numerous machine-learning and deep-learning forecasting algorithms have
Load forecasting been presented in literature for performing an accurate electrical load forecast. However, the complicated
Learning (artificial intelligence)
structure of machine-learning and deep-learning multi-layer and with increased filter size architectures provoke
Machine learning
Power engineering computing
the overfitting issue, which degrades the performance of STLF engines in the presence of highly diversified
Time series analysis weather and temporal variations. This paper proposes a novel two-stage Encoder-Decoder (ED) network with
improved generalization capability and forecasting accuracy. The proposed architecture is based on Short
Receptive field based Dilated Causal Convolutional (SRDCC) network in the first stage and Bi-directional Long
Short-Term Memory (BiLSTM) network in the second stage. Using real valued data, the proposed ED architecture
is quantitatively and qualitatively analyzed in comparison with state-of-the-art machine-learning and hybrid
deep-learning STLF models. The evaluation matrix used for the comparison consists of six evaluation parameters.
The extensive experimentation for multi-step ahead STLF validates the efficiency of the proposed technique in
terms of accuracy in comparison with other employed models. The CNN-LSTM revealed to have best performance
among all other implemented parametric and non-parametric forecasting models; however, the proposed ED
architecture proves to be 35% more accurate compared to CNN-LSTM and have the tendency to capture the local
trends in an electrical load pattern more accurately. Moreover, a detailed comparative analysis on the compu
tational complexity of the proposed ED architecture is also conducted to show the real implementation prospect.
* Corresponding author at: Department of Electrical and Computer Engineering, COMSATS University Islamabad, Lahore Campus, 54000, Pakistan.
E-mail addresses: [email protected] (K. Ijaz), [email protected] (M. Jawad), [email protected] (I. Khosa), [email protected]
(N. Shabbir).
1
Co-first author: U. Javed and K. Ijaz contributed equally to this paper.
2
ORCID: https://orcid.org/0000-0003-3730-2128.
https://doi.org/10.1016/j.eswa.2022.117689
Received 24 February 2022; Received in revised form 20 April 2022; Accepted 28 May 2022
Available online 4 June 2022
0957-4174/© 2022 Elsevier Ltd. All rights reserved.
U. Javed et al. Expert Systems With Applications 205 (2022) 117689
2
U. Javed et al. Expert Systems With Applications 205 (2022) 117689
Table 1
Summary of various state-of-the-art short-term electrical load forecasting models.
Ref. Methodology input Input parameter Forecast horizon Electricity Market/ Performance
resolution Data Metrics
(Cavallo et al., ANN Hourly Temperature, Humidity, Day type Daily load Irish Commission of NRMSE
2015) Energy Regulation
(Edigera & Akarb, ARIMA, SARIMA Historical load Yearly Energy resources of MSE
2007) Turkey
(Yildiz et al., Multiple linear Climate, temporal, historical load, Hourly, daily peak University of New MAPE, RMSE,
2017) regression, ANN, South Wales, MBE, R-Square
SVM, NARX Sydney, Australia
(Bianchi et al., PCA, Autoregressive 10 min Lagged loads 1-step ahead, 144-step Azienda Comunale NRMSE
2015) ahead Energia e Ambiente,
Rome, Italy
(Velasco et al., KNN-ANN Monthly Metering point name, date time, Kilowatt Day-ahead base, Power Utility MAPE
2018) delivered, Kilowatt per hour delivered Intermediate and Peak Company,
and Kilovolt amps reactive hours load Philippines
delivered.
(Jawad et al., FARIMA 30 min Weekdays, Weekends, Power load Daily Power load EIRGIRD Group, RMSE, MAPE,
2018) Ireland MAE, NRMSE
(Fan et al., 2019) W-KNN Historical load 8-Hours National Electricity RMSE, NMSE,
Market (Australia) MAPE, MAE
(Yildiz et al., KNN 30 min Minimum Temperature, Maximum Daily load Australian National MAPE, MAE
2017) Temperature Electricity Market
(NEM)
(Li et al., 2016) Extreme Learning Past load, Past temperature, forecasted Hourly, daily ISO New England MAPE, MAE, RMSE
Machine temperature, Day of week, Weekend
(Buitrago & ANN with NARX Month, Day, Hour of the day, Day of Daily load ISO-NE grid AE%, MAPE
Asfour, 2017) week, Working Day, Temperature, Dew operator, England
point
(Jawad et al., GA-NARX-NN 15 min Day of the week, Dry bulb temperature, 168-hour load National Estuarine MAPE, RMSE, Error
2018) Dew point temperature, Hour of the day, Research Reserve variance
Working or off day, Previous week, same System, Texas
day, same hour load, Average’s load
(Kong et al., 2019) LSTM RNN Time-steps, sequence of Consumed 5-days aggregated load Commercial-scale MAPE
energy, sequence of time day indices, smart grid project,
weekday indices, Holidays. SGSC, Australia
(Somu & CNN-LSTM 15 min Day, Month, Year, Hour, Minutes, Daily analysis on the IIT-Bombay, India MSE, RMSE, MAPE,
Ramamritham, Seconds and Energy consumed selected days of MAE
2021) weekdays (Tuesday,
Thursday, Saturday)
Our Work SRDCC-BiLSTM-ED 15 Minutes Temperature, Humidity, Hour of the day, Step Ahead & Day Ahead Lahore Electricity MAPE, RMSE,
Day of the week, working day, Previous Supply Corporation MAE, R-Square,
3 h electrical load, previous 3 days same (LESCO), Pakistan Std. Dev., PA, PI
hour electrical load
Abbreviations: MBE: Mean bias error, ANE: Absolute normalized error, Max. AE: Maximum Absolute error AAE: Average absolute error, APE: Absolute percentage
error, RPE: Relative percentage error, MARPE: Mean absolute relative percentage error, NMSE: Normalized mean square error.
conventional LSTM. The proposed ED architecture is implemented on • For the sake of detailed and valid comparative analysis and to vali
the real-time electric load profile of Lahore, Pakistan for a step ahead date the performance of the proposed ED model, the same PM is used
and day ahead STLF. However, the SRDCC-BiLSTM architecture is to develop linear parametric forecasting models, such as Auto-
equally applicable to other datasets with the small tuning of hyper- Regressive with Exogeneous Inputs (ARX), Auto-Regressive Moving
parameters because the small size of convolution layers and the small Average with Exogeneous Inputs (ARMAX), and Output Error (OE)
number of neurons and hidden layers in BiLSTM add robustness to the and for ML modeling techniques, such as KNN, SVM, Tree Bagger,
over-fitting. To the best of our knowledge, the ED configuration is ANN-PSO and ANN-LM.
implemented for the first time for STLF problem. Moreover, Table 1 il • The proposed SRDCC-BiLSTM framework is developed for STLF
lustrates a detailed overview of various state-of-the-art short-term using the same PM, which combines the key features of both CNN
electrical load forecasting models and gives a comprehensive compar and BiLSTM and reveals considerable progress compared to the
ative analysis in terms of methodology, resolution, parameters of input aforementioned techniques.
features, forecasting horizon, details of data availability, and the per • In SRDCC, a small size dilated causal convolution filters in the
formance metrics used in cited research articles. Considering the above encoder section is adopted, which reduces the receptive field of the
discussion, the main contributions of this paper are: lower layer to extract the specific patterns of the local trends in time-
series data without increasing model parameters. However, the
• A detailed Exploratory Data Analysis (EDA) is performed to identify proposed SRDCC block deploys the filters of size 2 × 2 in a second
the exogenous electrical and non-electrical multi-variable inputs for one-dimensional convolutional layer to improve the generalization
designing STLF model. Correlation analysis is used to identify his capability. This type of filters combination is never presented before
torical load values that are useful for prediction. Moreover, QQ-plots in an electrical STLF method.
and boxplots between the proposed inputs and output are analyzed • A suitable framework and an appropriate filter size impose re
to validate the findings of correlation analysis. Based on the results of strictions on the extensive change of input sequence dimensions
the EDA of electrical and non-electrical parameters, a Predictor during the convolution process in an encoder module. Therefore, this
Matrix (PM) is developed for the modeling of STLF. framework ensures that all the necessary features and patterns
3
U. Javed et al. Expert Systems With Applications 205 (2022) 117689
Fig. 2. Exploratory Data Analysis: (a) Autocorrelation plot of electrical load consumption; (b) QQ-plot of temperature vs electrical load; (c) QQ-plot of humidity vs
electrical load; (d) QQ-plot of present electrical load vs previous hour lagged load values; (e) QQ-plot of present electrical load vs previous 2nd hour lagged load
values; (f) QQ-plot of present electrical load vs previous 3rd hour lagged load values; (g) QQ-plot of present electrical load vs previous 3 h average lagged load values;
(h) QQ-plot of present electrical load vs previous day same hour lagged load values; (i) QQ-plot of present electrical load vs previous 2nd day same hour lagged load
values; (j) QQ-plot of present electrical load vs previous 3rd day same hour lagged load values; (k) Electrical load consumption curve of the entire month.
extracted from the convolutional process must participate in the next load data and avoiding over-fitting, under-fitting, and vanishing
stages of the prediction process. gradient problems.
• In SRDCC, a small size dilated causal convolution filters in the • The SRDCC-BiLSTM architecture is evenly significant to other data
encoder section is adopted, which reduces the receptive field of the sets with the small tuning of hyper-parameters because the small size
higher layers to extract the specific patterns of the local trends in of convolution layers and the small number of neurons and hidden
time-series data without increasing model parameters. A suitable layers in BiLSTM add robustness to the over-fitting.
framework imposes restrictions on the extensive change of input • A comprehensive qualitative and quantitative comparison among all
sequence dimensions during the convolution process in the encoder linear and non-linear parametric developed forecasting models is
module. performed over different seasons in the entire year. The validity of
• In BiLSTM, the extracted patterns of local trends are transformed into the proposed model is investigated using several performance met
the vector representation to make data compatible with the BiLSTM rics such as Mean Absolute Percentage Error (MAPE), Root Mean
block. The proposed ED framework forecasts the step-ahead and day- Square Error (RMSE), Mean Absolute Error (MAE), R-square, Stan
ahead electrical load values using BiLSTM in decoder module with dard Deviation (Std. Dev.), Prediction Accuracy (PA), and Prediction
the priority of extracting the complete features from the electrical Interval Bound Percentage (PI %).
4
U. Javed et al. Expert Systems With Applications 205 (2022) 117689
2500
2000
1000
500
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
• A detailed complexity analysis of the proposed SRDCC-BiLSTM Fig. 2(a) represents the autocorrelation plots of electrical load profile for
model is conducted to show the computational efficiency and time January 2019 over consecutive time intervals up to 1 week (96 lags/
complexity of the algorithm. day). In Fig. 2(a), one lag represents a lagged version of 15 min.
Therefore, 96 lags represent the lagged version of one complete day. The
2. Exploratory data analysis following autocorrelation plots of 700 lags presents lagged version over
consecutive time intervals up to 1 week (almost). Fig. 2(a) illustrates
The EDA is a significant statistical approach to examine the key that the electrical load has a strong correlation in the near past (with
features present in the dataset and reveals the relationship between consecutive last 3 h) and with previous day similar hours load profile.
inputs and outputs (Javed et al., 2021). The EDA is a preliminary Therefore, based on this analysis, we selected corresponding lag values
analysis to explore hidden features and patterns before formal hypoth as inputs in the PM used for the modeling of STLF.
esis investigation and modeling (Komorowski & Marshal, 2016).
5
U. Javed et al. Expert Systems With Applications 205 (2022) 117689
3. Methodology
Temperature (m)
The proposed hybrid ED architecture (SRDCC-BiLSTM) focuses on
Humidity (m) the mitigation of the over-fitting problem and avoids the overwhelming
of model parameters, such as the size of the convolutional layer that
includes weights of the filter, bias unit, and dimensions of the input
Hour of the Day series in Dilated Causal Convolution Neural Network (DCCN) block. The
proposed hybrid ED architecture for STLF is a combination of two neural
Day of the Week network architectures, such as (a) DCCN with different filter sizes and
Forecasted (b) Bidirectional LSTM (BiLSTM). However, four modifications are
Load introduced in the hybrid ED architecture: (a) introduced short receptive
Working Day field in DCCN using 1 × 1 and 2 × 2 small size dilated causal convolution
(d, h, m+1), filter for both step-ahead and multi-step ahead STLF, (b) restricting
Previous 1st Hour Load (d, h+1, m), feature map and model parameters, (c) avoiding an unnecessary change
(d, h-1, m) (d+1, h, m),
of input sequence vector representation, and (d) smaller number of
Previous 2nd Hour Load BiLSTM neurons to avoid over-fitting. The following subsections briefly
(d, h-2, m) describe different components of DCCN, and Bi-LSTM architecture fol
Previous 3rd Hour Load lowed by the proposed hybrid ED architecture.
(d, h-3, m)
3.1. Causal convolution
Previous 3 Hours
Average Load
One-dimensional convolution layer implements convolution be
Previous 1st Day same tween input sequence and filter, which is a sliding process of filter
Hour Load (d-1, h, m) weights across an input series, in which filter weights (model parame
Previous 2nd Day same ters) are sequentially applied to overlapping regions of the series (Oord
Hour Load (d-2, h, m) et al., 2016). However, the causal convolution holds the causality
property, which states that the convolution at time step t does not
Previous 3rd Day same depend upon the future samples xt+1 , xt+2 , ⋯., xT in the time series data
Hour Load (d-3, h, m)
ensuring the data will never leak from the future into the past (Rueda,
Suárez, Torres, & d., 2021). A stack of causal convolutional layers is
Fig. 4. Predictor Matrix for STLF Models. visualized in Fig. 5.
In the light of above analyzed input parameters for the STLF models, Input context window defines the temporal modeling with non-
the Predictor Matrix (PM) is developed, which is represented in Fig. 4. In recurrent networks. The number of model parameters increases as the
Fig. 4, all aforementioned parameters, such as temperature, humidity, size of the input context window increases, which consumes a lot of
and temporal, seasonal, and historical electrical load values are memory and saturates the long-range memory capacity. The dilated
considered as potential inputs in the PM. Whereas the present electrical convolution is instigated to resolve this issue as shown in Fig. 6, where
load is considered as the output of the PM. Moreover, in Fig. 4, the the convolution filter is applied over an area larger than its length by
variables ‘m’ and ‘d’ represent the month of the year and the day of the missing certain input values (Oord et al., 2016).
month, respectively. The ‘h’ represents the electrical load data with 15- Fig. 7 depicts the theoretical concept of dilated causal convolution.
minute resolution (quarter-hourly) as the time-step. The dilated causal convolution introduces a new parameter during
convolution called as the dilation factor. The dilation factor is respon
sible to create space between values in a mask. For instance, a 3 × 3
6
U. Javed et al. Expert Systems With Applications 205 (2022) 117689
0 0 0
0 0 0 0 0 0 0
0 0 0 0 0
0 0
Convolution Filter Dilated Convolution Filter
Convolution Filter Dilated Convolution Filter
Filter Size = 2*2 Dilation Factor = 2
Filter Size = 2*2 Dilation Factor = 2
Filter Size = 3*3
Filter Size = 3*3
( | )= ( | , , …, )
i=1
….
Output
Dilation Factor = 8
Hidden
Dilation Factor = 4
Hidden
Dilation Factor = 2
Hidden
Dilation Factor = 1
Input
….
mask with a dilation factor of two will have the same receptive field as a between input signal and kernel (mask) (Oord et al., 2016):
5 × 5 mask. However, the regular convolution of 5 × 5 mask has 25
∑
s− 1
parameters if we ignore a bias value. Instead, a dilated convolution has F(t)= (k*lf )t = kτ .ft − lτ (1)
only nine model parameters with the same receptive field. Therefore, the τ=0
dilated causal convolution delivers a wider field of view at the same
computational cost. The Eq. (1) represents the dilated convolution where l and s denote the dilation factor and the filter size, respectively.
More precisely, if l = 1, a dilated convolution performs similar to a
7
U. Javed et al. Expert Systems With Applications 205 (2022) 117689
= ( ∗ℎ + ∗ℎ )
Output …. …. ….
Activation Layer …. …. ….
Input ….
8
U. Javed et al. Expert Systems With Applications 205 (2022) 117689
Predicted Output
Y(n)
Raw Dataset
Output
Layer
Y1 (F) Y1 (B) Y2 (F) Y2 (B) Yn-1 (F) Yn-1 (B) Yn (F) Yn (B) Historical
Forecasting Climatic Temporal
Electric
Bi-LSTM
Data Data
Block
Module Decoder LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM
Cell Cell Cell Cell Cell Cell Cell Cell Load Data
X(1) X(2) X(n-1) X(n)
Flatten
Layer
.... .... Binary Encoded
Hidden Layer
Feature
Dilation Factor = 2
Extraction
Module Hidden Layer STLF Multivariate Time Series Data
Dilation Factor = 1
Input
Layer
Fig. 10. Architectural sketch of the proposed SRDCC-BiLSTM Model for STLF.
Normalization Normalization
dilated causal convolution filters of size 2x2 are implemented with a
dilation rate of two to capture the generalized trends in the electrical
CONV 1D (d=1)
load pattern with small model parameters. After two convolution layers
of the SRDCC block, a flatten layer is employed for vector representation Forecasting
of the learned features. The flattening layer adds compatibility with the CONV 1D (d=2)
next decoder module.
The multi-step ahead forecasting requires more aggressive and all
possible local trend extraction from the electrical load data. Therefore, Flatten Layer Inverse
we use 128 very short receptive field based dilated causal convolution Normalization
filters of size 1x1 to capture more specific features in both the first and
second convolutional layers. Moreover, the dilation rate is set to 1 and 2 Flatten Layer
in the first and second one-dimensional convolutional layers,
respectively. Evaluation
In the decoder section, a BiLSTM module is used to record the long- Bi-LSTM Bi-LSTM ... Bi-LSTM Metrics
Cell Cell Cell
term dependencies in the data. After multiple trials run, we finalize 128
neurons for the BiLSTM module with linear activation function as this
number avoids both under-fitting and overfitting problems. The output
layer of the proposed model consists of a single neuron for predicted
electrical load value. The architectural overview of the proposed ED
model (SRDCC-BiLSTM) is presented in Fig. 10. Pre-Trained
Model
3.6. Theoretical sketch of proposed forecasting framework
9
U. Javed et al. Expert Systems With Applications 205 (2022) 117689
Table 2 n ⃒ ⃒
Quantitative comparison of Linear Parametric Models for Step-Ahead STLF. 1∑ ⃒xi − yi ⃒
MAPE = ⃒ ⃒ (6)
2 t=1 xi ⃒
⃒
Period (2019) OE ARX ARMAX Errors
2500 4000
3500
2000
Electrical Load (MW)
3000
1500
2500
1000 2000
00:00 05:00 10:00 15:00 20:00 23:45 00:00 05:00 10:00 15:00 20:00 23:45
Time (Hour) of the Day Time (Hour) of the Day
Fig. 12. Predicted step-ahead electrical load output using statistical and ML methodologies: (a) 7th January 2019, (b) 7th June 2019.
10
U. Javed et al. Expert Systems With Applications 205 (2022) 117689
Table 3
Quantitative comparison of machine learning models for step-ahead electrical load forecasting.
Period (2019) KNN Tree Bagger SVM ANN-PSO ANN-LM (Double Layer) ANN-LM (Single Layer) Errors
changes accurately in the electrical load pattern, which is eminent for one hyper-parameter named as number of nearest neighbors, which is in-
managing hot and cold reserves in power systems. adequate to map the input–output relationship in the PM. Moreover, the
To develop more improved STLF modeling, we employed five widely KNN also experiences the over-fitting problem due to limited hyper-
used ML algorithms, such as Tree Beggars, KNN, SVM, ANN-PSO, and parameters and lack of regularization effect that reduces the perfor
ANN-LM. Moreover, we use two ANN-LM models with single and double mance of the STLF. Fig. 12(a) and (b) shows that the Bagged Trees is
hidden layers and both models are further fine-tuned to obtain optimal significantly better than KNN in terms of forecasting error due to the
hyper-parameters selection, such as the number of neurons and learning advantage of ensembled-based learning. However, the sensible selection
rate during the repetition of training. The superiority of ML models is of hyper-parameters in SVM propagates fewer forecasting errors.
validated by quantitative and qualitative analyzing the forecasting Therefore, the SVM learns the closest match between actual and fore
performance in the presence of highly diversified non-linear factors, casted load curve as depicted in Fig. 12(a) and (b). As the tuning of SVM
such as temperature and humidity for every month of the year. Table 3 hyper-parameters is tedious; therefore, the SVM cannot sustain any
shows the quantitative analysis of all five algorithms for four different further improvement. The NN-PSO that belongs to the family of meta-
months of Year 2019. The MAPE values suggest that the KNN shows the heuristic based NN models is also implemented to attain a more accurate
worst performance among all ML models, since the KNN model has only forecasting engine. However, ANN-PSO model was deprived of finding
2500 4000
3500
Electrical Load (MW)
2000
3000
1500
2500
1000 2000
00:00 05:00 10:00 15:00 20:00 23:45 00:00 05:00 10:00 15:00 20:00 23:45
Time (Hour) of the Day Time (Hour) of the Day
Fig. 13. Predicted Step-Ahead electrical load using hybrid DL Methodologies: (a) 7th January 2019 and (b) 7th June 2019.
11
U. Javed et al. Expert Systems With Applications 205 (2022) 117689
12
U. Javed et al. Expert Systems With Applications 205 (2022) 117689
10 8
KNN
OE
Bagged Trees
ARX SVM
ARMAX NN-PSO
8
6 NN-LM 2-Hidden Layers
NN-LM 1-Hidden Layer
6
MAPE
MAPE
4
2
2
0
0
l
n
ct
b
ar
c
ay
Ju
Ap
Ja
Ju
No
De
Fe
Au
Se
l
O
n
r
ct
b
ar
ay
ov
ec
Ju
M
Ap
M
Ja
Ju
Fe
Au
Se
O
M
D
Month of the Year Month of the Year
(a) (b)
2.5
LSTM
CNN-LSTM
2 SRDCC-BiLSTM
1.5
MAPE
0.5
0
l
n
n
r
ct
b
ar
ay
ov
ec
Ju
Ap
Ja
Ju
Fe
Au
Se
O
M
Fig. 14. Bar Chart of MAPE values between actual and forecasted Step-Ahead Electrical Load of Year 2019 using: (a) Linear Parametric Models; (b) ML models; (c)
Hybrid DL models.
2200
1800 2200
Electrical Load (MW)
Electrical Load (MW)
2000
1600 2000
1800
1400 1800
1600
Fig. 15. Comparative Analysis of Predicted day-ahead (multi-step) electrical load using ML and hybrid DL Models for: (a) 26th January 2019, (b) 26th March 2019,
(c) 26th November 2019.
results is depicted in Fig. 15. In Fig. 15, we plotted the forecasted results BiLSTM model also confirms effective forecasting results. The quanti
of three randomly selected months. Moreover, to further show a fair tative analysis presented in Table 5 shows that the proposed architecture
comparative analysis we plotted bar plot of MAPE error of all afore also performs comparatively better than other ML and hybrid DL models
mentioned DL models for every month of the Year 2019 in Fig. 16. for multi-step ahead electrical load forecasting. The comparison shows
Furthermore, the performance analysis based on the whole evaluation that most of the peak and valley load patterns are captured efficiently by
metric is given in Table 5 for the month of January 2019, March 2019, the proposed SRDCC-BiLSTM model. Moreover, the predicted output of
and November 2019. From the presented results, the proposed SRDCC- the proposed architecture is also comparatively stable and refined than
13
U. Javed et al. Expert Systems With Applications 205 (2022) 117689
15 node to leaf nodes. Therefore, the Bagged trees algorithm highly de
ANN
pends on the dimensions of the input data and ensembled bagged trees
LSTM
used in the algorithm, which restricts the Bagged tress to use in real-time
CNN-LSTM STLF applications due to overly expensive training and test time. Simi
( )
10
SRDCC-BiLSTM larly, the range of training time complexity of SVM is up to O n3
(Rutkowski, Korytkowski, Scherer, Tadeusiewicz, Zadeh, & Zurada,
2015). And the Kernel SVM prediction complexity depends on the Kernel
MAPE
size and dimensions O(d2 ), which is again very costly (Claesen, Smet,
Suykens, & Moor, 2014). The ANN training complexity depends on the
5
number of layers and neurons in the layer and can be represented by the
asymptotic behavior of algorithm analysis as O(n*m), where n is the
number of neurons and m is the total layers in an ANN architecture
(Kasper, 2021). The other factors, such as epochs and backpropagation
0 algorithm also increase its computational cost. The ANN-PSO training
n r r y n l ct and testing time complexity is the sum of individual ANN and PSO
Ja
b
Fe M
a Ap Ma Ju Ju Aug Sep O ov ec
N D training and test time complexity, which also increases the training time
Month of the Year due to finding the best neighborhood samples during the different iter
ations of d-dimensional search space. However, the ANN-LM algorithm
Fig. 16. Bar Chart of MAPE values between actual and forecasted Step-Ahead
with one layer utilizes the time complexity of conventional ANN in the
Electrical Load of Year 2019 using ML and hybrid DL Models.
training and testing phase, which is O(n). Therefore, the ANN-LM is
efficient and non-complex than above mentioned ML algorithms for our
STLF scenario.
Table 5 In hybrid DL algorithms, the LSTM is local in space and time
Quantitative comparison of alternative models for day-ahead short-term elec
(Hochreiter & Schmidhuber, 1997). It means that the memory require
trical load forecasting.
ment of the LSTM algorithm is not affected by the input length sequence
Period ANN- LSTM CNN- SRDCC-BiLSTM- Errors and dimensions. Therefore, the timing complexity of LSTM is O(w),
(2019) LM LSTM ED
where w are the weights of the LSTM networks. It reveals that the LSTM
January 10.97 7.27 5.34 4.03 MAPE is faster than all above mentioned ML algorithms. The CNN-LSTM
163.17 116.90 84.13 62.72 MAE complexity per time step can be derived as the sum of the complexity
216.91 144.06 98.30 76.05 RMSE
0.39 0.39 0.71 0.83
of the convolutional layers and the LSTM layer:
R2 ∑
6.79 19.08 16.88 16.05 Std. O( dl=1 (nl− 1 * sl2 * nl * ml2 ) +w) and for all the training process
Dev.
(∑ )
d
0.50 0.86 0.92 0.94 PA
O l=1 ((nl− 1 *sl2 *nl *ml2 ) + w)*i*e , where l is the input length and e is
14
U. Javed et al. Expert Systems With Applications 205 (2022) 117689
Short-Term Load Forecasting (STLF) has gained a considerable sig This work is supported by Estonian research council grants PSG739
nificance for energy management and electrical load scheduling in and European Commission through H2020 project Finest Twins grant
power systems. This paper proposes a novel hybrid deep learning-based No. 856602. Moreover, this research work is conducted as preliminary
Encoder-Decoder (ED) technique to improve the generalization capa research for the research project approved under HEC post-doctoral
bility and forecasting accuracy of the STLF problem. In ED, the Short fellowship funding Phase III (Batch-2).
Receptive field-based Dilated Causal Convolutional Network integrated
with Bidirectional Long-Short Term Memory (SRDCC-BiLSTM) and the References
advantages of the proposed hybrid SRDCC-BiLSTM model are as follows:
Ali, S. M., Jawad, M., Khan, M. U., Bilal, K., Glower, J., Khan, S. U., & Zomaya, A. Y.
(2020). An ancillary services model for data centers and power systems. IEEE
• The encoder section extracts the specific patterns of the local trends Transactions on Cloud Computing, 8(4), 1176–1188.
in the time-series data convincingly without increasing model pa Aslam, S., Herodotou, H., Mohsin, S. M., Javaid, N., Ashraf, N., & Aslam, S. (2021).
rameters using 1 × 1 and 2 × 2 dilated causal convolution filters for A survey on deep learning methods for power load and renewable energy forecasting
in smart microgrids. Renewable and Sustainable Energy Reviews, 144, Article 110992.
one-step ahead and multi-step day-ahead electrical load forecasting, Bianchi, F. M., Santis, E. D., Rizzi, A., & Sadeghian, A. (2015). Short-Term Electric Load
respectively with a dilation rate of 1 and 2. Forecasting Using Echo State Networks and PCA Decomposition. IEEE Access, 3,
• The 1 × 1 and 2 × 2 dilated causal convolution filters also help in 1931–1943.
Bouktif, S., Fiaz, A., Ouni, A., & Serhani, M. A. (2018). Optimal deep learning LSTM
restricting feature maps containing local patterns of climatic, his model for electric load forecasting using feature selection and genetic algorithm:
torical electrical load data, and temporal features as explained comparison with machine learning approaches. Energies, 11(7), 1636.
earlier, which reduces the time complexity of the encoder section Brezak, D., Bacek, T., Majetic, D., Kasac, J., & Novakovic, B. (2012). A comparison of
feed-forward and recurrent neural networks in time series forecasting. 2012 IEEE
and proposed model to O(n).
Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr).
• The proposed model also avoids unnecessary changes in the input New York, NY, USA.
sequence vector representation in the inner section of the encoder Buitrago, J., & Asfour, S. (2017). Short-term forecasting of electric loads using nonlinear
using 1 × 1 dilated causal convolution filters, which helps in autoregressive artificial neural networks with exogenous vector inputs. Energies, 10
(1), 40. https://doi.org/10.3390/en10010040
capturing the vital features of the electrical load pattern in the Cavallo, J., MarinescuIvana, A., Dusparic, & Clarke, S. (2015). Evaluation of Forecasting
dataset. Methods for Very Small-Scale Networks. Data Analytics for Renewable Energy
• The decoder section enhances the prediction accuracy using 128 Integration (pp. 56-75). New York City: Springer International. doi:10.1007/978-3-
319-27430-0_5.
BiLSTM cells. The proposed hybrid SRDCC-BiLSTM model is evalu Claesen, M., Smet, F. D., Suykens, J. A., & Moor, B. D. (2014). Fast Prediction with SVM
ated and validated on a real-time yearlong dataset. The extensive Models Containing RBF Kernels. arXiv preprint, arXiv:1609.03499.
experimentation reveals that the proposed architecture shows up to Deb, C., Zhang, F., Yang, J., Lee, S. E., & Shah, K. W. (2017). A review on time series
forecasting techniques for building energy consumption. Renewable and Sustainable
35 percent more significant improvement in electrical load predic Energy Reviews, 74, 902–924.
tion for step-ahead electrical load forecasting than CNN-LSTM, Edigera, V.Ş., & Akarb, S. (2007). ARIMA forecasting of primary energy demand by fuel
which was the best among all implemented models. The compari in Turkey. Energy Policy, 35(3), 1701–1708. https://doi.org/10.1016/j.
enpol.2006.05.009
son with other machine learning and deep learning models is con Fallah, S. N., Deo, R. C., Shojafar, M., Conti, M., & Shamshirband, S. (2018).
ducted based on standard evaluation parameters, such as Mean Computational intelligence approaches for energy load forecasting in smart energy
Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), management grids: state of the art, future challenges, and research directions.
Energies, 11(3), 596.
Mean Absolute Error (MAE), R-squared (R2), Standard Deviation
Fan, G.-F., Guo, Y.-H., Zheng, J.-M., & Hong, W.-C. (2019). Application of the Weighted
(Std. Dev.), and Prediction Accuracy (PA). K-Nearest Neighbor Algorithm for Short-Term Load Forecasting. Energies, 12(5).
• An in-depth qualitative analysis is accomplished in different weather https://doi.org/10.3390/en12050916
seasons, which indicates the efficiency of the presented model by Farsi, B., Amayri, M., Bouguila, N., & Eicker, U. (2021). On Short-term load forecasting
using machine learning techniques and a novel parallel deep LSTM-CNN Approach.
finding the closest match between the actual and the predicted IEEE Access, 9, 31191–31212. https://doi.org/10.1109/ACCESS.2021.3060290
electrical load. Han, Y., Sha, X., Grover-Silva, E., & Michiardi, P. (2014). On the impact of socio-
economic factors on power load forecasting. IEEE International Conference on Big Data
(Big Data). Washington, DC, USA. doi:10.1109/BigData.2014.7004299.
In the future, we will consider more diversified electrical load data Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation,
and improve the STLF performance by implementing a novel error 9(8), 1735–1780.
correction deep learning technique. Javed, U., Ijaz, K., Jawad, M., Ansari, E. A., Shabbir, N., Kütt, L., & Husev, O. (2021).
Exploratory data analysis based short-term electrical load forecasting: A
comprehensive analysis. Energies, 14(17), 5510.
Declaration of Competing Interest Javed, U., Mughees, N., Jawad, M., Azeem, O., Abbas, G., Ullah, N., … Tahir, U. (2021).
A Systematic review of key challenges IN Hybrid HVAC–HVDC grids. Energies, 14
(17), 5451.
The authors declare that they have no known competing financial Jawad, M., Ali, S. M., Khan, B., Mehmood, C. A., Farid, U., Ullah, Z., … Sami, I. (2018).
interests or personal relationships that could have appeared to influence Genetic algorithm-based non-linear auto-regressive with exogenous inputs neural
the work reported in this paper. network short-term and medium-term uncertainty modelling and prediction for
electrical load and wind speed. The Journal of Engineering, 2018(8), 721–729.
https://doi.org/10.1049/joe.2017.0873
Data Availability: Jawad, M., Qureshi, M. B., Khan, M. U., Ali, S. M., Mehmood, A., Khan, B., & Wang, X.
(2021). A robust optimization technique for energy cost minimization of cloud data
Simulations and Related Data to this article is available online at centers. IEEE Transactions on Cloud Computing, 9(2), 447–460.
Kasper. (2021, December 30). Computational Complexity Of Neural Networks. Retrieved
GitHub that can be accessed at: https://github.com/khalidijaz/ from https://kasperfred.com/series/introduction-to-neural-networks/
SRDCC-BilSTM-Short-Term-Electrical-Load-Forecasting, an open- computational-complexity-of-neural-networks.
source online data repository. Khan, K. S., Ali, S. M., Ullah, Z., Sami, I., Khan, B., & Mehmood, C. A. (2020). Statistical
Energy Information and Analysis of Pakistan Economic Corridor Based on Strengths,
15
U. Javed et al. Expert Systems With Applications 205 (2022) 117689
Availabilities, and Future Roadmap. IEEE Access, 8, 169701–169739. https://doi. rp5.ru. (n.d.). (Raspisaniye Pogodi Ltd.) Retrieved from www.rp5.ru/Weather_in_the_
org/10.1109/ACCESS.2020.3023647 world.
Khan, M. U., Jawad, M., & Khan, S. U. (2021). Adadb: Adaptive Diff-Batch optimization Rueda, F. D., Suárez, J. D., Torres, A., & d.. (2021). Short-Term Load Forecasting Using
technique for gradient descent. IEEE Access, 9, 99581–99588. Encoder-Decoder WaveNet: Application to the French Grid. Energies, 14(9), 2524.
Komorowski, M., Marshal, D. C., Salciccioli, l. D., & Crutain, Y. (2016). Exploratory Data Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L. A., & Zurada, J.
Analysis. In Secondary Analysis of Electronic Health Records (pp. 185-203). M. (2015). Reducing Time Complexity of SVM Model by LVQ Data Compression.
Kong, W., Dong, Z. Y., Jia, Y., Hill, D. J., Xu, Y., & Zhang, Y. (2019). Short-term 2015 14th International Conference on Artificial Intelligence and Soft Computing
residential load forecasting based on LSTM recurrent neural network. IEEE (ICAISC). Zakopane, Poland.
Transactions on Smart Grid, 10(1), 841–851. https://doi.org/10.1109/ Sadaei, H. J., Silva, P. C., Guimarães, F. G., & Leee, M. H. (2019). Short-term load
TSG.2017.2753802 forecasting by using a combined method of convolutional neural networks and fuzzy
Lara-Benítez, P., Carranza-García, M., Luna-Romera, J. M., & Riquelme, J. C. (2020). time series. Energy, 175, 365–377.
Temporal convolutional networks applied to energy-related time series forecasting. Sherstinsky, A. (2020). Fundamentals of recurrent neural network (rnn) and long short-
Applied Sciences, 10(7), 2322. term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, Article
Li, L., Ota, K., & Dong, M. (2017). Everything is Image: CNN-based Short-term Electrical 132306.
Load Forecasting for Smart Grid. 2017 14th International Symposium on Pervasive Shi, H., Xu, M., & Li, R. (2018). Deep learning for household load forecasting—A novel
Systems, Algorithms and Networks & 2017 11th International Conference on Frontier of pooling deep RNN. IEEE Transactions on Smart Grid, 9(5), 5271–5280.
Computer Science and Technology & 2017 Third International Symposium of Creative Sobhani, M., Campbell, A., Sangamwar, S., Li, C., & Hong, T. (2019). Combining weather
Computing (ISPAN-FCST-ISCC). Exeter, UK. doi:10.1109/ISPAN-FCST-ISCC.2017.78. stations for electric load forecasting. Energies, 12(8), 1550. https://doi.org/10.3390/
Li, W., Shi, Q., Sibtain, M., Li, D., & Mbanze, D. E. (2020). A hybrid forecasting model for en12081510
short-term power load based on sample entropy, two-phase decomposition and Somu, N., & Ramamritham, K. (2021). A deep learning framework for building energy
whale algorithm optimized support vector regression. IEEE Access, 8, consumption forecast. Renewable and Sustainable Energy Reviews, 137, Article
166907–166921. 110591.
Li, S., Wang, P., & Goel, L. (2016). A novel wavelet-based ensemble method for short- Tarkhaneh, O., & Shen, H. (2019). Training of feedforward neural networks for data
term load forecasting with hybrid neural networks and feature selection. IEEE classification using hybrid particle swarm optimization, Mantegna Lévy flight and
Transacions on Power Systems, 31(3), 1788–1798. https://doi.org/10.1109/ neighborhood search. Heliyon, 5(4). https://doi.org/10.1016/j.heliyon.2019.e01275
TPWRS.2015.2438322 Tayaba, U. B., Zia, A., Yanga, F., Lu, J., & Kashif, M. (2020). Short-term load forecasting
Liu, X., Zhang, Z., & Song, Z. (2020). A comparative study of the data-driven day-ahead for microgrid energy management system using hybrid HHO-FNN model with best-
hourly provincial load forecasting methods: From classical data mining to deep basis stationary wavelet packet transform. Energy, 203, Article 117857.
learning. Renewable and Sustainable Energy Reviews, 119, Article 109632. Tsironi, E., Barros, P., Weber, C., & Wermter, S. (2017). An analysis of Convolutional
Lv, P., Liu, S., Yu, W., Zheng, S., & Lv, J. (2020). EGA-STLF: A hybrid short-term load Long Short-Term Memory Recurrent Neural Networks for gesture recognition.
forecasting model. IEEE Access, 8, 31742–31752. Neurocomputing, 268, 76–86.
Ma, C., Dai, G., & Zhou, J. (2021). Short-term traffic flow prediction for urban road sections Uzair, M., & Jamil, N. (2020). Effects of Hidden Layers on the Efficiency of Neural
based on time series analysis and LSTM_BILSTM Method (pp. 1–10). Early Access: IEEE networks. 2020 IEEE 23rd International Multitopic Conference (INMIC). Bahawalpur,
Transactions on Intelligent Transportation Systems. Pakistan.
Mamun, A. A., Sohel, M., Mohammad, N., Sunny, M. S., Dipta, D. R., & Hos, E. (2020). Velasco, L. C., Estoperez, N. R., Jayson, R. J., Sabijon, C. J., & Sayles, V. C. (2018). Day-
A comprehensive review of the load forecasting techniques using single and hybrid ahead base, intermediate, and peak load forecasting using k-means and artificial
predictive models. IEEE Access, 8, 134911–134939. neural networks. International Journal of Advanced Computer Science and Applications,
Musbah, H., & El-Hawary, M. (2019). SARIMA Model Forecasting of Short-Term 9(2). https://doi.org/10.14569/IJACSA.2018.090210
Electrical Load Data Augmented by Fast Fourier Transform Seasonality Detection. Yildiz, B., Bilbao, J., & Sproul, A. (2017). A review and analysis of regression and
2019 IEEE Canadian Conference of Electrical and Computer Engineering (CCECE). machine learning models on commercial building electricity load forecasting.
Edmonton, AB, Canada. Renewable and Sustainable Energy Reviews, 73, 1104–1122. https://doi.org/10.1016/
Oord, A. v., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., . . . j.rser.2017.02.023
Kavukcuoglu, K. (2016). WaveNet: A Generative Model for Raw Audio. arXiv Zhang, Y.-F., & Chiang, H.-D. (2020). Enhanced ELITE-Load: A Novel CMPSOATT
preprint, arXiv:1609.03499. Methodology Constructing Short-Term Load Forecasting Model for Industrial
Rafi, S. H., Nahid-Al-Masood, D. S. R, & Hossain, E. (2021). A Short-Term Load Applications. IEEE Transactions on Industrial Informatics, 16(4), 2325–2334.
Forecasting Method Using Integrated CNN and LSTM Network. IEEE Access, 9,
32436–32448.
16