0% found this document useful (0 votes)
39 views16 pages

1 s2.0 S0957417422009848 Main

This paper presents a novel two-stage Encoder-Decoder network for short-term load forecasting (STLF) that integrates a Short Receptive Field based Dilated Causal Convolutional network with a Bidirectional LSTM. The proposed model demonstrates improved generalization and forecasting accuracy, outperforming existing machine learning and hybrid deep learning models by 35% in accuracy. Extensive experiments validate the model's efficiency in capturing local trends in electrical load patterns and its computational feasibility for real-world applications.

Uploaded by

khalid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views16 pages

1 s2.0 S0957417422009848 Main

This paper presents a novel two-stage Encoder-Decoder network for short-term load forecasting (STLF) that integrates a Short Receptive Field based Dilated Causal Convolutional network with a Bidirectional LSTM. The proposed model demonstrates improved generalization and forecasting accuracy, outperforming existing machine learning and hybrid deep learning models by 35% in accuracy. Extensive experiments validate the model's efficiency in capturing local trends in electrical load patterns and its computational feasibility for real-world applications.

Uploaded by

khalid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Expert Systems With Applications 205 (2022) 117689

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

A novel short receptive field based dilated causal convolutional network


integrated with Bidirectional LSTM for short-term load forecasting
Umar Javed a, 1, Khalid Ijaz b, c, 1, Muhammad Jawad b, e, *, 2, Ikramullah Khosa b,
Ejaz Ahmad Ansari b, Khurram Shabih Zaidi b, Muhammad Nadeem Rafiq b, Noman Shabbir d
a
National Transmission and Despatch Company Limited (NTDCL), Ministry of Energy, Power Division, Pakistan
b
Department of Electrical and Computer Engineering, COMSATS University Islamabad, Lahore Campus, 54000, Pakistan
c
Electrical Engineering Department, University of Management and Technology, Lahore 54000, Pakistan
d
Department of Electrical Power Engineering & Mechatronics, Tallinn University of Technology, Estonia
e
Department of Computer Information Sciences & Engineering, University of Florida, Gainesville, FL 32611-6120, USA

A R T I C L E I N F O A B S T R A C T

Keywords: The Short-Term Load Forecasting (STLF) is a pre-eminent task for reliable power generation and electrical load
Data analysis dispatching in the power system. Numerous machine-learning and deep-learning forecasting algorithms have
Load forecasting been presented in literature for performing an accurate electrical load forecast. However, the complicated
Learning (artificial intelligence)
structure of machine-learning and deep-learning multi-layer and with increased filter size architectures provoke
Machine learning
Power engineering computing
the overfitting issue, which degrades the performance of STLF engines in the presence of highly diversified
Time series analysis weather and temporal variations. This paper proposes a novel two-stage Encoder-Decoder (ED) network with
improved generalization capability and forecasting accuracy. The proposed architecture is based on Short
Receptive field based Dilated Causal Convolutional (SRDCC) network in the first stage and Bi-directional Long
Short-Term Memory (BiLSTM) network in the second stage. Using real valued data, the proposed ED architecture
is quantitatively and qualitatively analyzed in comparison with state-of-the-art machine-learning and hybrid
deep-learning STLF models. The evaluation matrix used for the comparison consists of six evaluation parameters.
The extensive experimentation for multi-step ahead STLF validates the efficiency of the proposed technique in
terms of accuracy in comparison with other employed models. The CNN-LSTM revealed to have best performance
among all other implemented parametric and non-parametric forecasting models; however, the proposed ED
architecture proves to be 35% more accurate compared to CNN-LSTM and have the tendency to capture the local
trends in an electrical load pattern more accurately. Moreover, a detailed comparative analysis on the compu­
tational complexity of the proposed ED architecture is also conducted to show the real implementation prospect.

1. Introduction renewable energy sources to maintain a low carbon and sustainable


environment (Cavallo, MarinescuIvana, & Dusparic, 2015; Jawad et al.,
The electrical load forecasting is vital for power systems as economic 2021). The in-accurate load forecast not only impact generation plan­
operations, such as unit commitment and maintenance, and demand ning but also hampers the protection and security of electrical power
schedule for power generating units in different levels of energy sectors. systems (Javed et al., 2021). Furthermore, the Short-Term Load Fore­
An accurate forecast not only helps Distribution System Operators casting (STLF) can play an imperative role in the structuring of eco­
(DSOs) to avoid highly disruptive blackouts but also assists them in nomic, secure, and reliable operating strategies for the distribution
planning diversified generations with stability. Moreover, reduces the electrical utilities (Khan et al., 2020). Therefore, the motivation behind
cost of power generation and dispatching for renewable and non- this research paper is to empower the electric utilities with state-of-the-

* Corresponding author at: Department of Electrical and Computer Engineering, COMSATS University Islamabad, Lahore Campus, 54000, Pakistan.
E-mail addresses: [email protected] (K. Ijaz), [email protected] (M. Jawad), [email protected] (I. Khosa), [email protected]
(N. Shabbir).
1
Co-first author: U. Javed and K. Ijaz contributed equally to this paper.
2
ORCID: https://orcid.org/0000-0003-3730-2128.

https://doi.org/10.1016/j.eswa.2022.117689
Received 24 February 2022; Received in revised form 20 April 2022; Accepted 28 May 2022
Available online 4 June 2022
0957-4174/© 2022 Elsevier Ltd. All rights reserved.
U. Javed et al. Expert Systems With Applications 205 (2022) 117689

art and an improved hybrid Deep-Neural Networks (DNN) methodology


for the optimal solution of the STLF problem.
High nonlinearity in electrical load patterns due to weather-sensitive
loads pose a challenging problem to construct an accurate STLF model.
Over the past two decades, many scientific research works have been
presented on short-term electrical load forecasting with a prediction
horizon from one hour to a few weeks. The statistical regression methods
include ARIMA and SARIMA models, which use the lagged moving av­
erages values of STLF time series data (Edigera & Akarb, 2007; Musbah
& El-Hawary, 2019). However, the statistical methods lack the capa­
bility to capture temporal variations and non-linear electrical load pat­
terns (Deb, Zhang, Yang, Lee, & Shah, 2017; Tayaba, Zia, Yanga, Lu, &
Kashif, 2020). The performance of statistical methods is enhanced using
Principal Component Analysis (PCA), which is a dimension reduction
technique. However, the PCA process results in the loss of few key fea­
tures, if the coefficients of co-variance matrix did not adjust properly
(Bianchi, Santis, Rizzi, & Sadeghian, 2015).
For STLF, several Machine Learning (ML) models, such as Artificial
Neural Networks (ANNs), K-Nearest Neighbors (KNN), and Support
Vector Regression (SVR) are capable of dealing with non-linear temporal
variations in electrical load data (Javed et al., 2021; Yildiz, Bilbao, &
Fig. 1. Bibliographic overview of state-of-the-art architectures for electrical
Sproul, 2017). However, the drawbacks associated with such algorithms
load forecasting.
are vanishing gradient, overfitting, and complex hyper-parameters
tuning problems.(Li, Shi, Sibtain, Li, & Mbanze, 2020; Mamun et al.,
and can extract prominent features and patterns present in high
2020; Shi, Xu, & Li, 2018). The hybrid machine learning models, such as
dimensional time-series electrical load data (Li, Ota, & Dong, 2017).
Genetic Algorithm (GA) based weighted KNN, KNN-ANN and KNN-SVR
Therefore, the advantages of both CNN and LSTM can be merged to
successfully extract prominent features by resolving the above-
develop a new hybrid CNN-LSTM model, which manifests a better pre­
mentioned problems (Fan, Guo, Zheng, & Hong, 2019; Velasco, Esto­
diction accuracy and low error metrics than metaheuristic approaches
perez, Jayson, Sabijon, & Sayles, 2018). However, the complicated ar­
and conventional ML and DL techniques (Farsi, Amayri, Bouguila, &
chitectures and an unknown number of optimal clusters restraint the
Eicker, 2021; Liu, Zhang, & Song, 2020; Rafi, Nahid-Al-Masood, &
validity of KNN based hybrid ML models (Fallah, Deo, Shojafar, Conti, &
Hossain, 2021; Somu & Ramamritham, 2021). The hybrid CNN models
Shamshirband, 2018). The use of extreme machine learning (ELM) with
still experience the overfitting issue due to the massive use of hidden
Levenberg–Marquardt (LM) and Conditional Mutual Information-based
layers (Sadaei, Silva, Guimarães, & Leee, 2019). Moreover, the hybrid
Feature Selection (CMIFS) enhances the forecasting performance of
CNN architectures have a large training time. The hybrid DL models
Feed Forward Neural Networks (FFNN), which also increases the time
show advancement in diminishing the errors in STLF at the cost of
complexity due to Hessian matrix (Li, Wang, & Goel, 2016; Zhang &
increased time complexity compared to conventional neural networks.
Chiang, 2020). The Particle Swarm Optimization (PSO) and GA based
The time complexity increases due to the cascade combination of mul­
NNs were also presented to optimize the weights of step-ahead and
tiple neural networks since each of constituent architecture consumes
multi-step ahead STLF models (Buitrago & Asfour, 2017; Jawad et al.,
valuable time to execute hyper variable functions. With the existence of
2018). However, the above-mentioned metaheuristic algorithms can
High-Performance Computing (HPC) servers, the computational
easily converge to a local optimum under the diversified feature space
complexity of hybrid DL algorithms is not a big issue in modern
(Bouktif, Fiaz, Ouni, & Serhani, 2018).
research. Therefore, the current research studies show more attention
The Deep Learning (DL) methods have enhanced the accuracy of
towards improving the accuracy of predicted electrical load for the
STLF models using highly diversified input data compared to conven­
stability of real-time power system operations (Ali et al., 2020).
tional ML algorithms (Aslam et al., 2021). Recurrent Neural Networks
A bibliometric analysis of STLF problem is conducted using SCOPUS
(RNNs) captures non-linear input and output relationship efficiently in
to spotlight the previous contributions of the different researchers and
STLF problem (Brezak et al., 2012). However, the vanishing gradient
the state-of-the art algorithms developed for STLF models, as demon­
issue and inadequacy of capturing long-term dependencies limits the
strated in Fig. 1 plotted on VOSviewer software. In the last decade, the
applications of RNNs (Khan, Jawad, & Khan, 2021; Sherstinsky, 2020).
focus of the research is more inclined to adopt DNN and LSTM archi­
The vanishing gradient problem is removed by the Long Short-Term
tectures, while few researchers have used RNN along with LSTM
Memory (LSTM) networks, which enhances the capability of learning
network to improve the operational efficiency of STLF models. In Fig. 1,
temporal correlations and long-term dependencies present in electrical-
the illustration also gives insights into the novelty of hybrid DL models
load curve patterns (Kong et al., 2019). However, a large number of
since very few researchers have utilized the DNN hybrid architectures
parameters in LSTM model cause over-fitting and reduce STLF perfor­
and the above-mentioned shortcomings suggest that there is still a need
mance (Lv, Liu, Yu, Zheng, & Lv, 2020; Shi et al., 2018).
for further advancement in STLF models to reduce the prediction error.
With the increase of high dimensional features and variational pa­
This paper proposes a Short Receptive Field based Dilated Causal
rameters, such as weather conditions, timestep, and historical load
Convolutional Network (SRDCC) and Bidirectional LSTM (BiLSTM) is an
values, the hybrid DL models have been proposed to reduce the pre­
Encoder-Decoder (ED) configuration to increase the STLF performance
diction error in STLF. The RNN-LSTM has eradicated the vanishing
and generalization ability. In encoder module, the SRDCC captures the
gradient problem by cascading LSTM with RNN (Shi et al., 2018), but is
relevant electrical load patterns of the local trends using smaller size
still not able to capture all the necessary input parameters required for
convolution filters. Afterwards, distinct types of extracted features, from
electrical load forecasting, such as temporal and climatic features, and
an encoder module, are fused with the vector representation. Whereas in
the patterns in historical electrical load data. After the deployment of
decoder module, the BiLSTM unit is implemented to convert the vector
RNN-LSTM, the advanced research studies are currently focusing on
representation into the predicted value of the consumed electrical load
hybridizing Convolutional Neural Network (CNN) with other DL tech­
with comparatively a smaller number of neurons compared to
niques for time-series data. The CNN gives high accuracy in estimation

2
U. Javed et al. Expert Systems With Applications 205 (2022) 117689

Table 1
Summary of various state-of-the-art short-term electrical load forecasting models.
Ref. Methodology input Input parameter Forecast horizon Electricity Market/ Performance
resolution Data Metrics

(Cavallo et al., ANN Hourly Temperature, Humidity, Day type Daily load Irish Commission of NRMSE
2015) Energy Regulation
(Edigera & Akarb, ARIMA, SARIMA Historical load Yearly Energy resources of MSE
2007) Turkey
(Yildiz et al., Multiple linear Climate, temporal, historical load, Hourly, daily peak University of New MAPE, RMSE,
2017) regression, ANN, South Wales, MBE, R-Square
SVM, NARX Sydney, Australia
(Bianchi et al., PCA, Autoregressive 10 min Lagged loads 1-step ahead, 144-step Azienda Comunale NRMSE
2015) ahead Energia e Ambiente,
Rome, Italy
(Velasco et al., KNN-ANN Monthly Metering point name, date time, Kilowatt Day-ahead base, Power Utility MAPE
2018) delivered, Kilowatt per hour delivered Intermediate and Peak Company,
and Kilovolt amps reactive hours load Philippines
delivered.
(Jawad et al., FARIMA 30 min Weekdays, Weekends, Power load Daily Power load EIRGIRD Group, RMSE, MAPE,
2018) Ireland MAE, NRMSE
(Fan et al., 2019) W-KNN Historical load 8-Hours National Electricity RMSE, NMSE,
Market (Australia) MAPE, MAE
(Yildiz et al., KNN 30 min Minimum Temperature, Maximum Daily load Australian National MAPE, MAE
2017) Temperature Electricity Market
(NEM)
(Li et al., 2016) Extreme Learning Past load, Past temperature, forecasted Hourly, daily ISO New England MAPE, MAE, RMSE
Machine temperature, Day of week, Weekend
(Buitrago & ANN with NARX Month, Day, Hour of the day, Day of Daily load ISO-NE grid AE%, MAPE
Asfour, 2017) week, Working Day, Temperature, Dew operator, England
point
(Jawad et al., GA-NARX-NN 15 min Day of the week, Dry bulb temperature, 168-hour load National Estuarine MAPE, RMSE, Error
2018) Dew point temperature, Hour of the day, Research Reserve variance
Working or off day, Previous week, same System, Texas
day, same hour load, Average’s load
(Kong et al., 2019) LSTM RNN Time-steps, sequence of Consumed 5-days aggregated load Commercial-scale MAPE
energy, sequence of time day indices, smart grid project,
weekday indices, Holidays. SGSC, Australia
(Somu & CNN-LSTM 15 min Day, Month, Year, Hour, Minutes, Daily analysis on the IIT-Bombay, India MSE, RMSE, MAPE,
Ramamritham, Seconds and Energy consumed selected days of MAE
2021) weekdays (Tuesday,
Thursday, Saturday)
Our Work SRDCC-BiLSTM-ED 15 Minutes Temperature, Humidity, Hour of the day, Step Ahead & Day Ahead Lahore Electricity MAPE, RMSE,
Day of the week, working day, Previous Supply Corporation MAE, R-Square,
3 h electrical load, previous 3 days same (LESCO), Pakistan Std. Dev., PA, PI
hour electrical load

Abbreviations: MBE: Mean bias error, ANE: Absolute normalized error, Max. AE: Maximum Absolute error AAE: Average absolute error, APE: Absolute percentage
error, RPE: Relative percentage error, MARPE: Mean absolute relative percentage error, NMSE: Normalized mean square error.

conventional LSTM. The proposed ED architecture is implemented on • For the sake of detailed and valid comparative analysis and to vali­
the real-time electric load profile of Lahore, Pakistan for a step ahead date the performance of the proposed ED model, the same PM is used
and day ahead STLF. However, the SRDCC-BiLSTM architecture is to develop linear parametric forecasting models, such as Auto-
equally applicable to other datasets with the small tuning of hyper- Regressive with Exogeneous Inputs (ARX), Auto-Regressive Moving
parameters because the small size of convolution layers and the small Average with Exogeneous Inputs (ARMAX), and Output Error (OE)
number of neurons and hidden layers in BiLSTM add robustness to the and for ML modeling techniques, such as KNN, SVM, Tree Bagger,
over-fitting. To the best of our knowledge, the ED configuration is ANN-PSO and ANN-LM.
implemented for the first time for STLF problem. Moreover, Table 1 il­ • The proposed SRDCC-BiLSTM framework is developed for STLF
lustrates a detailed overview of various state-of-the-art short-term using the same PM, which combines the key features of both CNN
electrical load forecasting models and gives a comprehensive compar­ and BiLSTM and reveals considerable progress compared to the
ative analysis in terms of methodology, resolution, parameters of input aforementioned techniques.
features, forecasting horizon, details of data availability, and the per­ • In SRDCC, a small size dilated causal convolution filters in the
formance metrics used in cited research articles. Considering the above encoder section is adopted, which reduces the receptive field of the
discussion, the main contributions of this paper are: lower layer to extract the specific patterns of the local trends in time-
series data without increasing model parameters. However, the
• A detailed Exploratory Data Analysis (EDA) is performed to identify proposed SRDCC block deploys the filters of size 2 × 2 in a second
the exogenous electrical and non-electrical multi-variable inputs for one-dimensional convolutional layer to improve the generalization
designing STLF model. Correlation analysis is used to identify his­ capability. This type of filters combination is never presented before
torical load values that are useful for prediction. Moreover, QQ-plots in an electrical STLF method.
and boxplots between the proposed inputs and output are analyzed • A suitable framework and an appropriate filter size impose re­
to validate the findings of correlation analysis. Based on the results of strictions on the extensive change of input sequence dimensions
the EDA of electrical and non-electrical parameters, a Predictor during the convolution process in an encoder module. Therefore, this
Matrix (PM) is developed for the modeling of STLF. framework ensures that all the necessary features and patterns

3
U. Javed et al. Expert Systems With Applications 205 (2022) 117689

Fig. 2. Exploratory Data Analysis: (a) Autocorrelation plot of electrical load consumption; (b) QQ-plot of temperature vs electrical load; (c) QQ-plot of humidity vs
electrical load; (d) QQ-plot of present electrical load vs previous hour lagged load values; (e) QQ-plot of present electrical load vs previous 2nd hour lagged load
values; (f) QQ-plot of present electrical load vs previous 3rd hour lagged load values; (g) QQ-plot of present electrical load vs previous 3 h average lagged load values;
(h) QQ-plot of present electrical load vs previous day same hour lagged load values; (i) QQ-plot of present electrical load vs previous 2nd day same hour lagged load
values; (j) QQ-plot of present electrical load vs previous 3rd day same hour lagged load values; (k) Electrical load consumption curve of the entire month.

extracted from the convolutional process must participate in the next load data and avoiding over-fitting, under-fitting, and vanishing
stages of the prediction process. gradient problems.
• In SRDCC, a small size dilated causal convolution filters in the • The SRDCC-BiLSTM architecture is evenly significant to other data­
encoder section is adopted, which reduces the receptive field of the sets with the small tuning of hyper-parameters because the small size
higher layers to extract the specific patterns of the local trends in of convolution layers and the small number of neurons and hidden
time-series data without increasing model parameters. A suitable layers in BiLSTM add robustness to the over-fitting.
framework imposes restrictions on the extensive change of input • A comprehensive qualitative and quantitative comparison among all
sequence dimensions during the convolution process in the encoder linear and non-linear parametric developed forecasting models is
module. performed over different seasons in the entire year. The validity of
• In BiLSTM, the extracted patterns of local trends are transformed into the proposed model is investigated using several performance met­
the vector representation to make data compatible with the BiLSTM rics such as Mean Absolute Percentage Error (MAPE), Root Mean
block. The proposed ED framework forecasts the step-ahead and day- Square Error (RMSE), Mean Absolute Error (MAE), R-square, Stan­
ahead electrical load values using BiLSTM in decoder module with dard Deviation (Std. Dev.), Prediction Accuracy (PA), and Prediction
the priority of extracting the complete features from the electrical Interval Bound Percentage (PI %).

4
U. Javed et al. Expert Systems With Applications 205 (2022) 117689

2500

2000

Electrical Load (MW)


1500

1000

500

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Day of the Month

Fig. 3. Box-Plot analysis of Electrical load Profile for January 2019.

• A detailed complexity analysis of the proposed SRDCC-BiLSTM Fig. 2(a) represents the autocorrelation plots of electrical load profile for
model is conducted to show the computational efficiency and time January 2019 over consecutive time intervals up to 1 week (96 lags/
complexity of the algorithm. day). In Fig. 2(a), one lag represents a lagged version of 15 min.
Therefore, 96 lags represent the lagged version of one complete day. The
2. Exploratory data analysis following autocorrelation plots of 700 lags presents lagged version over
consecutive time intervals up to 1 week (almost). Fig. 2(a) illustrates
The EDA is a significant statistical approach to examine the key that the electrical load has a strong correlation in the near past (with
features present in the dataset and reveals the relationship between consecutive last 3 h) and with previous day similar hours load profile.
inputs and outputs (Javed et al., 2021). The EDA is a preliminary Therefore, based on this analysis, we selected corresponding lag values
analysis to explore hidden features and patterns before formal hypoth­ as inputs in the PM used for the modeling of STLF.
esis investigation and modeling (Komorowski & Marshal, 2016).

2.4. Quantile-Quantile Analysis:


2.1. Dataset description

The quantile–quantile plots given in Fig. 2(b) and (c) represent a


In this study, we used the electrical load profile and climatic data of
strong linear relationship between the climatic data (temperature and
Lahore, Pakistan (Javed et al., 2021). The residential electrical load
humidity) and electrical power consumption. Moreover, Fig. 2(d)–(j)
profile is collected from Lahore Electric Supply Company (LESCO),
represents the quantile–quantile plots of current electrical load value
Pakistan (Javed et al., 2021). The data is recorded in real-time at 15-mi­
plotted against its lagged versions for January 2019. The straight linear
nutes time interval constituting of 96 samples per day from Year 2010 to
correlation verifies the findings of the auto-correlation analysis. More­
Year 2019 at an aggregated level. For the same years, climatic data, such
over, the Fig. 2(k) represents the electrical load profile of January 2019.
as temperature and humidity of Lahore region is collected from an on­
The electrical load curve of January can be formulated by accumulating
line database “Raspisaniye Pogodi Ltd.” at 15-minutes time interval
the daily electrical load curve of the discussed month. The electrical load
(Javed et al., 2021; rp5.ru, n.d.). All the models developed in this
curve of the entire month follows the sinusoidal pattern such that the
research are trained on 8 years data from Year 2010 – Year 2017 while
working days of the week, from Monday to Friday, have the highest
validated and tested on Year 2018 and Year 2019 data.
electrical load consumption, whereas Sundays have the lowest electrical
load consumption. Moreover, banks, institutes, and offices are also
2.2. Input parameters description
closed on Saturday. Therefore, Saturday has also less electrical load
consumption than the electrical load consumption of the working days.
The electrical load and demand profile at a given time depends on
The same periodical nature of the electrical load profile can be observed
two climatic factors, such as temperature and humidity of the respective
for the rest of the weeks of January. The plotted electrical load curve
area (Sobhani, Campbell, Sangamwar, Li, & Hong, 2019). In residential
depicts the existence of daily and weekly periodic load patterns.
load profile, the temporal factors, such as holidays and weekends exhibit
Therefore, an input named day of the week is considered in the PM to
a different load profile than normal working days (Han, Sha, Grover-
developed STLF models.
Silva, & Michiardi, 2014). The STLF methodologies not only incorpo­
rate historical load patterns, but also have temporal and meteorological
parameters (Javed et al., 2021; Jawad et al., 2018). The other specific 2.5. Box plot analysis
climatic features, such as sunny and rainy day are also the vital input
parameters to improve the accuracy of electrical load forecasting. The temporal factors, such as weekends, public holidays, and festi­
However, we did not include these parameters because temperature data vals are also considered as input for the PM because these temporal
of Lahore already capture the sunny and cloudy behavior inherently. factors exhibit a different load profile than normal working days.
Moreover, no separate access of such data is available in an electrical Therefore, box-plot analysis is conducted in Fig. 3 to depict the differ­
load dataset. The following EDA is applied on the dataset to identify ence between weekdays and holidays electrical load pattern in terms of
exogenous multi-variable inputs parameters that are necessary to be mean and standard deviation. The discussed figure also reveals the
considered as an input in the forecasting algorithm. periodicity present in the electrical load profile. For instance, the 6th,
13th, 20th, and 27th day of the month of January 2019 were Sundays,
2.3. Auto-correlation analysis which were the off days and has a similar load profile. The analysis
shows that the electrical load consumption on Sunday is lower than
Autocorrelation represents the degree of resemblance between a other weekdays. Due to this fact, a binary input is included in the PM to
given time series and its lagged version over consecutive time intervals. differentiate between weekdays (1) and holidays (0).

5
U. Javed et al. Expert Systems With Applications 205 (2022) 117689

3. Methodology
Temperature (m)
The proposed hybrid ED architecture (SRDCC-BiLSTM) focuses on
Humidity (m) the mitigation of the over-fitting problem and avoids the overwhelming
of model parameters, such as the size of the convolutional layer that
includes weights of the filter, bias unit, and dimensions of the input
Hour of the Day series in Dilated Causal Convolution Neural Network (DCCN) block. The
proposed hybrid ED architecture for STLF is a combination of two neural
Day of the Week network architectures, such as (a) DCCN with different filter sizes and
Forecasted (b) Bidirectional LSTM (BiLSTM). However, four modifications are
Load introduced in the hybrid ED architecture: (a) introduced short receptive
Working Day field in DCCN using 1 × 1 and 2 × 2 small size dilated causal convolution
(d, h, m+1), filter for both step-ahead and multi-step ahead STLF, (b) restricting
Previous 1st Hour Load (d, h+1, m), feature map and model parameters, (c) avoiding an unnecessary change
(d, h-1, m) (d+1, h, m),
of input sequence vector representation, and (d) smaller number of
Previous 2nd Hour Load BiLSTM neurons to avoid over-fitting. The following subsections briefly
(d, h-2, m) describe different components of DCCN, and Bi-LSTM architecture fol­
Previous 3rd Hour Load lowed by the proposed hybrid ED architecture.
(d, h-3, m)
3.1. Causal convolution
Previous 3 Hours
Average Load
One-dimensional convolution layer implements convolution be­
Previous 1st Day same tween input sequence and filter, which is a sliding process of filter
Hour Load (d-1, h, m) weights across an input series, in which filter weights (model parame­
Previous 2nd Day same ters) are sequentially applied to overlapping regions of the series (Oord
Hour Load (d-2, h, m) et al., 2016). However, the causal convolution holds the causality
property, which states that the convolution at time step t does not
Previous 3rd Day same depend upon the future samples xt+1 , xt+2 , ⋯., xT in the time series data
Hour Load (d-3, h, m)
ensuring the data will never leak from the future into the past (Rueda,
Suárez, Torres, & d., 2021). A stack of causal convolutional layers is
Fig. 4. Predictor Matrix for STLF Models. visualized in Fig. 5.

2.6. Predictor Matrix: 3.2. Dilated convolution

In the light of above analyzed input parameters for the STLF models, Input context window defines the temporal modeling with non-
the Predictor Matrix (PM) is developed, which is represented in Fig. 4. In recurrent networks. The number of model parameters increases as the
Fig. 4, all aforementioned parameters, such as temperature, humidity, size of the input context window increases, which consumes a lot of
and temporal, seasonal, and historical electrical load values are memory and saturates the long-range memory capacity. The dilated
considered as potential inputs in the PM. Whereas the present electrical convolution is instigated to resolve this issue as shown in Fig. 6, where
load is considered as the output of the PM. Moreover, in Fig. 4, the the convolution filter is applied over an area larger than its length by
variables ‘m’ and ‘d’ represent the month of the year and the day of the missing certain input values (Oord et al., 2016).
month, respectively. The ‘h’ represents the electrical load data with 15- Fig. 7 depicts the theoretical concept of dilated causal convolution.
minute resolution (quarter-hourly) as the time-step. The dilated causal convolution introduces a new parameter during
convolution called as the dilation factor. The dilation factor is respon­
sible to create space between values in a mask. For instance, a 3 × 3

Fig. 5. Stack of Causal Convolution Layer.

6
U. Javed et al. Expert Systems With Applications 205 (2022) 117689

Fig. 6. Dilated Convolution Layer.

First Layer Second Layer

0 0 0
0 0 0 0 0 0 0
0 0 0 0 0
0 0
Convolution Filter Dilated Convolution Filter
Convolution Filter Dilated Convolution Filter
Filter Size = 2*2 Dilation Factor = 2
Filter Size = 2*2 Dilation Factor = 2
Filter Size = 3*3
Filter Size = 3*3

Fig. 7. Theoretical sketch of Dilated Convolution.

( | )= ( | , , …, )
i=1

….

Output
Dilation Factor = 8
Hidden
Dilation Factor = 4
Hidden
Dilation Factor = 2
Hidden
Dilation Factor = 1
Input
….

Fig. 8. Dilated Causal Convolution.

mask with a dilation factor of two will have the same receptive field as a between input signal and kernel (mask) (Oord et al., 2016):
5 × 5 mask. However, the regular convolution of 5 × 5 mask has 25

s− 1
parameters if we ignore a bias value. Instead, a dilated convolution has F(t)= (k*lf )t = kτ .ft − lτ (1)
only nine model parameters with the same receptive field. Therefore, the τ=0
dilated causal convolution delivers a wider field of view at the same
computational cost. The Eq. (1) represents the dilated convolution where l and s denote the dilation factor and the filter size, respectively.
More precisely, if l = 1, a dilated convolution performs similar to a

7
U. Javed et al. Expert Systems With Applications 205 (2022) 117689

= ( ∗ℎ + ∗ℎ )

Output …. …. ….

Activation Layer …. …. ….

Backward Layer LSTM LSTM LSTM …. LSTM

Forward Layer LSTM LSTM LSTM …. LSTM

Input ….

Fig. 9. Bi-directional LSTM Architecture.

standard convolution. A stack of dilated convolutions enables networks depicted in Fig. 9.


to have exceptionally large receptive fields with just a few layers by For high frequency time series data forecasting, the BiLSTM network
increasing the dilation factor multiplicatively at each layer (e.g., 1, 2, 4, is more effective compared to conventional LSTM. In BiLSTM, the
8). With dilated convolution, we extract patterns of the local trends of a computational cost of reverse and forward layers is the same. However,
long history data with only nine dilated convolution layers of this form. the direction of propagation of hidden states data is reversed to obtain
the sequential type of time information. The BiLSTM network calcula­
3.3. Dilated causal convolution tion formula for forward and reverse layers are given in Eqs. (3) and (4):
(Ma et al., 2021).
The dilated and causal convolution can be combined to utilize the hf = f (wf 1 .xf + wf 2 .ht− 1 ) (3)
properties of both causality and dilation at the same time to obtain
advantages of convolutions simultaneously as shown in Fig. 8. The hb = f (wb1 .xt + wb2 .ht+1 ) (4)
dilated causal convolution designed a new CNN model that are consid­
ered as a statistical model. The new architecture does not saturate the where hf and hb denote the forward and reverse LSTM network output,
memory capacity of the system (Oord et al., 2016). The conditional respectively. The final output of the BiLSTM is computed as: (Ma et al.,
distribution of the output sequence y1 , y2 , ⋯., yT given the input 2021).
sequence x1 , x2 , ⋯., xT is provided as the product of conditional proba­
bilities computed by Eq. (2), where N represents the length of the yi = g(wo1 *hf + wo2 *hb ) (5)
receptive field (Oord et al., 2016).

T
3.5. Proposed Encoder-Decoder architecture
p(y|x ) = P(yi |xi− N+1 , − xi− N+2 , ⋯, xi ) (2)
i=1
The inputs and output to the ED model are defined in the PM defined
The most renowned DL architecture, which uses the Dilated Causal in Fig. 4. The proposed framework is a combination of cascade combi­
Convolution, is a Temporal Convolutional Network (TCN) for STLF. TCN nation of encoder and decoder, where a novel SRDCC module is
uses a kernel size of two and higher in the different blocks of dilated employed in the encoder section. In conventional CNN, the large filter
causal convolutional filter. TCN also implements more than three size can generalize the non-linear electrical load pattern effectively but
different blocks of stacked layers of dilated causal filter. TCN architec­ fails to capture the patterns of electrical load peaks because of the
ture implements zero padding to make the input and output sizes same incapability of extracting nonlinear highly diversified specific features.
of the dilated causal layer. The dilation factor in the TCN network is 1, 3, Good encoder architecture should capture both generalized and specific
3, 12, and 24 respectively in the different blocks of dilated causal con­ trends effectively. Therefore, for step-ahead load forecasting, we use
volutional filter (Lara-Benítez, Carranza-García, Luna-Romera, & small filter size in the encoder module unlike TCN, which implements
Riquelme, 2020). the filter size of 2x2 and higher in the various stages of the dilated causal
convolutional network, with an advantage to restrict the input di­
3.4. Bidirectional LSTM mensions of the next hidden layer and to reduce model parameters. The
intuition behind using a short receptive field induced by the 1 × 1
The LSTM consists of a backward-propagation path that gathers dilated causal convolution filter size is to maximize the capability of
previous information in the sequence data. However, BiLSTM allows the capturing the more specific nonlinear local patterns in the electrical load
backward and forward propagation with the addition of a forward LSTM data. The small filter size has smaller vicinity to slide over the input data
layer. The forward LSTM reverses the data with respect to the backward for the convolution, which acts as the simple CNN architecture. The
LSTM. The output layer of BiLSTM considers both the past and future dilation rate is set to 1 in the first one-dimensional convolutional layer.
information during the training of the model so that the hidden cells in Moreover, 128 dilated causal convolution filters are deployed in the first
the network can simultaneously obtain both past and future contexts one-dimensional convolutional layer to extract the maximum possible
(Ma, Dai, & Zhou, 2021). The structure of the Bidirectional LSTM is electrical load patterns. In the second layer of the SRDCC block, 128

8
U. Javed et al. Expert Systems With Applications 205 (2022) 117689

Predicted Output
Y(n)
Raw Dataset
Output
Layer

Y1 (F) Y1 (B) Y2 (F) Y2 (B) Yn-1 (F) Yn-1 (B) Yn (F) Yn (B) Historical
Forecasting Climatic Temporal
Electric
Bi-LSTM

Data Data
Block
Module Decoder LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM
Cell Cell Cell Cell Cell Cell Cell Cell Load Data
X(1) X(2) X(n-1) X(n)
Flatten
Layer
.... .... Binary Encoded
Hidden Layer
Feature
Dilation Factor = 2
Extraction
Module Hidden Layer STLF Multivariate Time Series Data
Dilation Factor = 1
Input
Layer

Training Validation Test


Inputs of the Predictor Matrix given in Figure 4.
Data Data Data

Fig. 10. Architectural sketch of the proposed SRDCC-BiLSTM Model for STLF.
Normalization Normalization
dilated causal convolution filters of size 2x2 are implemented with a
dilation rate of two to capture the generalized trends in the electrical
CONV 1D (d=1)
load pattern with small model parameters. After two convolution layers
of the SRDCC block, a flatten layer is employed for vector representation Forecasting
of the learned features. The flattening layer adds compatibility with the CONV 1D (d=2)
next decoder module.
The multi-step ahead forecasting requires more aggressive and all
possible local trend extraction from the electrical load data. Therefore, Flatten Layer Inverse
we use 128 very short receptive field based dilated causal convolution Normalization
filters of size 1x1 to capture more specific features in both the first and
second convolutional layers. Moreover, the dilation rate is set to 1 and 2 Flatten Layer
in the first and second one-dimensional convolutional layers,
respectively. Evaluation
In the decoder section, a BiLSTM module is used to record the long- Bi-LSTM Bi-LSTM ... Bi-LSTM Metrics
Cell Cell Cell
term dependencies in the data. After multiple trials run, we finalize 128
neurons for the BiLSTM module with linear activation function as this
number avoids both under-fitting and overfitting problems. The output
layer of the proposed model consists of a single neuron for predicted
electrical load value. The architectural overview of the proposed ED
model (SRDCC-BiLSTM) is presented in Fig. 10. Pre-Trained
Model
3.6. Theoretical sketch of proposed forecasting framework

Fig. 11 represents the theoretical sketch of the proposed system


framework. The entire multivariate electrical STLF dataset prepared
Training Forecasting
during the exploratory data analysis is segregated into three subsets for Module Module
training, testing, and validation. The 80% of data is used for training,
while 20% of data each is reserved for testing and validation. The entire Fig. 11. Theoretical sketch of Proposed SRDCC-BiLSTM Framework.
dataset is normalized individually using the standard min–max scaling
technique for better performance and convergence. After data normal­
4. Experimental results and discussions
ization, the training data is converted into PM form (Fig. 4) for the
training of the proposed SRDCC-BiLSTM model. The trained SRDCC-
In this Section, the proposed SRDCC-BiLSTM deep-learning model is
BiLSTM model is validated using a validation dataset for the fine-
thoroughly compared with traditional linear and non-linear parametric
tuning of hyper-parameters and to eliminate over-fitting and under-
models, such as ARX, ARMAX, and OE, KNN, Bagged Trees, SVM, ANN-
fitting problems. The Adam optimizer is used during the training and
PSO, ANN-LM (two hidden layer), ANN-LM, LSTM, and CNN-LSTM.
validation phase of the proposed model. Finally, the model is evaluated
Moreover, a detailed qualitative and quantitative analysis is conduct­
for unseen test set to evaluate the effectiveness of the proposed model.
ed based on the following evaluation matrix that consists of seven
Moreover, the forecasting output of the proposed ED model is compared
evaluation parameters.
with the standard LSTM model, CNN-LSTM, and ANN using same PM.
Furthermore, same learning rate, minibatch, and iterations are used
during training for all the above-mentioned NN algorithms for a fair 4.1. Evaluation metric
comparison.
In Literature, various evaluation parameters are used for perfor­
mance analysis of the prediction/forecasting algorithms. Therefore, we

9
U. Javed et al. Expert Systems With Applications 205 (2022) 117689

Table 2 n ⃒ ⃒
Quantitative comparison of Linear Parametric Models for Step-Ahead STLF. 1∑ ⃒xi − yi ⃒
MAPE = ⃒ ⃒ (6)
2 t=1 xi ⃒

Period (2019) OE ARX ARMAX Errors

January 8.42 6.82 5.12 MAPE √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅


142.87 120.08 92.05 MAE ΣNi=1 (xi − yi )2
173.14 155.08 120.25 RMSE
RMSE = (7)
n
0.65 0.72 0.83 R2
4.61 4.18 4.31 Std. Dev.
Σni=1 |yi − xi |
0.90 0.85 0.92 PA MAE = (8)
39.41 73.26 74.65 PI % n

1 − Sum Square Regression Error


April 4.46 2.26 3.05 MAPE R2 = (9)
Sum Squared Total Error
105.40 53.03 72.20 MAE
258.59 108.92 119.80 RMSE √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
0.50 0.91 0.89 R2 Σ(xi − μ)2
7.82 6.48 6.54 Std. Dev. Std.Dev. = σ = (10)
n
0.82 0.95 0.95 PA
78.39 91.06 87.90 PI %
Σ(xi − x)(yi − y)
PA = √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ (11)
June 3.71 2.86 3.82 MAPE Σ(xi − x)2 (yi − y)2
120.38 90.67 122.98 MAE
186.74 143.48 171.35 RMSE { }
0; PoC ≥ μ
0.75 0.85 0.79 R2 PI% = (12)
6.67 6.26 5.96 Std. Dev. 1; PoC ≥ μ
0.87 0.93 0.90 PA
83.44 88.72 89.44 PI %
4.2. Experimental analysis for step ahead forecasting
October 5.61 2.45 3.27 MAPE
125.00 55.42 72.83 MAE The proposed ED model is evaluated for every month of the Year
145.49 86.89 103.54 RMSE
2019 data. Moreover, a comprehensive comparative analysis is con­
0.74 0.91 0.87 R2
5.16 5.11 4.92 Std. Dev. ducted with widely used linear and nonlinear parametric modelling
0.96 0.95 0.93 PA techniques based on the aforementioned evaluation metric. Using the
74.66 82.80 80.81 PI % PM defined in Fig. 4, we developed three widely used linear parametric
modeling techniques, such as ARX, ARMAX, and OE for STLF. Table 2
presents the comparison between the aforementioned models using
used seven key performance parameters to define our evaluation metric,
evaluation metric. The ARMAX take advantage of its two hyper-
such as MAPE (Eq. (6)), RMSE (Eq. (7)), MAE (Eq. (8)), R-square value
parameters, i.e., Auto-Regressive (AR) and Moving-Average (MA). The
(Eq. (9)), Std. Dev. (Eq. (10)), PA (Eq. (11)), and PI% (Eq. (12)) (Tar­
former captures the sequence of lagged values present in the electrical
khaneh & Shen, 2019). In Eqs. (6)–(11), the xi represents target elec­
load pattern, whereas the second hyper-parameter proficiently maps the
trical load value, yi represents forecasted output electrical load value,
input–output relationship between high-dimensional features and the
and n represents number of data points evaluated. In Eq. (12), the PI
electrical load. Therefore, the ARMAX model performs better to present
index is the percentage occurrence of predicted values within the pre­
strong autocorrelation of electrical load data, and proficiently captures
diction interval bounds, whereas the PoC is the probability of coverage
the local trend between the input parameters compared to OE and ARX.
index that calculates the percentage of predicted values that lie inside
In addition, Fig. 12(a) shows the capability of ARMAX to detect varia­
the PIs. The prediction interval bounds are also termed as lower and
tions and peak electrical load accurately. Conversely, the ARMAX is
upper bounds, which are calculated from historical electrical load
unable to find the closest match between forecasted and actual load for
pattern using the bootstrap algorithm. For example: If the PI index will
the month of June as shown in Fig. 12(b). It is concluded from Table 3
be 100%, then all forecasted values are within bounds. Moreover, the
that there is a need for further improvement in the STLF, and evaluation
value of the PI index must be greater than nominal coverage (μ), which is
parameters values can be minimized further using a nonlinear high
set to be 90% for the discussed electrical load forecasting problem.
performance forecasting algorithm.
Moreover, Fig. 12(a) and (b) also show that the mentioned linear
models are incapable of forecasting electrical load peaks and abrupt

2500 4000

3500
2000
Electrical Load (MW)

Electrical Load (MW)

3000

1500
2500

1000 2000
00:00 05:00 10:00 15:00 20:00 23:45 00:00 05:00 10:00 15:00 20:00 23:45
Time (Hour) of the Day Time (Hour) of the Day

Fig. 12. Predicted step-ahead electrical load output using statistical and ML methodologies: (a) 7th January 2019, (b) 7th June 2019.

10
U. Javed et al. Expert Systems With Applications 205 (2022) 117689

Table 3
Quantitative comparison of machine learning models for step-ahead electrical load forecasting.
Period (2019) KNN Tree Bagger SVM ANN-PSO ANN-LM (Double Layer) ANN-LM (Single Layer) Errors

January 7.56 4.46 3.71 3.95 3.85 2.47 MAPE


136.15 80.51 65.32 71.14 70.12 44.41 MAE
172.14 105.12 87.08 95.42 90.94 57.96 RMSE
0.65 0.87 0.91 0.89 0.90 0.96 R2
4.89 4.40 4.87 4.81 4.91 4.87 Std. Dev.
0.85 0.95 0.96 0.95 0.95 0.98 PA
79.06 83.26 82.05 77.67 81.32 83.61 PI %

April 6.49 3.58 2.98 2.64 3.26 2.24 MAPE


153.69 79.79 68.90 59.49 76.41 52.98 MAE
206.89 132.18 111.25 94.67 103.62 76.53 RMSE
0.68 0.87 0.91 0.93 0.92 0.96 R2
5.36 6.06 6.33 6.33 6.55 6.33 Std. Dev.
0.84 0.93 0.95 0.97 0.96 0.98 PA
81.59 89.11 89.62 90.93 85.85 89.48 PI %

June 6.36 3.05 2.43 2.44 3.00 1.81 MAPE


209.54 99.10 75.53 79.60 98.41 58.77 MAE
269.87 152.86 130.70 113.97 126.41 82.45 RMSE
0.48 0.83 0.88 0.91 0.89 0.95 R2
5.63 5.67 6.33 6.34 5.99 6.33 Std. Dev.
0.75 0.92 0.94 0.96 0.95 0.97 PA
88.23 91.32 88.58 90.14 89.58 90.17 PI %

October 5.56 2.61 2.69 3.37 4.67 1.92 MAPE


126.28 58.47 59.91 75.67 106.07 43.30 MAE
161.38 85.10 85.89 101.08 147.06 64.47 RMSE
0.69 0.91 0.91 0.88 0.74 0.95 R2
4.97 4.98 5.11 5.30 5.38 5.11 Std. Dev.
0.85 0.96 0.95 0.93 0.88 0.97 PA
68.38 84.81 83.53 80.54 68.25 87.60 PI %

changes accurately in the electrical load pattern, which is eminent for one hyper-parameter named as number of nearest neighbors, which is in-
managing hot and cold reserves in power systems. adequate to map the input–output relationship in the PM. Moreover, the
To develop more improved STLF modeling, we employed five widely KNN also experiences the over-fitting problem due to limited hyper-
used ML algorithms, such as Tree Beggars, KNN, SVM, ANN-PSO, and parameters and lack of regularization effect that reduces the perfor­
ANN-LM. Moreover, we use two ANN-LM models with single and double mance of the STLF. Fig. 12(a) and (b) shows that the Bagged Trees is
hidden layers and both models are further fine-tuned to obtain optimal significantly better than KNN in terms of forecasting error due to the
hyper-parameters selection, such as the number of neurons and learning advantage of ensembled-based learning. However, the sensible selection
rate during the repetition of training. The superiority of ML models is of hyper-parameters in SVM propagates fewer forecasting errors.
validated by quantitative and qualitative analyzing the forecasting Therefore, the SVM learns the closest match between actual and fore­
performance in the presence of highly diversified non-linear factors, casted load curve as depicted in Fig. 12(a) and (b). As the tuning of SVM
such as temperature and humidity for every month of the year. Table 3 hyper-parameters is tedious; therefore, the SVM cannot sustain any
shows the quantitative analysis of all five algorithms for four different further improvement. The NN-PSO that belongs to the family of meta-
months of Year 2019. The MAPE values suggest that the KNN shows the heuristic based NN models is also implemented to attain a more accurate
worst performance among all ML models, since the KNN model has only forecasting engine. However, ANN-PSO model was deprived of finding

2500 4000

3500
Electrical Load (MW)

Electrical Load (MW)

2000

3000

1500
2500

1000 2000
00:00 05:00 10:00 15:00 20:00 23:45 00:00 05:00 10:00 15:00 20:00 23:45
Time (Hour) of the Day Time (Hour) of the Day

Fig. 13. Predicted Step-Ahead electrical load using hybrid DL Methodologies: (a) 7th January 2019 and (b) 7th June 2019.

11
U. Javed et al. Expert Systems With Applications 205 (2022) 117689

Table 4 LSTM model. Therefore, CNN-LSTM model renders fewer forecasting


Quantitative comparison of hybrid DL models for step-ahead electrical load errors than LSTM model. Therefore, the CNN-LSTM practices closest
forecasting. match between actual and forecasted load than LSTM as shown in
Period (2019) LSTM CNN-LSTM SRDCC-BiLSTM Errors Fig. 13(a) and (b). However, the MAPE values of the proposed ED
January 2.15 1.57 1.54 MAPE
(SRDCC-BiLSTM) model suggest a better forecasting performance than
38.86 28.52 27.81 MAE the CNN-LSTM. The CNN-LSTM and SRDCC-BiLSTM accommodates 128
51.93 40.71 39.47 RMSE filters of size 1x1 in the first one-dimensional convolutional layer. The
0.97 0.98 0.98 R2 input and output width of the first 1-D convolutional layer remains same
4.87 5.43 5.38 Std. Dev. due to the kernel of size 1x1. Moreover, the filter of size 1x1 is also
0.99 0.99 0.99 PA
83.72 81.11 85.35 PI %
helpful in extracting all input–output relationships between the exoge­
neous variables and the electrical load pattern. The main architectural
difference between the proposed SRDCC-BiLSTM and CNN-LSTM is the
April 2.03 2.02 1.35 MAPE
44.80 45.67 37.70 MAE utilization of dilation rate in the first one-dimensional convolutional
73.97 72.54 54.74 RMSE layer of SRDCC-BiLSTM while the CNN-LSTM lacks the dilation factor.
0.96 0.96 0.97 R2 Although, the dilation rate of the proposed ED model is set to one that
6.41 6.13 6.51 Std. Dev. compels the proposed model to behave similar to CNN-LSTM in terms of
0.98 0.98 0.99 PA
the receptive field. The smaller receptive field takes the advantage of
90.48 94.92 92.07 PI %
extracting all sequential patterns of the input vector, which conveniently
learns the local trends in the electrical load data. The second 1-D con­
June 1.61 1.33 1.09 MAPE
49.48 41.88 36.05 MAE
volutional layer of the proposed ED model also consists of 128 filters of
79.29 68.11 58.18 RMSE size 2x2. The kernel size is increased in the second 1-D convolutional
0.95 0.97 0.97 R2 layer so that the proposed ED model enhances the generalization
6.41 6.46 6.67 Std. Dev. capability by expanding the receptive field. To avoid the excessive use of
0.98 0.99 0.99 PA model parameters in the second 1-D convolutional layer in the proposed
87.31 90.93 91.95 PI %
ED model, the dilation factor has been adjusted to two rather than one.
Therefore, the proposed model supersedes the CNN-LSTM due to the
October 1.73 1.38 1.16 MAPE
strong capability of capturing local trends in the electrical load curve
39.03 30.98 26.96 MAE
63.88 52.87 53.25 RMSE and the generalization capability of STLF, which reduces over-fitting in
0.95 0.97 0.97 R2 the presence of high dimensional input features. The decoder section of
5.59 5.16 5.37 Std. Dev. the proposed ED model employs Bi-directional LSTM despite of simple
0.98 0.98 0.98 PA LSTM, which encapsulates the short- and long-term dependencies of
90.79 87.05 90.78 PI %
electrical load data effectively due to the forward and backward prop­
agation of the data. Therefore, the electrical load forecasting curve of the
the optimal solutions to converge globally for eliminating the short­ proposed ED model supersede all other employed models.
comings of the above mentioned STLF issues. Therefore, the MAPE of In addition, the proposed ED architecture must be robust to capture
ANN-PSO is slightly higher than SVM for the month of January, which large variations in electrical load curve and during the months of
also degrades the STLF performance of ANN-PSO to detect peaks in the diversified electrical load demand. In Pakistan, June is the most dy­
electrical load curve. The experimental results show that the forecasting namic month of the year in terms of weather variations and electrical
accuracy of ANN-LM models with one and two dense hidden layers is load consumption. For the month of June, the proposed SRDCC-BiLSTM
better than other ML algorithms. With one dense layer, the MAPE value architecture also proficiently recognize the abrupt variations and
for ANN-LM is observed to be 2.74, which is a significant improvement randomness in electrical load pattern due to better extraction capability
compared to all above-mentioned non-linear parametric ML models. The and learning short-term dependencies. Therefore, Table 4 recapitulates
comparative analysis between ANN-LM model with one and two-dense that the proposed model significantly limits all evaluation parameters
layers reveals that the latter attains higher MAPE than the former due values for the month of June.
to increase in the model complexity. Therefore, the over fitting issue is By comparing the values of all evaluation parameters presented in
elevated in two-dense layers ANN-LM model due to an increase in hid­ Table 2, Table 3, and Table 4, we can easily reveal that the proposed
den layers, which results in the reduction of generalization capability hybrid SRDCC-BiLSTM model performs best among all other linear
(Uzair & Jamil, 2020). However, the single hidden layer ANN–LM model parametric, ML, and hybrid DL models. Moreover, for more clear visual
does not combat with over fitting issue since it is comparatively simpler. comparison, we plotted bar chart of MAPE values of all linear para­
Therefore, ANN–LM with a single dense layer can extract both linear and metric, ML, and hybrid DL models for all twelve months of the Year 2019
non-linear relationships between the seasonal trends and electrical load in Fig. 14(a)–(c), respectively. The bar charts also depict the proficiency
curve convincingly. Similarly, for single layer ANN-LM, the other eval­ of the proposed hybrid SRDCC-BiLSTM model over other linear and non-
uation parameters, such as RMSE, MAE, R2, Std. Dev., PA, and PI also linear parametric models.
have lower values as shown in Table 3. Comparable results can also be
observed in Fig. 12(a) and (b) in terms of capturing peak and valley in 4.3. Experimental analysis for Multi-step ahead forecasting
the load profile.
To make a fair comparison and performance evaluation of our pro­ To further show the effectiveness of our proposed SRDCC-BiLSTM
posed hybrid SRDCC-BiLSTM model, we also implemented two widely model, we expand our evaluation process to multi-step ahead elec­
used hybrid DL algorithms, such as LSTM and CNN-LSTM. For trical load forecasting. Multi-step ahead forecasting is a task to predict
comparative analysis, all three Step-Ahead predicted outputs are plotted the sequence of values in a time series and is quite important for power
in Fig. 13(a) and (b) for two randomly selected months. Moreover, the utility planners and demand controllers to ensure required generation
evaluation matrix based comparative analysis of the aforementioned for the next few hours or even a few days in advance. For multi-step
three hybrid algorithms is presented in Table 4. The CNN-LSTM has ahead forecasting models, all DL learning models considered in this
better tendency to capture the local trends, and non-linear relationships paper are trained independently 15 times to remove random discrep­
between diversified features and electrical load patterns compared to ancy with very slow learning rate. The graphical comparison of ANN,
LSTM, CNN-LSTM, and SRDCC-BiLSTM model for day-ahead forecasting

12
U. Javed et al. Expert Systems With Applications 205 (2022) 117689

10 8
KNN
OE
Bagged Trees
ARX SVM
ARMAX NN-PSO
8
6 NN-LM 2-Hidden Layers
NN-LM 1-Hidden Layer

6
MAPE

MAPE
4

2
2

0
0
l
n

ct
b

ar

c
ay

Ju
Ap
Ja

Ju

No

De
Fe

Au

Se

l
O

n
r

ct
b

ar

ay

ov

ec
Ju
M

Ap
M

Ja

Ju
Fe

Au

Se

O
M

D
Month of the Year Month of the Year
(a) (b)
2.5
LSTM
CNN-LSTM
2 SRDCC-BiLSTM

1.5
MAPE

0.5

0
l
n

n
r

ct
b
ar

ay

ov

ec
Ju
Ap
Ja

Ju
Fe

Au

Se

O
M

Month of the Year


(c)

Fig. 14. Bar Chart of MAPE values between actual and forecasted Step-Ahead Electrical Load of Year 2019 using: (a) Linear Parametric Models; (b) ML models; (c)
Hybrid DL models.

2000 2400 2400

2200
1800 2200
Electrical Load (MW)
Electrical Load (MW)

Electrical Load (MW)

2000
1600 2000
1800

1400 1800
1600

1200 1600 1400


00:00 05:00 10:00 15:00 20:00 23:45 00:00 05:00 10:00 15:00 20:00 23:45 00:00 05:00 10:00 15:00 20:00 23:45
Time (Hour) of the Day Time (Hour) of the Day Time (Hour) of the Day

Fig. 15. Comparative Analysis of Predicted day-ahead (multi-step) electrical load using ML and hybrid DL Models for: (a) 26th January 2019, (b) 26th March 2019,
(c) 26th November 2019.

results is depicted in Fig. 15. In Fig. 15, we plotted the forecasted results BiLSTM model also confirms effective forecasting results. The quanti­
of three randomly selected months. Moreover, to further show a fair tative analysis presented in Table 5 shows that the proposed architecture
comparative analysis we plotted bar plot of MAPE error of all afore­ also performs comparatively better than other ML and hybrid DL models
mentioned DL models for every month of the Year 2019 in Fig. 16. for multi-step ahead electrical load forecasting. The comparison shows
Furthermore, the performance analysis based on the whole evaluation that most of the peak and valley load patterns are captured efficiently by
metric is given in Table 5 for the month of January 2019, March 2019, the proposed SRDCC-BiLSTM model. Moreover, the predicted output of
and November 2019. From the presented results, the proposed SRDCC- the proposed architecture is also comparatively stable and refined than

13
U. Javed et al. Expert Systems With Applications 205 (2022) 117689

15 node to leaf nodes. Therefore, the Bagged trees algorithm highly de­
ANN
pends on the dimensions of the input data and ensembled bagged trees
LSTM
used in the algorithm, which restricts the Bagged tress to use in real-time
CNN-LSTM STLF applications due to overly expensive training and test time. Simi­
( )
10
SRDCC-BiLSTM larly, the range of training time complexity of SVM is up to O n3
(Rutkowski, Korytkowski, Scherer, Tadeusiewicz, Zadeh, & Zurada,
2015). And the Kernel SVM prediction complexity depends on the Kernel
MAPE

size and dimensions O(d2 ), which is again very costly (Claesen, Smet,
Suykens, & Moor, 2014). The ANN training complexity depends on the
5
number of layers and neurons in the layer and can be represented by the
asymptotic behavior of algorithm analysis as O(n*m), where n is the
number of neurons and m is the total layers in an ANN architecture
(Kasper, 2021). The other factors, such as epochs and backpropagation
0 algorithm also increase its computational cost. The ANN-PSO training
n r r y n l ct and testing time complexity is the sum of individual ANN and PSO
Ja
b
Fe M
a Ap Ma Ju Ju Aug Sep O ov ec
N D training and test time complexity, which also increases the training time
Month of the Year due to finding the best neighborhood samples during the different iter­
ations of d-dimensional search space. However, the ANN-LM algorithm
Fig. 16. Bar Chart of MAPE values between actual and forecasted Step-Ahead
with one layer utilizes the time complexity of conventional ANN in the
Electrical Load of Year 2019 using ML and hybrid DL Models.
training and testing phase, which is O(n). Therefore, the ANN-LM is
efficient and non-complex than above mentioned ML algorithms for our
STLF scenario.
Table 5 In hybrid DL algorithms, the LSTM is local in space and time
Quantitative comparison of alternative models for day-ahead short-term elec­
(Hochreiter & Schmidhuber, 1997). It means that the memory require­
trical load forecasting.
ment of the LSTM algorithm is not affected by the input length sequence
Period ANN- LSTM CNN- SRDCC-BiLSTM- Errors and dimensions. Therefore, the timing complexity of LSTM is O(w),
(2019) LM LSTM ED
where w are the weights of the LSTM networks. It reveals that the LSTM
January 10.97 7.27 5.34 4.03 MAPE is faster than all above mentioned ML algorithms. The CNN-LSTM
163.17 116.90 84.13 62.72 MAE complexity per time step can be derived as the sum of the complexity
216.91 144.06 98.30 76.05 RMSE
0.39 0.39 0.71 0.83
of the convolutional layers and the LSTM layer:
R2 ∑
6.79 19.08 16.88 16.05 Std. O( dl=1 (nl− 1 * sl2 * nl * ml2 ) +w) and for all the training process
Dev.
(∑ )
d
0.50 0.86 0.92 0.94 PA
O l=1 ((nl− 1 *sl2 *nl *ml2 ) + w)*i*e , where l is the input length and e is

the number of epochs. Therefore, the time complexity of CNN-LSTM


March 6.17 4.88 3.53 3.32 MAPE comes out to be O(n) in the typical asymptotic notation (Tsironi, Bar­
124.31 103.16 72.26 67.17 MAE ros, Weber, & Wermter, 2017). However, CNN-LSTM is more accurate
149.35 133.37 87.10 83.17 RMSE than LSTM at the cost of time complexity. The high-speed servers and
0.15 0.32 0.71 0.74 R2
computational devices permit the researchers to pay focus on the ac­
2.72 9.45 15.24 16.92 Std.
Dev.
curacy than the timing complexity in the modern era.
0.54 0.79 0.88 0.89 PA In our proposed SRDCC-BiLSTM algorithm, the total time complexity
is the sum of the complexity of dilated causal convolutional layers, the
November 11.70 7.98 5.19 4.94 MAPE Bi-LSTM layers, and the output layer. We use 1-dimensional dilated
189.56 138.37 88.25 85.05 MAE causal convolutional layer with a filter size of 1 × 1 in the first stage of
227.39 163.73 103.07 103.37 RMSE the Encoder block whose timing complexity can be simplified as the
0.07 0.52 0.81 0.81 R2 general asymptotic notation for dilated causal convolutional layer, that
9.35 14.77 19.48 21.20 Std. (∑ )
d
Dev. is O l=1 ((nl− 1 *sl2 *nl *ml2 ) + w)*i*e , where l is the number of layers, sl
0.55 0.73 0.94 0.93 PA
is the size of the filter, nl is the number of filters in the l-th layer, nl− 1 is
the input channel and ml is the spatial size of the feature map. Therefore,
other models. the asymptotic notation of single required 1-dimensional dilated causal
convolutional layer in the first stage of Encoder is O(n). Similarly, the
second dilated causal convolution layer that uses the filter size of 2 × 2
4.4. Complexity analysis of the proposed SRDCC-BiLSTM model
in the second stage of the encoder section also generalizes to O(n). The
total time complexity of dilated causal convolution layers in the encoder
This study also provides insight into the computational efficiency of
section is generalized to O(n) + O(n). According to asymptotic behavior,
different ML and DL used in this study for STLF problem and compares
the time complexity generalizes to O(n). The time complexity of BiLSTM
their timing complexities. The training time complexity of KNN in terms
layer is the sum of the time complexity of forward and backward paths in
of asymptotic analysis of the algorithm is O(1), which is constant, but
the LSTM structure. Therefore, the total time complexity of BiLSTM
the testing time complexity is O(n*k*d), where n is the number of test
layer is O(w) + O(w), which also generalizes to O(w). Therefore, the
samples, k represents the members of giving vote, and d is the di­
total time complexity of our proposed architecture is O(n) +O(w) in the
mensions of the input data. The timing complexity of the KNN algorithm
training phase. In terms of simplified asymptotic notation, we can
suggests that the KNN algorithm is slow in the testing phase, which
deduce that our proposed model has O(n) time complexity, which is
makes it impractical to use for our STLF problem. The Bagging trees
equivalent to the conventional CNN-LSTM. Therefore, the proposed ar­
algorithm, which is an ensembled learning method of bagged decision
chitecture decreases multi-step electrical load forecasting errors without
trees, has a training time complexity O(n*logn*d), where n is the input
increasing the training and test time complexity.
data points and d is the dimensions. While the test time complexity of
Bagged tree is O(m), where m is the depth of bagged tress from the root

14
U. Javed et al. Expert Systems With Applications 205 (2022) 117689

5. Conclusions and future recommendations Acknowledgements

Short-Term Load Forecasting (STLF) has gained a considerable sig­ This work is supported by Estonian research council grants PSG739
nificance for energy management and electrical load scheduling in and European Commission through H2020 project Finest Twins grant
power systems. This paper proposes a novel hybrid deep learning-based No. 856602. Moreover, this research work is conducted as preliminary
Encoder-Decoder (ED) technique to improve the generalization capa­ research for the research project approved under HEC post-doctoral
bility and forecasting accuracy of the STLF problem. In ED, the Short fellowship funding Phase III (Batch-2).
Receptive field-based Dilated Causal Convolutional Network integrated
with Bidirectional Long-Short Term Memory (SRDCC-BiLSTM) and the References
advantages of the proposed hybrid SRDCC-BiLSTM model are as follows:
Ali, S. M., Jawad, M., Khan, M. U., Bilal, K., Glower, J., Khan, S. U., & Zomaya, A. Y.
(2020). An ancillary services model for data centers and power systems. IEEE
• The encoder section extracts the specific patterns of the local trends Transactions on Cloud Computing, 8(4), 1176–1188.
in the time-series data convincingly without increasing model pa­ Aslam, S., Herodotou, H., Mohsin, S. M., Javaid, N., Ashraf, N., & Aslam, S. (2021).
rameters using 1 × 1 and 2 × 2 dilated causal convolution filters for A survey on deep learning methods for power load and renewable energy forecasting
in smart microgrids. Renewable and Sustainable Energy Reviews, 144, Article 110992.
one-step ahead and multi-step day-ahead electrical load forecasting, Bianchi, F. M., Santis, E. D., Rizzi, A., & Sadeghian, A. (2015). Short-Term Electric Load
respectively with a dilation rate of 1 and 2. Forecasting Using Echo State Networks and PCA Decomposition. IEEE Access, 3,
• The 1 × 1 and 2 × 2 dilated causal convolution filters also help in 1931–1943.
Bouktif, S., Fiaz, A., Ouni, A., & Serhani, M. A. (2018). Optimal deep learning LSTM
restricting feature maps containing local patterns of climatic, his­ model for electric load forecasting using feature selection and genetic algorithm:
torical electrical load data, and temporal features as explained comparison with machine learning approaches. Energies, 11(7), 1636.
earlier, which reduces the time complexity of the encoder section Brezak, D., Bacek, T., Majetic, D., Kasac, J., & Novakovic, B. (2012). A comparison of
feed-forward and recurrent neural networks in time series forecasting. 2012 IEEE
and proposed model to O(n).
Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr).
• The proposed model also avoids unnecessary changes in the input New York, NY, USA.
sequence vector representation in the inner section of the encoder Buitrago, J., & Asfour, S. (2017). Short-term forecasting of electric loads using nonlinear
using 1 × 1 dilated causal convolution filters, which helps in autoregressive artificial neural networks with exogenous vector inputs. Energies, 10
(1), 40. https://doi.org/10.3390/en10010040
capturing the vital features of the electrical load pattern in the Cavallo, J., MarinescuIvana, A., Dusparic, & Clarke, S. (2015). Evaluation of Forecasting
dataset. Methods for Very Small-Scale Networks. Data Analytics for Renewable Energy
• The decoder section enhances the prediction accuracy using 128 Integration (pp. 56-75). New York City: Springer International. doi:10.1007/978-3-
319-27430-0_5.
BiLSTM cells. The proposed hybrid SRDCC-BiLSTM model is evalu­ Claesen, M., Smet, F. D., Suykens, J. A., & Moor, B. D. (2014). Fast Prediction with SVM
ated and validated on a real-time yearlong dataset. The extensive Models Containing RBF Kernels. arXiv preprint, arXiv:1609.03499.
experimentation reveals that the proposed architecture shows up to Deb, C., Zhang, F., Yang, J., Lee, S. E., & Shah, K. W. (2017). A review on time series
forecasting techniques for building energy consumption. Renewable and Sustainable
35 percent more significant improvement in electrical load predic­ Energy Reviews, 74, 902–924.
tion for step-ahead electrical load forecasting than CNN-LSTM, Edigera, V.Ş., & Akarb, S. (2007). ARIMA forecasting of primary energy demand by fuel
which was the best among all implemented models. The compari­ in Turkey. Energy Policy, 35(3), 1701–1708. https://doi.org/10.1016/j.
enpol.2006.05.009
son with other machine learning and deep learning models is con­ Fallah, S. N., Deo, R. C., Shojafar, M., Conti, M., & Shamshirband, S. (2018).
ducted based on standard evaluation parameters, such as Mean Computational intelligence approaches for energy load forecasting in smart energy
Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), management grids: state of the art, future challenges, and research directions.
Energies, 11(3), 596.
Mean Absolute Error (MAE), R-squared (R2), Standard Deviation
Fan, G.-F., Guo, Y.-H., Zheng, J.-M., & Hong, W.-C. (2019). Application of the Weighted
(Std. Dev.), and Prediction Accuracy (PA). K-Nearest Neighbor Algorithm for Short-Term Load Forecasting. Energies, 12(5).
• An in-depth qualitative analysis is accomplished in different weather https://doi.org/10.3390/en12050916
seasons, which indicates the efficiency of the presented model by Farsi, B., Amayri, M., Bouguila, N., & Eicker, U. (2021). On Short-term load forecasting
using machine learning techniques and a novel parallel deep LSTM-CNN Approach.
finding the closest match between the actual and the predicted IEEE Access, 9, 31191–31212. https://doi.org/10.1109/ACCESS.2021.3060290
electrical load. Han, Y., Sha, X., Grover-Silva, E., & Michiardi, P. (2014). On the impact of socio-
economic factors on power load forecasting. IEEE International Conference on Big Data
(Big Data). Washington, DC, USA. doi:10.1109/BigData.2014.7004299.
In the future, we will consider more diversified electrical load data Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation,
and improve the STLF performance by implementing a novel error 9(8), 1735–1780.
correction deep learning technique. Javed, U., Ijaz, K., Jawad, M., Ansari, E. A., Shabbir, N., Kütt, L., & Husev, O. (2021).
Exploratory data analysis based short-term electrical load forecasting: A
comprehensive analysis. Energies, 14(17), 5510.
Declaration of Competing Interest Javed, U., Mughees, N., Jawad, M., Azeem, O., Abbas, G., Ullah, N., … Tahir, U. (2021).
A Systematic review of key challenges IN Hybrid HVAC–HVDC grids. Energies, 14
(17), 5451.
The authors declare that they have no known competing financial Jawad, M., Ali, S. M., Khan, B., Mehmood, C. A., Farid, U., Ullah, Z., … Sami, I. (2018).
interests or personal relationships that could have appeared to influence Genetic algorithm-based non-linear auto-regressive with exogenous inputs neural
the work reported in this paper. network short-term and medium-term uncertainty modelling and prediction for
electrical load and wind speed. The Journal of Engineering, 2018(8), 721–729.
https://doi.org/10.1049/joe.2017.0873
Data Availability: Jawad, M., Qureshi, M. B., Khan, M. U., Ali, S. M., Mehmood, A., Khan, B., & Wang, X.
(2021). A robust optimization technique for energy cost minimization of cloud data
Simulations and Related Data to this article is available online at centers. IEEE Transactions on Cloud Computing, 9(2), 447–460.
Kasper. (2021, December 30). Computational Complexity Of Neural Networks. Retrieved
GitHub that can be accessed at: https://github.com/khalidijaz/ from https://kasperfred.com/series/introduction-to-neural-networks/
SRDCC-BilSTM-Short-Term-Electrical-Load-Forecasting, an open- computational-complexity-of-neural-networks.
source online data repository. Khan, K. S., Ali, S. M., Ullah, Z., Sami, I., Khan, B., & Mehmood, C. A. (2020). Statistical
Energy Information and Analysis of Pakistan Economic Corridor Based on Strengths,

15
U. Javed et al. Expert Systems With Applications 205 (2022) 117689

Availabilities, and Future Roadmap. IEEE Access, 8, 169701–169739. https://doi. rp5.ru. (n.d.). (Raspisaniye Pogodi Ltd.) Retrieved from www.rp5.ru/Weather_in_the_
org/10.1109/ACCESS.2020.3023647 world.
Khan, M. U., Jawad, M., & Khan, S. U. (2021). Adadb: Adaptive Diff-Batch optimization Rueda, F. D., Suárez, J. D., Torres, A., & d.. (2021). Short-Term Load Forecasting Using
technique for gradient descent. IEEE Access, 9, 99581–99588. Encoder-Decoder WaveNet: Application to the French Grid. Energies, 14(9), 2524.
Komorowski, M., Marshal, D. C., Salciccioli, l. D., & Crutain, Y. (2016). Exploratory Data Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L. A., & Zurada, J.
Analysis. In Secondary Analysis of Electronic Health Records (pp. 185-203). M. (2015). Reducing Time Complexity of SVM Model by LVQ Data Compression.
Kong, W., Dong, Z. Y., Jia, Y., Hill, D. J., Xu, Y., & Zhang, Y. (2019). Short-term 2015 14th International Conference on Artificial Intelligence and Soft Computing
residential load forecasting based on LSTM recurrent neural network. IEEE (ICAISC). Zakopane, Poland.
Transactions on Smart Grid, 10(1), 841–851. https://doi.org/10.1109/ Sadaei, H. J., Silva, P. C., Guimarães, F. G., & Leee, M. H. (2019). Short-term load
TSG.2017.2753802 forecasting by using a combined method of convolutional neural networks and fuzzy
Lara-Benítez, P., Carranza-García, M., Luna-Romera, J. M., & Riquelme, J. C. (2020). time series. Energy, 175, 365–377.
Temporal convolutional networks applied to energy-related time series forecasting. Sherstinsky, A. (2020). Fundamentals of recurrent neural network (rnn) and long short-
Applied Sciences, 10(7), 2322. term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, Article
Li, L., Ota, K., & Dong, M. (2017). Everything is Image: CNN-based Short-term Electrical 132306.
Load Forecasting for Smart Grid. 2017 14th International Symposium on Pervasive Shi, H., Xu, M., & Li, R. (2018). Deep learning for household load forecasting—A novel
Systems, Algorithms and Networks & 2017 11th International Conference on Frontier of pooling deep RNN. IEEE Transactions on Smart Grid, 9(5), 5271–5280.
Computer Science and Technology & 2017 Third International Symposium of Creative Sobhani, M., Campbell, A., Sangamwar, S., Li, C., & Hong, T. (2019). Combining weather
Computing (ISPAN-FCST-ISCC). Exeter, UK. doi:10.1109/ISPAN-FCST-ISCC.2017.78. stations for electric load forecasting. Energies, 12(8), 1550. https://doi.org/10.3390/
Li, W., Shi, Q., Sibtain, M., Li, D., & Mbanze, D. E. (2020). A hybrid forecasting model for en12081510
short-term power load based on sample entropy, two-phase decomposition and Somu, N., & Ramamritham, K. (2021). A deep learning framework for building energy
whale algorithm optimized support vector regression. IEEE Access, 8, consumption forecast. Renewable and Sustainable Energy Reviews, 137, Article
166907–166921. 110591.
Li, S., Wang, P., & Goel, L. (2016). A novel wavelet-based ensemble method for short- Tarkhaneh, O., & Shen, H. (2019). Training of feedforward neural networks for data
term load forecasting with hybrid neural networks and feature selection. IEEE classification using hybrid particle swarm optimization, Mantegna Lévy flight and
Transacions on Power Systems, 31(3), 1788–1798. https://doi.org/10.1109/ neighborhood search. Heliyon, 5(4). https://doi.org/10.1016/j.heliyon.2019.e01275
TPWRS.2015.2438322 Tayaba, U. B., Zia, A., Yanga, F., Lu, J., & Kashif, M. (2020). Short-term load forecasting
Liu, X., Zhang, Z., & Song, Z. (2020). A comparative study of the data-driven day-ahead for microgrid energy management system using hybrid HHO-FNN model with best-
hourly provincial load forecasting methods: From classical data mining to deep basis stationary wavelet packet transform. Energy, 203, Article 117857.
learning. Renewable and Sustainable Energy Reviews, 119, Article 109632. Tsironi, E., Barros, P., Weber, C., & Wermter, S. (2017). An analysis of Convolutional
Lv, P., Liu, S., Yu, W., Zheng, S., & Lv, J. (2020). EGA-STLF: A hybrid short-term load Long Short-Term Memory Recurrent Neural Networks for gesture recognition.
forecasting model. IEEE Access, 8, 31742–31752. Neurocomputing, 268, 76–86.
Ma, C., Dai, G., & Zhou, J. (2021). Short-term traffic flow prediction for urban road sections Uzair, M., & Jamil, N. (2020). Effects of Hidden Layers on the Efficiency of Neural
based on time series analysis and LSTM_BILSTM Method (pp. 1–10). Early Access: IEEE networks. 2020 IEEE 23rd International Multitopic Conference (INMIC). Bahawalpur,
Transactions on Intelligent Transportation Systems. Pakistan.
Mamun, A. A., Sohel, M., Mohammad, N., Sunny, M. S., Dipta, D. R., & Hos, E. (2020). Velasco, L. C., Estoperez, N. R., Jayson, R. J., Sabijon, C. J., & Sayles, V. C. (2018). Day-
A comprehensive review of the load forecasting techniques using single and hybrid ahead base, intermediate, and peak load forecasting using k-means and artificial
predictive models. IEEE Access, 8, 134911–134939. neural networks. International Journal of Advanced Computer Science and Applications,
Musbah, H., & El-Hawary, M. (2019). SARIMA Model Forecasting of Short-Term 9(2). https://doi.org/10.14569/IJACSA.2018.090210
Electrical Load Data Augmented by Fast Fourier Transform Seasonality Detection. Yildiz, B., Bilbao, J., & Sproul, A. (2017). A review and analysis of regression and
2019 IEEE Canadian Conference of Electrical and Computer Engineering (CCECE). machine learning models on commercial building electricity load forecasting.
Edmonton, AB, Canada. Renewable and Sustainable Energy Reviews, 73, 1104–1122. https://doi.org/10.1016/
Oord, A. v., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., . . . j.rser.2017.02.023
Kavukcuoglu, K. (2016). WaveNet: A Generative Model for Raw Audio. arXiv Zhang, Y.-F., & Chiang, H.-D. (2020). Enhanced ELITE-Load: A Novel CMPSOATT
preprint, arXiv:1609.03499. Methodology Constructing Short-Term Load Forecasting Model for Industrial
Rafi, S. H., Nahid-Al-Masood, D. S. R, & Hossain, E. (2021). A Short-Term Load Applications. IEEE Transactions on Industrial Informatics, 16(4), 2325–2334.
Forecasting Method Using Integrated CNN and LSTM Network. IEEE Access, 9,
32436–32448.

16

You might also like