1 s2.0 S2210537920301542 Main
1 s2.0 S2210537920301542 Main
Secure data analytics for smart grid systems in a sustainable smart city:
Challenges, solutions, and future directions
Aparna Kumari, Sudeep Tanwar *
Department of Computer Science and Engineering, Institute of Technology, Nirma University, Ahmedabad, 382481 Gujarat, India
A R T I C L E I N F O A B S T R A C T
Keywords: A smart city requires an intelligent infrastructure to improve the quality of life with sustainable environment for
Big data its citizens. There is an exponential demand for efficient, secure, reliable, and uninterrupted electricity supply, so
Secure data analytics there is a need for an intelligent grid, which uses Information and Communications Technology (ICT) to optimize
Sustainable smart city
the generation, circulation, and ingestion of electricity. Thus, Smart Grid (SG) acts as an intelligent grid, which
Load forecasting
Smart meter
plays an important role in the overall growth of any smart city. Further, the Big Data (BD) generated from SG,
Data security and privacy provides noteworthy information that could significantly benefit different applications of SG, such as demand
Machine learning response and load profiling. However, an insecure technique for decision-making may lead to the breach of SG
data where hackers gained full access to consumer data. On the contrary, a secure technique for decision-making
can provide satisfaction to all the stakeholders, including consumers and utility providers. Motivated from these
facts, this paper presents a comprehensive literature survey and analysis of state-of-the-art proposals for Secure
Data Analytics (SDA) in the SG system. However, to achieve SDA for the SG systems is one of the critical tasks.
The existing research and development endeavors not fully exploited the SDA in the SG system. In this paper, we
discuss the distinctive nature of SDA and its complexity over the SG data. A detailed taxonomy abstracted into a
novel process model, which highlights various research challenges such as secure data collection and pre
processing, secure load data processing and storage, load prediction, load management and analysis, data se
curity and privacy issues, and data communication. Finally, a case study is presented to demonstrate the process
model.
1. Introduction 2027 [2], which is almost double the existing market size, as shown in
Fig. 1a. Therefore, there is a need to develop a mechanism to tackle this
The smart city is all about how different “organism” of the city works BD issue in the SG system [3,4].
together efficiently and also survives to extreme conditions. Some Smart cities depend on SG to ensure pliant delivery of electricity
essential aspects of smart city, such as energy, clean water, fast supply to perform their many functions, present opportunities for con
communication, economical healthcare services, smart transportation, servation, improve efficiencies, and, most importantly, enable coordi
and safety are required to be managed to in concert to support the nation between city control center, other infrastructure domain
smooth operation of critical infrastructure while providing for a clean, operators, and those, who are responsible for public safety. The smart
economic and safe environment to live, work, and play. Moreover, with cities ensure that all its “organism” work efficiently as an integrated part
an increase in population across the globe, the demand for electricity of it and also survives in life-threatening condition. So, SG is one of the
also increases, which creates a burden on the traditional grid system to “organism” of smart city and a cornerstone of the Indian Government’s
meet out this demand. However, future energy management is pro important programs, for instance, 100 smart cities development. SG is
gressively data-intensive; for example, Smart Meters (SMs) are gener the key enablers to resolve India’s electricity woes and handle technical
ating superior time-resolution data termed as BD. The enlarged and non-technical energy losses. Energy (part of SG), water, trans
granularity and subsequently increased quantity enable an in-depth portation, communication, public health and safety, and other aspects of
analysis of BD generated from SMs [1]. As per the survey conducted a smart city are managed in concert to support the smooth operation of
by Statista, the global market of BD will increase to 103 billion USD by critical infrastructure while providing for a clean, economic and safe
* Corresponding author.
E-mail addresses: [email protected] (A. Kumari), [email protected] (S. Tanwar).
https://doi.org/10.1016/j.suscom.2020.100427
Received 13 January 2020; Received in revised form 27 February 2020; Accepted 14 August 2020
Available online 28 August 2020
2210-5379/© 2020 Elsevier Inc. All rights reserved.
A. Kumari and S. Tanwar Sustainable Computing: Informatics and Systems 28 (2020) 100427
environment in which to live, work and play. patterns. On the contrary, a secured analytics technique can give more
In the energy sector, the revolutions brought by BD are moving the accurate and secure decision-making results. So, SDA becomes the
landscape of the traditional grid industry. Nevertheless, SG recognizes foremost research concern for the utility industry and researcher [7]. In
BD analytics as an inescapable task, but hesitant to implement it because terms of utility market adoption, GTM Research has forecasted the cu
of factors such as skill shortage, data complexity, security and privacy mulative global spending on BD analytics in SG to top $20.6 billion, with
issues, and lack of management support, which make BD implementa a yearly expend of $3.8 billion globally in the year 2020, as shown in
tion difficult, as shown in Fig. 1c. Presently, only 32% of BD is imple Fig. 1d [8,9].
mented in SG system as shown in Fig. 1b [5]. There are various In the past few years, the SG has witnessed significant expansions of
challenges for decision-making in the SG system, such as electricity price data analytics in the consumption, generation, and transmission of BD.
control, operational efficiency, and reliability and stability of the system In SG, massive data such as electricity consumption, SM status, and
[6,7]. To handle the aforementioned challenges, there is a need for BD consumer interaction data are collected for analytics. Then, many secure
analytics in the SG system, which opens the doors for researchers to analytics techniques applied during secure data collection, such as
manage the demand for energy. The BD analytics can provide accurate authentication and trust management, secure data processing and stor
decision making analysis based on “information flow” (integrates energy age, and real-time analytics. SDA can provide efficient and effective
flow and data flow). But, insecure BD analytics techniques lead to se decision support to all of the producers, utility providers, operators, and
curity breaches of energy consumption data and consumer usage consumers in the SG system. Hence, it facilitates SG to provide a more
2
A. Kumari and S. Tanwar Sustainable Computing: Informatics and Systems 28 (2020) 100427
This section provides the highlights of SG, BD, and SDA, and the growth of SDA in the SG system. After reading this section, readers can
effortlessly discover the motivation, contributions, organization, and reading map of this paper.
3
A. Kumari and S. Tanwar Sustainable Computing: Informatics and Systems 28 (2020) 100427
2. Background and research challenges BD represents the enormous quantity and complex datasets, which
requires a specialized computational approach to decipher trends, pat
2.1. Background terns, and relations. Moreover, these datasets are difficult to analyze
using the conventional data-processing system in a tolerable elapsed
In this section, we discuss an overview of the SG system, the need of time. Mathematical analysis, inductive statistics, and nonlinear system
BD and the suitability of SDA in SG. identification can be used to deduce regressions, relationships, and
causal effects from these datasets having low information density to
2.1.1. Smart grid reveal relationships and dependencies. These assessments become very
The SG is a complex system that includes several sub-systems, for relevant and intriguing in the modern perspective as it carries vital in
instance, Advanced Metering Infrastructure (AMI), power generations, formation that can be used for the benefit of mankind. Current usage of
distribution, transmission, substations, networking systems, smart ap BD in SG systems pertains to predictive analytics, consumer behavior
pliances, and renewable energy resources. Recent developments in ICT analytics, etc., which extract the significant information from such
bring the efficient SG vision to reality as it reformed the traditional grid datasets. Thus, BD analytics is applicable in the area of business trends,
system. SG has six major components: software, hardware, network, disease prevention, energy usage, crime and combats, and so on. How
data, user, and servers. It operates on a two-way communication ar ever, analyzing this huge data has been challenging for scientists,
chitecture. Thus communication security and reliability are crucial to medical professionals, business executives, and governments. These
manage the two-way flow of data and electricity. SG has numerous challenges dragged significant attention worldwide to devise advance
benefits, for instance, data-driven pricing, bidirectional data, and energy BD analytics and make its realization possible for the overall benefit of
flow, integrated renewable energy, and power consumption tracking. It society. One such trend has been seen in energy sectors equipped with
has exceptional competences to perform self-awareness, self-coordina AMI, where a real-time assessment of energy usage through BD analytics
tion, and self-healing activities. SG implementation encompasses several enables utility companies to track user behavior, grid-behavior, and
challenges, such as transmission and distribution losses, outdated expected load in any given period. The utilities may follow the results
technology, renewable energy incorporation, power quality, and secu and integrate such information in their planning and decision-making
rity vulnerabilities. For instance, an SG system must encounter security processes.
requirements to prevent any vulnerability in its computation, commu
nication, and control sub-systems. Regardless of all its benefits, SG is 2.1.3. Secure data analytics for smart grid
struggling to handle the massive volume of data generated every day by BD is one of the technologies, which opens the doors for SG to bring
sub-systems and consumers [13]. To tackle the data management issue, new assessment models and different application areas such as demand
emerging technology BD analytics can play an important role. The response management, load classification, and customer behavioral
background and scope of BD have been discussed in subsequent analysis. The data analytics introduces SDA at each stage of SG to pro
subsections. vide secure analytics. It includes different phases such as secure BD
collection, secure BD integration, BD storage, secure BD analytics,
2.1.2. Big data and its analytics secure BD visualization, and secure communication systems. The SDA
A rapidly growing market of electronic devices (such as smart de involves in different application areas of SG, for example, on-demand
vices) and the advent of Internet-era have been influencing the way of usage monitoring, billing services, and consumption data statistics.
data computation. The increasing usage of smart devices collectively The SDA life-cycle can be categorized in various stages as follows [15]:
known as “Internet of Things (IoT)”, which is being used in remote
sensing, software logs, wireless sensor networks, phone, laptops, and (i) Secure BD collection: As a matter of fact, there are particular in
cameras [14]. Consequently, zettabytes of BD are being generated and formation classes as per the extricated esteems: (a) Operational
their size is growing exponentially every year. BD can be characterized data, which is the electrical information of the matrix that speaks
as an immense amount of datasets, yet in certainty, which incorporates to genuine and responsive control streams, for instance, request
other highlights. For example, (i) the volume, large information depends reaction limit and voltage, (b) Non-operational data, which is not
on the assortment to show, (ii) BD variety (structured, semi-structured identified with the power grid; for example, information on
or unstructured), (iii) velocity means data generation rates, (iv) vari control quality and dependability, (c) Meter utilization informa
ability means the frequent changing of data, and (v) veracity, which tion is another sort of information related to electricity usage
deals with the dependability of the data [9,15–17]. In this paper, control and request qualities, (d) Event; it message data mostly
limiting ourselves to the “5Vs” characteristics of BD, which are as SG devices, and (e) Metadata, which is utilized to form and un
follows: derstand the information about data.
(ii) Secure BD integration: Presently, BD and its associated activities
• Volume: It implies gigantic volumes of data. Alongside, that BD is are utilized to improve the quality, perseverance, effectiveness,
generated from different sources such as networks, sensors, and so and execution of SG. This causes numerous advancements and
cial media. Hence, the massive volume of BD is to be analyzed during approaches to deal with secure data integration, which are as
analytics. follows.
• Variety: It refers to the formats of BD, for instance, structured, semi- (a) Service-Oriented Architecture (SOA): It combines an extraor
structured, and unstructured. The BD can be in the form of text, dinary number of computer software to provide services to
videos, photos, sensor devices, logs, etc. The variety of BD encoun SG systems. But, the major issue is how to maintain and
ters BD storage, BD mining, and analytics challenges. manage these frameworks. SOA helps the software to
• Velocity: It is the data generation rate at which BD arrives. It also communicate with each other using a solitary methodology,
includes the time to process BD and apprehend the acquired data in which makes data integration progressively adaptable and
decision making. simpler [19]. In SG, SOA is used on the demand side.
• Variability: It refers to the inconsistency of the BD, which means the (b) Enterprise Service Bus (ESB): It depends on an incredible
data are frequently changing. number of ways to manage communication between various
• Veracity: It refers to the trustworthiness or quality of the data. The types of frameworks, for example, customer information
BD veracity abandons abnormality and noise in data and transforms systems, outage management system, and geographic infor
it into reliability insights [18]. mation system. It decreases the cost and saves time con
cerning monitoring, management, and variance of
4
A. Kumari and S. Tanwar Sustainable Computing: Informatics and Systems 28 (2020) 100427
integration [20]. In SG, ESB advancements are firmly iden generated BD can be done in batch processing form or as stream
tified with SOA, since it makes them flexible and robust. processing (for real-time applications) with low latency.
(c) Common Information Models (CIM): These models are used for (v) Secure BD visualization: It has an incredible role to improve the
data design, particularly for data management. These models assessment of SG using two-dimension (2D) or three-dimension
use Unified Modeling Language (UML), which plays a sig (3D) visualization. However, SG faces complicated data pre
nificant role in energy management systems with respect to sentations due to enormous variables, for instance, 3D Power-
cost and time. Moreover, it helps to exchange data within the map. Andrew curve, Scatter diagram, and parallel coordinate
technical grid infrastructure. CIM provides guaranteed data are be used to resolve this issue of high dimensional data [22].
interoperability due to operation in data transformation. The (vi) Secure communication systems: It involvement remains in all the
integration of ESB and CIM is used for the standardization stages of SDA to maintain data security and privacy, high band
and normalization of data between various SG systems. width, speed, and capacity. SG is based on various communica
(d) Message server: It exchanges messages between different ap tion technologies such as WiFi and ZigBee, and network
plications of the SG system [21]. technologies, for instance, Machine-to-Machine (M2M), Cellular
(iii) Secure BD storage: It stored the collected data from dispatched networks, and Ethernet [15,23].
sources and delivered them to analytical tools for quick opera
tions. Hence, there is a requirement for a scalable data storage Finally, the security of BD during its collection, integration, pro
system, such as (a) Distributed File System (DFS), which is a cessing, analytics, and communication needs great attention to provide
document framework that enables different clients on numerous SDA. In the subsequent section, we discuss the research challenges
computers to share storage assets and files. Moreover, it allows pertaining to the SDA in SG.
each client to have a local copy of the stored BD. There is an
incredible number of DFS solutions available, for example, HDFS, 2.2. Research challenges
Quantcast File System (FS), Ceph, Google FS (GFS), and Luster
GlusterFS, (b) NoSQL databases use a unique database approach An extensive cost-benefit analysis is needed that can assist the
to conquer the restrictions of traditional relational databases on effective realization of SDA in SG systems. Various research issues and
account of huge data. It comes up with three architectures, which challenges in SDA for SG systems are shown in Fig. 4 and are listed as
are column-oriented solutions (for example, HBase and Cassan follows [11]:
dra), key-value solutions (for instance, Voldemort and Dynamo),
and documents databases solutions (for instance, CouchDB and –Big data issues: Significant work has been conducted in BD analytics
MongoDB). [24,25], but the integration of multivariate data on a larger scale is
(iv) Secure BD analytics: It makes the SG system more efficient, one of the challenging tasks. Still, SDA combine two aspects: multi
intelligent, and gainful. The analytics can be classified as (i) variate data fusion (for instance, economic data, electricity con
Signal analytics (incorporate signal handling and processing), (ii) sumption data, meteorological data, and electric vehicle (EV)
Event analytics (focuses on events), (iii) State analytics (focuses charging data) and high-performance computing (for instance cloud
on state of the SG), (iv) Engineering operations analytics (focuses computing, fog computing, Graphics Processing Unit (GPU)
on operating side of SG), and (v) Customer data analytics computing, and distributed computing) [11,26,27].
(behavioral analysis of customer). There is various kind of the –Security and privacy: The utilities of an efficient SDA need compre
analytics classes such as descriptive (customers behaviors in de hensive and regular energy utilization information of users. These
mand response programs), diagnostic (helps to understand spe details may reveal the behavior of the household that could, in turn,
cific customers behaviors), predictive (predict customers lead to severe issues related to privacy and security [28]. Therefore,
decisions), and prescriptive models (analytics affect marketing, in order to tackle such problems, the communication infrastructure
business decision, and engagement strategies). SDA of SG must be secure and resilient to different attacks such as security
threats on data sources (physical attack, data injection attack and
service manipulation attack [29]), security threats on communica
tion system (man-in-the-middle attack, Sybil attack, sinkhole attack,
eavesdropping attack, jamming channel attack: tampering attack
and forgery attack) and other threats (SQL injection, insider attack,
privacy leakage, and distributed denial of service (DDoS) attack) [30,
31].
–New machine learning technologies: The recent advancements in
Machine Learning (ML) had a significant influence on SDA. More
over, the clustering methods were proposed and used in Deep
Learning (DL) technique, which is applicable in SDA to avoid over-
fitting.
(i) Deep learning and transfer learning: Various industries, including
SG, have applied deep learning. Developing distinct deep
learning models is a topic of research for deficient of label-data.
Then, transfer learning may help to fully utilize data for analytics
[32]. Deep learning is used to implement several transfer
learning tasks [33].
(ii) Online learning and incremental learning: As BD is a real-time
stream data that can be handled through online learning and
incremental learning [34]. Several incremental learning ap
proaches like incremental clustering and online learning
approach like online dictionary learning have been developed
[35]. But, online learning/incremental learning is barely used in
Fig. 4. Research challenges for SDA in SG system. SDA, except for the online anomaly detection approach.
5
A. Kumari and S. Tanwar Sustainable Computing: Informatics and Systems 28 (2020) 100427
–Transition of energy systems: The incorporation of renewable energy development of decentralized architectures (to enable harmonious
resources and multiple energy systems (MES) is an emerging trend in operation of small-scale energy supply systems), strengthen of the SG
SG development. An ideal smart home has several loads, including by accommodate more renewable energy resources, integration of
electricity, gas, cooling, and heating. These energy options like en erratic energy generation sources (for example, residential micro-
ergy storage, rooftop Photo-voltaic (PV), and EV also influence generation of energy using solar PV), and development of
change in the structure of future delivery of energy [11]. advanced technologies to handle dynamic energy resources more
(i) High penetration of renewable energy: It positively affects the effectively [18].
current utilization patterns that may, in turn, influence the net
load profiles. Superior data analytical systems are required to be
In this section, readers get the background knowledge of SG, BD, and SDA for a better understanding of the subject. Then, the discussion on
potential research issues of SDA in the SG system necessitates the development of the process model and taxonomy for SDA. Both are discussed in
the subsequent section comprehensively.
implemented for outage management, faulty detection, decision 3. Survey methodology, process model and solution taxonomy
making, and predictions in a high renewable energy system.
(ii) MES: Here, gas, heat, and current systems are integrated 3.1. Survey methodology
together to enhance the efficacy of the whole energy system.
SDA would be able to save these energy usages and cannot be A systematic procedure is used to perform this survey, as shown in
limited to power consumption. For example, combined predic Fig. 5, based on the guidelines proposed by Kitchenham et al. [37,38].
tion for a load of current, cooling, and heating can be The procedure is categorized into different phases: (i) Survey Planning,
accomplished. (ii) Research Questions, (iii) Relevant Literature Collection, (iv) Search
–New business models: The deregulation of retail energy markets, Criteria, (v) Inclusion and Exclusion Method, and (vi) Analysis and
incorporation of dispersed renewable energy, and advancement in Quality Evaluation.
ICT accelerate different business models in SG system such as
transactive energy (micro electricity market or consumer-to- 3.1.1. Survey planning
consumer business model) and sharing economy (distribution This survey starts with bordering of the research area by creating
scheme with circulated renewable energy and storage integration) research questions (RQs), relevant literature collection, emerging search
[11,36]. criteria with inclusion, and exclusion strategy for quality assessment.
–Interoperability:On the customer’s end, numerous stakeholders This comprehensive survey identified relevant studies, publications, and
realized that there are a lot of communication tools operating on work. The identified literature is first checked for the quality and in
distinct protocols in order to communicate with each other. This cludes only relevant literature data for this survey to minimize
leads to a challenge of interoperability between diverse entities researcher biases.
within the SG systems. Apart from the aforementioned issues, there
are various other research challenges that need to be addressed, such 3.1.2. Research questions
as communications infrastructure-related challenges (to trade be To lead this study, our fundamental goal is to do a top-to-bottom
tween potentially millions of parties in a single market), advanced survey of the SDA procedure in different applications of SG, such as
power system monitoring, protection, and control using AMI and demand response management and load forecasting. To accomplish this,
active demand side. Then, few of the other challenges are a lot of RQs and their objectives are framed for this systematic survey as
6
A. Kumari and S. Tanwar Sustainable Computing: Informatics and Systems 28 (2020) 100427
Table 1
A list of research questions with their objectives.
RQ. Research questions Objective
no.
RQ1 What are the secure data analytics issues in the SG system? It aims to explore secure data analytics issues in the SG system.
RQ2 Which type of issues exist and how to address them? It is expected to search each issue and address the identified issues using various algorithms to
ensure secure data analytics in the SG system.
RQ3 What are the different parameters that have been used to measure the It aims to recognize different parameters used to evaluate the performance of the applied
effectiveness of the SG system? techniques in the SG system.
RQ4 What are the different issues and their solutions in several application areas It is expected to classify the entire literature and look for a better solution of identified issues
of the SG system? in several applications areas such as demand response and load management for the SG
system.
RQ5 What are the studies, which emphasize the usage of different secure data This has a very high impact to know the usage of secure data analytics algorithms in several
analytics algorithms in several areas of the SG system areas of the SG system.
RQ6 What are the different issues and their solutions as categorized using These techniques are broadly classified according to the standard secure data analytics
research question 4? What are the pros and cons of that identified solution? algorithm and serves the purpose of providing understanding about the merits and demerits of
the existing solutions in the SG system.
RQ7 Discuss several taxonomies and comparative analysis of several existing Several taxonomies and comparative analysis based on existing surveys of secure data
surveys of secure data analytics in the SG system analytics in the SG system presented to understand the relative advantages and disadvantages
of the system.
RQ8 Discuss various research challenges in the different areas of secure data It aims to provide information on open research issues and challenges in several areas of the
analytics in the SG system? SG system.
RQ9 What are the different datasets and software frameworks/tools that have The purpose of this question is to familiarize the readers with several available data sets along
been used in the SG system for secure data analytics? with various software frameworks/tools being used for secure data analytics in the SG system.
listed in Table 1 how BD analytics is done in SG and what are the security issues and
challenges during data analytics. For this purpose, the search related to
3.1.3. Relevant literature collection “Big data analytics in smart grid” was made, followed by searches “secure
We perform an extensive search to retrieve research papers from data analytics in smart grid”, and “security issues in big data analytics in
standard digital library databases of high repute, which are as follows. smart grid”.
Q.1 Does the research paper refer to secure data analytics in the SG YES
system? The papers add an overview of security and privacy
issues in the SG system.
Where the word secure data analytics are not being used for the NO
SG that papers are excluded from the collected literature.
Q.2 Do the abstract, title, and full text of the research paper describe YES
Fig. 6. Search string. the secure data analytics issues in the SG system.
Have the abstract, title and full text of research paper described NO
secure data analytics issues in a sub-areas of the SG system
7
A. Kumari and S. Tanwar Sustainable Computing: Informatics and Systems 28 (2020) 100427
Table 3
Existing SDA reviews/surveys: a comparative summary.
Authors Year Objectives? Taxonomy? Artifacts? Tools or Research Dataset? Applicat- Case Scope of
* products? gap? ions? Study? literature
8
A. Kumari and S. Tanwar Sustainable Computing: Informatics and Systems 28 (2020) 100427
heterogeneous sources such as sensors, SMs, actuators, and IoT devices. (2D) or three-dimensional (3D) format. The proposed process model is
Then, this data gets preprocessed. The preprocessing includes various secure as it ensures secure data collection, secure data processing, and
techniques for identification of missing values (such as an average data security and privacy as a major concern for decision-making. To
method or linear interpolation), removal of duplicate values. It reduces validate the proposed process model, a case study is presented in Section
faults in the BD, for instance, errors, and noise. 10.
The preprocessing is used to convert the raw data in a useful format.
It includes three steps data cleaning, data transformation, and data
3.3. Solution taxonomy
reduction. Data cleaning comprises the handling of missing data, noisy
data, etc. There are different methods, such as ignoring the tuples, fill
In this paper, we studied the articles that are primarily based on SDA,
the missing values to handle missing data. There are binning method
SG, or both. A master taxonomy of SDA in the SG system is summarized
(works on sorted data in order to smooth it), regression, and clustering
in Fig. 9, which includes prevailing approaches available for secure BD
approach to handle noisy data. Then, cleaned data are transformed in an
collection, storage, and processing. The existing surveys have not
appropriate format using various data transformation techniques, for
covered each stage of the SDA to its full potential, whereas the proposed
instance, normalization, attribute selection, discretization, and concept
taxonomy includes each stage of the SDA (discussed profoundly in the
hierarchy generation. The data reduction is a technique that is used to
Background section). In the proposed solution taxonomy, discussion
handle the huge amount of data. While working with BD, analysis of BD
starting from secure BD collection and end by focusing on the commu
become harder due to huge volumes. It helps to reduce data storage and
nication and security and privacy issues during SDA. In the proposed
analysis costs and increase storage efficiency using various methodolo
solution taxonomy, a layer-wise color-coding style is used in each level;
gies such as numerosity reduction and dimensionality reduction.
for example, level-0 (i.e., root) is presented with green color. Similarly,
The preprocessed secure BD stored in the distributed storage system
the next layer, level-1, is represented with orange color. Hence, the
such as Hadoop for analytics and real-time modeling is used for
taxonomy is divided into different categories: (i) Secure Data Collection
streamed-in BD. Moreover, the processing is carried out to generate
and Preprocessing, (ii) Secure Load Data Processing and Storage, (iii)
routine or real-time events and visualize the results in two-dimensional
Load Prediction, (iv) Load Management and Analysis, (v) Data
9
A. Kumari and S. Tanwar Sustainable Computing: Informatics and Systems 28 (2020) 100427
Communications, and (vi) Data Security and Privacy. Further, each (d) Biometric-based authentication methods (BAM): It takes benefit
category is divided into sub-categories such as Secure Data Collection from the inherent behavioral or biological characteristics of a
and Preprocessing is categories as Authentication Mechanism, Trust person to check his identity [43], for example, fingerprint, face,
Management Mechanism, Anomaly Detection, and Feature Extraction and voice. This approach is more reliable than any other existing
and Selection. In this paper, each sub-category has been discussed in authentication method, as a biometric and behavioral feature is
detail into subsequent sections with the relative comparison of existing hard to forge [42][30]. Though, BAM may not apply in the sce
approaches. The next section focuses on the various techniques involved narios where persons are involved in data collection. Moreover, it
in secure data collection and preprocessing. requires a long duration to execute and the time complexity
controlled the security level.
This section discussed the survey methodology, proposed process model, and color-coded layer architecture of the solution taxonomy. At the end
of the section, readers can understand the need of this survey and the classification of the existing approaches of SDA.
The first step in SDA is to collect the data from different devices such The authentication mechanism can only control the authenticity of a
as sensors and SMs. Then, this collected data can affect the accuracy and device, but not guarantee the service provider’s trust. Trust manage
quality of data analytics. It has been a challenge to perform research on ment methods are used to calculate the trust value of a service provider
SDA due to data privacy and security issues. Moreover, utility companies [44]. Moreover, the trust management mechanism can be categorized
are reluctant to give access to their SG data for academic research and as: (i) Reputation-based Trust Management (RTM) and (ii) Policy-based
further analytics on it. Thus, traditional methods for secure data Trust Management (PTM) [30].
collection in the SG system plays a crucial role before the real-time
analysis [30,41]. (i) RTM: It uses computational and numerical methods to calculate
trust values. For example, in a social media network, the trust
value of a specific user is calculated by collecting and aggregating
4.1. Authentication mechanism
the reputation that the user obtained from the opinion of other
linked users.
It is a crucial requirement to guarantee the reliability of data sources
(ii) PTM: In this approach, verifiable attributes and logical rules are
in the SG system. It validates the user identity and guarantees that the
encrypted in signed credentials for data access. It has less flexi
user is legitimate to access the SG data server. Here, we recap various
bility as it makes a binary decision to allow the requester to access
authentication mechanisms as follows.
data [45]. In SG, AMI usage, a secure wireless communication
network connects to the utility providers/supplier and sends the
(a) Password-based authentication (PBA) methods: In this method,
energy usage data to the supplier automatically. The SM sends the
each user has his own ID and password maintained in a table in
data to an energy supplier in the intervals of 15 minutes to 1 hour.
encrypted form. During the authentication process, the legiti
Then, this data can be used for further processing.
macy of the user is verified from the password table. However,
PBA approach suffers from problems such as; (i) passwords can be
easily leaked as most of the users set passwords to some mean 4.3. Anomaly detection
ingful words such as user’s own or family member’s name, mobile
phone numbers, birthdays, to avoid to forget their passwords, (ii) Anomaly has discoursed as unusual data patterns or missing data
Passwords can be easily stolen and eavesdropped during data values triggered by the failure of events or unplanned events during data
transmission due to its static nature and during transmission, and collection, data entry, or communication. It uses the ML, probabilistic,
(iii) The table storing password is highly vulnerable to be and statistical methods [46]. In this section, SG-based time series BD is
tampered by attackers [42]. the major focus for anomaly detection. As per the modeling method,
(b) Smart card-based authentication methods (SCBA): It contains user time-series-based methods, time-window based methods, and low-rank
data related to his/her identity. Here, the user received his smart matrix technique-based methods are used for the detection of bad BD
card while registering to the system and use it during authenti or anomaly. A data cleaning method was proposed by Peppanen et al.
cation to the SG system. Moreover, it does not require any table to [47], which used the optimally weighted average (OWA) method for
maintain a password. Nevertheless, data stored in the smart card off-line and on-line circumstances. Energy load data is a linear combi
are static; the attacker may obtain the user’s identity through nation of the nearest load data, where a time series-based autoregressive
memory scanning while the card is being used by the user [42]. moving average (ARIMA) model can be applied. The training of an
(c) Dynamic password-based authentication (DPBA) method: It allows optimization model results in an optimal weight. On the contrary, the
users to change the password frequently according to the number nonlinear relationship between the energy data at different
of their uses. Here, the password is typically generated by specific time-intervals by combining artificial neural network (ANN) models and
hardware at the user’s end and the server uses a specific algo autoregressive with exogenous inputs (ARX). Here, hypothesis testing
rithm to validate the password during the authentication. More performed on anomaly detection. Likewise, the Q-test method has been
over, it avoids the risk of password stealing, but in a case when proposed to detect the outliers (when the number of samples <10) and
the server and client are not synchronized, then the authentica proposed the generalized extreme studentized deviate approach (when
tion failed. the number of samples greater than 10) [48]. Formerly, canonical
10
A. Kumari and S. Tanwar Sustainable Computing: Informatics and Systems 28 (2020) 100427
Table 4
A relative comparison of various secure data collection and preprocessing approaches.
Methodology Objectives Attributes References
Authentication To validate user identity and guarantees that the user is Authenticity, Security [24,30,31,42,
Mechanism legitimate to access the smart grid data 43]
Trust Management To achieve the amount of trust value to pick a service provider Security and Privacy [24,32,33,43]
Mechanism
Anomaly detection Discoursed unusual data patterns or missing data values caused Time series analysis, Time window [34,35,40,42]
by unplanned events
Feature extraction and Includes identifications of features extraction such as electricity Weather conditions, Time of the day, random events, seasons and [10,46,59–61]
selection prices, storage cells disturbances, electricity prices
variate analysis has been presented to cluster the recuperated load applied for SDA with selected features. A relative comparison of various
profiles and anomalous power consumption is classified using linear secure data collection and preprocessing approaches are shown in
discriminate analysis method. In-spite of detecting bad data, Jian et al. Table 4.
[49] suggested a forecasting method to identify the cyber-attacks
This section gives a relative comparison of various secure data collection and preprocessing approaches used in SDA. The preprocessing of BD
helps to remove duplicate and anomalous data. It makes the processing task easier and benefit in decision making. At the end of this section,
readers can understand existing approaches for secure data collection and preprocessing.
without bad data detection. 5. Secure load data processing and storage
The power consumptions are temporally and spatially correlated. To
identify the correlation (spatio-temporal) can help to handle outliers. With an advent of social sensing system and IoT upsurge generation
For bad data cleaning and accusation, the low-rank matrix-based and collection of the gigantic real-world BD. Here, users act as a source
method was proposed by the authors of [11]. This method enables data and provide run-time data. Hence, this BD needs secure processing to
exchange and communication between different consumers to protect protect the critical and confidential data of users.
privacy. In continuation, a reliable state estimation method has been
proposed using off-line and on-line algorithms [50]. However, the state 5.1. Secure data processing
estimation enhancement has not been reported after low-rank denois
ing. This method works well when bad data is distributed randomly. But, In the SG environment, data analytics are mostly outsourced to third-
it cannot handle unchanged data for a specific time. Here, low-rank party, for instance, cloud servers or edge servers. Consumers can depend
matrix-based data recovery and bad data identification methods have on storage and perform computing at an edge or cloud using analytics
been used [11]. Rather than detecting entire bad data straight, a services. In this scenario, the data owner loses control over the data as
clustering-based strategy has been proposed for the load profiles with soon as BD is outsourced and personal data of consumers can be exposed
missing data [51,52]. Here, load profiles are segmented for clustering, so to malicious stalkers. Many proposals have been carried out by re
that anomalous data can be recovered in the same cluster. Moreover, by searchers to handle this issue, which is as follows.
combining several anomaly classifiers, a collective contextual anomaly
detection method has been presented [53], where the anomaly was (i) Homomorphic Encryption (HE): It provides random data compu
detected using the overlying sliding windows approach. Meanwhile, a tation over ciphertexts and produces an encrypted computational
lambda architecture has been used for on-line anomaly detection [54]. result set [62,63]. The decryption operation on the result is the
This method can process the data in parallel, which resulted in a high same as the encryption performed on the input text. This
efficiency while working with huge datasets [11]. approach can be classified as full HE (FHE) (provide mixed data
computations) and partial HE (PHE) (support restrained appli
4.4. Feature extraction and selection cation scenarios, for example, paillier encryption). The compu
tational overhead is much higher in FHE as compared to PHE
The aspects that affect accurate data analytics can be divided into [30]. Consequently, suitable privacy-preserving algorithms need
traditional aspects and the SG aspects [55]. The traditional aspect in to be identified to fulfill the practical need of the SG systems.
cludes the time of the day, weather conditions, seasons, random events, (ii) Differential Privacy (DP): It provides strong privacy using random
and disturbances. On another side, the SG includes the demand noise to BD [64] and uses statistical data analytics, for example,
response, electricity prices, storage cells, distributed energy sources, and sum, minimum, maximum, average, and count over a specific
electric vehicles. The SDA in SG requires refinement to perform correct time [65]. Here, one way to protect against discrepancy attack is
analytics. The dimensionality reduction (for example, principal to arbitrary add the noise to the result, which maximizes the
component analysis (PCA)) plays a crucial role in handling noisy and accuracy of aggregation [66]. The second way is that every user
redundant data, where the selected features comprise redundant BD, or adds arbitrary noise to the data, which prevents privacy leakage,
correlated features are reduced or removed from the refined datasets. and then this modified data is aggregated by third-party. How
There are various techniques that can be used to construct an optimal ever, It can produce some statistical error in case of huge data.
subset of BD for the SDA, for example, But, it protects user privacy during the process of data analysis
minimum-redundancy-maximum-relevance [56], greedy hill-climbing, [67,68].
regularized tree, and random multinomial logit [55,57,58]. Then, (iii) Pseudonym Technology (PT): It helps users to benefit from third-
emergent technologies like Artificial Intelligence (AI) and ML can be party services anonymously by using pseudonyms. It is used to
11
A. Kumari and S. Tanwar Sustainable Computing: Informatics and Systems 28 (2020) 100427
protect the identity privacy of users. Subsequently, private data 6. Load prediction
of users cannot be leaked as the true identity of the user is hidden
from the third-party. However, various methods of de- Load predictions have been extensively used in SG systems. Elec
anonymization attack exist based on user behavioral analysis. tricity distribution companies use short-term and long-term prediction
So, this method provides less security than other methods of PT at feeder level to support planning processes and operations for elec
[30,69]. tricity distribution [75], whereas, retail electricity providers mark pro
curement, pricing, and hedging decisions mostly based on the forecasted
demand of the load.
5.2. Secure data storage methods The normalization of different loads types such as city load, feeder
load, and factory load are critical tasks. Here, feeder load is more
The cost of local BD storage is high; however, certain data processing impulsive than the city‘s load. Typically, the load profile is smoother if
results are stored to the third-party servers (such as cloud server or edge the higher load is measured. A highly accurate load prediction is sig
server). This method takes encrypted data as an input to prevent results nificant only at a lower level. The latest review has focused on proba
from being thieved or tempered. Nevertheless, this encryption deters the bilistic forecasting [11,76,77]. Here, recent literature on energy load
various operations, for instance, data search, de-duplication of results, forecasting has been focused, based on whether the SM data is being
and data access [30]. Approaches to protect SG privacy are as follows. used or not.
(i) Access control mechanism: In this mechanism, only an authorized (i) Load prediction without using SM data: Load profiles aggregated to
user can access the data [70]. There are several common access a group of customer or voltage levels are more volatile and sen
control mechanism, which is as follows. sitive to customer behaviors. Few of the residential loads can be
(a) Role-based access control (RBAC): In the RBAC approach, the responsive to the customer work schedule and weather condi
access control policies are related to authorizing the SG sys tions. These load prediction problems share a few common
tem and user roles (assigning specific roles to every user). challenges, for instance, modeling the effects of competitive
Here, mapping is carried out between roles and privileges on market influences, weather conditions, and leveraging the hier
data objects. It offers a flexible access control mechanism, archy. Another side, power consumption is largely driven by the
where third-party gives only limited storage to users by demand of consumers. The impulsive count of consumers adds
assigning a role to the users and acts as an admin to manage uncertainties to the load profile prediction. A long-term retail
the data. It leads to privacy leakage so RBAC is combined load prediction approach has been proposed by Xie et al. [78] by
with other security approaches to guarantee secure access taking consumer attrition into consideration. In this approach,
[71,72]. multiple linear regression method with a variable selection has
(b) Proxy re-encryption (PRE): It permits a proxy for ciphertext been used to forecast each consumer’s load. Then, predict the
transformation encrypted with ‘X’ public key and decrypted consumer’s attrition based on the survival analysis and predict
with ‘Y’ private key. It is designed to accomplish secure data the final retail load forecast. One of the major issues in the retail
sharing in data storage. The random number for re- energy market is the response of consumers to demand response
encryption keys is generated for every user. Then, the re- programs. Few consumers may respond to the price signals, while
encryption key of every recipient with an access control list few cannot. A nonparametric test has been done to detect
is uploaded to a semi-trust proxy server (for access). responsive consumers to forecast it separately [79]. They have
(c) Attribute-based access control (ABAC): In this approach, the performed an experiment using a cumulative load of the power
secret key and the ciphertext both are dependent on re grid. The humidity data impact on the load prediction has been
ceiver’s attributes. It grants access to users, where the discussed by Xie et al. [80] and a similar investigation has been
receiver can encrypt data in case his attributes match the carried out based on wind speed variables [81] and the impact of
quantified access policy. However, it has high computational weather conditions has been discussed in [11,82].
overhead. (ii) Load prediction with smart meter data: The SM has a two-fold value
(ii) Searchable encryption: Song et al. [73] proposed a scheme, which in load prediction. First, SM makes it possible to understand the
supports keyword to search over ciphertext and data encryption load prediction of individual buildings or houses clearly to the
using symmetric searchable encryption (the symmetric key is power retailer or power distribution companies. Second, the high
used by users to encrypt the index and create search hatches), and granularity level of load data offers the abundant potential to
asymmetric searchable encryption (users hold the public key for improve the prediction accuracy. The consumption of electricity
data encryption but only permits the one who could create search at the residential building can be much more volatile and random
hatches) [74]. These searches are based on specific rules, such as than those at aggregate levels and traditional techniques may not
maximum matches of the first n results. suit for load prediction [82].
This section focused on secure data processing and storage mechanism for SG. After preprocessing, secure data is processed using data analytics
tools and stored in the distributed storage for future analysis. At the end of this section, readers can understand existing approaches for secure
data processing and storage.
12
A. Kumari and S. Tanwar Sustainable Computing: Informatics and Systems 28 (2020) 100427
subsections.
Table 5
A relative comparison of state-of-the-art load prediction approaches.
Methodology* Authors Objectives Pros Cons Applications area
Hybrid approach with Jiang A hybrid approach for electricity Better results in consumer Incremental learning Customer Behavioral
unsupervised clustering et al. consumer classification using smart grid category identification and and association analysis Analysis
and supervised [96] data classifying accuracy to enhace the model
classification
K-means Clustering and Shaker Solar power generation estimation of This hybrid k-means and PCA – Dimesionality Reduction for
Principal Component et al. invisible solar energy site data dimension reduction power generation estimation
Analysis [96] shows stable results
Optimal Deep Learning Bouktif To perform load prediction using genetic Perform for medium to long To apply LSTM Load Forecasting
LSTM Model et al. algorithm and feature selection range forecasting modeling on different
[91] data sets
Statistical Modeling and Yu et al. Statistical Modeling using Gaussian Forecasting of energy – Load Forecasting, Energy
Machine Learning model [83] distribution with SVM techniques for generation, and anomaly generation, and anomaly
Energy Usage Forecasting detection performed on real- detection, optimal demand
world dataset response
Learning-Based Demand Zhang combines machine learning, Combine Home energy simulator to Commercial energy Demand Response system
Response System et al. optimization, and data structure design provide unbiased energy consumption need to be
[97] to build a demand response and home consumption consider
energy management system
Additive Quantile Taieb Individual Smart Meter Data Estimation Data Estimation of smart – Load Forecasting
Regression et al. meter
[98]
Statistical and Chen et al. Optimal demand response scheme Provide optimal prices in – Demand Response
probabilistic model [88] competitive market management system
K-means Clustering and Shaker Energy generation aggregation within a For large number of invisible Need to be tested for Power generation estimation
Principal Component et al. specific region solar site also this fuzzy other energy generation
Analysis [99] model can be use sources such as wind
Decision tree and SVM- Jindal Data analytics for theft detection This scheme identifies – Energy Theft detection
based et al. fraudulent consumers with an
[100] accuracy of 92.5%
Statistical and Guan Minimize the electricity and gas prices Significant energy cost – Energy usages optimization
probabilistic model et al. savings by scheduling and
[87] control of various energy
supply sources
SVM-based model Shi et al. One day ahead electricity price Tested on real-time dataset – Load Forecasting
[85] prediction using weather condition
Note: *, machine learning and deep learning or statistic and probability based model.
13
A. Kumari and S. Tanwar Sustainable Computing: Informatics and Systems 28 (2020) 100427
6.2. Statistic and probability based model classified as two categories incentive-based and dynamic pricing-based
schemes. In, incentive-based mechanism, the utility offers incentives
In comparison to the point forecast system, a probabilistic forecast to the customers for the reduction of demand during specific periods. It
gives more data related to the uncertainties of the future. A common broadly categorized according to the customer incentive method such as
point forecast system comprises three parts, such as data input, direct load control (loads turned off for specific time periods), curtail
modeling, and output (forecasts), as depicted in Fig. 10. To generate a able load (loads turned off or reduced for specific time periods), Inter
probabilistic forecast, there are three ways, which helps in workflow ruptible load (major portions of total load or total load turned off for
modification as reported in [76]: (i) producing multiple-input scenarios specific time periods) with the load reduction criteria events such as
that can be used as feed to point forecasting model, (ii) application of emergency programs or economic programs. In the case of the emer
probabilistic forecasting model, for example, quantile regression; and gency program, customers reduce load to relieve energy generation,
(iii) supplementing outputs of point to a probabilistic forecast by making transmission, and distribution capacity restrictions. The straight
ensembles of point forecasts or using simulated or modeled residuals. communication signals such as telephone and internet can be used to
The generation of scenarios is an efficient approach to collect the un notify participants of economic or reliability events of load management
certainties (on the input side) from the dynamic factors of energy de for a direct incentive payment. The last section discusses the issues
mands. Several scenarios of temperature generation approaches were linked to the implementation of a demand response program, which
also reported in the literature, for instance, direct consumption of the includes dynamic pricing, load profiling, and non-technical loss
preceding years of hourly temperatures with fixed dates [101], create detection.
additional scenarios by changing the historical temperatures (a few days
or so), and bootstrapping the historical temperatures [11,102]. On the 7.1. Customer classification
basis of the pinball loss function, these three methods were compared in
[103,104]. It has been observed that when the number of shifting dates The customers’ electricity usage behaviors and their sociodemo
is within a range, the shifted-date method dominates over the other two graphic status are closely associated. Therefore, in order to realize
methods. A relative comparison of various state-of-the-art load predic personalized services and user classification linking the load-profiles to
tion approaches is shown in Table 5. their sociodemographic status would be an important approach. Here,
Dahua et al. [105] proposed an experimental formula to get the the detection of consumer types based on the load-profiles would be a
parameters related to temperature scenarios. A quantile regression naive problem. Other issues are related to the identification of socio
neural network has been used in the regression model. In this approach, demographic information using load-profiles and load shapes prediction
the relationship between load and temperature can be broadly measured by the sociodemographic information. Simple classification can be used
and identified as the future uncertainties of temperature. This scenario to realize and identify the type of consumers. Using Fast Fourier trans
generation approach also develops a probabilistic view of the power formation (FFT), the temporal load-profiles were converted into the
distribution system consistency [106]. Another side, a point prediction frequency domain [111].
can be converted to probabilistic ones using residual simulation [107]. To categorize consumers, the coefficients of distinct frequencies were
The idea of combining point load prediction to produce probabilistic implemented as inputs of classification and regression tree (CART).
load prediction was initially proposed by Liu et al. [108]. It is still an Based on certain sine and cosine functions, FFT decomposes SM data.
unresolved problem, whether a more accurate point prediction model Sparse coding is another transformation method; it automatically learns
can cause a more skilled probabilistic prediction. them. On the other hand, Wang et al.[112] presented a non-negative
In a scenario, when two underlying models are considerably sparse coding for the extraction of partial usage patterns from original
different, a more accurate point prediction model would produce a more load-profiles. Subsequently, linear SVM was applied to categorize users
capable probabilistic prediction [109]. In the literature, various proba into residents and small and medium-sized enterprises on the basis of
bilistic forecasting models have been proposed by using Gaussian pro partial usage patterns. Here, the precision of classification is signifi
cess regression, quantile regression, and density estimation, and a cantly higher as compared to PCA and Discrete Wavelet Transform
hybrid quantile regression and generalized additive models [76,110]. (DWT). Consumer classification into distinct energy behavior groups
Probabilistic load prediction can be carried out using individual load was first carried out using a clustering approach. Later, identification of
profiles. Moreover, an additive quantile regression method has also been the indicators larger than a threshold energy behavior was achieved
proposed, which combines the quantile regression and gradient boosting through correlation rate and indicator dominance index [60]. Lastly,
method [60]. Then, the kernel density estimation approach has been mapping was done to reveal the relationship between various energy
tested, which modeled the density of electricity data [61]. behavior groups and their sociodemographic status. Further, Vercamer
et al. [113] generated typical load-profiles using spectral clustering.
In this section, the load prediction mechanism has been discussed using ML approaches and statistics and probability-based methods. At the end
of this section, readers can understand the existing approaches used for load prediction.
7. Load management and analysis Predictors, for instance, stochastic boosting and Random Forests
(RF), used these load-profiles as the inputs. It was found that the load-
This section summarized the way SDA contributes to the load man profiles of users can be precisely predicted, particularly with commer
agement implementation: initially discuss the necessity of having a good cial and cartographic data. To examine the factors that can influence
understanding of users’ sociodemographic information in order to residential electricity usage, the stepwise selection approach was pro
deliver personalized and better service. Then, manage the potential posed. The main factors identified involve floor area, location, the user’s
consumers for a demand response program using customer classification age, and the number of devices, whereas home-ownership and income
and customer behavioral analysis. In SG, load management plays a key level were found to have little connection with electricity usage. In
role in SDA, and it is active control of energy consumption. It can be [114], a multiple linear regression model was adopted to identify the
14
A. Kumari and S. Tanwar Sustainable Computing: Informatics and Systems 28 (2020) 100427
maximum demand, total electricity consumption, load factor, management improvement, electricity loss reduction, and improved
Time-of-Use (ToU), and customer’s socioeconomic variables. The impact QoS and reliability of SG. SDA essential for both utility companies and
of user’s socioeconomic status on their electricity usage pattern was correspondingly consumers to benefit. For example, the electricity
estimated by Han et al. [115]. A clustering algorithm could be one of the consumption pattern to reduce the total cost by optimizing the daily
approaches. In [116], Dirichlet Process Mixture Model (DPMM) was electricity consumption of consumers. The analytics on real-time data
used for load profiling of domestic and commercial premises, where the combine with past energy billing data to provide actionable insights on
pre-determination of a number of clusters was not required. The result consumption patterns. With the help of SM, customers can control and
acquired through the DPMM algorithm shows a clear relationship with monitor the consumption of their electricity. Several electric appliances,
the metadata of dwellings, for instance, the household size, type of for instance, washing machine, TV, electric car, fan, bulb consumes
dwelling, and nationality. On the basis of clustering results. Then, in different level of electricity. The SG keeps details of electricity con
[117], a multinomial logistic regression was used to the dwelling and sumption data so that in the event of queries (from the customer), SG can
appliance characteristics. reply quickly.
The clusters were individually examined according to the regression SDA helps customers to monitor and analyze their electricity con
model coefficients. As a classifier attribute, feature extraction and se sumption hourly/daily/weekly/monthly or yearly. The detailed data
lection were also applied. A set of features was established by Beckel analytics may expose noteworthy private data of consumers. For
et al. [118] comprising the ratios of two consumptions in diverse pe example, energy consumption analysis of a television may reveal shreds
riods, the average electricity usage over a defined time. Later, classifi of the evidence whether the consumer is at home or not. Similarly,
cation or regression was applied for the prediction of sociodemographic electricity consumption analysis of electric cars can disclose the places
status based on the selected features. Results obtained from their study visited by consumers. These private data of consumer requires excep
revealed that the suggested feature-extraction technique outperforms tional security by using private and public keys. Further, the time-series
biased random guess [11]. data generated at AMI records electricity consumption behaviors of each
consumer in the SG system. The SDA on efficient consumption behaviors
7.2. Dynamic pricing can be effective for flexible demand management and control electricity
usage. There are several works that exists for consumers classification
The electricity generation, transmission, and distribution cost a using load pattern grouping, for example, Self-Organizing Maps (SOM),
specific amount of consumption of energy in Kilo Watt-hour (KWh). ANN, fuzzy model, and multi-stage categorization methodology [120].
Earlier, there were mainly two categories of costs, fixed costs, and
operational costs. A specific amount is charged for per kWh of electricity 7.4. Demand response management
consumed by each consumer, which is known as a tariff. Conventionally,
electricity pricing used different tariff schemes such as block rate tariff, Demand response can be categorized into two sets; incentive-based
simple tariff, two-part tariff, power factor tariff, flat rate tariff, and and price-based schemes. In a price-based scheme, designing the price
maximum demand tariff. But, it does not meet the dynamic requirement, is a crucial model to drag customers’ attention and raise profit. On the
so the variable pricing or time-based pricing scheme has been intro other hand, an incentive-based demand response system quantifies
duced for distributed generation in the SG system. These pricing customers’ performance on the basis of baseline estimation. Here, an
schemes are ToU, Real-Time Pricing (RTP), and Critical Peak Pricing incentive is offered to the customers participating in peak load reduction
(CPP) and are discussed as follows. (to reduce the burden on SG) [126]. This subsection deals with the
application of SM data analytics in the two above mentioned categories
• ToU: The ToU provides a smart pricing system for peak time and off- of demand response.
peak time. Here, prices are higher during peak time (as per high Authors in [125] suggested an upgraded Weighted Fuzzy Average
demand for energy) as compared to off-peak time prices. Due to (WFA) for tariff design in order to acquire typical load profiles.
higher rates (during peak hours), consumers attracted to curtail Formulation of an optimized model having a designed profit function
electricity usage (resulted in reducing electricity bills) during peak was achieved where a piecewise function was used to model the
hours. This pricing approach can be categorized into three classes, acceptance of customers over price. Such a strategy of price determi
such as peak time, non-peak time, and mid-peak time [7]. nation was previously reported by Joseph et al. [127]. For the risk
• CPP: It is an updated form of ToU, which includes the time of year model, Conditional Value at Risk (CVaR) was considered and explored in
with very high energy demand compared to the peak-time of ToU. such a way that the initial optimization model becomes a stochastic one
The identification of the specific time (which exceeds the peak de [128]. Besides, several distinct kinds of clustering approaches were
mand of ToU) of year is not possible to identify by the ToU pricing adopted for the extraction of load profiles according to the performance
scheme. Occasionally it costs the grid high wholesale energy prices index granularity [129]. It was found that the clustering approach based
that are not in range of ToU pricing. CPP is only declared when the on different numbers of clusters and algorithms results in variations in
load prediction shows the critical day (by making very high load). costs.
Typically it is declared a day ahead of the actual CPP and has the In [130], the Gaussian mixture model (GMM) clustering was applied
pricing ratio of 15:1 for peak time and non-peak time. to energy cost as well as load profiles. Subsequently, the ToU tariff was
• RTP: It reflects the electricity consumption cost acquired by the designed using various arrangements of the specific time. The effect of
consumer at the nearest real-time prices. It is mainly of two types, the calculated price on demand response was eventually evaluated
hourly pricing, and day-ahead pricing. For hourly pricing, the elec [131]. The various scenario of demand response was considered and
tricity prices are announced for the next hour only. Another side, for demonstrated. Probabilistic baseline estimation based on
day-ahead pricing, the price is announced 24 hours beforehand by Gaussian-process was proposed to explain the ambiguity for users’
predicting the generation cost and the load demand. RTP requires the consumption behaviors [132]. Moreover, efforts were also made to
engagement of consumers to provide electricity at a cheaper cost understand how the aggregation level impacts the relative estimation
[119]. The participation of consumers in electricity prices changing error. In [133], K-means clustering was first applied for the load profiles
helps the SG system to work efficiently and effectively. in non-event days. The prediction of electricity usage level was carried
out using a decision tree based on the demographics data, which
7.3. Customer behavioral analysis included electrical devices and household features. Therefore, a new
user could be categorized directly into a certain group prior to their
The factual value of SDA provides real-time analytics for load joining in demand response program, and subsequent basic averaging
15
A. Kumari and S. Tanwar Sustainable Computing: Informatics and Systems 28 (2020) 100427
Table 6
A relative comparison of load management and analysis approaches.
Methodology Objectives Attributes References
Customer classification User classification linking the load-profiles to their Profile Prediction, Status Prediction, Socio-demographic, Energy [87,88,91,121,96]
sociodemographic status. usages
Dynamic Pricing To increase the profit at grid Time of use, Critical Peak Pricing &Real Time Pricing [9]
Load Profiling To classify the load for profile identification Direct Clustering, Local Characteristics, and Uncertainty [15,13,19,111,113,
116]
Customer Behavioral Electricity consumption helps to analyze the behavior of Time of use, Pattern of load, Customer Privacy [35]
Analysis the consumer.
Demand Response To manage the load demand of consumers by managing it Load Prediction, Load Classification, Customer classification, [89,122,123,80,81,
Management at demand side of smart grid Appliances usages, Appliance priority scheduling 124,114]
Non-technical loss To identify the energy theft. Supervised Learning; Unsupervised Learning [113,114,125]
detection
and piecewise linear regression can be utilized for the estimation of detection, non-technical losses were first predicted Jokar et al. [141]. To
baseline loads under various weather situations. group the load profiles, K-means clustering was exploited to determine
As an optimization problem, the selection of group representing as a the number of clusters [146]. Several potential malicious samples
control for baseline estimation was articulated by Hatton et al. [134]. created to train the classifier and address the imbalanced data issue.
Further, the minimization of the difference among load profiles of de Once a definite number of abnormal detections occur, an energy-theft
mand response group and the respective control group can be a main alarm was raised. The suggested scheme may also identify the type of
objective for the demand response system [11,84]. Further, load energy-theft.
profiling can help to classify load in demand response system, which is Besides, extraction of features based on clustering, an encoding
discussed in detail in the subsequent subsection. method was carried out for detection, this method could run in parallel.
A top-down scheme on the basis of SVM methods and decision tree was
7.5. Load profiling suggested in [100] through the introduction of external variables. The
anticipated energy consumption was also estimated using a decision tree
The classification of load-curves, according to power utilization on the basis of a number of consumers, devices, and outdoor tempera
behavior, is usually referred to as Load profiling. It may be categorized as ture. The SVM was fed with the output of the decision tree to determine
indirect-clustering-based approaches and direct-clustering based ap normal versus malicious consumers. Real-time detection could also be
proaches. Several methods of clustering have been directly utilized on applied using this proposed framework.
SG data, for example, includes hierarchical clustering, K-means, and Acquisition of labeled dataset for the detection of energy-theft is
SOM [135–137]. On the other hand, works on indirect clustering could costly on the load data in supervised learning approach [147]. This
be classified into load characteristics, dimensionality reduction, vari worked as the classifiers’ inputs, including the rule-engine-based algo
ability, and uncertainty-based methods based on the features extracted rithm and SVM, to identify the energy-theft. As compared to supervised
before clustering. With direct clustering techniques, there are two basic learning, unsupervised detection of energy-theft does not require labels
issues. of partial or all users data. In [148], optimum-path forest clustering was
The first challenge is imposed by the resolution of data generated via suggested, in which every group is modeled as a Gaussian distribution.
SMs. Granell et al. [138] presented clustering techniques, such as hier In case the distance is larger than the defined threshold, the load profile
archical algorithms, k-means, and DPMM on the load profile data. The may be recognized as an anomaly. The superiority among the proposed
SM time series data can record the power consumption behaviors of each method, a comparison with commonly used methods were assessed,
customer in the SG system. Effective consumption behaviors can be including affinity propagation, k-means, and GMM. Clustering was only
helpful for effective energy control and flexible demand management. performed within an individual user to acquire atypical and typical load
The customer classification into different categories is based on the profiles in [149], instead of clustering all load profiles. Based on the
similar load patterns. It plays a vital role in SDA as it is useful for both atypical and typical load profiles, the classifier was trained for the
electricity suppliers and consumers [139]. For load classification, there detection of energy-theft. Another method of feature extraction could be
are various works that exist for load pattern groupings such as ANN based on transforming the time-series SM data into the frequency
[140,11], fuzzy model, and other clustering algorithms [112,141–144]. domain.
In the meantime, few researchers focus on multi-stage categorization The extracted features in the examined interval and reference in
frameworks [136]. terval were compared (applying Structure & Detect method) using a
discrete Fourier transform. Subsequently, a normal or malicious load
7.6. Non-technical loss detection profile can be distinguished [150]. The method proposed here could be
employed in a parallel and distributed manner that will be helpful for
The bad data are usually temporary and unintentional; however, the on-line analysis of huge datasets. An additional method of unsu
theft of energy (non-technical loss) might change the SM data under pervised detection of energy-theft can prepare a given problem as a load
specific approaches and may last for a comparatively longer time. In forecasting problem. A consumer could be categorized as a malicious
fact, data related to energy-theft collected by SMs belongs to bad data. user if the metered expenditure is significantly lower compared to the
The detection of energy-theft could be done by implementing power forecasted consumption [145]. Each consumption data were shown
system state data and SM data, for example, node voltages. The energy- using different colors and visualized on the basis of an anomaly score
theft detection techniques using only SM data can be concisely reflected [151,11]. A comparative study of load management and analysis tech
from two aspects, that is, unsupervised learning and supervised learning niques is shown in Table 6. Further, the secure and strong data
[145]. communication system is discussed in the next section for load
Supervised learning and classification approaches are effective to management.
detect energy-theft and are basically comprises of two stages; feature
extraction and classification. In order to train the classifier for theft
16
A. Kumari and S. Tanwar Sustainable Computing: Informatics and Systems 28 (2020) 100427
This section discussed load management and analysis. Load management is one of the critical tasks, and it emphasizes the techniques available
for customer classification, load classification, dynamic pricing, customer behavioral analysis, demand response management, load profiling,
and energy theft detection. Readers get a glimpse of the available approaches and methodology for load management and analysis for the SG
system.
8. Data communication restoration. In [161], the outage management applications were dis
cussed using a two-stage strategy to identify the outage area. The first
Data communication plays a vital role in the SDA of the SG system. It stage comprised of the physical network-distribution and was simplified
includes network-connection verification, outage management and data through topology analysis. On the other hand, in the second stage, SM
compression [27,152]. In these aspects, only a few works are available data was used to identify the outage area considering the effects of
in the literature and summarized follows. communication. For the rapid detection and recovery of power outages
and its location, an SM data-based prediction technique was proposed by
8.1. Communication verification Kuroda et al. [162]. Here, the hurdles related to the utilization of SM
data and requisite functions were also examined. In addition, a
The network-connection distribution data may assist utility distrib multiple-hypothesis technique was proposed to detect the faulty section
utors and utilities to take optimal decisions concerning the distribution- on a feeder level [163]. Here, the input of the suggested
system operation. Unfortunately, at a lower voltage, the whole topology multiple-hypothesis method consists of outage reports collected by SMs.
of the system might not be accessible. To detect the connections of To maximize the amount of SM notifications, the problem was arti
various demand nodes, quite a few works have been carried out using culated as an optimization model. On the other hand, a unique hierar
SM data. For the correction of connectivity errors, correlation analysis chical framework was developed to detect outage by means of SM event
was done on the basis of SM data collected for the hourly voltage and data instead of usage data [164]. It is expected to address the hurdles
current utilization [153]. This analysis was based on the assumption that related to missing data, selection of variables, multivariate count data,
the magnitude of voltage declines downstream along the feeder. reliability indices, and outages [106]. Besides, methods of data analytics
Nevertheless, in cases where a huge quantity of distributed used to manage outages, further works on SM data-based management
renewable-energy integration occurs, the assumption may not work were also adopted to the corresponding communication architectures
correctly. In contrast to consumption data, the current and voltage in [165]. Further, to reduce the latency of data communication, data
formation were considered for the estimation of topology displayed by compression has been discussed in the subsequent subsection.
the distribution-system [154]. Indeed, this estimate was obtained in a
greedy fashion instead of an extensive quest to improve the computation 8.3. Data compression
efficiency.
The topology identification has been presented as an optimization Data communication and storage become challenging with huge SM
problem based on the conditional probability [155]. Likewise, an data. To overcome such issues of communication and storage,
assumption based on the correlation between non-neighbor buses is compression of SM data to small size without significant loss is required.
lower than that between interconnected neighboring buses. Thus, the Two categories of data compression are available, namely lossless
problem of topology identification was articulated as a Lasso-based compression and lossy compression, various methods to compress
sparse estimation problem and a probabilistic graph model [156]. For electric signal waveforms in SG were provided by Tcheou et al. [166]. In
Lasso regression, a discussion about the selection of the regularization addition, few papers have specifically discussed the problem associated
parameter was also presented. with SM data compression. Mainly for very high-frequency data, the
In [157], PCA was used to analyze power consumption data at changes in current utilization in the adjunct time period are very small
various levels for topology as well as phase identification where compared to actual consumption. Therefore, to achieve lossless
Gaussian distributions were used to formulate the errors triggered by compression of SM data, few methods related to variable-length coding,
smart metering, technical losses, and clock synchronization. Instead of combining normalization, differential coding, and entropy coding, were
utilizing all SM data, an incomplete data-based phase identification proposed [167].
problem was proposed by Xu et al. [158] in order to address the hurdles Similarly, several methods of lossless compression such as Interna
imposed by null data or bad data. Fourier transform was used to obtain tional Electrotechnical Commission (IEC) 62056-21, and differential
the high-frequency load for subsequent phase identification. The dif exponential Golomb and arithmetic coding showed enhanced perfor
ferences in high-frequency load among two contiguous time intervals mance on BD with higher granularity [168]. In contrast, to decrease the
were extracted as the inputs of saliency analysis. Sensitivity analysis dimensionality of load-profiles before clustering for low granularity SM
performed on SM penetration ratios revealed that using merely 10% SMs data, a classic time-series method of data compression called ‘symbolic
more than 95% accuracy can be achieved [11,159]. aggregate approximation’ was used [143]. Moreover, the generalized
extreme value was used to fit the distribution of load-profiles. By
8.2. Outage management defining the stimulus state and base state of load-profiles and through
detection of a change in the load status, a load data compression method
A failure in electricity supply that may occur due to short-circuit, based on feature was suggested. A comparison among the DWT and
damaged distribution line, and station failure is defined as ‘power piecewise aggregate approximation was also carried out. Further, a
outage’ [160]. Therefore, the management of power outage has been non-negative sparse coding has been used for the identification of partial
given the utmost importance in the analytics of SM data behind billing. usage of load patterns and load compressed in a sparse way [11].
This includes notification of outage, location, and verification of
This section discussed the data communication technologies available for SDA. The secure communication system protects from security
breaches during analytics. At the end of this section, readers can comprehend existing technologies for secure data communication.
17
A. Kumari and S. Tanwar Sustainable Computing: Informatics and Systems 28 (2020) 100427
9. Data security and privacy communication techniques. Hence, standards are required to make
interoperability possible. There are various data models have been
The privacy issue is among the major disagreements and worries for developed to standardize SG. For example, IEC 61970/61968, which is a
the installation of SMs. Using the fine-grained data gathered by SMs, the CIM, uses IEC 61850 for messaging and information exchanges. In recent
sociodemographic data can be inferred. Quite a few reports are available times, the integration of smart inverters required more advanced pro
discussing how the consumers’ privacy can be preserved. The distrib tocols such as IEC 61850-90-7. There are other advanced protocols such
uted aggregation architecture for additive SM data was carried out as IEEE 2030.5 and IEEE 1815 (dnp3), which is used in current com
[169]. To prevent disclosure of personal data, the gateways placed at the munications infrastructure [176,15].
consumer’s premises were equipped with a secure communication
protocol [170]. The proposed protocol can be applied in centralized as
well as distributed manners. Then, a framework has been presented by 9.2. Blockchain-based SDA
Sankar et al. [171] for the trade-off between utility requisite and privacy
of users using a Hidden Markov Model (HMM). The distortion among the In general, blockchain can be defined as a chain of several blocks
original and the perturbed data was used to evaluate the utility containing information. The main attribute of blockchain technology is
requirement, whereas, for privacy, the mutual information between the keeping track of all the variations in the block that is created such that
two data sequences was examined. all the blocks become immutable. This makes it a secure technique to do
Subsequently, a trade-off region of utility-privacy was determined transactions such as transferring money, properties, contracts, etc.
from the information theory perspective [172]. Here, the success without the need of intermediary agents (for example, governments or
probability of an attack was defined as an objective function to be banks). Blockchain is basically a software (a set of codes) that operates
minimized and formulate privacy. The introduction of colored noise and using internet connectivity and consists of software applications, data
the accumulation of individual SM data were employed so that the bases, and a network of computers called ‘lodgers.’ For example, solidity
success probability can be reduced. For identification of appliance sta (a high-level programming language) is largely used by blockchain de
tus, the granularity of SM data, and its influence on edge-detection was velopers, although the blockchain can be created using other program
reported to evaluate privacy [169,173]. Through this study, it was ming languages as well.
shown that the detection rate declines dramatically when information The first block of every blockchain is ‘Genesis block’ that works as a
collection frequency gets lower than half the on-time of a specific header of the chain to which newly created block gets connected in a
appliance. Further, the issue of privacy preservation was articulated as successive manner. Apart from the data contained in each block, a hash
an optimization problem [174,72]. Here, the target was to reduce the also exists, such as a fingerprint to uniquely identify each block and its
sum of the anticipated cost, less utilization of energy by users due to the contents. Any modification in the information of the block will change
late use of machines, and data leakage [175]. Further, data privacy is the accompanying hash and making hash a vital component to ensure
also managed by a set of standards and interoperability, as discussed in security in the blockchain operation [188]. Each block includes its own
the next subsection. hash, in case of malicious attack, changing block information will also
change its hash, while hash in the adjacent block will remain unaffected.
This will invalidate all the subsequent blocks in the chain [189,190]. A
9.1. Standards and interoperability typical process of blockchain operation is displayed in Fig. 11.
To start this, any user can initiate the transaction, which will be
SG is a complex and heterogeneous environment that contains broadcasted to all the network users for subsequent verification at all
different kinds of devices, systems, networks, and data. For example, nodes via hashes. After successful verification of the transaction, the
there are communication networks with low/fast processing, devices information will be kept in a newly created block connected to the
with/without energy constraints, continuous/non-continuous BD, previous blockchain making it unchangeable and permanent. On the
interactive/non-interactive systems. Hence, SG advent different issues contrary, hackers may use high computing resources to update the data
to manage BD integration in terms of limited resources, errors, band in a specific block and recalculate all the hashes corresponding to sub
width constraints, and high scalability. Currently, utilities are using sequent blocks in that blockchain. To handle this, several algorithms
different protocols having different definitions and different were created, which is known as the ‘consensus’ [191]. Consensus
18
A. Kumari and S. Tanwar Sustainable Computing: Informatics and Systems 28 (2020) 100427
algorithms verify the transactions prior to their addition to the block be used in SG by distinct blockchain aggregators to provide secure and
chain, allowing fearless information and transaction relay over the trustworthy data-storage platforms and implicate protection means. The
blockchain. utilization of the blockchain technology helps to mitigate the service
The consensus process occurs in a predetermined time intervals; the attacks such as the denial of service attacks [200]. Moreover, since no
later represents times from transaction initiation to addition on the direct-link exists between the identity of the specific user and the
Blockchain [192]. Such algorithms with variable properties are being identity within a blockchain environment; this will reduce vulnerability
developed by industry; here, confirmation time usually depends on the to malicious attacks and enhance the resilience of the SG system [199,
block size, the volume of transactions, and the type of algorithm used. 201].
Four widely used consensus algorithms are:
10.1. Decentralized energy management system: a case study
(i) Proof of Work (PoW) - a new block is added when miners mine
the block and solve PoW problems, Bitcoin and Ethereum utilizes In order to validate the process model for SDA, we present a case
the PoW algorithm [192], study on smart energy management in a demand response program
(ii) Proof of Stake (PoS) – validators verifies the transactions during using smart contracts (SC) [202]. To validate the proposed process
block creation. PoS is yet to be developed and used in the model, we explore the usage of decentralized blockchain technology for
industry, delivering secure, reliable, transparent, and timely energy flexibility
(iii) Proof of Authority (PoA) – provide rights to users to add new under demand response programs [203]. Here, distributed ledger stores
transactions in the blockchain, and energy consumption and production data collected from IoT devices
(iv) Practical Byzantine Fault Tolerance (PBFT) – used primary or such as sensors and SMs. Then, self-enforced SC defined the energy
secondary replica during the consensus process. flexibility level of each prosumer (act as a consumer and producer both)
and executed the set of rules to balance the energy supply and demand at
Blockchain has the following attributes that make it different from the grid level.
similar technologies: (i) Decentralized, (ii) Resilience, (iii) Time reduc The demand response programs use a consensus-based algorithm for
tion (or quick transactions), (iv) Reliability, (v) Fraud prevention, (vi) the appropriate financial settlement of a transaction within the micro
Security, and (vii) Transparency [193]. grid/SG. The prototype is implemented in the Ethereum platform using
In addition, there are three foremost types of blockchains structure: energy consumption and energy production traces of several houses
public, private, and Consortium depending on the number of users [75]. This study can match the energy production and demand at the SG
allowed to view and verify the transactions and add new transactions. level in a distributed fashion. Here, the demand and response signal has
Blockchain technology is used widely in sectors such as financial mar high accuracy with the flexibility of energy availability.
kets, health, and science [194–196]. Although these applications are Here, consider the fact that every consumer in demand side can be a
immature, but are yet being adopted by the cryptocurrencies industry. A prosumer and participate in the supply of energy by using any renewable
real-life example includes Bitcoin launched by Satoshi Nakamoto in the energy sources such as PV solar panel (1000 Square feet) linked to the
year 2008. It gained wide popularity as government or banks do not houses/buildings. SC executed and calculated the monthly bill as per the
govern cryptocurrencies, and all the transactions are open to view, energy consumption data associated with the consumer. As shown in
verify, and participation by the public. Ripple, Ethereum, and Litecoin Fig. 12, SG, and prosumers worked as a node and connected through a
are other such examples of cryptocurrencies [197]. private blockchain network. SG generates electricity from conventional
On the other hand, the government of Dubai established a roadmap sources or renewable resources and supplies to the prosumers based on
for blockchain in the year 2016. It provides an open platform to share the demand.
technology worldwide and built efficient government, global leadership, Prosumers can pay the electricity bill in a specific time interval to the
and industry creation [198]. A unique use of blockchain was used for SG using an SC established between SG and prosumers. SC first checks
humanitarian aid; transactions registered within blockchain ensure the account balance of the prosumers before supplying energy to him
secure and transparent transactions to beneficiaries receiving the aid and verifies the old records using the blockchain network. If the pro
(food, clothes, and funds) [199,39]. sumers have sufficient balance for a specific period such as a month,
This section discussed the data security and privacy issues during SDA. The standards and interoperability help to protect critical data during its
analytics in the SG system. Readers can grasp the standards and knowledge of emergent technology, such as blockchain, to provide security
during SDA. We presented a blockchain-based case study in the subsequent section.
10. Future research directions and case study then only energy will be supplied to him. The real-time settlement of the
energy transaction helps to make the demand response program effi
Several studies have been done based on open energy datasets, as cient. In the case of the breeches, the prosumers need to pay penalties as
summarized in Table 7. This list provides short details of available open- per the agreement. Once the energy transaction completed, a receipt will
source load datasets for future research in the field of SDA. Then, a list of be generated and published within the blockchain network. The sample
tools and technologies is shown in Table 8 for SDA. Further, enhanced data is shown in the EnergyUsage and ElectricityPrices table (shown in
security for energy systems is offered by Blockchain technology. Data Fig. 12) to calculate the electricity bill generated by a specific house
incorporated in Blockchain becomes immutable (tough to delete or alter using SM.
any transactions) due to the consensus mechanism applied with cryp There are three different types of meters of only phase-1 connection
tographic securitization. Moreover, the decentralized approach of (it can be a multiple-phase, for study purpose we consider only phase-1)
keeping a copy of the ledger with every peer allows the Blockchain to these are Type1, Type2 and Type3 and initial installation of the meter is
bring out a secure, robust, and resilient electrical energy system. charged as 100/- Indian rupees (INR), 200/- and 300/- respectively. For
An added advantage comes through fragmented computational ca customers with customer key 76106253044, total energy consumed in a
pacity instead of the intense use of one giant computer. Blockchain can specific time period is 175 kWh as per the SM reading. Further, energy
19
Table 7
A list of open source load datasets.
A. Kumari and S. Tanwar
DataSet UMass-Trace- ISO New-England Ausgrid- Smart-Energy Pecan-Street Genome- Low-Carbon- Ausgrid-Residents Customer- GEFCom- GEFCom-2012 OpenEI
name Repository Substation Informatics Lab Building- London Behavior- 2014
Data Trials
Time – 2011 – 12/ 2003/1 – 12/ 01/2005 – 12/2016 – 01/ 05/2005 – 05/ 12/2014 – 01/2013 – 07/2010 – 06/ 09/2009 – 01/2005 – 01/2003 – 06/ 01/2011 – 12/
duration 2019 2019 12/2019 2018 2017 11/2015 12/2013 2013 01/2011 09/2010 2008 2018
Frequency 0.0167 – 1 0.25 5–8 – s 0.0167 – (1 min) 1 0.5 0.5 0.5 1 1 1
(h) (1 min)
Entries 400 9 177 60 500 507 5567 300 6445 1 20 8761
DataSet Residential System load, Substation Residential Electric vehicle Non- Electricity Controlled load Smart meter Weather Temperature 29 dataset of
details electricity Temperature and data electricity charging and residential price data; consumption data based pre and zonal and zonal load Energy
consumption Locational consumption houses electricity building Appliance and general and post- load data; data consumption
data marginal pricing data consumption data; and attitude consumption data; trial data Data
data data; data PV output data
References [177] [178] [179] [180] [181] [182] [183] [184] [185] [76] [186] [187]
20
Table 8
Comparative analysis of open-source tools/technologies used in SDA for SG systems.
Features Developed Language supported Purpose to use Data Schema External file Software requirements Event Companies using Application area
by structure support driven
Apache Pig Yahoo PigLatin Hadoop cluster data Complex, Optional Yes Java1.6 or above No Yahoo For processing of large data set present
processing nested in Hadoop cluster
Apache Hive Facebook SQL-like language analytical purposes Apache Required Yes Hive version 1.2, Java No Facebook, Netflix Use for effective data aggregation
called HiveQL, or Derby 1.7, Hadoop version 2.x method, adhoc querying and analysis of
HQL. database preferred huge volumes of data.
Apache Cloudera MYSQL, SQL server, Import and export Simple Optional Yes No such requirements No Yahoo, Amazon To transfer data between Hadoop and
Sqoop PostgreSQL, IBM from RDBMS to Relational databases
DB2. Hadoop
Apache Apache Java Random read/write NOSQL Required No JDK version 1.7 No EBay, Yahoo, To provide quick random access to huge
HBase Softwares access TrendMicro, and amount of structured data
Facebook, etc.
Apache Yahoo Java and C For Distributed Kafka data Required Yes JDK 6 or greater, 2 GB of No Rackspace, Yahoo etc. To Provide centralized control for
Zookeeper Applications structures RAM, Three ZooKeeper synchronization across the Hadoop
servers cluster
Apache Cloudera Java Migration of data to Simple Required Yes Java 1.7 or above Yes Yahoo, Google etc. For moving streaming web log data into
Flume centralized storage HBase
Sustainable Computing: Informatics and Systems 28 (2020) 100427
A. Kumari and S. Tanwar Sustainable Computing: Informatics and Systems 28 (2020) 100427
consumption by the customer (with customer key – 76106253085) in a Declaration of Competing Interest
month is 201 kWh, considering the fact that he is using the 1-phase
electricity supply of Type2 meter, so the electricity bill produces as The authors report no declarations of interest.
ElectricityPrice.Minimumcharges + (CMV-LMV) * ElectricityPrice.Rates for
each slab, so here for the customer(76106253085) electricity bill will be References
854 INR. Once the bill is produced, the bill amount will be deducted
from the consumers’ E-wallet in the blockchain network. With the [1] A. Kumari, S. Tanwar, S. Tyagi, N. Kumar, M. Maasberg, K.-K.R. Choo,
Multimedia big data computing and internet of things applications: a taxonomy
increasing capabilities of communication within SG, this case study offer and process model, J. Netw. Comput. Appl. 124 (2018) 169–195.
justifiable services to both consumer and electricity utility providers [2] Statista. https://www.statista.com/statistics/254266/global-big-data-market-fo
(part of SG) and this case study can be summarized as An autonomous recast/ (Online; Accessed 04 December 2019).
[3] T. Wilcox, N. Jin, P. Flach, J. Thumim, A big data platform for smart meter data
decentralized energy system with features such as environment pro analytics, Comput. Ind. 105 (2019) 250–259.
tection (by using renewable energy resources), high quality and reli [4] D. Alahakoon, X. Yu, Smart electricity meter data intelligence for future energy
ability, consumer data security and privacy, superior electricity systems: a survey, IEEE Trans. Ind. Informatics 12 (1) (2016) 425–436.
[5] J.-P. Dijcks, Oracle: Big data for the enterprise, Oracle White Paper (2012) 16.
utilization, real-time settlement of the transaction, electricity cost [6] K. Zhou, S. Yang, C. Shen, S. Ding, C. Sun, Energy conservation and emission
minimization and improvement of QoS at SG. reduction of China’s electric power industry, Renew. Sustain. Energy Rev. 45
(2015) 10–19.
[7] K. Zhou, C. Fu, S. Yang, Big data driven smart energy management: from big data
11. Conclusion
to big insights, Renew. Sustain. Energy Rev. 56 (2016) 215–225.
[8] D.J. Leeds, The Soft Grid 2013–2020: Big Data & Utility Analytics for Smart Grid,
The secure analytics of BD has become the path-breaking technology GTM Research Report, 2012.
for different applications areas of sustainable smart cities such as SG and [9] B.P. Bhattarai, S. Paudyal, Y. Luo, M. Mohanpurkar, K. Cheung, R. Tonkoski,
R. Hovsapian, K.S. Myers, R. Zhang, P. Zhao, M. Manic, S. Zhang, X. Zhang, Big
transportation. In this paper, we have presented detailed information on data analytics in smart grids: state-of-the-art, challenges, opportunities, and
SG, BD, and SDA. Then, the challenges, issues, and discussions pre future directions, IET Smart Grid 2 (2) (2019) 141–154.
sented, which provides a better understanding of SDA in the SG system. [10] I. Group, et al., Managing Big Data for Smart Grids and Smart Meters, IBM
Corporation, Whitepaper, 2012.
A comparison of several existing surveys is presented along with the [11] Y. Wang, Q. Chen, T. Hong, C. Kang, Review of smart meter data analytics:
proposed process model, which can be used to design and deploy an SDA applications, methodologies, and challenges, IEEE Trans. Smart Grid 10 (3)
system. Then, we presented SDA solution taxonomy, focusing on secure (2018) 3125–3148.
[12] C. Tu, X. He, Z. Shuai, F. Jiang, Big data issues in smart grid – a review, Renew.
data collection and preprocessing, secure load data processing, load Sustain. Energy Rev. 79 (2017) 1099–1107.
prediction, load management and analysis, data communications, data [13] M. Ghofrani, A. Steeble, C. Barrett, I. Daneshnia, Survey of big data role in smart
security, and privacy. A comparative study of existing technologies and grids: definitions, applications, challenges, and solutions, Open Electr. Electron.
Eng. J. 12 (1) (2018).
approaches are presented in each aspect of solution taxonomy. Then, we [14] S. Tanwar, S. Tyagi, S. Kumar, The role of internet of things and smart grid for the
presented a case study on the blockchain-based decentralized energy development of a smart city, in: Y.-C. Hu, S. Tiwari, K.K. Mishra, M.C. Trivedi
management system to verify the effectiveness of the proposed process (Eds.), Intelligent Communication and Computational Technologies (Singapore),
Springer Singapore, 2018, pp. 23–33.
model. Finally, future research directions of SDA in SG systems are
[15] H. Daki, A. El Hannani, A. Aqqal, A. Haidine, A. Dahbi, Big data management in
presented. smart grid: concepts, requirements and implementation, J. Big Data 4 (1) (2017)
13.
21
A. Kumari and S. Tanwar Sustainable Computing: Informatics and Systems 28 (2020) 100427
[16] A.A. Munshi, Y.A.-R.I. Mohamed, Big data framework for analytics in smart grids, [52] A. Al-Wakeel, J. Wu, N. Jenkins, State estimation of medium voltage distribution
Electr. Power Syst. Res. 151 (2017) 369–380. networks using smart meter measurements, Appl. Energy 184 (2016) 207–218.
[17] Y. Zhang, T. Huang, E.F. Bompard, Big data analytics in smart grids: a review, [53] D.B. Araya, K. Grolinger, H.F. ElYamany, M.A. Capretz, G. Bitsuamlak, An
Energy Informatics 1 (1) (2018) 1–24. ensemble learning framework for anomaly detection in building energy
[18] G. Dileep, A survey on smart grid technologies and applications, Renew. Energy consumption, Energy Build. 144 (2017) 191–206.
146 (2020) 2589–2625. [54] X. Liu, N. Iftikhar, P.S. Nielsen, A. Heller, Online anomaly energy consumption
[19] J. Minguez, M. Jakob, U. Heinkel, B. Mitschang, A SOA-based approach for the detection using lambda architecture. International Conference on Big Data
integration of a data propagation system. 2009 IEEE International Conference on Analytics and Knowledge Discovery, Springer, 2016, pp. 193–209.
Information Reuse & Integration, IEEE, 2009, pp. 47–52. [55] P.D. Diamantoulakis, V.M. Kapinas, G.K. Karagiannidis, Big data analytics for
[20] A. Vera-Baquero, R. Colomo-Palacios, O. Molloy, Business process analytics using dynamic energy management in smart grids, Big Data Res. 2 (3) (2015) 94–101.
a big data approach, IT Professional 15 (6) (2013) 29–35. [56] H. Peng, F. Long, C. Ding, Feature selection based on mutual information: criteria
[21] C.L. Stimmel, Big Data Analytics Strategies for the Smart Grid, Auerbach of max-dependency, max-relevance, and min-redundancy, IEEE Trans. Pattern
Publications, 2016. Anal. Mach. Intell. 27 (8) (2005) 1226–1238.
[22] D.V. Nga, O.H. See, C.Y. Xuen, L.L. Chee, et al., Visualization techniques in smart [57] B.N. Alajmi, K.H. Ahmed, S.J. Finney, B.W. Williams, Fuzzy-logic-control
grid, Smart Grid Renew. Energy 3 (03) (2012) 175. approach of a modified hill-climbing method for maximum power point in
[23] S. Tyagi, S. Tanwar, N. Kumar, J.J. Rodrigues, Cognitive radio-based clustering microgrid standalone photovoltaic system, IEEE Trans. Power Electron. 26 (4)
for opportunistic shared spectrum access to enhance lifetime of wireless sensor (2011) 1022–1030.
network, Pervas. Mobile Comput. 22 (2015) 90–112. Special Issue on Recent [58] A. Prinzie, D.V. den Poel, Random forests for multiclass classification: random
Developments in Cognitive Radio Sensor Networks. multinomial logit, Expert Syst. Appl. 34 (3) (2008) 1721–1732.
[24] T. Hong, C. Chen, J. Huang, N. Lu, L. Xie, H. Zareipour, Guest editorial big data [59] J. Peppanen, S. Xiaochen Zhang, M.J. Grijalva, Reno, Handling bad or missing
analytics for grid modernization, IEEE Trans. Smart Grid 7 (5) (2016) 2395–2396. smart meter data through advanced data imputation, 2016 IEEE Power Energy
[25] T. Hong, Big data analytics: making the smart grid smarter [guest editorial], IEEE Society Innovative Smart Grid Technologies Conference (ISGT) (2016) 1–5.
Power Energy Mag. 16 (2018 May) 12–16. [60] X. Tong, R. Li, F. Li, C. Kang, Cross-domain feature selection and coding for
[26] A. Kumari, S. Tanwar, S. Tyagi, N. Kumar, Fog computing for healthcare 4.0 household energy behavior, Energy 107 (2016) 9–16.
environment: opportunities and challenges, Comput. Electr. Eng. 72 (2018) 1–13. [61] S. Arora, J.W. Taylor, Forecasting electricity smart meter data using conditional
[27] A. Kumari, S. Tanwar, S. Tyagi, N. Kumar, R.M. Parizi, K.-K.R. Choo, Fog data kernel density estimation, Omega 59 (2016) 47–59. Business Analytics.
analytics: a taxonomy and process model, J. Netw. Comput. Appl. 128 (2019) [62] C. Gentry, D. Boneh, A Fully Homomorphic Encryption Scheme, vol. 20, Stanford
90–104. University Stanford, 2009.
[28] M.Z. Gunduz, R. Das, Cyber-security on smart grid: threats and potential [63] W. Ding, Z. Yan, R.H. Deng, Encrypted data processing with homomorphic re-
solutions, Comput. Netw. 169 (2020) 107094. encryption, Inform. Sci. 409 (2017) 35–55.
[29] R. Roman, J. Lopez, M. Mambo, Mobile edge computing, fog et al.: a survey and [64] C. Dwork, A. Roth, et al., The algorithmic foundations of differential privacy,
analysis of security threats and challenges, Fut. Gen. Comput. Syst. 78 (2018) Found. Trends Theoret. Comput. Sci. 9 (3–4) (2014) 211–407.
680–698. [65] L. Chen, R. Lu, Z. Cao, K. AlHarbi, X. Lin, Muda: multifunctional data aggregation
[30] D. Liu, Z. Yan, W. Ding, M. Atiquzzaman, A survey on secure data analytics in in privacy-preserving smart grid communications, Peer-to-Peer Network. Appl. 8
edge computing, IEEE Internet Things J. 6 (3) (2019) 4946–4967. (5) (2015) 777–792.
[31] J. Hu, A.V. Vasilakos, Energy big data analytics and security: challenges and [66] S. Han, S. Zhao, Q. Li, C.-H. Ju, W. Zhou, Ppm-hda: privacy-preserving and
opportunities, IEEE Trans. Smart Grid 7 (5) (2016) 2423–2436. multifunctional health data aggregation with fault tolerance, IEEE Trans. Inform.
[32] S.J. Pan, Q. Yang, A survey on transfer learning, IEEE Trans. Knowl. Data Eng. 22 Forensics Secur. 11 (9) (2015) 1940–1955.
(10) (2009) 1345–1359. [67] L. Lyu, K. Nandakumar, B. Rubinstein, J. Jin, J. Bedo, M. Palaniswami, Ppfa:
[33] Y. Bengio, Deep learning of representations for unsupervised and transfer privacy preserving fog-enabled aggregation in smart grid, IEEE Trans. Ind.
learning, Proceedings of ICML Workshop on Unsupervised and Transfer Learning Informatics 14 (8) (2018) 3733–3744.
(2012) 17–36. [68] V. Rastogi, S. Nath, Differentially private aggregation of distributed time-series
[34] T. Diethe, M. Girolami, Online learning with (multiple) kernels: a review, Neural with transformation and encryption. Proceedings of the 2010 ACM SIGMOD
Comput. 25 (3) (2013) 567–625. International Conference on Management of data, ACM, 2010, pp. 735–746.
[35] Q. Zhang, C. Zhu, L.T. Yang, Z. Chen, L. Zhao, P. Li, An incremental CFS algorithm [69] F.Z. Benjelloun, A.A. Lahcen, Big data security: challenges, recommendations and
for clustering large data in industrial internet of things, IEEE Trans. Ind. solutions. Handbook of Research on Security Considerations in Cloud Computing,
Informatics 13 (3) (2017) 1193–1201. 2015, pp. 301–313.
[36] Y. Ma, C. Huang, Y. Sun, G. Zhao, Y. Lei, Review of power spatio-temporal big [70] A.R. Khan, Access control in cloud computing environment, ARPN J. Eng. Appl.
data technologies for mobile computing in smart grid, IEEE Access (2019). Sci. 7 (5) (2012) 613–615.
[37] B. Kitchenham, O.P. Brereton, D. Budgen, M. Turner, J. Bailey, S. Linkman, [71] L. Zhou, V. Varadharajan, M. Hitchens, Achieving secure role-based access
Systematic literature reviews in software engineering – a systematic literature control on encrypted data in cloud storage, IEEE Trans. Inform. Forensics Secur. 8
review, Inform. Software Technol. 51 (1) (2009) 7–15. (12) (2013) 1947–1960.
[38] B. Kitchenham, S. Charters, Guidelines for Performing Systematic Literature [72] A. Kumari, S. Tanwar, S. Tyagi, N. Kumar, Verification and validation techniques
Reviews in Software Engineering, 2007. for streaming big data analytics in internet of things environment, IET Netw. 8 (2)
[39] N. Wang, X. Zhou, X. Lu, Z. Guan, L. Wu, X. Du, M. Guizani, When energy trading (2018) 92–100.
meets blockchain in electrical power system: the state of the art, Appl. Sci. 9 (8) [73] D.X. Song, D. Wagner, A. Perrig, Practical techniques for searches on encrypted
(2019) 1561. data. Proceeding 2000 IEEE Symposium on Security and Privacy, S&P 2000,
[40] E. Hossain, I. Khan, F. Un-Noor, S.S. Sikander, M.S.H. Sunny, Application of big IEEE, 2000, pp. 44–55.
data and machine learning in smart grid, and associated security concerns: a [74] Y. Wang, J. Wang, X. Chen, Secure searchable encryption: a survey, J. Commun.
review, IEEE Access 7 (2019) 13960–13988. Inform. Netw. 1 (4) (2016) 52–65.
[41] U. Bodkhe, S. Tanwar, Secure data dissemination techniques for iot applications: [75] UGVCL, Uttar Gujarat Vij Company Limited. http://www.ugvcl.com/bill_calc/
research challenges and opportunities, Software: Pract. Exp. (2020). index.htm (Online; Accessed 12 December 2019).
[42] M. Alizadeh, S. Abolfazli, M. Zamani, S. Baharun, K. Sakurai, Authentication in [76] T. Hong, S. Fan, Probabilistic electric load forecasting: a tutorial review, Int. J.
mobile cloud computing: a survey, J. Netw. Comput. Appl. 61 (2016) 59–80. Forecast. 32 (3) (2016) 914–938.
[43] J. Wayman, A. Jain, D. Maltoni, D. Maio, An introduction to biometric [77] S. Kaneriya, S. Tanwar, A. Nayyar, J.P. Verma, S. Tyagi, N. Kumar, M.S. Obaidat,
authentication systems. Biometric Systems, Springer, 2005, pp. 1–20. J.J.P.C. Rodrigues, Data consumption-aware load forecasting scheme for smart
[44] G. Xu, Z. Yan, A survey on trust evaluation in mobile ad hoc networks. grid systems. 2018 IEEE Globecom Workshops (GC Wkshps), 2018, pp. 1–6.
Proceedings of the 9th EAI International Conference on Mobile Multimedia [78] J. Xie, T. Hong, J. Stroud, Long-term retail energy forecasting with consideration
Communications, ICST (Institute for Computer Sciences, Social-Informatics), of residential customer attrition, IEEE Trans. Smart Grid 6 (2015 Sep)
2016, pp. 140–148. 2245–2252.
[45] J. Cho, A. Swami, I. Chen, A survey on trust management for mobile ad hoc [79] W. Hoiles, V. Krishnamurthy, Nonparametric demand forecasting and detection
networks, IEEE Commun. Surv. Tutorials 13 (2011 Fourth) 562–583. of energy aware consumers, IEEE Trans. Smart Grid 6 (2015 March) 695–704.
[46] V. Hodge, J. Austin, A survey of outlier detection methodologies, Artif. Intell. [80] J. Xie, Y. Chen, T. Hong, T.D. Laing, Relative humidity for load forecasting
Rev. 22 (2) (2004) 85–126. models, IEEE Trans. Smart Grid 9 (2018) 191–198.
[47] J. Peppanen, X. Zhang, S. Grijalva, M.J. Reno, Handling bad or missing smart [81] J. Xie, T. Hong, Wind speed for load forecasting models, Sustainability 9 (5)
meter data through advanced data imputation. 2016 IEEE Power & Energy (2017).
Society Innovative Smart Grid Technologies Conference (ISGT), IEEE, 2016, [82] S. Singh, A. Yassine, Iot big data analytics with fog computing for household
pp. 1–5. energy management in smart grids. International Conference on Smart Grid and
[48] X. Li, C.P. Bowers, T. Schnier, Classification of energy consumption in buildings Internet of Things, Springer, 2018, pp. 13–22.
with outlier detection, IEEE Trans. Ind. Electron. 57 (11) (2009) 3639–3644. [83] W. Yu, D. An, D. Griffith, Q. Yang, G. Xu, Towards statistical modeling and
[49] L. Jian, H. Tao, Y. Meng, Real-time anomaly detection for very short-term load machine learning based energy usage forecasting in smart grid, ACM SIGAPP
forecasting, J. Mod. Power Syst. Clean Energy 6 (2) (2018) 235–243. Appl. Comput. Rev. 15 (1) (2015) 6–16.
[50] H. Huang, Q. Yan, Y. Zhao, W. Lu, Z. Liu, Z. Li, False data separation for data [84] A. Jindal, N. Kumar, M. Singh, A data analytical approach using support vector
security in smart grids, Knowl. Inform. Syst. 52 (3) (2017) 815–834. machine for demand response management in smart grid. 2016 IEEE Power and
[51] A. Al-Wakeel, J. Wu, N. Jenkins, K-means based load estimation of domestic Energy Society General Meeting (PESGM), 2016, pp. 1–5.
smart meter measurements, Appl. Energy 194 (2017) 333–342.
22
A. Kumari and S. Tanwar Sustainable Computing: Informatics and Systems 28 (2020) 100427
[85] J. Shi, W.-J. Lee, Y. Liu, Y. Yang, P. Wang, Forecasting power output of [115] Y. Han, X. Sha, E. Grover-Silva, P. Michiardi, On the impact of socio-economic
photovoltaic systems based on weather classification and support vector factors on power load forecasting, 2014 IEEE International Conference on Big
machines, IEEE Trans. Ind. Appl. 48 (3) (2012) 1064–1069. Data (Big Data) (2014) 742–747.
[86] S. Houde, A. Todd, A. Sudarshan, J.A. Flora, K.C. Armel, Real-time feedback and [116] R. Granell, C.J. Axon, D.C. Wallom, Clustering disaggregated load profiles using a
electricity consumption: a field experiment assessing the potential for savings and dirichlet process mixture model, Energy Convers. Manag. 92 (2015) 507–516.
persistence, Energy J. (2013) 87–102. [117] F. McLoughlin, A. Duffy, M. Conlon, A clustering approach to domestic electricity
[87] X. Guan, Z. Xu, Q.-S. Jia, Energy-efficient buildings facilitated by microgrid, IEEE load profile characterisation using smart metering data, Appl. Energy 141 (2015)
Trans. Smart Grid 1 (3) (2010) 243–252. 190–199.
[88] L. Chen, N. Li, S.H. Low, J.C. Doyle, Two market models for demand response in [118] C. Beckel, L. Sadamori, T. Staake, S. Santini, Revealing household characteristics
power networks, 2010 First IEEE International Conference on Smart Grid from smart meter data, Energy 78 (2014) 397–410.
Communications (2010) 397–402. [119] A.R. Khan, A. Mahmood, A. Safdar, Z.A. Khan, N.A. Khan, Load forecasting,
[89] D. Zhang, S. Li, M. Sun, Z. O‘Neill, An optimal and learning-based demand dynamic pricing and dsm in smart grid: a review, Renew. Sustain. Energy Rev. 54
response and home energy management system, IEEE Trans. Smart Grid 7 (4) (2016) 1311–1322.
(2016) 1790–1801. [120] Z. Jiang, R. Lin, F. Yang, A hybrid machine learning model for electricity
[90] P.B. Luh, L.D. Michel, P. Friedland, C. Guan, Y. Wang, Load forecasting and consumer categorization using smart meter data, Energies 11 (9) (2018) 2235.
demand response. IEEE PES General Meeting, IEEE, 2010, pp. 1–3. [121] P. Li, B. Zhang, Y. Weng, R. Rajagopal, A sparse linear model and significance test
[91] S. Bouktif, A. Fiaz, A. Ouni, M. Serhani, Optimal deep learning lstm model for for individual consumption prediction, IEEE Trans. Power Syst. 32 (2017)
electric load forecasting using feature selection and genetic algorithm: 4489–4500.
comparison with machine learning approaches, Energies 11 (7) (2018) 1636. [122] X. Sun, P.B. Luh, K.W. Cheung, W. Guan, L.D. Michel, S.S. Venkata, M.T. Miller,
[92] R. Gupta, S. Tanwar, S. Tyagi, N. Kumar, Machine learning models for secure data An efficient approach to short-term load forecasting at the distribution level, IEEE
analytics: a taxonomy and threat model, Comput. Commun. (2020). Trans. Power Syst. 31 (2016 July) 2526–2537.
[93] P. Bhattacharya, S. Tanwar, U. Bodke, S. Tyagi, N. Kumar, Bindaas: blockchain- [123] C. Yu, P. Mirowski, T.K. Ho, A sparse coding approach to household electricity
based deep-learning as-a-service in healthcare 4.0 applications, IEEE Trans. Netw. demand forecasting in smart grids, IEEE Trans. Smart Grid 8 (2017 March)
Sci. Eng. (2019). 738–748.
[94] C.-N. Yu, P. Mirowski, T.K. Ho, A sparse coding approach to household electricity [124] J. Xie, T. Hong, Gefcom2014 probabilistic electric load forecasting: an integrated
demand forecasting in smart grids, IEEE Trans. Smart Grid 8 (2) (2016) 738–748. solution with forecast combination and residual simulation, Int. J. Forecast. 32
[95] P. Li, B. Zhang, Y. Weng, R. Rajagopal, A sparse linear model and significance test (3) (2016) 1012–1016.
for individual consumption prediction, IEEE Trans. Power Syst. 32 (6) (2017) [125] N. Mahmoudi-Kohan, M.P. Moghaddam, M. Sheikh-El-Eslami, E. Shayesteh,
4489–4500. A three-stage strategy for optimal price offering by a retailer based on clustering
[96] H. Shaker, H. Zareipour, D. Wood, A data-driven approach for estimating the techniques, Int. J. Electr. Power Energy Syst. 32 (10) (2010) 1135–1142.
power generation of invisible solar sites, IEEE Trans. Smart Grid 7 (5) (2015) [126] A. Jindal, M. Singh, N. Kumar, Consumption-aware data analytical demand
2466–2476. response scheme for peak load reduction in smart grid, IEEE Trans. Ind. Electron.
[97] T. Zhang, G. Zhang, J. Lu, X. Feng, W. Yang, A new index and classification 65 (11) (2018) 8993–9004.
approach for load pattern analysis of large electricity customers, IEEE Trans. [127] S. Joseph, J. Erakkath Abdu, Real-time retail price determination in smart grid
Power Syst. 27 (2012 Feb) 153–160. from real-time load profiles, Int. Trans. Electr. Energy Syst. 28 (3) (2018) e2509.
[98] S.B. Taieb, R. Huser, R.J. Hyndman, M.G. Genton, Forecasting uncertainty in [128] N. Mahmoudi-Kohan, M.P. Moghaddam, M. Sheikh-El-Eslami, An annual
electricity smart meter data by boosting additive quantile regression, IEEE Trans. framework for clustering-based pricing for an electricity retailer, Electr. Power
Smart Grid 7 (5) (2016) 2448–2455. Syst. Res. 80 (9) (2010) 1042–1048.
[99] H. Shaker, H. Zareipour, D. Wood, Estimating power generation of invisible solar [129] M.L. Maigha, Crow, Clustering-based methodology for optimal residential time of
sites using publicly available data, IEEE Trans. Smart Grid 7 (5) (2016) use design structure, 2014 North American Power Symposium (NAPS) (2014)
2456–2465. 1–6.
[100] A. Jindal, A. Dua, K. Kaur, M. Singh, N. Kumar, S. Mishra, Decision tree and svm- [130] R. Li, Z. Wang, C. Gu, F. Li, H. Wu, A novel time-of-use tariff design based on
based data analytics for theft detection in smart grid, IEEE Trans. Ind. Informatics Gaussian mixture model, Appl. Energy 162 (2016) 1530–1536.
12 (2016) 1005–1016. [131] T.K. Wijaya, M. Vasirani, K. Aberer, When bias matters: An economic assessment
[101] T. Hong, J. Wilson, J. Xie, Long term probabilistic load forecasting and of demand response baselines for residential customers, IEEE Trans. Smart Grid 5
normalization with hourly information, IEEE Trans. Smart Grid 5 (1) (2013) (2014) 1755–1763.
456–462. [132] Y. Weng, R. Rajagopal, Probabilistic baseline estimation via gaussian process,
[102] T. Hong, P. Pinson, S. Fan, H. Zareipour, A. Troccoli, R.J. Hyndman, Probabilistic 2015 IEEE Power Energy Society General Meeting (2015) 1–5.
Energy Forecasting: Global Energy Forecasting Competition 2014 and Beyond, [133] Y. Zhang, W. Chen, R. Xu, J. Black, A cluster-based method for calculating
2016. baselines for residential loads, IEEE Trans. Smart Grid 7 (2016 Sep) 2368–2377.
[103] J. Xie, T. Hong, Temperature scenario generation for probabilistic load [134] L. Hatton, P. Charpentier, E. Matzner-Løber, Statistical estimation of the
forecasting, IEEE Trans. Smart Grid 9 (2018) 1680–1687. residential baseline, IEEE Trans. Power Syst. 31 (2016) 1752–1759.
[104] S. Kaneriya, S. Tanwar, S. Buddhadev, J.P. Verma, S. Tyagi, N. Kumar, S. Misra, [135] G. Chicco, Overview and performance assessment of the clustering methods for
A range-based approach for long-term forecast of weather using probabilistic electrical load pattern grouping, Energy 42 (1) (2012) 68–80, 8th World Energy
markov model, 2018 IEEE International Conference on Communications System Conference, WESC 2010.
Workshops (ICC Workshops) (2018) 1–6. [136] K. le Zhou, S. lin Yang, C. Shen, A review of electric load classification in smart
[105] G. Dahua, W. Yi, Y. Shuo, K. Chongqing, Embedding based quantile regression grid environment, Renew. Sustain. Energy Rev. 24 (2013) 103–110.
neural network for probabilistic load forecasting, J. Mod. Power Syst. Clean [137] Yi Wang, Qixin Chen, Chongqing Kang, Mingming Zhang, Ke Wang, Yun Zhao,
Energy 6 (2) (2018) 244–254. Load profiling and its application to demand response: a review, Tsinghua Sci.
[106] J. Black, A. Hoffman, T. Hong, J. Roberts, P. Wang, Weather data for energy Technol. 20 (2015) 117–129.
analytics: From modeling outages and reliability indices to simulating distributed [138] R. Granell, C.J. Axon, D.C.H. Wallom, Impacts of raw data temporal resolution
photovoltaic fleets, IEEE Power Energy Mag. 16 (3) (2018) 43–53. using selected clustering methods on residential electricity load profiles, IEEE
[107] J. Xie, T. Hong, T. Laing, C. Kang, On normality assumption in residual simulation Trans. Power Syst. 30 (2015) 3217–3224.
for probabilistic load forecasting, IEEE Trans. Smart Grid 8 (3) (2015) [139] T. Hong, D.W. Gao, T. Laing, D. Kruchten, J. Calzada, Training energy data
1046–1053. scientists: universities and industry need to work together to bridge the talent
[108] B. Liu, J. Nowotarski, T. Hong, R. Weron, Probabilistic load forecasting via gap, IEEE Power Energy Mag. 16 (2018) 66–73.
quantile regression averaging on sister forecasts, IEEE Trans. Smart Grid 8 (2) [140] D. Alahakoon, X. Yu, Smart electricity meter data intelligence for future energy
(2017) 730–737. systems: a survey, IEEE Trans. Ind. Informatics 12 (2016) 425–436.
[109] J. Xie, T. Hong, Variable selection methods for probabilistic load forecasting: [141] P. Jokar, N. Arianpoo, V.C.M. Leung, Electricity theft detection in ami using
empirical evidence from seven states of the united states, IEEE Trans. Smart Grid customers’ consumption patterns, IEEE Trans. Smart Grid 7 (2016) 216–226.
9 (6) (2017) 6039–6046. [142] H. Shi, M. Xu, R. Li, Deep learning for household load forecasting – a novel
[110] P. Gaillard, Y. Goude, R. Nedellec, Additive models and robust aggregation for pooling deep rnn, IEEE Trans. Smart Grid 9 (2018) 5271–5280.
gefcom2014 probabilistic electric load and electricity price forecasting, Int. J. [143] Y. Wang, Q. Chen, C. Kang, Q. Xia, Clustering of electricity consumption behavior
Forecast. 32 (3) (2016) 1038–1050. dynamics toward big data applications, IEEE Trans. Smart Grid 7 (2016)
[111] S. Zhong, K. Tam, Hierarchical classification of load profiles based on their 2437–2447.
characteristic attributes in frequency domain, IEEE Trans. Power Syst. 30 (2015 [144] M. Chaouch, Clustering-based improvement of nonparametric functional time
Sep) 2434–2441. series forecasting: application to intra-day household-level load curves, IEEE
[112] Y. Wang, Q. Chen, C. Kang, Q. Xia, M. Luo, Sparse and redundant representation- Trans. Smart Grid 5 (2014 Jan) 411–419.
based smart meter data compression and pattern extraction, IEEE Trans. Power [145] A. Jindal, A. Schaeffer-Filho, A. Marnerides, P. Smith, A. Mauthe, L. Granville,
Syst. 32 (2017) 2142–2151. Tackling Energy Theft in Smart Grids Through Data-Driven Analysis, 2019.
[113] D. Vercamer, B. Steurtewagen, D. Van den Poel, F. Vermeulen, Predicting [146] K. Wang, B. Wang, L. Peng, Cvap: validation for cluster analyses, Data Sci. J.
consumer load profiles using commercial and open data, IEEE Trans. Power Syst. (2009) vol. advpub, p. 0904220071.
31 (2016 Sep) 3693–3701. [147] S.S.S.R. Depuru, L. Wang, V. Devabhaktuni, R.C. Green, High performance
[114] F. McLoughlin, A. Duffy, M. Conlon, Characterising domestic electricity computing for detection of electricity theft, Int. J. Electr. Power Energy Syst. 47
consumption patterns by dwelling and occupant socio-economic variables: an (2013) 21–30.
Irish case study, Energy Build. 48 (2012) 240–248.
23
A. Kumari and S. Tanwar Sustainable Computing: Informatics and Systems 28 (2020) 100427
[148] L.A.P. Júnior, C.C.O. Ramos, D. Rodrigues, D.R. Pereira, A.N. de Souza, K.A.P. da [175] R. Gupta, S. Tanwar, S. Tyagi, N. Kumar, Tactile-internet-based telesurgery
Costa, J.P. Papa, Unsupervised non-technical losses identification through system for healthcare 4.0: an architecture, research challenges, and future
optimum-path forest, Electr. Power Syst. Res. 140 (2016) 413–423. directions, IEEE Network 33 (2019 Nov) 22–29.
[149] A.H. Nizar, Z.Y. Dong, Y. Wang, Power utility nontechnical loss analysis with [176] M. McGranaghan, D. Houseman, L. Schmitt, F. Cleveland, E. Lambert, Enabling
extreme learning machine method, IEEE Trans. Power Syst. 23 (2008) 946–955. the integrated grid: leveraging data to integrate distributed resources and
[150] V. Botev, M. Almgren, V. Gulisano, O. Landsiedel, M. Papatriantafilou, J. van customers, IEEE Power Energy Mag. 14 (1) (2016) 83–93.
Rooij, Detecting non-technical energy losses through structural periodic patterns [177] Umass Smart Data Set. http://traces.cs.umass.edu/index.php/Smart/Smart
in ami data, 2016 IEEE International Conference on Big Data (Big Data) (2016 (Online; Accessed 12 December 2019).
Dec) 3121–3130. [178] I.N. England, ISO New England Zonal Information, 2017.
[151] H. Janetzko, F. Stoffel, S. Mittelstädt, D.A. Keim, Anomaly detection for visual [179] Ausgird, Distribution Zone Substation Information Data to Share. http://www.
analytics of power consumption data, Comput. Graph. 38 (2014) 27–37. ausgrid.com.au/Common/About-us/Corporate-information/Data-to-sh
[152] A. Kumari, S. Tanwar, S. Tyagi, N. Kumar, M.S. Obaidat, J.J. Rodrigues, Fog are/DistZone-subs.aspx#.WYD6KenauUl (Online; Accessed 12 December 2019).
computing for smart grid systems in the 5g environment: challenges and [180] P.M. Mammen, H. Kumar, K. Ramamritham, H. Rashid, Want to reduce energy
solutions, IEEE Wireless Commun. 26 (3) (2019) 47–53. consumption, whom should we call?. Proceedings of the Ninth International
[153] W. Luan, J. Peng, M. Maras, J. Lo, B. Harapnuk, Smart meter data analytics for Conference on Future Energy Systems ACM, 2018, pp. 12–20.
distribution network connectivity verification, IEEE Trans. Smart Grid 6 (2015 [181] P. Street, Real Energy Real Customers in Real Time. http://www.pecanstreet.or
July) 1964–1971. g/energy/ (Online; Accessed 12 December 2019).
[154] J. Peppanen, S. Grijalva, M.J. Reno, R.J. Broderick, Distribution system low- [182] C. Miller, F. Meggers, The building data genome project: an open, public data set
voltage circuit topology estimation using smart metering data, 2016 IEEE/PES from non-residential building electrical meters, Energy Proc. 122 (2017)
Transmission and Distribution Conference and Exposition (TD) (2016) 1–5. 439–444.
[155] Y. Weng, Y. Liao, R. Rajagopal, Distributed energy resources topology [183] J.R. Schofield, R. Carmichael, S. Tindemans, M. Bilton, M. Woolf, G. Strbac, et al.,
identification via graphical modeling, IEEE Trans. Power Syst. 32 (2017) Low Carbon London Project: Data From the Dynamic Time-of-Use Electricity
2682–2694. Pricing Trial, 2013, UK Data Service, 2015 [Data Collection].
[156] Y. Liao, Y. Weng, R. Rajagopal, Urban distribution grid topology reconstruction [184] E.L. Ratnam, S.R. Weller, C.M. Kellett, A.T. Murray, Residential load and rooftop
via lasso, 2016 IEEE Power and Energy Society General Meeting (PESGM) (2016) pv generation: an australian distribution network dataset, Int. J. Sustain. Energy
1–5. 36 (8) (2017) 787–806.
[157] S.J. Pappu, N. Bhatt, R. Pasumarthy, A. Rajeswaran, Identifying topology of low [185] I. S. S. D. Archive, Commission for Energy Regulation (cer) Smart Metering
voltage distribution networks based on smart meter data, IEEE Trans. Smart Grid Project. http://www.ucd.ie/issda/data/commissionforenergyregulationcer/
9 (2018 Sep) 5113–5122. (Online; Accessed 12 December 2019).
[158] M. Xu, R. Li, F. Li, Phase identification with incomplete data, IEEE Trans. Smart [186] T. Hong, P. Pinson, S. Fan, Global Energy Forecasting Competition 2012, 2014.
Grid 9 (2018 July) 2777–2785. [187] OpenEI, Open Energy Information. https://openei.org/datasets/dataset?secto
[159] R. Gupta, S. Tanwar, S. Tyagi, N. Kumar, Tactile internet and its applications in 5g rs=smartgrid (Online; Accessed 12 December 2019).
era: a comprehensive review, Int. J. Commun. Syst. 32 (14) (2019) e3981, e3981 [188] R. Beck, Beyond bitcoin: the rise of blockchain world, Computer 51 (2) (2018)
dac.3981. 54–58.
[160] V.C. Gungor, D. Sahin, T. Kocak, S. Ergut, C. Buccella, C. Cecati, G.P. Hancke, [189] G. Karame, S. Capkun, Blockchain security and privacy, IEEE Secur. Privacy 16
A survey on smart grid potential applications and communication requirements, (4) (2018) 11–12.
IEEE Trans. Ind. Informatics 9 (2013 Feb) 28–42. [190] S. Tanwar, K. Parekh, R. Evans, Blockchain-based electronic healthcare record
[161] H. Tram, Technical and operation considerations in using smart metering for system for healthcare 4.0 applications, J. Inform. Secur. Appl. 50 (2020) 102407.
outage management, 2008 IEEE/PES Transmission and Distribution Conference [191] J. Moubarak, E. Filiol, M. Chamoun, On blockchain security and relevant attacks.
and Exposition (2008 April) 1–3. 2018 IEEE Middle East and North Africa Communications Conference
[162] K. Kuroda, R. Yokoyama, D. Kobayashi, T. Ichimura, An approach to outage (MENACOMM), IEEE, 2018, pp. 1–6.
location prediction utilizing smart metering data, 2014 8th Asia Modelling [192] R. Gupta, S. Tanwar, S. Tyagi, N. Kumar, M.S. Obaidat, B. Sadoun, Habits:
Symposium (2014) 61–66. Blockchain-based telesurgery framework for healthcare 4.0, 2019 International
[163] Y. Jiang, C. Liu, M. Diedesch, E. Lee, A.K. Srivastava, Outage management of Conference on Computer, Information and Telecommunication Systems (CITS)
distribution systems incorporating information from smart meters, IEEE Trans. (2019 Aug) 1–5.
Power Syst. 31 (2016 Sep) 4144–4154. [193] J. Vora, A. Nayyar, S. Tanwar, S. Tyagi, N. Kumar, M.S. Obaidat, J.J.P.
[164] R. Moghaddass, J. Wang, A hierarchical framework for smart grid anomaly C. Rodrigues, Bheem: a blockchain-based framework for securing electronic
detection using large-scale smart meter data, IEEE Trans. Smart Grid 9 (2018) health records, 2018 IEEE Globecom Workshops (GC Wkshps) (2018) 1–6.
5820–5830. [194] S. Tanwar, R. Gupta, A. Kumari, S. Tyagi, N. Kumar, Security and privacy of
[165] J. Zheng, D.W. Gao, L. Lin, Smart meters in smart grid: an overview, 2013 IEEE electronics healthcare records. The IET Book Series on e-Health Technologies,
Green Technologies Conference (GreenTech) (2013) 57–64. Institution of Engineering and Technology, Stevenage, United Kingdom, 2010,
[166] M.P. Tcheou, L. Lovisolo, M.V. Ribeiro, E.A.B. da Silva, M.A.M. Rodrigues, J.M. pp. 1–35.
T. Romano, P.S.R. Diniz, The compression of electric signal waveforms for smart [195] I. Mistry, S. Tanwar, S. Tyagi, N. Kumar, Blockchain for 5g-enabled iot for
grids: state of the art and future trends, IEEE Trans. Smart Grid 5 (2014) 291–302. industrial automation: a systematic review, solutions, and challenges, Mech. Syst.
[167] A. Unterweger, D. Engel, Resumable load data compression in smart grids, IEEE Signal Process. 135 (2020) 106382.
Trans. Smart Grid 6 (2015) 919–929. [196] N. Kabra, P. Bhattacharya, S. Tanwar, S. Tyagi, Mudrachain: blockchain-based
[168] A. Unterweger, D. Engel, M. Ringwelski, The effect of data granularity on load framework for automated cheque clearance in financial institutions, Fut. Gen.
data compression, in: S. Gottwalt, L. König, H. Schmeck (Eds.), Energy Comput. Syst. 102 (2020) 574–587.
Informatics, Springer International Publishing, 2015, pp. 69–80 (Cham). [197] D. Vujicic, D. Jagodic, S. Randic, Blockchain technology, bitcoin, and ethereum: a
[169] G. Eibl, D. Engel, Influence of data granularity on smart meter privacy, IEEE brief overview, 2018 17th International Symposium INFOTEH-JAHORINA
Trans. Smart Grid 6 (2015 March) 930–939. (INFOTEH) (2018) 1–6.
[170] S. Tanwar, N. Kumar, J.-W. Niu, Eemhr: energy-efficient multilevel [198] Dubai Blockchain Tehcnology. https://www.smartdubai.ae/initiatives/blockchai
heterogeneous routing protocol for wireless sensor networks, Int. J. Commun. n (Online; Accessed 04 December 2019).
Syst. 27 (9) (2014) 1289–1318. [199] A.S. Musleh, G. Yao, S. Muyeen, Blockchain applications in smart grid-review and
[171] L. Sankar, S.R. Rajagopalan, S. Mohajer, H.V. Poor, Smart meter privacy: a frameworks, IEEE Access 7 (2019) 86746–86757.
theoretical framework, IEEE Trans. Smart Grid 4 (2013) 837–846. [200] R. Singh, S. Tanwar, T.P. Sharma, Utilization of blockchain for mitigating the
[172] M. Savi, C. Rottondi, G. Verticale, Evaluation of the precision-privacy tradeoff of distributed denial of service attacks, Secur. Privacy (2019) e96.
data perturbation for smart metering, IEEE Trans. Smart Grid 6 (2015 Sep) [201] S. Tanwar, Q. Bhatia, P. Patel, A. Kumari, P.K. Singh, W.-C. Hong, Machine
2409–2416. learning adoption in blockchain-based smart applications: the challenges, and a
[173] J. Vora, P. Italiya, S. Tanwar, S. Tyagi, N. Kumar, M.S. Obaidat, K. Hsiao, way forward, IEEE Access 8 (2019) 474–488.
Ensuring privacy and security in e- health records, 2018 International Conference [202] R. Gupta, S. Tanwar, F. Al-Turjman, P. Italiya, A. Nauman, S.W. Kim, Smart
on Computer, Information and Telecommunication Systems (CITS) (2018) 1–5. contract privacy protection using ai in cyber-physical systems: Tools, techniques
[174] C.E. Kement, H. Gultekin, B. Tavli, T. Girici, S. Uludag, Comparative analysis of and challenges, IEEE Access (2020).
load-shaping-based privacy preservation strategies in a smart grid, IEEE Trans. [203] A. Kumari, A. Shukla, R. Gupta, S. Tanwar, S. Tyagi, N. Kumar, Et-deal: A p2p
Ind. Informatics 13 (6) (2017) 3226–3235. smart contract-based secure energy trading scheme for smart grid systems.
Proceedings of the INFOCOM 2020 WKSHPS BlockSecSDN, IEEE INFOCOM 2020
in Toronto, Canada, 2020, pp. 1–8.
24