Big Data for Energy Efficiency Insights
Big Data for Energy Efficiency Insights
School of Science
Master’s Programme in International Design Business Management
Hussnain Ahmed
Master’s Thesis
Espoo, August, 2014
Hussnain Ahmed
Abbreviations and Acronyms
1 Introduction 1
1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Helpful hints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Structure of the thesis document . . . . . . . . . . . . . . . . . . . 4
2 Background 5
2.1 Smart grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 The CIVIS project . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 The Green Campus initiative . . . . . . . . . . . . . . . . . . . . . 7
2.4 Big Data analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.1 Parallel batch processing with Hadoop . . . . . . . . . . . . 9
2.4.2 Real time Big Data processing . . . . . . . . . . . . . . . . 11
2.5 Energy efficiency and eco-effeciency . . . . . . . . . . . . . . . . . . 13
2.6 Daily consumption patterns, base load and user load . . . . . . . . 14
2.7 Energy consumption seasonal patterns . . . . . . . . . . . . . . . . 16
2.8 Classification of buildings based on energy efficiency . . . . . . . . . 17
2.8.1 K-means clustering . . . . . . . . . . . . . . . . . . . . . . . 17
2.9 Forecasting the energy consumption . . . . . . . . . . . . . . . . . . 19
2.9.1 Main conditions and Steps for Quantitative Forecasting . . . 19
2.9.2 Time Series Analysis . . . . . . . . . . . . . . . . . . . . . . 20
2.9.3 Autoregression, Moving Averages and ARIMA . . . . . . . . 20
3 Methodology 24
3.1 Kumiega-Van Vliet model . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Adaptation of the Kumiega-Van Vliet model . . . . . . . . . . . . . 26
3.3 Stages, steps and cycles . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.1 Stage 1. Conceptualization . . . . . . . . . . . . . . . . . . 27
3.3.2 Stage 2. Implementation . . . . . . . . . . . . . . . . . . . . 30
3.3.3 Stage 3. Data Analysis . . . . . . . . . . . . . . . . . . . . . 32
3.3.4 Stage 4. Documentation . . . . . . . . . . . . . . . . . . . . 34
3.4 Iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6 Discussion 64
6.1 Big Data tools and techniques . . . . . . . . . . . . . . . . . . . . . 64
6.2 Big Data Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.3 Using Big Data analytics for energy efficiency . . . . . . . . . . . . 66
7 Conclusions 68
C Detailed Results 77
C.1 K-means clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
C.2 Base loads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Chapter 1
Introduction
1
CHAPTER 1. INTRODUCTION 2
insights. The model is based on open source software components available free
of charge. There are other closed source software alternatives that can fit into the
presented model, but the discussions about these solutions are not included in this
document.
The topic of research is inspired by the European Union’s “Cities as drivers of
social change” (CIVIS) project under the seventh framework. The CIVIS project
focuses on the adoption of ICT tools and techniques for integrating social aspects of
city life into production, distribution and consumption of energy. It aims to make
city life as a functional unit for improving energy efficiency. The use of pervasive
ubiquitous computing is driving the smart energy solutions. The smart energy
devices as part of this ecosystem generate high volumes of data. This data needs
to be instantaneously transferred, stored, analysed and visualised for knowledge
discovery and improvements of services. The platform that was developed as part
of this endeavour has the capability to automate the whole process.
The data from smart energy devices was analysed to detect the usage patterns
and classify buildings on the basis of energy efficiency. Evaluation of some predic-
tion models for energy consumption of household appliances was also included in
the scope of research. These use cases provide the basis for designing, planning and
implementing schemes for improving energy related services for achieving higher
efficiency in both production and usage. The insights generated from these use
cases can also help in educating the consumer about the benefits of energy effi-
ciency and spread awareness about behavioural changes from which the society
and the individuals can benefit.
This research is also supported by Technical Research Centre of Finland (VTT)
as part of their Green Campus initiative. This project focuses on use of ICT based
solutions for management and control systems to optimize energy consumption
without compromising the indoor environment of the buildings. VTT is also a
supporter and partner of the CIVIS project. VTT has installed specialized smart
devices in selected test sites. VTT has contributed to our research by providing
the data generated by these smart devices. VTT has also helped in scoping the
use cases for energy efficiency with the experience and the knowledge they have
gained from the related projects and research.
In a nutshell, this thesis focuses on providing a solution for collecting, storing,
analysing and visualising data generated by smart energy device for generating
insights about energy consumption patterns and discovering the performance of
different building units in terms of energy efficiency. The data analysis part of our
research provides the models for knowledge discovery that can be used to improve
energy efficiency at both producer and consumer ends. The Big Data analytics
platform developed as part of this project is not limited to only being used for
energy efficiency. It has the capability of handling other Big Data uses cases.
CHAPTER 1. INTRODUCTION 3
However, within the scope of this document we discuss its use for energy usage
patterns’ detection and efficiency.
Background
This chapter describes the main motivation and theoretical background behind our
research. In a systematic stepwise approach we list and describe the main topics.
We start with the motivation, inspiration and the partners of this thesis and then
we explain the theoretical concepts with reference to previous work done on the
respective topics. For each topic we also describe how it has contributed to our
research.
5
CHAPTER 2. BACKGROUND 6
on electric power”[17].
Traditionally, power system participants have been strictly producers or con-
sumers of electricity. The demand response and reliability issues with conventional
electric power distribution models on the consumer side are causing a major trend
in motivating consumers to produce electricity at a domestic level mostly using
the renewable energy production methods. “Prosumer” is an emerging term used
for an economically motivated entity i.e. [23]:
• Operates or owns a power grid small or large, and hence transports electricity,
and
The current energy grids support unidirectional distribution models and are
centralized in nature. They have very limited ability to handle the prosumer
needs. Line losses and hierarchical topology makes them less reliable. They usually
become bottleneck when rapid adaptations are required for the demand response.
In [20], Farhangi defines smart grids as:
“The next-generation electricity grid, expected to address the major shortcom-
ings of the existing grid. In essence, the smart grid needs to provide the utility
companies with full visibility and pervasive control over their assets and services.
The smart grid is required to be self-healing and resilient to system anomalies. And
last but not least, a smart grids needs to empower its stakeholders to define and
realize new ways of engaging with each other and performing energy transactions
across the system”.
us with the data of device level energy consumption details i.e. electricity used
by different home appliances. This was achieved using smart NIALM-3 [25] meters
that can distinguish between different electric devices used on the basis of their
signal thumb print.
Apart from providing the data, the researchers from VTT’s Green Campus
initiative have also helped us in formulating the use cases for this thesis research.
Figure 2.1: Gartner 3V’s of data and data magnitude index [32].
and veracity based on the requirements while trying to maximize the valuation for
the use case. In the following subsections we discuss some of the relevant tech-
nological advancements that enable handling of the mentioned challenges of Big
Data analytics. These concepts, tools and techniques are also used in developing
the data analytics platform and performing the analysis for our thesis research.
the viable option is to scale out 5 using the required number of smaller machines
with relatively low computing resources in parallel. From programming point of
view managing parallel running processes on different machines while ensuring low
failure rate, is a tough job. So the desired system should provide programmers
an abstraction from lower level system details to enable rapid and fault tolerant
development for Big Data applications. MapReduce is a parallel batch processing
framework developed at Google for the purpose of web indexing. The concept of
MapReduce was published by Jeffrey Dean and Sanjay Ghemawat in 2008 within
their research paper “MapReduce: simplified data processing on large clusters”
[18]. This paper describes MapReduce as:
“a programming model that provides a map function that processes a key/value
pair to generate a set of intermediate key/value pairs, and a reduce function that
merges all intermediate values associated with the same intermediate key. Pro-
grams written in this functional programming style are automatically parallelized
and executed on a large cluster of commodity machines. The run-time system takes
care of the details of partitioning the input data, scheduling the program’s execution
across a set of machines, handling machine failures, and managing the required
inter-machine communication”.
Hadoop is the open source software framework whose main components are
drived from MapReduce. It was developed by Doug Cutting and Mike Cafarella.
It was initially created in 2005 to support an open source search engine but then
adapted to the published MapReduced framework [18]. It was released by the
Apache foundation. Apache has also built various supporting tools around Hadoop
framework to support end-to-end Big Data analytics ecosystems e.g. Apache flume
for data collection, Hadoop File system (HDFS) for storing, Apache Pig and Hive
for processing, Apache Mahout for machine learning.
Hadoop is a batch processing framework that empower processing of large
volumes of data using commercial grade low cost computing infrastructure. So
they support volume and valuation directly. Variety can also be supported by
different file formats in HDFS. Veracity is subjected to supported tools like data
collection or data mining tools. Support for such tools is available in Apache
Hadoop e.g. Flume, Mahout etc. Velocity however is the only feature that a batch
processing framework like Hadoop cannot handle. The next subsection answers
the question of velocity.
5
When the need for computing power increases, the tasks are divided between a large number
of less powerful machines with (relatively) slow CPUs, moderate memory amounts, moderate
hard disk counts.
CHAPTER 2. BACKGROUND 11
processing power and flexibility of the Hadoop. For our research we have used
Cloudera Impala for handling near to real-time velocity for Big Data processing.
Energy consumed
Energy ef f iciency of a building = (2.1)
Built area
In case of a specific energy consumption (SEC) [22] equation 2.1 can be written
as
Q
SEC = (2.2)
A
Where Q denotes the consumption for a single energy type for example electricity
and A is the built area in square metres. In subsequent sections we shall be refer-
ring to these equations when we try to identify the usage patterns at building level,
discuss the relevance of energy efficiency and then discuss a model for classifying
buildings by energy efficiency .
The base load of a building is one important metric that can be detected through
observing the daily consumption. The base load is the consumption that takes
place regardless of the actual use of the building and of the user’s energy con-
sumption [22]. It is the permanent minimum load that a power supply system
is required to deliver. The base load is usually caused by the continuous con-
sumption for building maintenance like air conditioning, ventilation, or night time
lighting. Sometimes the base load also includes energy consumption by functional
components inside building like computer servers, lab equipment, and refrigerators
etc. However VTT differentiates the base load from the user energy load that is
characterized by the direct involvement of the users of a building. For example
an office building has the peak load during the day time because users are using
various additional appliances like personal computers, coffee makers, lights etc.
compared to the base load that is generated during the night time when the office
building is not in use. Figure 2.4 illustrates the concept of base load and user load.
Figure 2.4: Base load, user load and energy efficiency [22].
In the Figure 2.4, the base load is written as base consumption. The energy
efficiency of base consumption and the energy efficiency of user load can be cal-
culated using equation 2.1 or 2.2. This provides a weighted metric that can be
benchmarked and compared. It can help to narrow down the scope of research by
referring to problematic buildings and their issues.
CHAPTER 2. BACKGROUND 16
we are using the Hartigon-Wong method. We shall also use some references from
Forgy method when explaining the K-means algorithm.
The K-means groups the data points into clusters with logical centre points.
The aim of the K-means algorithm is to divide data points within certain dimen-
sions into K clusters so that the within-cluster sum of squares is minimized [26].
Let’s assume if we want to have the K cluster for data points D = {x1 , x2 , . . . , xn }
in d dimensions then
xi ∈ R d
The K-means algorithm uses following steps to cluster data into groups [40].
1. Initialize the centroids randomly for each K i.e. for each group.
2. Data points are assigned to the closest centroid.
3. Move the centroids to the mean of the data points assigned to that centroid
in step 2.
4. Repeat 2 and 3 till convergence. Convergence means that the values have
stopped changing for further iterations.
Mathematically randomly initialized centroids are
µ1 , µ2 , . . . , µk ∈ Rn
If ci is the distance of centroid to assigned data point then Step 2 and 3 with
recursive distance minimization and mean adjustment can be explained as
For every i, set
ci := arg min ||xi − µj ||2 (2.3)
j
The equation above used the Euclidean distance formula for calculating distance
between centroid and data point.
For every j, set
n
1{ci = j}xi
P
i=1
µj := Pn (2.4)
1{ci = j}
i=1
The input to the K-means is a set of feature vectors along with the number
of clusters required. Before inserting data to the K-means, it is required to set
the similar scale for features as well set the standard variance to avoid errors in
the results. We were required to classify pilot site buildings into four groups with
high efficiency, moderate efficiency, low efficiency and poor efficiency classes, So
we have set K value as 4.
CHAPTER 2. BACKGROUND 19
4. Choosing and fitting the forecasting model. The model depends upon the
relationships between variables. Every model has its own construct. So data
needs to be fitted to that construct before applying that model. We discuss
it more in the data analysis part of this thesis.
Xt = dt + t (2.5)
Where dt is the deterministic component and t is the random component. The
deterministic component itself can be in the form of trends, periods, and jumps
etc. Figure 2.5 illustrates the example of different time series. In each illustration
there is at least one stochastic random component with and without deterministic
components.
In figure 2.5 illustrations 2, 3 and 4 contain a deterministic component with
a random component. When forecasting for such cases, it is possible to predict
even the random component by using the deterministic component. However for
stochastic random time series data without any deterministic components it is very
hard to predict anything accurately. The time series with no predictable pattern
is generally termed as a stationary time series.
[Link] Regression
The concept behind basic regression techniques for forecasting is that we try to
forecast a variable ‘y’ on the basis of another variable ‘x’. For example a linear
CHAPTER 2. BACKGROUND 21
[Link] Auto-regression
The auto-regressive model is based on the concept of a variable regressing on itself.
For auto-regression we can drive the equation as
xt = β0 + β1 xt−1 + et + (2.7)
6
The differences between consecutive observations
CHAPTER 2. BACKGROUND 22
The aim for good estimation is to select values of β0 and β1 that can minimize
the sum of the square of errors. The above equation can be used to estimate the
value based on previous values. But in case we want to estimate based on multiple
previous values e.g. ‘p’ values then we can write it as
xt = c + β1 xt−1 + β2 xt−2 + . . . + βp xt−p + et + (2.8)
We just replaced β0 with a constant c as it is a constant value. Adding the
summation to the historic values we can write
p
X
xt = c + et + βi xt−i
i=1
we have also taken out the random component that does not meet the basic
conditions for forecasting as described in subsection 2.9.1. The model presented
in equation 2.8 is referd to as AR(p) model.
Now to simplify the complex time series equation back-shift notations are usually
used e.g. yt − 1 can be denoted by Byt i.e.
Byt = yt−1
&
B(Byt ) = B 2 yt = yt−2
&
yt − yt−1 = (1 − B)yt
In general a dth order difference is written as
(1 − B)d yt
By rearranging equation 2.10 and using back-shift notations, we can have the
following equation with labelled p,d and q for the ARIMA model.
The explanation and the equations used in section 2.9.3 were cherry picked from
Rob Hyndman’s book “Forecasting: Principles and Practice” [27] as reference to
theory related to our [Link] further details please refer to chapter 5 and
chapter 8 of this book.
Fitting the ARIMA model and estimating the future time series values needs
intensive computation. We use software e.g. R to solve these equations for our use
cases.
Chapter 3
Methodology
In previous chapters we introduced our research problem, listed and explained the
solution options with the theoretical background. In this chapter we explain our
practical approach for carrying out the research along with the software develop-
ment required to support the experimentation and data analysis for our research.
Following is the list of major tasks for the practical part of our research.
• Understanding Data Analytics ecosystem, evaluating the Big Data tools and
solutions.
• Exploratory data analysis and selection of algorithms and data analysis tools
with respect to use cases.
Some of these tasks were required to be performed in a sequential way e.g. require-
ment engineering and evaluation of Big Data tools were required before developing
the Big Data analytics platform or selecting the algorithms. Similarly we needed
results before visualisations could be created. On the other hand some of the
tasks could have been executed in parallel. For example the documentation was
24
CHAPTER 3. METHODOLOGY 25
an ongoing process along with all other tasks. Similarly the literature review for
understanding each component of our research was also an ongoing process. Then
the iterations were required for continuous improvement.
To tackle these challenges, we needed a methodology that could provide se-
quential and parallel task execution with support for iterations to improve. Like
most of scientific research, fail fast and small to move ahead for success was the
key for us. Most of the tasks required conceptualization and rapid prototyping.
Taking it as a software development task initially, we had some candidate mod-
els such as the water fall model, the agile development model, the spiral model
and the incremental model etc. Here we shall briefly discuss the advantages and
disadvantages in context to our research project.
• The waterfall model offered the simplest approach to requirement engi-
neering, design, implement, test and operate our research. However, it is
inherently sequential and had weak support for iterations [33].
• The agile development model Agile methodology [37] is rapid, iterative
and supports quick prototyping but it requires additional communication
and management overhead like scrum meetings. Managing it along with
stakeholders like VTT and the CIVIS projects was very hard.
• The spiral model is a risk driven process model. It supports prototyping,
provides a good way of avoiding major failure risks, and it is iterative [15].
However, it needs a lot of resources during the planning phase especially
when the spiral keeps growing in size. It is usually very successful for large
projects but it has overheads for small projects like our thesis research. We
shall be discussing more about using parts of the spiral model later in this
chapter.
• The incremental model relies on small incremental steps with each step
consisting of independent design, implementation and testing phases [28].
In the beginning, incremental model was the best fit among other candi-
date models. We were able to prototype small functional units of the Big
Data analytics platform very quickly while independently working on the use
cases. However during the platform development and data analyses parts it
created integration overheads. For example by integrating two different data
processing tools together for a single use case, it becomes difficult when they
were configured in two different incremental steps.
Learning from the problems that we had faced while using incremental model,
we altered our approach to an adapted version of another very flexible software
research and development methodology known as “Kumiega-Van Vliet Trading
System Development Methodology”[31].
CHAPTER 3. METHODOLOGY 26
stages. Within each stage we had four steps. These intra-stage steps were different
for each stage. These steps were corresponding to the main tasks that we discussed
in the beginning of this chapter. A typical intra stage cycle ended with a set of
deliverables. The deliverables were reviewed in a team review session. If required,
the other stockholders such as VTT, were also involved in some of the review
meetings. We shall be discussing it in detail when we describe our stage-wise
proceedings. At the end of each review session a decision was made to either move
to next stage or try to improve via an additional cycle. Using all four intra-stage
steps for additional cycles was not a must. This was another minor adaptation
to the (K|V ) model. Similarly iterations were mostly initiated after stage three.
There were three major iterations. During the iterations change of deliverables
were not mandatory. However in practice it was observed that each iteration had
caused some major or minor changes in stage deliverables. Having a small and
informal team structure reduced our management and communication overheads.
This also helped in rapid processing during iterations. Figure 3.1 illustrates our
approach with the adapted version of (K|V ) model. Stage by stage description of
our methodology is explained in the next section.
Deliverables
Research quantitative
Understanding
methods
Review
•Problem statement
•Platform concept For Revision
document
Conceptual Accepted
Evaluation
model
of tools Iteration
Review
Stage 1. Conceptualization.
Data analytics
Use case platform prototype
definition
•Use cases
•Working prototype
Data
Prototype
collection
testing with
sample data Review
Stage 2. Implementation
Evaluation &
Tight integration of
Selection of
the platform •Functional Platform
Algorithms
components
•Visualizations and
Insights
Applying
Result
Analytics
Visualization
Review
Stage 3. Data Analysis
Problem
Document
statement vs.
Finalization
results review
Stage 4. Documentation
Process
Document
Review and
Integration
discussion
Review
•Final
Output
Document
• Discussions and informal interviews with VTT’s project lead for Green Cam-
pus Initiative.
Literature review had been a constant step through out this stage, its cycles,
and the iterations it went through.
3. Calculating the base load of the building to identify non user consumption
of buildings.
4. Classifying the buildings on the basis of the energy efficiency and analyse
the seasonal shifts in this classification.
sample data was the randomly selected records from the hourly consumption data
set. We started testing the platform with smaller samples of the data and kept
on increasing the data volume till the complete data set was tested. During this
testing the following functionalities were tested.
1. Data collection.
4. Data pre-processing. Reducing the large data volume without losing insights.
7. Data visualisation.
2. Results and visualisation of the results. Providing required insight for the
use cases.
CHAPTER 3. METHODOLOGY 34
3.4 Iterations
Iterative processes and work models do not require full specification right from
beginning. Instead the implementation can start with a part of the specification.
Then in a step wise approach the next scope is defined with consideration of
lessons learnt and new directions found from previous iterations. This inherent
characteristic of iterative processes had a vital role in our research. We started
with a smaller scope i.e. two simple uses case. In earlier iteration we were able
to focus on Big Data technologies and energy efficiency concepts more than the
complex advanced analytics topics in later iterations. The findings and practical
implementation in early phase enabled us to expand the scope later. We added
more use cases with more focus on the data analysis and application of the Big
Data for energy efficiency. Iterations also helped us in improving the quality of
the research.
In our approach, we went through 3 main iteration cycles. As mentioned in
the section 3.3 each iteration did not involve the complete four stages and their
respective steps. The first three stages were the main contributors in the iteration
with step [Link] of stage 4 as the main source for reviewing our proceedings
against our targets. Table 3.1 lists the main activities in each iteration against the
respective stages and steps.
CHAPTER 3. METHODOLOGY 36
We have referred to the Big Data platform concept and implementation on many
occasions in all the previous chapters. In chapter 2 we have discussed the basic
concepts of Big Data with various technological advancements and solutions avail-
able for handling Big Data. In chapter 3 we mentioned the functional components
and their implementation during different stages, cycles and iterations. In this
chapter we start with explaining the concept and a sample application framework
for a Big Data platform based on available open source components. After that,
we present the part of this concept that we have implemented for handling the
energy and social media data for our energy efficiency use cases listed in section
[Link] of the previous chapter.
Before we move to our conceptual model of the Big Data platform it is impor-
tant that we mention basic challenges that drive the design of a Big Data solution
and briefly explain a typical Big Data analytics process.
1. Volume refers to the size of the data. Volume is the most commonly asso-
ciated feature with Big Data. The Big Data analytics platform in our scope
of work is based on Hadoop File System (HDFS) which is a highly scalable
system. It has been tested with upto 4000 scaled out serving nodes capable
of handling upto 10 Petabytes (PB) of data.
37
CHAPTER 4. BIG DATA ANALYTICS PLATFORM 38
2. Velocity refers to the speed of data processing. Velocity is crucial for the
business use cases that need to process huge volumes of data in real time
to produce insights for decision making. Our model is designed for batch
processing. However, we have integrated additional components that can
process the data with near to real time capability.
5. Valuation compares the benefits of processing Big Data against the efforts
required. It is an emerging feature for Big Data analytics design. Just like the
other IT systems, organizations tend to decide about Big Data investments
by looking at the business cases. In our concept, we present a model based on
open source components. So there should be no cost of acquiring software.
There is no specialized hardware requirements for implementing our model
and any commercially available hardware with moderate specifications can be
used to deploy this software. Hardware maintenance is required for running
the service based on our model. There are Cloud alternatives that can be
used within our model. However, we are not discussing such alternatives in
the scope of this document or our research.
form can be ingested directly into Hadoop File System (HDFS). If required, some
filters can also be applied while collecting the data for efficient use of storage
space. Collected data can be structured, semi structured or unstructured and
some pre-processing can be done to format it i.e. form a schema or structure that
can be stored and accessed for data mining in database(s). A data mining engine
with flexibility to plug and use various quantitative and qualitative research tools
can then be used to analyse the data as per use case requirements. The data
mining engine requires a feedback loop to the pre-processing unit to adapt to the
requirements of the use cases. Another feedback channel for processed data storage
can be provided for direct data manipulations. Results of the data mining can be
stored in the database. A RESTful API or data driver can be used to extract data
from the database for visualisation front-ends. Figure 4.1 shows the high level
process flow.
e.g. collecting only the geo tagged tweets or collecting logs with error notifications
only. Following two components are recommended for data collection
[Link] Databases
For storing the pre-processed data various choices of available open source databases
can be applied in this platform. The document stores like MongoDB seems a
natural choice because of the flexibilities to handle any structure data and easy
maintainability but the relational databases can still be integrated and used within
this platform. The availability of tools like Cloudera Impala can fit very well with
SQL based databases like MySQL to build on-line query engines. Apache Hive in
this model is also acting as a projection on top of MySQL.
4.3.5 Presentation
For visualisation of the results, this model presents some tools to build the inter-
active dashboards. These dashboards should be able to zoom in and out of data.
They should provide a flexible way of managing visualisations based on use cases.
Also they can be integrated into user interfaces of the web based applications or
the mobile applications. The suggested components are Tabelau Public, [Link],
google maps and R project.
Tableau Public provides the easiest solution to visualise data through the static
and interactive graphs. Tabelau Public is a free service. The paid version of
Tableau can give additional tools like connecting directly to data bases and Hadoop
ecosystem tools like Apache Hive. Tableau software are based on the VizQL (Vi-
sual Query Language) paradigm [24]. Once connected to the data source, VizQL
provides a very flexible way of interacting with graphs. The idea is to focus on the
insights that are required rather than spending more efforts on programming the
queries to generate those insights. Tableau automatically generates the queries
and visualises the results.
[Link] provides a comprehensive library in Java Script for making the cus-
tomized information graphics. It is very flexible and powerful. However it needs
programming skills in Java scripts and a reasonable effort is required to prepare
or change the graph formats.
Google maps is a powerful tool for geo-spatial info graphics. In many Big Data
use cases geo spatial mapping provides a smart way of presenting the information.
R - project has additional packages and libraries like “ggplot” to visualise the
results after applying statistical analysis and data analytics. Like [Link], R info-
graphics also need programming skills to visualise the results.
CHAPTER 4. BIG DATA ANALYTICS PLATFORM 44
4.4 Implementation
In this section we discuss our implementation of the Big Data analytics platform
for analysing the energy consumption data of our energy efficiency use cases and
collecting social media data for supporting the CIVIS project. The implemented
model is the subset of the conceptual model presented in the previous section
with some additional component. Figure 4.3 shows the implemented model. The
detailed description of the configurations and source code is publicly available via
project’s GitHub repository.
Cloudera CDH 4.7 was used in the form of a pre-configured quick start virtual
machine (VM) capable of running on top of any known operating system e.g.
Microsoft Windows 7/8 , Linux RHEL/Centos, and Ubuntu etc. Cloudera CDH 4.7
quick start VM itself was running on Centos 6.2 operating system. The following
Hardware resources were allocated to run the VM for our analyses.
• 4GB RAM
• 64 GB VMDK storage
During the testing phase, we have also tested multi node Cloudera CDH 4.7 con-
figuration on the Cloud. However there was cost associated to running the set up
on the Cloud and the requirement of our use cases were also fulfilled by the single
node quick virtual machine. So we preferred not to run our analyses using the
multi node Cloud deployment.
In addition to pre-configured components in Cloudera CDH 4.7 we also added
some additional components as part of our platform. The following is the list of
those components with their respective versions.
For ease of use, we had been using R, Weka and Tableau outside the quick
start VM environment. Typically we were using Windows 7 with similar dedicated
hardware. Tableau Public is a web service running in the public Cloud.
In this section we explain the data processing work flows for both scenarios. These
work flows are aligned with the process we explained in section 4.2.
In the section [Link], we have listed the energy efficiency use cases. In chapter
2, we have explained the relevance of these use case to energy efficiency and also
introduced and explained the basic concepts of suitable statistical and advanced
analytics techniques to extract insights from data for these use cases. In chapter 4,
we have explained the implementation of a model Big Data platform for providing
an environment to perform these analysis. In this chapter, we explain how we
have performed our data analysis using the knowledge and capabilities developed
through our work, which is explained in the previous chapters. We also present
our results and their prospective applications. Before we go into the details of the
use cases, analysis and the results, it is important that we explain the data sets.
48
CHAPTER 5. DATA ANALYSES AND THE RESULTS 49
[Link]
CHAPTER 5. DATA ANALYSES AND THE RESULTS 50
at that instance of time. The devices that are used on demand for example a
TV, stove, coffee maker etc. records the consumption values right after the end of
the respective usage session. For continuously used devices like refrigerators and
freezers etc. NIALM devices have their inbuilt mechanism to keep recording the
data. For our research we take the data record as it appears in the data set.
subsections will describe the each step along with explanation of the analysis and
results.
Figure 5.2: Seasonal patterns in usage of electricity and electricity used for heating.
As expected, Figure 5.2 shows that electricity used for heating is more sen-
sitive to temperatures than electricity used for the other purposes. The general
purpose electricity consumption shows a relatively stable trend through out the
11 months period. For the service providers, improving heating distribution and
usage systems can contribute more to energy efficiency.
days and lesser during night times and on the weekends, then we can suggest that
the building is an office building. Secondly, the base load analysis can refer us to
building where there could be possible electricity leakages. Rectifying such issues
can improve the overall energy efficiency of the buildings.
In our analysis we detected the daily trends of the buildings by averaging the
consumption of each building separately for each hour of the day. Normalizing
the data for missing values was very important, so instead of averaging with the
total number of days for each building, we took averages for the number of days
for which data is available.
(a) Building 6: Daily electricity con-(b) Building 13: Daily electricity con-
sumption trend sumption trend
(c) Building 6: Daily electricity for (d) Building 13: Daily electricity for
heating consumption trend heating consumption trend
Figures 5.3a and 5.3b show the average average hourly electricity consumption
during the day of the two buildings from our data i.e. Building 6 and 13 respec-
tively. While figures 5.3c and 5.3d show the average hourly electricity consumption
CHAPTER 5. DATA ANALYSES AND THE RESULTS 55
for the heating of the same buildings. Both type of consumption are higher during
office hours suggesting the purpose of building. While each building has different
base loads. Base load hours are very visible in the daily trends graphs. An in-
teractive dashboard to observe the trends for all the buildings is available on the
following website:
[Link]
A quick exploratory analysis of the trends suggests that all buildings are office
buildings. Secondly the daily base load hours are hour 0, 1, 2, and 23. Some of
the buildings may have more base load hours but we considered only these hours
because we wanted to have a common base load period for all the buildings . The
base loads can then be calculated by averaging the consumption for these hours.
Appendix C contains the list of calculated base loads for each of the buildings in
each month of the year within available data.
El
ect
ri
cit
yforheat
ing
40 5.
97 74.
70
sq)
Whperm.
30 El
ect
ri
cit
y
3.
04 41.
92
y(
20
r
ect
El c
it
i
10
0
sq)
Whperm.
60
ng(
i
orHeat
40
t
ciyf
20
r
ect
El i
0
ng10
ng11
ng12
ng13
ng14
ng15
ng16
ng17
ng18
ng19
ng20
ng21
ng22
ng23
ng24
ng25
ng26
ng28
ng29
ng30
ng31
ng33
ng2
ng3
ng7
ng8
ng6
ng5
ng4
di
di
di
di
di
di
di
di
di
di
di
di
di
di
di
di
di
di
di
di
di
di
di
di
di
di
di
di
di
l
l
l
l
l
l
l
Bui
Bui
Bui
Bui
Bui
Bui
Bui
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
Bui
Bui
Bui
Bui
Bui
Bui
Bui
Bui
Bui
Bui
Bui
Bui
Bui
Bui
Bui
Bui
Bui
Bui
Bui
Bui
Bui
Bui
Aver
ageofel
ec_m2andaver
ageofheat
_m2foreacht
[Link]
ageofel
ec_m2:Col
orshowssum of
el
ec_m2.ForpaneAver
ageofheat
_m2:Col
orshowssum ofheat
_m2.
and month names as the separate columns. We called this matrix as the
“Energy Consumption Matrix”.
4. At this stage our Energy Consumption Matrix had few buildings for which
consumption values were missing for either type of energy feature. We re-
moved these buildings to avoid inconsistency in data for the cluster analysis.
5. We then introduced the real estate data i.e. ground floor area of the respec-
tive buildings into our energy consumption matrix. Using equation 2.2, we
calculated the energy efficiency values for each energy feature. We termed
the resulting matrix as the “Energy Efficiency Matrix”.
6. Until this point, we had energy efficiency in units of Kilo Watt hour per
square metre. We then converted the values into Watt hour per square metre.
This was an optional step and it was performed just to avoid handling small
decimal values.
7. To prepare the final input matrix we removed the labels and left two columns
of energy efficiency values for the two target energy features.
Cl
ass
ifi
cat
ionofBui
ldi
ngsonbas
isofener
gyef
feci
ency(
2013)
Bui
ldi
ngSi
ze(
FloorAr
eai
nm.
sq)
1,
019
180 B29
10,
000
B29 20,
000
160 B29 30,
000
41,
339
140 Cl
ust
er(
K-means)
1.
000 4.
000
q)
120
s
Whperm.
B29
100
ng(
80 B28
i
Heat
B29 B12
B28
B7 B28 B7
60 B7 B12
B29 B20
B28 B12
B13 B24 B12
40 B16
B28
B7 B24 B20
B13 B17 B29 B24 B12
B28 B7 B20
B21
B14 B10B21 B24 B12
20 B29 B25 B20
B24 B24
B15 B25B25 B28 B20 B12
B21
B7 B24 B24 B20
B6 B12
B20
0 B23 B25B28B25 B20
B12
0 5 10 15 20 25 30 35 40 45
El
ect
ri
cit
y(Whperm.
sq)
El
ec_Wh_per
_m2vs.heat
ing_Wh_per_m2.Colorshowscl
ust
[Link]
elabel
edbybui
l
[Link]
aisf
i
lt
eredon
mont
h,whi
chkeeps11of11member [Link]
ewisfil
ter
edonbui
ldi
ng,whichkeeps20of20members.
Figure 5.5: K-means clustering, average hourly energy efficiency for each month
per each building.
the highly efficient class and 1 is for the most energy inefficient class of the build-
ings. Figure 5.6 shows the one month subset of the clustered values. Each buble
represent average energy efficiency in the month of January.
Insight: Figures 5.5 and 5.6 reflect a very important insight; that some of the
bigger size buildings are in high or moderate efficiency clusters while some of the
smaller buildings are in inefficient clusters. Such extreme cases can be good targets
for the case studies. The low efficiency buildings may have some energy leakages,
faults or inefficient usage practices while the high efficiency buildings may suggest
good practices for using energy.
As part of the use cases, we also studied the change in behaviour of the buildings
with the external temperatures, while using the same cluster analysis. Figure 5.7
illustrates the behaviour of different buildings during the 11 month period. Figures
5.7a and 5.7b present the behaviour of Building 29 and Building 7 respectively.
Both the buildings shift among three different clusters. While figures 5.7c and
5.7d represent the buildings 16 and 24 that show shift between two clusters and
no clusters respectively.
CHAPTER 5. DATA ANALYSES AND THE RESULTS 59
Cl
ass
ifi
cat
ionofBui
ldi
ngsonbas
isofener
gyef
feci
ency(
2013)
Bui
ldi
ngSi
ze(
FloorAr
eai
nm.
sq)
1,
019
180
10,
000
B29 20,
000
160 30,
000
41,
339
140 Cl
ust
er(
K-means)
1.
000 4.
000
120
s
Whperm.q)
100
ng(
i
80
Heat
B28 B12
B7
60
B20
B13
40
B16
B24
B17 B18
B5 B21
20 B8
B23 B25
B15B26 B14
B6
0
0 5 10 15 20 25 30 35 40 45
El
ect
ri
ci
ty(
Whperm.
sq)
El
ec_Wh_per
_m2vs.heati
ng_Wh_per
_m2.Colorshowscl
ust
[Link]
[Link]
elabel
edbybui
l
[Link]
aisf
i
lt
eredon
mont
h,whi
chkeeps01_Jan.Thevi
ewisf
i
lt
eredonbuil
di
ng,whi
chkeeps20of20members.
Insight: Figure 5.7 shows the fact that buildings are not equally energy efficient
or inefficient throughout the year.
CHAPTER 5. DATA ANALYSES AND THE RESULTS 60
An interactive dashboard is available for viewing the behaviour of all the build-
ings in the 11 months of collected data via following web address:
[Link]
those devices which are in continuous usage e.g. freezer and the refrigerator. The
predicted values are on the daily consumption basis for a particular device.
1. We collected NIALM data in two phases. The first data set contained con-
sumption values from 1st May 2013 to 3rd February 2014. This data was
used as training data for prediction models. While the seconds data was col-
lected with consumption values from 4th February 2014 till 30th April 2014.
This set was used as the test data to measure the accuracy of the prediction
model.
3. A time window of previous 30 days was used to predict the values for the
next 30 days with 80% confidence interval.
4. The accuracy of the prediction models were compared on the basis of Mean
Absolute Error (MAE).
1. Data was reshaped to form continuous time series. The missing daily values
were filled with zeros.
2. Data was aggregated to calculate the daily consumption for each device.
3. The freezer data was extracted from the processed data set to apply the
prediction models.
4. Three forecasting models were applied to the training data i.e. Linear Re-
gression, ARIMA, and Artificial Neural Networks (AAN). We used R for the
ARIMA modeling and Weka for the Linear Regression and AAN. We shall
discuss about the usage of these tools in chapter 6.
5. We calculated the accuracy using test data and mean absolute error formula.
CHAPTER 5. DATA ANALYSES AND THE RESULTS 62
Table 5.1 lists the mean absolute error (MAE) values for each prediction model
using 80% confidence interval. Forecasting with the ARIMA model shows the
lowest MAE values. For this reason, we included ARIMA forecasting as part of our
implemented platform. We use R to perform calculations described in the section
2.9.3 using packages like “Forecast” for fitting and forecasting with the ARIMA
model and “zoo” for handling time series data structures. The p,d,q values for
predicting monthly values based the previous 30 days were 30,0,30 respectively.
800
[Link]
650
500
Figure 5.8: Forecasting monthly energy consumption for the home appliances
based on previous monthly consumptions
Chapter 6
Discussion
In this chapter we attempt to summarize our work with a critical analysis of our
approach and results while highlighting some possible future directions related
directly or indirectly to our research. Our research has three main constituents
i.e. (i) Big Data tools and techniques (ii) Big Data analytics (iii) Using Big Data
analytics to facilitate in improving energy efficiency. We focus on these topics for
our discussion in this chapter.
64
CHAPTER 6. DISCUSSION 65
data recorded during several years. To test our system’s readiness for large data
sets we replicated the collected data several times in the form of larger file sizes in
multiple numbers.
In terms of velocity, we analysed our data using distributed parallel batch
processing. For the sake of fast and near to real-time data processing speed, we
also tested Cloudera Impala. The performance of Cloudera Impala in terms of
speed, was many times faster than the batch processing tools like Apache Pig and
Apache Hive. However, Cloudera Impala has very limited support for handling
complex and composite data structures. There are some new emerging tools like
Apache Spark and Apache Shark that can enable much faster data processing and
offers support for complex data structures. Adaptation of our data platform with
these new emerging tools can further enhance its data processing capabilities.
Data variety and veracity was managed on the basis of our use case require-
ments and collected data sets. We built our data platform on top of Cloudera CDH
that provides basic Apache Hadoop ecosystem data processing tools like Hive and
Pig. On top of it we also integrated database systems like MongoDB that can in-
gest schema-less data to give flexibility for handling unstructured data. Statistical
programming capabilities using R were also available to clean and process data.
The whole platform was based on open source software and the model has the
inbuilt flexibility for use in any kind of deployment models e.g. public or private
cloud, own premises general purpose hardware, or combination of them.
In an ideal data platform, the processes and workflows should be fully auto-
mated. This means that after configurations, the platform should be able to collect,
process, mine, and visualise data automatically. Our data platform covered the
end-to-end process for data analytics. All the components were integrated to im-
plement the required workflows. However some manual interventions were required
to execute the workflows e.g. extracted insights were fed to visualisation tools in
the form of off-line CSV files. This can be improved to provide full automation
using some paid services like database connectors to visualisation. Another au-
tomation can be added for operations and maintenance of the platform using tools
like Opscode Chef and PuppetLabs AutomateIT tools. It is worth mentioning here
that Cloudera CDH provides a management control panel to manage resources.
But this control panel is limited to tools within CDH.
diverse range of powerful advance analytics libraries and packages that can be
used to extract the information from data. But these tools allow scalability only
in terms of scaling up. Such scalability is always limited and requires specialized
expensive hardware. Combining the data processing power of Apache Hadoop and
MapReduce with these statistical tools can enable us to process huge amounts of
data and mine the required information with tools like R.
We used this concept for data processing and analysis in our research. We
leveraged on the power of Hadoop to pre-process our bulk data and transformed
it into a smaller size while keeping the useful information intact. We then applied
the advance analytics using R to generate the required insights. This is a very
useful approach. But in order to build a continuous data processing and mining
pipeline it has its limitations e.g. manual or off-line mode for transfer of processed
data between the data processing and the data mining phases.
There is a new emerging paradigm of Big Data analytic tools that can execute
variety of advance analytics techniques directly on top of the Hadoop File Sys-
tem. Apache Mahout is one example that can be integrated within the Apache
Hadoop ecosystem and provides powerful advance analytics capabilities. Mahout
uses parallel batch processing, so inherently it is limited for use in on-line streaming
analytics. Cloudera Oryx and Mlbase are two new emerging tools in this ecosystem
that can integrate with Apache Spark and Hadoop 2.0 to provide on-line analyt-
ics on streaming Big Data. Integration and use of these tools can improve the
capabilities of platform discussed in the scope of our thesis research.
the classification. We used the cluster analysis technique with K-means algorithm
to classify the buildings. K-means algorithm requires a predefined number of
clusters. In our case we had our basic requirement to categorises building into
four categories. In other possible use cases, where there may not be any predefined
categories, other advance analytics techniques e.g. the Hierarchical Agglomerative
Clustering can be used. The use of our classification technique is to identify
the possible opportunities to improve the energy efficiency and learn from good
practices. As an obvious next step to our research, the respective buildings with
lower energy efficiency should be explored to identify the possible energy leakages,
faults and bad usage practices. Similarly the good practices can be learnt from
the energy efficient buildings. Our energy efficiency classification is limited to the
general purpose usage of the buildings. This classification model may raise false
red flags for buildings with specialized usage like data centre buildings, laboratories
and factories.
For the NIALM data set with device level energy consumption information, we
tried to compare different prediction models that can help energy service providers
to plan for demand response, production and distribution. Our comparison was
limited to the prediction model for continuously used devices like refrigerators
and freezers. We predicted the future month’s usage on the basis of previous
monthly usage. This comparison could actually be the first step for building a
prediction model for energy service providers, while a recommendation engine for
users that can learn from user behaviour. The energy tariff data can be included to
provide the recommendation to user to improve energy efficiency, as well as to save
money from reduction from the reduction on energy spendings. This model can
be improved further by adding more historic consumption data that can further
affirm the seasonal and usage specific trends.
Chapter 7
Conclusions
Global energy needs are continuously growing. The conventional methods for
producing more energy to meet the demand pose a great threat to the environment.
Among other solutions, energy efficiency has become a major tool for minimizing
the need for producing more energy to cater for the growing demand. Inherently,
the cause of improving energy efficiency relies on understanding the usage patterns,
identifying the problematic areas, establishing good energy consumption practices
and to rectify the faults to reduce energy leakages. The advancement in sensors,
ubiquitous computing and communication technologies has provided the basis for
effectively collecting the usage data to understand energy usage. The collected
data needs to be processed to generate leads for improving energy efficiency. The
quality of insights generated from data improves if we consider the current data
in context to historic data. This means that data volume for processing will keep
on increasing. There can be multiple sources of data so the data formats can also
vary. On the use case basis, data processing requires flexibility for customization
and variation in speed of data processing. All of these data features refers to
application of Big Data technologies for energy efficiency.
Distributed parallel computing programming models like MapReduce provide
the basic environment for handling Big Data. We leveraged on the power of
MapReduce using Apache Hadoop ecosystem tools to present an end-to-end Big
Data analytics tool. Hadoop supports scalability to meet large volumes of data
sets while the there are other tools that can integrate with Hadoop to process com-
plexity in data. We used the model platform to process real life energy data and
generated insights that can be used to improve energy efficiency. The proposed
model provides a ’plug and play’ environment for many other analytic tools to
integrate on a need basis. It is based entirely on open source software components
and can be deployed using general purpose hardware or any cloud based model.
We observed the strong sensitivity in the consumption of specific energy types
to the ecological factors like external temperature. The visualisation of the average
68
CHAPTER 7. CONCLUSIONS 69
daily consumption of each building suggested the respective use of the buildings.
It also provided the information about base loads of the building. Optimizing the
base load of the building improves their energy efficiency. We also calculated the
energy efficiency of the buildings in our sample data set and classified them on
the basis of calculated energy efficiency. This classification provided us a reference
point to identify the buildings to focus on and locate the possible faults, energy
leakages and bad consumption practices. This classification can be an important
tool for the organization working on improving the energy efficiency, as it can help
to isolate the problematic buildings. Our results also show a dynamic behaviour of
buildings’ energy efficiency performance during different months in the year. This
particular insight can be used as another reference to isolate the causes for energy
inefficiency in the target buildings.
We also compared different prediction models to forecast the future energy
usage of different home appliances on the basis of their previous usage. The pre-
diction model itself can be useful for energy service providers to understand and
plan for user specific demand. However, this can also be used as the first step
towards building a decision support system for users to effectively use home ap-
pliances. The decision support system could predict the usage pattern and then
recommend the best options on the basis of configurable parameters e.g. best
tariffs or best time to use green energy etc.
In a nutshell, our research provides a proof of concept of how emerging Big Data
technologies can be applied in the energy industry to improve energy efficiency
and data driven decision making. We presented and demonstrated the concept
of our Big Data analytics platform and applied it to solve the real-life use cases
from the energy industry. The output of our research is a working Big Data
analytics platform and the results generated from advance analytics techniques
applied specifically to solve energy efficiency problems.
Bibliography
[2] The cloudera impala - open source, interactive sql for hadoop.
[Link]
cdh/[Link]. Accessed: 2014-06-09.
70
BIBLIOGRAPHY 71
[16] Box, G. E., and Jenkins, G. M. Time series analysis: forecasting and
control, revised ed. Holden-Day, 1976.
[20] Farhangi, H. The path of the smart grid. Power and Energy Magazine,
IEEE 8, 1 (2010), 18–28.
[22] Forsström, J., Lahti, P., Pursiheimo, E., Rämä, M., Shemeikka,
J., Sipilä, K., Tuominen, P., and Wahlgren, I. Measuring energy
efficiency.
[28] Jacobson, I., Booch, G., Rumbaugh, J., Rumbaugh, J., and Booch,
G. The unified software development process, vol. 1. Addison-Wesley Reading,
1999.
[30] Khan, I., Capozzoli, A., Corgnati, S. P., and Cerquitelli, T. Fault
detection analysis of building energy consumption using data mining tech-
niques. Energy Procedia 42 (2013), 557–566.
[31] Kumiega, A., and Van Vliet, B. A software development methodology for
research and prototyping in financial markets. arXiv preprint arXiv:0803.0162
(2008).
[33] Laplante, P. A., and Neill, C. J. The demise of the waterfall model is
imminent. Queue 1, 10 (Feb. 2004), 10–15.
[34] Li, X., Bowers, C. P., and Schnier, T. Classification of energy con-
sumption in buildings with outlier detection. Industrial Electronics, IEEE
Transactions on 57, 11 (2010), 3639–3644.
[36] MacQueen, J., et al. Some methods for classification and analysis of
multivariate observations. In Proceedings of the fifth Berkeley symposium on
mathematical statistics and probability (1967), vol. 1, California, USA, p. 14.
[38] Marz, N., and Warren, J. Big Data: Principles and best practices of
scalable realtime data systems. O’Reilly Media, 2013.
[40] Ng, A. Cs229 lecture notes. CS229 Lecture notes 1, 1 (2000), 1–3.
[41] Russom, P., et al. Big data analytics. TDWI Best Practices Report, Fourth
Quarter (2011).
[42] Stonebraker, M. The case for shared nothing. IEEE Database Eng. Bull.
9, 1 (1986), 4–9.
74
Appendix B
Data Descriptions
75
APPENDIX B. DATA DESCRIPTIONS 76
7. Hour of the day, Value from 0 to 23. 0 hour mean utility consumed from
00:00 to 00:59 hr of the day.
4. Each record contains the name of the device used, time stamp of usage and
the amount of electricity consumed in Watt hour unit.
5. Please note that there are certain devices that are being logged into records
after their respective usage however there are devices like refrigerator and
freezer that are being used continuously and NIALM device uses internal
mechanism to get consumption value of these devices after certain period of
time.
Appendix C
Detailed Results
Figure C.1: Details of K-means clustering results. Four classes of buildings are
represented as high efficiency (4), moderate efficiency (3), low efficiency (2), and
poor efficiency (1).
77
APPENDIX C. DETAILED RESULTS 78
Building Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov
Building 12 406.2 410.5 414.5 395.9 399 426.3 422.2 431.2 417.2 419.9 431.2
Building 24 289.3 292.3 289.3 291.8 306.1 313.2 310.5 316.8 321.7 322.1 319.3
Building 2 251.7 254.1 255.3 257.39 255.5 279.7 271.8 275.3 268.1 256.1 245.3
Building 20 227.9 235.8 234.4 239.5 247.7 259.1 255.9 265.6 268.3 258.1 244.9
Building 31 170.3 175.7 163.1 159.6 182.8 195.4 190.4 176 177.9 164.6 164.7
Building 14 78.2 57.1 54.4 50.4 52.2 49.4 45.3 46.9 50.9 52.4 56.1
Building 11 77.8 83.7 83.7 83.4 74 66.5 59.7 55.2 51.7 72.5 75
Building 28 62.3 66.8 68 71.9 73.2 73.90 67.7 67.5 68.5 65.7 69.2
Building 8 60.9 65 63.3 68 72.3 75.7 75.59 73.5 67.5 68 65.5
Building 25 60.3 61.4 61.7 60.6 71.2 69.59 60.6 62.1 51.4 50 45.5
Building 10 59.3 54 53 45.2 50.8 53.9 56.5 55.9 57.6 59 57.5
Building 6 56.2 0 0 183.9 231.7 239.9 232.8 228.6 400.2 223.6 205.3
Building 17 48 48.1 47 44.2 45 46.3 44.8 48.2 48.6 48.6 47.4
Building 26 44.3 44.7 44.7 56.1 59.1 63.1 59.8 59.9 60.9 56.3 49.5
Building 5 43.6 45.8 45.6 44.6 46.8 50.9 50.5 52.7 85.4 47.1 47.6
Building 21 41.9 45.8 42.4 38.5 39.2 47.2 42.7 46.5 40.5 40.2 40.4
Building 16 38.5 43.3 42.2 40.2 40.7 41.2 36.1 41.9 33.9 31.8 35.7
Building 22 31.1 27.8 32 36.1 37 34.9 32.7 33.6 35.5 36.4 39.1
Building 7 30.9 42.1 44.1 41.7 38 37.2 26.1 39.5 35.4 44.4 43.3
Building 1 25.2 26.2 23.4 21.6 22.7 24.7 22.2 24.6 23.1 22.8 29.4
Building 23 19.8 20.3 21.2 23.1 20.6 18.8 18.7 20.5 23.9 23.7 22.7
Building 29 16.6 14.7 13.6 12 10 8.6 10.7 9.6 10.4 10.9 12.7
Building 30 14.7 14.6 13.8 14.3 14.2 14 13.3 13.9 13.5 11.7 11.9
Building 18 12.5 13.2 12.8 13 13.1 11.9 12.7 13.5 13.1 13.1 13.4
Building 15 9.2 10.2 10.4 10 12.6 15.8 16.6 16.2 13.2 10.4 9.8
Building 19 4.5 5.3 5.5 4.8 4.4 3.6 4.0 4.5 4.5 4.8 4.9