Visualisation for Data Science
Predict
Predict Summary
In this predict, we will use PowerBI to connect to new data sources related to Eskom’s power generation, change the data source locations, and
create new visuals to derive the insight required to appropriately answer several MCQs.
Theme Objective Outcome
● Changed the source of the
PowerBI data files to the path on
Experiment with different
1 Data Sourcing PowerBI data connectors
●
your local machine
Sourced additional Eskom Excel
data files
● Filled the missing values of the
Eskom Excel files with the
Clean, transform and format
2 Data Formatting sourced data
●
approximate feature averages
Created feature relations
● Added various visuals including:
○ Slicers,
Fix ‘broken’ dashboard and
3 Data Visualisation create new visuals
○ Cards, and
○ Line charts
© Explore Data Science Academy
Agenda
01
Problem Introduction
02
Connecting to the Data
03
Sourcing Additional Data and Setting Relationships
04
Creating Visuals
© Explore Data Science Academy
Problem Introduction: The National Energy Crisis
In this predict, we will use Eskom data to build an informative dashboard using Microsoft Power BI.
The dashboard, with its underlying reports, will be used to derive insight into the current South African energy crisis and answer multiple-
choice questions.
You will be supplied with a ‘broken’ dashboard and the underlying data.
Your task is to:
1) Correctly source the data
2) Fix the relationships, and
3) Create the appropriate visuals
The reports you have created will be used to answer the multiple choice
questions at the end of this predict.
© Explore Data Science Academy
Agenda
01
Problem Introduction
02
Connecting to the Data
03 Sourcing Additional Data and Setting Relationships
04
Creating Visuals
© Explore Data Science Academy
Connecting to the Data
In this predict, we will use XLSX and CSV files to source the data.
Step 1 Step 2 Step 3
Download and unzip the Download the Power BI file
folder with the XLSX and containing the broken Change the data source
CSV files: ‘dashboard’: directory for each file in
the Eskom-Data-Power-
Eskom-Data-Power-BI- Eskom-Visualisation4DS- BI-Predict folder.
Predict.zip student-version.pbix
© Explore Data Science Academy
Agenda
01
Problem Introduction
02
Connecting to the Data
03
Sourcing Additional Data and Setting Relationships
04
Creating Visuals
© Explore Data Science Academy
Sourcing Additional data and Setting Relationships
When opening the ‘broken’ dashboard, you will note that the ‘Capacity and Efficiency’ visuals are not working. This is because the datasets for
these visuals and their corresponding relationships are not set up. To properly source the data and set up the relationships, we will be following
the steps below:
Steps for connecting to the Eskom
Source Data
Capacities and Efficiencies Datasets:
1. Use the Power BI ‘Get Data’ Excel connector
and connect to the Eskom Electric Stations
Capacities.xlsx dataset
2. Select ‘Sheet2’
3. Click on ‘Load’
Create Relationship
4. Repeat steps 1-3 for the Eskom power stations
efficiencies.xlsx dataset
5. Rename the Efficiencies dataset to ‘Eskom-
Efficiencies’ and the Capacities dataset to
‘Eskom-Capacities’
6. Create a 1-1 relationship between the ‘Names’
column of the Eskom-Capacities and Eskom-
Efficiencies datasets
© Explore Data Science Academy
Sourcing Additional data and Setting Relationships
With the Eskom Efficiencies and Capacities datasets sourced and the relationships created, the next steps are concerned with cleaning the data
to get it in a workable format. To do this, we are going to replace the empty and null value columns with the column average.
and Data Functions
Steps for cleaning the Capacities
Access Datasets
Dataset
1. Access the Power Query Editor via the
‘Transform Data’ button and navigate to the
Eskom-Capacities dataset (you will notice a lot of
columns with ‘null’ entries)
2. Remove ‘Steam Capacity’ and ‘Column10’
3. For the ‘max nominal capacity (MW)’ column -
Replace ‘null’ values
fill the missing value with 1352
4. Select the ‘Plant Lifetime’ column and replace
‘null’ values with 50
5. For the ‘Nominal Unit capacity (MW)’ column -
fill the missing values with 510
6. For the ‘Number of Units column’ - fill the
missing values with 5
© Explore Data Science Academy
Sourcing Additional data and Setting Relationships
With the Eskom Efficiencies and Capacities datasets now sourced and the relationships created, the next steps are concerned with cleaning the
data to get it in a workable format. To do this, we are going to replace the empty or null value columns with the column average
Steps for cleaning the Efficiencies
Replace ‘null’ values
Dataset
1. Remove ‘Column10’
2. For ‘Fixed Operation and Maintenance Cost’ - fill
the missing values with 188
3. For ‘Variable Operation and Maintenance Cost’ -
fill the missing values with 45
4. For ‘Ramp Rate per hour’ - fill the missing
values with .3068
5. For ‘Energy Efficiency’ - fill the missing values
with .326731
Set the Data Type
6. For ‘heat rate’ - fill the missing values with
10589.22
7. For ‘availability’ - fill the missing values
with .6707 (approx. average)
8. For ‘design efficiency’ - fill the missing values
with .350889
9. Set the data types for all columns i.e. %, whole
number, etc.
10. Remove the bottom 67 rows and the top 1 row
© Explore Data Science Academy
Agenda
01
Problem Introduction
02
Connecting to the Data
03
Sourcing Additional Data and Setting Relationships
04
Creating Visuals
© Explore Data Science Academy
Infrastructure Dashboard
The infrastructure dashboard gives us insight into attributes of each station. Using Power BI slicers and filters we can draw correlations between
station attributes, and explore and analyse this dataset in-depth.
Using what you’ve learnt up to this point, you are tasked to restore the supplied ‘broken’ infrastructure dashboard. You may use the below standing as guidance in your quest to rebuild
the dashboard
Slicers Cards Table Visuals
Add the feature ‘Avg. of
Add the features nominal_capacity_MW’ to the
Add an average electricity
Add a location slicer ‘station_name’ and ‘count of column values field of the line
generated in MW card
id’ to the table and stacked column chart
Add the feature
Add the features ‘commissioned_year_end’ to
Add a total number of stations
Add a station slicer ‘nominal_capacity_MW’ and the line values field of the line
card
‘province’ to the table and stacked column chart
Add a summary of stations
Add the features Add the feature
card.
‘operator_name’ and ‘commission_year_start’ to the
Add a station type slicer Hint: Count of station_name
‘installed_capacity’ to the table tooltips field of the line and
feature and the status feature
stacked column chart
might be helpful
© Explore Data Science Academy
Capacity and Efficiency Dashboard
The Eskom capacity and efficiency dashboard is aimed at providing information on individual station performance.
Before data is sourced, cleaned
and relationships are created
Guidelines for fixing the Eskom Capacity and Efficiency dashboard
Slicers
Add a ‘number of units’ slicer to the dashboard. Set the slicer type as between
Cards
Add the following cards to the report:
and relationships are created
After data is sourced, cleaned
• Avg. of Ramp Rate (%/hr)
• Avg. of variable operations and maintenance costs
Visuals
• For the scatter chart, add the feature ‘Max nominal capacity (MW)’ to the size
column
• For the clustered bar chart add the feature ‘status’ to the visualisations legend
field
Twitter Dashboard
The twitter dashboard summarises Eskom mentions (@) and tags (#) over a defined period
Municipality slicer Hashtags slicer
Add a municipality slicer of type ‘dropdown’
3 4 Add a hashtags slicer of type ‘dropdown’ to
to the dashboard the dashboard
Date slicer Multirow card
Add a date slicer of type ‘between’ to the 2 5 Add the feature ‘count of hashtags’ to the
dashboard multi-row card visual
Import visuals Addition of Tweets feature
If you have not yet done so – click on the
1 6 To the table add the feature ‘Tweets’. Insert
‘Get more visuals’ button and download the this feature in the table values column
WordCloud 2.0 visual and the Scroller visual
Predict-related FAQs
This page will be updated periodically with common predict-related questions which may arise during the Sprint. Consider consulting this
space before asking your course facilitator a question.