Decision Support &
Business Intelligence
Systems
Lecturer: Assoc. Prof. Phạm Quốc Trung
School of Industrial Management, HCMUT
Main Contents
1 Introduction to DSS, BI & Analytics
2 Descriptive Analytics
3 Predictive Analytics
4 Prescriptive Analytics
5 Future Trends
2
Descriptive Analytics I:
Nature of Data,
Statistical Modeling,
and Visualization
3
Outline
•Descriptive Analytics I
•Nature of Data
•Statistical Modeling
•Visualization
4
Three Types of Analytics
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
A Data to Knowledge Continuum
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
A Simple Taxonomy of Data
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Data Preprocessing Steps
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
An Analytics Approach to Predicting Student Attrition
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
A Graphical Depiction of the
Class Imbalance Problem
10
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Relationship between Statistics and
Descriptive Analytics
11
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Understanding the Specifics about
Box-and-Whiskers Plots
12
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Relationship between
Dispersion and Shape Properties.
13
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
A Scatter Plot and
a Linear Regression Line
14
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
A Process Flow for Developing Regression
Models.
15
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
The Logistic Function
16
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Predicting NCAA Bowl Game Outcomes
17
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
A Sample Time Series of Data on
Quarterly Sales Volumes
18
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
The Role of Information Reporting in
Managerial Decision Making
19
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
A Taxonomy of Charts and Graphs
20
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
A Gapminder Chart That Shows the Wealth
and Health of Nations
21
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Magic Quadrant for Business Intelligence
and Analytics Platforms
22
Source: https://www.tableau.com/reports/gartner
A Storyline Visualization in Tableau Software
23
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
An Overview of SAS
Visual Analytics Architecture
24
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
A Screenshot from SAS Visual Analytics
25
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
A Sample Executive Dashboard
26
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
igraph
27
http://igraph.org/redirect.html
Gephi
28
https://gephi.org/
Discovering,
Analyzing,
Visualizing and
Presenting Data
with Python
in Google Colab 29
Google Colab
30
https://colab.research.google.com/notebooks/welcome.ipynb
The Quant Finance PyData Stack
31
Source: http://nbviewer.jupyter.org/format/slides/github/quantopian/pyfolio/blob/master/pyfolio/examples/overview_slides.ipynb#/5
Python
matplotlib
32
Source: https://matplotlib.org/
Python
Pandas
33
http://pandas.pydata.org/
Iris
setosa
flower data
versicolor
set
virginica
34
Source: https://en.wikipedia.org/wiki/Iris_flower_data_set
Source: http://suruchifialoke.com/2016-10-13-machine-learning-tutorial-iris-classification/
Iris Classfication
35
Source: http://suruchifialoke.com/2016-10-13-machine-learning-tutorial-iris-classification/
iris.data
https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data
5.1,3.5,1.4,0.2,Iris-setosa setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
5.4,3.7,1.5,0.2,Iris-setosa virginica
4.8,3.4,1.6,0.2,Iris-setosa
4.8,3.0,1.4,0.1,Iris-setosa
4.3,3.0,1.1,0.1,Iris-setosa
5.8,4.0,1.2,0.2,Iris-setosa
5.7,4.4,1.5,0.4,Iris-setosa
5.4,3.9,1.3,0.4,Iris-setosa
5.1,3.5,1.4,0.3,Iris-setosa
5.7,3.8,1.7,0.3,Iris-setosa versicolor
5.1,3.8,1.5,0.3,Iris-setosa
5.4,3.4,1.7,0.2,Iris-setosa
5.1,3.7,1.5,0.4,Iris-setosa
4.6,3.6,1.0,0.2,Iris-setosa
5.1,3.3,1.7,0.5,Iris-setosa 36
4.8,3.4,1.9,0.2,Iris-setosa
5.0,3.0,1.6,0.2,Iris-setosa
Iris Data Visualization
37
Source: https://seaborn.pydata.org/generated/seaborn.pairplot.html
Connect Google Colab in Google Drive
38
Google Colab
39
Google Colab
40
Connect Colaboratory to Google Drive
41
Google Colab
42
Google Colab
43
Google Colab
44
Run Jupyter Notebook
Python3 GPU
Google Colab
45
Google Colab Python Hello World
print('Hello World')
46
Data Visualization in Google Colab
47
Source: https://seaborn.pydata.org/generated/seaborn.pairplot.html
import seaborn as sns
sns.set(style="ticks", color_codes=True)
iris = sns.load_dataset("iris")
g = sns.pairplot(iris, hue="species")
48
Source: https://seaborn.pydata.org/generated/seaborn.pairplot.html
https://colab.research.google.com/drive/1KRqtEUd2Hg4dM2au9bfVQKrxWnW
import numpy as np
N3O9-
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
from pandas.plotting import scatter_matrix
# Load dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
df = pd.read_csv(url, names=names)
print(df.head(10))
print(df.tail(10))
print(df.describe())
print(df.info())
print(df.shape)
print(df.groupby('class').size())
plt.rcParams["figure.figsize"] = (10,8)
df.plot(kind='box', subplots=True, layout=(2,2), sharex=False, sharey=False)
plt.show()
df.hist()
plt.show()
scatter_matrix(df)
plt.show()
49
sns.pairplot(df, hue="class", size=2)
Source: https://machinelearningmastery.com/machine-learning-in-python-step-by-step/
import numpy as np
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
from pandas.plotting import scatter_matrix
50
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
df = pd.read_csv(url, names=names)
print(df.head(10))
51
df.tail(10)
52
df.describe()
53
print(df.info())
print(df.shape)
54
df.groupby('class').size()
55
plt.rcParams["figure.figsize"] = (10,8)
df.plot(kind='box', subplots=True, layout=(2,2), sharex=False, sharey=False)
plt.show()
56
df.hist()
plt.show()
57
scatter_matrix(df)
plt.show()
58
sns.pairplot(df, hue="class", size=2)
59
Summary
•Descriptive Analytics I
•Nature of Data
•Statistical Modeling
•Visualization
60
Descriptive Analytics II:
Business Intelligence
and Data Warehousing
61
Outline
• Descriptive Analytics II
• Business Intelligence
• Data Warehousing
• Data Integration and the Extraction, Transformation, and Load
(ETL) Processes
• Business Performance Management (BPM)
• Performance Measurement
• Balanced Scorecards
• Six Sigma
62
Relationship between Business Analytics and BI,
and BI and Data Warehousing
63
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
A List of Events That Led to
Data Warehousing Development
64
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Characteristics of Data Warehousing
• Subject oriented
• Data are organized by detailed subject, such as sales,
products, or customers, containing only information
relevant for decision support.
• Integrated
• Integration is closely related to subject orientation.
• Time variant (time series)
• A warehouse maintains historical data.
• Nonvolatile
• After data are entered into a data warehouse, users
cannot change or update the data. 65
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Data-Driven Decision Making—Business
Benefits of the
Data Warehouse
66
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
A Data Warehouse Framework and
Views
67
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Architecture of a
Three-Tier Data Warehouse
68
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Architecture of a
Two-Tier Data Warehouse
69
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Architecture of
Web-Based Data Warehousing
70
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
5 Alternative
Data Warehouse Architectures
a. Independent data marts.
b. Data mart bus architecture
c. Hub-and-spoke architecture
d. Centralized data warehouse
e. Federated data warehouse
71
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
5 Alternative
Data Warehouse Architectures
72
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
5 Alternative
Data Warehouse Architectures
73
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
5 Alternative
Data Warehouse Architectures
74
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Average Assessment Scores for the
Success of the DW Architectures
75
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
The ETL Process
76
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Sample List of
Data Warehousing Vendors
77
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Sample List of
Data Warehousing Vendors
78
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Contrasts between the DM and EDW
Development Approaches
79
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Essential Differences between
Inmon’s and Kimball’s Approaches
80
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Representation of Data in
Data Warehouse
(1) Star Schema (2) Snowflake Schema
81
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
A Comparison between
OLTP and OLAP
82
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Slicing Operations on a Simple
Three-Dimensional Data Cube
83
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Business Performance Management (BPM)
Closed-Loop BPM Cycle
84
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Business Performance Management (BPM)
Closed-Loop BPM Cycle
1. Strategize
• Where do we want to go?
2. Plan
• How do we get there?
3. Monitor/Analyze
• How are we doing?
4. Act and Adjust
• What do we need to do differently?
85
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Four Perspectives in
Balanced Scorecard Methodology
86
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Comparison of the
Balanced Scorecard and Six Sigma
87
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
Six Sigma
The DMAIC Performance Model
• Define
• Measure
• Analyze
• Improve
• Control
88
Source: Ramesh Sharda, Dursun Delen, and Efraim Turban (2017), Business Intelligence, Analytics, and Data Science: A Managerial Perspective, 4th Edition, Pearson
The Joy of Stats:
200 Countries, 200 Years, 4 Minutes
https://www.youtube.com/watch?v=jbkSRLYSojo
89
Python Data Science Handbook
in Google Colab
90
https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/Index.ipynb
Summary
• Descriptive Analytics II
• Business Intelligence
• Data Warehousing
• Data Integration and the Extraction, Transformation, and Load
(ETL) Processes
• Business Performance Management (BPM)
• Performance Measurement
• Balanced Scorecards
• Six Sigma
91