MIS Week 2
MIS Week 2
Saini Das
Vinod Gupta School of Management, IIT Kharagpur
• Poor security
A single human resources database provides many different views of data, depending on the
information requirements of the user. Illustrated here are two possible views, one of interest to a
benefits specialist and one of interest to a member of the company’s payroll department.
A single human resources database provides many different views of data, depending on the
information requirements of the user. Illustrated here are two possible views, one of interest to a
benefits specialist and one of interest to a member of the company’s payroll department.
Relational DBMS
• Relational DBMS
• Represent data as two-dimensional tables called relations or files
• Each table contains data on entity and attributes
• Examples: Microsoft Access, Oracle Database, MySQL, Microsoft SQL Server.
• Table: Grid of columns and rows
• Rows (tuples): Records for different entities
• Fields (columns): Represents attribute for entity
• Primary key: Field in table used for key fields or unique identification
• Foreign key: Primary key used in second table as look-up field to
identify records from original table
Relational Database Tables
A relational database organizes data in the form of two-dimensional tables. Illustrated here are tables for
the entities SUPPLIER and PART showing how they represent each entity and its attributes.
Supplier_Number is a primary key for the SUPPLIER table and a foreign key for the PART table.
Relational Database Tables ….
Operations of a Relational DBMS
The select, project, and join operations enable data from two different tables to be
combined and only selected attributes to be displayed.
Capabilities of DBMS
SELECT: Lists the desired fields that have to be included in the query
FROM: Lists the tables from where the data has to be drawn
WHERE: Specifies the values of the fields that have to be included or the
conditions that have to be met to include the field.
Illustrated here are the SQL statements for a query to select suppliers for parts 137 or 150
References
Cardinality of relationship
E-R (Entity-Relationship Diagrams) contd..
Normalization
Streamlining complex groupings of data to minimize redundant
data elements.
Students Engg. Mechanics
Roll No. Name Dept. HoD Dept. Contact
no.
101 Sachin Electrical Prof. X 1234567
102 Rahul Mechanical Prof. Y 4567899
103 Saurav Electronics Prof. Z 6789048
104 Virat Mechanical Prof. Y 4567899
105 Dhoni Electrical Prof. X 1234567
106 Anil Mechanical Prof. Y 4567899
Normalization (contd..)
• Very large organizations with huge amounts of data felt the need for:
• Consolidating much of the data from various databases into a whole that
could be understood clearly.
• Focusing on the use of data for decision making, as opposed to simply for
running transactions.
Using Databases to Improve Business Performance &
Decision Making (contd..)
BI
The data warehouse extracts current and historical data from multiple operational systems inside the
organization. These data are combined with data from external sources and reorganized into a central
database designed for management reporting and analysis. The information directory provides users
with information about the data available in the warehouse.
Business Intelligence
• Tools for consolidating, analyzing, and providing access to vast
amounts of data to help users make better business decisions
• E.g. Harrah’s Entertainment analyzes customers data to develop gambling profiles and
identify most profitable customers
• Principle tools to derive BI from data in a warehouse include:
• Software for database querying and reporting
• Online analytical processing (OLAP)
• Data mining
What drives Business Intelligence?
Online analytical processing
(OLAP)
• Supports multidimensional data analysis
• Enables viewing data using multiple dimensions
• Each aspect of information (product, pricing, cost, region,
time period) is different dimension
• E.g. How many cycles were sold in Eastern India in June?
• OLAP enables rapid, online answers to ad hoc queries
Multidimensional Data Model
10 Insights: A first look at the new intelligent enterprise survey on winning with data,
MIT Sloan Management Review, Vol 52, No 1, 2010
“Data Scientists will be the sexiest job of
21st century”
Integration Understanding
Interpretation Knowledge
& Evaluation
Knowledge
Raw
Data __ __ __
__ __ __
Patterns
__ __ __ and
Rules
Transformed
DATA Target Data
Ware Data
house
DM is multidisciplinary
Data Mining
Database
systems
Data Mining Applications
Typical Applications
Customer Segmentation What
Which
WhatHow
Which
are
isofcan Imost
customers
my
the tellchannel
mymarket
best which
are
valuable
What is the life time
segments
goodto
transactions
reach
candidates
and
customers
who
my customers
are
for
are
myare
our
at likely
customers
new
risk to
in each
of
long
Propensity to Buy profitability of my customers ?
be
byfraudulent
distance
marketleaving
segment
calling
segment? ??
? plans ?
Profitability Modeling & Profiling
Customer Attrition
Channel Optimization
Fraud Detection
Targeting customers
Personalize
Increase
Prevent
Interact loss
w/customers
high
customer
ofvalue
highbased on
customers
value their needs.
relationships.
based
customers
onbased
Detect and prevent fraud to minimize loss.
More product
Higher
on
and
their
current
let sales
preference. = value
satisfaction
go of
& lower
future Greater
= loyalty
profitability.
Higher
customers.
retention
9
Industry wide applications of analytics
Some other applications….
• Marketing
• Which customers are likely to respond to this campaign?
• Which customers are likely to be profitable ?
• Who might want to buy this product ? (‘Cross-selling’)
• Which web-pages are customers visiting before buying products or
before leaving our site ? Which types of customers visit which pages ?
• Telecommunications
• Which customers are vulnerable to attrition (at risk of churning) ?
• Based on these symptoms, where are problems located in the network ?
• Finance and Insurance
• Which customers are credit risks / insurance risks ?
• Which claims or credit transactions are fraudulent ?
• Which stocks are likely to perform well in the next 3 months, and why ?
When should we buy and sell, given the likely performance and
transaction costs ?
11
11
Some other applications (Contd..)
• Healthcare
• Which patients may take longer to recover ?
• What is the likely cause of this illness ?
• Which patients are at risk of disease (and might benefit from
medication)? Pfizer pharmaceuticals used data mining to construct a
predictive model that was then embedded in their online cholesterol
health risk assessment, which tells patients their cholesterol risk score.
High risk patients can consult their doctors and request Lipitor, Pfizer’s
cholesterol medication.
• Retail
• Which products do customers buy together (or in sequence) & which do
they not buy together ? (‘Category management’.)
• What characterizes customers at various stores ?
• What items are bought for cash, on credit, or by check ?
• What type of customer buys this item, or this product type ?
12
12
Data Mining Applications (Contd..)
• Quality Control
• Which shipments are high-risk and need to be inspected ?
• Customer Support
• Which tasks schedule (ordering) is optimal (or good enough) ?
• Which customer service representative should I assign to a task ?
• What documents or people are likely to be helpful to the customer in
solving their problem ?
13
13
Real-life Applications…
At Flipkart…
2
Target predicts customer pregnancy
https://www.youtube.com/watch?v=XH1wQEgROg4
3
Categories of Data Analytics
4
5
Descriptive Analytics Applications
6
Diagnostic Analytics Applications
7
Predictive Analytics Applications
8
Prescriptive Analytics Applications
9
Analytics Tools & Techniques
Data
Analytics
Classification
Decision tree
Neural nets
SVM
Logistic Regression
Association Rule
Mining
Descriptive Analytics Techniques
Data Visualization: Graphical representation of data. By using visual elements
like charts, graphs, and maps, data visualization tools provide an accessible way
to see and understand trends, outliers, and patterns in data.
Software for Data Visualization: MS Excel; Power BI, Tableau; QlikView
11
Descriptive Analytics Techniques (contd..) Intercluster distances
are maximized
12
Predictive Analytics Techniques
Regression analysis: Regression analysis is a statistical technique
used to describe relationships among variables. The simplest case to
examine is one in which a variable Y , referred to as the dependent or
target variable, may be related to one variable X, called an
independent or explanatory variable.
Applications:
Predicting events that are yet to occur:
Demand analysis or number of units consumers will purchase in the next
quarter based on past trends
Number of shoppers who will watch a particular advertisement.
Number of policy holders who will be involved in accidents in the next year.
13
Predictive Analytics Techniques (contd..)
• Classification: Data defined in terms of attributes, one of which is the class or
target and others are predictor variables.
• Find a model for class/target attribute as a function of the values of
other(predictor) attributes, such that previously unseen records can be
assigned a class as accurately as possible. Given data is usually divided into
training and test sets.
• Training Data: used to build the model
• Test data: used to validate the model (determine accuracy of the model).
• Classification Techniques:
• Decision tree
• Neural Networks
• Support Vector Machines
• Bayesian Classifier
14
Predictive Analytics Techniques (contd..)
Application 1: Given old data about fraudulent customers and their background, predict whether new
customer will commit fraud or not.
Class/
Predictor attributes
Target
Tid Refund Marital Taxable
Status Income Cheat Actual
Set Classifier
15
Predictive Analytics Techniques (contd..)
• Application 2: Given old data about customers and payments, predict new
applicant’s loan eligibility.
16
Predictive Analytics Techniques (contd..)
• Association Rule Mining: Finding frequent patterns:
• Analysing item sets in customers basket or transactions and identifying
frequently occurring item sets, which can be basis for recommendations.
• Known as Market Basket Analysis in Retail.
Applications:
Store layout: Place the frequently occurring items either in
close proximity or as far apart as possible; give combo offers.
Warehousing and inventory management.
17
THANK YOU!
18