Business Analytics - Study Notes
Business Analytics - Study Notes
SYLLABUS
BA4206 - BUSINESS ANALYTICS
COURSE OBJECTIVES:
Learn to
1. Use business analytics for decision making
2. To apply the appropriate analytics and generate solutions
3. Model and analyse the business situation using analytics.
Page 1 of 150
Page 2 of 150
REFERENCES
1. Marc J. Schniederjans, Dara G. Schniederjans and Christopher M.
Starkey, " Business Analytics Principles, Concepts, and Applications -
What, Why, and How" , Pearson Ed, 2014
2. Christian Albright S and Wayne L. Winston, "Business Analytics - Data
Analysis and Decision Making" , Fifth edition, Cengage Learning, 2015.
3. James R. Evans, "Business Analytics - Methods, Models and Decisions",
Pearson Ed, 2012.
===============================================================
Page 2 of 150
Page 3 of 150
INDEX
SL DESCRIPTION OF LESSON TOPICS PAGE No.
No.
What is business analytics?
What are the concepts of business
analytics? What are the components of
data analytics?
TERMINOLOGIES IN BUSINESS ANALYTICS
ANALYTICS TERMS
DEFINITIONS FOR THE TECHNOLOGIES AND FEATURES USED IN
ANALYTICS.
DATA CONCEPTS
HIGH-LEVEL PRACTICES AND METHODOLOGIES USED
WITH DATA.
DATA TEAM
ROLES OF THOSE COMMONLY INCLUDED IN A DATA PROJECT’S
TEAM.
RI TERMS
DEFINITIONS FOR TERMS AND CONCEPTS USED
AT RECONINSIGHT.
RI360
FEATURES UNIQUE TO RECONINSIGHT’S BUSINESS
ANALYTICS SOFTWARE, RI360.
What are the three main components of business analytics?
What is business analytics why it is required?
What are the key components of an analytics model?
What are the core areas of modern analytics?
Business analytics (BA)
Examples of Business Analytics Application
BASIC DOMAINS WITHIN ANALYTICS
Mathematical Models
Page 3 of 150
Page 4 of 150
Page 4 of 150
Page 5 of 150
How HR analytics helps Human Resource Management
Page 5 of 150
Page 6 of 150
Page 6 of 150
Page 7 of 150
WHAT IS BIG DATA
Page 7 of 150
Page 8 of 150
UNIT III
DESCRIPTIVE ANALYTICS
What is descriptive analysis analytics?
5 EXAMPLES OF DESCRIPTIVE ANALYTICS
USING DATA TO IDENTIFY RELATIONSHIPS AND TRENDS
Page 10 of
150
Page 11 of
150
SIZE
WHAT IS A VARIABLE?
Variables can be classified into various
ways Quantitative vs qualitative
Dependent and independent variables
Descriptive Statistics
DESCRIPTIVE STATISTICS CAN BE BROADLY PUT UNDER
TWO CATEGORIES:
Summary statistics
PROBABILITY DISTRIBUTION FOR DESCRIPTIVE ANALYTICS
What is descriptive probability?
What is the purpose of descriptive statistics?
Probability - terminologies.
Bernoulli Trials
Probability Mass
Function Normal
Distribution
UNIT IV
PREDICTIVE ANALYTICS
What is meant by predictive analytics?
Page 11 of
150
Page 12 of
150
How do you develop a predictive model?
THE SIX STEPS OF THE PREDICTIVE ANALYTICS PROCESS
The process of Predictive Analytics includes the following steps
:
What can you do with Predictive Analytics?
INDUSTRIES USING PREDICTIVE ANALYTICS MODEL
Advantages of Predictive Analytics
Page 12 of
150
Page 13 of 150
UNIT V
PRESCRITIVE ANALYTICS
INTRODUCTION TO PRESCRIPTIVE ANALYTICS
What Is Prescriptive Analytics?
BENEFITS OF PRESCRIPTIVE ANALYTICS
OTHER AREAS OF PRESCRIPTIVE ANALYSIS APPLICATION
Prescriptive Analytics
Software Prescriptive
Analytics Packages
Optimization Platforms
Advantages of optimization
platforms Disadvantages of
optimization platforms
Problems That Can Be Solved by Prescriptive Analytics
Page 16 of 150
Page 17 of 150
Data optimization.
ANALYTICS TERMS
DEFINITIONS FOR THE TECHNOLOGIES AND FEATURES USED IN ANALYTICS.
Page 18 of 150
Page 19 of 150
Data Visualization is the science of deriving meaning from data sets by using
graphical and other non-tabular presentations. Examples of traditional data
visualizations include line series charts, pie charts, and column charts.
Research over the past decades have moved data visualization forward
rapidly with the advent of new graphical representations. Impactful data
visualization thought leaders include Edward Tufte and Stephen Few.
Database is a computer system (cloud or on-prem) that stores data in a
persistent state, typically to be retrieved and modified by other software. In
analytics, reporting and analysis software read, interpret, and display
database data.
Geospatial Analytics associates data with a location. It is a type of data
visualization that is often used to overlay data onto digital maps. The data
is represented as data points or data summaries of geographic regions.
Pivot Table summarizes information extracted from large, detailed datasets.
Multi-level pivots allow you to create hierarchies within the pivots to derive
meaning. The term ‘pivot’ comes from a numeric value ‘pivoting’ on a
discrete list. For example, the total number of cars pivoted against car
model will summarize how many cars align with each car model. Pivot table
data can be displayed as a table, pie chart, bar chart, or column chart.
Time-Series Forecasting applies statistical modeling to a time-series and
forecasts that into future time periods.
Trends typically refer to a time-series data trend. Time-series trends
address the change of a data value through multiple time periods.
User Interface represents any visual interface that a user of a technology
interacts with during use. At ReconInsight we also use this to describe the
last part of a data pipeline, which is often represented by a data analytics
software.
DATA CONCEPTS
HIGH-LEVEL PRACTICES AND METHODOLOGIES USED WITH DATA.
Artificial Intelligence (AI) is a broad term for using vast data sets to provide a
Page 19 of 150
Page 20 of 150
high level of understanding and sometimes a higher level of consciousness.
Within the realm of analytics, artificial intelligence can apply to machine
learning.
Page 20 of 150
Page 21 of 150
Page 22 of 150
Page 23 of 150
DATA TEAM
ROLES OF THOSE COMMONLY INCLUDED IN A DATA PROJECT’S TEAM.
Page 24 of 150
Page 25 of 150
RI TERMS
DEFINITIONS FOR TERMS AND CONCEPTS USED AT RECONINSIGHT.
Data Master is used to describe members of the ReconInsight data team. Our
data masters are seasoned information professional that understand data
from soup to nuts. Data Masters have the unique ability to work with both
business users and technical users. They work directly with clients to
understand business needs, model data, and implement a world class data
pipeline using Ri360 technology.
RI360
FEATURES UNIQUE TO RECONINSIGHT’S BUSINESS ANALYTICS
SOFTWARE, RI360.
Collector is an Ri360-specific technology feature that stores data for a
specific business purpose. For example, an Ri360 collector can store all sales
transactions, a financial system’s general ledger, or all shipments. Ri360
automatically builds the data storage using a meta-data definition. Ri360
collector data processing is developed that ensures the collector contains up-
to-date information. Ri360’s user interface provides an automated data
exploration interface. Ri360 allows users to create information dashboards
using collector data. A single Ri360 collector, coupled with the Ri360 user
interface, can provide/replace dozens of traditional reports.
Collector Datasheet is the tabular representation of data stored in an Ri360
collector snapshot.
Collector Snapshot is a single instance of an Ri360 Collector. A single
instance may represent all the data for a time period. Snapshots are
created regularly and automatically over time. This consistency and
repetition provides a steady and reliable decision making platform.
Data Levels are an Ri360-specific technology feature that represent
cardinality relationships within an Ri360 Collector Datasheet. A single data
level can be represented in a single tabular view. When a datasheet
Page 25 of 150
Page 26 of 150
contains one or more parent- child relationships, multiple data levels are
necessary to represent the business data.
Page 26 of 150
Page 27 of 150
Data levels enable users to easily drill into the details of objects with
parent-child relationships.
=================================================================
What are the three main components of business analytics?
There are three types of analytics that businesses use to drive their
decision making; descriptive analytics, which tell us what has already
happened; predictive analytics, which show us what could happen, and
finally, prescriptive analytics, which inform us what should happen in the
future.
(1) Descriptive Analytics
is the examination of data or content, usually manually performed, to
answer the question “What happened?” (or What is happening?),
characterized by traditional business intelligence (BI) and visualizations such
as pie charts, bar charts, line graphs, tables, or generated narratives.
(2) Predictive analytics
is a branch of advanced analytics that makes predictions about future
outcomes using historical data combined with statistical modeling, data mining
techniques and machine learning. Companies employ predictive analytics to
find patterns in this data to identify risks and opportunities.
(3) Prescriptive Analytics
is a form of advanced analytics which examines data or content to answer
the question “What should be done?” or “What can we do to make
happen?”, and is characterized by techniques such as graph analysis,
simulation, complex event processing, neural networks, recommendation
engines.
Page 28 of 150
Page 29 of 150
1) Data Component,
2) Algorithm Component,
3) Real World Component, and
4) Ethical Component. Knowledge from data science training courses is
necessary for acquiring skills in Components 1 and 2 (Data Component and
Algorithm Component).
Business analytics
makes extensive use of analytical modeling and numerical analysis,
including explanatory and predictive modeling,[2] and fact-based
management to drive decision making. It is therefore closely related to
management science. Analytics may be used as input for human decisions
Page 29 of 150
Page 30 of 150
or may drive fully automated decisions. Business intelligence is querying,
reporting, online analytical processing (OLAP), and "alerts."
In other words, querying, reporting, and OLAP are alert tools that can answer
questions such as what happened, how many, how often, where the problem
is, and what actions
Page 30 of 150
Page 31 of 150
are needed. Business analytics can answer questions like why is this
happening, what if these trends continue, what will happen next (predict),
and what is the best outcome that can happen (optimize)
Page 31 of 150
Page 32 of 150
Risk & Credit analytics
Page 32 of 150
Page 33 of 150
Mathematical Models
•Hypothesis Testing
•Correlation, Regression, Forecasting
•Sampling
•Queueing Theory and Simulation
•Linear Programming
•Network Optimization
•Dynamic Programming
•Nonlinear Optimization
•Game Theory
•Decision Trees
•System Dynamics
•Markov Chains and Hidden Markov Models
•Bayesian Statistics
Keywords
Business Intelligence (BI):
Business Intelligence (BI) is a set of theories, methodologies, processes,
architectures, and technologies that transform raw data into meaningful
and useful information for business purposes.
1. Data Extraction:
Data extraction is the act or process of retrieving data out of (usually
unstructured or poorly structured) data sources for further data
processing or data storage (data migration).
2. Data Mining (DM):
Page 33 of 150
Page 34 of 150
CHALLENGES
Business analytics depends on sufficient volumes of high-quality data.
The difficulty in ensuring data quality is integrating and reconciling data across
different systems, and then deciding what subsets of data to make available.[3]
Previously, analytics was considered a type of after-the-fact method
of forecasting consumer behavior by examining the number of units sold in
the last quarter or the last year.
This type of data warehousing required a lot more storage space than it did
speed. Now business analytics is becoming a tool that can influence the
outcome of customer interactions.
When a specific customer type is considering a purchase, an analytics-
enabled enterprise can modify the sales pitch to appeal to that consumer.
This means the storage space for all that data must react extremely fast to
provide the necessary data in real-time.
================================================================
Page 34 of 150
Page 35 of 150
Page 36 of 150
Page 37 of 150
4. Improve Efficiency
Efficiency is not always limited to employees. Businesses can also analyse
other resources to learn more about their performance. For example, a
grocery store chain was able to reduce refrigeration costs by merely
analysing the temperatures of in-store coolers. It was found that the
refrigerators were being kept several degrees lower than necessary, which
increased power usage. So, by increasing the temperature, power costs
went down without affecting safe food storage. Business owners can learn
from such examples and use data to make their resources efficient.
5. Identify Frauds
Finance companies have begun using analytics to reduce fraud. One way
they do this is by using data to identify potentially fraudulent purchases,
based on the analysis of customers previous transactions. These companies
also use predictive analytics to look at customer profiles and gauge the level
of risk. This helps rate the risk that a particular customer presents and use
this analysis to prevent losses, and builds stronger customer relationships.
Page 37 of 150
Page 38 of 150
6. Cut Manufacturing Costs
One company that has outranked everyone when it comes to using analytics to
reduce manufacturing costs is Intel. Initially, this tech giant would perform 19,000
tests on each
Page 38 of 150
Page 39 of 150
chip being manufactured. With the advent of predictive analysis, Intel was
able to determine which chips need, which tests before their launch. By
using the data collected from all of that testing, it has been able to save
almost $3 million.
Page 40 of 150
Page 41 of 150
Look at the current business scenario. Owing to the lockdown across the
globe, the business environment is as uncertain as it gets. Almost nobody
has an idea when things will get back to normal, and the corporate world will
be allowed to resume its operations. At such times, data analytics can be
used to resolve supply chain issues, introduce crisis management solutions,
optimize costs, and more.
12. Conduct A Competitor Analysis
Today, almost every business has a clear idea of its competitors. An
effective way to get ahead of them is by understanding what they are up
to, their strategies, USPs, etc. By gathering this data by conducting a SWOT
analysis, you can get a preview of how your business is performing as
compared to your competitors.
=================================================
================
(1.3) THE PROCESS OF BUSINESS ANALYTICS
Page 41 of 150
Page 42 of 150
portfolio. Business experts define perimeters for the analytical process
which is crucial for assuring general understanding of the goal.
Step 2: Identify Potential Interest from Data
Page 42 of 150
Page 43 of 150
All sources of data having potential interest are required to identify. The key
asset in this step is the more the data, the better it is. All the data will then
be accumulated and consolidated in a data warehouse or data mart or at a
spreadsheet file. Some exploratory data analysis is executed to do the
computation for missing data, removing outliers, and transforming variables.
For example, time-series analysis graphs are plotted to figure out
some patterns or outliers, scatter plots are used to find correlation or non-
linearity, OLAP system for multidimensional analysis.
Step 3: Inspect the data
Once moving to the analytics step, an analytical model will be predicted on
the prepared and transformed data using statistical analysis techniques like
correlation analysis
and hypothesis testing. The analyst figures out all parameters in connection
with the target variable. The business expert also performs regression
analysis to make simple predictions depending upon the business objective.
In this step, data is also often reduced, divided, crumbled and compared
with various groups to derive powerful insights from data.
Page 44 of 150
Page 45 of 150
engaging and tribal patterns are challenging that can add value to data and
convert into new turnout opportunities.
Step 5: Optimization of Best Possible Solution
Once the analytical model has been validated and approved, the analyst
will apply predictive model coefficients and conclusions to drive “what-if”
conditions, using the defined to optimize the best solution within the given
limitations and constraints.
Necessary considerations are how to serve model output in a user-friendly
way, how to integrate it, how to confirm the monitoring of the analytical
model accurately. An optimal solution is chosen based on the lowest error,
management objectives, and identification of model coefficients that are
associated with the company’s goals.
Step 6: Decision Making and Estimate conclusions
Analysts then would make decisions and endure action based on the
conclusions derived from the model in accordance with the predefined
business problems. Spam of period is accounted for the estimation of
conclusion, all the favorable and opponent consequences are measured in
this duration to satisfy the business needs.
Step 7: Upgrade performance system
At last, the outcome of decision, action and the conclusion conducted from
the model are documented and updated into the database. This helps in
changing and upgrading the performance of the existing system.
Some queries are updated in the database such as “ were the decision and
action impactful?” “ what was the return or investment ?”,”how was the
analysis group compared with the regulating class?”. The performance-
based database is continuously updated once the new insight or knowledge
is extracted.
Page 46 of 150
Page 47 of 150
Page 47 of 150
Page 48 of 150
overheads, when possible — are some of the typical ways in which business
analytics has been used thus far.
Page 48 of 150
Page 49 of 150
Page 50 of 150
Page 51 of 150
Page 51 of 150
Page 52 of 150
overall management
Page 52 of 150
Page 53 of 150
Page 54 of 150
Page 55 of 150
-if” analyses
Page 56 of 150
Page 57 of 150
1. RECOGNIZE A PROBLEM
Problems exist when there is a gap between what is happening and what we
think should be happening.
constraintsor restrictions
4. ANALYZE THE PROBLEM
Analytics plays a major role.
Page 57 of 150
Page 58 of 150
Page 58 of 150
Page 59 of 150
=================================================
===============
Page 59 of 150
Page 60 of 150
Page 61 of 150
Page 62 of 150
3. Price Discrimination
With an understanding of your customers’ willingness to pay, you may find
that different types of customers are willing to pay different amounts for
your products. In such cases, it can be useful to employ price discrimination,
which can be a valuable tool for expanding your company’s reach when
competing with others.
“Price discrimination is one of the most common and powerful price
strategies for companies,” says Harvard Business School Professor
Bharat Anand in the online course Economics for Managers.
In the course, Anand presents several examples of price discrimination,
including reduced prices for students, seniors, and veterans. These “special
case” prices present an opportunity for your company to earn customers
whose willingness to pay may be lower than that of its typical customers.
It’s worth noting that a lower price doesn’t always win consumers over—
Page 62 of 150
Page 63 of 150
selecting a strategic price is crucial, but it’s just one factor they consider
when determining which product to buy.
Page 63 of 150
Page 64 of 150
4. Bundled Pricing
Another pricing strategy that can prove to be advantageous is bundled pricing.
Bundled pricing is the practice of selling two or more products together in a
“bundle,” for which the cost is different than that of purchasing all of the
items separately.
Cable companies often leverage bundling. Purchasing voice, video, and
data services together often grants the customer a lower price than if they
were to purchase the services individually.
“How you think about the logic of pricing should depend on willingness to
pay,” Anand says in Economics for Managers. He presents the example of
bundling childcare and theater tickets.
“Put two products together that, when consumed jointly, increase
consumers’ willingness to pay,” he says. “You might be able to increase the
price for both just because it has so much more value for consumers.”
The way you price your products should be strategic, purposeful, and give
your business a leg up over its competitors.
5. Human Capital
A company is only as strong as its people. As such, hiring, training, and
retaining a team of skilled employees is a competitive advantage for any
business.
Putting in the time and care to select outstanding candidates for open
positions, train current employees, offer professional development
opportunities, and create a culture wherein people feel supported and
challenged can pay off.
Gallup reports that business units with highly engaged employees see a 21
percent increase in profit over their less-engaged counterparts.
Employee engagement has been especially important during the
coronavirus (COVID- 19) pandemic, as many businesses have closed
physical offices and transitioned to remote work. By finding ways to
effectively engage your team in a virtual setting, you can make them feel
Page 64 of 150
Page 65 of 150
supported and empowered from afar.
Page 65 of 150
Page 66 of 150
ExcelSpreadsheets
=================================================
===============
UNIT II
MANAGING RESOURCES FOR BUSINESS ANALYTICS
HR analytics aim to provide insight into how best to manage employees and reach
business goals. Because so much data is available, it is important for HR teams to
first identify which data is most relevant, along with how to use it for maximum
ROI.
What is HR analytics?
HR analytics, also referred to as people analytics, workforce analytics, or
talent analytics, involves gathering together, analyzing, and reporting HR
data. It enables your organization to measure the impact of a range of HR
metrics on overall business performance and make decisions based on data.
In other words, HR analytics is a data- driven approach toward Human
Resources Management.
HR analytics is a fairly novel tool. This means it is still largely unexplored in
scientific literature. The best-known scientific HR analytics definition is by
Heuvel & Bondarouk. According to them, HR analytics is the systematic
identification and quantification of the people drivers of business outcomes (Heuvel &
Bondarouk, 2016). We discuss this further in our People Analytics Certificate
Program.
In the past century, Human Resource Management has changed
dramatically. It has shifted from an operational discipline towards a more
strategic one. The popularity of the term Strategic Human Resource
Management (SHRM) exemplifies this. The data- driven approach that
characterizes HR analytics is in line with this development.
By using people analytics you don’t have to rely on gut feeling anymore.
Analytics enables HR professionals to make data-driven decisions.
Furthermore, analytics helps to test the effectiveness of HR policies and
different interventions.
By the way, HR analytics is similar to people analytics but there are some
subtle differences in how the terms are used.
Being able to use data in decision-making has been growing in importance
throughout the global pandemic. Moving towards a post-pandemic world,
there are many changes happening in employment – whether it is the
Page 67 of 150
Page 68 of 150
growing popularity of hybrid work or the increased use of automation. In this
age of disruption and uncertainty, it is vital to make the correct decisions in
order to navigate our new realities.
Page 68 of 150
Page 69 of 150
Using data in HR
Of all the departments in an organization, the Human Resource (HR)
department may have the least popular reputation.
This has two reasons. First of all, the HR department is like a doctor: you’d
rather never need one.
Picture your role from the other side – when you ask an employee to come by
your
office, it’s likely that something bad is about to happen. You may need to
reprimand, put on notice, or even fire your colleague. Good news, like
getting a promotion, tends to come from an employee’s direct manager. Not
HR.
Secondly, many regard HR as soft. Fluffy-duddy. Old-fashioned. A lot of the
work in HR is based on ‘gut feeling’. We’re doing things a certain way
because we’ve always done it that way. HR doesn’t have a reputation of
bringing in the big bucks or playing a numbers game like sales. HR also
struggles to quantify and measure its success, as marketing and finance do.
HR data analytics changes all of this. A lot of the challenges we just
described can be resolved by becoming more data-driven and savvy about
HR and analytics.
Example questions include:
How high is your annual employee turnover?
How much of your employee turnover consists of regretted loss?
Do you know which employees will be the most likely to leave your
company within a year?
These questions can only be answered using HR data. Most HR
professionals can easily answer the first question. However, answering the
second question is harder.
To answer this second question, you would need to combine two different
data sources: your Human Resources Information System (HRIS) and your
Performance Management System.
To answer the third question, you would need even more HR data and
extensively analyze it as well.
Page 69 of 150
Page 70 of 150
As a HR professional, you collect vast amounts of data. Unfortunately, this
data often remains unused. Once you start to analyze human resource
challenges by using this data, you are engaged in HR data analytics.
Page 70 of 150
Page 71 of 150
Page 72 of 150
Page 73 of 150
Page 73 of 150
Page 74 of 150
Organizational Analytics is the process or set of processes that links people and
organization to the creation of value through information – information generated
Page 74 of 150
Page 75 of 150
Page 75 of 150
Page 76 of 150
MANAGING INFORMATION POLICY
What is an information management policy?
Page 76 of 150
Page 77 of 150
Things to include
An effective information management policy will usually include the following:
details of organisationally endorsed processes, practices and
procedures for undertaking information management tasks,
including creation and capture
Page 78 of 150
Page 79 of 150
identification of endorsed systems for managing information assets
advice on the disposal and destruction of information assets,
including the provisions of normal administrative practice (NAP)
Page 79 of 150
Page 80 of 150
Page 81 of 150
Page 82 of 150
The purpose of this policy is to guide and direct the creation and
management of information assets (records, information and data) by staff,
and to clarify staff responsibilities. [The agency] is committed to
establishing and maintaining information management practices that meet
its business needs, accountability requirements and stakeholder
expectations.
The benefit of complying with this policy will be trusted information that is
well- described, stored in known endorsed locations and accessible to staff
and clients when needed.
This policy is written within the context of [the agency's] information
governance framework, which is located at XXXX. Complementary policies
and additional guidelines and procedures support this policy and are located
at XXXX.
Scope
The scope should identify both who and what is covered by the policy, to
support the holistic management of all an agency’s information assets.
For example:
This policy applies to all [agency] staff members and contractors and to all
information assets (records, information and data) in any format, created or
received, to support [agency] business activities.
It covers all business applications used to create, manage and store
information assets, including dedicated information management systems,
business information systems, databases, email, voice and instant
messaging, websites, and social media applications. This policy covers
information created and managed in-house and off-site, including in cloud
based platforms.
Policy statement
Provide a brief statement of your agency's commitment to good
information management practices. If it applies, briefly mention factors
Page 82 of 150
Page 83 of 150
that influence information management within the agency.
For example:
Page 83 of 150
Page 84 of 150
Page 85 of 150
Page 86 of 150
Page 87 of 150
Page 88 of 150
Page 89 of 150
Page 90 of 150
Page 91 of 150
Page 92 of 150
Resources
Provide a list of resources that give extra information. This may include
contact details of relevant staff within the agency, as well as useful
reference material.
Senior management endorsement
Provide evidence that the head of your agency or a senior officer with
responsibility for information management has endorsed the policy. This
may be done in a brief paragraph signed by the head of agency or senior
officer, recognising the important place of information management in the
agency and directing staff to comply with the requirements of the policy.
=================================================
=============
(2.4) WHAT IS DATA QUALITY AND MASTER DATA MANAGEMENT?
audits
External
Page 93 of 150
Page 94 of 150
“The effective use of big data has the potential to transform economies, delivering a
new wave of productivity growth and consumer surplus. Using big data will become a
key basis of competition for existing companies, and will create new competitors who
are able to attract employees that have the critical skills for a big data world.” -
McKinsey Global Institute, 2011
TYPES OF DATA
Discrete-derived from countingsomething.
◦ For example, a delivery is either on time or not; an order is complete or
incomplete; or an invoice can have one, two, three, or any number of
errors. Some discrete metrics would be the proportion of on-time deliveries;
the number of incomplete orders each day, and the number of errors per
invoice.
Continuous based on a continuous scale of measurement.
◦ Any metrics involving dollars, length, time, volume, or weight, for
example, are continuous.
MEASUREMENT SCALES
Categorical (nominal) data -sorted into categories according to
Page 94 of 150
Page 95 of 150
specified characteristics.
Page 95 of 150
Page 96 of 150
DATA QUALITY
can be defined in many different ways. In the most general sense, good
data quality exists when data is suitable for the use case at hand.
Page 96 of 150
Page 97 of 150
This means that quality always depends on the context in which it is used, leading
to the conclusion that there is no absolute valid quality benchmark.
Nonetheless, several definitions use the following rules for evaluating data quality:
Page 97 of 150
Page 98 of 150
Page 98 of 150
Page 99 of 150
Employees do not work with their BI applications because they do not
trust them (and/or their underlying data)
Page 99 of 150
Page 100 of
150
Incorrect data leads to false facts and bad decisions in
data-driven environments
If data quality guidelines are not defined, multiple data copies are
generated that can be expensive to clean up
A lack of uniform concepts for data quality leads to
inconsistencies and threatens content standards
For data silo reduction, uniform data is necessary to allow systems
to talk to each other
To make Big Data useful, data needs business context; the link to the
business context is master data (e.g., in Internet of Things use cases,
reliable product master data is absolutely necessary)
Shifting the view to data (away from applications) requires a
different view on data, independent from the usage context so there
are general and specific requirements for the quality of data
Furthermore, there are specific characteristics in today’s business world that push
the organization to think about how to collect reliable data:
Companies must be able to react as flexibly as possible to dynamically
changing market requirements. Otherwise, they risk missing out on
valuable business opportunities. Therefore, a flexible data landscape
that can react quickly to changes and new requirements is essential.
Effective master data management can be the crucial factor when it
comes to minimizing integration costs.
Business users are demanding more and more cross-departmental
analysis from integrated datasets. Data-driven enterprises in
particular depend on quality-ensured master data as a pre-requisite
to optimize their business process and develop new (data-driven)
services and products.
Rapidly growing data volumes, as well as internal and external data
sources, lead to a constantly increasing data basis. Transparent
definitions of data and its relationships are essential for managing
and using them.
Stricter compliance requirements make it more important than ever
Page 100 of
150
Page 101 of
to ensure that data quality standards are met. 150
Page 101 of
150
Page 102 of
150
To make matters worse, the analytical landscapes of organizations are
becoming more complex. Companies collect increasing volumes of
differently structured data from various sources while at the same time
implementing new analytical solutions.
This drastically increases the importance of consistent master data.
Companies can only unlock the full economic potential of their data if the
master data is well managed and provided in high quality.
Page 104 of
150
Page 105 of
150
The cleansing of the (master) data is normally done according to
engineered individual business rules.
Enrichment of data (with, for example, geo-data or socio-
demographic information) can help with systems and
business processes.
To ensure (master) data quality is reached, continuous monitoring and
checking of the data is very important. This can be done automatically
via software by applying defined business rules. So at the end of the
cycle, there is a fluent transition of the original data quality initiative
to the second phase:
the ongoing protection of data quality.
The different phases are typically assigned to the aforementioned roles.
DATA QUALITY AND MASTER DATA MANAGEMENT SOFTWARE
Most of the technologies on the market today are aligned with the data quality
cycle and provide in-depth functionality to assist various user roles in their
processes.
The best way of achieving high data quality with technology is to integrate
the different phases of the data quality cycle into operational processes
and match them with individual roles. Software tools assist in different
ways by providing:
Data profiling functions
Data quality functions like cleansing, standardization, parsing, de-
duplication, matching, hierarchy management, identity resolution
User-specific interfaces/workflow support
Integration and synchronization with application models
Data cleansing, enrichment and removal
Data distribution and synchronization with data stores
Definition of metrics, monitoring components
Data Lifecycle Management
Reporting components, dashboarding
Versioning functionality for datasets, issue tracking, collaboration
This list of functions is intended to give an overview of the functional range
Page 105 of
150
Page 106 of
150
offered by current data quality and master data management tools.
Based on your company’s individual requirements, every organization should
define and prioritize which specific functions are relevant to them and which
will have a significant impact on the business.
Page 106 of
150
Page 107 of
150
The market for data quality and master data management tools is
comparatively heterogeneous. Providers can be classified according to
their focus or their history in the following groups:
Business intelligence and data management generalists have a
broad software portfolio which also can be used for data quality
and master data management tasks.
Specialists of service-oriented infrastructures also offer software
platforms, which can be used to manage master data.
Data quality specialists are mainly focused on how data quality can be
ensured and provide tools that can be integrated into existing
infrastructures and systems.
Data integration specialists offer tools that are especially useful for
matching and integrating data from different systems.
For master data management specialists, master data
management is a strategic topic. These providers offer explicit
solutions and services for the management of master data.
Data Preparation (Data Pre-Processing for Data Discovery) is a relatively
new trend in the business intelligence and data management market.
In the context of data quality, data discovery software can be a flexible
(but not really durable) tool for business users to address data quality
problems.
Page 108 of
150
Page 109 of
150
According to the business or company strategy, corresponding initiatives
should be launched (e.g., create centrally available master data;
documented data domains, dimensions and KPIs; define contact persons
and data management processes).
With the success of these initiatives (higher data quality), the company will
notice improvements in its internal business processes (e.g., management
and sales departments will make decisions based on correct data instead of
“gut feeling”), as well as new potentials like the chance to make the
transition to becoming a digitalized company that is able to develop new
data-centric products and services for business partners.
Only data-driven companies can compete in the era of digitization. In the
increasingly complex world of data, enterprises need reliable pillars. Reliable
data is a critical factor. Ultimately, sustainable data quality management will
pay off.
===========================================
==================== (2.5) CHANGES MAY BE
REQUIRED IN BUSINESS ANALYTICS -
THE ROLE OF BUSINESS ANALYTICS IN CHANGE MANAGEMENT
Page 109 of
150
Page 110 of
150
WHAT IS A BUSINESS ANALYTICS / BUSINESS ANALYST?
Business analysts provide business improvement recommendations based
on data. They use data analytics to examine processes, determine desired
outcomes, and deliver data-driven suggestions and reports to executives
and stakeholders. Business analysts evaluate the current model of a
business and see how it can be optimized.
Page 110 of
150
Page 111 of
150
The real value of a business analyst is in mitigating risk and providing
strategic direction. That first part really can’t be overstated: risk mitigation
is key to the process of change management. Change introduces the
unknown to an environment and that can either spell success or disaster. A
good business analyst will minimize uncertainty and guide the organization
toward a greater likelihood of success.
What is change management?
Change management is, put simply, the process of guiding the adaptation
of a business. A business that has grown stagnant and too comfortable with
the way it operates will not survive the years to come. The evidence of that
is all around us, from department stores not willing to re-evaluating how
modern customers shop to restaurants that don’t offer delivery.
Change management is the process of re-evaluating your current business
model and implementing the pivots necessary to keep the business healthy
and competitive in an evolving market. Done well, change management
minimizes the fear of change that inevitably arises and gets everyone on
board with successful implementation.
Page 112 of
150
Page 113 of
150
of your company: in other words, how healthy it is from a financial
perspective. A BA can tell you how much money your business has on hand,
what its debts are, and what your cash flow looks like. This is valuable for
being able to make a rational decision based on facts and not conjecture.
Providing current data on the state of the market
A BA will also provide an objective look at where your business stands in
context. It’s easy to feel, for instance, that your business is the top
cheesemaker in the country if you never research other cheesemakers. So,
a business analyst may look at market movement and use data to predict
future trends.
For instance, a business analyst may tell you that your business should
target enterprise companies instead of SMBs because your software is
more appropriate for large-scale solutions and enterprise companies tend
to have greater cashflow than SMBs. With that information, it’s easier to
not only decide which route is best for your company in the present
moment, but which changes will have favorable outcomes for years to
come.
Researching how proposed changes might impact the business
Business analysts aren’t fans of conjecture. Instead, they’ll use data to
simulate likely outcomes based on various fact-based scenarios. For
instance, a business analyst may simulate what kind of impact raising your
product pricing by 10% would have on everything from your bottom line to
your competitive place in the market. A BA will give you an educated opinion
based on all available data so that you can make informed decisions.
Collecting use cases
A BA can also help your business develop use cases. A use case is
essentially how an actor (normally a customer) interacts with a system to
achieve a desired end. If your business wants to understand, for instance,
how customers are interacting with a purchasing process, a business analyst
can look at the use case as well as alternate flows and exception flows to
uncover gaps in the process or missed opportunities. This is useful for
helping others think from the user’s perspective and describe the end result
Page 113 of
150
Page 114 of
150 way
after the use case has been completed. In short, they give you a visual
to understand flows so your business can maximize them.
Defining and communicating the desired future state
Page 114 of
150
Page 115 of
150
Sometimes organizations don’t know what they want. They certainly want to make
more money, but that’s not the entire story. It’s up to a BA to help define what an
organization actually wants.
For instance, consider a company that currently makes dashboards for self-
driving cars. Sure, the company wants a healthier bottom line. But what they
might really want is partnerships with prominent car manufacturers,
complete market domination, and to lead a transportation revolution. A BA
can help define and articulate the big, audacious desires of an organization.
That vision of a desired future state will inform the best changes the
business can make to achieve that future state.
Creating a comprehensive plan
It’s not enough for a business to know what needs to change. They also need
to know how to change. A business analyst can look at all the moving parts
in an organization and define their role in the change, how they contribute to
the implementation, and what the expectations are around contributing to
the change. They can create a clear, actionable plan that makes sense to
everyone involved.
Preparing reports and presentations to communicate motivations, changes,
options, and impact
An especially valuable role of BAs in change management is that of
showing why a business should change. They present their data and
rationale in reports and presentations to gain buy-in from stakeholders.
Seeing change options in a visual manner is vital when communicating both
the motivation for the change and its potential impact. It’s also a great idea
to create a
visual flowchart of the comprehensive plan mentioned above so that
there’s no doubt about how the implementation will proceed. These
presentations help eliminate confusion and uncertainty.
As you can see, a business analyst is truly vital to assisting your business in
navigating the world of change management. They can aid in everything
from providing data-driven suggestions for meaningful change to providing
roadmaps for how best to implement
Page 115 of
150
Page 116 of
the change. They’re an invaluable investment in your business and150
can provide
the right strategy to keep the organization healthy throughout any crisis or time of
change.
Page 116 of
150
Page 117 of
150
=================================================
=================
=================================================
=============
UNIT III
DESCRIPTIVE ANALYTICS
(Introduction to Descriptive analytics - Visualising and Exploring Data -
Descriptive Statistics - Sampling and Estimation - Probability Distribution for
Descriptive Analytics - Analysis of Descriptive analytics)
Page 117 of
150
Page 118 of
150
Descriptive analytics can be leveraged on its own or act as a foundation for
the other three analytics types. If you’re new to the field of business
analytics, descriptive analytics is an accessible and rewarding place to
start.
Page 118 of
150
Page 119 of
150
Page 119 of
150
Page 120 of
150 20
instance, highlighting that traffic from paid advertisements increased
percent year over year.
The three other analytics types can then be used to determine why traffic
from each source increased or decreased over time, if trends are predicted
to continue, and what your team’s best course of action is moving forward.
2. Financial Statement Analysis
Page 120 of
150
Page 121 of
150
Another example of descriptive analytics that may be familiar to you is
financial statement analysis. Financial statements are periodic reports that
detail financial information about a business and, together, give a holistic
view of a company’s financial health.
There are several types of financial statements, including the balance sheet,
income statement, cash flow statement, and statement of shareholders’ equity.
Each caters to a specific audience and conveys different information about a
company’s finances.
Financial statement analysis can be done in three primary ways: vertical,
horizontal, and ratio.
Vertical analysis involves reading a statement from top to bottom and
comparing each item to those above and below it. This helps determine
relationships between variables. For instance, if each line item is a
percentage of the total, comparing them can provide insight into which are
taking up larger and smaller percentages of the whole.
Horizontal analysis involves reading a statement from left to right and
comparing each item to itself from a previous period. This type of analysis
determines change over time. Finally, ratio analysis involves comparing one
section of a report to another based on their relationships to the whole. This
directly compares items across periods, as well as your company’s ratios to
the industry’s to gauge whether yours is over- or underperforming.
Each of these financial statement analysis methods are examples of
descriptive analytics, as they provide information about trends and
relationships between variables based on current and historical data.
3. Demand Trends
Descriptive analytics can also be used to identify trends in customer
preference and behavior and make assumptions about the demand for
specific products or services. Streaming provider Netflix’s trend
identification provides an excellent use case for descriptive analytics.
Netflix’s team—which has a track record of being heavily data- driven—
gathers data on users’ in-platform behavior. They analyze this data to
determine which TV series and movies are trending at any given time and
Page 121 of
150
Page 122 of
list trending titles in a section of the platform’s home screen. 150
Not only does this data allow Netflix users to see what’s popular—and thus,
what they might enjoy watching—but it allows the Netflix team to know
which types of media,
Page 122 of
150
Page 123 of
150
themes, and actors are especially favored at a certain time. This can drive
decision- making about future original content creation, contracts with
existing production companies, marketing, and retargeting campaigns.
4. Aggregated Survey Results
Descriptive analytics is also useful in market research. When it comes time
to glean insights from survey and focus group data, descriptive analytics
can help identify relationships between variables and trends.
For instance, you may conduct a survey and identify that as respondents’
age increases, so does their likelihood to purchase your product. If you’ve
conducted this survey multiple times over several years, descriptive
analytics can tell you if this age- purchase correlation has always existed or
if it was something that only occurred this year.
Insights like this can pave the way for diagnostic analytics to explain why
certain factors are correlated. You can then leverage predictive and
prescriptive analytics to plan future product improvements or marketing
campaigns based on those trends.
5. Progress to Goals
Finally, descriptive analytics can be applied to track progress to goals.
Reporting on progress toward key performance indicators (KPIs) can help
your team understand if efforts are on track or if adjustments need to be
made.
For example, if your organization aims to reach 500,000 monthly unique
page views, you can use traffic data to communicate how you’re tracking
toward it. Perhaps halfway through the month, you’re at 200,000 unique
page views. This would be underperforming because you’d like to be
halfway to your goal at that point—at 250,000 unique page views. This
descriptive analysis of your team’s progress can allow further analysis to
examine what can be done differently to improve traffic numbers and get
back on track to hit your KPI.
Page 124 of
150
Page 125 of
150
literacy—the ability to analyze, interpret, and even question data—is an
increasingly valuable skill.”
Leveraging descriptive analytics to communicate change based on current
and historical data and as a foundation for diagnostic, predictive, and
prescriptive analytics has the potential to take you and your organization
far.
=================================================
==============
(3.2) VISUALIZING AND EXPLORING DESCRIPTIVE DATA
Page 125 of
150
Page 126 of
than a bunch of numbers. 150
Page 126 of
150
Page 127 of
150
DESCRIPTIVE STATISTICS AND DATA VISUALIZATION
Turn Your Statistics Into Something More Interesting Data is quickly
becoming a defining thing in the business world. It is the lifeblood of every
company decision and thus, it defines what companies do. A company
which doesn’t pay attention to proper statistics can be at a serious
disadvantage from companies who do, especially companies that use
descriptive statistics and data visualization.
Data has to be good if a business wants to remain relevant and successful
in the business world.
The first step would be to collect the data, which is quite easy in many ways.
Then the gathered information needs to be analyzed and understood. But
what comes after that? Simple – descriptive data and data visualization.
Descriptive Statistics
Descriptive statistics describes data – it summarizes and organizes all of
the collected data into something manageable and simple to understand.
The descriptions can include the entire data set or just a part of the data
set.
One of the most important things to know about descriptive data analysis is
that it focuses on the data instead of on the implication that can be far
reaching and go beyond the represented data.
This is the main difference between inferential statistics and descriptive
statistics. Inferential statistics uses complicated calculations to make
predictions while descriptive statistics doesn’t.
This is just the basic information you need to know about descriptive
statistics, but it’s worth understanding the basics before we dive in any
deeper.
Page 128 of
150
Page 129 of
150
of a data point in the middle of a set while mode is the value which is
more frequent and appears often. One of the common examples of
mean is GPA, for example. The student’s academic success is
measured by the average of their grades.
The second type is the frequency. This is a measure of how
commonly and frequently something happens. This is often seen in
descriptive statistics when summarizing polls or surveys, and their
responses. For instance, if 46% of people say yes to some questions.
The third type would be the measure of position, which includes
quartile and percentile ranks. This type of descriptive analysis and
statistics describes different points of data and how they relate to
each other. This is used when comparing the data points against
each other.
Finally, the fourth type is the variation of dispersion which is
commonly used as a tool to determine the range of values that the
data encompasses and it identifies the maximum and minimum values
as well. The variance of information can also be attained and it helps
determine certain points.
Page 129 of
150
Page 130 of
150
population or a sample of a population. Descriptive statistics are broken
down into measures of central tendency and measures of variability
(spread). Measures of central tendency include the mean,
Page 130 of
150
Page 131 of
150
median, and mode, while measures of variability include standard
deviation, variance, minimum and maximum variables, kurtosis, and
skewness.
Descriptive statistics summarizes or describes the characteristics of a data
set.
Descriptive statistics consists of two basic categories of measures:
measures of central tendency and measures of variability (or spread).
Measures of central tendency describe the center of a data set.
Measures of variability or spread describe the dispersion of data within the
set.
Page 131 of
150
Page 132 of
150
performance. A student's personal GPA reflects their mean academic
performance.
Page 132 of
150
Page 133 of
150
Central Tendency
Measures of central tendency focus on the average or middle values of
data sets, whereas measures of variability focus on the dispersion of data.
These two measures use graphs, tables and general discussions to help
people understand the meaning of the analyzed data.
Measures of central tendency describe the center position of a distribution
for a data set. A person analyzes the frequency of each data point in the
distribution and describes it using the mean, median, or mode, which
measures the most common patterns of the analyzed data set.
Measures of Variability
Measures of variability (or the measures of spread) aid in analyzing how
dispersed the distribution is for a set of data. For example, while the
measures of central tendency may give a person the average of a data set,
it does not describe how the data is distributed within the set.
So while the average of the data maybe 65 out of 100, there can still be
data points at both 1 and 100. Measures of variability help communicate
this by describing the shape and spread of the data set. Range, quartiles,
absolute deviation, and variance are all examples of measures of
variability.
Consider the following data set: 5, 19, 24, 62, 91, 100. The range of that
data set is 95, which is calculated by subtracting the lowest number (5) in
the data set from the highest (100).
Why do we need statistics that simply describe data?
Descriptive statistics are used to describe or summarize the characteristics
of a sample or data set, such as a variable's mean, standard deviation, or
frequency. Inferential statistics can help us understand the collective
properties of the elements of a data sample. Knowing the sample mean,
variance, and distribution of a variable can help us understand the world
around us.
Page 133 of
150
Page 134 of
150
What are mean and standard deviation?
These are two commonly employed descriptive statistics. Mean is the
average level observed in some piece of data, while standard deviation
describes the variance, or how dispersed the data observed in that
variable is distributed around its mean.
Data Visualization
The term itself essentially means that you should take the data you have
and that you should convert it to a visual form which is simpler to digest
and understand. Instead of looking at numbers or spreadsheets, you can
get a picture which shows you the information.
Descriptive statistics turn the data into something more understandable
than raw data but data visualization goes further than that and creates a
visual which quickly tells a story.
For example, a pie graph shows information much better than a bunch of
numbers. And everyone has seen a pie chart many times already.
Pie graphs are very simple but they are effective when used properly. But there
are also different forms of data visualization like:
Bar charts
Line graphs
Scatter plots
Diagrams
Spider charts
And there are many more. This is the ultimate visual aid and it’s a key
ingredient in using the data in a helpful way.
Page 134 of
150
Page 135 of
150
Why is Data Visualization Important?
From a business perspective, data visualization is crucial. Data scientists
can look at the raw data and see something important, but people who are
not data scientists, which is the majority of the decision making teams in
companies, can’t use raw data. That’s why data visualization is necessary
when you need to get a point across. It makes the data clear,
understandable, and it eliminates the confusing aspects of it. With good data
visualization, you can get a lot of success and open the discussion of what to
do with the data you provide.
Page 136 of
150
Page 137 of
150
“When it comes to data visualization and descriptive statistics, these are
extremely useful to companies of all sizes. Both can help the companies
prepare for all sorts of situations, from bad ones to good ones and they can
make the decision making process a lot easier.
The Proper Tools
To make data effective and useful, companies need a proper data
visualization tool. There are many different data visualization tools that
companies can use. From simple to more complex, it all becomes a matter
of preference for companies.
“Finding the right data visualization tool is extremely important in all cases
as it can make a difference between a good data representation that
actually helps and bad data representation that confuses people.
predictive model:
Sales = -2.9485(price) +
3240.9 Total revenue =
(price)(sales) Cost =
10(Sales) + 5000
Page 137 of
150
Page 138 of
150
Page 138 of
150
Page 139 of
150
max. Profit
s.t.Sales >= 0
Sales is integer
=================================================
=================
=================================================
=============
WHAT IS A VARIABLE?
To put it in very simple terms, a variable is an entity whose value varies. A
variable is an essential component of any statistical data. It is a feature of a
member of a given sample or population, which is unique, and can differ in
quantity or quantity from another member of the same sample or
population. Variables either are the primary quantities of interest or act as
practical substitutes for the same. The importance of variables is that they
help in operationalization of concepts for data collection.
Continuous variables, on the other hand, can take any value in between the
two given values (e.g., duration for which the weals last in the same sample
Page 139 of
150
Page 140 of
of patients with urticaria). 150
Page 140 of
150
Page 141 of
150
Under the umbrella of qualitative variables, you can have
nominal/categorical variables and ordinal variables
Nominal/categorical variables are, as the name suggests, variables which can
be slotted into different categories (e.g., gender or type of psoriasis).
Ordinal variables or ranked variables are similar to categorical, but can be
put into an order (e.g., a scale for severity of itching).
Descriptive Statistics
Statistics can be broadly divided into descriptive statistics and inferential
statistics.[3,4] Descriptive statistics give a summary about the sample
being studied without drawing any inferences based on probability theory.
Descriptive statistics can be used to describe a single variable (univariate
analysis) or more than one variable (bivariate/multivariate analysis). In the
case of more than one variable, descriptive statistics can help summarize
relationships between variables using tools such as scatter plots.
SAMPLE SIZE
In an ideal study, we should be able to include all units of a particular
population under study, something that is referred to as a census.[5,6] This
would remove the chances of sampling error (difference between the
outcome characteristics in a random sample when compared with the true
population values – something that is virtually unavoidable when you take a
random sample). However, it is obvious that this would not be feasible in
most situations. Hence, we have to study a subset of the population to reach
to our conclusions. This representative subset is a sample and we need to
have sufficient numbers in this sample to make meaningful and accurate
conclusions and reduce the effect of sampling error.
We also need to know that broadly sampling can be divided into two types –
probability sampling and nonprobability sampling.
Examples of probability sampling include methods such as simple random
sampling (each member in a population has an equal chance of being
selected), stratified random sampling (in nonhomogeneous populations, the
population is divided into subgroups – followed be random sampling in each
subgroup), systematic (sampling is based on a systematic technique – e.g.,
every third person is selected for a survey), and cluster sampling (similar to
stratified sampling except that the clusters here are preexisting clusters
unlike stratified sampling where the researcher decides on the stratification
criteria), whereas nonprobability sampling, where every unit in the
population does not have an equal chance of inclusion into the sample,
includes methods such as convenience sampling (e.g., sample selected
based on ease of access) and purposive sampling (where only people who
Page 142 of
150
Page 143 of
meet specific criteria are included in the sample). 150
Page 143 of
150
Page 144 of
150
hoc analysis. A sample size that is too less may make the study
underpowered, whereas a sample size which is more than necessary might
lead to a wastage of resources.
We will first go through the sample size calculation for a hypothesis-based design
(like a randomized control trial).
The important factors to consider for sample size calculation include study
design, type of statistical test, level of significance, power and effect size,
variance (standard deviation for quantitative data), and expected
proportions in the case of qualitative data. This is based on previous data,
either based on previous studies or based on the clinicians' experience. In
case the study is something being conducted for the first time, a pilot study
might be conducted which helps generate these data for further studies
based on a larger sample size). It is also important to know whether the data
follow a normal distribution or not.
================================================================
(3.4) PROBABILITY DISTRIBUTION FOR DESCRIPTIVE ANALYTICS
Page 145 of
150
Page 146 of
150
variables. The three most common descriptive statistics can be displayed
graphically or pictorially and are measures of: Graphical/Pictorial Methods.
Probability
I will just give a brief indroduction of probability.
Before going to the actual definition of Probability let’s look at some
terminologies.
Experiment: An experiment could be something like — whether it
rains in Delhi on a daily basis or not.
Outcome: Outcome is the result of a single trial. If it rains today, the
outcome of today’s trial is “it rained”.
Event: An event is one or more outcomes of an experiment. For the
experiment of whether it rains in Delhi every day the event could be
“it rained” or it didn’t rain.
Probability: This simply the likelihood of an event. So it there’s a 60%
chance of it raining today, the probability of raining is 0.6.
Bernoulli Trials
An experiment which has exactly two outcomes like coin toss is called Bernoulli
Trials.
Probability distribution of the number of successes in n Bernoulli trials is
known as a Binomial distribution.
Formula for Binomial distribution is given below.
Page 146 of
150
Page 147 of
150
Area under a probability density function gives the probability for the
random variable to be in that range.
If I have a population data and I take random samples of equal size from
the data, the sample means are approximately normally distributed.
Normal Distribution
It basically describes how large samples of data look like when they are plotted. It
is sometimes called the “bell curve“ or the “Gaussian curve“.
Inferential statistics and the calculation of probabilities require that a
normal distribution is given. This basically means, that if your data is not
normally distributed, you need to be very careful what statistical tests you
apply to it since they could lead to wrong conclusions.
Page 147 of
150
Page 148 of
150
In a perfect normal distribution, each side is an exact mirror of the other. It
should look like the distribution on the picture below:
Normal Distribution
In a normal distribution, the mean, mode and median are all equal and fall
at the same midline point.
=================================================
===============
Page 148 of
150
Page 149 of
150
(3.5) ANALYSIS OF DESCRIPTIVE ANALYTICS
Page 150 of
150
Page 151 of
150
As soon as the “volume, velocity, and variety” of big data invades the
limited business data silos, the game changes. Now, powered by the hidden
intelligence of massive amounts of market data, descriptive analytics takes
new meaning. Whenever big data intervenes, vanilla-form descriptive
analytics is combined with the extensive capabilities of prescriptive and
predictive analytics to deliver highly-focused insights into business issues
and accurate future predictions based on past data patterns. Descriptive
analytics mines and prepares the data for use by predictive or prescriptive
analytics. Big data lends a wide context to the “nuggets of information” for
telling the whole story.
People and culture can influence the intelligence gathered from business
analytics. The
Analytics: Don’t Forget the Human Element study conducted jointly by Forbes
Insights and EY interviewed global executives and concluded that:
Every modern business needs to build its data analytics framework,
where the latest data technologies like big data play a crucial role.
Data and technology should be made available at every corner of an
enterprise to develop and nurture a widespread data-driven culture.
If data and analytics are aligned with overall business goals, then
day-to-day business decisions will be more driven by data-driven
insights.
As people drive businesses, the manpower engaged in data
analytics must be competent and adequately trained to support
enterprise goals.
A centrally managed team must lead the analytics production and
consumption efforts in the enterprise to bring behavioral change
towards a data culture.
The concept of data analytics must be spread through both formal
data centers and informal social networks for an inclusive growth.
Page 151 of
150
Page 152 of
150
Summarizing past events such as regional sales, customer attrition,
or success of marketing campaigns.
Tabulation of social metrics such as Facebook likes, Tweets, or followers.
Reporting of general trends like hot travel destinations or news trends.
As soon as the “volume, velocity, and variety” of big data invades the
limited business data silos, the game changes. Now, powered by the hidden
intelligence of massive
Page 152 of
150
Page 153 of
150
amounts of market data, descriptive analytics takes new meaning.
Whenever big data intervenes, vanilla-form descriptive analytics is combined
with the extensive capabilities of prescriptive and predictive analytics to
deliver highly-focused insights into business issues and accurate future
predictions based on past data patterns. Descriptive analytics mines and
prepares the data for use by predictive or prescriptive analytics. Big data
lends a wide context to the “nuggets of information” for telling the whole
story.
People and culture can influence the intelligence gathered from business
analytics. The Analytics: Don’t Forget the Human Element study conducted jointly
by Forbes Insights and EY interviewed global executives and concluded that:
Every modern business needs to build its data analytics framework,
where the latest data technologies like big data play a crucial role.
Data and technology should be made available at every corner of an
enterprise to develop and nurture a widespread data-driven culture.
If data and analytics are aligned with overall business goals, then
day-to-day business decisions will be more driven by data-driven
insights.
As people drive businesses, the manpower engaged in data
analytics must be competent and adequately trained to support
enterprise goals.
A centrally managed team must lead the analytics production and
consumption efforts in the enterprise to bring behavioral change
towards a data culture.
The concept of data analytics must be spread through both formal
data centers and informal social networks for an inclusive growth.
Page 154 of
150
Page 155 of
150
Page 156 of
150
Page 157 of
150
finance specialists to C-Suite executives. Stunning, visual dashboards
always help to disseminate complex business information. A judicious
combination of graphs, charts, and other visual elements presented on
dashboards may be the best answer to catch the attention of varied
audience.
Page 158 of
150
Page 159 of
150
As data mining and machine learning jointly offer solutions to predict
customer segments and marketing ROIs, the future predictive analytics
techniques will continue to evolve into Prescriptive Analytics, creating a
mash-up of “predictions, simulations, and optimization.”
==================================================================
==============================================================
UNIT IV
PREDICTIVE ANALYTICS
(Introduction to Predictive analytics - Logic and Data Driven Models -
Predictive Analysis Modeling and procedure - Data Mining for Predictive
analytics. Analysis of Predictive analytics)
Page 160 of
150
Page 161 of
150
Page 162 of
150
Page 163 of
150
2. Collecting the Data
3. Performing the Analysis
4. Modeling
5. Deploying the Model
6. Continuous Model Monitoring
Step 1: Defining the Project
We define the project in terms of outcomes, deliverables, scope of work, and
even data sets. We ask ourselves: what is it that we want to predict? What
degree of accuracy must we achieve to consider the project a success?
What are the data sets we need to have in order to perform the analysis?
Step 2: Collecting the Data
Remember the first diagram contrasting engineering tasks with analytics
tasks? When we collect the data, we are in the engineering phase of the
project. We may even need to build additional automation to make the
collection, cleaning, and preparation of the data easier and faster. This may
be grunt work, but it's crucial grunt work. Garbage in, garbage out, as the
saying goes. You want to ensure the data you prepare is good enough at
this stage.
Step 3: Performing the Analysis
Now, we perform analysis on the sanitized data. The objective here is to
derive several candidate models we may use for prediction. This step is
usually done in tandem with the next step, which is creating the model
itself. In this step, we frequently employ statistical methods specific for
analysis. Plenty of testing is done here as well. We want to validate the
assumptions and test them using standard statistical techniques.
Step 4: Modeling
In this step, we use various statistical and machine learning approaches to
generate a predictive model. We take several candidate models from the
previous step and iteratively train them using training data. When the model
completes the training set, we then test it on new data to see how well it
performs. We may further refine the models or even discard some of them
altogether depending on the testing results. We may also revert to earlier
Page 163 of
150
Page 164 of
150
stages in order to generate more models, especially if we are not getting the
level of accuracy we are looking for.
Step 5: Deploying the Model
Page 164 of
150
Page 165 of
150
Eventually, we'll arrive at a predictive model we are willing to go live with.
Now, we deploy it in real-life systems. At this stage, we are back to doing
more engineering tasks. We want to deploy the new model in such a way
that we can collect more data on its performance. Perhaps the model needs
to incorporate new data points as they arrive for maximum effectiveness; in
this case, more engineering is required to ensure this feature is properly
implemented.
Step 6: Continuous Model Monitoring
Our predictive model has gone live. We are collecting data on how it's
performing in real time. Since we will need to regularly tweak the model as
time goes by, we need to continuously monitor the model and its
effectiveness. Deciding on how much to tweak the model is more art than
science. It can be hard to perform analysis in real time, so your data
engineers and your data analysts need to work more closely here.
Page 165 of
150
Page 166 of
150
The process of Predictive Analytics includes the following steps :
Defining the project : This is the first step of the Predictive Analytics model.
Here you will have a clear-cut definition of the outcome of the project, the
business objectives, the scope of the effort, identifying data sets and more.
Collecting the data : This is the second step of the process wherein you will be
mining for the data from multiple sources and prepare the Predictive
Analytics mode and provide a complete overview of the entire process.
Analyzing the data : This is the process that includes the various steps of
inspection, cleaning, modelling of the data for discovering the objective and
help to reach at a conclusion.
Deploying the statistics : Here you will be working on validating the assumptions
and hypothesis and testing it using the standard statistical models.
Data Modeling : This is the process that provides the ability to create
automatic predictive models of the future. You can also create a set of
models and choose the most optimal one.
Model Deployment : This is the step in which you will be deploying the
analytical results into your everyday business operations helping to get
results, reports and the output of the automated decisions.
Monitoring the Model : The models are reviewed in order to ensure the
performance of it is going in the right direction.
The Predictive models are the relation between the specific performance of
a unit in the sample and other attributes in the unit. The model is designed
in order to understand the possibility of a different sample that exhibits the
same specific performance. It is
Page 166 of
150
Page 167 of
150
used in many domains for the purpose of answering a whole set of questions
in various areas like marketing, sales, customer service among other
domains.
Predictive Analytics is used for various purposes like business
segmentation, decision- making and other purposes like statistical
techniques among other tasks. There is a huge advancement in the speeds
at which computing is done, the availability of modeling techniques to come
up with valuable insights.
Page 168 of
150
Page 169 of
150
INDUSTRIES USING PREDICTIVE ANALYTICS MODEL
Aerospace : The amount of data that is generated in the modern era
aircrafts is phenomenal. Today due to the abundance of sensors, newer
ways of storing the data and finding various ways in which that data can be
useful, Predictive Analytics is suddenly taking a huge stride in the
aerospace industry.
Automotive : Today’s automobiles are heavily invested when it comes to
deploying the most cutting-edge gadgets, technologies, sensors for coming
up with highly valuable analytical methods for ensuring the driving
experience is simply phenomenal. In the not so distant future, most of the
automobiles will be connected to the internet of things and due to this the
role of Predictive Analytics will only grow stronger.
Energy & Utilities : This is another domain wherein the role of Predictive
Analytics is again very significant. It helps to predict the demand and supply
of electrical energy through the power grids. There are various sophisticated
models that are used for monitoring the plant availability, impact of
changing weather pattern, learning from historical trends, forecasting the
optimal demand and supply balance among other things that can help the
energy domain save huge amounts of money and resources. Banking and
Financial Services : This is one of the biggest domains that is currently
deploying Predictive Analytics at scale. Due to the large amounts of data
being generated and the extremely high stakes involved, banking and
financial institutions are increasingly deploying Predictive Analytics for
ensuring the customers get a world-class experience that is customer-
friendly, secure and forward-looking. It is possible to tailor- make products
and services depending on the profile built around the customer,
opportunities for cross-selling and up-selling, find patterns of fraud and
malpractices among a host of other things.
Retail : The retail industry is working with predictive analytical tools and
technologies to get inside the mind of the customers. It includes the process
of stocking the right products, promoting the right products to the right
customers, providing the most optimal discounts to persuade sales, having
Page 169 of
150
Page 170 of
150of other
the right strategy for marketing and advertising among a whole host
aspects.
Oil & Gas : The industry of oil and gas is a big user of the domain of
Predictive Analytics. it helps to save millions of dollars through better
predicting equipment failure, need for future resources, ensuring safety and
reliability measures are met, and so on.
Page 170 of
150
Page 171 of
150
There are a lot of sensor data that needs to be monitored in order to take
the right data- driven decision in the oil and gas industry.
Governments : Since the data in a government department is humungous
thanks to the all-encompassing nature of this domain, there is a huge
untapped opportunity which can be aptly exploited using the right Predictive
Analytics tools and technologies. It could be deployed for providing the right
services to the citizens, monitoring the various welfare schemes are
reaching the right audience, checking corruption and malpractices and so
on.
Manufacturing : Even in today’s world of services-oriented economy the
domain of manufacturing is still extremely important. The manufacturing
industry can make use of Predictive Analytics in order to streamline the
various processes, improve the quality of service, supply chain
management, optimizing distribution and such other tasks for enhancing the
overall business revenue and achieve bigger goals.
===============================================================
Page 171 of
150
Page 172 of
150
(4.2) PREDICTIVE ANALYSIS - LOGIC AND DATA DRIVEN MODELS
PREDICTIVE MODELING
What is predictive modeling?
Predictive modeling is a mathematical process used to predict future
events or outcomes by analyzing patterns in a given set of input data. It is
a crucial component
of predictive analytics, a type of data analytics which uses current and historical
data to forecast activity, behavior and trends.
Examples of predictive modeling include estimating the quality of a sales
lead, the likelihood of spam or the probability someone will click a link or
buy a product. These capabilities are often baked into various business
applications, so it is worth understanding the mechanics of predictive
modeling to troubleshoot and improve performance.
Page 173 of
150
Page 174 of
150
Supervised models use newer machine learning techniques such
as neural networks to identify patterns buried in data that has
already been labeled.
The biggest difference between these approaches is that with supervised
models more care must be taken to properly label data sets upfront.
"The application of different types of models tends to be more domain-
specific than industry-specific," said Scott Buchholz, government and
public services CTO and emerging technology research director at
Deloitte Consulting.
In certain cases, for example, standard statistical regression analysis may
provide the best predictive power. In other cases, more sophisticated
models are the right approach. For example, in a hospital, classic statistical
techniques may be enough to identify key constraints for scheduling, but
neural networks, a type of deep learning, may be required to optimize
patient assignment to doctors.
Once data scientists gather this sample data, they must select the right
model. Linear regressions are among the simplest types of predictive
models. Linear models take two variables that are correlated -- one
independent and the other dependent -- and plot one on the x-axis and one
on the y-axis. The model applies a best fit line to the resulting data points.
Data scientists can use this to predict future occurrences of the dependent
variable.
Page 175 of
150
Page 176 of
150
Neural networks. This technique reviews large volumes of labeled
data in search of correlations between variables in the data. Neural
networks form the basis of many of today's examples of artificial
intelligence (AI), including image recognition, smart assistants and
natural language generation.
The most complex area of predictive modeling is the neural network. This
type of machine learning model independently reviews large volumes of
labeled data in search of correlations between variables in the data. It can
detect even subtle correlations that only emerge after reviewing millions of
data points. The algorithm can then make inferences about unlabeled data
files that are similar in type to the data set it trained on.
Page 177 of
150
Page 178 of
150
BENEFITS OF PREDICTIVE MODELING
Phil Cooper, group VP of products at Clari, a RevOps software startup, said
some of the top benefits of predictive modeling in business include the
following:
Prioritizing resources. Predictive modeling is used to identify sales lead
conversion and send the best leads to inside sales teams; predict
whether a customer service case will be escalated and triage and
route it appropriately; and predict whether a customer will pay their
invoice on time and optimize accounts receivable workflows.
Improving profit margins. Predictive modeling is used to forecast
inventory, create pricing strategies, predict the number of
customers and configure store layouts to maximize sales.
Optimizing marketing campaigns. Predictive modeling is used to
unearth new customer insights and predict behaviors based on
inputs, allowing organizations to tailor marketing strategies, retain
valuable customers and take advantage of cross-sell opportunities.
Reducing risk. Predictive analytics can detect activities that are out
of the ordinary such as fraudulent transactions, corporate spying or
cyber attacks to reduce reaction time and negative consequences.
The techniques used in predictive modeling are probabilistic as opposed
to deterministic. This means models generate probabilities of an
outcome and include some uncertainty.
"This is a fundamental and inherent difference between data modeling of
historical facts versus predicting future events [based on historical data] and
has implications for how this information is communicated to users," Cooper
said. Understanding this difference is a critical necessity for transparency
and explainability in how a prediction or recommendation was generated.
Page 179 of
150
Page 180 of
150
Different tools have different data literacy requirements, are effective in
different use cases, are best used with similar software and can be
expensive. Once your organization has clarity on these issues, comparing
tools becomes easier.
Page 181 of
150
Page 182 of
150
(4.3) DATA MINING FOR PREDICTIVE ANALYTICS
DATA MINING
Data mining is the process of discovering patterns and anomalies form large
sets of data through the application of statistics, machine learning and
database systems [17]. The objective of data mining is to extract the useful
data from data set and process them to obtain information that has an
Page 182 of
150
Page 183 of
understandable structure. It includes provision for automated and150
semi-
automated analysis of large quantities of data set to extract recognizable
patterns, dependencies between the data, groups or clusters of records
Page 183 of
150
Page 184 of
150
and the relation between these records as well as the anomalies occurring
within the system. Data mining could be applied to identify multiple groups
and clusters of data and forward them for further analysis through machine
learning or predictive analytics.
Page 185 of
150
Page 186 of
150
always bought chocolates along with milk. The milk in this case gets
associated with chocolates and thus a pattern is emerged that provides
information about the sale of milk and chocolates and it suggests that the
next time the person buys the chocolate, he’ll be buying milk as well.
2. Classification – Classification, as the name suggests is the process of
identifying an object by relating with it, multiple attributes that describe that
object thoroughly. Classification is used to build up the reference or idea of
an object by describing it using multiple attributes to define its particular
class. Let us try to understand the same though an example of classification
of cars. Given a car as an object, we can identify or classify it through
multiple attributes such as shape, number of seats, transition type etc.
Through proper comparison and analysis of the categories or attributes, one
can identify the object to be classified into a similar kind of attribute.
3.Clustering – Once the attributes are applied to an object, through proper
and considerate examination of the attributes or classes, one can identify it
into groups of similar kind of objects. Such group of similar objects is
referred to as a cluster. A cluster uses a single or a group of attributes or
classes defined of an object as its base for segregation and combine a set of
results having correlating group of objects. Clustering works bi-directionally.
It is useful in identifying a group of objects having a set of similar attributes
or classes. Also it is useful in identifying the identification criteria for an
existing cluster of objects.
4.Prediction – Prediction is referred to as a wide area of study that ranges
from predicting about the failure of a machinery to identification of frauds
and intrusions to even predicting the future aspect of the organization or
business. When combined with the techniques of data mining, prediction
involves various tasks such as analysis and creation of trends,
classification, clustering, pattern discovery and relation. Through thorough
examination and analysis over the past trends or events, one can make
effective measurement regarding the future occurrence of that event.
5.Sequential Patterns - Sequential patterns are defined over a long period of
time wherein trends and similar activities or events are identified to be
Page 186 of
150
Page 187 of
150
occurring on a regular basis [23]. It is considered a very useful technique to
identify the trends and similar events. Considering an example of a
customer at a grocery store who buys a collection of objects regularly over
the year or so. Through analytics over the historical data,
Page 187 of
150
Page 188 of
150
patterns of grocery bought over the year can be used to provide the
sequential pattern of what is to be added to the grocery list.
6.Decision trees – Decision trees are basically related to the above defined
techniques. They can be either used for providing the criteria for selection or
to provide support to the selection and use of specific data within the overall
structure. A decision tree is started and created through a question that has
more than a single result or option to be selected [14]. Each of the result in
turn leads to another question that it can be categorized into a result set of
more than one option. These subsequent questions leads to the
categorization of the data set so as to facilitate the prediction based on the
result set.
7.Combinations – In real world applications, several techniques are applied in
combination. Exclusive use of a single technique is not valid. Clustering and
combination are similar techniques and are used combined so as to achieve
the desired effective result. Clustering can be used to identify the nearest
neighbors which can be useful in refinement of the classification of the data
set in use. In the same way, decision trees can act as base for the
techniques of sequential patterns and is used to identify and build
classification which when done over a longer period of time can be used to
identify the patterns of similarity between the data set and could help in
prediction.
8.Long Term Processing – Data analytics and predictive analytics is purely
based on the data and information processed over a period of time. There is
a need to record the data for a long term and then process the data for the
patterns, classification, categorization and prediction. For example, for
predictive learning and sequential patterns, there is a need to store and
process the historical data and instances of information for building a
pattern. As the time passes, new data and information is identified and
processed along with the historical data and information and the analytics is
applied on the whole set of data so as to cope with the additional data.
Page 189 of
150
Page 190 of
150
Regression techniques are focused on establishment of mathematical
equations so as to model, represent and process the information from the
available data set [25]. Some of the regression techniques being in use are
described as follows.
a. Linear Regression Model – This technique establishes a linear relationship
between the dependent variable y and multiple independent variables x. It is
represented through the linear equation y=a+bx+c
Page 191 of
150
Page 192 of
150
Page 193 of
150
Page 194 of
150
and methods for prediction of risks and opportunities and is found
applicable in various fields such as banking fraud detection, medical
diagnosis, natural language processing and analysis over the stock market.
Some of the methods commonly used for predictive analytics through
machine learning are defined as follows.
a. Neural Networks – These are nonlinear modelling techniques wherein they
learn the relationship between the inputs and the outputs through training.
There are 3 types of trainings used by neural networks: supervised,
unsupervised and reinforcement training. This technique can be applied for
prediction, control and classification in various fields.
b. Multilayer perceptron – This technique consists of an output and an input
layer with multiple hidden layers of nonlinear weights and is determined and
defined through the weight factors by adjusting the weight of the network.
The adjustment of the weights is done through a process called training the
nets that contains the learning rules.
c. Radial basis functions – Radial basis function technique is built on the
criteria of distance of data set with reference to the center. These functions
are basically used for interpolation of data as well as smoothening of data.
d. Support vector machines – SVMs are designed and defined to detect and
identify the complex patterns and sequences within the data set through
clustering and classification of the data. They are also referred to as the
learning machines.
e. Naïve Bayes – Naïve Bayes is deployed for the execution of classification
of data through the application of Bayes Conditional Probability [8]. It is
basically implemented and applied when the number of predictors is very
high.
f. k-nearest neighbours – This techniques involves pattern recognition
techniques of statistical prediction. It consists of a training set with both
positive and negative values.
g. Geospatial Predictive Modelling – This modelling technique involves
presence of occurrence of events over a spatial area with influence of
special environmental factors. The occurrence of events is defined to be
Page 194 of
150
Page 195 of
neither uniform not random but special in nature. 150
Page 195 of
150
Page 196 of
150
defined for an organization. CRM makes use of these analysis in applications
for increasing the sales targets, marketing and campaigns [2][5]. This not
only impacts the business growth, but also makes the business customer
eccentric through widening the base for customer satisfaction [7].
2. Child Protection – Child abuse is a serious offence and child protection is
most sought after within any country [18]. Several child welfare
organizations have applied the predictive analytics to flag high risk cases of
child abuse [11]. The predictive models help in identifying from medical
records, the cases that could fall under the child abuse criteria. This
approach is termed as “innovative” by the Commission to Eliminate Child
Abuse and Neglect Fatalities (CECANF) [19]. Using the predictive analytics,
the child abuse related felonies have been identified at the earlier stage
preventing from much further harm [20].
3. Clinical decision support systems – As defined, Clinical decision support
(CDS) provides clinicians, staff, patients, or other individuals with knowledge
and person- specific information, intelligently filtered or presented at
appropriate times, to enhance health and health care [10][21]. It
encompasses a variety of tools and interventions such as computerized
alerts and reminders, clinical guidelines, order sets, patient data reports and
dashboards, documentation templates, diagnostic support, and clinical
workflow tools. [22] Experts have involved the predictive analytics to model
the clinical data of patients so as determine the extent to which a patient
might be exposed to a disease and predict the risk of development of certain
conditions such as heart disease, asthma or diabetes. These approaches
have been devised so as to predict both the state and level of the disease as
well as the diagnosis and disease progression forecasting [12][13].
4. Collection Analytics – Many portfolios these days has a set of customer
who doesn’t make their payment within the defined time and the companies
put up a lot of financial expenditure on collection of those payments [27].
Thus the companies have started applying the predictive analytics over their
customers for effective analysis of the spending, usage and behavior of the
customer who are unable to make the payment and allocate the most
Page 196 of
150
Page 197 of
150
effective legal agencies and strategies for each customer, thus increasing
recovery significantly with lesser financial expenditure.
Page 197 of
150
Page 198 of
150
5. Fraud Detection – Fraud is one of the biggest challenges faced by
businesses around the globe and can be of various kinds such as fraudulent
online transactions, invalid credits [15], identity thefts and multiple false
insurance claims. Predictive modelling can be applied to this area so as to
model the data of the organization and detect such fraudulent activities
[24]. These models have the capabilities to identify and predict the
customers engaged in such activities. Many revenue systems too take in this
consideration to mine out the non-tax payers and identify tax fraud [24].
6. Project Risk Management – Each company employs a risk management
technique so as to increase their revenue. These risk management
techniques involves the use of predictive analytics to predict the cost and
benefit of a project within an organization and also helps organizing the work
management so as to maximize the profit statement. These approaches can
be applied ranging from projects to markets so as to maximize the return
from the investment.
CONCLUSIONS
Predictive analytics is the future of Data Mining. This survey provided with
the trends, techniques and applications of Predictive analytics through the
application of data mining. Data Mining leading to predictive analytics is
becoming key to every organization as it can be applied under various
circumstances so as to highlight growth of the organization. Predictive
Analytics aid not only in expansion of the business, but also prevents the
degradation through analysis of the fraudulent activities.
================================================================
(4.4) ANALYSIS OF PREDICTIVE ANALYTICS
What Is Predictive Analytics?
The term predictive analytics refers to the use of statistics and modeling
techniques to make predictions about future outcomes and performance.
Predictive analytics looks at current and historical data patterns to
determine if those patterns are likely to emerge again. This allows
businesses and investors to adjust where they use their resources to take
Page 198 of
150
Page 199 of
150used to
advantage of possible future events. Predictive analysis can also be
improve operational efficiencies and reduce risk.
Page 199 of
150
Page 200 of
150
PREDICTIVE ANALYTICS IS A DECISION-MAKING TOOL
IN A VARIETY OF INDUSTRIES.
1. Forecasting
Forecasting is essential in manufacturing because it ensures the optimal
utilization of resources in a supply chain. Critical spokes of the supply
chain wheel, whether it is inventory management or the shop floor,
require accurate forecasts for functioning.
Predictive modeling is often used to clean and optimize the quality of data
used for such forecasts. Modeling ensures that more data can be ingested by
the system, including from customer-facing operations, to ensure a more
accurate forecast.
Forecasting
Forecasting is essential in manufacturing because it ensures the optimal
utilization of resources in a supply chain. Critical spokes of the supply
chain wheel, whether it is inventory management or the shop floor,
require accurate forecasts for functioning.
Predictive modeling is often used to clean and optimize the quality of data
used for such forecasts. Modeling ensures that more data can be ingested by
the system, including from customer-facing operations, to ensure a more
accurate forecast.
2. Credit
Credit scoring makes extensive use of predictive analytics. When a
consumer or business applies for credit, data on the applicant's credit
history and the credit record of borrowers with similar characteristics are
used to predict the risk that the applicant might fail to perform on any credit
extended.
3. Underwriting
Data and predictive analytics play an important role in underwriting.
Insurance companies examine policy applicants to determine the likelihood
of having to pay out for a future claim based on the current risk pool of
similar policyholders, as well as past events that have resulted in payouts.
Predictive models that consider characteristics in comparison to data about
Page 200 of
150
Page 201 of
150
past policyholders and claims are routinely used by actuaries. 4.Marketing
Individuals who work in this field look at how consumers have reacted to the
overall economy when planning on a new campaign. They can use these
shifts in demographics to determine if the current mix of products will entice
consumers to make a purchase.
Page 201 of
150
Page 202 of
150
Active traders, meanwhile, look at a variety of metrics based on past events
when deciding whether to buy or sell a security. Moving averages, bands,
and breakpoints are based on historical data and are used to forecast future
price movements.
Credit scoring makes extensive use of predictive analytics. When a
consumer or business applies for credit, data on the applicant's credit
history and the credit record of borrowers with similar characteristics are
used to predict the risk that the applicant might fail to perform on any credit
extended.
5.Underwriting
Data and predictive analytics play an important role in underwriting.
Insurance companies examine policy applicants to determine the likelihood
of having to pay out for a future claim based on the current risk pool of
similar policyholders, as well as past events that have resulted in payouts.
Predictive models that consider characteristics in comparison to data about
past policyholders and claims are routinely used by actuaries. 6.Marketing
Individuals who work in this field look at how consumers have reacted to the
overall economy when planning on a new campaign. They can use these
shifts in demographics to determine if the current mix of products will entice
consumers to make a purchase.
Active traders, meanwhile, look at a variety of metrics based on past events
when deciding whether to buy or sell a security. Moving averages, bands,
and breakpoints are based on historical data and are used to forecast future
price movements.
Page 203 of
150
Page 204 of
150
2. Regression
This is the model that is used the most in statistical analysis. Use it when
you want to determine patterns in large sets of data and when there's a
linear relationship between the inputs. This method works by figuring out a
formula, which represents the relationship between all the inputs found in
the dataset. For example, you can use regression to figure out how price
and other key factors can shape the performance of a security.
3. Neural Networks
Neural networks were developed as a form of predictive analytics by
imitating the way the human brain works. This model can deal with complex
data relationships using artificial intelligence and pattern recognition. Use it
if you have several hurdles that you need to overcome like when you have
too much data on hand, when you don't have the formula you need to help
you find a relationship between the inputs and outputs in your dataset, or
when you need to make predictions rather than come up with explanations.
Page 205 of
150
Page 206 of
150
Customer service
Investment portfolio development
Page 206 of
150
Page 207 of
150
UNIT V
PRESCRITIVE ANALYTICS
(Introduction to Prescriptive analytics - Prescriptive Modeling - Non Linear
Optimisation
- Demonstrating Business Performance Improvement).
Prescriptive analytics is the third and final phase of business analytics, which
also includes descriptive and predictive analytics.
Prescriptive Analytics entails the application of mathematical and
computational sciences and suggests decision options to take advantage of
the results of descriptive and predictive analytics.
Page 207 of
150
Page 208 of
150 we
predicted outcomes. In essence, prescriptive analytics takes the “what
know”
Page 208 of
150
Page 209 of
150
(data), comprehensively understands that data to predict what could
happen, and suggests the best steps forward based on informed
simulations.
Prescriptive analytics is the third and final tier in modern, computerized data
processing. These three tiers include:
Descriptive analytics: Descriptive analytics acts as an initial catalyst
to clear and concise data analysis. It is the “what we know” (current
user data, real-time data, previous engagement data, and big data).
Predictive analytics: Predictive analytics applies mathematical
models to the current data to inform (predict) future behavior. It is
the “what could happen."
Prescriptive analytics: Prescriptive analytics utilizes similar modeling
structures to predict outcomes and then utilizes a combination of
machine learning, business rules, artificial intelligence, and algorithms
to simulate various approaches to these numerous outcomes. It then
suggests the best possible actions to optimize business practices. It is
the “what should happen.”
Page 210 of
150
Page 211 of
150
How Prescriptive Analytics Works
Prescriptive analytics relies on artificial intelligence techniques, such as
machine learning—the ability of a computer program, without additional
human input, to understand and advance from the data it acquires, adapting
all the while. Machine learning makes it possible to process a tremendous
amount of data available today. As new or additional data becomes
available, computer programs adjust automatically to make use of it, in a
process that is much faster and more comprehensive than human
capabilities could manage.
Prescriptive analytics works with another type of data analytics, predictive
analytics, which involves the use of statistics and modeling to determine
future performance, based on current and historical data. However, it goes
further: Using the predictive analytics' estimation of what is likely to
happen, it recommends what future course to take.
Numerous types of data-intensive businesses and government
agencies can benefit from using prescriptive analytics, including those in
the financial services and health care sectors, where the cost of human
error is high.
Prescriptive analytics makes use of machine learning to help
businesses decide a course of action based on a computer program’s
predictions.
Prescriptive analytics works with descriptive analytics and
predictive analytics, which uses data to determine near-term
outcomes.
When used effectively, prescriptive analytics can help organizations
make decisions based on facts and probability-weighted projections,
rather than jump to under-informed conclusions based on instinct.
Page 212 of
150
Page 213 of
150
reins of business intelligence to apply simulated actions to a scenario
to produce the steps necessary to avoid failure or achieve success.
2.Inform real-time and long-term business operations. Decision makers
can view both real-time and forecasted data simultaneously to make
decisions that support sustained growth and success. This
streamlines decision making by offering specific recommendations.
3.Spend less time thinking and more time doing. The instant turnaround
of data analysis and outcome prediction lets your team spend less
time finding problems and more time designing the perfect solutions.
Artificial intelligence can curate and process data better than your
team of data engineers and in a fraction of the time.
4.Reduce human error or bias. Through more advanced
algorithms and machine learning processes, predictive analytics
provides an even more comprehensive and accurate form of
data aggregation and analysis than descriptive analytics,
predictive analytics, or even individuals.
Page 214 of
150
Page 215 of
150
Pharmaceutical research – identifying the best testing and patient
groups for clinical trials.
==============================================================
(5.2) PRESCRIPTIVE MODELING - PRESCRIPTIVE PROCESS MODELS
Prescriptive analytics model businesses while taking into account all inputs,
processes and outputs. Models are calibrated and validated to ensure they
accurately reflect business processes. Prescriptive analytics recommend the
best way forward with actionable information to maximize overall returns
and profitability.
Page 215 of
150
Page 216 of
150
Page 218 of
150
Page 219 of
150
The planning of design is required before the whole system is broken
into small increments.
The demands of customer for the additional functionalities after every
increment causes problem during the system architecture.
3. RAD model
RAD is a Rapid Application Development model.
Using the RAD model, software product is developed in a short period of
time.
The initial activity starts with the communication between
customer and developer.
Planning depends upon the initial requirements and then the
requirements are divided into groups.
Planning is more important to work together on different modules.
The RAD model consist of following phases:
1. Business Modeling
Business modeling consist of the flow of information between various
functions in the project.
For example what type of information is produced by every function
and which are the functions to handle that information.
A complete business analysis should be performed to get the
essential business information.
2. Data modeling
The information in the business modeling phase is refined into the set
of objects and it is essential for the business.
The attributes of each object are identified and define the
relationship between objects.
3. Process modeling
The data objects defined in the data modeling phase are changed
to fulfil the information flow to implement the business model.
The process description is created for adding, modifying, deleting or
retrieving a data object.
Page 219 of
150
Page 220 of
4. Application generation 150
Page 220 of
150
Page 221 of
150
In the application generation phase, the actual system is built.
To construct the software the automated tools are used.
5. Testing and turnover
The prototypes are independently tested after each iteration so that
the overall testing time is reduced.
The data flow and the interfaces between all the components are are
fully tested. Hence, most of the programming components are already
tested.
==============================================================
Page 222 of
150
Page 223 of
150
decision oriented mathematical models, as well as to the growing
availability of computer routines capable of solving large-scale
nonlinear problems.
Page 223 of
150
Page 224 of
150
Page 224 of
150
Page 225 of
150
What are the three common elements of an optimization problem?
Optimization problems are classified according to the mathematical
characteristics of the objective function, the constraints, and the
controllable decision variables.
Optimization problems are made up of three basic ingredients: An objective
function that we want to minimize or maximize.
OPTIMIZATION
The concept of optimization is now well rooted as a principle underlying
the analysis of many complex decision or allocation problems. It offers a
certain degree of philosophical elegance that is hard to dispute, and it
often offers an indispensable degree
of operational simplicity. Using this optimization philosophy, one
approaches a complex decision problem, involving the selection of
values for a number of interrelated
variables, by focusing attention on a single objective designed to quantify
performance and measure the quality of the decision. This one objective is
maximized (or minimized, depending on the formulation) subject to the
constraints that
may limit the selection of decision variable values.
Page 225 of
150
Page 226 of
150
Disadvantage of heuristics
While logical, heuristics can't provide answers which haven't been
predetermined. This means that heuristics cannot analyze all possible
scenarios, a particular limitation
of Excel-based scenario analysis. It's possible answers may not be optimal,
nor always feasible. Rules need constant maintenance to reflect changes,
and it's difficult to account for constraints.
Page 227 of
150
Page 228 of
150
Optimization modeling can handle numerous constraints and trade-offs to
determine the best combination of feasible business activities to optimize the
objective function.
Page 229 of
150
Page 230 of
150
Advantages of packages
Especially with cloud-based packages, configuration is fast and setup can
be accomplished within a few days. Packages work well with standardized
applications, especially in large organizations with many branches.
Solutions require minimal IT involvement.
Optimization Platforms
These comprise of two parts, the modeling platform and an optimization
solver. They're sometimes sold as separate solutions, although in other
instances, vendors supply a complete platform comprising both functions. In
both instances, the user must research, design and develop the model.
Third and fourth generation optimization modeling platforms are based on
high-level programming languages such as Python, Ruby and SQL, and
they require a degree of expertise to write algorithms that define the
model and also to validate the model. Fifth generation modeling software
that uses a drag-and-drop approach to create a constraint-based model is
simpler and easy to use.
Page 231 of
150
Page 232 of
150
prepare. They're able to have a greater impact on value without sacrificing the
need for flexibility, agility, and user-friendly interfaces.
Page 233 of
150
Page 234 of
150
Page 235 of
150
Page 236 of
150
Page 237 of
150
Page 238 of
150
Inputs – What does the output look like? What data should be collected?
What methods will be used for the coalition of the data?
Business objective – What is the business objective? What is the expected
value of the product/service/whatever? What are the most important
constraints?
Outputs – What is the expected outcome? How will it be measured?
We do this so we know what outcome we are eventually trying to achieve.
Prescriptive analytics is typically used in situations where there is no causal
relationship between two or more variables. It is useful when you want to
know how to
change the condition of the dependent variable in order to maximize the
likelihood of the outcome you are trying to achieve.
Page 239 of
150
Page 240 of
150
prescriptive analytics by offering faster processing speeds with more robust
memory capabilities.
Page 241 of
150
Page 242 of
150
Prescriptive analytics is largely dedicated to the question “how to?” rather than
“what if”.
While one could think of predictive analytics as determining the outcome of
a forecast, it’s still about achieving an outcome. Prescriptive analytics, on
the other hand, don’t really care what the outcome is: all it is concerned
about is what should happen and how to achieve it. While predictive
analytics and prescriptive analytics might seem like two sides of the same
coin, they’re really not. Predictive analytics is about providing support for
an existing decision, while prescriptive analytics helps you develop a new
decision — there’s still a lot of overlap between the two, but we’ll get to
that.
One example to differentiate between predictive and prescriptive is:
You need a predictive model to find out what customers are likely
to buy. But you need
prescriptive models to find out which is the best communication channel, or
how many emails you should send to increase the chance of purchase.
Page 243 of
150
Page 244 of
150
4. Achieving higher agility: By simulating and running different
scenarios of sudden market shifts, you find the best ways to respond
to those shifts quickly and to your advantage. You get to make near-
time decisions instead of waiting for weeks.
5. Creating long-term strategies: Prescriptive analytics removes data
silos and brings teams together in a collaborative model for the long
run.
6. Managing risk management: By answering complex questions related
to demand and supply, your business can optimise investments and
reduce risk.
7. Fighting retail fraud: Prescriptive analytics is taking the guesswork out
of fraud management. Especially after the pandemic, retailers are
identifying fraud impact and using data and tools to detect fraudulent
e-commerce returns claims and minimise cashier fraud in stores.
Knowing the right solution to the problem is half the battle won. At Infosys
BPM, we’ve got more than just the know-how. For organizations on the
digital transformation journey, agility is key in responding to a rapidly
changing technology and business landscape.
Page 245 of
150
Page 246 of
150
intelligence, a prescriptive analytics solution can recommend the optimal
action plan likely to drive specific business outcomes.
Page 247 of
150
Page 248 of
150
The solution eliminates bottlenecks and enables the best possible use of
materials, staff and machine capacity.
Supply chain management
Manufacturers have a finite number of trucks, warehouses and drivers and
a perhaps infinite number of potential delivery addresses. On any given
day, they need to decide how many drivers and trucks to put on the road,
but also need to keep the costs of equipment, fuel and drivers to a
minimum while meeting delivery commitments.
FleetPride, North America’s largest distributor of truck and trailer
parts in the independent heavy-duty aftermarket channel, is transforming
its supply chain management with IBM Decision Optimization. The solution
helps FleetPride work out optimal warehouse locations, minimize delivery
time and costs across its network. That results in reduced labor costs and
higher revenues.
Detailed scheduling
With products becoming more complex and supply chains more extended,
production scheduling is often highly complex. Production facilities may only
have a few days each month to handle specific customer orders.
Optimization considers not only asset use and production objectives in
scheduling decisions, but also their impact on shift configurations, selection
of regular versus overtime labor and other constraints.
Inventory optimization
Stocking the right products in the right quantities at the right locations
requires precision. Decision optimization software helps firms manage their
inventories to meet customer demand while reducing costs. It enables
firms to compare multiple planning scenarios using what-if analysis and
choose the best option, avoiding under- and overstocks and freeing up
capital for reinvestment elsewhere.
Maintenance scheduling
In manufacturing, when an asset breaks down, every minute lost is costly.
When equipment breaks down, it reduces profitability in a variety of ways,
including production downtime, higher labor costs per unit, and added stress
Page 248 of
150
Page 249 of
on employees and machines. 150
Decision optimization can recommend the best time and sequence for
scheduling maintenance tasks in relation to production targets, downtime,
inventory requirements and other interdependencies. The results can then
be fed into companies’ enterprise resource planning, business intelligence,
logistics and other enterprise systems to
Page 249 of
150
Page 250 of
150
continuously reoptimize decisions as conditions change. This adds up to
major time savings, increased agility and greater ROI.
Start turning predictions into decisions
Manufacturers ready to take the next step toward prescriptive
analytics should focus on complex decisions where they have a lot of data.
These should be in areas where they are already getting good results from
descriptive and predictive analytics but need help turning those predictions
into decisions.
When it comes to solving complex manufacturing challenges, decision
optimization can help companies stay focused on their goals. IBM Decision
Optimization offers all the capabilities manufacturing firms need to
capitalize on the power of prescriptive decision making.
Page 251 of
150
Page 252 of
It increases the likelihood that companies will approach and150
plan
for internal growth properly.
Qualitative research method — know the characteristics that distinguish it.
Page 252 of
150
Page 253 of
150
Production optimization.
Efficient supply chain management.
Improved customer service and experience.
Due to its complexity, there are still few companies that use prescriptive
analysis. However, prescriptive analysis benefits have already become
evident in many fields, including, but not limited to, healthcare, insurance,
financial risk management, and sales and marketing operations.
Among its most significant advantages, it stands out that it allows decision
making based on data, Allowing an end-to-end view of costs, processes,
and performance.
It is possible to quantify risks and have access to actions considered ideal in
different circumstances. Algorithms have the ability, through current data,
to predict consequences arising from each decision made.
Therefore, it allows you to follow the path that offers more satisfactory results.
The prescriptive analysis allows more effective planning to be carried out in
Marketing and Sales actions, bringing information that significantly impacts
business intelligence. Therefore, decisions are made according to facts,
knowing the consequences that will arise from them.
Despite the prescriptive analysis potential, it will only affect a joint
work between machine and human being.
That’s because technology doesn’t make decisions alone. She organizes the
information, analyzes the scenario, and indicates the best thing to do,
leaving the professional to proceed with the suggestions.
Conclusion
As we saw, Prescriptive Analytics has great potential to support businesses,
optimizing resources, and increasing operational efficiency.
==================================================================
==============================================================
Page 253 of
150
Page 254 of
150
LESSON QUESTIONS - (2 MARKS for some questions) / (15 Marks for some
questions)
SL QUESTIONS
No.
1 What is business analytics?
Page 254 of
150
Page 255 of
150
24 What is HR analytics?
Page 255 of
150
Page 256 of
150
29 What is SQL and its advantages in Data Sets & Data Bases.
Page 256 of
150
Page 257 of
150
44 Can descriptive statistics be used to make inference or
prediction?
Page 257 of
150
Page 258 of
150
60 What techniques are used in descriptive analytics?
Page 258 of
150
Page 259 of
150
76 Explain the 6 Tasks and Factions of Data Mining 98
81 UNIT V 110
PRESCRITIVE ANALYTICS
What Is Prescriptive Analytics?
Page 260 of
150
Page 261 of
150
Prescriptive Analytics
Page 261 of
150