Introduction to Data Analysis
FMCG
Ensuring availability of FMCG Brands
in retail stores
How FMCG companies ensure availability
of their brands in the retail stores across
different locations?
Lets Analyze
Story of why men buy beer and diaper
together at Walmart?
Will bank approve loan application?
Data Analysis
What is data analysis?
• Systematic Process
• Raw data to information
• Specific purpose related insight
• Summaries and Conclusion
• Use of variety of methods
Basic terms
• Qualitative and
• Quantitative
variables Subset of
Raw Vs population
• Examples
Processed
Variable Population
Data and Types of Sample
Information Something Variables Totality of cases
whose values
are not constant
Information?
Data?
20
Name Age
Amit 20
KEY VOCABULARY
Population: Everything (or everyone) being studied.
number of newborn babies in UP.
number of tech startups in India
all MAT exam candidates
KEY VOCABULARY
Parameters :Characteristics of the population
Greek Letters are used to represent parameters like
We might be interested in learning about μ, the average weight
of all middle-aged female Indians.
The population consists of all middle-aged female Indians, and
the parameter is µ.
KEY VOCABULARY
Sample: Portion of the larger population.
For example, let's say a denim apparel manufacturer wants to check the quality of
the stitching on its blue jeans before shipping them off to retail stores.
It is not cost effective to examine every single pair of blue jeans the manufacturer
produces (the population).
Instead, the manufacturer looks at just 50 pairs (a sample) to draw a conclusion
about whether the entire population is likely to have been stitched correctly.
KEY VOCABULARY
Statistic
:Characteristics of the sample
English Letters are used to represent parameters like
For instance, suppose we selected a sample of 100 students from a
school with 1000 students. The average height of the sampled students
would be an example of a statistic.
any measurable characteristic of the sample would be an example of a
statistic.
KEY VOCABULARY
Variable : Characteristics of Interest gathered from each item
in the sample (the question)
A variable may also be called a data item. Age, sex, business income
and expenses, country of birth, capital expenditure, class grades, eye
colour and vehicle type are examples of variables. It is called a
variable because the value may vary between data units in
a population, and may change in value over time.
KEY VOCABULARY
Data: Actual Value of the variable.
KEY VOCABULARY
Population:
Everything (or everyone) being studied.
Parameters :Characteristics of the population
Greek Letters are used to represent parameters like
Sample: Portion of the larger population.
Statistic :Characteristics of the sample
English Letters are used to represent parameters like
Variable : Characteristics of Interest gathered from each item in the sample
(the question)
Data: Actual Value of the variable.
IDENTIFY KEY VOCABULARY
You want to know the average cost of Statistics Textbooks, so you
survey 25 books
Population
Parameters
Sample
Statistic
Variable
Data
IDENTIFY KEY VOCABULARY
You want to know the average cost of Statistics Textbooks, so you
survey 25 books
Population: All statistics Textbooks
Parameters : Average Cost of all statistics textbooks
Sample : 25 Textbooks
Statistic : Average Cost of 25 textbooks
Variable : Cost of a statistics textbook.
Data: Actual cost od the textbook(may be Rs 1200/-)
IDENTIFY KEY VOCABULARY
Let's suppose that there exists a population of 5 lakh MBA in India today. We
want to know the average GPA of the students, so we study 100 MBA students
randomly.
Population
Parameters
Sample
Statistic
Variable
Data
IDENTIFY KEY VOCABULARY
Let's suppose that there exists 5 lakh MBA students in India today. We want
to know the average GPA of the students, so we study 100 MBA students
randomly.
Population :5 lakh MBA students in India
Parameters :Average GPA of 5 lakh MBA students in India
Sample: 100 MBA students
Statistic :Average GPA of 100 students
Variable :GPA of a student
Data: Actual GPA of a student ,may be 6.7
IDENTIFY KEY VOCABULARY
The main campus at a University has a population of approximately 42,000
students. A research question is "what proportion of these students smoke
regularly?" A survey was administered to a sample of 987 Penn State students.
Population
Parameters
Sample
Statistic
Variable
Data
IDENTIFY KEY VOCABULARY
The main campus ABC University has a population of approximately 42,000
students. A research question is "what proportion of these students smoke regularly?"
A survey was administered to a sample of 987 Penn State students.
Population:42000 students of ABC university
Parameters :Proportion of students at ABC University who smoke regularly.
Sample:987 students of ABC University
Statistic :Proportion of 987 students at ABC University who smoke
regularly.
Variable :Smoke regularly or not
Data: Yes or No eg( may be 1 or 0)
INFERENCE A GLIMPSE
100 subscribers mean income may be Rs 42000/-
You conclude that the population mean income μ is likely to be close to Rs 42000 as well.
This example is one of statistical inference.
Data Analytics and Data Analysis
Business Analytics and Data analysis
• Business decision driven Vs Data driven
• Report on KPIs Vs Trend & Correlation
• Comparative Vs Exploratory
• Specific Vs General
Example
• Estimating Market share of Amul Ice-cream
• Consumer behaviour toward Ice-cream
New fields coming out of Data analysis
BIG DATA
• Volume
• Velocity
• Variety
• Value
• Veracity
New fields coming out of Data analysis
Artificial Intelligence
• Google Map
• E-payments
Machine Learning
• Amazon’s Alexa
• Google Assistant