0% found this document useful (0 votes)
324 views

BSADM Question Bank - MBA Sem 1

The document describes a module and question bank for a business statistics and analytics course. The first module covers measures of dispersion and symmetry for data, including range, interquartile range, mean absolute deviation, variance, standard deviation, Chebyshev's theorem, and coefficient of variation. The module aims to help students estimate dispersion and symmetry in data and draw inferences. The second module covers correlation analysis techniques like Pearson's coefficient, Spearman's rank correlation, and regression analysis to assess association between variables in data and predict variable values. Sample questions on correlation analysis and calculating rank correlation coefficients from bivariate data are provided.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
324 views

BSADM Question Bank - MBA Sem 1

The document describes a module and question bank for a business statistics and analytics course. The first module covers measures of dispersion and symmetry for data, including range, interquartile range, mean absolute deviation, variance, standard deviation, Chebyshev's theorem, and coefficient of variation. The module aims to help students estimate dispersion and symmetry in data and draw inferences. The second module covers correlation analysis techniques like Pearson's coefficient, Spearman's rank correlation, and regression analysis to assess association between variables in data and predict variable values. Sample questions on correlation analysis and calculating rank correlation coefficients from bivariate data are provided.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Course: Business Statistics and Analytics for Decision Making

MBA (Outcome Based – CBCS)

MODEL QUESTION BANK

Semester – I

1T6 – Business Statistics and Analytics for Decision Making

Module 1

Measures of Dispersion (Variation) & Symmetry: Significance of measuring Dispersion, Requisites and
classification of measures of Dispersion, Distance measures - Range, Interquartile range. Average Deviation
measures - Mean Absolute Deviation, Variance and Standard deviation, Chebyshev’s Theorem, Coefficient of
variation & its significance. Concept of Skewness & Kurtosis

CO1: For a given dataset, the student should be able estimate the dispersion / variance & symmetry of
the data using various measures and draw inferences to facilitate decision making

Question 1:

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 1


Course: Business Statistics and Analytics for Decision Making

Question 2:

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 2


Course: Business Statistics and Analytics for Decision Making

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 3


Course: Business Statistics and Analytics for Decision Making
Question 3:

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 4


Course: Business Statistics and Analytics for Decision Making

Question 4:

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 5


Course: Business Statistics and Analytics for Decision Making

Question 5:

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 6


Course: Business Statistics and Analytics for Decision Making

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 7


Course: Business Statistics and Analytics for Decision Making

Question 6:

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 8


Course: Business Statistics and Analytics for Decision Making

Question 7:

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 9


Course: Business Statistics and Analytics for Decision Making

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 10


Course: Business Statistics and Analytics for Decision Making
Module II

Measures of Association: Correlation, Types & Methods of Correlation analysis - Karl Pearson’s coefficient of
correlation, Spearman’s Rank correlation, Probable error, Coefficient of Determination, Standard error of
coefficient of correlation. Introduction to regression analysis and its advantages, Types of regression models,
methods to determine regression coefficients (normal equations)

CO2: For a given dataset, the student should be able assess the level of association between given
variables in the data using various types of correlation analysis techniques. The students should also be
able to predict the values of a variable using regression analysis techniques.

Question 1. Summarize Correlation Analysis by exemplifying its meaning, nature, assumptions and
limitations.(4+4+4+4)
Ans: Bivariate data: Data relating to two variables is called bivariate data. Bivariate data set may reveal
some kind of association between two variables x and y and we may be interested in numerically measuring the
degree of strength of this association. Such a measure can be performed with correlation.
Positive or direct correlation
If higher values of the one variable are associated with higher values of the other or when lower values of the
one are accompanied by the lower values of the other (in other words, movements of the two variables are in
the same direction) it is said that there exists positive or direct correlation between the variables.
Example
The greater the sides of a rectangle, the greater will be its area; the higher the dividend declared by a company,
the higher will be market price of its shares.

Negative or inverse correlation


If on the other hand, the higher values of one variable are associated with the lower values of the other (i.e.,
when the movements of two variables are in opposite directions), the correlation between those variables are
said to be negative or inverse. For example, investment is likely to be negatively
correlated with rate of interest.
The presence of correlation between two variables does not necessarily imply the existence of a direct
causation, though causation will always result in correlation. In general, correlation may be due to any one of
the following factors:
1.One variable being the cause of the other variable
In case of the association between quantity of money in circulation and price, quantity
of money in circulation is the cause of price levels.
2.Both variables being result of a common cause
Example
The yield of Wheat and Maize may be correlated positively due to the fact that they are
related with the amount of rainfall.
3. Chance factor

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 11


Course: Business Statistics and Analytics for Decision Making
While interpreting the correlation between two variables, it is essential to see if there is any likelihood of the
relationship. It might sometimes happen that between two variables a fair degree of correlation may be
observed but there is no likelihood of any relationship between them.
Example
Wholesale price index of India and average height of its female population. Between two variables, the degree
of association may range all the way from no relationship at all to a relationship so close that one variable is a
function of the other.
Thus, correlation may be:
1) Perfectly positive
2) Limited positive degree
3) No correlation at all
4) Limited negative degree
5) Perfectly negative
Perfect positive and perfect Negation Correlation
When we find a perfect positive relation between two variables, we designate it as +1. In case of perfect
negative we describe it as –1. Thus, correlation between any two variables must vary between –1 and +1.
Linear or Non-linear Correlation
Correlation may be linear or non-linear. If the amount of change in one variable tends have a constant ratio to
the amount of change in the other, then the correlation is said to be linear. Here we will study linear correlation
only. This is often called simple correlation.
Partial Correlation Coefficient
Suppose we have multivariate (more than two variable) data. The correlation coefficient between two variables
after eliminating the effect of the other variables from both of the variable gives the partial correlation
coefficient.
Multiple Correlation Coefficients
The product moment correlation coefficient between the observed values of a variable and the estimated
values of that variable is called multiple correlation coefficients.
Limitations of Simple Correlation
1. Simple correlation analysis deals with two variables only and it explores the extent of linear
relationship between them (if x and y are linearly related, then we can write y = a + bx). But as we have
noted earlier correlation between two variables may be due to the fact that they are affected by a third
variable.
2. Simple correlation analysis may not give the true nature of association between two variables in such
an event. Ideally, one should take out the effect of the 3rd variable on the first two and then go on
measuring the strength of association between them. But this is not possible under simple correlation
analysis.
3. In simple correlation analysis, we assume linear relationship between two variables but there may exist
non-linear relationship between them. In that case, simple correlation measure fails to capture the
association.
4. Strong relationship (linear) between two variables will imply that correlation between them is high
(either stark positive or stark negative) but the converse is not necessarily true.

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 12


Course: Business Statistics and Analytics for Decision Making
Question 2. Exemplifying the concept of rank correlation coefficient test, produce the rank correlation
coefficient from the following data and interpret the coefficient and its sign.
The following table shows the per capita income (in thousands) and food expenditure of the family in different
cities of Maharashtra. (4+12)
Per capita (income) 11 16 18 8 6 15 10 5
Expenditures 5 7 8 3 2 8 4 2
Sol:

Rank Correlation Coefficient Test


Rank correlation coefficient test is the nonparametric analog of the linear correlation coefficient. Rank
correlation coefficient test helps decide the type of relationship between data from population is not normally
distributed. It is denoted by rs for sample data and ρs for population data. This is the linear correlation
coefficient between the ranks of data. For the calculation of rank correlation coefficient r s, first we rank the data
for each variable x and y, separately denotes those rank by u and v respectively. Then take the difference
between each pairs of ranks d
(d = u – v).
Finally we estimate the value of rank correlation coefficient rs.

6 d 2
rs = 1 − .
n(n 2 − 1)

x Rank of x y Rank of y d = u-v d2


(u) (v)
11 5 6 5 -1 1
16 7 9 7 0 0
18 8 8 6 2 4
8 3 3 2 1 1
6 2 4 3 -1 1
15 6 10 8 -2 4
10 4 5 4 0 0
5 1 2 1 0 0
∑d2 = 11

6 d 2
So, rs = 1 − .
n(n − 1)
= 1 – {6×11/ 8(64-1)} = 1- 0.130952 = +0.869048.
Interpretation:
Here, rs = 0.869048. We conclude that there is perfect Positive correlation between the per capita income and
expenditure.

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 13


Course: Business Statistics and Analytics for Decision Making
Question 3:

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 14


Course: Business Statistics and Analytics for Decision Making
Question 4:

Question 5:

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 15


Course: Business Statistics and Analytics for Decision Making

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 16


Course: Business Statistics and Analytics for Decision Making
Question 6:

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 17


Course: Business Statistics and Analytics for Decision Making

Question 7:

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 18


Course: Business Statistics and Analytics for Decision Making

Question 8:

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 19


Course: Business Statistics and Analytics for Decision Making

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 20


Course: Business Statistics and Analytics for Decision Making
Question 9:

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 21


Course: Business Statistics and Analytics for Decision Making
Question 10:

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 22


Course: Business Statistics and Analytics for Decision Making
Module III

Probability: Basic terminology, types of probability, probability rules, conditional probabilities, Baye’s
Theorem. Random Variables, Probability distributions; Binomial distribution, Poisson distribution, Normal
distribution. Choosing correct probability distribution

CO3: For given situations a student should be able determine the various probabilities arising out of the
situation and make use of probability theory and appropriate probability distributions for the purpose
of decision making.

Question 1:The lifetimes of certain electronic devices have a mean of 300 hours and standard deviation of 25
hours. Assuming that the distribution of these lifetimes, which is measured to the nearest hour, can be
approximated closely with the normal curve.

a) Find the probability that any one of these electronic devices will have lifetimes more than 350 hours
b) What percentage has lifetimes of 300 hours or less?
c) What percentage will have lifetimes from 220 or 260 hours

Solution:

a) Given: μ = 300, Ϭ = 25 and x = 350 hours

Z = x– μ = 350-300 = 2

Ϭ 25

The area under normal curve between z = 0 and z = 2 is 0.9772. Thus the required probability is 1 – 0.9772 =
0.0228 = 2.28%

b) Z = x – μ= 300-300 = 0
Ϭ 25

Therefore the required percentage is 100 × 0.5000 = 50%

c) Given: x1 = 220, ×2 = 260, μ = 300, Ϭ = 25. Thus

Z1 = x – μ/ Ϭ = 220-300 = -3.2 and Z2 = 260 -300 = -1.6


25 25
From the normal table, we have
P(z = -1.6) = 0.4452 and P(z = -3.2) = 0.4903

The required probability is


P(z = -3.2) - P(z = -1.6) = 0.4903-0.4452 = 0.0541
Hence the required percentage is 0.0541 x 100 = 5.41%

Question 2: A company X has 1500 employees and every year 300 employees quit the company. Estimate the
probability of attrition in this company per year. Also estimate the probability of retention rate of employees by
the same company. (8+8 marks)

Solution
1.Calculation of attrition rate - According to frequency estimation equation, the probability of an event is given
by,

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 23


Course: Business Statistics and Analytics for Decision Making
P(X)= No. of observations in favour of event X/Total No. of observations = n(X)/N
In the given condition of the problem,
P(Attrition) = No.of employees leaving the job in a year/Total No.of employees
= 300/1500 = 3/15 = 0.2 = 20%
Hence the probability of attrition in this company per year = 0.2 or 20%

2.Calculation of Retention rate = According to frequency estimation equation, the probability of an event is
given by,
P(Y) = No. of observations in favour of event Y/Total No. of observations = n(Y)/N
In the given condition of the problem,
P(Retention) = No.of employees staying in the job in a year/Total No.of employees
= 1200/1500 = 12/15 = 0.8 = 80%
Hence the probability of retention in this company per year = 0.8 or 80%

Question 3: A web site displays total 10 advertisements. When the visitor to this web site clicks and sees any
of the advertisements, the web site gets its revenue. Out of total 2500 visitors to this site in a day, thirty visitors
clicked on advertisement one advertisement, fifteen visitors clicked on two advertisements, while five visitors
clicked on three advertisements. Rest of the visitors did not click on any of the advertisements. Under these
conditions, estimate,
1. Probability that the visitor will click on any of the advertisements.
2. Probability that any visitor will click on atleast two of the advertisements.
3. Probability that any visitor will not click on any of the advertisements. (6+6+4)

Solution
1. Probability that any visitor will click on any of the advertisement.
According to frequency estimation equation, the probability of an event is given by,
P(X) = No. of observations in favour of event X/Total No. of observations = n(X)/N
In the given condition of the problem,
P(Any One advertisement Click) = No. of visitors clicking on at least one advertisement in a day /Total No. of
visitors to the site
= 50/2500 = 1/50 = 0.02 = 2%
Probability that the visitor will click on any of the advertisements is = 0.02 = 2%

2. Probability that any visitor will click on at least two of the advertisements.
According to frequency estimation equation, the probability of an event is given by,
P(Y) = No. of observations in favour of event Y/Total No. of observations = n(Y)/N
In the given condition of the problem,
P(Any two advertisements Click) = No. of visitors clicking on any two advertisements in a day /Total No. of
visitors to the site
= 20/2500 = 1/125 = 0.008 = 0.8%
Probability that the visitor will click on any two the advertisements = 0.08 = 0.8%

3. Probability that any visitor will click on none of the advertisements.


According to frequency estimation equation, the probability of an event is given by,
P(Z) = No. of observations in favour of event Z/Total No. of observations = n(Z)/N
In the given condition of the problem,
P(No advertisement Click) = No. of visitors clicking on none of the advertisements in a day /Total No. of
visitors to the site
= 2450/2500 = 0.98 = 98%
Probability that the visitor will not click on any the advertisements = 0.98 = 98%

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 24


Course: Business Statistics and Analytics for Decision Making

Question 4:

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 25


Course: Business Statistics and Analytics for Decision Making
Question 5:

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 26


Course: Business Statistics and Analytics for Decision Making
Question 6:

Question 7:

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 27


Course: Business Statistics and Analytics for Decision Making

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 28


Course: Business Statistics and Analytics for Decision Making
Module 4

Hypothesis Testing: Introduction, Hypothesis testing procedure, errors in hypothesis testing, Power
of a statistical test. t-test, ANOVA and Chi–Square test, (Students should be able to perform testing on
spreadsheets)

CO4:For a given research problem, student should be able to construct appropriate hypotheses
and draw conclusions by using a suitable hypothesis testing procedure so as to address the
research problem in question.

Question 1: An automobile company has given you the task of ascertaining the preferences of customers for
particular colours of cars; Red, Silver, White & Black. The company also wants to know if the customers’ gender
may have any impact on this decision. Explain how you would plan to conduct this research in a step by step
manner, keeping in mind the research process and the hypothesis testing procedure. What may be the
possibilities of committing errors, if any?

Solution:

Expected steps in research process, hypothesis testing procedure and choice of suitable hypothesis test. In
addition, possibility of committing Type I and Type II errors to be discussed.

Question 2: Helix Corporation wants to study whether the welfare facilities provided by the organisation are
effective or not. The organisation has conducted the survey comprising of 784 sample size asking about
effectiveness of polices. The sample size comprise of all members of staff including females, Males and contract
employees. The female staff is 140, male staff is 260 and contractual employees are 87. The result of the survey
is tabulated as under.

Male Female Contract


Total
Employees Employees Employees

Effective 95 55 42 192

Effective to some
89 49 32 170
Extent

Not at all Effective 76 36 13 125

Total 260 140 87 487

Formulate the hypothesis and apply chi square test at 5% significance level to test whether policies are
effective or not. Draw the inference. (Use the Tabulated value of Chi square 9.488)

Solution:

H0: Helix Corporation is not providing effective welfare facilities to employees

H1: Helix Corporation is providing effective welfare facilities to employees

α = 0.05

n = 487

Calculation of Chi Square Value


MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 29
Course: Business Statistics and Analytics for Decision Making
fe =(RT x
Row Column fo fo-fe (fo-fe)2 (fo-fe)2/fe
CT)/n

1 1 95 102.5055 -7.5051 56.3270 0.5495

1 2 55 55.1950 -0.19507 0.0380 0.0006

1 3 42 34.2997 7.7002 59.2931 1.7286

2 1 89 90.7597 -1.7597 3.0967 0.0341

2 2 49 48.8706 0.1293 0.0167 0.0003

2 3 32 30.3696 1.6303 2.6581 0.0875

3 1 76 66.7351 9.2648 85.8381 1.2862

3 2 36 35.9342 0.0657 0.0043 0.0001

3 3 13 22.3305 -9.3305 87.0600 3.8986

Total 7.5859

Where, fo – Observed Frequency

fe - Expected Frequency

RT - Row Total

CT – Column Total

Rounded off to 7.586

DOF (Degrees of Freedom) = (R-1) * (C-1)

= (3-1)*(3-1)

=4

Where R – No of Rows in original table (Contingency Table)

C – No of Columns in original table (Contingency Table)

For 4 degrees of freedom and significance level of 0.05 the tabulated value of chi square is 9.488. The calculated
value is 7.586. This is less than tabulated value. Hence null hypothesis that Helix Corporation is not providing
effective welfare facilities to employees is accepted. And alternate hypothesis is rejected. Hence it is concluded
that Helix Corporation does not provide effective welfare policies to its employees.

Question3: Supratech Hospital is conducting the treatment of 250 patients suffering from a certain disease.
The hospital has devised a new treatment and it wants to know whether the new treatment is superior to
conventional treatment. The details of result of treatment are as given in the following table.

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 30


Course: Business Statistics and Analytics for Decision Making
No. of Patients
Treatment Total
Favourable Not Favourable

New 140 30 170

Conventional 60 20 80

Total 200 50 250

Formulate the hypothesis and test it using Chi Square Test at 5% level of Significance and draw the inference.
The tabulated value of Chi square at 5% level of Significance and 1 degree of freedom is 3.84.

Solution:

H0: There is no significant difference in the new and conventional treatment

H1: There is significant difference in the new and conventional treatment

α = 0.05

n = 250

fe =(RT x
Row Column Fo fo-fe (fo-fe)2 (fo-fe)2/fe
CT)/n

1 1 140 136 4 16 0.118

1 2 30 34 -4 16 0.471

2 1 60 64 -4 16 0.250

2 2 20 16 4 16 1.000

Total 1.839

Where, fo – Observed Frequency

fe = Expected Frequency

RT – Row Total

CT – Column Total

DOF (Degrees of Freedom) = (R-1)(C-1)

= (2-1)*(2-1)

=1

Where R – No of Rows in original table (Contingency Table)

C – No of Columns in original table (Contingency Table)

For 1 degree of freedom and significance level of 0.05 the tabulated value of chi square is 3.84. The calculated
value is 1.839. This is less than tabulated value. Hence null hypothesis that There is no significant difference in
the new and conventional treatment is accepted. And alternate hypothesis is rejected.

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 31


Course: Business Statistics and Analytics for Decision Making
Question 4: Sigma Associates wants to assess the Effectiveness of training method on the productivity of the
employees. They have conventional method and Newly devised training method. Employees were asked about
their preference for training method. For this they have taken the sample of 50 employees which work in
different units situated at different locations. There are 3 different location of the company viz..Mumbai,
Bangalore and Delhi. The responses from employees from all the 3 locations are noted and are given in
following table.

Area
Mumbai Bangalore Delhi Total
Preference

New Method 20 7 3 30

Conventional
14 5 1 20
Method

Total 34 12 4 50

Formulate the hypothesis and test it at 5% level of significance using Chi square test and draw the inference.
Use 5.99 as the tabulated value of Chi square for 2 degrees of freedom and at 5% level of significance.

Solution:

Null Hypothesis: There is no significant difference between the proportion of respondents who prefer new
training method and those who prefer conventional training method.

Alternate Hypothesis: There is significant difference between the proportion of respondents who prefer new
training method and those who prefer conventional training method

H0 : PM = PB = PD

H1 : PM ≠ PB ≠ PD

α = 0.05

PM = Proportion from Mumbai Region

PB = Proportion from Bangalore Region

PD = Proportion from Delhi Region

The hypothesis is tested using chi square test as the data comes from more than one population and also the
distribution is not known.

Row Column f0 fe = (RTxCT)/n f0-fe (f0-fe)2 (f0-fe)2/ fe

1 1 20 20.4 -0.4 0.16 0.00078

1 2 7 7.2 -0.2 0.04 0.00555

1 3 3 2.4 0.6 0.36 0.15

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 32


Course: Business Statistics and Analytics for Decision Making
2 1 14 13.6 0.4 0.16 0.0117

2 2 5 4.8 0.2 0.04 0.0083

2 3 1 1.6 0.4 0.16 0.1

Total 0.176

Where, fo – Observed Frequency

fe = Expected Frequency

RT – Row Total in Original Table (Contingency Table)

CT – Column Total in Original Table (Contingency Table)

DoF (Degrees of Freedom) = (R-1) (C-1)

= (2-1) (3-1)

=2

Where R – No of Rows in original table (Contingency Table)

C – No of Columns in original table (Contingency Table)

For 0.05 significance level & 2 degrees of freedom the tabulated value of chi square is 5.99. The calculated value
is 0.176 which is less than the tabulated value. Hence the null hypothesis that there is no significant difference
between the proportion of respondents who prefer new training method and those who prefer conventional
training method is accepted and alternate hypothesis is rejected.

Question 5: Triveni Industries Inc operates in Mumbai, Kolkata and Delhi for retail selling of certain
commodity. The company wants to test the significance of variation in the pricing of the commodity in all of its
outlets in the above mentioned cities. For this purpose it has chosen 4 shops in each city randomly. The Data
regarding variation in pricing is given below.

Mumbai 16 8 12 14
Kolkata 14 10 10 6
Delhi 4 10 8 8
Formulate the hypothesis at 5% significance level and draw the inference whether the variation in pricing in
different cities is significant or not. For df1 = 2 and df2 = 9 the F static has a critical value of 4.26.
Solution:
H0: There is no significant difference in the prices of the commodity in the three cities
H1: There is significant difference in the prices of the commodity in the three cities
α = 0.05

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 33


Course: Business Statistics and Analytics for Decision Making
Sample 1 sample 2 sample 3
Mumbai Kolkata Delhi
x1 x12 x2 x22 x3 x32
16 256 14 196 4 16
8 64 10 100 10 100
12 144 10 100 8 64
14 196 6 36 8 64
Σ = 50 660 40 432 30 244
There are r = 3 treatments (Samples) with n1 = n2 = n3 = 4 and n = 12
T = Sum of all observations in all the three samples
= Σ x1 +Σ x2+ Σ x3 = 50 + 40+ 30 = 120
Correction Factor (CF) = T2/n = (120)2/12 = 1200
SST = Total Sum of the Squares
= (Σ x12 +Σ x22+ Σ x32) – CF
= (660 + 432 + 244) – 1200
= 136
SSTR = Sum of Squares between the samples
= (x1)2/ n1+ (x2)2/ n2 + (x3)2/ n3 – CF
= (50)2/4 + (40)2/4 + (30)2/4 – 1200
= 50
SSE = Sum of squares within the samples
= SST – SSTR
= 136 – 50
= 86
Degrees of Freedom:
df1 = r-1 = 3-1 = 2
df2 = n-r = 12 – 3 = 9
MSTR = mean sum squares or variance between samples
= SSTR/df1 = 50/2 = 25
MSE = mean sum squares or variance within sample
= SSE/df2 = 86/9 = 9.55

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 34


Course: Business Statistics and Analytics for Decision Making
One Way ANOVA Table
Degrees
Source of Sum of Mean
of Test Statistic
Variation Squares squares
Freedom

F = 25/9.55 =
Between Samples 50 2 25
2.617

Within Samples 86 9 9.55


Total 136 11
Inference – Since the calculated value of F (2.617) is less than its critical value i.e. 4.26 at df1 = 2 and df2 = 9 and
5 percent level of significance, the null hypothesis is accepted. Hence it is concluded that there is no significant
difference in the prices of the commodity in the three cities.

Question 6: Pride agency is engaged in selling electronic and electrical goods. The agency has 4 salesmen and
wants to assess their sales performance for selling refrigerator during the months May, June and July. The
following table gives the details of their sales for mentioned months.
Salesman
Month
A B C B
May 50 40 48 39
June 46 48 50 45
July 39 44 40 39
The company wants to know is there a significant difference in the sales made by the four salesmen. The
company also wants to know whether there is significant difference in the sales made during months specified.
Formulate the hypothesis and test it at 5% level of significance. The critical value of F static is df
Solution:
H0: There is no significant difference in the prices of the sales made by the four salesmen during different
months.
H1: There is no significant difference in the prices of the sales made by the four salesmen during different
months.
α = 0.05
Coding of data: The problem contains the data which is high in values. Hence for sake of simplicity the data is
coded. Subtract 40 from every value in the problem. The new table is as under.
Salesman
Month
A B C B
May 10 0 8 -1
June 6 8 10 5
July -1 4 0 -1

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 35


Course: Business Statistics and Analytics for Decision Making

Salesman
Months A B C D
Row Sum
x1 x12 x2 x22 x3 x32 x4 x42
May 10 100 0 0 8 64 -1 1 17
June 6 36 8 64 10 100 5 25 29
July -1 1 4 16 0 0 -1 1 2
Column
15 137 12 80 18 164 3 27 48
Sum
T = Grand Total = 48
Correction Factor (CF) = T2/n = (48)2/12 = 192
SST = Total Sum of the Squares
= (Σ x12 +Σ x22 + Σ x32 + Σ x42) – CF
= (137 + 80 + 164 + 27) – 192
= 216
SSC = Sum of Squares between the Salesman (Columns)
= (x1)2/ n1+ (x2)2/ n2 + (x3)2/ n3 + (x4)2/ n4 – CF
= (15)2/3 + (12)2/3 (18)2/3 + (3)2/3 - 192
= 42
SSR = Sum of squares Between the Months (Rows)
= (17)2/4 + (29)2/4 + (2)2/4 – 192
= 91.5
SSE = SST – (SSC+SSR) = 216 – (42+91.5) = 82.5
Degrees of Freedom:
Total Degrees of freedom df = 12 – 1 = 11
dfc = c-1 = 4-1 = 3 (Column wise)
dfr = r-1 = 3 – 1 = 2 (Row wise)
dfE = (c-1)(r-1) = 3 X 2 = 6
MSC = mean sum squares or variance between samples (Shop wise)
= SSC/dfc = 42/3 = 14
MSR = mean sum squares or variance within sample (Area Wise)
= SSR/dfr = 91.5/2 = 45.75
MSE = SSE/df= 82.5/6 = 13.75

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 36


Course: Business Statistics and Analytics for Decision Making
Two Way ANOVA Table
Sum of Degrees of
Source of variation Mean Squares Variance Ratio
Squares freedom
Between the Salesman Ftreatment= 14/13.75 =
42 3 14
(Column) 1.108
Between the Months Fblock = 45.75/13.75 =
91.5 2 45.75
(Row) 3.327
Residual Error 82.5 6 13.75
Total 216 11

1. Since the calculated value of Ftreatment= 1.108 is less than its critical value F = 4.75 at df1 = 3 and df2 = 6
for 5% significance level, the null hypothesis is accepted. Hence it is concluded that sales made by the
salesman do not differ significantly.

2. Since the calculated value of Fblock = 3.327 is less than its critical value F = 5.14 at df1 = 2 and df2 = 6 at
5% significance level, the null hypothesis is accepted. Hence it may be concluded that sales made during
different months do not differ significantly.

Question 7: Falcon Fitness Studio declared a weight reduction program and claimed to reduce more than 7 Kg
a month. An overweight executive of one corporate is interested to join the program but he is not showing
confidence in the claim made by the Company and asked for evidences. The company asked him to choose any
10 previous participants randomly and check their weights before joining the program and after finishing the
program. The details are as given below.

Weight Before Joining the Weight after finishing the


Program (Kg) program (Kg)

94.5 85

101 89.5

110 101.5

103.5 96

97 86

88.5 80.5

96.5 87

101 93.5

104 93

116.5 102

The overweight executive wants to test at 5% significance level the claimed average weight loss of more than 7
kg. Formulate the hypothesis and test whether there is significant difference in the claim and actual weight loss
and claimed weight loss. The critical value of t at 9 degrees of freedom and 5% level of significance is 1.833
(One Tailed).

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 37


Course: Business Statistics and Analytics for Decision Making
Solution:

H0: μ = 7

H1: μ > 7

α = 0.05

n = 10

Loss x-𝑥̅ (x-𝑥̅ )2


Weight Weight
Loss (x) Squared
Before after (d) d2
(x2)

94.5 85 9.5 90.25 -0.3 0.09

101 89.5 11.5 132.25 1.7 2.89

110 101.5 8.5 72.25 -1.3 1.69

103.5 96 7.5 56.25 -2.3 5.29

97 86 11 121 1.2 1.44

88.5 80.5 8 64 -1.8 3.24

96.5 87 9.5 90.25 -0.3 0.09

101 93.5 7.5 56.25 -2.3 5.29

104 93 11 121 1.2 1.44

116.5 102 14.5 210.25 4.7 22.09

Total (Σ) 98.5 1013.75 0.5 43.55

Mean 𝑥̅ = Σx/n = 98.5/10 = 9.85

Standard Deviation s = √Σx2/n-1 - n𝑥̅ 2/n-1 = √1013/10-1 – 10(9.85)2/(10-1) = 3.84

Standard Error of mean σ𝑥̅ = s/√n = 3.84/√10 = 1.215

t = (𝑥̅ – μ)/ σ𝑥̅

= (9.85 – 7)/1.215

= 2.85/1.215

= 2.345

The calculated value of t static is 2.345. Since the test is one tailed, the critical value of t static at 9 degrees of
freedom and 5% confidence level is 1.833. (Since both the tails share 5% of area each, single tail will take 10%.
Hence the level of confidence is to be taken 10% while selecting the value of t static. Hence degree of freedom =
9 and Level of significance is 10% the value is 1.833 and not 2.262). The calculated value is more than critical
value. Hence null hypothesis is rejected. The claim of Company is legitimate.

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 38


Course: Business Statistics and Analytics for Decision Making
Question8: Montex Fertilizers is a leading fertilizer company with precision machinery. Its machine is set to
mix 12 Kg of Nitrate in every 100 Kg bag of fertilizer. The company wants to check if the machine is working
perfectly to what is set as mixing value of Nitrate. Ten bags of 100 Kg each are examined. The percentage of
Nitrate mix is found to be 11, 14, 13, 12, 13, 12, 13, 14, 11, 12. Formulate and test the hypothesis that the there
is no significant difference in quantity of Nitrate mix in different bags. The level of significance is 5%.

Solution

H0: The machine mixes 12 Kg of Nitrate in every 100 Kg of Fertiliser.

H1: The machine does not mix 12 Kg of Nitrate in every 100 Kg of Fertiliser.

In other words:

H0: μ = 12

H1: μ ≠ 12

α = 0.05

n = 10

Degrees of freedom = n-1 = 10-1 = 9

It is assumed that the weight of nitrate in fertiliser bags is normally distributed and its standard deviation is not
known. The values of 𝑥̅ , and sample standard deviation s are calculated below.

Weight of
Deviation
Nitrate x2 d2
d = x-𝑥̅
x

11 121 -1.5 2.25

14 196 1.5 2.25

13 169 0.5 0.25

12 144 -0.5 0.25

13 169 0.5 0.25

12 144 -0.5 0.25

13 169 0.5 0.25

14 196 1.5 2.25

11 121 -1.5 2.25

12 144 -0.5 0.25

Σ = 125 Σ = 1573 0 Σ = 10.5

𝑥̅ = Σx/n = 125/10 = 12.5

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 39


Course: Business Statistics and Analytics for Decision Making
s = √Σx2/n-1 - n𝑥̅ 2/n-1

= √1573/9 - 9(12.5)/9

= √174.78/9 - 112.5/9

= √19.42 – 12.5

=√ 6.92

= 2.63

Standard Error of mean σ𝑥̅ = s/√n = 2.63/√10 = 0.83

t= (𝑥̅ – μ)/ σ𝑥̅ = (12.5 – 12)/0.83 = 0.5/0.83 = 0.60

The critical value of t statistic at 9 degrees of freedom and 5% significance level for two tailed test is 2.262. The
calculated value is less than the critical value. Hence null hypothesis is accepted. It is concluded that the
machine mixes 12 kg of nitrate in every 100 kg of fertilizer.

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 40


Course: Business Statistics and Analytics for Decision Making
Module 5

Business Analytics - Introduction to analytics, Differentiating descriptive, predictive, and prescriptive


analytics, data mining vs data analytics, Industrial problem solving process, Decision needs and analytics,
stakeholders and analytics, SWOT analysis, Business analytics in decision making, Categorization of Analytical
Methods and Models. Introduction & applications of SPSS, R, Python etc.

CO5: The student will be able to differentiate between various forms of analytics and will also be able to
choose suitable analytics for decision making.

Question.1: Examine the role played by analytics in the growth of e-commerce companies.

Ans.

1.Demand forecasting

2. Management of excess inventory

3. Ability to predict cancellation of orders.

4.Predicting the fraudulent transactions.

5. Predicting the delivery time of the shipment.

6. Predicting the future purchases of the customers.

Question 2: A firm wants to develop a new product. List out the various steps involved in data driven
decision making in solving the problems likely to be encountered using analytics.

Ans.

1. Identify the opportunity or problem for value creation.

2. Identify the I ry and II ry sources of data.

3. Data processing for missing values.

4.Deriving the variables.

5. Building analytical models and selecting the best model.

6.Implement the solution/decision or develop the product.

Question 3: Examine the role of business analytics in

i) Retail business context

ii) Information Technology context

iii) Data Science context

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 41


Course: Business Statistics and Analytics for Decision Making

Ans.

i) Retail business context -

1.Explain the example of small basket from bigbasket.com

2. Demand prediction due to season/festivals

3. Explain the example of price insensitivity of the parents towards their children.

ii) Information Technology context -

1. Capturing the Point of Sales data to determine past purchases.

2. Use of sophisticated data analytical tools such as R, SPSS, Python

iii) Data Science context -

1. Constitutes Statistical & Operations Research Techniques, Machine Learning & Deep Learning Algorithms

2. Classification problems.

Question 4: Compare & Contrast the three types of analytics.

Solution

Ans. The three types of analytics are -

A. Descriptive Analytics - What has happened to the business in the past.

B. Predictive Analytics - What will happen to the business in the future.

C. Prescriptive Analytics - Which the best model for optimization of the business revenue.

The big data revolution has given birth to different kinds, types and stages of data analysis. Boardrooms across
companies are buzzing around with data analytics - offering enterprise wide solutions for business success.
However, what do these really mean to businesses? The key to companies successfully using Big Data, is by
gaining the right information which delivers knowledge, that gives businesses the power to gain a competitive
edge. The main goal of big data analytics is to help organizations make smarter decisions for better business
outcomes.

Big data analytics cannot be considered as a one-size-fits-all blanket strategy. In fact, what distinguishes a best
data scientist or data analyst from others, is their ability to identify the kind of analytics that can be leveraged
to benefit the business - at an optimum. The three dominant types of analytics –Descriptive, Predictive and
Prescriptive analytics, are interrelated solutions helping companies make the most out of the big data that they
have. Each of these analytic types offers a different insight. In this article we explore the three different types of
analytics -Descriptive Analytics, Predictive Analytics and Prescriptive Analytics - to understand what each type
of analytics delivers to improve on, an organization’s operational capabilities.

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 42


Course: Business Statistics and Analytics for Decision Making
Types of Analytics

Big data analytics helps a business understand the requirements and preferences of a customer, so that
businesses can increase their customer base and retain the existing ones with personalized and relevant
offerings of their products or services. According to IDC, the big data and analytics industry is anticipated to
grow at a CAGR of 26.4% reaching a value of $41.5 billion by end of 2018. The big data industry is growing at a
rapid pace due to various applications like smart power grid management, sentiment analysis, fraud detection,
personalized offerings, traffic management, etc. across myriad industries. After the organizations collect big
data, the next important step is to get started with analytics. Many organizations do not know where to begin,
what kind of analytics can nurture business growth and what these different types of analytics mean.

What is Descriptive Analytics?

90% of organizations today use descriptive analytics which is the most basic form of analytics. The simplest
way to define descriptive analytics is that, it answers the question “What has happened?”. This type of analytics,
analyses the data coming in real-time and historical data for insights on how to approach the future. The main
objective of descriptive analytics is to find out the reasons behind precious success or failure in the past. The
‘Past’ here, refers to any particular time in which an event had occurred and this could be a month ago or even
just a minute ago. The vast majority of big data analytics used by organizations falls into the category of
descriptive analytics.

A business learns from past behaviours to understand how they will impact future outcomes. Descriptive
analytics is leveraged when a business needs to understand the overall performance of the company at an
aggregate level and describe the various aspects.

Descriptive analytics are based on standard aggregate functions in databases, which just require knowledge of
basic school math. Most of the social analytics are descriptive analytics. They summarize certain groupings
based on simple counts of some events. The number of followers, likes, posts, fans are mere event counters.
These metrics are used for social analytics like average response time, average number of replies per post,
%index, number of page views, etc. that are the outcome of basic arithmetic operations.

The best example to explain descriptive analytics are the results, that a business gets from the web server
through Google Analytics tools. The outcomes help understand what actually happened in the past and validate
if a promotional campaign was successful or not based on basic parameters like page views.

What is Predictive Analytics?

The subsequent step in data reduction is predictive analytics. Analysing past data patterns and trends can
accurately inform a business about what could happen in the future. This helps in setting realistic goals for the
business, effective planning and restraining expectations. Predictive analytics is used by businesses to study the
data and ogle into the crystal ball to find answers to the question “What could happen in the future based on
previous trends and patterns?”

Organizations collect contextual data and relate it with other customer user behaviour datasets and web server
data to get real insights through predictive analytics. Companies can predict business growth in future if they
keep things as they are. Predictive analytics provides better recommendations and more future looking
answers to questions that cannot be answered by BI.

Predictive analytics helps predict the likelihood of a future outcome by using various statistical and machine
learning algorithms but the accuracy of predictions is not 100%, as it is based on probabilities. To make
predictions, algorithms take data and fill in the missing data with best possible guesses. This data is pooled
with historical data present in the CRM systems, POS Systems, ERP and HR systems to look for data patterns
MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 43
Course: Business Statistics and Analytics for Decision Making
and identify relationships among various variables in the dataset. Organizations should capitalise on hiring a
group of data scientists in 2016 who can develop statistical and machine learning algorithms to leverage
predictive analytics and design an effective business strategy.

Predictive analytics can be further categorized as –

Predictive Modelling –What will happen next, if ?

Root Cause Analysis-Why this actually happened?

Data Mining- Identifying correlated data (click here to get sample use-cases with code).

Forecasting- What if the existing trends continue?

Monte-Carlo Simulation – What could happen?

Pattern Identification and Alerts –When should an action be invoked to correct a process.

Sentiment analysis is the most common kind of predictive analytics. The learning model takes input in the form
of plain text and the output of the model is a sentiment score that helps determine whether the sentiment is
positive, negative or neutral.

Organizations like Walmart, Amazon and other retailers leverage predictive analytics to identify trends in sales
based on purchase patterns of customers, forecasting customer behaviour, forecasting inventory levels,
predicting what products customers are likely to purchase together so that they can offer personalized
recommendations, predicting the amount of sales at the end of the quarter or year. The best example where
predictive analytics find great application is in producing the credit score. Credit score helps financial
institutions decide the probability of a customer paying credit bills on time.

What is Prescriptive Analytics?

Big data might not be a reliable crystal ball for predicting the exact winning lottery numbers but it definitely
can highlight the problems and help a business understand why those problems occurred. Businesses can use
the data-backed and data-found factors to create prescriptions for the business problems, that lead to
realizations and observations.

Prescriptive analytics is the next step of predictive analytics that adds the spice of manipulating the future.
Prescriptive analytics advises on possible outcomes and results in actions that are likely to maximise key
business metrics. It basically uses simulation and optimization to ask “What should a business do?”

Prescriptive analytics is an advanced analytics concept based on –

Optimization that helps achieve the best outcomes.

Stochastic optimization that helps understand how to achieve the best outcome and identify data uncertainties
to make better decisions.

Simulating the future, under various set of assumptions, allows scenario analysis - which when combined with
different optimization techniques, allows prescriptive analysis to be performed. Prescriptive analysis explores
several possible actions and suggests actions depending on the results of descriptive and predictive analytics of
a given dataset.

Prescriptive analytics is a combination of data, and various business rules. The data for prescriptive analytics
can be both internal (within the organization) and external (like social media data).Business rules are

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 44


Course: Business Statistics and Analytics for Decision Making
preferences, best practices, boundaries and other constraints. Mathematical models include natural language
processing, machine learning, statistics, operations research, etc.

Prescriptive analytics are comparatively complex in nature and many companies are not yet using them in day-
to-day business activities, as it becomes difficult to manage. Prescriptive analytics if implemented properly can
have a major impact on business growth. Large scale organizations use prescriptive analytics for scheduling the
inventory in the supply chain, optimizing production, etc. to optimize customer experience.

Aurora Health Care system saved $6 million annually by using prescriptive analytics to reduce re-admission
rates by 10%. Prescriptive analytics can be used in healthcare to enhance drug development, finding the right
patients for clinical trials, etc.

As increasing number of organizations realize that big data is a competitive advantage and they should ensure
that they choose the right kind of data analytics solutions to increase ROI, reduce operational costs and
enhance service quality.

Question5: Discuss how regression analysis stands a basis for effective predictive analysis.

Ans: Regression Analysis is a statistical tool, with the help of which, we are in a position to estimate (or predict)
the unknown values of one variable from known values of another variable is called regression.
With the help of regression analysis we are in a position to Predict the average probable change in one variable
given a certain amount of change in another. Regression analysis is thus designed to examine the relationship
of variable y to a variable x. Thus through using regression analysis one can predict the possible new values of a
dependent variable based on the changes in the independent variable. Hence regression analysis forms an
effective tool for predictive analysis.

Question 6 : A company engaged in manufacturing & marketing of FMCG products wants to enter into a
new untapped market. Basing on the available purchase patterns of the consumers suggest which
products the company should introduce in that market. What data you would require to make this
decision? Will you be using data analytics or data mining or both the tools? Elaborate.

Ans :
In above mentioned situation, both the tools are required for decision of launch. The data mining tools will
provide us the insights on understanding buying patterns of the retailers & consumers in that market while the
analytics tools will provide us the idea about the projected sales, projected success rate, likely buyers for the
new products etc.

The following data is required by this company while taking a decision to introduce the existing products in a
new market.

1.Data about the new market : Total population of the market, demographic profiling of the population, gender
wise classification of the total population, existing direct & indirect competitors, their monthly sales, Retail
shelf size occupied, buying habits of the consumers, no.of available retailers in that market, their payment
terms, population/market growth, seasonality if any etc.

2. Historical data : Past one year sales of the competitors, new entrants in the market during last six months,
competitors who left the market during last one year.

3.Logistics Data ; Available modes of supplies, no.of visits of the distributors in that market, lead time etc.
Basing on the above three sets of data, the management of the company can take a suitable decision about
introduction into that market.
MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 45
Course: Business Statistics and Analytics for Decision Making

Question 7: Discuss the role of Predictive Analytics in business decision making. Illustrate using any
real life/hypothetical situation.

In analytics, it is the basic technique including data science, artificial intelligence & statistical tools for
measuring or estimating the relationship among variables that constitute the essence of economic theory and

Relevant Example : Prediction & classification problems

If we know that two variables, price(x) and demand(y), are closely related. We can predict or classify the most
probable value of x for given value of y or the most probable value of y for a given value of x.
Similarly, if we know that the amount of ax and the rise in price of a commodity are closely related, we can find
out the expected price for a certain amount of tax levy.

Question 8: Critically evaluate the differences between Data Analytics, Data Analysis and Data Mining?

Solution

People are generating tons of data every second. Every post on social media, each heartbeat, every link clicked
on the internet is data. The world generated more than 1ZB of data in 2010. The massive data are often stored
in data warehouses. These warehouses collect data from all possible sources. However, these data are often
unstructured and meaningless, therefore, professionals need to make sense of them. Experts in this field use
certain tools to make sense of these data in order to help businesses make an informed decision. Hence, those
tools include data analytics, data analysis and data mining.

The terms data analytics, data analysis and data mining are used interchangeably by people. However, there are
small differences between the three terms. In simplest terms, data mining is a proper subset of data analytics
and data analytics is a proper subset of data analysis and they are all proper subset of data science. It is easy to
get confused, read on to get a better understanding of the three terms.

Data Mining

We are starting with data mining because it is the smallest in the set we’re considering. Every tool, method or
process used in data mining is also used in data analytics. Data analytics is data mining plus more. Wikipedia
defines data mining as “the process of discovering patterns in large data sets involving methods at the
intersection of machine learning, statistics, and database systems”. The Economic Times defines it as “process
used to extract usable data from a larger set of any raw data”. These definitions give an overview of what data
mining is about. Let’s delve deeper.

Data mining was very popular in the 90s and early 2000s. Some sources say data mining is also known as the
Knowledge Discovery in Databases (KDD) while others say it is one of the stages in KDD. However, what’s most
important is data mining brings together data from a larger pool and tries to find a correlation between two
concepts or items. For instance, it can find the correlation between almonds and fungi or beer and diapers.

The more common operations in data mining used to make meaning of data include clustering, predictive or
descriptive model – forecasting, deviations, correlations between data sets, classification, regression and
summarization.

Clustering

This is a common task in data mining and it is used in grouping data that are similar together. Information that
has similar characteristics are grouped together. It brings a set of data together to find how similar they are and
find facts that were previously unknown. Clustering explains the data and uses the data to predict possible
future trends. Data mining uses clustering method in predicting the future.

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 46


Course: Business Statistics and Analytics for Decision Making
Deviations

This is also known as anomaly detection or deviation detection. This aims to understand why certain patterns
are different from the rest. It further studies data error and aims to find out why they are different and what
caused the difference.

Summarisation: This makes data more compact, therefore, making easier to understand, visualise and report.

Classification

This task aims to put data in groups. New data are classified into already existing structure or groups. For
instance, carrying out a test to know the blood group of a person would place the person into any of the four
blood groups. Furthermore, attempting to classify incoming emails as junk or genuine.

Correlations

This is understanding the links between two data sets. It is sometimes known as association rule learning. Its
goal is to find patterns between two unrelated data. For instance, finding the relationship between diapers’ and
‘beers’.

Regression

This aims to find the least error among a data set.

Data Analytics

Techopedia defined data analytics “as qualitative and quantitative techniques and processes used to enhance
productivity and business gain“. Analytics is the logic and tools behind the analysis. Analytics is the engine that
drives analysis. Businesses make decisions on the outcome of analytics. Margaret Rouse in her article, data
analytics, included the use of “specialised systems and software” in the definition of data analytics. There are
numerous tools used by data analysts, some software are Tableau Public, Open Refine, KNIME, Rapid Miner,
Google Fusion Tables, NodeXL and many more.

Data Analytics is the superset of data mining and a proper subset of data analysis. Data analytics involves using
tools to analyze data in making a business decision. For instance, your business offers massage services to
people using electric massage chairs to help relieve stress and backache. If you’re interested in knowing who
patronizes you, then you can create a table of your customers. You can further group your data by occupation,
age, home address, etc using the data analytics tool.

Quantitative techniques use mathematical and statistical tools and theories to manipulate numbers to obtain a
result or pattern. On the other hand, qualitative analytics is interpretive, it is the use of non-numerical data
such as images, audio, video, point of view, interviews or texts. More advanced data analytics tool include data
mining, machine learning, text mining and big data analytics. Data analytics can also refer to software ranging
from business intelligence (BI), online analytical processing (OLAP).

Data analytics starts with defining the business objective, collecting data, checking for data quality, building an
analytical model and then a decision based on the outcome.

1. Business objective: Data analytics starts with understanding the final goal. The team needs to know what is
required of them. This is part the team plans, select the possible dataset and establish project plans in line with
company goals.

2. Collecting data: The team selects the data that is required to carry out the analysis they want. Since data
comes from different sources. The team has to check and collect data that are most relevant to the information
they are trying to find out.

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 47


Course: Business Statistics and Analytics for Decision Making
3. Data quality: This is where the team ensures the raw information is as clean as possible. Dirty data can
influence results negatively and may cause the management to make wrong decisions. This is a very crucial
step in data analytics. The data team must verify the data quality to ensure it is what is required.

4. Building analytical models: Once the team ensures the data is clean, the team gathers the data for analysis
and they build analytical models. This is done with analytics software and programming languages such as
Python, SQL, R and Scala. In most cases, a test run is done on the data to check if the outcome is close to or in
line with the predicted outcome. If this turns out okay, the team then runs full analysis.

5. Outcome and decision: The next stage is the outcome, the result is evaluated. The team checks for accuracy of
the results and degree of error generated. The result is then deployed, a report is written and the team
performs a final check on the project as a whole. This is termed project review. Once, this is done, observations
and results are passed to the management to make an informed decision.

Data Analysis

EDUCBA defines data analysis as “extracting, cleaning, transforming, modeling and visualization of data with an
intention to uncover meaningful and useful information that can help in deriving conclusion and take
decisions“. This definition is comprehensive and it covers every aspect of data analysis. However, John Turkey,
a world-renowned statistician, added that data analysis includes making the results more precise or accurate
over time.

Data analysis often used interchangeably with data analytics, however, there are slight differences between
them. In the definition of analytics, we saw that it involved the use of specialized software and tools. Data
analysis is a broader term and it fully engulfs data analytics, in other words, data analytics is a subcomponent of
data analysis.

Data analysis involves both technical and non-technical tools. There are several stages in data analysis and the
phases can be iterative to improve accuracy and get better results. Data analysis is very wide and teams work
on different aspects. However, we state the most common steps used by data analysis teams. This is putting a
team together, understanding business objective, data collection, data cleaning, data manipulation,
communication, optimise and repeat.

1. Put a team together: In testing any hypothesis, the first step is to put a team together that would carry out the
analysis.

2. Business objective: The problem bugging the business is put across to the team. This serves as the
background of the analysis the team hopes to get a hypothesis on.

3. Data collection: Once the team understands the business objective, it set out to collect data needed.

4. Data cleaning: This is a very important and crucial step. This is identifying inaccurate or incomplete data and
deleting or modifying them. Dirty data can lead to wrong conclusions which can be fatal for a business. The
team has to ensure the data is as clean as possible. This is the stage the data is inspected.

5. Data manipulation: In this stage, the data is subjected to mathematical and statistical methods, algorithms
modelling of data. The data is transformed from one structure to another.

6. Optimise, communicate and repeat: Before communicating results and reports to the management, the team
has to optimise the data by checking and accounting for error due to calculation or mathematical method. Once,
the results are ready, the team presents their findings to the management in form of images, graphs or video. If
results require the new perspective, then the team would repeat the process from the beginning.

MBA –Outcome Based Syllabus: Question Bank 2019-20 Page 48

You might also like