100% found this document useful (2 votes)
171 views22 pages

Statistical Methods for Decision Making

The wholesale distributor analyzed annual spending data of 440 large retailers across different regions and channels in Portugal. Key findings include: - 'Other' region and 'Hotel' channel spent the most, while 'Oporto' region and 'Retail' channel spent the least. 'Fresh' category had the highest overall spending. - Recommendations are to expand supply of certain products to hotels and increase sales in underperforming regions. The CMSU survey analyzed responses from 62 students. Contingency tables were made for gender vs major, plans to graduate, etc. Probabilities were calculated and it was found graduate intention and gender were independent. For the asphalt shingle data
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
171 views22 pages

Statistical Methods for Decision Making

The wholesale distributor analyzed annual spending data of 440 large retailers across different regions and channels in Portugal. Key findings include: - 'Other' region and 'Hotel' channel spent the most, while 'Oporto' region and 'Retail' channel spent the least. 'Fresh' category had the highest overall spending. - Recommendations are to expand supply of certain products to hotels and increase sales in underperforming regions. The CMSU survey analyzed responses from 62 students. Contingency tables were made for gender vs major, plans to graduate, etc. Probabilities were calculated and it was found graduate intention and gender were independent. For the asphalt shingle data
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Statistical Methods for Decision Making(SMDM)

Project Report
by

Sachin Juneja
Dated- 10th April 2022
Problem 1 : Wholesale Customers Analysis

Problem Statement
A wholesale distributor operating in different regions of Portugal has information on annual spending of
several items in their stores across different regions and channels. The data consists of 440 large retailers’
annual spending on 6 different varieties of products in 3 different regions (Lisbon, Oporto, Other) and across
different sales channel (Hotel, Retail).

Image Source - Stock Images


1.1 Use methods of descriptive statistics to summarize data. DataFrame - wcadescribe

EDA Highlights:
• The data has 440 instances of Buyer/Spenders
spending across 2 channels & 3 regions
• The data has 9 attributes (7 integers & 2 objects)
• The data set is complete with no null values

• ’Fresh’ attribute has the highest spend.


1.1 Which Region and which Channel spent the most? Which Region and DataFrame - wca_topchannel & wca_topregion
which Channel spent the least?.

Channel: Region:
• ‘Hotel’ channel spent the most with approx. 8 million • ‘Other’ region spent the most with approx. 10 million
• ‘Retail’ channel spent the least with approx. 6.6 million • ‘Oporto’ channel spent the least with approx. 1.5 million
1.2 Describe and comment/explain all the varieties across Region and DataFrame wca_totalspend
Channel? Provide a detailed justification for your answer.

Highlights:
• Note - Data values are in thousands
• Fresh Food –
• ‘Other’ region has the highest spend in both ‘Retail’ & ‘Hotel’ channel
• Milk –
• ‘Other’ region & ‘Lisbon’ region have approx. similar spend in the ‘Retail’ channel
• Grocery –
• ‘Other’ region & ‘Oporto’ region have approx. similar spend in the ‘Retail’ channel
• Frozen –
• ‘Oporto’ has the highest spend in the ‘Hotel’ channel
• Detergents_Paper –
• ‘Other’ region & ‘Lisbon’ region have approx. similar spend in the ‘Retail’ channel
• Hotel has the minimum spend across all regions
1.3 On the basis of a descriptive measure of variability, which item shows the most dataframe=wcadescribe
inconsistent behaviour? Which items show the least inconsistent behaviour?

Coefficient of Variation (Relative Dispersion)

• All the food categories has very HIGH Coefficient of Variation & indicator of relative risk
1.4 Are there any outliers in the data? Back up your answer with a suitable dataframe=alloutliersboxplot
plot/technique with the help of detailed comments.

Plot/Technique:
• Used Box Plot and further subplots to indicate the outliers in each food category
• Dropped 'Channel’ & 'Region’ categorical objects from the data
1.5 On the basis of your analysis, what are your recommendations for the business? How can your analysis help
the business to solve its problem? Answer from the business perspective.

Recommendations

• Wholesaler shall try to expand its supply of Detergents_Paper & Grocery & milk supplies to the Hotels by providing
samples of the products in all regions

• Wholesaler shall try to increase its space in the Fresh & Frozen in the retail channel

• Wholesaler shall try to increase its overall sales in ‘Oporto’ region


Problem 2 : CMSU Survey

Problem Statement
The Student News Service at Clear Mountain State University (CMSU) has decided to gather data about the
undergraduate students that attend CMSU. CMSU creates and distributes a survey of 14 questions and
receives responses from 62 undergraduates

Image Source - Stock Images


Brief Summary of the CMSU Survey dataframe=cmsusurvey
dataframe=gender_major,
2.1 For this data, construct the following contingency tables (Keep Gender as row gender_grad,
variable). gender_employment
gender_computer

2.1.1. Gender and Major

2.1.2. Gender and Grad Intention 2.1.3. Gender and Employment

2.1.4. Gender and Computer


2.2.1. What is the probability that a randomly selected CMSU student will
be male?

2.2.2. What is the probability that a randomly selected CMSU student will
be female?
2.3.1. Find the conditional probability of different majors among the male
students in CMSU.

2.3.2 Find the conditional probability of different majors among the


female students of CMSU.
2.4.1. Find the probability That a randomly chosen student is a male and Dataframe - gender_grad
intends to graduate.

2.4.2 Find the probability that a randomly selected student is a female


and does NOT have a laptop.
2.5.1. Find the probability that a randomly chosen student is a male or
has full-time employment?

2.5.2. Find the conditional probability that given a female student is randomly
chosen, she is majoring in international business or management.
2.6. Construct a contingency table of Gender and Intent to Graduate at 2 levels (Yes/No). The Undecided
students are not considered now and the table is a 2x2 table. Do you think the graduate intention and being
female are independent events?

The graduate intention and being female are independent events


2.7.1. If a student is chosen randomly, what is the probability that his/her
GPA is less than 3?

2.7.2. Find the conditional probability that a randomly selected male earns 50 or more. Find the
conditional probability that a randomly selected female earns 50 or more.
2.8. Note that there are four numerical (continuous) variables in the data set, GPA, Salary,
Spending, and Text Messages. For each of them comment whether they follow a normal
distribution. Write a note summarizing your conclusions.

Conclusion
• None of the variable follow a normal distribution
• GPA variable is left skewed
• Salary, Spending & Text Messages are right skewed
Problem 2 : ABC asphalt shingles

Problem Statement
An important quality characteristic used by the manufacturers of ABC asphalt shingles is the amount of moisture the shingles
contain when they are packaged. Customers may feel that they have purchased a product lacking in quality if they find
moisture and wet shingles inside the packaging. In some cases, excessive moisture can cause the granules attached to the
shingles for texture and coloring purposes to fall off the shingles resulting in appearance problems. To monitor the amount of
moisture present, the company conducts moisture tests. A shingle is weighed and then dried. The shingle is then reweighed,
and based on the amount of moisture taken out of the product, the pounds of moisture per 100 square feet are calculated.
The company would like to show that the mean moisture content is less than 0.35 pounds per 100 square feet.
Image Source - Stock Images
Brief Summary of the ABC Asphalt Shingles
3.1 Do you think there is evidence that means moisture contents in both types of shingles are
within the permissible limits? State your conclusions clearly showing all steps.

The file (A & B [Link]) includes 36 measurements (in pounds per 100 square feet) for A shingles and 31 for B shingles.

ABC Asphalt Shingles - A ABC Asphalt Shingles - B


Ho- μ=0.35 Ho- μ=0.35
State Ho and Ha
Ha- μ<0.35 Ha- μ<0.35

0.05 Decide α 0.05

One sample t test -ttest_1samp One sample t test -ttest_1samp –


Identify the Test
t statistic: -1.474 t statistic: -3.100

0.748 Compute the p-value 0.002

Since p-value > 0.05, i.e. p-value = 0.0748 , we do not Since p-value < 0.05, i.e. p-value = 0.002 , we fail to
reject Ho . There is not enough evidence to conclude that accept Ho . There is enough evidence to conclude that the
the mean moisture content for Sample A shingles is less mean moisture content for Sample B shingles is more
than 0.35 pounds (per 100 square feet). if the population than 0.35.
mean moisture content is in fact no less than 0.35 pounds Conclude
per 100 square feet, the probability of observing a sample
of 36 shingles that will result in a sample mean moisture
content of 0.32 pounds per 100 square feet or less is
.0748.
3.2 Do you think that the population mean for shingles A and B are equal? Form the hypothesis and conduct the
test of the hypothesis. What assumption do you need to check before the test for equality of means is
performed?

The file (A & B [Link]) includes 36 measurements (in pounds per 100 square feet) for A shingles and 31 for B shingles.

State Ho and Ha Ho : μ(A)= μ(B) Ha : μ(A)!= μ(B)

Decide α 0.05

two-sample t-test
Identify the Test
t_statistic=1.29

Compute the p-value pvalue=0.202

As the p-value > α , we do not reject H0


Test Assumptions
Conclude • the distributions of the two populations are normal
• The variances of the two distributions are the same.
We can say that population mean for shingles A and B are equal

You might also like