Business Statistics: Australasian
Business Statistics: Australasian
business statistics
3rd edition
Ken Black
John Asafu-Adjaye
Paul Burke
Nazim Khan
Gerard King
Nelson Perera
Carl Sherwood
Reetu Verma
Saleh Wasimi
GOING FURTHER WITH KADDSTAT
Kaddstat online guide • Going further with KaddStat is an online guide with stepped instructions to perform the
textbook demonstration problems using enhanced KaddStat Excel 2010 functionality. Going
further with KaddStat can be downloaded for free from the student website, www.johnwiley.com
.au/highered/black3e/.
• KaddStat Excel 2010 Data Analysis Plug-in and the Australasian data sets can be downloaded
for free from the student website, www.johnwiley.com.au/highered/black3e/.
Screenshots reprinted with permission from Microsoft Corporation.
going further with kaddstat
Chapter 16
D E M O N S T R AT I O N P R O B L E M 3 . 7
One of the factors that universities are judged by in the Good University Guide is the
starting salary of university graduates. To check the statistics reported by the Good
University Guide, a regional university decided to survey recent graduates. The survey asked
graduates their starting salary and degree. A random selection of 50 responses from each
degree can be found on the student website in file DP03-07.xls. Draw boxplots to compare
the starting salaries of graduates with different degrees.
Drawing boxplots
Dp03–07
1. Access DP03-07.xls from the student website.
2. From the Add-Ins tab, select KADD and then Boxplots . . .
3. In the Boxplots dialogue box:
a. enter the range of cells containing the data in the Input Range field.
(Note: If the columns have unequal numbers of observations, you will need to
perform a separate boxplot procedure for each column.)
b. check Header Row Included if you have included the column headings in the input
range. Otherwise, leave it unchecked.
c. choose an output option
d. click OK.
A B C D E F
1 Boxplot output BA BSC BCOM BENG BLAW
2 first Quartile 33056.00 37609.00 41093.00 35850.00 44569.00
3 Median 37069.50 42179.00 45180.00 38620.00 49294.00
4 third Quartile 38914.00 46063.00 50658.00 42154.00 54681.00
5 interquartile range 5858.00 8454.00 9565.00 6304.00 10112.00
6
7 Moderate outliers (Δ) 0 0 0 0 0
8 extreme outliers ( ) 0 0 0 0 0
9
70000
10
11
60000
12
13
50000
14
15
40000
16
BLAW
17
30000 BCOM
18 BSC BENG
BA
19
20000
20
21
10000
22
23
24 0
Based on the medians, the starting salaries of BA graduates appear to be the lowest,
followed by the starting salaries of BEng graduates. Based on the interquartile range, the
starting salaries of BA and BEng graduates display smaller levels of variability. The median
starting salaries of BLaw graduates are higher than any of the other graduate groups
considered in the survey, but also represent the graduates with the greatest variation in
starting salaries based on the interquartile range. The first quartile measure indicates
that 75% of BCom graduates earn $41 000 or more as a starting salary.
D E M O N S T R AT I O N P R O B L E M 5 . 5
Lenovo Group Limited, a Hong Kong IT company, has 30% share of the Hong Kong
PC market. Suppose 20 new PC buyers are selected at random from the Hong Kong
population. What is the probability that fewer than four bought their PC from Lenovo?
f. click OK.
4 Mean = 6 0.15
5 standard deviation = 2.04939
0.10
6
7 Prob. of success = 0.3 0.05
Note: Several choices are available under Range of interest in the Probabilities dialogue box. The
most frequently used for the binomial distribution are Left Tail and Single Outcome. Experiment with
these and the other options to discover which probability each one computes.
A clothing company produces men’s jeans. The jeans are made and sold with either a
regular cut or a boot cut. In an effort to estimate the proportion of their men’s jeans market
in Wellington that prefers boot-cut jeans, an analyst takes a random sample of 212 jeans
sales from the company’s two Wellington retail outlets. Only 34 of the sales were boot-cut
jeans. Construct a 90% confidence interval to estimate the proportion of the population in
Wellington who buy the company’s jeans who prefer the company’s boot-cut jeans.
3. Click OK.
The Excel output using KaddStat follows.
A B
1 Confidence interval for
2 Population Proportion
3 Lower Limit = 0.1189
4 Upper Limit = 0.2018
5 Margin for Error (Half Width) = 0.0415
6
7 Successes = 34
8 Trials = 212
9 % Confidence Interval = 90%
10 p-hat = 0.160377
Suppose you want to estimate the average age of all Boeing 727 aeroplanes now in active
domestic service. You want to be 95% confident, and you want your estimate to be within
2 years of the actual figure. The 727 was first placed into service about 30 years ago, but
you believe that no active 727s in the domestic fleet are more than 25 years old. How large
a sample should you take?
3. Click OK.
The Excel output using KaddStat follows.
A B
1
2 Minimal Sample Size = 38.0
3
4 Desired C.I. = 95%
5 Estimated Std. Dev. = 6.25
6 Margin for Error (Half Width) = 2
Note that sample size estimates of the population mean using the t distribution where σ
is unknown are not shown here. Since a sample size must be known to determine the table
value of t, which in turn is used to estimate the sample size, this procedure usually involves
an iterative process.
Hewitt Associates conducted a national survey to determine the extent to which employers
are promoting health and fitness among their employees. One of the questions asked
was ‘Does your company offer on-site exercise classes?’ Calculate the sample size Hewitt
Associates would need in order to estimate the population proportion to ensure 98%
confidence that the results are within .03 of the true population proportion if:
1. it was estimated before the study that no more than 40% of the companies would answer
yes.
2. there was no previous information available to make an approximation of the value of .
3. Click OK.
The Excel output using KaddStat follows.
A B
1
2 Minimal Sample Size = 1444
3
4 Desired C.I. = 98%
5 Estimated Proportion = 0.4
6 Margin for Error (Half Width) = 0.03
Note that the sample size in the output from KaddStat is 1444 and, from a hand
calculation, the solution is 1448. The reason that these two sample size values are different
is that, in the hand calculation, the z-score for a 98% confidence interval is taken from
table A.5 as approximately 2.33. KaddStat, on the other hand, uses a much more precise
value of z-score in calculating the sample size.
D E M O N S T R AT I O N P R O B L E M 9 . 2
Records show that the average farm size in a particular region has increased over the
last 70 years. This trend might be explained, in part, by the inability of small farms to
compete with the prices and costs of large-scale operations and to produce a level of
income necessary to support the farmers’ desired standard of living. An agribusiness
researcher believes the average size of farms has continued to increase since 2007 from a
mean of 471 hectares. To test this, a random sample of 23 farms was selected from official
government sources and their sizes recorded. The data gathered are shown in table 9.3 in
the textbook. Use α = .05 to test the hypothesis.
A B
1 hypothesis testing
2 for Mean using t
3 p-value = 0.00478164
4
5 Null Hypothesis: μ = 471
6 Alternative Hypothesis: Greater Than
7 Sample Size: n = 23
8 Sample Mean: = 498.7826
9 Sample Standard Deviation: s = 46.9429
10 Standard Error: SE = 9.7883
A national survey found that 17% of Australians consume milk with their breakfast.
However, in Victoria, a large milk producer believes that more than 17% of Victorians
consume milk with their breakfast. To test this idea, a marketing organisation randomly
selected 550 Victorians and asked if they consume milk with their breakfast. It was found
that 115 did. Using a .05 level of significance, test the idea that more than 17% of Victorians
consume milk with their breakfast.
3. Click OK.
The Excel output using KaddStat follows.
A B
1 hypothesis testing for
2 Population Proportion
3 p-value = 0.00733178
4 z-statistic = 2.4406
5 Null Hypothesis: π = 0.17
6 Alternative Hypothesis: Greater Than
7 Number of Trials: n = 550
8 Number of Successes = 115
9 p-hat = 0.2091
A traffic researcher believes that driver behaviour at a set of traffic lights in a suburban
neighbourhood depends on the type of vehicle driven. She sorted types of vehicles into
three categories: sedan, SUV and utility. Driver behaviour was determined by how a driver
responded to the red light signal and had three categories: complete stop, near stop and
did not stop. A random sample of 300 drivers produced the following contingency table
of observed values. At the .01 level of significance, could there be a relationship between
driver behaviour and type of vehicle driven?
Driver behaviour
Vehicle type Complete stop Near stop Did not stop total
Sedan 102 80 50 232
SUV 30 18 15 63
Utility 13 12 20 45
total 145 110 85 340
d. click OK.
A B C D E F G
1 Chi-square test statistic = 11.455 number of:
2 p-value = 0.022 rows = 3
3 columns = 3
4
5 Actual frequencies
6 Variable B
Complete Did not
7 stop Near stop stop Totals
8 Variable a Sedan 102 80 50 232
9 SUV 30 18 15 63
10 Utility 13 12 20 45
11 Totals 145 110 85 340
12
13 Expected frequencies
14 Variable B
Complete Did not
15 stop Near stop stop Totals
16 Variable a Sedan 98.9412 75.0588 58.0000 232
17 SUV 26.8676 20.3824 15.7500 63
18 Utility 19.1912 14.5588 11.2500 45
19 Totals 145 110 85 340
20
21
22 Chi-square calculations
23 Variable B
Complete Did not
24 stop Near stop stop
25 Variable a Sedan 0.0946 0.3253 1.1034
26 SUV 0.3652 0.2785 0.0357
27 Utility 1.9973 0.4497 6.8056
28 NOTE: Expected frequencies should not be less than 5.0
The Australian housing approval data shown in figure 16.3 are provided in the Excel file
DP16-01.xls, Compute a 5-year moving average for the series.
f. Click OK.
A B C D E F G H I J K
5-Period %ABS
1 Period Actual MA Error
2 1 109 150
3 2 114
4 3 101 100
Data
5 4 88
50 Actual
6 5 113
5-Period MA
7 6 134 105 21.40% 0
8 7 97 110 13.74% 1 11 21
9 8 91 107 17.14% Period
The table below shows recent data for Australian consumption of unleaded petrol measured
in megalitres. Use exponential smoothing to forecast the values for each time period. Work
out the problem using alpha values of .2 and .8.
A B C D E F G H I J
%ABS
1 Period Actual Forecast Error
2 1 11560 25000
3 2 12426 11560.0000 6.97% 20000
4 3 13466 11733.2000 12.87% 15000
Data
10000 Actual
5 4 14522 12079.7600 16.82%
5000 Forecast
6 5 15214 12568.2080 17.39%
0
7 6 16308 13097.3664 19.69% 1 11
8 7 18874 13739.4931 27.20% Period
9 8 19962 14766.3945 26.03%
10 9 19876 15805.5156 20.48% MAPE = 16.38%
11 10 19048 16619.6125 12.75%
12 11 19251 17105.2900 11.15% Smoothing Constants
13 12 19234 17534.4320 8.84% Alpha = 0.2
D E M O N S T R AT I O N P R O B L E M 1 6 . 3
Although Excel can be used to deseasonalise time-series data, such as the Australian beer
production data, KaddStat is faster and easier to use.
e. click OK.
A B C D E F G H I
1
2 Period Actual Deseasonalized
3 1 435 437
4 2 380 414
5 3 421 439
6 4 490 435
7 5 435 437
8 6 390 425
9 7 412 429
10 8 454 403
11 9 416 418
12 10 403 439
13 11 408 425
14 12 482 428
15 13 438 440 Seasonal Indexes
16 14 386 421 1 0.9959
17 15 405 422 2 0.9177
18 16 491 436 3 0.9595
19 17 427 429 4 1.1269
20 18 383 417
21 19 394 411 530
22 20 473 420
23 21 420 422 480
Data
24 22 390 425
430 Actual
25 23 411 428
Deseasonalized
26 24 488 433 380
27 25 415 417 1 11 21
28 26 398 434 Period
29 27 419 436
30 28 488 433
31 29 414 416
32 30 374 408
Calculate MAD and MSE for the linear trend model forecasts for the average adult full-time
weekly earnings time-series data shown in table 16.6.
A B
1
2 Mean absolute deviation
3
4 Σ ⎜e ⎜= 508.7
5 n= 46
6
7 Σ ⎜e ⎜ ⁄ n = 11.05869565
To calculate MSE, repeat steps 1 to 3, but select Mean Square Error… instead of Mean
Absolute Deviation…
The Excel output using KaddStat follows.
A B C
1
2 Mean square error
3
4 Σe2 = 8183.61
5 n= 46
6
7 Σe2 ⁄ n = 177.9045652
8
9
According to the random walk hypothesis, stock market prices move randomly over time with
no specific pattern whatsoever. To test this hypothesis at the 5% significance level, a stock
analyst randomly selected stock prices on each day and classified them according to whether
the price was above the average closing price (A) or below the average closing price (B).
A B C
1
2 analysis for runs test
3 (Assumes α = 0.05)
4 Number of observations of first kind: n1 = 16
5 Number of observations of second kind: n2 = 8
6 Number of runs = 11
7 Upper critical value is RU = 17
8 Lower critical value is RL = 6
9 decision: Ho should not be rejected.
Sales ($ million)
Location Before campaign after campaign
1 95 121
2 130 131
3 25 30
4 35 71
5 50 85
6 78 20
7 15 19
8 80 72
9 60 57
10 30 70
11 41 34
12 56 76
13 45 75
14 59 50
15 28 30
16 45 43
17 39 33
18 60 120
A B
1
2 wilcoxon
3 rank sum test
4 p-value = 0.11134432
5
6 Null Hypothesis = 0
7 Alternative Hypothesis: Less Than
8
9 μT = 85.5
10 σT = 22.96192501
11 T− = 113.5
12 T+ = 57.5
13 T= 57.5000
14 Trials = 18
15 z= −1.219409958
Using the data in demonstration problem 17.3, apply the sign test to determine whether
sales increased after the national advertising campaign. Let α = .01.
e. click OK.
The Excel output using KaddStat follows.
A B
1 sign test
2 p-value = 0.17288929
3 z-statistic = 0.9428
4 Null Hypothesis: π = 0.5
5 Alternative Hypothesis: Greater Than
6 Number of Trials: n = 18
7 Number of Successes = 11
An entrepreneur would like to acquire a new client server to handle general accounting
and billing. A critical factor in her decision of which system to buy is reliability in terms of
downtime: the length of time for which the system breaks down. A system is deemed to
be more reliable if it has relatively shorter downtimes. The downtimes (in minutes) of three
systems on her shortlist can be found in the following table. The entrepreneur is not willing
to assume that the downtime populations are normally distributed and has decided to use
the Kruskal–Wallis test rather than the one-way ANOVA. Test the hypothesis using α = .01.
e. click OK.
A B
1
2 kruskal wallis test
3
4 n= 24
5 c= 3
6 Level of Significance = 0.01
7
8 χ2 Critical Value = 9.2103
9 Degrees of freedom = 2
10
11 ∑ Tj2/nj = 4372.475694
12
13
14 K= 12.4495
15 p-Value = 0.001979805
16 reject the null hypothesis.
D E M O N S T R AT I O N P R O B L E M 1 7. 6
c. click OK.
The Excel output using KaddStat follows.
A B
1
2 spearman’s rank correlation
3 Sample Size = 10
4 Spearman’s Rho = 0.29697
A manufacturing facility produces bearings. The diameter specified for the bearings is
5 millimetres. Every 10 minutes, six bearings are sampled and their diameters are measured
and recorded. Twenty of these samples of six bearings are gathered. Use the resulting data
and construct an chart.
Constructing an chart
Dp18–01
1. Access DP18-01.xls from the student website.
2. From the Add-Ins tab, select KADD.
3. Select Quality Control and then X-Bar Chart . . . from the drop-down menu.
A B C D E F G
1
2 X-Bar Chart
3 uCL 5.068
4 X-Bar-Bar 5.002
5 LCL 4.936
6 estimated sigma 0.054
7 estimated standard error 0.0219
8
9 special Concerns observation(s)
10 Below LCL 11, 19 X-Bar - Chart
11 Above UCL 5, 16 5.150
12 7 in a row up or down 5.100
13 8 in a row above or below 5.050
centreline
5.000
14 2 of 3 in a row beyond 1–3, 2–4, 3–5, 9–11,
2 sigma 10–12, 11–13, 12–14 4.950
15 1–5, 2–6, 8–12, 9–13, 4.900
10–14, 11–15, 12–16, 0 5 10 15 20
13–17, 14–18, 15–19, Observation
4 of 5 beyond 1 sigma 16–20
16
The KaddStat output shows the centreline (green), control limits (red) and the estimated
standard deviation and standard error. In addition, KaddStat provides information called
‘Special Concerns’ that may indicate that the process is not in control.
Sample range
1 0.25
2 0.12
3 0.34
4 0.10
5 0.07
6 0.05
7 0.03
8 0.09
9 0.14
10 0.18
11 0.22
12 0.11
13 0.16
14 0.21
15 0.06
16 0.12
17 0.12
18 0.09
19 0.17
20 0.09
A B C D E F G H
1
2 range Chart
3 uCL 0.273
4 r-Bar 0.136 Range - Chart
5 LCL 0.000 0.400
6 estimated sigma(r) 0.046
0.300
7
8 special Concerns observation(s) 0.200
9 Below LCL
0.100
10 Above UCL 3
11 7 in a row up or down 0.000
0 5 10 15 20
12 8 in a row above or below
centerline Observation
13
The KaddStat output is the range chart, showing the centreline (green), control limits (red)
and the estimated standard deviation and standard error. In addition, KaddStat provides
information called ‘Special Concerns’ that may indicate that the process is not in control.
A company produces bond paper and, at regular intervals, samples of 50 sheets of paper
are inspected. Suppose 20 random samples of 50 sheets of paper each are taken during a
certain period of time, with the following numbers of sheets in noncompliance per sample.
Construct a p chart from these data.
Constructing a p chart
Dp18–03
1. Access DP18-03.xls from the student website.
2. From the Add-Ins tab, select KADD.
3. Select Quality Control and then p Chart . . . from the drop-down menu.
4. In the p Chart dialogue box:
a. enter the data range (i.e. the sample proportion) in the Input Range field
b. in the Average sample size field, enter 50 (i.e. n)
c. select Include Sigma lines
d. choose an output option
A B C D E F G H
1
2 p Chart
3 uCL 0.148
4 p-Bar 0.053
5 LCL –0.042
6 estimated sigma(p) 0.032 p - Chart
7 0.200
8 sample size 50
0.150
9
10 special Concerns observation(s) 0.100
11 Below LCL
0.050
12 Above UCL
13 7 in a row up or down 0.000
0 5 10 15 20
14 8 in a row above or
below centerline Observation
15
16
The KaddStat output shows the centreline (green), control limits (red) and the estimated
standard deviation. In addition, KaddStat provides information called ‘Special Concerns’
that may indicate that the process is not in control.
D E M O N S T R AT I O N P R O B L E M 1 9 . 2
population growth
Decision alternative Low (.50) Moderate (.30) high (.20)
Small-scale expansion $45 $40 $30
Medium-scale expansion −$10 $90 $150
Large-scale expansion −$100 $80 $300
f. click OK.
A B C D
1
2 Low growth 45
3 0.50
4 Small-scale expansion Moderate growth 40
5 40.5 0.30
6 High growth 30
7 0.20
8 Low growth −10
9 0.50
10 52 Medium-scale expansion Moderate growth 90
11 52 0.30
12 High growth 150
13 0.20
14 Low growth −100
15 0.50
16 Large-scale expansion Moderate growth 80
17 34 0.30
18 High growth 300
19 0.20
D E M O N S T R AT I O N P R O B L E M 1 9 . 4
In demonstration problem 19.1 in the textbook, the decision makers were faced with the
problem of which expansion strategy to adopt. In this demonstration problem, we have
reduced the decision alternatives to two: medium- and large-scale expansion. Use the
following decision table to create a decision tree that displays the decision alternatives,
payoffs, probabilities, states of demand and expected monetary payoffs. Suppose the cost
of obtaining forecasts on population growth is $12 (recall that amounts are in $ millions).
Incorporate this fact into your decision. Calculate the expected value of sampling information
for this problem. Is it worthwhile for the company to buy the information?
The decision alternatives are: medium-scale expansion or large-scale expansion. The
states of population growth and the prior probabilities are: low growth (.50), moderate
growth (.30) and high growth (.20).
The population forecaster has historically not been accurate 100% of the time. For example,
in a period of low population growth, the forecaster correctly predicted it .80 of the time. When
there was moderate population growth, the forecaster correctly predicted it .70 of the time.
Sixty per cent of the time the forecaster correctly forecast high population growth when there
actually was high growth. Shown below are the probabilities that the forecaster will predict a
particular state of population growth under the actual states of population growth.
A B C D E
1 inPut: (Please fill in shaded areas)
2 sample results Cost of sample = 12
3 Forecast low growth
4 Forecast moderate growth
5 Forecast high growth
6 Likelihoods
forecast high
7 states of nature Priors forecast low growth forecast moderate growth growth
8 Low growth 0.50 0.80 0.15 0.05
9 Moderate growth 0.30 0.15 0.70 0.15
10 High growth 0.20 0.05 0.35 0.60
11 Payoffs
12 actions Low growth Moderate growth high growth
13 Medium-scale expansion −10 90 150
14 Large-scale expansion −100 80 300
15 taBuLar forMat:
16 sample result = Forecast low growth
17 states of nature Prior Likelihood Prior*Likelihood Posterior
18 Low growth 0.5 0.80 0.400 0.879
19 Moderate growth 0.3 0.15 0.045 0.099
20 High growth 0.2 0.05 0.010 0.022
21 Marginal = 0.455
22 sample result = Forecast moderate growth
23 states of nature Prior Likelihood Prior*Likelihood Posterior
24 Low growth 0.5 0.15 0.075 0.211
25 Moderate growth 0.3 0.70 0.210 0.592
26 High growth 0.2 0.35 0.070 0.197
27 Marginal = 0.355
28 sample result = Forecast high growth
29 states of nature Prior Likelihood Prior*Likelihood Posterior
30 Low growth 0.5 0.05 0.025 0.132
31 Moderate growth 0.3 0.15 0.045 0.237
32 High growth 0.2 0.60 0.120 0.632
33 Marginal = 0.19