TutorialGuide 2024S2
TutorialGuide 2024S2
2
Tutorial Guide
Semester 2 - 2024
Question 1
1.2 Define non probability sampling and name two types of non-probability sampling
methods.
Question 2
2.1 Sampling method which is used when the population is assumed to be heterogeneous with
respect to the random variable under study.
2.2 A researcher conducting an interview in Gauteng province only instead of all other provinces.
2.3 Sampling method where it is difficult to identify members of the target population for
reason of sensitivity.
2.4 Sampling method which is used when the population is assumed to be homogeneous with
respect to the random variable under study.
2.6 Sampling method where each member in the target population has an equal chance of
being selected.
2.7 Only beautiful woman are selected to advertise a particular hair product.
2.8 Only VUT student at Deveyton campus are selected to complete a questionnaire instead of
student from other VUT campuses.
2.9 A study is conducted among the motoring public where their age and gender are assumed
to influence their responses to questions on car type preferred and features sought in a
car.
2.11 Only engineers are selected and interviewed in a study on industrial marketing methods.
1
Question 3
3.3 The standard deviation of k number of sample statistics is a measure of the sampling
.
3.4 An important assumption in inferential analysis is that the sampling distribution of the
sample statistic is distributed.
3.5 In a simple random sample, each member in the target population has an
chance of being selected.
3.9 The shape of the distribution of all sample means based on a given sample size around its
population mean is .
Question 4
4.1 Differentiate between the concepts of probability and non-probability sampling methods.
2
Question 5
3
Chapter 7: Confidence Interval Estimation
1 A publishing company has just published a new college textbook. Before the company de-
cides the price at which to sell this textbook, it wants to know the average price of all
such textbooks in the market. The research department at the company took a sample of
50 such textbooks and collected information on their prices. This information produced
mean price of R484.00 for this sample. It is known that the population standard deviation
of the prices of all such textbooks is R45.00.
1.1 What is the point estimate of the mean price of all such textbooks?
1.2 Calculate a 95% confidence interval for the average price of all such college text-
books. Interpret your answer.
2 The average time taken to solve a computer assignment for a sample of students was found
to be 19 minutes with a sample standard deviation of 3 minutes.
Excel Output 1
2.1 Based on Excel Output 1, What does the value 2.624 (correct to 3 decimal places)
represent?
2.4 If the 98% confidence interval is changed to a 95% confidence interval, what effect
will it have on the precision of the interval estimate.
3 The principal of a large university wishes to estimate the average age of the students presently
enrolled. From past studies, the standard deviation is known to be 2 years. A sample of
50 students is selected, and the mean is found to be 23.2 years.
3.1 What is the point estimate of the average age of all students presently enrolled.
4
3.3 Find the 98% confidence interval of the population mean.
4 A survey of 30 adults found that the mean age of a person’s primary vehicle is 5.6 years.
Assuming a population standard deviation of 0.8 year, find the 99% confidence Interval
of the population mean.
5 Ten randomly selected automobiles were stopped, and the tread depth of the right front tire
was measured. The mean was 0.32 inch, and the sample standard deviation was 0.08
inch. Find the 95% confidence interval of the mean depth. Interpret your answer.
6 The average hemoglobin reading for a sample of 20 teachers was 16 grams per 100 milliliters,
with a sample standard deviation of 2 grams. Find the 90% confidence interval of the true
mean.
7 A marketing research student At DUT randomly selected 25 families in Durban to find out
how much they spend on electricity in summer. She found out that during the month
of December the average amount spend on electricity was R250 with a sample standard
deviation of R50.
Excel Output 2
7.1 Based on the Excel output 1, what does the value 27.9694 represent.
7.2 Construct a 99% confidence interval for the average amount spent on electricity in
the month of December.
7.3 If the sample size is reduced from n=25 to n=20, what will happen to the width and
precision of the interval estimate?
8 A Supermarket manager want to estimate time spend by customers when doing their pur-
chases. He observed the entry and departure times of 86 randomly selected customers
from the supermarket. The sample mean time was found to be 27.9 minutes. Assume a
population standard deviation of 10.5 minutes and that shopping time is approximately
normally distributed. Construct 90% confidence interval for the actual time spend by
customers when doing their shopping and interpret.
5
9 A study of 36 members of the Central Park Walkers showed that they could walk at an av-
erage rate of 8 kilometer per hour. The sample standard deviation is 0.8. Find the 95%
confidence interval for the mean for all walkers.
10 A recent study of 28 city residents showed that the mean of the time they had lived at their
present address was 9.3 years. The standard deviation of the sample was 2 years. Find
the 90% confidence interval of the true mean.
11 60 people were asked to measure their pulse rates after completing a 3 km run. The mean
was 105 beats and the population standard deviation was 8 beats. Construct a 95% confi-
dence interval for the mean of the population of people.
12 A random sample of 35 restaurants revealed that clients spend an average of R60 per meal.
Assume σ = 10. If a 99% confidence interval for the population average is constructed:
12.4 If a sample of 50 restaurants are used in the study, explain what the effect on the
width of the interval will be; and hence on the precision of the interval estimate.
13 An enterprise receives a shipment of 40 containers of paint, and the manager wants to esti-
mate the correct quantity of paint in each container. A random sample of 13 containers is
selected. The average quantity of paint per container is 20l and the standard deviation is
1l. If the data is normally distributed, calculate 90% confidence limits for the population
average.
14 The number of shirts finished per hour by a particular production line is normally dis-
tributed. A random sample of 25 hours output had a mean of 40 shirts per hour and a
standard deviation of 9 shirts. Calculate the 95% confidence interval for the true mean
number of shirts finished per hour.
15 The telephone company wants to estimate the proportion of households that would pur-
chase an additional telephone line if it were made available at a substantially reduced
installation cost. A random sample of 500 households is selected. The results indicate
that 135 of the households would purchase the additional telephone line at a reduced
installation cost.
6
15.1 Estimate the proportion of households that would purchase the additional telephone
line.
15.3 Calculate the estimated standard error for this sampling distribution.
15.4 Construct a 90% confidence interval estimate of the population proportion of house-
holds that would purchase the additional telephone line.
16 In a survey, 500 people were asked to identify their major source of news information and
150 stated that their major source was newspaper.
16.1 What is the sample proportion of people in the population who consider newspaper
their major source of news information.
16.2 Construct a 98% confidence interval for actual people in the population who con-
sider newspaper their major source of news information.
17 A sample of 500 nursing applications included 60 from men. To find the confidence interval
of the true proportion of men who applied to the program, the following Excel output is
given:
Excel Output 3
17.3 Calculate the confidence interval of the true proportion of men who applied to the
program.
7
18 A random sample of 100 families contained 65 families that owned one or more colour
television sets. Based on the sample results, construct a 99% confidence interval for the
true proportion of families who own colour television sets.
19 A recent study of 100 people in Soweto found that 27 were obese. Find the 90% confidence
interval of the population proportion of individuals living in Soweto who are obese.
21 The owner of a restaurant that serves continental food wants to study characteristics of his
customers. He decides to focus on two variables: the amount of money spent by cus-
tomers and whether customers order dessert, the results from a sample of 60 customers
are as follows: Based on the amount spent: = R38.54 and S = R7.26 and on the 18 cus-
tomers purchased dessert.
21.1 Construct a 95% confidence interval estimate of the population mean amount spent
per customer in the restaurant.
21.2 Construct a 90% confidence interval estimate of the population proportion of cus-
tomers who purchase dessert.
22 The owner of a large shopping centre is besieged with complaints the shortage of parking
space. He feels that the 1 000 spaces are adequate. In an effort to address the problem, he
obtains a sample of the average number of cars on the parking lot during prime hours.
The sample of 40 has a mean of 952. Assume a population standard deviation of 396.
Construct the 95% confidence interval estimate for prime hour parking.
23 For a selected month, the average kilowatt hours used by 49 residential customers is 1160
Kilowatt and the standard deviation S is 1085 kilowatt. Assume that the t-value for the
95% confidence interval is 1.6772. Determine the confidence interval estimate for the true
mean
24 A stationery store wants to estimate the mean retail value of greeting cards that it has in its
Inventory. A random sample of 100 greeting cards indicates a mean value of R2.05 and a
standard deviation of R0.44. Assuming a normal distribution, construct a 95% confidence
interval estimate of the mean value of all greeting cards in the store’s inventory.
25 Your statistics instructor wants you to determine a confidence interval estimate for the mean
Test score. Past experience indicated that tests scores are normally distributed with a
sample mean of 160 and a population standard deviation of 45. A confidence interval
estimate if your Group has 36 students is:
(a)145.3 ≤ µ ≤ 174.7 (b)157.55 ≤ µ ≤ 162.45 (c)152.5 ≤ µ ≤ 167.5
8
26 The data represent the overall miles per gallon (MPG) of 2008 SUVs priced under R30 000.
23 20 21 22 18 18 17 17 19 19 19
17 21 18 18 18 17 17 16 20 16 22
Construct a 95% confidence interval estimate for the population mean miles per gallon of
2008 SUVs priced under R30 000 assuming a normal distribution.
27 The operations manager of a large production plant would like to estimate the mean amount
of time a worker takes to assemble a new electronic component. Assume that the popula-
tion standard deviation of this assembly time is 3.6 minutes. After observing 120 workers
assembling similar devices, the manager noticed that their average time was 16.2 min-
utes.
27.3 Construct a 95% confidence interval for the mean assembly time.
28 A random sample of married people were asked "Would you remarry your spouse if you
were given the opportunity for a second time?"; Of the 150 people surveyed, 127 of them
said that they would do so.
28.1 Estimate the proportion of married people who would remarry their spouse.
28.3 Calculate the estimated standard error for this sampling distribution.
28.4 Construct a 99% confidence interval for the proportion of married people who would
remarry their spouse.
9
Excel Output 4
29.3 Based on Excel Output 4, What does the value 2.426 (correct to 3 decimal places)
represent?
30 A sample poll of 100 voters chosen at random from all voters in a given district.55 of them
were in favour of a particular candidate.
30.1 Estimate the proportion of voters who favour the particular candidate.
30.3 Calculate the estimated standard error for this sampling distribution.
30.4 Construct a 99% confidence interval for the proportion of voters who favour the
particular candidate.
30.5 If the 99% confidence interval is changed to a 95% confidence interval, what effect
will it have on the precision of the interval estimate.
10
EXCEL FUNCTIONS: 2007 versus 2010
11
Chapter 8: Hypothesis Testing
Five steps:
1. Define the statistical hypotheses- the null (H0 ) and alternative (H1 ) hypotheses.
2. Determine the region of acceptance of the null hypothesis.
3. Calculate the sample test statistic.
4. Compare the sample test statistic to the region of acceptance.
5. Draw statistical and management conclusions.
Z Critical limits associated with given levels of significance (both one-sided and two sided tests).
12
Five steps: Mean: population standard deviation unknown.
1. Formulate the hypotheses- the null (H0 ) and alternative (H1 ) hypotheses.
t Critical limits associated with given levels of significance (both one-sided and two sided tests).
13
Five steps: Proportion
1. Formulate the hypotheses- the null (H0 ) and alternative (H1 ) hypotheses.
Z Critical limits associated with given levels of significance (both one-sided and two sided tests).
14
Five steps: Mean: More than or equal to and less than or equal to
1. Formulate the hypotheses- the null (H0 ) and alternative (H1 ) hypotheses.
Z Critical limits associated with given levels of significance (both one-sided and two sided tests).
15
1 The quality-control manager at a lightbulb factory needs to determine whether the mean life
of a large shipment of lightbulbs is equal to 375 hours. The population standard deviation
is 100 hours. A random sample of 64 light bulbs indicates a sample mean life of 350 hours.
Conduct an appropriate statistical test to determine whether the mean life is equal to 375
hours.
Excel Output 1
1.1 Based on the Excel Output 1, the claim is tested at a significance level equal to
1.2 Based on the Excel Output 1, test the hypothesis and make a statistical conclusion
on the test.
2 The administrative officer of a hospital claims that the mean waiting time for patients to get
treatment in its emergency ward is more than 25 minutes. A random sample of 16 patients
who received treatment in the emergency ward of this hospital produced a mean waiting
time of 27.5 minutes with a sample standard deviation of 4.8 minutes. Assume that the
waiting times for all patients at this emergency ward follows a normal distribution. Using
the 10% significance level, test whether the mean waiting time at the emergency ward is
more than 25 minutes?
2.2 The above claim is tested at a significance level of 10%, find the critical-value.
2.3 The sample statistics = 2.08, this value falls within the area of
3 It has been reported that the average credit card debt for college seniors is R32000. The
student senate at a large university feels that their seniors have a debt much less than this,
so it conducts a study of 50 randomly selected seniors and finds that the average debt is
R33200, and the population standard deviation is R5200. Test at 5% level of significance
whether the average debt by a senior student is more than 32000.
16
4 A certain company would like to determine the amount of time employees waste at work
each day. A random sample of 10 of its employees shows a mean time of 121.80 minutes
wasted per day with a standard deviation of 9.45 minutes per day. Does the data provide
evidence that the mean amount of time wasted by employees each day is more than 120
minutes? Test at α = 0.05. Assume the population is at least approximately normally
distributed.
5 The manager of BREAD FOR LIFE bakery want to check if the average weight of their loaf
bread is equal to 700g as stated in the label of the plastic. A random sample of 80 loafs of
bread was conducted and the sample mean was found to be 685g. Assume a population
standard deviation of 60g.
5.1 Define the null and alternative hypothesis for the average weight of the loaf of bread.
5.2 The above claim is tested at a significance level of 1% do you agree with the critical
value as calculated in Excel Output 2?
Excel Output 2
5.3 The sample statistic = -2.24.This value falls within the area of
17
5.5 If the p-value Approach to hypothesis testing is used, where Zstat = −2.24
6 A teacher claim that the average score of all grader 6 learners at his school is 75. A random
sample of 100 learners at that school was taken. The mean score of these 100 learners was
71 with a population standard deviation of 8.1.
6.2 Redo the question 6, if p- value method is used with Zstat = −4.94
7 A publisher of university textbooks claims that the average price of all hardbound textbooks
is at most R627.50. A student group believes that the actual mean is higher and wishes
to test their belief. They use a sample of 27 textbooks and find that the mean price of the
textbooks is R694.25 with a standard deviation of R10.50.
7.3 If a significance level of 1% is used, what will the critical value be?
8 The recipe for a bakery item is designed to result in a product that contains 8 grams of fat per
serving. The quality control department samples the product periodically to insure that
the production process is working as desired. A sample of 45 products revealed a mean
of 10 grams of fat per serving. Test, at a 5% level, that the product contains 8 grams of fat
per serving. Assume σ = 2 grams.
18
8.3 If the test statistic is 6.71, H0 will be:
9 A retailer believes that less than 20% of grocery purchases are paid by cash (either a credit
card or cheque method of payment is preferred). To test this assertion, he observed a
sample of 160 customers at random and established that only 28 pay for their grocery
purchases by cash. Is his claim correct, test at 1% level of significance?
10 An educator estimates that the drop out rate for seniors at Progress High School is 12%. Last
year in a random sample of 300 Progress seniors, 27 withdrew from school. At α = 0.05,
is there enough evidence to reject the educator’s claim?
11 A Magazine wants to launch an online version, but only if more than 20% of its subscribers
would subscribe to it. A random survey of 400 subscribers indicated that 90 would be
interested. Test at 5% level of significance if the magazine has more than 20% subscribers.
12 A magazine company called ACE claim to have more than 35% of readership in Gauteng
province. Their competitor disagree with the claim and randomly selected 500 active
readers of different magazines. If 190 indicated that they read ACE magazine. Test at 5%
level of significance the validity of the statement made by ACE magazine.
13 A professional claims that more than 40% of all salesmen employed by firms switch jobs
within three years of being hired. At a significance level of 0.01, should the claim be
accepted or rejected if the sample results show that 25 out of 100 salesmen changed jobs
within 3 years?
14 A telephone company representative estimates that 40% of its customers have call-waiting
service. To test this hypothesis, he selected a sample of 100 customers and found that
thirty seven customers had call waiting. In the claim made, test, at 5% level of signifi-
cance, the hypothesis that the customers that have call-waiting service are less than 40%.
14.2 The above claim is tested at a significance level of 5%. The critical-value from the
table equals:
19
14.3 The sample statistics = −0.6122. This value falls within the area of
15 A sample of 200 people has a mean age of 21 with a population standard deviation of 5.
Test the hypothesis that the population mean is 18.9 at α = 0.05.
16 Blood glucose levels for obese patients have a mean of 100 with a standard deviation of 15.
A researcher thinks that a diet high in raw cornstarch will have a positive or negative
effect on blood glucose levels. A sample of 30 patients who have tried the raw cornstarch
diet have a mean glucose level of 140. Test, at the 10% level, the hypothesis that the raw
cornstarch had an effect.
17 A weight loss program claims all the participants to the program lose, on average 8 kg or
more in a year. When 40 people used the program for one year, their mean weight loss
was 6.9 kg. Use a 0.01 level of significance to test this claim.
19 A random sample of 200 observations shows that here are 36 successes. We want to test at
the 1% significance level if the true proportion of successes in the population is less than
24%, and made certain calculations.
(c) The critical value of Z (from the table) is Z < −Z0.01 = −2.33
19.2 If, in a random sample of 400 items, 164 are defective, what is the sample proportion
of the defective items?
20
19.3 Refer to question 19.2, suppose you are testing the null hypothesis Ho : π = 0.40
against H1 : π < 0.40 and you choose the level of significance α = 0.05. What is
your statistical decision?
20 The light bulbs in an industrial warehouse have been found to have a mean lifetime of
1030.0 hours, with a standard deviation of 60.0 hours. The warehouse manager has been
approached by a representative of Extend bulb, a company that make a device intended
to increase bulb life. The manager is concerned that the average lifetime of Extendabulb-
equipped bulbs might not be any greater than 1030 hours historically experienced. In
a subsequent test, the manager tests 40 bulbs equipped with the device and finds their
mean life to be 1061.6 hours. Does Extend bulb really work? Use α = 0.05
21 My daughter and I have argued the average length of our preacher’s sermons on Sunday
morning. Despite my arguments, she thinks that the sermons are more than twenty min-
utes and this is not acceptable to her. For one year she randomly selected 12 Sundays and
found the average time of 26.42 minutes with standard deviation of 6.69 minutes. As-
suming that the population is normally distributed and using 0.05 level of significance,
we decided to make a scientific analysis, using hypothesis test. Calculate the test statistic
and make a statistical decision.
21
Chapter 9.2: Hypothesis Testing
for the Difference between Two Means (µ1 − µ2 ) for Independent
Samples: Assume Population Standard Deviations are Known
1 A large organization produces electric light bulbs in each of its two factory ie factory A and
factory B. The organization believes that the average life of bulbs from factory A is the
same as that of factory B. To check their believe, they randomly selected 30 light bulbs
from factory A and 32 light bulbs from factory B. These light bulbs are measured to de-
termine how long each one works in hours before they fail. It was found that the sample
means of bulbs in factory A was 1135.33 and that of bulbs in factory B was 894.22. As-
sume the population standard deviation of 229.75 and 248.29 for bulbs in factory A and
bulbs in factory B respectively.
1.1 Define the null and alternative hypothesis to test whether there is a significant dif-
ference between the average life of bulbs from factory A with that of factory B.
1.2 The above claim is tested at tested at a significance level of 10%. The critical value
from the statistical table.
1.3 The sample statistic = 3.98, this value falls within the area of
2 An investigation into the ages at which a person acquires a full driving licence in a certain
country produced the following information:
In Province A, a random sample of 160 people who acquired their driving licence in a
particular month was found to have a mean age of 18.78 years, with a population stan-
dard deviation of 2.9 years.
In Province B, a random sample of 120 people who acquired their driving licence in the
same month was found to have a mean age of 19.82 years, with a population standard
deviation of 2.4 years.
2.1 Define the hypothesis for the difference between the two population’s means.
2.2 The above claim is tested at a significance level of 1%. The critical value (z-limits)
from the table equals:
2.4 State the statistical and management conclusions for the test.
22
City Mean Std.dev sample size
Apex $8.95 $0.40 200
Eden $9.10 $0.60 175
3.1 Define the hypothesis for the difference between the two population’s means.
3.2 The above claim is tested at a significance level of 5%. The critical value (z-limits)
from the table equals:
4 A company wishes to test when the sensitivity achieved by a new program is significantly
higher than achieved under the legacy program. The following information is available
from the test results.
Mean σ n
New program 92 15 32
Legacy program 84 19 35
Is the mean sensitivity achieved by the new program significantly higher than the mean
sensitivity achieved under the legacy program? Use a 5% significant level.
5 Mary Jo Fitzpatrick is the Vice President for Nursing Services at St. Luke’s Memorial Hospi-
tal. Recently she noticed in the job posting for nurses that those that are unionised seem
to offer higher wages. She decided to investigate and gathered the following sample
information.
Would it be reasonable for her to conclude that there is significant difference in earning
between union and non-union nurses? Use the 0.01 significance level.
23
6 A manufacturer claims that the calling range (in metre) of its cordless telephone is greater
than that of its leading competitor. You perform a study using 44 randomly selected
phones from the manufacturer and 46 randomly selected similar phones from its com-
petitor. The results are shown below. At α = 0.05, can you support the manufacturer’s
claim? Assume the populations are normally distributed.
7 The purchasing director for an industrial parts factory is investigating the possibility of pur-
chasing a new type of milling machine. He determines that the new machine will be
bought if there is evidence that the parts produced have a higher mean breaking strength
than those from the old machine. The population standard deviation of the breaking
strength for the old machine is 12 kilograms and for the new machine is 11 kilograms. A
sample of 50 parts taken from the old machine indicate a sample mean of 70 kilograms
and a similar sample of 50 from the new machine indicates a sample mean of 75 kilo-
grams.
7.1 Define the null and alternative hypothesis to test whether there is a significant dif-
ference in the average breaking strength of the old machine and the new machine.
7.2 The above claim is tested at a significance level of 5%. The critical value from the
statistical table equals:
7.3 The sample statistic = -2.17, this value falls within the area of
7.4 Formulate the management conclusion only based on the results obtained.
7.5 If the p-value Approach to hypothesis testing is used, where z − stat = −2.17, what
will the corresponding p-value be?
Test the hypothesis at the 5% significance level that it takes car commuters to get to work earlier
than train commuters.
24
EXCEL FUNCTIONS: 2007 versus 2010
25
Chapter 10.2: The Chi-Square Test for Independence of Association
1 In a study on smoking habits of a population, the following results were obtained:
Gender
Smoking habit
Male Female Total
Nonsmoker 149 148 297
Past smoker 13 24 37
Current smoker 31 37 68
Total 193 209 402
1.1 State the null hypothesis for the association between gender and smoking habit.
1.3 The hypothesis is to be tested at the 10% level of significance. The critical value
(chi-square limits) from the table equals:
1.4 The expected value for a respondent that is a female and a non-smoker, is .
1.5 The sample statistics for this test is χ2stat = 3.172. This value falls within the area of
2 The following results were obtained in a study on the hand preference of 300 male and female
respondents:
Hand preference
Gender
Left Right Total
Female 12 108 120
Male 24 156 180
Total 36 264 300
26
Compile the chi-square test to determine whether there is an association between gender
and hand preference. Use a 5% level of significance.
3 A motor vehicle distributor wishes to find out if the size of car bought is in any way related
to the age of a buyer. From sales invoices over the past two years, a sample of 300 buyers
were classified by size of car bought and buyer’s age.
Test, at the 1% level of significance, whether car size bought and buyer’s age are indepen-
dent.
3.1 State the null hypothesis.
3.2 Determine the critical value for this Chi-square test.
4 A geologist collects hand-specimen sized pieces of limestone from a particular area. A qual-
itative assessment of both texture and color is made with the following results. The fol-
lowing contingency table was constructed:
Colour
Texture
Light Medium Dark Total
Fine 4 20 8
Medium 5 23 12
Coarse 21 23 4
Total
Is there evidence of association between color and texture for these limestones?
4.2 If the chi-square test is done, calculate the number of degrees of freedom.
27
4.4 Calculate the value for the sample test statistic (χ2 - statistic).
5 Male and female adults completed a questionnaire on their body image. The following re-
sults were obtained:
Body image
Gender
About right Overweight Underweight Total
Female 560 163 37 760
Male 295 72 73 440
Total 855 235 110 1200
5.3 If the chi-square test is done at a 1% level, the critical value will be:
Body image
Gender
About right Overweight Underweight
Female 541.50 A 69.67
Male 313.50 86.17 B
(f0 − fe )2
5.5.1 Calculate the missing values for , C and D (given below).
fe
(f0 − fe )2
P
5.5.2 Calculate the value for , E.
fe
(f0 − fe )2
fe
Female About right 0.632
Overweight 1.349
Underweight C
Male About right 1.092
Overweight D
Underweight 26.465
E
28
5.6 State the statistical and management conclusions for the test.
6 A company has to choose among three health insurance plans. Management wishes to know
whether the preference for plans is independent of job classification. The opinions of a
random sample of 500 employees are shown below:
6.1 State the null hypothesis for the association between job classification and preference
for plans.
6.3 The hypothesis is to be tested at the 5% level of significance. The critical value (chi-
square limits) from the table equals:
6.4 The expected value for salaried workers that prefer plan 2, is .
6.5 The sample statistics for this test is χ2stat = 49.632. This value falls within the area of
7 Is gender independent of education level? A random sample of 395 people were surveyed
and each person was asked to report the highest education level they obtained. The data
that resulted from the survey is summarized in the following table:
29
7.1 State the null and alternative hypothesis.
7.2 The hypothesis is tested at 10% level of significance. The critical value from the table
equals to:
7.3 The value of the sample statistic equals 8.006. This value fall within the area of:
8 A trainee risk manager for an investment bank has been told that the levels of risk is directly
related to the industry type (manufacturing, retail, financial). For the data presented in
the contingency table below, analyse whether the perceived risk depends on
Industrial Class
Level of Risk
Manufacturing Retail Financial
Low 81 38 16
Moderate 46 42 33
High 22 26 29
Total 109 106 78
8.4 Calculate the expected frequency for moderate level of risk and financial class.
8.5 Given χ2cal = 28.88 this value fall within the area of
9 A study is conducted amongst adults to determine whether different age groups prefer dif-
ferent drinks.
Drinks
Age group
Soda Coffee Tea Water Total
20-29 10 8 5 2 25
30-39 11 9 2 3 25
40-49 8 9 1 7 25
50-59 9 8 3 5 25
Total 38 34 11 17 100
30
At the 10% level of significance, conduct a chi-square test to determine whether there is
an association between age group and preferred drinks by adults.
10 A large carpet store wishes to determine if the brand of carpet purchased is related to the
purchaser’s family income. As a sampling frame, they mailed a survey to people who
have a store credit card. Five hundred customers returned the survey and the results
follow:
Brand of Carpet
Family Income
Brand A Brand B Brand C
High Income 65 32 32
Middle Income 80 68 104
Low Income 25 35 59
The statements below refer to a test conducted on the data above to determine if the
brand of carpet purchased is related to the purchaser’s family income at the 5% level of
significance.
Brand of Carpet
Family Income
Brand A Brand B Brand C
High Income 43.86 34.83 50.31
Middle Income 85.68 68.04 98.28
Low Income 40.46 32.13 46.41
10.5 We can conclude that the brand of carpet purchased is related to the purchaser’s
family Income.
31
Chapter 12: Simple Linear Regression and Correlation Analysis
Question 1
A gynaecologist records the blood pressures of her pregnant patients and collected the follow-
ing data
Age (X) 23 24 25 26 28 29 31 35 40
Lower limit of BP (Y) 65 60 62 70 70 73 75 83 90
P 2
r2 = 0.9409
P
xy = 19212 x = 7817
1.2 Calculate the regression equation using the method of least squares.
1.3 Estimate the blood pressure if the age of the patient is 38 years.
1.5 Find the value of the correlation coefficient (r) and interpret it.
Question 2
2.1 The number of fires in a national park and the total rainfall (cm) for that region were
recorded over an 8-year period. The results were as follows:
32
Number of fires 12 19 8 25 23 20 12 33
Rainfall 20 15 30 16 14 16 25 8
P P 2 P 2
n=8 y = 152 x = 2922 y = 3356 b0 = 38.91
2.1.1 Plot the data for the number of fires (y) versus rainfall (x) on a scatter plot. Comment
on your graph with respect to direction and dispersion.
2.1.2 Find the straight-line regression equation for the data.
2.1.3 Draw the straight line on the graph in question (2.1.1). Show all calculations.
2.1.4 Find the correlation coefficient between rainfall and number of fires. Interpret.
2.2 A social scientist would like to analyse the relationship between educational attainment
and salary. The following Excel output is given for the data obtained, where "Education"
refers to years of higher education and "Salary" is the individual salary:
2.2.1 Which one of the above mentioned variables is the predictor variable?
2.2.2 Interpret the value of the slope.
2.2.3 What percentage of the variation in salary can be explained by education?
Question 3
3.1 The table below shows the data a sales manager has collected on annual sales and years of
experience:
33
Years of experience 1 3 4 4 6 8 10 10 11 13
Annual sales (R1000) 80 97 92 102 103 111 119 123 117 136
P P
n = 10 y = 1080 x = 70 b0 = 80
3.2 A real estate agency collects data concerning the Home Size (in hundreds of square feet)
and the Sales Price of houses (in thousands of rands) as follows:
Answer the following questions based on the above computer output and the data given:
Question 4
Given the following data on number of beers (x) taken by drivers and the number of accidents
(y):
34
No. of beers 9 7 6 11 14 15 12 13 15
No. of accidents 6 6 7 8 9 9 8 9 10
P P 2
xy = 850 x = 1246 r = 0.8960
4.1 Draw a scatter plot of the number of beers taken and number of accidents.
4.2 Calculate the regression equation using the least squares method.
4.3 Draw the straight line or line of best on the scatter plot in Question 4.1.
4.4 Predict the number of accidents that are caused by drivers who drink 10 beers.
Question 5
The following table gives the heights (in inches) of nine different randomly selected seedlings
at the end of a certain number of days after planting.
5.2 Compute the regression equation using the method of least squares.
35
5.3 Predict the height of such a seedling at the end of 25 days after planting.
5.4 Define Extrapolation. State whether it is applicable to Question (5.3) or not.
5.5 Find the correlation coefficient.
Question 6
Eight students, randomly selected from a large class, were asked to keep a record of the hours
they spent studying before the midterm examination. The following table gives the number of
hours these eight students studied before the midterm and their scores on the midterm.
Hours studied 15 7 12 8 18 6 9 11
Midterm score 97 78 87 92 89 57 74 69
Use the Summary Output given below to answer the following Questions.
6.2 What is the value of the slope (b1 ) ? Interpret the value.
6.4 Find the value of the coefficient of determination and interpret it?
Question 7
7.1 In a manufacturing process the assembly line speed was thought to affect the number of
defective parts found during the inspection process. To test this theory, managers devised
a situation in which the same batch of parts was inspected visually as a variety of line
speeds. The following table lists the collected data.
36
Line speed (x) 20 20 40 30 60 40
Number of defective parts found (y) 21 19 15 16 14 17
Given: b1 = −0.15
Question 8
Thirteen students followed a management course at a certain University. The table below repre-
sents their average examination marks at the end of their second year and their corresponding
average matric examination marks.
Student A B C D E F G H I J K L M
Matric mark 52 58 59 69 61 59 52 58 73 50 61 71 57
University mark 58 59 60 69 73 58 53 51 64 50 55 78 52
r2 = 0.5451
P P
n = 13 x = 780 y = 780 b0 = 6.10
37
8.1 Examine the given scatterplot and comment on it by choosing the correct statement.
8.4 Estimate the university mark of a student who’s matric mark was 80.
8.6 Comment on the strength of the relationship between the two variables by using the
correlation coefficient.
8.7 What percentage of the variation in the university mark can be explained by the matric
mark?
Question 9
A sociologist was hired by a large city hospital to investigate the relationship between the num-
bers of unauthorized days that employees are absent per year and the distance between home
and work for the employees. A sample of 10 employees was chosen and the data collected.
Given:
ŷ = 8.1 − 0.34x for 1 ≤ x ≤ 18 r = 0.84 t − stat = 0.0776
9.1 Identify the dependent and independent variables for the study.
9.3 Estimate the number of days an employee who lives 15 km from work, will be absent.
9.4 What is the value of the coefficient of determination. Interpret this value.
9.5 A hypothesis test is conducted at a 5% level to test the regression equation for signifi-
cance.
38
Question 10
A study to explore the relationship between income and training, resulted in the following:
Yearly income
(R’000) 150 200 250 350 450 550 750 1000 1750 2000 2500
Number of years
of training 3 4 5 7 6 9 11 12 16 15 21
10.4 If the years of training increase with one year, the yearly income will increase with R138
597.97
10.6 93.60% of the variation in yearly income can be explained by the years of training.
39
10.8 Based on the correlation coefficient, there exists a moderate linear relationship between
the two variables.
11 A study is conducted to examine the relationship between the amount spent on advertising
a new product and consumer awareness of the product based on the proportion of people
who have heard of it. Suppose a sample shows the following data for four different
products:
11.1 Indicate which of the given variables is the predictor and which is the response vari-
able.
11.2 What is the value of the slope?
11.3 What is the value of the y-intercept?
11.4 What is the value of the correlation coefficient?
11.5 What percentage of variation in consumer awareness can be explained by advertis-
ing expenses?
12 The following table gives information on ages and cholesterol levels for a random sample
of 8 men.
Age 58 69 43 39 63 52 47 31
Cholesterol level 189 235 193 177 154 191 213 175
40
Use the Summary Output given below to answer the following Questions.
12.1 What is the value of the slope (b1 ) ? Interpret the value.
12.2 Compute the regression equation.
12.3 Find the value of the correlation coefficient and interpret it?
13 Eight tomato plants were selected at random and treated. The yield in kilogram was recorded.
13.1 Calculate the linear regression equation by using the least square method.
13.2 Construct a scatter plot for the above data set.
13.3 Fit a linear regression to the scatter plot in question 13.2
13.4 Estimate Yield when amount of fertilizer is 6.
13.5 Is the estimation in question 13.4 above reliable? And why?
13.6 Interpret the value of slope.
13.7 Given r2 = 0.8920. Interpret correlation coefficient.
14 The following table shows the hours of sunshine (x) during nine days in august and the
number of ice cream sold by a beach shop in Durban.
41
14.2 Construct a scatterplot for the dataset.
14.3 Fit a regression equation into the scatterplot in question 14.2
14.4 Estimate ice cream sold when hours of sunshine is 12.
14.5 Write down the y intercept.
15 The mathematics(x) and statistics(y) examination marks for a group of 10 students are
shown below.
Mathematics(x) 89 73 57 53 51 49 47 76 66 70
Statistics(y) 70 67 50 44 45 45 38 72 65 71
15.1 Calculate the linear regression equation by using the least squares method.
15.2 Draw the scatter plot for the above data.
15.3 Draw the regression line on the scatter plot.
15.4 Predict statistics marks if student get 60 in mathematics.
15.5 Do you think it is reliable to estimate statistic marks if student get 40 in mathematics?
Give reason for your answer.
15.6 Given r2 = 0.85. Interpret correlation coefficient
16 Given: summary output of semester mark(x) of statistics students and their final exam(y).
42
16.1 Interpret the value of the slope.
16.2 Interpret coefficient of determination.
16.3 Write down the regression equation.
16.4 Write down the sample size.
43
Chapter 14 (4th Edition) or Chapter 13 (3rd Edition)
Index Numbers: Measuring Business Activity
Question 1
The following table provides the price per kilogram and the quantities purchased of nuts in
2002, 2005 and 2007.
1.1 Find the quantity relative for Pecans for 2007, where 2002=100.
1.2 Find the price relative for Cashews for 2005, where 2002 is the base year. Interpret your
answer.
1.3 Calculate the Laspeyres composite price index for 2007, using the weighted average of
relative method with 2005 as the base year.
1.4 Calculate the Paasche composite quantity index for 2007, using the method of weighted
aggregates with 2005 as the base year.
1.5 Calculate the composite price index for 2005 using the method of weighted aggregates
with 2002=100, if the quantities are held constant at 2002.
1.6 For the following table represent the cell phone prices from 2004 to 2007.
Question 2
Doc company produces and sells three types of electrical appliances. The prices and quantities
in 2005, 2007 and 2010 are shown below.
44
2005 2007 2010
Type
Price ($) Quantity (’000) Price ($) Quantity (’000) Price ($) Quantity (’000)
Radio 80 25 100 20 120 15
Toaster 150 55 200 40 250 25
Clock 120 15 130 30 140 50
2.1 Find the price relative for clock for 2007, where 2005 is the base year. Interpret your
answer.
2.2 Find the quantity relative for toaster for 2010, where 2005 = 100.
2.3 Calculate the Laspeyres composite price index for 2010, using the weighted average of
relative method with 2007 as the base year.
2.4 Calculate the Paasche composite quantity index for 2010, using the method of weighted
aggregates with 2007 as the base year.
2.5 Calculate the composite price index for 2007, using the method of weighted aggregates
with 2005 = 100, if the quantities are held constant at 2005.
Question 3
2004 2006
Product
Price(Rand) Quantity Price(Rand) Quantity
Bread 94 18 110 25
milk 86 32 130 44
Cheese 74 300 81 380
3.1 Find the quantity relative of bread for 2006 where 2004 =100
3.2 Find the price relative of cheese for 2006 where 2004 =100
3.3 Calculate the composite quantity index for 2006, using the method of aggregates with
2004=100, if prices are held constant at 2006.
3.4 Calculate Laspeyres price index for 2006, using the weighted average of relative method
with 2004 =100.
45
Question 4
4.1 With 2007=100, the price relative for coffee for 2008.
4.2 With 2008 =100, the quantity relative for sugar for 2009. Interpret the index.
4.3 A weighted aggregate price index for 2009, using Paasche’s approach, with 2007 as base
year.
4.4 A weighted quantity index for 2009 using 2008 as base year. Use the weighted average of
relatives method, holding prices constant in 2008. Interpret your answer.
4.5 The Laspeyres price index for 2008 (with 2007=100) using the weighted aggregates method.
Question 5
Price Quantity
Commodity
1986 1987 1988 1986 1987 1988
Wheat 10 10 10 20 24 36
Rice 12 15 23 24 25 20
Barley 14 18 60 105 125 60
Sugar 15 15 17 116 128 140
5.1 By using 1986 as base year, calculate the quantity relatives for Barley for 1987 and 1988.
5.2 Calculate a weighted aggregate price index for 1987, using Laspeyress approach, with
1986 as base year.
5.3 Calculate a weighted quantity index for 1988 using 1986 as base year. Use the weighted
average of relatives method, holding prices constant in 1986. Interpret your answer.
5.4 Calculate the Paasche price index for 1988 (with 1987=100) using the weighted aggregates
method.
46
Question 6
Given the following commodity (A, B, C and D) for 2000 and 2004:
2000 2004
Commodity
Price Quantity Price Quantity
A 2 8 4 6
B 5 14 7 10
C 4 15 5 14
D 3 16 4 22
6.1 Find the price relative of commodity A for 2004 where 2000 =100
6.2 Find the quantity relative of commodity C for 2004 where 2000 =100
6.3 Calculate the composite price index for 2004, using the method of weighted average, with
2000=100, if quantities are held constant at 2000 level.
6.4 Calculate Paashe price index for 2006, using the weighted aggregate of relative method
with 2004 =100.
Question 7
A Scrapyard specialist has recorded the unit prices and quantities sold of three types of used
tires for the year 2012, 2013 and 2014. See information below.
QUANTITY PRICE
Commodity
2012 2013 2014 2012 2013 2014
Tire X 20 22 26 R200 R225 R250
Tire Y 40 44 50 R120 R150 R160
Tire Z 25 20 30 R180 R200 R210
47
7.1 Find the price relative for tire Y for 2014, where 2012 =100.
7.2 Find the volume relative for tire Z for 2014, where 2013=100 and interpret.
7.3 Calculate the Laspeyres composite price index for 2013, using the method of weighed
aggregate with 2012 as the base year. Interpret your answer.
7.4 Calculate the Paashe composite quantity index for 2013, using the weighted average of
relative method with 2012 as the base year. Interpret your answer.
7.5 calculate the composite price index for 2014, using the method of weighted aggregates
with 2012=100, if quantities are held constant at 2014.
7.6 The following table represent the prices of sugar from 2009 to 2012.
Question 8
2012 = 100
8.1 Find the quantity relative for product B. Interpret your findings.
8.3 Calculate a weighted aggregate price index for 2015, using Paasche’s approach.
8.4 What is the percentage increase/decrease in the price of the three products between 2012
and 2015? Use the weighted average of price relative method if quantities are held con-
stant at base year. Interpret your findings.
48
Question 9
The data in the following table are for the per capita retail kilograms of beef consumed in the
United States from 1992 to 1999:
9.1 Calculate the missing index with 1993 = 100, from the above table.
9.3 For the given indexes, shift the base period to 1998.
Question 10
Study the following table with prices and quantities of three types of sweets that G. Lutton buy
weekly:
2003 2004
ITEM
PRICE QUANTITY PRICE QUANTITY
Kraker 0.2 20 0.25 24
Cool-C 0.25 12 0.25 16
Gosh 1.00 3 2.00 2
10.3 The Laspeyres price index using the weighted aggregates method.
10.4 The weighted average of quantity index, holding prices constant in 2004.
Question 11
A coffee shop bakery wants to know how much the price of its essential ingredients has in-
creased over the period 2000 to 2002. They need this information to adjust their selling price,
and to report back to other stakeholders in their business.
49
2000 2001 2002
Items
Price Quantity Price Quantity Price Quantity
Bread (loaf) 3.50 250 3.75 300 4.00 420
Eggs (1 dozen) 8.25 200 9.25 250 9.95 300
Margarine (1kg) 9.30 150 10.45 165 11.00 190
11.1 Find the price relative for Eggs for 2002, where 2000 = 100
11.2 Find the quantity relative for Margarine for 2001, where 2000 is the base year. Interpret
your answer.
11.3 Calculate the composite price index for 2001 using the method of weighted aggregates
with 2000 = 100, if the quantities are held constant at 2000.
11.4 Calculate the Paasche composite price index for 2001, using the method of weighted ag-
gregates with 2001 as the base year.
11.5 Calculate the Laspeyres composite quantity index for 2002, using the weighted average
of Relative method with 2001 as the base year.
50
Chapter 15 (4th Edition) or Chapter 14 (3rd Edition)
Time Series Analysis: A Forecasting Tool
Question 1
1.1 The Gross Domestic Product (GDP) in millions of a certain country is shown below for the
years 1980 to 1986.
1.2 The following table shows the annual sales of Baby Store between 1997 and 2003:
Question 2
2.1 Name the two methods that can be used for trend isolation.
2.2 The following data represent the production of steel on millions of kilograms in Vanderbi-
jlpark during the years 1995-2001.
51
Question 3
The following table represents the total number of telephones in operation in South Africa over
a period of 5 years:
Question 4
Question 5
The following data show retail sales of canoes from 1990 through 1996, with data in thousands
of boats.
5.3 Write the trend line equation, using the method of least squares.
52
Question 6
6.2 What will the x-codes be if the sequential numbering method is used?
6.3 Determine the trend equation using the method of least squares.
Question 7
Question 8
The data represented in the accompanying table represents the shipments received by the
Tugela Manufacturing Company:
53
Year Quarter Number of Centered Seasonal
4-period ratio
shipments moving average
2013 1 20
2 25
3 30 29.5 B
4 38 32.63 116.46
2014 1 30 35.13 58.40
2 40 A 115.11
3 35 35.0 100
4 30 37.5 80
2015 1 40 41.25 96.97
2 50 47.25 105.82
3 55
4 58
Question 9
The manager of Dialnet telephones determined the following percentages for sales from 2009
to 2013:
I II III IV
2009 - - 65.53 73.86
2010 116.48 144.30 67.52 73.32
2011 113.67 132.17 82.86 81.94
2012 108.39 126.84 83.30 76.52
2013 115.40 124.70 - -
9.2 If the real figures for 2013 are as follows, deseasonalize the 2013 sales:
SALES
(R’million)
2013 I 22.2
II 24.3
III 16.5
IV 15.4
54
Question 10
10.2 A time series is a set of (i) data of a random variable that is gathered
over time at (ii) intervals and arranged in (iii) order.
Question 11
55
Year Q1 Q2 Q3 Q4
2006 - - 104.92 108.56
2007 87.39 96.2 108.47 107.46
2008 90.83 97.3 101.59 101.33
2009 94.42 97.96 104.97 112.03
2010 83.72 99.24 105.86 114.07
2011 84.94 95.5 - -
Median seasonal index
Adjusted seasonal index
Question 12
Number of Gas heater sold (x1000) quarterly by ABC for the period 2007 to 2011.
Year Q1 Q2 Q3 Q4
2007 - - 120.48 100.31
2008 75.84 102.82 130.77 79.01
2009 79.04 125.15 127.9 61.94
2010 90.37 105.13 132.48 61.94
2011 67.1 108.04 - -
56
12.1 Calculate adjusted seasonal index.
Question 13
Consider the following demand level of winter jacket (in 1 000) at JACKET BAGAIN locate in
LIMPOPO from 2010 to 2013.
Year Q1 Q2 Q3 Q4
2010 - - 85.1064 153.8462
2011 74.5763 C 106.6667 141.4634
2012 63.6605 85.482 127.7778 177.8976
2013 20.0717 39.2638 - -
Median seasonal Index 63.6605 75.9124 106.6667 D
Adjusted seasonal index 63.65 75.9 E 153.81
57
Question 14
The number of detentions Mrs Nasty gives to naughty grade 6 children each day over a 3 week
period is shown below:
Week 1
Day Mon Tue Wed Thurs Fri
No of detentions 4 8 12 7 18
Week 2
Day Mon Tue Wed Thurs Fri
No of detentions 3 6 10 7 16
Week 3
Day Mon Tue Wed Thurs Fri
No of detentions 3 6 7 5 13
14.3 Calculate the 5-period moving average for the time series.
14.5 Estimate the number of detentions that Mrs Nasty will give on Wednesday Week 4.
Question 15
T = 146.57 + 58.18x
58
Question 16
Consider the following quarterly demand levels for electricity (in 1000 megawatts) in Cape
Town from 1988 to 1991.
Question 17
Use the following table to calculate the seasonal index by making use of the ratio-to-moving
average method.
Question 18
The management of an office building is studying a plan to reduce energy costs in the building.
The have assembled quarterly data on electricity costs for the past three years
(in R1 000).
Year 1 2 3 4
1 2.4 3.8 4.0 3.1
2 2.6 4.1 4.1 3.2
3 2.6 4.5 4.3 3.3
Compute seasonal indexes for the building’s electricity usage by the ratio-to-moving method.
59
EXCEL FUNCTIONS: 2007 versus 2010
60