0% found this document useful (0 votes)
19 views

Chapter 5 QM (PC)

This document provides an overview of estimation and hypothesis testing. It discusses point and interval estimates of population parameters like the mean, proportion, and variance. Point estimates are single values used to estimate parameters, while confidence intervals are ranges of values that likely contain the true population parameter. The document presents the formulas for confidence intervals of the mean and proportion at various confidence levels. It also defines hypothesis testing as using sample evidence and probability to determine if a hypothesis statement should be rejected or not. The key steps of hypothesis testing are outlined.

Uploaded by

SEOW INN LEE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Chapter 5 QM (PC)

This document provides an overview of estimation and hypothesis testing. It discusses point and interval estimates of population parameters like the mean, proportion, and variance. Point estimates are single values used to estimate parameters, while confidence intervals are ranges of values that likely contain the true population parameter. The document presents the formulas for confidence intervals of the mean and proportion at various confidence levels. It also defines hypothesis testing as using sample evidence and probability to determine if a hypothesis statement should be rejected or not. The key steps of hypothesis testing are outlined.

Uploaded by

SEOW INN LEE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Chapter 5 Estimation and Hypothesis Testing

Point estimate of population mean, proportion and


variance
➢ Estimation of parameters
✓ The statistical technique of estimating unknown
population parameters based on a value of the
corresponding sample statistic.

✓ The estimation procedure involves the following steps:


1. Select a sample
2. Collect the required information from the members of
the sample
3. Calculate the value of the sample statistic
4. Assign value(s) to the corresponding population
parameter
➢ Estimate
✓ The value(s) assigned to a population parameter
based on the value of a sample statistic

➢ Estimator
✓ The sample statistic that is used to estimate a
population parameter.

➢ Two types of estimates


1. Point Estimates:
A point estimate is one value that is used to
estimate a population parameter.

1
✓ The sample mean, 𝑥 is the best point estimate of
population mean, 𝜇.

✓ The sample proportion, 𝑝̂ is the best point estimate of


population proportion, 𝑝.

Eg 1: The number of defective items produced by a machine


was recorded for five randomly selected hours during a
40-hours work in a particular week. The observed
number of defectives was: 12, 4, 7, 14 and 10. What is
the point estimate for the population mean?
12+4+7+14+10 47
Point estimate, mean = = = 9.4
5 5

Interval estimates of population mean and proportion


➢ Confidence Interval (CI)
▪ An confidence interval also call as an interval estimate.
▪ An confidence interval is a range or an interval of values
that is likely to contain the population parameter.
▪ Confidence interval always based on the results of a
sample.
▪ They are 3 confidence intervals that are used extensively:
90%, 95% and 99%

2
➢ Confidence interval for population mean
Case 1: (large sample, n > 30)
Confidence interval for population mean 𝜇 is:
𝜎 𝑠
𝑥 ± 𝑧𝛼 𝑥̅ ± 𝑍𝛼
2 √𝑛 2 √𝑛

where 𝑥 = the sample mean


s, 𝜎 = the population standard deviation
𝑛 = the sample size
𝑧 = the confidence factor(1.6449 for 90%, 1.9600
for 95% and 2.5758 for 99%)
If n > 30, we can replace 𝜎 by the sample standard deviation, s
𝑠 𝑠
in the formula where we let 𝜎 = s. 𝑥̅ ± 𝑍𝛼 𝑛 = st. error
2𝑛 √ √

Eg 2: Find the confidence interval for population mean, 𝜇 if:


a) 99% CI, 𝑥 = 41.8 years, 𝜎 = 16.7 years, n = 570,
 = 1% = 0.01
99% CI for the population mean,  is (1m)
𝜎
𝑥 ± 𝑧𝛼
2 √𝑛
16.7
= 41.8  𝑍0.01 ( ) (1m)
2 √570

= 41.8  𝑍0.005 (0.6995)


= 41.8  2.5758(0.6995) (1m)
= 41.8  1.8 = (40.0, 43.6) or 40.0 to 43.6
(1m) =
3
b) 90% CI, 𝑥 = 1.8 minutes, 𝜎 = 0.6 minutes, n = 46
 = 10% = 0.1
90% CI for the population mean,  is
𝜎
𝑥 ± 𝑧𝛼
2 √𝑛
0.6
= 1.8  𝑍0.10 ( )
2 √46

= 1.8  𝑍0.05 (0.0885)


= 1.8  1.6449(0.0885)
= 1.8  0.15 = (1.65, 1.95)

Eg 3: The Dean of the Business School wants to estimate the


mean number of hours work per week by students. A
sample of 49 students showed a mean of 24 hours with a
standard deviation of 4 hours. What is the 95%
confidence interval for the average number of hours
worked per week by the students?
n = 49 𝑥̅ = 24 s=4  = 5% = 0.05
95% CI for the population mean,  is
𝑠
𝑥 ± 𝑧𝛼
2 √𝑛
4
= 24  𝑍0.05 ( )
2 √49

= 24  𝑍0.025 (0.571)
= 24  1.96(0.571) = 24  1.12 = (22.88, 25.12)

4
Eg 4: A survey of 64 students revealed that the average time
students spend studying per week was 125 hours with
standard deviation of 5 hours.
a) What is the point estimate for the population mean?
b) Calculate the standard error.
c) What is the 99% confidence interval for the mean of
the time students spends studying per week?

n = 64, 𝑥̅ = 125, s = 5
a) Point estimate, 𝑥̅ = 125
𝑠 5
b) standard error, 𝑠𝑥̅ = = = 0.625
√ 𝑛 √64

c) 99% CI for the population mean,  is


𝜎
𝑥 ± 𝑧𝛼
2 √𝑛

= 125  𝑍0.01 (0.625)


2

= 125  𝑍0.005 (0.625)


= 125  2.5758(0.625) = 125  1.6 = (123.4, 126.6)

➢ Confidence interval for population proportion, 𝑝

Confidence interval for the population proportion, p is given


by:
𝑝̂(1−𝑝̂)
𝑝̂ ± 𝑧𝛼 √
2 𝑛

where 𝑝̂ = sample proportion

5
𝑛 = sample size
𝑧 = confidence factor
(1.6449 for 90%, 1.9600 for 95% and 2.5758 for 99%)

Eg 5: Find the following population proportion, p if:


a) n = 400, x = 100, 90% CI
𝑥
𝑝̂ = 0.25 = 𝑛 (1m)  = 10% = 0.1

90% CI for the population proportion, p is (1m)


𝑝̂(1−𝑝̂)
𝑝̂ ± 𝑧𝛼 √
2 𝑛

0.25(1−0.25)
= 0.25  𝑍0.1 √
2 400
= 0.25  𝑍0.05 (0.0217)
= 0.25  1.6449(0.0217)
= 0.25  0.0357
= (0.2143, 0.2857) or (21.43%, 28.57%)

b) n = 900, x = 360, 95% CI


360
𝑝̂ = 0.4 = 900

95% CI for the population proportion, p is


𝑝̂(1−𝑝̂)
𝑝̂ ± 𝑧𝛼 √
2 𝑛

0.4(1−0.4)
= 0.4  𝑍0.05 √
2 900

= 0.4  𝑍0.025 (0.016)

6
= 0.4  1.96(0.016)
= 0.4 0.03136
= (0.36864, 0.43136)

Eg 6: A sample of 500 executives revealed that 175 of them


planned to sell their homes and retire at Arizona. Develop
a 99% confidence interval for the proportion of all
executives that plan to sell their house and move to
Arizona.
175
n = 500 x= 175 𝑝̂ = 500 = 0.35
99% CI for the population proportion, p is
𝑝̂(1−𝑝̂)
𝑝̂ ± 𝑧𝛼 √
2 𝑛

0.35(1−0.35)
= 0.35  𝑍0.01 √
2 500

= 0.35  𝑍0.005 (0.0213)


= 0.35  2.5758(0.0213)
= 0.35  0.055
= (0.295, 0.405) or 29.5% to 40.5%

7
Eg 7: A random sample of 200 students was taken. It was
notice that 25 of them dislike mathematics subjects.
Obtain a 90% confidence interval for the population
proportion of student dislike mathematics subject.
25
n = 200 𝑝̂ = 200 = 0.125
90% CI for the population proportion, p is
𝑝̂(1−𝑝̂)
𝑝̂ ± 𝑧𝛼 √
2 𝑛

0.125(1−0.125)
= 0.125  𝑍0.1 √
2 200

= 0.125  𝑍0.05 (0.0234)


= 0.125  1.6449(0.0234)
= 0.125  0.038
= (0.087, 0.163)

8
Hypothesis Testing (8 to 10m)
Hypothesis: A statement about the value of a population
parameter developed for the purpose of testing.

Example of hypothesis made about a population


parameter are:
a) The mean monthly income for system analysts
is RM 2,100 is this claim is true?
b) The average student spend to study are 2.5
hours per day.
c) 20% of all thieves are caught and sent to
prison.
d) 65% of student May Intake year 2014 of TAR
College students is local people.

What is hypothesis testing?


➢ Hypothesis testing:
A procedure, based on sample evidence and
probability theory, used to determine whether the
hypothesis is reasonable statement and should not be
rejected, or unreasonable and should be rejected.

9
Steps in hypothesis testing:
Most researchers take the following steps when testing
hypothesis.
Step 1: State null and alternate hypothesis
H0 :
H1 :

Step 2: Select a level of significance.


Significance level:
Critical value:
Formulate a decision rule:

Step 3: Identify the test statistic.


𝑥−𝜇0 𝑝̂−𝑝
𝑧= 𝜎 or 𝑧=
𝑝(1−𝑝)
√𝑛 √
𝑛

State the decision bases on the value of test statistics, z. Then


state
i) Reject H0 or ii) Do not reject H0
Step 4: Conclusion:

10
Definitions:
❖ Null hypothesis, H0:
A statement about the value of a population
parameter.
❖ Alternative hypothesis, H1:
A statement that is accepted if the sample data
provide evidence that the null hypothesis is false.
❖ Level of significance:
The probability of rejecting the null hypothesis when
it is actually true.
❖ Critical value:
The dividing point between the region where the null
hypothesis is rejected and the region where it is not
rejected.
❖ Test statistic:
A value, determined from sample information, used
to determine whether or not to reject the null
hypothesis.

Type I and Type II errors


▪ Type I Error: Rejecting the null hypothesis when it is
actually true.

▪ Type II Error: Accepting the null hypothesis when it is


actually false.

11
Test decision
Accept H0 Reject H0
Actual H0 is true correct Type I error
Situation H0 is false Type II error correct

Relationship between the sign in H0 and H1 and tails of test


Two-tailed test Left-tailed test Right-tailed
test
Sign in the null = = or  = or 
hypothesis H0
Sign in the  < >
alternative Decrease/ less Increase/ more
different
hypothesis H1
than than
Rejection In both tails In the left tail In the right tail
region
Reject H0 Reject H0 Reject H0

Do not reject H0

12
➢ Testing for the population mean (Large sample):
• In test of hypothesis about population mean for large
samples and the population standard deviation is
known, the test statistics is given by:
𝑥−𝜇0
𝑧= 𝜎
√𝑛

• Here 𝜎 is unknown, so we estimate it with the sample


standard deviation, s.

• As long as the sample size n  30, z can be approximated


with:
𝑥 − 𝜇0
𝑧= 𝑠
√𝑛
Steps in hypothesis testing for population mean (large sample)
Step 1:
H0 :
H1 :
Step 2:
Significance level:
Critical value:
Decision rule:
𝑥−𝜇0 𝑥−𝜇0
Step 3: 𝑧 = 𝜎 or 𝑧= 𝑠
√𝑛 √𝑛

Step 4:
Conclusion:

13
Eg 8: The processors of K&K peanut butter indicate on
the label that the bottle contains 16 ounces of peanut
butter. A sample of 36 bottles is selected hourly. Last
hour a sample of 36 bottles has a mean weight of
16.12 ounces with a standard deviation of 0.5 ounces.
Test at the 0.05 significance level whether the sample
peanut butter contain is different with the label
information?
Soln: n = 36 , 𝑥̅ = 16.12 , s = 0.5 , 𝜇0 = 16= 
same H0:  = 16 (1m) Reject H0

different H1:  ≠ 16 (1m)


Significance level:  = 0.05 -1.96 1.96

Critical value: 𝑍0.05 = 𝑍0.025 = 1.96 (1m)


2

Decision rule: Reject H0 if Z < -1.96 or Z > 1.96.


Otherwise, do not reject H0. (2m)
Test statistic:
𝑥̅ −𝜇0 16.12−16
𝑍= 𝑠 = 0.5 = 1.44 (2m)
√𝑛 √36

Since Z = 1.44 < 1.96, we do not reject H0.


(1m)

Conclusion: We can conclude that the sample


peanut butter contain is same with the
label information. (1m)

14
Eg 9: According to the national survey, the mean family
size was 3.18 in year 1998. A researcher wanted to
check if the current mean family size is less than
3.18. A sample of 900 families taken this year by
this researcher produced a mean family size of 3.16
with a standard deviation of 0.70. Using the 0.025
significance level, can we conclude that the mean
family size has decrease since 1998?

Soln: n = 900 , 𝑥̅ = 3.16 , s = 0.7 , 𝜇0 = 3.18


increase H0:   3.18 Reject H0

decrease H1:  < 3.18


Significance level:  = 0.025 -1.96

Critical value: -𝑍0.025 = -1.96


Decision rule: Reject H0 if Z < -1.96.
Otherwise, do not reject H0.
Test statistic:
𝑥̅ −𝜇0 3.16−3.18
𝑍= 𝑠 = 0.7 = -0.857
√𝑛 √900

Since Z = -0.857 > -1.96, we do not reject H0.

Conclusion: We can conclude that the mean family


size has increase since 1998.

15
Eg 10: A research conducted by a group of students
indicated that the average student spends RM 320
for food per month. A random sample of 400
students spends an average of RM 360 with a
standard deviation of RM 36 per month. Can we
conclude that there is an increase in the students
food expenditure? Test the hypothesis at 5% level
of significance.
Soln: n = 400 , 𝑥̅ = 360 , s = 36 , 𝜇0 = 320
H0:   320 Reject H0

increase H1:  > 320


Significance level:  = 0.05 1.6449

Critical value: 𝑍0.05 = 1.6449


Decision rule: Reject H0 if Z > 1.6449.
Otherwise, do not reject H0.(2m)
Test statistic:
𝑥̅ −𝜇0 360−320
𝑍= 𝑠 = 36 = 22.222
√𝑛 √400

Since Z = 22.222 > 1.6449, we reject H0.

Conclusion: We can conclude that there is an


increase in the students food
expenditure. (Accept H1)

16
➢ Testing hypothesis for population proportion (p)
(Limit to large sample only)
✓ Proportion: A fraction or percentage that indicates
the part of the population or sample having a
particular trait of interest.
✓ The sample proportion is denoted by 𝑝̂ .
𝑥
𝑝̂ =
𝑛

Test statistics for testing a population proportion


𝑝̂−𝑝
𝑧=
𝑝(1−𝑝)

𝑛

where 𝑝 = population proportion, 𝑝̂ = sample proportion


Steps in hypothesis testing for population proportion
Step 1:
H0:
H1:
Step 2: Significance level:
Critical value:
Decision rule:
Step 3:
𝑝̂ − 𝑝
𝑧=
√𝑝(1 − 𝑝)
𝑛
Step 4:
Conclusion:

17
Eg 11: In a certain city 60% of the families own cars. A survey
was done among the subscribers of a magazine to find
out whether they owned a car. A random sample of 1200
subscribers was taken and 64% of the subscribers
claimed that they owned a car. Test whether the
proportion of subscribers who owned cars is
significantly different from the percentage in the
population of the city at 0.01 level of significance.

Soln: 𝑝 = 0.60 , n = 1200 , 𝑝̂ = 0.64


Reject H0
H0 : p = 0.60 (same)
H1 : p ≠ 0.60 (different)
2.5758
Significance level:  = 0.01
-2.5758

Critical value: 𝑍0.01 = 𝑍0.005 =  2.5758


2

Decision rule: Reject H0 if Z<-2.5758 or Z>2.5758.


Otherwise, do not reject H0.
Test statistic:
𝑝̂−𝑝 0.64−0.60
Z= = = 2.828
𝑝(1−𝑝) 0.60(1−0.60)
√ √
𝑛 1200

Since z = 2.828 > 2.5758, so we reject H0.

Conclusion: We can conclude that the proportion of


subscribers who owned cars is
significantly different from the
percentage in the population of the city.

18
Eg 12: Director Mailing company sells computer and
computer parts by mail. The company claims that at
least 90% of all orders are mailed within 72 hours
after they are received. The quality control
department at the company often takes 150 samples
to check if this claim is valid. A recently 129 taken
were mailed within 72 hours. Do you think the
company’s claim is true? Use a 2.5% significance
level to test it.
129
Soln: 𝑝 = 0.90 , n = 150 , 𝑝̂ = 150 = 0.86
H
H0 : p  0.90 Reject H0

H1 : p < 0.90
Significance level:  = 0.025 -1.96

Critical value: -𝑍0.025 = -1.96


Decision rule: Reject H0 if Z < -1.96.
Otherwise, do not reject H0.
Test statistic:
𝑝̂−𝑝 0.86−0.90
Z= = = -1.633
𝑝(1−𝑝) 0.90(1−0.90)
√ √
𝑛 150

Since z = -1.633 > -1.96, so we do not reject H0.

Conclusion: We can conclude that the company’s


claim is true.

19
Eg 13: In the past, 15% of the mail order solicitations for a
certain charity resulted in a financial contribution. A
new solicitation letter that has been drafted is sent to a
sample of 200 people and 45 responded with a
contribution. At the 0.05 significance level, can it be
concluded that the new letter is more effective?
45
Soln: 𝑝 = 0.15 , n = 200 , 𝑝̂ = 200 = 0.225

H0 : p  0.15 Reject H0

H1 : p > 0.15
Significance level:  = 0.05 1.6449

Critical value: 𝑍0.05 = 1.6449


Decision rule: Reject H0 if Z > 1.6449.
Otherwise, do not reject H0.
Test statistic:
𝑝̂−𝑝 0.225−0.15
Z= = = 2.970
𝑝(1−𝑝) 0.15(1−0.15)
√ √
𝑛 200

Since z = 2.970 > 1.6449, so we reject H0.

Conclusion: We can conclude that the new letter is


more effective.

20
Eg 14: Chicken Delight claims that 90% of its orders are deliver
within ten minutes of the time the order is placed. A
sample of 100 orders revealed that 82 were delivered
within the promise time. At the 0.10 significance level,
can we conclude that less than 90% of the orders are
delivered within ten minutes?
82
Soln: 𝑝 = 0.90 , n = 100 , 𝑝̂ = 100 = 0.82

H0 : 𝑝  0.90 Reject H0

H1 : 𝑝 < 0.90
Significance level:  = 0.10 -1.2816

Critical value: −𝑍0.10 = -1.2816


Decision rule: Reject H0 if Z < -1.2816.
Otherwise, do not reject H0.
Test statistic:
𝑝̂−𝑝 0.82−0.90
Z= = = -2.667
𝑝(1−𝑝) 0.90(1−0.90)
√ √
𝑛 100

Since z = -2.667 < -1.2816, so we reject H0.

Conclusion: We can conclude that less than 90% of


the orders are delivered within ten
minutes.

21
22
Chi-Square Test
➢ ‘Chi’ is the Greek letter , pronounced ‘kye’.
➢ The chi-square distribution is a continuous distribution
and it has a positive integer parameter v, which
determines its shape.
➢ As its name implies, 2 cannot take a negative value.
➢ The parameter v is known as the degrees of freedom (df)
of the distribution and we refer to a ‘chi-square
distribution with v degrees of freedom’. For simplicity,
we write this as 𝑣2.
➢ There are many 2 distributions; one for each degree of
freedom. As the degrees of freedom become fewer, the
distribution becomes more positively skewed.
Conversely as the number of degrees of freedom is
increased, the distribution becomes approximately
normal.

➢ The 2 statistic plays an important role in many business


problems dealing with count data where information is
obtained by counting rather than by measuring.

23
Eg 15: a) In a market research, we count the number of
people who prefer a particular brand of detergent
powder.
b) In quality control, we count the number of
defectives produced by a machine during a certain
period.

➢ There are many situations of this type where


measurements are made by counting the numbers or
frequency in each category.
➢ The 2 test is applied to such frequency of occurrences as
against the expected ones.
➢ The 2 test is used broadly for:
 Test of goodness-of-fit
• For one-way classification or for one variable only
• Test whether a given set of data actually follows
an assumed distribution or not
✓ Test of independence
• For more than one row or column in the form of a
contingency table concerning several attributes
• Test for dependence between two variables

24
Contingency Table Analysis
- The chi-square test can be used in more than one variable
and more than one characteristic.
- Often data are collected on several variables at a time.
For example, a questionnaire will usually contain more
than one question.
- Another important application of the 2 distribution is in
testing for the independence of two variables on the basis
of sample data.
- If there are differences in the two variables then the
variables are said to be associated whereas if there are
no differences then the variables are said to be
independent.

Contingency Table
- A table that gives the frequencies for two or more
variables simultaneously.
✓ To determine whether 2 characteristics of given
population in null hypothesis are independent (no
relationship or no association).

1) The degree of freedom are


v = (r – 1)(c – 1)
where r = number of rows
c = number of columns

2) E = (row total  column total)


sample size

25
Steps in chi-square test
Step 1:
H0:
H1:
Step 2: Critical value:
Critical region:
Step 3:
Test statistic

Eg 16: An accident inspector makes spot checks on working


practices during visits to industrial sites chosen at
random. At one large construction site, the numbers of
accidents occurring per week were counted for a period
of three years, and each week was also classified as to
whether or not the inspector had visited the site during
the previous week. The results are shown as follows.

Total
50
130
Total 100 50 20 10 180
Do the number of accidents depends on the visits by
the inspector? Use  = 0.05.

26
Solution:
H0: No. of accidents is independent from visit of inspector
H1: No. of accidents is dependent from visit of inspector
At  = 0.05,
Critical value :𝜒 2 0.05;(2−1)(4−1) = 𝜒 2 0.05;3 = 7.815 (fr table)

Rejection region: Reject H0 if 𝜒 2 > 7.815.

O E O – E (O – E)2 (𝑂 − 𝐸)2
𝐸
33 50 × 100 5 25 0.8929
= 28
180
67 130 × 100 -5 25 0.3472
= 72
180
8 50 × 50 -6 36 2.5714
= 14
180
42 130 × 50 6 36 1
= 36
180
5 50 × 20 -1 1 0.1667
=6
180
15 130 × 20 1 1 0.0714
= 14
180
4 50 × 10 1 1 0.3333
=3
180
6 130 × 10 -1 1 0.1429
=7
180
n =180 2=5.5258
2 = 5.5258 < 7.815, we do not reject H0.
We can conclude that No. of accidents is independent from
visit of inspector.
27
Eg 17: A sample of hotels in a particular country was selected.
The following table shows the number of hotels in each
region of the country and in each of four grades.

Total
80
160
120
40
Total 160 100 140 400
Show whether there is any evidence of a significant
association between region and grade of hotel in this
country. Use  = 0.05.

Solution:
H0: There is no association between region and grade of hotel
H1: There is an association between region and grade of hotel
At  = 0.05,
Critical value :𝜒 2 0.05;(4−1)(3−1) = 𝜒 2 0.05;6 = 12.592

Rejection region: Reject H0 if 𝜒 2 > 12.592.

28
O E O – E (O – E)2 (𝑂 − 𝐸)2
𝐸
29 80 × 160 -3 9 0.2813
= 32
400
67 160 × 160 3 9 0.1406
= 64
400
53 120 × 160 5 25 0.5208
= 48
400
11 40 × 160 -5 25 1.5625
= 16
400
22 80 × 100 2 4 0.2
= 20
400
38 160 × 100 -2 4 0.1
= 40
400
32 120 × 100 2 4 0.1333
= 30
400
8 40 × 100 -2 4 0.4
= 10
400
29 80 × 140 1 1 0.0357
= 28
400
55 160 × 140 -1 1 0.0179
= 56
400
35 120 × 140 -7 49 1.167
= 42
400
21 40 × 140 7 49 3.5
= 14
400
n =400 2=8.0591
2 = 8.0591 < 12.592, we do not reject H0.
We can conclude that there is no association between region
and grade of hotel.

29
30

You might also like