0% found this document useful (0 votes)
85 views

Sample Testing

Sample testing detailed the process of conducting a statistical test on a representative sample of a population.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views

Sample Testing

Sample testing detailed the process of conducting a statistical test on a representative sample of a population.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Lesson 10 Homework

Submit the following homework problems to dropbox for HW 10 by April 7.

Remember to check for the assumptions before you determine what


procedure to use. If there is no complete data, but only summary statistic,
just mention what to check.

You may use Minitab whenever appropriate.

The answers to the problems will be posted on the Sunday after the due
date. To view the answers, click on the Homework Solutions link.

In this homework, one important issue is to identify whether to use 2 sample t- procedure
or to use paired t-procedure. See sample problems to make the distinction. In addition,
you should check whether to use pooled t or separate variances t if 2 sample t is
appropriate.
1. Problem 6.13 in 6th edition (which is problem 6.16 in 5th edition). [note that the output
is given in the problem and thus you do not need to use minitab to find the output again]
a) For the pooled-variance t statistic, the degree of freedom would be:
n1 + n2 2 = 24 + 36 2 = 58. The corresponding t-value in the table is -4.04.
b) For the separate-variance t statistic, the degree of freedom would be:
df =

( n1 1 ) (n21)
2
( 1c ) ( n1 1 ) +c 2 (n21)
2

Where c=

s1
n1
2

s1 s2
+
n 1 n2

=0.6799 giving df = 43.43 which is rounded to 43.

Thus, t = -3.90.
c)

H0: F = M

Vs

Ha: F M

i) For the pooled-variance, p-value = 2*0.0001 = 0.0002 < = 0.05. We reject the null
hypothesis. The same conclusion is obtained for = 0.01.
ii) For the separated-variance, p-value = 2*0.0002 = 0.0004 < = 0.05. We reject the null
hypothesis. The same conclusion is obtained for = 0.01.

Here we obtain the same conclusion for each or the statistic used; therefore the
conclusion doesnt depend on which statistic is used.
2. Problem 6.29 in 6th edition (which is problem 6.32 in 5th edition). [again, for this
problem, the minitab output is given in the book]
a)

H0: d = 0

Vs

Ha: d 0

We will use a default confidence coefficient of 0.95 ( = 0.05).


For the paired t-test, p-value = 2 * 0.000 0 < = 0.05, we reject the null hypothesis and
conclude that there is a difference in the mean final grade between students in an
academically and non-academically oriented home environment.
Using the separated-variance statistic would give a conclusion different from the paired ttest. But it seem to be that the data are paired and the previous conclusion seems
appropriate that the separated-variance statistic.
b) The size of the difference in the mean final grades of the students in academic and
nonacademic home environments is given is the first table at 95% confidence interval to
be (2.23; 5.37).
c) The conditions for using the paired t-test is justified for the following reasons:
i) The paired differences seem to be normally distributed; this is verified by the fact that
the mean is closed to the median (from the box plot). Also, for a normally distributed
population, the IQR will be about 1.35 . The box plot shows that the IQR 7 1 = 6.
From the first table, the standard deviation is 4.205. Multiplying this by 1.35 gives 5.68
6. The normal probability plot gives almost a straight line; with all these observations, we
can conclude that the data is normally distributed.
ii) The normal probability plot indicates that the differences are a random sample from a
normal distribution.
Therefore we can use the paired t-test.
d) The subject-to-subject difference in this data seems to be significant. For example, set
of twins 13 has grades of 49 and 48 respectively while set 9 has grades of 98 and 92. This
creates an important disparity between the subject and requires that the data are paired in
order the see the difference of one group in the pair to another.

3. Problem 6.43 in 6th edition (which is problem 6.39 in 5th edition) only work on a, b
(skip c)
a)

H0: NB = WB

Vs

Ha: NB WB

Standards deviations from the two distribution suggest that we could use the separatevariance t-test.
Two-Sample T-Test and CI: N-B Jet, W-B Jet
Two-sample T for N-B Jet vs W-B Jet
N-B Jet
W-B Jet

N
12
15

Mean
118.37
110.20

StDev
7.87
4.71

SE Mean
2.3
1.2

Difference = mu (N-B Jet) - mu (W-B Jet)


Estimate for difference: 8.17
95% CI for difference: (2.73, 13.60)
T-Test of difference = 0 (vs not =): T-Value = 3.17

P-Value = 0.006

DF = 17

At 95% significance level ( = 0.05) we have p-value = 0.006 < = 0.05. Therefore we
reject the null hypothesis and conclude that the data provides enough evidence that there
is a difference in the average noise level of the two jets.
b) Calculation from MiniTab gave the size of the difference in the mean noise level
between the two types of jets using a 95% confidence interval to be (2.73, 13.60).
4. Problem 6.55 in 6th edition (which is problem 6.63 in 5th edition) [hand compute, do
not use minitab]
a)
i) H0: Female candidates expenditures in campaigns for public office is at least equal to
male candidates expenditures.
Ha: Female candidates expenditures in campaigns for public office is less than male
candidates expenditures.
ii) H0: F M

Vs

Ha: F M

b)
Probability Plot of Female
Normal - 95% CI
99

95
90

Mean
StDev
N
AD
P-Value

245.3
51.95
20
0.383
0.364

Mean
StDev
N
AD
P-Value

351
61.92
20
0.187
0.892

Percent

80
70
60
50
40
30
20
10
5

100

150

200

250
300
Female

350

400

450

Probability Plot of Male


Normal - 95% CI
99

95
90

Percent

80
70
60
50
40
30
20
10
5

100

200

300

400

500

600

Male

Both sets of data fall within the 95% lines of the normal probability plot, therefore the
condition of normality of each set of data can be assume. The standard deviation of the

female sample s1 = 51.95, this is close to that of the male sample s2 = 61.92. Therefore the
condition of normality, equal variance, and independent random samples is assumed and
we can proceed to estimate the confidence interval F - M assuming independent samples
and equal variance.
s p=

( n11 ) s21 + ( n 21 ) s 22
n 1+ n22

1951.95 2+1961.922
=57.15
20+ 202

At 95% confidence coefficient, /2 = 0.025 and df = 20+20-2 = 38,


t0.025 is between 2.030 (df = 35) and 2.021 (df = 40), interpolating at df = 38 gives
t0.025 = 2.024
(F - M) +/- t0.025 * sp * Sqrt(1/n1 + 1/n2)
=
(245.3 351) +/- 2.024*57.15*Sqrt(1/20 + 1/20)
=
105.7 +/- 36.57
The 95% confidence interval of F - M is (-142.27, -69.13)
y 2

1
sp

c)

1 1
+
n1 n2

245.3351
=5.85
57.15 1/10

y
t=
t = -5.85 < -t0.025 = -2.024 therefore we reject the null hypothesis and conclude that the
difference is statistically significant at 0.05 level.
d) yes the difference is of practical significance. The monetary value of this difference
varies between $69,130 and $142,270 which is an important amount of money.
5. Problem 6.56 in 6th edition (which is problem 6.64 in 5th edition)
The conditions to be satisfied before using the t procedure to analyze the data are:
-

Normal distribution of the samples

independent random samples

Equal variance if the pooled t test is adopted (like in prob. 4)

The boxplots given show that the two data sets are just slightly skewed to the left, the
means are close to the median. So we can assume that the data are normally distributed.
The standard deviation of the female sample s1 = 51.95, this is close to that of the male
sample s2 = 61.92, so we can assume equal variance. Finally, it stated in the problem that
the group of males and females were randomly selected so we think of the two groups as
being independent.

Therefore all conditions for a t-test procedure have been met in the previous problem.
6. Current Population Reports presents data on the ages of married people. Ten married
couples are randomly selected and have the ages shown here:
Husband

54

21

32

78

70

33

68

35

54

52

Wife

53

22

33

74

64

35

67

30

45

48

Do the data suggest that the mean age of married men is greater than the mean age of
married women? Determine whether you will use two sample t-test or paired t-test.
(Hint: is the data paired? ) Check conditions and use minitab to perform the test. Test at
3% level of significance.
Since the random selection is made on the couples, we can think of a paired t-test since
the two samples (husbands and wifes) will not be independent. Moreover, there is a wide
variability among the ten data (78 and 74 for the highest couple against 21 and 22 for the
youngest couple), suggesting that a paired t-test will be appropriate in reducing the
couple to couple variability.
Probability Plot of Difference
Normal - 97% CI
99

Mean
StDev
N
AD
P-Value

95
90

2.6
3.565
10
0.288
0.542

Percent

80
70
60
50
40
30
20
10
5

-10

-5

0
5
Difference

10

15

The normal probability plot of the difference of ages between the two group shows that it
is close to a straight line and therefore we can that the paired data set has a normal
distribution. Also, since the couples were randomly selected, we can assume that the
paired data are independent.

H0: M - F 0

Vs

Ha: M - F > 0

Paired T-Test and CI: Husband, Wife


Paired T for Husband - Wife
Husband
Wife
Difference

N
10
10
10

Mean
49.70
47.10
2.60

StDev
18.92
17.36
3.57

SE Mean
5.98
5.49
1.13

97% lower bound for mean difference: 0.18


T-Test of mean difference = 0 (vs > 0): T-Value = 2.31

P-Value = 0.023

At df = 9, t0.03 = 2.15
t = 2.31 > t0.03 = 2.15 => We reject H0 and conclude that the data provide enough evidence
that the mean age of married men is greater than the mean age of married women.
7. The costs of major surgery vary substantially from one state to another due to
differences in hospital fees, malpractice insurance cost, doctors fees and rent. A study of
hysterectomy costs was done in California and Montana. Based on a random sample of
200 patient records from each state, the sample statistics shown here were obtained.
State

Sample Mean

Sample
Standard
Deviation

Montana

200

$6,458

$250

California

200

$12,690

$890

a) Is there significant evidence that California has a higher mean hysterectomy cost
than Montana?
The sample standard deviation for California is more than 3 times the sample
standard deviation for Montana, suggesting that the equal variance assumption is not
appropriate. It is stated in the problem that both 200 patient records in California and
Montana were randomly selected, so the samples are independent. The size of the
sample (200) suggests that we can assume a normal distribution of the data. We will
use the separate-variance t-test to analyze the data.
H0: C M 0
df =

( n1 1 ) (n21)
2
( 1c ) ( n1 1 ) +c 2 (n21)

Vs

Ha: C M > 0

Where c=

s1
n1
s21 s22
+
n1 n2

y 2

s 21 s 22
+
n1 n 2

=0.073

giving df = 230.15 which is rounded to 230.

126906458

8902 250 2
+
200 200
y
t ' =

=95.34

With a df = 230, t0.05 = 1.65


t = 95.34 >> t = 1.65 => we reject the null hypothesis and conclude that California has a
higher mean hysterectomy cost than Montana.
b) Estimate the difference in the mean costs of the two states using a 95%
confidence interval.
With df = 230, t0.025 = 1.97
(y1bar y2bar) +/- t0.025*

s 21 s22
+
n 1 n2

= (12690 6458) +/- 1.97*65.37

The 95% confidence interval is (6103.22, 6360.78)


c) Justify your choice between using the pooled t-test and the separate variance t-test
in part (a)
The sample standard deviation for California is more than 3 times the sample standard
deviation for Montana, suggesting that the equal variance assumption is not appropriate.
It is stated in the problem that both 200 patient records in California and Montana were
randomly selected, so the samples are independent. The size of the sample (200) suggests
that we can assume a normal distribution of the data. We will use the separate-variance ttest to analyze the data.

You might also like