0% found this document useful (0 votes)

29 views18 pages

Fall 19 Solutions

This document provides solutions to exam problems related to statistics. It covers topics like confidence intervals, hypothesis testing, ANOVA, multiple comparisons, and goodness of fit tests. Assumptions for procedures are discussed. Overall it analyzes statistical concepts through example exam questions and solutions.

Uploaded by

Abdullah Ibrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views18 pages

Fall 19 Solutions

Uploaded by

Abdullah Ibrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

1

Fall 2019 Applied Statistics Comprehensive Examination Solutions

1. (23 points) A recent study in Yellowstone National Park aimed to compare the
proportion of black and brown bear cubs surviving to at least one year in age.
Of the 181 black bear cubs studied, 150 survived to a year, while out of the 42
brown bear cubs studied, 32 survived.

(a) (10 points) Create a 95% confidence interval for the difference in the pro-
portion of each species of bear cub surviving to one year.

Solution: q 150 150 32

150 32 (1− ) (1− 32
42 )
(a) 95% CI: ( 181 − 42
) ± 1.96 181 181181 + 42
42
= (−0.073, 0.207)

We are 95% confident that the difference between the proportions of black
bear cubs and brown bear cubs surviving to at least a year (black−brown)
is between -0.073 and 0.207.

(b) (3 points) For inference for two proportions, we must assume that the
sample sizes are “large enough.” Briefly explain the rationale behind one
common check of this assumption, n1 pb1 ≥ 5, n1 (1 − pb1 ) ≥ 5, n2 pb2 ≥ 5, and
n2 (1 − pb2 ) ≥ 5.

Solution:
We need large enough samples so that the sample proportions are ap-
proximately normally distributed (implying that the difference in sample
proportions is also approximately normal) so that the z-stat will be ap-
proximately N(0,1) under H0 . The sample size checks for this assumption
of “large enough samples” depend not only on the number of observations
but also on the estimated proportions. This is due to the fact that when
the true proportions for each population are close to 0 or 1, larger sample
sizes n1 and n2 are needed so that the distributions of the sample pro-
portions are not skewed right or left. Simulations have established that
requiring n1 pb1 ≥ 5, n1 (1 − pb1 ) ≥ 5, n2 pb2 ≥ 5, and n2 (1 − pb2 ) ≥ 5 will
tend to lead to correct coverage probabilities and Type I error rates for
the confidence interval and hypothesis test procedures, respectively.

(c) (10 points) Suppose that the main goal of the study was actually to test
at the 0.05 level whether the one-year survival rate for black bear cubs
is different than the one-year survival rate for brown bear cubs. Use the
confidence interval to draw a conclusion for the test of interest. Make sure
to do each of the following:
i. write the hypotheses of interest symbolically, defining symbols as is
necessary.
2

ii. briefly explain what the conclusion is and how the confidence interval
helped you reach that conclusion.
iii. interpret the conclusion in terms of the problem.

Solution:
We want to test H0 : πblack = πbrown vs. H0 : πblack 6= πbrown where πblack is
the true proportion of black bear cubs surviving to a year and πbrown is the
true proportion of brown bear cubs surviving to a year (answer to part i).

Since 0 (the hypothesized difference in proportions under H0 ) is included

in the confidence interval, we would fail to reject H0 (answer to part ii).

That is, we do not have enough evidence at the 0.05 level to conclude
that there is a difference in the proportion of black and brown bear cubs
surviving to a year (answer to part iii).
3

2. (30 points) Four local amateur golfers participated in a long drive contest where
each golfer hit four drives. The drive lengths (in yards) and the sample means
and sample standard deviations are given in the table below. You are interested
in comparing the mean drive lengths for the four golfers.

Golfer Drive Lengths (Yards) x̄ s

A 248 125 193 229 199 54
B 271 251 247 225 249 19
C 163 149 177 189 170 17
D 253 252 284 301 273 24

(a) (10 points) Using level 0.05, test for evidence that the mean drive length
differs from one golfer to another.

Solution:
Let µA , µB , µC , and µD be the mean drive lengths of golfers A, B, C, and
D, respectively. Then, we want to test

H0 : µA = µB = µC = µD vs.
Ha : µA , µB , µC , and µD are not all the same.

SSE = (4 − 1)(s2A + s2B + s2C + s2D )

= 3(542 + 192 + 172 + 242 )
= 3(4142) = 12, 426

4
X
SST = 4 (xi· − x·· )2
i
= 4 (199 − 222.8)2 + (249 − 222.8)2 + (170 − 222.8)2 + (273 − 222.8)2
≈ 26, 243

Note: Using raw data, you should get SST ≈ 26, 196 and SSE ≈ 12, 524.
These numbers are slightly different than the SST and SSE based on the
mean and standard deviation summary statistics due to rounding (in the
drive lengths and/or the summary statistics).

26,243/(4−1)
F-Stat = 12,426/(16−4)
≈ 8.4.

Then, since F3,12,0.05 = 3.49 < 8.4, we have enough evidence at the 0.05
level to conclude that the mean drive lengths differ across the golfers.
4

(b) (10 points) Apply Tukey’s method at level 0.05 to compare the mean drive
lengths for the four golfers. Present your conclusions either in words or
with an appropriate diagram.

Solution:
From
q the table, q4,12,0.05 = 4.20, and thus the least significant difference is
4.20 12,426 1
√
2 12
(4 + 14 ) ≈ 67.6. This leads to the following underline diagram

(c) (10 points) List all the assumptions required for the inference procedures
that you applied in parts (a) and (b). For each assumption that you list,
provide one way to check the assumption. Please be specific. It is not
necessary that you actually do the assumption-checking.

Solution:
i. Independent errors. We could check this by plotting the residuals
against the order in which the drives were hit. A pattern in the plot
might indicate a lack of independence.
ii. Normal errors. We could make a histogram or normal probability plot
of the residuals either overall or by treatment.
iii. Equal variances. We could compare the sample variances. If the ratio
between the biggest and smallest exceeds 3 or so, then the population
variances may be sufficiently different to lead to an inflated α level.
5

3. (20 points) In the 1980-1981 and 1981-1982 NBA basketball seasons, Larry
Bird shot 338 pairs of free throws from the foul line (from Wardrop 1995, The
American Statistician). The observed distribution of the number of free throws
(out of two) that he made each time at the foul line is given by the following
table:
Number of Free Throws Made 0 1 2
Number of Occasions 5 82 251

(a) (15 points) If all of the free throws Larry Bird took over his career were
independent with the same probability of success, we would expect the
number of free throws made out of two to follow a binomial distribution
with success probability 0.89 (Larry Bird made 89% of his career free
throws). Conduct a 0.05 level goodness-of-fit test to decide whether the
binomial distribution with π = 0.89 is reasonable for these two seasons.

Solution:
First, note that if the Y =number of free throws out of two is binomial
with π = 0.89 then,
π00 = P (Y = 0) = 0.112 = 0.0121
π10 = P (Y = 1) = 2(0.89)(0.11) = 0.1958
π20 = P (Y = 2) = 0.892 = 0.7921

Then, we want to test

H0 : π0 = 0.0121, π1 = 0.1958, π2 = 0.7921 vs.

Ha : πi 6= πio for some i ∈ {0, 1, 2},

where πi is the true probability of making i out of the 2 free throws

The expected values for each of the three cells are 338(0.0121) = 4.090,
338(0.1958) = 66.180, and 338(0.7921) = 267.730.

(5−4.090)2 (82−66.180)2 (251−267.730)2

Then, χ2 −stat= 4.090
+ 66.180
+ 267.730
= 5.030

Rejection/Critical Region: χ2 −stat> χ22,0.05 = 5.991

Since the test statistic is not in the rejection region, we do not have enough
evidence at the 0.05 level to conclude that the distribution of the number
of free throws made by Larry Bird out of 2 is different than a binomial
with success probability 0.89.
6

(b) (5 points) What assumptions are required for the test in part a? Comment
on whether you have any concerns about the validity of these assumptions.

Solution:
We need to assume that our sample sizes are large enough. I would have
some concern about this assumption since the expected count for the 0
case is less than 5. The rule of thumb is to have all expected counts above
1 and no more than 20% below 5. Since the expected value 4.090 is only
just below five, the validity of the test may still be reasonable (note: sim-
ulations show this to be the case). Alternatively, one may combine the 0
and 1 cells before conducting the goodness-of-fit test.

We also need to assume that we have collected 338 pairs of free throws
that are independent from one another. This assumption may or may not
be reasonable (injuries, hot hand, etc.) but is fine if we are willing to make
the assumption as stated in the problem that each and every individual
free throw is independent. Of course, if we are willing to assume this, then
it may make more sense to do a one sample proportion test to see if the
overall proportion of free throws made in the 1980-1981 and 1981-1982
seasons differs from 0.89.
7

4. (25 points) A no-intercept simple linear regression model (or a regression model
iid
through the origin), that is yi = βxi + i , i ∼ N (0, σ 2 ), i = 1, 2, ..., n, is often
appropriate in analyzing data from chemical and other manufacturing processes.
a. (10
Pn
points) Show that the least-squares estimator for the slope β is β̂ =
xi yi
Pi=1
n
x2
.
i=1 i

Solution:
n
X
S(β) = (yi − βxi )2
i=1
n n n
!
dS(β) X X X set
=2 (yi − βxi )(−xi ) = −2 xi y i − β x2i =0
dβ
Pi=1
n
i=1 i=1
x y
i i
β̂ = Pi=1
n 2
.
x
i=1 i

b. (7 points) Derive the mean and variance of β̂.

Solution:
Pn Pn Pn
x i y i x i E(yi ) i=1 xi (xi β)
E(β̂) = E Pi=1
n 2
= i=1
Pn 2 = P n 2
=β
i=1 xi i=1 xi i=1 xi

Pn Pn 2
σ 2 ni=1 x2i σ2
P
i=1 xi yi i=1 xi V (yi )
V (β̂) = V P n 2
= Pn 2 2 = Pn 2 2 = n 2 P
i=1 xi ( i=1 xi ) ( i=1 xi ) i=1 xi

c. (3 points) Since two points determine a straight line, a student proposes

another estimator for β that uses only the first observation in the dataset,
i.e., (x1 , y1 ), and the origin, i.e., (0, 0), as β̃ = xy11−0
−0
= xy11 . Is β̃ an unbiased
estimator for β? Show your work.

Solution: β̃ is an unbiased estimator for β.

y1 βx1
E(β̃) = E = = β.
x1 x1

d. (5 points) Which estimator is better, β̂ or β̃? Explain.

Solution: I conclude that β̂ is better because both estimators are unbiased

for estimating β, but β̂ is more efficient with a smaller variance.
σ2 σ2

y1
V (β̃) = V = 2 ≥ Pn 2 ,
x1 x1 i=1 xi
8

where the equality only holds when x2 , x3 , ..., xn are all 0 which is not a
realistic scenario. Comments such as “β̂ is more efficient because it uses
all data while β̃ only uses the first observation” would be worth partial
credit.
9

5. (37 points) From a random sample of all domestic flights departing from Newark
Liberty International Airport (EWR) and John F. Kennedy International Air-
port (JFK) in the year 2013, the following summary statistics were observed for
the departure delay in (minutes):

Mean Standard Deviation n

EWR 18.5 41.7 178
JFK 20.3 44.6 153

(a) (10 points) Conduct an appropriate statistical test to determine if there is

evidence that the variances in departure delay time from each airport are
unequal, at α = 0.05. Note: since the provided distribution table does not
actually contain the proper degrees of freedom based on the data provided,
please use the closest that you can.

Solution:
2 2 2 2
H0 : σEW R = σJF K vs. Ha : σEW R 6= σJF K ,
2 2
where σEW R is the variance in flight delays at EWR airport and σJF K is
the variance in flight delays at JFK airport.

44.62
F-stat = 41.72
= 1.144

While the rejection region is two-sided, since I chose to put the larger sam-
ple variance in the numerator, we would reject H0 if F-stat> F152,177,0.025 ≈
F120,120,0.025 = 1.433. Alternatively, if one were to choose to put the
smaller sample variance in the numerator, we would then reject if F-
stat< F152,177,0.975 = 1/F177,152,0.025 ≈ 1/F120,120,0.025 = 1/1.433 = 0.698

Since our F-stat is not in the rejection region, we do not have enough ev-
idence at the 0.05 level to conclude that the variance in departure delays
is different at the EWR and JFK airports.

(b) (8 points) Since the desired degrees of freedom for the test in part (a)
are not in the table, the Type I and Type II Error rates will be different
than the ones resulting from the decisions based on the proper degrees of
freedom. Answer each of the following:
i. Using the provided distribution table, the Type I Error rate will be:

lower than the same as higher than

that of the decisions made with the proper degrees of freedom (choose
one, and explain).
10

ii. Using the provided distribution table, the Type II Error rate will be:

lower than the same as higher than

that of the decisions made with the proper degrees of freedom (choose
one, and explain).

Solution:
The Type I Error rate is lower than if we had used the correct degrees of
freedom. This is because the F upper percentiles decrease as df increases
in both numerator and denominator. Thus, when H0 is true, we would
not reject H0 in certain cases when the F-stat is actually above the true
rejection region cut-off if we are using the incorrect df.

The Type II Error rate would be higher than if we had used the correct
degrees of freedom. Again, since the F upper percentiles decrease as df
increases, using artificially low df makes the cut-off to reject H0 too high.
Thus, when Ha is true, we would not reject H0 in certain cases when the F-
stat is above the true rejection region cut-off. This means that the chances
of making a Type II Error are inflated (and power is decreased).

(c) (10 points) Based on your answer to part (a), conduct an appropriate
statistical test to determine if there is evidence that the mean departure
delays from each airport are different, at α = 0.05. If a required degrees
of freedom is not exactly available, please use the closest that you can.

Solution:
Using the equal variance assumption, we want to test H0 : µEW R = µJF K
vs. Ha : µEW R 6= µJF K , where µEW R (µJF K ) is the mean flight delay at
EWR (JFK) airport.
q
177(41.72 )+152(44.6)2
sp = 329
= 43.064 so that t-stat= √ 1 1
20.3−18.5
= 0.379
43.196 178 + 153

RR: |t-stat|> t329,0.025 ≈ t120,0.025 = 1.98

Since the t-stat is not in the rejection region, we do not have enough evi-
dence at the 0.05 level to conclude that there is a difference in the mean
departure delays at EWR and JRK airports.

(d) (6 points) Based on the information given, do either of the tests performed
in parts (a) or (c) require an assumption that departure delays follow a
normal distribution? Why or why not?
11

Solution:

The test for unequal variances requires normally distributed departure

times. This is because the F-statistic is derived under this assumption (as
sample sizes get larger, the F-stat will not necessarily become close to a
true F-distribution if the departure times are non-normal). The test for
unequal means does not require a normality assumption to work well, just
a large enough sample such that the distributions of the sample mean will
be normally distributed (so that the t-stat tdf under H0 ).

Note: since departure delays are non-negative and the sample standard de-
viations given in the table are well greater than the means, the normality
assumption is not reasonable for the test of unequal variances. It is also
possible that the data are so skewed that the two-sample t-test may have
inflated Type I Error rates as well, but this is less likely given the large
sample sizes.

(e) (3 points) List any other required assumptions for each of the tests in parts
(a) and (c), and comment briefly on whether they appear to be reasonable
here.

Solution:
For both tests in part (a) and (c), we need the samples to be independently
collected and randomly sampled from the populations of flight delays. We
are told that the samples are random which should imply in this context
independence between the samples. For the test in (c), we additionally as-
sumed that the variances are equal. The test for unequal variances proba-
bly doesn’t help much here as the samples of flight delays are very skewed.
However, with the sample sizes being relatively similar, there is likely no
issue with using the equal variance test.
12

6. (30 points) A student ran an experiment to study how the average time required
for her to blow up a balloon varied with the color of the balloon (Red, White,
or Blue) and the brand (A, B, or C). She blew up three balloons for each
combination of a color and a brand, recording the time required (in seconds).
The treatment sample means are given in the table below.

Brand
Color A B C
Red 22 20 26
White 20 17 23
Blue 25 20 18

(a) (10 points) Create appropriate interaction plots, and comment on both the
presence/absence of interaction and the type of interaction.
Solution:
30

● C ●
Red
25

● ●
Blue
Time (sec)

Time (sec)

● ●
A
● ●

Red
20

● ● ● ● ●

B ● White Blue ●

● ●
15

Red White Blue A B C

Color Brand

Since there is crossing in the plots, it appears like there may be disorderly
interaction.

(b) (10 points) Write down a complete set of orthogonal interaction contrasts.

Solution: Answers may vary. The following is one potential strategy.

Start with an interaction contrast only involving the upper left 2 by 2
portion of the table. The coefficients for an interaction contrast can be
easily found by multiplying the coefficients on the marginal simple effects
of A vs. B and Red vs. White (in diagram below, coefficients to the left
and above the lines represent the marginal effects while the coefficients
inside the lines represent the coefficients for the interaction contrast).
13

A B C
1 −1 0
Red 1 1 −1 0
W hite −1 −1 1 0
Blue 0 0 0 0
The associated interaction contrast is µRed,A − µRed,B − µW hite,A + µW hite,B .

Next, keep one of the marginal simple effects the same while changing
the other to a new marginal comparison of means that on its own will
be orthogonal to the first, such as the average of A and B vs. C. Then
cross-multiply the coefficients to get a new interaction contrast:

A B C
1 1 −2
Red 1 1 1 −2
W hite −1 −1 −1 2
Blue 0 0 0 0
The associated interaction contrast is µRed,A −µRed,B −2µRed,C −µW hite,A −
µW hite,B + µW hite,C .

Similarly, going back to the original A vs. B but now considering the
average of Red and White vs. Blue:

A B C
1 −1 0
Red 1 1 −1 0
W hite 1 1 −1 0
Blue −2 −2 2 0
The associated interaction contrast is µRed,A −µRed,B +µW hite,A −µW hite,B −
2µBlue,A + 2µBlue,B .

Finally, use both of the more complicated marginal comparisons and cross-
multiply to get the coefficients for the final interaction contrast:

A B C
1 1 −2
Red 1 1 1 −2
W hite 1 1 1 −2
Blue −2 −2 −2 4
The associated interaction contrast is µRed,A + µRed,B − 2µRed,C + µW hite,A +
µW hite,B − 2µW hite,C − 2µBlue,A − 2µBlue,B + 4µBlue,C .
14

Putting everything together, this gives a set of orthogonal contrasts on

(µRed,A , µRed,B , µRed,C , µW hite,A , µW hite,B , µW hite,C , µBlue,A , µBlue,B , µBlue,C )

with coefficient matrix

 
1 −1 0 −1 1 0 0 0 0
 1
 1 −2 −1 −1 2 0 0 0 

 1 −1 0 1 −1 0 −2 2 0 
1 1 −2 1 1 −2 −2 −2 4

(c) (10 points) Suppose that the SSE for an effects model with interaction is
72. Using level 0.05, test for an interaction involving colors White and
Blue and brands B and C.

Solution:
H0 : µW hite,B − µW hite,C − µBlue,B + µBlue,C = 0 vs.
Ha : µW hite,B − µW hite,C − µBlue,B + µBlue,C 6= 0

72
MSE = 9(3−1)
= 4 (18 df for error)

17−23−20+18 −8
t= q
2 (−1)2 (−1)2 2
= 2.309
≈ −3.5
4( 13 + 3 + 3 + 13

Then, since | − 3.5| > t18,0.025 = 2.10, we reject H0 and conclude at the
0.05 level that there is interaction in the effect on time between the colors
white and blue and the brands B and C.
15

7. (35 points) Researchers from the Netherlands were interested in whether smok-
ing by the mother in pregnancy is related to higher childhood blood pressure in
their offspring. In a recent study, they recruited 200 mothers and their newborns
(one newborn per mother). They considered the following regression model
iid
y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + , ∼ N (0, σ 2 ),

to relate the infant’s systolic blood pressure in mm Hg (y) to the infant’s age in
weeks (x1 ) and weight in kg (x2 ), and the mother’s smoking status in pregnancy
(no smoking, passive exposure to smoking, or smoking). Mothers who did not
smoke in pregnancy but were exposed to smoke by others were considered as
having “passive exposure to smoking”. Two indicator variables were defined to
account for mother’s smoking status:
( (
1, passive exposure to smoking, 1, smoking,
x3 = x4 =
0, otherwise; 0, otherwise.

The R output of the regression analysis for this full model is presented in Table
1. Table 2 presents the ANOVA tables for the full model, the reduced model
1 (only including the independent variables x1 and x2 ), and the reduced model
2 (only including the independent variables x3 and x4 ). Figure 1 shows the
residual plots for the full model.
Table 1: R output of regression analysis for the full model.

Call:
lm(formula = y ~ x1 + x2 + x3 + x4, data = sdata)

Residuals:
Min 1Q Median 3Q Max
-13.0408 -2.6090 -0.2574 1.6563 20.9125

Coefficients:
Estimate Std.Error t value Pr(>|t|)
(Intercept) 72.0164 2.5112 28.678 < 2e-16 ***
x1 1.0006 0.2393 4.181 4.38e-05 ***
x2 4.0882 0.5007 8.166 3.92e-14 ***
x3 0.2803 0.9379 0.299 0.765
x4 5.6025 0.8522 6.574 4.36e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.286 on 195 degrees of freedom
F-statistic: 30.08 on 4 and 195 DF, p-value: < 2.2e-16
16

Table 2: ANOVA tables for the full model (including x1 , x2 , x3 , and x4 ), the reduced
model 1 (including x1 and x2 ), and the reduced model 2 (including x3 and x4 ).

Models Degrees of Freedom Sum of Squares Mean Squares

Full Model Regression 4 2210.1 552.53
Error 195 3581.4 18.37
Reduced Model 1 Regression 2 1403.5 701.75
Error 197 4388.0 22.27
Reduced Model 2 Regression 2 699.3 349.65
Error 197 5092.2 25.85

Figure 1: Residual plots for the full model.

Residuals vs Fitted Normal Q-Q

4 4
20

Standardized residuals

4
174 174
10
Residuals

2
0

0
-10

-2

30
30

85 90 95 100 -3 -2 -1 0 1 2 3

Fitted values Theoretical Quantiles

Scale-Location Residuals vs Leverage

4 0.5
2.0
Standardized residuals

Standardized residuals

174
4

30 174
1.5

81
2
1.0

0
0.5

-2

Cook's distance 30
0.0

-4

85 90 95 100 0.00 0.02 0.04 0.06

Fitted values Leverage

a. (5 points) Interpret the estimated coefficient associated with the variable

age (x1 ) in the full model.
17

Solution: Each one week increase in the infant’s age is associated with
an estimated mean systolic blood pressure increase of 1.006 mm Hg, given
that all other variables are held constant.

b. (5 points) Calculate and interpret the coefficient of determination, R2 , for

the full model.

Solution:
SSR 2210.1
R2 = = = 0.382
SST 2210.1 + 3581.4
This means that 38.2% of the variability in the infant’s systolic blood pres-
sure can be explained by a linear relationship with the infant’s age, weight,
and the mother’s smoking status.

c. (10 points) Use appropriate sums of squares or mean squares in Table 2

to test whether the infant’s systolic blood pressure is associated with the
mother’s smoking status given that the age and weight of the infant are in
the model at the significance level of 0.05.

Solution:
H0 : β3 = β4 = 0
Ha : at least one of the regression coefficients is not 0.

(SSRfull − SSRreduced )/df (2210.1 − 1403.5)/2 H

F = = = 21.95 ∼0
MSEfull 18.37
F2,195

Since 21.95 > F0.05,2,195 ≈ 3.07, reject H0 . There is sufficient evidence

at the 0.05 level to claim that the infant’s systolic blood pressure is
associated with the mother’s smoking status after adjusting for the
infant’s age and weight.

d. (9 points) List all model assumptions. Based on the residual plots for the
full model in Figure 1, comment on whether the model assumptions appear
to be satisfied. Provide a way to address each assumption violation if there
are any.
Solution:
18

Assumption Violation Solution

Linearity No obvious violation as the residuals
seem to be scattered randomly about 0
for all predicted values of the response
Constant variance This assumption may not quite be met, weighted least squares
though since most of the data is be- (transformation on y
tween the predicted values of 90 and may work in some
100, we would expect to see observa- cases of changing vari-
tions further from 0 in this range. That ance)
said, there appear to exist outliers, such
as observations 4, 30, and 174, and one
could argue that the variances at the
center of the fitted values tend to be
slightly greater than the two ends.
Normality There is a clear violation of normality. Robust regression or
The tails of the distribution tend to be (transformation on y
heavier than those of normal distribu- may work in some
tions. cases)
Independence Whether or not independence is reason-
able cannot be diagnosed from Figure
1. No clear violation is found from the
study description.
Additionally, note that while influential observations are not strictly part
of the model assumptions and thus discussion of influential points are not
required as part of a complete answer here, it is still good practice to
consider their effect. Based on the “Residual vs Leverage” plot, we see
that observation 30 may be identified as an influential observation. How-
ever, according to Cook’s distance, none of the observations are influential.

e. (6 points) Later on, one researcher suspected that the association between
the infant’s systolic blood pressure and the age might be different across
different smoking statuses. Propose a model that allows the slope of age
to differ among the three smoking statuses and write down the null and
alternative hypotheses that would be used to test this researcher’s idea.

Solution: The proposed model is

iid
Y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + β5 x1 x3 + β6 x1 x4 + , ∼ N (0, σ 2 )

The tested hypotheses are H0 : β5 = β6 = 0 v.s. Ha : β5 or β6 6= 0.

439 (2) - Mid 2
No ratings yet
439 (2) - Mid 2
7 pages
H22 Practice Final 2
100% (1)
H22 Practice Final 2
4 pages
Intermediate Statistics Test Sample 2
0% (1)
Intermediate Statistics Test Sample 2
19 pages
Statistics for College Students
100% (1)
Statistics for College Students
19 pages
Statistics Final Exam Fall 2022
No ratings yet
Statistics Final Exam Fall 2022
8 pages
Statistical Analysis and Hypothesis Testing
No ratings yet
Statistical Analysis and Hypothesis Testing
17 pages
0.05 Z 1.6449 Z 3.511 Comparison 3.511 1.6499: Test of Hypothesis
No ratings yet
0.05 Z 1.6449 Z 3.511 Comparison 3.511 1.6499: Test of Hypothesis
8 pages
Stat 350 Midterm 2
No ratings yet
Stat 350 Midterm 2
6 pages
Practice Problems Final Exam 305
No ratings yet
Practice Problems Final Exam 305
8 pages
Practice Test
No ratings yet
Practice Test
4 pages
Applied Statistics Exam Sample Paper
No ratings yet
Applied Statistics Exam Sample Paper
7 pages
2DI36 - Statistics Final Exam Solution: June 28th, 2013
No ratings yet
2DI36 - Statistics Final Exam Solution: June 28th, 2013
4 pages
Homework03 Solutions
No ratings yet
Homework03 Solutions
6 pages
Endterm - 1
No ratings yet
Endterm - 1
14 pages
Statistics Practice Exam Problems
100% (1)
Statistics Practice Exam Problems
11 pages
Math 1011 Final Exam
No ratings yet
Math 1011 Final Exam
14 pages
Statistical Analysis Questions
No ratings yet
Statistical Analysis Questions
6 pages
Final Exam Review
No ratings yet
Final Exam Review
12 pages
SNM 1
No ratings yet
SNM 1
7 pages
Statistics Analysis for Education Students
No ratings yet
Statistics Analysis for Education Students
9 pages
Hypothesis Testing and Confidence Intervals
100% (1)
Hypothesis Testing and Confidence Intervals
40 pages
Difference Testing for Means and Proportions
No ratings yet
Difference Testing for Means and Proportions
42 pages
PracticeProblems Exam3 Solutions
No ratings yet
PracticeProblems Exam3 Solutions
14 pages
Exam
No ratings yet
Exam
7 pages
0 and H 0. Place 1 P-Value: ST ND
No ratings yet
0 and H 0. Place 1 P-Value: ST ND
2 pages
STAT501 Online - Spring2024 - FinalExam
No ratings yet
STAT501 Online - Spring2024 - FinalExam
14 pages
Statistics Test Attended For The Various SMEs Position
No ratings yet
Statistics Test Attended For The Various SMEs Position
31 pages
2019 Past Paper
No ratings yet
2019 Past Paper
6 pages
Statistical Hypothesis Testing Guide
No ratings yet
Statistical Hypothesis Testing Guide
12 pages
BIOSTATS 540 Practice Problems CI and Hypothesis Tests SOLUTIONS
No ratings yet
BIOSTATS 540 Practice Problems CI and Hypothesis Tests SOLUTIONS
7 pages
Final Exam January 2019 Ines Barkia PDF
No ratings yet
Final Exam January 2019 Ines Barkia PDF
10 pages
Montgomery Solutions
No ratings yet
Montgomery Solutions
6 pages
151 Practice Final 1
100% (1)
151 Practice Final 1
11 pages
T Dist&chisquare
No ratings yet
T Dist&chisquare
21 pages
Statistics 1
No ratings yet
Statistics 1
3 pages
Probability and Statistics Exam Paper
No ratings yet
Probability and Statistics Exam Paper
43 pages
Final Sample Fall2020
No ratings yet
Final Sample Fall2020
10 pages
GRMD2102 - Homework 3 - With - Answer
No ratings yet
GRMD2102 - Homework 3 - With - Answer
5 pages
Sol Practice Questions Final-Annotations
No ratings yet
Sol Practice Questions Final-Annotations
21 pages
Statistical Analysis and Hypothesis Testing
0% (1)
Statistical Analysis and Hypothesis Testing
7 pages
Test 2 Sem 1
No ratings yet
Test 2 Sem 1
13 pages
Business Stats Exam Guide
No ratings yet
Business Stats Exam Guide
11 pages
STAT501 Online FinalExam Fall2024
No ratings yet
STAT501 Online FinalExam Fall2024
14 pages
232 Final CC in
No ratings yet
232 Final CC in
24 pages
Statistics 2500 Exam #3: All Sections SP2008 / Form F1 April 24, 2008 100 Points
No ratings yet
Statistics 2500 Exam #3: All Sections SP2008 / Form F1 April 24, 2008 100 Points
8 pages
Stats Finals
100% (1)
Stats Finals
7 pages
Hypothesis Testing Problem Set
No ratings yet
Hypothesis Testing Problem Set
4 pages
Sample Applied Statistics and Probability For Engineers
No ratings yet
Sample Applied Statistics and Probability For Engineers
8 pages
AP Statistics Inference Practice Questions
No ratings yet
AP Statistics Inference Practice Questions
9 pages
Practice Exam 3
No ratings yet
Practice Exam 3
13 pages
STAT 200 Final Exam Solutions: Summer 2008 (Modified Version)
No ratings yet
STAT 200 Final Exam Solutions: Summer 2008 (Modified Version)
7 pages
Sampling Theory and Hypothesis Testing
No ratings yet
Sampling Theory and Hypothesis Testing
24 pages
10-Sample Techniques - Two Sample
No ratings yet
10-Sample Techniques - Two Sample
7 pages
STA408
No ratings yet
STA408
9 pages
STAT 250 Practice Problem Solutions
100% (1)
STAT 250 Practice Problem Solutions
5 pages
Statistics Problems for Students
No ratings yet
Statistics Problems for Students
10 pages
Prostate Cancer, Second Edition: Science and Clinical Practice Jack H. Mydlo MD Facs Ebook Multi Device Access
100% (6)
Prostate Cancer, Second Edition: Science and Clinical Practice Jack H. Mydlo MD Facs Ebook Multi Device Access
31 pages
Mixer Machine
No ratings yet
Mixer Machine
2 pages
At 6666pro
No ratings yet
At 6666pro
2 pages
07 - NDP Diving Emergency Assistance Plan Sample
No ratings yet
07 - NDP Diving Emergency Assistance Plan Sample
2 pages
Sjaf 032
No ratings yet
Sjaf 032
10 pages
P6 Maths Prelim 2024 Worked Solutions Nanyang
No ratings yet
P6 Maths Prelim 2024 Worked Solutions Nanyang
8 pages
Service & Installation Rules PDF
No ratings yet
Service & Installation Rules PDF
149 pages
Maths Formulas For Class 10 All Type of Important Formulas PDF
No ratings yet
Maths Formulas For Class 10 All Type of Important Formulas PDF
9 pages
Mechanical Characterization of Materials at Small Length Scales
No ratings yet
Mechanical Characterization of Materials at Small Length Scales
17 pages
Deep Groove Ball Bearings Report
No ratings yet
Deep Groove Ball Bearings Report
2 pages
Actor-Network Theory: Translation and Betrayal
No ratings yet
Actor-Network Theory: Translation and Betrayal
27 pages
Gravita India 2QFY25 Earnings Boost
No ratings yet
Gravita India 2QFY25 Earnings Boost
10 pages
Geologist & GIS Analyst Resume
No ratings yet
Geologist & GIS Analyst Resume
3 pages
Senegal 2023 - Energy Policy Review
No ratings yet
Senegal 2023 - Energy Policy Review
1 page
Advanced Level School Based Projects
0% (1)
Advanced Level School Based Projects
6 pages
Thermal Analysis of Cast Iron PDF
No ratings yet
Thermal Analysis of Cast Iron PDF
27 pages
Gurukul Public School
No ratings yet
Gurukul Public School
58 pages
Increasingly Rampant Deforestation in Cambodia, Study Finds CamboJA News
No ratings yet
Increasingly Rampant Deforestation in Cambodia, Study Finds CamboJA News
1 page
Effect of Short Implant Crown-To-Implant Ratio On
No ratings yet
Effect of Short Implant Crown-To-Implant Ratio On
18 pages
Water Bridge Over River
No ratings yet
Water Bridge Over River
26 pages
Trigonometry Mcq4
No ratings yet
Trigonometry Mcq4
9 pages
Highways and Minor Ports Department: E.V. Velu
No ratings yet
Highways and Minor Ports Department: E.V. Velu
124 pages
Claas Disco 2650 C Plus / RC - Parts Catalog
0% (1)
Claas Disco 2650 C Plus / RC - Parts Catalog
8 pages
The Effect of Hydrogen Content and Welding Conditions On The Hydrogen Induced Cracking of The API X70 Steel Weld
No ratings yet
The Effect of Hydrogen Content and Welding Conditions On The Hydrogen Induced Cracking of The API X70 Steel Weld
9 pages
ECI II Load Path
100% (1)
ECI II Load Path
138 pages
Bài Thảo Luận Tiếng Anh 1
No ratings yet
Bài Thảo Luận Tiếng Anh 1
24 pages
Quality Management for Villa Project
No ratings yet
Quality Management for Villa Project
32 pages
Monsoon & Seasons
No ratings yet
Monsoon & Seasons
3 pages
Getting To Know Your 2007 Chevrolet HHR
No ratings yet
Getting To Know Your 2007 Chevrolet HHR
20 pages
Pattern Doggie
100% (16)
Pattern Doggie
9 pages

Fall 19 Solutions

Uploaded by

Fall 19 Solutions

Uploaded by

1

Fall 2019 Applied Statistics Comprehensive Examination Solutions

Solution: q 150 150 32

Since 0 (the hypothesized difference in proportions under H0 ) is included

Golfer Drive Lengths (Yards) x̄ s

SSE = (4 − 1)(s2A + s2B + s2C + s2D )

Then, we want to test

H0 : π0 = 0.0121, π1 = 0.1958, π2 = 0.7921 vs.

where πi is the true probability of making i out of the 2 free throws

(5−4.090)2 (82−66.180)2 (251−267.730)2

Rejection/Critical Region: χ2 −stat> χ22,0.05 = 5.991

b. (7 points) Derive the mean and variance of β̂.

c. (3 points) Since two points determine a straight line, a student proposes

Solution: β̃ is an unbiased estimator for β.

d. (5 points) Which estimator is better, β̂ or β̃? Explain.

Solution: I conclude that β̂ is better because both estimators are unbiased

Mean Standard Deviation n

(a) (10 points) Conduct an appropriate statistical test to determine if there is

lower than the same as higher than

lower than the same as higher than

RR: |t-stat|> t329,0.025 ≈ t120,0.025 = 1.98

The test for unequal variances requires normally distributed departure

Red White Blue A B C

Solution: Answers may vary. The following is one potential strategy.

Putting everything together, this gives a set of orthogonal contrasts on

(µRed,A , µRed,B , µRed,C , µW hite,A , µW hite,B , µW hite,C , µBlue,A , µBlue,B , µBlue,C )

with coefficient matrix

Models Degrees of Freedom Sum of Squares Mean Squares

Figure 1: Residual plots for the full model.

Residuals vs Fitted Normal Q-Q

Fitted values Theoretical Quantiles

Scale-Location Residuals vs Leverage

85 90 95 100 0.00 0.02 0.04 0.06

Fitted values Leverage

a. (5 points) Interpret the estimated coefficient associated with the variable

b. (5 points) Calculate and interpret the coefficient of determination, R2 , for

c. (10 points) Use appropriate sums of squares or mean squares in Table 2

(SSRfull − SSRreduced )/df (2210.1 − 1403.5)/2 H

Since 21.95 > F0.05,2,195 ≈ 3.07, reject H0 . There is sufficient evidence

Assumption Violation Solution

Solution: The proposed model is

The tested hypotheses are H0 : β5 = β6 = 0 v.s. Ha : β5 or β6 6= 0.

You might also like