0% found this document useful (0 votes)
26 views23 pages

FE220 APR24 L0101 L0201 Sup

Great quality

Uploaded by

Misheel B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views23 pages

FE220 APR24 L0101 L0201 Sup

Great quality

Uploaded by

Misheel B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

UNIVERSITY OF TORONTO

Faculty of Arts & Science


April 2024 EXAMINATIONS
ECO220Y1Y, L0101 & L0201 (Professors J. Murdock / D. O’Brien)
Introduction to Data Analysis and Applied Econometrics
Duration: 3 hours
Aids Allowed: A non-programmable calculator

Exam Reminders:
• Leave these papers face up on your desk until the announcements end and the Exam Facilitator starts the exam.
• As a student, you help create a fair and inclusive writing environment. If you possess an unauthorized aid during
an exam, you may be charged with an academic offence.
• Turn off and place all cell phones, smart watches, electronic devices, and study materials in your bag under your
desk. If it is left in your pocket, it may be an academic offence.
• When you are done your exam, raise your hand for someone to come and collect your exam.
• If you are feeling ill and unable to finish your exam, please bring it to the attention of an Exam Facilitator so it
can be recorded before leaving the exam hall.
• In the event of a fire alarm, do not check your cell phone when escorted outside.

Exam Format and Special Instructions: This exam includes these 10 Crowdmark pages plus the Supplement. There are #
questions with varying parts and point values worth a total of 120 points. Your entire answer must fit in the designated
space immediately after each question on the Crowdmark pages. Write in PENCIL and use an ERASER as needed. Follow
the answer guides that end each question. The Supplement is 10 pages and contains graphs, tables, and other materials
required for some exam questions and the aid sheets. After the exam begins, carefully DETACH the Supplement.

Students must hand in all examination materials at the end.


(1) [6 pts] For the relationship between employee salary and years of experience, consider three types of data: time
series, cross-sectional, and panel. How would each type be structured in this context? Which variables – including any
relevant identifier variables – would be in each? What would be the unit of observation? Fill in the blanks to answer.
There would be ____ [#] variables in the time series data and these variables would be ________________________
_________________________________________. Next, there would be ____ [#] variables in the cross-sectional data
and these variables would be ________________________________________________________________________.
Finally, there would be ____ [#] variables in the panel data and these variables would be ______________________
_________________________________________________________. The unit of observation in the time series data
is __________________________________, in the cross-sectional data is __________________________________,
and in the panel data is __________________________________.

(2) See the Supplement for Question (2): Karlan and List (2007): Does Price Matter in Charitable Giving?
(a) [7 pts] Consider Columns (1) and (2). First, 0.018 minus 0.022 is -0.004: interpret -0.004. Second, 0.813 minus 0.967 is
-0.154: interpret -0.154. Do not be repetitive across the two sentences. Answer with 2 sentences.

(b) [4 pts] In the char_give.xlsx data the variable named amount is labelled “Dollar amount person donated in response
to the letter (0 for no donation).” Among those receiving a letter with a match and choosing to donate, what is the
numeric value of the standard deviation of the variable named amount? Answer with a quantitative analysis.
(3) See the Supplement for Question (3): Mental Models and Learning: The Case of Base-Rate Neglect.

(a) [7 pts] Show how to compute “41 percent.” Define relevant events (each with one letter), draw the probability tree,
and show the analysis. Answer with defined events, a probability tree & a quantitative analysis using formal notation.

(b) [4 pts] If a doctor sees a representative subset of the population, about ______ [#] percent of test results should
come back negative. For an elderly subset of the population where two-thirds have the disease, the size of the bias
caused by base-rate neglect would be ______________________ [smaller / larger / the same]. For the general
population, the test leads to an overestimate of 39 ______________________ [percent / percentage points]. If a
new test with an accuracy of 99 percent is adopted, the size of the bias caused by base-rate neglect would be
______________________ [smaller / larger / the same]. Fill in the blanks to answer.
(4) See the Supplement for Question (4): The Big Three and Financial Literacy.

(a) [14 pts] Given Table 1, compute the 99.9% confidence interval estimate of the difference from 2019 to 2022 in the
fraction with all Big Three correct. Next, interpret the interval. Answer with a quantitative analysis & 3 – 4 sentences.
(b) [7 pts] Interpret the Constant terms for Columns (1) to (5) in Table 2. Ignore the values in parentheses and do not be
repetitive. Answer with 2 sentences.

(c) [10 pts] Explain the meaning of the results in the Asian row in Column (3) and Column (5) and compare those results
with each other. Be sure to fully address any notable differences. Answer with 3 – 4 sentences.
(d) [5 pts] Interpret the 𝑹𝟐 in Column (6), including assessing its size (large or small). Answer with 1 – 2 sentences.

(e) [3 pts] Use Column (6) to predict the score of a person who is 50 years old, black, male, and has a Master’s degree.
Answer with a quantitative analysis.

(f) [5 pts] Suppose you wanted to redo Table 2 combining those with some college (no degree) with those with an
Associate’s degree (2-year). What is the simplest way to create the necessary new variable in Excel? Which columns of
Table 2 would be affected and what would be their new value of 𝑘? Answer with 2 sentences.
(5) See the Supplement for Question 5: Parents’ Beliefs about Their Children’s Academic Ability.

[4 pts] Locate number 0.11 in Table 1. Calculate it using the relevant information from the table. Answer with a
quantitative analysis.

(6) See the Supplement for Question 6: Adolescents’ Gender Attitudes.

(a) [10 pts] Refer to Table 6, Column (1) Panel C. Girls = Boys p-value. Reproduce the test of whether the attitudes
about female employment differ between girls and boys in the treated schools. Calculate the p-value and assess the
significance. Answer with hypotheses in formal notation, a quantitative analysis, and 2-4 sentences.
(b) [2 pts] Refer to Table 6, Column (2) Panel B. Boys. With regards to the treatment effect, what are the Type I and
Type II errors in this context? Answer in 2 sentences using plain English to explain.

(c) [9 pts] Refer to Table 6, Column (5) Panel B. Boys. At the significance level of 5% what is the power of the
hypothesis test that the proportion of boys who think that the community agrees with the female rights to
education is higher in the treated schools as compared to the control schools. Answer with a quantitative analysis.

(d) [1 pt] If the sample size were lower, how would that affect the standard deviation and power? Answer with 1-2
sentences.
(7) See the Supplement for Question 7: Coordination and Organization Design.

(a) [4 pts] Refer to Figure 3. Write down the regression model that represents the relationship depicted in the figure.
You can assume that the variable “Demand uncertainty” is measured on a 1-7 scale and the variable “Need for
coordination” is a dummy for either the low or high levels. Clearly define the variables. Answer using formal
notation with one equation and 3 phrases.

(b) [6 pts] Refer to the Stata regression output, Column (1). Interpret the coefficients on the variables “Demand
uncertainty” and “High need for coordination.” Answer with 2-4 sentences.

(c) [4 pts] Refer to the Stata regression output, values r2 (R2) and r2_a (Adjusted R2). Explain why R2 increases from
Column (1) to Column (2). Next explain what additional information the Adjusted R2 provides. Answer with 2-3
sentences.
(d) [8 pts] Refer to the Stata regression output, Column (2). Describe any changes in the coefficients on the variables
“demand uncertainty” and “high need for coordination,” their significance, and interpretation changes if any. Next,
interpret the coefficient on the interaction term. Answer with 2-3 sentences.
This Supplement will NOT be graded. UNSTAPLE this Supplement from Crowdmark Supplement: Page 1 of 13

APRIL 2024 ECO220Y1Y EXAM SUPPLEMENT: DETACH FROM CROWDMARK NOW

Supplement for Question (1): Not applicable because all information is given with the question.

Supplement for Question (2): Recall Table 2A from Karlan and List (2007) “Does Price Matter in Charitable Giving?
Evidence from a Large-Scale Natural Field Experiment.” At random, each potential donor either received a standard
solicitation letter (control group) or a letter where each dollar donated would be matched to a randomly varying degree.

Table 2A – Mean Responses


(Mean and standard errors)
Match ratio
Control Treatment 1:1 1:2 1:3
Implied price of $1 of public good: 1.00 0.36 0.50 0.33 0.25
(1) (2) (3) (4) (5)
Panel A

Response rate 0.018 0.022 0.021 0.023 0.023


(0.001) (0.001) (0.001) (0.001) (0.001)
Dollars given, unconditional 0.813 0.967 0.937 1.026 0.938
(0.063) (0.049) (0.089) (0.089) (0.077)

Dollars given, conditional on giving 45.540 43.872 45.143 45.337 41.252


(2.397) (1.549) (3.099) (2.725) (2.222)
Observations 16,687 33,396 11,133 11,134 11,129

Supplement for Question (3): Consider excerpts from a 2024 academic article in the American Economic Review “Mental
Models and Learning: The Case of Base-Rate Neglect.” For easy reference, “41 percent” is in boldface below.

Excerpt, Abstract: We experimentally document persistence of suboptimal behavior despite ample


opportunities to learn from feedback in a canonical updating problem where people suffer from base-
rate neglect. Our results suggest mistakes are more likely to be persistent when they are driven by
incorrect mental models that miss or misrepresent important aspects of the environment. Such models
induce confidence in initial answers, limiting engagement with and learning from feedback.

Excerpt, p. 753: One of the most well-documented biases in the literature is base-rate neglect. As a
motivating example, consider a person who is tested for a disease. The disease has a prevalence of 15
percent in the general population, and the test has an accuracy of 80 percent. 1 With these primitives, the
chance that the person is sick conditional on a positive test result is 41 percent, but the literature has
repeatedly documented that many subjects (and doctors!) incorrectly consider this chance to be 80
percent. Because such beliefs completely fail to take into account the unconditional probability of the
disease, we refer to this bias as perfect base-rate neglect.

In the example, the “base-rate” that people (mistakenly) neglect is the rate of disease in the general population, and the
size of the bias caused by base-rate neglect is an overestimate of 39 (=80-41).

1
The probability of a positive test result conditional on the person being sick is 80 percent. The probability of a positive test result
conditional on the person not being sick is 20 percent.
This Supplement will NOT be graded. UNSTAPLE this Supplement from Crowdmark Supplement: Page 2 of 13

Supplement for Question (4): Consider excerpts from a 2023 academic article in the Journal of Economic Perspectives
“The Importance of Financial Literacy: Opening a New Field,” abbreviated Lusardi and Mitchell (2023).

Excerpt, p. 138: The “Big Three” is a short set of survey questions that over the years has proven to be an
extremely good measure of peoples’ understanding of basic financial concepts:

Question 1: Suppose you had $100 in a savings account and the interest rate was 2% per year. After
5 years, how much do you think you would have in the account if you left the money to grow?
a. More than $102
b. Exactly $102
c. Less than $102
d. Do not know or refuse to answer [DK/Refuse]

Question 2: Imagine that the interest rate on your savings account was 1% per year and inflation was
2% per year. After 1 year, how much would you be able to buy with the money in this account?
a. More than today
b. Exactly the same
c. Less than today
d. Do not know or refuse to answer [DK/Refuse]

Question 3: Please tell me whether this statement is true or false. “Buying a single company’s stock
usually provides a safer return than a stock mutual fund.”
a. True
b. False
c. Do not know or refuse to answer [DK/Refuse]

Question 1 is abbreviated “interest,” and the correct answer is a. Question 2 is abbreviated “inflation,” and the correct
answer is c. Question 3 is abbreviated “risk,” and the correct answer is b. Together, these are the Big Three.

Every three years, the US Federal Reserve runs the Survey of Consumer Finances. It has many questions, including the Big
Three. Lusardi and Mitchell (2023) use the 2019 survey data, and since, the 2022 survey data have been released.

Table 1 shows some results as reported in Lusardi and Mitchell (2023) and adds in the 2022 survey results.

Table 1. Financial Literacy in the US Adult Population: Results from the Big Three Survey Questions
2019 Survey (n = 5,783) 2022 Survey (n = 4,595)
Correct Incorrect DK/Refuse Correct Incorrect DK/Refuse
Interest 80.6% 16.4% 3.0% 79.7% 17.0% 3.3%
Inflation 75.5% 20.7% 3.8% 82.8% 14.1% 3.1%
Risk 60.7% 17.1% 22.2% 67.8% 15.1% 17.1%

All Big Three correct 43.3% 50.9%

For financial literacy, a variable named score records the percent correct on the Big Three survey questions. For
example, for a person who answers b, c, and b for the three questions respectively, their score is 66.67 percent.
(Refusing to answer or saying don’t know are incorrect answers worth no marks.)

Supplement for Question (4), continues the next page >>>>>


This Supplement will NOT be graded. UNSTAPLE this Supplement from Crowdmark Supplement: Page 3 of 13

Supplement for Question (4), continued: Consider these variables:

• For the respondent’s gender, female is 1 if they identify as female, and zero otherwise.
• For the respondent’s highest level of education, hs_deg is 1 if it’s a high school degree, some_col is 1 if it’s some
college (no degree), assoc_deg is 1 if it’s an Associate’s degree (2 year), bach_deg is 1 if it’s a Bachelor’s degree
(4 year), ma_deg_plus is 1 if it’s a Master’s, professional, or higher graduate degree. These dummy variables are
mutually exclusive and are 0 whenever they are not 1.
• For the respondent’s race, black is 1 for black, hisp is 1 for Hispanic, asian is 1 for Asian, oth_race is 1 for other
non-white races. Like above, these are dummy variables and are mutually exclusive with respect to each other.

Table 2: Explaining Variation in Percent Correct on the Big Three Questions that Measure Financial Literacy
using the 2022 Survey of Consumer Finances (n = 4,595)

Dependent variable: Percent score on Big Three (1) (2) (3) (4) (5) (6)

-15.01 -9.41
Female - - - -
(0.93) (0.91)

7.93 6.68 6.69


High school degree - - -
(1.51) (1.52) (1.50)

12.68 11.09 12.07


Some college (no degree) - - -
(1.64) (1.68) (1.65)

11.52 9.81 10.53


Associate’s degree (2-year) - - -
(1.77) (1.80) (1.77)

24.76 20.73 20.17


Bachelor’s degree (4-year) - - -
(1.45) (1.53) (1.51)

29.60 25.29 23.96


Master’s, profession, or higher graduate degree - - -
(1.46) (1.56) (1.54)

-18.03 -12.22 -9.42


Black - - -
(1.12) (1.11) (1.13)

-14.29 -6.04 -5.14


Hispanic - - -
(1.18) (1.21) (1.22)

3.38 0.09 -0.03


Asian - - -
(1.51) (1.46) (1.44)

-11.76 -7.19 -5.85


Other non-white race - - -
(3.07) (2.95) (2.90)

1.3009 0.9313
Age (in years) - - - -
(0.1526) (0.1405)

-0.0106 -0.0079
Age (in years) squared - - - -
(0.0014) (0.0013)

80.43 59.29 81.51 40.17 64.77 41.56


Constant
(0.46) (1.24) (0.50) (3.95) (1.40) (3.94)

𝑹𝟐 0.0535 0.1324 0.0775 0.0209 0.1566 0.1865

Notes: Each column reports a separate regression where the number of observations is 4,595 survey respondents. Standard
errors are in parentheses. For highest level of education, the omitted (reference) category is less than a high school degree.
For race, the omitted (reference) category is white. Some parts are in boldface for easy reference.
This Supplement will NOT be graded. UNSTAPLE this Supplement from Crowdmark Supplement: Page 4 of 13

Supplement for Question (5): Parents’ Beliefs about Their Children’s Academic Ability.

Recall Dizon-Ross (2019) “Parents’ Beliefs about Their Children’s Academic Ability: Implications for Educational
Investments,” AER, Vol 109 No.8.

Excerpt, p.2744: Table 1 presents summary statistics and tests for balance across the treatment and control groups…
The differences between the treatment and control groups are never large, with a joint test of equality failing to reject
the null that all are 0 (p-value 0.67).

Table 1—Baseline Summary Statistics

Full sample Treat – Control


Mean St. Dev. Control Treat Mean St. Dev. P
mean mean value
T=C
Academic performance
(average achievement
scores)
Overall score 46.8 17.5 47.1 46.4 −0.74 0.46 0.11
Math score 44.9 20.2 45.4 44.4 −1.08 0.54 0.04
English score 44.2 20.1 44.5 43.9 −0.56 0.53 0.29
(Math − English) score 0.71 19.5 0.93 0.5 −0.53 0.51 0.3
Sample sizes
Sample size: Households 2,634 1,327 1,307
Sample size: Kids 5,268 2,654 2,614

Supplement for Question (6): Adolescents’ Gender Attitudes.

Recall Dhar, Jain and Jayachandran (2022): “Reshaping Adolescents’ Gender Attitudes: Evidence from a School-Based
Experiment in India.”

Abstract: This paper evaluates an intervention in India that engaged adolescent girls and boys in classroom discussions
about gender equality for two years, aiming to reduce their support for societal norms that restrict women’s and girls’
opportunities. Using a randomized controlled trial, we find that the program made attitudes more supportive of gender
equality by 0.18 standard deviations, or, equivalently, converted 16 percent of regressive attitudes. When we resurveyed
study participants two years after the intervention had ended, the effects had persisted. The program also led to more
gender-equal self-reported behavior, and we find weak evidence that it affected two revealed-preference measures.

Excerpt, p. 917: Among girls, the intervention made personal attitudes about female employment more progressive by 8
percentage points (Table 6, column 1) but did not significantly increase their perception that others in the community
hold that gender-progressive view (column 2). In contrast, among boys, not only is there a treatment effect on their
personal attitude, but there is also a significant increase in how progressive they view the community to be.

Table 6 is reproduced below. The dependent variable for each column is 1 if the student agrees with the statement
written at the top of each column and 0 otherwise. The treatment effect is represented by the variable “Treated.”
This Supplement will NOT be graded. UNSTAPLE this Supplement from Crowdmark Supplement: Page 5 of 13

Supplement for Question (7): Coordination and Organization Design.

Consider Dessein, Lo and Minami (2022): “Coordination and Organization Design: Theory and Micro-Evidence.”

Abstract: We explore the relationship between the volatility of a firm’s local environment and its organizational
structure. Using micro-level data on managers working for a large retailer [in Japan in 2017], we empirically test and
provide support for our theory that a more volatile local environment results in more decentralization only when the need
for coordination among subunits is low. In contrast, more local volatility is associated with more centralization when
coordination needs are high. Our evidence supports the argument that centralized organizations are better at adapting
to local shocks when coordination is important.

Excerpt, p. 820: We begin by briefly describing the variables we used in our empirical analysis. While some of our
measures are cardinal (e.g., task delegation, sales deviations, age), other variables are ordinal and are reported by
managers on a 1–7 scale (e.g., demand uncertainty, need for coordination). See Table 2 for detailed descriptions and
their summary statistics.

Table 2 is truncated on purpose to display only the variables of interest, which are highlighted.
This Supplement will NOT be graded. UNSTAPLE this Supplement from Crowdmark Supplement: Page 6 of 13

Excerpt, p. 826: The upward-sloping gray line [in Figure 3] depicts the case when the Need for coordination is low (tenth
percentile): an increase in Demand uncertainty from low (tenth percentile = 1 ) to high (ninetieth percentile = 5 )
increases Task delegation by about 28 percent. On the other hand, the downward-sloping black line shows that when
Need for coordination is high (ninetieth percentile), an increase in Demand uncertainty from low to high decreases Task
delegation by approximately 10 percent.

Figure 3
This Supplement will NOT be graded. UNSTAPLE this Supplement from Crowdmark Supplement: Page 7 of 13

Stata regression output

In the regression output below the dependent variable is “task delegation” as it is described in Table 2. However, the
variable “Need for coordination” described in Table 2 is replaced with a dummy “High need for coordination” (=1 if
“Need for coordination” = 7 and 0 otherwise (includes levels 1 – 6, to which you can refer as “low”)).
This Supplement will NOT be graded. UNSTAPLE this Supplement from Crowdmark Supplement: Page 8 of 13
𝑛 2
∑ 𝑥𝑖 ∑𝑛 ̅ 2
𝑖=1(𝑥𝑖 −𝑋) ∑𝑛 2
𝑖=1 𝑥𝑖 (∑𝑛
𝑖=1 𝑥𝑖 )
Sample mean: 𝑋̅ = 𝑖=1 Sample variance: 𝑠 = 2
= − Sample s.d.: 𝑠 = √𝑠 2
𝑛 𝑛−1 𝑛−1 𝑛(𝑛−1)

𝑠 ∑𝑛 ̅ ̅
𝑖=1(𝑥𝑖 −𝑋 )(𝑦𝑖 −𝑌 ) ∑𝑛
𝑖=1 𝑥𝑖 𝑦𝑖 (∑𝑛 𝑛
𝑖=1 𝑥𝑖 )(∑𝑖=1 𝑦𝑖 )
Sample coefficient of variation: 𝐶𝑉 = Sample covariance: 𝑠𝑥𝑦 = = −
𝑋̅ 𝑛−1 𝑛−1 𝑛(𝑛−1)

𝑠𝑥𝑦 ∑𝑛
𝑖=1 𝑧𝑥𝑖 𝑧𝑦𝑖
Sample interquartile range: 𝐼𝑄𝑅 = 𝑄3 − 𝑄1 Sample coefficient of correlation: 𝑟 = =
𝑠𝑥 𝑠𝑦 𝑛−1

𝑃(𝐴 𝑎𝑛𝑑 𝐵)
Addition rule: 𝑃 (𝐴 𝑜𝑟 𝐵) = 𝑃(𝐴) + 𝑃 (𝐵 ) − 𝑃(𝐴 𝑎𝑛𝑑 𝐵) Conditional probability: 𝑃 (𝐴|𝐵) =
𝑃 (𝐵 )

Complement rules: 𝑃(𝐴𝐶 ) = 𝑃 (𝐴′ ) = 1 − 𝑃(𝐴) 𝑃 (𝐴𝐶 |𝐵) = 𝑃(𝐴′|𝐵) = 1 − 𝑃(𝐴|𝐵)


Multiplication rule: 𝑃(𝐴 𝑎𝑛𝑑 𝐵) = 𝑃 (𝐴|𝐵 )𝑃 (𝐵) = 𝑃(𝐵|𝐴)𝑃(𝐴)

Expected value: 𝐸 [𝑋] = 𝜇 = ∑𝑎𝑙𝑙 𝑥 𝑥𝑝(𝑥) Variance: 𝑉 [𝑋] = 𝐸 [(𝑋 − 𝜇 )2 ] = 𝜎 2 = ∑𝑎𝑙𝑙 𝑥(𝑥 − 𝜇 )2 𝑝(𝑥)

Covariance: 𝐶𝑂𝑉 [𝑋, 𝑌] = 𝐸 [(𝑋 − 𝜇𝑋 )(𝑌 − 𝜇𝑌 )] = 𝜎𝑋𝑌 = ∑𝑎𝑙𝑙 𝑥 ∑𝑎𝑙𝑙 𝑦(𝑥 − 𝜇𝑋 )(𝑦 − 𝜇𝑌 )𝑝(𝑥, 𝑦)

Laws of expected value: Laws of variance: Laws of covariance:


𝐸 [𝑐 ] = 𝑐 𝑉[𝑐 ] = 0 𝐶𝑂𝑉[𝑋, 𝑐 ] = 0
𝐸 [𝑋 + 𝑐 ] = 𝐸 [𝑋] + 𝑐 𝑉[𝑋 + 𝑐 ] = 𝑉[𝑋] 𝐶𝑂𝑉[𝑎 + 𝑏𝑋, 𝑐 + 𝑑𝑌] = 𝑏𝑑 ∗ 𝐶𝑂𝑉[𝑋, 𝑌]
2
𝐸 [𝑐𝑋] = 𝑐𝐸[𝑋] 𝑉[𝑐𝑋] = 𝑐 𝑉[𝑋]
𝐸 [𝑎 + 𝑏𝑋 + 𝑐𝑌] = 𝑎 + 𝑏𝐸 [𝑋] + 𝑐𝐸[𝑌] 𝑉 [𝑎 + 𝑏𝑋 + 𝑐𝑌] = 𝑏2 𝑉[𝑋] + 𝑐 2 𝑉[𝑌] + 2𝑏𝑐 ∗ 𝐶𝑂𝑉[𝑋, 𝑌]
𝑉[𝑎 + 𝑏𝑋 + 𝑐𝑌] = 𝑏2 𝑉[𝑋] + 𝑐 2 𝑉[𝑌] + 2𝑏𝑐 ∗ 𝑆𝐷(𝑋) ∗ 𝑆𝐷(𝑌) ∗ 𝜌
where 𝜌 = 𝐶𝑂𝑅𝑅𝐸𝐿𝐴𝑇𝐼𝑂𝑁[𝑋, 𝑌]

𝑛! 𝑛!
Combinatorial formula: 𝐶𝑥𝑛 = Binomial probability: 𝑝(𝑥) = 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥 for 𝑥 = 0,1,2, … , 𝑛
𝑥!(𝑛−𝑥)! 𝑥!(𝑛−𝑥)!

If 𝑿 is Binomial (𝑋~𝐵(𝑛, 𝑝)) then 𝐸 [𝑋] = 𝑛𝑝 and 𝑉[𝑋] = 𝑛𝑝(1 − 𝑝)

1 𝑎+𝑏 (𝑏−𝑎)2
If 𝑿 is Uniform (𝑋~𝑈[𝑎, 𝑏]) then 𝑓 (𝑥 ) = and 𝐸 [𝑋] = and 𝑉[𝑋] =
𝑏−𝑎 2 12

̅:
Sampling distribution of 𝑿 ̂:
Sampling distribution of 𝑷 ̂𝟐 − 𝑷
Sampling distribution of (𝑷 ̂ 𝟏 ):
𝜇𝑋̅ = 𝐸 [𝑋̅] = 𝜇 𝜇𝑃̂ = 𝐸[𝑃̂ ] = 𝑝 𝜇𝑃̂2 −𝑃̂1 = 𝐸[𝑃̂2 − 𝑃̂1 ] = 𝑝2 − 𝑝1
𝜎2 𝑝(1−𝑝) 𝑝 (1−𝑝 ) 𝑝 (1−𝑝 )
𝜎𝑋2̅ = 𝑉[𝑋̅] = 𝜎𝑃2̂ = 𝑉[𝑃̂] = 𝑛 𝜎𝑃2̂2 −𝑃̂1 = 𝑉[𝑃̂2 − 𝑃̂1] = 2 𝑛 2 + 1 𝑛 1
𝑛 2 1

𝜎 𝑝(1−𝑝) 𝑝2 (1−𝑝2) 𝑝1 (1−𝑝1 )


𝜎𝑋̅ = 𝑆𝐷[𝑋̅] = 𝜎𝑃̂ = 𝑆𝐷[𝑃̂ ] = √ 𝜎𝑃̂2 −𝑃̂1 = 𝑆𝐷[𝑃̂2 − 𝑃̂1 ] = √ +
√𝑛 𝑛 𝑛2 𝑛1

̅𝟏 − 𝑿
Sampling distribution of (𝑿 ̅ 𝟐 ), independent samples: ̅ 𝒅 ), paired (𝒅 = 𝑿𝟏 − 𝑿𝟐 ):
Sampling distribution of (𝑿
𝜇𝑋̅1 −𝑋̅2 = 𝐸 [𝑋̅1 − 𝑋̅2 ] = 𝜇1 − 𝜇2 𝜇𝑋̅𝑑 = 𝐸 [𝑋̅𝑑 ] = 𝜇1 − 𝜇2
𝜎2 𝜎2 𝜎𝑑2 𝜎12 +𝜎22 −2∗𝜌∗𝜎1 ∗𝜎2
𝜎𝑋2̅1 −𝑋̅2 = 𝑉 [𝑋̅1 − 𝑋̅2 ] = 𝑛1 + 𝑛2 𝜎𝑋2̅𝑑 = 𝑉 [𝑋̅𝑑 ] = =
1 2 𝑛 𝑛
𝜎12 𝜎22 𝜎𝑑 𝜎12 +𝜎22 −2∗𝜌∗𝜎1∗𝜎2
𝜎𝑋̅1 −𝑋̅2 = 𝑆𝐷[𝑋̅1 − 𝑋̅2 ] = √𝑛 + 𝑛 𝜎𝑋̅𝑑 = 𝑆𝐷[𝑋̅𝑑 ] = =√
1 2 √𝑛 𝑛
This Supplement will NOT be graded. UNSTAPLE this Supplement from Crowdmark Supplement: Page 9 of 13

Inference about a population proportion:


𝑃̂−𝑝0 𝑃(1−𝑃) ̂ ̂
𝒛 test statistic: 𝑧 = CI estimator: 𝑃̂ ± 𝑧𝛼⁄2 √
𝑝 (1−𝑝0 ) 𝑛
√ 0
𝑛

Inference about comparing two population proportions:


𝑃̂2 −𝑃̂1 𝑋1 +𝑋2
𝒛 test statistic under Null hypothesis of no difference: 𝑧 = ̅ (1−𝑃̅) 𝑃̅ (1−𝑃̅)
Pooled proportion: 𝑃̅ =
𝑃 𝑛1 +𝑛2
√ +
𝑛1 𝑛2

𝑃̂2 (1−𝑃̂2 ) 𝑃̂1 (1−𝑃̂1 )


CI estimator: (𝑃̂2 − 𝑃̂1 ) ± 𝑧𝛼/2 √ +
𝑛2 𝑛1

Inference about the population mean:


𝑋̅ −𝜇0 𝑠
𝒕 test statistic: 𝑡 = CI estimator: 𝑋̅ ± 𝑡𝛼/2 Degrees of freedom: 𝜈 = 𝑛 − 1
𝑠/√𝑛 √𝑛

Inference about a comparing two population means, independent samples, unequal variances:

(𝑋̅1 −𝑋̅2 )−Δ0 𝑠12 𝑠2


𝒕 test statistic: 𝑡 = CI estimator: (𝑋̅1 − 𝑋̅2 ) ± 𝑡𝛼⁄2√ + 𝑛2
2 2 𝑛1 2
𝑠 𝑠
√ 1+ 2
𝑛1 𝑛2

𝑠2 𝑠 2 2
( 1+ 2)
𝑛1 𝑛2
Degrees of freedom: 𝜈 = 2 2
1 𝑠2 1 𝑠2
( 1) + ( 2)
𝑛1 −1 𝑛1 𝑛2 −1 𝑛2

Inference about a comparing two population means, independent samples, assuming equal variances:

(𝑋̅1 −𝑋̅2 )−Δ0 𝑠𝑝2 𝑠𝑝2


𝒕 test statistic: 𝑡 = CI estimator: (𝑋̅1 − 𝑋̅2 ) ± 𝑡𝛼⁄2 √ +𝑛 Degrees of freedom: 𝜈 = 𝑛1 + 𝑛2 − 2
𝑠2 𝑠2
𝑛1 2
√ 𝑝+ 𝑝
𝑛1 𝑛2

(𝑛1 −1)𝑠12 +(𝑛2 −1)𝑠22


Pooled variance: 𝑠𝑝2 =
𝑛1 +𝑛2 −2

Inference about a comparing two population means, paired data: (𝑛 is number of pairs and 𝑑 = 𝑋1 − 𝑋2 )
𝑑̅−Δ0 𝑠𝑑
𝒕 test statistic: 𝑡 = CI estimator: 𝑋̅𝑑 ± 𝑡𝛼⁄2 Degrees of freedom: 𝜈 = 𝑛 − 1
𝑠𝑑⁄√𝑛 √𝑛

SIMPLE REGRESSION:

𝑠𝑥𝑦 𝑠𝑦
Model: 𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 + 𝜀𝑖 OLS line: 𝑦̂𝑖 = 𝑏0 + 𝑏1 𝑥𝑖 𝑏1 = = 𝑟𝑠 𝑏0 = 𝑌̅ − 𝑏1 𝑋̅
𝑠𝑥2 𝑥

Coefficient of determination: 𝑅 2 = (𝑟)2 Residuals: 𝑒𝑖 = 𝑦𝑖 − 𝑦̂𝑖

𝑆𝑆𝐸 ∑𝑛
𝑖=1(𝑒𝑖 −0)
2 𝑠𝑒
Standard deviation of residuals: 𝑠𝑒 = √ =√ Standard error of slope: 𝑠. 𝑒. (𝑏1 ) = 𝑠𝑏1 =
𝑛−2 𝑛−2
√(𝑛−1)𝑠𝑥2
This Supplement will NOT be graded. UNSTAPLE this Supplement from Crowdmark Supplement: Page 10 of 13

Inference about the population slope:


𝑏1 −𝛽10
𝒕 test statistic: 𝑡 = CI estimator: 𝑏1 ± 𝑡𝛼⁄2 𝑠. 𝑒. (𝑏1 ) Degrees of freedom: 𝜈 = 𝑛 − 2
𝑠.𝑒.(𝑏1 )
𝑠𝑒
Standard error of slope: 𝑠. 𝑒. (𝑏1 ) = 𝑠𝑏1 =
√(𝑛−1)𝑠𝑥2

Prediction interval for 𝒚 at given value of 𝒙 (𝒙𝒈 ):

2
1 𝑔 (𝑥 −𝑋̅ ) 2 2 𝑠 2
𝑦̂𝑥𝑔 ± 𝑡𝛼⁄2 𝑠𝑒 √1 + 𝑛 + (𝑛−1)𝑠2 or 𝑦̂𝑥𝑔 ± 𝑡𝛼⁄2 √(𝑠. 𝑒. (𝑏1 )) (𝑥𝑔 − 𝑋̅) + 𝑛𝑒 + 𝑠𝑒2
𝑥

Degrees of freedom: 𝜈 = 𝑛 − 2

Confidence interval for predicted mean at given value of 𝒙 (𝒙𝒈 ):

2
1 (𝑥𝑔 −𝑋̅ ) 2 2 𝑠𝑒2
𝑦̂𝑥𝑔 ± 𝑡𝛼⁄2 𝑠𝑒 √𝑛 + (𝑛−1)𝑠 2 or 𝑦̂𝑥𝑔 ± 𝑡𝛼⁄2 √(𝑠. 𝑒. (𝑏1 )) (𝑥𝑔 − 𝑋̅) + Degrees of freedom: 𝜈 = 𝑛 − 2
𝑥 𝑛

SIMPLE & MULTIPLE REGRESSION:

Model: 𝑦𝑖 = 𝛽0 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 + ⋯ + 𝛽𝑘 𝑥𝑘𝑖 + 𝜀𝑖


2 2 2 2
𝑆𝑆𝑇 = ∑𝑛𝑖=1(𝑦𝑖 − 𝑌̅ ) = 𝑆𝑆𝑅 + 𝑆𝑆𝐸 𝑆𝑆𝑅 = ∑𝑛𝑖=1(𝑦̂𝑖 − 𝑌̅ ) 𝑆𝑆𝐸 = ∑𝑛𝑖=1 𝑒𝑖 = ∑𝑛𝑖=1(𝑦𝑖 − 𝑦̂𝑖 )

𝑆𝑆𝑇 𝑆𝑆𝐸 𝑆𝑆𝐸 𝑆𝑆𝑅


𝑠𝑦2 = 𝑀𝑆𝐸 = 𝑅𝑜𝑜𝑡 𝑀𝑆𝐸 = √ 𝑀𝑆𝑅 =
𝑛−1 𝑛−𝑘−1 𝑛−𝑘−1 𝑘

𝑆𝑆𝑅 𝑆𝑆𝐸 𝑆𝑆𝐸⁄(𝑛−𝑘−1) 𝑘 𝑛−1


𝑅 2 = 𝑆𝑆𝑇 = 1 − 𝑆𝑆𝑇 𝐴𝑑𝑗. 𝑅2 = 1 − 𝑆𝑆𝑇⁄(𝑛−1)
= (𝑅2 − 𝑛−1) (𝑛−𝑘−1)

𝑆𝑆𝐸 ∑𝑛
𝑖=1(𝑒𝑖 −0)
2
Residuals: 𝑒𝑖 = 𝑦𝑖 − 𝑦̂𝑖 Standard deviation of residuals: 𝑠𝑒 = √ =√
𝑛−𝑘−1 𝑛−𝑘−1

Inference about the overall statistical significance of the regression model:


𝑅2 /𝑘 (𝑆𝑆𝑇−𝑆𝑆𝐸)/𝑘 𝑆𝑆𝑅/𝑘 𝑀𝑆𝑅
𝐹 = (1−𝑅2 )/(𝑛−𝑘−1) = 𝑆𝑆𝐸/(𝑛−𝑘−1) = 𝑆𝑆𝐸/(𝑛−𝑘−1) = 𝑀𝑆𝐸

Numerator degrees of freedom: 𝜈1 = 𝑘 Denominator degrees of freedom: 𝜈2 = 𝑛 − 𝑘 − 1

Inference about the population slope for explanatory variable j:


𝑏𝑗 −𝛽𝑗0
𝒕 test statistic: 𝑡 = CI estimator: 𝑏𝑗 ± 𝑡𝛼/2 𝑠𝑏𝑗 Degrees of freedom: 𝜈 = 𝑛 − 𝑘 − 1
𝑠𝑏 𝑗

Standard error of slope: 𝑠. 𝑒. (𝑏𝑗 ) = 𝑠𝑏𝑗 (for multiple regression, must be obtained from technology)
This Supplement will NOT be graded. UNSTAPLE this Supplement from Crowdmark Supplement: Page 11 of 13
This Supplement will NOT be graded. UNSTAPLE this Supplement from Crowdmark Supplement: Page 12 of 13
This Supplement will NOT be graded. UNSTAPLE this Supplement from Crowdmark Supplement: Page 13 of 13

You might also like