L23 Chi Square
Are the frequencies in groups
significantly different?
CHI Square
• used when we have nominal/ordinal
data for both variables
– numbers represent a frequency in a
particular category.
• How many people are from urban vs. rural
• How many listened to Fox vs. not Fox
• Basically, when you cannot calculate means
because they wouldn’t make sense
– If half the people live in rural areas, and half live in
urban, the mean doesn’t = suburban
Example question
• Whether there’s a relationship between voting
behavior (Republican, Democratic, Didn’t vote)
and source of news?
Republican Democratic Didn’t vote
NPR
NBC
Fox
DV: interval/ratio & normal distribution?
NO
YES DV: categories
IV: IV: IV:
categories continuous categories
1 IV, 1 IV, 2+ IV(s) &
2 levels 3+ levels IVs Covariate(s)
t test One way Two-way ANCOVA Correlation Chi
ANOVA ANOVA Square
The logic of the test
• What would we expect if there were no
difference between the groups?
• What would we expect if there were no
relationship between the variables?
• How far off are the data from what we’d
expect?
– Is this difference big enough for us to conclude
that it’s unlikely to be random error?
• What would be the null hypothesis?
• What would be the alternative?
Republican Democratic Didn’t
vote
NPR
NBC
Fox
Same procedure as before
• Literature review to formulate hypothesis.
• Gather data
• Choose statistic & calculate critical value
based on degrees of freedom
• Calculate statistic for sample & compare to
critical value
• If bigger, reject the null hypothesis
• c2 = ∑ (Observed - Expected)2
Expected
• Take observed - expected for each cell
• Square that difference
• Divide by what was expected for that cell.
• Repeat for each cell
• Add them all up!
How do we get the critical value?
• What are the degrees of freedom?
• (A-1) x (B-1)
– A = # of levels of factor A
– B = # of levels of factor B
• α = .05 Republican Democratic Didn’t
vote
Daily
Show
CNN
Broadcast
What would we expect if there’s
no relationship?
Republican Democratic Didn’t
vote
NPR 245
NBC 260
Fox 125
220 300 110 630
What would we expect if there’s
no relationship? (don’t round too early!)
Republican Democrat Didn’t
vote
NPR 245 X 220 245 X 300 245
630 630
NBC 260
Fox 125
220 300 110 630
What would we expect if there’s
no relationship?
(don’t round too early!)
Republican Democrat Didn’t
vote
NPR 245 X 220 245 X 300 245 X 110 245
630 = 85.556 630 = 116.667 630 = 42.778
NBC 260 x 220 260 x 300 260 x 110 260
630 = 90.794 630 = 123.810 630 = 45.397
Fox 125 x 220 125 x 300 125 x 110 125
630 = 43.651 630 = 59.524 630 = 21.825
220 300 110 630
What did we find/observe?
Republican Democrat Didn’t
vote
NPR 60 150 35 245
NBC 60 140 60 260
Fox 100 10 15 125
220 300 110 630
So calculate your chi square!
• Follow the formula
• c2 = ∑ (Observed - Expected)2
Expected
• For each cell
– Observed - expected
– Square the difference
– Divide by expected for that cell
– Add them up!
(Observed - expected)2
Republican Democrat Didn’t vote
NPR (60 - 85.556)2 (150-116.667)2
85.556 116.667
NBC
Fox
(Observed - expected)2
expected
Republican Democrat Didn’t vote
NPR 653.31 1110.89 60.53
85.56 = 7.64 116.67 = 9.52 42.78 = 1.42
NBC 948.02 262.12 213.16
90.79 = 10.44 123.81 = 2.12 45.40 = 4.70
Fox 3175.32 2452.23 46.65
43.65 = 72.75 59.52 = 41.20 21.83 = 2.14
• Add the cells together
• c2 = 151.93
• This is WAY bigger than critical value of 9.49
• so we reject the null.
• Conclude there are differences in patterns of viewing
among Republicans, Dems, and nonvoters.
• Can often tell where differences are by looking at the table
– Need post-hoc tests for statistical assessment
The results of the chi-square test indicated a significant relationship
between voting patterns and news source c2 (4) = 151.91, N = 630, p
< .001. Those voting Republican were most likely to get their news
from Fox (45.5%), those voting for Democratic candidates were most
likely to get their news from…
What about moderation?
• Run chi square test within levels of third variable
to see if the relationship varies.
Issues with chi square
- Critical value ONLY depends on number of cells (df),
NOT number of people
- Need to consider how to keep # of cells manageable
- May need to collapse categories to avoid empty or small cells
- Need at least 5 cases per cell: Not a good test when
expected frequencies are less than 5
- Use Chi Square on raw data of frequencies not
proportions or percentages. Must convert proportions to
frequencies using sample size