SCSI2143: PROBABILITY & STATISTICAL DATA
ANALYSIS
CHAPTER 6
Chi-Square Test & Contingency
Analysis
(Chi-Square Test for k Proportions,
Chi-Square Test of Independence Contingency Table)
1
Chi-Square Test
&
One Way Contingency Table
Categories with Equal Frequencies/Probabilities
Categories with Unequal Frequencies/Probabilities
2
Multinomial Experiment
An experiment that meets the following conditions:
1. The number of trials is fixed.
2. The trials are independent.
3. All outcomes of each trial must be classified into
exactly one of several different categories.
4. The probabilities for the different categories remain
constant for each trial.
3
Multinomial Experiment (cont.)
• n identical trials
• k outcomes to each trial
• Constant outcome probability, pk
• Independent trials
• Random variable is count, ok
• Example: Ask 100 People (n) which of 3
candidates (k) they will vote for.
4
Goodness-of-fit Test
Goodness-of-fit test is used to test the
hypothesis that an observed frequency
distribution fits (or conforms to) some
claimed distribution.
5
Goodness-of-fit Test (cont.)
Notation:
0 represents the observed frequency of an outcome
E represents the expected frequency of an outcome
k represents the number of different categories or
outcomes
n represents the total number of trials
6
Expected Frequencies
If all expected frequencies are equal:
n
E=
k
the sum of all observed frequencies divided by the
number of categories.
7
Expected Frequencies (cont.)
If all expected frequencies are not all equal:
E = n*p
each expected frequency is found by multiplying the sum
of all observed frequencies (n) by the probability for the
category (p).
8
Expected Frequencies (cont.)
Key Question :
Are the differences between the observed values (O)
and the theoretically expected values (E) statistically
significant?
Answer:
We need to measure the discrepancy between O and E;
the test statistic will involve their difference: O - E
9
Chi-Square Test
(O - E)2
=
Test statistic value->
2
E calculated.
Critical Values (Chi-square value from table):
1. Found in table 2 using k-1 degrees of freedom
where k = number of categories.
2. Goodness-of-fit hypothesis tests are always right-tailed.
10
Test Hypothesis
H0: No difference between observed
and expected probabilities.
H1: At least one of the probabilities is
different from the others.
11
• A close agreement between observed and expected
values will lead to a small value of 2 and a large p-
value.
• A large disagreement between observed and expected
values will lead to a large value of 2 and a small p-
value.
• A significantly large value of 2 will cause a rejection of
the null hypothesis of no difference between the
observed and the expected.
12
Relationships
Among
Components in
Goodness-of-Fit
Hypothesis Test
13
Chi-Square (2) Test for k Proportions
• Tests Equality (=) of Proportions Only
Example: p1 = 0.2, p2= 0.3, p3 = 0.5
• One variable with several levels.
• Assumptions:
Multinomial Experiment
Large Sample Size
• All expected counts 5
• Uses One-Way Contingency Table
14
One-Way Contingency Table
• Shows number of observations in k Independent
Groups (Outcomes or Variable Levels)
Outcomes (k = 3)
Candidate
Tom Bill Mary Total
35 20 45 100
Number of responses
15
Finding Critical Value
Example: What is the critical 2 value if k = 3, and =0.05?
If oi = ei, 2 = 0. Do not reject R e je c t
H0
= 0.05
0 2
2 Table (Portion) Upper Tail Area
DF .995 … .95 … .05
1 ... … 0.004 … 3.841
Df = k - 1 = 2 2 0.010 … 0.103 … 5.991
16