0% found this document useful (0 votes)
27 views9 pages

How To Perform A Chi - Square Test

A lesson of mine probably the I reviewed after exam...

Uploaded by

ceajohn177
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views9 pages

How To Perform A Chi - Square Test

A lesson of mine probably the I reviewed after exam...

Uploaded by

ceajohn177
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

How to Perform a Chi-Square Test?

Let's say you want to know if gender has anything to do with political party preference.
You poll 440 voters in a simple random sample to find out which political party they
prefer. The results of the survey are shown in the table below:

Observed Value Republican Democrat Independent Total


Male 100 70 30 200
Female 140 60 20 220
Total 240 130 50 420

To see if gender is linked to political party preference, perform a Chi-Square test of


independence using the steps below.
Step 1: Define the Hypothesis

H0: There is no link between gender and political party preference.


H1: There is a link between gender and political party preference.

Step 2: Calculate the Expected Values


Now you will calculate the expected frequency.

For example, the expected value for Male Republicans is:


(240)(200)
  109
420

Expected Value Republican Democrat Independent Total


Male 114.29 61.90 23.81 200
Female 125.71 68.10 26.19 220
Total 240 130 50 420

Similarly, you can calculate the expected value for each of the cells.

Observed Value Republican Democrat Independent Total


Male 100 70 30 200
Female 140 60 20 220
Total 240 130 50 420

Expected Value Republican Democrat Independent Total


Male 114.29 61.90 23.81 200
Female 125.71 68.10 26.19 220
Total 240 130 50 420
 O  E  for Each Cell in the Table
2

Step 3: Calculate
E
 O  E  for each cell in the table.
2

Now you will calculate the


E
Where
O = Observed Value
E = Expected Value

Expected Value Republican Democrat Independent Total


Male 1.7867 1.0599 1.6092 200
Female 1.6244 0.9634 1.4630 220
Total 240 130 50 420

Step 4: Calculate the Test Statistic X2

X2 is the sum of all the values in the last table


= 1.7867 + 1.0599 + 1.6092 + 1.6244 + 0.9634 + 1.4630
= 8.5066

Before you can conclude, you must first determine the critical statistic, which requires
determining our degrees of freedom.
The degrees of freedom in this case are equal to the table's number of columns minus
one multiplied by the table's number of rows minus one, or (r-1) (c-1).
We have (3-1)(2-1) = 2.

8.5066 > 5.991 *reject the null hypothesis


Finally, you compare our obtained statistic to the critical statistic found in the chi-square
table. As you can see, for an alpha level of 0.05 and two degrees of freedom, the critical
statistic is 5.991, which is less than our obtained statistic of 8.5066. You can reject our
null hypothesis because the critical statistic is higher than your obtained statistic.
This means you have sufficient evidence to say that there is an association between
gender and political party preference.

Chi-Square Practice Problems


1. Voting Patterns
Problem
A researcher wants to know if voting preferences (party A, party B, or party C) and
gender (male, female) are related. Apply a chi-square test to the following set of data:
 Male: Party A - 30, Party B - 20, Party C - 50
 Female: Party A - 40, Party B - 30, Party C – 30

Solution
To determine if gender influences voting preferences, run a chi-square test of
independence.
Observed Values
Gender Party A Party B Party C Total
Male 30 20 50 100
Female 40 30 30 100
Total 70 50 80 200
Expected Values
Gender Party A Party B Party C Total
Male 35 25 40 100
Female 35 35 40 100
Total 70 50 80 200
O  E
2

Solve for
E
Gender Party A Party B Party C Total
Male 0.714285 1 2.5 100
Female 0.714235 0.714285 2.5 100
Total 70 50 80 200
8.14855
(r-1)(c-1) = 2
2. Consumer Preferences

Problem
Customers are surveyed by a company to determine whether their age group (under 20,
20-40, over 40) and their preferred product category (food, apparel, or electronics) are
related. The information gathered is:
 Under 20: Electronic - 50, Clothing - 30, Food - 20
 20-40: Electronic - 60, Clothing - 70, Food - 50
 Over 40: Electronic - 30, Clothing - 40, Food – 80

Solution
Use a chi-square test to investigate the connection between product preference and
age group

Age Electronic Luxury/ Educational Shelter Clothing Food Total


Group wants
Under 50 30 20
20
20 - 40 60 70 50
Over 40 30 40 80
Total

3. State of Health
Problem
In a sample population, a medical study examines the association between smoking
status (smoker, non-smoker) and the occurrence of lung disease (yes, no). The
information is as follows:
 Smoker: Yes - 90, No - 60
 Non-smoker: Yes - 30, No - 120
Solution
To find out if smoking status is related to the incidence of lung disease, do a chi-square
test.

4. Academic Performance
Problem
An educational researcher looks at the relationship between students' success on
standardized tests (pass, fail) and whether or not they participate in after-school
programs. The information is as follows:
 Yes: Pass - 80, Fail - 20
 No: Pass - 50, Fail – 50

Solution
Use a chi-square test to determine if involvement in after-school programs and test
scores are connected.

5. Genetic Inheritance
Problem

A geneticist investigates how a particular trait is inherited in plants and seeks to


ascertain whether the expression of a trait (trait present, trait absent) and the existence
of a genetic marker (marker present, marker absent) are significantly correlated. The
information gathered is:
 Marker Present: Trait Present - 70, Trait Absent - 30
 Marker Absent: Trait Present - 40, Trait Absent – 60

Solution
Do a chi-square test to determine if there is a correlation between the trait's expression
and the genetic marker.

How to Solve Chi-Square Problems?


1. State the Hypotheses
 Null hypothesis (H0): There is no association between the variables
 Alternative hypothesis (H1): There is an association between the variables.
2. Calculate the Expected Frequencies
 Use the formula: E=(Row Total×Column Total)Grand TotalE = \frac{(Row \ Total
\times Column \ Total)}{Grand \ Total}E=Grand Total(Row Total×Column Total)
3. Compute the Chi-Square Statistic
 Use the formula: χ2=∑(O−E)2E\chi^2 = \sum \frac{(O - E)^2}{E}χ2=∑E(O−E)2, where
O is the observed frequency and E is the expected frequency.
4. Determine the Degrees of Freedom (df)
 Use the formula: df=(number of rows−1)×(number of columns−1)df = (number \ of \
rows - 1) \times (number \ of \ columns - 1)df=(number of rows−1)×(number of
columns−1)
5. Find the Critical Value and Compare
 Use the chi-square distribution table to find the critical value for the given df and
significance level (usually 0.05).
 Compare the chi-square statistic to the critical value to decide whether to reject the
null hypothesis.
These practice problems help you understand how chi-square analysis tests
hypotheses and explores relationships between categorical variables in various fields.

When to Use a Chi-Square Test?


A Chi-Square Test is used to examine whether the observed results are in order with
the expected values. When the data to be analyzed is from a random sample, and when
the variable is the question is a categorical variable, then Chi-Square proves the most
appropriate test for the same. A categorical variable consists of selections such as
breeds of dogs, types of cars, genres of movies, educational attainment, male v/s
female etc. Survey responses and questionnaires are the primary sources of these
types of data. The Chi-square test is most commonly used for analysing this kind of
data. This type of analysis is helpful for researchers who are studying survey response
data. The research can range from customer and marketing research to political
sciences and economics.

Chi-Square Distribution
Chi-square distributions (X2) are a type of continuous probability distribution. They're
commonly utilized in hypothesis testing, such as the chi-square goodness of fit and
independence tests. The parameter k, which represents the degrees of freedom,
determines the shape of a chi-square distribution.
A chi-square distribution is followed by very few real-world observations. The objective
of chi-square distributions is to test hypotheses, not to describe real-world distributions.
In contrast, most other commonly used distributions, such as normal and Poisson
distributions, may explain important things like baby birth weights or illness cases per
year.
Because of its close resemblance to the conventional normal distribution, chi-square
distributions are excellent for hypothesis testing. Many essential statistical tests rely on
the conventional normal distribution.
In statistical analysis, the Chi-Square distribution is used in many hypothesis tests and
is determined by the parameter k degree of freedoms. It belongs to the family of
continuous probability distributions. The Sum of the squares of the k independent
standard random variables is called the Chi-Squared distribution. Pearson’s Chi-Square
Test formula is -
Where X^2 is the Chi-Square test symbol
Σ is the summation of observations
O is the observed results
E is the expected results
The shape of the distribution graph changes with the increase in the value of k, i.e.
degree of freedoms.
When k is 1 or 2, the Chi-square distribution curve is shaped like a backwards ‘J’. It
means there is a high chance that X^2 becomes close to zero.

Courtesy: Scribbr
When k is greater than 2, the shape of the distribution curve looks like a hump and has
a low probability that X^2 is very near to 0 or very far from 0. The distribution occurs
much longer on the right-hand side and shorter on the left-hand side. The probable
value of X^2 is (X^2 - 2).
Courtesy: Scribbr
When k is greater than ninety, a normal distribution is seen, approximating the Chi-
square distribution.
What is the P-Value in a Chi-Square Test?
The P-Value in a Chi-Square test is a statistical measure that helps to assess the
importance of your test results.
Here P denotes the probability; hence for the calculation of p-values, the Chi-Square
test comes into the picture. The different p-values indicate different types of hypothesis
interpretations.
1. P <= 0.05 (Hypothesis interpretations are rejected)
2. P>= 0.05 (Hypothesis interpretations are accepted)
The concepts of probability and statistics are entangled with Chi-Square Test.
Probability is the estimation of something that is most likely to happen. Simply put, it is
the possibility of an event or outcome of the sample. Probability can understandably
represent bulky or complicated data. And statistics involves collecting and organising,
analysing, interpreting and presenting the data.
Finding P-Value
When you run all of the Chi-square tests, you'll get a test statistic called X2. You have
two options for determining whether this test statistic is statistically significant at some
alpha level:
1. Compare the test statistic X2 to a critical value from the Chi-square distribution table.
2. Compare the p-value of the test statistic X2 to a chosen alpha level.
Test statistics are calculated by taking into account the sampling distribution of the test
statistic under the null hypothesis, the sample data, and the approach which is chosen
for performing the test.
The p-value will be as mentioned in the following cases.
 A lower-tailed test is specified by: P(TS ts | H0 is true) p-value = cdf (ts)
 Lower-tailed tests have the following definition: P(TS ts | H0 is true) p-value = cdf (ts)
 A two-sided test is defined as follows, if we assume that the test static distribution of
H0 is symmetric about 0. 2 * P(TS |ts| | H0 is true) = 2 * (1 - cdf(|ts|))
Where:
P: probability Event
TS: Test statistic is computed observed value of the test statistic from your sample cdf():
Cumulative distribution function of the test statistic's distribution (TS)
Properties of Chi-Square Test
1. Variance is double the times the number of degrees of freedom.
2. Mean distribution is equal to the number of degrees of freedom.
3. When the degree of freedom increases, the Chi-Square distribution curve becomes
normal.
Limitations of Chi-Square Test
There are two limitations to using the chi-square test that you should be aware of.
 The chi-square test, for starters, is extremely sensitive to sample size. Even
insignificant relationships can appear statistically significant when a large enough
sample is used. Keep in mind that "statistically significant" does not always imply
"meaningful" when using the chi-square test.
 Be mindful that the chi-square can only determine whether two variables are related.
It does not necessarily follow that one variable has a causal relationship with the
other. It would require a more detailed analysis to establish causality.

You might also like