Open In App

Chi-Square Test in R

Last Updated : 11 Jul, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

In R, the chi-square statistic is used to check if the distributions of categorical variables are different from each other. It's used when comparing the counts of categories between two or more independent groups.

The chi-square test of independence helps to find out if there's a relationship between the categories of two variables. There are two main types of data: numerical (numbers) and categorical (categories) on which it is performed.

Syntax:

chisq.test(data)

Parameters:

  • data: table containing count values of the variables in the table.

Implementation of Chi-Square test

We can implement the Chi- Square test in R programming language, using the MASS package.

1. Installing the libraries

We can install the MASS package using the install.packages() function and load it using library() function once installed. MASS library contains various datasets and functions for statistical analysis. We will use the str() function to display the structure of the survey dataset.

R
install.packages("MASS")
library(MASS)     

print(str(survey))

Output:

summary-of-survey-dataset
Summary of survey dataset

2. Creating a Contingency Table from Survey Data

We are creating a data frame stu_data from the survey dataset, selecting the Smoke and Exer variables. The Smoke column records the students smoking habits while the Exer column records their exercise level. Then, we use the table() function to create a contingency table, which summarizes the relationship between the Smoke and Exer variables. Our aim is to test the hypothesis whether the students smoking habit is dependent of their exercise level at .05 significance level.

R
stu_data = data.frame(survey$Smoke,survey$Exer)

stu_data = table(survey$Smoke,survey$Exer) 
                
print(stu_data)

Output:

smoke_vs_exerc-
Table of smoke and exercise variables

3. Applying Chi-Square Test

We are applying the chisq.test() function to the stu_data contingency table to perform a chi-square test. This test evaluates whether there is a significant association between the Smoke and Exer variables in the dataset.

R
print(chisq.test(stu_data))

Output:

chi-test
Chi-Square test result

As the p-value 0.4828 is greater than the .05, we conclude that the smoking habit is independent of the exercise level of the student and hence there is a weak or no correlation between the two variables.

4. Visualize the Chi-Square Test data

We are creating a bar plot using the barplot() function to visualize the relationship between smoking habits and exercise levels from the stu_data contingency table. The bars are grouped by exercise level (beside = TRUE), with different colors (lightblue for smokers and lightgreen for non-smokers).

R
barplot(stu_data, beside = TRUE, col = c("red", "lightgreen","lightblue","blue"),
        main = "Smoking Habits vs Exercise Levels",
        xlab = "Exercise Level", ylab = "Number of Students")

legend("center", legend = rownames(stu_data), fill = c("red", "lightgreen","lightblue","blue"))

Output:

smoke
barplot

In this article, we explored how to create a contingency table, apply the chi-square test and visualize the relationship between two variables using a bar plot in R.


Article Tags :

Similar Reads