Kruskal-Wallis test in R Programming
Last Updated :
16 May, 2022
The Kruskal–Wallis test in R Programming Language is a rank-based test that is similar to the Mann–Whitney U test but can be applied to one-way data with more than two groups. It is a non-parametric alternative to the one-way ANOVA test, which extends the two-samples Wilcoxon test. A group of data samples is independent if they come from unrelated populations and the samples do not affect each other. Using the Kruskal-Wallis Test, it can be decided whether the population distributions are similar without assuming them to follow the normal distribution. It is very much easy to perform Kruskal-Wallis test in the R language.
Note: The outcome of the Kruskal–Wallis test tells that if there are differences among the groups, but doesn’t tell which groups are different from other groups.
Examples:
- Let one wants to find out how socioeconomic status influences attitude towards sales tax hikes. Here the independent variable is “socioeconomic status” with three levels: working-class, middle-class, and wealthy. The dependent variable is measured on a 5-point Likert scale from strongly agree to strongly disagree.
- If one wants to find out how test anxiety influences actual test scores. The independent variable “test anxiety” has three levels: no anxiety, low-medium anxiety, and high anxiety. The dependent variable is the exam score and it is rated from 0 to 100%.
Assumptions for the Kruskal-Wallis test in R
The variables should have:
- One independent variable with two or more levels. The test is more commonly used when there are three or more levels. For two levels instead of the Kruskal-Wallis test consider using the Mann Whitney U Test.
- The dependent variable should be the Ordinal scale, Ratio Scale, or Interval scale.
- The observations should be independent. In other words, there should be no correlation between the members in every group or within groups.
- All groups should have identical shape distributions.
Implementation in R
R provides a method kruskal.test() which is available in the stats package to perform a Kruskal-Wallis rank-sum test.
Syntax: kruskal.test(x, g, formula, data, subset, na.action, …)
Parameters:
- x: a numeric vector of data values, or a list of numeric data vectors.
- g: a vector or factor object giving the group for the corresponding elements of x
- formula: a formula of the form response ~ group where response gives the data values and group a vector or factor of the corresponding groups.
- data: an optional matrix or data frame containing the variables in the formula .
- subset: an optional vector specifying a subset of observations to be used.
- na.action: a function which indicates what should happen when the data contain NA
...: further arguments to be passed to or from methods.
Example:
Let's use the built-in R data set named PlantGrowth. It contains the weight of plants obtained under control and two different treatment conditions.
R
# Preparing the data set
# to perform Kruskal-Wallis Test
# Taking the PlantGrowth data set
myData = PlantGrowth
print(myData)
# Show the group levels
print(levels(myData$group))
Output:
weight group
1 4.17 ctrl
2 5.58 ctrl
3 5.18 ctrl
4 6.11 ctrl
5 4.50 ctrl
6 4.61 ctrl
7 5.17 ctrl
8 4.53 ctrl
9 5.33 ctrl
10 5.14 ctrl
11 4.81 trt1
12 4.17 trt1
13 4.41 trt1
14 3.59 trt1
15 5.87 trt1
16 3.83 trt1
17 6.03 trt1
18 4.89 trt1
19 4.32 trt1
20 4.69 trt1
21 6.31 trt2
22 5.12 trt2
23 5.54 trt2
24 5.50 trt2
25 5.37 trt2
26 5.29 trt2
27 4.92 trt2
28 6.15 trt2
29 5.80 trt2
30 5.26 trt2
[1] "ctrl" "trt1" "trt2"
Here the column “group” is called factor and the different categories (“ctr”, “trt1”, “trt2”) are named factor levels. The levels are ordered alphabetically. The problem statement is we want to know if there is any significant difference between the average weights of plants in the 3 experimental conditions. And the test can be performed using the function kruskal.test() as given below.
R
# R program to illustrate
# Kruskal-Wallis Test
# Taking the PlantGrowth data set
myData = PlantGrowth
# Performing Kruskal-Wallis test
result = kruskal.test(weight ~ group,
data = myData)
print(result)
Output:
Kruskal-Wallis rank sum test
data: weight by group
Kruskal-Wallis chi-squared = 7.9882, df = 2, p-value = 0.01842
Explanation:
As the p-value is less than the significance level 0.05, it can be concluded that there are significant differences between the treatment groups.
Similar Reads
Mann Whitney U Test in R Programming A popular nonparametric(distribution-free) test to compare outcomes between two independent groups is the Mann Whitney U test. When comparing two independent samples, when the outcome is not normally distributed and the samples are small, a nonparametric test is appropriate. It is used to see the di
4 min read
Leveneâs Test in R Programming Levene's test is an inferential statistic used to assess whether the variances of a variable are equal across two or more groups, especially when the data comes from a non-normal distribution. This test checks the assumption of homoscedasticity (equal variances) before conducting tests like ANOVA. I
3 min read
ShapiroâWilk Test in R Programming The Shapiro-Wilk's test or Shapiro test is a normality test in frequentist statistics. The null hypothesis of Shapiro's test is that the population is distributed normally. It is among the three tests for normality designed for detecting all kinds of departure from normality. If the value of p is eq
4 min read
How to Perform Post Hoc Test for Kruskal-Wallis in R The Kruskal-Wallis test is a non-parametric statistical test used to determine if there are significant differences between the medians of three or more independent groups. While the Kruskal-Wallis test can tell us whether there's an overall significant difference, it does not pinpoint which specifi
4 min read
8 Coding Style Tips for R Programming R is an open-source programming language that is widely used as a statistical software and data analysis tool. R generally comes with the Command-line interface. R is available across widely used platforms like Windows, Linux, and macOS. Also, the R programming language is the latest cutting-edge to
5 min read
Fligner-Killeen Test in R Programming The Fligner-Killeen test is a non-parametric test for homogeneity of group variances based on ranks. It is useful when the data are non-normally distributed or when problems related to outliers in the dataset cannot be resolved. It is also one of the many tests for homogeneity of variances which is
3 min read
Kendall Correlation Testing in R Programming Correlation is a statistical measure that indicates how strongly two variables are related. It involves the relationship between multiple variables as well. For instance, if one is interested to know whether there is a relationship between the heights of fathers and sons, a correlation coefficient c
4 min read
Kolmogorov-Smirnov Test in R Programming Kolmogorov-Smirnov (K-S) test is a non-parametric test employed to check whether the probability distributions of a sample and a control distribution, or two samples are equal. It is constructed based on the cumulative distribution function (CDF) and calculates the greatest difference between the em
4 min read
Bartlettâs Test in R Programming In statistics, Bartlett's test is used to test if k samples are from populations with equal variances. Equal variances across populations are called homoscedasticity or homogeneity of variances. Some statistical tests, for example, the ANOVA test, assume that variances are equal across groups or sam
5 min read
Permutation Hypothesis Test in R Programming In simple words, the permutation hypothesis test in R is a way of comparing a numerical value of 2 groups. The permutation Hypothesis test is an alternative to: Independent two-sample t-test Mann-Whitney U aka Wilcoxon Rank-Sum Test Let's implement this test in R programming. Why use the Permutatio
6 min read