Open In App

Kolmogorov-Smirnov Test in R Programming

Last Updated : 16 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Kolmogorov-Smirnov (K-S) test is a non-parametric test employed to check whether the probability distributions of a sample and a control distribution, or two samples are equal. It is constructed based on the cumulative distribution function (CDF) and calculates the greatest difference between the empirical distribution function (EDF) of the sample and the theoretical or empirical distribution of the control sample.

The Kolmogorov-Smirnov test is mostly used for two purposes:

  1. One-sample K-S test: To compare the sample distribution to a known reference distribution.
  2. Two-sample K-S test: To compare the two independent samples' distributions.

The K-S test is formulated on the basis of the maximum difference between the observed and expected cumulative distribution functions (CDFs). The test is non-parametric, as it does not assume any specific distribution for the sample data. This makes it especially helpful in testing the goodness-of-fit for continuous distributions.

Kolmogorov-Smirnov Test Formula

The formula of the Kolmogorov-Smirnov test can be expressed as:

D_n=sup_x\left| F_n(x) - F(x)\right|

where,

  • supx : the supremum of the set of distances
  •  Fn(x) : the empirical distribution function for n id observations Xi

The empirical distribution function is a distribution function that is associated with the empirical measures of the chosen sample. Being a step function, this cumulative distribution jumps up by a 1/n step at each and every n data point. 

One Sample Kolmogorov-Smirnov Test in R

The K-S test can be performed using the ks.test() function in R. 

Syntax:

ks.text(x, y, alternative = c("two.sided", "less", "greater"), exact= NULL, tol= 1e-8, 
simulate.p.value = FALSE, B=2000)

Parameters:

  • x: numeric vector of data values
  • y: numeric vector of data values or a character string which is used to name a cumulative distribution function.
  • alternative: used to indicate the alternate hypothesis.
  • exact: usually NULL or it indicates a logic that an exact p-value should be computed.
  • tol: an upper bound used for rounding off errors in the data values.
  • simulate.p.value: a logic that checks whether to use Monte Carlo method to compute the p-value.
  • B: an integer value that indicates the number of replicates to be created while using the Monte Carlo method.

Example: One-Sample K-S Test

First, install the required packages. For performing the K-S test we need to install the "dgof" package using the install.packages() function from the R console.
The rnorm() function is used to generate random variates.

R
install.packages("dgof")
library("dgof")

# Generate a random sample
x1 <- rnorm(100)

# Perform the K-S test to check if the sample follows a normal distribution
ks.test(x1, "pnorm")

Output

One-sample Kolmogorov-Smirnov test

data: x1
D = 0.08831, p-value = 0.4165
alternative hypothesis: two-sided

Two Sample Kolmogorov-Smirnov Test in R

The two-sample K-S test is used to compare two samples to see whether they belong to the same distribution. The ks.test() function can also be used in R for this.

Example: Two-Sample K-S Test

We will generate two random samples using rnorm() and runif(), then perform the two-sample K-S test. for this purpose, use the ks.test() of the dgof package.

R
# Generate two random samples
x <- rnorm(50)  # Normal distribution
y <- runif(30)  # Uniform distribution

ks.test(x, y)

Output

Two-sample Kolmogorov-Smirnov test

data: x and y
D = 0.56, p-value = 6.303e-06
alternative hypothesis: two-sided

Visualization of the Kolmogorov- Smirnov Test

Visualization is a essential component of the K-S test because it allows us to visualize the difference between the cumulative distribution functions of the two samples. Here, we plot the empirical CDFs of two samples in order to visually check the difference.

Example: Visualizing the Two-Sample K-S Test

Here we are generating both samples using the rnorm() functions and then plotting them.  

R
# Generate two random samples
x <- rnorm(50)
x2 <- rnorm(50, -1)  # Slightly shifted normal distribution

# Plot the empirical CDFs of both samples
plot(ecdf(x), xlim = range(c(x, x2)), col = "blue")
plot(ecdf(x2), add = TRUE, lty = "dashed", col = "red")

# Perform the two-sample K-S test
ks.test(x, x2, alternative = "l")

Output: 

Two-sample Kolmogorov-Smirnov test

data: x and x2
D^- = 0.4, p-value = 0.0003355
alternative hypothesis: the CDF of x lies below that of y
Visualization-of-the-Kolmogorov--Smirnov-Test-in-R
Visualization of the Kolmogorov- Smirnov Test in R

Related Article:


Next Article

Similar Reads