Kolmogorov-Smirnov Test in R Programming

Last Updated : 16 Apr, 2025

Kolmogorov-Smirnov (K-S) test is a non-parametric test employed to check whether the probability distributions of a sample and a control distribution, or two samples are equal. It is constructed based on the cumulative distribution function (CDF) and calculates the greatest difference between the empirical distribution function (EDF) of the sample and the theoretical or empirical distribution of the control sample.

The Kolmogorov-Smirnov test is mostly used for two purposes:

One-sample K-S test: To compare the sample distribution to a known reference distribution.
Two-sample K-S test: To compare the two independent samples' distributions.

The K-S test is formulated on the basis of the maximum difference between the observed and expected cumulative distribution functions (CDFs). The test is non-parametric, as it does not assume any specific distribution for the sample data. This makes it especially helpful in testing the goodness-of-fit for continuous distributions.

Kolmogorov-Smirnov Test Formula

The formula of the Kolmogorov-Smirnov test can be expressed as:

D_n=sup_x\left| F_n(x) - F(x)\right|

where,

sup_x : the supremum of the set of distances
F_n(x) : the empirical distribution function for n id observations X_i

The empirical distribution function is a distribution function that is associated with the empirical measures of the chosen sample. Being a step function, this cumulative distribution jumps up by a 1/n step at each and every n data point.

One Sample Kolmogorov-Smirnov Test in R

The K-S test can be performed using the ks.test() function in R.

Syntax:

ks.text(x, y, alternative = c("two.sided", "less", "greater"), exact= NULL, tol= 1e-8,
simulate.p.value = FALSE, B=2000)

Parameters:

x: numeric vector of data values
y: numeric vector of data values or a character string which is used to name a cumulative distribution function.
alternative: used to indicate the alternate hypothesis.
exact: usually NULL or it indicates a logic that an exact p-value should be computed.
tol: an upper bound used for rounding off errors in the data values.
simulate.p.value: a logic that checks whether to use Monte Carlo method to compute the p-value.
B: an integer value that indicates the number of replicates to be created while using the Monte Carlo method.

Example: One-Sample K-S Test

First, install the required packages. For performing the K-S test we need to install the "dgof" package using the install.packages() function from the R console.
The rnorm() function is used to generate random variates.

install.packages("dgof")
library("dgof")

# Generate a random sample
x1 <- rnorm(100)

# Perform the K-S test to check if the sample follows a normal distribution
ks.test(x1, "pnorm")

Output

One-sample Kolmogorov-Smirnov test

data:  x1
D = 0.08831, p-value = 0.4165
alternative hypothesis: two-sided

Two Sample Kolmogorov-Smirnov Test in R

The two-sample K-S test is used to compare two samples to see whether they belong to the same distribution. The ks.test() function can also be used in R for this.

Example: Two-Sample K-S Test

We will generate two random samples using rnorm() and runif(), then perform the two-sample K-S test. for this purpose, use the ks.test() of the dgof package.

# Generate two random samples
x <- rnorm(50)  # Normal distribution
y <- runif(30)  # Uniform distribution

ks.test(x, y)

Output

Two-sample Kolmogorov-Smirnov test

data:  x and y
D = 0.56, p-value = 6.303e-06
alternative hypothesis: two-sided

Visualization of the Kolmogorov- Smirnov Test

Visualization is a essential component of the K-S test because it allows us to visualize the difference between the cumulative distribution functions of the two samples. Here, we plot the empirical CDFs of two samples in order to visually check the difference.

Example: Visualizing the Two-Sample K-S Test

Here we are generating both samples using the rnorm() functions and then plotting them.

# Generate two random samples
x <- rnorm(50)
x2 <- rnorm(50, -1)  # Slightly shifted normal distribution

# Plot the empirical CDFs of both samples
plot(ecdf(x), xlim = range(c(x, x2)), col = "blue")
plot(ecdf(x2), add = TRUE, lty = "dashed", col = "red")

# Perform the two-sample K-S test
ks.test(x, x2, alternative = "l")

Output:

Two-sample Kolmogorov-Smirnov test

data:  x and x2
D^- = 0.4, p-value = 0.0003355
alternative hypothesis: the CDF of x lies below that of y

Visualization-of-the-Kolmogorov--Smirnov-Test-in-R — Visualization of the Kolmogorov- Smirnov Test in R

Related Article:
Empirical Probability
Understanding Hypothesis Testing
ML | Kolmogorov-Smirnov Test

Moore – Penrose Pseudoinverse in R Programming

shaonim8

Improve

Article Tags :

Practice Tags :

Machine Learning

Kolmogorov-Smirnov Test in R Programming

Kolmogorov-Smirnov Test Formula

One Sample Kolmogorov-Smirnov Test in R

Example: One-Sample K-S Test

Two Sample Kolmogorov-Smirnov Test in R

Example: Two-Sample K-S Test

Visualization of the Kolmogorov- Smirnov Test

Example: Visualizing the Two-Sample K-S Test

Similar Reads

Getting Started With Machine Learning In R

Data Processing

Supervised Learning

Evaluation Metrics

Unsupervised Learning

Model Selection and Evaluation

Reinforcement Learning

Dimensionality Reduction

Advanced Topics

Thank You!

What kind of Experience do you want to share?