KMP (Knuth-Morris-Pratt) Algorithm for Pattern Searching in C
Last Updated :
13 Aug, 2024
The KMP (Knuth-Morris-Pratt) algorithm is an efficient string searching algorithm used to find occurrences of a pattern within a text. Unlike simpler algorithms, KMP preprocesses the pattern to create a partial match table, known as the "lps" (Longest Prefix Suffix) array, which helps in skipping unnecessary comparisons. This makes it faster and more efficient, especially for large texts.
In this article, we will explore the KMP algorithm and its implementation in the C programming language.
Example
Input:
char txt[] = "AABAACAADAABAABA";
char pat[] = "AABA";
Output:
Pattern found at index 0
Pattern found at index 9
Pattern found at index 12
The KMP algorithm is a pattern matching algorithm that preprocesses the pattern to create an array (lps[]) that stores the length of the longest proper prefix which is also a suffix for each prefix of the pattern. This preprocessing step allows the algorithm to skip rechecking parts of the text that have already been matched, thus improving efficiency.
Steps to Implement the KMP Algorithm in C
The KMP algorithm works by comparing the pattern with the text from left to right. When a mismatch occurs after matching a few characters, the algorithm uses the lps[] array to avoid unnecessary comparisons by shifting the pattern to the right place.
- Preprocess the pattern to create the lps[] array that stores the length of the longest proper prefix which is also a suffix for the pattern. The lps[] array is used to skip characters in the text when a mismatch occurs.
- Initialize two pointers, one for the text and one for the pattern, to compare characters.
- Start comparing the first character of the pattern with the first character of the text.
- If the characters match,
- move both pointers forward.
- If they mismatch,
- use the lps[] array to determine how far the pattern pointer should move.
- Continue this process until the pattern is found or the text is fully searched.
Working of KMP Algorithm in C
Consider the following example to understand how the KMP algorithm works:
KMP AlgorithmText: AABAACAADAABAABA
Pattern: AABA
1. Initial Comparison:
Compare the first window of text with the pattern.
Characters match for the first four positions.
Match found at index 0.
2. Shift Pattern Using lps[]
Use the lps[] array to shift the pattern, skipping unnecessary comparisons.
Continue comparing the pattern with the next segments of the text.
3. Next Matches:
Match found at index 9
Match found at index 12.
C Program to Implement KMP Algorithm
Below is a C program that implements the KMP algorithm for pattern searching.
C
//C program to implement KMP Algorithm
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// Fills lps[] for given pattern pat
void computeLPSArray(const char* pat, int M, int* lps){
// Length of the previous longest prefix suffix
int len = 0;
// lps[0] is always 0
lps[0] = 0;
// Loop calculates lps[i] for i = 1 to M-1
int i = 1;
while (i < M) {
if (pat[i] == pat[len]) {
len++;
lps[i] = len;
i++;
}
else {
if (len != 0) {
len = lps[len - 1];
}
else {
lps[i] = 0;
i++;
}
}
}
}
// Prints occurrences of pat in txt and returns an array of
// occurrences
int* KMPSearch(const char* pat, const char* txt, int* count){
int M = strlen(pat);
int N = strlen(txt);
// Create lps[] that will hold the longest prefix suffix
// values for pattern
int* lps = (int*)malloc(M * sizeof(int));
// Preprocess the pattern (calculate lps[] array)
computeLPSArray(pat, M, lps);
int* result = (int*)malloc(N * sizeof(int));
// Number of occurrences found
*count = 0;
int i = 0; // index for txt
int j = 0; // index for pat
while ((N - i) >= (M - j)) {
if (pat[j] == txt[i]) {
j++;
i++;
}
if (j == M) {
// Record the occurrence (1-based index)
result[*count] = i - j + 1;
(*count)++;
j = lps[j - 1];
}
else if (i < N && pat[j] != txt[i]) {
if (j != 0) {
j = lps[j - 1];
}
else {
i = i + 1;
}
}
}
free(lps);
return result;
}
int main(){
const char txt[] = "geeksforgeeks";
const char pat[] = "geeks";
int count;
// Call KMPSearch and get the array of occurrences
int* result = KMPSearch(pat, txt, &count);
// Print all the occurrences (1-based indices)
for (int i = 0; i < count; i++) {
printf("Pattern found at index: %d ", result[i]);
printf("\n");
}
printf("\n");
// Free the allocated memory
free(result);
return 0;
}
OutputPattern found at index: 1
Pattern found at index: 9
Time Complexity: O(N + M), where N is the length of the text and M is the length of the pattern. This is because the algorithm processes each character in the text and pattern at most once.
Auxiliary Space: O(M) for the lps[] array, where M is the length of the pattern.
Similar Reads
Boyer-Moore Algorithm for Pattern Searching in C
In this article, we will learn the Boyer-Moore Algorithm, a powerful technique for pattern searching in strings using C programming language. What is the Boyer-Moore Algorithm?The Boyer-Moore Algorithm is a pattern searching algorithm that efficiently finds occurrences of a pattern within a text. It
7 min read
Boyer-Moore Algorithm for Pattern Searching in C++
The Boyer-Moore algorithm is an efficient string searching algorithm that is used to find occurrences of a pattern within a text. This algorithm preprocesses the pattern and uses this information to skip sections of the text, making it much faster than simpler algorithms like the naive approach. In
6 min read
Searching Algorithms in Python
Searching algorithms are fundamental techniques used to find an element or a value within a collection of data. In this tutorial, we'll explore some of the most commonly used searching algorithms in Python. These algorithms include Linear Search, Binary Search, Interpolation Search, and Jump Search.
6 min read
Knuth-Morris-Pratt in C++
The Knuth-Morris-Pratt (KMP) is an efficient string-matching algorithm developed by Donald Knuth, Vaughan Pratt, and James H. Morris in 1977. It is used for finding a specific pattern in the given string. In this article, we will learn how to implement KMP algorithm in a C++ program. Table of Conten
5 min read
Boyer Moore Algorithm in Python
Boyer-Moore algorithm is an efficient string search algorithm that is particularly useful for large-scale searches. Unlike some other string search algorithms, the Boyer-Moore does not require preprocessing, making it ideal where the sample is relatively large relative to the data being searched. Wh
3 min read
Kosarajuâs Algorithm in C
Kosarajuâs Algorithm is a method by which we can use to find all strongly connected components (SCCs) in a directed graph. This algorithm has application in various applications such as finding cycles in a graph, understanding the structure of the web, and analyzing networks. In this article, we wil
5 min read
Implementation of Rabin Karp Algorithm in C++
The Rabin-Karp Algorithm is a string-searching algorithm that efficiently finds a pattern within a text using hashing. It is particularly useful for finding multiple patterns in the same text or for searching in streams of data. In this article, we will learn how to implement the Rabin-Karp Algorith
5 min read
Kahnâs Algorithm in C Language
In this post, we will see the implementation of Kahn's Algorithm in C language. What is Kahn's Algorithm?The Kahn's Algorithm is a classic algorithm used to the perform the topological sorting on the Directed Acyclic Graph (DAG). It produces a linear ordering of the vertices such that for every dire
5 min read
Java Program to Implement Commentz-Walter Algorithm
The Commentz-Walter algorithm is a string-matching algorithm that is used to search for a given pattern in a text string. It is a variant of the Knuth-Morris-Pratt (KMP) algorithm, which is a well-known algorithm for string matching. This algorithm works by preprocessing the pattern string to create
6 min read
Aho-Corasick Algorithm in Python
Given an input text and an array of k words, arr[], find all occurrences of all words in the input text. Let n be the length of text and m be the total number of characters in all words, i.e. m = length(arr[0]) + length(arr[1]) + ⦠+ length(arr[k-1]). Here k is the total number of input words. Examp
5 min read