Open In App

KMP (Knuth-Morris-Pratt) Algorithm for Pattern Searching in C

Last Updated : 13 Aug, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

The KMP (Knuth-Morris-Pratt) algorithm is an efficient string searching algorithm used to find occurrences of a pattern within a text. Unlike simpler algorithms, KMP preprocesses the pattern to create a partial match table, known as the "lps" (Longest Prefix Suffix) array, which helps in skipping unnecessary comparisons. This makes it faster and more efficient, especially for large texts.

In this article, we will explore the KMP algorithm and its implementation in the C programming language.

Example

Input:
char txt[] = "AABAACAADAABAABA";
char pat[] = "AABA";

Output:
Pattern found at index 0
Pattern found at index 9
Pattern found at index 12

What is the KMP Algorithm?

The KMP algorithm is a pattern matching algorithm that preprocesses the pattern to create an array (lps[]) that stores the length of the longest proper prefix which is also a suffix for each prefix of the pattern. This preprocessing step allows the algorithm to skip rechecking parts of the text that have already been matched, thus improving efficiency.

Steps to Implement the KMP Algorithm in C

The KMP algorithm works by comparing the pattern with the text from left to right. When a mismatch occurs after matching a few characters, the algorithm uses the lps[] array to avoid unnecessary comparisons by shifting the pattern to the right place.

  1. Preprocess the pattern to create the lps[] array that stores the length of the longest proper prefix which is also a suffix for the pattern. The lps[] array is used to skip characters in the text when a mismatch occurs.
  2. Initialize two pointers, one for the text and one for the pattern, to compare characters.
  3. Start comparing the first character of the pattern with the first character of the text.
  4. If the characters match,
    • move both pointers forward.
  5. If they mismatch,
    • use the lps[] array to determine how far the pattern pointer should move.
  6. Continue this process until the pattern is found or the text is fully searched.

Working of KMP Algorithm in C

Consider the following example to understand how the KMP algorithm works:

kmp
KMP Algorithm

Text: AABAACAADAABAABA
Pattern: AABA

1. Initial Comparison:
Compare the first window of text with the pattern.
Characters match for the first four positions.
Match found at index 0.

2. Shift Pattern Using lps[]
Use the lps[] array to shift the pattern, skipping unnecessary comparisons.
Continue comparing the pattern with the next segments of the text.

3. Next Matches:
Match found at index 9
Match found at index 12.

C Program to Implement KMP Algorithm

Below is a C program that implements the KMP algorithm for pattern searching.

C
//C program to implement KMP Algorithm
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// Fills lps[] for given pattern pat
void computeLPSArray(const char* pat, int M, int* lps){
    // Length of the previous longest prefix suffix
    int len = 0;

    // lps[0] is always 0
    lps[0] = 0;

    // Loop calculates lps[i] for i = 1 to M-1
    int i = 1;
    while (i < M) {
        if (pat[i] == pat[len]) {
            len++;
            lps[i] = len;
            i++;
        }
        else {
            if (len != 0) {
                len = lps[len - 1];
            }
            else {
                lps[i] = 0;
                i++;
            }
        }
    }
}

// Prints occurrences of pat in txt and returns an array of
// occurrences
int* KMPSearch(const char* pat, const char* txt, int* count){
    int M = strlen(pat);
    int N = strlen(txt);

    // Create lps[] that will hold the longest prefix suffix
    // values for pattern
    int* lps = (int*)malloc(M * sizeof(int));

    // Preprocess the pattern (calculate lps[] array)
    computeLPSArray(pat, M, lps);

    int* result = (int*)malloc(N * sizeof(int));

    // Number of occurrences found
    *count = 0;

    int i = 0; // index for txt
    int j = 0; // index for pat
  
    while ((N - i) >= (M - j)) {
        if (pat[j] == txt[i]) {
            j++;
            i++;
        }

        if (j == M) {

            // Record the occurrence (1-based index)
            result[*count] = i - j + 1;
            (*count)++;
            j = lps[j - 1];
        }
        else if (i < N && pat[j] != txt[i]) {
            if (j != 0) {
                j = lps[j - 1];
            }
            else {
                i = i + 1;
            }
        }
    }
    free(lps);
    return result;
}

int main(){
    const char txt[] = "geeksforgeeks";
    const char pat[] = "geeks";
    int count;

    // Call KMPSearch and get the array of occurrences
    int* result = KMPSearch(pat, txt, &count);

    // Print all the occurrences (1-based indices)
    for (int i = 0; i < count; i++) {
       printf("Pattern found at index: %d ", result[i]);
       printf("\n");
  
    }
    printf("\n");

    // Free the allocated memory
    free(result);

    return 0;
}

Output
Pattern found at index: 1 
Pattern found at index: 9 

Time Complexity: O(N + M), where N is the length of the text and M is the length of the pattern. This is because the algorithm processes each character in the text and pattern at most once.
Auxiliary Space: O(M) for the lps[] array, where M is the length of the pattern.


Next Article
Article Tags :

Similar Reads