The Z algorithm is a powerful string-matching algorithm used to find all occurrences of a pattern within a text. It operates efficiently, with a linear time complexity of O(n+m), where n is the length of the text and m is the length of the pattern. This makes it particularly useful for problems involving large texts. In this article, we'll explore the Z algorithm, understand its underlying concepts, and learn how to implement it in Python.
The Z algorithm computes an array, known as the Z-array, for a given string. The Z-array at position i stores the length of the longest substring starting from i that is also a prefix of the string. This information can then be used to efficiently search for a pattern within a text.
Z-array Definition:
Given a string S of length n, the Z-array Z is defined as follows: Z[i] is the length of the longest substring starting from S[i] which is also a prefix of S.
Example:
Consider the string S = "aabcaabxaaaz". The Z-array for S is calculated as follows:
- Z[0] = n (since the entire string is a prefix of itself)
 - Z[1] = 1 (the substring starting at index 1 is "a", which is a prefix of length 1)
 - Z[2] = 0 (the substring starting at index 2 is "b", which is not a prefix)
 - Z[3] = 1 (the substring starting at index 3 is "c", which is a prefix of length 1)
 - and so on.
 - The Z-array for S would be [12, 1, 0, 1, 3, 1, 0, 0, 3, 0, 0, 1].
 
The Z Algorithm: Step-by-Step
Here's a detailed breakdown of how the Z algorithm works:
- Initialization:
- Start with the entire string S, and initialize the Z-array Z with zeroes.
 - Set the variables L and R to 0. These variables will define a window in S where S[L:R+1] matches the prefix of S.
 
 - Iterate through the string: For each position i in the string S:
- Case 1: If i > R, then there is no Z-box (a substring matching the prefix of S that starts before i and ends after i).
- Set L = R = i and extend the window R to the right as long as S[R] == S[R-L].
 - Set Z[i] = R - L and decrement R.
 
 - Case 2: If i ≤ R, then i falls within a Z-box. Use the previously computed Z-values to determine the value of Z[i]:
- Sub-case 2a: If Z[i-L] < R - i + 1, then Z[i] = Z[i-L].
 - Sub-case 2b: If Z[i-L] ≥ R - i + 1, then set L = i and extend the window R as long as S[R] == S[R-L]. Set Z[i] = R - L and decrement R.
 
 
 - Output the Z-array: After processing all positions in the string, the Z-array contains the lengths of the longest substrings starting from each position that match the prefix of S.
 
Implementing the Z Algorithm in Python:
To understand the Z algorithm better, let's break down the implementation step by step.
- calculate_z(s):
- This function computes the Z-array for a given string 
s. - The Z-array is an array where the value at each position 
i indicates the length of the longest substring starting from s[i] which is also a prefix of s. 
 - z_algorithm(pattern, text):
- This function uses the Z Algorithm to search for all occurrences of 
pattern in text. - It concatenates the pattern, a unique delimiter (
$), and the text to create a combined string. - It then computes the Z-array for the combined string and checks for positions in the Z-array where the Z-value equals the length of the pattern, indicating a match.
 
 
Below is the implementation of the above approach:
            
            Python
    def calculate_z(s):
    n = len(s)  # Length of the input string
    z = [0] * n  # Initialize Z-array with zeros
    l, r, k = 0, 0, 0  # Initialize left and right boundary of Z-box
    for i in range(1, n):
       # Case 1: i is outside the current Z-box
        if i > r:
            l, r = i, i
            while r < n and s[r] == s[r - l]:
                r += 1
            z[i] = r - l
            r -= 1
            # Case 2: i is inside the current Z-box
        else:
            k = i - l
            # Case 2a: Value does not stretch outside the Z-box
            if z[k] < r - i + 1:
                z[i] = z[k]
                # Case 2b: Value stretches outside the Z-box
            else:
               # Case 2b: Value stretches outside the Z-box
                l = i
                while r < n and s[r] == s[r - l]:
                    r += 1
                z[i] = r - l
                r -= 1
    return z
def z_algorithm(pattern, text):
         # Concatenate pattern, delimiter, and text
    combined = pattern + "$" + text
    # Calculate Z-array for the combined string
    z = calculate_z(combined)
    # Length of the pattern
    pattern_length = len(pattern)
    # List to store the result indices
    result = []
    for i in range(len(z)):
      # If Z-value equals pattern length, pattern is found
        if z[i] == pattern_length:
          # Append starting index to result
            result.append(i - pattern_length - 1)
    return result
# Example usage:
pattern = "abc"
text = "ababcabc"
result = z_algorithm(pattern, text)
print("Pattern found at indices:", result)  # Output should be [2, 5]
OutputPattern found at indices: [2, 5]
 Time Complexity: O(n), where n is the length of the text. This is because the algorithm only needs to iterate through the text once to compute the Z array, and then it can use the Z array to find all occurrences of the pattern in the text.
Auxiliary Space: O(n), where n is the length of the text. This is because the algorithm needs to store the Z array, which has the same length as the text.
                                
                                
                            
                                                                                
                                                            
                                                    
                                                
                                                        
                            
                        
                                                
                        
                                                                                    
                                                                Explore
                                    
                                        DSA Fundamentals
Data Structures
Algorithms
Advanced
Interview Preparation
Practice Problem