The Z algorithm is a powerful string-matching algorithm used to find all occurrences of a pattern within a text. It operates efficiently, with a linear time complexity of O(n+m), where n is the length of the text and m is the length of the pattern. This makes it particularly useful for problems involving large texts. In this article, we'll explore the Z algorithm, understand its underlying concepts, and learn how to implement it in Python.
The Z algorithm computes an array, known as the Z-array, for a given string. The Z-array at position i stores the length of the longest substring starting from i that is also a prefix of the string. This information can then be used to efficiently search for a pattern within a text.
Z-array Definition:
Given a string S of length n, the Z-array Z is defined as follows: Z[i] is the length of the longest substring starting from S[i] which is also a prefix of S.
Example:
Consider the string S = "aabcaabxaaaz". The Z-array for S is calculated as follows:
- Z[0] = n (since the entire string is a prefix of itself)
- Z[1] = 1 (the substring starting at index 1 is "a", which is a prefix of length 1)
- Z[2] = 0 (the substring starting at index 2 is "b", which is not a prefix)
- Z[3] = 1 (the substring starting at index 3 is "c", which is a prefix of length 1)
- and so on.
- The Z-array for S would be [12, 1, 0, 1, 3, 1, 0, 0, 3, 0, 0, 1].
The Z Algorithm: Step-by-Step
Here's a detailed breakdown of how the Z algorithm works:
- Initialization:
- Start with the entire string S, and initialize the Z-array Z with zeroes.
- Set the variables L and R to 0. These variables will define a window in S where S[L:R+1] matches the prefix of S.
- Iterate through the string: For each position i in the string S:
- Case 1: If i > R, then there is no Z-box (a substring matching the prefix of S that starts before i and ends after i).
- Set L = R = i and extend the window R to the right as long as S[R] == S[R-L].
- Set Z[i] = R - L and decrement R.
- Case 2: If i ≤ R, then i falls within a Z-box. Use the previously computed Z-values to determine the value of Z[i]:
- Sub-case 2a: If Z[i-L] < R - i + 1, then Z[i] = Z[i-L].
- Sub-case 2b: If Z[i-L] ≥ R - i + 1, then set L = i and extend the window R as long as S[R] == S[R-L]. Set Z[i] = R - L and decrement R.
- Output the Z-array: After processing all positions in the string, the Z-array contains the lengths of the longest substrings starting from each position that match the prefix of S.
Implementing the Z Algorithm in Python:
To understand the Z algorithm better, let's break down the implementation step by step.
- calculate_z(s):
- This function computes the Z-array for a given string
s
. - The Z-array is an array where the value at each position
i
indicates the length of the longest substring starting from s[i]
which is also a prefix of s
.
- z_algorithm(pattern, text):
- This function uses the Z Algorithm to search for all occurrences of
pattern
in text
. - It concatenates the pattern, a unique delimiter (
$
), and the text to create a combined string. - It then computes the Z-array for the combined string and checks for positions in the Z-array where the Z-value equals the length of the pattern, indicating a match.
Below is the implementation of the above approach:
Python
def calculate_z(s):
n = len(s) # Length of the input string
z = [0] * n # Initialize Z-array with zeros
l, r, k = 0, 0, 0 # Initialize left and right boundary of Z-box
for i in range(1, n):
# Case 1: i is outside the current Z-box
if i > r:
l, r = i, i
while r < n and s[r] == s[r - l]:
r += 1
z[i] = r - l
r -= 1
# Case 2: i is inside the current Z-box
else:
k = i - l
# Case 2a: Value does not stretch outside the Z-box
if z[k] < r - i + 1:
z[i] = z[k]
# Case 2b: Value stretches outside the Z-box
else:
# Case 2b: Value stretches outside the Z-box
l = i
while r < n and s[r] == s[r - l]:
r += 1
z[i] = r - l
r -= 1
return z
def z_algorithm(pattern, text):
# Concatenate pattern, delimiter, and text
combined = pattern + "$" + text
# Calculate Z-array for the combined string
z = calculate_z(combined)
# Length of the pattern
pattern_length = len(pattern)
# List to store the result indices
result = []
for i in range(len(z)):
# If Z-value equals pattern length, pattern is found
if z[i] == pattern_length:
# Append starting index to result
result.append(i - pattern_length - 1)
return result
# Example usage:
pattern = "abc"
text = "ababcabc"
result = z_algorithm(pattern, text)
print("Pattern found at indices:", result) # Output should be [2, 5]
OutputPattern found at indices: [2, 5]
Time Complexity: O(n), where n is the length of the text. This is because the algorithm only needs to iterate through the text once to compute the Z array, and then it can use the Z array to find all occurrences of the pattern in the text.
Auxiliary Space: O(n), where n is the length of the text. This is because the algorithm needs to store the Z array, which has the same length as the text.
Similar Reads
Alphabet range in Python When working with strings and characters in Python, you may need to create a sequence of letters, such as the alphabet from 'a' to 'z' or 'A' to 'Z'. Python offers various options for accomplishing this, taking advantage of its rich string handling features. This article will go over numerous ways t
3 min read
Z algorithm (Linear time pattern searching Algorithm) This algorithm efficiently locates all instances of a specific pattern within a text in linear time. If the length of the text is "n" and the length of the pattern is "m," then the total time taken is O(m + n), with a linear auxiliary space. It is worth noting that the time and auxiliary space of th
13 min read
Python time.tzset() Function In Python, the tzset() function of the time module is based on the re-initialization settings using the environment variable TZ. tzset() method of time module in python resets the time transformation protocol. this timezone means non-DST seconds west of UTC time and altzone means DST seconds west of
3 min read
Best way to learn python Python is a versatile and beginner-friendly programming language that has become immensely popular for its readability and wide range of applications. Whether you're aiming to start a career in programming or just want to expand your skill set, learning Python is a valuable investment of your time.
11 min read
How to Change Values in a String in Python The task of changing values in a string in Python involves modifying specific parts of the string based on certain conditions. Since strings in Python are immutable, any modification requires creating a new string with the desired changes. For example, if we have a string like "Hello, World!", we mi
2 min read