
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Implement Zhu-Takaoka String Matching Algorithm in Java
The Zhu-Takaoka algorithm is one of the best algorithms for pattern matching. It is developed using the combination of the Boyer-Moore and KMP string-matching algorithms.
The Zhu-Takaoka algorithm utilizes the good character shift and bad character shift techniques to solve the problem.
Problem statement ? We have given two strings. We need to implement the Zhu-Takaoka algorithm for pattern matching.
Sample Examples
Input
str = "PQRDPQRSSE"; patt = "PQRS";
Output
5
Explanation
The ?PQRS' pattern exists at position 5. So, it prints 5.
Input
str = "PQRDPQRSSE"; patt = "PRQS";
Output
-1
Explanation
The pattern doesn't exist in the given string.
Input
str = "WELWELWELCOME"; patt = "WEL";
Output
1, 4, 7
Explanation
The pattern exists at multiple positions in the given string.
Approach
In the Zhu-Takaoka algorithm, we will prepare a ZTBC table after pre-processing the pattern string. So, we can know by how many indexes we should move the pattern to get the next match.
Let's understand the working of the Zhu-Takaoka algorithm step-by-step.
First, we create a ZTBC table of dimensions 26 x 26. Each row and column of the table is represented using the uppercase alphabetical characters.
In the first stage, all table elements are equal to the pattern length, assuming we need to pass the whole pattern when any mismatch occurs.
So, table values are as shown below according to the ?PQRS' pattern.
A B C D E F ? A 4 4 4 4 4 4 B 4 4 4 4 4 4 C 4 4 4 4 4 4 D 4 4 4 4 4 4 ..
In this algorithm, we need to match the two characters of the pattern with the string from right to left. If we find two consecutive matching elements in the pattern, we need to move the pattern such that it pair of characters match both in the string and pattern.
So, update the ZTBC table accordingly.
table [pattern[p-1]][pattern[p]] = len - p - 1 ;
After pre-processing the table, we need to start comparing the string and pattern.
Algorithm
Step 1 ? Define the ZTBCTable[] array of dimension 26 x 26 globally to store the pre-processed moves of the pattern.
Step 2 ? Execute the prepareZTBCTable() to fill the ZTBCTable() array after pre-processing the pattern.
Step 2.1 ? In the prepareZTBCTable() function, initialize all array elements with pattern length.
Step 2.2 ? Initialize the ZTBCTable[p][patt.charAt(0) - 'A'] with the pattern length - 1, representing the move when single character matches.
Step 2.3 ? Update the ZTBCTable[patt.charAt(p - 1) - 'A'][patt.charAt(p) - 'A'] with pattern length - 1 - index.
Step 3 ? Next, initialize the q with 0 and isPatPresent with a false value.
Step 4 ? Make iterations until q is smaller than the string and pattern length difference.
Step 5 ? Make nested loo's iteration until string and pattern character matches from the last.
Step 6 ? We found the match if p is less than 0. So, print the q as starting index, and update the isPatPresent with true.
Step 7 ? Otherwise, add ZTBCTable[str.charAt(q + patt_len - 2) - 'A'][str.charAt(q + patt_len - 1) - 'A'] to the ?q' variable to move the pattern.
Step 8 ? At last, if the value of isPatPresent is false, print ?pattern doesn't exists' in the string.
Example
import java.io.*; import java.lang.*; import java.util.*; public class Main { // Declaring custom strings as inputs and patterns public static String str = "PQRDPQRSSE"; public static String patt = "PQRS"; // And their lengths public static int str_len = str.length(); public static int patt_len = patt.length(); public static int[][] ZTBCTable = new int[26][26]; public static void prepareZTBCTable() { int p, q; // Initialize the table for (p = 0; p < 26; ++p) for (q = 0; q < 26; ++q) ZTBCTable[p][q] = patt_len; for (p = 0; p < 26; ++p) ZTBCTable[p][patt.charAt(0) - 'A'] = patt_len - 1; for (p = 1; p < patt_len - 1; ++p) ZTBCTable[patt.charAt(p - 1) - 'A'][patt.charAt(p) - 'A'] = patt_len - 1 - p; } public static void main(String args[]) { int p, q; // Preparing a ZTBC table prepareZTBCTable(); q = 0; boolean isPatPresent = false; while (q <= str_len - patt_len) { p = patt_len - 1; while (p >= 0 && patt.charAt(p) == str.charAt(p + q)) --p; if (p < 0) { // When we get the pattern System.out.println("Pattern exists at index " + (q + 1)); q += patt_len; isPatPresent = true; } else { // Move in the string q += ZTBCTable[str.charAt(q + patt_len - 2) - 'A'][str.charAt(q + patt_len - 1) - 'A']; } } if(!isPatPresent){ System.out.println("Pattern doesn't exists in the given string"); } } }
Output
Pattern exists at index 5
Time complexity - O(N*M), where N is the string length, and M is the pattern length.
Space complexity - O(26*26) to store the pattern moves.
The Zhu-Takaoka algorithm is more efficient in terms of memory and time. Also, it compares the two characters of the pattern with the string, improving the algorithm's performance by decreasing the comparisons.