0% found this document useful (0 votes)

34 views33 pages

String Matching Algorithms Overview

The document discusses the string matching problem, which involves finding occurrences of a pattern within a text, and highlights its applications in areas like DNA sequencing and internet search engines. It introduces various algorithms for string matching, focusing on the Naive and Rabin-Karp algorithms, detailing their procedures and complexities. The document also explains the concepts of valid and invalid shifts in string matching, along with definitions and notations relevant to the algorithms.

Uploaded by

dayalrattanani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views33 pages

String Matching Algorithms Overview

Uploaded by

dayalrattanani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

CT-363

Design & Analysis of Algorithms

String Matching
String Matching Problem
• Text-editing programs frequently need to find all
occurrences of a pattern in the text.
• Typically, the text is a document being edited, and
the pattern searched for is a particular word
supplied by the user.
• Efficient algorithms for this problem—called
“string matching”
• String matching can grealty aid the
responsiveness of the text-editing program
String Matching Problem
• In Computer Science string searching algorithms,
sometimes called string matching algorithms, that
try to find a place where one or several string
(called patterns) are found within large string or
text.
Application- String Matching Algorithms
• Particular patterns in DNA Sequence.
• Internet search engines
String Matching Problem
• We assume that the text is an array T [1 .. n] of
length n and that the pattern is an array P[1 .. m]
of length m ≤ n.

• We further assume that the elements of P and T

are characters drawn from a finite alphabet Σ.
– For example, we may have Σ = {0, 1} or
Σ = {a, b, . . . , z}.

• The character arrays P and T are often called

strings of characters.
String Matching Problem
• We say that pattern P occurs with shift s in text T (or,
equivalently, that pattern P occurs beginning at position
s + 1 in text T) if
0 ≤ s ≤ n - m and T [s + 1 .. s + m] = P[1 .. m] i.e.
T [s + j] = P[ j], for 1 ≤ j ≤ m).
– If P occurs with shift s in T, we call s a valid shift;
– otherwise, we call s an invalid shift.
String Matching Problem
• The string-matching problem is “finding all valid shifts
with which a given pattern P occurs in a given text T”.
Example: String Matching Problem

Text T a b c a b a a b c a b a c

s=3
Pattern P a b a a
Definitions and Notations

Notation Terminology
Σ* The set of all finite-length strings formed using characters from the
alphabet Σ.
ε The zero-length empty string, also belongs to Σ*.

|x| The length of a string x.

xy The concatenation of two strings x and y has length |x| + |y| and
consists of the characters from x followed by the characters from y.
wx A string w is a prefix of a string x, if x = wy for some string y 
Σ*. If w  x, then |w| ≤ |x|.
wx A string w is a suffix of a string x, if x = yw for some y  Σ*. If w
 x that |w| ≤ |x|.
1. Naive Approach
• The idea is based on Brute Force Approach.

• The naive algorithm finds all valid shifts using a loop that
checks the condition P[1 .. m] = T[s + 1 .. s + m] for each
of the n - m + 1 possible values of s.

• It can be interpreted graphically as sliding a

“template“ containing the pattern over the text, noting for
which shifts all of the characters on the template equal
the corresponding characters in the text.
String Matching Algorithms
String Matching Algorithms
• There are many types of String Matching
Algorithms
The Naive string-matching algorithm
The Rabin-Krap algorithm
String matching with finite automata
The Knuth-Morris-Pratt algorithm

• But we will discuss only 2 types, i.e. Naive &

Rabin-Krap
Naive String Matching Algorithm
• The naive algorithm finds all valid shifts using a loop
that checks the condition P[1..m] = T[1..s] for each
of the n-m+1 possible values of s.

NAIVE-STRING-MATCHER(T, P)
1 n ← length[T]
2 m ← length[P]
3 for s ← 0 to n - m
4 do if P[1 .. m] = T[s + 1 .. s + m]
5 then print "Pattern occurs with shift" s
Example: Naive String Matching Algorithm
• Suppose
P = aab
T = acaabc
Find all valid shifts
Example: Naive String Matching Algorithm

n ← length[T] = 6
m ← length[P] = 3
a c a a b c for s ← 0 to n – m (6 - 3 = 3)
P[1] = T[s + 1]
s=0
a a b P[1] = T[1] (As a = a)

P[2] = T[s + 2]
But P[2]  T[2] (As a  c)
Example: Naive String Matching Algorithm

a c a a b c for s ← 1
P[1] = T[s + 1]
s=1
a a b But P[1]  T[2] (As a  c)
Example: Naive String Matching Algorithm

for s ← 2
P[1] = T[s + 1]
P[1] = T[3] (As a = a)
a c a a b c

s=2 P[2] = T[s + 2]

a a b P[2] = T[4] (As a = a)

P[3] = T[s + 3]
P[3] = T[5] (As b = b)
Example: Naive String Matching Algorithm

for s ← 3

a c a a b c P[1] = T[s + 1]
P[1] = T[4] (As a = a)
s=3
a a b
P[2] = T[s + 2]
But P[2]  T[5] (As a  b)
Example: Naive String Matching Algorithm
Naive String Matching Algorithm
• Worst case Running Time
– Outer loop: n – m + 1
– Inner loop: m
– Total ((n - m + 1)m)
• Best-case: n-m
Note
• Not an optimal procedure for String Matching problem.
• It has high running time for worst case.
• The naive string-matcher is inefficient because
information gained about the text for one value of s is
entirely ignored in considering other values of s.
2. The Rabin-Karp Algorithm
• It compares string’s hash values, rather than string
themselves.
• Perform well in practice, and generalized to other
algorithms for related problems, such as two-dimensional
pattern matching.
2. The Rabin-Karp Algorithm
Special Case
• Given a text T [1 .. n] of length n, a pattern P[1 .. m] of
length m ≤ n, both as arrays.
• Assume that elements of P and T are characters drawn
from a finite set of alphabets Σ.
• Where Σ = {0, 1, 2, . . . , 9}, so that each character is a
decimal digit.
• Now our objective is “finding all valid shifts with which
a given pattern P occurs in a text T”.
Notations: The Rabin-Karp Algorithm
Let us suppose that
• p denotes decimal value of given a pattern P[1 .. m]
• ts = decimal value of length-m substring T[s + 1 .. s + m],
of given text T [1 .. n], for s = 0, 1, ..., n - m.
• It is very obvious that, ts = p if and only if
T [s + 1 .. s + m] = P[1 .. m];
thus, s is a valid shift if and only if ts = p.

• Now the question is how to compute p and ts efficiently

• Answer is Horner’s rule
Horner’s Rule
Example: Horner’s rule
[3, 4, 5] = 5 + 10(4 + 10(3)) = 5 + 10(4 + 30) = 5+340 =
345
p = P[3] + 10 (P[3 - 1] + 10(P[1])).

Formula
• We can compute p in time Θ(m) using this rule as
p = P[m] + 10 (P[m-1] + 10(P[m-2] + … + 10(P[2] + 10P[1]) ))

• Similarly t0 can be computed from T [1 .. m] in time Θ(m).

• To compute t1, t2, . . . , tn-m in time Θ(n - m), it suffices to

observe that ts+1 can be computed from ts in constant time.
Computing ts+1 from ts in constant time
• Text = [3, 1, 4, 1, 5, 2]; t0 = 31415
• m = 5; Shift = 0
3 1 4 1 5 2
• Old higher-order digit = 3
• New low-order digit = 2
• t1 = 10.(31415 – 104.T(1)) + T(5+1)
= 10.(31415 – 104.3) + 2
= 10(1415) + 2 = 14152

• ts+1 = 10(ts – T[s + 1] 10m-1 ) + T[s + m + 1])

• t1 = 10(t0 – T[1] 104) + T[0 + 5 + 1])

• Now t1, t2, . . . , tn-m can be computed in Θ(n - m)

Procedure: Computing ts+1 from ts
1. Subtract T[s + 1]10m-1 from ts, removes high-order digit
2. Multiply result by 10, shifts the number left one position
3. Add T [s + m + 1], it brings appropriate low-order digit.
ts+1 = (10(ts – T[s + 1] 10m-1 ) + T[s + m + 1])

Another issue and its treatment

• The only difficulty with the above procedure is that p and
ts may be too large to work with conveniently.

• Fortunately, there is a simple cure for this problem,

compute p and the ts modulo a suitable modulus q.
Computing ts+1 from ts Modulo q = 13

A window of length 5 is shaded.

The numerical value of window = 31415

31415 mod 13 = 7

2 3 5 9 0 2 3 1 4 1 5 2 6 7 3 9 9 2 1
mod 13
7
Spurious Hits and their Elimination
• m = 5.
• p = 31415,
• Now, 31415 ≡ 7 (mod 13)
• Now, 67399 ≡ 7 (mod 13)
• Window beginning at position 7 = valid match; s = 6
• Window beginning at position 13 = spurious hit; s = 12
• After comparing decimal values, text comparison is
needed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
2 3 5 9 0 2 3 1 4 1 5 2 6 7 3 9 9 2 1
… … … mod 13

8 9 3 11 0 1 7 8 4 5 10 11 7 9 11
Valid match Spurious hit
2. The Rabin-Karp Algorithm
Generalization
• Given a text T [1 .. n] of length n, a pattern P[1 .. m] of
length m ≤ n, both as arrays.
• Assume that elements of P and T are characters drawn
from a finite set of alphabets Σ = {0, 1, 2, . . . , d-1}.
• Now our objective is “finding all valid shifts with which
a given pattern P occurs in a text T”.

Note
• ts+1 = (d(ts – T[s + 1]h) + T[s + m + 1]) mod q
where h = dm-1 (mod q) is the value of the digit “1” in the
high-order position of an m-digit text window.
Sequence of Steps Designing Algorithm
1. Compute the lengths of pattern P and text T
2. Compute p and ts under modulo q using Horner’s Rule
3. For any shift s for which ts ≡ p (mod q), must be tested
further to see if s is really valid shift or a spurious hit.
4. This testing can be done by checking the condition:
P[1 .. m] = T [s + 1 .. s + m]. If these strings are equal s
is a valid shift otherwise spurious hit.
5. If for shift s, ts ≡ p (mod q) is false, compute ts+1 and
replace it with ts and repeat the step 3.

Note
• As ts ≡ p (mod q) does not imply that ts = p, hence text
comparison is required to find valid shift
2. The Rabin-Karp Algorithm
RABIN-KARP-MATCHER(T, P, d, q)
1 n ← length[T]
2 m ← length[P]
3 h ← dm-1 mod q
4 p←0
5 t0 ← 0
6 for i ← 1 to m  Preprocessing.
7 do p ← (dp + P[i]) mod q
8 t0 ← (dt0 + T[i]) mod q
9 for s ← 0 to n - m  Matching.
10 do if p = ts
11 then if P[1 .. m] = T [s + 1 .. s + m]
12 then print "Pattern occurs with shift" s
13 if s < n - m
14 then ts+1 ← (d(ts - T[s + 1]h) + T[s + m + 1]) mod q
Analysis: The Rabin-Karp Algorithm
• Worst case Running Time
– Preprocessing time: Θ(m)
– Matching time is Θ((n – m + 1)m)

• If P = am, T = an, verifications take time Θ((n - m + 1)m),

since each of the n - m + 1 possible shifts is valid.

• In applications with few valid shifts, matching time of the

algorithm is only O((n - m + 1) + cm) = O(n + m), plus
the time required to process spurious hits.
Summary
4. The Knuth-Morris-Pratt algorithm
Algorithm Preprocessing Matching
Time Time
Naive 0 O((n-m+1)m)
Rabin-Karp (m) O((n-m+1)m)
Finite Automaton O(m| | ) (n)

String Matching Algorithm Analysis
No ratings yet
String Matching Algorithm Analysis
18 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
57 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
74 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
17 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
53 pages
String Matching Algorithms Explained
No ratings yet
String Matching Algorithms Explained
7 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
63 pages
Naive and Rabin-Karp String Matching
No ratings yet
Naive and Rabin-Karp String Matching
47 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
46 pages
String Matching Algorithms Explained
No ratings yet
String Matching Algorithms Explained
13 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
34 pages
Comparing String Matching Algorithms
No ratings yet
Comparing String Matching Algorithms
35 pages
String Matching Algorithms Explained
100% (1)
String Matching Algorithms Explained
27 pages
Comparing Rabin-Karp and KMP Algorithms
No ratings yet
Comparing Rabin-Karp and KMP Algorithms
41 pages
Adobe Scan Nov 24, 2023
No ratings yet
Adobe Scan Nov 24, 2023
5 pages
Algebraic Computation in DAA
No ratings yet
Algebraic Computation in DAA
22 pages
String Matching Algorithms Explained
No ratings yet
String Matching Algorithms Explained
26 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
94 pages
Rabin-Karp Pattern Matching in C++
No ratings yet
Rabin-Karp Pattern Matching in C++
15 pages
String Matching Techniques Explained
No ratings yet
String Matching Techniques Explained
5 pages
Brute-Force Pattern Matching Algorithm
No ratings yet
Brute-Force Pattern Matching Algorithm
21 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
43 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
43 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
28 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
42 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
30 pages
String Matching Algorithms Explained
No ratings yet
String Matching Algorithms Explained
45 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
18 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
13 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
52 pages
String Matching Algorithms Explained
No ratings yet
String Matching Algorithms Explained
52 pages
Naive String Matching Algorithm Overview
No ratings yet
Naive String Matching Algorithm Overview
4 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
23 pages
String Matching Algorithms Explained
No ratings yet
String Matching Algorithms Explained
10 pages
String Matching Algorithms Explained
No ratings yet
String Matching Algorithms Explained
10 pages
String Matching Algorithms Explained
No ratings yet
String Matching Algorithms Explained
27 pages
Naïve String Matching Algorithm Explained
100% (1)
Naïve String Matching Algorithm Explained
9 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
50 pages
DNA Pattern Matching Algorithms
No ratings yet
DNA Pattern Matching Algorithms
27 pages
Combinatorial Optimization Problems
No ratings yet
Combinatorial Optimization Problems
10 pages
Patternmatching
No ratings yet
Patternmatching
29 pages
String Matching Algorithms Explained
No ratings yet
String Matching Algorithms Explained
6 pages
String Matching Algorithms Explained
No ratings yet
String Matching Algorithms Explained
11 pages
Improved Rabin-Karp String Matching Algorithm
No ratings yet
Improved Rabin-Karp String Matching Algorithm
4 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
45 pages
KMP Algorithm for String Matching
No ratings yet
KMP Algorithm for String Matching
40 pages
Efficient Palindrome Detection Algorithm
No ratings yet
Efficient Palindrome Detection Algorithm
5 pages
String Matching Algorithms Explained
No ratings yet
String Matching Algorithms Explained
27 pages
Naive Algorithm for String Matching
No ratings yet
Naive Algorithm for String Matching
5 pages
String Algorithms and Matching Techniques
100% (1)
String Algorithms and Matching Techniques
12 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
31 pages
String Matching and NP-Completeness Overview
No ratings yet
String Matching and NP-Completeness Overview
37 pages
String Matching Algorithms and Analysis
No ratings yet
String Matching Algorithms and Analysis
2 pages
String Matching Algorithms Explained
No ratings yet
String Matching Algorithms Explained
16 pages
KMP Algorithm in Bioinformatics
No ratings yet
KMP Algorithm in Bioinformatics
7 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
38 pages
String Matching Algorithms in Bioinformatics
No ratings yet
String Matching Algorithms in Bioinformatics
7 pages
String Matching Algorithms Analysis
No ratings yet
String Matching Algorithms Analysis
63 pages
Rabin-Karp String Matching Explained
No ratings yet
Rabin-Karp String Matching Explained
13 pages
Overview of Pattern Matching Algorithms
No ratings yet
Overview of Pattern Matching Algorithms
16 pages
Bresenham Line Drawing Algorithm Guide
No ratings yet
Bresenham Line Drawing Algorithm Guide
17 pages
Multiple Regression Analysis Techniques
No ratings yet
Multiple Regression Analysis Techniques
69 pages
Chemical Process Optimization Exam 2022
No ratings yet
Chemical Process Optimization Exam 2022
2 pages
Understanding Perceptrons and Learning
No ratings yet
Understanding Perceptrons and Learning
22 pages
Numerical Methods in Computer Science
No ratings yet
Numerical Methods in Computer Science
34 pages
Python Machine Learning Experiments
No ratings yet
Python Machine Learning Experiments
14 pages
Newton-Raphson Method Explained
No ratings yet
Newton-Raphson Method Explained
37 pages
Digital Signal Processing Exam Guide
No ratings yet
Digital Signal Processing Exam Guide
2 pages
Understanding Principal Component Analysis
No ratings yet
Understanding Principal Component Analysis
3 pages
Constructing Conditional FP Trees
No ratings yet
Constructing Conditional FP Trees
42 pages
SVM Signal Detection in OFDM Systems
No ratings yet
SVM Signal Detection in OFDM Systems
20 pages
Simplex Method for Gas Production Optimization
No ratings yet
Simplex Method for Gas Production Optimization
8 pages
Error Detection in Data Link Layer
No ratings yet
Error Detection in Data Link Layer
16 pages
Understanding Strand Sort Algorithm
No ratings yet
Understanding Strand Sort Algorithm
18 pages
Key Derivation Functions Explained
No ratings yet
Key Derivation Functions Explained
9 pages
Optimizing Imbalanced Data in Crash Severity
No ratings yet
Optimizing Imbalanced Data in Crash Severity
14 pages
Understanding Empirical Risk Minimization
No ratings yet
Understanding Empirical Risk Minimization
61 pages
Class 12 Linear Regression Exercise 3.2
No ratings yet
Class 12 Linear Regression Exercise 3.2
13 pages
Convolutional Codes in MATLAB Simulation
No ratings yet
Convolutional Codes in MATLAB Simulation
6 pages
Understanding Arrays in C Programming
No ratings yet
Understanding Arrays in C Programming
26 pages
Data Structures & Algorithms Assignment
No ratings yet
Data Structures & Algorithms Assignment
42 pages
Overview of Pulse-Code Modulation
No ratings yet
Overview of Pulse-Code Modulation
16 pages
Convolutional Codes in Channel Coding
No ratings yet
Convolutional Codes in Channel Coding
3 pages
Quick Sort Analysis and Implementation
No ratings yet
Quick Sort Analysis and Implementation
33 pages
Trip Cost Analysis by State and Consultant
No ratings yet
Trip Cost Analysis by State and Consultant
9 pages
Optimization Techniques in Machine Learning
No ratings yet
Optimization Techniques in Machine Learning
5 pages
Systems Analysis and Engineering Concepts
No ratings yet
Systems Analysis and Engineering Concepts
69 pages
Deep Learning Concepts and Challenges
No ratings yet
Deep Learning Concepts and Challenges
22 pages
Greedy Algorithm Techniques Explained
No ratings yet
Greedy Algorithm Techniques Explained
34 pages

String Matching Algorithms Overview

Uploaded by

String Matching Algorithms Overview

Uploaded by

CT-363

Design & Analysis of Algorithms

• We further assume that the elements of P and T

• The character arrays P and T are often called

|x| The length of a string x.

• It can be interpreted graphically as sliding a

• But we will discuss only 2 types, i.e. Naive &

s=2 P[2] = T[s + 2]

• Now the question is how to compute p and ts efficiently

• Similarly t0 can be computed from T [1 .. m] in time Θ(m).

• To compute t1, t2, . . . , tn-m in time Θ(n - m), it suffices to

• ts+1 = 10(ts – T[s + 1] 10m-1 ) + T[s + m + 1])

• Now t1, t2, . . . , tn-m can be computed in Θ(n - m)

Another issue and its treatment

• Fortunately, there is a simple cure for this problem,

A window of length 5 is shaded.

The numerical value of window = 31415

• If P = am, T = an, verifications take time Θ((n - m + 1)m),

• In applications with few valid shifts, matching time of the

You might also like