String Searching Algorithms Slides
String Searching Algorithms Slides
Robert Horvick
SOFTWARE ENGINEER
@bubbafat www.roberthorvick.com
String searching overview
Boyer-Moore-Horspool algorithm
- Algorithm
- Performance
Search Algorithm
The IStringSearchAlgorithm defines the function, Search, which will be called to
find all of the matches of the search string (pattern) within the input (toSearch)
string.
public interface ISearchMatch
Search Matches
The ISearchMatch interface defines the index and length of a search match in
the input string.
string pattern = "fox";
string toSearch = "The quick brown fox jumps over the lazy dog";
...
Example Usage
The Search method of the IStringSearchAlgorithm instance (algorithm) returns
each of the matches as an enumeration of ISearchMatch objects.
Search Example
matchCount = 0
matchCount = 0
matchCount = 0
A T
Naive Search Algorithm
for (startIndex = 0; startIndex < toSearch.Length; startIndex++) {
matchCount = 0
A T
Naive Search Algorithm
for (startIndex = 0; startIndex < toSearch.Length; startIndex++) {
matchCount = 0
A T
Naive Search Algorithm
for (startIndex = 0; startIndex < toSearch.Length; startIndex++) {
matchCount = 0
A T
Naive Search Algorithm
for (startIndex = 0; startIndex < toSearch.Length; startIndex++) {
matchCount = 0
A T
Naive Search Algorithm
for (startIndex = 0; startIndex < toSearch.Length; startIndex++) {
matchCount = 0
A T
Naive Search Algorithm
for (startIndex = 0; startIndex < toSearch.Length; startIndex++) {
matchCount = 0
A T
Naive Search Algorithm
for (startIndex = 0; startIndex < toSearch.Length; startIndex++) {
matchCount = 0
A T
Naive Search Algorithm
for (startIndex = 0; startIndex < toSearch.Length; startIndex++) {
matchCount = 0
A T
Naive Search Algorithm
for (startIndex = 0; startIndex < toSearch.Length; startIndex++) {
matchCount = 0
A T
Naive Search Algorithm
for (startIndex = 0; startIndex < toSearch.Length; startIndex++) {
matchCount = 0
A T
Naive Search Algorithm
for (startIndex = 0; startIndex < toSearch.Length; startIndex++) {
matchCount = 0
A T
Naive Search Algorithm
for (startIndex = 0; startIndex < toSearch.Length; startIndex++) {
matchCount = 0
A T
Naive Search Algorithm
for (startIndex = 0; startIndex < toSearch.Length; startIndex++) {
matchCount = 0
A T
Naive Search Algorithm
for (startIndex = 0; startIndex < toSearch.Length; startIndex++) {
matchCount = 0
A T
Naive Search Algorithm
for (startIndex = 0; startIndex < toSearch.Length; startIndex++) {
matchCount = 0
A T
Naive Search Algorithm
for (startIndex = 0; startIndex < toSearch.Length; startIndex++) {
matchCount = 0
A T
Naive Search Algorithm
for (startIndex = 0; startIndex < toSearch.Length; startIndex++) {
matchCount = 0
A T
Naive Search Algorithm
for (startIndex = 0; startIndex < toSearch.Length; startIndex++) {
matchCount = 0
A T
Naive Search Algorithm
for (startIndex = 0; startIndex < toSearch.Length; startIndex++) {
matchCount = 0
A T
Naive Search Algorithm
for (startIndex = 0; startIndex < toSearch.Length; startIndex++) {
matchCount = 0
A T
Naive Search Algorithm
for (startIndex = 0; startIndex < toSearch.Length; startIndex++) {
matchCount = 0
A T
Naive Search Algorithm
for (startIndex = 0; startIndex < toSearch.Length; startIndex++) {
matchCount = 0
A T
Naive Search Performance
Stage 1 Stage 2
Pre-process the string to find to The string to find is search right-to-
build a bad match table left using the bad match table to
skip ahead at a mismatch
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Boyer-Moore-Horspool Algorithm
A B C D E F G H I J K L M N O P Q R S T
J K L M
Bad Match Table
class BadMatchTable {
readonly int missing; t Default value for items not in table
readonly Dictionary<int, int> table; t The table of offsets to shift
for (int i = 0; i < pattern.Length - 1; i++) { t For each character in the pattern
table[pattern[i]] = pattern.Length - i - 1; t Set the offset when that character is seen
}
}
T R U T H
public BadMatchTable(string pattern) {
missing = pattern.Length;
table = new Dictionary<int, int>();
T R U T H
public BadMatchTable(string pattern) {
missing = pattern.Length;
table = new Dictionary<int, int>();
T R U T H
public BadMatchTable(string pattern) {
missing = pattern.Length;
table = new Dictionary<int, int>();
T R U T H
public BadMatchTable(string pattern) {
missing = pattern.Length;
table = new Dictionary<int, int>();
T R U T H
public BadMatchTable(string pattern) {
missing = pattern.Length;
table = new Dictionary<int, int>();
T R U T H
public BadMatchTable(string pattern) {
missing = pattern.Length;
table = new Dictionary<int, int>();
T R U T H
public BadMatchTable(string pattern) {
missing = pattern.Length;
table = new Dictionary<int, int>();
T R U T H
public BadMatchTable(string pattern) {
missing = pattern.Length;
table = new Dictionary<int, int>();
T R U T H
public BadMatchTable(string pattern) {
missing = pattern.Length;
table = new Dictionary<int, int>();
T R U T H
public BadMatchTable(string pattern) {
missing = pattern.Length;
table = new Dictionary<int, int>();
T R U T H
public BadMatchTable(string pattern) {
missing = pattern.Length;
table = new Dictionary<int, int>();
T H E T R U T H I S O U T T H E R E
Boyer-Moore-Horspool Algorithm
T H E T R U T H I S O U T T H E R E
T R U T H
Boyer-Moore-Horspool Algorithm
T H E T R U T H I S O U T T H E R E
T R U T H
Index Value
? 5
T 1
R 3
U 2
Boyer-Moore-Horspool Algorithm
T H E T R U T H I S O U T T H E R E
T R U T H
Index Value
? 5
T 1
R 3
U 2
Boyer-Moore-Horspool Algorithm
T H E T R U T H I S O U T T H E R E
T R U T H
Index Value
? 5
T 1
R 3
U 2
Boyer-Moore-Horspool Algorithm
T H E T R U T H I S O U T T H E R E
T R U T H
Index Value
? 5
T 1
R 3
U 2
Boyer-Moore-Horspool Algorithm
T H E T R U T H I S O U T T H E R E
T R U T H
Index Value
? 5
T 1
R 3
U 2
Boyer-Moore-Horspool Algorithm
T H E T R U T H I S O U T T H E R E
T R U T H
Index Value
? 5
T 1
R 3
U 2
Boyer-Moore-Horspool Algorithm
T H E T R U T H I S O U T T H E R E
T R U T H
Index Value
? 5
T 1
R 3
U 2
Boyer-Moore-Horspool Algorithm
T H E T R U T H I S O U T T H E R E
T R U T H
Index Value
? 5
T 1
R 3
U 2
Boyer-Moore-Horspool Algorithm
T H E T R U T H I S O U T T H E R E
T R U T H
Index Value
? 5
T 1
R 3
U 2
Boyer-Moore-Horspool Algorithm
T H E T R U T H I S O U T T H E R E
T R U T H
Index Value
? 5
T 1
R 3
U 2
Boyer-Moore-Horspool Algorithm
T H E T R U T H I S O U T T H E R E
T R U T H
Index Value
? 5
T 1
R 3
U 2
Boyer-Moore-Horspool Algorithm
T H E T R U T H I S O U T T H E R E
T R U T H
Index Value
? 5
T 1
R 3
U 2
Boyer-Moore-Horspool Algorithm
T H E T R U T H I S O U T T H E R E
T R U T H
Index Value
? 5
T 1
R 3
U 2
Boyer-Moore-Horspool Algorithm
T H E T R U T H I S O U T T H E R E
T R U T H
Index Value
? 5
T 1
R 3
U 2
Boyer-Moore-Horspool Algorithm
T H E T R U T H I S O U T T H E R E
T R U T H
Index Value
? 5
T 1
R 3
U 2
Boyer-Moore-Horspool Algorithm
T H E T R U T H I S O U T T H E R E
T R U T H
Index Value
? 5
T 1
R 3
U 2
Boyer-Moore-Horspool Algorithm
T H E T R U T H I S O U T T H E R E
T R U T H
Index Value
? 5
T 1
R 3
U 2
Boyer-Moore-Horspool Algorithm
T H E T R U T H I S O U T T H E R E
T R U T H
Index Value
? 5
T 1
R 3
U 2
Boyer-Moore-Horspool Algorithm
T H E T R U T H I S O U T T H E R E
T R U T H
Index Value
? 5
T 1
R 3
U 2
Boyer-Moore-Horspool Performance
? The larger the bad match table the better the performance
Demo
Review Boyer-Moore-Horspool Algorithm
Demo: Text search and replace tool