Adsa Unit3
Adsa Unit3
2-3 Trees:
A 2-3 tree is a type of self-balancing search tree where every node can have two or three
children, and all leaves are at the same depth. It ensures that the tree remains balanced,
leading to efficient operations.
Characteristics of 2-3 Trees
1. Each internal node can be:
o A 2-node: Contains one key and has two children.
o A 3-node: Contains two keys and has three children.
2. The keys in a node are in sorted order.
3. All leaves are at the same level.
4. It maintains balance by splitting or merging nodes during insertions and deletions.
Search(x, key):
if x is a leaf:
if key is in x:
return True
else:
return False
else:
if key matches a key in x:
return True
else:
determine the appropriate child of x to traverse
return Search(child, key)
Example: Search for 25 in the following 2-3 tree:
css
[20, 40]
/ | \
[10] [30] [50, 60]
1. Compare with 20 and 40. Traverse the middle child.
2. Compare with 30. Found no match. Return False.
2. Insertion Operation
Insertions in 2-3 trees maintain balance by splitting nodes when necessary:
1. Locate the appropriate leaf node for the new key.
2. Insert the key into the node in sorted order.
3. If the node becomes overfilled (contains three keys), split it:
o Promote the middle key to the parent.
o Split the remaining keys into two new nodes.
Algorithm:
plaintext
Insert(x, key):
if x is a leaf:
insert key into x in sorted order
if x becomes overfilled:
split x and promote the middle key to the parent
else:
determine the appropriate child of x
Insert(child, key)
if child was split:
adjust x to include the promoted key
if x becomes overfilled:
split x and promote the middle key
Example: Insert 35 into the tree:
css
[20, 40]
/ | \
[10] [30] [50, 60]
1. Traverse to the middle child [30].
2. Insert 35 into [30], resulting in [30, 35].
3. No split needed.
3. Deletion Operation
Deletions involve:
1. Locate the key to be deleted.
2. If the key is in an internal node, replace it with its in-order predecessor or successor.
3. Adjust the tree to maintain balance:
o Borrow a key from a sibling if possible.
o Merge nodes if necessary.
Example: Delete 30 from the tree:
css
[20, 40]
/ | \
[10] [30] [50, 60]
1. Replace 30 with its in-order successor 35.
2. Merge nodes or adjust as needed.
Analysis of Operations
For a 2-3 tree with nnn keys:
1. Height: O(logn)O(\log n)O(logn) since the tree remains balanced.
2. Search: O(logn)O(\log n)O(logn), as in a binary search.
3. Insertion: O(logn)O(\log n)O(logn) due to localized splits.
4. Deletion: O(logn)O(\log n)O(logn) due to merges and adjustments.
Example in Action
Insert the following sequence into an empty 2-3 tree: 10, 20, 30, 40, 50.
1. Insert 10: [10].
2. Insert 20: [10, 20].
3. Insert 30: Split node. Promote 20. Tree becomes:
css
[20]
/ \
[10] [30]
4. Insert 40: Add to [30], resulting in [30, 40].
5. Insert 50: Split [30, 40, 50]. Promote 40. Tree becomes:
css
[20, 40]
/ | \
[10] [30] [50]
The tree remains balanced throughout.
Summary
2-3 trees offer efficient and predictable performance due to their balanced structure, making
them a reliable choice for applications requiring consistent O(logn)O(\log n)O(logn)
operations. By balancing nodes through localized splits and merges, they avoid the pitfalls of
unbalanced binary search trees.
B-Trees:
A B-Tree is a self-balancing tree data structure used primarily for database systems and file
systems where efficient disk access is required. It generalizes the 2-3 tree by allowing nodes
to have more than two or three children.
Characteristics of B-Trees
1. Node Properties:
o A node can contain multiple keys (up to m−1m-1m−1, where mmm is the
order of the tree).
o A node with kkk keys has k+1k+1k+1 children.
2. Balance:
o All leaves are at the same level.
o The tree grows or shrinks from the root, keeping it balanced.
3. Key Ordering:
o Keys within a node are in sorted order.
o Subtrees are arranged so that all keys in a subtree to the left of a key are
smaller, and all keys in a subtree to the right are larger.
Advantages of B-Trees Over Binary Search Trees (BSTs)
1. Better Disk Utilization:
o B-Trees are designed to minimize disk I/O by storing multiple keys in a single
node. This is particularly useful in databases and file systems.
2. Guaranteed Balance:
o Like 2-3 trees, B-Trees remain balanced, ensuring O(logn)O(\log n)O(logn)
operations. BSTs can degenerate into a linked list in the worst case, leading to
O(n)O(n)O(n) operations.
3. Scalability:
o B-Trees are better suited for large datasets since they reduce the number of
levels (and disk accesses) compared to BSTs.
4. Efficient Range Queries:
o Traversing keys in a B-Tree node and its children enables efficient range
queries compared to binary search trees.
Height of a B-Tree
The height hhh of a B-Tree depends on its order mmm and the number of keys nnn. The
height can be estimated as:
h≤log⌈m/2⌉(n+12)h \leq \log_{\lceil m/2 \rceil} \left( \frac{n+1}{2} \right)h≤log⌈m/2⌉(2n+1)
Key Observations:
A higher order mmm results in a shorter tree, as each node can hold more keys.
The height grows logarithmically with the number of keys nnn, ensuring efficient
operations.
Operations on B-Trees
1. Search Operation
The search process in a B-Tree is similar to a binary search on the keys within a node:
1. Search the current node for the target key.
2. If the key is not found, traverse the appropriate child node.
3. Repeat until the key is found or a leaf node is reached.
Algorithm:
plaintext
Search(x, key):
for i = 1 to number of keys in x:
if key == x.keys[i]:
return True
else if key < x.keys[i]:
return Search(x.children[i], key)
return Search(x.children[number of keys + 1], key)
Example: Search for 50 in the B-Tree:
css
[20, 40]
/ | \
[10] [30] [50, 60]
1. Compare with 20 and 40. Traverse the rightmost child.
2. Compare with 50. Found the key.
2. Insertion Operation
Insertion involves:
1. Locate the appropriate leaf node.
2. Insert the key into the node in sorted order.
3. If the node overflows (has mmm keys):
o Split the node into two nodes.
o Promote the middle key to the parent.
Algorithm:
plaintext
Insert(x, key):
if x is a leaf:
insert key into x in sorted order
if x becomes overfilled:
split x and promote the middle key to the parent
else:
determine the appropriate child of x
Insert(child, key)
if child was split:
adjust x to include the promoted key
if x becomes overfilled:
split x and promote the middle key
Example: Insert 70 into the B-Tree:
css
[20, 40]
/ | \
[10] [30] [50, 60]
1. Traverse to [50, 60].
2. Insert 70, resulting in [50, 60, 70].
3. Split [50, 60, 70]. Promote 60. Tree becomes:
css
3. Deletion Operation
Deletion involves:
1. Locate the key to be deleted.
2. If the key is in an internal node:
o Replace it with its in-order predecessor or successor.
3. If a node underflows (has fewer than ⌈m/2⌉−1\lceil m/2 \rceil - 1⌈m/2⌉−1 keys):
o Borrow a key from a sibling if possible.
o Otherwise, merge with a sibling.
Example: Delete 40 from the tree:
css
Analysis of Operations
For a B-Tree of order mmm and nnn keys:
1. Height: O(logn)O(\log n)O(logn), as the tree is balanced.
2. Search: O(logn)O(\log n)O(logn) due to binary search within nodes and traversal.
3. Insertion: O(logn)O(\log n)O(logn) for locating the position and performing splits.
4. Deletion: O(logn)O(\log n)O(logn) for locating the key and restructuring.
Example in Action
Insert the sequence 10, 20, 30, 40, 50, 60, 70, 80 into a B-Tree of order 3.
1. Insert 10, 20, 30: No splits. Tree becomes:
csharp
[20]
/ \
[10] [30, 40]
3. Insert 50, 60, 70: Split [30, 40, 50]. Promote 40. Tree becomes:
css
[20, 40]
/ | \
[10] [30] [50, 60, 70]
4. Insert 80: Split [50, 60, 70, 80]. Promote 60. Final tree:
css
Key Operations
1. Splaying
2. Search
3. Insertion
4. Deletion
Splaying
Splaying is the process of bringing a target node to the root of the tree using a series of
rotations. It is performed using three main cases:
1. Zig (Single Rotation)
When the node xx is the left or right child of the root.
o Left Zig: Rotate xx right.
o Right Zig: Rotate xx left.
2. Zig-Zig (Double Rotation)
When xx and its parent pp are both left or right children.
o Left Zig-Zig: Rotate pp right, then xx right.
o Right Zig-Zig: Rotate pp left, then xx left.
3. Zig-Zag (Double Rotation)
When xx is a left child, and pp is a right child (or vice versa).
o Left Zig-Zag: Rotate xx left, then xx right.
o Right Zig-Zag: Rotate xx right, then xx left.
Algorithm for Splaying
function Splay(tree, x):
while x is not the root:
if x.parent is root:
// Perform Zig
Rotate(x)
else:
if (x is left child of parent and parent is left child of grandparent) or
(x is right child of parent and parent is right child of grandparent):
// Perform Zig-Zig
Rotate(parent)
Rotate(x)
else:
// Perform Zig-Zag
Rotate(x)
Rotate(x)
return x
Update Operations
1. Insertion
o Insert the new element as in a regular BST.
o Splay the newly inserted node to the root.
2. function Insert(tree, key):
3. node = BST_Insert(tree, key)
4. Splay(tree, node)
5. Deletion
o Splay the node to be deleted to the root.
o Remove the root and merge the remaining subtrees.
6. function Delete(tree, key):
7. node = Search(tree, key)
8. if node exists:
9. Splay(tree, node)
10. if node.left:
11. leftSubtree = node.left
12. leftSubtree.parent = null
13. if node.right:
14. rightSubtree = node.right
15. rightSubtree.parent = null
16. tree.root = Join(leftSubtree, rightSubtree)
UNIT-IV
3. String Length
Calculate the number of characters in a string.
# Example in Python
text = "Hello"
length = len(text)
print(length) # Output: 5
function ComputeLPS(pattern):
lps = array of size len(pattern) initialized to 0
length = 0
i=1
while i < len(pattern):
if pattern[i] == pattern[length]:
length += 1
lps[i] = length
i += 1
else:
if length != 0:
length = lps[length - 1]
else:
lps[i] = 0
i += 1
return lps
Example For text = "ababcababcabc" and pattern = "abc", KMP will find all
occurrences efficiently.
Practical Applications
1. Search Engines: Pattern matching and keyword searching.
2. Spell Checkers: Longest common substring and edit distance.
3. Bioinformatics: DNA sequence analysis using substring operations.
4. Data Parsing: Tokenizing and processing structured text like JSON or XML.
Let me know if you'd like to explore more advanced string operations or algorithms!
Brute-Force Pattern Matching
Overview
The brute-force (or naive) pattern matching algorithm compares the pattern with every
substring of the text. It is simple to implement but inefficient for long texts and patterns.
Algorithm
1. Start at the beginning of the text.
2. Compare each character of the pattern with the corresponding characters in the
text.
3. If all characters match, record the match position.
4. Shift the pattern one position to the right and repeat until the end of the text is
reached.
Pseudocode
function BruteForceMatch(text, pattern):
n = len(text)
m = len(pattern)
for i = 0 to n - m:
match = true
for j = 0 to m - 1:
if text[i + j] != pattern[j]:
match = false
break
if match:
return i // Match found at index i
return -1 // No match found
Time Complexity
Best Case: O(n)O(n) (when the first character of the pattern doesn't match most
positions)
Worst Case: O(n⋅m)O(n \cdot m) (when the pattern and text are similar)
Example
For text = "abcabcabcd" and pattern = "abcd", the algorithm will compare each
position until it finds a match at index 6.
The Boyer-Moore Algorithm
Overview
The Boyer-Moore algorithm is an efficient string-searching algorithm that skips sections
of the text by using information gathered during preprocessing. It works by aligning the
pattern with the text and using two key heuristics:
1. Bad Character Heuristic
2. Good Suffix Heuristic
1. Bad Character Heuristic
If a mismatch occurs, align the pattern such that the mismatched character in the text
aligns with its last occurrence in the pattern. If the character does not exist in the
pattern, shift the pattern past the mismatched character.
Preprocessing:
1. Create a table bad_char where bad_char[c] stores the last occurrence of
character c in the pattern.
2. If c is not in the pattern, set bad_char[c] = -1.
Shift Calculation:
\text{shift} = \max(1, j - \text{bad_char[text[i + j]]})
Where jj is the mismatch position.
Algorithm
function BoyerMooreMatch(text, pattern):
n = len(text)
m = len(pattern)
bad_char = PreprocessBadCharacter(pattern)
suffix = PreprocessGoodSuffix(pattern)
shift = 0
while shift <= n - m:
j=m-1
while j >= 0 and pattern[j] == text[shift + j]:
j -= 1
if j < 0:
return shift // Match found
shift += (m - bad_char[text[shift + m]] if shift + m < n else 1)
else:
shift += max(1, j - bad_char[text[shift + j]])
return -1 // No match found
function PreprocessBadCharacter(pattern):
bad_char = array of size 256 initialized to -1
for i = 0 to len(pattern) - 1:
bad_char[pattern[i]] = i
return bad_char
function PreprocessGoodSuffix(pattern):
m = len(pattern)
suffix = array of size m initialized to 0
shift = array of size m initialized to 0
// Compute suffix and shift arrays
// (Details omitted for brevity)
return shift
Time Complexity
Best Case: O(n/m)O(n / m) (skips many comparisons)
Worst Case: O(n+m)O(n + m)
Example
For text = "ABAAABCD" and pattern = "ABC", the Boyer-Moore algorithm
preprocesses the pattern and skips unnecessary comparisons, leading to faster
matching.
1. Preprocess pattern = "ABC":
o Bad character table: {A: 0, B: 1, C: 2}
2. Search:
o Compare pattern with text starting at index 0.
o Mismatch at index 2 (C vs A), shift by 2 (from bad character heuristic).
o Compare again and find a match at index 4.
Efficiency Slow for large n,mn, m Very fast for large n,mn, m
Key Concepts
1. Avoid Redundant Comparisons
When a mismatch occurs, the algorithm shifts the pattern in a way that skips
unnecessary rechecks of characters that were already matched.
2. Longest Prefix-Suffix (LPS) Array
The LPS array stores the lengths of the proper prefixes of the pattern that are
also suffixes. It helps determine how much to shift the pattern when a mismatch
occurs.
Algorithm
1. Preprocessing: Compute the LPS Array
The LPS array is computed for the pattern. For each position ii, LPS[i]LPS[i]
represents the length of the longest proper prefix of the substring pattern[0:i+1] that is
also a suffix.
Pseudocode
function ComputeLPS(pattern):
m = len(pattern)
LPS = array of size m initialized to 0
length = 0 // Length of the previous longest prefix suffix
i = 1 // Start from the second character
while i < m:
if pattern[i] == pattern[length]:
length += 1
LPS[i] = length
i += 1
else:
if length != 0:
length = LPS[length - 1]
else:
LPS[i] = 0
i += 1
return LPS
2. Pattern Matching
Compare the pattern against the text, using the LPS array to skip characters on a
mismatch.
Pseudocode
function KMPSearch(text, pattern):
n = len(text)
m = len(pattern)
LPS = ComputeLPS(pattern)
i = 0 // Index for text
j = 0 // Index for pattern
while i < n:
if text[i] == pattern[j]:
i += 1
j += 1
if j == m:
return i - j // Match found at index (i - j)
j = LPS[j - 1] // Reset j to LPS[j-1]
elif i < n and text[i] != pattern[j]:
if j != 0:
j = LPS[j - 1] // Use LPS to shift the pattern
else:
i += 1
return -1 // No match found
Example
Input:
Text: "ababcabcabababd"
Pattern: "ababd"
Step 1: Compute LPS Array
For pattern = "ababd", the LPS array is computed as follows:
Index ii Pattern pattern[0:i+1]pattern[0:i+1] LPS[i]
0 a 0
1 ab 0
2 aba 1
3 abab 2
4 ababd 0
LPS Array: [0, 0, 1, 2, 0]
5 c a Mismatch Shift ii
Pattern found at
12 d d Match
index i−j=10i - j = 10
Output
Pattern found at index 10.
Time Complexity
1. Preprocessing (LPS Array): O(m)O(m)
2. Pattern Matching: O(n)O(n)
Overall Time Complexity: O(n+m)O(n + m)
Advantages
Efficient for large texts and patterns.
Linear time complexity ensures scalability.
Works well with repetitive patterns in the text.
Applications
Text editors for find-and-replace functionality.
DNA sequence matching in bioinformatics.
Spam filtering and malware detection.
Let me know if you'd like further clarifications or visualizations!
Tries (Prefix Trees)
A Trie is a tree-like data structure that stores strings (or sequences) efficiently. It is
primarily used for searching, insertion, and prefix-matching operations. Variants of
Tries, such as Standard Tries, Compressed Tries, and Suffix Tries, optimize storage and
operations for specific use cases.
1. Standard Tries
Definition
A Standard Trie is a tree where:
Each node represents a character of a string.
Paths from the root to a leaf represent complete strings.
Common prefixes are shared among strings.
Key Operations
1. Insertion
o Start at the root.
o Traverse or create child nodes for each character in the string.
o Mark the final node as a terminal node.
2. Search
o Traverse the nodes corresponding to each character of the string.
o If all characters match and the last node is terminal, the string is found.
Algorithm
Insertion
function Insert(root, word):
current = root
for char in word:
if char not in current.children:
current.children[char] = new TrieNode()
current = current.children[char]
current.isTerminal = true
Search
function Search(root, word):
current = root
for char in word:
if char not in current.children:
return false
current = current.children[char]
return current.isTerminal
Example
Insert: ["cat", "car", "cart", "dog"]
Structure:
root
/ \
c d
/\ \
a a o
/ \ \
t r g
\
t
Search:
"cat" → Found.
"doge" → Not Found.
Time Complexity
Insertion/Search: O(L)O(L), where LL is the length of the string.
2. Compressed Tries
Definition
A Compressed Trie optimizes the Standard Trie by:
Combining chains of single-child nodes into a single edge labeled with the
concatenated string.
Reducing space usage and traversal time.
Key Operations
1. Insertion
o Similar to Standard Trie but merge single-child chains.
2. Search
o Traverse nodes by matching entire edge labels instead of single
characters.
Algorithm
Insertion
function InsertCompressed(root, word):
current = root
i=0
while i < len(word):
found = false
for child in current.children:
commonPrefix = FindCommonPrefix(child.label, word[i:])
if commonPrefix:
SplitNode(current, child, commonPrefix)
current = child
i += len(commonPrefix)
found = true
break
if not found:
newNode = new TrieNode(word[i:])
current.children.append(newNode)
break
Search
function SearchCompressed(root, word):
current = root
i=0
while i < len(word):
found = false
for child in current.children:
if word[i:].startswith(child.label):
current = child
i += len(child.label)
found = true
break
if not found:
return false
return current.isTerminal
Example
Insert: ["cat", "car", "cart"]
Structure:
root
/ \
c d
/\
at ar
\
t
Search:
"cat" → Found.
"cart" → Found.
Time Complexity
Insertion/Search: O(L)O(L), where LL is the length of the string.
3. Suffix Tries
Definition
A Suffix Trie stores all suffixes of a given string. It is used for pattern matching,
substring searching, and finding the longest repeated substring.
Key Features
Each path from the root to a leaf represents a suffix of the string.
Useful for solving string problems efficiently.
Construction
To construct a Suffix Trie for a string SS:
1. Add all suffixes of SS into the Trie.
Algorithm
Construction
function BuildSuffixTrie(string):
root = new TrieNode()
for i = 0 to len(string) - 1:
suffix = string[i:]
Insert(root, suffix)
return root
Search
function SearchSuffixTrie(root, pattern):
current = root
for char in pattern:
if char not in current.children:
return false
current = current.children[char]
return true
Example
Build Suffix Trie for S="banana"S = "banana":
Suffixes: "banana", "anana", "nana", "ana", "na", "a"
Structure:
root
/ \
b a
\ /\
a n n
\| |
na a
\|
a
Search:
"ana" → Found.
"nan" → Found.
"abc" → Not Found.
Time Complexity
Construction: O(n2)O(n^2) (without optimization), where nn is the length of the
string.
Search: O(m)O(m), where mm is the length of the pattern.
Comparison
Feature Standard Trie Compressed Trie Suffix Trie
matching matching
Let me know if you'd like more details on any of these Trie types!
Huffman Coding
Huffman Coding is a lossless data compression algorithm. It is used to encode data by
assigning variable-length binary codes to characters, such that more frequent
characters have shorter codes and less frequent characters have longer codes. This
minimizes the total number of bits required to represent the data.
Key Concepts
1. Prefix Codes
o Huffman codes are prefix codes, meaning no code is a prefix of another.
This ensures unambiguous decoding.
2. Frequency-Based Tree Construction
o A binary tree is constructed based on the frequency of characters in the
input. Characters with lower frequency are placed deeper in the tree.
Algorithm
Build Huffman Tree
function BuildHuffmanTree(frequencies):
priorityQueue = MinHeap()
for char, freq in frequencies:
node = new TreeNode(char, freq)
priorityQueue.insert(node)
Example
Input Data
data = "AAABBCDA"
Step 1: Count Frequencies
Character Frequency
A 4
B 2
C 1
D 1
Step 2: Build Huffman Tree
Insert all characters into the priority queue.
Merge nodes:
1. Merge C and D → Frequency: 2.
2. Merge B and the merged C-D → Frequency: 4.
3. Merge A and the merged B-C-D → Frequency: 8 (root).
Huffman Tree:
(*,8)
/ \
(A,4) (*,4)
/ \
(B,2) (*,2)
/ \
(C,1) (D,1)
Step 3: Generate Codes
Character Huffman Code
A 0
B 10
C 110
D 111
Step 4: Encode Data
Original: AAABBCDA
Encoded: 0001010111010
Step 5: Decode Data
Encoded: 0001010111010
Decoded: AAABBCDA
Advantages
1. Optimality
o Produces the most efficient prefix code for given character frequencies.
2. Space-Efficient
o Reduces the number of bits required to store data.
Applications
1. Compression formats (e.g., JPEG, MP3, ZIP).
2. Efficient transmission of data in telecommunication.
Time Complexity
1. Build Huffman Tree: O(nlogn)O(n \log n), where nn is the number of characters.
2. Generate Codes: O(n)O(n).
3. Encode/Decode: O(L)O(L), where LL is the total length of the encoded data.
Let me know if you’d like further clarifications, examples, or a visualization!
The Longest Common Subsequence (LCS) Problem
The Longest Common Subsequence (LCS) problem involves finding the longest
sequence that can appear in both given strings without altering the order of characters,
though they don’t need to be contiguous.
Definition
Given two sequences (strings) X = x1, x2, ..., xm and Y = y1, y2, ..., yn, the LCS is a
subsequence S such that:
S is a subsequence of both X and Y.
S is as long as possible.
Properties of LCS
Subsequence: A subsequence is derived from a sequence by deleting some or no
elements without changing the order of the remaining elements.
Longest: Out of all common subsequences, we are interested in the longest one.
Algorithm
1. Dynamic Programming for LCS Length
function LCS(X, Y):
m = length of X
n = length of Y
dp = 2D array of size (m+1) x (n+1)
for i from 0 to m:
dp[i][0] = 0 // Base case: LCS of X with empty Y
for j from 0 to n:
dp[0][j] = 0 // Base case: LCS of Y with empty X
for i from 1 to m:
for j from 1 to n:
if X[i-1] == Y[j-1]:
dp[i][j] = dp[i-1][j-1] + 1 // Characters match, increase LCS
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1]) // Max of excluding current characters
return dp[m][n] // The length of LCS is in dp[m][n]
2. Reconstructing the LCS
function ReconstructLCS(X, Y, dp):
i = length of X
j = length of Y
LCS_string = ""
return LCS_string
Example
Let's solve the LCS problem for two strings:
X = "AGGTAB"
Y = "GXTXAYB"
Step 1: Initialize the DP Table
We initialize a DP table dp with dimensions (7 x 7) (since both X and Y have length 6),
where dp[i][j] will store the length of LCS of the first i characters of X and the first j
characters of Y.
Step 2: Fill the DP Table
We fill the DP table according to the recurrence relation:
GXTXAYB
A0 0 0 0 1 1 1
GXTXAYB
G1 1 1 1 1 1 1
G1 1 1 1 1 1 1
T 1 1 2 2 2 2 2
A1 1 2 2 3 3 3
B 1 1 2 2 3 3 4
Step 3: Reconstruct the LCS
Starting from dp[6][6], we trace back to find the common subsequence:
B matches.
A matches.
T matches.
G matches.
Thus, the LCS is "GTAB".
Step 4: Result
The length of the LCS is 4 and the LCS itself is "GTAB".
Applications
Text comparison: For example, comparing documents or version control.
Bioinformatics: Finding similarities between DNA, RNA, or protein sequences.
File comparison: Comparing files for differences.
Computational Geometry: One-Dimensional Range Searching
In Computational Geometry, Range Searching refers to the problem of preprocessing a
set of points or objects in such a way that allows for efficient querying of all points that
lie within a given range. In one-dimensional range searching, we are concerned with
finding points that lie within a specified interval on the number line.
Problem Definition
Given a set of nn points, P={p1,p2,...,pn}P = \{ p_1, p_2, ..., p_n \}, on the real line, a
range query involves finding all the points from PP that lie within a given interval [a,b]
[a, b], where a≤ba \leq b.
Example:
Let P={1,3,5,7,9}P = \{ 1, 3, 5, 7, 9 \}.
For the query [4,8][4, 8], the points in the set PP that lie within the range are 5,75, 7.
Naive Approach
The simplest method for solving the 1D range searching problem is to check each point
individually to see if it lies within the range [a,b][a, b].
Algorithm (Naive Approach):
function RangeQueryNaive(P, a, b):
result = []
for each point p in P:
if a <= p <= b:
result.append(p)
return result
Time Complexity:
Time Complexity: O(n)O(n), where nn is the number of points in the set PP.
This approach checks each point individually, so it requires linear time.
Example:
P={1,3,5,7,9}P = \{ 1, 3, 5, 7, 9 \}, Query [4,8][4, 8].
Points inside the range are 5,75, 7.
Comparison of Approaches
Approach Time Complexity (Preprocessing) Time Complexity (Query)
Naive Approach
The simplest approach to solve this problem is to check each point individually to see if
it lies within the query rectangle.
Algorithm (Naive):
function RangeQueryNaive(P, x1, y1, x2, y2):
result = []
for each point p in P:
if x1 <= p.x <= x2 and y1 <= p.y <= y2:
result.append(p)
return result
Time Complexity:
Time Complexity: O(n)O(n), where nn is the number of points in the set PP. This
approach checks each point individually, making it inefficient for large datasets.
Example:
Given P={(1,3),(3,4),(5,7),(7,8),(2,6)}P = \{ (1, 3), (3, 4), (5, 7), (7, 8), (2, 6) \} and the
query rectangle [2,4][2, 4] to [6,8][6, 8]:
Points inside the rectangle are (3,4),(5,7),(2,6)(3, 4), (5, 7), (2, 6).
Efficient Approaches
For large sets of points, the naive approach becomes inefficient. More efficient methods
for two-dimensional range searching include:
1. Sorting + Binary Search
2. Range Tree
3. k-d Tree
Let’s explore each method.
return result
Time Complexity:
Sorting: O(nlogn)O(n \log n) for sorting points by xx-coordinates.
Querying: O(n)O(n) for scanning all points within the range (since we still check
each point's yy-coordinate).
Total Time Complexity: O(nlogn)O(n \log n).
2. Range Tree
A Range Tree is a balanced binary search tree built for multidimensional data. In the
case of two-dimensional range searching, we can build a tree based on the x-coordinate
and store a secondary balanced tree for each node, which is sorted by the y-coordinate.
Steps:
1. Construct a 2D Range Tree:
o Build a balanced binary search tree (BST) on the xx-coordinates.
o At each node, store another BST of the yy-coordinates of the points in its
subtree.
2. Query the Range Tree:
o Perform a range query on the x-coordinates using the main tree (BST).
o For each node in the query range, perform a range query on its secondary
tree (y-coordinates).
Algorithm:
function RangeQueryRangeTree(RangeTree, x1, y1, x2, y2):
result = []
Perform range query on the x-coordinate tree from x1 to x2
3. k-d Tree
A k-d Tree (k-dimensional tree) is a binary tree in which every non-leaf node splits the
space into two parts by using a hyperplane perpendicular to one of the axes. In the case
of 2D range searching, the tree alternates between splitting by the x-axis and the y-axis
at each level.
Steps:
1. Build a k-d Tree:
o Construct a balanced tree where each node stores a point and alternates
between x and y coordinates for splitting.
2. Query the k-d Tree:
o To query a range, traverse the tree by checking whether the query
rectangle intersects the splitting planes at each level.
Algorithm:
function RangeQueryKDTree(kdTree, x1, y1, x2, y2):
result = []
Traverse the tree using recursive search:
if the region overlaps with the query range, explore the child nodes
if the point lies within the range, add it to the result
return result
Time Complexity:
Building the k-d Tree: O(nlogn)O(n \log n) for constructing the tree.
Querying: O(logn+k)O(\log n + k), where kk is the number of points in the
range, and we recursively traverse the tree.
Comparison of Approaches
Approach Time Complexity (Preprocessing) Time Complexity (Query)
Quadtrees
A Quadtree is a data structure used to partition 2D space into smaller regions, making it
ideal for spatial indexing and handling 2D range queries. It is particularly useful for
applications like image processing and geographical databases.
Steps for Constructing a Quadtree:
1. Divide the 2D Space:
o Start by dividing the 2D space into four quadrants. Each quadrant can be
subdivided further into four quadrants recursively until a stopping
condition (e.g., a maximum number of points per quadrant or a minimum
quadrant size) is met.
2. Insert Points:
o Points are inserted into the appropriate quadrant based on their spatial
location.
3. Query:
o A range query is performed by checking the quadrants that overlap with
the query region.
Algorithm for Building a Quadtree:
function BuildQuadtree(points, region):
if region is sufficiently small or points are few:
create a leaf node with all points in the region
else:
divide the region into four quadrants
recursively build the tree for each quadrant
return quadtree
Algorithm for Searching in a Quadtree:
function RangeQueryQuadtree(root, queryRegion):
result = []
if root is a leaf node:
for each point in root:
if point is inside queryRegion:
result.append(point)
else:
for each child quadrant of root:
if queryRegion overlaps with quadrant:
result += RangeQueryQuadtree(child, queryRegion)
return result
Time Complexity:
Building the Tree: O(n)O(n), where nn is the number of points.
Querying:
o Best Case: O(logn)O(\log n), when the query region covers a small part of
the space.
o Worst Case: O(n)O(n), when the query region covers the entire space.
return result
Time Complexity: In the worst case, the algorithm may need to explore all points
in the tree, so the time complexity can be O(n)O(n). In the best case, if the query
range is small and well-contained, the complexity is reduced to O(logn+k)O(\log
n + k), where kk is the number of points in the range.
2. Nearest Neighbor Search:
The nearest neighbor search finds the point closest to a given query point.
The search starts by finding the closest point along the tree’s splitting dimension.
It then recursively explores the nearest subtree, and if necessary, checks the
other subtree for potential closer points.
Algorithm for Nearest Neighbor Search in k-D Tree:
function NearestNeighbor(root, query_point):
best = root.point
best_distance = distance(best, query_point)
return best
Time Complexity: The complexity of nearest neighbor search is O(logn)O(\log n)
on average, but can degrade to O(n)O(n) in the worst case (if the tree is
unbalanced).