0% found this document useful (0 votes)
6 views57 pages

Adsa Unit3

Uploaded by

harini yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views57 pages

Adsa Unit3

Uploaded by

harini yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 57

Unit-III

2-3 Trees:
A 2-3 tree is a type of self-balancing search tree where every node can have two or three
children, and all leaves are at the same depth. It ensures that the tree remains balanced,
leading to efficient operations.
Characteristics of 2-3 Trees
1. Each internal node can be:
o A 2-node: Contains one key and has two children.
o A 3-node: Contains two keys and has three children.
2. The keys in a node are in sorted order.
3. All leaves are at the same level.
4. It maintains balance by splitting or merging nodes during insertions and deletions.

Advantages of 2-3 Trees Over Binary Search Trees (BSTs)


1. Guaranteed Balance:
o A binary search tree can degenerate into a linked list in the worst case, leading
to O(n)O(n)O(n) operations.
o A 2-3 tree remains balanced, ensuring O(log⁡n)O(\log n)O(logn) operations for
search, insertion, and deletion.
2. Efficient Updates:
o The restructuring operations in 2-3 trees are localized, maintaining balance
without requiring extensive rotations as in AVL or Red-Black Trees.
3. Predictable Performance:
o In 2-3 trees, the height of the tree is always minimized for the number of
elements, leading to consistent performance.

Operations on 2-3 Trees


1. Search Operation
The search operation is similar to a binary search tree. At each node:
 Compare the target key with the keys in the node.
 Traverse the appropriate child based on the comparison.
Algorithm:
plaintext

Search(x, key):
if x is a leaf:
if key is in x:
return True
else:
return False
else:
if key matches a key in x:
return True
else:
determine the appropriate child of x to traverse
return Search(child, key)
Example: Search for 25 in the following 2-3 tree:
css

[20, 40]
/ | \
[10] [30] [50, 60]
1. Compare with 20 and 40. Traverse the middle child.
2. Compare with 30. Found no match. Return False.

2. Insertion Operation
Insertions in 2-3 trees maintain balance by splitting nodes when necessary:
1. Locate the appropriate leaf node for the new key.
2. Insert the key into the node in sorted order.
3. If the node becomes overfilled (contains three keys), split it:
o Promote the middle key to the parent.
o Split the remaining keys into two new nodes.
Algorithm:
plaintext

Insert(x, key):
if x is a leaf:
insert key into x in sorted order
if x becomes overfilled:
split x and promote the middle key to the parent
else:
determine the appropriate child of x
Insert(child, key)
if child was split:
adjust x to include the promoted key
if x becomes overfilled:
split x and promote the middle key
Example: Insert 35 into the tree:
css

[20, 40]
/ | \
[10] [30] [50, 60]
1. Traverse to the middle child [30].
2. Insert 35 into [30], resulting in [30, 35].
3. No split needed.

3. Deletion Operation
Deletions involve:
1. Locate the key to be deleted.
2. If the key is in an internal node, replace it with its in-order predecessor or successor.
3. Adjust the tree to maintain balance:
o Borrow a key from a sibling if possible.
o Merge nodes if necessary.
Example: Delete 30 from the tree:
css

[20, 40]
/ | \
[10] [30] [50, 60]
1. Replace 30 with its in-order successor 35.
2. Merge nodes or adjust as needed.

Analysis of Operations
For a 2-3 tree with nnn keys:
1. Height: O(log⁡n)O(\log n)O(logn) since the tree remains balanced.
2. Search: O(log⁡n)O(\log n)O(logn), as in a binary search.
3. Insertion: O(log⁡n)O(\log n)O(logn) due to localized splits.
4. Deletion: O(log⁡n)O(\log n)O(logn) due to merges and adjustments.

Example in Action
Insert the following sequence into an empty 2-3 tree: 10, 20, 30, 40, 50.
1. Insert 10: [10].
2. Insert 20: [10, 20].
3. Insert 30: Split node. Promote 20. Tree becomes:
css

[20]
/ \
[10] [30]
4. Insert 40: Add to [30], resulting in [30, 40].
5. Insert 50: Split [30, 40, 50]. Promote 40. Tree becomes:
css

[20, 40]
/ | \
[10] [30] [50]
The tree remains balanced throughout.

Summary
2-3 trees offer efficient and predictable performance due to their balanced structure, making
them a reliable choice for applications requiring consistent O(log⁡n)O(\log n)O(logn)
operations. By balancing nodes through localized splits and merges, they avoid the pitfalls of
unbalanced binary search trees.

B-Trees:
A B-Tree is a self-balancing tree data structure used primarily for database systems and file
systems where efficient disk access is required. It generalizes the 2-3 tree by allowing nodes
to have more than two or three children.

Characteristics of B-Trees
1. Node Properties:
o A node can contain multiple keys (up to m−1m-1m−1, where mmm is the
order of the tree).
o A node with kkk keys has k+1k+1k+1 children.
2. Balance:
o All leaves are at the same level.
o The tree grows or shrinks from the root, keeping it balanced.
3. Key Ordering:
o Keys within a node are in sorted order.
o Subtrees are arranged so that all keys in a subtree to the left of a key are
smaller, and all keys in a subtree to the right are larger.
Advantages of B-Trees Over Binary Search Trees (BSTs)
1. Better Disk Utilization:
o B-Trees are designed to minimize disk I/O by storing multiple keys in a single
node. This is particularly useful in databases and file systems.
2. Guaranteed Balance:
o Like 2-3 trees, B-Trees remain balanced, ensuring O(log⁡n)O(\log n)O(logn)
operations. BSTs can degenerate into a linked list in the worst case, leading to
O(n)O(n)O(n) operations.
3. Scalability:
o B-Trees are better suited for large datasets since they reduce the number of
levels (and disk accesses) compared to BSTs.
4. Efficient Range Queries:
o Traversing keys in a B-Tree node and its children enables efficient range
queries compared to binary search trees.

Height of a B-Tree
The height hhh of a B-Tree depends on its order mmm and the number of keys nnn. The
height can be estimated as:
h≤log⁡⌈m/2⌉(n+12)h \leq \log_{\lceil m/2 \rceil} \left( \frac{n+1}{2} \right)h≤log⌈m/2⌉(2n+1)
Key Observations:
 A higher order mmm results in a shorter tree, as each node can hold more keys.
 The height grows logarithmically with the number of keys nnn, ensuring efficient
operations.

Operations on B-Trees
1. Search Operation
The search process in a B-Tree is similar to a binary search on the keys within a node:
1. Search the current node for the target key.
2. If the key is not found, traverse the appropriate child node.
3. Repeat until the key is found or a leaf node is reached.
Algorithm:
plaintext
Search(x, key):
for i = 1 to number of keys in x:
if key == x.keys[i]:
return True
else if key < x.keys[i]:
return Search(x.children[i], key)
return Search(x.children[number of keys + 1], key)
Example: Search for 50 in the B-Tree:
css

[20, 40]
/ | \
[10] [30] [50, 60]
1. Compare with 20 and 40. Traverse the rightmost child.
2. Compare with 50. Found the key.

2. Insertion Operation
Insertion involves:
1. Locate the appropriate leaf node.
2. Insert the key into the node in sorted order.
3. If the node overflows (has mmm keys):
o Split the node into two nodes.
o Promote the middle key to the parent.
Algorithm:
plaintext

Insert(x, key):
if x is a leaf:
insert key into x in sorted order
if x becomes overfilled:
split x and promote the middle key to the parent
else:
determine the appropriate child of x
Insert(child, key)
if child was split:
adjust x to include the promoted key
if x becomes overfilled:
split x and promote the middle key
Example: Insert 70 into the B-Tree:
css

[20, 40]
/ | \
[10] [30] [50, 60]
1. Traverse to [50, 60].
2. Insert 70, resulting in [50, 60, 70].
3. Split [50, 60, 70]. Promote 60. Tree becomes:
css

[20, 40, 60]


/ | | \
[10] [30] [50] [70]

3. Deletion Operation
Deletion involves:
1. Locate the key to be deleted.
2. If the key is in an internal node:
o Replace it with its in-order predecessor or successor.

3. If a node underflows (has fewer than ⌈m/2⌉−1\lceil m/2 \rceil - 1⌈m/2⌉−1 keys):
o Borrow a key from a sibling if possible.
o Otherwise, merge with a sibling.
Example: Delete 40 from the tree:
css

[20, 40, 60]


/ | | \
[10] [30] [50] [70]
1. Replace 40 with its in-order successor 50.
2. Adjust the structure if needed.

Analysis of Operations
For a B-Tree of order mmm and nnn keys:
1. Height: O(log⁡n)O(\log n)O(logn), as the tree is balanced.
2. Search: O(log⁡n)O(\log n)O(logn) due to binary search within nodes and traversal.
3. Insertion: O(log⁡n)O(\log n)O(logn) for locating the position and performing splits.
4. Deletion: O(log⁡n)O(\log n)O(logn) for locating the key and restructuring.

Example in Action
Insert the sequence 10, 20, 30, 40, 50, 60, 70, 80 into a B-Tree of order 3.
1. Insert 10, 20, 30: No splits. Tree becomes:
csharp

[10, 20, 30]


2. Insert 40: Split. Promote 20. Tree becomes:
css

[20]
/ \
[10] [30, 40]
3. Insert 50, 60, 70: Split [30, 40, 50]. Promote 40. Tree becomes:
css

[20, 40]
/ | \
[10] [30] [50, 60, 70]
4. Insert 80: Split [50, 60, 70, 80]. Promote 60. Final tree:
css

[20, 40, 60]


/ | | \
[10] [30] [50] [70, 80]
This structure remains balanced, ensuring efficient operations.
Splay Trees
A splay tree is a self-adjusting binary search tree (BST) where the most recently accessed
element is brought to the root through an operation called splaying. This improves access
time for frequently accessed elements over a sequence of operations.
Splay trees do not maintain strict balance but offer good amortized performance. Each
operation (search, insert, delete) has an amortized time complexity of O(log⁡n)O(\log n).

Key Operations
1. Splaying
2. Search
3. Insertion
4. Deletion

Splaying
Splaying is the process of bringing a target node to the root of the tree using a series of
rotations. It is performed using three main cases:
1. Zig (Single Rotation)
When the node xx is the left or right child of the root.
o Left Zig: Rotate xx right.
o Right Zig: Rotate xx left.
2. Zig-Zig (Double Rotation)
When xx and its parent pp are both left or right children.
o Left Zig-Zig: Rotate pp right, then xx right.
o Right Zig-Zig: Rotate pp left, then xx left.
3. Zig-Zag (Double Rotation)
When xx is a left child, and pp is a right child (or vice versa).
o Left Zig-Zag: Rotate xx left, then xx right.
o Right Zig-Zag: Rotate xx right, then xx left.
Algorithm for Splaying
function Splay(tree, x):
while x is not the root:
if x.parent is root:
// Perform Zig
Rotate(x)
else:
if (x is left child of parent and parent is left child of grandparent) or
(x is right child of parent and parent is right child of grandparent):
// Perform Zig-Zig
Rotate(parent)
Rotate(x)
else:
// Perform Zig-Zag
Rotate(x)
Rotate(x)
return x

Search in Splay Trees


Searching in a splay tree involves:
1. Performing a standard BST search.
2. Splaying the accessed node to the root.
Algorithm
function Search(tree, key):
node = BST_Search(tree, key)
if node exists:
Splay(tree, node)
return node
Example: For a splay tree containing {10, 20, 30, 40, 50}, searching for 30 will bring it to
the root.

Update Operations
1. Insertion
o Insert the new element as in a regular BST.
o Splay the newly inserted node to the root.
2. function Insert(tree, key):
3. node = BST_Insert(tree, key)
4. Splay(tree, node)
5. Deletion
o Splay the node to be deleted to the root.
o Remove the root and merge the remaining subtrees.
6. function Delete(tree, key):
7. node = Search(tree, key)
8. if node exists:
9. Splay(tree, node)
10. if node.left:
11. leftSubtree = node.left
12. leftSubtree.parent = null
13. if node.right:
14. rightSubtree = node.right
15. rightSubtree.parent = null
16. tree.root = Join(leftSubtree, rightSubtree)

Amortized Analysis of Splaying


The amortized cost of a splay operation is analyzed using the potential method with the
rank of a node defined as:
rank(x)=log⁡(subtree size of x)\text{rank}(x) = \log(\text{subtree size of } x)
Steps
1. Define potential Φ\Phi of the tree as: Φ=∑x∈treerank(x)\Phi = \sum_{x \in \
text{tree}} \text{rank}(x)
2. The amortized cost of a splay operation is: amortized cost=actual cost+ΔΦ\
text{amortized cost} = \text{actual cost} + \Delta \Phi
3. The analysis shows the amortized cost of each operation is O(log⁡n)O(\log n).
Example
In a splay tree with n=8n = 8, accessing the deepest node may take O(log⁡n)O(\log n) rotations
to bring it to the root. Over a sequence of operations, the amortized cost remains O(log⁡n)O(\
log n).

Advantages of Splay Trees


 Fast access to frequently used elements due to locality of reference.
 Simple implementation compared to other self-balancing trees like AVL or Red-Black
trees.
 No additional storage for balancing factors or colors.
Disadvantages
 Poor performance for uniformly random access patterns.
 No guaranteed O(log⁡n)O(\log n) worst-case time for individual operations.
Let me know if you'd like detailed examples or a visualization of rotations!

UNIT-IV

Text Processing: String Operations


String processing involves manipulating and analyzing text to perform various
operations such as searching, pattern matching, transformations, and more. These
operations are essential in fields like natural language processing, data analysis, and
programming.

Key String Operations


1. Basic String Manipulations
o Concatenation
o Substring extraction
o Length calculation
o Comparison
o Reversal
2. Search Operations
o Exact match search
o Pattern matching (using algorithms like KMP, Rabin-Karp)
3. Transformations
o Case conversion
o Trimming
o Replacement
4. Splitting and Joining
o Tokenizing strings into substrings
o Joining substrings into a single string
5. Advanced Operations
o Longest Common Substring
o Longest Palindromic Substring
o String Matching with Wildcards or Regular Expressions

Examples and Algorithms


1. Concatenation
Combining two or more strings.
# Example in Python
str1 = "Hello"
str2 = "World"
result = str1 + " " + str2
print(result) # Output: "Hello World"
2. Substring Extraction
Extract a portion of a string.
# Example in Python
text = "Hello World"
substring = text[0:5] # Extracts "Hello"

3. String Length
Calculate the number of characters in a string.
# Example in Python
text = "Hello"
length = len(text)
print(length) # Output: 5

4. Searching for a Substring


Find whether a substring exists within a string.
# Example in Python
text = "This is an example"
search = "example"
print(search in text) # Output: True
Algorithm for Naive String Matching
function NaiveSearch(text, pattern):
for i = 0 to (len(text) - len(pattern)):
match = true
for j = 0 to len(pattern) - 1:
if text[i + j] != pattern[j]:
match = false
break
if match:
return i # Return index of match
return -1 # No match found
5. Pattern Matching (KMP Algorithm)
Efficient pattern matching using the Knuth-Morris-Pratt (KMP) algorithm.
Steps:
1. Precompute the longest prefix-suffix (LPS) array.
2. Use the LPS array to skip unnecessary comparisons.
Algorithm
function KMP(text, pattern):
lps = ComputeLPS(pattern)
i = 0 // Index for text
j = 0 // Index for pattern
while i < len(text):
if text[i] == pattern[j]:
i += 1
j += 1
if j == len(pattern):
return i - j // Match found
elif i < len(text) and text[i] != pattern[j]:
if j != 0:
j = lps[j - 1]
else:
i += 1
return -1 // No match found

function ComputeLPS(pattern):
lps = array of size len(pattern) initialized to 0
length = 0
i=1
while i < len(pattern):
if pattern[i] == pattern[length]:
length += 1
lps[i] = length
i += 1
else:
if length != 0:
length = lps[length - 1]
else:
lps[i] = 0
i += 1
return lps
Example For text = "ababcababcabc" and pattern = "abc", KMP will find all
occurrences efficiently.

6. Longest Common Substring


Find the longest substring common to two strings.
Dynamic Programming Algorithm
function LongestCommonSubstring(s1, s2):
m = len(s1)
n = len(s2)
dp = array of size (m+1)x(n+1) initialized to 0
maxLength = 0
endIndex = 0
for i = 1 to m:
for j = 1 to n:
if s1[i-1] == s2[j-1]:
dp[i][j] = dp[i-1][j-1] + 1
if dp[i][j] > maxLength:
maxLength = dp[i][j]
endIndex = i
return s1[endIndex - maxLength:endIndex]
Example For s1 = "abcdef" and s2 = "zcdemf", the longest common substring is "cde".
7. Longest Palindromic Substring
Find the longest substring that is a palindrome.
Expand Around Center Algorithm
function LongestPalindromicSubstring(s):
start = 0
maxLength = 1
for i = 0 to len(s) - 1:
length1 = ExpandAroundCenter(s, i, i) // Odd-length palindrome
length2 = ExpandAroundCenter(s, i, i + 1) // Even-length palindrome
maxLen = max(length1, length2)
if maxLen > maxLength:
maxLength = maxLen
start = i - (maxLen - 1) // 2
return s[start:start + maxLength]

function ExpandAroundCenter(s, left, right):


while left >= 0 and right < len(s) and s[left] == s[right]:
left -= 1
right += 1
return right - left - 1
Example For s = "babad", the longest palindromic substring is "bab" or "aba".

Practical Applications
1. Search Engines: Pattern matching and keyword searching.
2. Spell Checkers: Longest common substring and edit distance.
3. Bioinformatics: DNA sequence analysis using substring operations.
4. Data Parsing: Tokenizing and processing structured text like JSON or XML.
Let me know if you'd like to explore more advanced string operations or algorithms!
Brute-Force Pattern Matching
Overview
The brute-force (or naive) pattern matching algorithm compares the pattern with every
substring of the text. It is simple to implement but inefficient for long texts and patterns.
Algorithm
1. Start at the beginning of the text.
2. Compare each character of the pattern with the corresponding characters in the
text.
3. If all characters match, record the match position.
4. Shift the pattern one position to the right and repeat until the end of the text is
reached.
Pseudocode
function BruteForceMatch(text, pattern):
n = len(text)
m = len(pattern)
for i = 0 to n - m:
match = true
for j = 0 to m - 1:
if text[i + j] != pattern[j]:
match = false
break
if match:
return i // Match found at index i
return -1 // No match found
Time Complexity
 Best Case: O(n)O(n) (when the first character of the pattern doesn't match most
positions)
 Worst Case: O(n⋅m)O(n \cdot m) (when the pattern and text are similar)
Example
For text = "abcabcabcd" and pattern = "abcd", the algorithm will compare each
position until it finds a match at index 6.
The Boyer-Moore Algorithm
Overview
The Boyer-Moore algorithm is an efficient string-searching algorithm that skips sections
of the text by using information gathered during preprocessing. It works by aligning the
pattern with the text and using two key heuristics:
1. Bad Character Heuristic
2. Good Suffix Heuristic
1. Bad Character Heuristic
If a mismatch occurs, align the pattern such that the mismatched character in the text
aligns with its last occurrence in the pattern. If the character does not exist in the
pattern, shift the pattern past the mismatched character.
Preprocessing:
1. Create a table bad_char where bad_char[c] stores the last occurrence of
character c in the pattern.
2. If c is not in the pattern, set bad_char[c] = -1.
Shift Calculation:
\text{shift} = \max(1, j - \text{bad_char[text[i + j]]})
Where jj is the mismatch position.

2. Good Suffix Heuristic


If a mismatch occurs, align the pattern such that the last matched suffix aligns with its
next occurrence in the pattern.
Preprocessing:
1. Create a table suffix that stores the length of the longest suffix of the pattern
ending at each position.
2. Create a table shift that determines how far to shift the pattern based on the
suffix.
Shift Calculation:
shift=value from the shift table\text{shift} = \text{value from the shift table}

Algorithm
function BoyerMooreMatch(text, pattern):
n = len(text)
m = len(pattern)
bad_char = PreprocessBadCharacter(pattern)
suffix = PreprocessGoodSuffix(pattern)
shift = 0
while shift <= n - m:
j=m-1
while j >= 0 and pattern[j] == text[shift + j]:
j -= 1
if j < 0:
return shift // Match found
shift += (m - bad_char[text[shift + m]] if shift + m < n else 1)
else:
shift += max(1, j - bad_char[text[shift + j]])
return -1 // No match found

function PreprocessBadCharacter(pattern):
bad_char = array of size 256 initialized to -1
for i = 0 to len(pattern) - 1:
bad_char[pattern[i]] = i
return bad_char

function PreprocessGoodSuffix(pattern):
m = len(pattern)
suffix = array of size m initialized to 0
shift = array of size m initialized to 0
// Compute suffix and shift arrays
// (Details omitted for brevity)
return shift

Time Complexity
 Best Case: O(n/m)O(n / m) (skips many comparisons)
 Worst Case: O(n+m)O(n + m)
Example
For text = "ABAAABCD" and pattern = "ABC", the Boyer-Moore algorithm
preprocesses the pattern and skips unnecessary comparisons, leading to faster
matching.
1. Preprocess pattern = "ABC":
o Bad character table: {A: 0, B: 1, C: 2}
2. Search:
o Compare pattern with text starting at index 0.
o Mismatch at index 2 (C vs A), shift by 2 (from bad character heuristic).
o Compare again and find a match at index 4.

Comparison: Brute Force vs. Boyer-Moore


Feature Brute-Force Boyer-Moore

Complexity O(n⋅m)O(n \cdot m) O(n+m)O(n + m)

Preprocessing None Bad character + suffix

Efficiency Slow for large n,mn, m Very fast for large n,mn, m

Usage Simplicity Optimized search


Let me know if you'd like detailed examples, visualizations, or further breakdowns of
these concepts!

Knuth-Morris-Pratt (KMP) Algorithm


The Knuth-Morris-Pratt (KMP) algorithm efficiently finds occurrences of a pattern in a
text by preprocessing the pattern to create a "partial match table" (LPS array). This
allows the algorithm to skip unnecessary comparisons, resulting in linear time
complexity.

Key Concepts
1. Avoid Redundant Comparisons
When a mismatch occurs, the algorithm shifts the pattern in a way that skips
unnecessary rechecks of characters that were already matched.
2. Longest Prefix-Suffix (LPS) Array
The LPS array stores the lengths of the proper prefixes of the pattern that are
also suffixes. It helps determine how much to shift the pattern when a mismatch
occurs.

Steps of the KMP Algorithm


1. Preprocess the Pattern (Compute LPS Array)
The LPS array helps the algorithm determine the next character of the pattern
to match after a mismatch.
2. Pattern Matching
Use the LPS array to slide the pattern over the text without revisiting already
matched characters.

Algorithm
1. Preprocessing: Compute the LPS Array
The LPS array is computed for the pattern. For each position ii, LPS[i]LPS[i]
represents the length of the longest proper prefix of the substring pattern[0:i+1] that is
also a suffix.
Pseudocode
function ComputeLPS(pattern):
m = len(pattern)
LPS = array of size m initialized to 0
length = 0 // Length of the previous longest prefix suffix
i = 1 // Start from the second character
while i < m:
if pattern[i] == pattern[length]:
length += 1
LPS[i] = length
i += 1
else:
if length != 0:
length = LPS[length - 1]
else:
LPS[i] = 0
i += 1
return LPS

2. Pattern Matching
Compare the pattern against the text, using the LPS array to skip characters on a
mismatch.
Pseudocode
function KMPSearch(text, pattern):
n = len(text)
m = len(pattern)
LPS = ComputeLPS(pattern)
i = 0 // Index for text
j = 0 // Index for pattern
while i < n:
if text[i] == pattern[j]:
i += 1
j += 1
if j == m:
return i - j // Match found at index (i - j)
j = LPS[j - 1] // Reset j to LPS[j-1]
elif i < n and text[i] != pattern[j]:
if j != 0:
j = LPS[j - 1] // Use LPS to shift the pattern
else:
i += 1
return -1 // No match found

Example
Input:
 Text: "ababcabcabababd"
 Pattern: "ababd"
Step 1: Compute LPS Array
For pattern = "ababd", the LPS array is computed as follows:
Index ii Pattern pattern[0:i+1]pattern[0:i+1] LPS[i]

0 a 0

1 ab 0

2 aba 1

3 abab 2

4 ababd 0
LPS Array: [0, 0, 1, 2, 0]

Step 2: Pattern Matching


Text Text Pattern
Match/Mismatch Action
Index ii text[i]text[i] pattern[j]pattern[j]

0 a a Match Increment i,ji, j

1 b b Match Increment i,ji, j

2 a a Match Increment i,ji, j

3 b b Match Increment i,ji, j

Shift using LPS (j=2j


4 c d Mismatch
= 2)

5 c a Mismatch Shift ii

6 a a Match Increment i,ji, j

7–11 ... ... Matches Increment i,ji, j

Pattern found at
12 d d Match
index i−j=10i - j = 10

Output
 Pattern found at index 10.

Time Complexity
1. Preprocessing (LPS Array): O(m)O(m)
2. Pattern Matching: O(n)O(n)
Overall Time Complexity: O(n+m)O(n + m)

Advantages
 Efficient for large texts and patterns.
 Linear time complexity ensures scalability.
 Works well with repetitive patterns in the text.

Applications
 Text editors for find-and-replace functionality.
 DNA sequence matching in bioinformatics.
 Spam filtering and malware detection.
Let me know if you'd like further clarifications or visualizations!
Tries (Prefix Trees)
A Trie is a tree-like data structure that stores strings (or sequences) efficiently. It is
primarily used for searching, insertion, and prefix-matching operations. Variants of
Tries, such as Standard Tries, Compressed Tries, and Suffix Tries, optimize storage and
operations for specific use cases.

1. Standard Tries
Definition
A Standard Trie is a tree where:
 Each node represents a character of a string.
 Paths from the root to a leaf represent complete strings.
 Common prefixes are shared among strings.
Key Operations
1. Insertion
o Start at the root.
o Traverse or create child nodes for each character in the string.
o Mark the final node as a terminal node.
2. Search
o Traverse the nodes corresponding to each character of the string.
o If all characters match and the last node is terminal, the string is found.
Algorithm
Insertion
function Insert(root, word):
current = root
for char in word:
if char not in current.children:
current.children[char] = new TrieNode()
current = current.children[char]
current.isTerminal = true
Search
function Search(root, word):
current = root
for char in word:
if char not in current.children:
return false
current = current.children[char]
return current.isTerminal
Example
Insert: ["cat", "car", "cart", "dog"]
Structure:
root
/ \
c d
/\ \
a a o
/ \ \
t r g
\
t
Search:
 "cat" → Found.
 "doge" → Not Found.
Time Complexity
 Insertion/Search: O(L)O(L), where LL is the length of the string.

2. Compressed Tries
Definition
A Compressed Trie optimizes the Standard Trie by:
 Combining chains of single-child nodes into a single edge labeled with the
concatenated string.
 Reducing space usage and traversal time.
Key Operations
1. Insertion
o Similar to Standard Trie but merge single-child chains.
2. Search
o Traverse nodes by matching entire edge labels instead of single
characters.
Algorithm
Insertion
function InsertCompressed(root, word):
current = root
i=0
while i < len(word):
found = false
for child in current.children:
commonPrefix = FindCommonPrefix(child.label, word[i:])
if commonPrefix:
SplitNode(current, child, commonPrefix)
current = child
i += len(commonPrefix)
found = true
break
if not found:
newNode = new TrieNode(word[i:])
current.children.append(newNode)
break
Search
function SearchCompressed(root, word):
current = root
i=0
while i < len(word):
found = false
for child in current.children:
if word[i:].startswith(child.label):
current = child
i += len(child.label)
found = true
break
if not found:
return false
return current.isTerminal
Example
Insert: ["cat", "car", "cart"]
Structure:
root
/ \
c d
/\
at ar
\
t
Search:
 "cat" → Found.
 "cart" → Found.
Time Complexity
 Insertion/Search: O(L)O(L), where LL is the length of the string.

3. Suffix Tries
Definition
A Suffix Trie stores all suffixes of a given string. It is used for pattern matching,
substring searching, and finding the longest repeated substring.
Key Features
 Each path from the root to a leaf represents a suffix of the string.
 Useful for solving string problems efficiently.
Construction
To construct a Suffix Trie for a string SS:
1. Add all suffixes of SS into the Trie.
Algorithm
Construction
function BuildSuffixTrie(string):
root = new TrieNode()
for i = 0 to len(string) - 1:
suffix = string[i:]
Insert(root, suffix)
return root
Search
function SearchSuffixTrie(root, pattern):
current = root
for char in pattern:
if char not in current.children:
return false
current = current.children[char]
return true
Example
Build Suffix Trie for S="banana"S = "banana":
Suffixes: "banana", "anana", "nana", "ana", "na", "a"
Structure:
root
/ \
b a
\ /\
a n n
\| |
na a
\|
a
Search:
 "ana" → Found.
 "nan" → Found.
 "abc" → Not Found.
Time Complexity
 Construction: O(n2)O(n^2) (without optimization), where nn is the length of the
string.
 Search: O(m)O(m), where mm is the length of the pattern.

Comparison
Feature Standard Trie Compressed Trie Suffix Trie

Storage Large Optimized Larger than Trie

Faster due to Good for suffix-related


Efficiency Good for search
compression tasks

Applications Word lookup, prefix Compact storage Substring search, DNA


Feature Standard Trie Compressed Trie Suffix Trie

matching matching
Let me know if you'd like more details on any of these Trie types!
Huffman Coding
Huffman Coding is a lossless data compression algorithm. It is used to encode data by
assigning variable-length binary codes to characters, such that more frequent
characters have shorter codes and less frequent characters have longer codes. This
minimizes the total number of bits required to represent the data.

Key Concepts
1. Prefix Codes
o Huffman codes are prefix codes, meaning no code is a prefix of another.
This ensures unambiguous decoding.
2. Frequency-Based Tree Construction
o A binary tree is constructed based on the frequency of characters in the
input. Characters with lower frequency are placed deeper in the tree.

Steps in Huffman Coding


1. Count Frequencies
Determine the frequency of each character in the input data.
2. Build a Priority Queue (Min-Heap)
Insert all characters as individual nodes in a priority queue, with their
frequencies as keys.
3. Construct the Huffman Tree
o While the priority queue contains more than one node:
1. Remove the two nodes with the smallest frequencies.
2. Create a new internal node with these two nodes as children. The
new node's frequency is the sum of the two.
3. Insert the new node back into the priority queue.
o The remaining node in the queue is the root of the Huffman Tree.
4. Generate Codes
Assign binary codes to characters by traversing the Huffman Tree:
o Assign 0 for a left edge and 1 for a right edge.
o The code for a character is the concatenation of the binary values
encountered along the path from the root to the character.
5. Encode Data
Replace each character in the input with its corresponding Huffman code.
6. Decode Data
Use the Huffman Tree to decode the binary data back into characters.

Algorithm
Build Huffman Tree
function BuildHuffmanTree(frequencies):
priorityQueue = MinHeap()
for char, freq in frequencies:
node = new TreeNode(char, freq)
priorityQueue.insert(node)

while priorityQueue.size > 1:


left = priorityQueue.extractMin()
right = priorityQueue.extractMin()
merged = new TreeNode(None, left.freq + right.freq)
merged.left = left
merged.right = right
priorityQueue.insert(merged)

return priorityQueue.extractMin() // Root of the Huffman Tree


Generate Codes
function GenerateCodes(root, code, codes):
if root is None:
return
if root.char is not None: // Leaf node
codes[root.char] = code
GenerateCodes(root.left, code + "0", codes)
GenerateCodes(root.right, code + "1", codes)
Encode Data
function EncodeData(data, codes):
encoded = ""
for char in data:
encoded += codes[char]
return encoded
Decode Data
function DecodeData(encodedData, root):
decoded = ""
current = root
for bit in encodedData:
if bit == "0":
current = current.left
else:
current = current.right
if current.char is not None: // Leaf node
decoded += current.char
current = root
return decoded

Example
Input Data
data = "AAABBCDA"
Step 1: Count Frequencies
Character Frequency

A 4

B 2

C 1

D 1
Step 2: Build Huffman Tree
 Insert all characters into the priority queue.
 Merge nodes:
1. Merge C and D → Frequency: 2.
2. Merge B and the merged C-D → Frequency: 4.
3. Merge A and the merged B-C-D → Frequency: 8 (root).
Huffman Tree:
(*,8)
/ \
(A,4) (*,4)
/ \
(B,2) (*,2)
/ \
(C,1) (D,1)
Step 3: Generate Codes
Character Huffman Code

A 0

B 10

C 110

D 111
Step 4: Encode Data
Original: AAABBCDA
Encoded: 0001010111010
Step 5: Decode Data
Encoded: 0001010111010
Decoded: AAABBCDA

Advantages
1. Optimality
o Produces the most efficient prefix code for given character frequencies.
2. Space-Efficient
o Reduces the number of bits required to store data.
Applications
1. Compression formats (e.g., JPEG, MP3, ZIP).
2. Efficient transmission of data in telecommunication.

Time Complexity
1. Build Huffman Tree: O(nlog⁡n)O(n \log n), where nn is the number of characters.
2. Generate Codes: O(n)O(n).
3. Encode/Decode: O(L)O(L), where LL is the total length of the encoded data.
Let me know if you’d like further clarifications, examples, or a visualization!
The Longest Common Subsequence (LCS) Problem
The Longest Common Subsequence (LCS) problem involves finding the longest
sequence that can appear in both given strings without altering the order of characters,
though they don’t need to be contiguous.
Definition
Given two sequences (strings) X = x1, x2, ..., xm and Y = y1, y2, ..., yn, the LCS is a
subsequence S such that:
 S is a subsequence of both X and Y.
 S is as long as possible.
Properties of LCS
 Subsequence: A subsequence is derived from a sequence by deleting some or no
elements without changing the order of the remaining elements.
 Longest: Out of all common subsequences, we are interested in the longest one.

Dynamic Programming Approach to LCS


The LCS problem can be solved efficiently using Dynamic Programming (DP) by
breaking it into smaller overlapping subproblems. The idea is to build a 2D DP table,
where each cell in the table represents the length of the LCS for a substring of X and Y.
Steps to Solve Using Dynamic Programming
1. Define the State
Let dp[i][j] represent the length of the LCS of the first i characters of X and the
first j characters of Y.
2. State Transition
The recurrence relation for the DP table is:
o If X[i-1] == Y[j-1], then dp[i][j] = dp[i-1][j-1] + 1 (add 1 to the LCS found
up to i-1 and j-1).
o Otherwise, dp[i][j] = max(dp[i-1][j], dp[i][j-1]) (take the maximum LCS
possible by excluding either X[i-1] or Y[j-1]).
3. Base Case
o dp[0][j] = 0 for all j (LCS with an empty X is 0).
o dp[i][0] = 0 for all i (LCS with an empty Y is 0).
4. Reconstruct the LCS
After filling the DP table, the length of the LCS is dp[m][n], where m is the
length of X and n is the length of Y. To find the actual subsequence, we trace
back through the table.

Algorithm
1. Dynamic Programming for LCS Length
function LCS(X, Y):
m = length of X
n = length of Y
dp = 2D array of size (m+1) x (n+1)

for i from 0 to m:
dp[i][0] = 0 // Base case: LCS of X with empty Y

for j from 0 to n:
dp[0][j] = 0 // Base case: LCS of Y with empty X

for i from 1 to m:
for j from 1 to n:
if X[i-1] == Y[j-1]:
dp[i][j] = dp[i-1][j-1] + 1 // Characters match, increase LCS
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1]) // Max of excluding current characters
return dp[m][n] // The length of LCS is in dp[m][n]
2. Reconstructing the LCS
function ReconstructLCS(X, Y, dp):
i = length of X
j = length of Y
LCS_string = ""

while i > 0 and j > 0:


if X[i-1] == Y[j-1]:
LCS_string = X[i-1] + LCS_string
i -= 1
j -= 1
else if dp[i-1][j] >= dp[i][j-1]:
i -= 1
else:
j -= 1

return LCS_string

Example
Let's solve the LCS problem for two strings:
 X = "AGGTAB"
 Y = "GXTXAYB"
Step 1: Initialize the DP Table
We initialize a DP table dp with dimensions (7 x 7) (since both X and Y have length 6),
where dp[i][j] will store the length of LCS of the first i characters of X and the first j
characters of Y.
Step 2: Fill the DP Table
We fill the DP table according to the recurrence relation:
GXTXAYB

A0 0 0 0 1 1 1
GXTXAYB

G1 1 1 1 1 1 1

G1 1 1 1 1 1 1

T 1 1 2 2 2 2 2

A1 1 2 2 3 3 3

B 1 1 2 2 3 3 4
Step 3: Reconstruct the LCS
Starting from dp[6][6], we trace back to find the common subsequence:
 B matches.
 A matches.
 T matches.
 G matches.
Thus, the LCS is "GTAB".
Step 4: Result
The length of the LCS is 4 and the LCS itself is "GTAB".

Time and Space Complexity


 Time Complexity: O(m×n)O(m \times n), where m and n are the lengths of the
two input strings. This is because we fill a 2D table of size (m+1) x (n+1).
 Space Complexity: O(m×n)O(m \times n) for storing the DP table. If we only
need the length of the LCS, space can be optimized to O(min⁡(m,n))O(\min(m, n))
by using two rows.

Applications
 Text comparison: For example, comparing documents or version control.
 Bioinformatics: Finding similarities between DNA, RNA, or protein sequences.
 File comparison: Comparing files for differences.
Computational Geometry: One-Dimensional Range Searching
In Computational Geometry, Range Searching refers to the problem of preprocessing a
set of points or objects in such a way that allows for efficient querying of all points that
lie within a given range. In one-dimensional range searching, we are concerned with
finding points that lie within a specified interval on the number line.
Problem Definition
Given a set of nn points, P={p1,p2,...,pn}P = \{ p_1, p_2, ..., p_n \}, on the real line, a
range query involves finding all the points from PP that lie within a given interval [a,b]
[a, b], where a≤ba \leq b.
Example:
Let P={1,3,5,7,9}P = \{ 1, 3, 5, 7, 9 \}.
For the query [4,8][4, 8], the points in the set PP that lie within the range are 5,75, 7.

Naive Approach
The simplest method for solving the 1D range searching problem is to check each point
individually to see if it lies within the range [a,b][a, b].
Algorithm (Naive Approach):
function RangeQueryNaive(P, a, b):
result = []
for each point p in P:
if a <= p <= b:
result.append(p)
return result
Time Complexity:
 Time Complexity: O(n)O(n), where nn is the number of points in the set PP.
 This approach checks each point individually, so it requires linear time.
Example:
 P={1,3,5,7,9}P = \{ 1, 3, 5, 7, 9 \}, Query [4,8][4, 8].
 Points inside the range are 5,75, 7.

Efficient Approaches for Range Searching


While the naive approach works, it’s not efficient when the set of points PP is large.
More efficient methods involve sorting the points and using binary search or data
structures like segment trees or interval trees.

1. Binary Search Approach


If the set PP is sorted, we can use binary search to efficiently find the range of points
that lie within [a,b][a, b].
Steps for Binary Search Approach:
1. Sort the points: Sort the points in ascending order (if not already sorted).
2. Binary Search for the Left Bound: Use binary search to find the index of the
smallest point ≥a\geq a.
3. Binary Search for the Right Bound: Use binary search to find the index of the
largest point ≤b\leq b.
4. Return the Points: The points between these two indices are the answer.
Algorithm (Binary Search Approach):
function RangeQueryBinarySearch(P, a, b):
Sort(P) // Ensure the points are sorted

// Find the index of the first point >= a


left = BinarySearchLeft(P, a)

// Find the index of the last point <= b


right = BinarySearchRight(P, b)

// Return the points within the range [a, b]


return P[left...right]
Binary Search Functions:
function BinarySearchLeft(P, a):
low = 0
high = length(P) - 1
while low <= high:
mid = (low + high) // 2
if P[mid] < a:
low = mid + 1
else:
high = mid - 1
return low // Index of first point >= a
function BinarySearchRight(P, b):
low = 0
high = length(P) - 1
while low <= high:
mid = (low + high) // 2
if P[mid] <= b:
low = mid + 1
else:
high = mid - 1
return high // Index of last point <= b
Time Complexity:
 Sorting: O(nlog⁡n)O(n \log n) for sorting the points.
 Binary Search: O(log⁡n)O(\log n) for each binary search (left and right).
 Overall Time Complexity: O(nlog⁡n)O(n \log n) (due to the sorting step).
Example:
 P={1,3,5,7,9}P = \{ 1, 3, 5, 7, 9 \}, Query [4,8][4, 8].
 Binary Search for Left Bound: 55 (first point ≥4\geq 4).
 Binary Search for Right Bound: 77 (last point ≤8\leq 8).
 Points inside the range are 5,75, 7.

2. Segment Tree Approach


A Segment Tree is a data structure that allows for efficient range queries on intervals. In
1D range searching, we can build a segment tree to store intervals, enabling efficient
querying.
Segment Tree Construction
The tree stores intervals and supports queries to find all elements in a range.
Steps for Segment Tree:
1. Build the Segment Tree: The tree is built such that each node stores information
about a range of points.
2. Query the Range: Traverse the tree to find all points that lie within the range
[a,b][a, b].
Time Complexity:
 Building the Segment Tree: O(n)O(n).
 Range Query: O(log⁡n+k)O(\log n + k), where kk is the number of points in the
range.
Example:
 P={1,3,5,7,9}P = \{ 1, 3, 5, 7, 9 \}, Query [4,8][4, 8].
 The segment tree will quickly return the points 5,75, 7 that lie in the range.

3. Interval Tree Approach


An Interval Tree is a balanced binary search tree where each node stores an interval.
This structure is optimized for searching all intervals that overlap with a given query
interval.
Steps for Interval Tree:
1. Build the Tree: Each node stores an interval and the maximum value in its
subtree.
2. Range Query: For a given query, the tree is traversed to find all intervals that
overlap with the query range.
Time Complexity:
 Building the Interval Tree: O(nlog⁡n)O(n \log n).
 Query: O(log⁡n+k)O(\log n + k), where kk is the number of intervals that overlap
with the query range.

Comparison of Approaches
Approach Time Complexity (Preprocessing) Time Complexity (Query)

Naïve O(1)O(1) (no preprocessing) O(n)O(n)

Binary Search O(nlog⁡n)O(n \log n) O(log⁡n+k)O(\log n + k)

Segment Tree O(n)O(n) O(log⁡n+k)O(\log n + k)

Interval Tree O(nlog⁡n)O(n \log n) O(log⁡n+k)O(\log n + k)

Applications of One-Dimensional Range Searching


 Searching for values in sorted data: Finding all points within a specified range in
financial or sensor data.
 Database indexing: Efficiently querying ranges of values in a database.
 Geospatial data analysis: For 1D coordinates or projections of 2D data onto a
line.

Two-Dimensional Range Searching


In Two-Dimensional Range Searching, the goal is to efficiently find all points lying
within a rectangular region in a 2D plane. Given a set of nn points in the 2D plane, we
are tasked with answering queries of the form:
"Which points from the set lie inside the rectangle defined by two opposite corners?"
Problem Definition
Given:
 A set of points P={p1,p2,…,pn}P = \{ p_1, p_2, \dots, p_n \}, where each point
pip_i has coordinates (xi,yi)(x_i, y_i).
 A query rectangle defined by two opposite corners: (x1,y1)(x_1, y_1) and (x2,y2)
(x_2, y_2), where x1≤x2x_1 \leq x_2 and y1≤y2y_1 \leq y_2.
Task: Find all points pi=(xi,yi)p_i = (x_i, y_i) such that:
 x1≤xi≤x2x_1 \leq x_i \leq x_2
 y1≤yi≤y2y_1 \leq y_i \leq y_2
Example
Given a set of points P={(1,3),(3,4),(5,7),(7,8),(2,6)}P = \{ (1, 3), (3, 4), (5, 7), (7, 8), (2,
6) \}, and a query rectangle defined by (x1,y1)=(2,4)(x_1, y_1) = (2, 4) and (x2,y2)=(6,8)
(x_2, y_2) = (6, 8), we need to find all the points inside the rectangle.
 The points that satisfy the condition are:
o (3,4)(3, 4)
o (5,7)(5, 7)
o (2,6)(2, 6)
Thus, the result is {(3,4),(5,7),(2,6)}\{ (3, 4), (5, 7), (2, 6) \}.

Naive Approach
The simplest approach to solve this problem is to check each point individually to see if
it lies within the query rectangle.
Algorithm (Naive):
function RangeQueryNaive(P, x1, y1, x2, y2):
result = []
for each point p in P:
if x1 <= p.x <= x2 and y1 <= p.y <= y2:
result.append(p)
return result
Time Complexity:
 Time Complexity: O(n)O(n), where nn is the number of points in the set PP. This
approach checks each point individually, making it inefficient for large datasets.
Example:
Given P={(1,3),(3,4),(5,7),(7,8),(2,6)}P = \{ (1, 3), (3, 4), (5, 7), (7, 8), (2, 6) \} and the
query rectangle [2,4][2, 4] to [6,8][6, 8]:
 Points inside the rectangle are (3,4),(5,7),(2,6)(3, 4), (5, 7), (2, 6).

Efficient Approaches
For large sets of points, the naive approach becomes inefficient. More efficient methods
for two-dimensional range searching include:
1. Sorting + Binary Search
2. Range Tree
3. k-d Tree
Let’s explore each method.

1. Sorting + Binary Search Approach


In the sorting approach, we first sort the points by their x-coordinates and then apply
binary search to find the points that lie in the desired range for both xx- and yy-
coordinates.
Steps:
1. Sort the points by their xx-coordinates.
2. For each point pip_i in the x-range, check if its yy-coordinate lies within the
query's yy-range using a secondary search (e.g., binary search on sorted yy-
coordinates).
Algorithm:
function RangeQuerySortBinary(P, x1, y1, x2, y2):
Sort P by x-coordinate
result = []
for each point p in P:
if x1 <= p.x <= x2:
if y1 <= p.y <= y2:
result.append(p)

return result
Time Complexity:
 Sorting: O(nlog⁡n)O(n \log n) for sorting points by xx-coordinates.
 Querying: O(n)O(n) for scanning all points within the range (since we still check
each point's yy-coordinate).
 Total Time Complexity: O(nlog⁡n)O(n \log n).

2. Range Tree
A Range Tree is a balanced binary search tree built for multidimensional data. In the
case of two-dimensional range searching, we can build a tree based on the x-coordinate
and store a secondary balanced tree for each node, which is sorted by the y-coordinate.
Steps:
1. Construct a 2D Range Tree:
o Build a balanced binary search tree (BST) on the xx-coordinates.
o At each node, store another BST of the yy-coordinates of the points in its
subtree.
2. Query the Range Tree:
o Perform a range query on the x-coordinates using the main tree (BST).
o For each node in the query range, perform a range query on its secondary
tree (y-coordinates).
Algorithm:
function RangeQueryRangeTree(RangeTree, x1, y1, x2, y2):
result = []
Perform range query on the x-coordinate tree from x1 to x2

for each subtree in the range:


Perform range query on the y-coordinate tree from y1 to y2
Add points to the result
return result
Time Complexity:
 Building the Range Tree: O(nlog⁡n)O(n \log n) for constructing the tree.
 Querying: O(log⁡n+k)O(\log n + k), where kk is the number of points in the
range, because we query the x-range with O(log⁡n)O(\log n) and then for each
subtree, we query the y-range, which takes O(log⁡n+k)O(\log n + k).
 Total Time Complexity: O(log⁡n+k)O(\log n + k).
Example:
For the points P={(1,3),(3,4),(5,7),(7,8),(2,6)}P = \{ (1, 3), (3, 4), (5, 7), (7, 8), (2, 6) \}, and
the query [2,4][2, 4] to [6,8][6, 8], the range tree will efficiently return the points inside
the rectangle.

3. k-d Tree
A k-d Tree (k-dimensional tree) is a binary tree in which every non-leaf node splits the
space into two parts by using a hyperplane perpendicular to one of the axes. In the case
of 2D range searching, the tree alternates between splitting by the x-axis and the y-axis
at each level.
Steps:
1. Build a k-d Tree:
o Construct a balanced tree where each node stores a point and alternates
between x and y coordinates for splitting.
2. Query the k-d Tree:
o To query a range, traverse the tree by checking whether the query
rectangle intersects the splitting planes at each level.
Algorithm:
function RangeQueryKDTree(kdTree, x1, y1, x2, y2):
result = []
Traverse the tree using recursive search:
if the region overlaps with the query range, explore the child nodes
if the point lies within the range, add it to the result
return result
Time Complexity:
 Building the k-d Tree: O(nlog⁡n)O(n \log n) for constructing the tree.
 Querying: O(log⁡n+k)O(\log n + k), where kk is the number of points in the
range, and we recursively traverse the tree.

Comparison of Approaches
Approach Time Complexity (Preprocessing) Time Complexity (Query)

Naive O(1)O(1) (no preprocessing) O(n)O(n)

Sorting + Binary Search O(nlog⁡n)O(n \log n) O(n)O(n)

Range Tree O(nlog⁡n)O(n \log n) O(log⁡n+k)O(\log n + k)

k-d Tree O(nlog⁡n)O(n \log n) O(log⁡n+k)O(\log n + k)

Applications of Two-Dimensional Range Searching


 Geographical data analysis: Querying locations within a specific rectangular
region on a map.
 Database indexing: For spatial databases that store multidimensional data.
 Computer graphics: Performing range queries on 2D point clouds for
intersection tests, collision detection, etc.
Constructing a Priority Search Tree (PST)
A Priority Search Tree (PST) is a data structure used to efficiently handle range queries
in two-dimensional space. It supports searching for points within a range defined by two
perpendicular intervals, often used for applications where we need to search for
rectangles or intervals in 2D space.
Problem Definition
We are given a set of points P={(x1,y1),(x2,y2),...,(xn,yn)}P = \{(x_1, y_1), (x_2, y_2), ...,
(x_n, y_n)\} and a query range defined by two intervals:
 xx-range: [x1,x2][x_1, x_2]
 yy-range: [y1,y2][y_1, y_2]
We need to find all points (x,y)(x, y) such that:
 x1≤x≤x2x_1 \leq x \leq x_2
 y1≤y≤y2y_1 \leq y \leq y_2
Steps to Construct a Priority Search Tree
The Priority Search Tree is constructed by combining a binary search tree (BST) and a
secondary data structure. The tree helps efficiently locate the points within the given
range.
Steps for Constructing a Priority Search Tree (PST):
1. Sort the Points:
o Sort the points primarily by their xx-coordinates (this will form the
structure of the binary search tree).
o Secondary sorting is done by their yy-coordinates.
2. Building the Tree:
o Construct a binary search tree (BST) with points sorted by their xx-
coordinate.
o Each node in the tree stores the point (x,y)(x, y), and the left and right
subtrees contain points with smaller and larger xx-coordinates,
respectively.
o For each node, store a secondary sorted list of the yy-coordinates of all
points in its subtree.
Algorithm for Building PST:
function BuildPST(points):
Sort points by x-coordinate
root = BuildBST(points) // Build a Binary Search Tree based on x-coordinates
for each node in the tree:
Sort the y-coordinates of the points in the subtree rooted at that node
return root
Time Complexity:
 Sorting the points by their xx-coordinates: O(nlog⁡n)O(n \log n).
 Constructing the BST: O(nlog⁡n)O(n \log n) in the worst case.
 Sorting the y-coordinates for each subtree: O(nlog⁡n)O(n \log n).
 Overall Complexity: O(nlog⁡n)O(n \log n).
Searching a Priority Search Tree (PST)
Once the PST is built, we can search for points within a given rectangular query range.
Steps for Searching in a PST:
1. Query Range: Given a query rectangle defined by [x1,x2][x_1, x_2] and [y1,y2]
[y_1, y_2], we perform a range query.
2. BST Search: Traverse the tree to find all nodes whose xx-coordinate lies within
the range [x1,x2][x_1, x_2].
3. Range Query on yy-coordinates: For each node found in the range, use the
secondary sorted list of yy-coordinates to check if the yy-coordinate lies within
the range [y1,y2][y_1, y_2].
4. Output the Results: Collect the points that satisfy both conditions.
Algorithm for Searching in PST:
function SearchPST(root, x1, x2, y1, y2):
result = []
if root is null:
return result
if root.x >= x1 and root.x <= x2:
// Search for y-coordinates within range
for each point (x, y) in root's y-coordinate list:
if y1 <= y <= y2:
result.append((x, y))
if root.x > x1:
// Search left subtree
result += SearchPST(root.left, x1, x2, y1, y2)
if root.x < x2:
// Search right subtree
result += SearchPST(root.right, x1, x2, y1, y2)
return result
Time Complexity:
 Query Complexity:
o Searching through the BST: O(log⁡n)O(\log n) in the average case.
o For each node in the range, searching through the secondary list of yy-
coordinates: O(log⁡n+k)O(\log n + k), where kk is the number of points
found.
 Overall Complexity: O(log⁡n+k)O(\log n + k).

Priority Range Trees


A Priority Range Tree is an extension of the Priority Search Tree, where the tree is
augmented to answer range queries efficiently in two dimensions. It allows for finding
all points in a range that are inside a given rectangle or interval, but in an optimized
way.
Steps to Construct a Priority Range Tree:
1. Construct a Range Tree:
o Build a range tree for the xx-coordinates (like in a 1D range tree).
o For each node, store the points in the yy-coordinates sorted by their yy-
values.
2. Querying:
o Perform a range query on the xx-coordinates first.
o Then, use the corresponding yy-ranges from the secondary trees to
perform the query efficiently for the yy-coordinates.
Time Complexity:
 Building the Tree: O(nlog⁡n)O(n \log n) for the range tree construction.
 Querying: O(log⁡n+k)O(\log n + k), where kk is the number of points in the
range.

Quadtrees
A Quadtree is a data structure used to partition 2D space into smaller regions, making it
ideal for spatial indexing and handling 2D range queries. It is particularly useful for
applications like image processing and geographical databases.
Steps for Constructing a Quadtree:
1. Divide the 2D Space:
o Start by dividing the 2D space into four quadrants. Each quadrant can be
subdivided further into four quadrants recursively until a stopping
condition (e.g., a maximum number of points per quadrant or a minimum
quadrant size) is met.
2. Insert Points:
o Points are inserted into the appropriate quadrant based on their spatial
location.
3. Query:
o A range query is performed by checking the quadrants that overlap with
the query region.
Algorithm for Building a Quadtree:
function BuildQuadtree(points, region):
if region is sufficiently small or points are few:
create a leaf node with all points in the region
else:
divide the region into four quadrants
recursively build the tree for each quadrant
return quadtree
Algorithm for Searching in a Quadtree:
function RangeQueryQuadtree(root, queryRegion):
result = []
if root is a leaf node:
for each point in root:
if point is inside queryRegion:
result.append(point)
else:
for each child quadrant of root:
if queryRegion overlaps with quadrant:
result += RangeQueryQuadtree(child, queryRegion)
return result
Time Complexity:
 Building the Tree: O(n)O(n), where nn is the number of points.
 Querying:
o Best Case: O(log⁡n)O(\log n), when the query region covers a small part of
the space.
o Worst Case: O(n)O(n), when the query region covers the entire space.

Comparison of Data Structures


Data Construction
Query Time Best Use Case
Structure Time

Priority O(nlog⁡n)O(n \log Efficient for range queries in


O(log⁡n+k)O(\log n + k)
Search Tree n) 2D space
Data Construction
Query Time Best Use Case
Structure Time

Priority O(nlog⁡n)O(n \log Extended version of PST with


O(log⁡n+k)O(\log n + k)
Range Tree n) more efficient range queries

O(log⁡n)O(\log n) to Best for spatial indexing and


Quadtree O(n)O(n)
O(n)O(n) geographical databases

Applications of These Structures:


 Priority Search Tree: Efficient for searching points within rectangular ranges in
2D space (e.g., in geographical data).
 Priority Range Tree: Enhanced version of PST for more complex range queries
in 2D.
 Quadtree: Used in image processing, geographical mapping, and spatial
indexing.

k-D Trees: Overview, Construction, and Algorithms


A k-D Tree (short for k-dimensional tree) is a binary search tree that partitions a k-
dimensional space into two parts. It is a generalization of binary search trees to higher
dimensions. While binary search trees are designed for one-dimensional data, k-D trees
allow efficient searching, insertion, and deletion for multidimensional data (where kk
represents the number of dimensions).
The k-D tree is especially useful in applications involving range searches, nearest
neighbor searches, and other geometric operations in multi-dimensional spaces. It is a
popular data structure in computational geometry and machine learning, for example,
in spatial indexing and k-nearest neighbor algorithms.
k-D Tree Construction
A k-D tree is built by recursively dividing the space along the dimensions in a cyclic
manner. In the simplest case, the first dimension is used to split the points into two
halves, then the second dimension is used for the next split, and so on. After each split,
the data is divided into two regions (left and right subtrees), and the tree continues
recursively.
Steps for Constructing a k-D Tree
1. Choose the splitting dimension:
o Start by choosing the dimension to split the space. If the space has kk
dimensions, the first split uses the first dimension, the second split uses
the second dimension, and so on. After kk dimensions, the cycle repeats.
2. Sort the points:
o Sort the points based on the value of the current splitting dimension (the
dimension selected for the current level of the tree).
3. Split the points:
o Choose the median point in the sorted list as the root node. The median
ensures a balanced tree and minimizes the depth of the tree. The points to
the left of the median form the left subtree, and the points to the right
form the right subtree.
4. Recursively construct the left and right subtrees:
o Recursively repeat the process for each subtree, alternating the splitting
dimension at each level of the tree.
Example: Constructing a 2D Tree (k = 2)
Consider the set of 2D points:
P={(3,6),(17,15),(13,15),(6,12),(9,1),(2,7),(10,19)}P = \{(3, 6), (17, 15), (13, 15), (6, 12), (9,
1), (2, 7), (10, 19)\}
Step-by-Step Construction:
1. First Split (Dimension 1: x-coordinate):
o Sort the points by the x-coordinate: (2,7),(3,6),(6,12),(9,1),(10,19),(13,15),
(17,15)(2, 7), (3, 6), (6, 12), (9, 1), (10, 19), (13, 15), (17, 15)
o Choose the median (middle point) as the root: (6,12)(6, 12).
o The left subtree contains points with x≤6x \leq 6: (2,7),(3,6)(2, 7), (3, 6).
o The right subtree contains points with x>6x > 6: (9,1),(10,19),(13,15),
(17,15)(9, 1), (10, 19), (13, 15), (17, 15).
2. Second Split (Dimension 2: y-coordinate):
o For the left subtree, sort by the y-coordinate: (2,7),(3,6)(2, 7), (3, 6).
o The median is (3,6)(3, 6), so this becomes the left child of the root (6,12)(6,
12).
o For the right subtree, sort by the y-coordinate: (9,1),(10,19),(13,15),(17,15)
(9, 1), (10, 19), (13, 15), (17, 15).
o The median is (10,19)(10, 19), so this becomes the right child of the root
(6,12)(6, 12).
3. Third Split (Back to Dimension 1: x-coordinate):
o For the left subtree of (3,6)(3, 6), we only have one point (2,7)(2, 7), so it
becomes a leaf node.
o For the right subtree of (10,19)(10, 19), the points are (9,1),(13,15),(17,15)
(9, 1), (13, 15), (17, 15).
o Sort these points by x-coordinate: (9,1),(13,15),(17,15)(9, 1), (13, 15), (17,
15).
o The median is (13,15)(13, 15), which becomes the left child of (10,19)(10,
19), and (9,1)(9, 1) becomes the left child of (13,15)(13, 15). (17,15)(17, 15)
becomes the right child of (13,15)(13, 15).
Final Tree Structure:
(6, 12)
/ \
(3, 6) (10, 19)
/ / \
(2, 7) (9, 1) (13, 15)
/ \
(17, 15) (9, 1)
Time Complexity for Construction
 Sorting the points at each level takes O(nlog⁡n)O(n \log n), where nn is the
number of points.
 There are O(log⁡n)O(\log n) levels (since the tree is balanced, and we split the data
in half at each level).
 Thus, the overall time complexity to build the tree is:
o Time Complexity: O(nlog⁡n)O(n \log n)

Searching in a k-D Tree


1. Range Search:
A range search query finds all the points within a given rectangular or cuboidal region
defined by lower and upper bounds in each dimension.
 The algorithm traverses the tree and for each node:
o If the current node’s region is entirely within the query range, all points
in that subtree are added to the result.
o If the current node’s region partially overlaps with the query range, the
algorithm recursively searches both left and right subtrees.
o If the current node’s region is outside the query range, the subtree is not
explored.
Algorithm for Range Search in k-D Tree:
function RangeSearch(root, range):
result = []
if root is null:
return result

if root's region is completely inside the query range:


result.append(root.point)
else if root's region overlaps with the query range:
if left subtree of root intersects range:
result += RangeSearch(root.left, range)
if right subtree of root intersects range:
result += RangeSearch(root.right, range)

return result
 Time Complexity: In the worst case, the algorithm may need to explore all points
in the tree, so the time complexity can be O(n)O(n). In the best case, if the query
range is small and well-contained, the complexity is reduced to O(log⁡n+k)O(\log
n + k), where kk is the number of points in the range.
2. Nearest Neighbor Search:
The nearest neighbor search finds the point closest to a given query point.
 The search starts by finding the closest point along the tree’s splitting dimension.
It then recursively explores the nearest subtree, and if necessary, checks the
other subtree for potential closer points.
Algorithm for Nearest Neighbor Search in k-D Tree:
function NearestNeighbor(root, query_point):
best = root.point
best_distance = distance(best, query_point)

if root.left is not null and query_point is within the range of root.left:


candidate = NearestNeighbor(root.left, query_point)
if distance(candidate, query_point) < best_distance:
best = candidate
best_distance = distance(best, query_point)

if root.right is not null and query_point is within the range of root.right:


candidate = NearestNeighbor(root.right, query_point)
if distance(candidate, query_point) < best_distance:
best = candidate
best_distance = distance(best, query_point)

return best
 Time Complexity: The complexity of nearest neighbor search is O(log⁡n)O(\log n)
on average, but can degrade to O(n)O(n) in the worst case (if the tree is
unbalanced).

Applications of k-D Trees


1. Geometric and Spatial Queries: k-D trees are ideal for 2D, 3D, and higher-
dimensional spatial queries like finding the nearest neighbor or performing
range queries.
2. Computer Graphics: Used in applications such as ray tracing, collision detection,
and visibility determination.
3. Machine Learning: Used for k-nearest neighbor (k-NN) algorithms and
classification tasks.
4. Robotics: k-D trees are used in pathfinding and motion planning.
Summary of k-D Tree Characteristics
Operation Time Complexity (Best Case) Time Complexity (Worst Case)

Construction O(nlog⁡n)O(n \log n) O(nlog⁡n)O(n \log n)

Range Search O(log⁡n+k)O(\log n + k) O(n)O(n)

Nearest Neighbor O(log⁡n)O(\log n) O(n)O(n)

You might also like