0% found this document useful (0 votes)
16 views38 pages

Data Structures and Algorithms

Uploaded by

masabaian332
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views38 pages

Data Structures and Algorithms

Uploaded by

masabaian332
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Data structures and Algorithms

Topics Covered

●​ Basic Data Structures


●​ Arrays

●​ Stacks

●​ Queues

●​ Core Algorithms
●​ Two Pointer Technique

●​ Searching (Linear & Binary)

●​ Sorting (Bubble, Selection)

●​ Problem-Solving Patterns
●​ Sliding window (for max sum of subarray)

●​ Hashing (for counting or checking duplicates)

●​ Prefix sums (for range queries)

Learning Objectives
●​ Develop algorithmic thinking and problem-solving skills.

●​ Understand time and space complexity to write efficient code.

●​ Apply data structures and algorithms to real-world problems.

●​ Improve competitive programming skills by recognizing common patterns.


1.​Data Structures
1.1.​What Are Data Structures?
A data structure is a way of organizing and storing data efficiently in a program so that
operations like searching, inserting, and deleting are optimized.

There are two main types of data structures:

1.​ Linear Data Structures: Data elements are arranged in a sequence (e.g., Arrays,
Linked Lists, Stacks, Queues). Think of a row of books on a shelf: The books are placed
one after another in a straight line. You can easily go to the next or previous book.
This is an example of a linear structure.

2.​ Non-Linear Data Structures: Data is stored in hierarchical or interconnected ways


(e.g., Trees, Graphs). Think of a family tree: A family has parents, children, and
grandparents. It doesn’t follow a straight line like a list; instead, it branches out. This is
an example of a non-linear structure.
1.2.​Key Data Structures for Beginners
1.2.1.​Arrays

An array is a collection of elements stored contiguously (Contiguous means that all


elements are stored in a single, unbroken block of memory. This means that accessing the
next element simply requires moving to the next memory address.) in memory. It’s like
having numbered seats in a bus. You can access a person’s seat by its number.

They support fast access of elements using an index.

Common Uses:

●​ Storing fixed-size collections of numbers or characters.


●​ Used in sorting and searching algorithms.
What Is Traversal?

Traversal means going through each item in a list, array, or string one-by-one to check
something, count values, or perform operations. It’s like checking each student's name on
a class register, moving from the first to the last. This is something we’re going to do very
often in challenges.

This skill shows up in almost every beginner coding problem. For example:

Example Problem (Finding Maximum Element in an Array):

arr = [3, 7, 2, 9, 1]
print(max(arr)) # Output: 9 # Traverses the array internally to find the biggest number

How to Recognize the Need for Arrays

Problem Clue Use Array?

"You are given n numbers..." ✅ Yes


"Find max in a list of items" ✅ Yes
"Access elements by position/index" ✅ Yes
"Perform sorting or searching" ✅ Yes
"Fixed size collection of values" ✅ Yes
1.2.2.​Strings

A string is a sequence of characters stored like an array. These are the same strings we
looked at earlier.

They are used in pattern matching, hashing, and text processing.

Example Problem (Checking if a Word Is a Palindrome):

def is_palindrome(string):
return string == string[::-1] # Reverse and compare
print(is_palindrome("racecar")) # Output: True

How to Recognize the Need for Strings

Problem Clue Use String?

"Given a word or sentence..." ✅ Yes


"Check if a word is a ✅ Yes
palindrome"

"Count vowels or consonants in ✅ Yes


text"

"Extract substrings or ✅ Yes


characters"

"Match or find patterns in text" ✅ Yes


1.2.3.​Stacks

A stack is a data structure where the last item added to the stack is the first one to be
removed. It follows the LIFO (Last In, First Out) structure. Think of it like picking pringles out
of a pringles tin. The last ones to be put in are the ones on top and so the first to be eaten.

A good example is the Undo/Redo operations in text editors like word or google docs.
They bring back the last text or item put in.

●​ Two Main Operations:


○​ Push: Add an item to the top of the stack.
○​ Pop: Remove the top item from the stack.

Common Uses:

●​ Balancing parentheses in expressions.


●​ Implementing recursive function calls.

Example Problem: Check if Parentheses are Valid


Problem Statement: Given a string of brackets "(())()", determine if they are properly
matched.

“””
Solution:
- Use a stack to store open brackets.
- Pop from the stack when a matching close bracket is found.
- If the stack is empty at the end, it's valid.
Code Implementation:
“””
def is_valid_parentheses(s):
stack = []
for char in s:
if char == '(':
stack.append(char)
elif char == ')':
if not stack:
return False
stack.pop()
return len(stack) == 0

print(is_valid_parentheses("(())()")) # True
print(is_valid_parentheses("(()")) # False

Let's break down this code step by step:

Understanding the Problem

The goal is to determine whether parentheses in a string are properly matched. That
means:

●​ Every opening "(" must have a corresponding closing ")".


●​ The order should be correct, meaning "(()" is invalid because an open bracket
remains unmatched.

Solution Approach

We use a stack, which is a data structure that follows the Last-In-First-Out (LIFO) principle.
Think of it like a stack of plates—you can only remove the top plate first.

Code Breakdown

i. Define the function is_valid_parentheses(s), where:

●​ s is the input string containing parentheses.


●​ stack = [] initializes an empty list to store unmatched "(" brackets.

ii. Loop through each character in the string s:

for char in s:

This iterates through every character one by one.


iii. If char is "(", push it onto the stack:

stack.append(char)

Example:

●​ Input: "((())"
●​ Stack after processing "((())": ["(", "(", "("]

iv. If char is ")", check for a match:

if not stack: return False

●​ If the stack is empty (meaning there's a closing ")" without a previous "("), return
False.

If char is ")": stack.pop()

●​ Otherwise, remove the last "(" from the stack because it has been matched.

v. Final check: If the stack is empty, return True, meaning all "(" have been matched with
")":

return len(stack) == 0

●​ Example "(()())" → Stack is empty → ✅ Valid


●​ Example "(()" → Stack has "(" left → ❌ Invalid

Example Runs

print(is_valid_parentheses("(())()")) # True (Properly matched) print(is_valid_parentheses("(()"))


# False (One "(" remains unmatched)

This implementation efficiently checks validity in linear time O(n)!


How to Recognize the Need for Stacks

Problem Clue Use


Stack?

"Check matching open/close ✅ Yes


brackets"

"Undo or redo last operation" ✅ Yes


"Reverse order of items" ✅ Yes
"Backtracking or recursive function ✅ Yes
calls"

"Process last entered item first" ✅ Yes


1.2.4.​Queue

A queue is a fundamental data structure that follows the First In, First Out (FIFO) principle.
Think of a queue like a line at a supermarket checkout—the first person to arrive is the first
to be served and leave, while new arrivals join the end of the line.

How It Works in Programming

Queues are used when maintaining the order of elements is important. Each item gets
processed in the same order it was added.

Key Operations in a Queue:

●​ Enqueue: Adding an item to the back of the queue.


●​ Dequeue: Removing an item from the front of the queue.
●​ Peek: Viewing the item at the front without removing it.

How Queues Fit as a Data Structure

Queues are particularly useful when handling tasks that require order preservation, such as:

●​ Task Scheduling: Like managing background jobs in an operating system.


●​ Handling Events: Like processing print jobs in a printer queue.

Example Problem (Simulating a Task Queue)

from collections import deque

queue = deque(["Task1", "Task2", "Task3"])


queue.popleft() # Removes the first task
print(queue) # Output: deque(['Task2', 'Task3'])
Understanding the Code

i. Importing deque

from collections import deque

This allows us to use double-ended queues, which support fast append and pop operations
from both ends.

ii. Creating a Queue

queue = deque(["Task1", "Task2", "Task3"])

●​ This initializes a FIFO (First In, First Out) queue with "Task1", "Task2", and
"Task3".

iii. Removing the First Element

queue.popleft()

●​ .popleft() removes "Task1" from the front of the queue.


●​ Unlike .pop(), which removes from the end, .popleft() maintains queue behavior.

iv. Printing the Updated Queue

print(queue) # Output: deque(['Task2', 'Task3'])

●​ After removal, "Task2" is now at the front.

Why Use deque Instead of Lists?

●​ Efficient Removals: .popleft() is O(1), whereas .pop(0) on a list is O(n).


●​ Fast Insertions: .appendleft() allows inserting at the beginning efficiently.
How to Recognize the Need for Queues

Problem Clue Use


Queue?

"Process items in the order they ✅ Yes


arrive"

"Simulate a line or waiting list" ✅ Yes


"Handle task scheduling or print ✅ Yes
jobs"

"Stream data processing" ✅ Yes


1.3.​SUMMARY

The core strategy behind competitive programming is that you choose the right tool for the
job depending on how the input is shaped and what the problem wants you to do.

Here’s how input affects your data structure choice:


A.​ Input is a fixed-size list of numbers
Use: Array
You can access by index, sort it, search through it
Good for traversal, sliding window, or basic math problems

B.​ Input comes as a stream or sequence (like tasks or steps)


Use: Queue
Helps when you need to process elements in order received
Common in simulations (e.g., print queue, task scheduler)

C.​ Input requires “undo” or “reverse order”


Use: Stack
Last item added is the first removed — perfect for bracket checking, backtracking, or undo
operations

D.​ Input includes key–value pairs or duplicate checking


Use: Dictionary / Hash Map
Fast lookup and counting
Great for problems that ask: “How many times does X appear?”
2.​Algorithms
2.1.​What is an Algorithm?
In regards to computing, an algorithm is a set of rules that a machine follows to achieve a
particular goal.

Since there are many types of algorithms, we usually attach another word to “algorithm” to
help us know exactly what goal this algorithm is helping us accomplish.
For example;
●​ If the algorithm is helping us search for particular information in a big set of data,
then we call it a “search algorithm”.
●​ If it is helping us hide sensitive information then it is an “encryption algorithm”

2.2.​Types of Algorithms and Sample Problems


2.2.1.​Two Pointer Technique
This technique involves two pointers that move toward or away from each other to solve
problems efficiently.

A pointer in this case refers to a variable that represents a position within a data structure
like an array.
You can visualise it like this:

Imagine two fingers pointing at positions in a list, moving inward or outward to check values.

-It is commonly used for sorting, searching, and optimization.


Example Problem: Find Two Numbers That Add to a Target
Here it will be common that you are given a short story and sample input and expected
output for a particular problem. Let’s start simply.

Problem Statement: Given a sorted array of numbers, find two numbers that add up to
10.

"""
Example: [2, 3, 4, 5, 7, 9]
Solution:
- Place one pointer at the start (1) and one at the end (9).
- Check: 1 + 9 = 10 → Found the answer!
- If the sum is too small, move the left pointer to the right.
- If the sum is too big, move the right pointer to the left.
Code Implementation:
"""
def two_sum(arr, target):
left, right = 0, len(arr) - 1
while left < right:
if arr[left] + arr[right] == target:
return [arr[left], arr[right]]
elif arr[left] + arr[right] < target:
left += 1
else:
right -= 1
return "No solution"

print(two_sum([2, 3, 4, 5, 7, 9], 10))

Be very careful with the indentations so that you do not get errors.
What’s happening here?

The function two_sum(arr, target) searches for two numbers in a sorted array that
add up to a given target (in this case, 10). Let's break it down step by step!

Code Explanation

left, right = 0, len(arr) - Initialize two pointers at the start and end of the
1 array.

while left < right: Keep checking as long as the left pointer hasn’t
crossed right.

if arr[left] + arr[right] If the sum matches the target, return the pair.
== target:

elif arr[left] + arr[right] If the sum is too small, move the left pointer
< target: right to increase the sum.

else: If the sum is too big, move the right pointer left
to decrease the sum.

return "No solution" If no pair is found, return "No solution".

If we had a different target, the pointers would keep adjusting dynamically.


When do we use it?

We use it in problems where we need to efficiently process sorted arrays, linked lists, or
specific conditions in strings.

It is typically applied in:

●​ Finding Pairs or Triplets in Sorted Arrays​


Example: Finding two numbers that add up to a given target in a sorted list.​
Why it works: We move pointers towards each other without checking every pair →
time complexity O(N) instead of O(N²).

●​ Checking Palindromes or Reverse Comparisons


A palindrome is a word, phrase, number, or sequence that reads the same forward
and backward. Some common examples include:
- Words: racecar, level, civic, noon, radar
- Numbers: 121, 1331, 12321
- Dates: 02/02/2020 (a rare palindromic date)

Example: Checking whether a string is a palindrome.


Works because: We only need to compare characters from both ends toward the
center.
Exercise 1: Given a string s, the task is to check if it is palindrome or not.

Example:

Input: s = "abba"
Output: 1
Explanation: s is a palindrome

Input: s = "abc"
Output: 0
Explanation: s is not a palindrome
2.3.​Searching Algorithms
Searching algorithms help locate a specific item in a dataset. We are going to focus on two
algorithms: Linear search and binary search.

2.3.1.​Linear Search (Sequential Search)


Linear search is the simplest searching method. It checks each element one by one until
the target is found or the list ends.

●​ Best for small, unsorted lists but inefficient for large datasets.

How Linear Search Works

Step Description

1. Start at the first Begin checking from the first


element. item.

2. Compare with target. Check if current element


equals target.

3. If a match is found, Return position of the target


return index. in the list.

4. If there is no match, Move to the next element


continue. and repeat.

5. If the list ends, return Target not in the list.


"Not Found".
Example Code

def linear_search(arr, target):


for i in range(len(arr)):
if arr[i] == target:
return i # Found at index i
return "Not Found"

print(linear_search([5, 2, 8, 1, 9], 8)) # Output: 2

What is happening here?


Step-by-step with input arr = [5, 2, 8, 1, 9] and target = 8:

index arr[i] (array Match? Action


in index i)

0 5 No Continue

1 2 No Continue

2 8 Yes Return index 2


2.3.2.​Binary Search
Binary search is an efficient algorithm for finding a target element in a sorted array by
repeatedly dividing the search range in half. Take an example of one of the most efficient
ways to look up words in a hard copy dictionary. If you’re looking for the word “mango“, you
open the middle of the dictionary and either flip right or left.

Steps of this dictionary Binary Search

1.​ Open the dictionary to the middle page.


2.​ Compare the target word ("mango") with the first word on the page:
○​ If "mango" comes before, flip left.
○​ If "mango" comes after, flip right.
3.​ Repeat the second step until you find the word or eliminate all possibilities

The binary search divides the array in half to search efficiently (O(log N) time complexity).
This program will take a shorter time to run.

Example Problem: Find a Number in a Sorted Array


Problem Statement: Given [1, 2, 4, 7, 9, 12], search for 7.

Step Middle Element Action

1 4 (index 2) 7 > 4, ignore left half

2 9 (index 4) 7 < 9, ignore right half

3 7 (index 3) Found 7, return 3


"""Solution:
- Check the middle number (4). Since 7 is bigger, ignore everything before 4.
- New middle is 9. Since 7 is smaller, ignore everything after 9.
- Left with 7 → Found it!
Code Implementation:
"""
def binary_search(arr, target):
left, right = 0, len(arr) - 1
while left <= right:
mid = left + (right - left) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return "Not found"

print(binary_search([1, 2, 4, 7, 9, 12], 7))

What happened here?

Code Line Explanation

left, right = 0, len(arr) - 1 Set search range boundaries (start and


end).

mid = left + (right - left) // 2 Find a middle index to avoid overflow.

if arr[mid] == target: Return index if found.

elif arr[mid] < target: Search the right half if the target is
bigger.

else: Search the left half if the target is


smaller.

Loop ends if no match found → return "Not


found".
When to Use Binary Search

Use When Reason

Sorted Data Requires sorted array for


efficiency.

Large Data Sets Search space reduces


exponentially (O(log n)).

Direct Access Data Works well with arrays


supporting constant-time
access.

When NOT to Use Binary Search

Avoid When Reason

Unsorted Data Algorithms won't work


correctly.

Small Data Sets Linear search might be faster.

Dynamic Data Maintaining sorted order is


costly.
2.4.​ Sorting Algorithms

Sorting algorithms help us organize data efficiently. These two algorithms are simple but
serve as a foundation for understanding more complex ones.

2.4.1.​Bubble Sort

Bubble Sort works by repeatedly swapping adjacent elements if they are in the wrong
order. It continues this process until the entire list is sorted.

Algorithm Steps:

1.​ Start at the beginning of the list.


2.​ Compare each pair of adjacent elements.
3.​ Swap them if they are in the wrong order.
4.​ Repeat for each element in the list until no swaps are needed.

Example: Sorting [5, 3, 8, 4, 2] from the smallest to the biggest number

Here’s how bubble sort should work for sorting [5, 3, 8, 4, 2] from smallest to largest:

In the context of bubble sort, a pass refers to one full round of comparisons and swaps that
goes through the list from left to right

Pass Comparisons & Swaps Resulting List

1st 5 & 3 → Swap, 5 & 8 → No Swap, 8 & 4 → Swap, 8 & 2 [3, 5, 4, 2, 8]


→ Swap
2nd 3 & 5 → No Swap, 5 & 4 → Swap, 5 & 2 → Swap [3, 4, 2, 5, 8]

3rd 3 & 4 → No Swap, 4 & 2 → Swap [3, 2, 4, 5, 8]

4th 3 & 2 → Swap [2, 3, 4, 5, 8]

Now everything is sorted correctly! Bubble sort works by only swapping adjacent elements,
meaning each pass moves the largest remaining element to its final position.

Bubble Sort Python Code:

def bubble_sort(list_a):

n = len(list_a)

for i in range(n):

swapped = False

for j in range(n - 1 - i):

if list_a[j] > list_a[j + 1]: # Swap if elements are in wrong order

list_a[j], list_a[j + 1] = list_a[j + 1], list_a[j]

swapped = True

if not swapped: # If no swaps, the list is sorted


break

return list_a

# Example usage

numbers = [5, 3, 8, 4, 2]

print(bubble_sort(numbers)) # Output: [2, 3, 4, 5, 8]

What is happening here?

Code Section Explanation

for i in range(n): Outer loop for passes through the list.

swapped = False Flag to detect if swaps happened this


pass.

for j in range(n - 1 - Inner loop to compare adjacent


i): elements.

if list_a[j] > Swap adjacent elements if out of order.


list_a[j+1]:
if not swapped: Stop early if no swaps, meaning the list
is sorted.

return list_a Return the sorted list.

2.4.2.​Selection Sort

Selection Sort works by finding the smallest element and placing it at the beginning of the
list, then repeating the process for the remaining elements.

Algorithm Steps:

1.​ Find the smallest element in the list.


2.​ Swap it with the first element.
3.​ Repeat for the next smallest element until the list is sorted.

Example: Sorting [5, 3, 8, 4, 2]

Selection Sort Python Code:


def selection_sort(arr):
n = len(arr)
for i in range(n):
min_index = i
for j in range(i + 1, n):
if arr[j] < arr[min_index]: # Find smallest element
min_index = j
arr[i], arr[min_index] = arr[min_index], arr[i] # Swap elements

return arr # Return sorted list

# Example usage
numbers = [5, 3, 8, 4, 2]
print(selection_sort(numbers)) # Output: [2, 3, 4, 5, 8]

Let’s break down the Selection Sort algorithm step by step.

Code Section Explanation

for i in range(n): Outer loop to select the minimum


element position.

min_index = i Assume current position as minimum.

for j in range(i+1, n): Inner loop to find actual minimum


element.
if arr[j] < arr[min_index]: Update minimum if smaller element
found.

arr[i], arr[min_index] = Swap smallest element to correct


arr[min_index], arr[i] position.

return arr Return sorted list.

Time Complexity

Algorithm Best Case Worst Case Notes

Bubble Sort O(n) O(n²) Early stop if already sorted.

Selection O(n²) O(n²) Not adaptive, always runs full


Sort passes.

2.5.​Exercises

Try the following exercises to solidify your understanding:

1.​ Modify Bubble Sort: Adjust the code so it sorts in descending order instead of
ascending.
2.​ Modify Selection Sort: Instead of finding the smallest element first, modify the code
to find the largest element first.
3.​ Compare Performance: Run both sorting algorithms on a large list (e.g., 100 random
numbers) and measure their speed.
4.​ Analyze Complexity: What is the worst-case time complexity of Bubble Sort and
Selection Sort? Research and summarize your findings.

3.​Problem-Solving Techniques Using Data Structures and


Algorithms
3.1.​Sliding Window Technique

What is it?

Imagine you have a long list of numbers and want to find the biggest sum of three numbers
in a row. One way is to add every possible set of three numbers again and again—but that
takes too much time.

Instead, Sliding Window helps by using a moving "window." Think of this window like
looking through three numbers at a time. First, you add up the first three numbers. Then,
instead of starting over, you slide the window forward. Remove the first number and add the
next one. This way, you don’t have to redo everything!

It saves time because you're only updating small parts instead of recalculating the whole
thing.

When to Use It?

Finding the maximum sum of a subarray of size k.​


Finding the longest substring with unique characters.​
Problems involving continuous sequences in an array.
Definitions:

●​ Subarray: A subarray is a smaller part of an array that includes some or all of the
elements in order. It must come from the original array without changing the order.
●​ Example: If you have an array [3, 7, 1, 9], some subarrays are [3, 7],
[1, 9], or [7, 1, 9].
●​ Substring: A substring is a smaller part of a string (text) that keeps the original order
of letters.
●​ Example: If the string is "hello", some substrings are "hel", "lo", or "ell".

Both must be continuous, meaning you can’t skip elements.

Example : Find Maximum Sum of a Subarray of Size k

Problem Statement

Given an array:

arr = [2, 1, 5, 1, 3, 2]

k = 3 # Window size

Find the maximum sum of any 3 consecutive numbers.

Naive Approach (Slow): This checks every possible 3-number subarray:

def max_subarray_sum_naive(arr, k):


max_sum = float('-inf')
for i in range(len(arr) - k + 1):
current_sum = sum(arr[i:i+k]) # Sum of subarray
max_sum = max(max_sum, current_sum)
return max_sum

print(max_subarray_sum_naive([2, 1, 5, 1, 3, 2], 3)) # Output: 9

Problem: This approach recalculates the sum for every subarray, making it slow. We can
make it more efficient .
Optimized Sliding Window Approach:

Instead of recalculating every sum, we do this:

1.​ Find the first window sum (2 + 1 + 5 = 8).


2.​ Slide the window:
●​ Remove the first number (2)
●​ Add the next number (1) → New sum: 8 - 2 + 1 = 7.
3.​ Repeat until we find the max sum

def max_subarray_sum(arr, k):


max_sum = window_sum = sum(arr[:k]) # First window sum
for i in range(k, len(arr)):
window_sum += arr[i] - arr[i-k] # Slide the window
max_sum = max(max_sum, window_sum)
return max_sum

print(max_subarray_sum([2, 1, 5, 1, 3, 2], 3)) # Output: 9

Let’s break this code down step by step to fully understand how it works.

Step Explanation Example with arr = [2, 1,


5, 1, 3, 2], k=3

1. Initial window Sum of first k elements [2, 1, 5] sum = 8


sum

2. Slide window Subtract the element leaving the 8 - 2 + 1 = 7 for


forward window, add the new element window [1, 5, 1]
3. Update Keep track of highest window sum so New max is 9 when
max_sum far window is [5, 1, 3]

4. Continue until Process all windows of size k Final max sum found is 9
array end

Why is Sliding Window Efficient?

✅ Instead of recalculating the sum for every possible subarray, it reuses previous
calculations, making it much faster (O(n) time complexity).​
❌ A brute-force approach would require checking all possible subarrays, making it slow
(O(n*k) time complexity).

Quick Quiz:

●​ Why does subtracting the element that leaves the window and adding the new
element speed up the calculation?​

●​ What would happen if k is larger than the length of the array?

3.2.​ Hashing (Counting or Checking Duplicates)

What is it?

Hashing is a technique that uses a hash table (dictionary or set) to store and retrieve data
efficiently.

When to Use It?

i. Checking for duplicate elements in an array.​


ii. Counting frequency of elements in a list.​
iii. Fast lookups (checking if an element exists).
Example Problem: Find First Duplicate in an Array

Problem Statement: Given arr = [1, 2, 3, 2, 4], find the first repeating element.

Naive Approach (O(n²))

# Function to find the first duplicate in an array (Naive Approach)


def find_duplicate_naive(arr):
# Iterate through each element in the array
for i in range(len(arr)):
# Compare the current element with all elements that come after it
for j in range(i+1, len(arr)):
# If a duplicate is found, return it immediately
if arr[i] == arr[j]:
return arr[i]
# If no duplicate is found, return None
return None

# Test the function


print(find_duplicate_naive([1, 2, 3, 2, 4])) # Output: 2

Problem: This approach checks every pair, making it slow.

Optimized Hashing Approach (O(n))

# Function to find the first duplicate in an array (Efficient Approach)


def find_duplicate(arr):
seen = set() # Create an empty set to store seen numbers
for num in arr:
if num in seen: # If number is already in the set, it's a duplicate
return num # Return the first duplicate found
seen.add(num) # Add the number to the set
return None # If no duplicate is found, return None

# Test the function


print(find_duplicate([1, 2, 3, 2, 4])) # Output: 2
Why is this better? Instead of checking every pair, we store seen elements in a set for fast
lookup.

Let's break down the optimized duplicate-finding code step by step:

Step-by-Step Breakdown

Step Explanation

1. Create set An empty set seen to store elements we've


checked

2. Loop through Check each element one by one


arr

3. Check If element is already in seen, return it (first


presence duplicate)

4. Add to set Otherwise, add element to seen

5. Return None If loop finishes with no duplicate, return None


Let’s walk through the example

Iteration Current Num Set Contents Action

1 1 {} Add 1

2 2 {1} Add 2

3 3 {1, 2} Add 3

4 2 {1, 2, 3} Duplicate found! Return


2

Why This Technique is Useful

●​ Efficient → O(n) time complexity instead of O(n²).​

●​ Early Exit → Stops as soon as a duplicate is found.​

●​ Practical for databases, input validation, and more.​

Reflection Questions:

●​ Why is a set used instead of a list for checking duplicates?​

●​ Can hashing be used for counting frequencies? How?​


3.3.​ Prefix Sums (Efficient Range Queries)

What is it?

Imagine you have a list of numbers: [3, 7, 2, 5, 1]​


You want to find the sum of numbers from index 1 to index 3 (which means 7 + 2 + 5).

A normal way would be to add those numbers every time you ask—but that takes time!​
Prefix sums precompute these sums so you don't have to recalculate every time.

How it Works

1. First, make a new list where each number keeps a running total:

●​ Start with 0 → [0, 3, 10, 12, 17, 18]


●​ Each number here adds the previous total

2. Now, to find the sum between two places, just subtract:

●​ Sum from index 1 to 3 → prefix[4] - prefix[1] → 17 - 3 = 14


●​ Instead of adding 7 + 2 + 5 every time, you get instant answers

When to Use It?

i. Finding the sum of elements between two indices efficiently.​


ii. Problems involving cumulative sums.​
iii. Optimizing range queries in large datasets.

Example Problem: Compute Sum Between Indexes 1 and 3

Statement: Given arr = [3, 7, 2, 5, 1], find the sum of elements between index 1 and
3.

Naive Approach (O(n))

# Function to compute the sum of elements between two indices (Naive Approach)
def range_sum_naive(arr, left, right):
return sum(arr[left:right + 1]) # Sum elements from index 'left' to 'right'

# Example usage
numbers = [3, 7, 2, 5, 1]
print(range_sum_naive(numbers, 1, 3)) # Output: 14

Problem: This recalculates the sum every time, making it slow for multiple queries.

Optimized Prefix Sum Approach (O(1) per query)

# Function to compute prefix sums for fast range queries


def prefix_sum(arr):
prefix = [0] * (len(arr) + 1) # Create a prefix sum list initialized with zeros
for i in range(len(arr)):
prefix[i + 1] = prefix[i] + arr[i] # Compute cumulative sum at each step
return prefix # Return the completed prefix sum array

# Example usage
arr = [3, 7, 2, 5, 1]
prefix = prefix_sum(arr) # Generate the prefix sum array

# Query: Compute sum of elements between index 1 and 3 (7 + 2 + 5)


print(prefix[4] - prefix[1]) # Output: 14

Why is this better? Instead of recalculating, we precompute sums and use subtraction for
fast queries.

Let's break down the Prefix Sum code step by step so you fully understand how it works.

Step-by-Step Breakdown

Step Explanation

1. Create prefix list A list prefix with length len(arr) + 1, initialized


to 0
2. Compute running prefix[i+1] = prefix[i] + arr[i] for each
sum index i

3. Query sum Sum from left to right = prefix[right+1] -


prefix[left]

Why is This Efficient?

●​ Instant queries → Just a subtraction operation (O(1)).​

●​ Fast preprocessing → O(n) time to build prefix sums.​

●​ Useful for huge datasets with many range queries.

Reflection Questions:

●​ How does the extra zero at the start of prefix sums help?​

●​ Can you think of other uses of prefix sums besides summing?​

Quick Decision Table for Choosing Techniques

Technique When to Use Time Complexity Notes

Sliding Continuous subarrays, O(n)O(n) Works with sums,


Window fixed window size counts, max/min
Hashing Detect duplicates, O(n)O(n) Uses extra space
frequency counts, (set/dict)
membership

Prefix Sums Multiple range sum O(n) preprocessing, Precompute sums


queries O(1) query for fast queries

You might also like