Unit II- Linear Data Structures, searching and sorting
Overview of Array and Array as an ADT (Abstract Data Type)
1. Overview of Array
An array is a collection of items stored at continuous memory locations. It can store multiple values of the
same type (e.g., all integers, or all strings) in a single variable.
In simple terms:
An array is like a row of school lockers – each locker has a number (called index) and holds a value (like books or
pens).
Why Use an Array?
Imagine you want to store marks of 5 students. Without an array, you would need 5 variables:
mark1 = 78
mark2 = 85
mark3 = 90
mark4 = 67
mark5 = 88
But with an array:
marks = [78, 85, 90, 67, 88]
You can now:
• Access any mark easily: marks[2] → 90
• Loop through all marks
• Add, update, or delete marks efficiently
Python Implementation:
Python doesn’t have built-in arrays like C/C++; instead, it uses lists which are dynamic arrays.
arr = [10, 20, 30, 40, 50]
print(arr[2]) # Output: 30
Python Code Example:
# Create an array of student marks
marks = [78, 85, 90, 67, 88]
# Accessing elements
print("Mark of 1st student:", marks[0])
print("Mark of 3rd student:", marks[2])
# Traversing (visiting each element)
print("All marks:")
for mark in marks:
print(mark)
Key Features of Arrays:
Feature Description
Fixed size You must know how many items you want to store
Same type of data All elements are usually of the same data type
Indexing Elements are accessed using index (starting from 0)
Efficient access You can quickly access any element in O(1) time
2. Array as an Abstract Data Type (ADT)
An Abstract Data Type (ADT) is a logical description of how data is organized and what operations can be
performed on it – without focusing on how it is implemented.
Think of ADT as a black box: You know what it does, but not how it works inside.
For example, a vending machine is an ADT – you press a button (operation), and you get your drink (result), but
you don’t see the internal mechanics.
Abstractly, an array can be defined as: Array A of size n, where A[i] is accessible if 0 <= i < n
Array as an ADT
An array as an ADT means:
• It is a linear collection of elements.
• It supports basic operations like:
o Traversal – visit each element
o Insertion – add a new element
o Deletion – remove an element
o Searching – find an element
o Updating – change the value at a position
Why Arrays Are Important in ADTs
Arrays are:
• Simple to use
• Efficient for accessing data
• Used as a base for complex data structures (like matrices, heaps, trees, etc.)
Summary Table
Concept Description
Array Linear collection of same-type elements, stored in contiguous memory
ADT (Abstract Data Type) Logical model of data structure and operations, hiding implementation details
Array as ADT Defines what we can do with arrays: insert, delete, traverse, search, update
Real-Life Analogy
Real Life Data Structure
School lockers Array
Each locker (number) Index
Items inside locker Element (value)
3.Operations on Array
An array supports several key operations, just like a real-world container. In Data Structures, we perform the
following:
🔹 1. Traversal (Accessing All Elements)
Definition: Visiting each element in the array one by one.
Use Case: Printing all student marks or processing each item.
Python Example:
arr = [10, 20, 30, 40, 50]
print("Traversing the array:")
for item in arr:
print(item)
🔹 2. Insertion
Definition: Adding a new element at a specific position or at the end.
Python Example:
arr = [10, 20, 30, 50]
# Insert 40 at index 3
arr.insert(3, 40)
print("After insertion:", arr)
# Append 60 at the end
arr.append(60)
print("After appending:", arr)
🔹 3. Deletion
Definition: Removing an element from the array by value or index.
Python Example:
arr = [10, 20, 30, 40, 50]
# Delete by value
arr.remove(30)
print("After removing 30:", arr)
# Delete by index
del arr[2]
print("After deleting index 2:", arr)
🔹 4. Searching
Definition: Finding the index or presence of a value.
Python Example:
arr = [10, 20, 30, 40, 50]
# Check if 30 is present
if 30 in arr:
print("30 is found at index:", arr.index(30))
else:
print("30 not found")
🔹 5. Updating
Definition: Changing the value at a particular index.
Python Example:
arr = [10, 20, 30, 40, 50]
# Update value at index 2
arr[2] = 35
print("After update:", arr)
Summary Table for Array Operations
Operation Method Used in Python Time Complexity
Traversal for item in arr: O(n)
Insertion insert(), append() O(n) / O(1)
Deletion remove(), del O(n)
Searching in, index() O(n)
Updating arr[i] = value O(1)
4.Storage Representation
• Storage representation refers to how data structures like arrays are stored in memory (RAM). Arrays are
stored in contiguous memory locations, meaning elements are stored one after the other.
• In computer memory, even multidimensional arrays are stored linearly (1D). There are two main ways to
map these multidimensional structures into linear memory:
1. Row-Major Order (used by languages like C, Python NumPy):
o Rows are stored one after the other.
2. Column-Major Order (used by languages like Fortran, MATLAB):
o Columns are stored one after the other.
Python (via NumPy) uses row-major by default.
Python Example:
import numpy as np
arr = np.array([[1, 2, 3],
[4, 5, 6]])
print("Original Array:")
print(arr)
print("\nFlattened (row-major):", arr.flatten(order='C')) # Row-wise
print("Flattened (column-major):", arr.flatten(order='F')) # Column-wise
Output:
Original Array:
[[1 2 3]
[4 5 6]]
Flattened (row-major): [1 2 3 4 5 6]
Flattened (column-major): [1 4 2 5 3 6]
2. Multidimensional Arrays [2D, nD]
Multidimensional arrays are arrays with more than one axis or dimension:
• 2D Array: Matrix (rows × columns)
• nD Array: Tensor-like structures with more than 2 dimensions.
Detailed Explanation:
• A 2D array is like a table: it has rows and columns.
• A 3D array is like a stack of 2D arrays.
• An nD array can go beyond 3D (e.g., 4D, 5D, etc.) and is useful in scientific computing (e.g., image
processing, deep learning).
Python Examples (with NumPy):
➤ 2D Array (Matrix):
import numpy as np
matrix_2d = np.array([[10, 20, 30],
[40, 50, 60]])
print("2D Array:")
print(matrix_2d)
print("Shape:", matrix_2d.shape)
print("Access element at (1,2):", matrix_2d[1, 2])
➤ 3D Array:
array_3d = np.array([[[1, 2], [3, 4]],
[[5, 6], [7, 8]]])
print("\n3D Array:")
print(array_3d)
print("Shape:", array_3d.shape)
print("Access element at (1,1,1):", array_3d[1,1,1]) # Output: 8
➤ nD Array (e.g., 4D):
array_4d = np.random.randint(1, 10, size=(2, 2, 2, 2)) # Random 4D array
print("\n4D Array Shape:", array_4d.shape)
print("Sample Element at (0,1,1,0):", array_4d[0,1,1,0])
Concept Description
Storage Representation Refers to how arrays are stored in memory (usually row-major in Python)
2D Array A table with rows and columns
nD Array Higher-dimensional structures (e.g., tensors) useful for advanced computations
5.Sparse Matrix Representation
A sparse matrix is a 2D matrix with mostly 0s and only a few non-zero elements.
Why use a special representation?
Storing a huge matrix with mostly 0s wastes memory. So, instead of storing all elements, we only store the non-
zero elements with their row and column positions.
Common Representations
➤ Triplet Format (3-tuple format):
We store only the non-zero elements as a list of [row, column, value].
Example:
Matrix:
[003]
[000]
[500]
Triplet Format:
[
[0, 2, 3],
[2, 0, 5]
]
❖ Python Example:
def sparse_representation(matrix):
triplets = []
for i in range(len(matrix)):
for j in range(len(matrix[0])):
if matrix[i][j] != 0:
triplets.append([i, j, matrix[i][j]])
return triplets
matrix = [
[0, 0, 3],
[0, 0, 0],
[5, 0, 0]
]
print("Sparse Representation (Triplet):")
print(sparse_representation(matrix))
Searching Techniques in 2D Arrays / Sparse Data
Let's assume we want to search for a value in the sparse matrix. We can use different techniques depending on the
structure.
✅ 1 Sequential Search / Linear Search
Idea:
Look at each item one by one until you find the value.
Analogy:
Looking for a name in a random list of people — you check each person.
Advantages:
• Simple and easy to implement.
• Works on unsorted data.
• No extra memory needed.
Disadvantages:
• Slow for large datasets.
• Time-consuming as it checks every item.
Applications:
• Small lists like contact names on your phone.
• When the list is unsorted or very short.
⏱ Time Complexity:
Case Time Complexity
Best O(1) → Found at start
Worst O(n) → Found at end or not found
Average O(n)
❖ Python Example:
def linear_search(triplets, target):
for row, col, val in triplets:
if val == target:
return (row, col)
return "Not Found"
triplets = [[0, 2, 3], [2, 0, 5]]
print("Linear Search Result:", linear_search(triplets, 5))
✅ 2. Binary Search
Idea:
Works only if the data is sorted. Start in the middle, and eliminate half of the elements each time.
Analogy:
Finding a word in a dictionary – you go to the middle and narrow down.
Advantages:
• Much faster than linear search for large data.
• Very efficient on sorted data.
Disadvantages:
• Only works on sorted lists.
• Needs more logic to implement.
Applications:
• Dictionary word lookup
• Searching in databases or ordered lists
• Any sorted data (e.g., ID numbers, prices)
⏱ Time Complexity:
Case Time Complexity
Best O(1)
Worst O(log n)
Average O(log n)
❖ Python Example:
def binary_search(triplets, target):
triplets = sorted(triplets, key=lambda x: x[2]) # Sort by value
low = 0
high = len(triplets) - 1
while low <= high:
mid = (low + high) // 2
if triplets[mid][2] == target:
return (triplets[mid][0], triplets[mid][1])
elif triplets[mid][2] < target:
low = mid + 1
else:
high = mid - 1
return "Not Found"
triplets = [[0, 2, 3], [2, 0, 5], [1, 1, 8]]
print("Binary Search Result:", binary_search(triplets, 5))
✅ 3. Fibonacci Search
Idea:
Similar to binary search, but uses Fibonacci numbers to decide the split points.
Analogy:
Less common, but it gives good performance in some cases (especially for large sorted arrays).
Advantages:
• Similar performance to binary search.
• Better in systems with slow access time (e.g., tape memory).
Disadvantages:
• Slightly more complex than binary search.
• Still needs the data to be sorted.
Applications:
• Large sorted lists in low-level systems.
• Environments where division is expensive.
⏱ Time Complexity:
Case Time Complexity
Best O(1)
Worst O(log n)
Average O(log n)
❖ Python Example:
def fibonacci_search(triplets, target):
triplets = sorted(triplets, key=lambda x: x[2])
n = len(triplets)
# Initialize fibonacci numbers
fibMMm2 = 0 # (m-2)'th Fibonacci
fibMMm1 = 1 # (m-1)'th Fibonacci
fibM = fibMMm2 + fibMMm1 # m'th Fibonacci
while fibM < n:
fibMMm2 = fibMMm1
fibMMm1 = fibM
fibM = fibMMm2 + fibMMm1
offset = -1
while fibM > 1:
i = min(offset + fibMMm2, n - 1)
if triplets[i][2] < target:
fibM = fibMMm1
fibMMm1 = fibMMm2
fibMMm2 = fibM - fibMMm1
offset = i
elif triplets[i][2] > target:
fibM = fibMMm2
fibMMm1 = fibMMm1 - fibMMm2
fibMMm2 = fibM - fibMMm1
else:
return (triplets[i][0], triplets[i][1])
if fibMMm1 and offset + 1 < n and triplets[offset + 1][2] == target:
return (triplets[offset + 1][0], triplets[offset + 1][1])
return "Not Found"
triplets = [[0, 2, 3], [2, 0, 5], [1, 1, 8], [2, 2, 12]]
print("Fibonacci Search Result:", fibonacci_search(triplets, 12))
✅ 4. Indexed Sequential Search
Idea:
• Divide data into blocks
• Create an index for the blocks
• Use the index to jump to the right block, then use linear search
Analogy:
Using chapter titles (index) to find a topic in a textbook, then flipping pages inside that chapter.
Advantages:
• Faster than full linear search.
• Uses index to skip unnecessary elements.
Disadvantages:
• Needs extra indexing space.
• Only works on sorted data.
Applications:
• Database indexing (like finding students in sorted roll numbers).
• Large sorted data (e.g., postal codes, bank records).
⏱ Time Complexity:
Case Time Complexity
Best O(1)
Worst O(√n)
Average O(√n)
❖ Python Example:
def indexed_sequential_search(triplets, target):
triplets = sorted(triplets, key=lambda x: x[2])
n = len(triplets)
block_size = 2
index = []
# Create index
for i in range(0, n, block_size):
index.append((i, triplets[i][2]))
# Find block
for i in range(len(index)):
if i == len(index) - 1 or index[i+1][1] > target:
start = index[i][0]
end = min(start + block_size, n)
for j in range(start, end):
if triplets[j][2] == target:
return (triplets[j][0], triplets[j][1])
break
return "Not Found"
triplets = [[0, 2, 3], [2, 0, 5], [1, 1, 8], [2, 2, 12]]
print("Indexed Sequential Search Result:", indexed_sequential_search(triplets, 8))
▪ Comparison Table
Search Method Works on Extra Best Worst Average Applications
Unsorted? Memory? Case Case Case
Linear Search Yes No O(1) O(n) O(n) Small or unsorted
data
Binary Search No No O(1) O(log n) O(log n) Sorted arrays/lists
Fibonacci No No O(1) O(log n) O(log n) Large sorted files
Search
Indexed No (needs sorted Yes O(1) O(√n) O(√n) Databases, file
Sequential index) systems
Sorting
Sorting is all about arranging data in a particular order – like ascending (A to Z) or descending (Z to A).
Let’s understand 3 key concepts that every good sorting algorithm is judged on:
1. Stability
What does “Stable” mean in sorting?
A stable sort keeps equal elements in the same relative order they were in before sorting.
Simple Example:
Imagine you're sorting a list of students by age, but some students have the same age:
Name Age
Alice 14
Bob 13
Clara 14
Dave 13
Now you sort them by age:
• If Bob still comes before Dave, and Alice still before Clara, even though their ages are equal → the sorting
is stable.
Why is Stability Important?
• Useful when you sort based on multiple criteria.
• Example: First sort by class, then by roll number.
Stable Sorts:
• Bubble Sort
• Merge Sort
• Insertion Sort
Unstable Sorts:
• Selection Sort
• Quick Sort (unless modified)
2. Efficiency (Time & Space)
What does “Efficient” mean?
An efficient sorting algorithm:
• Takes less time
• Uses less memory
Think of it like this:
If sorting a list of 10 names takes 1 second, and sorting 1,000 names takes 5 seconds — that’s efficient!
Time Complexity (How fast it is):
Algorithm Best Case Average Case Worst Case
Bubble Sort O(n) O(n²) O(n²)
Insertion Sort O(n) O(n²) O(n²)
Merge Sort O(n log n) O(n log n) O(n log n)
Quick Sort O(n log n) O(n log n) O(n²)
➡ n = number of items
Space Complexity (How much memory it uses):
• In-place algorithms use less space (like Bubble Sort, Insertion Sort).
• Merge Sort uses extra memory to store sublists.
3. Number of Passes
A pass is one complete sweep over the list.
Example: In Bubble Sort, in each pass, you go through the whole list and swap elements if needed.
For a list of 5 items, you may need up to 4 passes to fully sort it.
Real-Life Analogy (Classroom Style)
Imagine you and your classmates are lining up:
• You're sorted by height (shortest to tallest).
• Some people have the same height.
A stable sort would keep the same order of students with the same height as they were before.
If sorting takes a lot of time and shuffling, it’s inefficient.
If your teacher has to check you all multiple times (passes) to fix the line, that's about the number of passes.
4.Internal and External Sorting
Sorting means arranging data in a specific order — usually ascending or descending (like arranging numbers or
names alphabetically).
1.Internal Sorting
Definition: When all the data that needs to be sorted fits into the main memory (RAM), it is called Internal
Sorting.
Example: Suppose you have a list of 1000 numbers, and your computer's RAM can easily handle this — then you
can sort it using an algorithm like Bubble Sort, Insertion Sort, or Quick Sort.
Features:
• Data is sorted in RAM.
• Fast, because RAM is faster than external storage.
• Suitable for small to medium-sized data sets.
Common Algorithms:
• Bubble Sort
• Insertion Sort
• Selection Sort
• Merge Sort (internal version)
• Quick Sort
2.External Sorting
Definition: When the data does NOT fit into RAM and has to be stored on external memory (like hard drives or
SSDs), it is called External Sorting.
Example: Imagine you have a file with 500 million records — too large for RAM. So, we use External Sorting
techniques.
Features:
• Data is sorted using disk storage.
• Slower than internal sorting because accessing disk is slower.
• Designed for very large data sets (big data).
Common Algorithms:
• External Merge Sort (most popular)
• Replacement Selection
• Polyphase Merge Sort
Comparison Table:
Feature Internal Sorting External Sorting
Storage used Main memory (RAM) External memory (Disk)
Speed Faster Slower (due to disk access)
Data size Small to medium Very large
Examples Quick Sort, Bubble Sort External Merge Sort
Complexity to manage Easy More complex
5.Sorting Algorithms in Python
1️⃣ Bubble Sort
Definition:
Bubble Sort repeatedly compares adjacent elements and swaps them if they are in the wrong order — largest
values "bubble up" to the end.
Steps:
1. Compare first and second elements.
2. Swap if out of order.
3. Repeat for all adjacent pairs.
4. Repeat the process for all elements.
Python Code:
def bubble_sort(arr):
n = len(arr)
for i in range(n):
for j in range(0, n - i - 1): # Last i elements are already sorted
if arr[j] > arr[j + 1]:
arr[j], arr[j + 1] = arr[j + 1], arr[j]
return arr
Applications:
• Teaching algorithm basics
• Small datasets
Advantages:
• Simple and easy to understand
Disadvantages:
• Very slow for large datasets
Time Complexity:
• Best: O(n) (already sorted)
• Average/Worst: O(n²)
2️⃣ Insertion Sort
Definition:
Insertion Sort builds the final sorted array one item at a time, inserting elements in their correct position.
Steps:
1. Start with the second element.
2. Compare it with the previous elements.
3. Shift bigger elements to the right.
4. Insert the current element in its correct place.
Python Code:
def insertion_sort(arr):
for i in range(1, len(arr)):
key = arr[i]
j=i-1
while j >= 0 and key < arr[j]:
arr[j + 1] = arr[j]
j -= 1
arr[j + 1] = key
return arr
Applications:
• Partially sorted data
• Small datasets like cards
Advantages:
• Efficient for small or nearly sorted data
• Simple
Disadvantages:
• Slow for large datasets
Time Complexity:
• Best: O(n)
• Average/Worst: O(n²)
3️⃣ Selection Sort
Definition:
Selection Sort finds the smallest (or largest) element from the unsorted part and swaps it with the beginning of the
unsorted part.
Steps:
1. Find the smallest element.
2. Swap it with the first unsorted element.
3. Repeat for the rest.
Python Code:
def selection_sort(arr):
n = len(arr)
for i in range(n):
min_index = i
for j in range(i+1, n):
if arr[j] < arr[min_index]:
min_index = j
arr[i], arr[min_index] = arr[min_index], arr[i]
return arr
Applications:
• When memory writes are costly (less swaps)
Advantages:
• Simple
• Minimizes number of swaps
Disadvantages:
• Still slow
• Not adaptive
Time Complexity:
• Best/Worst/Average: O(n²)
4️⃣ Quick Sort
Definition:
Quick Sort is a divide and conquer algorithm. It picks a pivot, partitions the array into two (less than and greater
than pivot), and sorts them recursively.
Steps:
1. Pick a pivot.
2. Partition the array into two subarrays.
3. Recursively apply Quick Sort to subarrays.
Python Code:
def quick_sort(arr):
if len(arr) <= 1:
return arr
pivot = arr[0] # You can choose other ways too
left = [x for x in arr[1:] if x <= pivot]
right = [x for x in arr[1:] if x > pivot]
return quick_sort(left) + [pivot] + quick_sort(right)
Applications:
• General-purpose sorting
• Quick and efficient in practice
Advantages:
• Very fast on average
• In-place version exists
Disadvantages:
• Worst case is very slow (if pivot is poorly chosen)
Time Complexity:
• Best/Average: O(n log n)
• Worst: O(n²)
5️⃣ Merge Sort
Definition:
Merge Sort is another divide and conquer algorithm. It splits the list into halves, sorts each half, then merges the
sorted halves.
Steps:
1. Divide the array into two halves.
2. Recursively sort both halves.
3. Merge the two sorted halves.
Python Code:
def merge_sort(arr):
if len(arr) <= 1:
return arr
mid = len(arr) // 2
left_half = merge_sort(arr[:mid])
right_half = merge_sort(arr[mid:])
return merge(left_half, right_half)
def merge(left, right):
result = []
i=j=0
while i < len(left) and j < len(right):
if left[i] < right[j]:
result.append(left[i])
i += 1
else:
result.append(right[j])
j += 1
result.extend(left[i:])
result.extend(right[j:])
return result
Applications:
• External Sorting
• Stable sorting where order matters (e.g. by name, then by date)
Advantages:
• Always O(n log n)
• Stable
Disadvantages:
• Uses extra memory (not in-place)
Time Complexity:
• Best/Average/Worst: O(n log n)
Summary Table:
Algorithm Best Case Average Case Worst Case Space Stable? Use When...
Bubble Sort O(n) O(n²) O(n²) O(1) Yes Very small datasets
Insertion Sort O(n) O(n²) O(n²) O(1) Yes Nearly sorted data
Selection Sort O(n²) O(n²) O(n²) O(1) No Minimizing swaps
Quick Sort O(n log n) O(n log n) O(n²) O(log n) No Fastest in practice
Merge Sort O(n log n) O(n log n) O(n log n) O(n) Yes Large/stable/external sorting