Unit 1
Unit 1
Data Structures
Definition
● A data structure is a way of organizing and storing data to perform operations efficiently.
● It defines a set of rules for how data is arranged in memory and how operations can be
performed on that data.
● The choice of data structure depends on the specific requirements and constraints of a
given problem.
● Efficiency: Well-designed data structures allow for efficient storage and retrieval of
information. They can significantly impact the performance of algorithms and overall
system efficiency.
● Organization: Data structures help in organizing and managing data in a structured
manner. This organization is crucial for maintaining the integrity and coherence of the
data.
● Abstraction: Data structures provide a level of abstraction, allowing programmers to
work with high-level concepts without worrying about low-level implementation details.
Data Representation
Data representation is a fundamental concept in computer science that deals with how
data is organized and stored in a computer system. Data structures are the building blocks of data
representation, providing efficient ways to organize and manage data for various purposes.
Data can be represented in various forms depending on its type and intended use. Some common
types of data representation include:
1. Primitive Data Types: These are basic data types that are directly supported by the
programming language, such as integers, floating-point numbers, strings, and booleans.
2. Structured Data: This type of data is organized into a hierarchical or relational structure,
such as arrays, structs, and records.
3. Unstructured Data: This type of data does not have a predefined structure, such as text,
images, and audio.
4. Data in Memory: Data is stored in memory using binary representation, where each data
element is represented as a sequence of bits.
5. Data on Storage Devices: Data is stored on storage devices using various formats, such as
binary file formats, databases, and cloud storage.
Primitive data types are the basic building blocks of data representation in Python. They include:
Structured Data
Structured data is organized into a hierarchical or relational structure. This makes it easier to
store, manage, and access data. Examples of structured data in Python include:
● Arrays: Arrays are collections of elements of the same data type that are stored in
contiguous memory locations. For example, the following code creates an array of
integers:
● Structs: Structs are collections of named data elements, or fields. For example, the
following code creates a struct to store information about a person:
● Records: Records are similar to structs, but they are immutable, meaning that their values
cannot be changed after they are created. For example, the following code creates a
record to store information about a person:
Unstructured Data
Unstructured data does not have a predefined format or organization. This makes it more
difficult to store, manage, and access than structured data. However, unstructured data is often
more expressive and informative than structured data. Examples of unstructured data in Python
include:
● Text: Text is a sequence of characters that can be used to represent human language, such
as the text of this document.
● Images: Images are representations of visual information, such as photographs, drawings,
or paintings. They are typically stored as arrays of pixels, where each pixel is represented
by a color value.
● Audio: Audio is a representation of sound waves, such as music, speech, or
environmental sounds. It is typically stored as an array of samples, where each sample is
a representation of the sound pressure at a particular point in time.
An abstract data type (ADT) is a mathematical model for a type of data structure that
specifies its behavior without specifying its implementation details. ADTs provide a high-level
abstraction for data structures, allowing programmers to focus on the logic of their programs
without worrying about the underlying implementation details.
1. Lists: Lists are a collection of elements that can be added, removed, accessed, and
searched. Common operations include insert, delete, search, and retrieve.
2. Stacks: Stacks are LIFO (Last In, First Out) data structures, where elements are added
and removed from the top. Common operations include push, pop, and peek.
3. Queues: Queues are FIFO (First In, First Out) data structures, where elements are added
to the rear and removed from the front. Common operations include enqueue, dequeue,
and front.
4. Trees: Trees are hierarchical data structures, where nodes are connected in a parent-child
relationship. Common operations include insert, delete, search, and traversal.
5. Graphs: Graphs are a collection of nodes connected by edges, representing relationships
between entities. Common operations include add node, add edge, remove node, remove
edge, and search.
Algorithm:
● The word Algorithm means ” A set of finite rules or instructions to be followed in calculations or
other problem-solving operations ” Or ” A procedure for solving a mathematical problem in a
finite number of steps that frequently involves recursive operations”.
As one would not follow any written instructions to cook the recipe, but only the standard one. Similarly,
not all written instructions for programming is an algorithms. In order for some instructions to be an
algorithm, it must have the following characteristics:
Clear and Unambiguous: The algorithm should be clear and unambiguous. Each of its steps should be
clear in all aspects and must lead to only one meaning.
Well-Defined Inputs: If an algorithm says to take inputs, it should be well-defined inputs. It may or may
not take input.
Well-Defined Outputs: The algorithm must clearly define what output will be yielded and it should be
well-defined as well. It should produce at least 1 output.
Finite-ness: The algorithm must be finite, i.e. it should terminate after a finite time.
Feasible: The algorithm must be simple, generic, and practical, such that it can be executed with the
available resources. It must not contain some future technology or anything.
Language Independent: The Algorithm designed must be language-independent, i.e. it must be just
plain instructions that can be implemented in any language, and yet the output will be the same, as
expected.
Properties of Algorithm:
· It should terminate after a finite time.
· It should produce at least one output.
· It should take zero or more input.
· It should be deterministic, meaning giving the same output for the same input case.
· Every step in the algorithm must be effective i.e. every step should do some work.
Asymptotic Notations
If an algorithm has a time complexity of O(n), it means that the running time grows linearly with
the size of the input (e.g., in a straight line).
2. Example:
If an algorithm has a time complexity of Ω(n), it means that the running time grows at least as
fast as a linear function of the input.
If an algorithm has a time complexity of Θ(n), it means that the running time grows exactly in
proportion to the size of the input.
Algorithm analysis is the process of studying the performance of algorithms. This involves
determining the time and space complexity of an algorithm.
Time complexity is a measure of how long an algorithm takes to run, while space complexity is
a measure of how much memory an algorithm uses.
● Big O notation: Big O notation is a mathematical notation used to describe the upper
bound of an algorithm's growth rate. It specifies that the algorithm's growth rate is no
worse than a certain function, denoted by O(g(n)). This means that the algorithm grows at
a rate less than or equal to g(n) as n approaches infinity.
● Worst-case analysis: Worst-case analysis is the process of determining the maximum
time or space complexity of an algorithm. This is typically done by identifying the input
that causes the algorithm to take the longest time or use the most memory.
Recursion
Recursion can be a powerful tool for solving problems, but it can also be difficult to understand
and implement. It is important to carefully design recursive algorithms to avoid problems such as
stack overflow.
Example: Factorial
The factorial of a non-negative integer n is the product of all positive integers less than or equal
to n. For example, the factorial of 5 is 120, because 120 = 1 × 2 × 3 × 4 × 5.
Here is a recursive algorithm to calculate the factorial of a non-negative integer n in C:
C
int factorial(int n) {
if (n == 0) {
return 1;
} else {
return n * factorial(n - 1);
}
}
This algorithm works by recursively calling itself to calculate the factorial of n - 1, and then
multiplying the result by n. This algorithm has a time complexity of O(n).
factorial(n):
if n is 0
return 1
return n * factorial(n-1)
Time complexity
If we look at the pseudo-code again, added below for convenience. Then we notice that:
● factorial(0) is only comparison (1 unit of time)
● factorial(n) is 1 comparison, 1 multiplication, 1 subtraction and time for factorial(n-1)
factorial(n):
if n is 0
return 1
return n * factorial(n-1)
From the above analysis we can write:
T(n) = T(n — 1) + 3
T(0) = 1
T(n) = T(n-1) + 3
= T(n-2) + 6
= T(n-3) + 9
= T(n-4) + 12
= ...
= T(n-k) + 3k
as we know T(0) = 1
we need to find the value of k for which n - k = 0, k = n
T(n) = T(0) + 3n , k = n
= 1 + 3n
that gives us a time complexity of O(n)
Space complexity
For every call to the recursive function, the state is saved onto the call stack, till the value is
computed and returned to the called function.
Here we don’t assign an explicit stack, but an implicit call stack is maintained
f(6) → f(5) → f(4) → f(3) → f(2) → f(1) → f(0)
f(6) → f(5) → f(4) → f(3) → f(2) → f(1)
f(6) → f(5) → f(4) → f(3) → f(2)
f(6) → f(5) → f(4) → f(3)
f(6) → f(5) → f(4)
f(6) → f(5)
f(6)
The function in bold is the one currently being executed. As you can see for f(6) a stack of 6 is
required till the call is made to f(0) and a value is finally computed.
Hence for factorial of N, a stack of size N will be implicitly allocated for storing the state of the
function calls.
The space complexity of recursive factorial implementation is O(n)
● A data structure is not only used for organizing the data. It is also used for processing,
retrieving, and storing data.
● There are different basic and advanced types of data structures that are used in almost
every program or software system that has been developed.
The following are some fundamental terminologies used whenever the data structures are
involved:
1. Data: We can define data as an elementary value or a collection of values. For example,
the Employee's name and ID are the data related to the Employee.
2. Data Items: A Single unit of value is known as Data Item.
3. Group Items: Data Items that have subordinate data items are known as Group Items.
For example, an employee's name can have a first, middle, and last name.
4. Elementary Items: Data Items that are unable to divide into sub-items are known as
Elementary Items. For example, the ID of an Employee.
5. Entity and Attribute: A class of certain objects is represented by an Entity. It consists of
different Attributes. Each Attribute symbolizes the specific property of that Entity. For
example,
6.
➔ Entities with similar attributes form an Entity Set. Each attribute of an entity set has a
range of values, the set of all possible values that could be assigned to the specific
attribute.
➔ The term "information" is sometimes utilized for data with given attributes of meaningful
or processed data.
1. Field: A single elementary unit of information symbolizing the Attribute of an Entity is
known as Field.
2. Record: A collection of different data items are known as a Record. For example, if we
talk about the employee entity, then its name, id, address, and job title can be grouped to
form the record for the employee.
3. File: A collection of different Records of one entity type is known as a File. For example,
if there are 100 employees, there will be 25 records in the related file containing data
about each employee.
Classification of DataStructures
A Data Structure delivers a structured set of variables related to each other in various ways. It forms the
basis of a programming tool that signifies the relationship between the data elements and allows
programmers to process the data efficiently.
This page contains detailed tutorials on different data structures (DS) with topic-wise problems.
General Operations:
● Traversing: Visiting each element in the data structure exactly once. Can be done in
different ways like preorder, inorder, postorder for trees, or iteratively for arrays and lists.
● Searching: Finding a specific element within the data structure based on a given
criterion. Different algorithms like linear search, binary search, or hash tables are used
depending on the structure.
● Insertion: Adding a new element to the data structure at a specific position or according
to some rule. Different insertion methods are used for each structure, like appending to a
list, pushing onto a stack, or inserting into a specific node in a tree.
● Deletion: Removing an element from the data structure. Similar to insertion, different
methods are used based on the structure and desired behaviour.
● Sorting: Arranging the elements in the data structure in a specific order (ascending,
descending, custom). Various sorting algorithms like bubble sort, merge sort, quick sort,
etc., are used with different time and space complexities.
● Merging: Combining two or more sorted data structures into a single sorted structure.
Used for efficiently handling large datasets, often employing merge sort algorithm.
Specific Operations:
● Stacks: Push (add to top), Pop (remove from top), Peek (access top element).
● Queues: Enqueue (add to back), Dequeue (remove from front), Peek (access front
element).
● Linked Lists: Insert new node at specific position, Delete node, Traversal based on
pointers.
● Trees: Inorder, preorder, postorder traversal, Finding specific node, Insertion with
balancing (AVL, Red-Black trees), Deletion with rebalancing.
● Graphs: Breadth-First Search (BFS), Depth-First Search (DFS), Finding shortest path,
Topological sorting.
What is an Array?
An array is a collection of items of the same variable type that are stored at contiguous
memory locations. It’s one of the most popular and simple data structures and is often
used to implement other data structures. Each item in an array is indexed starting with 0.
Representation of Array
The representation of an array can be defined by its declaration. A declaration means
allocating memory for an array of a given size.
Example:
int arr[5]; // This array will store integer type element
char arr[10]; // This array will store char type element
float arr[20]; // This array will store float type element
Types of arrays:
There are majorly two types of arrays:
● One-dimensional array (1-D arrays): You can imagine a 1d array as a row, where
elements are stored one after another.
1D array
● Two-dimensional array: 2-D Multidimensional arrays can be considered as an array of
arrays or as a matrix consisting of rows and columns.
2. Traversing (Iterating):
● Example:
C++
for (int i = 0; i < 5; i++) {
cout << numbers[i] << " "; // Prints each element on a new line
}
● Output: 10 20 30 40 50
A Linked List is a linear data structure which looks like a chain of nodes, where each node is a
different element. Unlike Arrays, Linked List elements are not stored at a contiguous location.
It is basically chains of nodes, each node contains information such as data and a pointer to the
next node in the chain. In the linked list there is a head pointer, which points to the first element
of the linked list, and if the list is empty then it simply points to null or nothing.
Here are a few advantages of a linked list that is listed below, it will help you understand why it
is necessary to know.
● Dynamic Data structure: The size of memory can be allocated or de-allocated at run time
based on the operation insertion or deletion.
● Ease of Insertion/Deletion: The insertion and deletion of elements are simpler than arrays
since no elements need to be shifted after insertion and deletion, Just the address needed
to be updated.
● Efficient Memory Utilization: As we know Linked List is a dynamic data structure the
size increases or decreases as per the requirement so this avoids the wastage of memory.
● Implementation: Various advanced data structures can be implemented using a linked list
like a stack, queue, graph, hash maps, etc.
Traversal of items can be done in the forward direction only due to the linking of every node to
its next node.
● A Node Creation:
int main() {
Node* head = nullptr; // Initially empty list
insertAtBeginning(&head, 10);
insertAtBeginning(&head, 20);
insertAtBeginning(&head, 30);
// Node structure
struct Node {
int data;
Node* next;
};
int main() {
Node* head = nullptr; // Initially empty list
insertAtBeginning(&head, 10);
insertAtBeginning(&head, 20);
insertAtBeginning(&head, 30);
return 0;
}
Complexity Analysis:
● Insertion at the beginning: O(1) time complexity, as it only involves creating a new node
and updating the head pointer.
● Printing the list: O(n) time complexity, as it requires iterating through each node in the
list.
● Other operations:
○ Insertion at the end: O(1) time complexity (similar to insertion at the beginning).
○ Deletion at the beginning or end: O(1) time complexity.
○ Deletion at a specific position: O(n) time complexity (requires finding the node to
delete).
○ Searching for a value: O(n) time complexity for unsorted lists, O(log n) for sorted
lists using binary search.
Space Complexity:
● Linked lists have a space complexity of O(n), as each node requires additional memory
for the data and pointer fields.
Algorithm Efficiency:
● Refers to how well an algorithm uses computational resources (time and memory) to solve a
problem.
● A more efficient algorithm typically uses fewer resources for a given input size.
Time Complexity:
● Measures how the execution time of an algorithm grows as the input size increases.
● Expressed using Big O notation (e.g., O(1), O(n), O(log n), O(n^2)).
● Common time complexities:
○ O(1): Constant time, independent of input size.
○ O(log n): Logarithmic time, grows slowly with input size.
○ O(n): Linear time, grows directly with input size.
○ O(n^2): Quadratic time, grows as the square of input size.
Space Complexity:
● Measures how much memory an algorithm uses as the input size increases.
● Also expressed using Big O notation.
● Common space complexities:
○ O(1): Constant space, independent of input size.
○ O(n): Linear space, grows directly with input size.
Key Considerations:
● Algorithm Choice: Different algorithms for the same problem often have different time and space
complexities. Choosing the most efficient algorithm for a given situation is crucial.
● Input Size: The importance of efficiency becomes more evident as input sizes grow larger.
● Trade-offs: Sometimes algorithms with better time complexity have worse space complexity, or
vice versa. Determining the best balance depends on the specific problem and resource
constraints.
Example:
● Linear Search: O(n) time complexity (scans all elements in the worst case), O(1) space
complexity (only uses a few variables).
● Binary Search: O(log n) time complexity (divides the search space in half repeatedly), O(1) space
complexity.
Understanding time and space complexity is essential for designing efficient algorithms and choosing
appropriate data structures for different tasks.