Data Structures and Design Notes
Data Structures and Design Notes
The above figure shows the ADT model. There are two types of models in the ADT model, i.e., the public
function and the private function. The ADT model also contains the data structures that we are using in a
program. In this model, first encapsulation is performed, i.e., all the data is wrapped in a single unit, i.e.,
ADT. Then, the abstraction is performed means showing the operations that can be performed on the data
structure and what are the data structures that we are using in a program.
1. Class:
A class is a user-defined data type. It consists of data members and member functions, which can be
accessed and used by creating an instance of that class. It represents the set of properties or methods that are
common to all objects of one type. A class is like a blueprint for an object.
For Example: Consider the Class of Cars. There may be many cars with different names and brands but all
of them will share some common properties like all of them will have 4 wheels, Speed Limit, Mileage range,
etc. So here, Car is the class, and wheels, speed limits, mileage are their properties.
2. Object:
It is a basic unit of Object-Oriented Programming and represents the real-life entities. An Object is an
instance of a Class. When a class is defined, no memory is allocated but when it is instantiated (i.e. an object
Object
3. Data Abstraction:
Data abstraction is one of the most essential and important features of object-oriented programming. Data
abstraction refers to providing only essential information about the data to the outside world, hiding the
background details or implementation. Consider a real-life example of a man driving a car. The man only
knows that pressing the accelerators will increase the speed of the car or applying brakes will stop the car,
but he does not know about how on pressing the accelerator the speed is increasing, he does not know about
the inner mechanism of the car or the implementation of the accelerator, brakes, etc in the car. This is what
abstraction is.
4. Encapsulation:
Encapsulation is defined as the wrapping up of data under a single unit. It is the mechanism that binds
together code and the data it manipulates. In Encapsulation, the variables or data of a class are hidden from
any other class and can be accessed only through any member function of their class in which they are
declared. As in encapsulation, the data in a class is hidden from other classes, so it is also known as data-
hiding.
Consider a real-life example of encapsulation, in a company, there are different sections like the accounts
section, finance section, sales section, etc. The finance section handles all the financial transactions and
keeps records of all the data related to finance. Similarly, the sales section handles all the sales-related
activities and keeps records of all the sales. Now there may arise a situation when for some reason an official
from the finance section needs all the data about sales in a particular month. In this case, he is not allowed to
directly access the data of the sales section. He will first have to contact some other officer in the sales
section and then request him to give the particular data. This is what encapsulation is. Here the data of the
sales section and the employees that can manipulate them are wrapped under a single name “sales section”.
5. Inheritance:
Inheritance is an important pillar of OOP(Object-Oriented Programming). The capability of a class to derive
properties and characteristics from another class is called Inheritance. When we write a class, we inherit
properties from other classes. So when we create a class, we do not need to write all the properties and
functions again and again, as these can be inherited from another class that possesses it. Inheritance allows
the user to reuse the code whenever possible and reduce its redundancy.
6. Polymorphism:
The word polymorphism means having many forms. In simple words, we can define polymorphism as the
ability of a message to be displayed in more than one form. For example, A person at the same time can have
different characteristics. Like a man at the same time is a father, a husband, an employee. So the same
person posses different behavior in different situations. This is called polymorphism.
7. Dynamic Binding:
In dynamic binding, the code to be executed in response to the function call is decided at runtime. Dynamic
binding means that the code associated with a given procedure call is not known until the time of the call at
run time. Dynamic Method Binding One of the main advantages of inheritance is that some derived class D
has all the members of its base class B. Once D is not hiding any of the public members of B, then an object
of D can represent B in any context where a B could be used. This feature is known as subtype
polymorphism.
8. Message Passing:
It is a form of communication used in object-oriented programming as well as parallel programming.
Objects communicate with one another by sending and receiving information to each other. A message for
an object is a request for execution of a procedure and therefore will invoke a function in the receiving
object that generates the desired results. Message passing involves specifying the name of the object, the
name of the function, and the information to be sent.
Why do we need object-oriented programming
To make the development and maintenance of projects more effortless.
To provide the feature of data hiding that is good for security concerns.
We can solve real-world problems if we are using object-oriented programming.
It ensures code reusability.
It lets us write generic code: which will work with a range of data, so we don’t have to write basic stuff
over and over again.
1.3 Python Classes and Objects
Python is an object oriented programming language.
Almost everything in Python is an object, with its properties and methods.
Create a Class
To create a class, use the keyword class:
Example
classMyClass:
x=5
Create Object
Now we can use the class named MyClass to create objects:
Example
Create an object named p1, and print the value of x:
p1=MyClass()
print(p1.x)
The __init__() Function
The examples above are classes and objects in their simplest form, and are not really useful in real life
applications.
To understand the meaning of classes we have to understand the built-in __init__() function.
All classes have a function called __init__(), which is always executed when the class is being initiated.
Use the __init__() function to assign values to object properties, or other operations that are necessary to do
when the object is being created:
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
p1 = Person("John", 36)
print(p1.name)
print(p1.age)
Object Methods
Objects can also contain methods. Methods in objects are functions that belong to the object.
Let us create a method in the Person class:
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
defmyfunc(self):
print("Hello my name is " + self.name)
p1 = Person("John", 36)
p1.myfunc()
The self Parameter
The self parameter is a reference to the current instance of the class, and is used to access variables that
belong to the class.
It does not have to be named self, you can call it whatever you like, but it has to be the first parameter of any
function in the class:
class Person:
def __init__(mysillyobject, name, age):
mysillyobject.name = name
O(g(n)) = { f(n): there exist positive constants c and n0 such that 0 <= f(n) <= c*g(n) for all n >= n0}
2. Big Omega – Ω: - The notation Ω(𝒏) is the formal way of representing lower bound of an algorithm’s
complexity.
Ω (g(n)) = {f(n): there exist positive constants c andn0 such that 0 <= c*g(n) <= f(n) for all n >= n0}.
3. Big Theta – Ө: - The notation Ө(𝒏) is the formal way to representing lower bound and upper bound of an
algorithm’s complexity.
Θ(g(n)) = {f(n): there exist positive constants c1, c2 and n0 such that 0 <= c1*g(n) <= f(n) <= c2*g(n) for all
n >= n0}
1.7 Divide and conquer
Divide and conquer is a problem-solving technique in computer science that breaks down large problems
into smaller, more manageable subproblems. The solutions to the subproblems are then combined to solve
the original problem.
The steps of a divide and conquer algorithm are:
Divide: Break the problem into smaller subproblems
Conquer: Solve the subproblems recursively
Combine: Merge the solutions of the subproblems to solve the original problem
Divide and conquer is more efficient than brute force approaches. It's used in many algorithms, including:
Merge sort: Divides an array into two halves, calls itself for each half, and then merges the two sorted
halves
Quick sort: Another algorithm based on the divide and conquer method
Designing efficient divide and conquer algorithms can be difficult. The correctness of a divide and conquer
algorithm is usually proved by mathematical induction, and its computational cost is often determined by
solving recurrence relations.
1.8 Recursion
The term Recursion can be defined as the process of defining something in terms of itself. In simple words,
it is a process in which a function calls itself directly or indirectly.
Advantages of using recursion
A complicated function can be split down into smaller sub-problems utilizing recursion.
Sequence creation is simpler through recursion than utilizing any nested iteration.
Recursive functions render the code look simple and effective.
Disadvantages of using recursion
A lot of memory and time is taken through recursive calls which makes it expensive for use.
Recursive functions are challenging to debug.
The reasoning behind recursion can sometimes be tough to think through.
Example: Factorial
n! = 1•2•3...n and 0! = 1 (called initial case)
So the recursive defintiion n! = n•(n-1)!
Algorithm F(n)
if n = 0 then return 1 // base case
else F(n-1)•n // recursive call
Basic operation? multiplication during the recursive call
Formula for multiplication
M(n) = M(n-1) + 1
is a recursive formula too. This is typical.
In general solution to the inhomogeneous problem is equal to the sum of solution to homogenous problem
plus solution only to the inhomogeneous part. The undetermined coefficients of the solution for the
homogenous problem are used to satisfy the IC.
We guess at I(n) and then determine the new IC for the homogenous problem for B(n)
There is the Master Theorem that give the asymptotic limit for many common problems.
• In ADT the implementation details are hidden. Hence the ADT will be-
AbstractDataType List
{
Instances: List is a collection of elements which are arranged in a linear manner.
Operations: Various operations that can be carried out on list are -
1. Insertion: This operation is for insertion of element in the list.
2. Deletion: This operation removed the element from the list.
3. Searching: Based on the value of the key element the desired element can be searched.
4. Modification: The value of the specific element can be changed without changing its location.
5. Display: The list can be displayed in forward or in backward manner.
}
• The List can be implemented by two ways
1. Array based implementation.
2. Linked List based implementation.
2.2 Array based Implementation
• The linked list that can be represented by arrays is called static linked list.
• In this section we will discuss in detail how exactly the list can be represented using arrays.
• Basically list is a collection of elements.
• To show the list using arrays we will have 'data' and 'link' fields in the array.
• The array can be created as array of structure as
struct node
{
int data;
int next;
} a[10];
• For instance: Consider a list of (10,20,30,40,50). We can store it in arrays as :
While creating the list we have to first enter the location in an array where the first node is placed and then input the
pair: Data and next.
2. Display
After creation we can display the list. Normally when the list is displayed simply the data fields are to be displayed.
After creation we can display the list. Normally when the list is displayed simply the data fields are to be displayed.
In Insert function, consider that we have already created a list (10, 20, 30, 40) as
Now using for loop we will search 'temp' in array 'a'. When we get the value 'temp' we will come out of loop by using
break statement and check for whether the next location is empty or not.
3. Deletion of a node
While deleting a node from the list we simply manipulate the next pointer of previous node in the list. And the data
field of the node to be deleted is initialized to - 1.
Consider that the list is
Limitations
1. There is a limitation on the number of nodes in the list because of fixed size of array. Hence sometimes memory
may get wasted because of less elements in the list or there may be large number of nodes in the list and we could not
store some elements in the array.
2. Insertions and deletions of elements in the array(list) is complicated (Just refer the functions Insert() and Delete()
and feel how complicated they are!)
If (flag == TRUE)
{
Head = New
temp head;
Step 4:
If user wants to enter more elements then let say for value 30 the scenario will be,
Step 5:
As now value of temp becomes NULL we will come out of while loop. As a result of such display routine we will get,
10 20 30 40 50 → NULL
will be printed on console.
3. Insertion of any element at anywhere in the linked list
There are three possible cases when we want to insert an element in the linked list -
a) Insertion of a node as a head node.
b) Insertion of a node as a last node.
c) Insertion of a node after some node.
Suppose we want to delete node 30. Then we will search the node containing 30, using search (*head, key) routine.
Mark the node to be deleted as temp. Then we will obtain previous node of temp using get_prev () function. Mark
previous node as prev
Suppose key=30 i.e. we want a node containing value 30 then compare temp → data and key value. If there is no
match then we will mark next node as temp.
Thus we can create a circular linked list by inserting one more element 50. It is as shown below
• Representation
The linked representation of a doubly linked list is
Thus the doubly linked list can traverse in both the directions, forward as well as backwards.
Now, there may be one question in your mind that bon how do the empty doubly circular linked lists look ? he? Here
is the representation.
That means the prev and next pointers are pointing to the self node.
Logic explanation of DLL program
1. Creation of doubly linked list
Step 1: Initially one variable flag is taken whose value is initialized to TRUE. The purpose of this flag is for making a
check on creation of first node. After creating first node reset flag (i.e. assign FALSE to flag)
Step 2: If head node is created, we can further create a linked list by attaching the subsequent nodes. Suppose, we
want to insert a node with value 20 then
As now temp becomes NULL, we will come out of the while loop. Hence as a result forward display will be
10→20-30→40
Reverse Display
First of all we must reach at the last node by using while statement.
Case 2: Insertion of a node at the end suppose we want to insert a node with value 40 then, first move temp pointer to
the last node
4. Deletion of node
Suppose we want to delete a node with a value 20 then, first search this node, call it as temp.
Unit III
Sorting And Searching
Bubble sort – selection sort – insertion sort – merge sort – quick sort – analysis of sorting algorithms – linear
search – binary search – hashing – hash functions – collision handling – load factors, rehashing, and
efficiency
3.1 Bubble Sort
This is the simplest kind of sorting method in this method. We do this bubble sort procedure in several
iterations, which are called passes.
Sort the following list of numbers using bubble sort technique 52, 1, 27, 85, 66, 23, 13, 57.
Sol.
Pass 1:
Pass 2:
Pass 5:
Pass 6:
1 pass:
st
2 pass
nd
3 pass:
rd
6th pass:
Algorithm
MergeSort(int A[0...n-1],low,high)
//Problem Description: This algorithm is for
//sorting the elements using merge sort
//Input: Array A of unsorted elements, low as //beginning
//pointer of array A and high as end pointer of array A
//Output: Sorted array A[0...n-1]
if(low < high)then
{
mid←(low+high)/2 //split the list at mid
MergeSort(A,low,mid) //first sublist
Choice of Pivot
There are many different choices for picking pivots.
From the above Fig. 6.1.1 the array is maintained to store the students record. The record is not
sorted at all. If we want to search the student's record whose roll number is 12 then with the key-roll
number we will see the every record whether it is of roll number We can obtain such a record at
Array [4] location.
Advantages of linear searching
Step 1: Now the key element which is to be searched is = 99. key = 99.
Step 2: Find the middle element of the array. Compare it with the key
if middle? Key
i.e.if 42 ? 99
if 42 < 99 search the sublist 2
Now handle only sublist 2. Again divide it, find mid of sublist 2
if middle? Key
i.e.if 99 ? 99
So match is found at 7 position of array i.e. at array [6]
th
Thus by binary search method we can find the element 99 present in the given list at array [6]th
location.
3.7 Hashing
Hashing is an effective way to reduce the number of comparisons. Actually hashing deals with the
idea of proving the direct address of the record where the record is likely to store. To understand the
idea clearly let us take an example -
Suppose the manufacturing company has an inventory file that consists of less than 1000 parts. Each
part is having unique 7 digit number. The number is called 'key' and the particular keyed record
consists of that part name. If there are less than 1000 parts then a 1000 element array can be used to
store the complete file. Such an array will be indexed from 0 to 999. Since the key number is 7 digit
it is converted to 3 digits by taking only last three digits of a key. This is shown in the Fig. 7.1.1.
Observe in Fig. 7.1.1 that the first key 496700 and it is stored at 0th position. The second key is
8421002. The last three digits indicate the position 2nd in the array. Let us search the element
4957397. Naturally it will be obtained at position 397. This method of searching is called hashing.
The function that converts the key (7 digit) into array position is called hash function. Here hash
function is
h(key)=key % 1000
Where key % 1000 will be the hash function and the key obtained by hash
3) Bucket: The hash function H(key) is used to map several dictionary entries in the hash table. Each
position of the hash table is called bucket.
4) Collision: Collision is situation in which hash function returns the same address for more than one
record.
For example
5) Probe: Each calculation of an address and test for success is known as a probe. 6) Synonym: The
set of keys that has to the same location are called synonyms. For example - In above given hash
table computation 25 and 55 are synonyms.
7) Overflow: When hash table becomes full and new record needs to be inserted then it is called
overflow.
For example -
3.9.1 Chaining
1. Chaining without replacement
In collision handling method chaining is a concept which introduces an additional field with data i.e.
chain. A separate chain table is maintained for colliding data. When collision occurs we store the
second colliding data by linear probing method. The address of this colliding data can be stored with
the first colliding element in the chain table, without replacement.
For example consider elements,
131, 3, 4, 21, 61, 6, 71, 8, 9
From the example, you can see that the chain is maintained the number who demands for location 1.
First number 131 comes we will place at index 1. Next comes 21 but collision Fig. 7.4.2 Chaining
without replacement occurs so by linear probing we will place 21 at index 2, and chain is maintained
by writing 2 in chain table at index 1 similarly next comes 61 by linear probing we can place 61 at
index 5 and chain will be maintained at index 2. Thus any element which gives hash key as 1 will be
stored by linear probing at empty location but a chain is maintained so that traversing the hash table
will be efficient.
The drawback of this method is in finding the next empty location. We are least bothered about the
fact that when the element which actually belonging to that empty location cannot obtain its location.
This means logic of hash function gets disturbed.
3.9.2 Chaining with replacement
As previous method has a drawback of loosing the meaning of the hash function, to overcome this
drawback the method known as chaining with replacement is introduced. Let us discuss the example
to understand the method. Suppose we have to store following elements :
131, 21, 31, 4, 5
Now next element is 2. As hash function will indicate hash key as 2 but already at index 2. We have
stored element 21. But we also know that 21 is not of that position at which currently it is placed.
Hence we will replace 21 by 2 and accordingly chain table will be updated. See the table :
The value -1 in the hash table and chain table indicate the empty location
The advantage of this method is that the meaning of hash function is preserved. But each time
some logic is needed to test the element, whether it is at its proper position.
3.9.3 Open Addressing
Open addressing is a collision handling technique in which the entire hash table is searched in
systematic way for empty cell to insert new item if collision occurs.
Various techniques used in open addressing are
1. Linear probing 2. Quadratic probing 3. Double hashing
1. Linear probing
In the hash table given in Fig 7.4.3 the hash function used is number % 10. If the first 131 then 131
% 10= 1 i.e. remainder is 1 so hash key = 1. That means we are supposed to place the record at index
1. Next number is 21 which gives hash key 1 as 21 % 10 1. But already 131 is placed at index 1. That
means collision is occurred. We will now apply linear probing. In this method, we will search the
place for number 21 from location of 131. In this case we can place 21 at index 2. Then 31 at index 3.
Similarly 61 can be stored at 6 because number 4 and 5 are stored before 61. Because of this
technique, the searching becomes efficient, as we have to search only limited list to obtain the desired
number.
Problem with linear probing
One major problem with linear probing is primary clustering. Primary clustering is a process in
which a block of data is formed in the hash table when collision is resolved.
For example:
Now if we want to place 17 a collision will occur as 17% 10=7 and bucket 7 has already an element
37. Hence we will apply quadratic probing to insert this record in the hash table.
H (key)=(Hash(key)+i )%m
i
2
(17+1 ) % 10 = 8, when i = 1
2
It is observed that if we want to place all the necessary elements in the hash table the size of divisor
(m) should be twice as large as total number of elements.
3.9.5 Double hashing
Double hashing is technique in which a second hash function is applied to the key when a collision
occurs. By applying the second hash function we will get the number of positions from the point of
collision to insert.
There are two important rules to be followed for the second function :
• It must never evaluate to zero.
• Must make sure that all cells can be probed.
The formula to be used for double hashing is
H (key) = key mod tablesize
1
H (key) = M - (key%M)
2
Here M is a prime number smaller than the size of the table. Prime number smaller than table size 10
is 7.
Hence M = 7
H (17) = 7- (17%7) = 7-3 = 4
2
That means we have to insert the element 17 at 4 places from 37. In short we have to take 4 jumps.
Therefore the 17 will be placed at index 1.
Now to insert number 55.
H (55)=55%10 = 5... collision
1
That means we have to take one jump from index 5 to place 55. Finally the hash table will be -
3.10 Rehashing:
Rehashing is the process of increasing the size of a hashmap and redistributing the elements to new
buckets based on their new hash values. It is done to improve the performance of the hashmap and to
prevent collisions caused by a high load factor.
When a hashmap becomes full, the load factor (i.e., the ratio of the number of elements to the number
of buckets) increases. As the load factor increases, the number of collisions also increases, which can
lead to poor performance. To avoid this, the hashmap can be resized and the elements can be
rehashed to new buckets, which decreases the load factor and reduces the number of collisions.
During rehashing, all elements of the hashmap are iterated and their new bucket positions are
calculated using the new hash function that corresponds to the new size of the hashmap. This process
can be time-consuming but it is necessary to maintain the efficiency of the hashmap.
Unit IV
Tree Structures
Tree ADT – Binary Tree ADT – tree traversals – binary search trees – AVL trees – heaps – multiway search
trees
4.1 Trees ADT
A tree is a finite set of one or more nodes such that -
i) There is a specially designated node called root.
ii) The remaining nodes are partitioned into n >= 0 disjoint sets T1, T2, T3...Tn
T1,T2T3, ...T are called the sub-trees of the root.
The concept of tree is represented by following Fig. 4.1.1.
Various operations that can be performed on the tree data structure are -
1. Creation of a tree.
2. Insertion of a node in the tree as a child of desired node.
3. Deletion of any node(except root node) from the tree.
4. Modification of the node value of the tree.
5. Searching particular node from the tree.
4.2 Basic Terminologies
Let us get introduced with some of the definitions or terms which are normally used.
1. Root
Root is a unique node in the tree to which further subtrees are attached. For above given tree, node 10 is a
root node.
2. Parent node
The node having further sub-branches is called parent node. In Fig. 4.2.2 the 20 is parent node of 40, 50 and
60.
3. Child nodes
The child nodes in above given tree are marked as shown below
4. Leaves
These are the terminal nodes of the tree.
For example -
6. Degree of tree
The maximum degree in the tree is degree of tree.
12. Sibling
The nodes with common parent are called siblings or brothers.
For example
From the above tree, we want to delete the node having value 6 then we will set left pointer of its parent
node as NULL. That is left pointer of node having value 8 is set to NULL.
The idea of balancing a tree is obtained by calculating the balance factor of a tree.
• Definition of Balance Factor
The balance factor BF(T) of a node in binary tree is defined to be h - h where he h and h are heights of left
L R L R
After an insertion of a new node if balance condition gets destroyed, then the nodes on that path (new node
insertion point to root) needs to be readjusted. That means only the affected subtree is to be rebalanced.
• The rebalancing should be such that entire tree should satisfy AVL property.
4.6.2 Insertion
There are four different cases when rebalancing is required after insertion of new node.
1. An insertion of new node into left subtree of left child (LL).
2. An insertion of new node into right subtree of left child (LR).
3. An insertion of new node into left subtree of right child (RL).
4. An insertion of new node into right subtree of right child (RR).
There is a symmetry between case 1 and 4. Similarly symmetry exists between case 2 and 3.
Some modifications done on AVL tree in order to rebalance it is called rotations of AVL tree.
There are two types of rotations.
Single rotation
Double rotation
Insertion algorithm
1. Insert a new node as new leaf just as in ordinary binary search tree.
When node '1' gets inserted as a left child of node 'C' then AVL property gets destroyed i.e. node A has
balance factor + 2.
The LL rotation has to be applied to rebalance the nodes.
2. RR rotation
When node '4' gets attached as right child of node 'C' then node 'A' gets unbalanced. The rotation which
needs to be applied is RR rotation
When node '3' is attached as a right child of node 'C' then unbalancing occurs because of LR. Hence LR
rotation needs to be applied.
4. RL rotation
When node '2' is attached as a left child of node 'C' then node 'A' gets unbalanced as its balance factor
becomes 2. Then RL rotation needs to be applied to rebalance the AVL tree.
4.6.4 Deletion
Even after deletion of any particular node from AVL tree, the tree has to be restructured in order to preserve
AVL property. And thereby various rotations need to be applied.
Algorithm for deletion
The deletion algorithm is more complex than insertion algorithm.
1.Search the node which is to be deleted.
2. a) If the node to be deleted is a leaf node then simply make it NULL to remove.
b) If the node to be deleted is not a leaf node i.e. node may have one or two children, then the node must be
swapped with its inorder successor. Once the node is swapped, we can remove this node.
3. Now we have to traverse back up the path towards root, checking the balance factor of every node along
the path. If we encounter unbalancing in some subtree then balance that subtree using appropriate single or
double rotations.
The deletion algorithm takes O(log n) time to delete any node
Step 1: Compare 13 with root node 10. As 13 > 10, search right subbranch.
Step 2: Compare 13 with 18. As 13 < 18, search left subbranch.
Step 3: Compare 13 with 12. As 13 > 12, search right subbranch.
Step 4: Compare 13 with 13. Declare now "node is found".
The searching operation takes O (log n) time.
The main objective here is to keep the tree balanced all the times. Such balancing causes the depth of the tree
to remain at its minimum and therefore overall costs of search is reduced.
4.7 Binary Heap
In this section we will learn "What is heap?" and "How to construct heap?"
Definition: Heap is a complete binary tree or a almost complete binary tree in which every parent node be
either greater or lesser than its child nodes.
A Max heap is a tree in which value of each node is greater than or equal to the value of its children nodes.
For example
A Min heap is a tree in which value of each node is less than or equal to value of its children nodes.
Parent being greater or lesser in heap is called parental property. Thus heap has two important properties.
Heap:
Before understanding the construction of heap let us learn/revise few basics that are required while
constructing heap.
• Level of binary tree: The root of the tree is always at level 0. Any node is always at a level one more than
its parent nodes level.
For example:
• Height of the tree: The maximum level is the height of the tree. The height of the tree is also called depth
of the tree.
For example:
• Complete binary tree: The complete binary tree is a binary tree in which all the levels of the tree are filled
completely except the lowest level nodes which are filled from left if required.
•. Almost complete binary tree: The almost complete binary tree is a tree in which
i) Each node has a left child whenever it has a right child. That means there is always a left child, but for a
left child there may not be a right child.
ii) The leaf in a tree must be present at height h or h-1. That means all the leaves are on two adjacent levels.
For example:
In Fig. 4.16.5 (a) the given binary tree is satisfying both the properties. That is all the leaves are at either
level 2 or level 1 and when right child is present left child is always present.
In Fig. 4.14.5 (b), the given binary tree is not satisfying the first property; that is the right child is present for
node 20 but it has no left child. Hence it is not almost complete binary tree.
In Fig. 4.14.5 (c), the given binary tree is not satisfying the second property; that is the leaves are at level 3
and level 1 and not at adjacent levels. Hence it is not almost complete binary tree.
Node Structure: Each node in a multiway tree has numerous pointers pointing to the nodes that are
its children. From node to node, the number of pointers may differ.
Root Node: The root node, from which all other nodes can be accessed, is the node that acts as the
tree's origin.
Leaf Nodes: Nodes with no children are known as leaf nodes. Leaf nodes in a multiway tree can
have any number of children, even zero.
Height and Depth: Multiway trees have height (the maximum depth of the tree) and depth (the
distance from the root to a node). The depth and height of the tree might vary substantially due to the
different number of children.
4.8.2 Types of Multiway Trees:
Multiway trees come in a variety of forms, each with particular properties and uses. The most typical
varieties include:
General Multiway Trees: Each node in these trees is allowed to have an unlimited number of
offspring. They are utilized in a broad variety of applications, such as company organizational
hierarchies and directory structures in file systems.
B-trees: In file systems and databases, B-trees are a sort of self-balancing multiway tree. They are
effective for insertions, deletions, and searches because they are made to preserve a balanced
structure.
Ternary Trees: A particular variety of multiway trees in which each node has precisely three
children is known as a ternary tree. They are frequently incorporated into optimization and language
processing methods.
Quad Trees and Oct Trees: These are two examples of specialized multiway trees used in
geographic information systems and computer graphics for spatial segmentation and indexing. Oct
trees have eight children per node, compared to quad trees' four.
4.8.3 Operations on Multiway Trees:
Multiple operations are supported by multiway trees, allowing for effective data modification and retrieval.
The following are a few examples of basic operations:
Insertion: Adding a new node to the tree while making sure it preserves the structure and
characteristics of the tree.
Deletion: A node is deleted from the tree while still preserving its integrity.
Search: Finding a particular node or value within the tree using search.
Traversal: Traversal is the process of going through every node in the tree in a particular order, such
as pre-order, in-order, or post-order.
Versatility: Multiway trees are useful for a wide range of applications because they offer a versatile
way to describe hierarchical relationships.
Efficiency: For managing and organizing massive datasets in databases and file systems, certain
varieties of multiway trees, such as B-trees, are built for efficiency.
Parsing and Syntax Trees: Multiway trees are crucial for compilers and interpreters because they
are utilized in the parsing and syntax analysis of programming languages.
Spatial Data Structures: Multiway trees, such as quadtrees and oct trees, allow for effective spatial
indexing and partitioning in geographic information systems and computer graphics.
5.1.3 Representation of Directed Graph to Adjacency Matrix: The below figure shows a directed graph.
Initially, the entire Matrix is initialized to 0. If there is an edge from source to destination, we insert 1 for that
particular adjMat[destination].
∙ Weighted Graph
A graph G= (V, E) is called a labeled or weighted graph because each edge has a value or weight representing the
cost of traversing that edge.
∙ Directed Graph
A directed graph also referred to as a digraph, is a set of nodes connected by edges, each with a direction
∙ Undirected Graph
An undirected graph comprises a set of nodes and links connecting them. The order of the two connected vertices
is irrelevant and has no direction. You can form an undirected graph with a finite number of vertices and edges.
∙ Connected Graph
If there is a path between one vertex of a graph data structure and any other vertex, the graph is connected.
∙ Disconnected Graph
When there is no edge linking the vertices, you refer to the null graph as a disconnected graph.
∙ Cyclic Graph
If a graph contains at least one graph cycle, it is considered to be cyclic.
∙ Acyclic Graph
When there are no cycles in a graph, it is called an acyclic graph.
It then starts a loop that continues until all nodes have been visited.
Step: 1 We pick A as the starting point and add A to the Queue. To prevent cycles, we also mark A as visited(by
adding it to the visited set).
Step: 2 We remove the head of the Queue (i.e. A now). The Node was First In (inserted first) in the Queue. We
process A and pick all its neighbours that have not been visited yet(i.e., not in the visited set).
Step :3 Next, we pull the head of the Queue, , i.e. D. We process D and consider all neighbours of D, which are
A and E, but since both A and E are in the visited set, we ignore them and move forward.
Step :4 Next, we pull the head of the Queue, i.e. E. We process E. Then we need to consider all neighbours of
E, which are A and D, but since both A and D are in the visited set, we ignore them and move forward.
Next, we pull the head of the Queue, i.e. C. We process C. Then we consider all neighbours of C, which are A
and B. Since A is in the visited set, we ignore it. But as B has not yet been visited, we visit B and add it to the
Queue.
Step 5: Finally, we pick B from Queue, and its neighbour is C, which has already visited. We have nothing else
in Queue to process, so we are done traversing the graph.
So the order in which we processed/explored the elements are: A, D, E, C, B which is the Breadth-First Search
of the above Graph.
5.3.2 DFS Graph Traversal in Data Structure
When traversing a graph, the DFS method goes as far as it can before turning around. This algorithm explores the
graph in depth-first order, starting with a given source node and then recursively visiting all of its surrounding
vertices before backtracking. DFS will analyze the deepest vertices in a branch of the graph before moving on to
other branches. To implement DFS, either recursion or an explicit stack might be utilized.
Graph Traversal: DFS Algorithm
Pseudo Code :
def dfs(graph, start_node, visited=set()):
visited.add(start_node)
print(start_node)
Example
Let us see how the DFS algorithm works with an example. Here, we will use an undirected graph with
5 vertices.
Reachability Relation: In DAG, we can determine if there is a reachability relation between two nodes. Node A
is said to be reachable from node B if there exists a directed path that starts at node B and ends at node A. This
implies that you can follow the direction of edges in the graph to get from B to A.
Transitive Closure:The transitive closure of a directed graph is a new graph that represents all the direct and
indirect relationships or connections between nodes in the original graph. In other words, it tells you which nodes
can be reached from other nodes by following one or more directed edges.
Topological Sorting
Topological sorting for Directed Acyclic Graph (DAG) is a linear ordering of vertices such that for every directed
edge u-v, vertex u comes before v in the ordering.
topological_sort(N, adj[N][N])
T = []
visited = []
in_degree = []
for i = 0 to N
in_degree[i] = visited[i] = 0
for i = 0 to N
for j = 0 to N
if adj[i][j] is TRUE
in_degree[j] = in_degree[j] + 1
for i = 0 to N
if in_degree[i] is 0
enqueue(Queue, i)
visited[i] = TRUE
while Queue is not Empty
vertex = get_front(Queue)
dequeue(Queue)
T.append(vertex)
for j = 0 to N
if adj[vertex][j] is TRUE and visited[j] is FALSE
in_degree[j] = in_degree[j] - 1
if in_degree[j] is 0
enqueue(Queue, j)
visited[j] = TRUE
return T
Dijkstra’s Algorithm
Step 2: Check for adjacent Nodes, Now we have to choices (Either choose Node1 with distance 2 or either choose
Node 2 with distance 6 ) and choose Node with minimum distance. In this step Node 1 is Minimum distance
adjacent Node, so marked it as visited and add up the distance.
Distance: Node 0 -> Node 1 = 2
Dijkstra’s Algorithm
Step 3: Then Move Forward and check for adjacent Node which is Node 3, so marked it as visited and add up the
distance, Now the distance will be:
Distance: Node 0 -> Node 1 -> Node 3 = 2 + 5 = 7
Dijkstra’s Algorithm
Step 5: Again, Move Forward and check for adjacent Node which is Node 6, so marked it as visited and add up
the distance, Now the distance will be:
Distance: Node 0 -> Node 1 -> Node 3 -> Node 4 -> Node 6 = 2 + 5 + 10 + 2 = 19
Dijkstra’s Algorithm
Dijkstra’s Algorithm
So, the Shortest Distance from the Source Vertex is 19 which is optimal one.