0% found this document useful (0 votes)
1 views

Data Structures and Design Notes

The document provides an overview of Abstract Data Types (ADTs) and Object-Oriented Programming (OOP) concepts, including classes, objects, inheritance, polymorphism, and encapsulation, particularly in the context of Python. It also discusses the importance of algorithm analysis, focusing on time and space complexity, and introduces asymptotic notations for measuring algorithm efficiency. Additionally, it covers shallow and deep copying, namespaces, and the significance of OOP in simplifying project development and maintenance.

Uploaded by

suganyaa.aids
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Data Structures and Design Notes

The document provides an overview of Abstract Data Types (ADTs) and Object-Oriented Programming (OOP) concepts, including classes, objects, inheritance, polymorphism, and encapsulation, particularly in the context of Python. It also discusses the importance of algorithm analysis, focusing on time and space complexity, and introduces asymptotic notations for measuring algorithm efficiency. Additionally, it covers shallow and deep copying, namespaces, and the significance of OOP in simplifying project development and maintenance.

Uploaded by

suganyaa.aids
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 74

Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


Unit I
Abstract Data Types
Abstract Data Types (ADTs) – ADTs and classes – introduction to OOP – classes in Python – inheritance –
namespaces – shallow and deep copying Introduction to analysis of algorithms – asymptotic notations –
divide & conquer – recursion – analyzing recursive algorithms
1.1 Abstract data type model
Before knowing about the abstract data type model, we should know about abstraction and encapsulation.
Abstraction: It is a technique of hiding the internal details from the user and only showing the necessary
details to the user.
Encapsulation: It is a technique of combining the data and the member function in a single unit is known as
encapsulation.

The above figure shows the ADT model. There are two types of models in the ADT model, i.e., the public
function and the private function. The ADT model also contains the data structures that we are using in a
program. In this model, first encapsulation is performed, i.e., all the data is wrapped in a single unit, i.e.,
ADT. Then, the abstraction is performed means showing the operations that can be performed on the data
structure and what are the data structures that we are using in a program.

Introduction of Object Oriented Programming


1.2 OOPs Concepts:
 Class
 Objects
 Data Abstraction
 Encapsulation
 Inheritance
 Polymorphism
 Dynamic Binding
 Message Passing

1. Class:
A class is a user-defined data type. It consists of data members and member functions, which can be
accessed and used by creating an instance of that class. It represents the set of properties or methods that are
common to all objects of one type. A class is like a blueprint for an object.
For Example: Consider the Class of Cars. There may be many cars with different names and brands but all
of them will share some common properties like all of them will have 4 wheels, Speed Limit, Mileage range,
etc. So here, Car is the class, and wheels, speed limits, mileage are their properties.
2. Object:
It is a basic unit of Object-Oriented Programming and represents the real-life entities. An Object is an
instance of a Class. When a class is defined, no memory is allocated but when it is instantiated (i.e. an object

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


is created) memory is allocated. An object has an identity, state, and behavior. Each object contains data and
code to manipulate the data. Objects can interact without having to know details of each other’s data or code,
it is sufficient to know the type of message accepted and type of response returned by the objects.
For example “Dog” is a real-life Object, which has some characteristics like color, Breed, Bark, Sleep, and
Eats.

Object

3. Data Abstraction:
Data abstraction is one of the most essential and important features of object-oriented programming. Data
abstraction refers to providing only essential information about the data to the outside world, hiding the
background details or implementation. Consider a real-life example of a man driving a car. The man only
knows that pressing the accelerators will increase the speed of the car or applying brakes will stop the car,
but he does not know about how on pressing the accelerator the speed is increasing, he does not know about
the inner mechanism of the car or the implementation of the accelerator, brakes, etc in the car. This is what
abstraction is.
4. Encapsulation:
Encapsulation is defined as the wrapping up of data under a single unit. It is the mechanism that binds
together code and the data it manipulates. In Encapsulation, the variables or data of a class are hidden from
any other class and can be accessed only through any member function of their class in which they are
declared. As in encapsulation, the data in a class is hidden from other classes, so it is also known as data-
hiding.

Consider a real-life example of encapsulation, in a company, there are different sections like the accounts
section, finance section, sales section, etc. The finance section handles all the financial transactions and
keeps records of all the data related to finance. Similarly, the sales section handles all the sales-related
activities and keeps records of all the sales. Now there may arise a situation when for some reason an official
from the finance section needs all the data about sales in a particular month. In this case, he is not allowed to
directly access the data of the sales section. He will first have to contact some other officer in the sales
section and then request him to give the particular data. This is what encapsulation is. Here the data of the
sales section and the employees that can manipulate them are wrapped under a single name “sales section”.
5. Inheritance:
Inheritance is an important pillar of OOP(Object-Oriented Programming). The capability of a class to derive
properties and characteristics from another class is called Inheritance. When we write a class, we inherit
properties from other classes. So when we create a class, we do not need to write all the properties and
functions again and again, as these can be inherited from another class that possesses it. Inheritance allows
the user to reuse the code whenever possible and reduce its redundancy.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

6. Polymorphism:
The word polymorphism means having many forms. In simple words, we can define polymorphism as the
ability of a message to be displayed in more than one form. For example, A person at the same time can have
different characteristics. Like a man at the same time is a father, a husband, an employee. So the same
person posses different behavior in different situations. This is called polymorphism.

7. Dynamic Binding:
In dynamic binding, the code to be executed in response to the function call is decided at runtime. Dynamic
binding means that the code associated with a given procedure call is not known until the time of the call at
run time. Dynamic Method Binding One of the main advantages of inheritance is that some derived class D
has all the members of its base class B. Once D is not hiding any of the public members of B, then an object
of D can represent B in any context where a B could be used. This feature is known as subtype
polymorphism.
8. Message Passing:
It is a form of communication used in object-oriented programming as well as parallel programming.
Objects communicate with one another by sending and receiving information to each other. A message for
an object is a request for execution of a procedure and therefore will invoke a function in the receiving
object that generates the desired results. Message passing involves specifying the name of the object, the
name of the function, and the information to be sent.
Why do we need object-oriented programming
 To make the development and maintenance of projects more effortless.
 To provide the feature of data hiding that is good for security concerns.
 We can solve real-world problems if we are using object-oriented programming.
 It ensures code reusability.
 It lets us write generic code: which will work with a range of data, so we don’t have to write basic stuff
over and over again.
1.3 Python Classes and Objects
Python is an object oriented programming language.
Almost everything in Python is an object, with its properties and methods.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


A Class is like an object constructor, or a "blueprint" for creating objects.

Create a Class
To create a class, use the keyword class:
Example

Create a class named MyClass, with a property named x:

classMyClass:
x=5
Create Object
Now we can use the class named MyClass to create objects:
Example
Create an object named p1, and print the value of x:
p1=MyClass()
print(p1.x)
The __init__() Function
The examples above are classes and objects in their simplest form, and are not really useful in real life
applications.
To understand the meaning of classes we have to understand the built-in __init__() function.
All classes have a function called __init__(), which is always executed when the class is being initiated.
Use the __init__() function to assign values to object properties, or other operations that are necessary to do
when the object is being created:
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
p1 = Person("John", 36)
print(p1.name)
print(p1.age)
Object Methods
Objects can also contain methods. Methods in objects are functions that belong to the object.
Let us create a method in the Person class:
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
defmyfunc(self):
print("Hello my name is " + self.name)
p1 = Person("John", 36)
p1.myfunc()
The self Parameter
The self parameter is a reference to the current instance of the class, and is used to access variables that
belong to the class.
It does not have to be named self, you can call it whatever you like, but it has to be the first parameter of any
function in the class:
class Person:
def __init__(mysillyobject, name, age):
mysillyobject.name = name

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


mysillyobject.age = age
defmyfunc(abc):
print("Hello my name is " + abc.name)
p1 = Person("John", 36)
p1.myfunc()
Inheritance in Python
One of the core concepts in object-oriented programming (OOP) languages is inheritance. It is a mechanism
that allows you to create a hierarchy of classes that share a set of properties and methods by deriving a class
from another class. Inheritance is the capability of one class to derive or inherit the properties from another
class.
Benefits of inheritance are:
Inheritance allows you to inherit the properties of a class, i.e., base class to another, i.e., derived class. The
benefits of Inheritance in Python are as follows:
 It represents real-world relationships well.
 It provides the reusability of a code. We don’t have to write the same code again and again. Also, it
allows us to add more features to a class without modifying it.
 It is transitive in nature, which means that if class B inherits from another class A, then all the subclasses
of B would automatically inherit from class A.
 Inheritance offers a simple, understandable model structure.
 Less development and maintenance expenses result from an inheritance.
Python Inheritance Syntax
The syntax of simple inheritance in Python is as follows:
ClassBaseClass:
{Body}
ClassDerivedClass(BaseClass):
{Body}
Creating a Parent Class
A parent class is a class whose properties are inherited by the child class. Let’s create a parent class called
Person which has a Display method to display the person’s information.
classPerson(object):
def__init__(self,name,id):
self.name=name
self.id=id
defDisplay(self):
print(self.name,self.id)
emp=Person("Satyam",102) An Object of Person
emp.Display()
Creating a Child Class
A child class is a class that drives the properties from its parent class. Here Emp is another class that is
going to inherit the properties of the Person class (base class).
classEmp(Person):
defPrint(self):
print("Emp class called")
Emp_details=Emp("Mayank",103)
Emp_details.Display()
Emp_details.Print()
Example of Inheritance in Python
Let us see an example of simple Python inheritance in which a child class is inheriting the properties of its
parent class. In this example, ‘Person’ is the parent class, and ‘Employee’ is its child class.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


classPerson(object):
def__init__(self,name):
self.name=name
defgetName(self):
returnself.name
defisEmployee(self):
returnFalse
classEmployee(Person):
defisEmployee(self):
returnTrue
emp=Person("Geek1")# An Object of Person
print(emp.getName(),emp.isEmployee())
emp=Employee("Geek2")# An Object of Employee
print(emp.getName(),emp.isEmployee())
1.4 Namespace
A namespace in computing is a collection of identifiers that are used to uniquely identify
objects. Namespaces are used to organize code into logical groups and to prevent name collisions. They can
be used in a variety of contexts, including:
 File systems: Assign names to files
 Programming languages: Organize variables and subroutines
 Computer networks: Assign names to resources like computers, printers, and websites
 Databases: Define groups of tables, where all table names must be unique
Namespaces are often structured as hierarchies, which allows names to be reused in different contexts. For
example, in a family, each person has a unique combination of first and last name. Within the family
namespace, "Jane" can refer to a specific person, but within the global namespace, the full name is required.
In Python, namespaces are collections of objects with unique names. The lifespan of a namespace depends
on the scope of a variable. There are three levels of scope: built-in, global, and local.

1.5 Shallow copying and deep copying


Shallow copying and deep copying are two different ways of duplicating an object or data structure:
 Shallow copy
Creates a copy of the original object by storing the reference address. The copied object and the original
object share the same reference, so changes to one will affect the other. Shallow copies are faster than deep
copies.
 Deep copy
Creates a new, independent copy of the original object and all of its data. The copied object and the original
object do not share any data, so changes to one will not affect the other. Deep copies are slower than shallow
copies and can consume more memory, especially for large or complex data structures.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


1.6 Introduction to analysis of algorithms
COMPLEXITY OF ALGORITHMS
Generally algorithms are measured in terms of time complexity and space complexity.
1. Time Complexity
Time Complexity of an algorithm is a measure of how much time is required to execute an algorithm for a
given number of inputs. And it is measured by its rate of growth relative to a standard function. Time
Complexity can be calculated by adding compilation time and execution time. Or it can do by counting the
number of steps in an algorithm.
2. Space Complexity
Space Complexity of an algorithm is a measure of how much storage is required by the algorithm. Thus
space complexity is the amount of computer memory required during program execution as a function of
input elements. The space requirement of algorithm can be performed at compilation time and run time.
1.6.1 ASYMPTOTIC ANALYSIS
For analysis of algorithm it needs to calculate the complexity of algorithms in termsof resources, such as
time and space. But when complexity is calculated, it does not providethe exact amount of resources
required. Therefore instead of taking exact amount of resources, the complexity of algorithm is represented
in a general mathematical form which will give the basic nature of algorithm.
Thus on analyzing an algorithm, it derives a mathematical formula to represent amount of time and space
required for the execution. The Asymptotic analysis of algorithm evaluates the performance of an algorithm
in terms of input size It calculates how does the time taken by an algorithm increases with input size. It
focuses on:
i) Analyzing the problem with large input size.
ii) Considers only leading terms of formula. Since the lower order term contracts lesser to the overall
complexity as input grows larger.
iii) Ignores the coefficient of leading term.
1.6.2 ASYMPTOTIC NOTATION
The commonly used Asymptotic Notations to calculate the complexity of algorithms are:
1. Big Oh – O: - The notation O(𝒏) is the formal way to express the upper bound of an algorithm’s
complexity.

O(g(n)) = { f(n): there exist positive constants c and n0 such that 0 <= f(n) <= c*g(n) for all n >= n0}
2. Big Omega – Ω: - The notation Ω(𝒏) is the formal way of representing lower bound of an algorithm’s
complexity.

Ω (g(n)) = {f(n): there exist positive constants c andn0 such that 0 <= c*g(n) <= f(n) for all n >= n0}.
3. Big Theta – Ө: - The notation Ө(𝒏) is the formal way to representing lower bound and upper bound of an
algorithm’s complexity.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

Θ(g(n)) = {f(n): there exist positive constants c1, c2 and n0 such that 0 <= c1*g(n) <= f(n) <= c2*g(n) for all
n >= n0}
1.7 Divide and conquer
Divide and conquer is a problem-solving technique in computer science that breaks down large problems
into smaller, more manageable subproblems. The solutions to the subproblems are then combined to solve
the original problem.
The steps of a divide and conquer algorithm are:
 Divide: Break the problem into smaller subproblems
 Conquer: Solve the subproblems recursively
 Combine: Merge the solutions of the subproblems to solve the original problem
Divide and conquer is more efficient than brute force approaches. It's used in many algorithms, including:
 Merge sort: Divides an array into two halves, calls itself for each half, and then merges the two sorted
halves
 Quick sort: Another algorithm based on the divide and conquer method
Designing efficient divide and conquer algorithms can be difficult. The correctness of a divide and conquer
algorithm is usually proved by mathematical induction, and its computational cost is often determined by
solving recurrence relations.

1.8 Recursion
The term Recursion can be defined as the process of defining something in terms of itself. In simple words,
it is a process in which a function calls itself directly or indirectly.
Advantages of using recursion
 A complicated function can be split down into smaller sub-problems utilizing recursion.
 Sequence creation is simpler through recursion than utilizing any nested iteration.
 Recursive functions render the code look simple and effective.
Disadvantages of using recursion
 A lot of memory and time is taken through recursive calls which makes it expensive for use.
 Recursive functions are challenging to debug.
 The reasoning behind recursion can sometimes be tough to think through.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


Example
defrecursive_fibonacci(n):
if n <= 1:
return n
else:
return(recursive_fibonacci(n-1) + recursive_fibonacci(n-2))
n_terms = 10
ifn_terms<= 0:
print("Invalid input ! Please input a positive value")
else:
print("Fibonacci series:")
fori in range(n_terms):
print(recursive_fibonacci(i))

1.9 Analysis of Recursive Algorithms

Example: Factorial
n! = 1•2•3...n and 0! = 1 (called initial case)
So the recursive defintiion n! = n•(n-1)!

Algorithm F(n)
if n = 0 then return 1 // base case
else F(n-1)•n // recursive call
Basic operation? multiplication during the recursive call
Formula for multiplication
M(n) = M(n-1) + 1
is a recursive formula too. This is typical.

We need the initial case which corresponds to the base case


M(0) = 0
There are no multiplications

Solve by the method of backward substitutions


M(n) = M(n-1) + 1

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


= [M(n-2) + 1] + 1 = M(n-2) + 2 substituted M(n-2) for M(n-1)
= [M(n-3) + 1] + 2 = M(n-3) + 3 substituted M(n-3) for M(n-2)
... a pattern evolves
= M(0) + n
=n
Therefore M(n) ε Θ(n)

Procedure for Recursive Algorithm


1. Specify problem size
2. Identify basic operation
3. Worst, best, average case
4. Write recursive relation for the number of basic operation. Don't forget the initial conditions (IC)
5. Solve recursive relation and order of growth

Recursive Algorithm for Fibonacci Numbers


Algorithm F(n)
if n ≤ 1 then return n
else return F(n-1) + F(n-2)

1. Problem size is n, the sequence number for the Fibonacci number


2. Basic operation is the sum in recursive call
3. No difference between worst and best case
4. Recurrence relation
A(n) = A(n-1) + A(n-2) + 1
IC: A(0) = A(1) = 0
or
A(n) - A(n-1) - A(n-2) = 1, Inhomogeneous recurrences because of the 1

In general solution to the inhomogeneous problem is equal to the sum of solution to homogenous problem
plus solution only to the inhomogeneous part. The undetermined coefficients of the solution for the
homogenous problem are used to satisfy the IC.

In this case A(n) = B(n) + I(n) where


A(n) is solution to complete inhomogeneous problem
B(n) is solution to homogeneous problem

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


I(n) solution to only the inhomogeneous part of the problem

We guess at I(n) and then determine the new IC for the homogenous problem for B(n)

For this problem the correct guess is I(n) = 1

substitute A(n) = B(n) -1 into the recursion and get


B(n) - B(n-1) - B(n-2) = 0 with IC B(0) = B(1) = 1

The same as the relation for F(n) with different IC


We do not really need the exact solution; We can conclude
A(n) = B(n) -1 = F(n+1) - 1 ε Θ(φ ), exponential
n

There is the Master Theorem that give the asymptotic limit for many common problems.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


Unit II
Linear Structures
List ADT – array-based implementations – linked list implementations – singly linked lists – circularly linked lists –
doubly linked lists – Stack ADT – Queue ADT – double-ended queues – applications
List is a collection of elements in sequential order. In memory we can store the list in two ways, one way is we can
store the elements in sequential memory locations. This is known as arrays.
2.1 List ADT
• List is a collection of elements in sequential order.
• In memory we can store the list in two ways, one way is we can store the elements in sequential memory locations.
This is known as arrays. And the other way is, we can use pointers or links to associate the elements sequentially. This
known as Linked Lists.

• In ADT the implementation details are hidden. Hence the ADT will be-
AbstractDataType List
{
Instances: List is a collection of elements which are arranged in a linear manner.
Operations: Various operations that can be carried out on list are -
1. Insertion: This operation is for insertion of element in the list.
2. Deletion: This operation removed the element from the list.
3. Searching: Based on the value of the key element the desired element can be searched.
4. Modification: The value of the specific element can be changed without changing its location.
5. Display: The list can be displayed in forward or in backward manner.
}
• The List can be implemented by two ways
1. Array based implementation.
2. Linked List based implementation.
2.2 Array based Implementation
• The linked list that can be represented by arrays is called static linked list.
• In this section we will discuss in detail how exactly the list can be represented using arrays.
• Basically list is a collection of elements.
• To show the list using arrays we will have 'data' and 'link' fields in the array.
• The array can be created as array of structure as
struct node
{
int data;
int next;
} a[10];
• For instance: Consider a list of (10,20,30,40,50). We can store it in arrays as :

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

• Various operations that can be performed on static linked list are -


1. Creation of list.
2. Display of list.
3. Insertion of any element in the list.
4. Deletion of any element from the list.
5. Searching of desired element in the list.
Let us understand each operation one by one.
1. Creation of list
The list can be created by placing the node of list in an array. Each node consists of two fields 'data' and 'next' pointer.
We need to give the address of starting node which is to be placed in the array.
Thus the list can be created as follows

While creating the list we have to first enter the location in an array where the first node is placed and then input the
pair: Data and next.
2. Display
After creation we can display the list. Normally when the list is displayed simply the data fields are to be displayed.
After creation we can display the list. Normally when the list is displayed simply the data fields are to be displayed.
In Insert function, consider that we have already created a list (10, 20, 30, 40) as

Suppose we want to insert value 21 after 20 then

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

Now using for loop we will search 'temp' in array 'a'. When we get the value 'temp' we will come out of loop by using
break statement and check for whether the next location is empty or not.

The list after insertion of 21 will look like this

3. Deletion of a node
While deleting a node from the list we simply manipulate the next pointer of previous node in the list. And the data
field of the node to be deleted is initialized to - 1.
Consider that the list is

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

Limitations
1. There is a limitation on the number of nodes in the list because of fixed size of array. Hence sometimes memory
may get wasted because of less elements in the list or there may be large number of nodes in the list and we could not
store some elements in the array.
2. Insertions and deletions of elements in the array(list) is complicated (Just refer the functions Insert() and Delete()
and feel how complicated they are!)

2.3 Linked List Implementation


• Various operations of linked list are
1. Creation of linked list.
2. Display of linked list.
3. Insertion of any element in the linked list.
4. Deletion of any element from the linked list.
5. Searching of desired element in the linked list.
We will discuss each operation one by one -
Creation of linked list (logic explanation part) :
Initially one variable flag is taken whose value is initialized to TRUE (i.e. 1). The purpose of flag is for making a
check on creation of first node. That means if flag is TRUE then we have to create head node or first node of linked
list. Naturally after creation of first node we will reset the flag (i.e. assign FALSE to flag). Consider that we have
entered the element value 10 initially then,
Step 1:

New = get_node ();


/* memory gets allocated for New node */
New data = Val;
/* value 10 will be put in data field of New */
Step 2:

If (flag == TRUE)
{
Head = New
temp head;

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


/* We have also called this node as temp because head's address will be preserved in 'head' and we can change 'temp'
node as per requirement */
flag=FALSE;
/* After creation of first node flag is reset */
Step 3:
If head node of linked list is created we can further create a linked list by attaching the subsequent nodes. Suppose we
want to insert a node with value 20 then,

Step 4:
If user wants to enter more elements then let say for value 30 the scenario will be,

Step 5:

2. Display of linked list


We are passing the address of head node to the display routine and calling head as the 'temp' node. If the linked list is
not created then naturally head NULL. Therefore the message "The list is empty" will be displayed.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

As now value of temp becomes NULL we will come out of while loop. As a result of such display routine we will get,
10 20 30 40 50 → NULL
will be printed on console.
3. Insertion of any element at anywhere in the linked list
There are three possible cases when we want to insert an element in the linked list -
a) Insertion of a node as a head node.
b) Insertion of a node as a last node.
c) Insertion of a node after some node.

Now we will insert a node at the end -


void insert_last(node *head)
{
node *New,*temp;
New=get_node();
printf("\n Enter The element which you want to insert");
scanf("%d", &New->data);
if(head == NULL)
head=New;

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


else
{
temp=head;
while(temp->next!= NULL)
temp-temp->next;
temp->next=New;
New->next = NULL;
}
}
To attach a node at the end of linked list assume that we have already created a linked list like this –

Thus desired node gets inserted at desired position.


4. Deletion of any element from the linked list

Suppose we want to delete node 30. Then we will search the node containing 30, using search (*head, key) routine.
Mark the node to be deleted as temp. Then we will obtain previous node of temp using get_prev () function. Mark
previous node as prev

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


This can be done using following statements
* head = temp->next;
free (temp);
5. Searching of desired element in the linked list
The search function is for searching the node containing desired value. We pass the head of the linked list to this
routine so that the complete linked list can be searched from the beginning.

Consider that we have created a linked list as

Suppose key=30 i.e. we want a node containing value 30 then compare temp → data and key value. If there is no
match then we will mark next node as temp.

Hence print the message "The Element is present in the list".


Thus in search operation the entire list can be scanned in search of desired node. And still, if required node is not
obtained then we have to print the message.
"The Element is not present in the list"
Let us see the complete program for it -

2.4 Circularly Linked List


Definition: The Circular Linked List (CLL) is similar to singly linked list except that the last node's next pointer
points to first node.
The circular linked list is as shown below

• Various operations that can be performed on circular linked list are,


1. Creation of circular linked list.
2. Insertion of a node in circular linked list.
3. Deletion of any node from linked list.
4. Display of circular linked list.
We will see each operation along with some example.
1. Creation of circular linked list

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


Initially we will allocate memory for New node using a function get_node(). There is one variable flag whose purpose
is to check whether first node is created or not. That means flag is 1 (set) then first node is not created. Therefore after
creation of first node we have to reset the flag (making flag = 0).
Initially,

Here variable head indicates starting node.


Now as flag = 0, we can further create the nodes and attach them as follows.
When we have taken element '20'

Thus we can create a circular linked list by inserting one more element 50. It is as shown below

2.Insertion of circular linked list


While inserting a node in the linked list, there are 3 cases -
a) Inserting a node as a head node
b) Inserting a node as a last node
c) Inserting a node at intermediate position
The functions for these cases is as given below

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

4. Deletion of any node


Suppose we have created a linked list as,

5. Searching a node from circular linked list

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


while searching a node from circular linked list we go on comparing the data field of each node starting from the head
node. If the node containing desired data is found we return the address of that node to calling function.
Advantages of Circular Linked List over Singly Linked List
In circular linked list the next pointer of last node points to the head node. Hence we can move from last node to the
head node of the list very efficiently. Hence accessing of any node is much faster than singly linked list.
Applications of circular list :
1) The circular list is used to create the linked lists in which the last node points to the first node.
2) In round robin method the circular linked list is used.
3) The circular linked list is used in solving Josephus problem. 4) For representing the polynomial the circular linked
list.
2.5 Doubly Linked List
• Definition: The doubly linked list has two link fields. One link field is previous pointer and the other link field is that
next pointer.
• The typical structure of each node in doubly linked list is like this.

• Representation
The linked representation of a doubly linked list is

Thus the doubly linked list can traverse in both the directions, forward as well as backwards.

Now, there may be one question in your mind that bon how do the empty doubly circular linked lists look ? he? Here
is the representation.
That means the prev and next pointers are pointing to the self node.
Logic explanation of DLL program
1. Creation of doubly linked list
Step 1: Initially one variable flag is taken whose value is initialized to TRUE. The purpose of this flag is for making a
check on creation of first node. After creating first node reset flag (i.e. assign FALSE to flag)

Step 2: If head node is created, we can further create a linked list by attaching the subsequent nodes. Suppose, we
want to insert a node with value 20 then

Step 3; If user wants to insert a node with a value say 30 then

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

Continuing in this fashion doubly linked can be created.


2. Display of DLL
As each of DLL contains two link fields -
Previous link and next link. Hence we can display DLL in forward as well as in reverse direction.
Suppose we have created a DLL as
Forward Display

As now temp becomes NULL, we will come out of the while loop. Hence as a result forward display will be
10→20-30→40
Reverse Display
First of all we must reach at the last node by using while statement.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


Now if we set temp = temp → prev, then temp becomes NULL, hence we will come out of while loop. Hence we get
the display as
40->30->20->10
3. Insertion of a node in DLL
There are three cases for insertion of a node
1) Insertion of a node at the beginning.
2) Insertion of a node at the end.
3) Insertion of a node at intermediate position.
Case 1: Inserting a node at the beginning
Suppose we want to insert a node with value 9 then

Case 2: Insertion of a node at the end suppose we want to insert a node with value 40 then, first move temp pointer to
the last node

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


Case 3 : Inserting a node at the intermediate position.
Suppose, we want to insert a node with value 22 after node with value 20, then first search the node with value 20

4. Deletion of node
Suppose we want to delete a node with a value 20 then, first search this node, call it as temp.

2.6 Stack ADT


• Stack is a data structure which posses LIFO i.e. Last In First Out property.
• The abstract data type for stack can be as given below.
AbstractDataType stack
{
Instances: Stack is a collection of elements in which insertion and deletion of elements is done by one end called top.
Preconditions:
1. Stfull (): This condition indicates whether the stack is full or not. If the stack is full then we cannot insert the
elements in the stack.
2. Stempty(): This condition indicates whether the stack is empty or not. If the stack is empty then we cannot pop or
remove any element from the stack.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


Operations:
1. Push: By this operation one can push elements onto the stack. Before performing push we must check stfull ()
condition.
2. Pop : By this operation one can remove the elements from stack. Before popping the elements from stack we
should check stempty () condition.

2.8 Queue ADT


AbstractData Type Queue
{
Instances: The queue is a collection of elements in which the element can be inserted by one end called rear and
elements can be deleted by other end called front.
Operations:
1. Insert The insertion of the element in the queue is done by the end called rear. Before the insertion of the element in
the queue it is checked whether or not the queue is full.
2. Delete: The deletion of the element from the queue is done by the end called front. Before performing the delete
operation it checked whether the queue is empty or not.
}

2.9 Double Ended Queue:


A double-ended queue (deque) is a data structure that allows elements to be added or removed from either the front or
back of the queue. It's a generalized version of a linear queue, which has restrictions on where elements can be added
and removed.
Here are some characteristics of a deque:
 Ordered collection: Items remain in a specific position in the collection.
 Hybrid linear structure: Provides the capabilities of both stacks and queues.
 No LIFO or FIFO order: Unlike stacks and queues, deques don't enforce LIFO or FIFO orderings.
 Circular array: A circular array can be used to implement a deque. When the array is full, the next element is
considered to be the first element.
A deque is often called a head-tail linked list, but this term more specifically refers to a specific data structure
implementation of a deque.

2.10 Infix to Postfix Conversion


Algorithm
Read an expression from left to right each character one by one
1. If an operand is encountered then add it to postfix array.
2. If '(' is read, then simply push it onto the stack. Because the (has highest priority when read as an input.
3. If ')' is reads, then pop all the operands until (is read. Discard (. Store the popped characters in the postfix array.
4. If operator is read then
i) If instack operator has greatest precedence (or equal to) over the incoming operator then pop the operator and add it
to postfix expression. Repeat this step until we get the instack operator of higher priority than the current incoming
operator. Finally push the incoming operator onto the stack.
ii) Else push the operator.
5. The postfix expression in in array.
The priorities of different operators when they are in stack will be called instack operators and the priorities of
different operator when they are read from input will be called as incoming priorities. These priorities are as given
below -

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

Ex. 2.7.1: A * B + C $. Obtain postfix expression.


Sol. :

Ex. 2.7.2 Obtain the postfix expression for (A+B)*(C-D)$.


Sol. :

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

Unit III
Sorting And Searching
Bubble sort – selection sort – insertion sort – merge sort – quick sort – analysis of sorting algorithms – linear
search – binary search – hashing – hash functions – collision handling – load factors, rehashing, and
efficiency
3.1 Bubble Sort
This is the simplest kind of sorting method in this method. We do this bubble sort procedure in several
iterations, which are called passes.
Sort the following list of numbers using bubble sort technique 52, 1, 27, 85, 66, 23, 13, 57.
Sol.
Pass 1:

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

Pass 2:

Pass 5:

Pass 6:

No interchange takes place. Hence the sorted list is

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

3.2 Selection Sort


Scan the array to find its smallest element and swap it with the first element. Then, starting with the second
element scan the entire list to find the smallest element and swap it with the second element. Then starting
from the third element the entire list is scanned in order to find the next smallest element. Continuing in this
fashion we can sort the entire list.
Generally, on pass i (0 ≤i≤n-2), the smallest element is searched among last n-i elements and is swapped
with A[i]

The list gets sorted after n - 1 passes.


Example 1:
Consider the elements
70, 30, 20, 50, 60, 10, 40
We can store these elements in array A as:

1 pass:
st

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

2 pass
nd

3 pass:
rd

As there is no smallest element than 30 we will increment i pointer.


4 pass:
th

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

6th pass:

This is a sorted array.

3.3 Insertion Sort


In this method the elements are inserted at their appropriate place. Hence is the name insertion sort. Let us
understand this method with the help of some example -
For Example
Consider a list of elements as,

The process starts with first element

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

Advantages of insertion sort


1. Simple to implement.
2. This method is efficient when we want to sort small number of elements. And this method has excellent
performance on almost sorted list of elements.
3. More efficient than most other simple O (n2) algorithms such as selection sort or
bubble sort.
4. This is a stable (does not change the relative order of equal elements).
5. It is called in-place sorting algorithm (only requires a constant amount O(1) of extra memory space). The
in-place sorting algorithm is an algorithm in which the input is overwritten by output and to execute the
sorting method it does not require any more additional space.
3.4 Merge Sort
Merge sort is a sorting algorithm in which array is divided repeatedly. The sub arrays are sorted
independently and then these subarrays are combined together to form a final sorted list.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


Merge sort on an input array with n elements consists of three steps:
Divide: partition array into two sub lists s1 and s2 with n/2 elements each.
Conquer: Then sort sub list s1 and sub list s2.
Combine merge s1 and s2 into a unique sorted group.
Ex. 6.8.1: Consider the following elements for sorting using merge sort
70, 20, 30, 40, 10, 50, 60
Sol. Now we will split this list into two sublists.

Algorithm
MergeSort(int A[0...n-1],low,high)
//Problem Description: This algorithm is for
//sorting the elements using merge sort
//Input: Array A of unsorted elements, low as //beginning
//pointer of array A and high as end pointer of array A
//Output: Sorted array A[0...n-1]
if(low < high)then
{
mid←(low+high)/2 //split the list at mid
MergeSort(A,low,mid) //first sublist

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


MergeSort(A,mid+1,high) //second sublist
Combine(A,low,mid,high) //merging of two sublists
}
3.5 QuickSort
QuickSort is a sorting algorithm based on the Divide and Conquer that picks an element as a pivot and
partitions the given array around the picked pivot by placing the pivot in its correct position in the sorted
array.
How does QuickSort Algorithm work?
QuickSort works on the principle of divide and conquer, breaking down the problem into smaller sub-
problems.
There are mainly three steps in the algorithm:
1. Choose a Pivot: Select an element from the array as the pivot. The choice of pivot can vary (e.g.,
first element, last element, random element, or median).
2. Partition the Array: Rearrange the array around the pivot. After partitioning, all elements smaller
than the pivot will be on its left, and all elements greater than the pivot will be on its right. The pivot
is then in its correct position, and we obtain the index of the pivot.
3. Recursively Call: Recursively apply the same process to the two partitioned sub-arrays (left and
right of the pivot).
4. Base Case: The recursion stops when there is only one element left in the sub-array, as a single
element is already sorted.
Here’s a basic overview of how the QuickSort algorithm works.

Choice of Pivot
There are many different choices for picking pivots.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


 Always pick the first (or last) element as a pivot . The below implementation is picks the last element
as pivot. The problem with this approach is it ends up in the worst case when array is already sorted.
 Pick a random element as a pivot . This is a preferred approach because it does not have a pattern for
which the worst case happens.
 Pick the median element is pivot. This is an ideal approach in terms of time complexity as we can
find median in linear time and the partition function will always divide the input array into two
halves. But it is low on average as median finding has high constants.
6. Searching
 When we want to find out particular record efficiently from the given list of elements then there are
various methods of searching that element. These methods are called searching methods. Various
algorithms based on these searching methods are known as searching algorithms.
 The basic characteristic of any searching algorithm is
 i. It should be efficient
 ii. Less number of computations must be involved in it.
 iii. The space occupied by searching algorithms must be less.
 •The most commonly used searching algorithms are -
 i.Sequential or linear search
 ii. Indexed sequential search
 iii. Binary search
 • The element to be searched from the given list is called the key element. Let us discuss various
searching algorithms.
3.6.1 Linear Search
 •Linear search or sequential search is technique in which the given list of elements is scanned from
the beginning. The key element is compared with every element of the list. If the match is found the
searching is stopped otherwise it will be continued to the end of the list.
 •Although this is a simple method, there are some unnecessary comparisons involved in this method.
 •The time complexity of this algorithm is O(n). The time complexity will increase linearly with the
value of n.
 • For higher value of n sequential search is not satisfactory solution.
 Example


 From the above Fig. 6.1.1 the array is maintained to store the students record. The record is not
sorted at all. If we want to search the student's record whose roll number is 12 then with the key-roll
number we will see the every record whether it is of roll number We can obtain such a record at
Array [4] location.
 Advantages of linear searching

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


 1. It is simple to implement.
 2. It does not require specific ordering before applying the method.
 Disadvantage of linear searching
 1. It is less efficient..
2. Binary Search
 Definition: Binary search is a searching technique in which the elements are arranged in a sorted
order and each time mid element is compared with the key element recursively.
 Example:
 The necessity of this method is that all the elements should be sorted. So let us take an array of sorted
elements.


 Step 1: Now the key element which is to be searched is = 99. key = 99.
 Step 2: Find the middle element of the array. Compare it with the key


 if middle? Key
 i.e.if 42 ? 99
 if 42 < 99 search the sublist 2
 Now handle only sublist 2. Again divide it, find mid of sublist 2


 if middle? Key
 i.e.if 99 ? 99


 So match is found at 7 position of array i.e. at array [6]
th

 Thus by binary search method we can find the element 99 present in the given list at array [6]th
location.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

3.7 Hashing
Hashing is an effective way to reduce the number of comparisons. Actually hashing deals with the
idea of proving the direct address of the record where the record is likely to store. To understand the
idea clearly let us take an example -
Suppose the manufacturing company has an inventory file that consists of less than 1000 parts. Each
part is having unique 7 digit number. The number is called 'key' and the particular keyed record
consists of that part name. If there are less than 1000 parts then a 1000 element array can be used to
store the complete file. Such an array will be indexed from 0 to 999. Since the key number is 7 digit
it is converted to 3 digits by taking only last three digits of a key. This is shown in the Fig. 7.1.1.
Observe in Fig. 7.1.1 that the first key 496700 and it is stored at 0th position. The second key is
8421002. The last three digits indicate the position 2nd in the array. Let us search the element
4957397. Naturally it will be obtained at position 397. This method of searching is called hashing.
The function that converts the key (7 digit) into array position is called hash function. Here hash
function is

h(key)=key % 1000
Where key % 1000 will be the hash function and the key obtained by hash

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


function is called hash key.
3.7.1 Basic Concepts in Hashing
1) Hash Table
Hash table is a data structure used for storing and retrieving data quickly. Every entry in the hash
table is made using Hash function.
2) Hash Function:
•Hash function is a function used to place data in hash table.
• Similarly hash function is used to retrieve data from hash table.
• Thus the use of hash function is to implement hash table.
For example: Consider hash function as key mod 5. The hash table of size 5.

3) Bucket: The hash function H(key) is used to map several dictionary entries in the hash table. Each
position of the hash table is called bucket.
4) Collision: Collision is situation in which hash function returns the same address for more than one
record.
For example

5) Probe: Each calculation of an address and test for success is known as a probe. 6) Synonym: The
set of keys that has to the same location are called synonyms. For example - In above given hash
table computation 25 and 55 are synonyms.
7) Overflow: When hash table becomes full and new record needs to be inserted then it is called
overflow.
For example -

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

3.8 Hash Functions


There are various types of hash functions or hash methods which are used to place the elements in
hash table.

3.8.1 Division Method


The hash function depends upon the remainder of division.
Typically the divisor is table length. For example :-
If the record 54, 72, 89, 37 is to be placed in the hash table and if the table size is 10 then

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


3.8.2 Multiplicative Hash Function
The multiplicative hash function works in following steps
1) Multiply the key 'k' by a constant A where A is in the range 0 < A < 1. Then extract the fractional
part of KA.
2) Multiply this fractional part by m and take the floor.
The above steps can be formulated as
h(k) = m {KA}]
Fractional part
Donald Knuth suggested to use A =0.61803398987
Example;
Let key k = 107, assume m = 50.
A = 0.61803398987

That means 107 will be placed at index 6 in hash table.


Advantage: The choice of m is not critical
3.8.3 Extraction
In this method some digits are extracted from the key to form the address location in hash table.
For example: Suppose first, third and fourth digit from left is selected for hash key.

3.8.4 Mid Square


This method works in following steps
1) Square the key

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


2) Extract middle part of the result. This will indicate the location of the key element in the hash
table.
Note that if the key element is a string then it has to be preprocessed to produce a number.
Let key = 3111

For the hash table of size of 1000


H(3111)=783
3.9 Collision Resolution Techniques
Definition: If collisions occur then it should be handled by applying some techniques, such
techniques are called collision handling techniques.

3.9.1 Chaining
1. Chaining without replacement
In collision handling method chaining is a concept which introduces an additional field with data i.e.
chain. A separate chain table is maintained for colliding data. When collision occurs we store the
second colliding data by linear probing method. The address of this colliding data can be stored with
the first colliding element in the chain table, without replacement.
For example consider elements,
131, 3, 4, 21, 61, 6, 71, 8, 9

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

From the example, you can see that the chain is maintained the number who demands for location 1.
First number 131 comes we will place at index 1. Next comes 21 but collision Fig. 7.4.2 Chaining
without replacement occurs so by linear probing we will place 21 at index 2, and chain is maintained
by writing 2 in chain table at index 1 similarly next comes 61 by linear probing we can place 61 at
index 5 and chain will be maintained at index 2. Thus any element which gives hash key as 1 will be
stored by linear probing at empty location but a chain is maintained so that traversing the hash table
will be efficient.
The drawback of this method is in finding the next empty location. We are least bothered about the
fact that when the element which actually belonging to that empty location cannot obtain its location.
This means logic of hash function gets disturbed.
3.9.2 Chaining with replacement
As previous method has a drawback of loosing the meaning of the hash function, to overcome this
drawback the method known as chaining with replacement is introduced. Let us discuss the example
to understand the method. Suppose we have to store following elements :
131, 21, 31, 4, 5

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

Now next element is 2. As hash function will indicate hash key as 2 but already at index 2. We have
stored element 21. But we also know that 21 is not of that position at which currently it is placed.
Hence we will replace 21 by 2 and accordingly chain table will be updated. See the table :

The value -1 in the hash table and chain table indicate the empty location
The advantage of this method is that the meaning of hash function is preserved. But each time
some logic is needed to test the element, whether it is at its proper position.
3.9.3 Open Addressing
Open addressing is a collision handling technique in which the entire hash table is searched in
systematic way for empty cell to insert new item if collision occurs.
Various techniques used in open addressing are
1. Linear probing 2. Quadratic probing 3. Double hashing
1. Linear probing

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


When collision occurs i.e. when two records demand for the same location in the hash table, then the
collision can be solved by placing second record linearly down wherever the empty location is found.
For example

In the hash table given in Fig 7.4.3 the hash function used is number % 10. If the first 131 then 131
% 10= 1 i.e. remainder is 1 so hash key = 1. That means we are supposed to place the record at index
1. Next number is 21 which gives hash key 1 as 21 % 10 1. But already 131 is placed at index 1. That
means collision is occurred. We will now apply linear probing. In this method, we will search the
place for number 21 from location of 131. In this case we can place 21 at index 2. Then 31 at index 3.
Similarly 61 can be stored at 6 because number 4 and 5 are stored before 61. Because of this
technique, the searching becomes efficient, as we have to search only limited list to obtain the desired
number.
Problem with linear probing
One major problem with linear probing is primary clustering. Primary clustering is a process in
which a block of data is formed in the hash table when collision is resolved.
For example:

This clustering problem can be solved by quadratic probing.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


3.9.4 Quadratic probing
Quadratic probing operates by taking the original hash value and adding successive values of an
arbitrary quadratic polynomial to the starting value. This method uses following formula -
H (key) = (Hash(key)+i )%m
i
2

where m can be a table size or any prime number.


For example: If we have to insert following elements in the hash table with table size 10: 37, 90, 55,
22, 11, 17, 49, 87.
We will fill the hash table step by step

Now if we want to place 17 a collision will occur as 17% 10=7 and bucket 7 has already an element
37. Hence we will apply quadratic probing to insert this record in the hash table.
H (key)=(Hash(key)+i )%m
i
2

we will choose value i = 0, 1, 2, whichever is applicable.


Consider i = 0 then
(17+0 ) % 10 = 7
2

(17+1 ) % 10 = 8, when i = 1
2

The bucket 8 is empty hence we will place the element at index 8.


Then comes 49 which will be placed at index 9.
49%10 = 9
Now to place 87 we will use quadratic probing.
(87+0)%10 = 7
(87 + 1)%10 = 8...but already occupied

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


(87 +22)%10 = 1…already occupied
(87+32)%10 = 6... this slot is free
We place 87 at 6th index.

It is observed that if we want to place all the necessary elements in the hash table the size of divisor
(m) should be twice as large as total number of elements.
3.9.5 Double hashing
Double hashing is technique in which a second hash function is applied to the key when a collision
occurs. By applying the second hash function we will get the number of positions from the point of
collision to insert.
There are two important rules to be followed for the second function :
• It must never evaluate to zero.
• Must make sure that all cells can be probed.
The formula to be used for double hashing is
H (key) = key mod tablesize
1

H (key) = M - ( key mod M)


2

where M is a prime number smaller than the size of the table.


Consider the following elements to be placed in the hash table of size 10
37, 90, 45, 22, 17, 49, 55
Initially insert the elements using the formula for H1(key).
Insert 37, 90, 45, 22.

Now if 17 is to be inserted then

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


H (17) = 17% 10 = 7
1

H (key) = M - (key%M)
2

Here M is a prime number smaller than the size of the table. Prime number smaller than table size 10
is 7.
Hence M = 7
H (17) = 7- (17%7) = 7-3 = 4
2

That means we have to insert the element 17 at 4 places from 37. In short we have to take 4 jumps.
Therefore the 17 will be placed at index 1.
Now to insert number 55.
H (55)=55%10 = 5... collision
1

H (55)=7- (55%7) = 7-6 = 1


2

That means we have to take one jump from index 5 to place 55. Finally the hash table will be -

3.10 Rehashing:
Rehashing is the process of increasing the size of a hashmap and redistributing the elements to new
buckets based on their new hash values. It is done to improve the performance of the hashmap and to
prevent collisions caused by a high load factor.
When a hashmap becomes full, the load factor (i.e., the ratio of the number of elements to the number
of buckets) increases. As the load factor increases, the number of collisions also increases, which can
lead to poor performance. To avoid this, the hashmap can be resized and the elements can be
rehashed to new buckets, which decreases the load factor and reduces the number of collisions.
During rehashing, all elements of the hashmap are iterated and their new bucket positions are
calculated using the new hash function that corresponds to the new size of the hashmap. This process
can be time-consuming but it is necessary to maintain the efficiency of the hashmap.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


Why rehashing?
Rehashing is needed in a hashmap to prevent collision and to maintain the efficiency of the data
structure.
As elements are inserted into a hashmap, the load factor (i.e., the ratio of the number of elements to
the number of buckets) increases. If the load factor exceeds a certain threshold (often set to 0.75), the
hashmap becomes inefficient as the number of collisions increases. To avoid this, the hashmap can
be resized and the elements can be rehashed to new buckets, which decreases the load factor and
reduces the number of collisions. This process is known as rehashing.
Rehashing can be costly in terms of time and space, but it is necessary to maintain the efficiency of
the hashmap.
How Rehashing is done?
Rehashing can be done as follows:
 For each addition of a new entry to the map, check the load factor.
 If it’s greater than its pre-defined value (or default value of 0.75 if not given), then Rehash.
 For Rehash, make a new array of double the previous size and make it the new bucketarray.
 Then traverse to each element in the old bucketArray and call the insert() for each so as to insert it
into the new larger bucket array.
3.10.1 Complexity and Load Factor
1. The time taken in the initial step is contingent upon both the value of K and the chosen hash
function. For instance, if the key is represented by the string “abcd,” the hash function employed
might be influenced by the length of the string. However, as n becomes significantly large, where n is
the number of entries in the map, and the length of keys becomes negligible compared to n, the hash
computation can be considered to occur in constant time, denoted as O(1).
2. Moving on to the second step, it involves traversing the list of key-value pairs located at a specific
index. In the worst-case scenario, all n entries might be assigned to the same index, resulting in a
time complexity of O(n). Nevertheless, extensive research has been conducted to design hash
functions that uniformly distribute keys in the array, mitigating the likelihood of this scenario.
3. On average, with n entries and an array size denoted by b, there would be an average of n/b entries
assigned to each index. This n/b ratio is referred to as the load factor, representing the burden placed
on our map.
4. It is crucial to maintain a low load factor to ensure that the number of entries at a given index
remains minimal, thereby keeping the time complexity nearly constant at O(1).
 Complexity of Hashing
 Time complexity – O(nn)
 Space complexity – O(nn)
 Complexity of Rehashing
 Time complexity – O(nn)
 Space complexity – O(nn)

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

Unit IV
Tree Structures
Tree ADT – Binary Tree ADT – tree traversals – binary search trees – AVL trees – heaps – multiway search
trees
4.1 Trees ADT
A tree is a finite set of one or more nodes such that -
i) There is a specially designated node called root.
ii) The remaining nodes are partitioned into n >= 0 disjoint sets T1, T2, T3...Tn
T1,T2T3, ...T are called the sub-trees of the root.
The concept of tree is represented by following Fig. 4.1.1.

Various operations that can be performed on the tree data structure are -
1. Creation of a tree.
2. Insertion of a node in the tree as a child of desired node.
3. Deletion of any node(except root node) from the tree.
4. Modification of the node value of the tree.
5. Searching particular node from the tree.
4.2 Basic Terminologies
Let us get introduced with some of the definitions or terms which are normally used.

1. Root
Root is a unique node in the tree to which further subtrees are attached. For above given tree, node 10 is a
root node.
2. Parent node
The node having further sub-branches is called parent node. In Fig. 4.2.2 the 20 is parent node of 40, 50 and
60.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

3. Child nodes
The child nodes in above given tree are marked as shown below

4. Leaves
These are the terminal nodes of the tree.
For example -

5. Degree of the node


The total number of subtrees attached to that node is called the degree of a node.
For example.

6. Degree of tree
The maximum degree in the tree is degree of tree.

7. Level of the tree


The root node is always considered at level zero.
The adjacent nodes to root are supposed to be at level 1 and so on.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

8. Height of the tree


The maximum level is the height of the tree. In Fig. 4.2.7 the height of tree is 3. Sometimes height of the tree
is also called depth of tree.
9. Predecessor
While displaying the tree, if some particular node occurs previous to some other node then that node is
called predecessor of the other node.
For example: While displaying the tree in Fig. 4.2.7 if we read node 20 first and then if we read node 40,
then 20 is a predecessor of 40.
10. Successor
Successor is a node which occurs next to some node.
For example: While displaying tree in Fig. 4.2.7 if we read node 60 after-reading node 20 then 60 is called
successor of 20.
11. Internal and external nodes
Leaf node means a node having no child node. As leaf nodes are not having further links, we call leaf nodes
External nodes and non leaf nodes are called internal nodes.

12. Sibling
The nodes with common parent are called siblings or brothers.
For example

4.3 Binary Tree ADT


• Definition of a binary tree: A binary tree is a finite set of nodes which is either empty or consists of a root
and two disjoint binary trees called the left subtree and right subtree.
Abstract DataType BinT (node * root)
{
Instances

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


: Binary tree is a nonlinear data structure which contains every node except the leaf nodes at most two child
nodes.
Operations:
1. Insertion :
This operation is used to insert the nodes in the binary tree. By inserting desired number of nodes, the binary
tree gets created.
2. Deletion :
This operation is used to remove any node from the tree. Note that if root node is removed the tree becomes
empty.
}
4.4 Tree Traversal
• Definition: Tree traversal means visiting each node exactly once.
•Basically there are six ways to traverse a tree. For these traversals we will use following notations :
L for moving to left child
R for moving to right child
D for parent node
• Thus with L, R, D we will have six combinations such as LDR, LRD, DLR, DRL, RLD, RDL.
• From computation point of view, we will consider only three combinations as LDR, DLR and LRD i.e.
inorder, preorder and postorder respectively.
1) Inorder Traversal :
In this type of traversal, the left node is visited, then parent node and then right node is visited.
Algorithm:
1. If tree is not empty then
a. traverse the left subtree in inorder
b. visit the root node
c. traverse the right subtree in inorder
The recursive routine for inorder traversal is as given below to
void inorder(node *temp)
{
if(temp!= NULL)
{
inorder(temp->left);
printf("%d",temp->data);
inorder(temp->right);
}
}
2) Preorder Traversal :
In this type of traversal, the parent node or root node is visited first; then left node and finally right node will
be visited.
Algorithm:
1. If tree is not empty then
a. visit the root node
b. traverse the left subtree in preorder
c. traverse the right subtree in preorder
The recursive routine for preorder traversal is as given below.
void preorder(node *temp)
{
if(temp!= NULL)
{
printf("%d",temp->data);

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


preorder(temp->left);
preorder(temp->right);
}
}
3) Postorder Traversal :
In this type of traversal, the left node is visited first, then right node and finally parent node is visited.
Algorithm:
1. If tree is not empty then
a. traverse the left subtree in postorder
b. traverse the right subtree in postorder
c. visit the root node
The recursive routine for postorder traversal is as given below.
void postorder(node *temp)
{
if(temp!= NULL)
{
postorder(temp->left);
postorder(temp->right);
printf("%d",temp->data);
}
}
4.5 Binary Search Tree ADT
Binary search tree is a binary tree in which the nodes are arranged in specific order. That means the values at
left subtree are less than the root node value. Similarly the values at the right subtree are greater than the root
node. Fig. 4.9.1 represents the binary search tree.
The binary search tree is based on binary search algorithm.

4.5.1 Operations on Binary Search Tree


1. Insertion of a node in a binary tree
Algorithm:
1. Read the value for the node which is to be created and store it in a node called New.
2. Initially if (root!=NULL) then root-New.
3. Again read the next value of node created in New.
4. If (New->value < root->value) then attach New node as a left child of root otherwise attach New node as a
right child of root.
5. Repeat steps 3 aand 4 for constructing required binary search tree completely.
2. Deletion of an element from the binary tree
For deletion of any node from binary search tree there are three cases which are possible.
i. Deletion of leaf node.
ii. Deletion of a node having one child.
iii. Deletion of a node having two children.
i. Deletion of leaf node
This is the simplest deletion, in which we set the left or right pointer of parent node as NULL.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

From the above tree, we want to delete the node having value 6 then we will set left pointer of its parent
node as NULL. That is left pointer of node having value 8 is set to NULL.

Searching a node from binary search tree


In searching, the node which we want to search is called a key node. The key node will be compared with
each node starting from root node if value of key node is greater than current node then we search for it on
right subbranch otherwise on left subbranch. If we reach to leaf node and still we do not get the value of key
node then we declare "node is not present in the tree".

4.6 AVL Trees


Adelsion Velski and Lendis in 1962 introduced binary tree structure that is balanced with respect to height of
subtrees. The tree can be made balanced and because of this retrieval of any node can be done in O(logn)
times, where n is total number of nodes. From the name of these scientists the tree is called AVL tree.
• Definition
An empty tree is height balanced if T is a non empty binary tree with T and T as its left and right subtrees.
L R

The T is height balanced if and only if,


i) T and T are height balanced.
L R

ii) H - h <=1 where h - h are heights of T and T


L R L R L R

The idea of balancing a tree is obtained by calculating the balance factor of a tree.
• Definition of Balance Factor
The balance factor BF(T) of a node in binary tree is defined to be h - h where he h and h are heights of left
L R L R

and right subtrees of T.


For any node in AVL tree the balance factor i.e. BF(T) is -1, 0 or +1

4.6.1 Representation of AVL Tree

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


•The AVL tree follows the property of binary search tree. In fact AVL trees are basically binary search trees
with balance factor as -1, 0 or + 1.
•After insertion of any node in an AVL tree if the balance factor of any node becomes other than -1, 0, or + 1
then it is said that AVL property is violated. Then we have to restore the destroyed balance condition. The
balance factor is denoted at right top corner inside the node.

After an insertion of a new node if balance condition gets destroyed, then the nodes on that path (new node
insertion point to root) needs to be readjusted. That means only the affected subtree is to be rebalanced.
• The rebalancing should be such that entire tree should satisfy AVL property.

4.6.2 Insertion
There are four different cases when rebalancing is required after insertion of new node.
1. An insertion of new node into left subtree of left child (LL).
2. An insertion of new node into right subtree of left child (LR).
3. An insertion of new node into left subtree of right child (RL).
4. An insertion of new node into right subtree of right child (RR).
There is a symmetry between case 1 and 4. Similarly symmetry exists between case 2 and 3.
Some modifications done on AVL tree in order to rebalance it is called rotations of AVL tree.
There are two types of rotations.
Single rotation
Double rotation
Insertion algorithm
1. Insert a new node as new leaf just as in ordinary binary search tree.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


2. Now trace the path from insertion point (new node inserted as leaf) towards root. For each node 'n'
encountered, check if heights of left (n) and right (n) differ by at most 1.
a) If yes, move towards parent (n).
b) Otherwise restructure by doing either a single rotation or a double rotation.
Thus once we perform a rotation at node 'n' we do not require to perform any rotation at any ancestor on 'n'.
4.6.3 Different rotations in AVL tree
1. LL rotation

When node '1' gets inserted as a left child of node 'C' then AVL property gets destroyed i.e. node A has
balance factor + 2.
The LL rotation has to be applied to rebalance the nodes.
2. RR rotation
When node '4' gets attached as right child of node 'C' then node 'A' gets unbalanced. The rotation which
needs to be applied is RR rotation

When node '3' is attached as a right child of node 'C' then unbalancing occurs because of LR. Hence LR
rotation needs to be applied.
4. RL rotation
When node '2' is attached as a left child of node 'C' then node 'A' gets unbalanced as its balance factor
becomes 2. Then RL rotation needs to be applied to rebalance the AVL tree.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

4.6.4 Deletion
Even after deletion of any particular node from AVL tree, the tree has to be restructured in order to preserve
AVL property. And thereby various rotations need to be applied.
Algorithm for deletion
The deletion algorithm is more complex than insertion algorithm.
1.Search the node which is to be deleted.
2. a) If the node to be deleted is a leaf node then simply make it NULL to remove.
b) If the node to be deleted is not a leaf node i.e. node may have one or two children, then the node must be
swapped with its inorder successor. Once the node is swapped, we can remove this node.
3. Now we have to traverse back up the path towards root, checking the balance factor of every node along
the path. If we encounter unbalancing in some subtree then balance that subtree using appropriate single or
double rotations.
The deletion algorithm takes O(log n) time to delete any node

Thus the node 14 gets deleted from AVL tree.


4.6.5 Searching
The searching of a node in an AVL tree is very simple. As AVL tree is basically binary search tree, the
algorithm used for searching a node from binary search tree is the same one is used to search a node from
AVL tree.
For example: Consider an AVL tree, as given below
Now if we want to search node with value 13 then, we will start searching process from root.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

Step 1: Compare 13 with root node 10. As 13 > 10, search right subbranch.
Step 2: Compare 13 with 18. As 13 < 18, search left subbranch.
Step 3: Compare 13 with 12. As 13 > 12, search right subbranch.
Step 4: Compare 13 with 13. Declare now "node is found".
The searching operation takes O (log n) time.
The main objective here is to keep the tree balanced all the times. Such balancing causes the depth of the tree
to remain at its minimum and therefore overall costs of search is reduced.
4.7 Binary Heap

In this section we will learn "What is heap?" and "How to construct heap?"

Definition: Heap is a complete binary tree or a almost complete binary tree in which every parent node be
either greater or lesser than its child nodes.

Heap can be min heap or max heap.

4.7.1 Types of heap

i)Min heap ii) Max heap

A Max heap is a tree in which value of each node is greater than or equal to the value of its children nodes.

For example

A Min heap is a tree in which value of each node is less than or equal to value of its children nodes.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


For example:

Parent being greater or lesser in heap is called parental property. Thus heap has two important properties.

Heap:

i)It should be a complete binary tree

ii)It should satisfy parental property

Before understanding the construction of heap let us learn/revise few basics that are required while
constructing heap.

• Level of binary tree: The root of the tree is always at level 0. Any node is always at a level one more than
its parent nodes level.

For example:

• Height of the tree: The maximum level is the height of the tree. The height of the tree is also called depth
of the tree.

For example:

• Complete binary tree: The complete binary tree is a binary tree in which all the levels of the tree are filled
completely except the lowest level nodes which are filled from left if required.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


For example:

•. Almost complete binary tree: The almost complete binary tree is a tree in which

i) Each node has a left child whenever it has a right child. That means there is always a left child, but for a
left child there may not be a right child.

ii) The leaf in a tree must be present at height h or h-1. That means all the leaves are on two adjacent levels.

For example:

In Fig. 4.16.5 (a) the given binary tree is satisfying both the properties. That is all the leaves are at either
level 2 or level 1 and when right child is present left child is always present.

In Fig. 4.14.5 (b), the given binary tree is not satisfying the first property; that is the right child is present for
node 20 but it has no left child. Hence it is not almost complete binary tree.

In Fig. 4.14.5 (c), the given binary tree is not satisfying the second property; that is the leaves are at level 3
and level 1 and not at adjacent levels. Hence it is not almost complete binary tree.

4.8 Multiway Trees in Data Structures


In computer science and information technology, multiway trees-also referred to as multi-ary trees or generic
trees-are a fundamental type of data structure. They offer a flexible approach to describe hierarchical
structures and are used in a variety of contexts, including file systems, databases, and parser trees. We shall
examine the idea of multiway trees, their attributes, types, operations, and importance in data structures in
this article.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

4.8.1 Characteristics of Multiway Trees:


A tree data structure called a multiway tree allows each node to have several offspring. Multiway trees can
have a variety of child nodes, as opposed to binary trees, where each node can only have a maximum of two
children. They are ideal for modeling intricate hierarchical relationships because of their flexibility.
Multiway trees key characteristics include:

 Node Structure: Each node in a multiway tree has numerous pointers pointing to the nodes that are
its children. From node to node, the number of pointers may differ.
 Root Node: The root node, from which all other nodes can be accessed, is the node that acts as the
tree's origin.
 Leaf Nodes: Nodes with no children are known as leaf nodes. Leaf nodes in a multiway tree can
have any number of children, even zero.
 Height and Depth: Multiway trees have height (the maximum depth of the tree) and depth (the
distance from the root to a node). The depth and height of the tree might vary substantially due to the
different number of children.
4.8.2 Types of Multiway Trees:
Multiway trees come in a variety of forms, each with particular properties and uses. The most typical
varieties include:

 General Multiway Trees: Each node in these trees is allowed to have an unlimited number of
offspring. They are utilized in a broad variety of applications, such as company organizational
hierarchies and directory structures in file systems.
 B-trees: In file systems and databases, B-trees are a sort of self-balancing multiway tree. They are
effective for insertions, deletions, and searches because they are made to preserve a balanced
structure.
 Ternary Trees: A particular variety of multiway trees in which each node has precisely three
children is known as a ternary tree. They are frequently incorporated into optimization and language
processing methods.
 Quad Trees and Oct Trees: These are two examples of specialized multiway trees used in
geographic information systems and computer graphics for spatial segmentation and indexing. Oct
trees have eight children per node, compared to quad trees' four.
4.8.3 Operations on Multiway Trees:
Multiple operations are supported by multiway trees, allowing for effective data modification and retrieval.
The following are a few examples of basic operations:

 Insertion: Adding a new node to the tree while making sure it preserves the structure and
characteristics of the tree.
 Deletion: A node is deleted from the tree while still preserving its integrity.
 Search: Finding a particular node or value within the tree using search.
 Traversal: Traversal is the process of going through every node in the tree in a particular order, such
as pre-order, in-order, or post-order.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


 Balancing (for B-trees): To maintain quick search and retrieval times, balancing (for B-trees)
involves making sure the tree maintains its balance after insertions and deletions.

4.8.4 Significance of Multiway Trees:


For the following reasons, multiway trees are very important in data structures and computer science.

 Versatility: Multiway trees are useful for a wide range of applications because they offer a versatile
way to describe hierarchical relationships.
 Efficiency: For managing and organizing massive datasets in databases and file systems, certain
varieties of multiway trees, such as B-trees, are built for efficiency.
 Parsing and Syntax Trees: Multiway trees are crucial for compilers and interpreters because they
are utilized in the parsing and syntax analysis of programming languages.
 Spatial Data Structures: Multiway trees, such as quadtrees and oct trees, allow for effective spatial
indexing and partitioning in geographic information systems and computer graphics.

UNIT V GRAPH STRUCTURES


Graph ADT – representations of graph – graph traversals – DAG – topological ordering – shortest paths –
minimum spanning trees.

5.1 Graph Data Structure and Algorithms


Graph Data Structure is a collection of nodes connected by edges. It’s used to represent relationships between
different entities. Graph algorithms are methods used to manipulate and analyze graphs, solving various problems
like finding the shortest path or detecting cycles.
What is Graph Data Structure?
Graph is a non-linear data structure consisting of vertices and edges. The vertices are sometimes also referred to
as nodes and the edges are lines or arcs that connect any two nodes in the graph. More formally a Graph is
composed of a set of vertices( V ) and a set of edges( E ). The graph is denoted by G(V, E).
Components of a Graph:
Vertices: Vertices are the fundamental units of the graph. Sometimes, vertices are also known as vertex or nodes.
Every node/vertex can be labeled or unlabeled.
Edges: Edges are drawn or used to connect two nodes of the graph. It can be ordered pair of nodes in a directed
graph. Edges can connect any two nodes in any possible way. There are no rules. Sometimes, edges are also
known as arcs. Every edge can be labelled/unlabelled.
5.1.1 Representations of Graph
Here are the two most common ways to represent a graph :
 Adjacency Matrix
 Adjacency List

5.1.2 Adjacency Matrix


An adjacency matrix is a way of representing a graph as a matrix of boolean (0’s and 1’s).

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


Let’s assume there are n vertices in the graph So, create a 2D matrix adjMat[n][n] having dimension n x n.
If there is an edge from vertex i to j, mark adjMat[i][j] as 1.
If there is no edge from vertex i to j, mark adjMat[i][j] as 0.
Representation of Undirected Graph to Adjacency Matrix:
The below figure shows an undirected graph. Initially, the entire Matrix is initialized to 0. If there is an edge from
source to destination, we insert 1 to both cases (adjMat[destination] and adjMat[destination]) because we can go
either way.

5.1.3 Representation of Directed Graph to Adjacency Matrix: The below figure shows a directed graph.
Initially, the entire Matrix is initialized to 0. If there is an edge from source to destination, we insert 1 for that
particular adjMat[destination].

5.1.4 Adjacency List


An array of Lists is used to store edges between two vertices. The size of array is equal to the number of vertices
(i.e, n). Each index in this array represents a specific vertex in the graph. The entry at the index i of the array
contains a linked list containing the vertices that are adjacent to vertex i.
Let’s assume there are n vertices in the graph So, create an array of list of size n as adjList[n].
adjList[0] will have all the nodes which are connected (neighbour) to vertex 0.
adjList[1] will have all the nodes which are connected (neighbour) to vertex 1 and so on.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


5.1.5 Representation of Undirected Graph to Adjacency list:
The below undirected graph has 3 vertices. So, an array of list will be created of size 3, where each indices
represent the vertices. Now, vertex 0 has two neighbours (i.e, 1 and 2). So, insert vertex 1 and 2 at indices 0 of
array. Similarly, For vertex 1, it has two neighbour (i.e, 2 and 0) So, insert vertices 2 and 0 at indices 1 of array.
Similarly, for vertex 2, insert its neighbours in array of list.

5.1.6 Representation of Directed Graph to Adjacency list:


The below directed graph has 3 vertices. So, an array of list will be created of size 3, where each indices represent
the vertices. Now, vertex 0 has no neighbours. For vertex 1, it has two neighbour (i.e, 0 and 2) So, insert vertices
0 and 2 at indices 1 of array. Similarly, for vertex 2, insert its neighbours in array of list.

5.2 Types of Graph in Data Structure


There are different types of graphs in data structures, each of which is detailed below.
 Complete Graph
If a graph G= (V, E) is also a simple graph, it is complete. Using the edges, with n number of vertices must be
connected. It's also known as a full graph because each vertex's degree must be n-1.

∙ Weighted Graph
A graph G= (V, E) is called a labeled or weighted graph because each edge has a value or weight representing the
cost of traversing that edge.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

∙ Directed Graph
A directed graph also referred to as a digraph, is a set of nodes connected by edges, each with a direction

∙ Undirected Graph
An undirected graph comprises a set of nodes and links connecting them. The order of the two connected vertices
is irrelevant and has no direction. You can form an undirected graph with a finite number of vertices and edges.

∙ Connected Graph
If there is a path between one vertex of a graph data structure and any other vertex, the graph is connected.

∙ Disconnected Graph
When there is no edge linking the vertices, you refer to the null graph as a disconnected graph.

∙ Cyclic Graph
If a graph contains at least one graph cycle, it is considered to be cyclic.

∙ Acyclic Graph
When there are no cycles in a graph, it is called an acyclic graph.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

∙ Directed Acyclic Graph


It's also known as a directed acyclic graph (DAG), and it's a graph with directed edges but no cycle. It represents
the edges using an ordered pair of vertices since it directs the vertices and stores some data.

5.3 Graph Traversal in Data Structure


We can traverse a graph in two ways :
1. BFS ( Breadth First Search )
2. DFS ( Depth First Search )

5.3.1 BFS Graph Traversal in Data Structure


Breadth-first search (BFS) traversal is a technique for visiting all nodes in a given network. This traversal
algorithm selects a node and visits all nearby nodes in order. After checking all nearby vertices, examine another
set of vertices, then recheck adjacent vertices. This algorithm uses a queue as a data structure as an additional
data structure to store nodes for further processing. Queue size is the maximum total number of vertices in the
graph.
Graph Traversal: BFS Algorithm
Pseudo Code :
def bfs(graph, start_node):
queue = [start_node]
visited = set()
while queue:
node = queue.pop(0)
if node not in visited:
visited.add(node)
print(node)
for neighbor in graph[node]:
queue.append(neighbor)
Explanation of the above Pseudocode
 The technique starts by creating a queue with the start node and an empty set to keep track of visited nodes.

 It then starts a loop that continues until all nodes have been visited.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


 During each loop iteration, the algorithm dequeues the first node from the queue, checks if it has been visited
and if not, marks it as visited, prints it (or performs any other desired action), and adds all its adjacent nodes to
the queue.
 The operation is repeated until the queue is empty, indicating that all nodes have been visited.

 continues until the queue empties, signifying the completion of traversal.

Step: 1 We pick A as the starting point and add A to the Queue. To prevent cycles, we also mark A as visited(by
adding it to the visited set).
 Step: 2 We remove the head of the Queue (i.e. A now). The Node was First In (inserted first) in the Queue. We
process A and pick all its neighbours that have not been visited yet(i.e., not in the visited set).
 Step :3 Next, we pull the head of the Queue, , i.e. D. We process D and consider all neighbours of D, which are
A and E, but since both A and E are in the visited set, we ignore them and move forward.

 Step :4 Next, we pull the head of the Queue, i.e. E. We process E. Then we need to consider all neighbours of
E, which are A and D, but since both A and D are in the visited set, we ignore them and move forward.
 Next, we pull the head of the Queue, i.e. C. We process C. Then we consider all neighbours of C, which are A
and B. Since A is in the visited set, we ignore it. But as B has not yet been visited, we visit B and add it to the
Queue.
 Step 5: Finally, we pick B from Queue, and its neighbour is C, which has already visited. We have nothing else
in Queue to process, so we are done traversing the graph.
 So the order in which we processed/explored the elements are: A, D, E, C, B which is the Breadth-First Search
of the above Graph.
5.3.2 DFS Graph Traversal in Data Structure
When traversing a graph, the DFS method goes as far as it can before turning around. This algorithm explores the
graph in depth-first order, starting with a given source node and then recursively visiting all of its surrounding
vertices before backtracking. DFS will analyze the deepest vertices in a branch of the graph before moving on to
other branches. To implement DFS, either recursion or an explicit stack might be utilized.
Graph Traversal: DFS Algorithm
Pseudo Code :
def dfs(graph, start_node, visited=set()):
visited.add(start_node)
print(start_node)

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


for neighbor in graph[start_node]:
if neighbor not in visited:
dfs(graph, neighbor, visited)
Explanation of the above Pseudocode
 The method starts by marking the start node as visited and publishing it (or doing whatever additional action is
needed).
 It then visits all adjacent nodes that have not yet been visited recursively. This procedure is repeated until all
nodes have been visited.
 The algorithm identifies the current node as visited and prints it (or does any other required action) throughout
each recursive call.
 It then invokes itself on all neighboring nodes that have yet to be visited.

 Example

 Let us see how the DFS algorithm works with an example. Here, we will use an undirected graph with
5 vertices.

What is Directed Acyclic Graph?


A Directed Acyclic Graph (DAG) is a directed graph that does not contain any cycles.
Below Graph represents a Directed Acyclic Graph (DAG):

Meaning of Directed Acyclic Graph:


Directed Acyclic Graph has two important features:
 Directed Edges: In Directed Acyclic Graph, each edge has a direction, meaning it goes from one vertex (node)
to another. This direction signifies a one-way relationship or dependency between nodes.
 Acyclic: The term “acyclic” indicates that there are no cycles or closed loops within the graph. In other words,
you cannot traverse a sequence of directed edges and return to the same node, following the edge directions.
Formation of cycles is prohibited in DAG. Hence this characteristic is essential.
Properties of Directed Acyclic Graph DAG:
Directed Acyclic Graph (DAG) has different properties that make them usable in graph problems.
There are following properties of Directed Acyclic Graph (DAG):

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

Reachability Relation: In DAG, we can determine if there is a reachability relation between two nodes. Node A
is said to be reachable from node B if there exists a directed path that starts at node B and ends at node A. This
implies that you can follow the direction of edges in the graph to get from B to A.
 Transitive Closure:The transitive closure of a directed graph is a new graph that represents all the direct and
indirect relationships or connections between nodes in the original graph. In other words, it tells you which nodes
can be reached from other nodes by following one or more directed edges.

Transitive Closure of Directed Acyclic Graph


 Transitive Reduction: The transitive reduction of a directed graph is a new graph that retains only the
essential, direct relationships between nodes, while removing any unnecessary indirect edges. In essence, it
simplifies the graph by eliminating edges that can be inferred from the remaining edges.

Transitive Reduction of Directed Acyclic Graph


5.4Topological Ordering: A DAG can be topologically sorted, which means you can linearly order its nodes in
such a way that for all the edges, start node of the edge occurs earlier in the sequence. This property is useful for
tasks like scheduling and dependency resolution.

Topological Ordering of Directed Acyclic Graph


Practical Applications of DAG:
 Data flow Analysis: In compiler design and optimization, DAGs are used to represent data flow within a
program. This aids in optimizing code by identifying redundant calculations and dead code. DAGs are also used
to represent the structure of basic blocks in Compiler Design.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


Task Scheduling: DAGs are used in project management and job scheduling. Each task or job is represented as a
node in the DAG, with directed edges indicating dependencies. The acyclic nature of the DAG ensures tasks are
scheduled in a logical order, preventing circular dependencies.
A weighted directed acyclic graph can be used to represent a scheduling problem. Let’s take the example of a
task scheduling problem. Here, a vertex can represent the task and its weight can represent the size of the task
computation. Similarly, an edge can represent the communication between two tasks and its weight can represent
the cost of communication:

Topological Sorting
Topological sorting for Directed Acyclic Graph (DAG) is a linear ordering of vertices such that for every directed
edge u-v, vertex u comes before v in the ordering.
topological_sort(N, adj[N][N])
T = []
visited = []
in_degree = []
for i = 0 to N
in_degree[i] = visited[i] = 0
for i = 0 to N
for j = 0 to N
if adj[i][j] is TRUE
in_degree[j] = in_degree[j] + 1
for i = 0 to N
if in_degree[i] is 0
enqueue(Queue, i)
visited[i] = TRUE
while Queue is not Empty
vertex = get_front(Queue)
dequeue(Queue)
T.append(vertex)
for j = 0 to N
if adj[vertex][j] is TRUE and visited[j] is FALSE
in_degree[j] = in_degree[j] - 1
if in_degree[j] is 0
enqueue(Queue, j)
visited[j] = TRUE
return T

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


Let's take a graph and see the algorithm in action. Consider the graph given below:
Initially 𝑖𝑛_𝑑𝑒𝑔𝑟𝑒𝑒[0]=0 and 𝑇 is empty
So, we delete 0 from 𝑄𝑢𝑒𝑢𝑒 and append it to 𝑇. The vertices directly connected to 0 are 1 and 2 so we
decrease their 𝑖𝑛_𝑑𝑒𝑔𝑟𝑒𝑒[] by 1. So, now 𝑖𝑛_𝑑𝑒𝑔𝑟𝑒𝑒[1]=0 and so 1 is pushed in 𝑄𝑢𝑒𝑢𝑒.
Next we delete 1 from 𝑄𝑢𝑒𝑢𝑒 and append it to 𝑇. Doing this we decrease 𝑖𝑛_𝑑𝑒𝑔𝑟𝑒𝑒[2] by 1, and now it
becomes 0 and 2 is pushed into 𝑄𝑢𝑒𝑢𝑒.

5.5 SHORTEST PATHS


Dijkstra’s algorithm is a popular algorithms for solving many single-source shortest path problems having non-
negative edge weight in the graphs i.e., it is to find the shortest distance between two vertices on a graph.
The algorithm maintains a set of visited vertices and a set of unvisited vertices. It starts at the source vertex and
iteratively selects the unvisited vertex with the smallest tentative distance from the source. It then visits the
neighbors of this vertex and updates their tentative distances if a shorter path is found. This process continues
until the destination vertex is reached, or all reachable vertices have been visited.
Step 1: Start from Node 0 and mark Node as visited as you can check in below image visited Node is marked
red.

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)

Dijkstra’s Algorithm
Step 2: Check for adjacent Nodes, Now we have to choices (Either choose Node1 with distance 2 or either choose
Node 2 with distance 6 ) and choose Node with minimum distance. In this step Node 1 is Minimum distance
adjacent Node, so marked it as visited and add up the distance.
Distance: Node 0 -> Node 1 = 2

Dijkstra’s Algorithm
Step 3: Then Move Forward and check for adjacent Node which is Node 3, so marked it as visited and add up the
distance, Now the distance will be:
Distance: Node 0 -> Node 1 -> Node 3 = 2 + 5 = 7

SubCode:AD3251 Subject Name: Data Structures Design


Department of Artificial Intelligence and Data Science

(An Autonomous Institution)


Dijkstra’s Algorithm
Step 4: Again we have two choices for adjacent Nodes (Either we can choose Node 4 with distance 10 or either
we can choose Node 5 with distance 15) so choose Node with minimum distance. In this step Node 4 is Minimum
distance adjacent Node, so marked it as visited and add up the distance.
Distance: Node 0 -> Node 1 -> Node 3 -> Node 4 = 2 + 5 + 10 = 17

Dijkstra’s Algorithm
Step 5: Again, Move Forward and check for adjacent Node which is Node 6, so marked it as visited and add up
the distance, Now the distance will be:
Distance: Node 0 -> Node 1 -> Node 3 -> Node 4 -> Node 6 = 2 + 5 + 10 + 2 = 19
Dijkstra’s Algorithm

Dijkstra’s Algorithm
So, the Shortest Distance from the Source Vertex is 19 which is optimal one.

SubCode:AD3251 Subject Name: Data Structures Design

You might also like