Advanced Data Structures
Advanced data structures are one of the most important disciplines of data science since
they are used for storing, organizing, managing data and information to make it more
efficient, easier to access, and modify. They are the foundation for designing and
developing efficient and effective software and algorithms.
Definition of Algorithm
The word Algorithm means ” A set of finite rules or instructions to be followed in
calculations or other problem-solving operations ”
Or
” A procedure for solving a mathematical problem in a finite number of steps that
frequently involves recursive operations”.
Therefore Algorithm refers to a sequence of finite steps to solve a particular problem.
Use of the Algorithms:
Algorithms play a crucial role in various fields and have many applications. Some of the key
areas where algorithms are used include:
1. Computer Science: Algorithms form the basis of computer programming and are used
to solve problems ranging from simple sorting and searching to complex tasks such as
artificial intelligence and machine learning.
2. Mathematics: Algorithms are used to solve mathematical problems, such as finding the
optimal solution to a system of linear equations or finding the shortest path in a graph.
3. Operations Research: Algorithms are used to optimize and make decisions in fields
such as transportation, logistics, and resource allocation.
4. Artificial Intelligence: Algorithms are the foundation of artificial intelligence and
machine learning, and are used to develop intelligent systems that can perform tasks
such as image recognition, natural language processing, and decision-making.
5. Data Science: Algorithms are used to analyze, process, and extract insights from large
amounts of data in fields such as marketing, finance, and healthcare.
Need for algorithms
1. Algorithms are necessary for solving complex problems efficiently and effectively.
2. They help to automate processes and make them more reliable, faster, and easier to
perform.
3. Algorithms also enable computers to perform tasks that would be difficult or impossible
for humans to do manually.
4. They are used in various fields such as mathematics, computer science, engineering,
finance, and many others to optimize processes, analyze data, make predictions, and
provide solutions to problems.
Characteristics of an Algorithm
● Clear and Unambiguous: The algorithm should be unambiguous. Each of its steps
should be clear in all aspects and must lead to only one meaning.
● Well-Defined Inputs: If an algorithm says to take inputs, it should be well-defined
inputs. It may or may not take input.
● Well-Defined Outputs: The algorithm must clearly define what output will be yielded
and it should be well-defined as well. It should produce at least 1 output.
● Finite-ness: The algorithm must be finite, i.e. it should terminate after a finite time.
● Feasible: The algorithm must be simple, generic, and practical, such that it can be
executed with the available resources. It must not contain some future technology or
anything.
● Language Independent: The Algorithm designed must be language-independent, i.e.
it must be just plain instructions that can be implemented in any language, and yet the
output will be the same, as expected.
● Input: An algorithm has zero or more inputs. Each that contains a fundamental operator
must accept zero or more inputs.
● Output: An algorithm produces at least one output. Every instruction that contains a
fundamental operator must accept zero or more inputs.
● Definiteness: All instructions in an algorithm must be unambiguous, precise, and easy
to interpret. By referring to any of the instructions in an algorithm one can clearly
understand what is to be done. Every fundamental operator in instruction must be
defined without any ambiguity.
● Finiteness: An algorithm must terminate after a finite number of steps in all test cases.
Every instruction which contains a fundamental operator must be terminated within a
finite amount of time. Infinite loops or recursive functions without base conditions do
not possess finiteness.
● Effectiveness: An algorithm must be developed by using very basic, simple, and
feasible operations so that one can trace it out by using just paper and pencil.
Properties of Algorithm:
● It should terminate after a finite time.
● It should produce at least one output.
● It should take zero or more input.
● It should be deterministic means giving the same output for the same input case.
● Every step in the algorithm must be effective i.e. every step should do some work.
Algorithm analysis
Algorithm analysis is a crucial aspect of computer science and software engineering.
It involves evaluating the efficiency and performance of an algorithm.
Algorithms Design Techniques
An Algorithm is a procedure to solve a particular problem in a finite number of steps for a
finite-sized input.
The algorithms can be classified in various ways. They are:
1. Implementation Method
2. Design Method
3. Design Approaches
4. Other Classifications
Classification by Implementation Method: There are primarily three main
categories into which an algorithm can be named in this type of classification. They are:
1. Recursion or Iteration: A recursive algorithm is an algorithm which calls itself again
and again until a base condition is achieved whereas iterative algorithms use loops
and/or data structures like stacks, queues to solve any problem. Every recursive
solution can be implemented as an iterative solution and vice versa.
Example: The Tower of Hanoi is implemented in a recursive fashion while Stock Span
problem is implemented iteratively.
2. Exact or Approximate: Algorithms that are capable of finding an optimal solution
for any problem are known as the exact algorithm. For all those problems, where it is
not possible to find the most optimized solution, an approximation algorithm is used.
Approximate algorithms are the type of algorithms that find the result as an average
outcome of sub outcomes to a problem.
Example: For NP-Hard Problems, approximation algorithms are used. Sorting
algorithms are the exact algorithms.
3. Serial or Parallel or Distributed Algorithms: In serial algorithms, one
instruction is executed at a time while parallel algorithms are those in which we divide
the problem into subproblems and execute them on different processors. If parallel
algorithms are distributed on different machines, then they are known as distributed
algorithms.
Classification by Design Method: There are primarily three main categories into
which an algorithm can be named in this type of classification. They are:
1. Greedy Method: In the greedy method, at each step, a decision is made to choose the
local optimum, without thinking about the future consequences.
Example: Fractional Knapsack, Activity Selection.
2. Divide and Conquer: The Divide and Conquer strategy involves dividing the problem
into sub-problem, recursively solving them, and then recombining them for the final
answer.
Example: Merge sort, Quicksort.
3. Dynamic Programming: The approach of Dynamic programming is similar to divide
and conquer. The difference is that whenever we have recursive function calls with the
same result, instead of calling them again we try to store the result in a data structure in
the form of a table and retrieve the results from the table. Thus, the overall time
complexity is reduced. “Dynamic” means we dynamically decide, whether to call a
function or retrieve values from the table.
Example: 0-1 Knapsack, subset-sum problem.
4. Linear Programming: In Linear Programming, there are inequalities in terms of
inputs and maximizing or minimizing some linear functions of inputs.
Example: Maximum flow of Directed Graph
5. Reduction(Transform and Conquer): In this method, we solve a difficult problem
by transforming it into a known problem for which we have an optimal solution.
Basically, the goal is to find a reducing algorithm whose complexity is not dominated by
the resulting reduced algorithms.
Example: Selection algorithm for finding the median in a list involves first sorting the
list and then finding out the middle element in the sorted list. These techniques are also
called transform and conquer.
6. Backtracking: This technique is very useful in solving combinatorial problems that
have a single unique solution. Where we have to find the correct combination of steps
that lead to fulfillment of the task. Such problems have multiple stages and there are
multiple options at each stage. This approach is based on exploring each available
option at every stage one-by-one. While exploring an option if a point is reached that
doesn’t seem to lead to the solution, the program control backtracks one step, and starts
exploring the next option. In this way, the program explores all possible course of
actions and finds the route that leads to the solution.
Example: N-queen problem, maize problem.
7. Branch and Bound: This technique is very useful in solving combinatorial
optimization problem that have multiple solutions and we are interested in find the
most optimum solution. In this approach, the entire solution space is represented in the
form of a state space tree. As the program progresses each state combination is
explored, and the previous solution is replaced by new one if it is not the optimal than
the current solution.
Example: Job sequencing, Travelling salesman problem.
Classification by Design Approaches : There are two approaches for designing
an algorithm. these approaches include
1. Top-Down Approach :
2. Bottom-up approach
● Top-Down Approach: In the top-down approach, a large problem is divided into
small sub-problem. and keeps repeating the process of decomposing problems until
the complex problem is solved.
● Bottom-up approach: The bottom-up approach is also known as the reverse of
top-down approaches.
In approach different, part of a complex program is solved using a programming
language and then this is combined into a complete program.
Top-Down Approach:
Breaking down a complex problem into smaller, more manageable sub-problems and
solving each sub-problem individually.
Designing a system starting from the highest level of abstraction and moving towards the
lower levels.
Bottom-Up Approach:
Building a system by starting with the individual components and gradually integrating
them to form a larger system.
Solving sub-problems first and then using the solutions to build up to a solution of a
larger problem.
Complexity analysis
Complexity analysis is defined as a technique to characterise the time
taken by an algorithm with respect to input size (independent from the
machine, language and compiler). It is used for evaluating the variations of
execution time on different algorithms.
Need for Complexity Analysis
● Complexity Analysis determines the amount of time and space resources
required to execute it.
● It is used for comparing different algorithms on different input sizes.
● Complexity helps to determine the difficulty of a problem.
● often measured by how much time and space (memory) it takes to solve a
particular problem.
Asymptotic Notations in Complexity Analysis:
1. Big O Notation: Big-O notation represents the upper bound of the running
time of an algorithm. Therefore, it gives the worst-case complexity of an
algorithm. By using big O- notation, we can asymptotically limit the expansion
of a running time to a range of constant factors above and below. It is a model for
quantifying algorithm performance.
Graphical Representation
Mathematical Representation of Big-O Notation:
O(g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ f(n) ≤
cg(n) for all n ≥ n0 }
2. Omega Notation:Omega notation represents the lower bound of the
running time of an algorithm. Thus, it provides the best-case complexity of an
algorithm.
The execution time serves as a lower bound on the algorithm’s time complexity. It
is defined as the condition that allows an algorithm to complete statement execution
in the shortest amount of time.
Graphical Representation
Mathematical Representation of Omega notation :
Ω(g(n)) = { f(n): there exist positive constants c and n0 such that 0 ≤ cg(n) ≤
f(n) for all n ≥ n0 }
Note: Ω (g) is a set
3. Theta Notation:Theta notation encloses the function from above and below.
Since it represents the upper and the lower bound of the running time of an
algorithm, it is used for analyzing the average-case complexity of an algorithm.
The execution time serves as both a lower and upper bound on the algorithm’s
time complexity. It exists as both, the most, and least boundaries for a given
input value.
Graphical Representation
Mathematical Representation:
Θ (g(n)) = {f(n): there exist positive constants c1, c2 and n0 such that 0 ≤ c1 *
g(n) ≤ f(n) ≤ c2 * g(n) for all n ≥ n0}
4. Little ο asymptotic notation:Big-Ο is used as a tight upper bound on
the growth of an algorithm’s effort (this effort is described by the function f(n)),
even though, as written, it can also be a loose upper bound. “Little-ο” (ο())
notation is used to describe an upper bound that cannot be tight.
Graphical Representation
Mathematical Representation:
f(n) = o(g(n)) means lim f(n)/g(n) = 0 n→∞
5. Little ω asymptotic notation:Let f(n) and g(n) be functions that map
positive integers to positive real numbers. We say that f(n) is ω(g(n)) (or f(n) ∈
ω(g(n))) if for any real constant c > 0, there exists an integer constant n0 ≥ 1
such that f(n) > c * g(n) ≥ 0 for every integer n ≥ n0.
Mathematical Representation:
if f(n) ∈ ω(g(n)) then, lim f(n)/g(n) = ∞ , n→∞
Note: In most of the Algorithm we use Big-O notation, as it is worst case
Complexity Analysis.
How to measure complexity?
The complexity of an algorithm can be measured in three ways:
1. Time Complexity
The time complexity of an algorithm is defined as the amount of time taken by an
algorithm to run as a function of the length of the input. Note that the time to run is
a function of the length of the input and not the actual execution time of the
machine on which the algorithm is running on
How is Time complexity computed?
To estimate the time complexity, we need to consider the cost of each fundamental
instruction and the number of times the instruction is executed.
If we have statements with basic operations like comparisons, return
statements, assignments, and reading a variable. We can assume
they take constant time each O(1).
Statement 1: int a=5; // reading a variable
statement 2; if( a==5) return true; // return statement
statement 3; int x= 4>5 ? 1:0; // comparison
statement 4; bool flag=true; // Assignment
This is the result of calculating the overall time complexity.
total time = time(statement1) + time(statement2) + ... time (statementN)
Assuming that n is the size of the input, let’s use T(n) to represent the overall
time and t to represent the amount of time that a statement or collection of
statements takes to execute.
T(n) = t(statement1) + t(statement2) + ... + t(statementN);
Overall, T(n)= O(1), which means constant complexity.
For any loop, we find out the runtime of the block inside them and multiply it
by the number of times the program will repeat the loop.
for (int i = 0; i < n; i++) {
cout << “CSE” << endl;
}
For the above example, the loop will execute n times, and it will print “CSE” N
number of times. so the time taken to run this program is:
T(N)= n *( t(cout statement))
= n * O(1)
=O(n), Linear complexity.
For 2D arrays, we would have nested loop concepts, which means a loop
inside a loop.
for (int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
cout << “CSE” << endl;
}
}
For the above example, the cout statement will execute n*m times, and it will
print “CSE” N*M number of times. so the time taken to run this program is:
T(N)= n * m *(t(cout statement))
= n * m * O(1)
=O(n*m), Quadratic Complexity.
2. Space Complexity :
The amount of memory required by the algorithm to solve a given problem is
called the space complexity of the algorithm. Problem-solving using a computer
requires memory to hold temporary data or final result while the program is in
execution.
How is space complexity computed?
The space Complexity of an algorithm is the total space taken by the algorithm with
respect to the input size. Space complexity includes both Auxiliary space and space
used by input.
Space complexity is a parallel concept to time complexity. If we need to create an
array of size n, this will require O(n) space. If we create a two-dimensional
array of size n*n, this will require O(n2) space.
In recursive calls stack space also counts.
Example:
int add (int n){
if (n <= 0){
return 0;
}
return n + add (n-1);
}
Here each call add a level to the stack :
1. add(4)
2. -> add(3)
3. -> add(2)
4. -> add(1)
5. -> add(0)
Each of these calls is added to call stack and takes up actual memory.
So it takes O(n) space.
However, just because you have n calls total doesn’t mean it takes O(n) space.
Look at the below function :
int addSequence (int n){
int sum = 0;
for (int i = 0; i < n; i++){
sum += pairSum(i, i+1);
}
return sum;
}
int pairSum(int x, int y){
return x + y;
}
There will be roughly O(n) calls to pairSum. However, those
calls do not exist simultaneously on the call stack,
so you only need O(1) space.
3. Auxiliary Space :
The temporary space needed for the use of an algorithm is referred to as auxiliary
space. Like temporary arrays, pointers, etc.
It is preferable to make use of Auxiliary Space when comparing things like sorting
algorithms.
for example, sorting algorithms take O(n) space, as there is an input array to
sort. but auxiliary space is O(1) in that case.
How does Complexity affect any algorithm?
Time complexity of an algorithm quantifies the amount of time taken by an
algorithm to run as a function of length of the input. While, the space
complexity of an algorithm quantifies the amount of space or memory taken
by an algorithm to run as a function of the length of the input.
How to optimize the time and space complexity of an Algorithm?
Optimization means modifying the brute-force approach to a problem. It is done to
derive the best possible solution to solve the problem so that it will take less time
and space complexity. We can optimize a program by either limiting the search
space at each step or occupying less search space from the start.
We can optimize a solution using both time and space optimization. To optimize a
program,We can reduce the time taken to run the program and increase the
space occupied;We can reduce the memory usage of the program and increase its
total run time, orWe can reduce both time and space complexity by deploying
relevant algorithms.