COL 106: Data-structures
Course coordinator :
Amit Kumar ([email protected])
Course web-page:
www.cse.iitd.ac.in/~amitk/
Course Information
Fundamental topics which will be needed in almost all
subsequence CSE courses.
You will need to write large programs in JAVA
programming language.
Textbook: Data-structures and Algorithms in JAVA by
Goodrich & Tamassia
We will not teach JAVA. Many excellent sources on the
web. Also read the JAVA module on course web-page
and do all exercises. TA will grade these exercises.
We will have lab sessions where TA will explain/help
with JAVA.
Evaluations components
Quiz : 5% (unannounced)
Minor Exams : 20% each
Assignments : 20% (5-6 assignments)
Major exam : 35%
Assignments
You will be expected to program in JAVA and use
object oriented programming.
One programming assignment every 2 weeks
NO late submission (strictly enforced, reasons
like illness will not be accepted)
NO COPYING FROM ANY SOURCE
(if caught copying, expect an “F” grade and
other disciplinary actions)
Topics
Arrays
Lists
Abstract Data Types, object oriented concepts
Stacks, Queues
Trees : Binary trees, Balanced trees, B-trees
Strings : Tries, Matching algorithms
Sorting
Hashing
Graphs
Data Structures and Algorithms
Algorithm: Outline, the essence of a
computational procedure, step-by-step
instructions
Program: an implementation of an
algorithm in some programming language
Data structure: Organization of data
needed to solve the problem
Algorithmic problem
Specification
Specification ? of output as
of input a function of
input
Infinitenumber of input instances satisfying the
specification. For eg: A sorted, non-decreasing
sequence of natural numbers of non-zero, finite
length:
1, 20, 908, 909, 100000, 1000000000.
3.
Algorithmic Solution
Input instance, Algorithm Output
adhering to related to
the the input as
specification required
Algorithm describes actions on the input instance
Infinitely many correct algorithms for the same
algorithmic problem
What is a Good Algorithm?
Efficient:
Running time
Space used
Efficiency as a function of input size:
Thenumber of bits in an input number
Number of data elements (numbers, points)
Measuring the Running Time
t (ms)
60
50
How should we measure the 40
running time of an algorithm? 30
20
10
Experimental Study 0 50 100
n
Write a program that implements the algorithm
Run the program with data sets of varying size
and composition.
Use a system call to get an accurate measure of
the actual running time.
Limitations of Experimental Studies
Itis necessary to implement and test the
algorithm in order to determine its running time.
Experiments can be done only on a limited set of
inputs, and may not be indicative of the running
time on other inputs not included in the
experiment.
In order to compare two algorithms, the same
hardware and software environments should be
used.
Beyond Experimental Studies
We will develop a general methodology for
analyzing running time of algorithms. This
approach
Uses a high-level description of the algorithm
instead of testing one of its implementations.
Takes into account all possible inputs.
Allows one to evaluate the efficiency of any
algorithm in a way that is independent of the
hardware and software environment.
Pseudo-Code
A mixture of natural language and high-level
programming concepts that describes the main
ideas behind a generic implementation of a data
structure or algorithm.
Eg: Algorithm arrayMax(A, n):
Input: An array A storing n integers.
Output: The maximum element in A.
currentMax A[0]
for i 1 to n-1 do
if currentMax < A[i] then currentMax A[i]
return currentMax
Pseudo-Code
It is more structured than usual prose but
less formal than a programming language
Expressions:
use standard mathematical symbols to
describe numeric and boolean expressions
use for assignment (“=” in Java)
use = for the equality relationship (“==” in
Java)
Method Declarations:
Algorithm name(param1, param2)
Pseudo Code
Programming Constructs:
decision structures: if ... then ... [else ... ]
while-loops: while ... do
repeat-loops: repeat ... until ...
for-loop: for ... do
array indexing: A[i], A[i,j]
Methods:
calls:object method(args)
returns: return value
Analysis of Algorithms
Primitive
Operation: Low-level operation
independent of programming language.
Can be identified in pseudo-code. For eg:
Data movement (assign)
Control (branch, subroutine call, return)
arithmetic an logical operations (e.g. addition,
comparison)
Byinspecting the pseudo-code, we can
count the number of primitive operations
executed by an algorithm.
Example: Sorting
INPUT OUTPUT
sequence of numbers a permutation of the
sequence of numbers
a1, a2, a3,….,an Sort b1,b2,b3,….,bn
2 5 4 10 7 2 4 5 7 10
Correctness (requirements for the Running time
output) Depends on
For any given input the algorithm number of elements (n)
halts with the output: how (partially) sorted
b1 < b2 < b3 < …. < bn they are
b1, b2, b3, …., bn is a algorithm
permutation of a1, a2, a3,….,an
Insertion Sort
A 3 4 6 8 9 7 2 5 1
1 j n
i
Strategy INPUT: A[1..n] – an array of integers
OUTPUT: a permutation of A such that
Start “empty handed” A[1]A[2]…A[n]
Insert a card in the right
position of the already for j2 to n do
sorted key A[j]
hand Insert A[j] into the sorted sequence
Continue until all cards are A[1..j-1]
inserted/sorted ij-1
while i>0 and A[i]>key
do A[i+1]A[i]
i--
A[i+1]key
Analysis of Insertion Sort
cost times
for j2 to n do c1 n
keyA[j] c2 n-1
Insert A[j] into the sorted 0 n-1
sequence A[1..j-1]
c3 n-1
ij-1
t
n
while i>0 and A[i]>key c4 j 2 j
(t
n
c5 1)
do A[i+1]A[i] j 2 j
(t
n
c6 n-1 1)
i-- j 2 j
A[i+1] Ã key c7
Total time = n(c1+c2+c3+c7) + nj=2 tj (c4+c5+c6)
– (c2+c3+c5+c6+c7)
Best/Worst/Average Case
Total time = n(c1+c2+c3+c7) + nj=2 tj (c4+c5+c6)
– (c2+c3+c5+c6+c7)
Best case: elements already sorted; tj=1,
running time = f(n), i.e., linear time.
Worst case: elements are sorted in
inverse order; tj=j, running time = f(n2), i.e.,
quadratic time
Average case: tj=j/2, running time = f(n2),
i.e., quadratic time
Best/Worst/Average Case (2)
Fora specific size of input n, investigate
running times for different input instances:
Best/Worst/Average Case (3)
For inputs of all sizes:
worst-case
6n average-case
Running time
5n
best-case
4n
3n
2n
1n
1 2 3 4 5 6 7 8 9 10 11 12 …..
Input instance size
Best/Worst/Average Case (4)
Worst case is usually used: It is an upper-
bound and in certain application domains (e.g.,
air traffic control, surgery) knowing the worst-
case time complexity is of crucial importance
For some algorithms worst case occurs fairly
often
Average case is often as bad as the worst
case
Finding average case can be very difficult
Asymptotic Analysis
Goal: to simplify analysis of running time by
getting rid of ”details”, which may be affected by
specific implementation and hardware
like“rounding”: 1,000,001 1,000,000
3n2 n2
Capturing the essence: how the running time of
an algorithm increases with the size of the input
in the limit.
Asymptotically more efficient algorithms are best for
all but small inputs
Asymptotic Notation
The “big-Oh” O-Notation
asymptotic upper bound
f(n) is O(g(n)), if there
exists constants c and
n0, s.t. f(n) c g(n) for n
c g ( n)
n0 f (n )
Running Time
f(n) and g(n) are
functions over non-
negative integers
Usedfor worst-case n0 Input Size
analysis
Example
For functions f(n) and g(n) there are positive
constants c and n0 such that: f(n) ≤ c g(n) for n ≥ n0
f(n) = 2n + 6
conclusion:
2n+6 is O(n).
Another Example
On the other hand…
n2 is not O(n) because there is
no c and n0 such that:
n2 ≤ cn for n ≥ n0
The graph to the right
illustrates that no matter how
large a c is chosen there is an n
big enough that n2 > cn ) .
Asymptotic Notation
SimpleRule: Drop lower order terms and
constant factors.
50 n log n is O(n log n)
7n - 3 is O(n)
8n2 log n + 5n2 + n is O(n2 log n)
Note:Even though (50 n log n) is O(n5), it
is expected that such an approximation be
of as small an order as possible
Asymptotic Analysis of Running Time
Use O-notation to express number of primitive
operations executed as function of input size.
Comparing asymptotic running times
an algorithm that runs in O(n) time is better
than one that runs in O(n2) time
similarly, O(log n) is better than O(n)
hierarchy of functions: log n < n < n2 < n3 < 2n
Caution! Beware of very large constant factors.
An algorithm running in time 1,000,000 n is still
O(n) but might be less efficient than one running
in time 2n2, which is O(n2)
Example of Asymptotic Analysis
Algorithm prefixAverages1(X):
Input: An n-element array X of numbers.
Output: An n-element array A of numbers such that
A[i] is the average of elements X[0], ... , X[i].
for i 0 to n-1 do
a0
for j 0 to i do n iterations
i iterations
a a + X[j] 1 step with
A[i] a/(i+1) i=0,1,2...n-1
return array A
Analysis: running time is O(n2
A Better Algorithm
Algorithm prefixAverages2(X):
Input: An n-element array X of numbers.
Output: An n-element array A of numbers such
that A[i] is the average of elements X[0], ... , X[i].
s0
for i 0 to n do
s s + X[i]
A[i] s/(i+1)
return array A
Analysis: Running time is O(n)
Asymptotic Notation (terminology)
Special classes of algorithms:
Logarithmic: O(log n)
Linear: O(n)
Quadratic: O(n2)
Polynomial: O(nk), k ≥ 1
Exponential: O(an), a > 1
“Relatives” of the Big-Oh
(f(n)): Big Omega -asymptotic lower bound
(f(n)): Big Theta -asymptotic tight bound
Asymptotic Notation
The “big-Omega”
Notation
asymptotic lower bound
f (n )
Running Time
f(n) is (g(n)) if there exists
constants c and n0, s.t. c g ( n)
c g(n) f(n) for n n0
Used to describe best-
case running times or n0 Input Size
lower bounds for
algorithmic problems
E.g.,lower-bound for
searching in an unsorted
array is (n).
Asymptotic Notation
The “big-Theta” Notation
asymptotically tight bound
f(n) = (g(n)) if there exists
constants c1, c2, and n0, s.t.
c1 g(n) f(n) c2 g(n) for n
n0
c 2 g (n )
f(n) is (g(n)) if and only if f (n )
Running Time
f(n) is O(g(n)) and f(n) is c 1 g (n )
(g(n))
O(f(n)) is often misused
n0
instead of (f(n))
Input Size
Asymptotic Notation
Two more asymptotic notations
"Little-Oh" notation f(n) is o(g(n))
non-tight analogue of Big-Oh
For every c, there should exist n0 , s.t. f(n)
c g(n) for n n0
Used for comparisons of running times.
If f(n)=o(g(n)), it is said that g(n)
dominates f(n).
"Little-omega" notation f(n) is w(g(n))
non-tight analogue of Big-Omega
Asymptotic Notation
Analogy with real numbers
f(n) = O(g(n)) @ f g
f(n) = (g(n)) @ fg
f(n) = (g(n)) @ f g
f(n) = o(g(n)) @ f <g
f(n) = w(g(n)) @ f >g
Abuse
of notation: f(n) = O(g(n)) actually
means f(n) O(g(n))
Comparison of Running Times
Running Maximum problem size (n)
Time 1 second 1 minute 1 hour
400n 2500 150000 9000000
20n log n 4096 166666 7826087
2n2 707 5477 42426
n4 31 88 244
2n 19 25 31