Data structures

Abstract Data
Structures
and
Algorithms
Overview of standard data structures and
useful algorithms

Why different data types?
Complexity of Manipulation
One
criteria:
The data structure can have an effect
on how difficult the task is

Vector of size n
Efficient for the nth element
A single arithmetic calculation
Complexity does not increase as the vector gets bigger
O(1)
76 8 9 10 11 120 1 2 3 4 5
Pos(v[0]) Pos(v[0]) + 5
Find ith element of vector
This is exactly an example for what a vector is designed for

Vector of size n
Insert before ith element in vector
76 8 9 10 11 120 1 2 3 4 5
1. Allocate vector of size n+1
2. Copy element 0 to i-1 to places 0 to i-1
and
copy elements i to n-1 to places i+1 to n
76 8 9 10 11 120 1 2 3 4 5
1 operation
n
operations
3. Set in element
76 8 9 10 11 120 1 2 3 4 5 13
1 operation
Vectors are not designed to be used with insertion operations
As the vector gets larger, the insertion takes more time/operations
O(n)

Complexity
O(1) Time/operation complexity does not increase
with the size of the problem
O(n) Time/operation complexity does increases linearly
with the size of the problem
Find ith element of vector
Insert before ith element in vector

Linked List
Element
Structure pair
Pointer to next element pair
Linked list with 6 elements
43210 5

Linked List
43210 5
Find ith element of linked list
43210 5
Have to traverse structure to ith element
Linked lists are not designed to find the ith element
As the list increases in size, the number of steps can increase
O(n)

Linked List
Insert an element
2
43210 5
2
43210 5
Change pointers… one operation
Linked lists are exactly designed to insert an element
Regardless of the size of the list, the insertion is still one
operation

Linear Search
I am thinking of a number between 1 and 10
If you just guess number (for example sequentially)
Best case: correct on the 1 guess
Worse case: correct after 10 guesses
On the average it will take you 5 guesses
In general:
for a number between 1 and n it will take you n/2 guesses
Complexity: n/2 guesses O(n)
Don’t worry about the constant ½…
The complexity increases linearly with the size of the problem

Binary Search
For every guess I will say whether it is correct, higher or lower
4
62
7531
Best Case:
1 guess
Worse Case:
3 guesses
At most log2 8 = 3 are needed
In general: log2 n guesses O(log n)
Extra information:

Complexity of operations
The proper data structure can increase the efficiency of an algorithm
For structures of size n
Increases linearly with size of structure
Does not depend on size of structure

Complexity of an Algorithm
O(c) Complexity does not increase with the size of the problem
Example: Find ith element in a vector
O(n) Complexity increases linearly with the size of the problem
Example: Find ith element in a linked list
O(log n) Complexity increases with the log of the problem
Example: Binary search
As the problem grows in size, how more difficult
(in terms of computation time/operations)
does the problem become

Coupling
Relationship between
data structures and algorithms
Choose the wrong data structure
the algorithm becomes more complex

Why different data types?
A specific object implies a data structure
Another criteria:

Graph Data Structure
A set of nodes
A set of connections between the nodes
Both nodes and connections can have
properties associated to them

Graph Data Structure
A graph can be
a natural representation
for many
data objects and processes

Social Network
Node: The person
(facebook page)
Node: connects two people who know each other
(the friends of facebook page)
Each node has a list of connections
(the friends of facebook page)

Inheritance
Directed graph
Nodes:
Data classes
Directed connection:
One way connection
One class inherits properties from the other
Connections:
One class inherits the properties of the other
Object oriented classes

Ontology
(labeled connectors)
Nodes: Objects
Connections:
Relationship between objects

Arithmetic Expression
(functional programming)
x + y + z * ( a + b * c)
cb
*a
+z
*yx
+ function
arguments

Molecular Graph
Nodes:
The atoms
Connections:
The bonds between the atoms

Graph as a
Linked list
4321 5
876 9
10 11
12 13
Foundation of LISP:
List programming
Functional programming
(graph as an functional expression)

Stacks
Characteristics:
Top: was the last thing added
To get to something in the middle
You have to remove what is on top first
LIFO:
Last In, First Out

Last in first out (LIFO)
D
C
B
A
B
A
top
C
B
A
top
D
C
B
A
top E
D
C
B
A
top
topA
Push C Push D Push EPush B Pop E
Two main operations:
Push and Pop

The Towers of Hanoi
A Stack-based Application
o GIVEN: three poles
o a set of discs on the first pole, discs of different sizes, the
smallest discs at the top
o GOAL: move all the discs from the left pole to the right
one.
o CONDITIONS: only one disc may be moved at a time.
o A disc can be placed either on an empty pole or on
top of a larger disc.

Complexity:
Towers of Hanoi
Complexity:
2n
Why?
To get to the bottom, you have to move all of the top object: 2(n-1)
Then you move the bottom object: 1
Then you have to move all the other objects back on top again: 2(n-1)
2(n-1) + 2(n-1) = 2 * 2(n-1) = 2n

A Legend
The Towers of Hanoi
In the great temple of Brahma in Benares,
on a brass plate under the dome that marks the center of the world
there are 64 disks of pure gold that
the priests carry one at a time between these diamond needles
According to Brahma's immutable law:
No disk may be placed on a smaller disk.
In the beginning of the world all 64 disks formed the Tower of Brahma on
one needle.
Now, however, the process of transfer of the tower from one needle to
another is in mid course.
When the last disk is finally in place, once again forming the Tower of
Brahma but on a different needle,
then will come the end of the world and all will turn to dust.

Is the End of the World
Approaching?
• Problem complexity 2n
• 64 gold discs
• Given 1 move a second
 600,000,000,000 years until the end of the world

Queues
FILO: First In and Last Out
Objects are inserted in the back
And
Removed from the front

Queues
Computer systems must often provide a
“holding area” for messages
between two processes, two programs, or even two systems.
Real time systems

Queue: Buffering
Computer sends data faster than the printer can print
Printer
Buffer

Priority Queue
Like a regular queue or stack datastructure, but where
additionally each element has a "priority" associated with it.
An element with high priority
is served before
an element with low priority.
If two elements have the same priority,
they are served according to their order in the queue.
There is an ordering associated with the
queue

Programming Paradigms
• Goto (like assembler and primitive/older languages)
• Iteration and Loops (while and for-next)
• Functional languages and Recursion
• Declarative
• Non-deterministic programming
Example: Factorial
n!º i
i=1
n
Õ n!º
1 n =1
n*(n-1)! n >1
ì
í
ï
îï
ü
ý
ï
þï
Implies a loop
Recursive mathematical definition

Goto statement
Loops a GOTO (or similar) statement
The GOTO jumps to a specified location (label or address)
n!º i
i=1
n
Õ
an index involved
The index is incremented until the end is reached i=1
factorial = 1;
loop:
factorial = factorial * I
if( i=n) goto exit
goto loop
exit

Iteration
Repetition of a block of code
n!º i
i=1
n
Õ
an index involved
The index is incremented until the end is reached
i=1
factorial = 1;
while( i <= n) {
factorial = factorial * i
i = i + 1
}
Once again involves a iteration counter
factorial = 1;
for i=1 to n {
factorial = factorial * I
}

Recursion
Numerische Mathematik 2, 312--318 (1960)

Content of Recursion
• Base case(s).
o Values of the input variables for which we perform no
recursive calls are called base cases (there should be at
least one base case).
o Every possible chain of recursive calls must eventually
reach a base case.
• Recursive calls.
o Calls to the current method.
o Each recursive call should be defined so that it makes
progress towards a base case.
factorial(n) {
if(n=1) return 1
return factorial(n-1)*n
}
n!º
1 n =1
n*(n-1)! n >1
ì
í
ï
îï
ü
ý
ï
þï

How do I write a
recursive function?
• Determine the size factor
o The number: smaller number, smaller size
• Determine the base case(s)
o The case for n=1, the answer is 1
• Determine the general case(s)
o The recursive call: factorial(n)=factorial(n-1)*n
• Verify the algorithm
(use the "Three-Question-Method")
factorial(n) {
if(n=1) return 1
return factorial(n-1)*n
}
n!º
1 n =1
n*(n-1)! n >1
ì
í
ï
îï
ü
ý
ï
þï

Three-Question Verification Method
1. The Base-Case Question:
Is there a nonrecursive way out of the function,
and does the routine work correctly for this
"base" case?
2. The Smaller-Caller Question:
Does each recursive call to the function involve
a smaller case of the original problem, leading
inescapably to the base case?
3. The General-Case Question:
Assuming that the recursive call(s) work
correctly, does the whole function work
correctly?

Stacks
in recursion
factorial(n)
If (n=1)
return 1
else
return factorial(n-1)
n! = n*(n-1)*(n-2)*(n-3)*……* 1
5! = 5*4*3*2*1
Factorial(5)
Factorial(4)
Factorial(3)
Factorial(2)
Factorial(1) return 1
Return 2
Return 6
Return 24
Return 120 5!=120
Deep recursion can result in
running out of memory

tail recursion
Tail recursion is iteration
n!º
1 n =1
n*(n-1)! n >1
ì
í
ï
îï
ü
ý
ï
þï
factorial(n) {
factorial-help(n,1);
}
factorial-help(n, acc) {
if(n=1) return acc
return factorial-help(n-1,acc*n)
}
Tail recursion is a pattern of use that can
be compiled or interpreted as iteration,
avoiding the inefficiencies
A tail recursive function is one where every recursive call is the last thing
done by the function before returning and thus produces the function’s value

Declarative programming
Expresses the logic of a computation
without describing its control flow.
n!º
1 n =1
n*(n-1)! n >1
ì
í
ï
îï
ü
ý
ï
þï
factorial(1,1) factorial(N,F) :-
N1 is N-1,
factorial(N1,F1),F is N*F1.

Constraint Logic
Programming
N1 is N-1,
Factorial(5,F) Returns F=120
Factorial(N.120) Creates an instantiation error
PROLOG has no knowledge of Real or Integer numbers
Mathematical manipulations cannot be made

Constraint Logic
Programming
N1 is N-1,
Logic Programming
Constraint Logic
Programming
CLP
Formulas passed to
CLP
Reduced or solved
formulas returned
Mathmatical knowledge
about the numbers used

Probabilistic Algorithms
Non-deterministic
No exact control program flow
Leaves Some of Its Decisions To Chance
Outcome of the program in different runs is not necessarily the same
Monte Carlo Methods
Always Gives an answer
But not necessarily Correct
The probability of correctness go es up with time
Las Vegas Methods Never returns an incorrect answer
But sometimes it doesn’t give an answer

Probabilistic Algorithms in optimization:
Closer to human reasoning and problem solving
(for hard problems we don’t follow strict deterministic algorithms)
Finding
local and global
minimum

Classic gradient optimization find local minimum
The search path is always downhill toward minimum
Probabilistic algorithms allow search to go uphill sometimes
Randomness in the search for next step
Genetic Algorithms
Simulated Annealing to find global minimum

Calculate pi with a dart board
Area of square
d2
Area of Circle:
P
d
2
æ
è
ç
ö
ø
÷
2
prob =
circle
square
=
p
d
2
æ
è
ç
ö
ø
÷
2
d2
=
p
4
Probability dart will be in circle
d
number darts in circle
divided by
number of darts in
total
times
Is π
Monte Carlo Method
Always Gives an answer
But not necessarily Correct
The probability of correctness goes up with time

Data structures

More Related Content

What's hot

Similar to Data structures

More from Edward Blurock

Recently uploaded

Data structures