Abstract Data
Structures
and
Algorithms
Overview of standard data structures and
useful algorithms
Why different data types?
Complexity of Manipulation
One
criteria:
The data structure can have an effect
on how difficult the task is
Vector of size n
Efficient for the nth element
A single arithmetic calculation
Complexity does not increase as the vector gets bigger
O(1)
76 8 9 10 11 120 1 2 3 4 5
Pos(v[0]) Pos(v[0]) + 5
Find ith element of vector
This is exactly an example for what a vector is designed for
Vector of size n
Insert before ith element in vector
76 8 9 10 11 120 1 2 3 4 5
1. Allocate vector of size n+1
2. Copy element 0 to i-1 to places 0 to i-1
and
copy elements i to n-1 to places i+1 to n
76 8 9 10 11 120 1 2 3 4 5
1 operation
n
operations
3. Set in element
76 8 9 10 11 120 1 2 3 4 5 13
1 operation
Vectors are not designed to be used with insertion operations
As the vector gets larger, the insertion takes more time/operations
O(n)
Complexity
O(1) Time/operation complexity does not increase
with the size of the problem
O(n) Time/operation complexity does increases linearly
with the size of the problem
Find ith element of vector
Insert before ith element in vector
Linked List
Element
Structure pair
Pointer to next element pair
Linked list with 6 elements
43210 5
Linked List
43210 5
Find ith element of linked list
43210 5
Have to traverse structure to ith element
Linked lists are not designed to find the ith element
As the list increases in size, the number of steps can increase
O(n)
Linked List
Insert an element
2
43210 5
2
43210 5
Change pointers… one operation
Linked lists are exactly designed to insert an element
Regardless of the size of the list, the insertion is still one
operation
Linear Search
I am thinking of a number between 1 and 10
If you just guess number (for example sequentially)
Best case: correct on the 1 guess
Worse case: correct after 10 guesses
On the average it will take you 5 guesses
In general:
for a number between 1 and n it will take you n/2 guesses
Complexity: n/2 guesses O(n)
Don’t worry about the constant ½…
The complexity increases linearly with the size of the problem
Binary Search
For every guess I will say whether it is correct, higher or lower
4
62
7531
Best Case:
1 guess
Worse Case:
3 guesses
At most log2 8 = 3 are needed
In general: log2 n guesses O(log n)
Extra information:
Complexity of operations
The proper data structure can increase the efficiency of an algorithm
For structures of size n
Increases linearly with size of structure
Does not depend on size of structure
Complexity of an Algorithm
O(c) Complexity does not increase with the size of the problem
Example: Find ith element in a vector
O(n) Complexity increases linearly with the size of the problem
Example: Find ith element in a linked list
O(log n) Complexity increases with the log of the problem
Example: Binary search
As the problem grows in size, how more difficult
(in terms of computation time/operations)
does the problem become
Coupling
Relationship between
data structures and algorithms
Choose the wrong data structure
the algorithm becomes more complex
Why different data types?
A specific object implies a data structure
Another criteria:
Graph Data Structure
A set of nodes
A set of connections between the nodes
Both nodes and connections can have
properties associated to them
Graph Data Structure
A graph can be
a natural representation
for many
data objects and processes
Social Network
Node: The person
(facebook page)
Node: connects two people who know each other
(the friends of facebook page)
Each node has a list of connections
(the friends of facebook page)
Inheritance
Directed graph
Nodes:
Data classes
Directed connection:
One way connection
One class inherits properties from the other
Connections:
One class inherits the properties of the other
Object oriented classes
Ontology
(labeled connectors)
Nodes: Objects
Connections:
Relationship between objects
Arithmetic Expression
(functional programming)
x + y + z * ( a + b * c)
cb
*a
+z
*yx
+ function
arguments
Molecular Graph
Nodes:
The atoms
Connections:
The bonds between the atoms
Graph as a
Linked list
4321 5
876 9
10 11
12 13
Foundation of LISP:
List programming
Functional programming
(graph as an functional expression)
Stacks
Queues
Priority Queues
Stacks
Characteristics:
Top: was the last thing added
To get to something in the middle
You have to remove what is on top first
LIFO:
Last In, First Out
Last in first out (LIFO)
D
C
B
A
B
A
top
C
B
A
top
D
C
B
A
top E
D
C
B
A
top
topA
Push C Push D Push EPush B Pop E
Two main operations:
Push and Pop
The Towers of Hanoi
A Stack-based Application
o GIVEN: three poles
o a set of discs on the first pole, discs of different sizes, the
smallest discs at the top
o GOAL: move all the discs from the left pole to the right
one.
o CONDITIONS: only one disc may be moved at a time.
o A disc can be placed either on an empty pole or on
top of a larger disc.
Towers of Hanoi
Complexity:
Towers of Hanoi
Complexity:
2n
Why?
To get to the bottom, you have to move all of the top object: 2(n-1)
Then you move the bottom object: 1
Then you have to move all the other objects back on top again: 2(n-1)
2(n-1) + 2(n-1) = 2 * 2(n-1) = 2n
A Legend
The Towers of Hanoi
In the great temple of Brahma in Benares,
on a brass plate under the dome that marks the center of the world
there are 64 disks of pure gold that
the priests carry one at a time between these diamond needles
According to Brahma's immutable law:
No disk may be placed on a smaller disk.
In the beginning of the world all 64 disks formed the Tower of Brahma on
one needle.
Now, however, the process of transfer of the tower from one needle to
another is in mid course.
When the last disk is finally in place, once again forming the Tower of
Brahma but on a different needle,
then will come the end of the world and all will turn to dust.
Is the End of the World
Approaching?
• Problem complexity 2n
• 64 gold discs
• Given 1 move a second
 600,000,000,000 years until the end of the world
Queues
FILO: First In and Last Out
Objects are inserted in the back
And
Removed from the front
Queues
Computer systems must often provide a
“holding area” for messages
between two processes, two programs, or even two systems.
Real time systems
Queue: Buffering
Computer sends data faster than the printer can print
Printer
Buffer
Priority Queue
Like a regular queue or stack datastructure, but where
additionally each element has a "priority" associated with it.
An element with high priority
is served before
an element with low priority.
If two elements have the same priority,
they are served according to their order in the queue.
There is an ordering associated with the
queue
Programming Paradigms
• Goto (like assembler and primitive/older languages)
• Iteration and Loops (while and for-next)
• Functional languages and Recursion
• Declarative
• Non-deterministic programming
Example: Factorial
n!º i
i=1
n
Õ n!º
1 n =1
n*(n-1)! n >1
ì
í
ï
îï
ü
ý
ï
þï
Implies a loop
Recursive mathematical definition
Goto statement
Loops a GOTO (or similar) statement
The GOTO jumps to a specified location (label or address)
n!º i
i=1
n
Õ
an index involved
The index is incremented until the end is reached i=1
factorial = 1;
loop:
factorial = factorial * I
if( i=n) goto exit
goto loop
exit
Iteration
Repetition of a block of code
n!º i
i=1
n
Õ
an index involved
The index is incremented until the end is reached
i=1
factorial = 1;
while( i <= n) {
factorial = factorial * i
i = i + 1
}
Once again involves a iteration counter
factorial = 1;
for i=1 to n {
factorial = factorial * I
}
Recursion
Numerische Mathematik 2, 312--318 (1960)
Content of Recursion
• Base case(s).
o Values of the input variables for which we perform no
recursive calls are called base cases (there should be at
least one base case).
o Every possible chain of recursive calls must eventually
reach a base case.
• Recursive calls.
o Calls to the current method.
o Each recursive call should be defined so that it makes
progress towards a base case.
factorial(n) {
if(n=1) return 1
return factorial(n-1)*n
}
n!º
1 n =1
n*(n-1)! n >1
ì
í
ï
îï
ü
ý
ï
þï
How do I write a
recursive function?
• Determine the size factor
o The number: smaller number, smaller size
• Determine the base case(s)
o The case for n=1, the answer is 1
• Determine the general case(s)
o The recursive call: factorial(n)=factorial(n-1)*n
• Verify the algorithm
(use the "Three-Question-Method")
factorial(n) {
if(n=1) return 1
return factorial(n-1)*n
}
n!º
1 n =1
n*(n-1)! n >1
ì
í
ï
îï
ü
ý
ï
þï
Three-Question Verification Method
1. The Base-Case Question:
Is there a nonrecursive way out of the function,
and does the routine work correctly for this
"base" case?
2. The Smaller-Caller Question:
Does each recursive call to the function involve
a smaller case of the original problem, leading
inescapably to the base case?
3. The General-Case Question:
Assuming that the recursive call(s) work
correctly, does the whole function work
correctly?
Stacks
in recursion
factorial(n)
If (n=1)
return 1
else
return factorial(n-1)
n! = n*(n-1)*(n-2)*(n-3)*……* 1
5! = 5*4*3*2*1
Factorial(5)
Factorial(4)
Factorial(3)
Factorial(2)
Factorial(1) return 1
Return 2
Return 6
Return 24
Return 120 5!=120
Deep recursion can result in
running out of memory
tail recursion
Tail recursion is iteration
n!º
1 n =1
n*(n-1)! n >1
ì
í
ï
îï
ü
ý
ï
þï
factorial(n) {
factorial-help(n,1);
}
factorial-help(n, acc) {
if(n=1) return acc
return factorial-help(n-1,acc*n)
}
Tail recursion is a pattern of use that can
be compiled or interpreted as iteration,
avoiding the inefficiencies
A tail recursive function is one where every recursive call is the last thing
done by the function before returning and thus produces the function’s value
Declarative programming
Expresses the logic of a computation
without describing its control flow.
n!º
1 n =1
n*(n-1)! n >1
ì
í
ï
îï
ü
ý
ï
þï
factorial(1,1) factorial(N,F) :-
N1 is N-1,
factorial(N1,F1),F is N*F1.
Constraint Logic
Programming
factorial(1,1) factorial(N,F) :-
N1 is N-1,
factorial(N1,F1),F is N*F1.
Factorial(5,F) Returns F=120
Factorial(N.120) Creates an instantiation error
PROLOG has no knowledge of Real or Integer numbers
Mathematical manipulations cannot be made
Constraint Logic
Programming
factorial(1,1) factorial(N,F) :-
N1 is N-1,
factorial(N1,F1),F is N*F1.
Logic Programming
Constraint Logic
Programming
CLP
Formulas passed to
CLP
Reduced or solved
formulas returned
Mathmatical knowledge
about the numbers used
Probabilistic Algorithms
Non-deterministic
No exact control program flow
Leaves Some of Its Decisions To Chance
Outcome of the program in different runs is not necessarily the same
Monte Carlo Methods
Always Gives an answer
But not necessarily Correct
The probability of correctness go es up with time
Las Vegas Methods Never returns an incorrect answer
But sometimes it doesn’t give an answer
Probabilistic Algorithms
Probabilistic Algorithms in optimization:
Closer to human reasoning and problem solving
(for hard problems we don’t follow strict deterministic algorithms)
Finding
local and global
minimum
Probabilistic Algorithms
Classic gradient optimization find local minimum
The search path is always downhill toward minimum
Probabilistic algorithms allow search to go uphill sometimes
Randomness in the search for next step
Genetic Algorithms
Simulated Annealing to find global minimum
Probabilistic Algorithms
Calculate pi with a dart board
Area of square
d2
Area of Circle:
P
d
2
æ
è
ç
ö
ø
÷
2
prob =
circle
square
=
p
d
2
æ
è
ç
ö
ø
÷
2
d2
=
p
4
Probability dart will be in circle
d
number darts in circle
divided by
number of darts in
total
times
Is π
Monte Carlo Method
Always Gives an answer
But not necessarily Correct
The probability of correctness goes up with time

Data structures

  • 1.
    Abstract Data Structures and Algorithms Overview ofstandard data structures and useful algorithms
  • 2.
    Why different datatypes? Complexity of Manipulation One criteria: The data structure can have an effect on how difficult the task is
  • 3.
    Vector of sizen Efficient for the nth element A single arithmetic calculation Complexity does not increase as the vector gets bigger O(1) 76 8 9 10 11 120 1 2 3 4 5 Pos(v[0]) Pos(v[0]) + 5 Find ith element of vector This is exactly an example for what a vector is designed for
  • 4.
    Vector of sizen Insert before ith element in vector 76 8 9 10 11 120 1 2 3 4 5 1. Allocate vector of size n+1 2. Copy element 0 to i-1 to places 0 to i-1 and copy elements i to n-1 to places i+1 to n 76 8 9 10 11 120 1 2 3 4 5 1 operation n operations 3. Set in element 76 8 9 10 11 120 1 2 3 4 5 13 1 operation Vectors are not designed to be used with insertion operations As the vector gets larger, the insertion takes more time/operations O(n)
  • 5.
    Complexity O(1) Time/operation complexitydoes not increase with the size of the problem O(n) Time/operation complexity does increases linearly with the size of the problem Find ith element of vector Insert before ith element in vector
  • 6.
    Linked List Element Structure pair Pointerto next element pair Linked list with 6 elements 43210 5
  • 7.
    Linked List 43210 5 Findith element of linked list 43210 5 Have to traverse structure to ith element Linked lists are not designed to find the ith element As the list increases in size, the number of steps can increase O(n)
  • 8.
    Linked List Insert anelement 2 43210 5 2 43210 5 Change pointers… one operation Linked lists are exactly designed to insert an element Regardless of the size of the list, the insertion is still one operation
  • 9.
    Linear Search I amthinking of a number between 1 and 10 If you just guess number (for example sequentially) Best case: correct on the 1 guess Worse case: correct after 10 guesses On the average it will take you 5 guesses In general: for a number between 1 and n it will take you n/2 guesses Complexity: n/2 guesses O(n) Don’t worry about the constant ½… The complexity increases linearly with the size of the problem
  • 10.
    Binary Search For everyguess I will say whether it is correct, higher or lower 4 62 7531 Best Case: 1 guess Worse Case: 3 guesses At most log2 8 = 3 are needed In general: log2 n guesses O(log n) Extra information:
  • 11.
    Complexity of operations Theproper data structure can increase the efficiency of an algorithm For structures of size n Increases linearly with size of structure Does not depend on size of structure
  • 12.
    Complexity of anAlgorithm O(c) Complexity does not increase with the size of the problem Example: Find ith element in a vector O(n) Complexity increases linearly with the size of the problem Example: Find ith element in a linked list O(log n) Complexity increases with the log of the problem Example: Binary search As the problem grows in size, how more difficult (in terms of computation time/operations) does the problem become
  • 13.
    Coupling Relationship between data structuresand algorithms Choose the wrong data structure the algorithm becomes more complex
  • 14.
    Why different datatypes? A specific object implies a data structure Another criteria:
  • 15.
    Graph Data Structure Aset of nodes A set of connections between the nodes Both nodes and connections can have properties associated to them
  • 16.
    Graph Data Structure Agraph can be a natural representation for many data objects and processes
  • 17.
    Social Network Node: Theperson (facebook page) Node: connects two people who know each other (the friends of facebook page) Each node has a list of connections (the friends of facebook page)
  • 18.
    Inheritance Directed graph Nodes: Data classes Directedconnection: One way connection One class inherits properties from the other Connections: One class inherits the properties of the other Object oriented classes
  • 19.
  • 20.
    Arithmetic Expression (functional programming) x+ y + z * ( a + b * c) cb *a +z *yx + function arguments
  • 21.
  • 22.
    Graph as a Linkedlist 4321 5 876 9 10 11 12 13 Foundation of LISP: List programming Functional programming (graph as an functional expression)
  • 23.
  • 24.
    Stacks Characteristics: Top: was thelast thing added To get to something in the middle You have to remove what is on top first LIFO: Last In, First Out
  • 25.
    Last in firstout (LIFO) D C B A B A top C B A top D C B A top E D C B A top topA Push C Push D Push EPush B Pop E Two main operations: Push and Pop
  • 26.
    The Towers ofHanoi A Stack-based Application o GIVEN: three poles o a set of discs on the first pole, discs of different sizes, the smallest discs at the top o GOAL: move all the discs from the left pole to the right one. o CONDITIONS: only one disc may be moved at a time. o A disc can be placed either on an empty pole or on top of a larger disc.
  • 27.
  • 28.
    Complexity: Towers of Hanoi Complexity: 2n Why? Toget to the bottom, you have to move all of the top object: 2(n-1) Then you move the bottom object: 1 Then you have to move all the other objects back on top again: 2(n-1) 2(n-1) + 2(n-1) = 2 * 2(n-1) = 2n
  • 29.
    A Legend The Towersof Hanoi In the great temple of Brahma in Benares, on a brass plate under the dome that marks the center of the world there are 64 disks of pure gold that the priests carry one at a time between these diamond needles According to Brahma's immutable law: No disk may be placed on a smaller disk. In the beginning of the world all 64 disks formed the Tower of Brahma on one needle. Now, however, the process of transfer of the tower from one needle to another is in mid course. When the last disk is finally in place, once again forming the Tower of Brahma but on a different needle, then will come the end of the world and all will turn to dust.
  • 30.
    Is the Endof the World Approaching? • Problem complexity 2n • 64 gold discs • Given 1 move a second  600,000,000,000 years until the end of the world
  • 31.
    Queues FILO: First Inand Last Out Objects are inserted in the back And Removed from the front
  • 32.
    Queues Computer systems mustoften provide a “holding area” for messages between two processes, two programs, or even two systems. Real time systems
  • 33.
    Queue: Buffering Computer sendsdata faster than the printer can print Printer Buffer
  • 34.
    Priority Queue Like aregular queue or stack datastructure, but where additionally each element has a "priority" associated with it. An element with high priority is served before an element with low priority. If two elements have the same priority, they are served according to their order in the queue. There is an ordering associated with the queue
  • 35.
    Programming Paradigms • Goto(like assembler and primitive/older languages) • Iteration and Loops (while and for-next) • Functional languages and Recursion • Declarative • Non-deterministic programming Example: Factorial n!º i i=1 n Õ n!º 1 n =1 n*(n-1)! n >1 ì í ï îï ü ý ï þï Implies a loop Recursive mathematical definition
  • 36.
    Goto statement Loops aGOTO (or similar) statement The GOTO jumps to a specified location (label or address) n!º i i=1 n Õ an index involved The index is incremented until the end is reached i=1 factorial = 1; loop: factorial = factorial * I if( i=n) goto exit goto loop exit
  • 37.
    Iteration Repetition of ablock of code n!º i i=1 n Õ an index involved The index is incremented until the end is reached i=1 factorial = 1; while( i <= n) { factorial = factorial * i i = i + 1 } Once again involves a iteration counter factorial = 1; for i=1 to n { factorial = factorial * I }
  • 38.
  • 39.
    Content of Recursion •Base case(s). o Values of the input variables for which we perform no recursive calls are called base cases (there should be at least one base case). o Every possible chain of recursive calls must eventually reach a base case. • Recursive calls. o Calls to the current method. o Each recursive call should be defined so that it makes progress towards a base case. factorial(n) { if(n=1) return 1 return factorial(n-1)*n } n!º 1 n =1 n*(n-1)! n >1 ì í ï îï ü ý ï þï
  • 40.
    How do Iwrite a recursive function? • Determine the size factor o The number: smaller number, smaller size • Determine the base case(s) o The case for n=1, the answer is 1 • Determine the general case(s) o The recursive call: factorial(n)=factorial(n-1)*n • Verify the algorithm (use the "Three-Question-Method") factorial(n) { if(n=1) return 1 return factorial(n-1)*n } n!º 1 n =1 n*(n-1)! n >1 ì í ï îï ü ý ï þï
  • 41.
    Three-Question Verification Method 1.The Base-Case Question: Is there a nonrecursive way out of the function, and does the routine work correctly for this "base" case? 2. The Smaller-Caller Question: Does each recursive call to the function involve a smaller case of the original problem, leading inescapably to the base case? 3. The General-Case Question: Assuming that the recursive call(s) work correctly, does the whole function work correctly?
  • 42.
    Stacks in recursion factorial(n) If (n=1) return1 else return factorial(n-1) n! = n*(n-1)*(n-2)*(n-3)*……* 1 5! = 5*4*3*2*1 Factorial(5) Factorial(4) Factorial(3) Factorial(2) Factorial(1) return 1 Return 2 Return 6 Return 24 Return 120 5!=120 Deep recursion can result in running out of memory
  • 43.
    tail recursion Tail recursionis iteration n!º 1 n =1 n*(n-1)! n >1 ì í ï îï ü ý ï þï factorial(n) { factorial-help(n,1); } factorial-help(n, acc) { if(n=1) return acc return factorial-help(n-1,acc*n) } Tail recursion is a pattern of use that can be compiled or interpreted as iteration, avoiding the inefficiencies A tail recursive function is one where every recursive call is the last thing done by the function before returning and thus produces the function’s value
  • 44.
    Declarative programming Expresses thelogic of a computation without describing its control flow. n!º 1 n =1 n*(n-1)! n >1 ì í ï îï ü ý ï þï factorial(1,1) factorial(N,F) :- N1 is N-1, factorial(N1,F1),F is N*F1.
  • 45.
    Constraint Logic Programming factorial(1,1) factorial(N,F):- N1 is N-1, factorial(N1,F1),F is N*F1. Factorial(5,F) Returns F=120 Factorial(N.120) Creates an instantiation error PROLOG has no knowledge of Real or Integer numbers Mathematical manipulations cannot be made
  • 46.
    Constraint Logic Programming factorial(1,1) factorial(N,F):- N1 is N-1, factorial(N1,F1),F is N*F1. Logic Programming Constraint Logic Programming CLP Formulas passed to CLP Reduced or solved formulas returned Mathmatical knowledge about the numbers used
  • 47.
    Probabilistic Algorithms Non-deterministic No exactcontrol program flow Leaves Some of Its Decisions To Chance Outcome of the program in different runs is not necessarily the same Monte Carlo Methods Always Gives an answer But not necessarily Correct The probability of correctness go es up with time Las Vegas Methods Never returns an incorrect answer But sometimes it doesn’t give an answer
  • 48.
    Probabilistic Algorithms Probabilistic Algorithmsin optimization: Closer to human reasoning and problem solving (for hard problems we don’t follow strict deterministic algorithms) Finding local and global minimum
  • 49.
    Probabilistic Algorithms Classic gradientoptimization find local minimum The search path is always downhill toward minimum Probabilistic algorithms allow search to go uphill sometimes Randomness in the search for next step Genetic Algorithms Simulated Annealing to find global minimum
  • 50.
    Probabilistic Algorithms Calculate piwith a dart board Area of square d2 Area of Circle: P d 2 æ è ç ö ø ÷ 2 prob = circle square = p d 2 æ è ç ö ø ÷ 2 d2 = p 4 Probability dart will be in circle d number darts in circle divided by number of darts in total times Is π Monte Carlo Method Always Gives an answer But not necessarily Correct The probability of correctness goes up with time