Unit 3
Unit 3
A Tree is a data structure in which each element is attached to one or more elements directly
beneath it.
Level 0
A
B 1
C D
E F G
H I J
K L 2
3
Terminology
The connections between elements are called branches.
A tree has a single root, called root node, which is shown at the top of the tree.
i.e. root is always at the highest level 0.
Each node has exactly one node above it, called parent. Eg: A is the parent of B,C and
D.
The nodes just below a node are called its children. ie. child nodes are one
level lower than the parent node.
A node which does not have any child called leaf or terminal node. Eg: E, F, K, L, H, I
and M are
lNeaovdes. with at least one child are called non terminal or internal nodes.
The child nodes of same parent are said to be siblings.
A path in a tree is a list of distinct nodes in which successive nodes are
connected by branches in the tree.
The length of a particular path is the number of branches in that path. The
degree of a node of a tree is the number of children of that node.
The maximum number of children a node can have is often referred to
as the order of a tree. The height or depth of a tree is the length of the
longest path from root to any leaf.
1. Root: This is the unique node in the tree to which further sub trees are attached. Eg: A
Degree of the node: The total number of sub-trees attached to the node is called
the degree of the node.Eg: For node A degree is 3. For node K degree is 0
3. Leaves: These are the terminal nodes of the tree. The nodes with degree 0 are always
the leaf nodes. Eg: E, F, K, L,H, I, J
1
4. Internal nodes: The nodes other than the root node and the leaves are called the
internal nodes. Eg: B, C, D, G
5. Parent nodes: The node which is having further sub-trees(branches) is called the
parent node of those sub-trees. Eg: B is the parent node of E and F.
6. Predecessor: While displaying the tree, if some particular node occurs previous to
some other node then that node is called the predecessor of the other node. Eg: E is
the predecessor of the node B.
7. Successor: The node which occurs next to some other node is a successor
node. Eg: B is the successor of E and F.
8. Level of the tree: The root node is always considered at level 0, then its adjacent
children are supposed to be at level 1 and so on. Eg: A is at level 0, B,C,D are at level
1, E,F,G,H,I,J are at level 2, K,L are at level 3.
9. Height of the tree: The maximum level is the height of the tree. Here height of
the tree is 3. The height if the tree is also called depth of the tree.
10. Degree of tree: The maximum degree of the node is called the degree of the tree.
BINARY TREES
Binary tree is a tree in which each node has at most two children, a left child and a
right child. Thus the order of binary tree is 2.
A binary tree is a finite set of nodes which is either empty or consists of a root
and two disjoint trees called left sub-tree and right sub-tree.
In binary tree each node will have one data field and two pointer fields for
representing the sub-branches. The degree of each node in the binary tree will be at the
most two.
1. Left skewed binary tree: If the right sub-tree is missing in every node of a
tree we call it as left skewed tree.
2
A
3
2. Right skewed binary tree: If the left sub-tree is missing in every node of a
tree we call it is right sub-tree.
B C
D E F G
Note:
n
1. A binary tree of depth n will have maximum 2 -1 nodes.
2. A complete binary tree of level l will have maximum 2l nodes at each level, where l starts
from 0.
3. Any binary tree with n nodes will have at the most n+1 null branches.
4. The total number of edges in a complete binary tree with n terminal nodes are 2(n-1).
a) Sequential Representation
b) Linked Representation
a) Sequential Representation
The simplest way to represent binary trees in memory is the sequential
representation that uses one- dimensional array.
1) The root of binary tree is stored in the 1 st location of array
th
2) If a node is in the j location of array, then its left child is in the location 2J+1 and its
right
child in the location 2J+2
4
d+1
The maximum size that is required for an array to store a tree is 2 -1, where d is the depth of
the tree.
5
Advantages of sequential representation:
The only advantage with this type of representation is that the
direct access to any node can be possible and finding the parent or left children of
any particular node is fast because of the random access.
2. Insertions and deletions which are the most common operations can
be done without moving the nodes.
7
Disadvantages of linked representation:
Traversing a tree means that processing it so that each node is visited exactly
once. A binary tree can be
traversed a number of ways.The most common tree traversals are
In-order
Pre-order and
Post-order
B C
D E F G
H I J
K
The pre-order traversal is:
ABDEHCFGIKJ The in-order
traversal is : DBHEAFCKIGJ The
post-order traversal
is:DHEBFKIJGCA
8
Inorder Traversal:
rd
Print 3
A
nd th
Print 2 Print 4
B D
C Print this
st
Print 1 E at the last
C-B-A-D-E is the inorder traversal i.e. first we go towards the leftmost node. i.e. C so print that
node
C. Then go back to the node B and print B. Then root node A then move towards
the right sub-tree print D and finally E. Thus we are following the tracing sequence
of Left|Root|Right. This type of traversal is called inorder traversal. The basic
principle is to traverse left sub-tree then root and then the right sub-tree.
Pseudo Code:
9
is the preorder traversal of the above fig. We are following Root|Left|Right path
i.e. data at the root node will be printed first then we move on the left sub-tree
and go on printing the data till we reach to the left most node. Print the data at
that node and then move to the right sub- tree. Follow the same principle at each
sub-tree and go on printing the data accordingly.
10
{
if(temp!=NULL)
{
cout<<”temp->data”; preorder(temp-
>left); preorder(temp->right);
}
}
From figure the postorder traversal is C-D-B-E-A. In the postorder traversal we are
following the Left|Right|Root principle i.e. move to the leftmost node, if right sub-
tree is there or not if not then print the leftmost node, if right sub-tree is there move
towards the right most node. The key idea here is that at each sub-tree we are
following the Left|Right|Root principle and print the data accordingly.
Pseudo Code:
12
Operations On Binary Search Tree:
The basic operations which can be performed on binary search tree are.
1. Insertion of a node in binary search tree.
2. Deletion of a node from binary search tree.
3. Searching for a particular node in binary search tree.
Insertion of a node in binary search tree.
While inserting any node in binary search tree, look for its appropriate position in
the binary search tree. We start comparing this new node with each node of the tree.
If the value of the node which is to be inserted is greater than the value of the current
node we move on to the right sub-branch otherwise we move on to the left sub-
branch. As soon as the appropriate position is found we attach this new node as left
or right child appropriately.
Before Insertion
In the above fig, if we wan to insert 23. Then we will start comparing 23 with value of root
node
i.e. 10. As 23 is greater than 10, we will move on right sub-tree. Now we will
compare 23 with 20 and move right, compare 23 with 22 and move right. Now
compare 23 with 24 but it is less than
24. We will move on left branch of 24. But as there is node as left child of 24, we
13
can attach 23 as left child of 24.
14
Deletion of a node from binary search tree.
For deletion of any node from binary search tree there are three which are possible.
i. Deletion of leaf node.
ii. Deletion of a node having one child.
iii. Deletion of a node having two children.
Deletion of leaf node.
This is the simplest deletion, in which we set the left or right pointer of parent node as NULL.
1
0
7 15
Before
deletion
5 9 12 18
From the above fig, we want to delete the node having value 5 then we will set left
pointer of its parent node as NULL. That is left pointer of node having value 7 is set
to NULL.
15
Deletion of a node having one child.
16
To explain this kind of deletion, consider a tree as given below.
17
Let us consider that we want to delete node having value 7. We will then find out
the inorder successor of node 7. We will then find out the inorder successor of node
7. The inorder successor will be simply copied at location of node 7.
That means copy 8 at the position where value of node is 7. Set left pointer of 9
as NULL. This completes the deletion procedure.
18
In the above tree, if we want to search for value 9. Then we will compare 9 with
root node 10. As 9 is less than 10 we will search on left sub branch. Now compare
9 with 5, but 9 is greater than 5. So we will move on right sub tree. Now compare 9
with 8 but 9 is greater than 8 we will move on right sub branch. As the node we
will get holds the value 9. Thus the desired node can be searched.
AVL TREES
Adelsion Velski and Lendis in 1962 introduced binary tree structure that is
balanced with respect to height of sub trees. The tree can be made balanced
and because of this retrieval
of any node can be done in Ο(log n) times, where n is total number of
nodes. From the name of these scientists the tree is called AVL tree.
Definition:
For any node in AVL tree the balance factor i.e. BF(T) is -1, 0 or +1.
19
Height of AVL Tree:
Theorem: The height of AVL tree with n elements (nodes) is O(log n).
Proof: Let an AVL tree with n nodes in it. Nh be the minimum number of nodes in
an AVL tree of height h.
In worst case, one sub tree may have height h-1 and other sub tree may have height
h-2. And both these sub trees are AVL trees. Since for every node in AVL tree the
height of left and right sub trees differ by at most 1.
Hence
Nh = Nh-1+Nh-2+1
20
Where Nh denotes the minimum number of nodes in an AVL
21
We can also write
it as N > Nh = Nh-
1+Nh-2+1
> 2Nh-2
> 4Nh-4
.
.
> 2iNh-2i
becomes
N > 2h/2-1N2
= O(log N)
This proves that height of AVL tree is always O(log N). Hence search, insertion
and deletion can be carried out in logarithmic time.
The AVL tree follows the property of binary search tree. In fact AVL
trees are basically binary search trees with balance factors as -1, 0, or
+1.
After insertion of any node in an AVL tree if the balance factor of
any node becomes other than -1, 0, or +1 then it is said that AVL
property is violated. Then we have to restore the destroyed balance
condition. The balance factor is denoted at right top corner inside the
node.
22
After insertion of a new node if balance condition gets destroyed, then the
nodes on that path(new node insertion point to root) needs to be readjusted.
That means only the affected sub tree is to be rebalanced.
The rebalancing should be such that entire tree should satisfy AVL property.
In above given example-
23
Insertion of a node.
There are four different cases when rebalancing is required after insertion of new node.
1. An insertion of new node into left sub tree of left child. (LL).
2. An insertion of new node into right sub tree of left child. (LR).
3. An insertion of new node into left sub tree of right child. (RL).
4. An insertion of new node into right sub tree of right child.(RR).
Some modifications done on AVL tree in order to rebalance it is called rotations of AVL
tree
Insertion Algorithm:
1. Insert a new node as new leaf just as an ordinary binary search tree.
2. Now trace the path from insertion point(new node inserted as leaf) towards root.
For each node ‘n’ encountered, check if heights of left (n) and right (n) differ by at
most 1.
a) If yes, move towards parent (n).
b) Otherwise restructure by doing either a single rotation or a double rotation.
24
Thus once we perform a rotation at node ‘n’ we do not require to perform any
rotation at any ancestor on ‘n’.
25
When node ‘1’ gets inserted as a left child of node ‘C’ then AVL property gets
destroyed i.e. node A has balance factor +2.
The LL rotation has to be applied to rebalance the nodes.
2. RR rotation:
When node ‘4’ gets attached as right child of node ‘C’ then node ‘A’ gets
unbalanced. The rotation which needs to be applied is RR rotation as shown in
fig.
26
When node ‘3’ is attached as a right child of node ‘C’ then unbalancing occurs
because of LR. Hence LR rotation needs to be applied.
When node ‘2’ is attached as a left child of node ‘C’ then node ‘A’ gets
unbalanced as its balance factor becomes -2. Then RL rotation needs to be
applied to rebalance the AVL tree.
Example:
27
Insert 1
To insert node ‘1’ we have to attach it as a left child of ‘2’. This will unbalance the
tree as follows. We will apply LL rotation to preserve AVL property of it.
Insert 25
29
Insert 12
30
Deletion:
Even after deletion of any particular node from AVL tree, the tree has to be
restructured in order to preserve AVL property. And thereby various rotations
need to be applied.
31
The tree becomes
32
Searching:
The searching of a node in an AVL tree is very simple. As AVL tree is basically
binary search tree, the algorithm used for searching a node from binary search tree is
the same one is used to search a node from AVL tree.
BTREES
Multi-way trees are tree data structures with more than two branches at a
node. The data structures of m-way search trees, B trees and Tries belong to
this category of tree structures.
AVL search trees are height balanced versions of binary search trees, provide
efficient retrievals and storage operations. The complexity of insert, delete and
search operations on AVL search trees id O(log n).
Applications such as File indexing where the entries in an index may be very
large, maintaining the index as m-way search trees provides a better option
than AVL search trees which are but only balanced binary search trees.
While binary search trees are two-way search trees, m-way search trees are
extended binary search trees and hence provide efficient retrievals.
B trees are height balanced versions of m-way search trees and they do not
recommend representation of keys with varying sizes.
Tries are tree based data structures that support keys with varying sizes.
33
RED – BLACK TREE :
Red-Black Tree is a self-balancing Binary Search Tree (BST) where every node
follows following rules:
Most of the Binary Search Tree operations take O(h) time (where ‘h’ is the
height of the tree) for example: Search , Max , Min , Insert , Delete etc.
If Binary Search Tree becomes skewed then height of the tree will become
equal to total number of nodes i.e. ‘n’ and complexity will increase to O(n).
34
So to make the complexity low we should balance the Binary Search Tree
after each insertion and deletion operation. This will ensure the height h of
tree as log n and complexity as O(log n).
So the height of a Red Black Tree is always log n.
If frequent insertion and deletion are required then Red Black Tree give
better performance than AVL Tree. If insertion and deletion are less frequent
then AVL tree give good performance because AVL Trees are more
balanced than Red Black Trees but they can cause more rotation and can
increase time complexity.
Properties :
In a red black tree of height ‘h’ has black height bh(x) >=h/2 from any node
x.
In a red black tree with ‘n’ nodes has height h <= 2log(n+1).
35
3. Do following if color of x’s parent is not BLACK and x is not root.
36
Left right case (See g, p and x)
37
Right Left Case (See g, p and x)
Example of Insertion
38
The main property that violates after insertion is two consecutive reds. In
delete, the main violated property is, change of black height in subtrees as
deletion of a black node may cause reduced black height in one root to leaf
path.
Deletion is fairly complex process. To understand deletion, notion of
double black is used. When a black node is deleted and replaced by a black
child, the child is marked as double black. The main task now becomes to
convert this double black to single black.
Deletion Steps
Following are detailed steps for deletion.
1. Perform standard BST delete. When we perform standard delete operation in
BST, we always end up deleting a node which is either leaf or has only
one child (For an internal node, we copy the successor and then recursively
call delete for successor, successor is always a leaf node or a node with one
child). So we only need to handle cases where a node is leaf or has one
child. Let v be the node to be deleted and u be the child that replaces v (Note
that u is NULL when v is a leaf and color of NULL is considered as Black).
2. Simple Case: If either u or v is red, we mark the replaced child as black
(No change in black height). Note that both u and v cannot be red as v is
parent of u and two consecutive reds are not allowed in red-black tree.
39
Do following while the current node u is double black and it is not root.
Let sibling of node be s.
If sibling s is black and at least one of sibling’s
children is red, perform rotation(s). Let the red
child of s be r. This case can be divided in four
subcases depending upon positions of s and r.
Left Left Case (s is left child of its parent
and r is left child of s or both children of
s are red). This is mirror of right right
case shown in below diagram.
40
41
Right Left Case (s is right child of its
parent and r is left child of s)
In this case, if parent was red, then we didn’t need to recur for
parent, we can simply make it black (red + double black =
single black)
42
If sibling is red, perform a rotation to move old sibling up, recolor the old sibling and parent. The
new sibling is always black (See the below diagram). This mainly converts the tree to black
sibling case (by rotation) and leads to case (a) or (b). This case can be divided in two subcases.
Left Case (s is left child of its parent). This is mirror of right right case shown in below
diagram. We right rotate the parent p.
Right Case (s is right child of its parent). We left rotate the parentp.
If u is root, make it single black and return (Black height of complete tree reduces by 1)
Time complexity for insertion in red black tree is O(logn) and deletion is also O(logn).
Balanced BSTs
● We've explored balanced BSTs this quarter because they guarantee worst-case O(log n)
operations.
● Claim: Depending on the access sequence, balanced BSTs may not be optimal BSTs.
2 6
1 3 5 7
Balanced BSTs
● We've explored balanced BSTs this quarter because they guarantee worst-case O(log n)
operations.
● Claim: Depending on the access sequence, balanced BSTs may not be optimal BSTs.
1 5
4 6
3 7
Static Optimality
● Let S = { x₁, x₂, …, xₙ } be a set with access probabilities p₁, p₂, …, pₙ.
● Goal: Construct a binary search tree T* that minimizes the total expected access time.
2
● There is an O(n )-time dynamic programming algorithm for constructing statically optimal
binary search trees.
● Knuth, 1971
● There is an O(n log n)-time greedy algorithm for constructing binary search trees whose cost is
within 1.5 of optimal.
● Mehlhorn, 1975
● These algorithms assume that the access probabilities are known in advance.
Challenge: Can we construct an optimal BST without knowing the access probabilities in
advance?
The Intuition
● If we don't know the access probabilities in advance, we can't build a fixed BST and then “hope”
it works correctly.
● For now, let's focus on lookups; we'll handle insertions and deletions later on.
Refresher: Tree Rotations
Rotate Right
B A
Rotate Left
A B
>B <A
<B <B
An Initial Idea
● After looking up an element, repeatedly rotate that element with its parent until it becomes the
root.
● Intuition:
● Recently-accessed elements will be up near the root of the tree, lowering access time.
E
The Problem
2
● The “rotate to root” method might result in n accesses taking time Θ(n ).
●
Why?
● Rotating an element x to the root significantly “helps” x, but “hurts” the rest of the tree.
● Most of the nodes on the access path to x have depth that increases or is unchanged.
A More Balanced Approach
● In 1983, Daniel Sleator and Robert Tarjan invented an operation called splaying.
● Rotates an element to the root of the tree, but does so in a way that's more “fair” to other
nodes in the tree.
g x
p p
x
>g g
<x
>p
<g >x
<p
>x First, rotate p with g
<x <p
Then, rotate x with p >p
Continue moving x up the tree <g >g
Case 2: Zig-Zag
g
x
p
g p
x
<g
>p
>g >x
<g <x <p >p
>g >x
<x <p First, rotate x with p
Then, rotate x with g
Continue moving x up the tree
Case 3: Zig
(Assume r is the
tree root)
r x
x r
>p <x
>x >x
<x <p >p
<p
Rotate x with r
x is now the root.
A
G
G
A D
C F
E
E
B G
A D F
C
Splaying, Empirically
● After a few splays, we went from a totally degenerate tree to a reasonably-balanced tree.
● Splaying nodes that are deep in the tree tends to correct the tree shape.
● Why is this?
● Is this a coincidence?
Why Splaying Works
● Claim: After doing a splay at x, the average depth of any nodes on the access path to x is halved.
g
x
p
p
x
g
>g
>p <x
<g >x
<p
>x
<x <p >p
<g >g
g x
p g p
x
<g
>p
>g >x
>p
>g >x <g <x <p
<x <p
These subtrees have their
height decreased by one.
There is no net
change in the height of x or r.
r x
x r
>p <x
>x >x
<x <p >p
<p
The nodes in this subtree have
their height decreased by one.
An Intuition for Splaying
● Each rotation done only slightly penalizes each other part of the tree (say, adding +1 or +2 depth).
● Each splay rapidly cuts down the height of each node on the access path.
● Slow growth in height, combined with rapid drop in height, is a hallmark of amortized
efficiency.
Some Claims
● Claim 1: The amortized cost of splaying a node up to the root is O(log n).
● Claim 2: The amortized cost of splaying a node up to the root can be o(log n) if the access
pattern is non-uniform.
● Splay trees provide make it extremely easy to perform the following operations:
● lookup
● insert
● delete
● predecessor / successor
● join
● split
● To join two trees T₁ and T₂, where all keys in T₁ are less than the keys in T₂:
T₁ T₂
Split
● To split T at a key k:
T₂
T₁
Delete
● Delete k.
T₁ T₂
The Runtime
● Rationale: Each has runtime bounded by the cost of O(1) splays, which takes total amortized
time O(log n).
● Historically, it has proven hard to analyze splay trees for several reasons:
● Deliberate lack of structure makes it hard to find invariants useful in the analysis.
● 30 years after splay trees were invented, we don't know the answer to the following
questions:
● What is the cost of performing n splits on a splay tree with no intervening operations?
● What is the cost of performing n deque operations on a splay tree?
Time-Out for Your Questions!
“If computers don't work with binary numbers, but instead with decimal (base 10) numbers, would
we be able to solve new questions and/or solve old questions more efficiently? Or would it be a
big
headache?”
● Use that analysis to derive powerful results about splay tree costs.
● Will not do the full analysis here; check the original paper for more details.
Weighted Path Lengths
● For any node xᵢ in a BST, define the path length of xᵢ, which we'll denote lᵢ, to be the length of the
path from the root of the tree to xᵢ.
i=1
● If the weights are access probabilities, the expected cost of a lookup in the BST is
( O 1+ ∑
i=1
w l
i i
)
Intuiting Weighted Path Lengths
w l
i i
∑
This is the weighted path i=1
length plus the sum of the
weights.
Total: 37 15 1
6 3 8 3
1 1 2 2 4 4 1 1
Node Sizes
● For each node x, define the size of x, denoted s(x), to be the sum of the weights of x
and all its descendants.
● The sum of the sizes of all the nodes gives the weighted path length of the tree plus the
sum of the weights
● By trying to keep the sum of the node sizes low, we can try to keep the access costs in the
tree low.
87