Data Structure and Algorithm
AVL Trees
handout 2
Apurba Sarkar
November 2, 2018
1 Motivation
In this handout we are going to talk about AVL trees. Earlier we have seen
binary search tree data structure. One problem with the binary search tree
if you recall is that the operations of insertion, deletion and search take time
proportional to the height of the tree and the height of the tree can be very bad.
We saw an example where the height of the tree could be as bad as order n or
n − 1 to be precise. We want to some how create a tree which does not have too
bad a height and that is what we are going to discuss in this handout. We are
going to look at the data structure called AVL trees named after their inventors,
Adelson-Velskii and Landis. AVL trees are also called height balanced trees.
2 AVL Trees
So what is an AVL tree? Consider the tree shown in the figure below (Fig. 1).
This is a binary search tree and inside the nodes are the keys. Everything which
is less than the root is to the left of the root and everything which is more than
the root is to the right of it. The thing that is written next to each node is the
height of a node. So what is the height of a node? The height of a node is the
height of the sub tree rooted at that node. For instance if we look at the node
with key 78, all the things which is below it is the sub tree rooted at this node.
What is the sub-tree rooted at a node? It is just the set of its descendants. The
height of the sub-tree rooted at 78 is 3 assuming level numbers begins with 1
at the root. The height of a node can also be defined as follows: the height of
a leaf node is 1, and if hl and hr be the heights of the left and right sub-trees
respectively of a node v then height of v is max(hl , hr ) + 1. The height of the
sub-tree rooted at node v is the height of v. For example, the sub tree rooted at
50 has height 2 and the sub tree rooted 78 has height 3 and the entire tree has
height 4. We are going to call this as height of the tree for the purpose of the
AVL tree. With every node we have noted down the height of that node. All
the leaves will have height 1, the parents of the leaves will have height 2 and so
on. Such a tree is called AVL tree if it is height balanced. Now what is height
balanced? The following is the definition.
1
Figure 1: Binary search tree
Definition 1. An internal node v of T is height balanced if and only if the
heights of the children of v differ by at most 1.
Definition 2. An binary search tree is an AVL tree if and only if every internal
node v of T are height balanced i.e. the heights of the children of v differ by at
most 1.
So, if we look at any node and its children then the difference in the heights
of its children is at most one. There might be no difference in their heights, as
in the case with the node with key 50 (Fig. 1). Its 2 children have the same
height. The node 78 has the difference of height 1 in the sub-tree. The left
sub-tree has height 2 and the right sub tree has height 1. Similarly, the node
with key 44 also has a difference of one. The right sub-tree has height 3 and the
left sub tree has height 2. If we look at all the nodes in the tree, the difference
is no more than 1. So the tree in Fig. 1 is an example of AVL tree. Basically
there are two properties to be satisfied by a tree to be an AVL tree. Firstly, it
has to be a binary search tree and secondly, for every internal node of the tree,
the heights of the children must differ by at most one. It is to be noted that we
have used the term “internal node”. This is because a leaf node does not have
any children and it does not make any difference to talk about the height of the
tree. For example, the left sub tree of the node with key 17 is missing so we call
it height zero sub-tree. If the tree is absent then we will denote the height as 0
(zero) and the single node will become height one.
Let us now see what is not an AVL tree? So recall that one of our binary
tree which was very bad, which had a huge height, was a tree like the following
(Fig 2). Such a tree with n nodes has height of n − 1.
Now the question arises, is this tree (Fig. 2) an AVL tree? To answer this
we need to check whether above mentioned properties are satisfied or not. First
property that it has to be binary search tree is satisfied (convince yourself by
looking at the key values). The second property, i.e. whether the tree is height
balanced, we need to check whether every node in the tree is height balanced. Is
the last node height balanced? Yes, since it is a leaf node, it is height balanced.
Is the next node height balanced? Yes, it is also height balanced. Is the node
following the 2nd node height balanced? No, because the right sub tree has
height 2 and the left sub tree has height zero. Thus the height balanced property
is violated here. It is also violated in the following nodes. Thus we will never
have such kind of tree as AVL tree. Since we now know that we are not going
2
17
32
44
48
..
..
..
88
Figure 2: Bad Binary search tree
to have such a kind of trees as AVL trees, let us try and figure out how bad the
height of an AVL tree can be. Let us suppose we have an AVL tree of n nodes,
now, if its height can still be as bad as n − 1 then we have not gained anything.
We would ideally like to say that its height is no more than log2 n or something.
We will figure that out and prove that the height of an AVL tree T which has
n nodes in it is only order log2 n.
Proposition 1. The height of an AVL tree T storing n keys is O(log2 n).
Let us see why this is true. We are not going to prove this claim directly,
instead we are going to make a slightly different argument which is as follows.
Let us consider all possible AVL tree of height h and amongst all such AVL trees
of height h, let us take the one which has the smallest number of nodes. Let us
define the quantity n(h) as the minimum number of nodes in an AVL tree of
height h. Let us figure out the quantity and then we will see how this implies
the proposition. Given an AVL tree of height h, we would like to find out what
is the smallest number of nodes it can have. Can it have only h nodes? Then
we will be in trouble. We want to say it has many nodes, if you recall, a binary
search tree of height h can have only h nodes like the tree in Fig. 2. But a
good tree which is like a complete binary tree of height h will have roughly 2h−1
nodes. What we would really like is that our AVL tree which is of height h has
large number of nodes, i.e. not just h but more like 2h or something like that.
Let us understand the quantity n(h): the minimum number of nodes in an AVL
tree of height h. What is an AVL tree of height 1? It is just a singleton node
and nothing else. It has only 1 node in it i.e n(1) = 1. If we have an AVL
tree of height 2 then it has root and 1 child node i.e n(2) = 2. It can also be
root and 2 children. But we taken n(2) = 2 and not 3 because we are counting
the minimum number of node in the tree of given height h. n(2) = 2 is the
3
minimum number of nodes in an AVL tree of height 2. Suppose if we have an
AVL tree of height h, it will contain 1 root node, an AVL tree of height h − 1
on one side, and an AVL tree of height h − 2 on the other side. Why h − 1
and h − 2? Because it has height h so its children can have height only h − 1
and not more than h − 1. They can have a difference of at most one. If one of
them is h − 1 the other one can only be h − 2 or h − 1. One of the sub tree has
height h − 1 and the other sub tree has height h − 1 or h − 2. But which one
will we pick? We would pick the other sub tree that has height h − 2, because
we are interested in having a tree of minimum number of nodes. A tree which
has smaller height will also have smaller number of nodes, so we would like that
the height of the other sub tree to be h − 2. If n(h) was the number of nodes
in the tree of height h, then what is the number of nodes n(h) equal to? It is
the smallest possible number of nodes in a tree of height h − 1 plus the smallest
possible number of nodes in a tree of height h − 2 plus 1 (for the root node). So
we have
1
if h = 1
n(h) = 2 if h = 2 (1)
1 + n(h − 1) + n(h − 2) otherwise
This is the recurrence we have to solve to get a closed form expression for n(h).
Let us try an approximate solution of this recurrence. First we use the fact
that n(h − 1) is only going to be larger than n(h − 2). Because, as the height
of the tree grows the number of nodes cannot reduce, it will only be more. So
n(h − 1) is at least as large as n(h − 2). Then this implies what we had written
earlier that is n(h) = n(h − 1) + n(h − 2) + 1. This quantity is at least as large
as 2n(h − 2). Strictly larger because we have also dropped the one. We have
replaced n(h − 1) by n(h − 2) and the earlier expression becomes
n(h) = n(h − 1) + n(h − 2) + 1 > 2n(h − 2)
This inequality now becomes simple to solve, n(h) is more than 2n(h − 2),
n(h − 2) is more than two times 2n(h − 4). This implies the entire thing n(h)
is more than 4n(h − 4) which in turn implies that the entire thing is more than
8n(h − 6).
n(h) > 2n(h − 2)
> 4n(h − 4)
> 8n(h − 6)
...
> 2i n(h − 2i)
Now, as we know n(2) = 2, so let h − 2i = 2. It implies i = h2 − 1
h h
We have now n(h) = 2 2 −1 n(2) = 2 2 If the minimum number of node in an
AVL tree be m then we have
h
m > 22
or h < 2 log2 m
h is the height and m is the number of nodes. The height of an AVL tree of m
nodes is less than two times log2 of the number of nodes. The best possible tree
4
could have height only log2 m if it were like a complete binary tree, very dense
in nature. But this tree has more height but not too much, just a factor of two
more, much better than having a height of order m.
Let us try and solve this recurrence slightly better way. This is more of
an exercise also to show you how recurrences are solved. We did fairly crude
analysis in solving the recurrence above, we replaced n(h − 1) with n(h − 2)
and then we did the steps and got the result. Now we will show how to get a
sharper bound on the height of an AVL tree. The bound we obtained earlier
is 2 log2 n. Let see if we can get something better than that. We are going to
use induction and we are going to do a tighter analysis of the same thing. We
prove using induction that the minimum number of nodes in an AVL tree of
height h, n(h) ≥ ch , where c is some number greater than 1. Earlier we showed
h √
that n(h) was at least 2 2 . The value of c in that√ case was 2. Let see if we
can get a higher c that is a larger c more than 2. What would be the way
of doing such a thing? We will assume that n(h) is at least as large as ch−1 .
We are going to prove this by induction. We will figure out what c is later.
We are proving a certain statement without actually knowing exactly what the
statement is because we are not telling what c is. But you will see what that c
has to be for the statement to be true.
Base Case: h = 1, n(h) is 1 for c > 1
Induction Hypothesis: Suppose claim is true for all h < k i.e n(h) >= ch−1
for h < k
Induction Steps: We have to show that n(k) >= ck−1
Lets recall our recurrence relation which was n(k) = n(k − 1) + n(k − 2) + 1.
Now our induction hypothesis says that n(k − 1) is at least ck−2 and n(k − 2)
is at least ck−3 .
So we have n(k) ≥ ck−2 + ck−3 + 1. If we ignore the plus 1 we have
n(k) ≥ ck−2 + ck−3
Now we will be able to show that n(k) ≥ ck−1 if we can show ck−2 +ck−3 ≥ ck−1 .
Dividing both sides by ck−3 , we get
c2 − c − 1 ≤ 0
So c should be such that it satisfies above√
quadratic
√
equation. The quadratic
2 1− 5 1+ 5
equation c − c − 1 = 0 has the roots 2 and 2 . So anything in between
these two would keep
√
this less than zero. But we want as large a c as possible,
so we will take 1+2 5 which is roughly 1.63. This quantity is also known as the
golden ratio.
Let us now look at the structure of an AVL tree in more detail. Once again
we have an AVL tree of n nodes. Let us take the leaf of this tree which is closest
to the root, which means the level number of this leaf node is the smallest among
all the leaves. Suppose this leaf is at level k. We can show that the height of
the tree is at most 2k − 1. This is rephrased in the following proposition.
Proposition 2. If the level of the closest leaf in an AVL tree is k (assuming
the level of the root to be 1), then the height of the AVL tree is at most 2k − 1
Let us now prove the above proposition. We have an AVL tree which has
n nodes in it, although the number of nodes in the tree is not going to be
particularly important. Let us consider a tree as shown in Fig. 3. Suppose the
node which is encircled, shown in the figure, is the is closest leaf to the root
5
and it is at level k. Obviously there could be other leaves at this level or could
be below this. We are going to prove that the height of this tree is at most
2k − 1. Let us draw the path in which the leaf (nk ) closest the root lie as shown
in Fig. 4. The hight of the node nk is 1. There would be some sub-tree hanging
out from node nk−1 , node nk−2 and so on as shown. Now what would be height
of the node nk−1 at level k − 1. The node nk−1 has a left sub-tree of hight one
as the left sub-tree is just a singleton node. What would be the height of the
right subtree of nk−1 ? obviously it could be 1 or 2 (but not 0, why?) so that
it maintains height balance property at nk−1 . Since we want to get as larger
height as possible for the subtree, we take the height of the right sub-tree of
nk−1 to be 2. So the height of the node nk−1 becomes 3. Now what would be
the height of the node nk−1 ? The left subtree of it has height 3 so the right
sub-tree of it could have height 2, 3 or 4, but again as we want to get as larger
height as possible we take the height of this right sub-tree to be of height 4. As
a result the height of the node nk−2 becomes 5. Similarly, what would be the
height of the subtree rooted at nk−3 ? it should be 7 as shown in the figure. If
we proceed this way along the path to root the height of the nodes will 9, 11 and
so on. What will be the height of the root? In general given that the closest
leaf is at level k, it should be 2k − 1. If k = 2 then the height is 3, if k = 3
then the height is 5. If k = 4 then height is 7 and so on. For arbitrary k this is
2k − 1. It is a very simple argument which means that this entire tree can be no
taller than 2k − 1, given that the closest leaf is at level k. It is to be noted that
this is the property of an AVL tree and not a property of any arbitrary binary
tree. In an arbitrary binary tree we can have leaves at any level and the height
of the tree could be as bad as you wanted. But for an AVL tree if there is a leaf
(closest) at level k then the height of the tree can not be more than 2k.
1
k
closeset leaf
≤ 2k − 1
Figure 3: Some tree
We have argued that if the closest leaf is at the level k then the height of
the tree is no more than 2k − 1. That is the largest possible height the tree can
have. Let us make another claim.
Proposition 3. If the closest leaf is at level k then all nodes at level 1 through
k − 2 have 2 children.
Every node on these 1st k − 2 levels should have 2 children. Let us prove
this by contradiction. Let us take some node u at level k − 2 which has only 1
child, v as in Fig. 5.
So the node v is at level k − 1 and it cannot be a leaf because our closest
leaf is at level k. So the node v it has to have a child. We have shown only 1
child of v but it can also have 2 children. The sub tree rooted at v should have
6
9
7
nk−3
5
nk−2 6
k
3 4
nk−1
1 2
nk
Figure 4: Some tree
k−2
u
k−1
v
Figure 5: Tree levels
height at least 2 because this should have 1 child (it cannot be a leaf). Now
if the node u has one child then its one subtree rooted at v has height 2 while
the other subtree has a height 0. So the node u becomes height imbalanced
which is not possible. We have shown for a node at level k − 2 but the same
argument would apply to any node at level 1 through k − 2. It is to be noted
that a node at level k − 1 however can have 1 child and that child should be a
leaf, otherwise, once again the height balance property will be violated. This
proof gives one important structural property about AVL tree that the levels 1
through k − 1 are completely full. It means they have as many nodes as possible
on those level. Recall that the height of the tree is at most 2k − 1. If the height
of the tree is 2k − 1 then it has at most 22k−1 nodes. This implies the number
of nodes in such an AVL tree lies between 2k−1 and 22k−1 . So if we denote the
number nodes in such an AVL tree by n, we have,
2k−1 ≤ n ≤ 22k−1
Since we have been using h for the height, let us substitute h for 2k − 1. Let us
see how this equation would look like.
h+1
2 2 ≤ n ≤ 2h
So we have proved that if you have an AVL tree of height h then it has at least
h+1
2 2 . The important point here is that it has an exponential number of nodes,
it has number of nodes which is some constant c to the power h, an exponential.
Because thats what gives gives us the logarithmic height property i.e. if there
are n nodes in the AVL tree its height is roughly log2 n.