CS202 Lecture Notes - AVL Trees

AVL Trees are binary search trees that are balanced. This is a very nice property because it guarantees that finds, insertions and deletions are all executed in logarithmic time. That means that if there are n items in the tree, the operations take roughly log2n time. That's as good as you can do, by the way. To help motivate the need for balancing, think about what happens if you insert into a regular binary search tree, and the input is already sorted. The tree becomes a gigantic line, and if the size of the tree is n elements, then performing each insertion takes n operations. Performing all n insertions takes roughly n2 operations. For example, go into the lecture note directory for binary search trees, and take a look at the file txt/input.txt in the binary search tree directory):

UNIX> wc txt/input.txt       # This is in the Trees lecture note directory
   50000  200000 2397902 txt/input.txt
UNIX> head txt/input.txt
INSERT Brooke-Footwork 443-90-4990 898-934-4865
INSERT Sophia-Allison-Bromfield 510-30-7699 873-553-7759
INSERT Grace-Barnabas 948-49-5092 562-672-8825
INSERT Cole-Illogic 225-22-1798 976-177-7104
INSERT Elizabeth-Green 451-59-3245 106-637-5581
INSERT Dylan-Bambi 183-22-7881 033-896-1807
INSERT Anna-Hitch 284-19-1258 072-144-3834
INSERT Michael-Bilayer 741-13-7226 327-981-7902
INSERT Gavin-Harriman 831-80-7194 488-419-0189
INSERT Charlie-Iii 998-93-7448 930-447-4165
UNIX> time cat txt/input.txt | bin/bstree_tester -

real	0m0.435s
user	0m0.424s
sys	0m0.007s
UNIX> time sort txt/input.txt | head -n 1000 | bin/bstree_tester -

real	0m0.512s
user	0m0.506s
sys	0m0.010s
UNIX> time sort txt/input.txt | head -n 2000 | bin/bstree_tester -

real	0m0.633s
user	0m0.629s
sys	0m0.010s
UNIX> time sort txt/input.txt | head -n 4000 | bin/bstree_tester -

real	0m1.273s
user	0m1.269s
sys	0m0.010s
UNIX> time sort txt/input.txt | head -n 8000 | bin/bstree_tester -

real	0m3.747s
user	0m3.741s
sys	0m0.015s
UNIX> time sort txt/input.txt | bin/bstree_tester -

real	2m7.399s
user	2m7.181s
sys	0m0.140s
UNIX> 
That's a big problem with binary search trees. AVL trees (and other balanced trees like Splay trees, Red-Black trees, B-trees, 2-3 trees, etc) make sure that their trees are balanced so that the various operations are much faster. For example, the program avltree_test is my solution to the AVL Tree lab (which some semesters will not have the pleasure of implementing):
UNIX> time sort txt/input.txt > /dev/null

real	0m0.436s
user	0m0.428s
sys	0m0.007s
UNIX> time cat txt/input.txt | bin/avltree_tester -

real	0m0.296s
user	0m0.291s
sys	0m0.007s
UNIX> time sort txt/input.txt | bin/avltree_tester -

real	0m0.698s
user	0m0.721s
sys	0m0.012s
UNIX> 
As you can see, since sorting takes .43 seconds, performing insertions with the AVL tree takes the same time when the input is sorted as when it is not sorted.

Rotations

A central operation with AVL Trees is a rotation. It is a way of changing a binary search tree so that it remains a binary search tree, but changes how it is balanced. The concept is illustrated below:

B and D are nodes in a binary search tree. They can occur anywhere a tree, but we don't worry about what's above them -- just what's below them. A, C and E are subtrees that rooted as the children of B and D. They may be empty. If they are not empty, then since the tree is a binary search tree, we know that:

When we perform a rotation, we perform it about a node. For example, the rotation pictured above rotates about node D to turn the tree on the left to the tree on the right. It also shows that you can turn the tree on the right to the tree on the left by rotating about node B.

When you rotate about a node, you are going to change the tree so that the node's parent is now the node's child. The middle subtree (subtree C) will change from being one node's child to being the other node's child. The rotation does not violate any of the properties of binary search trees. However, it changes the shape of the tree, and there are multiple types of trees, like AVL, Splay and Red-Black trees that employ rotations to ensure that the trees are balanced.

Below are some example rotations. Make sure you understand all of them:


AVL Trees

An AVL Tree is a binary search tree that has conditions on the height of each node. The height is defined to be the number of nodes in the longest path from that node to a leaf node. Thus, in the left tree in the last diagram, Eunice's height is four and Binky's height is three. In the right tree, Eunice's height is still four, but Binky's height is now two. Leaf nodes have a height of one. Since each node roots a subtree, we say that the height of a subtree is the height of its root. An empty tree has a height of 0.

The definition of an AVL tree is follows:

That's a pretty simple definition. However, the constraint on the heights of each node's children is what makes the trees balanced, and makes insertion, finding, and deletion O(log n).

Below are some AVL trees:

And below are two trees that are binary search trees, but are not AVL trees.


Binky violates the definition

Fred violates the definition


Insertion into AVL Trees

To implement AVL trees, you need to maintain the height of each node. You insert into an AVL tree by performing a standard binary tree insertion. When you're done, you check each node on the path from the new node to the root. The checking goes as follows:

Let's try some examples. Suppose I have the following AVL tree -- I now annotate the nodes with their heights:

If I insert Ahmad, take a look at the resulting tree:

The new node Ahmad has a height of one, and when I travel the path up to the root, I change Baby Daisy's height to two. However, her node is not imbalanced, since the height of her subtrees are 1 and 0. Moving on, Binky's height is unchanged, so we can stop -- the resulting tree is indeed an AVL tree.

However, suppose I now try to insert Waluigi. I get the following tree:

Traveling from the new node to the root, I see that Fred violates the balance condition. Its left child is an empty tree, and as such has a height of 0. Its right child has a height of 2. I have to rebalance the tree.


Rebalancing

When you recognize that you have an imbalanced node, it will look like one of the two pictures below. This is without exception:

Up to two of the three subtrees A, C and E may be empty in this picture, but all three won't be empty. You'll note that the two pictures are mirror copies of one another.

Zig-Zig

Now, each of these may be further broken up into two cases, which we call "Zig-Zig" and "Zig-Zag". Let's concentrate on Zig-Zig, because it is simpler. Here is what it looks like in its two mirror images:

You'll note that the defining feature is that the direction of the imbalance either goes from the right child of the root to its right child (in the left picture), or from the left child of the root to its left child (in the right picture). That's why it's called "Zig-Zig".

To fix the Zig-Zig imbalance, you rotate about the child. That "fixes" the imbalance in each case, and it also decreases the height of the tree. In the pictures below, make sure that you double-check all of the nodes to make sure they meet their balance conditions:


This is the left tree above, rotated about node D.

This is the right tree above, rotated about node B.

It's now a good time to do some examples where we insert a node into an AVL tree, it becomes imbalanced due to a Zig-Zig imbalance, and we then fix it with a rotation. What I'm going to do in each picture below is show the tree and state what node we're inserting. Then I'll draw the resulting tree, which is imbalanced, and I'll shade the A, C and E subtrees. I'll then show the balanced tree that results when you perform the rotation:

We insert "Ralph"

"Khloe" is imbalanced.
We rotate about "Luther"

It's an AVL Tree again!

We insert "Becca"

"Eunice" is imbalanced.
We rotate about "Cal"

It's an AVL Tree again!

We insert "Zelda"
Sometimes this example is confusing.

"Henry" is imbalanced.
We rotate about "Mike"

It's an AVL Tree again!


One important thing to note is that after the rebalancing, the entire AVL tree is balanced -- the height of the rebalanced subtree is the same after rebalancing than it was before the node was inserted.

Zig-Zag

The "Zig-Zag" imbalance happens when the imbalance goes right, then left, or left, then right. Here's what it looks like in its two mirror images:

To explain how to fix this, we need to blow up the C tree in the picture above, relabeling the nodes and subtrees so that they make sense:

To rebalance the Zig-Zag case, we need to rotate twice about the grandchild. In each of these pictures, the grandchild is D. In the pictures below, we rotate once about D, but the tree is not balanced yet:

One rotation about D:
The tree is still imbalanced.

One rotation about D:
The tree is still imbalanced.

We perform one more rotation about D, and now the tree is balanced. In fact, in both cases, the resulting trees are identical!

After the second rotation about D:
The tree is balanced!

After the second rotation about D:
The tree is balanced!

Let's do some examples:

We insert "Don".

"Eve" is imbalanced.
We rotate twice about "Cal".

It's an AVL Tree again!

We insert "Eve".

"Kim" is imbalanced.
We rotate twice about "Ian".

It's an AVL Tree again!

We insert "Ginger".

"Brad" is imbalanced.
We rotate twice about "Ginger".

It's an AVL Tree again!

As with Zig-Zig, after rebalancing, the height of the rebalanced subtree is the same as it was before the node was inserted.

Summary of Insertion


Deletion

When you delete a node, then you need to start at the parent, and travel up to the root. For each node that you encounter, you need to do the following: With deletion, you can get the following imabalanced trees (an of course their mirror images). The first and third ones are the same as with insertion. The middle one can only occur with deletion.

Zig-Zig -- same as above.

This case only occurs with deletion. We treat it as a Zig-Zig.

Zig-Zag -- same as above.

After rebalancing, you can't stop, as you do with insertion. Instead you need to keep traveling toward the root. You only stop when you reach the root, or you don't change the height of a node (because then the heights of its ancestor nodes won't change either).

Let's look at some examples. As with insertion, I'll show an original tree before deletion, the tree after deletion, but before rebalancing, and the tree after rebalancing.

We delete "Hal".

We check Hal's parent, Ian, and it's balanced and its height is unchanged. We're done.

We delete "Ian".

As we move up to the root, we see that "Cal" is imbalanced. It's a Zig-Zig, so we rotate about "Bob".

It's an AVL Tree again. You'll note we still had to travel up from Bob to the root, and change Kim's height from 5 ato 4.

We delete "Nell". To do that, we find the largest node in Nell's left subtree -- Anne. We delete Anne and replace Nell with Anne.

Anne is imbalanced. It's a Zig-Zag, so we rotate twice about "Omar".

It's an AVL Tree again.

We delete "Naomi". To do that, we find the largest node in Naomi's left subtree -- Jet. We delete Jet and replace Naomi with Jet.

Jet is imbalanced. It's a Zig-Zig, so we rotate about "Samson".

That subtree is balanced, but have to continue going to the root. We find that "Henry" is also imbalanced, and that it's a Zig-Zag. Two rotations about ""Dub".

Now, we're done and the tree is balanced.


Running Times

The balance condition means that AVL trees' heights are O(log n). Therefore: