CS140 Lecture Notes - AVL Trees

AVL Trees are binary search trees that are balanced. This is a very nice property because it guarantees that finds, insertions and deletions are all executed in logarithmic time. That means that if there are n items in the tree, the operations take roughly log2n time. That's as good as you can do, by the way. To help motivate the need for balancing, think about what happens if you insert into a regular binary search tree, and the input is already sorted. The tree becomes a gigantic line, and if the size of the tree is n elements, then performing each insertion takes n operations. Performing all n insertions takes roughly n2 operations. For example (do this in the binary search tree directory):

UNIX> wc input.txt
   10000   51456  770000 input.txt
UNIX> head input.txt
INSERT Ellie Warlike                              944-867-2246   165-79-8849
INSERT David Bobble                               026-631-5520   826-96-9094
INSERT Isaac Giuliano                             462-055-3150   827-30-6292
INSERT Madison Fiend                              193-149-4333   106-62-2934
INSERT Chloe Skew                                 257-554-8530   481-12-6340
INSERT Julia Postdoctoral                         018-992-9715   512-23-5507
INSERT Connor Teledyne                            808-602-6582   702-11-9340
INSERT Caleb Disciple                             457-440-4397   076-91-9105
INSERT Avery Chloe Panther                        243-649-0973   727-68-6107
INSERT Anna Placenta                              193-082-7570   836-85-9844
UNIX> time sh -c "bstree_test - < input.txt"
0.099u 0.006s 0:00.11 81.8%	0+0k 0+1io 0pf+0w
UNIX> time bstree_test - < input.txt
0.105u 0.004s 0:00.11 90.9%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "sort input.txt | bstree_test -"
1.721u 0.014s 0:01.73 100.0%	0+0k 0+1io 0pf+0w
UNIX> 
That's a big problem with binary search trees. AVL trees (and other balanced trees like Splay trees, Red-Black trees, B-trees, 2-3 trees, etc) make sure that their trees are balanced so that the various operations are much faster. For example, the program avltree_test is my solution to the AVL Tree lab (which some semesters will not have the pleasure of implementing):
UNIX> time avltree_test - < input.txt
0.099u 0.003s 0:00.10 90.0%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "sort input.txt | avltree_test -"
0.336u 0.011s 0:00.34 100.0%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "sort input.txt > /dev/null"
0.245u 0.007s 0:00.25 96.0%	0+0k 0+0io 0pf+0w
UNIX> 
As you can see, since sorting takes .25 seconds, performing insertions with the AVL tree takes the same time when the input is sorted as when it is not sorted.

Rotations

A central operation with AVL Trees is a rotation. It is a way of changing a binary search tree so that it remains a binary search tree, but changes how it is balanced. The concept is illustrated below:

B and D are nodes in a binary search tree. They can occur anywhere a tree, but we don't worry about what's above them -- just what's below them. A, C and E are subtrees that rooted as the children of B and D. They may be empty. If they are not empty, then since the tree is a binary search tree, we know that:

When we perform a rotation, we perform it about a node. For example, the rotation pictured above rotates about node D to turn the tree on the left to the tree on the right. It also shows that you can turn the tree on the right to the tree on the left by rotating about node B.

When you rotate about a node, you are going to change the tree so that the node's parent is now the node's child. The middle subtree (subtree C) will change from being one node's child to being the other node's child. The rotation does not violate any of the properties of binary search trees. However, it changes the shape of the tree, and there are multiple types of trees, like AVL, Splay and Red-Black trees that employ rotations to ensure that the trees are balanced.

Below are some example rotations. Make sure you understand all of them:


AVL Trees

An AVL Tree is a binary search tree that has conditions on the height of each node. The height is defined to be the length of the longest path from that node to a leaf node. Thus, in the left tree in the last diagram, Eunice's height is three and Binky's height is two. In the right tree, Eunice's height is still three, but Binky's height is now one. Leaf nodes have a height of zero. Since each node roots a subtree, we say that the height of a subtree is the height of its root. An empty tree has a height of -1.

The definition of an AVL tree is follows:

Below are some AVL trees:

And below are two trees that are binary search trees, but are not AVL trees.


Binky violates the definition

Fred violates the definition


Insertion into AVL Trees

To implement AVL trees, you need to maintain the height of each node. You insert into an AVL tree by performing a standard binary tree insertion. When you're done, you check each node on the path from the new node to the root. The checking goes as follows:

Let's try some examples. Suppose I have the following AVL tree -- I now annotate the nodes with their heights:

If I insert Ahmad, take a look at the resulting tree:

The new node Ahmad has a height of zero, and when I travel the path up to the root, I change Baby Daisy's height to one. However, her node is not imbalanced, since the height of her subtrees are 0 and -1. Moving on, Binky's height is unchanged, so we can stop -- the resulting tree is indeed an AVL tree.

However, suppose I now try to insert Waluigi. I get the following tree:

Traveling from the new node to the root, I see that Fred violates the balance condition. Its left child has a height of -1 and its right child has a height of 1. I have to rebalance the tree.


Rebalancing: Zig-Zig

Now, suppose I insert a node into a tree, and I discover that it is imbalanced at a node. There are two ways that this can happen, and they are called (for reasons that will become obvious later) zig-zig and zig-zag. We'll start with the Zig-Zig case. This happens when you have a tree of the following general form:

And you insert a node into tree F, which changes F's height from h-3 to h-2. Let's assume that F is still an AVL tree, so we have this tree, where A C and F are all AVL trees:

You'll note that the tree is imbalanced, and it is imbalanced at node B. Let's give some concrete examples:

You insert F into the tree:

Make sure you can match this example up with its general form above. In this case, h=2, both before and after.

You insert "Jasper" into the tree:

Make sure you can match this example up with its general form above. In this case, h=4, both before and after.

You insert an element into either E or G, changing its height, but not violating the AVL property:

Make sure you can match this example up with its general form above.

Let's take that last example as the general form of a Zig-Zig insertion:

In this case, the path from the imbalanced node (B) to the newly added node starts with two right children. That's what makes it "Zig-Zig". To rebalance the tree, you perform a rotation about node D:

Node D is the new root of the subtree. Before the insertion, node B's height was h-1, so the height of the subtree has not changed because of the rotation. Thus, after performing the rotation, you may return from your insertion: you are left with a valid AVL tree.

If the path from the imbalanced node to the newly added node starts with two left children, then you have another Zig-Zig case (the mirror image). You treat it in the same way: rotate about the imbalanced node's left child.

Before going on, take a look at our example above where we inserted Jasper, which led to the tree being imbalanced around Daisy. We can identify this as a Zig-Zig case, so we can fix it by rotating about Garth:

Before
After

Here's another example, where we inserted Waluigi and that made the tree imbalanced at Fred:

Before
After


Rebalancing: Zig-Zag

The other rebalancing case is the Zig-Zag case, pictured below. I'm starting with the general form, so try to envision an example of how this happens:

To fix this, you perform two rotations. You rotate about node D, and then you rotate about node D again. This is called a double rotation about node D. Here are the two rotations:

Once again, the height of the subtree before insertion was h-1, so when you're done with the double rotation, you are done -- your tree is balanced. Again, the mirror image case is treated in the exact same manner.

Here's an example. Suppose our tree is the following, rather large tree:

And suppose we insert Boo into the tree:

Checking for balancing, we have to increment every height up to the root, and the root node Eunice is imbalanced. Since the path to the new node starts with a left child and a right child, this is a Zig-Zag case, and we need to perform a double rotation about the grandchild of the imbalanced node -- Daisy. Below is the result. We have a nicely balanced AVL tree!


Deletion

When you delete a node from an AVL tree, you follow the same deletion procedure as you did for BSTrees. To review, there are three cases:

  1. The deleted node is a leaf node: Delete the node from its parent.
  2. The deleted node has a single child: Delete the node and promote the single child to replace it. The child's new parent will be this node's former parent.
  3. The deleted node has two children: Find the largest node in the left subtree of the node to be deleted. First delete this largest node from the tree (it must either be a leaf node or a node with no right child). Then promote this largest node so that it replaces the node to be deleted. The largest node's new parent will be the deleted node's former parent.

When you delete a node, you must also ensure that the tree remains an AVL tree, which means that some rebalancing may be required. There are three things that can happen to the parent:

You handle these three cases in different ways: To rebalance, you need to identify whether you are in a zig-zig situation or a zig-zag situation and rebalance accordingly. How do you do this? Look at the following picture:


Imbalanced Identification Picture

The imbalanced node is B. If the height of subtree C is h-3, then the height of E will be h-2 and the tree is a Zig-Zig -- you can rebalance by rotating about node D. If the height of subtree E is h-3, then the height of C is h-2 and the tree is a Zig-Zag -- you rebalance by doing a double rotation about the root of C. If both C and E have heights of h-2, then you treat it as either a Zig-Zig or a Zig-Zag. Both work. For the purposes of your lab, treat this case like a Zig-Zig.

The mirror image works the same way.

Let's look at some examples. First, suppose we delete Calista from the following tree:

You're left with:

You check Calista's old parent -- Binky and although Binky's height hasn't changed, the node is imbalanced. It's clearly a Zig-Zig tree, so you rotate about Baby Daisy to yield the following tree:

Since Baby Daisy is the root, we're done.

Let's try a more complex example -- deleting Eunice from the following tree:

First, Eunice has two children. So, we find the child with the greatest key in Eunice's left subtree, which is Dontonio, delete it, and replace Eunice with Dontonio. We start by deleting Dontonio:

And we start checking at Eunice. It is imbalanced. Looking at it, we see that it's a Zig-Zag, so we have to double-rotate about Fred:

Now, the subtree rooted by Fred is balanced, but the subtree's height is one less than it used to be, so we need to move to its parent and check it. Its height is unchanged, and it is balanced, so we're done -- as a last step, we replace Eunice with Dontonio:


Here's a larger example. We're going to delete Daisy from the following AVL tree:

As in the previous example, Daisy has two children, so we find the node with the largest key in Daisy's left subtree: Calista. We are going to delete Calista and then replace Daisy with Calista. So, we first delete Calista:

We start our checking with Calista's old parent: Binky. The node is imbalanced, so we have to determine whether it is Zig-Zig or Zig-Zag. I'll redraw it so it fits the Imbalanced Identification Picture:

In this example, h is three, and we're looking at the mirror image of the Imbalanced Identification Picture. You can see from the picture that the tree is Zig-Zig, so we do a single rotation about Baby Daisy:

Since the tree rooted at Baby Daisy has a smaller height, we need to check its parent (Daisy). Daisy's height is too high, so it needs to be decremented, and since it's the root of the tree, we're done. We replace Daisy with Calista, and return.


One final example. Let's delete Frenchy from the following tree:

We're left with the following tree, and when we start checking Frenchy's old parent, Galois, it is imbalanced:

I've drawn the subtrees for the Imbalanced Identification picture. h is two, and the C and E subtrees are both of height h-2. Thus, we can treat it either as a Zig-Zig or a Zig-Zag. We will treat it as a Zig-Zig (which you should do in your lab) and do a single rotation about Xavier.

Since the subtree's height is the same as it was before, we can return.


Practice

What is the tree that results when you delete Xerxes from the above tree? I'm not going to work through this in steps for you. Convince yourself that it will be the following tree:

To get that, you had to do a Zig-Zag balancing, and a Zig-Zig balancing.