CS140 Lecture notes -- Binary Search Trees

Binary Search Trees

The book has a very nice description of binary search trees. These lecture notes simply concern the implementation. The implementation of binary search trees that I've made is in the files bstree.h and bstree.c.

Our search trees will use character strings as keys and use strcmp() as our comparison function. This is not totally general, but it will serve as a nice introduction to search trees.

First, look at bstree.c. This defines two typedefs (remember that we are using information hiding so we need to put the typedefs in the .c file and not the .h file). The first is a node of the binary search tree. This has a search key, a value and pointers to left and right children:

```typedef struct bstreenode {
char *key;
void *value;
struct bstreenode *left_child;
struct bstreenode *right_child;
} BstreeNode;
```
The second typedef is for the tree container structure. All it is is a pointer to the root of the tree:
```typedef struct {
BstreeNode *root;
} Bstree;
```

Now the invariant in a binary search tree is that given node n, all nodes reachable from n->left_child will have keys less than n->key, and that all nodes reachable from n->right_child will have keys greater than n->right_child. In this implementation, we will not allow two nodes to have the same key.

Bstree.h defines the following procedures:

• void *new_bstree(): create a new empty tree.
• void free_bstree(void *tree): delete all nodes in the tree and delete the tree.
• void bstree_insert(void *tree, char *key, void *val): create a new node with key key and value val, and insert it into the correct place in the tree. If key already exists, then it replaces the key and val of the node that was there.
• void *bstree_find(void *tree, char *key): find the node in the tree with the given key and return a pointer to its value. If there is no such node, it returns NULL.
• void *bstree_delete(void *tree, char *key): delete the given key from the tree and return its value. It returns NULL if the key was not in the tree
• void *bstree_find_max(void *): return the node in the tree with the maximum key.
• void *bstree_find_min(void *): return the node in the tree with the minimum key.
• void *bstree_root(void *tree): return the root node of the tree.
• void *bstree_left(void *node): return the left child of the current node.
• void *bstree_right(void *node): return the right child of the current node.
• char *bstree_key(void *node): return the key of the current node.
• void *bstree_value(void *node): return the value of the current node.

I am not going to go over the implementation in detail but I will discuss it in class. Look at the code yourself. New_bstree(), bstree_insert(), bstree_find(), bstree_find_max() and bstree_find_min() are all straightforward code that you should be able to look over and understand rather quickly. bstree_insert and bstree_find use recursion, which means that they call themselves.

The only two tricky ones are free_bstree() and bstree_delete_node().

Free_bstree() is recursive -- it simply frees its left and right children, and then frees itself. Note, recursive_free_bstree() is defined to be static -- this means that it may only be used by procedures in bstree.c. This is a convenient thing to do when you need a procedure like recursive_free_bstree() in an implementation, but you don't want anyone else to call it. Here is the code for free_bstree():

```static void recursive_free_bstree(BstreeNode *bn)
{
if (bn == NULL) return;
if (bn->left != NULL) recursive_free_bstree(bn->left);
if (bn->right != NULL) recursive_free_bstree(bn->right);
free(bn);
return;
}

void free_bstree(void *tree)
{
Bstree *b = (Bstree *)tree;

recursive_free_bstree(b->root);
free(b);
return;
}
```
Node deletion is pretty complex. One way to do it is by using lazy deletion where it marks a node as deleted but does not remove it from the tree. If the number of deletes is roughly equal to the number of nodes in the tree, then the running time of the algorithms is not affected, as long as the tree stays balanced. The reason is that the number of nodes in the tree will be roughly double what it would otherwise be (since the number of deleted nodes equal the number of nodes in the tree), which increases the depth of the tree by at most 1. Hence instead of our operations requiring log n time, they will require 1 + log n time, which is still O(n).

If the number of deletes is not equal to the number of nodes in the tree, then the running time could be affected and it is better to actually remove the node from the tree. There are three cases to consider:

1. The deleted node is a leaf: In this case we simply remove the node from the tree. For example, assume we have the tree:
```                6
/  \
2    8
/  \
1    4
/
3
```
If we delete 1, then we have the tree:
```                6
/  \
2    8
\
4
/
3
```

2. The deleted node has only one child: In this case we promote the child to replace the node. For example, if we delete 4 from the above tree, 3 will be promoted to replace it:
```                6
/  \
2    8
\
3
```

3. The deleted node has two children: In this case we promote the smallest child in the right subtree to replace the node (this decision is arbitrary, we could also choose to promote the largest element in the left subtree), and then recursively delete this child from the right subtree. Note that since we selected the smallest value in the right subtree, all the values in the right subtree are greater than this value, and all the values in the left subtree are less than this value. Thus we maintain our invariant that all keys less than a node appear in its left subtree and all keys greater than a node appear in its right subtree. In practice we promote the child by storing its key and value in the "deleted" node and then delete the child's node from the tree. This has the desired effect of promoting the child. For example, let's say we have the following tree:
```                6
/  \
2    8
/  \
1    4
/ \
3   5
```
If we delete 2 from this tree, then we will promote 3 to replace 2, since 3 is the smallest key in 2's right subtree:
```                6
/  \
3    8
/  \
1    4
\
5
```
Here is my code for deleting a node from a tree:
```// delete a node with the indicated key from the tree.
static void *bstree_delete_helper(BstreeNode *node, char *key) {
if (node == NULL)
if (strcmp(key, node->key) < 0)
node->left_child = bstree_delete_helper(node->left_child, key);
else if (strcmp(key, node->key) > 0)
node->right_child = bstree_delete_helper(node->right_child, key);
// else key found so do deletion
else if ((node->left_child != NULL) && (node->right_child != NULL)) {
// two child case--find smallest element in right subtree and promote
// it to this node. Then recursively remove the smallest element in the
// right subtree
BstreeNode *smallest = bstree_find_min_helper(node->right_child);
node->key = smallest->key;
node->value = smallest->value;
node->right_child = bstree_delete_helper(node->right_child, smallest->key);
}
else { // either no children or one child
BstreeNode *saveNode = node;
if (node->left_child != NULL)
node = node->left_child;
else
node = node->right_child; // works even if right_child is NULL
free(saveNode);
}
return node;
}

/* delete the given key from the tree */
void bstree_delete(void *tree, char *key) {
Bstree *b = (Bstree *)tree;
b->root = bstree_delete_helper(b->root, key);
}
```
Note that I used a recursive helper function to perform the deletion, and I declared it static because I want to make it private to this file.

A simple application

Bstree_test.c contains a simple tree editor which lets you manage a tree of words as keys and doubles as values. You can insert, delete, print the max and min, and do one of three traversals, preorder, postorder or inorder. Note the inorder traversal will print the tree in sorted order. The pre and post-order traversals use indentation to show you what the tree looks like.

Here are some examples. First, we'll create a tree that looks as follows:

```                6
/  \
2    8
/  \
1    4
/
3
```
(note, even though our search trees are character strings, we can use them to sort single digit numbers. Below, I will use values of zero for everything):
```UNIX> bstree_test
BSTREE> INSERT 6 0
BSTREE> INSERT 2 0
BSTREE> INSERT 1 0
BSTREE> INSERT 4 0
BSTREE> INSERT 3 0
BSTREE> INSERT 8 0
BSTREE> INORDER
1                                    0.00
2                                    0.00
3                                    0.00
4                                    0.00
6                                    0.00
8                                    0.00
BSTREE> PREORDER
6 0.00
2 0.00
1 0.00
4 0.00
3 0.00
8 0.00
BSTREE>
```
Note, the preorder traversal shows that the tree is just as depicted in the above figure. We could also do the same with a post-order traversal. Frankly, I find the preorder traversal easier to understand:
```BSTREE> POSTORDER
1 0.00
3 0.00
4 0.00
2 0.00
8 0.00
6 0.00
BSTREE>
```
Now we insert 5 into the tree. Here's our update figure, followed by a preorder traversal:
```                6
/  \
2    8
/  \
1    4
/ \
3   5

BSTREE> INSERT 5 0
BSTREE> PREORDER
6 0.00
2 0.00
1 0.00
4 0.00
3 0.00
5 0.00
8 0.00
```
If we delete node 5, then we again have the original figure.
```BSTREE> DELETE 5
BSTREE> PREORDER
6 0.00
2 0.00
1 0.00
4 0.00
3 0.00
8 0.00
```
When we delete node 4, it will replace the right child of node 2 with node three, as depicted in the picture:
```                6
/  \
2    8
/  \
1    3

BSTREE> DELETE 4
BSTREE> PREORDER
6 0.00
2 0.00
1 0.00
3 0.00
8 0.00
```
Now, to show deletion of a node with two children, I'll delete node 3 and then add nodes 5, 3 and 4. This will give us the following tree:
```                6
/  \
2    8
/  \
1    5
/
3
\
4

BSTREE> DELETE 3
BSTREE> INSERT 5 0
BSTREE> INSERT 3 0
BSTREE> INSERT 4 0
BSTREE> PREORDER
6 0.00
2 0.00
1 0.00
5 0.00
3 0.00
4 0.00
8 0.00
```
Now, we delete node 2. This will replace node 2 with node 3, and delete node three. We're left will the tree depicted on the right side of figure 4.24:
```                6
/  \
3    8
/  \
1    5
/
4

BSTREE> DELETE 2
BSTREE> PREORDER
6 0.00
3 0.00
1 0.00
5 0.00
4 0.00
8 0.00
BSTREE>
```
Finally, note that binary search trees can be bad if they are created with the keys already sorted. For example, look at the following tree:
```UNIX> bstree_test
BSTREE> INSERT Cindy 1955
BSTREE> INSERT Dave 1923
BSTREE> INSERT Jim 1966
BSTREE> INSERT Peg 1929
BSTREE> INSERT Terry 1963
BSTREE> INORDER
Cindy                             1955.00
Dave                              1923.00
Jim                               1966.00
Peg                               1929.00
Terry                             1963.00
BSTREE> PREORDER
Cindy 1955.00
Dave 1923.00
Jim 1966.00
Peg 1929.00
Terry 1963.00
BSTREE>
```
As you see, the tree is unbalanced, and finding keys in this tree is as inefficient as finding keys in a linked list: O(n). We'll talk more about this later.