Our search trees will use character strings as keys and use strcmp() as our comparison function. This is not totally general, but it will serve as a nice introduction to search trees.
First, look at bstree.c. This defines two typedefs (remember that we are using information hiding so we need to put the typedefs in the .c file and not the .h file). The first is a node of the binary search tree. This has a search key, a value and pointers to left and right children:
typedef struct bstreenode { char *key; void *value; struct bstreenode *left_child; struct bstreenode *right_child; } BstreeNode;The second typedef is for the tree container structure. All it is is a pointer to the root of the tree:
typedef struct { BstreeNode *root; } Bstree;
Now the invariant in a binary search tree is that given node n, all nodes reachable from n->left_child will have keys less than n->key, and that all nodes reachable from n->right_child will have keys greater than n->right_child. In this implementation, we will not allow two nodes to have the same key.
Bstree.h defines the following procedures:
I am not going to go over the implementation in detail but I will discuss it in class. Look at the code yourself. New_bstree(), bstree_insert(), bstree_find(), bstree_find_max() and bstree_find_min() are all straightforward code that you should be able to look over and understand rather quickly. bstree_insert and bstree_find use recursion, which means that they call themselves.
The only two tricky ones are free_bstree() and bstree_delete_node().
Free_bstree() is recursive -- it simply frees its left and right children, and then frees itself. Note, recursive_free_bstree() is defined to be static -- this means that it may only be used by procedures in bstree.c. This is a convenient thing to do when you need a procedure like recursive_free_bstree() in an implementation, but you don't want anyone else to call it. Here is the code for free_bstree():
static void recursive_free_bstree(BstreeNode *bn) { if (bn == NULL) return; if (bn->left != NULL) recursive_free_bstree(bn->left); if (bn->right != NULL) recursive_free_bstree(bn->right); free(bn); return; } void free_bstree(void *tree) { Bstree *b = (Bstree *)tree; recursive_free_bstree(b->root); free(b); return; }Node deletion is pretty complex. The book glosses over node deletion by using lazy deletion where it marks a node as deleted but does not remove it from the tree. If the number of deletes is roughly equal to the number of nodes in the tree, then the running time of the algorithms is not affected, as long as the tree stays balanced. The reason is that the number of nodes in the tree will be roughly double what it would otherwise be (since the number of deleted nodes equal the number of nodes in the tree), which increases the depth of the tree by at most 1. Hence instead of our operations requiring log n time, they will require 1 + log n time, which is still O(n).
If the number of deletes is not equal to the number of nodes in the tree, then the running time could be affected and it is better to actually remove the node from the tree. There are three cases to consider:
6 / \ 2 8 / \ 1 4 / 3If we delete 1, then we have the tree:
6 / \ 2 8 \ 4 / 3
6 / \ 2 8 \ 3
6 / \ 2 8 / \ 1 4 / \ 3 5If we delete 2 from this tree, then we will promote 3 to replace 2, since 3 is the smallest key in 2's right subtree:
6 / \ 3 8 / \ 1 4 \ 5Here is my code for deleting a node from a tree:
// delete a node with the indicated key from the tree. Note that we pass // a pointer to the node's position in the tree, rather than the node // itself. This allows us to do an in-place modification of the node if // necessary. static void *bstree_delete_helper(BstreeNode **bn, char *key) { void *return_value; BstreeNode *node = *bn; if (node == NULL) return NULL; // Item not found; do nothing if (strcmp(key, node->key) < 0) return bstree_delete_helper(&node->left_child, key); else if (strcmp(key, node->key) > 0) return bstree_delete_helper(&node->right_child, key); // else key found so do deletion else if ((node->left_child != NULL) && (node->right_child != NULL)) { // two child case--find smallest element in right subtree and promote // it to this node. Then recursively remove the smallest element in the // right subtree BstreeNode *smallest = bstree_find_min_helper(node->right_child); return_value = node->value; node->key = smallest->key; node->value = smallest->value; bstree_delete_helper(&node->right_child, smallest->key); return return_value; } else { return_value = node->value; if (node->left_child != NULL) *bn = node->left_child; else *bn = node->right_child; // works even if right_child is NULL free(node); return return_value; } } /* delete the given key from the tree and return its value. */ void *bstree_delete(void *tree, char *key) { Bstree *b = (Bstree *)tree; return bstree_delete_helper(&b->root, key); }There are a few things to note about this code:
Here are some examples. First, we'll create a tree that looks as follows:
6 / \ 2 8 / \ 1 4 / 3(note, even though our search trees are character strings, we can use them to sort single digit numbers. Below, I will use values of zero for everything):
UNIX> bstree_test BSTREE> INSERT 6 0 BSTREE> INSERT 2 0 BSTREE> INSERT 1 0 BSTREE> INSERT 4 0 BSTREE> INSERT 3 0 BSTREE> INSERT 8 0 BSTREE> INORDER 1 0.00 2 0.00 3 0.00 4 0.00 6 0.00 8 0.00 BSTREE> PREORDER 6 0.00 2 0.00 1 0.00 4 0.00 3 0.00 8 0.00 BSTREE>Note, the preorder traversal shows that the tree is just as depicted in the above figure. We could also do the same with a post-order traversal. Frankly, I find the preorder traversal easier to understand:
BSTREE> POSTORDER 1 0.00 3 0.00 4 0.00 2 0.00 8 0.00 6 0.00 BSTREE>Now we insert 5 into the tree. Here's our update figure, followed by a preorder traversal:
6 / \ 2 8 / \ 1 4 / \ 3 5 BSTREE> INSERT 5 0 BSTREE> PREORDER 6 0.00 2 0.00 1 0.00 4 0.00 3 0.00 5 0.00 8 0.00If we delete node 5, then we again have the original figure.
BSTREE> DELETE 5 BSTREE> PREORDER 6 0.00 2 0.00 1 0.00 4 0.00 3 0.00 8 0.00When we delete node 4, it will replace the right child of node 2 with node three, as depicted in the picture:
6 / \ 2 8 / \ 1 3 BSTREE> DELETE 4 BSTREE> PREORDER 6 0.00 2 0.00 1 0.00 3 0.00 8 0.00Now, to show deletion of a node with two children, I'll delete node 3 and then add nodes 5, 3 and 4. This will give us the following tree:
6 / \ 2 8 / \ 1 5 / 3 \ 4 BSTREE> DELETE 3 BSTREE> INSERT 5 0 BSTREE> INSERT 3 0 BSTREE> INSERT 4 0 BSTREE> PREORDER 6 0.00 2 0.00 1 0.00 5 0.00 3 0.00 4 0.00 8 0.00Now, we delete node 2. This will replace node 2 with node 3, and delete node three. We're left will the tree depicted on the right side of figure 4.24:
6 / \ 3 8 / \ 1 5 / 4 BSTREE> DELETE 2 BSTREE> PREORDER 6 0.00 3 0.00 1 0.00 5 0.00 4 0.00 8 0.00 BSTREE>Finally, note that binary search trees can be bad if they are created with the keys already sorted. For example, look at the following tree:
UNIX> bstree_test BSTREE> INSERT Cindy 1955 BSTREE> INSERT Dave 1923 BSTREE> INSERT Jim 1966 BSTREE> INSERT Peg 1929 BSTREE> INSERT Terry 1963 BSTREE> INORDER Cindy 1955.00 Dave 1923.00 Jim 1966.00 Peg 1929.00 Terry 1963.00 BSTREE> PREORDER Cindy 1955.00 Dave 1923.00 Jim 1966.00 Peg 1929.00 Terry 1963.00 BSTREE>As you see, the tree is unbalanced, and finding keys in this tree is as inefficient as finding keys in a linked list: O(n). We'll talk more about this later.