Our search trees will use character strings as keys and use strcmp() as our comparison function. This is not totally general, but it will serve as a nice introduction to search trees.
First, look at bstree.h. This defines two typedefs. The first is a node of the binary search tree. This has a search key, a value and pointers to left and right children:
typedef struct bstreenode { char *key; Jval val; struct bstreenode *left; struct bstreenode *right; } BstreeNode;The second typedef is for the tree header structure. All it is is a pointer to the root of the tree:
typedef struct { BstreeNode *root; } Bstree;
Now the invariant in a binary search tree is that given node n, all nodes reachable from n->left will have keys less than n->key, and that all nodes reachable from n->right will have keys greater than n->right. In this implementation, we will not allow two nodes to have the same key.
Bstree.h defines the following procedures:
I am not going to go over the implementation in detail. Look at the code yourself. New_bstree(), bstree_insert(), bstree_find(), bstree_find_max() and bstree_find_min() are all straightforward code that you should be able to look over and understand rather quickly. The only two tricky ones are free_bstree() and bstree_delete_node().
Free_bstree() is recursive -- it simply frees its left and right children, and then frees itself. Note, recursive_free_bstree() is defined to be static -- this means that it may only be used by procedures in bstree.c. This is a convenient thing to do when you need a procedure like recursive_free_bstree() in an implementation , but you don't want anyone else to call it. Here is the code for free_bstree():
static void recursive_free_bstree(BstreeNode *bn) { if (bn == NULL) return; if (bn->left != NULL) recursive_free_bstree(bn->left); if (bn->right != NULL) recursive_free_bstree(bn->right); free(bn); return; } void free_bstree(Bstree *b) { BstreeNode *bn; recursive_free_bstree(b->root); free(b); return; }Node deletion is pretty complex. Read over the book's description -- this is exactly how I have implemented it. I find the node's parent with the find_parent() routine, figure out how I will delete the node, and then delete it. If both children of the node are non-NULL, then I find the smallest node in the subtree rooted by the right child and use it to replace the node that is to be deleted. I do this by saving its key and val, deleting it recursively, and then replacing the key and val of the specified node.
Note, I also need special code for when the node to be deleted is the root of the tree. Here is the code:
static BstreeNode *find_parent(Bstree *t, BstreeNode *bn) { int cmp; BstreeNode *tmp; tmp = t->root; if (tmp == bn) return NULL; while(1) { cmp = strcmp(tmp->key, bn->key); if (cmp == 0) { fprintf(stderr, "Internal Error: two nodes with the same key (%s)\n", tmp->key); exit(1); } if (cmp > 0) { if (tmp->left == NULL) { fprintf(stderr, "Internal Error finding parent -- left child empty\n"); exit(1); } else if (tmp->left == bn) { return tmp; } else { tmp = tmp->left; } } else { if (tmp->right == NULL) { fprintf(stderr, "Internal Error finding parent -- right child empty\n"); exit(1); } else if (tmp->right == bn) { return tmp; } else { tmp = tmp->right; } } } } void bstree_delete_node(Bstree *b, BstreeNode *bn) { BstreeNode *parent, *replacement; int cmp; char *key; Jval val; parent = find_parent(b, bn); if (bn->left != NULL && bn->right != NULL) { replacement = bn->right; while (replacement->left != NULL) { replacement = replacement->left; } key = replacement->key; val = replacement->val; bstree_delete_node(b, replacement); bn->key = key; bn->val = val; return; } else { if (bn->left == NULL && bn->right == NULL) { replacement = NULL; } else if (bn->left == NULL) { replacement = bn->right; } else { replacement = bn->left; } if (parent == NULL) { b->root = replacement; } else if (parent->left == bn) { parent->left = replacement; } else { parent->right = replacement; } free(bn); return; } }
Here are some examples. First, we'll create a tree that looks just like the left tree in figure 4.21 (note, even though our search trees are character strings, we can use them to sort single digit numbers. Below, I will use values of zero for everything):
UNIX> bstree_test BSTREE> INSERT 6 0 BSTREE> INSERT 2 0 BSTREE> INSERT 1 0 BSTREE> INSERT 4 0 BSTREE> INSERT 3 0 BSTREE> INSERT 8 0 BSTREE> INORDER 1 0.00 2 0.00 3 0.00 4 0.00 6 0.00 8 0.00 BSTREE> PREORDER 6 0.00 2 0.00 1 0.00 4 0.00 3 0.00 8 0.00 BSTREE>Note, the preorder traversal shows that the tree is just as depicted in the left side of figure 4.21. We could also do the same with a post-order traversal. Frankly, I find the preorder traversal easier to understand:
BSTREE> POSTORDER 1 0.00 3 0.00 4 0.00 2 0.00 8 0.00 6 0.00 BSTREE>Now, as in figure 4.21, we insert 5 into the tree. Note again that it looks like figure 4.21:
BSTREE> INSERT 5 0 BSTREE> PREORDER 6 0.00 2 0.00 1 0.00 4 0.00 3 0.00 5 0.00 8 0.00If we delete node 5, then again we have the left side of figure 4.21. This is also the same as the left side of figure 4.23:
BSTREE> DELETE 5 BSTREE> PREORDER 6 0.00 2 0.00 1 0.00 4 0.00 3 0.00 8 0.00When we delete node 4, it will replace the right child of node 2 with node three, as depicted in the picture:
BSTREE> DELETE 4 BSTREE> PREORDER 6 0.00 2 0.00 1 0.00 3 0.00 8 0.00Now, to show deletion of a node with two children, I'll delete node 3 and then add nodes 5, 3 and 4. This will give us the tree in the left side of figure 4.24:
BSTREE> DELETE 3 BSTREE> INSERT 5 0 BSTREE> INSERT 3 0 BSTREE> INSERT 4 0 BSTREE> PREORDER 6 0.00 2 0.00 1 0.00 5 0.00 3 0.00 4 0.00 8 0.00Now, we delete node 2. This will replace node 2 with node 3, and delete node three. We're left will the tree depicted on the right side of figure 4.24:
BSTREE> DELETE 2 BSTREE> PREORDER 6 0.00 3 0.00 1 0.00 5 0.00 4 0.00 8 0.00 BSTREE>Finally, note that binary search trees can be bad if they are created with the keys already sorted. For example, look at the following tree:
UNIX> bstree_test BSTREE> INSERT Cindy 1955 BSTREE> INSERT Dave 1923 BSTREE> INSERT Jim 1966 BSTREE> INSERT Peg 1929 BSTREE> INSERT Terry 1963 BSTREE> INORDER Cindy 1955.00 Dave 1923.00 Jim 1966.00 Peg 1929.00 Terry 1963.00 BSTREE> PREORDER Cindy 1955.00 Dave 1923.00 Jim 1966.00 Peg 1929.00 Terry 1963.00 BSTREE>As you see, the tree is unbalanced, and finding keys in this tree is as inefficient as find keys in a linked list: O(n). We'll talk more about this later.