Our search trees will use character strings as keys and use strcmp() as our comparison function. This is not totally general, but it will serve as a nice introduction to search trees.
First, look at bstree.h. This defines two typedefs. The first is a node of the binary search tree. This has a search key, a value and pointers to left and right children:
typedef struct bstreenode {
char *key;
Jval val;
struct bstreenode *left;
struct bstreenode *right;
} BstreeNode;
The second typedef is for the tree header structure. All it
is is a pointer to the root of the tree:
typedef struct {
BstreeNode *root;
} Bstree;
Now the invariant in a binary search tree is that given node n, all nodes reachable from n->left will have keys less than n->key, and that all nodes reachable from n->right will have keys greater than n->right. In this implementation, we will not allow two nodes to have the same key.
Bstree.h defines the following procedures:
I am not going to go over the implementation in detail. Look at the code yourself. New_bstree(), bstree_insert(), bstree_find(), bstree_find_max() and bstree_find_min() are all straightforward code that you should be able to look over and understand rather quickly. The only two tricky ones are free_bstree() and bstree_delete_node().
Free_bstree() is recursive -- it simply frees its left and right children, and then frees itself. Note, recursive_free_bstree() is defined to be static -- this means that it may only be used by procedures in bstree.c. This is a convenient thing to do when you need a procedure like recursive_free_bstree() in an implementation , but you don't want anyone else to call it. Here is the code for free_bstree():
static void recursive_free_bstree(BstreeNode *bn)
{
if (bn == NULL) return;
if (bn->left != NULL) recursive_free_bstree(bn->left);
if (bn->right != NULL) recursive_free_bstree(bn->right);
free(bn);
return;
}
void free_bstree(Bstree *b)
{
BstreeNode *bn;
recursive_free_bstree(b->root);
free(b);
return;
}
Node deletion is pretty complex. Read over the book's description --
this is exactly how I have implemented it. I find the node's parent
with the find_parent() routine, figure out how I will delete
the node, and then delete it. If both children of the node are
non-NULL, then I find the smallest node in the subtree rooted
by the right child and use it to replace the node that is to be
deleted. I do this by saving its key and val, deleting
it recursively, and then replacing the key and val
of the specified node.
Note, I also need special code for when the node to be deleted is the root of the tree. Here is the code:
static BstreeNode *find_parent(Bstree *t, BstreeNode *bn)
{
int cmp;
BstreeNode *tmp;
tmp = t->root;
if (tmp == bn) return NULL;
while(1) {
cmp = strcmp(tmp->key, bn->key);
if (cmp == 0) {
fprintf(stderr, "Internal Error: two nodes with the same key (%s)\n",
tmp->key);
exit(1);
}
if (cmp > 0) {
if (tmp->left == NULL) {
fprintf(stderr, "Internal Error finding parent -- left child empty\n");
exit(1);
} else if (tmp->left == bn) {
return tmp;
} else {
tmp = tmp->left;
}
} else {
if (tmp->right == NULL) {
fprintf(stderr, "Internal Error finding parent -- right child empty\n");
exit(1);
} else if (tmp->right == bn) {
return tmp;
} else {
tmp = tmp->right;
}
}
}
}
void bstree_delete_node(Bstree *b, BstreeNode *bn)
{
BstreeNode *parent, *replacement;
int cmp;
char *key;
Jval val;
parent = find_parent(b, bn);
if (bn->left != NULL && bn->right != NULL) {
replacement = bn->right;
while (replacement->left != NULL) {
replacement = replacement->left;
}
key = replacement->key;
val = replacement->val;
bstree_delete_node(b, replacement);
bn->key = key;
bn->val = val;
return;
} else {
if (bn->left == NULL && bn->right == NULL) {
replacement = NULL;
} else if (bn->left == NULL) {
replacement = bn->right;
} else {
replacement = bn->left;
}
if (parent == NULL) {
b->root = replacement;
} else if (parent->left == bn) {
parent->left = replacement;
} else {
parent->right = replacement;
}
free(bn);
return;
}
}
Here are some examples. First, we'll create a tree that looks just like the left tree in figure 4.21 (note, even though our search trees are character strings, we can use them to sort single digit numbers. Below, I will use values of zero for everything):
UNIX> bstree_test
BSTREE> INSERT 6 0
BSTREE> INSERT 2 0
BSTREE> INSERT 1 0
BSTREE> INSERT 4 0
BSTREE> INSERT 3 0
BSTREE> INSERT 8 0
BSTREE> INORDER
1 0.00
2 0.00
3 0.00
4 0.00
6 0.00
8 0.00
BSTREE> PREORDER
6 0.00
2 0.00
1 0.00
4 0.00
3 0.00
8 0.00
BSTREE>
Note, the preorder traversal shows that the tree is just as
depicted in the left side of figure 4.21. We could also do
the same with a post-order traversal. Frankly, I find the
preorder traversal easier to understand:
BSTREE> POSTORDER
1 0.00
3 0.00
4 0.00
2 0.00
8 0.00
6 0.00
BSTREE>
Now, as in figure 4.21, we insert 5 into the tree. Note
again that it looks like figure 4.21:
BSTREE> INSERT 5 0
BSTREE> PREORDER
6 0.00
2 0.00
1 0.00
4 0.00
3 0.00
5 0.00
8 0.00
If we delete node 5, then again we have the left side of figure
4.21. This is also the same as the left side of figure 4.23:
BSTREE> DELETE 5
BSTREE> PREORDER
6 0.00
2 0.00
1 0.00
4 0.00
3 0.00
8 0.00
When we delete node 4, it will replace the right child of node
2 with node three, as depicted in the picture:
BSTREE> DELETE 4
BSTREE> PREORDER
6 0.00
2 0.00
1 0.00
3 0.00
8 0.00
Now, to show deletion of a node with two children, I'll delete
node 3 and then add nodes 5, 3 and 4. This will give us
the tree in the left side of figure 4.24:
BSTREE> DELETE 3
BSTREE> INSERT 5 0
BSTREE> INSERT 3 0
BSTREE> INSERT 4 0
BSTREE> PREORDER
6 0.00
2 0.00
1 0.00
5 0.00
3 0.00
4 0.00
8 0.00
Now, we delete node 2. This will replace node 2 with node
3, and delete node three. We're left will the tree
depicted on the right side of figure 4.24:
BSTREE> DELETE 2
BSTREE> PREORDER
6 0.00
3 0.00
1 0.00
5 0.00
4 0.00
8 0.00
BSTREE>
Finally, note that binary search trees can be bad if they are
created with the keys already sorted. For example, look at the
following tree:
UNIX> bstree_test
BSTREE> INSERT Cindy 1955
BSTREE> INSERT Dave 1923
BSTREE> INSERT Jim 1966
BSTREE> INSERT Peg 1929
BSTREE> INSERT Terry 1963
BSTREE> INORDER
Cindy 1955.00
Dave 1923.00
Jim 1966.00
Peg 1929.00
Terry 1963.00
BSTREE> PREORDER
Cindy 1955.00
Dave 1923.00
Jim 1966.00
Peg 1929.00
Terry 1963.00
BSTREE>
As you see, the tree is unbalanced, and finding keys in this tree
is as inefficient as find keys in a linked list: O(n).
We'll talk more about this later.