Our search trees will use character strings as keys and use strcmp() as our comparison function. This is not totally general, but it will serve as a nice introduction to search trees.
First, look at bstree.c. This defines two typedefs (remember that we are using information hiding so we need to put the typedefs in the .c file and not the .h file). The first is a node of the binary search tree. This has a search key, a value and pointers to left and right children:
typedef struct bstreenode {
char *key;
void *value;
struct bstreenode *left_child;
struct bstreenode *right_child;
} BstreeNode;
The second typedef is for the tree container structure. All it
is is a pointer to the root of the tree:
typedef struct {
BstreeNode *root;
} Bstree;
Now the invariant in a binary search tree is that given node n, all nodes reachable from n->left_child will have keys less than n->key, and that all nodes reachable from n->right_child will have keys greater than n->right_child. In this implementation, we will not allow two nodes to have the same key.
Bstree.h defines the following procedures:
I am not going to go over the implementation in detail but I will discuss it in class. Look at the code yourself. New_bstree(), bstree_insert(), bstree_find(), bstree_find_max() and bstree_find_min() are all straightforward code that you should be able to look over and understand rather quickly. bstree_insert and bstree_find use recursion, which means that they call themselves.
The only two tricky ones are free_bstree() and bstree_delete_node().
Free_bstree() is recursive -- it simply frees its left and right children, and then frees itself. Note, recursive_free_bstree() is defined to be static -- this means that it may only be used by procedures in bstree.c. This is a convenient thing to do when you need a procedure like recursive_free_bstree() in an implementation, but you don't want anyone else to call it. Here is the code for free_bstree():
static void recursive_free_bstree(BstreeNode *bn)
{
if (bn == NULL) return;
if (bn->left != NULL) recursive_free_bstree(bn->left);
if (bn->right != NULL) recursive_free_bstree(bn->right);
free(bn);
return;
}
void free_bstree(void *tree)
{
Bstree *b = (Bstree *)tree;
recursive_free_bstree(b->root);
free(b);
return;
}
Node deletion is pretty complex. One way to do it is
by using lazy deletion where it marks a node as deleted but does not
remove it from the tree. If the number of deletes is roughly equal to
the number of nodes in the tree, then the running time of the algorithms
is not affected, as long as the tree stays balanced. The reason is that
the number of nodes in the tree will be roughly double what it would
otherwise be (since the number of deleted nodes equal the number of nodes
in the tree), which increases the depth of the tree by at most 1. Hence instead
of our operations requiring log n time, they will require
1 + log n time, which is still O(n).
If the number of deletes is not equal to the number of nodes in the tree, then the running time could be affected and it is better to actually remove the node from the tree. There are three cases to consider:
6
/ \
2 8
/ \
1 4
/
3
If we delete 1, then we have the tree:
6
/ \
2 8
\
4
/
3
6
/ \
2 8
\
3
6
/ \
2 8
/ \
1 4
/ \
3 5
If we delete 2 from this tree, then we will promote 3 to replace 2, since
3 is the smallest key in 2's right subtree:
6
/ \
3 8
/ \
1 4
\
5
Here is my code for deleting a node from a tree:
// delete a node with the indicated key from the tree.
static void *bstree_delete_helper(BstreeNode *node, char *key) {
if (node == NULL)
return NULL; // Item not found; do nothing
if (strcmp(key, node->key) < 0)
node->left_child = bstree_delete_helper(node->left_child, key);
else if (strcmp(key, node->key) > 0)
node->right_child = bstree_delete_helper(node->right_child, key);
// else key found so do deletion
else if ((node->left_child != NULL) && (node->right_child != NULL)) {
// two child case--find smallest element in right subtree and promote
// it to this node. Then recursively remove the smallest element in the
// right subtree
BstreeNode *smallest = bstree_find_min_helper(node->right_child);
node->key = smallest->key;
node->value = smallest->value;
node->right_child = bstree_delete_helper(node->right_child, smallest->key);
}
else { // either no children or one child
BstreeNode *saveNode = node;
if (node->left_child != NULL)
node = node->left_child;
else
node = node->right_child; // works even if right_child is NULL
free(saveNode);
}
return node;
}
/* delete the given key from the tree */
void bstree_delete(void *tree, char *key) {
Bstree *b = (Bstree *)tree;
b->root = bstree_delete_helper(b->root, key);
}
Note that I used a recursive helper function to perform the deletion, and I
declared it static because I want to make it private to this file.
Here are some examples. First, we'll create a tree that looks as follows:
6
/ \
2 8
/ \
1 4
/
3
(note, even though
our search trees are character strings, we can use them
to sort single digit numbers. Below, I will use values of
zero for everything):
UNIX> bstree_test
BSTREE> INSERT 6 0
BSTREE> INSERT 2 0
BSTREE> INSERT 1 0
BSTREE> INSERT 4 0
BSTREE> INSERT 3 0
BSTREE> INSERT 8 0
BSTREE> INORDER
1 0.00
2 0.00
3 0.00
4 0.00
6 0.00
8 0.00
BSTREE> PREORDER
6 0.00
2 0.00
1 0.00
4 0.00
3 0.00
8 0.00
BSTREE>
Note, the preorder traversal shows that the tree is just as
depicted in the above figure. We could also do
the same with a post-order traversal. Frankly, I find the
preorder traversal easier to understand:
BSTREE> POSTORDER
1 0.00
3 0.00
4 0.00
2 0.00
8 0.00
6 0.00
BSTREE>
Now we insert 5 into the tree. Here's our update figure, followed by
a preorder traversal:
6
/ \
2 8
/ \
1 4
/ \
3 5
BSTREE> INSERT 5 0
BSTREE> PREORDER
6 0.00
2 0.00
1 0.00
4 0.00
3 0.00
5 0.00
8 0.00
If we delete node 5, then we again have the original figure.
BSTREE> DELETE 5
BSTREE> PREORDER
6 0.00
2 0.00
1 0.00
4 0.00
3 0.00
8 0.00
When we delete node 4, it will replace the right child of node
2 with node three, as depicted in the picture:
6
/ \
2 8
/ \
1 3
BSTREE> DELETE 4
BSTREE> PREORDER
6 0.00
2 0.00
1 0.00
3 0.00
8 0.00
Now, to show deletion of a node with two children, I'll delete
node 3 and then add nodes 5, 3 and 4. This will give us
the following tree:
6
/ \
2 8
/ \
1 5
/
3
\
4
BSTREE> DELETE 3
BSTREE> INSERT 5 0
BSTREE> INSERT 3 0
BSTREE> INSERT 4 0
BSTREE> PREORDER
6 0.00
2 0.00
1 0.00
5 0.00
3 0.00
4 0.00
8 0.00
Now, we delete node 2. This will replace node 2 with node
3, and delete node three. We're left will the tree
depicted on the right side of figure 4.24:
6
/ \
3 8
/ \
1 5
/
4
BSTREE> DELETE 2
BSTREE> PREORDER
6 0.00
3 0.00
1 0.00
5 0.00
4 0.00
8 0.00
BSTREE>
Finally, note that binary search trees can be bad if they are
created with the keys already sorted. For example, look at the
following tree:
UNIX> bstree_test
BSTREE> INSERT Cindy 1955
BSTREE> INSERT Dave 1923
BSTREE> INSERT Jim 1966
BSTREE> INSERT Peg 1929
BSTREE> INSERT Terry 1963
BSTREE> INORDER
Cindy 1955.00
Dave 1923.00
Jim 1966.00
Peg 1929.00
Terry 1963.00
BSTREE> PREORDER
Cindy 1955.00
Dave 1923.00
Jim 1966.00
Peg 1929.00
Terry 1963.00
BSTREE>
As you see, the tree is unbalanced, and finding keys in this tree
is as inefficient as finding keys in a linked list: O(n).
We'll talk more about this later.