CS140 Lecture notes -- Binary Search Trees

  • Jim Plank
  • Directory: ~cs140/www-home/notes/Bstrees
  • Lecture notes: http://www.cs.utk.edu/~cs140/notes/Bstrees
  • Tue Nov 17 09:51:38 EST 1998

    Binary Search Trees

    The book has a very nice description of binary search trees in section 4.3. Please read it. These lecture notes simply concern the implementation. The implementation of binary search trees that I've made is in the files /lymon/homes/cs140/include/bstree.h, /lymon/homes/cs140/src/bstrees/bstree.c, and /lymon/homes/cs140/objs/bstree.o.

    Our search trees will use character strings as keys and use strcmp() as our comparison function. This is not totally general, but it will serve as a nice introduction to search trees.

    First, look at bstree.h. This defines two typedefs. The first is a node of the binary search tree. This has a search key, a value and pointers to left and right children:

    typedef struct bstreenode {
      char *key;
      Jval val;
      struct bstreenode *left;
      struct bstreenode *right;
    } BstreeNode;
    
    The second typedef is for the tree header structure. All it is is a pointer to the root of the tree:
    typedef struct {
      BstreeNode *root;
    } Bstree;
    

    Now the invariant in a binary search tree is that given node n, all nodes reachable from n->left will have keys less than n->key, and that all nodes reachable from n->right will have keys greater than n->right. In this implementation, we will not allow two nodes to have the same key.

    Bstree.h defines the following procedures:

    I am not going to go over the implementation in detail. Look at the code yourself. New_bstree(), bstree_insert(), bstree_find(), bstree_find_max() and bstree_find_min() are all straightforward code that you should be able to look over and understand rather quickly. The only two tricky ones are free_bstree() and bstree_delete_node().

    Free_bstree() is recursive -- it simply frees its left and right children, and then frees itself. Note, recursive_free_bstree() is defined to be static -- this means that it may only be used by procedures in bstree.c. This is a convenient thing to do when you need a procedure like recursive_free_bstree() in an implementation , but you don't want anyone else to call it. Here is the code for free_bstree():

    static void recursive_free_bstree(BstreeNode *bn)
    {
      if (bn == NULL) return;
      if (bn->left != NULL) recursive_free_bstree(bn->left);
      if (bn->right != NULL) recursive_free_bstree(bn->right);
      free(bn);
      return;
    }
    
    void free_bstree(Bstree *b)
    {
      BstreeNode *bn;
    
      recursive_free_bstree(b->root);
      free(b);
      return;
    }
    
    Node deletion is pretty complex. Read over the book's description -- this is exactly how I have implemented it. I find the node's parent with the find_parent() routine, figure out how I will delete the node, and then delete it. If both children of the node are non-NULL, then I find the smallest node in the subtree rooted by the right child and use it to replace the node that is to be deleted. I do this by saving its key and val, deleting it recursively, and then replacing the key and val of the specified node.

    Note, I also need special code for when the node to be deleted is the root of the tree. Here is the code:

    static BstreeNode *find_parent(Bstree *t, BstreeNode *bn)
    {
      int cmp;
      BstreeNode *tmp;
    
      tmp = t->root;
    
      if (tmp == bn) return NULL;
    
      while(1) {
        cmp = strcmp(tmp->key, bn->key);
        if (cmp == 0) {
          fprintf(stderr, "Internal Error: two nodes with the same key (%s)\n",
                           tmp->key);
          exit(1);
        }
        if (cmp > 0) {
          if (tmp->left == NULL) {
            fprintf(stderr, "Internal Error finding parent -- left child empty\n");
            exit(1);
          } else if (tmp->left == bn) {
            return tmp;
          } else {
            tmp = tmp->left;
          }
        } else {
          if (tmp->right == NULL) {
            fprintf(stderr, "Internal Error finding parent -- right child empty\n");
            exit(1);
          } else if (tmp->right == bn) {
            return tmp;
          } else {
            tmp = tmp->right;
          }
        }
      }
    }
    
    void bstree_delete_node(Bstree *b, BstreeNode *bn)
    {
      BstreeNode *parent, *replacement;
      int cmp;
      char *key;
      Jval val;
    
      parent = find_parent(b, bn);
    
      if (bn->left != NULL && bn->right != NULL) {
        replacement = bn->right;
        while (replacement->left != NULL) {
          replacement = replacement->left;
        }
        key = replacement->key;
        val = replacement->val;
        bstree_delete_node(b, replacement);
        bn->key = key;
        bn->val = val;
        return;
      } else {
        if (bn->left == NULL && bn->right == NULL) {
          replacement = NULL;
        } else if (bn->left == NULL) {
          replacement = bn->right;
        } else {
          replacement = bn->left;
        }
        if (parent == NULL) {
          b->root = replacement;
        } else if (parent->left == bn) {
          parent->left = replacement;
        } else {
          parent->right = replacement;
        }
        free(bn);
        return;
      }
    }
    

    A simple application

    Bstree_test.c contains a simple tree editor which lets you manage a tree of words as keys and doubles as values. You can insert, delete, print the max and min, and do one of three traversals, preorder, postorder or inorder. Note the inorder traversal will print the tree in sorted order. The pre and post-order traversals use indentation to show you what the tree looks like.

    Here are some examples. First, we'll create a tree that looks just like the left tree in figure 4.21 (note, even though our search trees are character strings, we can use them to sort single digit numbers. Below, I will use values of zero for everything):

    UNIX> bstree_test
    BSTREE> INSERT 6 0
    BSTREE> INSERT 2 0
    BSTREE> INSERT 1 0
    BSTREE> INSERT 4 0
    BSTREE> INSERT 3 0
    BSTREE> INSERT 8 0
    BSTREE> INORDER
    1                                    0.00
    2                                    0.00
    3                                    0.00
    4                                    0.00
    6                                    0.00
    8                                    0.00
    BSTREE> PREORDER
    6 0.00
      2 0.00
        1 0.00
        4 0.00
          3 0.00
      8 0.00
    BSTREE> 
    
    Note, the preorder traversal shows that the tree is just as depicted in the left side of figure 4.21. We could also do the same with a post-order traversal. Frankly, I find the preorder traversal easier to understand:
    BSTREE> POSTORDER
        1 0.00
          3 0.00
        4 0.00
      2 0.00
      8 0.00
    6 0.00
    BSTREE> 
    
    Now, as in figure 4.21, we insert 5 into the tree. Note again that it looks like figure 4.21:
    BSTREE> INSERT 5 0
    BSTREE> PREORDER
    6 0.00
      2 0.00
        1 0.00
        4 0.00
          3 0.00
          5 0.00
      8 0.00
    
    If we delete node 5, then again we have the left side of figure 4.21. This is also the same as the left side of figure 4.23:
    BSTREE> DELETE 5
    BSTREE> PREORDER
    6 0.00
      2 0.00
        1 0.00
        4 0.00
          3 0.00
      8 0.00
    
    When we delete node 4, it will replace the right child of node 2 with node three, as depicted in the picture:
    BSTREE> DELETE 4
    BSTREE> PREORDER
    6 0.00
      2 0.00
        1 0.00
        3 0.00
      8 0.00
    
    Now, to show deletion of a node with two children, I'll delete node 3 and then add nodes 5, 3 and 4. This will give us the tree in the left side of figure 4.24:
    BSTREE> DELETE 3
    BSTREE> INSERT 5 0
    BSTREE> INSERT 3 0
    BSTREE> INSERT 4 0
    BSTREE> PREORDER
    6 0.00
      2 0.00
        1 0.00
        5 0.00
          3 0.00
            4 0.00
      8 0.00
    
    Now, we delete node 2. This will replace node 2 with node 3, and delete node three. We're left will the tree depicted on the right side of figure 4.24:
    BSTREE> DELETE 2
    BSTREE> PREORDER
    6 0.00
      3 0.00
        1 0.00
        5 0.00
          4 0.00
      8 0.00
    BSTREE> 
    
    Finally, note that binary search trees can be bad if they are created with the keys already sorted. For example, look at the following tree:
    UNIX> bstree_test
    BSTREE> INSERT Cindy 1955
    BSTREE> INSERT Dave 1923
    BSTREE> INSERT Jim 1966
    BSTREE> INSERT Peg 1929    
    BSTREE> INSERT Terry 1963
    BSTREE> INORDER
    Cindy                             1955.00
    Dave                              1923.00
    Jim                               1966.00
    Peg                               1929.00
    Terry                             1963.00
    BSTREE> PREORDER
    Cindy 1955.00
      Dave 1923.00
        Jim 1966.00
          Peg 1929.00
            Terry 1963.00
    BSTREE> 
    
    As you see, the tree is unbalanced, and finding keys in this tree is as inefficient as find keys in a linked list: O(n). We'll talk more about this later.