CS140 -- Lab 10


Overview

This lab is a two part lab. The first part is an experiment that will explore how successful binary search trees, AVL trees, and splay trees are in creating balanced trees. The second part will give you an opportunity to revisit the broker portion of lab 5 and redo it using a balanced tree scheme. You will use a new type of balanced binary tree, a red-black tree, that is even more efficient than AVL or splay trees. You will be using Dr. Plank's red-black tree library, which you will see much more of in CS302 and CS360. The implementation of a red-black tree is quite complicated and we will not explore its implementation in this class.


Binary Tree Experiment

In tree.h you will find the definition of a binary tree node and routines for creating binary search trees, AVL trees, and splay trees. The source file, tree.c, can be found in ~cs140/www-home/spring-2005/src and the object file tree.o can be found in ~cs140/www-home/spring-2005/objs. The object file is linked automatically to your executable by the makefile provided for this lab. You do not need to read or understand the source code in tree.c in order to complete this lab. The comments you find in tree.h should be sufficient for you to complete the lab. One important thing to note about the make_tree functions is that they create and return a sentinel node. Hence your program will have pointers to sentinel nodes rather than to the roots of the trees. If you want to get a pointer to the root of a tree you should call the function tree_root and pass it a pointer to the sentinel node. The insert and find functions expect pointers to the sentinel nodes as well.

Your task is to read a file that is given on the command line and insert each word in the file into a binary search tree, an AVL tree, and a splay tree. You should only insert a word into the trees if it has not yet occurred in the file. You can check whether it has already been seen by calling the splay_tree_find function on the splay tree. It will return 0 if the word is not in the tree and otherwise will return a pointer to the node containing the word. You should also save each unique word on a dllist so that you can do a series of finds on the splay tree once all the words in the file have been read. You should append words to the dllist.

Once all the words in the file have been read and inserted into the respective trees, you should traverse the dllist of words that you created and for each word you should call the splay_tree_find function on your splay tree. Before each such call you should compute the height of the current tree and the depth of the word for which you will be searching. You will want to keep a cumulative total of the height and the depth so that you can compute an average depth and height when you have completed all the finds. The depth of the word for which you are searching can be found by writing a simple find function for a binary search tree and computing the depth as you search for the node. Note that the node is always guaranteed to be in the tree so you do not have to worry about reaching NULL nodes at the bottom of the tree. You can use this fact to simplify your find function.

The reason for computing the depth and height of the splay tree before each find is that the find command will dynamically adjust the tree and hence the tree's height and the depth of the nodes will change after each find. By calculating the depth of a word before a find is performed on it you can accurately count the number of nodes that must be traversed to find it. If you calculate the depth of a word after a find is performed the depth will always be 0 because the find command rotates the word to the root.

When you have completed traversing the dllist of words you can compute the average depth and height of the splay tree by dividing your cumulative depth and height by the number of words on the list. At this point you can also compute the height and the average depth of the binary search tree and the AVL tree. The average depth is obtained by summing the depth of each node and then dividing this sum by the total number of nodes in the tree. Assuming that all finds are done on random words, the average depth represents the expected number of nodes that must be examined to retrieve a word's node. Since the find function does not change either a binary search tree or an AVL tree, it is ok to compute the average depth and height without doing individual finds on these two trees.

You should be able to obtain both the height of these two trees and compute their cumulative depth, which is the sum of the depths of all nodes, using a recursive tree traversal. You can then divide the cumulative depth by the number of nodes in the tree to get the average depth. At the end of your program you should print the number of words that were read by your program, which constitutes N, and log2N. C does not provide a function for computing log2N but you can compute log2N using the following formula:

log2N = log(N) / log(2)
You will have to include math.h in your program to use C's log function. The log function returns the natural log, ln, of a number. When you use the math.h library you must remember to link C's math library using the -lm flag. The makefile for this lab does this task for you.

Once your program has printed the above information it should print the 1) height, 2) average depth, and 3) ratio of the average depth to log2N for each tree. For example, your output might look as follows:

%UNIX btree_experiment myfile
Number of words = 233  log 233 = 7.86

		height	depth	depth/log n
Binary tree     22.00	15.85   2.02
AVL tree 	10.00	9.56	1.22
Splay tree  	15.21	11.23	1.43
You should name your program btree_experiment. If you have any questions about what how your output should appear, consult the executable, btree_experiment, which can be found in the lab10 directory. You can correctly position the "height", "depth", and "depth/log n" labels by using the tab character, \t, in your printf statement. The "height" label should appear two tab positions from the beginning of the line and the "depth" and "depth/log n" labels should each be one tab position over:
printf("\t\theight\tdepth\tdepth/log n\n\n");


Broker

Redo the broker program from lab 5 so that you use Dr. Plank's red-black tree library rather than a dllist to first sort customers by name and later to sort customers by average transaction cost. You will still use a dllist to store a customer's transactions and you will still name your executable broker.

You will need to include jrb.h in your program. The library's basic data type is a pointer to a tree node, which it calls JRB (for Jim's Red-Black tree). The only two fields you need concern yourself with are the key and val fields. The other fields are work fields that are used by the library's functions to balance the tree and link the nodes. The libfdr.a library contains the red-black tree functions and the makefile automatically links this library with your executable. The functions that you will need to use from Dr. Plank's library are:

  1. JRB make_jrb(): creates a new red-black tree.

  2. JRB jrb_insert_str(JRB tree, char *key, Jval val): insert a new node into the tree using a standard character string as the key. Strcmp() is used as the comparison function. jrb_insert_str returns a pointer to the new jrb tree node. Even though the key is a string, it will be converted into a Jval in the new node. Thus, if you want to get at the key of node s, you should either use jval_s(s->key) or s->key.s.

  3. JRB jrb_insert_dbl(JRB tree, double dkey, Jval val): insert a new node into the jrb tree based on its dkey value. Like a string key, the dkey will be converted to a Jval so if you want to get at the key of a node s, you should either use jval_d(s->key) or s->key.d.

  4. JRB jrb_find_str(JRB root, char *key): returns a JRB node whose value is equal to key or NULL if there is no such node in the tree. root should be a pointer to the root of the red-black tree (i.e, the node returned by make_jrb).

  5. jrb_rtraverse(ptr, jrb_tree): Traverses a jrb tree in reverse lexicographic order and assigns ptr to each node in turn. Both ptr and jrb_tree should be declared as JRB nodes.

  6. Jval jrb_val(JRB node): returns the value stored at a node.