CS140 -- Lab 9

Brad Vander Zanden


This lab is designed to:

  1. give you practice with function pointers,
  2. give you practice with information hiding,
  3. introduce you to makefiles, and
  4. give you practice with using and implementing binary search tree's for range queries. Range queries can take two forms:

    1. A request for all data in a certain range. Sample queries might be all students with grades between 80 and 89, or all political figures whose photo appeared in a major news publication between October 10 and November 1.

    2. A request for the first or last n items in a data set. Sample queries might be the 10 runners with the fastest times in a race or the 20 politicians whose photos appeared most frequently in major news publications on November 1.

Binary search trees are ideally suited to handle range queries because they keep data in sorted order. However, the way that binary search trees are often implemented, it is difficult to get from the current data element to the next data element without doing an inorder traversal. Inorder traversals are fine when we want to process all the elements in a tree but they are inefficient when we only want to access a limited number of elements in the tree.

To help facilitate the handling of range queries, you are going to modify my binary search tree library so that you can traverse a binary search tree as though it were a linked list.

You are then going to write a program that reads race results and performs range queries on these results.


Lab Materials


Linked-List Style Traversal of Binary Search Trees

In order to extend a binary search tree to handle linked-list style traversals, you are going to have to make a number of modifications to my existing code:

  1. add next and prev pointers to each node in a tree. The next pointer should point to the next element in ascending order in the tree and the prev pointer should point to the next element in descending order in the tree. For example, if we have the data set {3, 8, 13, 23, 28} then the next pointer for 13 should be to the node for 23 and the prev pointer for 13 should be to the node for 8.

  2. add a sentinel binary tree node pointer to the tree container struct (Bstree). This sentinel binary node will point to the minimum and maximum elements of the tree, via its next and prev pointers. In the above data set, the sentinel node's next pointer would point to 3 and its prev pointer would point to 28. Additionally, 3's prev pointer and 28's next pointers will both point to the sentinel node. You can set the remaining fields in the sentinel node (key, value, left_child, and right_child to NULL) since they will not be used.

  3. when you insert a new value into the tree, you will need to make the new value point to the appropriate successor and predecessor values in the tree, as well as adjust the prev and next pointers of these values so that they point to the new value. If the child is added as a left child, then its successor will always be its parent. If the child is added as a right child, then its predecessor will always be its parent. From these two relationships you should be able to figure out how to insert a child into the linked list. It is unacceptable to traverse the linked list from the start to the back in order to find where to insert the new node, because that would require an O(n) search and would destroy the O(log n) performance of the insert.

  4. you are going to need to modify your find procedure so that it returns a pointer to a node containing the target key, rather than the value associated with that key.

  5. you are going to add a bstree_next function that returns the node with the next highest value in the tree, a bstree_prev function that returns the node with the next lowest value in the tree, and a bstree_end function that returns 1 if the node passed to it is the sentinel node and 0 otherwise.

  6. you will make a few other minor changes to my code which are noted in the "New Binary Tree Library" section below.


A More Generic Binary Tree Library

In my notes I assumed that the key was a char *. Since my library knows the type of the key, it makes key comparisons simple--I simply use strcmp. However, forcing the key to be a char * also limits the flexibility of the library. In this lab you are going to make the key be a void *, just like the value. Now your library will not know how to compare keys unless it gets some assistance from the library's user. In particular, the user will need to pass to new_bstree a pointer to a function that compares two keys and returns an indication of their lexicographic order. You will need to store a pointer to this function in your bstree container struct so that the insert and find routines can access it.


New Binary Tree Library Interface

You should create a new binary tree library with the following interface. You are not allowed to change any part of the interface:

  1. void bstree_insert(void *tree, void *key, void *value): Insert the (key,value) pair into the tree in sorted order. hint: You will need to modify my insertion helper routine so that it has access to the parent node. You will probably want to add an additional parent parameter to the insertion helper routine. When you get to the bottom of the tree and create the new node, you can check to see whether it should be the left or right child of the parent node and set the next and prev links for both the new node and the parent node accordingly.
  2. void *bstree_find(void *tree, void *key, bool *found): Find the node associated with key in the binary tree and return either 1) a pointer to that node or 2) a pointer to the first node whose key is greater than the target key. If the key is found, then set the found parameter to true; otherwise set it to false. Note that my code returns NULL if the key is not found. However, when you are dealing with range queries you do not want to force the user to ensure that the first key in the range is in the tree. Instead you want to get a starting spot. For example, suppose I want all students whose grades are between 80 and 89. If the grades of these students are 82, 85, and 88, I would like my find function to return a pointer to 82, not return NULL. This is what your find function will now be doing. If the tree is empty or the search key is greater than any key in the tree, then return a pointer to the sentinel node.

    The found flag is a convenience for the user. The user could determine whether or not the key was found by retrieving the key from the returned node and determining whether or not it is equal to the search key. However, it is much easier if the user can simply check the flag. Passing a parameter as a pointer and then setting it is a common way of allowing a function to return more than one value.

  3. void *bstree_next(void *node): returns the node associated with the next higher key in the tree.
  4. void *bstree_prev(void *node): returns the node associated with the next lower key in the tree.
  5. int bstree_end(void *tree, void *node): returns true if the node is the sentinel node of the tree and false otherwise. You can use bstree_end for iterating through the tree and making sure that you stop when you reach the last node in the tree if traversing the tree in ascending order or the first node if traversing the tree in descending order.
  6. void *new_bstree(int (*compare)(void *key1, void *key2)): create a container struct for a binary search tree and return it as a void *. The compare function will take pointers to two keys and return a negative number, 0, or a positive number depending on whether key1 is less than, equal to, or greater than key2. Your library code should store a pointer to the function in the container struct for the binary search tree. It will need to use this function when it tries to find or insert values in the tree.
  7. void *bstree_key(void *node): This function only needs to be modified so that it returns a void * rather than a char *. The function body can remain unchanged.
  8. void *bstree_find_max(void *tree): With a linked list, you should be able to rewrite this function so that it returns the maximum node in O(1) rather than O(log n) time.
  9. void *bstree_find_min(void *tree): With a linked list, you should be able to rewrite this function so that it returns the minimum node in O(1) rather than O(log n) time.

The following functions should also be in your bstree library, but they do not need to be modified (i.e., you can use my bstree code without modifying it):

Finally you can delete the following two functions from my library:


Race Results

In this part of the lab you are going to read the results of a race and then allow the user to perform several types of range queries.

Input

The input to your program will consist of lines of the form:

FirstName LastName mm:ss
where FirstName and LastName are single word fields and mm:ss is the runner's time in minutes and seconds. A sample file might be:
Nels VanderZanden 18:03
Mickey Mouse 20:05
Minnie Mouse 17:50
Brad VanderZanden 16:57
Daffy Duck 17:08
Joe Tortoise 29:08
Sally Hare 16:59
Naturally I won :)

Queries

Once you have read the input the user will be able to enter queries on stdin that request information. Your program should support the following queries:

Program Output

Print the runners who meet the query criteria one per line with single spacing between each field. Note that you cannot assume that the fields in the input will be separated by single spaces and hence it will be necessary to store the fields individually in a runner's struct.

Program Design

You will need to read your input into a binary search tree using the time as the key and the name as the value. You can either store the time in a struct that has two fields--minutes and second--or you can convert the time to seconds using the equation 60*minutes + seconds. If you store the time as a struct then your comparison function will need to compare both the minutes and the seconds. You can use strchr to find the : delimiter and separate the time into separate minute and seconds fields.

Once you have read the input into your binary search tree you will need to enter a loop that reads stdin and performs the indicated query.

Error Checking

You need to perform the following error checks:

  1. Check that the number of command line arguments is correct.
  2. Check that the input file can be opened.
  3. Check that each line of input has exactly three fields.
  4. Check that the format of the time is correct and that the minutes and seconds are both numeric. A time must have one or more digits for the minutes and exactly two digits for the seconds.
  5. Check that a query is correctly formatted and print an appropriate error message if it is not correctly formatted. Your error messages do not need to precisely imitate mine but they should be easily understood by a user.

You may assume that there are no duplicate times in the input (i.e., you do not have to error check this condition) and you do not have to catch time errors of the form 2a:40 or 23:4a, since sscanf and atoi can both convert the strings to numbers. In contrast, you must catch errors of the form a2:40 and 23:a4 because sscanf and atoi cannot convert these strings to numbers. Note that in this lab you cannot use atoi or else you will miss some errors. Can you see why?


Design Document

To help you think through how you might want to design your program for this lab, you should answer the following questions and hand them in when told to do so by the TA:

  1. Show the binary search tree that would result from processing the sample race results file shown earlier. When you draw the tree, also draw the next and prev links between the nodes. To make your drawing simpler, it is okay to:

  2. As each node from the sample race results file gets inserted, show which nodes immediately precede and succeed it in the current tree (not in the final tree, but in the tree that results immediately after the node is inserted). For example:

    Inserted NodePrevious NodeNext Node
    Nels VanderZanden 18:03SentinelNodeSentinelNode
    Mickey Mouse 20:05Nels Vander Zanden 18:03SentinelNode
    Minnie Mouse 17:50Sentinel NodeNels Vander Zanden 18:03

    Now answer the following questions:

    1. Based on the above insertion pattern, if a node is inserted as a left child, what is its successor node (i.e., what is the relationship of the successor node to the left child--parent, grandparent, left sibling, right sibling left child, right child)?
    2. Based on the above insertion pattern, if a node is inserted as a right child, what is its predecessor node?

  3. Suppose you decide to represent the key as a minute/second pair. Show the struct you would declare.

  4. Now show the comparison function you would write to compare two keys in your minute/second pair representation.

  5. Suppose you decide to represent the key as an integer. Show the comparision function you would write to compare two keys.

  6. Show the call you would make to insert a key with value 1078 into a tree named my_tree. Assume the value field is a null pointer for this problem.

  7. Show the struct you plan to use to store the name fields.

  8. Suppose you have created a tree named my_tree and inserted integer keys into it. Complete the following problems:

    1. Write code fragment that prints the value of the minimum key in the tree.
    2. Write a code fragment that prints the first value that is greater than or equal to 100 in the tree.
    3. Write a code fragment that traverses the tree's linked list in ascending order and prints the values of all the keys in the tree.

  9. Given the sample race results file presented earlier, write down the output that should be produced by your program for each of the following queries:

    1. first 3
    2. last 2
    3. range 14:00 17:30
    4. range * 18:00
    5. range 19:00 *

What To Hand In

You should submit your design document when the TA asks for it during the lab. You will submit the following files to the TAs via the submit script:

  1. bstree.h
  2. bstree.c
  3. runner.c