CS140 -- Lab 10

Brad Vander Zanden


This lab is designed to:

  1. give you practice with function pointers,
  2. give you practice with information hiding,
  3. introduce you to makefiles, and
  4. give you practice with using and implementing binary search tree's for range queries. Range queries can take two forms:

    1. A request for all data in a certain range. Sample queries might be all students with grades between 80 and 89, or all political figures whose photo appeared in a major news publication between October 10 and November 1.

    2. A request for the first or last n items in a data set. Sample queries might be the 10 runners with the fastest times in a race or the 20 politicians whose photos appeared most frequently in major news publications on November 1.

Binary search trees are ideally suited to handle range queries because they keep data in sorted order. However, the way that binary search trees are often implemented, it is difficult to get from the current data element to the next data element without doing an inorder traversal. Inorder traversals are fine when we want to process all the elements in a tree but they are inefficient when we only want to access a limited number of elements in the tree.

To help facilitate the handling of range queries, you are going to modify my binary search tree library so that you can traverse a binary search tree as though it were a linked list.

You are then going to write a program that reads race results and performs range queries on these results.


Lab Materials


Part 1--Linked-List Style Traversal of Binary Search Trees

In order to extend a binary search tree to handle linked-list style traversals, you are going to have to make a number of modifications to my existing code:

  1. add next and prev pointers to each node in a tree. The next pointer should point to the next element in ascending order in the tree and the prev pointer should point to the next element in descending order in the tree. For example, if we have the data set {3, 8, 13, 23, 28} then the next pointer for 13 should be to the node for 23 and the prev pointer for 13 should be to the node for 8.

  2. add a sentinel binary tree node pointer to the tree container struct (Bstree). This sentinel binary node will point to the minimum and maximum elements of the tree, via its next and prev pointers. In the above data set, the sentinel node's next pointer would point to 3 and its prev pointer would point to 28. Additionally, 3's prev pointer and 28's next pointers will both point to the sentinel node. You can set the remaining fields in the sentinel node (key, value, left_child, and right_child to NULL) since they will not be used.

  3. when you insert a new value into the tree, you will need to make the new value point to the appropriate successor and predecessor values in the tree, as well as adjust the prev and next pointers of these values so that they point to the new value. If the child is added as a left child, then its successor will always be its parent. If the child is added as a right child, then its predecessor will always be its parent. From these two relationships you should be able to figure out how to insert a child into the linked list. It is unacceptable to traverse the linked list from the start to the back in order to find where to insert the new node, because that would require an O(n) search and would destroy the O(log n) performance of the insert.

  4. modify the bstree_find and tree_find_helper procedure so that they return a pointer to a node containing the target key, rather than the value associated with that key.

  5. add the following function declarations to bstree.h and their corresponding implementations to bstree.c:

  6. modify bstree.h so that any function which takes a key as a parameter has the type of the key changed from a char * to the type that you use to represent a race time, which will be your key for this lab.

  7. modify bstree.c so that the comparison code in tree_insert_helper and tree_find_helper works with your racetime key, rather than a char *.

  8. Delete the following functions from bstree.c:


Part 2--Race Results

In this part of the lab you are going to write a program named race.c that reads the results of a race and then allows the user to perform several types of range queries.

Input

The input to your program will consist of lines of the form:

FirstName LastName mm:ss
where FirstName and LastName are single word fields and mm:ss is the runner's time in minutes and seconds. A sample file might be:
Nels VanderZanden 18:03
Mickey Mouse 20:05
Minnie Mouse 17:50
Brad VanderZanden 16:57
Daffy Duck 17:08
Joe Tortoise 29:08
Sally Hare 16:59
Naturally I won :)

Queries

Once you have read the input the user will be able to enter queries on stdin that request information. Your program should support the following queries:

Program Output

Print the runners who meet the query criteria one per line with single spacing between each field. Note that you cannot assume that the fields in the input will be separated by single spaces and hence it will be necessary to store the fields individually in a runner's struct.

Program Design

You will need to read your input into a binary search tree using the time as the key and the name as the value. You can either store the time in a struct that has two fields--minutes and second--or you can convert the time to seconds using the equation 60*minutes + seconds. If you store the time as a struct then your comparison function will need to compare both the minutes and the seconds. You can use strchr to find the : delimiter and separate the time into separate minute and seconds fields.

Once you have read the input into your binary search tree you will need to enter a loop that reads stdin and performs the indicated query.

Error Checking

You need to perform the following error checks:

  1. Check that the number of command line arguments is correct.
  2. Check that the input file can be opened.
  3. Check that each line of input has exactly three fields.
  4. Check that the format of the time is correct and that the minutes and seconds are both numeric. A time must have one or more digits for the minutes and exactly two digits for the seconds.
  5. Check that a query is correctly formatted and print an appropriate error message if it is not correctly formatted. Your error messages do not need to precisely imitate mine but they should be easily understood by a user.

You may assume that there are no duplicate times in the input (i.e., you do not have to error check this condition) and you do not have to catch time errors of the form 2a:40 or 23:4a, since sscanf and atoi can both convert the strings to numbers. In contrast, you must catch errors of the form a2:40 and 23:a4 because sscanf and atoi cannot convert these strings to numbers. Note that in this lab you cannot use atoi or else you will miss some errors. Can you see why?


Part 3--Creating A More Generic Binary Tree Library Using Function Pointers

In my original bstree library I assumed that the key was a char * and in part 1 of this lab you changed the key to be a key related to time. You should see that the library is not portable, since each problem seems to require changing the type of the key, and then changing the comparison code for the key. In this part of the lab you are going to make the key be a void *, just like the value. Now your library will not know how to compare keys unless it gets some assistance from the library's user. In particular, the user will need to pass to new_bstree a pointer to a function that compares two keys and returns an indication of their lexicographic order. You will need to store a pointer to this function in your bstree container struct so that the insert and find routines can access it.


New Binary Tree Library Interface

You should create a new binary tree library with the following interface. You are not allowed to change any part of the interface:

  1. void bstree_insert(void *tree, void *key, void *value): Insert the (key,value) pair into the tree in sorted order. hint: You will need to modify my insertion helper routine so that it has access to the parent node. You will probably want to add an additional parent parameter to the insertion helper routine. When you get to the bottom of the tree and create the new node, you can check to see whether it should be the left or right child of the parent node and set the next and prev links for both the new node and the parent node accordingly.
  2. void *bstree_find(void *tree, void *key, bool *found): Find the node associated with key in the binary tree and return either 1) a pointer to that node or 2) a pointer to the first node whose key is greater than the target key. If the key is found, then set the found parameter to true; otherwise set it to false. Note that my code returns NULL if the key is not found. However, when you are dealing with range queries you do not want to force the user to ensure that the first key in the range is in the tree. Instead you want to get a starting spot. For example, suppose I want all students whose grades are between 80 and 89. If the grades of these students are 82, 85, and 88, I would like my find function to return a pointer to 82, not return NULL. This is what your find function will now be doing. If the tree is empty or the search key is greater than any key in the tree, then return a pointer to the sentinel node.

    The found flag is a convenience for the user. The user could determine whether or not the key was found by retrieving the key from the returned node and determining whether or not it is equal to the search key. However, it is much easier if the user can simply check the flag. Passing a parameter as a pointer and then setting it is a common way of allowing a function to return more than one value.

  3. void *bstree_next(void *node): returns the node associated with the next higher key in the tree.
  4. void *bstree_prev(void *node): returns the node associated with the next lower key in the tree.
  5. int bstree_end(void *tree, void *node): returns true if the node is the sentinel node of the tree and false otherwise. You can use bstree_end for iterating through the tree and making sure that you stop when you reach the last node in the tree if traversing the tree in ascending order or the first node if traversing the tree in descending order.
  6. void *new_bstree(int (*compare)(void *key1, void *key2)): create a container struct for a binary search tree and return it as a void *. The compare function will take pointers to two keys and return a negative number, 0, or a positive number depending on whether key1 is less than, equal to, or greater than key2. Your library code should store a pointer to the function in the container struct for the binary search tree. It will need to use this function when it tries to find or insert values in the tree.
  7. void *bstree_key(void *node): This function only needs to be modified so that it returns a void * rather than a char *. The function body can remain unchanged.
  8. void *bstree_find_max(void *tree): With a linked list, you should be able to rewrite this function so that it returns the maximum node in O(1) rather than O(log n) time.
  9. void *bstree_find_min(void *tree): With a linked list, you should be able to rewrite this function so that it returns the minimum node in O(1) rather than O(log n) time.

The following functions do not need to be modified (i.e., you can use my bstree code without modifying it):

Finally you can delete the following functions from my library:


Testing your Generic Library

In the lab10 directory, the files bst_find.c, bst_print.c, and bst_traverse.c provide several simple programs that you can use to test your binary search tree library:

  1. bst_find: you provide several words on the command line and it 1) inserts the words into a bstree, and 2) tries to find each command line argument and prints out its status (found or not found). The program searches all of your words plus argv[0], which is the command name. Only the command name should not be found. A sample invocation is:
    bst_find fox hound brown jumped the gate
    
  2. bst_print: you provide several words on the command line and it inserts them into a bstree and then visits the tree inorder and prints the words out in sorted order. A sample invocation is:
    bst_print fox hound brown jumped the gate
    
  3. bst_traverse: takes a file as input and stores each word and the line number of the first line on which it appears in a bstree. Then it:

    A sample invocation is:
    bst_traverse wordfile
    


Part 4--Modify Race.c

Modify your race.c file so that it uses your generic bstree library. You will need to write a comparison function to compare your keys, and then pass a pointer to this comparison function to new_bstree. The comparison function you wrote for the "In Lab Questions" should be helpful in serving as a guide for writing your comparison function. You may even be able to use the comparison function that you wrote without having to modify it.


In Lab Questions

To help you think through how you might want to design your program for this lab, you should answer the following questions during lab under the guidance of the TA:

  1. Show the binary search tree that would result from processing the sample race results file shown earlier. When you draw the tree, also draw the next and prev links between the nodes. To make your drawing simpler, it is okay to:

  2. As each node from the sample race results file gets inserted, show which nodes immediately precede and succeed it in the current tree (not in the final tree, but in the tree that results immediately after the node is inserted). For example:

    Inserted NodePrevious NodeNext Node
    Nels VanderZanden 18:03SentinelNodeSentinelNode
    Mickey Mouse 20:05Nels Vander Zanden 18:03SentinelNode
    Minnie Mouse 17:50Sentinel NodeNels Vander Zanden 18:03

    Now answer the following questions:

    1. Based on the above insertion pattern, if a node is inserted as a left child, what is its successor node (i.e., what is the relationship of the successor node to the left child--parent, grandparent, left sibling, right sibling left child, right child)?
    2. Based on the above insertion pattern, if a node is inserted as a right child, what is its predecessor node?

  3. Suppose you decide to represent the key as a minute/second pair. Show the struct you would declare to store this pair.

  4. Now show the comparison function you would write to compare two keys in your minute/second pair representation. Assume that the function prototype for the comparison function:
    int compare(void *key1, void *key2);
    
    The void *'s are actually pointers to your minute/second struct, so you will downcast the pointers to be pointers to your struct, then compare the values in the struct. Note that the return value is an int, not the bool I showed in class. Your return value should be computed as follows:
    key1 < key2: return any negative number
    key1 = key2: return 0
    key1 > key2: return any positive number
    

  5. Suppose you decide to represent the time key as an integer, using my conversion equation to convert the minute:second time to an integer. Show the comparision function you would write to compare two keys, again using the compare function prototype given in the previous problem. Now your void *'s will be pointers to ints.

  6. Show the call you would make to insert a key with value 1078 into a tree named my_tree (use my bstree API). Assume the value field is a null pointer for this problem. Note that you will first need to malloc an integer in which to store 1078, and then pass a pointer to that malloc'ed memory to the insert function.

  7. Given the sample race results file presented earlier, write down the output that should be produced by your program for each of the following queries:

    1. first 3
    2. last 2
    3. range 14:00 17:30
    4. range * 18:00
    5. range 19:00 *

What To Hand In

Note that there are two separate deadlines for this lab. For both deadlines you will submit the following files to the TAs via the submit script:

  1. bstree.h
  2. bstree.c
  3. race.c
For the first deadline, answer "10a" when prompted for a lab number. For the second deadline, answer "10b" when prompted for a lab number. For the second deadline, your bstree.h and bstree.c files should have been modified to handle void * keys, and race.c should have been modified to work with this new, generic bstree libary.