CS140 -- Lab 9
This lab is designed to:
- give you practice with function pointers,
- give you practice with information hiding,
- introduce you to makefiles, and
- give you practice with
using and implementing binary search tree's for range queries. Range
queries can take two forms:
- A request for all data in a certain range.
Sample queries might be
all students with grades between 80 and 89, or all political
figures whose photo appeared in a major news publication between October 10
and November 1.
- A request for the first or last n items in a data set. Sample
queries might be the 10 runners with the fastest times in a race or the
20 politicians whose photos appeared most frequently in major news
publications on November 1.
Binary search trees are ideally suited to handle range queries because they
keep data in sorted order. However, the way that binary search trees are
often implemented, it is difficult to get from the current data element to
the next data element without doing an inorder traversal. Inorder traversals
are fine when we want to process all the elements in a tree but they are
inefficient when we only want to access a limited number of elements in the
tree.
To help facilitate the handling of range queries, you are going to modify
my binary search tree library so that
you can traverse a binary search tree as though it were a linked list.
You are then going to write a program that reads race results and performs
range queries on these results.
Lab Materials
- Executables for the test files are in the directory
/home/bvz/cs140/labs/lab9. As usual, if you have
questions about how these programs should work, try these.
- race1 and race2 are an error-free and an error-filled
test file that you can use. You should develop your own test files
as well. You should not assume that every possible error type is in
race2. In fact, race2 does not contain every possible
error type.
- bst_find.c, bst_print.c, and bst_traverse.c provide several
simple test programs for your binary search tree library:
- bst_find: you provide several words on the command line and it
1) inserts the words into a bstree, and 2) tries to find each
command line argument and prints out its status (found or not
found). The program searches all of your words plus argv[0], which
is the command name. Only the command name should not be found.
- bst_print: you provide several words on the command line and
it inserts them into a bstree and then visits the tree inorder and
prints the words out in sorted order.
- bst_traverse: takes a file as input and stores each word and the
line number of the first line on which it appears in a bstree. Then
it:
- traverses the binary tree in the forward direction and prints
the words and their line numbers in ascending alphabetical
order.
- traverses the binary tree in the reverse direction and prints
the words and their line numbers in descending alphabetical
order.
- prints out all words that begin with the letters m-o.
wordfile is a simple test file for
bst_traverse.
- There is a file called makefile that you can use to automatically
create executables, so long as you name your files the way I told you to.
To use the makefile, you type:
make executable-name
For example:
make bst_find
If you simply type make, without any parameters, then it will try to
compile all the executables that are listed in the makefile. The TAs
will go over makefiles with you at the beginning of lab.
Linked-List Style Traversal of Binary Search Trees
In order to extend a binary search tree to handle linked-list style
traversals, you are going to have to make a number of modifications to
my existing code:
- add next and prev pointers to each node in a tree.
The next pointer should point to the next element in ascending
order in the tree and the prev pointer should point to the
next element in descending order in the tree. For example, if we have
the data set {3, 8, 13, 23, 28} then the next pointer
for 13 should be to the node for 23 and the prev
pointer for 13 should be to the node for 8.
- add a sentinel binary tree node pointer to the tree container struct (Bstree).
This sentinel binary node
will point to the minimum and maximum elements of the tree, via its
next and prev pointers. In the above data set, the
sentinel node's next pointer would point to 3 and its
prev pointer would point to 28. Additionally,
3's prev pointer and 28's next pointers will both
point to the sentinel node. You can set the remaining fields in
the sentinel node (key, value, left_child,
and right_child to NULL) since they will not be used.
- when you insert a new value into the tree, you will need to make the
new value point to the appropriate successor and predecessor values
in the tree, as well as adjust the
prev and next pointers of these values so that they
point to the new value. If the child is added as a left child, then
its successor will always be its parent. If the child is added as a right
child, then its predecessor will always be its parent. From these two
relationships you should be able to figure out how to insert a child into
the linked list. It is unacceptable to traverse the linked list from
the start to the back in order to find where to insert the new node,
because that would require an O(n) search and would destroy the O(log n)
performance of the insert.
- you are going to need to modify your find procedure so that
it returns a pointer to a node containing the target key, rather than
the value associated with that key.
- you are going to add
a bstree_next
function that returns the node with the next highest value in the tree,
a bstree_prev
function that returns the node with the next lowest value in the tree, and
a bstree_end function that returns 1 if the node passed to it is
the sentinel node and 0 otherwise.
- you will make a few other minor changes to my code which are noted in
the "New Binary Tree Library" section below.
A More Generic Binary Tree Library
In my notes I assumed that
the key was a char *. Since my library knows the type of the key,
it makes key comparisons simple--I simply use strcmp. However,
forcing the key to be a char * also limits the flexibility of the
library. In this lab you are going to make the key be a void *, just
like the value. Now your library will not know how to compare keys unless
it gets some assistance from the library's user. In particular, the user
will need to pass to new_bstree a pointer to a function that
compares two keys and returns an indication of their lexicographic order. You
will need to store a pointer to this function in your bstree container struct
so that the insert and find routines can access it.
New Binary Tree Library Interface
You should create a new binary tree library with the following interface. You
are not allowed to change any part of the interface:
- void bstree_insert(void *tree, void *key, void *value):
Insert the (key,value) pair into the tree in sorted order. hint:
You will need to modify my insertion helper routine so that it has
access to the parent node. You will probably want to add an additional
parent parameter to the insertion helper routine. When you get to
the bottom of the tree and create the new node, you can check to see
whether it should be the left or right child of the parent node and
set the next and prev links for both the new node and
the parent node accordingly.
- void *bstree_find(void *tree, void *key, bool *found):
Find the node associated with key in the binary tree and return either
1) a pointer to that node or 2) a pointer to the first node whose key
is greater than the target key. If the key is found, then set the found
parameter to true; otherwise set it to false.
Note that my code returns NULL
if the key is not found. However, when
you are dealing with range queries you do not want to force the user
to ensure that the first key in the range is in the tree. Instead you
want to get a starting spot. For example,
suppose I want all students whose grades are between 80 and 89. If the
grades of these students are 82, 85, and 88, I would like my find function
to return a pointer to 82, not return NULL. This is what your find function
will now be doing. If the tree is empty or the search key is greater than
any key in the tree, then return a pointer to the sentinel node.
The found flag is a convenience for the user. The user could determine
whether or not the key was found by retrieving the key from the returned
node and determining whether or not it is equal to the search key.
However, it is much easier if the user can simply check the flag. Passing
a parameter as a pointer and then setting it is a common way of
allowing a function to return more than one value.
- void *bstree_next(void *node): returns the node associated with the next
higher key in the tree.
- void *bstree_prev(void *node): returns the node associated with the
next lower key in the tree.
- int bstree_end(void *tree, void *node): returns true if the node is the
sentinel node of the tree
and false otherwise. You can use bstree_end for iterating through
the tree and making sure that you stop when you reach the last node in
the tree if traversing the tree in ascending order or the first node if
traversing the tree in descending order.
- void *new_bstree(int (*compare)(void *key1, void *key2)): create a
container struct for a binary search tree and return it as
a void *.
The compare function will
take pointers to two keys
and return a negative number, 0, or a positive number depending on whether key1 is
less than, equal to, or greater than key2.
Your library code should store a pointer to the function in the
container struct
for the binary search tree. It will need to use this function when
it tries to find or insert values in the tree.
- void *bstree_key(void *node): This function only needs to be modified
so that it returns a void * rather than a char *. The
function body can remain unchanged.
- void *bstree_find_max(void *tree): With a linked list, you should
be able to rewrite this function so that it returns the maximum node
in O(1) rather than O(log n) time.
- void *bstree_find_min(void *tree): With a linked list, you should
be able to rewrite this function so that it returns the minimum node
in O(1) rather than O(log n) time.
The following functions should also be in your bstree library, but they do
not need to be modified (i.e., you can use my bstree code without modifying
it):
- void *bstree_root(void *tree)
- void *bstree_left(void *node)
- void *bstree_right(void *node)
- void *bstree_value(void *node)
Finally you can delete the following two functions from my library:
- void free_bstree(void *tree)
- void *bstree_delete(void *tree, void *key)
Race Results
In this part of the lab you are going to read the results of a race and then
allow the user to perform several types of range queries.
Input
The input to your program will consist of lines of the form:
FirstName LastName mm:ss
where FirstName and LastName are single word fields and
mm:ss is the runner's time in minutes and seconds. A sample file
might be:
Nels VanderZanden 18:03
Mickey Mouse 20:05
Minnie Mouse 17:50
Brad VanderZanden 16:57
Daffy Duck 17:08
Joe Tortoise 29:08
Sally Hare 16:59
Naturally I won :)
Queries
Once you have read the input the user will be able to enter queries on stdin
that request information. Your program should support the following queries:
Program Output
Print the runners who meet the query criteria one per line with single
spacing between each field. Note that you cannot assume that the fields in
the input will be separated by single spaces and hence it will be necessary
to store the fields individually in a runner's struct.
Program Design
You will need to read your input into a binary search tree using the time
as the key and the name as the value. You can either store the time in a
struct that has two fields--minutes and second--or you can convert the time
to seconds using the equation 60*minutes + seconds. If you store the
time as a struct then your comparison function will need to compare both the
minutes and the seconds. You can use strchr to find the
: delimiter and
separate the time into separate minute and seconds fields.
Once you have read the input into your binary search tree you will need to
enter a loop that reads stdin and performs the indicated query.
Error Checking
You need to perform the following error checks:
- Check that the number of command line arguments is correct.
- Check that the input file can be opened.
- Check that each line of input has exactly three fields.
- Check that the format of the time is correct and that the minutes and
seconds are both numeric. A time must have one
or more digits for the minutes and exactly two digits for the seconds.
- Check that a query is correctly formatted and print an appropriate
error message if it is not correctly formatted. Your error messages
do not need to precisely imitate mine but they should be easily
understood by a user.
You may assume that there are no duplicate times in the input (i.e., you
do not have to error check this condition) and you do not have to catch
time errors of the form 2a:40 or 23:4a, since sscanf and atoi can both
convert the strings to numbers. In contrast, you must catch errors of
the form a2:40 and 23:a4 because sscanf and atoi cannot convert these
strings to numbers. Note that in this lab you cannot use atoi or else you
will miss some errors. Can you see why?
Design Document
To help you think through how you might want to design your program for
this lab, you should answer the following questions and hand them in when
told to do so by the TA:
- Show the binary search tree that would result from processing the
sample race results file shown earlier. When you draw the tree, also
draw the next and prev links between the nodes. To make your drawing
simpler, it is okay to:
- As each node from the sample race results file
gets inserted, show which nodes immediately precede
and succeed it in the current tree (not in the final tree, but in the
tree that results immediately after the node is inserted). For example:
Inserted Node | Previous Node | Next Node |
Nels VanderZanden 18:03 | SentinelNode | SentinelNode |
Mickey Mouse 20:05 | Nels Vander Zanden 18:03 | SentinelNode |
Minnie Mouse 17:50 | Sentinel Node | Nels Vander Zanden 18:03 |
Now answer the following questions:
- Based on the above insertion pattern, if a node is inserted as
a left child, what is its successor node (i.e., what is the
relationship of the successor node to the left child--parent,
grandparent, left sibling, right sibling left child, right child)?
- Based on the above insertion pattern, if a node is inserted as
a right child, what is its predecessor node?
- Suppose you decide to represent the key as a minute/second pair. Show the
struct you would declare.
- Now show the comparison function you would write to compare two keys
in your minute/second pair representation.
- Suppose you decide to represent the key as an integer. Show the
comparision function you would write to compare two keys.
- Show the call you would make to insert a key with value 1078 into
a tree named my_tree.
Assume the value field is a null pointer for this problem.
- Show the struct you plan to use to store the name fields.
- Suppose you have created a tree named my_tree and inserted
integer keys into it. Complete the following problems:
- Write code fragment that prints the value of the minimum key in the
tree.
- Write a code fragment that prints the first value that is greater than
or equal to 100 in the tree.
- Write a code fragment that traverses the tree's linked list in
ascending order and
prints the values of all the keys in the tree.
- Given the sample race results file presented earlier, write down the
output that should be produced by your program for each of the following
queries:
- first 3
- last 2
- range 14:00 17:30
- range * 18:00
- range 19:00 *
What To Hand In
You should submit your design document when the TA asks for it during the lab.
You will submit the following files to the TAs via the submit script:
- bstree.h
- bstree.c
- runner.c