CS140 -- Lab 7
Lab Objective
This lab is designed to give you practice with:
- using and implementing binary search tree's,
- implementing information hiding using void *'s,
- writing and using software libraries, and
- implementing and traversing general trees
The lab is divided into three parts. The first part has you write a binary search
tree library. The second part has you count the frequency of words in a file and
print out the words and their frequency in alphabetical order. This part is designed
to give you some warm-up practice with binary search trees.
The third part has you lay out a family tree. This part is designed to give you
practice with creating and traversing general purpose trees, and with
using binary search trees.
Lab Materials
- Executables for the test files are in the directory
/home/bvz/cs140/labs/lab7. As usual, if you have
questions about how these programs should work, try these.
- word1 is a test file for word_count and
family1, family2, and family3 are test files
for family_tree. You should develop your own test files
as well.
- bst_test_option1.c and bst_test_option2.c provide
simple test programs that test your binary search tree library.
Binary Search Tree Library
In this part of the lab you are going to create a binary search tree library that
supports several operations. There are two different interfaces that your library may choose
to support. Choose only one of the two interfaces.
Option 1: Separate Keys and Values
The first option minimizes the number of function pointers you must use by having a separate key
and value for each node in the binary tree:
- void tree_insert(char *key, void *value, void *binary_tree):
Insert the (key, value) pair into the tree in sorted order.
- void *tree_find(char *key, void *binary_tree):
Find the node associated with key in the binary tree and return either
a pointer to that node's value or 0 if the key is not in the tree.
- void *create_tree(): create a record for a binary search tree and return it as
a void *.
- void print_tree(void *tree, void (*print_fct)(char *key, void *value)): print the tree in
sorted order based on the values of the keys. print_tree should
perform an in_order traversal. print_fct should print an appropriate line
for the current node. It can cast the value
argument to the appropriate type and then print the value. For example:
// This function declaration goes in your application program
void print_name(char *key, void *value) {
// this function just prints the key and ignores the value field. You can
// use this function for the bst_test_option1 program but you will need to modify it
// to work with the word_count program
printf("%s\n", key);
}
...
main() {
void *tree;
...
print_tree(tree, print_name);
}
Option 2: Single Value Field
The second option gives you more practice with function pointers. If you use choose this option, then
a binary tree node will have only a value field. When you insert a (key,value) pair into the
tree, you will
need to bundle your key and your value into a struct and pass the struct to your insert routine
as the value field. Since your insert and find functions will not know how to compare two structs,
you will need to pass your create_tree function a comparison function that takes two value structs,
v1 and v2, and returns a negative number, 0, or a positive number based on whether v1 is less than
v2, equal to v2, or greater than v2:
- void tree_insert(void *value, void *binary_tree):
Insert the value into the tree in sorted order.
- void *tree_find(void *key, void *binary_tree):
Find the node associated with key in the binary tree and return either
a pointer to that node's value or 0 if the key is not in the tree.
- void *create_tree(int (*compare)(void *val1, void *val2)): create a record for a binary search tree and return it as
a void *.
The compare function will
take pointers to two values
and return a negative number, 0, or a positive number depending on whether val1 is
less than, equal to, or greater than val2. The types of val1 and
val2 should be the same--they are the values that you are storing in each
binary tree node.
Your library code should store a pointer to the function in the
record for the binary search tree. It will need to use this function when
it tries to find or insert values in the tree.
- void print_tree(void *tree, void (*print_fct)(void *value)): print the tree in
sorted order based on the values of the keys. print_tree should
perform an in_order traversal. print_fct should cast the value
argument to the appropriate type and then print the value on its own
line. For example:
// This function declaration goes in your application program
void print_name(void *value) {
// in bst_tree_option2.c you do not have a (key,value) pair, only a key. Hence you can treat
// the key as your value field. When the value is passed to print_name, you simply cast
// the value to a char *. For word_count, you will need to use a struct to store your
// (key, value) pair, so you will need to cast your value to the struct and then print
// the key and value parts of the struct.
printf("%s\n", (char *)value);
}
...
main() {
void *tree;
...
print_tree(tree, print_name);
}
Problem Specifications
- Place the function declarations in a file called bintree.h and the
function implementations in a file called bintree.c.
- Place the
structs that you use to implement a binary search tree in bintree.c, so that their
declaration, and hence the binary search tree's implementation, is hidden from the user.
- There are programs in /home/bvz/cs140/lab7 called bst_test_option1.c
and bst_test_option2.c
that you can use to test your binary search tree library. They take any number of strings
on the command line and print them out in alphabetical order.
Word Frequency
Write a program named word_count
that reads words from a file given on the command line and counts the
number of times each word occurs in the file. Once all the words
have been read, print them in alphabetical order, along with the
number of times they occurred in the file. Do not worry about
potential errors.
If your input file is:
the quick brown fox
jumped over the brown
fence
then your output would look like:
brown 2
fence 1
fox 1
jumped 1
over 1
quick 1
the 2
Notice that there is a single space between a word and its count.
Case does matter. For example, if the first "the" was capitalized in
the above input file, then "The" and "the" would appear as two
separate words. This assumption makes your life easier, not more
difficult because you do not have to worry about converting words
to all lower or upper case.
Your program should use your binary search tree library to store the words
and their frequencies.
Error Checking
Your program should perform the following error checks:
- Check that word_count
has the correct number of command line arguments.
- Check that the input file can be successfully opened.
However, you should check
for other types of special situations, such as blank lines or empty files.
Drawing A Family Tree
Write a program called family_tree that reads an input file that contains
a family tree, calculates the graphical coordinates for each node in the tree, and
outputs the graphical coordinates to a user-specified file.
You may then pass the output-file as a command line argument to
/home/bvz/cs140/labs/lab7/display_tree. display_tree
will read the coordinates that you produce and display the
family tree.
Format of the Input
The input to your program will consist of lines of the form:
parent-name child1-name child2-name ... childk-name
The first line of the input file represents the root of the family tree.
For example:
Mary Jane Jill Emily Howard
Jill Joe William Eddie
Howard Tom James Ellen Katie
Jane Tommy Jennifer Susan
Tom Hank Nancy
Mary is the root of the this tree.
To simplify the problem, you may assume that the children names are unique. Note that
a child may appear as a parent later in the input. You should not assume that children
always appear after their parents. They may appear before their parents as well. Children
that do not have an entry are assumed to be leaf nodes in the family tree.
Program Design
Your program should read the input lines and construct a family tree. You cannot
make any assumptions about how many children a person might have. Once you have constructed
the family tree, you can compute a layout for the family tree using a two step
procedure as follows:
- Calculate the space required by each subtree: The space for a subtree should be
calculated as follows:
- Space(Leaf Node) = The number of characters in the person's name
- Space(Interior Node) = max(number of characters in the person's name,
Sum of childrens' space + (k-1)*2) where k
is the number of children. The multiplication by 2 effectively puts a 2 character
spacing between children. The max function takes care of unusual cases
such as the parent having a single child and the parent having a
longer name than the child.
- Calculate the position of each node. The y coordinate will be the depth of the
node. The x coordinate will represent the center of the node and
can be calculated as follows:
- Position(Root) = 0
- Position(Child1) = Position(Parent) - Space(Parent) / 2 + Space(Child1) / 2
- Position(Childi) = Position(Childi-1) + Space(Childi-1) / 2 + 2 + Space(Childi) / 2
Format of the Output
For each person in your family tree you should output the person's name, the person's
x and y coordinates, and the names of the person's children.
Each person should have their own separate
line of output. You should use a pre-order traversal to print your tree. A sample
line of output might be:
Mary 0 0 Jane Jill Emily Howard
Error Checking
You should perform the following error checks:
- Check that the number of command line arguments is correct
- Check that the input and output files can be successfully opened
- Every parent must have at least one child so ensure
that each line in the input has at least two fields, one for
the parent and at least one for a child.
- Check that the input file is not empty (i.e., there must be at least
one line of input)
You may assume that names are unique in the sense that no name appears more
than once in a family tree (note that a name may appear as both a parent and
a child but a name will not appear more than once as a child).
Make sure that you check for special situations, such as a parent having only
a single child or a parent's name requiring more space than the space required
by its children.
Design Document
- Draw the family tree that would result from the sample input given earlier. Do not
worry about space or layout coordinates. Just draw the tree.
- For each person in the sample input, calculate the space required by that person's
node in the family tree.
- For each person in the sample input, calculate the position of that person's
node in the family tree.
- Write down the struct you will use to represent a node in the family tree. This
struct will need to include fields to represent the space required by the node
and its x and y coordinates. Can
you use an array to represent a node's children? Why or why not?
- As you read each input line, you should try to determine whether or not a node
already exists for the parent. If a node already exists, then you should retrieve
the node and establish child links to each child. If a node does not yet exist,
then you should create a node for the parent. For each of the parent's children
you should also determine whether or not a node already exists for the child. If
a node already exists, then you should retrieve the node and establish a child
link to it. If a node does not yet exist, then you should create a node for the
child and establish a child link to it.
- What data structure can you use to efficiently determine whether or not
a parent or child node exists and, if it does exist, to efficiently retrieve it?
- What information will you need to keep in each node of this data structure?
Write down the struct that will keep this information.
- What type of traversal (pre-order, in-order, or post-order) will you need to use
to calculate the space required by each node in the family tree. Justify your answer.
- What type of traversal (pre-order, in-order, or post-order) will you need to use
to calculate the position of each node in the family tree. Justify your answer.
What To Hand In
You should submit your design document when the TA asks for it during the lab.
You will submit the following files to the TAs via the submit script:
- bintree.h
- bintree.c
- word_count.c
- family_tree.c