CS140 -- Lab 7


Lab Objective

This lab is designed to give you practice with:

  1. using and implementing binary search tree's,
  2. implementing information hiding using void *'s,
  3. writing and using software libraries, and
  4. implementing and traversing general trees
The lab is divided into three parts. The first part has you write a binary search tree library. The second part has you count the frequency of words in a file and print out the words and their frequency in alphabetical order. This part is designed to give you some warm-up practice with binary search trees. The third part has you lay out a family tree. This part is designed to give you practice with creating and traversing general purpose trees, and with using binary search trees.

Lab Materials


Binary Search Tree Library

In this part of the lab you are going to create a binary search tree library that supports several operations. There are two different interfaces that your library may choose to support. Choose only one of the two interfaces.

Option 1: Separate Keys and Values

The first option minimizes the number of function pointers you must use by having a separate key and value for each node in the binary tree:

  1. void tree_insert(char *key, void *value, void *binary_tree): Insert the (key, value) pair into the tree in sorted order.
  2. void *tree_find(char *key, void *binary_tree): Find the node associated with key in the binary tree and return either a pointer to that node's value or 0 if the key is not in the tree.
  3. void *create_tree(): create a record for a binary search tree and return it as a void *.
  4. void print_tree(void *tree, void (*print_fct)(char *key, void *value)): print the tree in sorted order based on the values of the keys. print_tree should perform an in_order traversal. print_fct should print an appropriate line for the current node. It can cast the value argument to the appropriate type and then print the value. For example: // This function declaration goes in your application program void print_name(char *key, void *value) { // this function just prints the key and ignores the value field. You can // use this function for the bst_test_option1 program but you will need to modify it // to work with the word_count program printf("%s\n", key); } ... main() { void *tree; ... print_tree(tree, print_name); }

Option 2: Single Value Field

The second option gives you more practice with function pointers. If you use choose this option, then a binary tree node will have only a value field. When you insert a (key,value) pair into the tree, you will need to bundle your key and your value into a struct and pass the struct to your insert routine as the value field. Since your insert and find functions will not know how to compare two structs, you will need to pass your create_tree function a comparison function that takes two value structs, v1 and v2, and returns a negative number, 0, or a positive number based on whether v1 is less than v2, equal to v2, or greater than v2:

  1. void tree_insert(void *value, void *binary_tree): Insert the value into the tree in sorted order.
  2. void *tree_find(void *key, void *binary_tree): Find the node associated with key in the binary tree and return either a pointer to that node's value or 0 if the key is not in the tree.
  3. void *create_tree(int (*compare)(void *val1, void *val2)): create a record for a binary search tree and return it as a void *. The compare function will take pointers to two values and return a negative number, 0, or a positive number depending on whether val1 is less than, equal to, or greater than val2. The types of val1 and val2 should be the same--they are the values that you are storing in each binary tree node. Your library code should store a pointer to the function in the record for the binary search tree. It will need to use this function when it tries to find or insert values in the tree.
  4. void print_tree(void *tree, void (*print_fct)(void *value)): print the tree in sorted order based on the values of the keys. print_tree should perform an in_order traversal. print_fct should cast the value argument to the appropriate type and then print the value on its own line. For example: // This function declaration goes in your application program void print_name(void *value) { // in bst_tree_option2.c you do not have a (key,value) pair, only a key. Hence you can treat // the key as your value field. When the value is passed to print_name, you simply cast // the value to a char *. For word_count, you will need to use a struct to store your // (key, value) pair, so you will need to cast your value to the struct and then print // the key and value parts of the struct. printf("%s\n", (char *)value); } ... main() { void *tree; ... print_tree(tree, print_name); }

Problem Specifications

  1. Place the function declarations in a file called bintree.h and the function implementations in a file called bintree.c.
  2. Place the structs that you use to implement a binary search tree in bintree.c, so that their declaration, and hence the binary search tree's implementation, is hidden from the user.
  3. There are programs in /home/bvz/cs140/lab7 called bst_test_option1.c and bst_test_option2.c that you can use to test your binary search tree library. They take any number of strings on the command line and print them out in alphabetical order.


Word Frequency

Write a program named word_count that reads words from a file given on the command line and counts the number of times each word occurs in the file. Once all the words have been read, print them in alphabetical order, along with the number of times they occurred in the file. Do not worry about potential errors.

If your input file is:

	the quick brown fox
	jumped over the brown
	fence
	
then your output would look like:
	brown 2
	fence 1
	fox 1
	jumped 1
	over 1
	quick 1
	the 2
	
Notice that there is a single space between a word and its count. Case does matter. For example, if the first "the" was capitalized in the above input file, then "The" and "the" would appear as two separate words. This assumption makes your life easier, not more difficult because you do not have to worry about converting words to all lower or upper case.

Your program should use your binary search tree library to store the words and their frequencies.

Error Checking

Your program should perform the following error checks:

  1. Check that word_count has the correct number of command line arguments.
  2. Check that the input file can be successfully opened.

However, you should check for other types of special situations, such as blank lines or empty files.


Drawing A Family Tree

Write a program called family_tree that reads an input file that contains a family tree, calculates the graphical coordinates for each node in the tree, and outputs the graphical coordinates to a user-specified file. You may then pass the output-file as a command line argument to /home/bvz/cs140/labs/lab7/display_tree. display_tree will read the coordinates that you produce and display the family tree.

Format of the Input

The input to your program will consist of lines of the form:

parent-name child1-name child2-name ... childk-name
The first line of the input file represents the root of the family tree. For example:
Mary Jane Jill Emily Howard
Jill Joe William Eddie
Howard Tom James Ellen Katie
Jane Tommy Jennifer Susan
Tom Hank Nancy
Mary is the root of the this tree. To simplify the problem, you may assume that the children names are unique. Note that a child may appear as a parent later in the input. You should not assume that children always appear after their parents. They may appear before their parents as well. Children that do not have an entry are assumed to be leaf nodes in the family tree.

Program Design

Your program should read the input lines and construct a family tree. You cannot make any assumptions about how many children a person might have. Once you have constructed the family tree, you can compute a layout for the family tree using a two step procedure as follows:

  1. Calculate the space required by each subtree: The space for a subtree should be calculated as follows:

  2. Calculate the position of each node. The y coordinate will be the depth of the node. The x coordinate will represent the center of the node and can be calculated as follows:

Format of the Output

For each person in your family tree you should output the person's name, the person's x and y coordinates, and the names of the person's children. Each person should have their own separate line of output. You should use a pre-order traversal to print your tree. A sample line of output might be:

Mary 0 0 Jane Jill Emily Howard

Error Checking

You should perform the following error checks:

  1. Check that the number of command line arguments is correct
  2. Check that the input and output files can be successfully opened
  3. Every parent must have at least one child so ensure that each line in the input has at least two fields, one for the parent and at least one for a child.
  4. Check that the input file is not empty (i.e., there must be at least one line of input)

You may assume that names are unique in the sense that no name appears more than once in a family tree (note that a name may appear as both a parent and a child but a name will not appear more than once as a child).

Make sure that you check for special situations, such as a parent having only a single child or a parent's name requiring more space than the space required by its children.

Design Document

  1. Draw the family tree that would result from the sample input given earlier. Do not worry about space or layout coordinates. Just draw the tree.

  2. For each person in the sample input, calculate the space required by that person's node in the family tree.

  3. For each person in the sample input, calculate the position of that person's node in the family tree.

  4. Write down the struct you will use to represent a node in the family tree. This struct will need to include fields to represent the space required by the node and its x and y coordinates. Can you use an array to represent a node's children? Why or why not?

  5. As you read each input line, you should try to determine whether or not a node already exists for the parent. If a node already exists, then you should retrieve the node and establish child links to each child. If a node does not yet exist, then you should create a node for the parent. For each of the parent's children you should also determine whether or not a node already exists for the child. If a node already exists, then you should retrieve the node and establish a child link to it. If a node does not yet exist, then you should create a node for the child and establish a child link to it.

  6. What type of traversal (pre-order, in-order, or post-order) will you need to use to calculate the space required by each node in the family tree. Justify your answer.

  7. What type of traversal (pre-order, in-order, or post-order) will you need to use to calculate the position of each node in the family tree. Justify your answer.


What To Hand In

You should submit your design document when the TA asks for it during the lab. You will submit the following files to the TAs via the submit script:

  1. bintree.h
  2. bintree.c
  3. word_count.c
  4. family_tree.c