CS140 -- Lab 8

CS140 -- Data Structures
James S. Plank (with modifications by Brad Vander Zanden)

This is a lab for you to get practice with hashing.

The makefile for scoreproc is /home/cs140/www-home/spring-2005/labs/lab8/makefile.
The scoreproc executable is in: /home/cs140/www-home/spring-2005/labs/lab8. As usual, if you have questions about how scoreproc should work, try this executable.
There are two simple example score files for scoreproc in scfile1 and scfile2.
Scorefiles for the 1998 PGA golf season are in /home/cs140/www-home/spring-2005/labs/lab8/golf/*.
You may find that you want to use the sqrt function from the C math library. To use it you will need to include <math.h> in your file. The sqrt function takes a double as an argument and returns a double. You may want to use this function when computing a prime number.
You should look at the makefile and see how the -lm flag is used to link the C math library to your executable. You will find the -lm flag placed early in the makefile, in the definition of the library symbols. When linking in C libraries it is important to place them last in your list of files because the linker only loads those functions that it needs. It uses only the files it has already seen to determine these functions. Thus if you place the -lm flag before your scoreproc.o file, the linker will not have seen the sqrt function when it processes the -lm flag and it will not link in the sqrt function. You will then get a linker error that claims that sqrt cannot be found and you will tear out your hair because you "know" that you included the math library. By the way, the -l indicates a library is to follow and the "m" indicates the math library.
I have added a new file to the objs directory called string_conversion.o and a new file to the include directory called string_conversion.h. If you want to see the source code, you can find it in ~cs140/www-home/spring-2005/src/string_conversion.c. I have also included the .o file in the list of library files in your makefile, so it will be linked automatically into your executable. This new file provides four functions that convert strings to integers, longs, floats, and doubles respectively. They are more reliable than sscanf because 1) they will not treat any variation of "nan" as meaning infinity, and 2) they will not treat a word that starts with numbers but has trailing letters, like 123xy, as a number. As an example of the difference between sscanf and these functions, sscanf will treat the string "Nancy" as if it is a number representing infinity, whereas these functions will treat it as a string that is not a number. The functions take two arguments, a character string and an address where they will place the converted results. They return an integer denoting either success, 1, or failure, 0. I would prefer that you start using these functions instead of sscanf, atoi, or atof.

Scoreproc

This is a program that helps you process score files. A score file is a file where each line is either blank (in which case it should be ignored) or it has a name and a score on it. The name can be multiple words with any amount of white space between them. You should convert all names to strings with just one space between the words. The last word on each line is a non-negative score, which is a floating point number (as always, use a double and error check if the last word is not a number).

Input

scoreproc takes a number indicating the approximate number of unique names and then a list of score files on the command line. It then reads each score in every file, and for each name, it computes the average score for that name. In other words, a name can have multiple entries in a score file, and different score files can have different scores with the same name.

For example, the files scfile1 and scfile2 are two simple score files:

UNIX> cat scfile1
Phil Fulmer 9
Pat Summitt  10
Jerry Green  8
UNIX> cat scfile2
Rod Delmonico 7
Pat Summitt 11
Jerry Green 6

If we call scoreproc with both files as command line arguments, the program will keep track of four names:

Phil Fulmer, with an average score of 9.
Pat Summitt, with an average score of 10.5.
Jerry Green, with an average score of 7.
Rod Delmonico, with an average score of 7.

(Note those files are really meaningless -- I just assigned random numbers to non-random names....)

Program Actions

scoreproc must first read all the name/score pairs and place them into a hash table. When reading a name your program should first check whether the name is already in the hash table. If it is not then you should create a record for the name and add it to the hash table. Regardless of whether or not you create a record you will then need to update the score information so that you can calculate an average score once all the name/score pairs have been read.

Once all the name/pair scores have been read your program should ask the user to enter a name. Your program should print the number of scores plus the average score for that name. If the name wasn't specified in the score files, then your program should say that the name isn't found:

UNIX> scoreproc 4 scfile1 scfile2
Enter a name: Pat Summitt
  Pat Summitt: Avg score: 10.50   #scores: 2
Enter a name: Phil Fulmer
  Phil Fulmer: Avg score: 9.00   #scores: 1
Enter a name: Jim Plank
  Jim Plank is not in the score files
Enter a name:  < CNTL-D >

Once the user terminates their queries by entering CNTL-D your program should print the size of the hash table and the records associated with each entry. Continuing with the previous example, you would get the following output once the user hits CNTL-D:


table size = 5
0: Empty
1: Empty
2:      Jerry Green: Avg score: 7.00   #scores: 2
3:      Rod Delmonico: Avg score: 7.00   #scores: 1
        Phil Fulmer: Avg score: 9.00   #scores: 1
4:      Pat Summitt: Avg score: 10.50   #scores: 2
UNIX>

Printing Notes

All scores and averages should be printed to two decimal places.
You can get the proper indenting for the entries in each hash table bucket by using the \t character in your printf formatting string. The \t character indicates that printf should skip to the next tab stop and then start printing the next character. For example,
```
	printf("\t%s\n", "brad");
	
```
will indent "brad" by the number of positions defined for the tab stop on your computer.
Places three blank spaces between the average score and the character string "#scores".

Implementation Details

You should use a hash table with separate chaining to keep track of the names/scores. Your table size must be the smallest prime number greater than or equal to the estimated number of names given on the command line.

For the hashing function, you should use the method described in figure 5.5 on page 152 in the book. You should use bit shifting as outlined in the example.

There you have it. Try the example above. Also, a great set of input files to try are the files in /home/cs140/www-home/spring-2005/labs/lab8/golf/* (these are all the PGA tournament results from 1998, minus the Masters, with scores normalized over a 100-point scale):

UNIX> scoreproc 400 /home/cs140/www-home/spring-2005/labs/lab8/golf/*
Enter a name: Tiger Woods
  Tiger Woods: Avg score: 30.32   #scores: 14
Enter a name: Davis Love III
  Davis Love III: Avg score: 38.20   #scores: 13
Enter a name: Glen Day
  Glen Day: Avg score: 49.37   #scores: 20
Enter a name: Jim Plank
  Jim Plank is not in the score files
Enter a name: < CNTL-D >

table size = 401
0:      Peter Hedblom: Avg score: 100.00   #scores: 1
1:      Gabriel Hjerstedt: Avg score: 100.00   #scores: 1
2:      Walt Chapman: Avg score: 100.00   #scores: 1
        David Young: Avg score: 100.00   #scores: 1
        Donnie Hammond: Avg score: 62.83   #scores: 18
...
UNIX>

What To Submit

Submit a program named scoreproc.c and the makefile.