CS140 -- Lab 8


This is a lab for you to get practice with hashing.

Scoreproc

This is a program that helps you process score files. A score file is a file where each line is either blank (in which case it should be ignored) or it has a name and a score on it. The name can be multiple words with any amount of white space between them. You should convert all names to strings with just one space between the words. The last word on each line is a non-negative score, which is a floating point number (as always, use a double and error check if the last word is not a number).

Input

scoreproc takes a number indicating the approximate number of unique names and then a list of score files on the command line. It then reads each score in every file, and for each name, it computes the average score for that name. In other words, a name can have multiple entries in a score file, and different score files can have different scores with the same name.

For example, the files scfile1 and scfile2 are two simple score files:

UNIX> cat scfile1
Phil Fulmer 9
Pat Summitt  10
Jerry Green  8
UNIX> cat scfile2
Rod Delmonico 7
Pat Summitt 11
Jerry Green 6
If we call scoreproc with both files as command line arguments, the program will keep track of four names: (Note those files are really meaningless -- I just assigned random numbers to non-random names....)

Program Actions

scoreproc must first read all the name/score pairs and place them into a hash table. When reading a name your program should first check whether the name is already in the hash table. If it is not then you should create a record for the name and add it to the hash table. Regardless of whether or not you create a record you will then need to update the score information so that you can calculate an average score once all the name/score pairs have been read.

Once all the name/pair scores have been read your program should ask the user to enter a name. Your program should print the number of scores plus the average score for that name. If the name wasn't specified in the score files, then your program should say that the name isn't found:

UNIX> scoreproc 4 scfile1 scfile2
Enter a name: Pat Summitt
  Pat Summitt: Avg score: 10.50   #scores: 2
Enter a name: Phil Fulmer
  Phil Fulmer: Avg score: 9.00   #scores: 1
Enter a name: Jim Plank
  Jim Plank is not in the score files
Enter a name:  < CNTL-D >

Once the user terminates their queries by entering CNTL-D your program should print the size of the hash table and the records associated with each entry. Continuing with the previous example, you would get the following output once the user hits CNTL-D:


table size = 5
0: Empty
1: Empty
2:      Jerry Green: Avg score: 7.00   #scores: 2
3:      Rod Delmonico: Avg score: 7.00   #scores: 1
        Phil Fulmer: Avg score: 9.00   #scores: 1
4:      Pat Summitt: Avg score: 10.50   #scores: 2
UNIX> 

Printing Notes

  1. All scores and averages should be printed to two decimal places.
  2. You can get the proper indenting for the entries in each hash table bucket by using the \t character in your printf formatting string. The \t character indicates that printf should skip to the next tab stop and then start printing the next character. For example,
    	printf("\t%s\n", "brad");
    	
    will indent "brad" by the number of positions defined for the tab stop on your computer.
  3. Places three blank spaces between the average score and the character string "#scores".

Implementation Details

You should use a hash table with separate chaining to keep track of the names/scores. Your table size must be the smallest prime number greater than or equal to the estimated number of names given on the command line.

For the hashing function, you should use the method described in figure 5.5 on page 152 in the book. You should use bit shifting as outlined in the example.


There you have it. Try the example above. Also, a great set of input files to try are the files in /home/cs140/www-home/spring-2005/labs/lab8/golf/* (these are all the PGA tournament results from 1998, minus the Masters, with scores normalized over a 100-point scale):
UNIX> scoreproc 400 /home/cs140/www-home/spring-2005/labs/lab8/golf/*
Enter a name: Tiger Woods
  Tiger Woods: Avg score: 30.32   #scores: 14
Enter a name: Davis Love III
  Davis Love III: Avg score: 38.20   #scores: 13
Enter a name: Glen Day
  Glen Day: Avg score: 49.37   #scores: 20
Enter a name: Jim Plank
  Jim Plank is not in the score files
Enter a name: < CNTL-D >

table size = 401
0:      Peter Hedblom: Avg score: 100.00   #scores: 1
1:      Gabriel Hjerstedt: Avg score: 100.00   #scores: 1
2:      Walt Chapman: Avg score: 100.00   #scores: 1
        David Young: Avg score: 100.00   #scores: 1
        Donnie Hammond: Avg score: 62.83   #scores: 18
...
UNIX> 


What To Submit

Submit a program named scoreproc.c and the makefile.