CS140 -- Lab 8
This is a lab for you to get practice with hashing.
- The makefile for scoreproc is
/home/cs140/www-home/spring-2005/labs/lab8/makefile.
- The scoreproc executable is in:
/home/cs140/www-home/spring-2005/labs/lab8. As usual, if you have
questions about how scoreproc should work, try this executable.
- There are two simple example score files for scoreproc in
scfile1 and scfile2.
- Scorefiles for the 1998 PGA golf season are in
/home/cs140/www-home/spring-2005/labs/lab8/golf/*.
- You may find that you want to use the sqrt function from the
C math library. To use it you will need to include <math.h>
in your file. The sqrt function takes a double as an argument and
returns a double. You may want to use this function when computing
a prime number.
- You should look at the makefile and see how the -lm flag is used to
link the C math library to your executable. You will find the -lm
flag placed early in the makefile, in the definition of the library
symbols. When linking in C libraries it is important to place them
last in your list of files because the linker only loads those functions
that it needs. It uses only the files it has already seen to determine
these functions. Thus if you place the -lm flag before your
scoreproc.o file, the linker will not have seen the sqrt function
when it processes the -lm flag and it will not link in the sqrt
function. You will then get a linker error that claims that sqrt
cannot be found and you will tear out your hair because you "know"
that you included the math library. By the way, the -l indicates a
library is to follow and the "m" indicates the math library.
- I have added a new file to the objs directory called
string_conversion.o and a new file to the include directory
called string_conversion.h. If you want to see the source code, you can find it in
~cs140/www-home/spring-2005/src/string_conversion.c.
I have also included the .o file
in the list of library files in your makefile, so it will be
linked automatically into your executable. This new file provides
four functions that convert strings to integers, longs, floats, and
doubles respectively. They are more reliable than sscanf because
1) they will not treat any variation of "nan" as meaning infinity, and
2) they will not treat a word that starts with numbers but has trailing
letters, like 123xy, as a number. As an example of the difference
between sscanf and these functions, sscanf will treat the string
"Nancy" as if it is a number representing infinity, whereas these
functions will treat it as a string that is not a number. The functions
take two arguments, a character string and an address where they will
place the converted results. They return an integer denoting either
success, 1, or failure, 0. I would prefer that you start using these
functions instead of sscanf, atoi, or atof.
Scoreproc
This is a program that helps you process score files. A score file
is a file where each line is either blank (in which case it should
be ignored) or it has a name and a score on it. The name can
be multiple words with any amount of white space between them. You should
convert all names to strings with just one space between the words.
The last word on each line is a non-negative score, which is a floating
point number (as always, use a double and error check if the
last word is not a number).
Input
scoreproc takes a number indicating the approximate number
of unique names and then a list of score files on the command line.
It then reads each score in every file, and for each name, it
computes the average score for that name. In other words, a name
can have multiple entries in a score file, and different score
files can have different scores with the same name.
For example, the files
scfile1
and
scfile2
are two simple score files:
UNIX> cat scfile1
Phil Fulmer 9
Pat Summitt 10
Jerry Green 8
UNIX> cat scfile2
Rod Delmonico 7
Pat Summitt 11
Jerry Green 6
If we call scoreproc with both files as command line arguments,
the program will keep track of four names:
- Phil Fulmer, with an average score of 9.
- Pat Summitt, with an average score of 10.5.
- Jerry Green, with an average score of 7.
- Rod Delmonico, with an average score of 7.
(Note those files are really meaningless -- I just assigned random
numbers to non-random names....)
Program Actions
scoreproc must first read all the name/score pairs and place
them into a hash table. When reading a name your program should first
check whether the name is already in the hash table. If it is not then
you should create a record for the name and add it to the hash table.
Regardless of whether or not you create a record you will then need to
update the score information so that you can calculate an average
score once all the name/score pairs have been read.
Once all the name/pair scores have been read your program should
ask the user to enter a name. Your program should print the number of
scores plus the average score for that name. If the name wasn't
specified in the score files, then your program should say that the name isn't
found:
UNIX> scoreproc 4 scfile1 scfile2
Enter a name: Pat Summitt
Pat Summitt: Avg score: 10.50 #scores: 2
Enter a name: Phil Fulmer
Phil Fulmer: Avg score: 9.00 #scores: 1
Enter a name: Jim Plank
Jim Plank is not in the score files
Enter a name: < CNTL-D >
Once the user terminates their queries by entering CNTL-D your program
should print the size of the hash table and the records associated
with each entry. Continuing with the previous example, you would get
the following output once the user hits CNTL-D:
table size = 5
0: Empty
1: Empty
2: Jerry Green: Avg score: 7.00 #scores: 2
3: Rod Delmonico: Avg score: 7.00 #scores: 1
Phil Fulmer: Avg score: 9.00 #scores: 1
4: Pat Summitt: Avg score: 10.50 #scores: 2
UNIX>
Printing Notes
- All scores and averages should be printed to two decimal
places.
- You can get the proper indenting for the entries in each hash table
bucket by using the \t character in your printf formatting
string. The \t character indicates that printf should skip to the
next tab stop and then start printing the next character. For example,
printf("\t%s\n", "brad");
will indent "brad" by the number of positions defined for the
tab stop on your computer.
- Places three blank spaces between the average score and the character
string "#scores".
Implementation Details
You should use a hash table with separate chaining to keep track
of the names/scores. Your table size must be the smallest prime number
greater than or equal to the estimated number of names given on the
command line.
For the hashing function, you should use the method described in
figure 5.5 on page 152 in the book. You should use bit shifting
as outlined in the example.
There you have it. Try the example above. Also, a great set of input
files to try are the files in
/home/cs140/www-home/spring-2005/labs/lab8/golf/* (these are all the
PGA tournament results from 1998, minus the Masters, with scores
normalized over a 100-point scale):
UNIX> scoreproc 400 /home/cs140/www-home/spring-2005/labs/lab8/golf/*
Enter a name: Tiger Woods
Tiger Woods: Avg score: 30.32 #scores: 14
Enter a name: Davis Love III
Davis Love III: Avg score: 38.20 #scores: 13
Enter a name: Glen Day
Glen Day: Avg score: 49.37 #scores: 20
Enter a name: Jim Plank
Jim Plank is not in the score files
Enter a name: < CNTL-D >
table size = 401
0: Peter Hedblom: Avg score: 100.00 #scores: 1
1: Gabriel Hjerstedt: Avg score: 100.00 #scores: 1
2: Walt Chapman: Avg score: 100.00 #scores: 1
David Young: Avg score: 100.00 #scores: 1
Donnie Hammond: Avg score: 62.83 #scores: 18
...
UNIX>
What To Submit
Submit a program named scoreproc.c and the makefile.