CS140 -- Lab 2

This lab comes in four parts and is designed to give you practice using the strings, command line arguments, and C I/O.

Part 1: Maxmin

Your job here is to take an input file composed of names and scores, and to print out the maximum and minimum numbers in the input file. The specific format of the input file is as follows. Each line is of the form:

name score

The name may contain any number of words with any amount of white space between them. No word in a name may be a number. The score is a floating point number (use a double). Example input files are input1, input2 and input3 in the lab2 directory.

Maxmin should take an input file on standard input, and print out the maximum and minimum score, rounded to 2 decimal digits. If standard input is not in the proper form, maxmin can do anything.. You should try out the example executable in /home/bvz/cs140/labs/lab2/:

UNIX> maxmin < input1
Max: 0.71
Min: 0.38
UNIX> maxmin < input2
Max: 68.43
Min: 35.90
UNIX> maxmin < input3
Max: 74.58
Min: 69.21

You should also try this on other input files that you make up. Does your executable work if the input file has no lines?

(Hint: use scanf("%s", ...) to read in words, and then use atof(...) to see if it is a score.)

Part 2: Maxminname

Maxminname works just like maxmin, but it also prints the names with the maximum and minimum scores. Each word in the name should be printed with one space between each word. If more than one line contains the maximum or minimum score, then print ``DUPLICATE'' as the name. Again, if you have further questions about what maxminname should do, test the executable in /home/bvz/cs140/labs/lab2/:
UNIX> maxminname < input1
Max: 0.71 New York Yankees
Min: 0.38 Detroit
UNIX> maxminname < input2
Max: 68.43 Ted McLellan & Brad Vander Zanden
Min: 35.90 DUPLICATE
UNIX> maxminname < input3
Max: 74.58 Chip Beck
Min: 69.21 David Duval

Hint: you should have four character arrays in your program; all of them should be 1000 characters:

You read words into word, and build name. When you get a score, see if you should update max (and therefore maxname) and/or min (and therefore minname). When you reach the end of standard input, use max, maxname, min and minname to make your final printout.

Part 3: Word Frequency

A very rough way to categorize documents using "artificial intelligence" is to count the frequency of certain words that appear in the documents. If the combined sum of the word frequencies exceeds a certain threshold for a certain category, then the document will be placed in that category (e.g., entertainment document, marketing document, social science document). Actual programs that categorize documents use much more sophisticated techniques than this one, but this simple technique will serve for our purposes.

You are to write a program named frequency.c that reads a file from stdin and reads a category, a threshold number, and a list of words from the command line. It will then determine whether the combined frequency of the words exceeds the threshold number. If they do, then your program will print a line indicating that the document fits in the given category; otherwise it will print a line indicating that the document does not fit in the given category. Your program will also print the frequency of each word on the command line and the sum of the words.

For example, suppose you have a file named fox.txt with the contents:

The quick brown fox jumped over the fence, slipped through
the hedges, and disappeared into the woods. The 
hounds that were following behind followed on the heels 
of the fox a short time later. The hounds stopped when they reached
a stream because they had lost the scent of the fox. The hunters
stopped at the stream's edge as well.
Then the command:
bvz> frequency hunting 5 fox hounds hunters < fox.txt
should produce the output:
fox             2
hounds          2
hunters         1
total           5
This document is about hunting

Format of the Command and the Input

Format of the Output

Part 4: Line Breaking

You may have had the experience of saving a document using a word processing package, loading it into an editor such as vi, and discovering that the file is either an enormous single line or that each paragraph is a single line. Editors such as vi do not deal very well with such files, and oftentimes either do not display the entire line or make editing the files extremely awkward.

You are going to write a program, named linebreak.c, that reads a text file from stdin and formats the words in the file into lines of a pre-specified size. The size will be specified by the user on the command line and may not exceed 256 (if it does then your program should print an error message). Your program does not have to worry about the line breaks used in the text file. It will simply read words one at a time, and add them to the current line until a word would cause the line to overflow its size limit. Your program will then print the line to stdout and start a new line. Words should be separated by a single space. Every line will have at least one word, even if that word exceeds the pre-specified line width. At the end of the program you will print statistics for the file that include the number of characters in the file (do not count the spaces between the words), the number of words in the file, and the number of lines your program creates.

For example, suppose your program is given the input:

The quick brown fox jumped over the fence, slipped through
the hedges, and disappeared into the woods. The 
hounds that were following behind followed on the heels 
of the fox a short time later. 
Further suppose that the input is in a file named fox_linebreak.txt and that the line limit is 20. Then your program should produce the following output:
bvz> ./linebreak 20 < fox_linebreak.txt
The quick brown fox
jumped over the
fence, slipped
through the hedges,
and disappeared into
the woods. The
hounds that were
following behind
followed on the
heels of the fox a
short time later.

-------- statistics --------
character count:  160
word count:       34
line count:       11
You should try out your program on other text files that you might have, as well as other line sizes. What happens when the line size is less than the size of most words, such as 1 or 2? What happens if there is a word that is larger than 256 characters?

Format of the Statistics

The three count labels should be printed in left-justified fields that are 18 characters wide. The counts should be right-justified.

Strategy for Writing the Program

A rough strategy for writing this program works as follows:

  1. Declare a string array of size 257 that will hold the line. You need the extra character to hold the null character that terminates the string if the user specifies a line size of 256.
  2. Use strcpy to copy the first word into the line.
  3. Use strlen to determine the size of the next word and add this length to the existing length of the line. If the new line length is less than the line limit, use strcat to concatenate a blank and then the word to the line. If the new line length would exceed the line limit, print the existing line and go back to step 2.
  4. Throughout the program keep track of the number of characters, words, and lines seen thus far. You can use strlen to count the number of characters in each line.