The name may contain any number of words with any amount of white space between them. No word in a name may be a number. The score is a floating point number (use a double). Example input files are input1, input2 and input3 in the lab2 directory.
Maxmin should take an input file on standard input, and print out the maximum and minimum score, rounded to 2 decimal digits. If standard input is not in the proper form, maxmin can do anything.. You should try out the example executable in /home/bvz/cs140/labs/lab2/:
UNIX> maxmin < input1 Max: 0.71 Min: 0.38 UNIX> maxmin < input2 Max: 68.43 Min: 35.90 UNIX> maxmin < input3 Max: 74.58 Min: 69.21 UNIX>
You should also try this on other input files that you make up. Does your executable work if the input file has no lines?
(Hint: use scanf("%s", ...) to read in words, and then use either atof(...) or sscanf to see if it is a score.)
UNIX> maxminname < input1 Max: 0.71 New York Yankees Min: 0.38 Detroit UNIX> maxminname < input2 Max: 68.43 Ted McLellan & Brad Vander Zanden Min: 35.90 DUPLICATE UNIX> maxminname < input3 Max: 74.58 Chip Beck Min: 69.21 David Duval UNIX>
Hint: you should have four character arrays in your program; all of them should be 1000 characters:
A very rough way to categorize documents using "artificial intelligence" is to count the frequency of certain words that appear in the documents. If the combined sum of the word frequencies exceeds a certain threshold for a certain category, then the document will be placed in that category (e.g., entertainment document, marketing document, social science document). Actual programs that categorize documents use much more sophisticated techniques than this one, but this simple technique will serve for our purposes.
You are to write a program named frequency.c that reads input from stdin and reads a category, a threshold number, and a list of words from the command line. It will then determine whether the combined frequency of the words exceeds the threshold number. If they do, then your program will print a line indicating that the document fits in the given category; otherwise it will print a line indicating that the document does not fit in the given category. Your program will also print the frequency of each word on the command line and the sum of the words.
For example, suppose you have a file named fox.txt with the contents:
The quick brown fox jumped over the fence, slipped through the hedges, and disappeared into the woods. The hounds that were following behind followed on the heels of the fox a short time later. The hounds stopped when they reached a stream because they had lost the scent of the fox. The hunters stopped at the stream's edge as well.Then the command:
bvz> frequency hunting 5 fox hounds hunters < fox.txtshould produce the output:
fox 2 hounds 2 hunters 1 total 5 This document is about hunting
frequency category threshold word1 word2 ... word_n < input_fileNote that you will be reading the input from stdin.
This document is about "category"if the total word count equals or exceeds the threshold. Otherwise there should be a statement of the form:
This document is not about "category"
You may have had the experience of saving a document using a word processing package, loading it into an editor such as vi, and discovering that the file is either an enormous single line or that each paragraph is a single line. Editors such as vi do not deal very well with such files, and oftentimes either do not display the entire line or make editing the files extremely awkward.
You are going to write a program, named linebreak.c, that reads a text file from stdin and formats the words in the file into lines of a pre-specified size. The size will be specified by the user on the command line and may not exceed 256 (if it does then your program should print an error message). Your program does not have to worry about the line breaks used in the text file. It will simply read words one at a time, and add them to the current line until a word would cause the line to overflow its size limit. Your program will then print the line to stdout and start a new line. Words should be separated by a single space. Every line will have at least one word, even if that word exceeds the pre-specified line width. At the end of the program you will print statistics for the file that include the number of characters in the file (do not count the spaces between the words), the number of words in the file, and the number of lines your program creates.
For example, suppose your program is given the input:
The quick brown fox jumped over the fence, slipped through the hedges, and disappeared into the woods. The hounds that were following behind followed on the heels of the fox a short time later.Further suppose that the input is in a file named fox_linebreak.txt and that the line limit is 20. Then your program should produce the following output:
bvz> ./linebreak 20 < fox_linebreak.txt The quick brown fox jumped over the fence, slipped through the hedges, and disappeared into the woods. The hounds that were following behind followed on the heels of the fox a short time later. -------- statistics -------- character count: 160 word count: 34 line count: 11Note that the character count does not include the blank spaces between words. You should try out your program on other text files that you might have, as well as other line sizes. What happens when the line size is less than the size of most words, such as 1 or 2? What happens if there is a word that is larger than 256 characters (remember that a word can be up to 1000 characters in length)?
The three count labels should be printed in left-justified fields that are 18 characters wide. The counts should be right-justified.
A rough strategy for writing this program works as follows: