Homework Assignment 7


For this assignment there are several requirements:

  1. You must use Python to write all the programs in this assignment and you must use version 3.3. On the hydra/tesla machines you must type the following command on the command line to get python 3.3:
    scl enable python33 zsh
    
  2. You do not have to do any error checking. If an exception occurs, your program does not have to handle it. Instead just let your program terminate.
  3. You must use {}-style formatting rather than %-style formatting for printing your strings. See my Python IO notes for a discussion of how to use {}-style formatting. The %-style is considered old-style and you should not get into the habit of using it.

  1. Write a program named tail.py that prints the last n lines of a file. The program should take two command line arguments which are the name of the file and the number of lines to print. Remember that you can use minus subscripts and array slices to simplify your task.

  2. Write a program named frequency.py that prints each word that occurs in a file and the number of times that word occurs. Your program should take a single command line argument, which is the name of the file, and it should print the words by increasing frequency, using the word itself as a secondary sort key. Each word should be printed in a left-justified field of 20 characters and each word count should be printed in a right-justified field of 5 characters. For example, if the contents of the file is:
    The quick brown fox jumped
    over the fence and then
    the fox jumped back over
    the fence.
    
    then your output should be:
    The                      1
    and                      1
    back                     1
    brown                    1
    fence                    1
    fence.                   1
    quick                    1
    then                     1
    fox                      2
    jumped                   2
    over                     2
    the                      3
    
    Note that uppercase is significant (The and the are different) and so is punctuation (fence and fence. are different). You must use a dictionary to keep track of word frequencies and then dump the dictionary to a list to sort it by word frequencies.

  3. You may thank the redoubtable Professor Plank for this problem, as he is the one who composed it and wrote it.

    In the game of golf, players can rank themselves with a handicap. This is a number that basically tells you how good you are at the game. The lower your handicap, the better you are.

    The US Golf Association has a very lengthy description of how you calculate a handicap. It may be found at http://www.usga.org/Handicapping.html if you're interested. I'm going to simplify it a bit. Here is how you calculate a handicap for a golfer:

    Your Strategy

    Your strategy for solving this problem should be as follows:

    1. Use a class to store the golfer's last name, their scores, and their computed handicap.
    2. Use a dictionary to store the golfer objects indexed by last names so that you can read lines from a score file and find the appropriate golfer to associate it with.
    3. For each golfer, use a list to store a golfer's scores.
    4. Use Python's datetime module to conveniently handle dates (you can google for its documentation).
    5. Use a dictionary to store the courses. You will need a class to store the information associated with each course.
    6. Your golfer class should have a method that when called, sets the golfer's computed handicap by iterating through the golfer's most recent 20 scores. You can use Python's sorted command to sort the list of the golfer's scores in descending order by date. You can then use an array slice to extract the first 20 dates.
    7. Once you've read the score file and associated all the scores with the appropriate golfer, compute each golfer's handicap by calling your computeHandicap method. You can print the golfers by handicap in increasing order by adding the golfers to a list, and then sorting the list using the golfers' handicap. You can then traverse this list to print the golfer's by handicap in ascending order.

    Your Job

    Your job is to write the program handicap.py. It should take two command line arguments:

    python handicap.py score-file course-file
    
    The score-file is a file that contains scores. Each line of this file is in the following format:
    Month Day Year Name Score Course
    
    Month is a number between 1 and 12. Day is a number between 1 and 31. You do not have to error check for legal month/day combinations (i.e. don't worry about 2/30). Year is the year (i.e. 1999, 2000). Name is a one-word name of a golfer, and Score is an integer score. Course is the name of a course, and may contain any number of words separated by white space. Although course is the last part of the line, you cannot assume that there will be only one space between each of the names in the course. Hence you will need to iterate through each of the names and concatenate them together, each separated by a space. The scores can be in any order, and there can be any number of golfers in the score file.

    There are example score files in /home/bvanderz/courses/302/labs/lab3:

    The course-file is a file that contains golf courses plus their ratings and slopes. This file has a looser structure than the score file. It contains three kinds of lines:
    1. Course Lines: These start with the word ``Course'' and then contain the name of the golf course. Again, the name is words separated by white space.
    2. Rating/Slope Lines: These have the word ``Rating'' then the rating (which is a floating point number), then the word ``Slope'' and then the slope (which you should also treat as a floating point number). When you encounter this line, it is the rating and slope for the course named on the most recently encountered ``Course'' line.
    3. All Other Lines: All other lines should be ignored.
    My suggestion for reading a course file is to use Python's strip and split commands to remove excess whitespace on each line and then to split each line into fields. You can then examine the first word of the line. There will be three cases:

    1. The line starts with Course: Concatenate the remaining fields of the line to obtain the course name. You can use a string's join command to quickly piece together a course name without having to write a for loop that concatenates the pieces of the course name together.
    2. The line starts with Rating: Convert the fields corresponding to rating and slope to doubles.
    3. The line starts with anything else. Discard the line and move on.
    An example (good to use for testing) is in courses. Look at the first four lines:

    Course Three Ridges -- White Tees
    Rating 69.3 Slope 119
    Par     72
    

    This says that there is a course that's called ``Three Ridges -- White Tees'' with a rating of 69.3 and a slope of 119. You ignore the ``Par'' line, and the blank line after the ``Par'' line.

    In both files (scores and courses) you should create a string for a course that is composed of each word separated by a space. For example, the following course specifications should be equivalent:

    Course Three Ridges -- White Tees
    
    Course      Three          Ridges          -- White            Tees
    

    Now, your program must read in both of these files. You may assume that they contain no errors and that they contain at least 20 scores for each golfer. Then print out the golfers and their handicaps, ordered by handicap (lowest first). Print out the handicap first (padded to 5 characters and two decimal places), and then the golfer's name.

    For example:

    UNIX> python handicap.py score1 courses
    14.31 Jim
    UNIX> python handicap.py score2 courses
     3.70 Phil
    14.31 Jim
    UNIX> python handicap.py score3 courses
     3.70 Phil
    14.31 Jim
    UNIX> python handicap.py bigscore courses
     2.58 Tiger
     8.26 Phil
     8.31 Sergio
     9.63 David
     9.77 Anika
    10.12 Jose
    18.12 Ernie
    18.21 Colin
    18.50 John
    18.88 Se-Ri
    39.55 Karrie
    
    A working executable is available at ~bvanderz/cs302/labs/lab3/handicap (it's a C++ implementation, not a Python implementation). If you set the environment variable PRINTDIFFS to be "yes", then the program will also print out each golfer's differential and date number (defined in step 4 below) or each score. You can use this to test yourself in case your computations do not seem to match those here. Note, you do not have to implement this feature. It is just included it so that you can help test your own code.
    UNIX> setenv PRINTDIFFS yes
    UNIX> handicap score1 courses
    Jim
      Dnum: 743660   Differential: 17.76
      Dnum: 743661   Differential: 17.76
      Dnum: 743691   Differential: 14.91
      Dnum: 743693   Differential: 14.91
      Dnum: 743722   Differential: 17.76
      Dnum: 743725   Differential: 17.76
      Dnum: 743753   Differential: 14.91
      Dnum: 743757   Differential: 14.91
      Dnum: 743784   Differential: 17.76
      Dnum: 743789   Differential: 17.76
      Dnum: 743815   Differential: 14.91
      Dnum: 743821   Differential: 14.91
      Dnum: 743846   Differential: 17.76
      Dnum: 743853   Differential: 17.76
      Dnum: 743877   Differential: 14.91
      Dnum: 743885   Differential: 14.91
      Dnum: 743908   Differential: 17.76
      Dnum: 743939   Differential: 14.91
      Dnum: 743970   Differential: 17.76
      Dnum: 744001   Differential: 14.91
    
    14.31 Jim
    UNIX> setenv PRINTDIFFS no
    UNIX> handicap score1 courses
    14.31 Jim
    UNIX> 
    

    Naming Your Files

    1. Name your main file handicap.py
    2. Put any class definitions in a file named golfer.py.
    These are the only two files you should use.

    Python Versus Java Comparison. In the past I also assigned this problem earlier in the semestr as a Java problem. My Python implementation took roughly 100 lines and my Java implementation took roughly 250 lines, for a 60% reduction in code.


What to Submit

Submit the following files:

  1. tail.py
  2. frequency.py
  3. handicap.py, golfer.py