CS365 Project Assignment 2


In this assignment you are going to implement a parser for the formula grammar that you were given in the assignment 1.

If your parser detects either a syntactic or semantic error (e.g., a non-integer used as a row reference), then your parser should print an appropriate error message and the line number on which the error occurred. If the error is a syntax number than you should also print out the token that caused the error. For both syntactic and semantic errors the parser should also continue parsing its input.


Output of Your Parser

Your parser should print the productions as it recognizes them and any syntax/semantic errors. If you have a question about the output your parser should generate you can test your input with my test parser, which can be invoked by typing:

java -cp .:..:/home/cs365/www-home/project/:/usr/local/lib/jar formula.parser

Your files should be placed in a package named formula.


Hints

  1. JCup does not provide an obvious way to communicate the line number for a token from the lexer to the parser. There are however two constructors for the Symbol class that I found useful in communicating this information. The constructors are:
         Symbol(int token_name, int left, int right)
         Symbol(int token_name, int left, int right, Object lexeme_value)
         
    I suggest that you pass the line number as the left argument to this constructor.

    Perhaps an easier way to tackle this problem is add the following directive to your JLex file:

    	 %{
    	   int getLine() { return yyline+1; }
    	 %}
    	 
    This directive will make it possible for your parser to access the line number via the following piece of code:
    	 int line = ((Yylex)getScanner()).getLine();
    	 

  2. You will need to override the parser's syntax_error method to report syntax errors. You can print the token by simply treating the Symbol object passed to syntax_error as a string. It will automatically print the integer constant assigned to the token by the sym class.

  3. There is only one semantic error you need to recognize:

    1. A decimal number being used as a row reference: You need to print a message specifying that an integer must be used as a row reference. You cannot treat this problem as a syntax error. On the surface it appears you can by specifying that the index must be an integer and then detecting a syntax error if a decimal number is used instead. However, you cannot differentiate between a decimal number being used as an index and a decimal number simply appearing in another inappropriate place (e.g., a[10] 6 3 should also generate a syntax error).

  4. Your action code that appears between the {: and :} is not placed in the parser class but instead in a separate class called CUP$parser$actions. As a result you will get a compiler error if you try to call one of your custom parser methods that you have defined in the "parser code {: ... :}" block. Fortunately CUP$parser$actions maintains a pointer to an instance of the parser class in a variable named parser. Hence, if you have defined a custom method named foo, you can access it in your action code by calling parser.foo().

  5. Add 1 to your line number when you pass it from the lexer to the parser. The lexer starts numbering lines at 0 rather than 1 and users will expect to see lines numbered beginning at 1.

Formula Grammar

I've repeated and slightly updated the spreadsheet grammar from the lexer assignment. The main change is the addition of the newline character (\n) as the delimiter for a formula. The reason for adding a newline character as a delimiter is so that we can recover from syntax errors. You will note that I have added an error production that allows a formula to be recognized even if there is an error. In order to know where it can resume parsing when it encounters a syntactic error, the parser needs to know where a formula ends, since it resumes parsing at the start of the next formula. In many languages a statement is delimited by a semi-colon but since you will eventually be typing individual formulas into a spreadsheet I've decided that a newline character will delimit the end of a formula. You can consult the JCup manual to see a further discussion of how to report syntactic errors and how to recover from syntactic errors.

The following grammar uses several conventions:

  1. Nonterminals start with a capital letter (except for the error non-terminal)
  2. Named terminals, such as id, are bold-faced

Pgm -> Formula
    
Formula -> id[number|id=RowList] = Exp \n
	|  \n  /* allows empty lines */
	|  error \n

RowList -> RowRef [, RowRef]*
RowRef -> number | number-number
  1. An id starts with an upper/lowercase letter and is followed by any number of letters, numbers, or _'s.
  2. A number is any string of 0 or more digits, followed by an optional decimal point and 1 or more digits. A number must have at least one digit.
  3. A sample formula involving a cell list might be: grade[i=3-5, 7, 8-9] = midterm[i] * .4 + final[i] * .6 The ids on the left side of the equals sign represent cell references, with the ids denoting the column labels and the numbers denoting the rows. A simple expression such as grade[1] references a single cell whereas a row list allows a formula to be assigned to multiple cells in the same column. In the above example the formula would be assigned to rows 3-5, 7, and 8-9 in the grade column.
Exp -> Exp + Exp | Exp - Exp | Exp * Exp | Exp / Exp | ( Exp ) | -Exp
    |  Exp < Exp | Exp > Exp | Exp <= Exp | Exp >= Exp 
	|  Exp == Exp | Exp != Exp
    |  number | id[number|id]
    |  Exp ? Exp : Exp
    |  sum|min|max(CellList)

CellList -> CellRef (, CellRef)*

CellRef -> id[RowList] | id[id]
  1. ? and : have the highest precedence
  2. * and / have precedence over + and -.
  3. The unary - has precedence over * and /.
  4. * and / have equal precedence and so do + and -.
  5. Operators are left associative.
  6. sum returns the sum of its operands
  7. min returns the minimum value of its operands
  8. max returns the maximum value of its operands
  9. Cell references are designed to allow the user to reference either a number of rows in a single column or a generic row in a single column
  10. The meaning of Exp1 ? Exp2 : Exp3 is the same as: if (Exp1) then Exp2 else Exp3 Exp1 is considered false if it evaluates to 0 and true otherwise. As an example, the expression income < 10000 ? income * .1 : income * .2 will return 500 if income is 5000 and 20000 if income is 100000.


Sample Formulas

Here is a list of sample formulas:

a[2] = 3 b[3] = a[2] c[1] = b[2] * a[1] grade[1] = .3 * (midterm1[1] + midterm2[1]) + .4 * final[1] grade[i=1,3-6,8] = .3 * (midterm1[i] + midterm2[i]) + .4 * final[i] grade[i=1,3-6,8] = weight[1] * (midterm1[i] + midterm2[i]) + weight[2] * final[i] total_hw[i=1-6] = sum(hw1[i], hw2[i], hw3[i]) total_purchases[2] = sum(total[2-6, 8]) tax[3] = income[3] < 20000 ? income[3] * .1 : income[3] * .2