CS365 Project Assignment 1


In this assignment you are going to implement an antlr scanner and parser for the formula expression language. If a formula is valid, your parser need not print anything (we will however validate your parser by executing your grammar in antlrworks and checking the parse tree that gets generated).

You may use antlr's error handling mechanism for printing lexical and syntactic errors. There are two semantic errors you need to recognize and for which you must print error messages:

  1. A decimal number being used as a row reference (e.g., a[6.3] = 10). You need to print a message specifying that an integer must be used as a row reference. For example:
        line x:y - 6.3 must be an integer row reference.
      
    Replace x and y with the appropriate line number and beginning character position of the undefined variable.

    You may be tempted to try to treat this problem as a syntax error, but that will not work. It appears that you can specify that the index must be an integer and then detect a syntax error if a decimal number is used instead. However, you cannot differentiate between a decimal number being used as an index and a decimal number simply appearing in another inappropriate place (e.g., a[10] 6 3 should also generate a syntax error).

  2. Using an identifier as an index on the right hand side of an expression that was not defined as an index on the left hand side of the same expression. For example, the formula:
    	  total_hw[i=1-6] = sum(hw1[j], hw2[i], hw3[k])
    	
    should generate the error messages:

    	line x:y - j not defined 
    	line x:y - k not defined 
    	
    Replace x and y with the appropriate line number and beginning character position of the undefined variable.

For both syntactic and semantic errors the parser should also continue parsing its input.

If you have a question about the output your parser should generate you can test your input with my test parser, which can be invoked by typing:

java -cp .:..:/home/bvz/cs365/project:/usr/share/java/antlr3.jar formula.parser


What to Submit

You will use the 365_submit script to submit your projects. When prompted for a lab number, enter 'p1'. Your submission should contain the following files:

  1. README: Minimally this file should contain the names of your teammates, or just yourself if you are doing it alone. If you cannot finish the project, also indicate what you have accomplished.
  2. Formula.g: Your antlr grammar. You should generate the lexer and parser files in a package called formula. The hints tell you how you can do this.
  3. parser.java: Your driver program that invokes your parser and accepts input from stdin. It should be in a package named formula.

Hints

  1. You can obtain the line number for a token and its beginning character position with the methods getLine and getCharPositionInLine. For example, if you have defined ID as a token, then $ID.getLine() will return the line on which the id token occurs and $ID.getCharPositionInLine() returns the id token's beginning character position.

  2. You can use the @members block to define variables or methods that you want to call from the action code in your productions.

  3. You can use the @header block to put statements at the top of your parser file and the @lexer::header block to put statements at the top of your lexer file. For example you will use these two statements to include your lexer and parser files in the formula package:
    @lexer::header {
      package formula;
    }   
    
    @header {
      package formula;
    }
         
  4. You may need to save the name of a left hand side index so that you can compare it with a right hand side index. You will need to declare a variable in @members to hold the name and you will want to re-set the variable to null before each formula is processed. You can do this initialization by placing an @init statement immediately after the start of your formula production:
    formula
      @init { lefthand_id = null; }
      : your productions
    
    Anything in an @init block is executed before any of your action code. You can also place an @after block before your productions, and the statements in the block will be executed after a production is completely recognized and after any action code associated with that production.

Formula Grammar

The formula grammar you will need to recognize is shown below. The newline character (\n) is used as the delimiter for a formula, so you will place formulas on separate lines. The newline character will also help the parser recover from syntax errors, by allowing it to skip to the next line if it gets too confused to parse the current line. In many languages a statement is delimited by a semi-colon but since you will eventually be typing individual formulas into a spreadsheet I've decided that a newline character will delimit the end of a formula.

The following grammar uses several conventions:

  1. Nonterminals start with an uppercase letter
  2. Named terminals that may have more than one value, such as ID, are shown in uppercase and are boldfaced.
  3. Terminals that are keywords, such as sum and or, are shown in lowercase and are boldfaced.
You will need to convert nonterminals to lowercase for your antlr specification.

Pgm -> Formula
    
Formula -> ID[NUMBER|ID=RowList] = Exp NEWLINE
	|  NEWLINE  /* allows empty lines */

RowList -> RowRef [, RowRef]*
RowRef -> NUMBER | NUMBER-NUMBER
  1. An ID starts with an upper/lowercase letter and is followed by any number of letters, numbers, or _'s.
  2. A NUMBER is any string of 0 or more digits, followed by an optional decimal point and 1 or more digits. A number must have at least one digit.
  3. A sample formula involving a cell list might be: grade[i=3-5, 7, 8-9] = midterm[i] * .4 + final[i] * .6 The ids on the left side of the equals sign represent cell references, with the ids denoting the column labels and the numbers denoting the rows. A simple expression such as grade[1] references a single cell whereas a row list allows a formula to be assigned to multiple cells in the same column. In the above example the formula would be assigned to rows 3-5, 7, and 8-9 in the grade column.
Exp -> Exp + Exp | Exp - Exp | Exp * Exp | Exp / Exp | ( Exp ) | -Exp
    |  Exp < Exp | Exp > Exp | Exp <= Exp | Exp >= Exp 
	|  Exp == Exp | Exp != Exp | Exp and Exp | Exp or Exp
    |  NUMBER | ID[NUMBER|ID]
    |  Exp ? Exp : Exp
    |  sum|min|max(CellList)

CellList -> CellRef (, CellRef)*

CellRef -> ID[RowList] | ID[ID]
  1. The order of precedence, from lowest to highest, is as follows:
    1. ? / :
    2. boolean operators: and/or
    3. relational operators: <, >, <=, >=, ==, !=
    4. +/-
    5. * and /
    6. unary -
    You will need to use the arithmetic grammar design pattern discussed in class to organize the Exp productions so that they obey the precedence shown above. Remember that the lower the operator precedence, the closer its productions should be to the start symbol.
  2. Operators are left associative.
  3. sum returns the sum of its operands
  4. min returns the minimum value of its operands
  5. max returns the maximum value of its operands
  6. Cell references are designed to allow the user to reference either a number of rows in a single column or a generic row in a single column
  7. The meaning of Exp1 ? Exp2 : Exp3 is the same as: if (Exp1) then Exp2 else Exp3 Exp1 is considered false if it evaluates to 0 and true otherwise. As an example, the expression income < 10000 ? income * .1 : income * .2 will return 500 if income is 5000 and 20000 if income is 100000.


Sample Formulas

Here is a list of sample formulas:

a[2] = 3 b[3] = a[2] c[1] = b[2] * a[1] grade[1] = .3 * (midterm1[1] + midterm2[1]) + .4 * final[1] grade[i=1,3-6,8] = .3 * (midterm1[i] + midterm2[i]) + .4 * final[i] grade[i=1,3-6,8] = weight[1] * (midterm1[i] + midterm2[i]) + weight[2] * final[i] total_hw[i=1-6] = sum(hw1[i], hw2[i], hw3[i]) total_purchases[2] = sum(total[2-6, 8]) tax[3] = income[3] < 20000 ? income[3] * .1 : income[3] * .2 grade[2] = midterm[2] > 90 and final[2] > 85 ? 100 : 90