CS365 Project Assignment 1
In this assignment you are going to implement an antlr scanner and
parser for the formula expression language. If a formula is valid, your
parser need not print anything (we will however validate your parser by
executing your grammar in antlrworks and checking the parse tree that
gets generated).
You may use antlr's error handling mechanism for printing lexical and
syntactic errors.
There are two semantic errors you need to recognize and for which you
must print error messages:
- A decimal number being used as a row reference (e.g., a[6.3] = 10). You need to print
a message specifying that an integer must be used as a row
reference. For example:
line x:y - 6.3 must be an integer row reference.
Replace x and y with the appropriate line number and
beginning character position of the undefined variable.
You may be tempted to try to treat this problem as a syntax error, but that
will not work.
It appears that you can specify that the index must
be an integer and then detect a syntax error if a decimal number
is used instead. However, you cannot differentiate between a
decimal number being used as an index and a decimal number simply
appearing in another inappropriate place (e.g., a[10] 6 3 should
also generate a syntax error).
- Using an identifier as an index
on the right hand side of an expression that
was not defined as an index
on the left hand side of the same expression. For
example, the formula:
total_hw[i=1-6] = sum(hw1[j], hw2[i], hw3[k])
should generate the error messages:
line x:y - j not defined
line x:y - k not defined
Replace x and y with the appropriate line number and
beginning character position of the undefined variable.
For both syntactic and semantic errors
the parser should also continue parsing its input.
If you have a question about the output your parser
should generate you can test your input with my test parser, which can
be invoked by typing:
java -cp .:..:/home/bvz/cs365/project:/usr/share/java/antlr3.jar formula.parser
What to Submit
You will use the 365_submit script to submit your projects. When
prompted for a lab number, enter 'p1'. Your submission should contain the
following files:
- README: Minimally this file should contain the names of your teammates, or
just yourself if you are doing it alone. If you cannot finish the
project, also indicate what you have accomplished.
- Formula.g: Your antlr grammar. You should generate the lexer and parser
files in a package called formula. The hints tell you how you
can do this.
- parser.java: Your driver program that invokes your parser and accepts
input from stdin. It should be in a package named formula.
Hints
- You can obtain the line number for a token and its beginning character
position with the methods getLine and getCharPositionInLine.
For example, if you have defined ID as a token, then
$ID.getLine() will return the line on which the id token occurs
and $ID.getCharPositionInLine() returns the id token's beginning
character position.
- You can use the @members block
to define variables or methods that you want to call from
the action code in your productions.
- You can use the @header block to put statements at the top
of your parser file and the @lexer::header block to put
statements at the top of your lexer file. For example you will use
these two statements to include your lexer and parser files in the
formula package:
@lexer::header {
package formula;
}
@header {
package formula;
}
- You may need to save the name of a left hand side index so that you
can compare it with a right hand side index. You will need to declare
a variable in @members to hold the name and you will want to
re-set the variable to null before each formula is processed.
You can do this initialization by placing an @init statement
immediately after the start of your formula production:
formula
@init { lefthand_id = null; }
: your productions
Anything in an @init block is executed before any
of your action code. You can also place an @after block
before your productions, and the statements in the block will be
executed after a production is completely recognized and after any
action code associated with that production.
Formula Grammar
The formula grammar you will need to recognize is shown below. The
newline character (\n) is used
as the delimiter for a formula, so you will place formulas on separate
lines. The newline character will also help the parser recover from
syntax errors, by allowing it to skip to the next line if it gets too confused
to parse the current line. In many languages a statement is delimited by
a semi-colon but since you will eventually be typing individual formulas into
a spreadsheet I've decided that a newline character will delimit the end
of a formula.
The following grammar uses several conventions:
- Nonterminals start with an uppercase letter
- Named terminals that may have more than one value, such as ID,
are shown in uppercase and are boldfaced.
- Terminals that are keywords, such as sum and or, are
shown in lowercase and are boldfaced.
You will need to convert nonterminals to lowercase
for your antlr specification.
Pgm -> Formula
Formula -> ID[NUMBER|ID=RowList] = Exp NEWLINE
| NEWLINE /* allows empty lines */
RowList -> RowRef [, RowRef]*
RowRef -> NUMBER | NUMBER-NUMBER
- An ID starts with an upper/lowercase letter and is followed
by any number of letters, numbers, or _'s.
- A NUMBER is any string of 0 or more digits, followed by
an optional decimal point and 1 or more digits. A number must have
at least one digit.
- A sample formula involving a cell list might be:
grade[i=3-5, 7, 8-9] = midterm[i] * .4 + final[i] * .6
The ids on the left side of the equals sign represent cell references,
with the ids denoting the column labels and the numbers denoting the
rows. A simple expression such as grade[1] references a single cell
whereas a row list allows a formula to be assigned to multiple cells
in the same column. In the above example the formula would be assigned
to rows 3-5, 7, and 8-9 in the grade column.
Exp -> Exp + Exp | Exp - Exp | Exp * Exp | Exp / Exp | ( Exp ) | -Exp
| Exp < Exp | Exp > Exp | Exp <= Exp | Exp >= Exp
| Exp == Exp | Exp != Exp | Exp and Exp | Exp or Exp
| NUMBER | ID[NUMBER|ID]
| Exp ? Exp : Exp
| sum|min|max(CellList)
CellList -> CellRef (, CellRef)*
CellRef -> ID[RowList] | ID[ID]
- The order of precedence, from lowest to highest, is as follows:
- ? / :
- boolean operators: and/or
- relational operators: <, >, <=, >=, ==, !=
- +/-
- * and /
- unary -
You will need to use the arithmetic grammar design pattern discussed
in class to organize the Exp productions so that they obey
the precedence shown above. Remember that the lower the operator
precedence, the closer its productions should be to the start symbol.
- Operators are left associative.
- sum returns the sum of its operands
- min returns the minimum value of its operands
- max returns the maximum value of its operands
- Cell references are designed to allow the user to reference either
a number of rows in a single column or a generic row in a single column
- The meaning of Exp1 ? Exp2 : Exp3 is the same as:
if (Exp1) then Exp2 else Exp3
Exp1 is considered false if it evaluates to 0 and true otherwise.
As an example, the expression income < 10000 ? income * .1 : income * .2
will return 500 if income is 5000 and 20000 if income is 100000.
Sample Formulas
Here is a list of sample formulas:
a[2] = 3
b[3] = a[2]
c[1] = b[2] * a[1]
grade[1] = .3 * (midterm1[1] + midterm2[1]) + .4 * final[1]
grade[i=1,3-6,8] = .3 * (midterm1[i] + midterm2[i]) + .4 * final[i]
grade[i=1,3-6,8] = weight[1] * (midterm1[i] + midterm2[i]) + weight[2] * final[i]
total_hw[i=1-6] = sum(hw1[i], hw2[i], hw3[i])
total_purchases[2] = sum(total[2-6, 8])
tax[3] = income[3] < 20000 ? income[3] * .1 : income[3] * .2
grade[2] = midterm[2] > 90 and final[2] > 85 ? 100 : 90