In order to construct a parse tree you will need to declare a class for
each non-terminal and each production in your grammar. The class for each
non-terminal should be declared as abstract and each of the non-terminal's
productions should be declared as a subclass. If a terminal carries information,
such as a number or id, then that terminal should also have its own
top-level class. If a terminal has only one possible value or is a punctuation
character, there is no need to store it since you will know its value based
on its production. For example, if you have the production `E -> E - E`,
there is no need to store the minus sign since you will know that the
production represents a minus expression.

In each production's subclass you will need to have pointers to nodes that represent non-terminals on the right hand side of the production. If there are terminals that carry information, such as numbers or ids, then you will need to pointers to those nodes as well. The reason that productions should be subclasses of their left hand side nonterminal is that they expand that nonterminal and therefore represent one of the potential subtrees rooted at that nonterminal.

As an example of how you might construct a parse tree, consider the following grammar:

Pgm -> Exp Exp -> number | Exp + Exp | Exp - ExpThe nonterminals are

abstract class Exp {} abstract class Pgm {}The terminal

class Number extends Exp { int number; public Number(String num) { number = num.parseInt(num); } }

class PgmExpression extends Pgm { Exp child; public PgmExpression(Exp e) { child = e; } }Note that this class has a pointer to an expression because the parse tree rooted at

Next we define the remaining two subclass productions for *Exp*:

class MinusExp extends Exp { Exp child1; Exp child2; public MinusExp(Exp left, Exp right) { child1 = left; child2 = right; } } class PlusExp extends Exp { Exp child1; Exp child2; public PlusExp(Exp left, Exp right) { child1 = left; child2 = right; } }

Now suppose that we want to use Antlr to build a parse tree for strings that can be generated using this grammar. Here is how the Antlr specification might be written:

pgm returns [Pgm value] : e=exp { $value = new PgmExpression($e.value); } ; exp returns [Exp value] : n=NUM { $value = new Number($n.text); } | l=exp '+' r=exp { $value = new PlusExp($l.value, $r.value); } | l=exp '-' r=exp { $value = new MinusExp($l.value, $r.value); } ;

Often times you want to abstract away parts of the parse tree that do
not add "meaning" to your program. In the above example, you actually started
constructing an abstract syntax tree, because the parse tree you constructed
did not include the operator symbols '+' and '-'. Instead, their meaning
was embedded in the two classes you created, `PlusExp` and
`MinusExp`. In the above example, you also were not concerned with
preserving the start non-terminal. Instead what you really wanted was an
expression tree for the expression input by the user. You could abstract
away the Pgm non-terminal by having its action return the expression tree,
rather than a Pgm object:

pgm returns [Exp value] : e=exp { $value = $e.value; } ;As another example, suppose we extended the above expression grammar by adding parenthesized expressions:

Pgm -> Exp Exp -> number | Exp + Exp | Exp - Exp | ( Exp )The parentheses are there to allow us to properly group the operands associated with operators, and can be discarded from the parse tree. Additionally, we do not need to define a subclass for the parenthesized expression. Instead we can simply return its expression tree, thus abstracting away the parentheses:

exp returns [Exp value] : other productions | '(' e=exp ')' { $value = $e.value; } ;