Building Parse Trees


Classes that Need to be Declared

In order to construct a parse tree you will need to declare a class for each non-terminal and each production in your grammar. The class for each non-terminal should be declared as abstract and each of the non-terminal's productions should be declared as a subclass. If a terminal carries information, such as a number or id, then that terminal should also have its own top-level class. If a terminal has only one possible value or is a punctuation character, there is no need to store it since you will know its value based on its production. For example, if you have the production E -> E - E, there is no need to store the minus sign since you will know that the production represents a minus expression.

In each production's subclass you will need to have pointers to nodes that represent non-terminals on the right hand side of the production. If there are terminals that carry information, such as numbers or ids, then you will need to pointers to those nodes as well. The reason that productions should be subclasses of their left hand side nonterminal is that they expand that nonterminal and therefore represent one of the potential subtrees rooted at that nonterminal.

As an example of how you might construct a parse tree, consider the following grammar:

Pgm -> Exp
Exp -> number | Exp + Exp | Exp - Exp
The nonterminals are Pgm and E so we need abstract classes for these two nonterminals:
abstract class Exp {}
abstract class Pgm {}
The terminal number needs to have a class as well which stores the value of the number:
class Number extends Exp {
    int number; 

    public Number(Integer num) {
       number = num.integerValue(num);
    }
}
Pgm has only one production so it has one subclass:
class PgmExpression extends Pgm {
    Exp child;
    public PgmExpression(Exp e) { child = e; }
}
Note that this class has a pointer to an expression because the parse tree rooted at Pgm will expand to an Exp node. Also note that the Exp node represents an abstract class and hence any production subclass of Exp can be passed as the child node.

Next we define the remaining two subclass productions for Exp:

class MinusExp extends Exp {
    Exp child1;
    Exp child2;

  public MinusExp(Exp left, Exp right) {
    child1 = left;
    child2 = right;
  }
}

class PlusExp extends Exp {
    Exp child1;
    Exp child2;

  public PlusExp(Exp left, Exp right) {
    child1 = left;
    child2 = right;
  }
}


Building Parse Trees in JCUP

Now suppose that we want to use JCUP to build a parse tree for strings that can be generated using this grammar. Here is how the JCUP specification might be written:

parser code {:
    public static void main(String args[]) throws Exception {
          // The root of the parse tree will be a Pgm object
	  Pgm = new parser(new Yylex(System.in)).parse();
    }
:}

terminal ADD, MINUS;
terminal Integer NUM;
non terminal Exp expr;
non terminal Pgm pgm;

// associativity declarations for ADD, MINUS
precedence left ADD, MINUS;

pgm ::= expr:e {: RESULT = new PgmExpression(e); :}

expr ::= NUM:n {: RESULT = new Number(n); :}
      |  expr:l ADD expr:r {: RESULT = new PlusExp(l, r); :}
      |  expr:l MINUS expr:r {: RESULT = new MinusExp(l, r); :}