The Antlr Parsing Tool


Introduction

Antlr is a public-domain, software tool developed by Terence Parr to assist with the development of translators and compilers. It specifically allows a user to provide a lexical and syntactic description of a language, using extended BNF grammar notation and then generates a top-down, recursive descent parser to recognize the language. The allowable set of grammars are LL(*), which means that a parser generator can theoretically look ahead an infinite number of tokens to determine which production to select next. Antlr is not as powerful as the Yacc/Lex family of tools, which support LALR grammars, but it is powerful enough to express most languages of interest. It also provides a number of features/hacks that allows you to express some language features that ordinarily would require either an LALR grammar or a context sensitive grammar.


Installation

To install Antlr on your computer, you will need to go to http://www.antlr.org and go to the downloads page. From there you will want to download two software packages:

  1. Complete ANTLR 3.2 jar: Contains all the tools and the runtime environment.
  2. AntlrWorks Version 1.3: This is an IDE for Antlr and is optional. However, I strongly recommend it, as it provides a nice editor and debugger for creating grammars.


Useful Antlr Links

Here are some Antlr articles that will help you get started with Antlr and AntlrWorks:

I would not suggest getting the reference book for Antlr. It can be confusing for beginners. If you start to use Antlr on a regular basis in the future, then it is worth buying.


Developing an Antlr Parser

The way I go about developing an Antlr parser is as follows:

  1. I start by using AntlrWorks and I typically copy a pre-existng file and then start modifying it.
  2. I first perfect my grammar, and then I incrementally add actions (actions are code that gets performed as the elements in a production are recognized, such as storing an id in a hash table or retrieving the value of an id from a hash table).
  3. To compile the grammar, you go to the Generate menu and select the Generate_Code command. If Antlr succeeds in generating a parser, Antlrworks pops up a dialog box announcing success and where it stored the parser files. Otherwise it pops up a dialog box with a diagnostic message.
  4. To test your grammar, go to the Run menu and select either the Run or Debug commands. Both options will pop up a dialog box that asks for input. Once you enter the input it will run the parser over the input. By selecting the Debugger button at the bottom of the AntlrWorks window you can take a look at the generated parse tree, the input you provided, or the output that your parser generated. Antlr may not print error messages if your input is invalid but you can determine if there was an input error by looking at the parse tree--there will be nodes that indicate that an error occurred.
  5. Tips and troubleshooting hints for working with AntlrWorks windows:

    1. Viewing large parse trees: The parse tree window has an upward sloping arrow in the upper right corner that allows you to convert it to an independent window that can be re-sized.
    2. Debugging allows you to step through the input tokens one by one and see what path the parser is taking through your grammar rules. It can help you see where the parser is getting "stuck" when you are perfecting your grammar.
    3. AntlrWorks "freezing" during debugging: When you select the debug option, you cannot run a new test case without first stopping the debugger. You can stop the debugger by clicking on the square icon just above the input window.
    4. Menubar disappears in AntlrWorks, leaving only the AntlrWorks menu. Click the AntlrWorks option and select acknowledgements. The menu bar should return.
    5. Viewing a grammar production visually: Use the syntax diagram button and select a production to view it visually, as a finite state automata.


Using Antlr from the Command Line

Unfortunately AntlrWorks generates code that includes debugging code and this code does not always work from the command line. This means you will have to do a number of things to get your grammar to work from the command line:

  1. You will need to create your own test driver. Here is a sample one for a grammar named Expr.g:
    import org.antlr.runtime.*;
    
    public class Test {
        public static void main(String[] args) throws Exception {
            // read input from stdin
            ANTLRInputStream input = new ANTLRInputStream(System.in);
            // have the lexer read from the input stream
            ExprLexer lexer = new ExprLexer(input);
            // have the lexer create a stream of tokens
            CommonTokenStream tokens = new CommonTokenStream(lexer);
            // create a parser that reads the stream of tokens as input 
            ExprParser parser = new ExprParser(tokens);
            // invoke the parser by calling the function associated with
            // the start symbol 
            parser.prog();
        }
    }
           
    There are a few things to note about this test file:

    1. For a grammar named X.g, Antlr generates a lexer class named XLexer and a parser class named XParser. You will need to modify your test file to reflect the name of the grammar that you use.
    2. The call that invokes the parser on the last line of the test file, parser.prog(), must match the name of the starting non-terminal in the grammar. In this case I am assuming that the name of the start non-terminal is prog. You should change the name of this invoking method to match the name of the start non-terminal in your grammar.

  2. You will need to recompile your grammar using antlr3 (this is an alias to Antlr's org.antlr.Tool tool). For example, if your grammar is named Expr.g, then your command will look like:
    	  antlr3 Expr.g
    	 

  3. To run your parser from the command line, you will first need to compile the files and then run your test driver:
             javac -cp .:..:/usr/share/java/antlr3.jar *.java
    	 java -cp .:..:/usr/share/java/antlr3.jar Test < inputfile