CpS 450 Notes

Parser Generation with ANTLR


In addition to generating scanners, ANTLR is also able to generate a parser component, given a language grammar written in an EBNF-like syntax.

Let's begin with a small grammar:

<exp> = number 
        | <add>;

<add> = l_par <exp> plus <exp> r_par;

 

An ANTLR specification would look much like this. Begin by removing < > brackets from around nonterminals and capitalizing tokens:

The token names must, of course, be defined in the Tokens section of the ANTLR specification file. Note that tokens must begin with a capital letter (I suggest using all caps for clarity), and nonterminals must begin with lowercase.

EBNF Features

ANTLR supports some EBNF capabilities through the use of the *, +, and ? operators (note: the standard EBNF notations [ ] and { } are not available).

ANTLR and Recursive Descent Parsing

ANTLR generates recursive descent parsers. It handles left recursion and common prefixes by refactoring grammars that contain these.

Grammar Ambiguity

Unlike most parser generators, ANTLR allows ambiguity in its grammars. It resolves the ambiguity using the first matching alternative. This can lead to incorrect syntax errors.

To help troubleshoot faulty syntax errors, review the stack trace, which reveals the path through the grammar taken by the parser.

Generated Code

ANTLR creates a recursive descent parser in a class named GrammarNameParser that includes one parsing method for each nonterminal in the grammar.