ParserGen

CpS 450 Notes

Parser Generation with ANTLR

In addition to generating scanners, ANTLR is also able to generate a parser component, given a language grammar written in an EBNF-like syntax.

Let's begin with a small grammar:

<exp> = number 
        | <add>;

<add> = l_par <exp> plus <exp> r_par;

An ANTLR specification would look much like this. Begin by removing < > brackets from around nonterminals and capitalizing tokens:

exp : NUMBER 
    | add;

add : L_PAR exp PLUS exp R_PAR;

The token names must, of course, be defined in the Tokens section of the ANTLR specification file. Note that tokens must begin with a capital letter (I suggest using all caps for clarity), and nonterminals must begin with lowercase.

EBNF Features

ANTLR supports some EBNF capabilities through the use of the *, +, and ? operators (note: the standard EBNF notations [ ] and { } are not available).

Use * to specify 0 or more repetitions of an element (+ for 1 or more repetitions):

id_list : ID id_list_tail* ; id_list_tail = ‘,’ ID ;

This would generate a single id, or id comma id, or id comma id comma id, and so on. This would be similar to writing

id_list ::= id { id_list_tail }

in an EBNF notation.
Use ? to specify optionality (0 or 1 occurrence of the element):

function_decl : ‘function’ ID L_PAR arg_list? R_PAR; Here, the arg_list is marked optional. This would be similar to writing

function_decl ::=function id l_par [arg_list ]r_parin an EBNF notation.

ANTLR and Recursive Descent Parsing

ANTLR generates recursive descent parsers. It handles left recursion and common prefixes by refactoring grammars that contain these.

Grammar Ambiguity

Unlike most parser generators, ANTLR allows ambiguity in its grammars. It resolves the ambiguity using the first matching alternative. This can lead to incorrect syntax errors.

To help troubleshoot faulty syntax errors, review the stack trace, which reveals the path through the grammar taken by the parser.

Generated Code

ANTLR creates a recursive descent parser in a class named GrammarNameParser that includes one parsing method for each nonterminal in the grammar.