In addition to generating scanners, ANTLR is also able to generate a parser component, given a language grammar written in an EBNF-like syntax.
Let's begin with a small grammar:
<exp> = number | <add>; <add> = l_par <exp> plus <exp> r_par;
An ANTLR specification would look much like this. Begin by removing < > brackets from around nonterminals and capitalizing tokens:
exp : NUMBER | add; add : L_PAR exp PLUS exp R_PAR;
The token names must, of course, be defined in the Tokens section of the ANTLR specification file. Note that tokens must begin with a capital letter (I suggest using all caps for clarity), and nonterminals must begin with lowercase.
ANTLR supports some EBNF capabilities through the use of the *, +, and ? operators (note: the standard EBNF notations [ ] and { } are not available).
Use * to specify 0 or more
repetitions of an element (+ for 1 or more repetitions):
id_list
: ID id_list_tail* ;
id_list_tail = ‘,’ ID ;
This
would generate a single id, or id comma id, or id comma id comma id,
and so on. This would be similar to writing
id_list
::= id {
id_list_tail }
in
an EBNF notation.
Use ? to specify optionality (0 or 1 occurrence of the
element):
function_decl : ‘function’
ID L_PAR arg_list? R_PAR;
Here, the arg_list
is marked optional. This would be similar to
writing
function_decl ::=
function id l_par [
arg_list ]
r_par
in
an EBNF notation.
ANTLR generates recursive descent parsers. It handles left recursion and common prefixes by refactoring grammars that contain these.
Unlike most parser generators, ANTLR allows ambiguity in its grammars. It resolves the ambiguity using the first matching alternative. This can lead to incorrect syntax errors.
To help troubleshoot faulty syntax errors, review the stack trace, which reveals the path through the grammar taken by the parser.
ANTLR creates a recursive descent parser in a class named GrammarNameParser that includes one parsing method for each nonterminal in the grammar.