A parser determines whether a sequence of tokens is legal given a grammar
LL and LR parsing: The first L means token sequence is parsed left to right; the second letter means whether a leftmost or rightmost parse is produced. A number in parentheses represents the number of lookahead symbols used by the parser to make parsing choices; 1 is common.
We use a parse stack instead of the implicit recursive stack used in Recursive Descent parsing to keep track of the parse.
Parsing algorithm: (see example pp. 98-100)
To start, Stack.push(EOF) and Stack.push(start).
While stack is not empty,
If Stack.Top is a token, pop it and match it with the current input token (syntax error if match fails)
If Stack.Top is a nonterminal N, pop it off, select the production with N on the LHS (if more than one, see below for how to pick the right one) and push the RHS onto the stack, from right to left
1. |
start |
::= stmt_list |
2. |
stmt_list |
::= stmt ';' stmt_tail |
3. |
stmt_tail |
::= stmt_list |
5. |
stmt |
::= read '(' id ')' |
8. |
expr |
::= intlit |
LL Parse:
Stack |
Input |
Action |
$ start |
read(a); b := 5; $ |
Apply Production #1 |
$ stmt_list |
read(a); b := 5; $ |
Apply Production #2 |
$ stmt_tail ';' stmt |
read(a); b := 5; $ |
Apply Production #5 |
$ stmt_tail ';' ')' id '(' read |
read(a); b := 5; $ |
Match(read) |
$ stmt_tail ';' ')' id '(' |
(a); b := 5; $ |
match '(' |
$ stmt_tail ';' ')' id |
a); b := 5; $ |
match id |
$ stmt_tail ';' ')' |
); b := 5; $ |
match ) |
$ stmt_tail ';' |
; b := 5; $ |
match ; |
$ stmt_tail |
b := 5; $ |
Apply production #3 |
$ stmt_list |
b := 5; $ |
Apply production #2 |
$ stmt_tail ';' stmt |
b := 5; $ |
Apply production #7 |
$ stmt_tail ';' expr ':=' id |
b := 5; $ |
match b |
$ stmt_tail ';' expr ':=' |
:= 5; $ |
match := |
$ stmt_tail ';' expr |
5; $ |
Apply production #8 |
$ stmt_tail ';' intlit |
5; $ |
match intlit |
$ stmt_tail ';' |
; $ |
match ; |
$ stmt_tail |
$ |
Apply production #4 |
$ |
$ |
match EOF |
(Basically the same as a Pushdown Machine Transition Table, omitting rows indexed by terminals for brevity)
An LL(1) parse table is used to determine which production is selected in step 2(b) of the LL(1) algorithm above. The rows are indexed by nonterminals, and the columns by terminals. At each row N and column T is a single production number X, indicating that when N is at the top of the stack and T is the current input token, production X should be selected.
For a simple grammar (def. p. 97), construct the Parse Table as shown on pp. 99-100, skipping step 3 (since we omit rows indexed with terminals for brevity).
More generally, the LL(1) parse table LLPT is constructed using the First and Follow sets:
For each nonterminal A and production A → V*
For each token a in First(V*), add that production to LLPT[A, a]
If First(V*) contains ε, for each element b in Follow(A), add that production to LLPT[A, b]
Note that, if there is more than one production in a cell in the table, the grammar is not LL(1).
A correct, leftmost parse is guaranteed
All grammars in the LL(1) class are unambiguous
All LL(1) parsers operate in linear time and at most linear space