In order for a recursive descent parse to proceed when there are multiple alternatives for a given LHS, the parser must be able to use the current input token to predict which production to use to expand a nonterminal. This requires the parser to know First and Follow sets.
First(V) is the set of tokens that can begin a sentence derivable from V.
If V is a terminal or ε , the set is { V }
If V is a nonterminal, the set must be computed using an algorithm discussed below
Follow(V) is the set of tokens that can follow a nonterminal V in a sentential form.
Example from this grammar:
First(stmt) = {read, write, id}
Follow(stmt) = { ; }
Two cases to consider.
The grammar contains no nullable productions
The grammar contains nullable productions
Note that, in general, it's impossible to compute the First set for an isolated nonterminal; we usually must compute the First set for all nonterminals.
Here's a description of the more complicated case, to compute First(A), where A is a nonterminal:
For each production A ::= V1 V2 V3 ... Vm
One at a time, from left to right, add the First sets of each vocabulary symbol on the RHS to First(A), leaving out ε if it isn't already in First(A), until you get to a nonnullable vocabulary symbol (doesn't have ε in its First set). If all the vocabulary symbols on the RHS have ε in their First sets, or the RHS is empty (ε ), put ε in First(A)
Iterate over this algorithm until no First sets change.
Example Exercise: Compute First(this grammar)
|
First |
start |
read, write, id |
stmt_list |
read, write, id |
stmt_tail |
read, write, id, ε |
stmt |
read, write, id |
expr |
intlit, id |
Example Exercise: Compute First of the following grammar:
1. |
A |
::= B C D |
2. |
B |
::= c E b |
3. |
C |
::= a E D |
5. |
D |
::= a D |
7. |
E |
::= B b D |
|
First |
A |
c a ε |
B |
c ε |
C |
a c ε |
D |
a ε |
E |
c, b |
When constructing an LLPT, we must compute the set of symbols that can begin a string derivable from the right-hand side of each production. For arbitrary LL(1) grammars, this can be done only after the First( ) sets are computed for all of the nonterminals.
See p. 113 (Step 5) for the technique.
Need to know what terminal symbols can follow a nonterminal Vn in some sentential form
Note that in practice, for programming language grammars, ε is never in Follow(Vn), because Follow(Start) is $ (EOF).
To compute the Follow sets for all nonterminals in a grammar,
Follow[Start] = {$}
While there are changes,
For each production,
For each nonterminal Vn on the RHS,
Compute the First(rest of symbols past Vn on RHS) and add it to Follow[Vn] (except for ε if it appears)
If ε appears in First(rest of symbols past Vn on RHS), add Follow[LHS] to Follow[Vn]
Example Exercise: Compute Follow(this grammar)
|
Follow |
start |
$ |
stmt_list |
$ |
stmt_tail |
$ |
stmt |
; |
expr |
) ; |
Example Exercise: Compute Follow(this grammar)
|
Follow |
A |
$ |
B |
a b c $ |
C |
a $ |
D |
a b $ |
E |
a b $ |
Important to know which nonterminals are nullable (can derive ε)
Theorem: A is nullable iff ε ∈ First(A)