CpS 450 Lecture Notes

Syntactic Analysis


A parser determines whether a sequence of tokens is legal given a grammar

Four classes of grammars:

Context-Free Grammars

A context-free grammar consists of

Example:

 

start

::= exp

 

exp
exp
exp

::= exp op exp
::= '(' exp ')'
::= number

 

op
op
op

::= '+'
::= '-'
::= '*'

A language is a set of legal token strings which can be derived in zero or more steps from the start symbol of a grammar. Note symbolic definition: L(G) = { s | start Þ * s }

A derivation involves a series of replacements of nonterminals by their grammar definitions. A derivation step is one such replacement. See p. 74 (bottom).

A parse of a token string involves performing a derivation beginning with the start symbol that results in the input token string (if possible).

Repetition

Ex: L({E ::= E + a | a}) => {a, a+a, a+a+a, ...}

General cases:

This shows context-free grammars can express everything regular expressions can, and more (ex: nested parentheses)

Practice: Write a grammar that matches a comma-delimited list of argument declarations type1 name1, type2 name2, ..., such as might appear in a function definition.

Epsilon Productions (aka "Poof Productions")

It's possible for a production to have 0 symbols on the RHS. This allows for optional areas in the grammar without having to resort to common prefixes. Consider two if-then grammars:

 

stmt

::= if-stmt

 

if-stmt

::= if '(' exp ')' stmt
::= if '(' exp ')' stmt else stmt

 

exp

::= 0
::= 1

Using Poof Productions:

 

stmt

::= if-stmt

 

if-stmt

::= if '(' exp ')' stmt if-trailer

 

if-trailer

::= else stmt
::=

 

exp

::= 0
::= 1

 

Parse Trees

In order for translation to occur, we need to know more than just the answer to "is the input string in the language?" Need to know the structure of the string.

A parse tree is constructed as follows:

Note that a parse tree contains more information than is actually needed to perform translation. An abstract syntax tree (sometimes just called "syntax tree") is a parse tree with nonessential info removed.

Note: Your author uses the term "derivation tree" instead of "parse tree". I prefer "parse tree" to avoid confusing with "derivation".

Ambiguity

Ambiguous grammars permit the construction of multiple parse trees from the same input.

This results when, given a derivation order (leftmost or rightmost), there is more than one rule that can be used to expand a given nonterminal. See p. 74.