delimiters -- whitespace and look ahead - new
tokens
Principle of longest substring
xtemp=ytemp
Syntactic Analysis and BNF
The legal organization of tokens
into
statements are described by a context
free grammar (CFG).
BNF (Backus-Naur Form) is a
meta syntax for expressing CFGs. Usually used
as the notation for a programming
language's grammar.
Set of productions (also
called rules)
<expr> ::= <expr> +
<term>
|
<term>
Terminal symbols
if + begin
Non terminal symbols
<statement> <compilation-unit>
Example
- Grammar for a Small Language
This is an example of a
grammar that generates expressions with
infix
operators.
Expression consist of operands and
operators.
<program> ::=
begin
<stmt_list> end <stmt_list> ::=
<stmt>
| <stmt> ;
<stmt_list> <stmt> ::=
<var> :=
<expression> <var> ::=
A
| B
| C <expression> ::=
<var> +
<var>
|
<var> -
<var>
| <var>
Derivation
of
begin A := A + B end
<program>
=> begin <stmt_list>
end
=>
begin <stmt>
end =>
begin <var> := <expression>end =>
begin A := <expression>end =>
begin A := <var>
+ <var>
end =>
begin A := A + <var>
end =>
begin A := A + B end
This is a leftmost
derivation.
Example of top down
parsing.
Each string in the
derivation is called a sentential form.
Example from Sebesta 96
Parse
Tree is a Visualization of a Derivation
show
the analysis
Abstract
Syntax Tree
Can be produced directly by
a Parser.
Shows only result
Eliminates redundant
information.
Each node's degree depends
on the arityof
the
operator.
Aside:
Traversing an expression tree in preorder
yields a prefix
expression.
Traversing an expression tree in postorder
yields a postfix
expression.
Traversing an expression tree inorder
yields an infix
expression,
but every subexpression must be parenthesized.
A Rule
for
<if-statement>
<stmt> ::=
…|<if_stmt> <if_stmt> ::=
if <logic_expr>
then <stmt>
| if <logic_expr>
then <stmt>
else <stmt>
What
is the parse tree for the sentential form:
if
<logic_expr> then if
<logic_expr> then <stmt> else
<stmt> ?
When there exist
two different parse tree for the same
sentential form the grammar is ambiguous
Sometimes languages are
ambiguous
Why is this a problem?
How can this problem be
handled?
C handles the ambiguity by
adding a disambiguating rule:
each else
matches the closest unmatched if
Modula-2 removes the
ambiguity by changing the grammar:
<stmt>
::= if
<logic_expr> then
<stmt>end
| if
<logic_expr> then
<stmt> else
<stmt> end
There are 2 different
derivations that derive different strings
<stmt> => if
<logic_expr1) then <stmt>end
=> if
<logic_expr1) then if
<logic-expr2>
then
<stmt1>
else
<stmt2> endend
VS
<stmt> => if
<logic_expr1) then <stmt>
else <stmt2>
end
=> if
<logic_expr1) then if
<logic-expr2>
else
<stmt1> end
<stmt2>end
A
more general solution:
<stmt> ::=<matched>
|
<unmatched> <matched> ::=
if <logic_expr> then
<matched> else <matched>
|
any non-if statement <unmatched> ::=if
<logic_expr> then
<stmt>
|
if <logic_expr> then <matched> else
<unmatched>
There is only one
possible parse tree. Draw it for
yourself.
Aside: Some languages are inherently
ambiguous. i.e.
All grammars for the language are ambiguous.
{anbncmdm
: n,m >
0 } union {anbmcmdn:n,m >
0
}
Operators
What is an operator?
Operator are functions
with special symbols i.e.
+ & ^
Can be
part of the grammar : C
defined in a library : Haskell
What should the order of symbols and
variables be for an operator?
Most languages use infix
notation
5 + 4
Forth uses postfix notation
5 4 +
Lisp uses prefix notation
+ 5 4
Infix notation
In infix
notation needs associativity,
precedence and
parenthesesto
resolve ambiguities.
Which operator does "b"
belong to?
a + b * c
Operator Precedence
Another
ambiguous grammar: G1.
E ::=
E +
E |
E * E |
C |
V |
(E)
C ::=
0 |
1
V ::=
a |
b |
c
a + b * c ??
Unambiguous
equivalent grammar: G2
E ::=
E + F
|
F
F ::=
F * G |
G
G ::=
C |
V |
( E )
C ::=
0 |
1
V ::=
a |
b |
c
a + b * c ??
What would you do to the
grammar to add the '-' and '\' operator?
Associativity of
Operators:
A
:= A + B + C
Don't confuse with
(semantic) associative rules in mathematics
Consider the following
grammar:
<assign> ::=
<id> :=
<expr>
<id>::=
A |
B |
C
<expr> ::=
<expr> +
<term> |
<term>
<term> ::=
<id>
A rule is left recursive if
LHS appears at the beginning of RHS.
Left recursive yields left
associativity
Right recursive rule yields right associativity.
Note: Prefix and postfix
notations do not produce ambiguous
expressions and therefore precedence and associativity are
unnecessary.
Evaluation
policies
Expressions are evaluated by
repeated application of rewrite rules
Each application of a
rewrite rule is called a reduction
An expression to which a
rewrite rule can be applied is a reducible
expression
(or redex)
Example : redex of
75/5 +
3*2
are
75/5 and 3*2
When an expression contains
two or more redexes, the choice of
the redex(es) to be reduced is governed by an evaluation
policy
leftmost
rightmost
parallel
innermost
outermost
Can be specified in the
language or left to the compiler writer