Fundamentals of Computer Science
SMU CSE 5311/7311
Fall 1997
Chapter 11: Grammars
Recursive Description of Patterns (11)
Context-Free Grammars
- A grammar is a recursive definition of a pattern
- More powerful than regular expressions/automata
- Used to specify the syntax of programming languages
Terminology
- Syntactic Category: A symbol that represents
the set of strings being defined.
- Metasymbol: Symbols that play special
roles in the construction
- Terminal: A string that will be
defined elsewhere or a simple character
- Production: A ``substitution'' rule comprised of
three parts
- Head: The syntactic category on the left
- --> : A metasymbol meaning ``can
be composed of''
- Body:: 0 or more syntactic categories or
terminals on the right
- Start Symbol: The syntactic category from which
to start generating strings
Languages from Grammars (11.3)
Languages
- For a syntactic category < S > the language of strings
in < S >, L(< S >) is formed by repeatedly using
the productions to generate terminals
- Can find by repeatedly making rounds, expanding the
productions in all possible ways
Parse Trees (11.4)
- Generation of a string s in L(< S >) is accomplished
by applying productions
- Can illustrate this in a parse tree
- Each interior node is a syntactic category
- Each leaf is a terminal
- A node and its children represent a production
- The leaf nodes read left-to-right represent the
string s. This is called the yield of the
tree
- Can be shown that the yields of parse trees for < S > is
exactly L(< S >)
Ambiguity (11.5)
Ambiguous Grammars
- It is possible to have grammars such that two
or more parse trees have the same yield
< B > --> (< B >) | < B > < B > | e
- These grammars are called ambiguous. Those
that do not have this property are called
unambiguous
- Ambiguous grammars can lead to problems. For example,
when used for expressions, can lead to wrong answers
- Sometimes grammars can be made unambiguous by
adding additional syntactic categories
Constructing Parse Trees (11.6)
Recursive Descent
- For certain grammars can automatically create program
to construct parse trees
- Many different ways to do this, but one simple (but
not the most powerful) technique is called
recursive descent
- Basic Structure:
- Create a procedure for each syntactic category
- Have ``lookahead'' symbol for input
- The procedure for each syntactic category attempts
to perform productions based on next input symbol
- This requires that next input symbol be unique for
all productions
Table-Driven Parsing (11.7)
- Can actually condense recursive descent into table
- As before, requires that production can be determined
from next input symbol
- Can perform left factoring on the grammar to
remove some ambiguities
Grammars vs. Regular Expressions (10.8)
Regular Expressions to Grammars
- Can show that for any regular expression an equivalent
grammar can be constructed
- Do this inductively -- show grammars for the operands and
each of the operators
Grammars to Regular Expressions
- Can show that the language for a given grammar
cannot be generated by a regular expression
< S > --> 0 < S > 1
< S > --> 01
- Proof relies on pigeonhole principle
Review: Chapter 10
- What does a node in a state machine represent?
- What is an arc labeled with?
- T or F: There can be only one accepting state.
- What is the main distinction between determinism and nondeterminism?
- How is ``acceptance'' handled in a nondeterministic automaton?
- T or F: Every deterministic automaton is nondeterministic.
- What is subset construction used for?
- What does one state in the DFSA represent in the NFSA when
using subset construction?
- What is simulation using a FSA entail?
- What are the basic operators for regular expressions?
- What the the operands?
- T or F: If R = (ab | ba)^* then the string ``abababa'' is
in L(R).
- What does the regular expression ``[a-m]+[0-9]*'' match?
- What does the regular expression ``a..b'' match?
- T or F: A DFSA can have epsilon transitions.
Note: These notes are based on the text
Foundations of Computer Science, C Edition
A. Aho and J. Ullman
W.H. Freeman and Company, 1995
ISBN 1-7167-8284-7
and are copyright © 1998 Matthew Diaz, Southern Methodist University.
These notes may be freely copied and distributed so long as this
notice is included.