Académique Documents
Professionnel Documents
Culture Documents
A context-free grammar (CFG) is a set of recursive rewriting rules (or productions) used to generate patterns of strings. A CFG consists of the following components: a set of terminal symbols, which are the characters of the alphabet that appear in the strings generated by the grammar. a set of nonterminal symbols, which are placeholders for patterns of terminal symbols that can be generated by the nonterminal symbols. a set of productions, which are rules for replacing (or rewriting) nonterminal symbols (on the left side of the production) in a string with other nonterminal or terminal symbols (on the right side of the production). a start symbol, which is a special nonterminal symbol that appears in the initial string generated by the grammar. To generate a string of terminal symbols from a CFG, we: Begin with a string consisting of the start symbol; Apply one of the productions with the start symbol on the left hand size, replacing the start symbol with the right hand side of the production; Repeat the process of selecting nonterminal symbols in the string, and replacing them with the right hand side of some corresponding production, until all nonterminals have been replaced by terminal symbols.
The only nonterminal symbol in this grammar is <expression>, which is also the start symbol. The terminal symbols are {+,-,*,/,(,),number}. (We will interpret "number" to represent any valid number.) The first rule (or production) states that an <expression> can be rewritten as (or replaced by) a number. In other words, a number is a valid expression. The second rule says that an <expression> enclosed in parentheses is also an <expression>. Note that this rule defines an expression in terms of expressions, an example of the use of recursion in the definition of context-free grammars. The remaining rules say that the sum, difference, product, or division of two <expression>s is also an expression.
Using rule 5 we can choose to replace this nonterminal, producing the string:
<expression> * <expression>
We now have two nonterminals to replace. We can apply rule 3 to the first nonterminal, producing the string:
<expression> + <expression> * <expression>
We can apply rule two to the first nonterminal in this string to produce:
(<expression>) + <expression> * <expression>
If we apply rule 1 to the remaining nonterminals (the recursion must end somewhere!), we get:
(number) + number * number
converted by Web2PDFConvert.com
This is a valid arithmetic expression, as generated by the grammar. When applying the rules above, we often face a choice as to which production to choose. Different choices will typically result in different strings being generated. Given a grammar G with start symbol S, if there is some sequence of productions that, when applied to the initial string S, result in the string s, then s is in L(G), the language of the grammar.
We begin with the string P. We can replace P with epsilon, in which case we have generated the empty string (which does have balanced parentheses). Alternatively, we can generate a string of balanced parentheses within a pair of balanced parentheses, which must result in a string of balanced parentheses. Alternatively, we can concatenate two strings of balanced parentheses, which again must result in a string of balanced parentheses. This grammar is equivalent to:
P --> ( P ) | P P | epsilon
We use the notational shorthand '|', which can be read as "or", to represent multiple rewriting rules within a single line.
CFG Examples
A CFG describing strings of letters with the word "main" somewhere in the string:
<program> --> <letter*> m a i n <letter*> <letter*> --> <letter> <letter*> | epsilon <letter> --> A | B | ... | Z | a | b ... | z
converted by Web2PDFConvert.com
0. Applying at most zero productions, we cannot generate any strings. 1. Applying at most one production (starting with the start symbol) we can generate {wcd<S>, b<L>e, s}. Only one of these strings consists entirely of terminal symbols, so the set of terminal strings we can generate using at most one production is {s}. 2. Applying at most two productions, we can generate all the strings we can generate with one production, plus any additional strings we can generate with an additional production.
{wcdwcd<S>, wcdb<L>e, wcds, b<S>e, b<L>;<S>e,s}
The set of terminal strings we can generate with at most two productions is therefore {s, wcds}. 3. Applying at most three productions, we can generate:
{wcdwcdwcd<S>, wcdwcdb<L>e, wcdwcds, wcdb<L>;<S>e, wcdb<S>e, bwcd<S>e, bb<L>ee, bse, b<L>;<S>Se, b<S><S>e, b<L>wcd<S>e, b<L>b<L>ee, b<L>se }
The set of terminal strings we can generate with at most three productions is therefore {s, wcds, wcdwcds, bse}. We can repeat this process for an arbitrary number of steps N, and find all the strings the grammar can generate by applying N productions.
If RE is null, don't add a production. Assume the RE is R1R2. Add to G the production
<RE> --> <R1> <R2>
and create productions for regular expressions R1 and R2. Assume the RE is R1 | R2. Add to G the production
<RE> --> <R1> | <R2>
Example: RE to CFG
We will build a CFG G for the RE (0|1)*111. First the operands:
<0> --> 0 <1> --> 1
converted by Web2PDFConvert.com