Vous êtes sur la page 1sur 91

Lenguajes Formales y Autmatas

Jos Antonio Rodrguez Melquiades

12 de septiembre de 2016

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 1 / 49


Gramticas libre del contexto

1 Parsing and Ambiguity


Parsing
Ambiguity
Parsing Algorithm

2 Ejercicios

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 2 / 49


Parsing

A grammar can be used in two ways:


Using the grammar to generate strings of the language.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 3 / 49


Parsing

A grammar can be used in two ways:


Using the grammar to generate strings of the language.

Using the grammar to recognize the strings.Parsing a string is finding a


derivation (or a derivation tree ) for that string.

Parsing a string is like recognizing a string. The only realistic way to


recognize a string of a contextfree grammar is to parse it.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 3 / 49


Exhaustive Search Parsing

The basic idea of the Exhaustive Search Parsing is to parse a string w,


generate all strings in L and check if w is among them. Problem arises when
L is an infinite language. Therefore a systematic approach is needed to
achieve this, as it is required to know that no strings are overlooked. And also
it is necessary so as to stop after a finite number of steps.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 4 / 49


Exhaustive Search Parsing

The basic idea of the Exhaustive Search Parsing is to parse a string w,


generate all strings in L and check if w is among them. Problem arises when
L is an infinite language. Therefore a systematic approach is needed to
achieve this, as it is required to know that no strings are overlooked. And also
it is necessary so as to stop after a finite number of steps.

The idea of exhaustive search parsing for a string is to generate all strings of
length no greater than |w |, and see if w is among them.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 4 / 49


Exhaustive Search Parsing

The basic idea of the Exhaustive Search Parsing is to parse a string w,


generate all strings in L and check if w is among them. Problem arises when
L is an infinite language. Therefore a systematic approach is needed to
achieve this, as it is required to know that no strings are overlooked. And also
it is necessary so as to stop after a finite number of steps.

The idea of exhaustive search parsing for a string is to generate all strings of
length no greater than |w |, and see if w is among them.

The restrictions that are placed on the grammar will allow us to generate any
string w L in at most 2 |w | 1 derivation steps.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 4 / 49


Exhaustive Search Parsing

The basic idea of the Exhaustive Search Parsing is to parse a string w,


generate all strings in L and check if w is among them. Problem arises when
L is an infinite language. Therefore a systematic approach is needed to
achieve this, as it is required to know that no strings are overlooked. And also
it is necessary so as to stop after a finite number of steps.

The idea of exhaustive search parsing for a string is to generate all strings of
length no greater than |w |, and see if w is among them.

The restrictions that are placed on the grammar will allow us to generate any
string w L in at most 2 |w | 1 derivation steps.

Exhaustive search parsing is inefficient. It requires time exponential in |w |.


There are ways to further restrict context free grammar so that strings may
be parsed in linear or nonlinear time. There is no known linear or
nonlinear algorithm for parsing strings of a general context free grammar.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 4 / 49


Derivacin mas a la izquierda1

Una derivacin es llamada mas a la izquierda (dmi), si en cada paso el terminal se


expande mas a la izquierda.

Ejemplo:
Sea la GLC dada por
G = (V , , R, S) = ({S}, {a, b}, R, S)

donde R : S aSS/b
Al derivar la sentencia aababbb, se obtiene:

S aSS

1 leftmost derivation
Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 5 / 49
Derivacin mas a la izquierda1

Una derivacin es llamada mas a la izquierda (dmi), si en cada paso el terminal se


expande mas a la izquierda.

Ejemplo:
Sea la GLC dada por
G = (V , , R, S) = ({S}, {a, b}, R, S)

donde R : S aSS/b
Al derivar la sentencia aababbb, se obtiene:

S aSS
aaSSS

1 leftmost derivation
Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 5 / 49
Derivacin mas a la izquierda1

Una derivacin es llamada mas a la izquierda (dmi), si en cada paso el terminal se


expande mas a la izquierda.

Ejemplo:
Sea la GLC dada por
G = (V , , R, S) = ({S}, {a, b}, R, S)

donde R : S aSS/b
Al derivar la sentencia aababbb, se obtiene:

S aSS
aaSSS
aabSS

1 leftmost derivation
Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 5 / 49
Derivacin mas a la izquierda1

Una derivacin es llamada mas a la izquierda (dmi), si en cada paso el terminal se


expande mas a la izquierda.

Ejemplo:
Sea la GLC dada por
G = (V , , R, S) = ({S}, {a, b}, R, S)

donde R : S aSS/b
Al derivar la sentencia aababbb, se obtiene:

S aSS
aaSSS
aabSS
aabaSSS

1 leftmost derivation
Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 5 / 49
Derivacin mas a la izquierda1

Una derivacin es llamada mas a la izquierda (dmi), si en cada paso el terminal se


expande mas a la izquierda.

Ejemplo:
Sea la GLC dada por
G = (V , , R, S) = ({S}, {a, b}, R, S)

donde R : S aSS/b
Al derivar la sentencia aababbb, se obtiene:

S aSS
aaSSS
aabSS
aabaSSS
aababSS

1 leftmost derivation
Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 5 / 49
Derivacin mas a la izquierda1

Una derivacin es llamada mas a la izquierda (dmi), si en cada paso el terminal se


expande mas a la izquierda.

Ejemplo:
Sea la GLC dada por
G = (V , , R, S) = ({S}, {a, b}, R, S)

donde R : S aSS/b
Al derivar la sentencia aababbb, se obtiene:

S aSS
aaSSS
aabSS
aabaSSS
aababSS
aababbS

1 leftmost derivation
Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 5 / 49
Derivacin mas a la izquierda1

Una derivacin es llamada mas a la izquierda (dmi), si en cada paso el terminal se


expande mas a la izquierda.

Ejemplo:
Sea la GLC dada por
G = (V , , R, S) = ({S}, {a, b}, R, S)

donde R : S aSS/b
Al derivar la sentencia aababbb, se obtiene:

S aSS
aaSSS
aabSS
aabaSSS
aababSS
aababbS
aababbb
1 leftmost derivation
Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 5 / 49
Ejemplo:

Sea la GLC dada por


G = (V , , R, S) = ({S, A, B}, {a, b}, R, S)

donde R : S aB/bA
A aS/bAA/a
B bS/aBB/b
Al derivar la sentencia aaabbabbba, se obtiene:
En efecto
S aB aaBB
aaaBBB aaabBB
aaabbB aaabbaBB
aaabbabB aaabbabbS
aaabbabbbA aaabbabbba

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 6 / 49


Derivacin mas a la derecha2

Una derivacin es llamada mas a la derecha (dmd), si en cada paso el terminal se


expande mas a la derecha.

Ejemplo:

Sea la GLC dada por


G = (V , , R, S) = ({S}, {a, b}, R, S)

donde R : S aSS/b
Al derivar la sentencia aababbb, se obtiene:

S aSS

2 rightmost derivation
Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 7 / 49
Derivacin mas a la derecha2

Una derivacin es llamada mas a la derecha (dmd), si en cada paso el terminal se


expande mas a la derecha.

Ejemplo:

Sea la GLC dada por


G = (V , , R, S) = ({S}, {a, b}, R, S)

donde R : S aSS/b
Al derivar la sentencia aababbb, se obtiene:

S aSS
aSb

2 rightmost derivation
Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 7 / 49
Derivacin mas a la derecha2

Una derivacin es llamada mas a la derecha (dmd), si en cada paso el terminal se


expande mas a la derecha.

Ejemplo:

Sea la GLC dada por


G = (V , , R, S) = ({S}, {a, b}, R, S)

donde R : S aSS/b
Al derivar la sentencia aababbb, se obtiene:

S aSS
aSb
aaSSb

2 rightmost derivation
Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 7 / 49
Derivacin mas a la derecha2

Una derivacin es llamada mas a la derecha (dmd), si en cada paso el terminal se


expande mas a la derecha.

Ejemplo:

Sea la GLC dada por


G = (V , , R, S) = ({S}, {a, b}, R, S)

donde R : S aSS/b
Al derivar la sentencia aababbb, se obtiene:

S aSS
aSb
aaSSb
aaSaSSb

2 rightmost derivation
Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 7 / 49
Derivacin mas a la derecha2

Una derivacin es llamada mas a la derecha (dmd), si en cada paso el terminal se


expande mas a la derecha.

Ejemplo:

Sea la GLC dada por


G = (V , , R, S) = ({S}, {a, b}, R, S)

donde R : S aSS/b
Al derivar la sentencia aababbb, se obtiene:

S aSS
aSb
aaSSb
aaSaSSb
aaSaSbb

2 rightmost derivation
Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 7 / 49
Derivacin mas a la derecha2

Una derivacin es llamada mas a la derecha (dmd), si en cada paso el terminal se


expande mas a la derecha.

Ejemplo:

Sea la GLC dada por


G = (V , , R, S) = ({S}, {a, b}, R, S)

donde R : S aSS/b
Al derivar la sentencia aababbb, se obtiene:

S aSS
aSb
aaSSb
aaSaSSb
aaSaSbb
aaSabbb

2 rightmost derivation
Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 7 / 49
Derivacin mas a la derecha2

Una derivacin es llamada mas a la derecha (dmd), si en cada paso el terminal se


expande mas a la derecha.

Ejemplo:

Sea la GLC dada por


G = (V , , R, S) = ({S}, {a, b}, R, S)

donde R : S aSS/b
Al derivar la sentencia aababbb, se obtiene:

S aSS
aSb
aaSSb
aaSaSSb
aaSaSbb
aaSabbb
aababbb
2 rightmost derivation
Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 7 / 49
Ejemplo:

Sea la GLC dada por


G = (V , , R, S) = ({S, A, B}, {a, b}, R, S)

donde R : S aB/bA
A aS/bAA/a
B bS/aBB/b
Al derivar la sentencia aaabbabbba, se obtiene:
En efecto

S aB aaBB

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 8 / 49


Ejemplo:

Sea la GLC dada por


G = (V , , R, S) = ({S, A, B}, {a, b}, R, S)

donde R : S aB/bA
A aS/bAA/a
B bS/aBB/b
Al derivar la sentencia aaabbabbba, se obtiene:
En efecto

S aB aaBB
aaBbS aaBbbA

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 8 / 49


Ejemplo:

Sea la GLC dada por


G = (V , , R, S) = ({S, A, B}, {a, b}, R, S)

donde R : S aB/bA
A aS/bAA/a
B bS/aBB/b
Al derivar la sentencia aaabbabbba, se obtiene:
En efecto

S aB aaBB
aaBbS aaBbbA
aaBbba aaaBBbba

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 8 / 49


Ejemplo:

Sea la GLC dada por


G = (V , , R, S) = ({S, A, B}, {a, b}, R, S)

donde R : S aB/bA
A aS/bAA/a
B bS/aBB/b
Al derivar la sentencia aaabbabbba, se obtiene:
En efecto

S aB aaBB
aaBbS aaBbbA
aaBbba aaaBBbba
aaaBbbba aaabSbbba

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 8 / 49


Ejemplo:

Sea la GLC dada por


G = (V , , R, S) = ({S, A, B}, {a, b}, R, S)

donde R : S aB/bA
A aS/bAA/a
B bS/aBB/b
Al derivar la sentencia aaabbabbba, se obtiene:
En efecto

S aB aaBB
aaBbS aaBbbA
aaBbba aaaBBbba
aaaBbbba aaabSbbba
aaabbAbbba aaabbabbba

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 8 / 49


Ambiguity

Muchas veces la estructura del AD se usa para asociar el significado para las
sentencias de un lenguaje; de forma similar a como se hace en el anlisis
sintctico de las sentencias del lenguaje castellano donde se identifica sujeto,
verbo, predicado, etc.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 9 / 49


Ambiguity

Muchas veces la estructura del AD se usa para asociar el significado para las
sentencias de un lenguaje; de forma similar a como se hace en el anlisis
sintctico de las sentencias del lenguaje castellano donde se identifica sujeto,
verbo, predicado, etc.

Si una sentencia puede ser dividida en mas de una forma durante el anlisis,
entonces puede tener diversos significados. Por eso se le llama ambigua, es
decir que una sentencia se genera por mas de una sucesin distinta de
derivaciones.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 9 / 49


Ambiguity

Muchas veces la estructura del AD se usa para asociar el significado para las
sentencias de un lenguaje; de forma similar a como se hace en el anlisis
sintctico de las sentencias del lenguaje castellano donde se identifica sujeto,
verbo, predicado, etc.

Si una sentencia puede ser dividida en mas de una forma durante el anlisis,
entonces puede tener diversos significados. Por eso se le llama ambigua, es
decir que una sentencia se genera por mas de una sucesin distinta de
derivaciones.

Una GLC es no ambigua, si para una sentencia existe una nica sucesin de
derivaciones mas a la izquierda y una nica derivacin mas a la derecha que
la generan.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 9 / 49


Teorema:

Sea G una GLC. Si w L(G ) y si existen dos o mas derivaciones mas a la


izquierda para w en G , entonces tambin existen dos o mas derivaciones mas a la
derecha para w en G .

Formalmente:
Una GLC es llamada ambigua cuando existe mas de un AD para alguna sentencia
que ella genera.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 10 / 49


Ejemplo
Sea la GLC dada por
G = (V , , R, S) = ({S}, {a, +, , (, )}, R, S)

donde R : S S + S/S S/(S)/a


Al derivar la sentencia a + a + a, se obtienen dos ADs.
En efecto
(a) AD1: Se interpreta como a + (a + a)
S S +S a+S a+S +S a+a+S a+a+a

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 11 / 49


(b) AD2: Se interpreta como (a + a) + a
S S +S S +S +S a+S +S a+a+S a+a+a

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 12 / 49


Al derivar la palabra a + a a notamos que existen dos AD.
En efecto
(a) AD1: Se interpreta como a + (a a)
S S +S a+S a+S S a+aS a+aa

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 13 / 49


(b) AD2: Se interpreta como (a + a) a
S S S S +S S a+S S a+aS a+aa

Por lo tanto: La gramatica es ambigua pues tiene dos AD para a + a + a y para


a a + a diferentes, respectivamente.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 14 / 49


Ejemplo
Sea la GLC dada por

G = (V , , R, S) = ({E , F , T }, {+, , (, ), a}, R, E )

donde R : E T + E /T
F (E )/a
T F T /F

Esta gramatica es no ambigua. Por ejemplo derivar la palabra a (a + a)

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 15 / 49


Formalmente tenemos los siguientes resultados:

Teorema: A contextfree grammar G is ambiguous if there is a string w L(G )


that can be derived by two distinct leftmost derivations.

Es decir:

Una GLC es ambigua, si y solamenten si existe mas de una dmi para alguna
sentencia que ella genera.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 16 / 49


Formalmente tenemos los siguientes resultados:

Teorema: A contextfree grammar G is ambiguous if there is a string w L(G )


that can be derived by two distinct leftmost derivations.

Es decir:

Una GLC es ambigua, si y solamenten si existe mas de una dmi para alguna
sentencia que ella genera.

Una GLC es ambigua, si y solamenten si existe mas de una dmd para alguna
sentencia que ella genera.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 16 / 49


Ejemplo
Sea la GLC dada por
G = (V , , R, S) = ({S}, {a, +, , (, )}, R, S)

donde R : S S + S/S S/(S)/a


G es ambigua ?
En efecto
(a) dmi:
S S +S a+S a+S +S a+a+S a+a+a

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 17 / 49


(b) dmi:
S S +S S +S +S a+S +S a+a+S a+a+a

Al derivar la palabra a + a + a notamos que G es ambigua, pues existen dos AD


diferentes

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 18 / 49


Al derivar la palabra a + a + a notamos que existen dos AD. Por lo tanto, tambin
se muestra que usando esta derivacin (dmd), G es ambigua.
En efecto
(a) dmd:
S S +S S +S +S S +S +aS +a+aa+a+a

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 19 / 49


(b) dmd:
S S +S S +aS +S +aS +a+aa+a+a

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 20 / 49


Ejemplo
Sea la GLC dada por
G = (V , , R, S) = ({S}, {a, b}, R, S)

donde R : S SbS/a
Al derivar la palabra abababa, se determina que G es ambigua ?
En efecto
...

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 21 / 49


Observaciones:

Detectar y eliminar la ambiguedad es muy importante.

3 Es un programa cuyo objetivo es determinar si un programa esta sintcticamente bien escrito.


Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 22 / 49
Observaciones:

Detectar y eliminar la ambiguedad es muy importante.

Determinar si una GLC es ambigua es un problema indecidible. Porqu ?.

3 Es un programa cuyo objetivo es determinar si un programa esta sintcticamente bien escrito.


Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 22 / 49
Observaciones:

Detectar y eliminar la ambiguedad es muy importante.

Determinar si una GLC es ambigua es un problema indecidible. Porqu ?.

Dos tipos de analisadores sintcticos3 generados a partir de la GLC son:


(a) Bottom-up: Parte del programa y aplica las reglas de forma invertida,
construyendo un AD desde las hojas hacia la raz del rbol, es decir que la
derivacin en el proceso es dmd.

(b) Top-down: Parte del smbolo inicial S de la GLC y construye un AD desde la


raz hacia las hojas del rbol, es decir que la derivacin en el proceso es dmi.

3 Es un programa cuyo objetivo es determinar si un programa esta sintcticamente bien escrito.


Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 22 / 49
Parsing algorithm
Derivations in a contextfree grammar provide a mechanism for generating
the strings of the language of the grammar. The language of the
BackusNaur definition of Java is the set of syntactically correct Java
programs. An important question remains: How can we determine whether a
sequence of Java code is a syntactically correct program?.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 23 / 49


Parsing algorithm
Derivations in a contextfree grammar provide a mechanism for generating
the strings of the language of the grammar. The language of the
BackusNaur definition of Java is the set of syntactically correct Java
programs. An important question remains: How can we determine whether a
sequence of Java code is a syntactically correct program?.

The syntax is correct if the string es derivable from the start symbol using the
rules of the grammar. Algorithms must be designed to generate derivations
for strings in the language of the grammar. When an input string is not in the
languages, these procedures should discover that no derivations exists. A
procedure that performs this function is called a parsing algorithm or parser.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 23 / 49


Parsing algorithm
Derivations in a contextfree grammar provide a mechanism for generating
the strings of the language of the grammar. The language of the
BackusNaur definition of Java is the set of syntactically correct Java
programs. An important question remains: How can we determine whether a
sequence of Java code is a syntactically correct program?.

The syntax is correct if the string es derivable from the start symbol using the
rules of the grammar. Algorithms must be designed to generate derivations
for strings in the language of the grammar. When an input string is not in the
languages, these procedures should discover that no derivations exists. A
procedure that performs this function is called a parsing algorithm or parser.

We study simple parsing algorithms. These parsers are variations of classical


algorithms for traversing directed graphs. The parsing algorithms are valid
but incomplete; the answers that they produce are correct, but it is possible
that they enter a nonterminating computation and fail to produce an answer.
The potential incompleteness is a consequence of the ocurrence of certain
types of derivations allowed by the grammar.
Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 23 / 49
Two distinct strategies may be employed to find a derivation of w from S.
The search can begin with the node S and atttempt to find the string w .

(a) Top down parser.


(a1) A breadth first top down parser

(a2) A depth first top down parser

(b) Bottom up parser.


(b1) A breadth first bottom up parser

(b2) A depth first bottom up parser

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 24 / 49


Two distinct strategies may be employed to find a derivation of w from S.
The search can begin with the node S and atttempt to find the string w .

(a) Top down parser.


(a1) A breadth first top down parser

(a2) A depth first top down parser

(b) Bottom up parser.


(b1) A breadth first bottom up parser

(b2) A depth first bottom up parser

The objective of a parser is to determine whether an input string is derivable


from the rules of a grammar.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 24 / 49


A breadthfirst topdown parser
A topdown parser constructs derivations by applying rules to the leftmost
variable of a sentential form.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 25 / 49


A breadthfirst topdown parser
A topdown parser constructs derivations by applying rules to the leftmost
variable of a sentential form.

The parsing algorithms use the terminal prefix of the derived string to identify
deadends. A deadend is a string that the parser can determine does not
occurr in a derivation of the input string.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 25 / 49


A breadthfirst topdown parser
A topdown parser constructs derivations by applying rules to the leftmost
variable of a sentential form.

The parsing algorithms use the terminal prefix of the derived string to identify
deadends. A deadend is a string that the parser can determine does not
occurr in a derivation of the input string.

Parsing is an inherenty nondeterministic process. In constructing a derivation,


there may be several rules that can be applied to a sentential form.

It is not known whether the application of a particular rule will lead to a


derivation of the input string, a deadend, or an unending computation. A
parse is said to terminate sucessfully when it produces a derivation of the
input string.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 25 / 49


A breadthfirst topdown parser
A topdown parser constructs derivations by applying rules to the leftmost
variable of a sentential form.

The parsing algorithms use the terminal prefix of the derived string to identify
deadends. A deadend is a string that the parser can determine does not
occurr in a derivation of the input string.

Parsing is an inherenty nondeterministic process. In constructing a derivation,


there may be several rules that can be applied to a sentential form.

It is not known whether the application of a particular rule will lead to a


derivation of the input string, a deadend, or an unending computation. A
parse is said to terminate sucessfully when it produces a derivation of the
input string.

Paths beginning with S in the graph of a grammar represent the leftmost


derivations of the grammar. The arcs emanating from a node represent the
the possible rule applications.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 25 / 49


Algorithm

Input: context-free grammar G = (V , , R, S), string p , queue Q


(1) Initialize T with root S
INSERT(S, Q)
(2) Repeat
(3) q := REMOVE(Q)
(4) i := 0
(5) done:= false
Let q = uAv where A is the leftmost variable in q
(6) Repeat
(7) IF (there is no A rule numbered greater than i)
done := true

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 26 / 49


(8) IF NOT
Let A w be the first A rule with number greater than i and
let j be the number of this rule
(9) / and the terminal prefix of uwv matches a prefix
IF (uwv
of p)
(10) INSERT(uwv , Q)
(11) Add node uwv to T. Set a pointer from uwv to q
END IF
END IF
i := j
UNTIL done or p = uwv
UNTIL EMPTY(Q) or p = uwv
(12) IF (p = uwv )
accept
ELSE reject

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 27 / 49


The queue is used to implement the firstin, firstout memory management
strategy required for a breadthfirst graph traversal. The queue Q is maintained
by three functions:
(a) INSERT(x, Q) places the string x at the rear of the queue.

(b) REMOVE(Q) returns the item at the front and deletes it from the queue.

(c) EMPTY(Q) is a boolean function that returns true if the queue is empty,
false otherwise.

The search tree is initialized with root S since a topdown algorithm


attempts to find a derivation of p from S. The repeat until, line (6),
generates the successors of the node with sentential form q in the order
specified by the numbering of the rules.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 28 / 49


The queue is used to implement the firstin, firstout memory management
strategy required for a breadthfirst graph traversal. The queue Q is maintained
by three functions:
(a) INSERT(x, Q) places the string x at the rear of the queue.

(b) REMOVE(Q) returns the item at the front and deletes it from the queue.

(c) EMPTY(Q) is a boolean function that returns true if the queue is empty,
false otherwise.

The search tree is initialized with root S since a topdown algorithm


attempts to find a derivation of p from S. The repeat until, line (6),
generates the successors of the node with sentential form q in the order
specified by the numbering of the rules.

The process of generating the successors of a node and adding them to the
search tree is called expanding the node. Utilizing the queue, nodes are
expanded level by level, resulting in the breadthfirst construction of T.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 28 / 49


Example:

Let CFG: G = (V , , R, S) = ({S, A, T }, {b, +, (, )}, R, S)


R:S A A T /A + T T b/(A)
The search tree constructed by the parse of (b + b) using algorithm is given:

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 29 / 49


The sentential forms that are generated but not added to the search tree because
of the prefix matching conditions are indicated by dotted lines. The comparison in
step (9) matches the terminal prefix of the sentential form generated by the parser
to the input string. To obtain the information required for the match, the parser
reads the input string as it builds derivation.

A parser must not only be able to generate derivations for strings in the language;
it must also determine when strings are not in the language. The bottom branch
of the search tree can potentially grow forever.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 30 / 49


A depthfirst topdown parser

A depthfirst search of a graph avoids the combinatorial problems associated


with a breadthfirst search. The traversal moves through the graph
examining a single path.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 31 / 49


A depthfirst topdown parser

A depthfirst search of a graph avoids the combinatorial problems associated


with a breadthfirst search. The traversal moves through the graph
examining a single path.

In a graph defined by a grammar, this corresponds to exploring a single


derivation at a time. When a node is expanded, only one sucessor is
generated and added to the search structure. The choice of the descendant
added to the path is arbitrary, and it is possible that the alternative chosen
will not produce a derivation of the input string.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 31 / 49


A depthfirst topdown parser

A depthfirst search of a graph avoids the combinatorial problems associated


with a breadthfirst search. The traversal moves through the graph
examining a single path.

In a graph defined by a grammar, this corresponds to exploring a single


derivation at a time. When a node is expanded, only one sucessor is
generated and added to the search structure. The choice of the descendant
added to the path is arbitrary, and it is possible that the alternative chosen
will not produce a derivation of the input string.

The possibility of incorrectly choosing a sucessor adds two complications to a


depthfirst parser. The algorithm must be able to determine that an
incorrect choice has been made. When this occurs, the parser must have the
ability to backtrack and generate the alternative derivations.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 31 / 49


Algorithm

Input: context-free grammar G = (V , , R, S), string p , stack S


(1) PUSH([S, 0] , S)
(2) Repeat
(3) [q, i] := POP(S)
(4) deadend := false
(5) Repeat
Let q = uAv where A is the leftmost variable in q
(6) IF (u is not a prefix of p)
deadend:= true
(7) IF ( there are no A rules numbered greater than i)
deadend:= true

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 32 / 49


(8) IF ( not deadend)
Let A w be the first A rule with number greater than i
and let j be the number of this rule
(9) PUSH([q, j] , S)
(10) q := uwv
(11) i := 0
END IF
Until deadend or q
Until q = p or EMPTY (S)
(12) IF (p = q)
accept
ELSE reject

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 33 / 49


A stack S is maintained using the procedures PUSH, POP and EMPTY. A stack
element is an ordered pair [u, i], where u is a sentential form and i the number of
the rule applied to u to generate the subsequent node in the path.

(a) PUSH([u, i] , S) places the stack item [u, i] on the top of the stack S.

(b) POP(S) returns the top item and deletes it from the stack.

(c) EMPTY (S) is a boolean function that returns true if the stack is empty,
false otherwise.

A stack provides a lastin, firstout memory management strategy. The


algorithm consists of two repeatuntil loops. The interior loop (5) extends the
current derivation by expanding the final node of the path. This loop terminates
when the most recently constructed node completes the derivations of a terminal
string or is determined to be a deadend. There are three ways in which
deadends can be detected:

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 34 / 49


A stack S is maintained using the procedures PUSH, POP and EMPTY. A stack
element is an ordered pair [u, i], where u is a sentential form and i the number of
the rule applied to u to generate the subsequent node in the path.

(a) PUSH([u, i] , S) places the stack item [u, i] on the top of the stack S.

(b) POP(S) returns the top item and deletes it from the stack.

(c) EMPTY (S) is a boolean function that returns true if the stack is empty,
false otherwise.

A stack provides a lastin, firstout memory management strategy. The


algorithm consists of two repeatuntil loops. The interior loop (5) extends the
current derivation by expanding the final node of the path. This loop terminates
when the most recently constructed node completes the derivations of a terminal
string or is determined to be a deadend. There are three ways in which
deadends can be detected:
The terminal prefix of q may not match an initial substring of p, line (6).

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 34 / 49


A stack S is maintained using the procedures PUSH, POP and EMPTY. A stack
element is an ordered pair [u, i], where u is a sentential form and i the number of
the rule applied to u to generate the subsequent node in the path.

(a) PUSH([u, i] , S) places the stack item [u, i] on the top of the stack S.

(b) POP(S) returns the top item and deletes it from the stack.

(c) EMPTY (S) is a boolean function that returns true if the stack is empty,
false otherwise.

A stack provides a lastin, firstout memory management strategy. The


algorithm consists of two repeatuntil loops. The interior loop (5) extends the
current derivation by expanding the final node of the path. This loop terminates
when the most recently constructed node completes the derivations of a terminal
string or is determined to be a deadend. There are three ways in which
deadends can be detected:
The terminal prefix of q may not match an initial substring of p, line (6).
There may be no rules to apply to the leftmost variable in the q, line (7).
This occurs when all the appropriate rules have been examined and have
produced deadends.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 34 / 49


A stack S is maintained using the procedures PUSH, POP and EMPTY. A stack
element is an ordered pair [u, i], where u is a sentential form and i the number of
the rule applied to u to generate the subsequent node in the path.

(a) PUSH([u, i] , S) places the stack item [u, i] on the top of the stack S.

(b) POP(S) returns the top item and deletes it from the stack.

(c) EMPTY (S) is a boolean function that returns true if the stack is empty,
false otherwise.

A stack provides a lastin, firstout memory management strategy. The


algorithm consists of two repeatuntil loops. The interior loop (5) extends the
current derivation by expanding the final node of the path. This loop terminates
when the most recently constructed node completes the derivations of a terminal
string or is determined to be a deadend. There are three ways in which
deadends can be detected:
The terminal prefix of q may not match an initial substring of p, line (6).
There may be no rules to apply to the leftmost variable in the q, line (7).
This occurs when all the appropriate rules have been examined and have
produced deadends.
A terminal string other than p may be derived.
Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 34 / 49
Example:

Let CFG: G = (V , , R, S) = ({S, A, T }, {b, +, (, )}, R, S)


R:S A A T /A + T T b/(A)
The search tree constructed by the parse of (b + b) using algorithm is given:

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 35 / 49


The figure shows the sentential forms constructed by the depthfirst parse of the
string (b + b) in the graph of the grammar. The sentential forms connected by
dotted lines are those that have been generated and determined to the deadends.

When this choice results in a deadend, the next T rule is applied. Using the
prefix matching condition, the parser eventually determines that the rule A T
applied to (A) cannot lead to a derivation of (b + b). At this point the search
returns to the node (A) and constructs derivations utilizing the A rule A A + T .

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 36 / 49


A derivation of the input string can be constructed from the items on the stack
when the search successfully terminates. The first element is the sentential form
being expanded and the second is the number of the rule applied to generate the
successor.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 38 / 49


A breadthfirst bottomup parser

A derivation of a string p is obtained by building a path from the start


symbol S in the graph of a grammar. When the explicit search structure
begins with input string p, the resulting algorithm is known as a bottomup
parser. The bottomup parsers that examine rightmost derivations.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 39 / 49


A breadthfirst bottomup parser

A derivation of a string p is obtained by building a path from the start


symbol S in the graph of a grammar. When the explicit search structure
begins with input string p, the resulting algorithm is known as a bottomup
parser. The bottomup parsers that examine rightmost derivations.

By beginning the search with the input string p, the only derivations that are
produced by a bottomup parser are those that can generate p. This serves
to focus the search and limit the size of the search tree generated by the
parser.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 39 / 49


A breadthfirst bottomup parser

A derivation of a string p is obtained by building a path from the start


symbol S in the graph of a grammar. When the explicit search structure
begins with input string p, the resulting algorithm is known as a bottomup
parser. The bottomup parsers that examine rightmost derivations.

By beginning the search with the input string p, the only derivations that are
produced by a bottomup parser are those that can generate p. This serves
to focus the search and limit the size of the search tree generated by the
parser.

Bottomup parsing may be considered to be a search of an implicit graph


consisting of all that derive p by rightmost derivations.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 39 / 49


Algorithm

Input: context-free grammar G = (V , , R, S), string p , queue Q


(1) Initialize T with root p
INSERT(p, Q)
(2) Repeat
q := REMOVE(Q)
(3) FOR (each rule A w in R)
(4) FOR(each decomposition uwv of q with v )
(5) INSERT(uAv , Q)
(6) Add node uAv to T . Set a pointer from uAv to q
END FOR
END FOR
Until q = S or EMPTY(Q)
(7) IF (q = S)
accept
ELSE reject

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 40 / 49


The step (5) of algorithm inserts reductions into the queue. Bottomup parsers
are designed to generate only rightmost derivations. A reduction of uwv to uAv is
added to the search tree only if v is a terminal string. The bottomup parser is
building the derivation in reverse.

Example:

Let CFG: G = (V , , R, S) = ({S, A, T }, {b, +, (, )}, R, S)


R:S A A T /A + T T b/(A)
The search tree constructed by the parse of (b + b) using algorithm is given:

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 41 / 49


The path from S to (b + b) yields the rightmost derivation.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 42 / 49


A depthfirst bottomup parser

The reductions are generated by the shift and compare technique described
for the breadthfirst algorithm.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 43 / 49


A depthfirst bottomup parser

The reductions are generated by the shift and compare technique described
for the breadthfirst algorithm.

The order in which the reductions are processed is determined by the number
of shifts required to produce the match and the ordering of the rules.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 43 / 49


A depthfirst bottomup parser

The reductions are generated by the shift and compare technique described
for the breadthfirst algorithm.

The order in which the reductions are processed is determined by the number
of shifts required to produce the match and the ordering of the rules.

The following algorithm is designed for parsing strings in a contextfree


grammar in which the start symbol is nonrecursive. The start symbol is
nonrecursive if it does not occur on the righthand side of any rule.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 43 / 49


A depthfirst bottomup parser

The reductions are generated by the shift and compare technique described
for the breadthfirst algorithm.

The order in which the reductions are processed is determined by the number
of shifts required to produce the match and the ordering of the rules.

The following algorithm is designed for parsing strings in a contextfree


grammar in which the start symbol is nonrecursive. The start symbol is
nonrecursive if it does not occur on the righthand side of any rule.

A reduction using a rule S w in a grammar with a nonrecursive start


symbol is a deadend unless u = w and v = . In this case, the reduction
successfully terminates the parse.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 43 / 49


Algorithm

Input: context-free grammar G = (V , , R, S) with nonrecursive start


symbol, string p , stack S
(1) PUSH([, 0, p] , S)
(2) Repeat
(3) [u, i, v ] := POP(S)
(4) deadend := false
(5) Repeat
Find the first j > i with rule number j that satisfies
i) A ! w with u = qw and A 6= S or
ii) S ! w with u = w and v =
(6) IF ( there is such a j)
(7) PUSH ([u, j, v ] , S)
(8) u := qA
(9) i := 0
END IF

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 44 / 49


(10) IF( there is no such j and v 6= )
(11) shift(u, v )
(12) i := 0
END IF
(13) IF ( there is such a j and v = )
deadend := true
Until (u = S) or deadend
Until (u = S) or EMPTY(S)
(14) IF EMPTY(S)
reject
ELSE accept

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 45 / 49


Gramticas libre del contexto

1 Parsing and Ambiguity


Parsing
Ambiguity
Parsing Algorithm

2 Ejercicios

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 46 / 49


Ejercicios

(1) La GLC dada por


G = (V , , R, S) = ({S}, {a, b}, R, S)

donde R : S aS/Sb/ab/SS
(1.1) Dar una expresin regular para L(G ).
(1.2) Construir dos derivaciones mas a la izquierda para la palabra aabb.
(1.3) Construir AD para la pregunta (1,2).

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 47 / 49


(2) La GLC dada por
G = (V , , R, S) = ({S, A, B}, {a, b}, R, S)

donde R : S ASB/ab/SS
A aA/
B bB/
(2.1) Dar una expresin regular para L(G ).
(2.2) Construir derivaciones mas a la izquierda y a la derecha para la palabra aaabb.
(2.3) Muestre que G es ambigua.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 48 / 49


(3) La GLC dada por
G = (V , , R, S) = ({S, A, B}, {a, b, c, d}, R, S)

donde R : S aSb/aAb
A cAd/B
B aBb/
(3.1) Dar una definicin terica del L(G ).
(3.2) Muestre que G es ambigua.

(4) Usando la gramtica del C + + derivar una palabra reservada de este


lenguaje y construir su AD.

Jos Antonio Rodrguez Melquiades Lenguajes Formales y Autmatas 12 de septiembre de 2016 49 / 49

Vous aimerez peut-être aussi