Vous êtes sur la page 1sur 3

Navarro, Jose

6.806 PSET 4
Problem 1(a):
We’re given a CFG 𝐺′: (𝑁, Σ, 𝑅, 𝑆, 𝑃) where 𝐺′ is not necessarily in Chomsky normal form.
To do so, we need to make sure that:

 For any 𝑋 → 𝑌1 𝑌2 for 𝑋 ∈ 𝑁 and 𝑌1 , 𝑌2 ∈ 𝑁


 𝑋 → 𝑌 for 𝑋 ∈ 𝑁, but 𝑌 ∈ Σ.
Consider for any non-terminal 𝑋 in G. Consider the first case in which 𝑋 → 𝑌1 … 𝑌𝑛 where
𝑛 > 2. We can inductively apply the following:
Introduce a new non-terminal: 𝑍1 and a new rule such that, 𝑋 → 𝑍1 𝑌𝑛 . Since we now have
that 𝑍1 = 𝑌1 … 𝑌𝑛−1 we also check the case if 𝑛 − 1 > 2, if so we continue the process (i.e
𝑍1 = 𝑍2 𝑌𝑛−1 , where 𝑍2 = 𝑌1 … 𝑌𝑛−2 ). At every step we have the guarantee that a none-
terminal 𝑋 or 𝑍𝑖 when mapping to non-terminals, only maps to two non-terminals exactly.
We now deal with the case that, a non-terminal maps to less than 2 non-terminals (which was
possible given the above construction). Note that we’re “removing” one non-terminal at a
time with the above procedure in the sense that we’re replace every non-terminal except the
last non-terminal with a new non-terminal 𝑍𝑖 . This implies monotonicity which means we
first must deal with case in which there are exactly 2 non-terminals. In such case, we don’t do
anything since the mapping adheres to CNF and we don’t continue the above introduction of
new non-terminals. We’ll either be left with 𝑋 → 𝑌1 𝑌2 (we didn’t change anything) or after
applying the above process: 𝑍𝑖 → 𝑌1 𝑌2 .
Since all of the rules in our 𝐺′ are guaranteed to be of the form:
 𝑋 → 𝑌1 … 𝑌𝑛 for 𝑋 ∈ 𝑁 , 𝑌𝑖 ∈ 𝑁 and 𝑛 ≥ 2
 𝑋 → 𝑌 where 𝑌 ∈ Σ.
We took care of the first bullet case, and since 𝑋 can’t, by default, map to multiple terminals,
we don’t need to update the terminal rules either. For every new non-terminal we added,
assign probability 1 for its corresponding rule and maintain the probabilities of the original
rules in 𝐺′ in this way the intended probabilities remain the same.
1(b).
Now we use the above mappings to transform the CFG shown.
𝑆 → 𝑁𝑃 𝑉𝑃 |0.7 𝑁𝑃 → 𝑍3 𝑁𝑁 |0.3 𝑁𝑁 → 𝑤𝑜𝑚𝑎𝑛 |0.2
𝑆 → 𝑍1 𝑉𝑃 |0.3 𝑍3 → 𝐷𝑇 𝑁𝑁|1.0 𝑁𝑁 → 𝑡𝑒𝑙𝑒𝑠𝑐𝑜𝑝𝑒 |0.1
𝑍1 → 𝑁𝑃 𝑉𝑃 | 1.0 𝑁𝑃 → 𝑁𝑃 𝑃𝑃 |0.7 𝐷𝑇 → 𝑡ℎ𝑒 | 1.0
𝑉𝑃 → 𝑉𝑡 𝑁𝑃 | 0.8 𝑃𝑃 → 𝑃 𝑁𝑃 |1.0 𝐼𝑁 → 𝑤𝑖𝑡ℎ |0.5
𝑉𝑃 → 𝑍2 𝑃𝑃 |0.2 𝑉𝑡 → saw|1.0 𝐼𝑁 → 𝑖𝑛 | 0.5
𝑍2 = 𝑉𝑡 𝑁𝑃 |1.0 𝑁𝑁 → 𝑚𝑎𝑛 |0.7
Problem 2.
We consider two trees for the following sentence: “John snored on Wednesday in a park
under a bush”. We’ll denote this by tree 𝐶 and tree 𝑁.
First note that tree 𝑁 is binary in structure (other than the leaves). A con for tree 𝑁 is that the
modifiers for the phrase “snored” are nested in nature. 𝑁 implies that “snoring” happened on
Thursday but then “snoring on Wednesday” happened “in a park”, and then “snoring in a
park” happened under a bush. This might not make semantic sense. On the other hand, tree 𝐶
avoids this problem since each modifier only modifies “snored” which is the intended
meaning. Hence, in general if a sentence contains more modifiers than 2 for a single word or
verb, there will be a loss in meaning. A pro of 𝑁 is that it might make analysis via CYK
algorithm easier since it is somewhat easier to convert this tree into a CNF.
A pro for the C tree is that is requires less memory to maintain that 𝑁 tree. Since, 𝐶 doesn’t
need to maintain a binary structure there are less redundant nodes and edges that are found in
𝑁 (in this example the difference is not so big, but as the trees grow it would become more
significant). However, in the case where a sentence has very long sequence of PP’s for a
single word or verb, that parent needs to create an arbitrarily long (as long as the number of
modifier PPs there are) number of children.
Problem 3.
(a). We’ll make use of the DP table: 𝜋(𝑖, 𝑗, 𝑘, 𝑋) to represent the highest probability of any
parse tree with root symbol as: 𝑋(𝑤𝑘 ) for the sequence of words from 𝑖 to 𝑗.
Call
𝑣1 = max {𝜋(𝑖, 𝑙, 𝑘, 𝑃) ∗ 𝜋(𝑙 + 1, 𝑗, 𝑤2 , 𝑄) ∗ Pr(𝑋(𝑤𝑘 ) → 𝑃(𝑘)𝑄(𝑤2 )}
𝑙,𝑃,𝑄,𝑤2

𝑣2 = max {𝜋(𝑖, 𝑙, 𝑤1 , 𝑃) ∗ 𝜋(𝑙 + 1, 𝑗, 𝑘, 𝑄) ∗ Pr(𝑋(𝑤𝑘 ) → 𝑃(𝑤1 )𝑄(𝑘)}


𝑙,𝑃,𝑄,𝑤1

𝜋(𝑖, 𝑗, 𝑘, 𝑋) = max{ 𝑣1 , 𝑣2 }
Where 𝑖 ≤ 𝑙 ≤ 𝑗, 𝑃, 𝑄 ∈ 𝑁, and 𝑤1 , 𝑤2 ∈ sentence. We then analyze the initial case:
𝜋(𝑖, 𝑖, 𝑗, 𝑄) = Pr(𝑄(𝑗) → 𝑤𝑖 ) where 𝑤𝑖 ∈ sentence.
We need to populate this table and then backtrack in order to come up with the optimal parse
tree.
Given a query: 𝑥0 , 𝑥1 , … 𝑥𝑛
Recursively solve and memoize: ∀ 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑙𝑠 𝑘 ∈ sentence, 𝜋(0, 𝑛, 𝑘 , 𝑆) by the above
updating rules. In addition, we also want to know what was the optimal children nodes, since
we needed to find this anyways, we should memorize this as well. Create an additional
dictionary the maps the tuples of subproblems (𝑒. 𝑔 (0, 𝑛, 𝑘, 𝑆) → (𝑃(𝑘), 𝑄(𝑤𝑠 ), 𝑘, 𝑤𝑠 , 𝑙)
where we found via the above update rule that 𝑃, 𝑄 are optimal children, with the parent’s
word 𝑘 going to the left subchildren and some other optimal terminal 𝑤𝑠 for 𝑄. In this case
the 𝑃(𝑘) child subtree should produce words from (0, 𝑙) and 𝑄(𝑤𝑠 ) from (𝑙 + 1, 𝑛) (with
optimal probabilities).
Once the recursion is done, we can build tree node by node, by starting at the root:
We first start with the biggest subproblem: 𝜋(0, 𝑛, 𝑘 , 𝑆), we search the tuple (0, 𝑛, 𝑘, 𝑆) in
our dictionary and obtain (𝑃(𝑘), 𝑄(𝑤𝑠 ), 𝑘, 𝑤𝑠 , 𝑙). We then find the children nodes, by
considering: 𝜋(0, 𝑙, 𝑘 , 𝑃) for the left child and 𝜋(𝑙 + 1, 𝑛, 𝑤𝑠 , 𝑄) for the second child. We
repeat the process by continuously looking up the respective optimal nodes and then output
this as a final result.

Question 3(b): Say the query sentence is of length 𝐿.


Runtime of the algorithm: The DP problem considers: 𝐿2 |𝑁||L| subproblems. Each
subproblem we must find optimal middle index: 𝐿 possibilities, optimal two children: |𝑁|2
possibilities, and optimal terminal for one of the children: |L| possibilities.
In total we have: (𝐿3 |𝑁|) ∗ (L2 |N|2 ) = 𝑂(𝐿5 |𝑁|3 ) as our runtime.

Vous aimerez peut-être aussi