0% found this document useful (0 votes)
482 views

Answers

The document provides solutions to homework problems about formal languages and automata theory. It includes: 1) Solutions for four problems involving describing languages with context-free grammars and pushdown automata. 2) Converting a context-free grammar for regular expressions to Chomsky normal form. 3) Determining whether given languages are regular, context-free but not regular, or not context-free, explaining with DFAs, CFGs, or pumping lemmas.

Uploaded by

Sean Tuason
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
482 views

Answers

The document provides solutions to homework problems about formal languages and automata theory. It includes: 1) Solutions for four problems involving describing languages with context-free grammars and pushdown automata. 2) Converting a context-free grammar for regular expressions to Chomsky normal form. 3) Determining whether given languages are regular, context-free but not regular, or not context-free, explaining with DFAs, CFGs, or pumping lemmas.

Uploaded by

Sean Tuason
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

CSCI 3130: Formal Languages and Automata Theory The Chinese University of Hong Kong, Fall 2011

Homework 3 Solutions

Problem 1
For each of the following languages, give a context-free grammar and a pushdown automaton. Give a short explanation of how the PDA works. (a) L1 = {wywR : the length of y is even}, = {a, b}. (Recall that wR is the reverse of w.) (b) L2 = {w : w has the same number of as as bs and cs together}, = {a, b, c}. (c) L3 = {ai bj ck : i > j or j > k, where i, j, k 0}, = {a, b, c}. (d) (Extra credit) L4 = {xy : |x| = |y| and x = y}, = {a, b}.

Solution
(a) This is a trick question as the language of L1 is the set () of all strings with even length. So L1 is described by the CFG S abS | aaS | baS | bbS | . You can also draw a 2-state PDA (based on the DFA of L1 ) for L1 . Alternatively, you can just follow the pattern in the denition of L1 and come up with the following CFG and PDA. S aSa | bSb | Y Y abY | bbY | baY | aaY |

(b) To write matched and Xs. generate

a CFG, let Y represent a b or a c. If the string starts with an a, then this a can be with an X somewhere so that the remaining segments have the same number of as Similarly, if it starts with an Y , this Y can be matched with a later a. So we can the strings with the same number of as and Y s via the rule S aSY S | Y SaS |

Finally, we add the rule Y b | c. For the PDA, we use an X on the stack to record the excess number of as, and we use Y to record the excess joint number of bs and cs. When we see an a, we have a choice: Either we can push an X or we can pop a Y (if it is available). When we see a b, we can either pop an X or push a Y . If there are as many bs and cs together as there are as, then the number of Y s on the stack will not exceed the number of Xs. Moreover, there is always a way to run the PDA so there are always nothing left on the stack. On the other hand, if there are many bs and cs together as there are as, it is not possible to have nothing left on the stack. So at the end, the PDA simply checks that nothing is left on the stack.

(c) There are two dierent possibilities: i > j or j > k. A, C generate the strings a , c respectively. X generates ai bj where i > j, thus XC represents the string ai bj ck where i > j. Y generates bj ck , where j > k thus AY represents the string ai bj ck , where j > k. Thus the union of XC, AY represents L3 . S XC |AY X aXb |aX |a Y bY c |bY |b A Aa | C Cc | Similarly, to obtain a PDA, we draw two dierent PDAs for the two possibilities and connect them up via -transitions.

(d) Lets rst try to understand what the strings in L4 look like. We know that w L4 if and only if it can be written in the form xy where |x| = |y| but x = y. If x and y are dierent then they must dier in some position, say i. Then the string w must look like (a + b)i a(a + b)j (a + b)i b(a + b)j
x part y part

or

(a + b)i b(a + b)j (a + b)i a(a + b)j .


x part y part

We can rewrite each of these like (a + b)i a(a + b)i (a + b)j b(a + b)j
u part v part

or

(a + b)i b(a + b)i (a + b)j a(a + b)j .


u part v part

So we can write language L5 as La Lb Lb La , where La = {uaw : |u| = |w|} and Lb is dened similarly. Clearly La and Lb are context-free as they are described by the CFGs: Sa U Sa U | a Sb U Sb U | b U a|b To obtain a CFG for L4 , we add the starting production: S Sa Sb | Sb Sa .

We can design a PDA following a similar principle. First, we design a PDA for La . The PDA pushes xs on the stack for each input symbol until it non-deterministically the middle a, then it pops xs for the remaining symbols until the end of the stack is reached, and accepts if the end of the stack coincides with the end of the input. The PDA for Lb is very similar. To obtain PDAs for La Lb and Lb La , we combine the two PDAs back-to-back. Finally, the PDA for L5 is obtained by combining these two via -transitions. Here is a somewhat simplied version of the PDA obtained by this construction.
a or b, x/ , $/$ a or b, /x

q2a
a, /

q3a
a, /

start

q0

, /$

q1
b, / a or b, /x b, / , $/$

q4

, $/

q5

q2b

q3b

a or b, x/

a or b, x/

a or b, /x

Problem 2
Consider the following context-free grammar G that describes (nontrivial) regular expressions over the alphabet {0, 1}: R (R) | R+R | RR | R* | 0 | 1 | e The alphabet of G consists of the symbols (, ), +, *, 0, 1, and e. Here + and * describe the union and star operators, while e describes the empty string. (a) Convert G to Chomsky Normal Form. (b) Apply the Cocke-Younger-Kasami algorithm (algorithm 2 from lecture 9) to obtain parse trees for the following strings: (1+0)*, 0+01, (1+e)1*. Some of these expressions several parse trees; which ones describe the intended meaning of the expression? (c) Give a CFG G that describes the same language as G but is not ambiguous. Moreover, each parse tree in G should describe the intended meaning of the corresponding regular expression.

Solution
(a) We notice that there are no unit productions and -productions in original CFG. So we just break up long sequences with new variables. We obtain the following grammar: R XB | Y R | RR | RM | 0 | 1 | e X AR Y RP A( B) P + M *

(b) On input (1+0)*, the run of the CYK algorithm looks like this: R R X X A (

R Y R 1

P +

R 0

B )

M *

The table yields the following unique parse tree: R R X A ( R 1 On input 0+01, R R Y R 0 Y P + R R 0 B ) M *

P +

R R R 0

R 1

There are two parse for 0+01. Only the left one describes the intended meaning of the expression. The left one describes the language rst take the union of 0 and 0, and then concatenate with 1, which we would write as (0+0)1. R Y R 0 P + R 0 R R 1 R 0 On input (1+e)1*, the CYK algorithm does the following: R R R X X A ( Y P + and R R 0 R R 1

R Y R 1

Y P +

R e

B )

R R 1

M *

There are two parse trees for this expression. Only the left one describes the intended meaning. R R X A ( R 1 Y P + R R e B ) R 1 R M * A ( R 1 Y P + X R R e R B ) and R R 1 R M *

(c) To disambiguate the grammar and ensure the parse tree has the intended meaning, we have to consider the order of precedence of the operators (union, concatenaton, and star). Union has the least precedence, so we write a regular expression R as a union of one or more terms (which we represent by T ). Next down the line is concatenation, so each term is a concatenaton of

one or more factors (F). Now each factor can be the star of another factor, or it can represent a whole regular expression, provided it is parenthesized. Finally, the atomic factors are the constants 0, 1, and e, which we represent by C. RR+T |T T TF | F F (R) | F | C C0|1|e

Problem 3
Consider the following languages. For each of the languages, say whether the language is (1) regular, (2) context-free but not regular, or (3) not context free. Explain your answer (e.g. give a DFA or argue why one exists, give a CFG, apply the appropriate pumping lemma). (a) L1 = {an bn an bn : n 0}, = {a, b}. (b) L2 = {wR #z : w is a substring of z, w, z {a, b} }, = {a, b, #}. (c) L3 = {w#z : w is a substring of z, w, z {a, b} }, = {a, b, #}. (d) L4 = {x + y=z : x + y=z in unary where x, y, z 11 }, = {1, =, +}. For example, 1 + 11 = 111 L4 but +1 = 1 L4 , 1 + 1 = 111 L4 . / /

Solution
(a) L1 is not context-free. Suppose it is. By the pumping lemma, there is a pumping length p so that every w L1 of length at least p can be written as w = uvxyz so that |vy| > 0, |vxy| p and uv i xy i z L1 for every i. Let w = ap bp ap bp , which is in L1 and consider any way of writing w as uvxyz. We consider three cases: Case 1: vy is of the form a or b . Then vxy must be contained in a contiguous block of as or a contiguous block of bs. Suppose it is a block of as. Then uv 2 xy 2 z contains an uneven number of as in the two blocks, so it is not in L1 . If it is a block of bs, the analysis is analogous. Case 2: vy contains both as and bs. Then uv 2 xy 2 z is not of the form a b a b , so it is not in L1 . This contradicts the assumption that L1 is context-free. (b) L2 is context free but not regular. We rst give a CFG. The strings in L1 have the form wR #uwz, where u, w, z {a, b} . A string S of this form can be written as Az, where

z {a, b} and A has the form wR #uw. We can write A as wR Bw, where B has the form #(a + b) and w {a, b} . This gives the following CFG: S Sa | Sb | A A aAa | bAb | B B Ba | Bb | # To show L2 is not regular, for pumping length n, we choose s = an #an which is in L2 . For any partition s = xyz where y = and |xy| n, y must contain only as before #. Suppose y = aj , where 0 < j n. Then we can let i = 2 and get xy i z = xy 2 z = anj a2j #an = an+j #an which is not in L2 . Hence L2 is not regular. (c) L3 is not context-free. We prove this by pumping lemma. Consider an arbitrary pumping length n, and choose s = an bn #an bn . Then s L3 but we will show that no matter how we write s = uvwxy with |vwx| n, |vx| > 0, we can pump it out of L3 . We look at three cases: Case 1: vwx is in the rst half (before #) of an bn #an bn . We choose i = 2, thus uv 2 wx2 y looks like z#z where |z| > |z | so that z cannot be a substring of z . Hence uv 2 wx2 y is not in L3 . Case 2: vwx is in the second half (after #) of an bn #an bn . We choose i = 0, thus uv 0 wx0 y looks like an bn #ai bj where i < n or j < n (or both). an bn can not be a substring of ai bj . Case 3: vwx is in the middle part of an bn #an bn (not intersecting the initial block of as and the nal block of bs). If vx contains #, then we choose i = 0, thus uv 0 wx0 y contains no # so that uv 0 wx0 y is not in L3 . Else, v must in left bs of # and x in the right as of #. If v = , we choose i = 2, then there are more bs in the left string of # than the right string of # in uv 2 wx2 y. If x = , we choose i = 0, then there are more as in the left string of # than the right string of # in uv 0 wx0 y. In those cases, uv i wxi y is out of L3 since the left string of # can be a substring of right string of #. This covers all the cases, so by the pumping lemma for context-free languages, L3 is not context-free. (d) L4 is context-free but not regular. We can give following grammar for this language S A + B = AB A 1A | 1 B 1B | 1 We prove its not regular by pumping lemma. Consider an arbitrary pumping length n, and choose s = 1n + 1n = 12n . For any partition s = xyz where y = and |xy| n, y must contain only 1s before +. Suppose y = 1j , where 0 < j n. Then we can let i = 2 and get xy i z = xy 2 z = 1nj 12j + 1n = 12n which is not in L2 . Hence L2 is not regular.

10

Problem 4
Context-free grammars are sometimes used to model natural languages. In this problem you will model a fragment of the English language using context-free grammars. Consider the following English sentences: The The The The girl girl girl girl is pretty. that the boy likes is pretty. that the boy that the clerk pushed likes is pretty. that the boy that the clerk that the girl knows pushed likes is pretty.

This is a special type of sentence built from a subject (The girl), a relative pronoun (that) followed by another sentence, a verb (is) and an adjective (pretty). (a) Give a context-free grammar G that models this special type of sentence. Your terminals should be words or sequences of words like pretty or the girl. (b) Is the language of G regular? If so, write a regular expression for it. If not, prove using the pumping lemma for regular languages. (c) Can you give an example of a sentence that is in G but does not make sense in common English?

Solution
(a) The following grammar G describes this type of sentence:

SEN T EN CE SU BJ REL V ERB ADJ REL that SU BJ REL V ERB | SU BJ the girl | the boy | the clerk V ERB is | likes | pushed | knows ADJ pretty (b) This language is not regular. We prove this via the pumping lemma for regular languages. For any given n, the string the girl (that the boy)n (knows)n is pretty is in L(G). It can be generated via the recursive rule for REL. Let uvw be any splitting of z where |uv| n, |v| > 0. If v consists only of a single copy of that, then uv 2 w is not in L(G) since it contains the pattern that that, which is not allowed by the rules. Otherwise v contains at least one subject (the girl or the boy) but no verb. Then uv 2 w has more subjects than verbs, which is not allowed by the rules of the grammar, where every subject must have a matching verb.

11

(c) You can give dierent kinds of examples. For instance, The girl pushed pretty does not make sense in English. Neither does The boy that the boy is is pretty. There are also examples that make grammatical sense, but we would hardly use in practice, like The girl that the boy that the clerk that the girl knows pushed likes is pretty. This illustrates some of the diculties in trying to design formal grammars for natural languages.

You might also like