CS103 Cynthia Lee
Spring 2019 Keith Schwarz
Problem Set 8
In this problem set, you’ll transition away from the regular languages to the context-free lan-
guages and to the realm of Turing machines. This will be your first foray beyond the limits of
what computers can over hope to accomplish, and we hope that you find this as exciting as we do!
As always, please feel free to drop by office hours or ask on Piazza if you have any questions.
We'd be happy to help out.
Good luck, and have fun!
Due Friday, May 31st at 2:30PM.
2/7
Problem One: Designing CFGs
For each of the following languages, design a CFG for that language. Please use our online tool to de-
sign, test, and submit the CFGs in this problem. To use it, visit the CS103 website and click the “CFG
Editor” link under the “Resources” header. You should only have one member from each team submit
your grammars; tell us who this person is when you submit the rest of the problems through Grade-
Scope.
i. Given Σ = {a, b, c}, write a CFG for the language { w ∈ Σ* | w contains aa as a substring }.
For example, the strings aa, baac, and ccaabb are all in the language, but aba is not.
ii. Given Σ = {a, b}, write a CFG for the language { w ∈ Σ* | w is not a palindrome }, the lan-
guage of strings that are not the same when read forwards and backwards. For example,
aab ∈ L and baabab ∈ L, but aba ∉ L, bb ∉ L, and ε ∉ L.
Don’t try solving this one by starting with the CFG for palindromes and making modifications to it. In
general, there’s no way to mechanically turn a CFG for a language L into a CFG for the language L,
since the context-free languages aren’t closed under complementation. However, the idea of looking at
the first and last characters of a given string might still be a good idea.
iii. Let Σ be an alphabet containing these symbols:
Ø ℕ { } , ∪
We can form strings from these symbols which represent sets. Here's some examples:
Ø {Ø, ℕ} ∪ ℕ ∪ Ø {Ø} ∪ ℕ ∪ {ℕ} {Ø, Ø, Ø}
{{ℕ, Ø} ∪ {Ø}} ℕ ∪ {ℕ, Ø} {} {ℕ}
{Ø, {Ø, {Ø}}} {{{{ℕ}}}} ℕ {Ø, {}}
Notice that some of these sets, like {Ø, Ø} are syntactically valid but redundant, and others like
{} are syntactically valid but not the cleanest way of writing things. Here's some examples of
strings that don't represent sets or aren't syntactically valid:
ε }Ø{ Ø{ℕ} {{}
ℕ, Ø, {Ø} {, ℕ} { ℕ Ø }, {,}
{Ø }} ℕ { Ø, Ø, Ø, } {ℕ, , , Ø}
Write a CFG for the language { w ∈ Σ* | w is a syntactically valid string representing a set }.
When using the CFG tool, please use the letters n, u, and o in place of ℕ, ∪, and Ø, respec -
tively.
Fun fact: The starter files for Problem Set One contain a parser that’s designed to take as input
a string representing a set and to reconstruct what set that is. The logic we wrote to do that pars-
ing was based on a CFG we wrote for sets and set theory. Take CS143 if you’re curious how to
go from a grammar to a parser!
Test your CFG thoroughly! In Fall 2017, a quarter of the submissions we received weren’t able to de -
rive the string {Ø, Ø, Ø}.
As a hint, as is often the case when writing CFGs, we recommend that you use different nonterminals
to represent different components of the string. For example, structure of a comma-separated list is
very different from the structure of an expression combining multiple sets together.
3/7
Problem Two: The Complexity of Addition
This problem explores the following question:
How hard is it to add two numbers?
Suppose that we want to check whether x + y = z, where x, y, and z are all natural numbers. If we want
to phrase this as a problem as a question of strings and languages, we will need to find some way to
standardize our notation. In this problem, we will be using the unary number system, a number system
in which the number n is represented by writing out n 1's. For example, the number 5 would be written
as 11111, the number 7 as 1111111, and the number 12 as 111111111111.
Given the alphabet Σ = {1, +, =}, we can consider strings encoding x + y = z by writing out x, y, and z
in unary. For example:
4 + 3 = 7 would be encoded as 1111+111=1111111
7 + 1 = 8 would be encoded as 1111111+1=11111111
0 + 1 = 1 would be encoded as +1=1
Consider the alphabet Σ = {1, +, =} and the following language, which we’ll call ADD:
{ 1m+1n=1m+n | m, n ∈ ℕ }
For example, the strings 111+1=1111 and +1=1 are in the language, but 1+11=11 is not, nor is the
string 1+1+1=111.
i. Prove or disprove: the language ADD defined above is regular.
ii. Write a context-free grammar for ADD, showing that ADD is context-free. (Please submit your
CFG online.)
You may find it easier to solve this problem if you first build a CFG for this language where you’re al-
lowed to have as many numbers added together as you’d like. Once you have that working, think about
how you’d modify it so that you have exactly two numbers added together on the left-hand side of the
equation.
Problem Three: The Complexity of RNA Hairpins
RNA strands consist of strings of nucleotides, molecules which encode genetic information. Computa-
tional biologists typically represent each RNA strand as a string made from four different letters, A, C,
G, and U, each of which represents one of the four possible nucleotides.
Each of the the four nucleotides has an affinity for a specific other nucleotide. Specifically:
A has an affinity for U (and vice-versa) C has an affinity for G (and vice-versa)
This can cause RNA strands to fold over and bind with themselves. Consider this RNA strand:
G A U U A C A G G U A A U C
If you perfectly fold this RNA strand in half, you get the following:
G A U U A C A G A U U A C A
C U A A U G G C U A A U G G
4/7
Notice that each pair of nucleotides – except for the A and the G on the far right – are attracted to the
corresponding nucleotide on the other side of the RNA strand. Because of the natural affinities of the
nucleotides in the RNA strand, the RNA strand will be held in this shape. This is an example of an RNA
hairpin, a structure with important biological roles.
For the purposes of this problem, we'll say that an RNA strand forms a hairpin if
• it has even length (so that it can be cleanly folded in half);
• it has length at least ten (there are at least four pairs holding the hairpin shut); and
• all of its nucleotides, except for the middle two, have an affinity for its corresponding nucleo-
tide when folded over. (The middle two nucleotides in a hairpin might coincidentally have an
affinity for one another, but it's not required. For example, CCCCAUGGGG forms a hairpin.)
This problem explores the question
How hard is it to determine whether an RNA strand forms a hairpin?
Let Σ = {A, C, G, U} and let LRNA = { w ∈ Σ* | w represents an RNA strand that forms a hairpin }. For
example, the strings UGACCCGUCA, GUACAAGUAC, UUUUUUUUUAAAAAAAAA, and CCAACCUUGG
are all in LRNA, but the strings AU, AAAAACUUUUU, GGGC, and GUUUUAAAAG are all not in LRNA.
Design a CFG for LRNA, which proves that the language is context-free. Please submit your grammar on-
line. (This language turns out to not be regular, though the proof of that result using the Myhill-Nerode
theorem is heavy on details and light on the intuition, so we won’t ask you to do that here.)
Problem Four: Equivalence Classes and Regular Languages, Part Two
On Problem Set Seven, you explored the indistinguishability relation for L, denoted ≡L, defined as
x ≡L y if ∀w ∈ Σ*. (xw ∈ L ↔ yw ∈ L).
You specifically proved that for any language L, the relation ≡L is an equivalence relation and that any
DFA for L must have at least I(≡L) states. In this problem, you’re going to prove an amazing result:
Theorem: If L is a language where I(≡L) is finite, then L is regular.
In other words, if you know absolutely nothing about a language other than there are finitely many
equivalence classes of the ≡L relation, then somewhere out there, there must be a DFA for L!
Let L be an arbitrary language over some alphabet Σ where I(≡L) is finite. We are going to prove that L
is regular by defining a 5-tuple (Q, Σ, δ, q₀, F) for this language L. The key insight behind this proof is
how to choose Q. Specifically, we will choose Q to be the set of equivalence classes of ≡L:
Q = { [w]≡L | w ∈ Σ* }.
It might seem strange to have the states of a DFA be sets, but then again, you’ve seen something like
this before. In Problem Set Six, when working through the subset construction, you created a DFA
whose states literally were sets of states of some particular NFA.
5/7
i. Explain why Q is finite. This should take you at most a sentence or two.
We now need to figure out how to pick a start state and wire up our transitions. Our goal will be to de-
fine q₀ and δ so that our DFA has the following property: if you run w through this DFA, the state you
end up in corresponds to [w]≡L. It turns out that choosing q₀ and δ as follows makes this work:
q₀ = [ε]≡L δ([x]≡L, a) = [xa]≡L.
Of course, you shouldn’t take our word for it. You should prove that these choices make everything
work!
ii. Prove that for any string w ∈ Σ*, we have δ*(w) = [w]≡L.
Need help with the definition of δ*? Check the pset8 web page for a link to a 1-pager with a definition
and key related theorems.
To seal the deal, we need to choose our set of accepting states. We’ll define F as follows:
F = { [w]≡L | ∃x ∈ [w]≡L. x ∈ L }.
In other words, F is the set of all equivalence classes containing at least one string in L.
iii. In the δ* 1-page explainer document (see pset8 web page), you saw that we can formally define
ℒ(D) = { w ∈ Σ* | δ*(w) ∈ F }. Prove that with this choice of F, we have ℒ(D) = L.
There is a ton of formal notation here, but at the end of the day, this question is just asking you to prove
that two sets are equal. Think way back to Problem Set One. What’s the easiest way to do this?
Your proof should use the formal definitions provided here rather than higher-level concepts like “the
DFA accepts w” or “run the DFA on w.” Also, perhaps a result from Problem Set Seven would be use-
ful here?
By combining the two theorems you’ve explored about indistinguishability – the one you proved last
time, and the one from above – we get this fundamental result:
Theorem (Myhill-Nerode): A language L is regular if and only if I(≡L) is finite.
Furthermore, if I(≡L) is finite, the smallest possible DFA for L has exactly I(≡L) states.
This result formalizes the intuition we’ve had about regular languages corresponding to problems you
can solve with only finite memory. The “memory” you need corresponds to remembering which equiv-
alence class the string you’ve seen so far happens to fall into.
If you talk to CS theory folk and mention “the Myhill-Nerode theorem,” they’ll assume you’re talking
about the above theorem! The version we saw in lecture is just a special case of this more general one.
6/7
Problem Five: TMs, Formally
Just as it’s possible to formally define a DFA using a 5-tuple, it’s possible to formally define a TM as an
8-tuple (Q, Σ, Γ, B, q₀, Y, N, δ) where
• Q is a finite set of states, which can be anything;
• Σ is a finite, nonempty set called the input alphabet;
• Γ is a finite, nonempty set called the tape alphabet, where Σ ⊆ Γ;
• B ∈ Γ – Σ is the blank symbol;
• q₀ is the start state, where q₀ ∈ Q;
• Y ⊆ Q is the set of accepting states;
• N ⊆ Q is the set of rejecting states, where Y ∩ N = Ø; and
• δ is the transition function, described below.
Remember that the definition is the official arbiter of what is legal and what isn’t, so if the definition
doesn’t preclude something, it’s legal regardless of how counterintuitive or weird it might be. This
question explores some aspects of the definition.
i. Is it possible to have a TM with no states? Justify your answer.
ii. Is it possible to have a TM with no accepting states? Justify your answer.
iii. Is it possible to have a TM with no rejecting states? Justify your answer.
iv. Why is the restriction Y ∩ N = Ø there? Justify your answer.
v. Is it possible to have a TM where Σ = Γ? Justify your answer.
Now, let’s talk about the transition function. As with DFAs, the transition function of a Turing machine
is what formally defines the transitions. If q is a state in a TM that isn’t an accepting state or a rejecting
state and a is a symbol that can appear on the TM’s tape, then
δ(q, a) = (r, b, D)
where r is the new state to transition into, b is the symbol to write back to the tape, and D is either L for
“move left” or R for “move right.” Because TMs immediately stop running after entering an accepting
or rejecting state, the δ function should not be defined for any state q that’s either accepting or reject-
ing. Aside from this, δ should be defined for every combination of a (non-accepting, non-rejecting)
state q and any symbol a that can appear on the tape.
vi. Based on the above description of δ, what should the domain of δ be? What should it codomain
be? Answer this question by filling in the following blanks, and briefly justify your answer.
δ : _______________ → _______________
In class, we said that any missing transitions in a TM implicitly reject. By that, we didn’t mean that the
TM’s transition function can be undefined on certain inputs. Instead, it means “if we don’t draw in a
transition in a picture representing the TM, it means that the transition does exist and goes to a reject-
ing state, but we were just too lazy to draw it in.” So you should assume that every transition not
drawn in a picture of a TM really is there and really goes to some rejecting state.
Also, take a moment to appreciate the fact that you can read the notation in this question and under-
stand what it means! Could you imagine doing that at the start of the quarter?
7/7
Problem Six: Regular and Decidable Languages
In class, we alluded to the fact that REG (the class of all regular languages) is a subset of R (the class
of all decidable languages), but we never actually justified this claim.
Describe a construction that, given a DFA D, produces a decider D’ where ℒ(D) = ℒ(D’ ). Briefly jus-
tify why the TM D’ you construct is a decider and why it accepts precisely the strings that D accepts.
Illustrate your example by applying it to a small DFA D of your choice.
Although you have a formal 5-tuple definition of a DFA and a formal 8-tuple definition of a TM at your
disposal, we’re not expecting you to write your solution at that level of detail.
Remember that DFAs and TMs work completely differently with regards to accepting and rejecting
states and that the transitions in TMs have a very different structure than the transitions in DFAs!
Problem Seven: What Does it Mean to Solve a Problem?
Let L be a language over Σ and M be a TM with input alphabet Σ. Here are three potential traits of M:
1. M halts on all inputs.
2. For any string w ∈ Σ*, if M accepts w, then w ∈ L.
3. For any string w ∈ Σ*, if M rejects w, then w ∉ L.
At some level, for a TM to claim to solve a problem, it should have at least some of these properties.
Interestingly, though, just having two of these properties doesn't say much.
i. Prove that if L is any language over Σ, then there is a TM M that satisfies properties (1) and (2).
ii. Prove that if L is any language over Σ, then there is a TM M that satisfies properties (1) and (3).
iii. Prove that if L is any language over Σ, then there is a TM M that satisfies properties (2) and (3).
iv. Suppose that L is a language over Σ for which there is a TM M that satisfies properties (1), (2),
and (3). What can you say about L? Prove it.
The whole point of this problem is to show that you have to be extremely careful about how you define
“solving a problem,” since if you define it incorrectly then you can “solve” a problem in a way that
bears little resemblance to what we’d think of as solving a problem. Keep this in mind as you work
through this one.
Optional Fun Problem: TMs and Regular Languages (Extra Credit)
Let M be a TM with the following property: there exists a natural number k such that after M is run on
any string w, M always halts after at most k steps. Prove that ℒ(M) is regular.