Thursday, January 29, 2015

Reusing Data from Privacy

Vitaly Feldman gave a talk at Georgia Tech earlier this week on his recent paper Preserving Statistical Validity in Adaptive Data Analysis with Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold and Aaron Roth. This work looks at the problem of reuse of the cross-validation data in statistical inference/machine learning using tools from differential privacy.

Many machine learning algorithms have a parameter that specifies the generality of the model, for example the number of clusters in a clustering algorithm. If the model is too simple it cannot capture the full complexity of what it is learning. If the model is too general it may overfit, fitting the vagrancies of this particular data too closely.

One way to tune the parameters is by cross-validation, running the algorithm on fresh data to see how well it performs. However if you always cross-validate with the same data you may end up overfitting the cross-validation data.

Feldman's paper shows how to reuse the cross-validation data safely. They show how to get an exponential (in the dimension of the data) number of adaptive uses of the same data without significant degradation. Unfortunately their algorithm takes exponential time but sometimes time is much cheaper than data. They also have an efficient algorithm that allows a quadratic amount of reuse.

The intuition and proof ideas come from differential privacy where one wants to make it hard to infer individual information from multiple database queries. A standard approach is to add some noise in the responses and the same idea is used by the authors in this paper.

All of the above is pretty simplified and you should read the paper for details. This is one of my favorite kinds of paper where ideas developed for one domain (differential privacy) have surprising applications in a seemingly different one (cross-validation).


Monday, January 26, 2015

A nice problem from a Romanian Math Problem Book


(All of the math for this problem is here)
My Discrete Math Honors TA Ioana showed me a Romanian Math Problem book
(She is Romanian) and told the following problem:


(All ai in this post are assumed to be natural numbers)

Show that for all n ≥ 6 there exists (a1,...,an) such that 1/a12 + ... + 1/an2 = 1.

(sum of reciprocals squared)

Normally my curiosity exceeds my ego and I would look up the answer.
But it was in Romanian! Normally I would ask her to read the answer to me.
But I was going out of town! Normally I would look it up the answer on the
web. But this is not the kind of thing the web is good at!

So I did the obvious thing- worked on it while watching Homeland Season 2
the first four episodes. And I solved it! Either try to solve it yourself
OR goto the link.

Some possibly open questions come out of this

1) I also prove that, for all k there is an n0=n0(k) such that

all n ≥ n0 there exists (a1,...,an) such that1/a1k+ ... + 1/ank = 1.


(sum of reciprocal kth powers)

We showed above that n0(2) ≤ 6, its easy to show no(2) ≥ 6, so n0(2)=6.

Obtain upper and lower bounds on n0(k).

2) What is the complexity of the following problem:

Given k,n find out if there exists (a1,...,an) such that1/a1/k + ... + 1/ank = 1.

If so then find the values (a1,...,an).

(We know of an example where the Greedy method does not work.)

3) What is the complexity of the following problem: Just as above
but now we want to know HOW MANY solutions.

4) Meta question: How hard are these questions? The original one was
on the level of a high school or college math competition. The rest
might be easy or hard. I suspect that getting an exact formula for
n0(k) is hard. I also suspect that proving that this is hard
will be hard.


Thursday, January 22, 2015

There Should be an Algorithm

My high school daughter Molly was reading her Kindle and said "You know how you can choose a word and the Kindle will give you a definition. There should be an algorithm that chooses the right definition to display depending on the context". She was reading a book that took place in the 60's that referred to a meter man. This was not, as the first definition of "meter" would indicate, a 39 inch tall male. A meter man is the person who records the electric or gas meter at your house. Today we would use a gender neutral term like "meter reader" if technology hadn't made them obsolete.

Molly hit upon a very difficult natural language processing challenge known as word-sense disambiguation with the most successful approaches using supervised machine learning techniques. If anyone from Amazon is reading this post, the Kindle dictionary would make an excellent application of word-sense disambiguation. You don't need perfection, anything better than choosing the first definition would be welcome. Small tweaks to the user interface where the reader can indicate the appropriate definition would give more labelled data to produce better algorithms.

And to Molly: Keep saying "There should be an algorithm". Someday there might be. Someday you might be the one to discover it.

Monday, January 19, 2015

The two defintions of Chomsky Normal Form

I have eight  textbooks on Formal Lang Theory on my desk.  Six of them define a CFG to be in Chomsky Normal Form if every production is of the form either A-->BC or A--> σ (σ a single letter). With that definition one can show that every e-free grammar can be put in Chomsky Normal Form, and using that, show that CFL ⊆ P. There is a very minor issue of what to do if the CFL has e in it.

Two of the books (Sipers and Floyd-Beigel) define a CFG to be in Chomsky Normal Form if every rule is A-->BC or A--> σ OR S-->e and also that S cannot appear as one of the two nonterminals at the right hand side of a production of the form A-->BC. With this definition you can get that every CFL (even those with e in them) has a grammar in Chomsky Normal Form.

The definitions are NOT equivalent mathematically, but they are equivalent in spirit and both aim towards the same thing: getting all CFL's in P (That's  why I use them for.  What did Chomsky used them for?)

The first definition is correct historically- its what Chomsky used (I am assuming this since it's in the earlier books). The second one could be argued to be better since when you are done you don't have to deal with the e. I still like the first one, but its six of one, half a dozen of the other. One person's  floor function is another person's  ceiling function.

I don't have a strong opinion about any of this, but I will say that if you use the second definition then you should at least note that there is another definition that is used for the same purpose. Perhaps make a homework  out of this.

There are many other cases where definitions change and the new one leads to more elegant theorems and a better viewpoint than the old one. There are even cases where the new definition IS equivalent to the old one but is better. IMHO the (∃) definition of NP is better than the definition that uses nondeterministic Poly Time TM's since this leads to the definition of the poly hierarchy. One drawback- if you want to define NL then I think you DO need nondeterministic TM's.

The only problem I see with changing definitions is if you are reading an old paper and don't quite know which definition they are using.

What examples of a definition changing (or an equivalent one being more used) do you approve of? Disapprove of?


Thursday, January 15, 2015

The Impact Factor Disease

The Institute of Science Information (ISI) was founded in 1960 to help index the ever growing collection of scientific journals. The founder of ISI, Eugene Garfield, developed a simple impact factor to give a rough estimate of quality and help highlight the more important journals. Roughly the impact factor of a journal in year x is the average number of citations each article from years x-1 and x-2 receives in year x.

Thomson Scientific bought out ISI in 1992 and turned the data collection into a business. Impact factors are not only being used to measure the quality of journals but of authors (i.e. researchers) and institutions as well. In many parts of the world a person's research quality is being measured strongly or sometimes solely by their ability to publish in high impact factor journals.

This is bad news for computer science since conference proceedings in our field have historically more prestige than journals. We mitigate the ISI factors pretty well in the US but in many other countries this puts computer science at a disadvantage. The need for impact factor publications is one of the reasons conferences are experimenting with a hybrid model.

In a new trend I get many announcements of journals highlighting their ISI impact factor, mostly very general and previously unknown to me. Our old friends WSEAS say "The ISI Journals (with Impact Factor from Thomson Reuters) that publish the accepted papers from our Conferences are now 32" in the subject line of their email.

It's the vanity journal with a twist: publish with us and you'll raise your official numerical prestige. So we get a set of journals whose purpose is to raise the value of researchers who should have their value lowered by publishing in these journals.

Raise your research standing the old fashioned way: Do great research that gets published in the best venues. The numbers will take care of themselves.

Tuesday, January 13, 2015

Barrier's for Matrx Mult Lower bound


Matrix Mult:

The usual algorithm is O(n^3).

Strassen surprised people by showing an O(n^{2.81}) algorithm. (Now its so well known that its not surprising. I try to teach it to people who don't already know its true so they'll be surprised.)

Over the years the exponent came down, though there was a long time between Coopersmith and Winograd's exp of 2.3755 and the recent  improvements.

Are these improvements going to lead to the exp 2+epsilon?

Alas the answer is prob no :-(

In Fast Matrix Mult: Limitations of the Laser Method Ambainis, Filmus, and Le Gall show that the methods that lead to the algorithms above will NOT lead to an exp of 2+epsilon.

How to react to news like this? Barriers should not make us give up; however, they should make us look for new techniques, perhaps guided by the barrier. I would like to think that if you know where NOT to look then it helps you know where TO look.

There have been barriers that were broken (IP=PSPACE didn't relativize, the recent 2-prover PIR used techniques NOT covered by the barrier result, and the Erdos distance problem was proven after it was known that the current techniques `wouldn't work').   I am sure there are other examples, feel free to leave some in the comments

Will this result help guide researchers in the right direction? Lets hope so!

Of course, its possible that 2+epsilon is false and the barrier result is a first step in that direction.

Which one am I routing for? Neither- I am routing for finding out!

bill g.

Thursday, January 08, 2015

The History of the History of the History of Computer Science

In 2007, the science historian Martin Campbell-Kelly wrote an article The History of the History of Software, where he writes about how he initially wrote histories of the technical aspects of computer software back in the 70's but now he's evolved into writing more about the applications and implications of software technologies. He argues that the whole field of the history of software has moved in the same directions.

Donald Knuth made an emotional argument against this trend last May in his Stanford Kailath lecture Let's Not Dumb Down the History of Computer Science. If you can find an hour, this is a video well worth watching.

In the January CACM Thomas Haigh gave his thoughts in The Tears of Donald Knuth. Haigh argues that Knuth conflates the History of Computer Science with the History of Computing. Haigh says that historians focus on the latter and the History of Computer Science doesn't get enough emphasis.

Let me mention two recent examples in that History of Computing category. The Imitation Game give a great, though slightly fictionalized, portrait of the computing and computer science pioneer Alan Turing focusing on his time at Bletchley Park breaking the Enigma code. Walter Isaacson, whose histories of Franklin, Einstein and Jobs I thoroughly enjoyed, writes The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution which tells the stories of computers from Ada Lovelace to Google (oddly stopping before social networks).

But what can we do about the History of Computer Science, particularly for theoretical computer science? We live in a relatively young field where most of the great early researchers still roam among us. We should take this opportunity to learn and record how our field developed. I've dabbled a bit myself, talking to several of the pioneers, writing (with Steve Homer) a short history of computational complexity in the 20th Century and a history chapter in The Golden Ticket.

But I'm not a historian. How do we collect the stories and memories of the founders of the field and tell their tales while we still have a chance?

Monday, January 05, 2015

Why do students do this?

Before my midterm in ugrad  Theory of Computation  I gave the students a sheet of practice problems to do that I would go over before the midterm.

One of them was: Let L be in DTIME(T(n)). Give an algorithm for L*. Try to make it efficient. What is the time complexity of your algorithm? (I had done that if L is in P then L^* is in P in class earlier in the term.)

My intention was that they do the Dynamic Programming solution. Since it wasn't being collected I didn't have to worry about what would happen if they did it by brute force.  When  I went over it in class I did the Dynamic Programming Solution, which is roughly T(n)^3 time.

I allow my students to bring in a sheet of notes that they make up themselves.

On the exam was the problem: Let L_1 \in DTIME(T_1(n)) and L_2\in DTIME(T_2(n)).
Give an algorithm for L_1L_2. What is the time complexity of your algorithm?

Of my 20 students 5 of them gave me, word for word, the dynamic programming solution to the L, L* problem.

Why would they do this? Speculations:
  1. They just copied it off of their cheat sheet with no understanding.
  2. They wanted pity points (they didn't get any and I told the class that if a similar thing happens on the final I will give them LESS THAN zero on the problem).
  3. They so hoped that the L, L* problem would be on the exam (possibly becuase it was on their cheat sheet) that  they misread the problem.
  4. They thought `Dr. Gasarch wouldn't have put it on the practice exam unless it was on the real exam' (or something like that), so they misread it. 
The five students were not very good  (they did poorly on other problems as well, and on the HW), so it was not a matter of good students being confused or getting nervous.

But I ask-- (1) is this kind of thing common? For my Sophomore Dscrete Math yes, but I was very slightly surprised to see it in my senior course. (2) Do you have examples? I am sure that you do, but my point is NOT to do student-bashing, its to ask WHY they do this.

Monday, December 29, 2014

2014 Complexity Year in Review

Theorem of the year goes to 2-Server PIR with Sub-polynomial Communication by Zeev Dvir and Sivakanth Gopi. In Private Information Retrieval you want to access information copied in multiple databases in a way so that no database knows what question you asked. In 1995 Chor, Kushilevits, Goldreich and Sudan showed how to use n1/3 bits of communication with two databases. Razborov and Yekhanin proved that using current techniques the bound could not be broken.  Dvir and Gopi
developed new techniques to break that barrier using nO(√(log log n/log n)) bits with two databases, less than nδ for any δ. Bill posted more on this result back in August.

And of course lots of other great work on extended formulations, circuits, algorithms, communication complexity and many other topics. We also had another round of favorite theorems for the past decade.

2014 will go down as the year computer science exploded. With a big convergence of machine learning/big data, the connectedness of everything, the sharing economy, automation and the move to mobile, we have a great demand for computer scientists and a great demand from students to become computer scientists or have enough computing education to succeed in whatever job they get. Enrollments are booming, CS departments are hiring and demand far outstrips the supply. A great time for computer science and a challenging one as well.

We say goodbye to G.M. Adelson-VelskyAlberto Bertoni, Ed Blum, Ashok Chandra, Alexey Chervonenkis, Eugene DynkinClarence "Skip" Ellis, Alexander Grothendieck, Ferran Hurtado, Mike Stilman, Ivan Stojmenovic, Berthold VöckingAnn Yasuhara, Microsoft Research-Silicon Valley and The New York Times Chess Column.

Thanks to our contributors Andrew ChildsBrittany Terese Fasy and MohammadTaghi Hajiaghayi.

Looking ahead 2015 brings the centenary of the man we know for balls and distance and the fiftieth anniversary of the paper that brought us the title of this blog. Have a great New Years and remember, in a complex world best to keep it simple.

Monday, December 22, 2014

Undergraduate Research

I just received the Cornell Math Matters, dedicated to the memory of  Eugene Dynkin who passed away on November 14 at the age of 90. In my freshman year at Cornell, Dynkin recruited me into his undergraduate research seminar building on the success he had with a similar seminar he ran when in Russia. I didn't last long, making the conscious choice not to work on research as an undergrad but to focus on enjoying the college experience. I missed out on a great opportunity but I don't regret that decision.

Reluctantly I wouldn't give that advice to today's undergrads. Getting into a good graduate program has become much more competitive and even a small amount of research experience may make a large difference in your application. I encourage any undergrad who may consider a PhD in their future to talk to some professors and get started in a research program. But don't let it run your life, make sure you enjoy your time at college. You'll have plenty of time to spend every waking moment on research once you start graduate school.

Thursday, December 18, 2014

The NIPS Experiment

The NIPS (machine learning) conference ran an interesting experiment this year. They had two separate and disjoint program committees with the submissions split between them. 10% (166) of the submissions were given to both committees. If either committee accepted one of those papers it was accepted to NIPS.

According to an analysis by Eric Price, of those 166, about 16 (about 10%) were accepted by both committees, 43 (26%) by exactly one of the committees and 107 (64%) rejected by both committees. Price notes that of the accepted papers, over half (57%) of them would not have been accepted with a different PC. On the flip side 83% of the rejected papers would still be rejected. More details of the experiment here.

No one who has ever served on a program committee should be surprised by these results. Nor is there anything really wrong or bad going on here. A PC will almost always accept the great papers and almost always reject the mediocre ones, but the middle ground are at a similar quality level and personal tastes come into play. There is no objective perfect ordering of the papers and that's why we task a program committee to make those tough choices. The only completely fair committees would either accept all the papers or reject all the papers.

These results can lead to a false sense of self worth. If your paper is accepted you might think you had a great submission, more likely you had a good submission and got lucky. If your paper was rejected, you might think you had a good submission and was unlucky, more likely you had a mediocre paper that would never get in.

In the few days since NIPS announced these results, I've already seen people try to use them not only to trash program committees but for many other subjective decision making. In the end we have to make choices on who to hire, who to promote and who to give grants. We need to make subjective decisions and those done by our peers aren't always consistent but they work much better than the alternatives. Even the machine learning conference doesn't use machine learning to choose which papers to accept.

Monday, December 15, 2014

Joint Center for Quantum Information and Computer Science

(Guest post by Andrew Childs who is now at the Univ of MD at College Park)



We have recently launched a new Joint Center for Quantum Information and Computer Science (QuICS) at the University of Maryland. This center is a partnership
with the National Institute of Standards and Technology, with the support and participation of the Research Directorate of the National Security Agency/Central Security Service. QuICS will foster research on quantum information and computation.

We are pleased to announce opportunities for Hartree Postdoctoral Fellowships
(deadline: December 30, 2014) and Lanczos Graduate Fellowships. Outstanding postdoctoral and graduate researchers with an interest in quantum information processing are encouraged to apply.

QuICS complements a strong program in quantum physics research at the Joint Quantum Institute. Maryland is also home to a new Quantum Engineering Center.
It's an exciting time for quantum information here.

Thursday, December 11, 2014

The Ethics of Saving Languages

The linguist John McWhorter wrote an NYT opinion piece entitled Why Save a Language? where he argues why we should care about saving dying languages, basically that language gives us a window into culture. As a computer scientist I appreciate the scientific value of studying languages but perhaps the question is not whether we should care but is it ethical to save languages?

Languages developed on geographical and geopolitical boundaries. Even as new methods of communication came along such as postal mail, the printing press, the telephone and television there never was a strong reason to learn multiple languages save for some small European nations and some professions such as academics and politics.

Then came the Internet and location mattered less but language barriers still persist. I've certainly noticed a marked increase in the number of young people around the world who know basic conversational English, much from the content they consume online. There's also a sizable amount of content in all the major languages.

But if you speak a small language where all the other native speakers are geographically very close to you, you lose this networked connection to the rest of humanity. Your only hope is to learn a second language and that second language might become a first language and so many of these small languages start to disappear.

I understand the desire of linguists and social scientists to want to keep these languages active, but to do so may make it harder for them to take advantage of our networked society. Linguists should study languages but they shouldn't interfere with the natural progression. Every time a language dies, the world gets more connected and that's not a bad thing.

Tuesday, December 09, 2014

Godel and Knuth Prize Nominations due soon. Which would you rather win? Or ...

(Alg Decision Theory conference in Kentucky: here.)

Knuth Prize Nominations are due Jan 20, 2015.
For info on the prize see here, if you want to nominate someone
go here.

Godel Prize Nominations are due Jan 31, 2015.
For info on the prize see here, if you want to nominate someone
go here

Would you rather:

  1. Win a Godel Prize
  2. Win a Knuth Prize
  3. Have a prize named after you when you are dead
  4. Have a prize named after you when you are alive
I pick 4; however, I doubt I'll have of 1,2,3,4 happen to me.

How about you?

Monday, December 01, 2014

Cliques are nasty but Cliques are nastier

BILL: Today we will show that finding large cliques is likely a nasty problem

STUDENT: Yes! In my High School the Cliques were usually at most six people and they gossiped about everyone else. They were very nasty.

BILL: Um, yes, picture in your school that everyone is a node in a graph and that two nodes are connected if they are friends. In your school a clique of size 6 would be 6 people who all liked each other

STUDENT: Actually, the people in a clique secretly hated each other and sometimes belonged to other cliques that would gossip about people in the first clique.

BILL: We might need  the Hypergraph version to model your school.


Computer Scientists and Graph Theorists call a set of nodes that are all connected to each other a CLIQUE - pronounced CLEEK

High School Kids call a group of people who hang out together a CLIQUE- pronounced CLICK.

Which term came first? Why are they pronounced differently when they are quite related to each other? Do the members of a high school clique really hate each other?

Sunday, November 23, 2014

Guest Post about Barbie `I can be an engineer' -- Sounds good but its not.


There is now a I can be an engineer Barbie. That sounds good! It's not. Imagine how this could be turned around and made sexist. What you are imagining might not be as bad as the reality. Depends on your imagination.

Guest Blogger Brittany Terese Fasy explains:

Remember the controversy over the Barbie doll that said
"Math class is tough!"?  Well, Barbie strikes again.

If you haven't heard about II can be a computer engineer  it is a story about how Barbie, as a "computer engineer" designs a game, but cannot code it herself.  She enlists the help of her two friends, Steven and Brian, to do it for her.  Then, she gets a computer virus and naively shares it with hersister.  Again, Steven and Brian must come to the rescue.  Somehow, in the end, she takes credit for all of their work and says that she can be a computer engineer.  Gender issues aside, she does not embody a computer engineer in this book. For more details, please see here.

Children need role models.  Naturally, parents are their first role models.  And, not everyone's parent is a computer engineer / computer scientist.  So, books exploring different career choices to children provides the much-needed opportunity for them to learn about something new, to have a role model (even if if that role model is fictional).  In principle, this book is fantastic; however, it fails to convey the right message.  That is why I started a petition to Random House to pull this book off the market.The petition is here.

Progress was made as Barbie issued an apology: here. And Amazon and Barnes and Nobles  removed the book from its catalog. However, neither Random House, nor the author of the book have issued a statement, and it is still available at Walmart.

Until the book is completely off the market we should not stop! And maybe one day, we'll see
Barbie: I can be a Computational Geometer on the shelves.

Thursday, November 20, 2014

A November to Remember

The Imitation Game starring Benedict Cumberbatch as Alan Turing opens in the US on November 28. If you read this blog you should see that movie. If one challenged British scientist biograph is not enough for you, The Theory of Everything with Eddie Redmayne as Stephen Hawking opened earlier this month. Also this month has the physics-laden Interstellar and the nerd-hero robot adventure Big Hero 6. Science rules the box office, though likely to be clobbered by a Mockingjay.

Speaking of Turing, The ACM Turing Award will now come with a $1 million dollar prize, up from $250K, now on par with the Nobel Prize. Thanks Google.

Madhu Sudan will receive the 2014 Infosys Prize in Mathematics. Nimrod Megiddo wins the 2014 John von Neumann Theory Prize.

Harvard Computer Science gets 12 endowed faculty lines from former Microsoft CEO Steve Ballmer. That's about the number of former Microsoft Research theorists now on the job market. Just saying,

Sanjeev Arora asks for comments on the potential changes to STOC/FOCS discussed at the recent FOCS. Boaz Barak has set up a new CS Theory jobs site. On that note, the November CRA News has 75 pages of faculty job ads, up from 50 a year ago.

Terry Tao talked twin, cousin and sexy primes on Colbert last week. The new result he quoted is that the Generalized Elliott-Halberstam conjecture implies that there are infinitely many pairs of primes at most six apart.

Not all happy news as we lost the great mathematician Alexander Grothendieck. Tributes by Luca and Ken.

Monday, November 17, 2014

A Fly on the wall for a Harvard Faculty meeting: Not interesting for Gossip but interesting for a more important reason

I was recently in Boston for Mikefest (which Lance talked about  here)  and found time to talk to  my adviser Harry Lewis at Harvard (adviser? Gee, I already finished my PhD. Former-Advisor? that doesn't quite sound right. I'll stick with Adviser, kind of like when they refer to Romney as Governor Romney, or Palin as half-governor Palin). He also invited me to goto a Harvard Faculty meeting.

NO, I didn't see anything worth gossiping about. NO I am not going to quote Kissinger ``Academic battles are so fierce because the stakes were so low'' NO I am not going to say that under the veneer or cordiality
you could tell there were deep seated tensions. It was all very civilized. Plus there was a free lunch.

The topic was (roughly) which courses count in which categories incomputer science for which requirements.  Why is this interesting? (hmmm- IS it interesting? You'd prob rather hear that Harry Lewis stabbed Les Valiant with a fork in a dispute about whether there should be an ugrad learning theory course). Because ALL Comp sci depts face these problems.  At Mikefest and other conferences I've heard the following issues brought up:

Should CS become a service department?  Math did that with Calculus many years ago. PRO: They get to tell the dean `we need to hire more tenure track faculty to teach calculus', CON: They have to have their tenure track faculty teach calculus. (I know its more complicated than that.)

What should a non-majors course have in it?

What should CS1,CS2,CS3 have in it (often misconstrued by the question ``what is a good first language'' which misses the point of what you are trying to accomplish). For that matter, should it be a 3-long intro sequence (it is at UMCP).

Can our majors take the non-majors courses?  (At UMCP our non majors course on web design has material in it that is NOT in any majors course.)

When new courses come about (comp-bio, programming hand-held devices, Computational flavor-of-the-month) what categories to they fit into? (For an argument in favor of Machine Learning see Daume's post. ) What should the categories be anyway? And what about the functors?

Which courses were at one point important but aren't any more? UMCP no longer requires a Hardware course-- is that bad?  (Yes- when I tell my students that PARITY can't be solved by a constant depth poly sized circuit, they don't know what a circuit is!)

I don't have strong opinions to any of these questions (except that, despite my best efforts, we do not require all students to learn Ramsey Theory), but I note that all depts face these questions (or need to- I wonder if some depts are still teaching FORTRAN and COBOL- and even that's not quite a bad thing since there is so much legacy code out there.)

I have this notion (perhaps `grass is always greener on the other side') that MATH (and most other majors) don't have these problems.  AT UMCP there have only been TWO new math courses introduced on the ugrad level since 1985 :Crypto (which is cross-listed with CS), and Chaos Theory. CS has new courses, new emphasis, new requirements every few years. Oddly enough when I tell this to Math Profs they ENVY that we CAN change our courses so much. What is better chaos or stability?

When I saw Back to the future 2   in 1989 I noticed that their depiction of academic computer science in 2015 was that  Comp Sci Depts across the country agreed on what was important and be similar (as I imagine math is). Instead the opposite has happened- these things are still in flux. (If you can't trust a Science Fiction movie staring Michael J Fox what can you trust?) As a sign of that, the advanced GRE in CS never really worked and has now been discontinued.

So- will CS settle down by 2015? We still have a year to go, but I doubt it.  2025? Before P vs NP is solved?

and is it OKAY if it doesn't?


Thursday, November 13, 2014

From Homework Solution to Research Paper

Inspired by the Dantzig Story  I occasionally put an open problem on a class assignment. Never worked, though I did have a student get a research paper from solving a homework question the hard way.

Teaching in the early 90's, I showed Valiant's proof that computing the permanent of a 0-1 matrix was #P-complete, including showing that the 0-1 permanent was in #P, the class of functions representable as the number of accepting paths of a nondeterministic polynomial-time Turing machine.

I gave a homework assignment to show that the permanent of a matrix with non-negative integer entries was in #P. The answer I expected was to construct an appropriate NP machine whose number of accepting paths equalled the permanent and some students came up with such a proof.

One of the students Viktória Zankó took a different approach, creating a reduction that mapped an integer matrix A to a 0-1 matrix B such that permanent(A) = permanent(B). A fine solution reducing the problem to a previously solved case.

So what's the rub? Such a reduction was an open problem and simplified Valiant's paper. Valiant only had the reduction for integer matrices A with small entries and needed a mod trick to show the 0-1 permanent is #P-complete. Zankó's construction eliminated the need for the mod trick.

And that's how Viktória Zankó got a research paper from solving a homework problem.