Ruminations on computational geometry, algorithms, theoretical computer science and life
Sunday, April 03, 2005
Numb3rs Review: "Identity Crisis"
Plot Summary: Don discovers a crime with an M.O identical to a case that he thought he had solved a year ago. He gets Charlie to tell him that the likelihood of a coincidence was very small, and then goes after the real killer. Along the way, Charlie insults a fingerprint specialist, but she gets her revenge, with a "butcher's hook" (no no, not that kind).
What I found more interesting was a real life Numb3rs story (with no crime though), via the New Yorker, starring the brothers Chudnovsky, and a delicately tampered rug. And if you're tired of such seriousness, check out this profile of the most beautiful mathematics grad student on TV, Navi Rawat (courtesy Sepia Mutiny).
Tags: Numb3rs, review
The /. translation
- I'll put it in StarCraft terms: you're spending your minerals on upgrading your Zealots, and failing to invest in the pylons and tech structures that would allow you to build a whole frickin' fleet of Protoss Carriers
- You can't cut Counter Strike research! What will I play now?
- All useful CS research was done in the 70s
- We don't need no stinkin' guvmint: a teenage hacker can do better
- Why should DARPA fund academic research
- Why do CS researchers needs tons of money anyway ? All they do is think !
Saturday, April 02, 2005
The role of the university in the current funding crisis
Although it is not clear that the dot-com bust had anything to do with increased submission rates to conferences/funding agencies, it is fair to say that the dot-com boom had a lot to do with this.
I recall that in the late 90s and early 00s, universities were desperately trying to increase the sizes of their CS departments, allegedly in response to increased undergradate enrollment and the consequent increased demand for teachers.
Needless to say, when you hire tenure-track professors, you get more teaching, but you get a heck of a lot more research, as you have more slaves running the tenure treadmill as fast their legs can go. That inevitably leads to more submissions to conferences and more requests for money.
But tenure-track jobs often lead to tenure, i.e permanent jobs. These professors are not likely to stop publishing and stop requesting funding, but the underlying supply of students appears to be drying up (at least at the graduate level, though probably not at the undergraduate level). What happens then ?
This appears reminiscient of what has happened in biology. The PCR revolution led to an explosion in the number of researchers in biology and genetics. The NIH has historically been generous in funding graduate work in biology. However, funding for PIs and post-docs is far more scarce, creating a supply-demand bottleneck that means long years toiling at postdocs for ever elusive faculty jobs.
In computer science, we are far from that, also because industry jobs are to an extent still a viable option for researchers. But as industrial research labs cut back, and funding for basic research dries up, we could easily be facing the same kind of funding crises that biology faces today. And to draw another lesson from biology, what is true for rats is true for humans: as resources dwindle, organisms become more ferocious towards each other. if you think your grant reviews were harsh before, watch out !
Dwindling research funding in the U.S.
The Defense Advanced Research Projects Agency at the Pentagon - which has long underwritten open-ended "blue sky" research by the nation's best computer scientists - is sharply cutting such spending at universities, researchers say, in favor of financing more classified work and narrowly defined projects that promise a more immediate payoff.Others have already commented on this article: it is interesting to read it in conjunction with the latest David Patterson letter in CACM. One of the most fascinating tables in this letter is a chart outlining the evolution (in terms of money, and where the research was done) of many of the most important technologies available today. Technology transfer is a rather overused buzzword, but this chart really shows how technology moves between academia and industrial research labs (in both directions), and then to commercial settings. If ever one wanted proof of the value of academic "blue-sky" research, this is it.

As always, there are a number of related trends:
- In a previous letter, David Patterson lamented the increasing submission loads to various conferences. If you look at the chart of NSF funding requests and acceptances over the past few years, it looks strikingly similar.
- Research is becoming more conservative; again, David Patterson in his two letters cites the lack of 'out-there' conference papers as well as the 'chilling effect' of funding on truly innovative grant proposals.
Being conservative about accepting 'out-there' papers is not necessarily a bad thing; however, I'd argue that one cannot afford to be conservative about funding new ideas: to use a rather abhorrent analogy, funding is like the VC seed capital for a company, and conference/journal peer review is like the market place; if you cut off the source of innovation, there is no fuel for good ideas to emerge.
One has to ask though: How much is ACM itself responsible for this state of affairs ? I have read more interesting articles in CACM in the past three months than I had in the prior 7 years, and I am truly grateful that David Patterson is raising issues that are familiar to all of us in research. However, the ACM, over the past many years, has morphed from one of the truly representative computer science organizations (many foundational research results were published in CACM in days gone by), to an IT trade organization. The ACM does not speak to me anymore, at a time when an active lobbying group for basic computer science research has never been more important.
Part of this, for better or for worse, has been the explosion in the IT industry over the last ten years. With the kind of money pouring into the IT world, it probably made sense to cater to the needs of IT professionals. However, companies are fickle creatures; one of the main problems we face right now is the lack of industrial funding for basic research, something that used to be a big part of the computer science landscape.
Let's be clear here: DARPA is right that if they are
devoting more attention to "quick reaction" projects that draw on the fruits of earlier science and technology to produce useful prototypes as soon as possible.then they don't need to fund university research (and in fact academic researchers wouldn't want to work on these problems). Their mistake of course is not seeing the value of long-term fundamental research even now, but you have to ask: when the flagship magazine of the most well known computer science organization cannot be distinguished from a random trade magazine, who's to tell the difference between academic research and corporate software efforts ?
Hat tip to Chandra Chekuri for pointing out the above articles.
Friday, April 01, 2005
Theory Day at Maryland
Here's the program:
- Seffi Naor (Technion): Balanced Metric Labeling
- Yossi Azar (Tel-Aviv University): The Price of Routing Unsplittable Flow
- Moses Charikar (Princeton): Aggregating Inconsistent Information: Ranking and Clustering
- Sanjeev Khanna (Univ. of Pennsylvania): New Algorithms and Hardness Results for Disjoint Paths Problems
No wifi please, we're Japanese...
Todd Ogasawara went to Japan for business and discovered that unlike in the US, where hotels are tripping over themselves to offer WiFi, over there it’s mighty rough trying to find a hotel with any sort of broadband access (whether or wired or wireless) for guests. Not that there isn’t WiFi in Japan or anything like that, but apparently most hotels assume that you’ll have some sort of high-speed access via your cellphone and so don’t feel compelled to offer WiFi.Looks like blogging will be greatly reduced on my upcoming trip :)
Proofs
The article distinguishes two kinds of computer proof: one where the the computer is used to enumerate and check a large number of subcases in a proof, and one where the computer is used to generate a proof via an automated theorem-proving process. The article argues that
It is possible that mathematicians will trust computer-based results more if they are backed up by transparent logical steps, rather than the arcane workings of computer code, which could more easily contain bugs that go undetected.I am skeptical whether mathematicians will be willing to trust a computer over their own intuition. The dark secret at the heart of mathematics is that most proofs are established by ultimately appealing to the reader's intuition. In fact, it is more likely that I can convince a listener of my proof by intuitive arguments than by pure formal methods. This is a sliding scale of course; my intuitively obvious claim is another person's completely nontrivial lemma, and it is through the give and take among mathematicians in the community that a proof is 'stabilized' or 'confirmed'.
Ultimately, it's worth remembering that part of the beauty of mathematics is that rigid formal truths have beautiful intuitive interpretations, and it is these intuitive metaphors that are much more lasting than the proofs themselves.
A quote from the last paragraph of the Economist article is fitting, and would make Timothy Gowers proud:
Why should the non-mathematician care about things of this nature? The foremost reason is that mathematics is beautiful, even if it is, sadly, more inaccessible than other forms of art. The second is that it is useful, and that its utility depends in part on its certainty, and that that certainty cannot come without a notion of proof.
Tags: Economist, proof
Monday, March 28, 2005
Open Problems
It's worth pointing out that there are open problems, and there are open problems. Anyone who writes a paper can come up with open problems; these are merely the questions you didn't get time to address, and are either too bored or too unmotivated to work on yourself. These are not really that interesting. Good open problems are not necessarily easy, and are not too hard either (P?=NP is an open problem in the true sense of the word, but is not particularly useful as a proposal for this kind of forum).
A good open problem thus has some intrigue, has some surprise, and should tantalize the reader; the solution should appear to be just over the horizon, rather than indistinctly fading away. Any thoughts on good candidates ?
Friday, March 25, 2005
New (to me) book on randomization
Cambridge University Press has cornered the market on randomized algorithms. After publishing the truly excellent Randomized Algorithms by Rajeev Motwani and Prabhakar Raghavan (ed: wasn't Rajeev your advisor ? me: that has nothing, nothing at all to do with it), they have now published Probability and Computing by Michael Mitzenmacher and Eli Upfal. I caught a brief look at the book the other day, and it seems to have a slightly higher focus on probabilistic tools and techniques and possibly a slightly lesser focus on randomized algorithms, but I could be wrong here.
All in all, looks like a great book to have; it's definitely on my list of books-to-get.
Tuesday, March 22, 2005
Blogging from FSU
The latest issue of Salon (you can read the article if you sit thru an ad) reviews two books on Gödel that were mentioned here earlier: Rebecca Goldstein's Incompleteness and Palle Yourgrau's World Without Time...
Monday, March 21, 2005
New Abel Prize(s)
for his groundbreaking contributions to the theory and application of partial differential equations and to the computation of their solutionsAn interesting section from the long popularized explanation of his work:
Lax considers himself both a pure and an applied mathematician. His advice to young mathematicians is summarized in 'I heartily recommend that all youngmathematicians try their skill in some branch of applied mathematics. It is a gold mine of deep problems whose solutions await conceptual as well as technical breakthroughs. It displays an enormous variety, to suit every style; it gives mathematicians a chance to be part of the larger scientific and technological enterprise. Good hunting!'I really like the fact that the Abel Prize committee supplies these popular explanations of the work of the awardees. The ACM has a detailed press release for its awardees, but a popular exposition of the work would be a great addition.
In other news, the Abel Prize Committee is funding a new prize called the Ramanujan award. From the news blurb on the Abel Prize site:
The Prize will be awarded annually to a researcher from a developing country less than 45 years of age at the time of the award, who has conducted outstanding research in a developing country. The first winner will be announced in 2005.The award amount will be $10,000.
Wednesday, March 16, 2005
Mmmm, ice cream
There is, or at least was briefly, an ice cream flavor created in honor of physicist Richard Feynman. Devised by Toscanini's Ice Cream, it was inspired by the story that earlier inspired the name of the classic book Surely You're Joking, Mr. Feynman!On that note,
The Newton: comforting and familiar, with a rich chocolate sauce. But it ends, and you are sad.
The Heisenberg: Everytime you think you can nail it down, it escapes you.
The Einstein: Mindbending, dude !
The Godel: I could describe the flavor, but I need another icecream...
Tuesday, March 15, 2005
Clickety click
Friday, March 11, 2005
Numb3rs Review: "Counterfeit Reality"
After a brief hiatus, Numb3rs is back, with another kidnapping episode. As usual, spoilers abound..
Plot Summary: Series of store robberies and two murdered teens lead to a gang of counterfeiters, who kidnapped an artist and forced her into doing note sketches for them. If Don, Charlie, and the rest of their merry men can't find her in time, it's lights out !
This is a good "character development episode", with backstory about Don's love life being revealed. I guess the producers didn't have the courage to go all CSI/Law and Order/Law and Order SVU/Law and Order CI/Law and Order WillYouStopWithAllTheSpinoffs on us and dispense with the characters' backgrounds. Pity...(Yes, I know this is a frequent rant, but ...)
As I always say, if you want character development, don't read this blog ! My heart races when they say 'wavelets', which in fact they did. Local cop refers to a 'video-enhancement algorithm' that Charlie has developed, and I was wondering when wavelets would come up. As it turns out, there are scenes with pictures of what appear to be wavelet basis functions (though they didn't look like Haar wavelets at least).
I have now seen 7 episodes of Numb3rs, and in these seven episodes, Charlie uses statistical modelling, attacks P vs NP, does cryptography, understands advanced number theory, knows civil engineering, is a mean hacker and consults on string theory problems. Not to mention having tenure. So my question is: What could Charlie's field of specialization possibly be ? I mean, what on earth could he have written a thesis in ? The 'professor of applied math' part could cover the modelling and the civil engineering, but the rest ? And does he not know that at certain universities, math department professors wrinkle their noses when a professor of 'applied math' walks by ? I have certain departments in mind, but I'm not naming names ;)
Best/Most inane line of the episode: Charlie's grad student (who appears to be a wavelet expert as well as a combinatoricist), says 'In combinatorics, sometimes we vary the angle of attack'.
In algorithms, sometimes we eat the sugared candy...
- Next episode review
- Previous episode review
Thursday, March 10, 2005
Gödel, Original Sin, and Sausage...
"Communist takeover of mathematics" in which individuality and intuition would be subjugated, for the common good, to logical rules.This tidbit is funny.
Yet, Gödel is routinely deployed by people with antirationalist agendas as a stick to whack any offending piece of science that happens by. A typical recent article, "Why Evolutionary Theories Are Unbelievable," claims, "Basically, Gödel's theorems prove the Doctrine of Original Sin, the need for the sacrament of penance, and that there is a future eternity."To that, I have only one response: the Sacred Sausage Invocation. Amen.
Tags: Gödel
Wednesday, March 09, 2005
The answer...
It becomes clear fairly quickly once you start playing with examples that the thing to do is create some kind of bipartite graph, where the left side is the "bad answer" and the right side is the "good answer". Since picking all the vertices on a side is a valid cover, and you have to pick all of at least one side, it is clear that the smaller side is the optimal solution. So the goal is to make an example with more nodes on the left than on the right, and force the algorithm to pick something from the left each time (if you could direct the edges, to force picking from one side, then it would be easier, but then you've just created a SET COVER instance).
The catch is that just by averaging, if the left side has more nodes than the right, then it seems that the nodes on the right should have higher degree and should be picked first. How does one avoid this ? This is where the "trick" is used. Gronwall's theorem bounds the sum of the number of divisors of n. Roughly, the sum of the divisors of n behaves like n ln ln n as n goes to infinity. So here's the construction:
Let 1 = d1 < d2 < ... < dk = n be the divisors of n. Create n nodes on the right side. For each i, create di nodes on the left side. Of these di nodes, connect the first to the first n/di nodes on the right, and the second to the next n/di, and so on. There are (asymptotically) n ln ln n nodes on the left side. The maximum degree on the left side is n/d1 (= dk!), and the maximum degree on the right side is k, since each node on the right connects to only one node in each group on the left. It is easy to check that dk is always greater than k, so the algorithm always picks the first node from the left side. Repeat, use induction, clean your hands, and voila! you end up picking n ln ln n nodes instead of n: a ln ln n approximation gap.
This is not the best solution possible. With a bit more work based on this solution alone you can go up to c ln n. I'll leave that as an exercise ;).
Tags: VertexCover, algorithms, numbertheory
Tuesday, March 08, 2005
An old question, a new approach
One of the most obvious attacks on this problem is to use a greedy approach. Pick the vertex of highest degree, and remove all the adjacent edges. Repeat this till all edges have been removed. Clearly the set of vertices picked form a cover. But how good is this ?
The answer is, 'not very good', and one of the first homework problems in a graduate class on algorithms is to show why, by coming up with a counter-example. There are some standard (but not trivial) constructions that can be used, and you can amuse yourself with thinking about this.
However, this is not the point of this post. Amit Chakrabarti is teaching an graduate algorithms class up in Dartmouth, and Khanh Do Ba, a junior taking the class, came up with an ingenious solution that I at least had not heard of before. I will not tell you the solution just yet, but I will give you the hint that Amit gave me when he told me about it: the student used Gronwall's Theorem.
Happy pondering. Tune in tomorrow for the answer.
Tags: VertexCover, algorithms, numbertheory
Sp311ing
Sp311ing
“Oh no, a crime!”Tags: Numb3rs
“Maybe we can solve it – with spelling!”
“Applesauce. A-P-P-L-E-S-A-U-C-E. Applesauce.”
“So, you are the American dog who spells for the Great Satan, eh? Well, now you will spell for the jihad!”
“Never!”
“We’ll never disarm this bomb in time! It’s impossible”
“You can’t spell ‘impossible’ without ‘possible’! Let’s go, team!”
“Australopithecine. A-U-S-T-R-A…”
Monday, March 07, 2005
Alpher, Bethe, Gamow
From Quark Soup:
Here is Gamow's own account, from his popular science book _The Creation of the Universe_:
"The results of these calculations were first announced in a letter to _The Physical Review_, April 1, 1948. This was signed Alpher, Bethe, and Gamow, and is often referred to as the 'alphabetical article.' It seemed unfair to the Greek alphabet to have the article signed by Alpher and Gamow only, and so the name of Dr. Hans A. Bethe (_in absentia_) was inserted in preparing the manuscript for print. Dr. Bethe, who received a copy of the manuscript, did not object, and, as a matter of fact, was quite helpful in subsequent discussions. There was, however, a rumor that later, when the alpha, beta, gamma theory went temporarily on the rocks, Dr. Bethe seriously considered changing his name to Zacharias.
"The close fit of the calculated curve and the observed abundances is shown in Fig. 15, which represents the results of later calculations carried out on the electronic computer of the National Bureau of Standards by Ralph Alpher and R. C. Herman (who stubbornly refuses to change his name to Delter.)"
Tags: Bethe
First we wrestle, then we ooze
How is the modern professoriate like a slime mold? Consider this description from Steven Johnson:
[T]hey solve problems by drawing on masses of relatively stupid elements, rather than a single, intelligent "executive branch." They are bottom-up systems, not top-down. They get their smarts from below. In a more technical language, they are complex adaptive systems that display emergent behavior. [Emergence: Scribner, 2001]
Read the article: it's interesting.
(HT: Three-Toed Sloth)
Tags: academe
Saturday, March 05, 2005
updating web pages..
for many academics, the practice of keepingAnd just for the heck of it:
one's web page up to date with both their long term and short
term research interests is something that gets prioritized just
below rabid-squirrel-wrestling.
Tags: rabid, squirrel, wrestling
Friday, March 04, 2005
Wake up, ACM.
Current case in point: RSS. A recent issue of CACM covered the blogosphere in great detail, so clearly the ACM is familar with the notion of RSS. However, there are still no RSS feeds for tables of contents for journals. ACM instead provides a TOCalert email service, something that was in vogue quite some years ago.
There is now a standard for RSS feeds from journals. Nature Publishing Group, that publishes most of the Nature:XX journals, provides RSS feeds for all of its journals. The IEEE provides feeds for its journals (though not in the PRISM standard format). The ArXiv provides feeds on all subject categories.Why can't the ACM ?
The reason I bring this up has to do with CiteULike. One of the things you can do with CiteULike is track RSS feeds of many journals archived there. There are numerous CS journals, including many in the area of machine learning. However, since the ACM doesn't provide RSS feeds, these can't be tracked inside CiteULike.
Even if ACM can't provide feeds in a standard format, any feed would be useful. Maybe some of you readers know people who know people who know people.... inside ACM ?
Wednesday, March 02, 2005
On Grace Murray Hopper
Her messages to us:
If you want to get something done, do it first. Apologize (if apologies are necessary) later.
Don't let anyone tell you you can't succeed, especially if you are a woman.
She gave us lengths of wire cut to the distance light travels in a nanosecond, and handed out pepper grain sized wires that she called picoseconds for the same reason.
Her talk was less about doing computer science and more about having the will to succeed against all odds. Very inspirational.
CS Awards
Jennifer Rexford has won the Grace Murray Hopper Award.
With a mixture of pride and regret I point out that all three are AT&T Labs alumni.
If you want to know more about what these awards are given for, visit the ACM Awards page. The Kanellakis prize commemorates an achievement that spans theory and practice (has theoretical heft and has shown to be effective in practice). The Gödel Prize acknowledges a single journal paper that has had great impact in the last seven years. The Grace Murry Hopper Award is a 'young achiever' award, given to researchers below the age of 35. The Turing Award, of course, is the computer science "Nobel" for lifetime achievement. The Knuth prize is the theory lifetime achievement award.
Gödel, Turing and Knuth need no introduction. Paris Kanellakis was a professor at Brown who died tragically in a plane crash in 1995. He was one of the pioneers in database theory. Grace Murray Hopper was one of the first programmers of modern computers, the first compiler designer, and is attributed with the coining of the term 'bug' to describe an error in a program.
Now that's the way to say it:
A university is not a Walmart-- Daniel Lemire.
Monday, February 28, 2005
a different set of numbers...
87,000+ words
70,000+ visits
113,000+ page views.
....
and a tired yet satisfied blogger, 1 year old, and plugging away.
Ah, the good old days...
[ brief pause while I am overcome with nostalgia...]
It should come as no surprise to the Indians who read this that the Punjabi section was one of the top swear picks. Yiddish also seems like a good choice, but I am surprised at the inclusion of French in the top picks list.
Tags: insults
The Road to Reality
The perfect reader for ''The Road to Reality,'' I fantasized, would be someone comfortable traversing the highest planes of abstract reasoning, yet who had missed some of the most important landmarks of scientific history -- a being, perhaps, from another place and time.I don't know. It sounds an awful lot like a mathematician to me.
Sunday, February 27, 2005
SODA Outtakes: Market Equilibria
This outtake is much easier, because there is a well-written survey I can point to. The topic is market equilibria, or the problem of determining when equilibrium sets in when traders exchange good at various prices, and how fast this can be determined.
More formally, the problem is: given an initial allocation of goods to traders (a "supply" vector, where each "dimension" represents a single commodity), and a "preference" function for each trader (I have a high affinity for gold and cookies, you have an aversion to silver, but really like beer), determine a set of prices for goods, and a "demand" vector for each trader so that if each trader sells what they have and buys what they want (at the given prices), their preference is maximized.
Here, the preference function component for any commodity is a concave function of its amount. Note that you want a concave function in order to model "diminishing returns". In addition, having concave utilities helps ensure unique optimal solutions (because the average of two solutions has better utility).
There are also sanity conditions (you can only buy with whatever money you have from selling, and the total amount of goods is fixed - this does not model producer economies).
This SIGACT News survey by Codenotti, Pemmaraju and Varadarajan lays out the terrain of market equilbria very nicely, outlining the history, the techniques (some beautiful methods) and what is currently known.
It's a good example of an area that has been studied extensively (in economics) but where the computer science angle brings in questions that were not necessarily of concern earlier: issues of computational feasibility, efficiency, approximations that are characteristic of the algorithmic way of approaching a problem.
Tags: market, equilibrium, algorithms
Friday, February 25, 2005
Numb3rs Review: "Sabotage"
Plot summary: Series of trains are derailed. At each site, a paper containing the same set of numbers is left behind. Everyone thinks it's a code, but it actually isn't (at least not in the traditional sense). Some good old sleuthing yields the answer.
It was funny when the NTSB official asks Charlie if he knows any cryptography. Although technically isn't cryptanalysis what the code breakers do ? Larry the loony physicist had some hilarious quotes:
- "Our evening was primal, in the carnal sense, not the mathematical."
- "The math department, the least libidinous place on campus"
The only real math nuggets came at the end, when Charlie was trying to explain to the female FBI agent that math appears in nature. His example: The Fibonacci series ! Bzzttt...
As it turns out, Rudbeckia Hirta just today posted a note about the exaggerated claims concerning the Fibonacci series; that the Greeks referred to the "golden ratio" (no), that the pyramids and the Parthenon somehow magically encode this ratio (only if you round off heavily), and even how the ratio is supposed to be aesthetically pleasing (no evidence). Keith Devlin has more on this.
Oh well: if Dan Brown can do it...
Tags: numb3rs, reviews
p.s I understand that we need some kind of romantic interest to give the series some spice (although CSI and Law and Order manage perfectly well without it), but why on earth are they setting up an advisor-student relationship ? Allegedly at some distant point in the past this used to be fairly common in the humanities, but nowadays ? My only question is: how long before this becomes a "Big Issue" in the academic blogosphere ?
Meta: recent comments
Comments were broken for a while because I forgot to update the hack that allows me comments in the first place. They should be working again.
SODA Outtakes: Lloyd's algorithm and analysing heuristics
[Aside: in the theoryCS community, a "heuristic" usually refers to a procedure that has no provable performance guarantees; this could mean either no clear running time analysis, or no clear quality of optimization guarantee. The term "algorithm" is usually reserved for a procedure that has well-defined steps and ends in a well-defined state after a definite (finite) number of steps. One will occasionally find the term 'algorithm' used to refer to a heuristic in non-theory areas; beware !]
Doing such analysis, although maybe not as "sexy" as coming up with a new algorithm, is important also for "sociological" reasons. Although it might technically be correct to argue that a particular heuristic is not reliable and thus should not be used, the resolute theoretician is often faced with a mountain of empirical evidence in support of said heuristic, compared with the possibly complex and hard-to-understand algorithm that she proposes.
Some of the best known heuristics show up in clustering, especially since this is a problem that transcends many different application domains, including areas that are not as familiar with theoretical algorithmic methods. One of the best known methods for clustering with k centers is Lloyd's algorithm (or the k-means algorithm), which works essentially as a discrete variant of expectation-maximization. The algorithm itself is very simple:
- Choose k "centers"
- Assign each point to its nearest center.
- Compute the k centroids of points assigned to a given center. These are the new k "centers".
- Go back to step 2.
- In 1D, and in d-D (on the grid), the number of iterations of the k-means algorithm is polynomial in the number of points, the dimension, and the "spread" (the ratio of max distance to min-distance).
- Even in 1D, there are example point sets on which the k-means algorithm takes a linear number of iterations to converge.
They go on to present empirical evidence in support of their claim that these algorithms generate similar quality answers (in similar time) to k-means. This part is still a work in progress: there are very fast implementations of k-means that they don't compare against, (but the data structures might be used to speed up their approach as well).
It is not clear to me that their paper answers to any degree the question of why Lloyd's algorithm works as well as it appears to. However, the fact that they can prove bounds for their alternate algorithms suggests that maybe this is a line of attack to take when analyzing Lloyd's method. These alternates are also quite simple, and so could conceivably be used as k-means replacements if one wished guaranteed running time bounds.
One final point: the authors have their code available online. Would that more would do the same !
Tuesday, February 22, 2005
CiteULike
CiteULike is a similar site, for archiving papers. Here's how it works. If you find a paper you are interested in, you click on a bookmark link, and the paper is added to a web-based collection under your name. However, what makes this unique is that if the paper link is on a "well known" site like the ACM DL, Citeseer, arxiv, or many other places, bibliographic information is automatically added to the entry (this is because the server parses pages etc and extracts the necessary bibliographic information)
Subsequently, when you want to extract bib entries for all these papers, one click gets you a .bib file for all the papers you want. This is very handy (for example) when you are doing a literature search and reading/downloading papers by the dozen.
You can also associate tags with the entries if you choose to. This gets into the "social bookmarking" aspect of CiteULike: the idea being that if you tag a bunch of papers as being related to "artgallery", and so does someone else, you will learn what the other person has found and vice versa. Personally, I am either too oldfashioned or too clueless to have made effective use of this feature, but the archiving/.bib features above are enough for me try it out.
Like any respectable web-based service, there are RSS feeds for any view of the collection (by tag, by person, by personal tag), and there are also watch lists, so if you want to keep track of when a new page (tag or person) adds a new paper, you can do so.
Here's my paper feed. It will be an interesting experiment.
If you think your committee is rough on you...
Einstein’s conclusions were the product of pure thought, proceeding from the most austere assumptions about nature. In the century since he derived them, they have been precisely confirmed by experiment after experiment. Yet his June, 1905, paper on relativity was rejected when he submitted it as a dissertation. (He then submitted his April paper, on the size of atoms, which he thought would be less likely to startle the examiners; they accepted it only after he added one sentence to meet the length threshold.)The article dwells at length on Gödel's interpretation of time (something I had mentioned a while ago). Read the whole thing: it's fascinating.
(HT: Libertarianoid)
Tags: Gödel, Einstein, relativity, time
Monday, February 21, 2005
A great new resource for molecular modelling
ZINC -- which, in the free software tradition of recursive acronyms, stands for ZINC Is Not Commercial -- is a free database of compounds for "virtual screening." That is, ZINC provides 3D models of chemical compounds in a standard "docking" format used in chemistry and biochemistry software, allowing researchers to assemble and test new chemical compositions on their computers.This is really great. When I was doing my Ph.D, I mucked around with molecular structures for docking and matching purposes, and I had accces only to structures that our collaborators at Pfizer provided us. Having such a large database to play with will really help geometry folks who do molecular modelling, not to mention all the medical researchers who probably already use this.
Things we take for granted..
How we take certain things for granted...Condensing 500 pages into five paragraphs is a fool's errand, but here goes anyway.
The belief that only the community of scholars has the final authority to determine what counts as valid research or permissible speech has deep roots in the history of the university, going all the way back to its origins in medieval Europe. But it was extremely slow to develop in colonial and antebellum America, which had few institutions of higher learning that were anything but outgrowths of religious denominations.
Saturday, February 19, 2005
Numb3rs Review: "Prime Suspect"
I agree with much of what Scott says. I think the problem is a little different though: some of the random phrases attributed to mathematicians (which mathematicians refers to an open problem as a "math problem") actually make sense if a non-mathematician uses them, and when Don (the FBI agent) uses them, it doesn't sound incongruous at all. It's just that the writers need to switch gears a little when writing dialogue for the mathematical characters.
They've done a good job getting a sense of the mathematics; just less of a sense of the mathematician.
Plot Summary: Mathematician claims to have proof of Riemann Hypothesis. Hackers kidnap daughter, demanding proof (and algorithm), which they will then use to crack cryptography.
Major beef: there is no direct link between a solution to the Riemann Hypothesis and factoring. It is undoubtedly true that a solution will shed a lot of light on the structure of primes, but it's not the kind of immediate corollary that is commonly implied in the popular press (take this, for example).
But credit goes where credit is due: I was wondering how they would make the conceptual jump from a theorem to an actual algorithm (assuming said connection). At least they tried to make it credible, with Charlie pointing out that an algorithm could take years to develop.
A point that I didn't really consider, but both Scott and my wife noticed, was that by using the Riemann Hypothesis, the plot line introduced the idea of a problem that has remained unsolved for 150 years, and that it's good for people to realize that "math problems" can take a while to solve. In fact, this point is reinforced when Don expects Charlie to solve it in a few hours :).
Trivia note: the mathematician father was played by Neil Patrick Harris, the "original" child prodigy from Doogie Howser, M.D.
Tags: numb3rs, review
Tuesday, February 15, 2005
Erik Demaine and Origami
- The article has no quotes that make him look like an absent-minded professor.
- It does a beautiful job of capturing both the art and the science of origami.
- It has no real clangers in terms of the mathematical descriptions.
Monday, February 14, 2005
"Laws of Chance"
A recent /. post links to an article on RedNova News about a "Random Event Generator" that purports to predict world events:
One of these new technologies was a humble-looking black box known was a Random Event Generator (REG). This used computer technology to generate two numbers - a one and a zero - in a totally random sequence, rather like an electronic coin-flipper.Now it's all very well to generate randomness from a computer, and in fact there have been suggestions that certain kinds of electrical fluctuations on a chip can be recorded and used for randomness generation. But to claim that this is "totally random" by virtue of using the magical "computer technology" is a bit of a stretch.
But wait, it gets worse:
The pattern of ones and noughts - 'heads' and 'tails' as it were - could then be printed out as a graph. The laws of chance dictate that the generators should churn out equal numbers of ones and zeros - which would be represented by a nearly flat line on the graph. Any deviation from this equal number shows up as a gently rising curve.
I presume they mean that they plot |H-T| against n. But as any book on probability will tell you, you do expect to see random fluctuations (this is after all a binomial process), and the standard deviation is around sqrt(N), so bumps are to be expected: a flat line is what would be suspicious.
During the late 1970s, Prof Jahn decided to investigate whether the power of human thought alone could interfere in some way with the machine's usual readings. He hauled strangers off the street and asked them to concentrate their minds on his number generator. In effect, he was asking them to try to make it flip more heads than tails.
It was a preposterous idea at the time. The results, however, were stunning and have never been satisfactorily explained.
Well it sounds like the results haven't been stated either. What exactly were these "stunning" results ? Ah, here we go:
"All the known laws of science" can't explain an unequal number of heads and tails. In other words, if I just managed to get a streak of 10 heads with a coin, I'm psychic ?Again and again, entirely ordinary people proved that their minds could influence the machine and produce significant fluctuations on the graph, 'forcing it' to produce unequal numbers of 'heads' or 'tails'.
According to all of the known laws of science, this should not have happened - but it did. And it kept on happening.
Aieee......
Saturday, February 12, 2005
Theory folks in the press
In a paper titled "Pricing Via Processing, or Combating Junk MailThis is remarkably prescient: in 1992 I was grateful to have access to email, let alone worry about junk mail. On an irrelevant note, I am impressed that the NYT has started providing real links in their articles.," two computer scientists, Cynthia Dwork and Moni Naor, came up with a way to force a sender to pay every time a message was sent - payment not in money, but in time, by applying the computer's resources to a computational puzzle, devised on the fly for that particular message.
[...snip...]
Ms. Dwork and her colleagues have named this the Penny Black Project.
Tags: spam, cryptography
SODA Outtakes: Approximation algorithms for metric embeddings.
This set off a flurry of research into questions of the form "What is the best distortion for embedding spaces of type A into spaces of type B". There is no way I can survey all the results in this area; a new CRC book chapter by Indyk and Matoušek should be able to cover most of the terrain, and the excellent book by Deza and Laurent has much of the technical background. Everything started (in a sense) from the result by Bourgain that any metric space can be embedded into l1 with distortion O(log n) (albeit in log2 n dimensions).
Other questions that were studied along the way:
- what is the minimum distortion that one must face when embedding A into B (metrics derived from expanders proved to be useful gadgets in this regard)
- Can one reduce the dimensionality of a space without distorting distances too much ? This question has numerous implications for all kinds of problems in clustering, data mining, database search, you name it...
- Are there general subclasses of metrics that have low-distortion embeddings ? A fundamental open question here involves metrics on planar graphs. It is still open as to whether they can be embedded in l1 with constant distortion (again, this has implications for sparsest cut in planar graphs)
- Metric "Ramsey theory": if I can't embed the entire metric space, can I embed a large fraction of the points/edges with smaller distortion ? Yair Bartal just gave a nice talk at AT&T on partial embeddings (the "edge" variant).
It was shown early on that one can find the optimal distortion into l2 (upto additive error) in polynomial time using semi-definite programming. For l1, the corresponding question is NP-hard. What I have been seeing is a slew of interesting recent papers that explore hardness results and approximation algorithms for embedding problems:
- Kenyon/Rabani/Sinclair (STOC 2004): They consider bijections between two n-point metric spaces, and present a constant factor approximation in the case when the metrics are derived from a line (as long as OPT is small).
- Papadimitriou/Safra (SODA 2005): They show that the minimum distortion bijection between two 3D point sets is hard to approximate to within a factor of 3.
- Badoiu, Dhamdhere, Gupta, Rabinovich, Raecke, Ravi, and Sidiropoulos (SODA 2005): A collection of results on embedding metrics in 1D and 2D. One main result is an O(sqrt{n})-approximation for embedding a metric on the line.
- Badoiu, Chuzhoy, Indyk, Sidiropoulos (STOC 2005): I don't have a copy of the paper, but the title is indicative enough: "Low-Distortion Embeddings of General Metrics Into the Line"
p.s usual disclaimers apply: this is a blog post, not a survey, and hence is opinionated and far from comprehensive. Pointers to relevant add ons are welcome.
Turing Train Terminal
This is by far a more enjoyable variant.
Via BoingBoing.
Friday, February 11, 2005
Numb3rs review: "Structural Corruption"
Charlie was using proofs that naturalize, which won't work because of the results of Razborov and Rudich (Judd Hirsch: So use unnatural techniques)Last night's episode dealt with calculations involving the structural integrity of a building. Not much actual math shown in this episode, and there were some curious points:
- Charlie is a "professor of applied mathematics", and yet in the first episode crazy string theorist wants his help with some calculations. I guess string theory could be viewed as applied math in some circles...
- When trying to figure out why the dead student had come to him for help with calculations (is he a math prof or a human calculator?), Charlie says "We can only infer that it had something to do with math". Uh, yeah......
- Charlie, "professor of applied math" and string theorist par excellence, can also mock up a structural integrity demo, with GUI, 3D graphics and all, in one night. Sheesh; give him a job at Google, people...
Tags: numb3rs, reviews
Thursday, February 10, 2005
Football and Dynamic Programming.
On Ph.D Theses
- your advisor
- your committee
- your friends (to see if they've been acknowledged)
- your parents (who don't necessarily read it so much as keep it as a trophy/proof that you can now get a "real job".
So if there are any graduate students reading this, my recommendation to them is: take the time to expound on your area in an introduction. You are (or should be) an expert on this topic by now, and your voice is authoritative enough, even if you may not feel that way. The reward, of having someone read your work and appreciate it, is something that can be surprisingly elusive.
Using the GPU
- OS X: "Quartz Extreme uses a supported graphics card built into your Mac to relieve the main PowerPC chip of on screen calculations. This dramatically improves system performance, making Panther much more responsive."
- Longhorn: "Longhorn will feature a graphics subset called WGF (Windows Graphic Foundation). Its goal is to unify 2D and 3D graphics operation in one and will bring 2D Windows drawing and 3D operations together. Nowadays, 3D is done using a Direct X subset with the current version 9.0c. Longhorn will also use 3D menus and 3D interfaces and will require at least Shader 2.0 compliant cards, so it will make the graphics card an important part of your PC."
- And late to the party, but getting there, Unix/X (via Thane Plambeck): " David Reveman, who became a Novell employee a couple of weeks ago, has been writing a new X server on OpenGL/Glitz called Xgl. Because Xgl is built on GL primitives it naturally gets the benefit of hardware acceleration. For example, window contents get rendered directly into textures (actually they get copied once in video memory for now), and so you get the benefit of the 3d hardware doing the compositing when you move semi-opaque windows or regions around."
Wednesday, February 09, 2005
Heal thyself ?
C'mon guys, you can do better than that.
p.s take the tour. Try searching for 'free wifi 19130': excellent !!
Monday, February 07, 2005
"To chop a tea kettle"
Scott McLemee at Inside Higher Ed has an interesting column on modern academic interactions, and how they relate to the blogosphere. The article is worth reading in its entirety, (as is the Crooked Timber note that pointed me there), but I couldn't resist excerpting the following, on discussions at conferences:
And the title of my post ? Read the whole article...At conferences, scholars would stand up and read their papers, one by one. Then the audience would "ask questions," as the exercise is formally called. What that often meant, in practice, was people standing up to deliver short lectures on the papers they would have liked to have heard, instead -- and presumably would have delivered, had they been invited.
Hypothetically, if everyone on a panel read one another's papers beforehand, they might be able to get some lively cross-talk going. This does happen in some of the social sciences, but it seems never to occur among humanities scholars. The whole process seems curiously formal, and utterly divorced from any intent to communicate. A routine exercise, or rather perhaps an exercise in routinism. A process streamlined into grim efficiency, yielding one more line on the scholar's vita.
[...snip...]
The inner dynamic of these gatherings is peculiar, but not especially difficult to understand. They are extremely well-regulated versions of what Erving Goffman called "face work" -- an "interaction ritual" through which people lay claim to a given social identity. Thanks to the steady and perhaps irreversible drive to "professionalization," the obligation to perform that ritual now comes very early in a scholar's career.
And so the implicit content of many a conference paper is not, as one might think, "Here is my research." Rather, it is: "Here am I, qualified and capable, performing this role, which all of us here share, and none of us want to question too closely. So let's get it over with, then go out for a drink afterwards.
Friday, February 04, 2005
Numb3rs Review: "Vector"
I have to say that the writers spent a lot more time getting the math details right than getting the biology right. Why on earth would you compare geometric structures of DNA, when you could just look at the sequence ? My wife the biologist had two comments on this, her first episode.
- "This is not a TV show, it's a horror movie"
- "I've been beaten senseless"
There was an amusing incident with the eccentric string theorist where he forgot which way he was headed. I could be wrong, but I think this refers to an anecdote involving Herman Weyl at Princeton.
There was more Numb3rs PR in the math press. Keith Devlin (the Math Guy) had an interesting article on Numb3rs at MAA Online. He does an occasional segment on Weekend Edition with Scott Simon. His interviews make for interesting listening (a recent one was on the continuum hypothesis).
Tags: numb3rs, review
And the wrath of God thunders down upon the unbelievers
He was known as an architect of the evolutionary or modern synthesis, an intellectual watershed when modern evolutionary biology was born. The synthesis, which has been described by Dr. Stephen Jay Gould of Harvard as "one of the half-dozen major scientific achievements in our century," revived Darwin's theories of evolution and reconciled them with new findings in laboratory genetics and in field work on animal populations and diversity.
Polynomial hierarchy collapse
As it turns out, this is a trivial corollary of an older result by Boppana, HÃ¥stad and Zachos that if co-NP has short interactive proofs, then PH collapses. Specifically if Unknotting is NP-complete, then Knotting is co-NP-complete and since it is in AM = IP(2), this implies that co-NP is in IP(2), and everything comes crashing down.
It would be nice if there was some way to augment the Complexity Zoo to make such results easier to find.
Unknotting (proof translation)
The paper relies on a construction (due to Haken?) of a triangulated 3-dimensional manifold M(K) from a given knot K, such that M(K) is homeomorphic to the 3-sphere if and only if K is unknotted. Specifically, M(K) consists of two copies of S3-T(K), glued along their boundaries (which are toruses), where T(K) is a tubular neighborhood of the knot K.
So one algorithm for deciding whether a knot K is trivial is to construct M(K) and check whether M(K) is the 3-sphere. The 3-sphere recognition problem is in NP. Haas et al. used a slightly different technique to prove that UNKOTTING is in NP, but both NP-proofs rely on something called "normal surface theory" introduced by Kneser in 1929 and used by Haken in the early 1960s to boil everything down to integer programming(!). (For a brief outline of normal surface theory, read the introduction of Benjamin Burton's Ph.D thesis)
Hara, Tani, and Yamamoto describe an interactive proof protocol for KNOTTING based on this characterization, and similar to an interactive protocol for graph non-isomorphism. The prover is trying to convince the verifier that her knot is nontrivial. The verifier constructs two triangulated 3-manifolds—the manifold M(K) described above, and a particular triangulation S(K) of the 3-sphere. After randomly permuting the vertex labels, the verifier presents the prover with a one of the two triangulations, chosen uniformly at random, without telling the prover which it is. The prover then (supposedly) tells the verifier whether his given manifold is the 3-sphere.
(1) If the prover says that S(K) is the 3-sphere, the verifier learns nothing.
(2) If the prover says that S(K) is NOT the 3-sphere, the verifier knows he's lying and assumes the knot is trivial.
(3) If the prover says that M(K) is the 3-sphere, the verifier assumes the knot is trivial.
(4) if the prover says that M(K) is NOT the 3-sphere, the verifier assumes the knot is nontrivial.
If the knot is trivial, then no prover can convince this verifier with probablity better than 1/2. If the knot is nontrivial, an honest prover can convince this verifier with probability 1/2. Running this protocol twice inflates the probabilities to 1/4 and 3/4.
END PROOF
As an aside, the original two-round interactive proof for graph nonisomorphism is a beautiful example of the power of interactive proofs. It was first demonstrated by Goldreich, Micali and Wigderson in 1986, and I reproduce the proof here in full (for definitions, refer to the original paper).
1. Given two graphs G1, G2 on n vertices. the verifier randomly picks i ε {1,2} and a random permutation π of [1..n].
2. Verifier then sends π(Gi) to prover
3. Prover returns j ε {1,2}, and verifier accepts iff j = i.
Proof:
If G1 and G2 are nonisomorphic, the prover can figure out which is which by comparing the graph sent by the verifier to G1 and G2, and thus will always return the correct answer.
If G1 and G2 are isomorphic, the prover cannot tell the difference, and thus can "fool" the verifier with probability 1/2.
QED.
Thursday, February 03, 2005
Unknotting...
Exhibit 1, in no particular order, is the following result, by Hara, Tani and Yamamoto:
Unknotting is one of the more fundamental problems in knot theory. If we think of a (tame) knot as a (really tangled) loop in three dimensions, then much of simple knot theory focuses on the question: Given two knots, are they equivalent ? (where equivalence is expressed in terms of homeomorphisms).
The trivial knot (or the "unknot") is the simple circle in 3D, and thus the unknotting problem is:
It was first shown in 1961 by Haken that Unknotting was decidable. After a brief gap of 36 years, Haas, Lagarias and Pippenger showed that Unknotting was in fact in NP (!), and conjectured that it was in NP ∩ co-NP. Other results along the way showed that Genus (is the genus of a knot less than g) was in PSPACE, and ManifoldGenus (a generalization to a given arbitrary manifold), was NP-Complete.
The new result speaks for itself. Formally, the authors show that Knotting (the complement of Unknotting) is in AM, by presenting a two-round interactive protocol for it. They also mention a corollary that if Unknotting is NP-Complete, then PH collapses to the second level. If I had paid better attention in my complexity class I would have figured out how this follows, but alas... I'd be grateful for any enlightenment. (Update: here it is)
The tragedy for me is that although it is clear that this result is interesting, the proof itself is rather impenetrable, requiring a knowledge of combinatorial topology above and beyond what I can muster up. This is why translators are needed ! (Update: here it is)
Knot theory is a good example of a (relatively arcane) subfield of mathematics that has suddenly become profoundly important in "real life". For more on this, read what well-known theoretical computer scientist John Baez has to say :).
On a more frivolous note, you can do knot art ! Calling our resident mathematical knot knitter...
Frank Harary, 1921-2005.
Timothy Gowers and the study of mathematics
Ever since then, I've looked for expositors who could extract the same kind of beauty out of mathematics and computer science, distilling its essence in a way that might seem obvious to those of us who practice it, but that allows the core concepts to be revealed to a non-technical audience in a way that captures the imagination.
Timothy Gowers, the Fields Medalist, is well known for his expository articles on a variety of topics. He gave a speech in 2000 in Paris at an event sponsored by the Clay Mathematics Institute. The speech was titled 'The Importance of Mathematics', and I highly encourage watching it.
What struck me the most about this lecture was his eloquent defense of mathematics, and how he conveyed a sense of the aesthetic beauty of the field (why mathematicians refer to a theorem as 'beautiful'). With beautiful examples, he systematically laid out the landscape of a mathematical life, not in terms of topics, but in terms of how mathematical work is done.
He firmly dismissed the idea of mathematics needing to be useful (something we have had our own controversies over in theoryCS), and distinguished between the two statements
(1) Mathematics is not a useful subjectarguing that (1) is false even though (2) is usually true, and that it should be this way.
(2) A typical mathematician does not try to be useful.
A simple way of expressing some of the ideas in his lecture would be:
Mathematics is surprisingly useful, because complexity arises from simplicity and beauty reveals itself to be important.I just bought his book 'Mathematics: A Very Short Introduction', which brings out these ideas in much more detail, and also puts to rest some of the usual canards associated with mathematics (the 30yr old rule, the lack of women, etc).
To some degree, the description of the daily activities of an algorithms researcher are similar to that of a mathematician, but in many ways they are different. We need a similar exposition for theoryCS !