The Geomblog: shape

Showing posts with label shape. Show all posts

Saturday, June 19, 2010

The Shape of Shape Analysis Research, Part III

(a brief series on shape analysis: for earlier episodes, click here)

Shape matching research in computational geometry is fundamentally distance-based. In other words, we start with a distance function, and then design algorithms to compute it, or minimize it under transformations, or approximate it, and so on.

There's an important problem with this point of view. While computing the distance between two shapes is an important tool in shape analysis, it's not the only problem. Other equally important problems include:

Finding a shape similar to a query shape
Matching pieces of shapes together
Organizing shapes into groups (i.e clustering)

And so the problem with the distance-based viewpoint is that all you get at the end is an abstract metric space. You can compute d(x,y) in an appropriate amount of time (maybe), but you lack all the additional structure needed to solve these other problems efficiently. With our modern knowledge of metric embeddings, it's always possible to ask if these distances can be embedded in a more tractable space, but it turns out for measures of interest (Hausdorff, Frechet, earthmover), this cannot be done without incurring huge errors.

The idea of shape spaces turns this process around. Rather than starting with the distance, and trying to find a space to embed it in, shape-space based methods start with a mapping that takes a shape to a single point in a (usually curved) space, and use an induced metric (usually some kind of geodesic) as the distance.

By at least one unsourced account, this view of shape dates back to Riemann, but the modern formulation of this approach started with David Kendall, in the 70s. His idea was extremely elegant.

Consider a collection of closed simply connected regions of the plane (the shapes), each shape described by k points on its boundary. Each of these points can be described by the two coordinates (x,y), which we will write as the complex number x+iy. By a shifting transformation, we can ensure that the centroid of each shape lies at the origin. This loses one (complex) degree of freedom, yielding a k-1 dimensional complex vector.

Next, consider what it means to rotate the shape around the origin. In the complex plane, this corresponds to multiplying by the complex number z = exp(i theta). Doing the appropriate projective transformation, this means that we can identify a shape with a single point in k-2 dimensional complex projective space.
The distance between two shapes is now defined as the geodesic distance between two points in this space.

There are a few important points to note here:

Each shape of k points is mapped to a single point in a k-2 dimensional space.
All shapes are assumed to have the same number of points, which correspond across shapes.
The space is constructed by quotienting the original representation (the k-dimensional complex vector) by the special orthogonal group.

This last point is particularly crucial: the invariance under transformations is folded directly into the representation, rather than being something to "solve" via minimization.

The general program outlined by Kendall (map shapes to points on a manifold quotiented by a suitable set of transformations) has led to many other constructions, among the more notable being Bookstein's shape space and the Michor-Mumford representation for planar closed curves invariant under diffeomorphisms (which bears a strong resemblance to a summed variant of the Frechet distance). These methods have (for reasons unknown to me) taken up residence primarily in the computer vision community.

A Critique.

There is much to like about the shape space approach to shape analysis. Fundamentally, by embedding shapes in a space with structure, it gives us both a distance measure and a geometry to play with, and this is invaluable. However, there are serious limitations to the ideas developed thus far.

Computation: It's all very well to come up with a mathematically elegant formulation of a distance as a geodesic, but it's a lot harder to actually compute these distances. In practice, researchers often resort to heuristics with no guarantees beyond local convergence. To me, this is like building a beautiful mansion in a pit of mud: it's hard to get in and out with a lot of dirt and pain.
Scalability: the mathematical complexity also makes it harder to do scalable computations on shapes.
Global vs local features: I'll have more to say about this later, but these approaches (generally speaking) construct a global signature for a shape, which limits one's ability to do partial matching.
Correspondences: The Kendall method at least requires explicit correspondences between points in each shape. Finding correspondences is one of the most annoying parts of shape analysis (and affect most methods for comparing shapes).

Next: We examine the problem of hearing shape, or how the Laplacian starts to figure in.

Monday, June 14, 2010

The Shape of Shape Analysis Research: Part II

(a brief series on shape analysis: for earlier episodes, click here)

Shape analysis in the geometry community follows a fairly standard pattern. It goes something like this:

Fix class of shapes (points or curves, usually)
Define distance between two shapes
Minimize distance under transformations (rotations, translations, sometimes scaling)
Approximate distance if necessary
Study distance for special classes of shapes.

There are many distances that have been studied in this manner for point sets, including the bottleneck matching distance, the Hausdorff distance, the RMS matching distance and the earthmover distance. For curves, the list is much shorter. The Frechet distance is pretty much the only game in town, with a brief cameo by its first cousin, the dynamic time warping distance.

This process has brought forth a number of interesting ideas and tools - among them

free space decompositions and equivalence classes of regions with respect to combinatorial structure in the solution (so you can enumerate solution types)
how to approximate spaces of transformations via carefully chosen grids in order to get provable approximations for distance estimation
connections between geometric matching and string matching via different kinds of hashing tricks.

I'd go as far as to argue that these tools are more important than the measures themselves, because of their applicability.

But while shape matching is a rich area of research within CompGeom, it's less clear what influence this research has had in the larger shape analysis environment. Some of the obstacles (some self-inflicted, some not) are:

Definitions involving 'max' terms.
Combinatorially, it's easier to define distance measures where the distance is governed by a max over some quantity (usually a max distance over pairs of points). It's easier to define the space of possible solutions, and make the requisite combinatorial arguments. But such measures are highly non-robust, since any one 'outlier' can cause the distance measure to report a large distance.

This problem is usually fixed by introducing 'outlier-sensitive' variants of the measure under consideration, which leaves some combinatorial structure intact (at a price), or by replacing 'max' measures by 'sum' measures, which can often be inelegant, and usually destroys most of the algorithmic tools developed for the original case.

Reliance on the 'expensive exact, cheap approximate' framework.
This might require some explanation. Most of the above shape matching measures can be computed for two fixed shapes relatively easily. But if you have to compute them under transformation groups, things get hairy really quickly. Running times in the realm of n^7 or n^8 for point sets in three dimensions are not unusual.

What's worse though is the kind of impracticality involved in these algorithms (yes I know, n^7 doesn't need further beating, but still...). They usually involve finding intersections of surfaces defined by medium-to-high-degree polynomials, and then walking through the arrangements thus defined.

There's no way on God's green earth that anyone in their right mind will implement these algorithms, so we usually apply the standard bait-and-switch: "if you love these expensive exact methods, you're REALLY going to like our fast and tasty approximations !!". There have been some really creative tools designed to solve shape matching problems approximately, but what they often do is hide high complexity in terms involving the error epsilon, while looking well behaved in n.

It's difficult to overstate the depth and beauty that approximation methods bring to the field of computational geometry as a whole, and you should read Sariel's soon-to-be-book on this topic to learn more. But in the narrow realm of shape analysis, this two-step 'expensive exact, sort-of-cheap-approximation' has the following problems:

Designing exact algorithms with humungous running times only confuses the people you might want to have use your algorithms. They'll just turn around and use some other measure. Coming along afterwards and saying, "but wait, we can APPROXIMATE IT" doesn't help, because no one's vested enough in the measure to even care at this point.
Hiding expensive dependencies in the error term is problematic. Theoretically, it's a sound form of analysis - isolating the dependency from the dominant 'input size' term. But practically speaking, a bound that's cubic in 1/epsilon is not a whole lot better than a bound that's (say) quadratic in n, for reasonable values of epsilon. Which is to say, they're both terrible. You can of course protest that in practice things will work a lot better (and they often do!), but again, you've lost your audience, who wasn't really vested in the distance measure in the first place !

Missing the forest for the trees.
This is an odd kind of objection to level at theoretical research. But here's the problem as I see it. If your focus is the primitive 'compute distance between two shapes', then you use that as the platform and layer on things like 'minimize under transformations; find near neighbors', and so on.

The problem is that this approach focuses on the 'tree' (the distance measure) while in my mind missing the 'forest': namely, the larger set of problems of shape analysis, of which computing the distance measure is but one. I'll develop this idea as I talk about other approaches to shape analysis - the point I want to make is that you have to view the particular distance measure between shapes in the context of a large set of possible applications. A measure that looks nice and
clean and well founded on its own may be less well suited for the rigors of supporting a class of analysis problems than a different measure that's maybe less elegant, but more flexible when seen in context.

A coda.
I was discussing some of the issues here with someone, and I think a clarification is important here.

I'm not saying that people can or cannot work on studying whatever distance measures they want. There are all kinds of interesting geometric puzzles that have cropped up while studying shape matching (subquadratic algorithm for the Frechet distance?).

But when we look at the CompGeom shape matching literature through the lens of "Is this a way of designing the theoretical foundations of the larger shape analysis research program", then the above objections become more relevant.

In the next installment, I'll switch over to the line of shape analysis research that originated with David Kendall and others, and present a similar critique of that body of work.

Thursday, May 13, 2010

The Shape of Shape Analysis Research: Part I

Shape analysis is a topic that is almost a killer app for computational geometry. Where the word 'almost' comes in is an interesting story about the tension between computational power and mathematical elegance.

Shape analysis is defined by the generic problem:

Given two (or more) shapes, determine whether they are similar.

Simple, no ? I don't need to list the applications for this problem. Or maybe I should: computer vision, computational biology, graphics, imaging, statistics, computer-aided design, and so many more.

It would not be an exaggeration to say that in many ways, shape analysis is as fundamental a problem as clustering. It's a problem that people have been studying for a very long time (I heard a rumor that Riemann once speculated on the manifold structure of shapes). Especially in the realm of biology, shape analysis is not merely a way to organize biological structures like proteins: it's a critical part of thinking about their very function.

Like any problem of such richness and depth, shape analysis has spawned its own ecology of concepts. You have the distances, the transformation groups, the problem frameworks, the feature representations, the algorithms, and the databases. You now even have the large data sets of very large shapes, and this brings nontrivial computational issues to the forefront.

Shape analysis (or shape matching) has been a core part of the computational geometry problem base for a long time. I've seen papers on point pattern matching from the late 70s. There's been a steady stream of work all through the past 30 years, introducing new concepts, distances and algorithm design principles.

In parallel, and mostly within the computer vision community, there have been other efforts, focused mainly on designing more and more elegant distance measures. Computer graphics got in on the action more recently as well, with a focus on different kinds of measures and problems.

What I think is unfortunate (and this is entirely my own opinion) is that there's a strong disconnect between the developments happening in the computational geometry community, and the parallel efforts in the more 'applied' communities.

I'm not merely talking about lack of awareness of results and ideas. I believe there are fundamentally different ways in which people go about attacking the problems of shape analysis, and I think all sides could benefit greatly from understanding the strengths of the others.

Specifically, I think that within our community, we've focused for far too long on measures that are easy to define and relatively easy to compute (while not being trivial to work with). On the more 'math/vision' side, researchers have focused much more on measures that have the 'right' kind of mathematical structures, but have bailed miserably on computational aspects.

Over the next post or two, I want to develop this argument further, and hopefully end with a set of problems that I think are worthy of study both from their intrinsic appeal and the unsolved computational issues they present.

Stay tuned....

p.s The clustering series will also continue shortly. Oh do I love summer :)

The Geomblog

Pages