Showing posts with label algorithms. Show all posts
Showing posts with label algorithms. Show all posts

Thursday, August 25, 2011

Compare-And-Set in Memcache

With the most recent release (1.5.3, last week) App Engine's Python API for Memcache has added a new feature, Compare-And-Set. This feature (with a different API) was already available in Java; it has also been available in the non-App-Engine pure-Python memcache client. In fact, I designed the App Engine Python API for this feature to be compatible with the latter, since most of the rest of the App Engine Python API also strives to be at least a superset of that package.

But what is it? There seems to be little information on how to use Compare-And-Set with memcache. It is also sometimes (incorrectly) referred to as Compare-And-Swap -- incorrect, because the cas() operation does not actually "swap" anything. The first response when we closed the bug requesting this feature was "Some examples of usage are appreciated." So here goes.

The basic use case for Compare-And-Set is when multiple requests that are being handled concurrently need to update the same memcache key in an atomic fashion. Let's assume you are managing a counter in memcache. (Actually, you could use the incr() and decr() operations to update 64-bit integer counters atomically, but just for argument's sake assume you cannot use those -- there are other data types for which the memcache service does not have built-in support.)

The naive code to update a counter would be something like this:

def init_counter(key):
. memcache.set(key, 0)

def bump_counter(key):
. counter = memcache.get(key)
. assert counter is not None, 'Uninitialized counter'
. memcache.set(key, counter+1)

(Aside: I don't want to have to think about how to get blogger to properly format Python code. I really don't. So just bear with the dots I use for indentation. Okay? Comments pointing me to solutions will be DELETED.)

(Aside 2: The assert is kind of naive; in practice you'll have to somehow deal with counter initialization. You also should implement a backup for your counter using the App Engine datastore, so that it can survive eviction by the memcache service. However interesting these details are on their own, I leave them for another time.)

Hopefully you can spot the problem in this version of bump_counter(): if two requests execute concurrently (on different instances of the same app), the sequence of operations might be as follows, labeling the two requests as A and B:

A: counter = memcache.get(key) # Reads 42
B: counter = memcache.get(key) # Reads 42
A: memcache.set(key, counter+1) # Writes 43
B: memcache.set(key, counter+1) # Writes 43

So even though two requests were executed, the counter only gets incremented by one. This is called a race condition. Various interlacings of these lines can have the same effect; a race condition occurs whenever B reads the counter before A has written it (or vice versa).

You could try to guard against this by reading the counter value back and checking that it was incremented by one; however this solution still has a race condition (see if you can figure it out for yourself). There are other solutions possible involving a separate "lock" variable, managed using the add() and delete() operations. However these in general require more server roundtrips, and it is pretty hard to manufacture a decent lock out of the basic memcache operations (try for yourself -- think about what would happen if your request was somehow aborted after acquiring the lock, without having a chance of releasing it).

Using the Compare-And-Set operation, writing a reliable bump_counter() function is a cinch:

def bump_counter(key):
. client = memcache.Client()
. while True: # Retry loop
. . counter = client.gets(key)
. . assert counter is not None, 'Uninitialized counter'
. . if client.cas(key, counter+1):
. . . break

I've highlighted the changes from the previous version. There are several essential differences:
The Client object is required because the gets() operation actually squirrels away some hidden information that is used by the subsequent cas() operation. Because the memcache functions are stateless (meaning they don't alter any global values), these operations are only available as methods on the Client object, not as functions in the memcache module. (Apart from these two, the methods on the Client object are exactly the same as the functions in the module, as you can tell by comparing the documentation.)

The retry loop is necessary because this code doesn't actually avoid race conditions -- it just detects them! The memcache service guarantees that when used in the pattern shown here (i.e. using gets() instead of get() and cas() instead of set()), if two (or more) different client instances happen to be involved a race condition like I showed earlier, only the first one to execute the cas() operation will succeed (return True), while the second one (and later ones) will fail (return False). Let's spell out the events that happen when a race condition occurs:

A: counter = memcache.gets(key) # Reads 42
B: counter = memcache.gets(key) # Reads 42
A: memcache.cas(key, counter+1) # Writes 43, returns True
B: memcache.cas(key, counter+1) # Returns False
B: counter = memcache.gets(key) # Reads 43
B: memcache.cas(key, counter+1) # Writes 44, returns True

Another refinement I've left out here for brevity is to set a limit on the number of retries, to avoid an infinite loop in worst-case scenarios where there is a lot of contention for the same counter (meaning more requests are trying to update the counter than the memcache service can process in real time). You can figure out how to code this for yourself. (UPDATE: see comments #1 and #2 for a note about busy-waiting.)

Now let me explain roughly how this actually works. For some people that helps understanding how to use it: this is often the case for me when I am trying to understand some new concept.

The gets() operation internally receives two values from the memcache service: the value stored for the key (in our example the counter value), and a timestamp (also known as the cas_id). The timestamp is an opaque number; only the memcache service knows what it means. The important thing is that each time the value associated with a memcache key is updated, the associated timestamp is changed. The gets() operation stores this timestamp in a Python dict on the Client object, using the key passed to gets() as the dict key.

The cas() operation internally adds the timestamp to the request it sends to the memcache service. The service then compares the timestamp received with a cas() operation to the timestamp currently associated with the key. If they match, it updates the value and the timestamp, and returns success. If they don't match, it leaves the value and timestamp alone, and returns failure. (By the way, it does not send the new timestamp back with a successful response. The only way to retrieve the timestamp is to call gets().)

Of course, there's one more important ingredient: the App Engine memcache service itself behaves atomically. That is, when two concurrent requests (for the same app id) use memcache, they will go to the same memcache service instance (for historic reasons called a shard), and the memcache service has enough internal locking so that concurrent requests for the same key are properly serialized. In particular this means that two cas() requests for the same key do not actually run in parallel -- the service handles the first request that came in until completion (i.e., updating the value and timestamp) before it starts handling the second request.

And that's Compare-And-Set in a nutshell. If you have questions please don't hesitate to ask!

(UPDATE: The memcache API defines batch versions of most of its APIs. For example, to get multiple keys in a single call, there is get_multi(); to set multiple keys, there is set_multi(). Corresponding to cas(), there is cas_multi(). But there is no gets_multi(): instead, you can use get_multi(keys, for_cas=True). Finally, there's cas_reset(), which clears the dict used to store timestamps. But I haven't figured out what to do with it yet. :-)

Friday, June 3, 2011

The depth and breadth of Python

As of late I'm noticing a trend: I'm spending more time having in-person in-depth conversations, and less time coding. While I regret the latter, I really enjoy the former. Certainly more than weekly meetings, code reviews, or bikeshedding email threads. (I'm not all that excited about blogging either, as you may have guessed; but some things just don't fit in 140 characters.)

Two conversations with visitors I particularly enjoyed this week were both with very happy Python users, and yet they couldn't be more different. This to me is a confirmation of Python's enduring depth and breadth: it is as far away of a one-trick language as you can imagine.

My first visitor was Annie Liu, a professor of computer science (with a tendency to theory :-) at Stony Brook University in New York State. During an animated conversation that lasted nearly three hours (and still she had more to say :-) she explained to me the gist of her research, which appears to be writing small Python programs that implement fundamental algorithms using set comprehensions, and then optimizing the heck out of it using an automated approach she summarized as the three I's: Iterate, incrementalize, and implement. While her academic colleagues laugh at her for her choice of such a non-theoretical language like Python, her students love it, and she seems to be having the last laugh, obtaining publication-worthy results that don't require advanced LaTeX skills, nor writing in a dead language like SETL (of which she is also a great fan, and which, via ABC, had some influence on Python -- see also below).

Annie told me an amusing anecdote about an inscrutable security standard produced by NiST a decade ago, with a fifty-page specification written in Z. She took a 12-page portion of it and translated it into a 120-line Python program, which was much more readable than the original, and in the process she uncovered some bugs in the spec!

Another anecdote she recounted had reached me before, but somehow I had forgotten about it until she reminded me. It concerns the origins of Python's use of indentation. The anecdote takes place long before Python was created. At an IFIP working group meeting in a hotel, one night the delegates could not agree about the best delimiters to use for code blocks. On the table were the venerable BEGIN ... END, the newcomers { ... }, and some oddities like IF ... FI and indentation. In desperation someone said the decision had to be made by a non-programmer. The only person available was apparently Robert Dewar's wife, who in those days traveled with her husband to these events. Despite the late hour, she was called down from her hotel room and asked for her independent judgement. Immediately she decided that structuring by pure indentation was the winner. Now, I've probably got all the details wrong here, but apparently Lambert Meertens was present, who went on to design Python's predecessor, ABC, though at the time he called it B (the italics meant that B was not the name of the language, but the name of the variable containing the name of the language). I checked my personal archives, and the first time I heard this was from Prof. Paul Hilfinger at Berkeley, who recounted a similar story. In his version, it was just Lambert Meertens and Robert Dewar, and Robert Dewar's wife chose indentation because she wanted to go to bed. Either way it is a charming and powerful story. (UPDATE: indeed the real story was quite different.)

Of course Annie had some requests as well. I'll probably go over these in more detail on python-ideas, but here's a quick rundown (of what I could remember):
  • Quantifiers. She is really longing for the "SOME x IN xs HAS pred" notation from ABC (and its sibling "EACH x IN xs HAS pred"), which superficially resemble Python's any() and all() functions, but have the added semantics of making x available in the scope executed when the test succeeds (or fails, in the case of EACH -- then x represents a counterexample).
  • Type declarations. (Though I think she would be happy with Python 3 function annotations, possibly augmented with the attribute declarations seen in e.g. Django and App Engine's model classes.)
  • Pattern matching, a la Erlang. I have been eying these myself from time to time; it is hard to find a syntax that really shines, but it seems to be a useful feature.
  • Something she calls labels or yield points. It seems somewhat similar to yield statements in generators, but not quite.
  • She has only recently begun to look at distributed algorithms (she had some Leslie Lamport anecdotes as well) and might prefer sets to be immutable after all. Though that isn't so clear; her work so far has actually benefited from mutating sets to maintain some algorithmic invariant. (The "incrementalize" of the three I's actually refers to a form of "differentiation" of expressions that produce a new set for each input.)
The contrast with my visitor the next day couldn't be greater. Through a former colleague I got an introduction to Drew Houston, co-founder and CEO of the vastly successful start-up company Dropbox. Dropbox currently has 25 million users, stores petabytes of data on Amazon S3, is profitable, and is not for sale. Drew is an easygoing MIT graduate who is equally comfortable discussing custom memory allocators, the world of venture capitalism, and how to keep engineers happy; he likes hard problems and winning.

Python plays an important role in Dropbox's success: the Dropbox client, which runs on Windows, Mac and Linux (!), is written in Python. This is key to the portability: everything except the UI is cross-platform. (The UI uses a Python-ObjC bridge on Mac, and wxPython on the other platforms.) Performance has never been a problem -- understanding that a small number of critical pieces were written in C, including a custom memory allocator used for a certain type of objects whose pattern of allocation involves allocating 100,000s of them and then releasing all but a few. Before you jump in to open up the Dropbox distro and learn all about how it works, beware that the source code is not included and the bytecode is obfuscated. Drew's no fool. And he laughs at the poor competitors who are using Java.

Next Monday I'm having lunch with another high-tech enterpreneur, a Y-combinator start-up founder using (and contributing to) App Engine. Maybe I should just cancel all weekly meetings and sign off from all mailing lists and focus on two things: meeting Python users and coding. That's the life!

Monday, April 27, 2009

Final Words on Tail Calls

A lot of people remarked that in my post on Tail Recursion Elimination I confused tail self-recursion with other tail calls, which proper Tail Call Optimization (TCO) also eliminates. I now feel more educated: tail calls are not just about loops. I started my blog post when someone pointed out several recent posts by Pythonistas playing around with implementing tail self-recursion through decorators or bytecode hacks. In the eyes of the TCO proponents those were all amateurs, and perhaps that's so.

The one issue on which TCO advocates seem to agree with me is that TCO is a feature, not an optimization. (Even though in some compiled languages it really is provided by a compiler optimization.) We can argue over whether it is a desirable feature. Personally, I think it is a fine feature for some languages, but I don't think it fits Python: The elimination of stack traces for some calls but not others would certainly confuse many users, who have not been raised with tail call religion but might have learned about call semantics by tracing through a few calls in a debugger.

The main issue here is that I expect that in many cases tail calls are not of a recursive nature (neither direct nor indirect), so the elimination of stack frames doesn't do anything for the algorithmic complexity of the code, but it does make debugging harder. For example, if your have a function ending in something like this:
if x > y:
return some_call(z)
else:
return 42
and you end up in the debugger inside some_call() whereas you expected to have taken the other branch, with TCO as a feature your debugger can't tell you the value of x and y, because the stack frame has been eliminated.

(I'm sure at this point someone will bring up that the debugger should be smarter. Sure. I'm expecting your patch for CPython any minute now.)

The most interesting use case brought up for TCO is the implementation of algorithms involving state machines. The proponents of TCO claim that the only alternative to TCO is a loop with lots of state, which they consider ugly. Now, apart from the observation that since TCO essentially is a GOTO, you write spaghetti code using TCO just as easily, Ian Bicking gave a solution that is as simple as it is elegant. (I saw it in a comment to someone's blog that I can't find back right now; I'll add a link if someone adds it in a comment here.) Instead of this tail call:
return foo(args)
you write this:
return foo, (args,)
which doesn't call foo() but just returns it and an argument tuple, and embed everything in a "driver" loop like this:
func, args = ...initial func/args pair...
while True:
func, args = func(*args)
If you need an exit condition you can use an exception, or you could invent some other protocol to signal the end of the loop (like returning None).

And here it ends. One other thing I learned is that some in the academic world scornfully refer to Python as "the Basic of the future". Personally, I rather see that as a badge of honor, and it gives me an opportunity to plug a book of interviews with language designers to which I contributed, side by side with the creators of Basic, C++, Perl, Java, and other academically scorned languages -- as well as those of ML and Haskell, I hasten to add. (Apparently the creators of Scheme were too busy arguing whether to say "tail call optimization" or "proper tail recursion." :-)

Thursday, January 29, 2009

Detecting Cycles in a Directed Graph

I needed an algorithm for detecting cycles in a directed graph. I came up with the following. It's probably something straight from a textbook, but I couldn't find a textbook that had one, so I came up with this myself. I like the simplicity. I also like that there's a well-defined point in the algorithm where you can do any additional processing on each node for which you find that is not part of a cycle.

The function makes few assumptions about the representation of the graph; instead of a graph object, it takes in two function arguments that are called to describe the graph:
  • def NODES(): an iterable returning all nodes
  • def EDGES(node): an iterable returning all nodes reached via node's outgoing edges
In addition it takes a third function argument which is called once for each node:
  • def READY(node): called when we know node is not part of any cycles
The function returns None upon success, or a list containing the members of the first cycle found otherwise. Here's the algorithm:
def find_cycle(NODES, EDGES, READY):
todo = set(NODES())
while todo:
node = todo.pop()
stack = [node]
while stack:
top = stack[-1]
for node in EDGES(top):
if node in stack:
return stack[stack.index(node):]
if node in todo:
stack.append(node)
todo.remove(node)
break
else:
node = stack.pop()
READY(node)
return None
Discussion: The EDGES() function may be called multiple times for the same node, and the for loop does some duplicate work in that case. A straightforward fix for this inefficiency is to maintain a parallel stack of iterators that is pushed and popped at the same times at the main stack, and at all times contains iter(node). I'll leave that version as an excercise.

Update: Fixed a typo in the algorithm (EDGES(top)) and renamed all to todo.