Dave's Data

Posts

Showing posts with the label hashing

Optimistic Cuckoo Hashing for concurrent, read-intensive applications

March 11, 2013

Our FAWN team has been spending a lot of time looking at memory-efficient data and algorithms structures for various things (with a lot of emphasis on key-value stores, as per our first and second papers on FAWN-KV and SILT , respectively). Coming up at NSDI'13 , Bin Fan has a new paper in which we substantially improve the throughput of the memcached distributed DRAM cache. One of the core techniques we use is a new multiple-reader, single-writer concurrent variant of Cuckoo hashing that we call optimistic cuckoo hashing. It combines the refinement of a technique we introduced in SILT ("partial-key cuckoo hashing"), a new way of moving items during cuckoo insertion, and an optimistic variant of lock-striping to create a hash table that is extremely compact and supports extremely high read throughput, while still allowing one thread to update it at high speed (about 2M updates/second in our tests). We've released the code on github to this and you should...

Coming up at ALENEX 2013

November 30, 2012

Coming up at ALENEX 2013, in which Dave discovers that perfect hashing and friends can be a really intellectually engaging problem. This is the algorithm engineering paper about the most memory-efficient index we developed for SILT, with more analysis and depth about the index itself vs. focusing on the big system. Practical Batch-Updatable External Hashing with Sorting Hyeontaek Lim, David G. Andersen, Michael Kaminsky The key thing about this paper is that we've achieved the best-yet overhead for static external perfect hashing. That's a mouthful: Concretely, we use 2.5 bits of DRAM to tell you exactly where to look on your SSD to find a stored key-value pair. The data is stored densely on the SSD, wasting no space on the storage. The prior best approaches were Botelho's EPH scheme , which used about 3.8 bits per key, and the "EM" variant of EPH, which used about 3.1--3.3 bits/key. As a systems geek, I also really like that Hyeontaek's algorithm u...