Database Scalability: Jonathan Ellis
Database Scalability: Jonathan Ellis
Jonathan Ellis
Data
money
Performance
Latency Throughput
Caching
Memcached Ehcache etc
cache
DB
Cache invalidation
Implicit Explicit
Set invalidation 2
prefix = get('cart_prefix:13') get(prefix + ':10:10') del('cart_prefix:13')
http://www.aminus.org/blogs/index.php/2007/1
Replication
Types of replication
Master slave
Master slave other slaves
Master master
multi-master
Types of replication 2
Synchronous Asynchronous
Synchronous
Synchronous = slow(er) Complexity (e.g. 2pc) PGCluster Oracle
Asynchronous master/slave
Easiest Failover MySQL replication Slony, Londiste, WAL shipping Tungsten
Asynchronous multi-master
Conflict resolution
O(N3) or O(N2) as you add nodes
http://research.microsoft.com/~gray/replicas.ps
Achtung!
Asynchronous replication can lose data if the master fails
Architecture
Primarily about how you cope with failure scenarios
Scaling writes
Partitioning aka sharding
Key / horizontal Vertical Directed
Partitioning
Vertical partitioning
Tables on separate nodes Often a table that is too big to keep with the other tables, gets too big for a single node
Growing is hard
Directed partitioning
Central db that knows what server owns a key Makes adding machines easier Single point of failure
Partitioning
To summarize
Scaling reads sucks Scaling writes sucks more
Distributed databases
Data is automatically partitioned Transparent to application Add capacity without downtime Failure tolerant
Two approaches
Bigtable: How can we build a distributed database on top of GFS? Dynamo: How can we build a distributed hash table appropriate for the data center?
Bigtable architecture
Lookup in Bigtable
Dynamo
Eventually consistent
Amazon: http://www.allthingsdistributed.com/2008/12
eBay: http://queue.acm.org/detail.cfm?id=139412
Cassandra
Memtable / SSTable
Disk
Commit log
ColumnFamilies
keyA keyC column1 column1 column2 column7 column3 column11
Cassandra
~0.12ms write ~15ms read
Achtung!
Data
Questions