0% found this document useful (0 votes)
33 views

Database Scalability: Jonathan Ellis

This document discusses database scalability and different techniques for scaling databases. It covers three key points: 1) Scaling reads is difficult as disk seeks are slow, requiring techniques like caching. Scaling writes is even harder and requires partitioning or sharding data across multiple servers. 2) There are two main types of replication: asynchronous which is easiest but can lose data if the master fails, and synchronous which is more complex but prevents data loss. Replication does not help scale writes. 3) Partitioning or sharding data across multiple servers based on a partitioning key is necessary to scale writes. This allows adding capacity without downtime but requires an architecture to handle failures and maintain data integrity.

Uploaded by

bbo0t
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Database Scalability: Jonathan Ellis

This document discusses database scalability and different techniques for scaling databases. It covers three key points: 1) Scaling reads is difficult as disk seeks are slow, requiring techniques like caching. Scaling writes is even harder and requires partitioning or sharding data across multiple servers. 2) There are two main types of replication: asynchronous which is easiest but can lose data if the master fails, and synchronous which is more complex but prevents data loss. Replication does not help scale writes. 3) Partitioning or sharding data across multiple servers based on a partitioning key is necessary to scale writes. This allows adding capacity without downtime but requires an architecture to handle failures and maintain data integrity.

Uploaded by

bbo0t
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Database scalability

Jonathan Ellis

Classic RDBMS persistence


Index

Data

Disk is the new tape*


~8ms to seek
~4ms on expensive 15k rpm disks

What scaling means performance

money

Performance
Latency Throughput

Two kinds of operations


Reads Writes

Caching
Memcached Ehcache etc

cache

DB

Cache invalidation
Implicit Explicit

Cache set invalidation


get_cached_cart(cart=13, offset=10, limit=10) get('cart:13:10:10') ?

Set invalidation 2
prefix = get('cart_prefix:13') get(prefix + ':10:10') del('cart_prefix:13')

http://www.aminus.org/blogs/index.php/2007/1

Replication

Types of replication
Master slave
Master slave other slaves

Master master
multi-master

Types of replication 2
Synchronous Asynchronous

Synchronous
Synchronous = slow(er) Complexity (e.g. 2pc) PGCluster Oracle

Asynchronous master/slave
Easiest Failover MySQL replication Slony, Londiste, WAL shipping Tungsten

Asynchronous multi-master
Conflict resolution
O(N3) or O(N2) as you add nodes

http://research.microsoft.com/~gray/replicas.ps

Bucardo MySQL Cluster

Achtung!
Asynchronous replication can lose data if the master fails

Architecture
Primarily about how you cope with failure scenarios

Replication does not scale writes

Scaling writes
Partitioning aka sharding
Key / horizontal Vertical Directed

Partitioning

Key based partitioning


PK of root table controls destination
e.g. user id

Retains referential integrity

Example: blogger.com Users Blogs Comments

Example: blogger.com Users Blogs Comments Comments'

Vertical partitioning
Tables on separate nodes Often a table that is too big to keep with the other tables, gets too big for a single node

Growing is hard

Directed partitioning
Central db that knows what server owns a key Makes adding machines easier Single point of failure

Partitioning

Partitioning with replication

What these have in common


Ad hoc Error-prone Manpower-intensive

To summarize
Scaling reads sucks Scaling writes sucks more

Distributed databases
Data is automatically partitioned Transparent to application Add capacity without downtime Failure tolerant

*Like Bigtable, not Lotus Notes

Two famous papers


Bigtable: A distributed storage system for structured data, 2006 Dynamo: amazon's highly available keyvalue store, 2007

The world doesn't need another half-assed key/value store


(See also Olin Shivers' 100% and 80% solutions)

Two approaches
Bigtable: How can we build a distributed database on top of GFS? Dynamo: How can we build a distributed hash table appropriate for the data center?

Bigtable architecture

Lookup in Bigtable

Dynamo

Eventually consistent

Amazon: http://www.allthingsdistributed.com/2008/12

eBay: http://queue.acm.org/detail.cfm?id=139412

Consistency in a BASE world


If W + R > N, you are 100% consistent W=1, R=N W=N, R=1 W=Q, R=Q where Q = N / 2 + 1

Cassandra

Memtable / SSTable

Disk
Commit log

ColumnFamilies
keyA keyC column1 column1 column2 column7 column3 column11

Column Byte[] Name Byte[] Value I64 timestamp

LSM write properties


No reads No seeks Fast Atomic within ColumnFamily

vs MySQL with 50GB of data


MySQL
~300ms write ~350ms read

Cassandra
~0.12ms write ~15ms read

Achtung!

Classic RDBMS persistence


Index

Data

Questions

You might also like