0% found this document useful (0 votes)

33 views

Database Scalability: Jonathan Ellis

This document discusses database scalability and different techniques for scaling databases. It covers three key points: 1) Scaling reads is difficult as disk seeks are slow, requiring techniques like caching. Scaling writes is even harder and requires partitioning or sharding data across multiple servers. 2) There are two main types of replication: asynchronous which is easiest but can lose data if the master fails, and synchronous which is more complex but prevents data loss. Replication does not help scale writes. 3) Partitioning or sharding data across multiple servers based on a partitioning key is necessary to scale writes. This allows adding capacity without downtime but requires an architecture to handle failures and maintain data integrity.

Uploaded by

bbo0t

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

Database Scalability: Jonathan Ellis

Uploaded by

bbo0t

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Database scalability

Jonathan Ellis

Classic RDBMS persistence

Index

Data

Disk is the new tape*

~8ms to seek
~4ms on expensive 15k rpm disks

What scaling means performance

money

Performance
Latency Throughput

Two kinds of operations

Reads Writes

Caching
Memcached Ehcache etc

cache

Cache invalidation
Implicit Explicit

Cache set invalidation

get_cached_cart(cart=13, offset=10, limit=10) get('cart:13:10:10') ?

Set invalidation 2
prefix = get('cart_prefix:13') get(prefix + ':10:10') del('cart_prefix:13')

http://www.aminus.org/blogs/index.php/2007/1

Replication

Types of replication
Master slave
Master slave other slaves

Master master
multi-master

Types of replication 2
Synchronous Asynchronous

Synchronous
Synchronous = slow(er) Complexity (e.g. 2pc) PGCluster Oracle

Asynchronous master/slave
Easiest Failover MySQL replication Slony, Londiste, WAL shipping Tungsten

Asynchronous multi-master
Conflict resolution
O(N3) or O(N2) as you add nodes

http://research.microsoft.com/~gray/replicas.ps

Bucardo MySQL Cluster

Achtung!
Asynchronous replication can lose data if the master fails

Architecture
Primarily about how you cope with failure scenarios

Replication does not scale writes

Scaling writes
Partitioning aka sharding
Key / horizontal Vertical Directed

Partitioning

Key based partitioning

PK of root table controls destination
e.g. user id

Retains referential integrity

Example: blogger.com Users Blogs Comments

Example: blogger.com Users Blogs Comments Comments'

Vertical partitioning
Tables on separate nodes Often a table that is too big to keep with the other tables, gets too big for a single node

Growing is hard

Directed partitioning
Central db that knows what server owns a key Makes adding machines easier Single point of failure

Partitioning

Partitioning with replication

What these have in common

Ad hoc Error-prone Manpower-intensive

To summarize
Scaling reads sucks Scaling writes sucks more

Distributed databases
Data is automatically partitioned Transparent to application Add capacity without downtime Failure tolerant

*Like Bigtable, not Lotus Notes

Two famous papers

Bigtable: A distributed storage system for structured data, 2006 Dynamo: amazon's highly available keyvalue store, 2007

The world doesn't need another half-assed key/value store

(See also Olin Shivers' 100% and 80% solutions)

Two approaches
Bigtable: How can we build a distributed database on top of GFS? Dynamo: How can we build a distributed hash table appropriate for the data center?

Bigtable architecture

Lookup in Bigtable

Dynamo

Eventually consistent

Amazon: http://www.allthingsdistributed.com/2008/12

eBay: http://queue.acm.org/detail.cfm?id=139412

Consistency in a BASE world

If W + R > N, you are 100% consistent W=1, R=N W=N, R=1 W=Q, R=Q where Q = N / 2 + 1

Cassandra

Memtable / SSTable

Disk
Commit log

ColumnFamilies
keyA keyC column1 column1 column2 column7 column3 column11

Column Byte[] Name Byte[] Value I64 timestamp

LSM write properties

No reads No seeks Fast Atomic within ColumnFamily

vs MySQL with 50GB of data

MySQL
~300ms write ~350ms read

Cassandra
~0.12ms write ~15ms read

Achtung!

Classic RDBMS persistence

Index

Data

Questions

Lab 9 - Exploiting Application Vulnerabilities Using ZAP, XSS and URL Manipulation - CYB302
No ratings yet
Lab 9 - Exploiting Application Vulnerabilities Using ZAP, XSS and URL Manipulation - CYB302
12 pages
MySQL Scaling and High Availability Architectures
100% (8)
MySQL Scaling and High Availability Architectures
57 pages
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
Amazon Dynamo DB - Presentation
100% (1)
Amazon Dynamo DB - Presentation
30 pages
Types of Attributes Microstrategy Interview Questions
100% (1)
Types of Attributes Microstrategy Interview Questions
1 page
First Derivatives In-Memory Databases: Peter Storeng
No ratings yet
First Derivatives In-Memory Databases: Peter Storeng
34 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
Memcached and Redis
No ratings yet
Memcached and Redis
12 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
SQL Server Administration PDF
No ratings yet
SQL Server Administration PDF
13 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
Mysql Cluster Deployment Best Practices
No ratings yet
Mysql Cluster Deployment Best Practices
39 pages
Hbase in Practice
No ratings yet
Hbase in Practice
46 pages
Another MySQL Performance Talk
100% (1)
Another MySQL Performance Talk
35 pages
Scaling To 200K Transactions Per Second With Open Source - MySQL, Java, Curl, PHP
No ratings yet
Scaling To 200K Transactions Per Second With Open Source - MySQL, Java, Curl, PHP
37 pages
Rwws Mysql 2006
No ratings yet
Rwws Mysql 2006
73 pages
10 NoSQL Databases - HBase Hive Cassandra
No ratings yet
10 NoSQL Databases - HBase Hive Cassandra
74 pages
Lecture 04 - Cloud Storage
No ratings yet
Lecture 04 - Cloud Storage
28 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
Common Exadata Mistakes: Andy Colvin Practice Director, Enkitec IOUG Collaborate 2014
No ratings yet
Common Exadata Mistakes: Andy Colvin Practice Director, Enkitec IOUG Collaborate 2014
49 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
Enkitec-DIY Exadata KerryOsborne
No ratings yet
Enkitec-DIY Exadata KerryOsborne
26 pages
Optimizing Data Loading
No ratings yet
Optimizing Data Loading
26 pages
Tim Hawkins: or "How To Survive The Digg or Slashdot Effect"
100% (10)
Tim Hawkins: or "How To Survive The Digg or Slashdot Effect"
34 pages
Hadoopintro
No ratings yet
Hadoopintro
31 pages
lec09-no-sql
No ratings yet
lec09-no-sql
42 pages
Exadata Primer For The Executive
No ratings yet
Exadata Primer For The Executive
65 pages
Impala Presentation - Orlando PDF
No ratings yet
Impala Presentation - Orlando PDF
60 pages
Optimizing Tempdb Performance
No ratings yet
Optimizing Tempdb Performance
42 pages
Intro to memcached
No ratings yet
Intro to memcached
77 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
Mongodb at Fliptop
No ratings yet
Mongodb at Fliptop
28 pages
An Introduction To MySQL Performance Optimization
No ratings yet
An Introduction To MySQL Performance Optimization
20 pages
Extreme Performance
No ratings yet
Extreme Performance
45 pages
RK NoSQL
No ratings yet
RK NoSQL
35 pages
Big data Slides
No ratings yet
Big data Slides
26 pages
Introduction To Big Data and NoSQL
No ratings yet
Introduction To Big Data and NoSQL
52 pages
Memcached Talk
No ratings yet
Memcached Talk
35 pages
NO-SQL
No ratings yet
NO-SQL
32 pages
Memcache: Rob Sharp Rob@sharp - Id.au Lead Developer The Sound Alliance
100% (2)
Memcache: Rob Sharp Rob@sharp - Id.au Lead Developer The Sound Alliance
35 pages
This Unit: Caches: - Basic Memory Hierarchy Concepts
No ratings yet
This Unit: Caches: - Basic Memory Hierarchy Concepts
24 pages
MySQL Performance Basics - BeCamp 2008
No ratings yet
MySQL Performance Basics - BeCamp 2008
19 pages
Cheat Sheets - 4
No ratings yet
Cheat Sheets - 4
10 pages
Best Practices PDF
No ratings yet
Best Practices PDF
47 pages
Memory Management in SQL Server Analysis Services: Steve Wright Director of Product Support SQL Sentry, Inc
No ratings yet
Memory Management in SQL Server Analysis Services: Steve Wright Director of Product Support SQL Sentry, Inc
38 pages
Storing Data: Disks and Files: (R&G Chapter 9)
No ratings yet
Storing Data: Disks and Files: (R&G Chapter 9)
39 pages
JRandall - Build High Perf SQL Server
No ratings yet
JRandall - Build High Perf SQL Server
43 pages
DBT SB
No ratings yet
DBT SB
159 pages
A Practical Guide To Oracle 10g RAC Its REAL Easy!: Gavin Soorma, Emirates Airline, Dubai Session# 106
No ratings yet
A Practical Guide To Oracle 10g RAC Its REAL Easy!: Gavin Soorma, Emirates Airline, Dubai Session# 106
113 pages
os
No ratings yet
os
16 pages
Lecture1.1 Database Concepts
No ratings yet
Lecture1.1 Database Concepts
61 pages
Linuxpiter 2015 Kosmodemiansky Linux
No ratings yet
Linuxpiter 2015 Kosmodemiansky Linux
26 pages
NOsql Presentation
No ratings yet
NOsql Presentation
20 pages
Cloud Data Storage
No ratings yet
Cloud Data Storage
47 pages
Nosql Databases: P.Krishna Reddy Iiit Hyderabad
No ratings yet
Nosql Databases: P.Krishna Reddy Iiit Hyderabad
30 pages
Overview - Explain - Measuring Performance - Disk Architectures - Indexes - Join Algorithms (CTD.)
No ratings yet
Overview - Explain - Measuring Performance - Disk Architectures - Indexes - Join Algorithms (CTD.)
69 pages
Google BigQuery Analytics
From Everand
Google BigQuery Analytics
Jordan Tigani
3/5 (1)
Elements of Android Room
From Everand
Elements of Android Room
Mark Murphy
No ratings yet
Distributed Caching & Data Management: Mastering Redis, Memcached, And Apache Ignite Caching
From Everand
Distributed Caching & Data Management: Mastering Redis, Memcached, And Apache Ignite Caching
Rob Botwright
No ratings yet
FreeBSD Mastery: Storage Essentials: IT Mastery, #4
From Everand
FreeBSD Mastery: Storage Essentials: IT Mastery, #4
Michael W. Lucas
No ratings yet
Concise Oracle Database For People Who Has No Time
From Everand
Concise Oracle Database For People Who Has No Time
Billy Aung Myint
No ratings yet
Chapter5 Quiz
No ratings yet
Chapter5 Quiz
2 pages
Entity Relationship Model
No ratings yet
Entity Relationship Model
21 pages
Class Quiz 03 - Chapter 03 - Storage
No ratings yet
Class Quiz 03 - Chapter 03 - Storage
2 pages
Red Hat Ceph Storage-1.2.3-Red Hat Ceph Architecture-En-US
No ratings yet
Red Hat Ceph Storage-1.2.3-Red Hat Ceph Architecture-En-US
24 pages
Modeling&ETLDesign PDF
No ratings yet
Modeling&ETLDesign PDF
71 pages
Data Grid
No ratings yet
Data Grid
7 pages
Redhat Linux Enterprise
No ratings yet
Redhat Linux Enterprise
13 pages
CC6 FInals Rev
No ratings yet
CC6 FInals Rev
5 pages
Step-By-step Installation of An EBS 12.2 Vision Instance
No ratings yet
Step-By-step Installation of An EBS 12.2 Vision Instance
30 pages
DBMS Keys.
No ratings yet
DBMS Keys.
16 pages
07
No ratings yet
07
10 pages
Unit 1
No ratings yet
Unit 1
10 pages
Soal Simulasi Final
No ratings yet
Soal Simulasi Final
7 pages
CS408: Data Warehousing: Welcome To Course
No ratings yet
CS408: Data Warehousing: Welcome To Course
45 pages
Saloon and Spa 1st Report
No ratings yet
Saloon and Spa 1st Report
11 pages
QCM
No ratings yet
QCM
11 pages
Final Year Project On Inventory Manageme
No ratings yet
Final Year Project On Inventory Manageme
51 pages
HC Workshop Session-02 Workbook
No ratings yet
HC Workshop Session-02 Workbook
57 pages
College Data Management System Report
No ratings yet
College Data Management System Report
12 pages
6496d9d564de3 Icdl Using Databases 2016 6.0. Jodan College of Technology
100% (1)
6496d9d564de3 Icdl Using Databases 2016 6.0. Jodan College of Technology
178 pages
Nama: Widya Arini Selti Lestari Nim: 1995114037 Prodi: Teknik Informatika 3C Matkul: Basis Data Mysql
No ratings yet
Nama: Widya Arini Selti Lestari Nim: 1995114037 Prodi: Teknik Informatika 3C Matkul: Basis Data Mysql
9 pages
Pl1 - DB2 Examples How To Use Cursors. Example 1
No ratings yet
Pl1 - DB2 Examples How To Use Cursors. Example 1
2 pages
DDE-Module 5 - Dubai Data Inventories
No ratings yet
DDE-Module 5 - Dubai Data Inventories
10 pages
openSAP Hanasql2 Week 3 Transcript EN
No ratings yet
openSAP Hanasql2 Week 3 Transcript EN
21 pages
Unit 1 - Chapter 5 - Worksheet
No ratings yet
Unit 1 - Chapter 5 - Worksheet
4 pages
Spagobi Server Installation v3
No ratings yet
Spagobi Server Installation v3
6 pages
CS Practicals 11-17
No ratings yet
CS Practicals 11-17
11 pages

Database Scalability: Jonathan Ellis

Uploaded by

Database Scalability: Jonathan Ellis

Uploaded by

Database scalability

Classic RDBMS persistence

Disk is the new tape*

What scaling means performance

Two kinds of operations

Cache set invalidation

Bucardo MySQL Cluster

Replication does not scale writes

Key based partitioning

Retains referential integrity

Example: blogger.com Users Blogs Comments

Example: blogger.com Users Blogs Comments Comments'

Partitioning with replication

What these have in common

*Like Bigtable, not Lotus Notes

Two famous papers

The world doesn't need another half-assed key/value store

Consistency in a BASE world

Column Byte[] Name Byte[] Value I64 timestamp

LSM write properties

vs MySQL with 50GB of data

Classic RDBMS persistence

You might also like