0% found this document useful (0 votes)
73 views15 pages

DB Capacity Planning at Ebay: Feng Qu, SR Mts Bass Chorng, Principal Capacity Engineer

This document provides an overview of database capacity planning at eBay. It discusses: - eBay's site database traffic, which totals over 1 billion calls per day across NoSQL and RDBMS databases such as Cassandra, MongoDB, Couchbase, MySQL and Oracle. - The key aspects of capacity planning including analyzing traffic, utilization, their relationship, forecasting growth, and converting resource needs to costs. - The importance of understanding the technology platform, system overhead, workload characteristics, and bottlenecks for effective capacity planning.

Uploaded by

Krishna8765
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views15 pages

DB Capacity Planning at Ebay: Feng Qu, SR Mts Bass Chorng, Principal Capacity Engineer

This document provides an overview of database capacity planning at eBay. It discusses: - eBay's site database traffic, which totals over 1 billion calls per day across NoSQL and RDBMS databases such as Cassandra, MongoDB, Couchbase, MySQL and Oracle. - The key aspects of capacity planning including analyzing traffic, utilization, their relationship, forecasting growth, and converting resource needs to costs. - The importance of understanding the technology platform, system overhead, workload characteristics, and bottlenecks for effective capacity planning.

Uploaded by

Krishna8765
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

DB Capacity Planning at eBay

Feng Qu, Sr MTS


Bass Chorng, Principal Capacity Engineer

#CassandraSummit2015    
Who Am I?

Bass Chorng – Principal Capacity Engineer @


eBay

Specializes in database performance, availability


& scalability in a large website.

Established DB capacity team at eBay in 2003.

Loves mountain biking.

#CassandraSummit2015 2
eBay Site DB Traffic At A Glance
NoSQL Total – 52 B/Day
Cassandra – 15 B Billion SQL Calls per Day
Mongo – 15 B 15 15 12 10
10
CouchBase – 12 B Cassandra
PushVM – 10B Mongo
CouchBase
RDBMS Total – 350 B 340
PushVM

MySQL – 10 B MySQL

Oracle – 340 B Oracle

Peak Traffic – 8M/sec


Site Total DB Calls – 400B/Day across 2,000 NoSQL Nodes + 450 Oracle Nodes
Hosting 800M Active items & 120M Active Users
Y-o-Y Growth – 30% ~ 35%

#CassandraSummit2015    
 
Capacity Planning - Simply Put
Ø  Analyze Traffic
o  Data
Ø  Analyze Utilization
o  Data
Ø  Analyze The Relationship Of The Above Two
o  Same Data
Ø  Forecast Growth
o  Simple Models, Then Impress Your Boss.
Ø  Convert Resource Need into $
o  A Calculator, Then Impress Your CIO’s

BTW, You Also Need To Know …

•  Platform Domain Knowledge – Server, DB Engine, IO Subsystem, Networks …


•  Relationship Between System Overhead & Utilization
•  Seasonality & Workload Characteristics
•  Bottlenecks – Components, Systems, Platforms, Architecture, Site & Apps
•  New Technologies

#CassandraSummit2015 4
Domain Knowledge Stack
aka Whom To Blame Stack

APPS C
CA
A
P
DB P
AA
CC
UNIX I
TI
YT
Bottom of food chain => STORAGE Y

#CassandraSummit2015 5
Ø  What

0.0
1.0
2.0
3.0
5/1/2015

Apps,
4.0To
5/2/2015
5/3/2015
5/4/2015
5/5/2015
5/6/2015
5/7/2015
5/8/2015

Ø  How To Use It?


5/10/2015

Ø  How To Collect?


5/11/2015
Collect?

5/12/2015
5/13/2015
5/14/2015
5/15/2015
5/16/2015

0
5000000
10000000
15000000
20000000
25000000
30000000
35000000
40000000
5/17/2015
1/26/2015 5/19/2015
IO Time, NIC, HBA, Array

1/28/2015 5/20/2015
5/21/2015
1/30/2015 5/22/2015
2/1/2015 5/23/2015
5/24/2015
2/3/2015 5/25/2015
2/5/2015 5/26/2015
5/27/2015
2/7/2015
2/9/2015
2/11/2015
2/13/2015
2/15/2015
Data

2/17/2015
2/19/2015
Time Resolution, Aggregation Level, Retention

2/21/2015
2/23/2015
2/25/2015
2/27/2015
3/1/2015
Average, Max, 95th percentile, Dashboard, Reporting, Trending
Database, Sessions, CPU, Memory, Connections, IOPS,

#CassandraSummit2015
6
Forecast
Ø  Model Traffic, Not Resources CATY Traffic Forecast
70
Ø  Need One Year Trend
60
Ø  Forecast At Daily Level
Ø  Eliminate 50Outliers
Ø  No Data Is Better Than Wrong Data
Billion 40
Ø  Convert
Calls Traffic To Resource Usage
30
Ø  Linear Extrapolation Only (CPU Utilization, not IO Time)
20
Ø  Simple Excel Formula Works Well
Ø  For Long 10 Term Resource Planning Only
Ø  Use Average, 0 Not Max
01/01/2012 01/01/2013 01/01/2014 01/01/2015
Ø  Not All Workloads Are Predictable
Forecast Actual Capacity

#CassandraSummit2015
7
Things To Watch For
Myths

Ø  More CPU Makes Apps Run Faster


Ø  More Data Makes Apps Run Slower
Ø  Apps Run Twice As Fast On CPU Twice The Speed
Ø  High Session = High Load

Pitfalls

Ø  Cause VS. Symptom


Ø  Time Resolution Masks Issues
Ø  Look At The Whole Picture
Ø  Slow Down In Order To Go Faster < Throttle >

Challenges

Ø  Data Quality – Data Missing, Data Source Changes, F/O Data Residency, Data Errors …
Ø  Varieties of Data Formats & Resolutions
Ø  Data Collection In Secured Zones

#CassandraSummit2015
8
Me: Everything NoSQL

Ø Prior to 2011: Worked on Oracle at DoubleClick/Yahoo/Intuit

Ø Worked on NoSQL at eBay Database Infrastructure team:


Ø Cassandra since 2011
Ø MongoDB since 2012
Ø Couchbase since 2014

Ø Cassandra Summit speaker for 2013, 2014, 2015

Ø DataStax Cassandra MVP for 2014, 2015

CassandraSummit2015  |  #CassandraSummit  
For Cassandra
Ø Capacity Measurements
Ø Throughput
Ø Latency
Ø E.g. 30,000 reads/sec with SLA of P99 at 5ms

Ø Hardware SKU Example


Ø CPU: 20 cores
Ø Memory: 128GB RAM
Ø Storage: 1.5TB local SSD
Ø Network: 10g NIC

CassandraSummit2015  |  #CassandraSummit  
Benchmarking
Ø Benchmarking for different hardware
Ø  High I/O SKU
Ø  High memory SKU
Ø  High storage SKU
Ø  Bare metal or cloud
Ø Benchmarking for different software releases
Ø Benchmarking for different workloads
Ø  100% Writes
Ø  50% Writes, 50% Reads
Ø  5% Writes, 95% Reads
Ø  100% Reads
Ø Benchmarking Tools
Ø  YCSB
Ø  Cassandra-stress
Ø Proactive and repeated process using near real-time traffic in prod like environment

CassandraSummit2015  |  #CassandraSummit  
Capacity Planning
Ø Key to avoid surprise in production
Ø The concept behind capacity planning is simple, but the mechanics are harder.
Ø Business requirements may increase, need to forecast how much resource must be
added to the system to ensure that user experience continues uninterrupted
Ø  Input: clearly defined capacity goal coming from business requirement and performance baseline
from benchmark test
Ø  Output: Identify resources to be added, such as memory, CPU, storage, I/O, network
Ø Always prepare for peak + headroom

CassandraSummit2015  |  #CassandraSummit  
Capacity Planning Process
Ø Initial Sizing
Ø Storage size vs. data size
Ø Compaction overhead, compression ratio, RF, indexes
Ø Cost-effective configuration to meet capacpity/latency SLA
Ø Routine Review
Ø System utilization on I/O, storage, network, CPU, memory etc
Ø Cassandra metrics on GC, compaction, latency, throughput etc
Ø Compactionstats, cfhistoralgrams, tpstats etc
Ø Forecasting
Ø Historical comparison
Ø Traffic projection
Ø Flex up or Flex down

CassandraSummit2015  |  #CassandraSummit  
Scale Up vs. Scale Out
Ø Scale Up(vertical)
Ø  Pros
Ø Smaller data center footprint, such as space, power, cooling
Ø Less license cost
Ø  Cons
Ø Likely cost more using proprietary hardware
Ø Less fault tolerant
Ø Limited upgradability in future
Ø Scale Out(horizontal)
Ø  Pros
Ø Cheaper using commodity hardware
Ø More fault tolerant
Ø (unlimited) upgradability
Ø  Cons
Ø Bigger data center footprint
Ø More license cost
Ø Likely need more network equipment

CassandraSummit2015  |  #CassandraSummit  
Questions ?

eBay is hiring experienced NoSQL professionals, please send resume to [email protected]

CassandraSummit2015  |  #CassandraSummit  

You might also like