eBay Cloud Configuration Management System
蒋旭
平台技术部 架构师
eBay中国技术研发中心
eBay Inc. Proprietary & Confidential
Agenda
• eBay Cloud Overview
– Why eBay Need Cloud?
– eBay Cloud Tech Overview
• CMS - Configuration Management System
– Architecture
– Try Me Page
– Functionality & Demo
• NoSQL in CMS
– Why CMS choose NoSQL?
– Overcome NoSQL Design Challenges
– Resolve Open Source NoSQL Issues
2 eBay Inc. confidential
Why eBay need cloud?
3 eBay Inc. confidential
eBay Scale
Data Search Front
Analytics Infrastructure End
2B page views/day
500M live listings
14,000 application servers
9PB of data
5B queries/day 96M active users
44M line of code
10M items added/day
75B database calls/day
4 eBay Inc. confidential
eBay Utilization
Number of servers required based on utilization for 8 pools
5 eBay Inc. confidential
eBay Global Brands
6 eBay Inc. confidential
eBay Cloud Tech Overview
7 eBay Inc. confidential
eBay Cloud Technology Stack
Service Catalog REST APIs
Ticket driven run book Model Driven Close Loop
automation Automation
Monitoring Complex Event Processing
Configuration Management
Distributed State Management
Database (CMDB)
Chargeback Pay As You Go
8 eBay Inc. confidential
eBay Cloud Architecture Overview
REST API Queue API REST API
Current/expected
state
Configuration
Thresholds/topology Cloud
Management
Manager
Service
Control Discovery
Events &
alerts
REST API Queue API REST API Queue API
Infrastructure &
Monitoring Platform
Mgt Services
Cloud Infrastructure
metrics Control
Agent
9 eBay Inc. confidential
Model Driven Automation
• Desired configuration is
LB Pool LB Pool specified in the expected
state and persisted in
Server Server Server Server Server Server CMS
• Upon approval, the
Reconciliation orchestration will configure
Expected Current
State State the site to reflect the
desired configuration.
Comparison
• Updated site configuration
Orchestration Discovery is discovered based on
detection of configuration
events
• Reconciliation between the
Site expected and current state
allows to verify the proper
configuration.
10 eBay Inc. confidential
Configuration Management System (CMS)
11 eBay Inc. confidential
CMS - Overview
• CMS (Configuration Management System) is a high-performance
metadata-driven persistence and query service for configuration
data with supporting of RESTful API and client lib (Java, Python).
• CMS is a generic system that be used for cloud configuration, as
well other software needs for configuration.
• As a by-product, CMS can be a persistence solution for real-time
state data as well.
• CMS supports multiple data repositories for desired data isolation.
12 eBay Inc. confidential
CMS - Architecture
REST Request
REST API
Query Engine Entity Manager
Branch History
Parser
Service Service
Metadata Service
Translator & Entity
Optimizer Service
Entity
Executor
Mapper
Data Access Layer
Search Persistence
Service Service
13 eBay Inc. confidential
MongoDB
CMS - Try Me Page
14 eBay Inc. confidential
CMS Functionality & Demo
15 eBay Inc. confidential
Metadata Model – Basic Feature
• The metadata model is based on object-oriented paradigm that can support
graph/tree data model
– MetaClass define the meta type of runtime data (i.e. entity)
– Entity represent one node in graph
– Relationship between entity represent the edge in graph
• The metadata can contain two types of field:
– Attribute field define payload of entity
• String, Boolean, Double, Integer, Long, Date
• Json
– Relationship field define relationship between entity.
• Reference
• Embedded
16 eBay Inc. confidential
Metadata Model – Sample
17 eBay Inc. confidential
Metadata Model – Advanced Feature
• Metadata Inheritance (parent & child)
• Reference Integrity (strong & weak)
• Index Support on Metadata (unique contraints & query optimizer)
• Mongodb Collection Split by Metadata (break 64 index limitation)
18 eBay Inc. confidential
Persistence Service – Basic Feature
• The persistence service provides CRUD API for the runtime data (i.e. entity)
of metadata.
– Create
– Retrieval
– Update
– Delete
• The entity can be flat-structure or embedded-structure that conformed to the
metadata definition
– For reference relationship, entity is flat-structure
– For embedded relationship, entity is embedded-structure
19 eBay Inc. confidential
Persistence Service – Advanced Feature
• Branching (main & sub & merge)
• Audit Tracing (entity history)
• Reference Integrity (strong & weak)
• Conditional Update (version based optimistic locking)
• Security Access Control
20 eBay Inc. confidential
Query Service – Basic Feature
• The query service provides an imperative style query language that defines
the traversal path of graph/tree data model.
• The query language supports Boolean filter, attribute selection and implicit
join that will extract a sub-tree result from graph data set.
• For example, *ApplicationService[@name = “pool1"].groups[@name =
"columns"].groups[@name = "col1"].serviceInstances* will return service
instances under column 1 of pool1 application.
21 eBay Inc. confidential
Query Service – Advanced Feature
• Query Optimizer (cost & hint)
• Result Pagination (sort / limit / skip)
• Full Table Scan Check (query filter & index info)
• Query Explanation (execution plan)
22 eBay Inc. confidential
System Management
• Monitoring (approximate & accurate sliding window metrics)
• State Management (normal / maintain / overload)
• Healthy Model (formula based on qps & latency -> overload state)
• API Throttling (overload state -> priority throttling)
23 eBay Inc. confidential
Open Source Strategy
• Plan to open source the core functionality of CMS
• Separate the ebay-related code (e.g. security) from open source code
• Welcome to contribute code!
24 eBay Inc. confidential
NoSQL in CMS
25 eBay Inc. confidential
CMS Requirements
• The primary goal of CMS is to efficiently manage the configuration data
• The characteristic of configuration data
– data model is very complex and flexible
– access pattern is reading >> writing
– need to support very complex query
• Non-functional requirements
– High Performance
– High Availability
– High Scalability
– Access Control
26 eBay Inc. confidential
Relational DB vs. Nosql DB
RDB (i.e. Document Store (i.e. Column Store (i.e.
MySQL) MongoDB) Cassandra)
DB Schema Rigid Schema Schema Free Flexible Schema
Performance Too many join High read performance; High write performance
for graph model Potential write Fast key based read &
performance bottleneck Slow range query
Scalability Not scale-out horizontally scalable horizontally scalable
Metadata DB Schema No metadata No metadata
Query SQL Limited query language Limited query language
Consistency Transactional Eventual Consistency Eventual Consistency
Security AuthZ & AuthN Basic security Basic security
Concurrency Locking or database-level locking & row-based atomic
Control MVCC atomic operation
27 eBay Inc. confidential
Why CMS choose MongoDB?
• High Performance
– In-Memory Storage (if work set fit in memory)
– B-Tree Index
• High Availability & High Scalability
– Replication Set
• Flexible Schema
– JSON-Based Document Model
• Query Support
– Rich, document-based queries.
28 eBay Inc. confidential
Overcome NoSQL Design Challenges
• No Metadata Management
– Metadata Driven
• Limit Query Language
– Imperative Query Language
• No Multi-Row Transaction
– Branching & Merge
• No Access Control
– Security Model
29 eBay Inc. confidential
Resolve MongoDB Issues
• Open source software is great, but isn’t bug-free to use.
• Something, we may need to dig into source code or OS kernel to find the
root cause and do some enhancement by ourselves
• Case Study
– Case 1: High system CPU for high concurrent full table scan query
– Case 2: High system CPU for high concurrent large result set query
30 eBay Inc. confidential
Resolve MongoDB Issues – Case Study I
• Case 1: High system CPU for high concurrent full table scan query
• Symptom:
– When there are 100+ concurrent client to execute full table scan on a 100K+
collection, the system cpu is 80%+.
• Analysis:
– gdb sampling show that lost of samples are on pthread_mutex_lock &
pthread_mutex_unlock that is called mongo::ps::Rolling::access()
– strace sampling show 80%+ syscall are futex
– After we study the mongodb code, mongo::ps::Rolling::access() will check whether
the record is in memory or not; if it’s out of memory, it will load it into memory.
– The problem is that mongo::ps::Rolling::access() will acquire a pthread_mutex for
each record that trigger high lock contention.
• Solution
– We add “full table scan” checking in query engine. And we will reject “full table scan”
query when system is in unhealthy state
– We have a JIRA CS-3969 opened with 10gen
31 eBay Inc. confidential
Resolve MongoDB Issues – Case Study II
• Case 2: High system CPU for high concurrent large result set query
• Symptom:
– When there are 100+ concurrent client to execute large query that return 1K+ result set,
the system cpu is 90%+.
• Analysis:
– gdb sampling show that most samples is on socket recv() and many samples is on malloc
mutex that is used in allocate string for query result.
– Since recv is io-bound that should not cause high system cpu, so we suspect malloc
mutex __lll_lock_wait_private()
– oprofile profiling show that 95% sample is futex_wait & futex_wake
– Since glibc mutex is implemented by futex, it’s very likely that malloc mutex cause high
system cpu
• Solution
– We use google tcmalloc to replace the default glibc ptmalloc by LD_PRELOAD. The
query latency is reduced from 3 second to 300ms
– Since mongodb 2.2 already use tcmalloc as default memory allocator, you can use
32
mongodb 2.2 directly.
eBay Inc. confidential
Q&A
Thanks!
please visit us @eBayTech
33 eBay Inc. confidential