0% found this document useful (0 votes)
16 views

Microservices at Netflix Scale

Uploaded by

dinhnguyenngoc
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Microservices at Netflix Scale

Uploaded by

dinhnguyenngoc
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Microservices at Netflix Scale

First Principles, Tradeoffs, Lessons Learned


Ruslan Meshenberg
@rusmeshenberg
Microservices:
all benefits, no costs?
Netflix is the world’s leading Internet television network
with over 81 million members in over 190 countries
enjoying more than 125 million hours of TV shows and
movies per day, including original series, documentaries
and feature films.
Ruslan Meshenberg
Director, Platform Engineering

• Runtime Systems

• Container Runtime

• Persistence and Databases

• Real Time Data Infrastructure


Netflix runs on microservices
Netflix journey
to microservices
Our journey took 7 years

https://media.netflix.com/en/company-blog/completing-the-netflix-cloud-migration
Data Center - Monolith

RDBMS
August 2008
First Principles
Buy vs. Build
● Use or contribute to OSS technologies first

● Only build what you have to


Services should be stateless*
● Must not rely on sticky sessions

● Prove by Chaos testing

*Except the Persistence / Caching layers


Scale out vs. scale up
● If you keep scaling up, you’ll hit a limit

● Horizontal scaling gives you a longer runway


Redundancy and Isolation
For Resiliency
● Make more than one of anything

● Isolate the blast radius for any given failure


Automate destructive testing
● Simian Army

● Started with Chaos Monkey


First Principles
In Action
Stateless services

Service A

Service B Service B Service B Service B Service B


Verify stateless
Data – from RDBMS to Cassandra
● NoSQL at scale
● Open Source
● Multi-Regional
● Multi-directional

● Available
● Partition Tolerance
● Tunable Consistency*
Multi-Regional Replication
Zone Zone
B B
500ms

Zone Zone Zone Zone


A B A B
Client
Client

Zone Local Quorum Zone Zone Zone


(Typical)
A C A C
Zone Zone
C C

Bi-directional
Region A Region B
Nightly compare & repair
Last, but not least - Billing
Microservices –
Benefits
Our Priorities
3. Efficiency

1. Innovation 2. Reliability
Innovation:
tight coupling doesn’t work

Develop
• Team A
• Team B Test Release
• Team C
• …
Innovation: Loose coupling
Develop,
Team A Test, Deploy,
Support

Develop,
Team B Test, Deploy,
Support

Develop,
Team C Test, Deploy,
Support
Support Architect

Run Design

End-end
ownership
Deploy Develop

Test Review
End-end ownership + velocity
Support Architect Support Architect
Support Architect Support Architect

Run Design Run Design


Run Design Run Design

Deploy Develop Deploy Develop


Deploy Develop Deploy Develop

Test Review Test Review


Test Review Test Review

Support Architect Support Architect


Support Architect Support Architect

Run Design Run Design


Run Design Run Design

Deploy Develop Deploy Develop


Deploy Develop Deploy Develop

Test Review Test Review


Test Review Test Review

Support Architect Support Architect


Support Architect Support Architect

Run Design Run Design


Run Design Run Design

Deploy Develop Deploy Develop


Deploy Develop Deploy Develop

Test Review Test Review


Test Review Test Review
Separation of concerns
UI Feature A Feature B Feature C

Personalization Feature D A/B Test E


Leverage

A/B
Mid-tier Feature H
Test F

Infrastructure Availability Scalability Security


Microservices –
Costs
Microservices
Is an org change!

Org changes are hard!


Evolving the organization
Central infrastructure
investment
Migration doesn’t happen
overnight
● Living in the hybrid world

● Supporting 2 tech stacks

● Double the maintenance

● Multi-master data replication


Microservices -
Lessons Learned
IPC is crucial
for loose coupling
● Common language between the services

● Establishes the contract of interaction


Caching to protect DBs
Client Application
Request Cache
Client Library

EVCache Client Service Client

...

...
S S S S

1. Read from Cache


2. On cache miss call service
3. Service calls DB and responds
...
4. Service updates the cache DB DB DB DB
Operational visibility matters

If you can’t see it, you can’t improve it


Will your Telemetry scale?

Observe Orient

Act Decide
Edge Middle Tier & Platform
Zuul

EVCache

ELB

API

Cassandra

Playback
Reliability Matters
● We strive for 4 9’s of availability

● That leaves only 52 minutes of downtime per YEAR

● Netflix outages lead to…


Disappointment
Outrage
Withdrawal
Humor
Cascading failures


99% availability 99% availability 99% availability

500
99% = 0.0657%
Microservice failure

FIT
Fault-Injection
Test Framework
Regional fail-over

x x
Regional fail-over
A word on containers
● Containers change the level of encapsulation
from VM to process

● Containers can help deliver great developer


experience

● To run containers in production at scale…


Requires something like this:
Cassandra Zookeeper

Docker
Docker S3
Registry
Docker
Registry
Registry

Titus UI Titus Master


Titus UI
Titus UI Titus Agent
Job Management & metrics agent
Scheduler container
container
container Titus executor
RheaAPI
Titus
Rhea container
Fenzo container
container logging agent
docker
Mesos Master
VPC networking
zfs docker driver

AWS container Integration

EC2 Autocaling mesos agent metadata proxy


API
Amazon VM’s
CI/CD
50
Microservices -
Resources
http://netflix.github.com
http://netflix.github.com
http://netflix.github.com
http://netflix.github.com
http://netflix.github.com
http://netflix.github.com
Wrap up
Microservices bring great value to development
velocity, availability and other dimensions
Microservices at scale require organizational
change and centralized infrastructure investment
Be aware of your situation and what works for you
Questions?

Ruslan Meshenberg
@rusmeshenberg

You might also like