AWS High Availability Design Guide
AWS High Availability Design Guide
YOU: here to learn more about designing your applications for high
availability on AWS
TODAY: about best practices and things to think about when building a
highly available application on AWS
What is High Availability?
Availability: Percentage of time an application operates during its work cycle
Loss of availability is known as an outage or downtime
App is offline, unreachable, or partially available
App is slow to use
Planned and unplanned
Goal
No downtime
Always available
3
Availability is related to
Scalability
Ability of an application to accommodate growth without changing design
If app cannot scale, availability may be impacted
Scalability doesnt guarantee availability
Fault Tolerance
Built-in redundancy so apps can continue functioning when components fail
Fault tolerance is crucial to HA
4
AWS GLOBAL
INFRASTRUCTURE
Global Infrastructure
AWS Regions and Availability Zones
App Services
Networking
RDS DB
instance
Internet gateway
Elastic IP
RDS DB
instance
www.example.com
Route
53
Internet gateway
user DNS
Resolution
Elastic IP
RDS DB
instance
#1
DESIGN FOR FAILURE
Everything fails
all the time
Werner Vogels
CTO of Amazon
AVOID SINGLE POINTS OF FAILURE
AVOID SINGLE POINTS OF FAILURE
Route
53
Internet gateway
user DNS
Resolution
Elastic IP
RDS DB
instance
www.example.com
Route
53
Internet gateway
user DNS
Resolution
Elastic IP
RDS DB
instance
www.example.com
Route
53
Internet gateway
user DNS
Resolution
Elastic IP
RDS DB
instance
AMAZON EBS
ELASTIC BLOCK STORE
Storage
Elastic Block Store
EC2
High performance block storage device
1GB to 1TB in size
EBS Mount as drives to instances
snapshot
Feature Details
Deployment & Administration
High performance Mount EBS as drives and format as required
file system
App Services Flexible size Volumes from 1GB to 1TB in size
Secure Private to your instances
Compute Storage Database
Performance Use provisioned IOPS to get desired level of IO
performance
Networking Available Replicated within an Availability Zone
Backups Volumes can be snapshotted for point in time
AWS Global Infrastructure restore
Monitoring Detailed metrics captured via Cloud Watch
www.example.com
Route
53
Internet gateway
user DNS
Resolution
Elastic IP
EBS
RDS DB
instance
www.example.com
Route
53
Internet gateway
user DNS
Resolution
Elastic IP
EBS
RDS DB
instance
www.example.com
Route
53
Internet gateway
user DNS
Resolution
Elastic IP
EBS
RDS DB
instance
www.example.com
Route
53
Internet gateway
user DNS
Resolution
Elastic IP
EBS
RDS DB
instance
AMAZON ELB
ELASTIC LOAD BALANCING
Compute
Elastic Load
Elastic Load Balancing Balancing
Route
53
Internet gateway
user DNS
Resolution
Elastic IP
RDS DB
instance
www.example.com
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
RDS DB
instance
HEALTH CHECKS
www.example.com
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
RDS DB
instance
www.example.com
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
RDS DB
instance
www.example.com
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
Health Checks
RDS DB
instance
# 2
MULTIPLE
AVAILABILITY ZONES
AMAZON RDS
MULTI-AZ
Database Relational Database Service
Database-as-a-Service
No need to install or manage database instances
Scalable and fault tolerant configurations
RDS DB RDS DB RDS DB
instance read instance instance standby Feature Details
replica (Multi-AZ)
Platform support Create MySQL, SQL Server, Postgres and
Oracle RDBMS
Preconfigured Get started instantly with sensible default
Deployment & Administration settings
Automated patching Keep your database platform up to date
App Services automatically
Backups Automatic backups and point in time recovery
and full DB backups
Compute Storage Database
Provisioned IOPS Specify IO throughput depending on
requirements
Networking Failover Automated failover to slave hosts in event of a
failure
AWS Global Infrastructure Replication Easily create read-replicas of your data and
seamlessly replicate data across availability
zones
www.example.com
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
RDS DB RDS DB
instance Synchronous Replication Slave
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
RDS DB
instance
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
RDS DB
instance
Synchronous Replication
Availability Zone A Availability Zone B
AMAZON ELB AND
MULTIPLE AZs
www.example.com
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
RDS DB RDS DB
instance Synchronous Replication Slave
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
RDS DB RDS DB
instance Synchronous Replication Slave
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
RDS DB RDS DB
instance Synchronous Replication Slave
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
RDS DB RDS DB
instance Synchronous Replication Slave
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
RDS DB RDS DB
instance Synchronous Replication Slave
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
RDS DB RDS DB
instance Synchronous Replication Slave
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
RDS DB RDS DB
instance Synchronous Replication Slave
Route
53
Internet gateway
user DNS
Resolution
AMI
Elastic Load
Balancing
Auto Scaling Policy fires
RDS DB RDS DB
instance Synchronous Replication Slave
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
RDS DB RDS DB
instance Synchronous Replication Slave
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
RDS DB RDS DB
instance Synchronous Replication Slave
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
RDS DB RDS DB
instance Synchronous Replication Slave
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
RDS DB RDS DB
instance Synchronous Replication Slave
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
RDS DB RDS DB
instance Synchronous Replication Slave
scale up to an 8-core
server with 244 GB of RAM
with the cr1.8xlarge
scaling
READS
Scale-out with one or
more read servers master-slave
architecture
Use Cases
Reporting and ETL
002347 C
No practical limit on scalability
002348 B
B
Operation complexity/sophistication
002349 A
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
RDS DB RDS DB
instance Synchronous Replication Slave
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
RDS DB RDS DB
instance Synchronous Replication Slave
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
launching Auto
Auto Scaling Group Scaling
RDS DB RDS DB
instance Synchronous Replication Slave
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
RDS DB RDS DB
instance Synchronous Replication Slave
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
RDS DB RDS DB
instance Synchronous Replication Slave
Route
53
Internet gateway
user DNS
Resolution
Elastic Load
Balancing
RDS DB RDS DB
instance Synchronous Replication Slave
App Services
Feature Details
Compute Storage Database Reliable Messages stored redundantly across
multiple availability zones
Simple Simple APIs to send and receive messages
Networking
Scalable Unlimited number of messages
user
S3 Bucket
Webservers / CMS
user
S3 Bucket
Webservers / CMS
user
S3 Bucket
Webservers / CMS
Message reappears
1) User / browser posts photo in queue 3
to S3 and is redirected to
form on webservers
2) User completes form for 5 SQS
photo and submits
3) Message is sent to SQS
4) Worker long polling SQS 4
grabs message and
creates different size photo 6
assets
5) Thumbs are uploaded to Workers
S3 bucket
6) Worker updates database
with photo assets
www.example.com
Photo CMS with SQS
Route
53
user
S3 Bucket
Webservers / CMS
user
backlog of
S3 Bucket
messages Webservers / CMS
Pull:
Kinesis Stream
Kinesis
user
S3 Bucket
1) User / browser posts photo Webservers / CMS
to S3 and is redirected to
form on webservers
2) The redirected user
completes form for photo
and submits 3 4
3) At the same time as the
redirect, S3 event
notifications fire off and are
received by Lambda
4) Lambda creates different 5
size photo assets and
uploads them to S3 Lambda
5) Lambda updates database
with photo assets
1. DESIGN FOR FAILURE
2. MULTIPLE AVAILABILITY ZONES
3. SCALING
4. SELF-HEALING
5. LOOSE COUPLING
1. DESIGN FOR FAILURE
2. MULTIPLE AVAILABILITY ZONES
3. SCALING
4. SELF-HEALING
5. LOOSE COUPLING
1. DESIGN FOR FAILURE
2. MULTIPLE AVAILABILITY ZONES
3. SCALING
4. SELF-HEALING
5. LOOSE COUPLING
1. DESIGN FOR FAILURE
2. MULTIPLE AVAILABILITY ZONES
3. SCALING
4. SELF-HEALING
5. LOOSE COUPLING
1. DESIGN FOR FAILURE
2. MULTIPLE AVAILABILITY ZONES
3. SCALING
4. SELF-HEALING
5. LOOSE COUPLING
1. DESIGN FOR FAILURE
2. MULTIPLE AVAILABILITY ZONES
3. SCALING
4. SELF-HEALING
5. LOOSE COUPLING
YOUR GOAL
Applications should continue to function
ITS ALL ABOUT
CHOICE
BALANCE COST & AVAILABILITY REQUIREMENTS
AWS Architecture Center
http://aws.amazon.com/architecture
AWS Whitepapers
http://aws.amazon.com/whitepapers
AWS Blog
http://aws.amazon.com/blogs/aws