0% found this document useful (0 votes)
127 views109 pages

AWS High Availability Design Guide

The document discusses designing applications for high availability on AWS. It emphasizes that availability is related to scalability and fault tolerance. The key principles for designing highly available systems on AWS are to design for failure, use multiple availability zones, implement scaling, enable self-healing capabilities, and maintain loose coupling between components. Some AWS services that are inherently highly available include S3, DynamoDB, and CloudFront, while other services like EC2, EBS, and RDS can be made highly available through proper architecture and configuration.

Uploaded by

bobwillmore
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
127 views109 pages

AWS High Availability Design Guide

The document discusses designing applications for high availability on AWS. It emphasizes that availability is related to scalability and fault tolerance. The key principles for designing highly available systems on AWS are to design for failure, use multiple availability zones, implement scaling, enable self-healing capabilities, and maintain loose coupling between components. Some AWS services that are inherently highly available include S3, DynamoDB, and CloudFront, while other services like EC2, EBS, and RDS can be made highly available through proper architecture and configuration.

Uploaded by

bobwillmore
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 109

Design for Availability

Joel Williams, Solutions Architect, AWS


March 18, 2015
Designing for Availability
ME: Joel Williams Solutions Architect at Amazon Web Services

YOU: here to learn more about designing your applications for high
availability on AWS

TODAY: about best practices and things to think about when building a
highly available application on AWS
What is High Availability?
Availability: Percentage of time an application operates during its work cycle
Loss of availability is known as an outage or downtime
App is offline, unreachable, or partially available
App is slow to use
Planned and unplanned
Goal
No downtime
Always available

3
Availability is related to
Scalability
Ability of an application to accommodate growth without changing design
If app cannot scale, availability may be impacted
Scalability doesnt guarantee availability

Fault Tolerance
Built-in redundancy so apps can continue functioning when components fail
Fault tolerance is crucial to HA

AWS democratizes High Availability


Multiple servers, isolated redundant data centers, regions across the globe, Fault
Tolerant services, etc.

4
AWS GLOBAL
INFRASTRUCTURE
Global Infrastructure
AWS Regions and Availability Zones

Customer Decides Where Applications and Data Reside


Reference Model
Deployment & Administration

App Services

Compute Storage Database

Networking

AWS Global Infrastructure


AWS BUILDING BLOCKS

Inherently Highly Available and Highly Available


Fault Tolerant Services with the right
architecture
Amazon S3 Amazon SQS Amazon EC2
Amazon DynamoDB Amazon SNS Amazon EBS
Amazon CloudFront Amazon SES Amazon RDS
Amazon Route53 Amazon SWF Amazon VPC
Elastic Load Balancing
Principles of Designing for Availability

1. DESIGN FOR FAILURE


2. MULTIPLE AVAILABILITY ZONES
3. SCALING
4. SELF-HEALING
5. LOOSE COUPLING
LETS BUILD A
HIGHLY AVAILABLE
SYSTEM
** MANY NEW INSTANCE TYPES
Compute
Vertical Scaling Elastic Compute Cloud (EC2)
From $0.02/hr
Basic unit of compute capacity
Range of CPU, memory & local disk options
42 Instance types available from 16 different families
Amazon EC2 instances Feature Details
Flexible Run windows or Linux distributions
Scalable Wide range of instance types from micro to
Deployment & Administration cluster compute
Machine Images Configurations can be saved as machine
images (AMIs) from which new instances can
App Services be created
Full control Full root or administrator rights
Compute Storage Database Secure Full firewall control via Security Groups

Monitoring Publishes metrics to Cloud Watch


Networking
Inexpensive On-demand, Reserved and Spot instance types

AWS Global Infrastructure VM Import/Export Import and export VM images to transfer


configurations in and out of EC2
Web Server EC2
Web Server EC2

RDS DB
instance
Internet gateway

Elastic IP

Web Server EC2

RDS DB
instance
www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic IP

Web Server EC2

RDS DB
instance
#1
DESIGN FOR FAILURE

Everything fails
all the time
Werner Vogels
CTO of Amazon
AVOID SINGLE POINTS OF FAILURE
AVOID SINGLE POINTS OF FAILURE

ASSUME EVERYTHING FAILS,


AND WORK BACKWARDS
YOUR GOAL
Applications should continue to function
www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic IP

Web Server EC2

RDS DB
instance
www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic IP

Web Server EC2

RDS DB
instance
www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic IP

Web Server EC2 EC2

RDS DB
instance
AMAZON EBS
ELASTIC BLOCK STORE
Storage
Elastic Block Store
EC2
High performance block storage device
1GB to 1TB in size
EBS Mount as drives to instances

snapshot
Feature Details
Deployment & Administration
High performance Mount EBS as drives and format as required
file system
App Services Flexible size Volumes from 1GB to 1TB in size
Secure Private to your instances
Compute Storage Database
Performance Use provisioned IOPS to get desired level of IO
performance
Networking Available Replicated within an Availability Zone
Backups Volumes can be snapshotted for point in time
AWS Global Infrastructure restore
Monitoring Detailed metrics captured via Cloud Watch
www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic IP

Web Server EC2

EBS

RDS DB
instance
www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic IP

Web Server EC2

EBS

RDS DB
instance
www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic IP

Web Server EC2 EC2

EBS

RDS DB
instance
www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic IP

Web Server EC2 EC2

EBS

RDS DB
instance
AMAZON ELB
ELASTIC LOAD BALANCING
Compute
Elastic Load
Elastic Load Balancing Balancing

Create highly scalable applications


Distribute load across EC2 instances in multiple
availability zones
EC2 EC2
** NEW CONNECTION DRAINING
Auto Scaling Group
AND NEW ACCESS LOGS
Feature Details
Deployment & Administration
Auto-scaling Automatically scales to handle request volume

App Services Available Load balance across instances in multiple


availability zones
Health checks Automatically checks health of instances and
Compute Storage Database takes them in or out of service
Session stickiness Route requests to the same instance
Networking
Secure sockets layer Supports SSL offload from web and application
servers with flexible cipher support
AWS Global Infrastructure Monitoring Publishes metrics to Cloud Watch
www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic IP

Web Server EC2

RDS DB
instance
www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Web EC2 EC2 EC2


Servers

RDS DB
instance
HEALTH CHECKS
www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Web EC2 EC2 EC2


Servers

RDS DB
instance
www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Web EC2 EC2 EC2


Servers

RDS DB
instance
www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Health Checks

Web EC2 EC2 EC2


Servers

RDS DB
instance
# 2
MULTIPLE
AVAILABILITY ZONES

AMAZON RDS
MULTI-AZ
Database Relational Database Service
Database-as-a-Service
No need to install or manage database instances
Scalable and fault tolerant configurations
RDS DB RDS DB RDS DB
instance read instance instance standby Feature Details
replica (Multi-AZ)
Platform support Create MySQL, SQL Server, Postgres and
Oracle RDBMS
Preconfigured Get started instantly with sensible default
Deployment & Administration settings
Automated patching Keep your database platform up to date
App Services automatically
Backups Automatic backups and point in time recovery
and full DB backups
Compute Storage Database
Provisioned IOPS Specify IO throughput depending on
requirements
Networking Failover Automated failover to slave hosts in event of a
failure

AWS Global Infrastructure Replication Easily create read-replicas of your data and
seamlessly replicate data across availability
zones
www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Web EC2 EC2 EC2


Servers

RDS DB RDS DB
instance Synchronous Replication Slave

Availability Zone A Availability Zone B


www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Web EC2 EC2 EC2


Servers

RDS DB
instance

Availability Zone A Availability Zone B


www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Web EC2 EC2 EC2


Servers

RDS DB
instance
Synchronous Replication
Availability Zone A Availability Zone B
AMAZON ELB AND
MULTIPLE AZs
www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Web EC2 EC2 EC2


Servers

RDS DB RDS DB
instance Synchronous Replication Slave

Availability Zone A Availability Zone B


www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Web EC2 EC2 EC2 EC2


Servers

RDS DB RDS DB
instance Synchronous Replication Slave

Availability Zone A Availability Zone B


#3
SCALING

www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Web EC2 EC2 EC2 EC2


Servers

RDS DB RDS DB
instance Synchronous Replication Slave

Availability Zone A Availability Zone B


www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Web EC2 EC2 EC2 EC2


Servers

RDS DB RDS DB
instance Synchronous Replication Slave

Availability Zone A Availability Zone B


AUTO SCALING
SCALE UP/DOWN EC2 CAPACITY
Compute Auto Scaling
** NEW CONSOLE
EC2 EC2
as-create-auto-scaling-group MyGroup
--launch-configuration MyConfig
--availability-zones eu-west-1a Auto Scaling Group
--min-size 4
--max-size 200

Deployment & Administration


Auto Scaling
Automatic re-sizing of compute clusters based upon demand
App Services
Feature Details
Compute Storage Database Control Define minimum and maximum instance pool
sizes and when scaling and cool down occurs

Networking Integrated to Use metrics gathered by CloudWatch to drive


CloudWatch scaling
Instance types Run auto scaling for on-demand instances and
AWS Global Infrastructure spot. Compatible with VPC
www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Web EC2 EC2 EC2 EC2


Servers

RDS DB RDS DB
instance Synchronous Replication Slave

Availability Zone A Availability Zone B


www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Web EC2 EC2 EC2 EC2


Servers
Auto
Auto Scaling Group Scaling

RDS DB RDS DB
instance Synchronous Replication Slave

Availability Zone A Availability Zone B


www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Web EC2 EC2 EC2 EC2


Servers
Auto
Auto Scaling Group Scaling

RDS DB RDS DB
instance Synchronous Replication Slave

Availability Zone A Availability Zone B


www.example.com

Route
53
Internet gateway
user DNS
Resolution

AMI

Elastic Load
Balancing
Auto Scaling Policy fires

Web EC2 EC2 EC2 EC2


Servers
Auto
Auto Scaling Group Scaling

RDS DB RDS DB
instance Synchronous Replication Slave

Availability Zone A Availability Zone B


www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Web EC2 EC2 EC2 EC2 EC2 EC2


Servers

launching launching Auto


Auto Scaling Group Scaling

RDS DB RDS DB
instance Synchronous Replication Slave

Availability Zone A Availability Zone B


www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Web EC2 EC2 EC2 EC2 EC2 EC2


Servers
Auto
Auto Scaling Group Scaling

RDS DB RDS DB
instance Synchronous Replication Slave

Availability Zone A Availability Zone B


www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Web EC2 EC2 EC2 EC2 EC2 EC2


Servers
Auto
Auto Scaling Group Scaling

RDS DB RDS DB
instance Synchronous Replication Slave

Availability Zone A Availability Zone B


www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Web EC2 EC2 EC2 EC2 EC2 EC2


Servers

terminating terminating Auto


Auto Scaling Group Scaling

RDS DB RDS DB
instance Synchronous Replication Slave

Availability Zone A Availability Zone B


www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Web EC2 EC2 EC2 EC2


Servers
Auto
Auto Scaling Group Scaling

RDS DB RDS DB
instance Synchronous Replication Slave

Availability Zone A Availability Zone B


Scaling
Data Tier
RDS - Push-Button Scaling

scale up or down to the


desired instance class

scale up to an 8-core
server with 244 GB of RAM
with the cr1.8xlarge
scaling
READS
Scale-out with one or
more read servers master-slave
architecture

Use Cases
Reporting and ETL

Discrete read/write transactions (browsers vs buyers)


scaling
READS Optimize master for OLTP and read slaves for table
scans Tech tips
Resize slaves as needed to boost reporting performance

Use short-term slaves to save cost during monthly


reporting

Promote to standalone server.

NEW - Cross Region Read Replicas with MySQL


Scaling for Writes on the Data Tier
At large scale, you may start to run into issues with your database
around contention on writes to the master.

How can you solve it?

Federation ( splitting into multiple DBs based on function)

Sharding ( splitting one data set up across multiple hosts)

Moving some functionality to other types of DBs ( NoSQL )


Database Federation
Split up Databases by function/purpose ForumsDB

Harder to do cross function queries

Essentially delaying the need for UsersDB


something like sharding / NoSQL until
much further down the line

Wont help with single huge ProductsDB


functions/tables
Sharded Horizontal Scaling A
User ShardID
More complex at the application layer
002345 A
ORM support can help 002346 B

002347 C
No practical limit on scalability
002348 B
B
Operation complexity/sophistication
002349 A

Shard by function or key space


C
RDBMS or NoSQL
#4
SELF-HEALING

HEALTH CHECKS
+
AUTO SCALING
www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Web EC2 EC2 EC2 EC2


Servers
Auto
Auto Scaling Group Scaling

RDS DB RDS DB
instance Synchronous Replication Slave

Availability Zone A Availability Zone B


www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Web EC2 EC2 EC2 EC2


Servers
Auto
Auto Scaling Group Scaling

RDS DB RDS DB
instance Synchronous Replication Slave

Availability Zone A Availability Zone B


www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Web EC2 EC2 EC2 EC2 EC2


Servers

launching Auto
Auto Scaling Group Scaling

RDS DB RDS DB
instance Synchronous Replication Slave

Availability Zone A Availability Zone B


www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Web EC2 EC2 EC2 EC2


Servers
Auto
Auto Scaling Group Scaling

RDS DB RDS DB
instance Synchronous Replication Slave

Availability Zone A Availability Zone B


HEALTH CHECKS
+
AUTO SCALING
=
SELF-HEALING
DEGRADED MODE
AMAZON S3
STATIC WEBSITE
+
AMAZON ROUTE 53
DNS Failover
S3 Static Website www.example.com
www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Web EC2 EC2 EC2 EC2


Servers
Auto
Auto Scaling Group Scaling

RDS DB RDS DB
instance Synchronous Replication Slave

Availability Zone A Availability Zone B


S3 Static Website www.example.com
www.example.com

Route
53
Internet gateway
user DNS
Resolution

Elastic Load
Balancing

Web EC2 EC2 EC2 EC2


Servers
Auto
Auto Scaling Group Scaling

RDS DB RDS DB
instance Synchronous Replication Slave

Availability Zone A Availability Zone B


# 5
LOOSE
COUPLING

BUILD LOOSELY
COUPLED SYSTEMS
The looser they are coupled,
the bigger they scale,
the more fault tolerant they get
Services Oriented Architecture - SOA
Move services into their own
tiers/modules. Treat each of these
as 100% whole-y separate pieces
of your infrastructure and scale
them independently.

Amazon.com and AWS do this


extensively! It offers flexibility and
greater understanding of each
component.
Loose coupling sets you free!
The looser they're coupled, the bigger they scale
Independent components
Design everything as a black box
Decouple interactions
Favor services with built in redundancy and scalability than building your
own
AMAZON SQS
SIMPLE QUEUE SERVICE
Application Services
Amazon SQS instance put
message instance
SQS get
Reliable, highly scalable, queue service message
for storing messages as they travel
between instances messages
queue is subscribed
publish to topic
notification

Deployment & Administration Amazon SNS topic

App Services
Feature Details
Compute Storage Database Reliable Messages stored redundantly across
multiple availability zones
Simple Simple APIs to send and receive messages
Networking
Scalable Unlimited number of messages

AWS Global Infrastructure Secure Authentication of queues to ensure


controlled access
PUBLISH&
RECEIVE CREATE THUMBS
NOTIFY
SQS SQS
PUBLISH&
RECEIVE CREATE THUMBS
NOTIFY
www.example.com
Photo CMS with SQS
Route
53

user

S3 Bucket
Webservers / CMS

1) User / browser posts photo 3


to S3 and is redirected to
form on webservers
2) User completes form for 5 SQS
photo and submits
3) Message is sent to SQS
4) Worker long polling SQS 4
grabs message and
creates different size photo 6
assets
5) Thumbs are uploaded to Workers
S3 bucket
6) Worker updates database
with photo assets
VISIBILITY TIMEOUT
www.example.com
Photo CMS with SQS
Route
53

user

S3 Bucket
Webservers / CMS

1) User / browser posts photo 3


to S3 and is redirected to
form on webservers
2) User completes form for 5 SQS
photo and submits
3) Message is sent to SQS
4) Worker long polling SQS 4
grabs message and
creates different size photo 6
assets
5) Thumbs are uploaded to Workers
S3 bucket
6) Worker updates database
with photo assets message
www.example.com
Photo CMS with SQS
Route
53

user

S3 Bucket
Webservers / CMS
Message reappears
1) User / browser posts photo in queue 3
to S3 and is redirected to
form on webservers
2) User completes form for 5 SQS
photo and submits
3) Message is sent to SQS
4) Worker long polling SQS 4
grabs message and
creates different size photo 6
assets
5) Thumbs are uploaded to Workers
S3 bucket
6) Worker updates database
with photo assets
www.example.com
Photo CMS with SQS
Route
53

user

S3 Bucket
Webservers / CMS

1) User / browser posts photo 3


to S3 and is redirected to
form on webservers
2) User completes form for 5 SQS
photo and submits
3) Message is sent to SQS
message 4
4) Worker long polling SQS
grabs message and
creates different size photo 6
assets
5) Thumbs are uploaded to Workers
S3 bucket
6) Worker updates database
with photo assets
CLOUDWATCH METRICS
FOR AMAZON SQS
+
AUTO SCALING
www.example.com
Photo CMS Scaling with SQS
Route
53

user

backlog of
S3 Bucket
messages Webservers / CMS

1) User / browser posts photo 3 Auto Scaling Group


to S3 and is redirected to
form on webservers
2) User completes form for 5 SQS
photo and submits
3) Message is sent to SQS
4) Worker long polling SQS 4
grabs message and
creates different size photo 6
assets
5) Thumbs are uploaded to Workers
S3 bucket
6) Worker updates database Auto Scaling Group
with photo assets
Compute Push: Event
S3 Bucket notification
Lambda
Lambda Pull: DynamoDB
Stream
Event driven compute DynamoDB
Connective tissue for AWS services

Pull:
Kinesis Stream
Kinesis

Deployment & Administration


Feature Details
App Services Stateless Request driven code called Lambda functions
triggered by events
Easy Fixed OS and language - JavaScript
Compute Storage Database

Management AWS owns and manages the infrastructure


Networking
Scaling Implicit scaling; just make requests
AWS Global Infrastructure
www.example.com
Photo CMS with Lambda
Route
53

user

S3 Bucket
1) User / browser posts photo Webservers / CMS
to S3 and is redirected to
form on webservers
2) The redirected user
completes form for photo
and submits 3 4
3) At the same time as the
redirect, S3 event
notifications fire off and are
received by Lambda
4) Lambda creates different 5
size photo assets and
uploads them to S3 Lambda
5) Lambda updates database
with photo assets
1. DESIGN FOR FAILURE
2. MULTIPLE AVAILABILITY ZONES
3. SCALING
4. SELF-HEALING
5. LOOSE COUPLING
1. DESIGN FOR FAILURE
2. MULTIPLE AVAILABILITY ZONES
3. SCALING
4. SELF-HEALING
5. LOOSE COUPLING
1. DESIGN FOR FAILURE
2. MULTIPLE AVAILABILITY ZONES
3. SCALING
4. SELF-HEALING
5. LOOSE COUPLING
1. DESIGN FOR FAILURE
2. MULTIPLE AVAILABILITY ZONES
3. SCALING
4. SELF-HEALING
5. LOOSE COUPLING
1. DESIGN FOR FAILURE
2. MULTIPLE AVAILABILITY ZONES
3. SCALING
4. SELF-HEALING
5. LOOSE COUPLING
1. DESIGN FOR FAILURE
2. MULTIPLE AVAILABILITY ZONES
3. SCALING
4. SELF-HEALING
5. LOOSE COUPLING
YOUR GOAL
Applications should continue to function
ITS ALL ABOUT

CHOICE
BALANCE COST & AVAILABILITY REQUIREMENTS
AWS Architecture Center
http://aws.amazon.com/architecture

AWS Whitepapers
http://aws.amazon.com/whitepapers

AWS Blog
http://aws.amazon.com/blogs/aws

Thanks for attending!


- Joel Williams

You might also like