+ =
AWS LAMBDA FROM THE TRENCHES
what you should know before you go to production
hi, my name is Yan Cui
@theburningmonk
- Dan North
“lead time to someone saying
thank you is the only reputation
metric that matters.”
security
complexity
OUTSIDE the code
deployment
load balancing
caching
monitoring
config management
https://www.infoq.com/presentations/complexity-simplicity-esb
centralised logging
elastic scaling
setup server
THERE IS NO SERVER
automatic scaling
minimise
undifferentiated
heavy-lifting
simple, fast
deployment
- Dan North
“lead time to someone saying
thank you is the only reputation
metric that matters.”
cost saving
not paying for
idle servers
energy efficiency in DCs
easy to get
started
fuelling
the Yubl
platform
evolution
completely rebuilt search
Legacy Monolith Amazon Kinesis Amazon Lambda
Legacy Monolith Amazon Kinesis Amazon Lambda
Amazon CloudSearchAmazon API Gateway Amazon Lambda
analytics pipeline
Legacy Monolith Amazon Kinesis Amazon Lambda
Google BigQuery
Legacy Monolith Amazon Kinesis Amazon Lambda
Google BigQuery
1 developer, 2 days
design production
(his 1st serverless project)
Legacy Monolith Amazon Kinesis Amazon Lambda
Google BigQuery
“nothing ever got done
this fast at Skype!”
- Chris Twamley
- Dan North
“lead time to someone saying
thank you is the only reputation
metric that matters.”
Facebook login
Amazon Lambda GrapheneDBAmazon API Gateway
Amazon API Gateway Amazon Lambda Facebook Graph API
and many more…
GET PRODUCTION-READY
USE A

DEPLOYMENT
FRAMEWORK
http://serverless.com
http://apex.run
https://github.com/claudiajs/claudia
TESTING
Amazon Lambda
Amazon Kinesis
Amazon IOT
Amazon IOT
“I thought of objects being like
biological cells and/or individual
computers on a network, only
able to communicate with
messages.”
- Alan Kay
Amazon Lambda
Amazon Kinesis
Amazon IOT
Amazon IOT
“OOP to me means only
messaging, local retention and
protection and hiding of state-
process, and extreme late-
binding of all things.”
- Alan Kay
amzn.to/29Lxuzu
Level of Testing
1.Unit
do our objects do the right thing?
are they easy to work with?
Level of Testing
1.Unit
2.Integration
does our code work against code we
can’t change?
handler
handler
test by invoking
the handler
Level of Testing
1.Unit
2.Integration
3.Acceptance
does the whole system work?
Level of Testing
unit
integration
acceptance
Level of Testing
unit
integration
acceptance
can do all 3 with Lambda
“…We find that tests that mock external
libraries often need to be complex to
get the code into the right state for the
functionality we need to exercise.
The mess in such tests is telling us that
the design isn’t right but, instead of
fixing the problem by improving the
code, we have to carry the extra
complexity in both code and test…”
Don’t Mock Types You Can’t Change
“…The second risk is that we have to be
sure that the behaviour we stub or mock
matches what the external library will
actually do…
Even if we get it right once, we have to
make sure that the tests remain valid
when we upgrade the libraries…”
Don’t Mock Types You Can’t Change
Don’t Mock Types You Can’t Change
Services
“…Wherever possible, an acceptance
test should exercise the system end-to-
end without directly calling its internal
code.
An end-to-end test interacts with the
system only from the outside: through
its interface…”
Testing End-to-End
Legacy Monolith Amazon Kinesis Amazon Lambda
Amazon CloudSearchAmazon API Gateway Amazon Lambda
Legacy Monolith Amazon Kinesis Amazon Lambda
Amazon CloudSearchAmazon API Gateway Amazon Lambda
Test Input
Legacy Monolith Amazon Kinesis Amazon Lambda
Amazon CloudSearchAmazon API Gateway Amazon Lambda
Test Input
Validate
“…We prefer to have the end-to-end
tests exercise both the system and the
process by which it’s built and
deployed…
This sounds like a lot of effort (it is), but
has to be done anyway repeatedly
during the software’s lifetime…”
Testing End-to-End
Jenkins build config deploys and tests
unit + integration tests
deploy
acceptance tests
build.sh allows repeatable builds on both local & CI
TEAM WORK
shared environments
GOALS
easily propagate
environmental changes
GOALS
PRO TIP
don’t ignore _meta
centralised config service
config service
goes here
APP SECRETS
GOALS
sensitive data are
encrypted at rest
(credentials, connection string, etc.)
GOALS
has to work on CI
GOALS
role-based access
hand-rolled with KMS
(encrypted at rest)
hand-rolled with KMS
plug-ins
serverless-plugin-kmsvariables
serverless-secrets
serverless-meta-sync
centralised config service
DOCUMENTATION
set goals
set goals
choose a way
set goals
choose a way
document
create project templates/scaffolds
set goals
choose a way
evaluate document
set goals
choose a way
evaluate document
set goals
choose a way
evaluate document
share
LOGGING
2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae
GOT is off air, what do I do now?
2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae
GOT is off air, what do I do now?
UTC Timestamp API Gateway Request Id
your log message
organised by Function + Version
LOG OVERLOAD
centralise your logs
CloudWatch Logs AWS Lambda
LogStash ElasticSearch
CloudWatch Logs AWS Lambda
LogStash ElasticSearch
AWS Elasticsearch
CloudWatch Logs AWS Lambda
LogStash ElasticSearch
AWS Elasticsearch
Elastic Cloud
CloudWatch Logs AWS Lambda
LogStash ElasticSearch
AWS Elasticsearch
Elastic Cloud
?
correlation IDs
MONITORING
PRO TIP
set up dashboards
PRO TIP
don’t forget to set
up alarms
PRO TIP
add application-level
metrics
ERROR HANDLING
“how do I return
HTTP error codes?”
{ “status” : 404, “errorMessage” : ”oops” }
{ “status” : 404, “errorMessage” : ”oops” }
s-templates.json
{ “status” : 404, “errorMessage” : ”oops” }
PRO TIP
map timeouts to 504
every Lambda function has
a timeout setting
use error regex to map it to
a HTTP 504
s-templates.json
PRO TIP
avoid using 128mb
setting for production
continuous timeout loop…
PRO TIP
proactively time out
your function
“what’s the retry strategy with
Kinesis and SNS?”
“…If the invocation for one record
times out, is throttled, or
encounters any other error,
Lambda will retry until it
succeeds (or the record reaches
its 24-hour expiration) before
moving on to the next record…”
http://aws.amazon.com/lambda/faqs
• do nothing
• swallow errors
• track retry count
effort
• retry forever
• no retry
• retry N times
PRO TIP
use local state to track
no. of retries; move on
after N retries
PRO TIP
record CloudWatch
metrics for error count;
alarm if necessary
retried 3-5 times
KEEP
WARM
functions are unloaded if
idle for a while
noticeable cold start time
(package size matters)
CloudWatch Event AWS Lambda
CloudWatch Event AWS Lambda
ping
ping
ping
ping
CloudWatch Event AWS Lambda
ping
ping
ping
ping
CloudWatch Event AWS Lambda
ping
ping
ping
ping
HEALTH CHECKS?
even then…
functions are recycled
every few hours
functions are recycled
every few hours
PRO TIP
don’t make hard
assumptions about
function lifetime
KNOW YOUR LIMITS
max 50 MB deployment package size
max 50 MB deployment package size
max 75 GB total deployment package size*
* limit is per AWS region
Janitor Monkey
Janitor Lambda
max 5 mins execution time
max 6 MB request payload size*
max 6 MB response payload size
* for a request-response event type
default max 100 concurrent executions*
* soft-limit, can be raised via support ticket
looking ahead
.Net core?
SQS support?
v1.0 (coming soon)
MULTI-CLOUD
FUTURE?
IBM OpenWhisk
Amazon Lambda Azure Web Functions
Google Cloud Functions
competition
faster innovation
lower prices
@theburningmonk
@theburningmonk
theburningmonk.com
github.com/theburningmonk

AWS Lambda from the Trenches