PHIL BASFORD – HEAD OF SOLUTION ENGINEERING
Tuesday, 8 October 2019
MACHINE LEARNING
AT SCALE
@philipbasford
2
AGENDA
Baseline:
➤ Machine Learning, Serverless, Dev Ops, Well-
Architected
Endpoint Architecture:
➤ Reference architecture
➤ Endpoint components
➤ Docker & standard algorithms
➤ Implications of Using Docker
Experiment:
➤ Lambda and Serverless artillery
➤ Results + Charts
➤ Support email
➤ Instance types and Gunicorn
➤ Findings
➤ Auto scaling – HA / Fault Torrance
Operational excellence:
➤ SageMaker CloudWatch metrics
➤ Custom CloudWatch metrics
➤ CloudWatch dashboard
➤ X-Ray - inclusion, sample rate, and set up
➤ X-Ray – Service Map, Traces, and Analytics
DevOps:
➤ Deployment Types
➤ AWS SageMaker with AWS CodePipeline, AWS Lambda
and AWS Step Functions
➤ SAM
Questions
3
SERVERLESS
Lambda API Gateway
DynamoDB is A fully
managed non-sql
cloud service from
AWS. In machine
learning it is typically
used for reference
data.
DynamoDBS3
SNS ; Pub + Sub
SQS : Queues
Fargate : Containers
Step Functions:
Workflows
..and more
Highly durable object
storage used for many
things including data
lakes. For machine
learning it is used to
store training data sets
and model artefacts
API Gateway is the
endpoint for your API,
it has extensive
security measures,
logging, and API
definition using open
API or swagger.
AWS Lambda is
AWS’s native and fully
managed cloud
service for running
application code
without the need to
run servers.
4
Monitoring, observing
and alerting using
CloudWatch and X-
Ray. Infrastructure as
Code with SAM and
CloudFormation.
Operational Excellence
Least privilege, Data
Encryption at Rest,
and Data Encryption
in Transit using IAM
Policies, Resource
Policies, KMS, Secret
Manager, VPC and
Security Group.
Security
Elastic scaling based
on demand and
meeting response
times using Auto
Scaling, Serverless,
and Per Request
managed services.
Performance
Serverless and fully
managed services to
lower TCO. Resource
Tag everything
possible for cost
analysis. Right sizing
instance types for
model hosting.
Cost Optimisation
Fault tolerance and
auto healing to meet a
target availability
using Auto Scaling,
Multi AZ, Multi Region,
Read Replicas and
Snapshots.
Reliance
5
Serverless Machine Learning application using an XGBoost model for predictions
REFERENCE ARCHITECTURE
AWS Cloud (Inawisdom RAMP)
Users
Data Lake Data Science
Web API
Data Science
Team
Amazon
API Gateway
Amazon
SageMaker
AWS
Lambda
Amazon
DynamoDB
Amazon S3
AWS Glue
6
Logical components of an endpoint within Amazon SageMaker
AMAZON SAGEMAKER – COMPONENTS
All components are immutable, any configuration changes require new models and endpoint configurations,
however there is a specific SageMaker API to update instance count and variant weight
Endpoint
ConfigurationEndpoint
Model
Primary Container
Container
Container
VPC
S3
KMS + IAM
Model
Primary Container
Container
Container
VPC
S3
KMS + IAM
Production Variant
Production Variant
Model
Initial
Count + Weight
Instance Type
SDKsRESTSignV4Requests
Name
7
Endpoint
Docker containers host the inference engines, inference engines can be written in any language and endpoints can use
more than one container. Primary container needs to implement a simple REST API.
Common Engines:
➤ 685385470294.dkr.ecr.eu-west-1.amazonaws.com/xgboost:1
➤ 520713654638.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-
tensorflow:1.11-cpu-py2
➤ 520713654638.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-
tensorflow:1.11-gpu-py2
➤ 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-
inference:1.13-gpu
➤ 520713654638.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-
tensorflow-serving:1.11-cpu
AMAZON SAGEMAKER – INFERENCE ENGINES
Dockerfile:
FROM tensorflow/serving:latest
RUN apt-get update && apt-get install -y --no-install-
recommends nginx git
RUN mkdir -p /opt/ml/model
COPY nginx.conf /etc/nginx/nginx.conf
ENTRYPOINT service nginx start | tensorflow_model_server --
rest_api_port=8501 --
model_config_file=/opt/ml/model/models.config
Container
http://localhost:8080/invocations
http://localhost:8080/ping
Amazon
SageMaker model.tar.gz
Primary Container
Nginx Gunicorn Model
Runtime
link
/opt/ml/model
X-Amzn-SageMaker-Custom-Attributes
8
Using Docker immediately raises the following questions
➤How many Docker containers are run on a single underlying EC2 instance?
➤Is Kubernetes or ECS used? And do I have to become a Docker expert?
➤How fast and how slow are instances started and stopped?
➤How do instances reside within the VPC and use network resources? For example, can
the number of instances exhaust the network addresses of a VPC?
➤How isolated are my models? as Docker uses soft CPU and Memory units?
➤Will I suffer issues if containers are bin packed or re-distributed?
IMPLICATIONS OF USING DOCKER
9
In order to answer these questions a series of experiments were carried out
THE EXPERIMENT
AZ Available Address
EU-West-1a 4091
EU-West-1b 4091
EU-West-1c 4091
AZ Available Address
EU-West-1a 4090
EU-West-1b 4091
EU-West-1c 4091
AZ Available Address
EU-West-1a 4090
EU-West-1b 4090
EU-West-1c 4090
After VPC Creation:
After Notebook Instance Creation:
After Endpoint Creation:
primary_container ={
"Image": "685385470294.dkr.ecr.eu-west-1.amazonaws.com/xgboost:1",
"ModelDataUrl": "s3://mybucket/mymodel/output/model.tar.gz",
}
create_model_response = sm_client.create_model(
ModelName = ‘load-test’,
ExecutionRoleArn = role,
PrimaryContainer = primary_container,
VpcConfig = {
"SecurityGroupIds": [
"My SecurityGroupId”
],
“Subnets": [
"Subnet Id 1b”,
"Subnet Id 1c”
]}
}
create_endpoint_config_response = sm_client.create_endpoint_config(
EndpointConfigName = endpoint_config_name,
ProductionVariants=[{
'InstanceType':'ml.t2.medium',
'InitialInstanceCount':2,
'InitialVariantWeight':1,
'ModelName’: ‘load-test’,
'VariantName':'AllTraffic’}
])
Endpoint Creation:
10
In order to answer these questions a series of experiments were carried out
THE EXPERIMENT
session = boto3.Session()
client = session.client('sagemaker-runtime')
ENDPOINT_NAME = 'XGBoostEndpoint'
def lambda_handler(event, context):
file_name = 'test_point.csv'
with open(file_name, 'r') as f:
payload = f.read().strip()
print(payload)
response = client.invoke_endpoint(
EndpointName=ENDPOINT_NAME,
Body=payload,
ContentType='text/csv'
)
return {
'statusCode': 200,
'body': json.dumps('Hello!')
}
Lambda:
# You can find great documentation of the possibilities
at: https://artillery.io/docs/
config:
target: "https://???.execute-api.us-west-
2.amazonaws.com/Dev"
phases:
- duration: 3600
arrivalRate: 200
rampTo: 600
name: "warm up”
- duration: 3600
arrivalRate: 600
rampTo: 800
name: "Max load"
scenarios:
- flow:
- post:
url: "/".
Artillery:
11
The CPU usage in AWS CloudWatch for a load run test experiment
RESULTS
At 13:20 we saw the start of a drop
in the CPU usage and at 13:40 it
stopped at 100%, why was this?
From the load script I configured
we know that this is when
serverless-artillery entered the 2nd
phase of sustained load.
There was a slow ramp up for the
first 15 mins until we hit around the
200% CPU usage mark. The 200%
CPU usage means we are using
more than the capacity of a single
endpoint instance.
We then saw a return to 200%
CPU usage 10 minutes later. At
14:40 we saw a complete stop in
load and this is when the
serverless-artillery job completed
12
Investigation into what happen at 13:20, luckily AWS SageMaker sends logs to AWS CloudWatch
RESULTS
➤ During the entire run there was only three log streams and
each one has an ‘instance id’ as an identifier
➤ There were 2 instances at the start of the run and at the end we
still had only two instances
➤ There are no errors in the logs and the last entry was at 13:24.
This would imply none of the instances crashed due to a fatal
error
➤ Instance i-010adxx started logging at 13:38 but we only saw
the CPU start to increase at 13:45, a gap of 7 minutes.
➤ The output from AWS CloudWatch Logs Insights confirms that
each instance was producing 100-150 entries per second
Support Email
14
Inside XGBoost Docker image and its implementation
XGBOOST INFERENCE ENGINE
“Gunicorn relies on the operating system to provide all of the load balancing when handling requests. Generally we
recommend (2 x num_cores) + 1 as the number of workers to start off with. While not overly scientific, the formula is
based on the assumption that for a given core, one worker will be reading or writing from the socket while the other
worker is processing a request” 2019 © http://docs.gunicorn.org/
The XGBoost inference engine is
implemented using Gunicorn. Gunicorn
is providing a lightweight REST API to
the model located on each of the
instances.
5 Gunicorn workers were created and
this is very important as Python is single
threaded and you need to relate the
number of workers to CPU cores.
Docker image used was
685385470294.dkr.ecr.eu-west-
1.amazonaws.com/xgboost:1
15
The key findings from the experiment were:
FINDINGS
We have proven from the endpoint creation process and from building a model that an endpoint
uses Docker containers on EC2 instances.
Load was spread evenly over the 2 instances and each instance can serve a large amount of
inference requests per second.
Instance i-01be56xxx termination took 5 minutes and instance i-010adxx took 7 minutes to start.
Such timings are indicative of the time it takes to start EC2 instances and configured cooldown
times
Each instance when deployed into a VPC will use a single IP Address. This means that if a
reasonable subnet range is configured then the number of IP addresses will not be exhausted.
We have proven that if an instance crashes or terminates that it will be auto healed.
Instance scaling using Application Auto Scaling
AUTO SCALING
➤Application Auto Scaling is supported with the
usual controls.
➤'InitialInstanceCount’ becomes
DesiredInstanceCount
➤Model can be scaled on CPU, Disk and
Memory utilisation
➤More importantly they can be scaled on
SageMakerVariantInvocationsPerInstance
➤Make sure MinCapacity is greater than 1 for
Fault Torrance and High Availability.
"SageMakerAutoScalingTarget": {
"Type": "AWS::ApplicationAutoScaling::ScalableTarget",
"Properties": {
"MaxCapacity": 4,
"MinCapacity": 2,
"ScalableDimension":
"sagemaker:variant:DesiredInstanceCount",
"ServiceNamespace": "sagemaker",
"RoleARN": "arn:aws:iam::xxxxx:role/sagemakerrole",
"ResourceId": "endpoint/MyEndpoint/variant/AllTraffic”
}
},
"SageMakerAutoScalingPolicy": {
"Type" : "AWS::ApplicationAutoScaling::ScalingPolicy",
"DependsOn": “SageMakerAutoScalingTarget",
"Properties": {
"PolicyName": "SageMakerAutoScalingPolicy",
"ServiceNamespace": "sagemaker",
"ResourceId": "endpoint/MyEndpoint/variant/AllTraffic”
"ScaleInCooldown": 300,
"ScaleOutCooldown": 300,
"ScalableDimension":"sagemaker:variant:DesiredInstanceCount",
"PolicyType": "TargetTrackingScaling",
"TargetTrackingScalingPolicyConfiguration": {
"TargetValue": 100,
"PredefinedMetricSpecification":{
"PredefinedMetricType":
"SageMakerVariantInvocationsPerInstance"
}
}
}
Amazon SageMaker exposes metrics to AWS CloudWatch
METRIC AND ALARMS
Used with the following metrics to
provide a complete view:
➤ API Gateway 4XX and 5XX errors
➤ Lambda Latency
➤ Lambda Invocations
➤ Lambda Errors
Name Dimension Statistic Threshold Time Period Missing
Endpoint model
latency
Milliseconds Average >100 For 5 minutes ignore
Endpoint model
invocations
Count Sum
> 10000
For 15 minutes
notBreaching
< 1000 breaching
Endpoint disk
usage
% Average
> 90%
For 15 minutes ignore
> 80%
Endpoint CPU
usage
% Average
> 90%
For 15 minutes ignore
> 80%
Endpoint memory
usage
% Average
> 90%
For 15 minutes ignore
> 80%
Endpoint 5XX
errors
Count Sum >10 For 5 minutes
notBreaching
Endpoint 4XX
errors
Count Sum >50 For 5 minutes
The metrics in AWS CloudWatch
can then be used for alarms:
➤ Always pay attention to how to
handle missing data
➤ Always test your alarms
➤ Look to level your alarms
➤ Make your alarms complement
each other
WITH CLOUD FORMATION
AWS SAGEMAKER
19
Availability Metrics: Usage Plan Metrics:
ADDITIONAL CUSTOM METRICS
➤ AWS API GW has no Usage Plan or Access Key metric in
AWS CloudWatch. Therefore we added it!
➤ Runs 0100 UTC daily and publish the previous 24hrs
usage with dimensions being ‘Usage Plan Id’ and ‘Key Id’
for RemainingQouta, UsedQouta and PercentageUsed
➤ There is no away of recording complete end-to-end
availability, therefore we added it!
➤ The end-to-end availability performs a ‘meaningful’ health
check, involving all elements of the solution.
➤ Runs every minute and publishes a simple count.
API error and
success rates
API Gateway
response times
using percentiles
Lambda
executions
AWS CloudWatch a dashboard providing complete oversight of the inference process
MONITORING
Availability
recorded from
health checker
API Usage data
for Usage Plan
21
Complete production observations using AWS X-Ray, including all services involved in the inference process
PRODUCTION OBSERVATIONS
22
Using AWS X-Ray from AWS Lambda is very easy
➤ Download X-RAY SDK via requirements.txt
➤ Monkey Patch AWS SDK and common SDKs
➤ Use annotations on your own functions, but be careful!
➤ Use Sample Rates to always capture in production
INTEGRATING X-RAY
# Tracing
from aws_xray_sdk.core import patch_all
patch_all()
from aws_xray_sdk.core import xray_recorder
xray_recorder.capture("## predict")
def __predict(self, req):
….
return a
23
For target response times below 200 Milliseconds, X-RAY traces can help you spot bottlenecks and costly areas of the
code.
X-RAY : TRACES
24
New this year - Analytics allows you to see the distribution of traces
X-RAY : ANALYTICS
25
The following are the four ways to deploy new versions of models in Amazon SageMaker
Rolling:
DEPLOYMENT TYPES
Endpoint
Configuration
Canary Variant
Full Variant
Endpoint
Configuration
Full Variant
Endpoint
Configuration
Full Variant
Endpoint
Configuration
Full Variant
Endpoint
Configuration
New Variant
Old Variant
Canary: Blue/Green: Linear:
weight
The default option, SageMaker
will start new instances and then
once they are healthy stop the
old ones
Canary deployments are done
using two Variants in the
Endpoint Configuration and
performed over two
CloudFormation updates.
Requires two CloudFormation
stacks and then changing the
endpoint name in the AWS
Lambda using an Environment
Variable
Linear uses two Variants in the
Endpoint Configuration and using
an AWS Step Function and AWS
Lambda to call the
UpdateEndpointWeightsAndCap
acities API.
26
AWS SageMaker with AWS CodePipeline, AWS Lambda and AWS Step Functions
CONTINUOUS DEPLOYMENT OF ML MODELS
27
Using Infrastructure as Code to define the application with AWS SAM
Defining applications with SAM allows you to use SAM
Local. Using SAM Local then allows for rapid changes to
be made locally to the inference process without the
need to constantly upload lambda functions to AWS
DEPLOYMENT
SAM Local helps to package the source code of your
AWS Lambda function into zip files (including
dependencies). Then it uploads the zip files to S3 and
deploys them to AWS Lambda
Using stages in API GW and aliases for AWS Lambda
function is brilliant for release control. This can be
automated using SAM’s AutoPublishAlias feature
AWS Toolkit for Visual Studio Code allows you to use
the Python debugger to connect AWS Lambda functions
running within SAM Local
QUESTIONS
29
Webinar:
➤https://pages.awscloud.com/GLOBAL-PTNR-OE-IPC-AIML-Inawisdom-Oct-2019-reg-
event.html
➤https://www.inawisdom.com/machine-learning/amazon-sagemaker-endpoints-inference/
➤https://www.inawisdom.com/machine-learning/machine-learning-performance-more-
than-skin-deep/
➤https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms.html
➤https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsE
mail.html#alarms-and-missing-data
REFERENCES
Other:
My blogs:
020 3575 1337
info@inawisdom.com
Columba House,
Adastral Park, Martlesham Heath
Ipswich, Suffolk, IP5 3RE
www.inawisdom.com
@philipbasford

More Related Content

PDF
Montaje de Dientes en Prótesis Removible Completas
PPTX
Restorasi pasca endo ppt fix rina kaleb.pptx
PPTX
Gingival retraction techniques for implants versus teeth/ cosmetic dentistry ...
PPTX
e-Sim Sharing (extract)
PDF
Exodoncia por vía alveolar y su manejo del caso clínico
PPTX
Savana lesi endo perio
PPT
Implant diagnosis n planning /certified fixed orthodontic courses by India...
PPT
Interocclusal records (2/ dental courses
Montaje de Dientes en Prótesis Removible Completas
Restorasi pasca endo ppt fix rina kaleb.pptx
Gingival retraction techniques for implants versus teeth/ cosmetic dentistry ...
e-Sim Sharing (extract)
Exodoncia por vía alveolar y su manejo del caso clínico
Savana lesi endo perio
Implant diagnosis n planning /certified fixed orthodontic courses by India...
Interocclusal records (2/ dental courses

What's hot (20)

PPTX
Network automation
PPTX
Transportación apical .pptx
PPTX
Presentación Ajuste Oclusal. Protesis Total.pptx
PPTX
Materi singkat workshop penulisan soal ulangan harian
PDF
제조업의 AWS 기반 주요 워크로드 및 고객 사례:: 이현석::AWS Summit Seoul 2018
PDF
SURAT KETERANGAN MENGIKUTI PESANTREN KILAT.pdf
PDF
32 Ortodontide komplikasyonlar.pdf
PDF
AZ 900 135 New Questions.pdf
PPTX
Perdida no cariosa de tejido dentario
PPTX
Seminario patologia pulpa
PDF
Biología periodontal para estudiantes de grado
PDF
Banco de preguntas de prostodoncia
PPTX
Clasificación de áreas edéntulas
PPTX
Sobredentaduras c2
PPTX
Seminario 9: Endodoncia
PDF
Implant prototypes and_designs
PPTX
Cuáles son las condiciones del pilar ideal
PDF
Meraki Cloud Networking Workshop
PPTX
Intrakoronal (tugas gtc)
PPT
Minor connectors/ dental crown & bridge courses
Network automation
Transportación apical .pptx
Presentación Ajuste Oclusal. Protesis Total.pptx
Materi singkat workshop penulisan soal ulangan harian
제조업의 AWS 기반 주요 워크로드 및 고객 사례:: 이현석::AWS Summit Seoul 2018
SURAT KETERANGAN MENGIKUTI PESANTREN KILAT.pdf
32 Ortodontide komplikasyonlar.pdf
AZ 900 135 New Questions.pdf
Perdida no cariosa de tejido dentario
Seminario patologia pulpa
Biología periodontal para estudiantes de grado
Banco de preguntas de prostodoncia
Clasificación de áreas edéntulas
Sobredentaduras c2
Seminario 9: Endodoncia
Implant prototypes and_designs
Cuáles son las condiciones del pilar ideal
Meraki Cloud Networking Workshop
Intrakoronal (tugas gtc)
Minor connectors/ dental crown & bridge courses
Ad

Similar to Machine learning at scale with aws sage maker (20)

PDF
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
PDF
Supporting bioinformatics applications with hybrid multi-cloud services
PDF
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
PPTX
Ultimate Guide to Microservice Architecture on Kubernetes
PDF
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
PDF
Digital Forensics and Incident Response in The Cloud Part 3
PPTX
Re:Invent 2019 Recap. AWS User Groups in Spain. Javier Ramirez
PDF
Cloud Run - the rise of serverless and containerization
PPTX
Docker Enterprise Workshop - Technical
PDF
Big Data Tools in AWS
PDF
Top conf serverlezz
PDF
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
PDF
Where should I run my code? Serverless, Containers, Virtual Machines and more
PDF
AWS Lambda from the trenches
PDF
Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
PDF
Kubecon seattle 2018 workshop slides
PDF
20211202 NADOG Adapting to Covid with Serverless Craeg Strong Ariel Partners
PPTX
Shipping logs to splunk from a container in aws howto
PPTX
DevOps, Microservices and Serverless Architecture
PDF
Getting Started with Apache Spark on Kubernetes
AWS reinvent 2019 recap - Riyadh - Containers and Serverless - Paul Maddox
Supporting bioinformatics applications with hybrid multi-cloud services
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
Ultimate Guide to Microservice Architecture on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
Digital Forensics and Incident Response in The Cloud Part 3
Re:Invent 2019 Recap. AWS User Groups in Spain. Javier Ramirez
Cloud Run - the rise of serverless and containerization
Docker Enterprise Workshop - Technical
Big Data Tools in AWS
Top conf serverlezz
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Where should I run my code? Serverless, Containers, Virtual Machines and more
AWS Lambda from the trenches
Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Kubecon seattle 2018 workshop slides
20211202 NADOG Adapting to Covid with Serverless Craeg Strong Ariel Partners
Shipping logs to splunk from a container in aws howto
DevOps, Microservices and Serverless Architecture
Getting Started with Apache Spark on Kubernetes
Ad

More from PhilipBasford (20)

PPTX
Gartner Talk on AI Transformation & Innovation
PDF
AWS Construction Event for Gen AI and Connected Data Lakes - Jun 2024
PDF
AWS Summit London 2024 - Cognizant Partner Spotlight - Cognitive Architecture...
PDF
re:cap Generative AI journey with Bedrock
PDF
AIM102-S_Cognizant_CognizantCognitive
PDF
Inawisdom IDP
PDF
Inawisdom MLOPS
PDF
Inawisdom Quick Sight
PDF
Inawsidom - Data Journey
PDF
Realizing_the_real_business_impact_of_gen_AI_white_paper.pdf
PDF
Gen AI Cognizant & AWS event presentation_12 Oct.pdf
PDF
Inawisdom Overview - construction.pdf
PDF
D3 IDP Slides.pdf
PDF
C04 Driving understanding from Documents and unstructured data sources final.pdf
PPTX
Securing your Machine Learning models
PPTX
Fish Cam.pptx
PDF
Ml ops on AWS
PDF
Ml 3 ways
PDF
Palringo AWS London Summit 2017
PDF
Palringo : a startup's journey from a data center to the cloud
Gartner Talk on AI Transformation & Innovation
AWS Construction Event for Gen AI and Connected Data Lakes - Jun 2024
AWS Summit London 2024 - Cognizant Partner Spotlight - Cognitive Architecture...
re:cap Generative AI journey with Bedrock
AIM102-S_Cognizant_CognizantCognitive
Inawisdom IDP
Inawisdom MLOPS
Inawisdom Quick Sight
Inawsidom - Data Journey
Realizing_the_real_business_impact_of_gen_AI_white_paper.pdf
Gen AI Cognizant & AWS event presentation_12 Oct.pdf
Inawisdom Overview - construction.pdf
D3 IDP Slides.pdf
C04 Driving understanding from Documents and unstructured data sources final.pdf
Securing your Machine Learning models
Fish Cam.pptx
Ml ops on AWS
Ml 3 ways
Palringo AWS London Summit 2017
Palringo : a startup's journey from a data center to the cloud

Recently uploaded (20)

PDF
CEH Module 2 Footprinting CEH V13, concepts
PDF
“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Prese...
PDF
Co-training pseudo-labeling for text classification with support vector machi...
PDF
Human Computer Interaction Miterm Lesson
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PDF
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
PPTX
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
PDF
Examining Bias in AI Generated News Content.pdf
PDF
Altius execution marketplace concept.pdf
PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PDF
giants, standing on the shoulders of - by Daniel Stenberg
PPTX
Presentation - Principles of Instructional Design.pptx
PDF
LMS bot: enhanced learning management systems for improved student learning e...
PDF
Auditboard EB SOX Playbook 2023 edition.
PDF
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
PDF
Electrocardiogram sequences data analytics and classification using unsupervi...
PPTX
Internet of Everything -Basic concepts details
PDF
The AI Revolution in Customer Service - 2025
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
CEH Module 2 Footprinting CEH V13, concepts
“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Prese...
Co-training pseudo-labeling for text classification with support vector machi...
Human Computer Interaction Miterm Lesson
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
Examining Bias in AI Generated News Content.pdf
Altius execution marketplace concept.pdf
Data Virtualization in Action: Scaling APIs and Apps with FME
giants, standing on the shoulders of - by Daniel Stenberg
Presentation - Principles of Instructional Design.pptx
LMS bot: enhanced learning management systems for improved student learning e...
Auditboard EB SOX Playbook 2023 edition.
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
Electrocardiogram sequences data analytics and classification using unsupervi...
Internet of Everything -Basic concepts details
The AI Revolution in Customer Service - 2025
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf

Machine learning at scale with aws sage maker

  • 1. PHIL BASFORD – HEAD OF SOLUTION ENGINEERING Tuesday, 8 October 2019 MACHINE LEARNING AT SCALE @philipbasford
  • 2. 2 AGENDA Baseline: ➤ Machine Learning, Serverless, Dev Ops, Well- Architected Endpoint Architecture: ➤ Reference architecture ➤ Endpoint components ➤ Docker & standard algorithms ➤ Implications of Using Docker Experiment: ➤ Lambda and Serverless artillery ➤ Results + Charts ➤ Support email ➤ Instance types and Gunicorn ➤ Findings ➤ Auto scaling – HA / Fault Torrance Operational excellence: ➤ SageMaker CloudWatch metrics ➤ Custom CloudWatch metrics ➤ CloudWatch dashboard ➤ X-Ray - inclusion, sample rate, and set up ➤ X-Ray – Service Map, Traces, and Analytics DevOps: ➤ Deployment Types ➤ AWS SageMaker with AWS CodePipeline, AWS Lambda and AWS Step Functions ➤ SAM Questions
  • 3. 3 SERVERLESS Lambda API Gateway DynamoDB is A fully managed non-sql cloud service from AWS. In machine learning it is typically used for reference data. DynamoDBS3 SNS ; Pub + Sub SQS : Queues Fargate : Containers Step Functions: Workflows ..and more Highly durable object storage used for many things including data lakes. For machine learning it is used to store training data sets and model artefacts API Gateway is the endpoint for your API, it has extensive security measures, logging, and API definition using open API or swagger. AWS Lambda is AWS’s native and fully managed cloud service for running application code without the need to run servers.
  • 4. 4 Monitoring, observing and alerting using CloudWatch and X- Ray. Infrastructure as Code with SAM and CloudFormation. Operational Excellence Least privilege, Data Encryption at Rest, and Data Encryption in Transit using IAM Policies, Resource Policies, KMS, Secret Manager, VPC and Security Group. Security Elastic scaling based on demand and meeting response times using Auto Scaling, Serverless, and Per Request managed services. Performance Serverless and fully managed services to lower TCO. Resource Tag everything possible for cost analysis. Right sizing instance types for model hosting. Cost Optimisation Fault tolerance and auto healing to meet a target availability using Auto Scaling, Multi AZ, Multi Region, Read Replicas and Snapshots. Reliance
  • 5. 5 Serverless Machine Learning application using an XGBoost model for predictions REFERENCE ARCHITECTURE AWS Cloud (Inawisdom RAMP) Users Data Lake Data Science Web API Data Science Team Amazon API Gateway Amazon SageMaker AWS Lambda Amazon DynamoDB Amazon S3 AWS Glue
  • 6. 6 Logical components of an endpoint within Amazon SageMaker AMAZON SAGEMAKER – COMPONENTS All components are immutable, any configuration changes require new models and endpoint configurations, however there is a specific SageMaker API to update instance count and variant weight Endpoint ConfigurationEndpoint Model Primary Container Container Container VPC S3 KMS + IAM Model Primary Container Container Container VPC S3 KMS + IAM Production Variant Production Variant Model Initial Count + Weight Instance Type SDKsRESTSignV4Requests Name
  • 7. 7 Endpoint Docker containers host the inference engines, inference engines can be written in any language and endpoints can use more than one container. Primary container needs to implement a simple REST API. Common Engines: ➤ 685385470294.dkr.ecr.eu-west-1.amazonaws.com/xgboost:1 ➤ 520713654638.dkr.ecr.eu-west-1.amazonaws.com/sagemaker- tensorflow:1.11-cpu-py2 ➤ 520713654638.dkr.ecr.eu-west-1.amazonaws.com/sagemaker- tensorflow:1.11-gpu-py2 ➤ 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow- inference:1.13-gpu ➤ 520713654638.dkr.ecr.eu-west-1.amazonaws.com/sagemaker- tensorflow-serving:1.11-cpu AMAZON SAGEMAKER – INFERENCE ENGINES Dockerfile: FROM tensorflow/serving:latest RUN apt-get update && apt-get install -y --no-install- recommends nginx git RUN mkdir -p /opt/ml/model COPY nginx.conf /etc/nginx/nginx.conf ENTRYPOINT service nginx start | tensorflow_model_server -- rest_api_port=8501 -- model_config_file=/opt/ml/model/models.config Container http://localhost:8080/invocations http://localhost:8080/ping Amazon SageMaker model.tar.gz Primary Container Nginx Gunicorn Model Runtime link /opt/ml/model X-Amzn-SageMaker-Custom-Attributes
  • 8. 8 Using Docker immediately raises the following questions ➤How many Docker containers are run on a single underlying EC2 instance? ➤Is Kubernetes or ECS used? And do I have to become a Docker expert? ➤How fast and how slow are instances started and stopped? ➤How do instances reside within the VPC and use network resources? For example, can the number of instances exhaust the network addresses of a VPC? ➤How isolated are my models? as Docker uses soft CPU and Memory units? ➤Will I suffer issues if containers are bin packed or re-distributed? IMPLICATIONS OF USING DOCKER
  • 9. 9 In order to answer these questions a series of experiments were carried out THE EXPERIMENT AZ Available Address EU-West-1a 4091 EU-West-1b 4091 EU-West-1c 4091 AZ Available Address EU-West-1a 4090 EU-West-1b 4091 EU-West-1c 4091 AZ Available Address EU-West-1a 4090 EU-West-1b 4090 EU-West-1c 4090 After VPC Creation: After Notebook Instance Creation: After Endpoint Creation: primary_container ={ "Image": "685385470294.dkr.ecr.eu-west-1.amazonaws.com/xgboost:1", "ModelDataUrl": "s3://mybucket/mymodel/output/model.tar.gz", } create_model_response = sm_client.create_model( ModelName = ‘load-test’, ExecutionRoleArn = role, PrimaryContainer = primary_container, VpcConfig = { "SecurityGroupIds": [ "My SecurityGroupId” ], “Subnets": [ "Subnet Id 1b”, "Subnet Id 1c” ]} } create_endpoint_config_response = sm_client.create_endpoint_config( EndpointConfigName = endpoint_config_name, ProductionVariants=[{ 'InstanceType':'ml.t2.medium', 'InitialInstanceCount':2, 'InitialVariantWeight':1, 'ModelName’: ‘load-test’, 'VariantName':'AllTraffic’} ]) Endpoint Creation:
  • 10. 10 In order to answer these questions a series of experiments were carried out THE EXPERIMENT session = boto3.Session() client = session.client('sagemaker-runtime') ENDPOINT_NAME = 'XGBoostEndpoint' def lambda_handler(event, context): file_name = 'test_point.csv' with open(file_name, 'r') as f: payload = f.read().strip() print(payload) response = client.invoke_endpoint( EndpointName=ENDPOINT_NAME, Body=payload, ContentType='text/csv' ) return { 'statusCode': 200, 'body': json.dumps('Hello!') } Lambda: # You can find great documentation of the possibilities at: https://artillery.io/docs/ config: target: "https://???.execute-api.us-west- 2.amazonaws.com/Dev" phases: - duration: 3600 arrivalRate: 200 rampTo: 600 name: "warm up” - duration: 3600 arrivalRate: 600 rampTo: 800 name: "Max load" scenarios: - flow: - post: url: "/". Artillery:
  • 11. 11 The CPU usage in AWS CloudWatch for a load run test experiment RESULTS At 13:20 we saw the start of a drop in the CPU usage and at 13:40 it stopped at 100%, why was this? From the load script I configured we know that this is when serverless-artillery entered the 2nd phase of sustained load. There was a slow ramp up for the first 15 mins until we hit around the 200% CPU usage mark. The 200% CPU usage means we are using more than the capacity of a single endpoint instance. We then saw a return to 200% CPU usage 10 minutes later. At 14:40 we saw a complete stop in load and this is when the serverless-artillery job completed
  • 12. 12 Investigation into what happen at 13:20, luckily AWS SageMaker sends logs to AWS CloudWatch RESULTS ➤ During the entire run there was only three log streams and each one has an ‘instance id’ as an identifier ➤ There were 2 instances at the start of the run and at the end we still had only two instances ➤ There are no errors in the logs and the last entry was at 13:24. This would imply none of the instances crashed due to a fatal error ➤ Instance i-010adxx started logging at 13:38 but we only saw the CPU start to increase at 13:45, a gap of 7 minutes. ➤ The output from AWS CloudWatch Logs Insights confirms that each instance was producing 100-150 entries per second
  • 14. 14 Inside XGBoost Docker image and its implementation XGBOOST INFERENCE ENGINE “Gunicorn relies on the operating system to provide all of the load balancing when handling requests. Generally we recommend (2 x num_cores) + 1 as the number of workers to start off with. While not overly scientific, the formula is based on the assumption that for a given core, one worker will be reading or writing from the socket while the other worker is processing a request” 2019 © http://docs.gunicorn.org/ The XGBoost inference engine is implemented using Gunicorn. Gunicorn is providing a lightweight REST API to the model located on each of the instances. 5 Gunicorn workers were created and this is very important as Python is single threaded and you need to relate the number of workers to CPU cores. Docker image used was 685385470294.dkr.ecr.eu-west- 1.amazonaws.com/xgboost:1
  • 15. 15 The key findings from the experiment were: FINDINGS We have proven from the endpoint creation process and from building a model that an endpoint uses Docker containers on EC2 instances. Load was spread evenly over the 2 instances and each instance can serve a large amount of inference requests per second. Instance i-01be56xxx termination took 5 minutes and instance i-010adxx took 7 minutes to start. Such timings are indicative of the time it takes to start EC2 instances and configured cooldown times Each instance when deployed into a VPC will use a single IP Address. This means that if a reasonable subnet range is configured then the number of IP addresses will not be exhausted. We have proven that if an instance crashes or terminates that it will be auto healed.
  • 16. Instance scaling using Application Auto Scaling AUTO SCALING ➤Application Auto Scaling is supported with the usual controls. ➤'InitialInstanceCount’ becomes DesiredInstanceCount ➤Model can be scaled on CPU, Disk and Memory utilisation ➤More importantly they can be scaled on SageMakerVariantInvocationsPerInstance ➤Make sure MinCapacity is greater than 1 for Fault Torrance and High Availability. "SageMakerAutoScalingTarget": { "Type": "AWS::ApplicationAutoScaling::ScalableTarget", "Properties": { "MaxCapacity": 4, "MinCapacity": 2, "ScalableDimension": "sagemaker:variant:DesiredInstanceCount", "ServiceNamespace": "sagemaker", "RoleARN": "arn:aws:iam::xxxxx:role/sagemakerrole", "ResourceId": "endpoint/MyEndpoint/variant/AllTraffic” } }, "SageMakerAutoScalingPolicy": { "Type" : "AWS::ApplicationAutoScaling::ScalingPolicy", "DependsOn": “SageMakerAutoScalingTarget", "Properties": { "PolicyName": "SageMakerAutoScalingPolicy", "ServiceNamespace": "sagemaker", "ResourceId": "endpoint/MyEndpoint/variant/AllTraffic” "ScaleInCooldown": 300, "ScaleOutCooldown": 300, "ScalableDimension":"sagemaker:variant:DesiredInstanceCount", "PolicyType": "TargetTrackingScaling", "TargetTrackingScalingPolicyConfiguration": { "TargetValue": 100, "PredefinedMetricSpecification":{ "PredefinedMetricType": "SageMakerVariantInvocationsPerInstance" } } }
  • 17. Amazon SageMaker exposes metrics to AWS CloudWatch METRIC AND ALARMS Used with the following metrics to provide a complete view: ➤ API Gateway 4XX and 5XX errors ➤ Lambda Latency ➤ Lambda Invocations ➤ Lambda Errors Name Dimension Statistic Threshold Time Period Missing Endpoint model latency Milliseconds Average >100 For 5 minutes ignore Endpoint model invocations Count Sum > 10000 For 15 minutes notBreaching < 1000 breaching Endpoint disk usage % Average > 90% For 15 minutes ignore > 80% Endpoint CPU usage % Average > 90% For 15 minutes ignore > 80% Endpoint memory usage % Average > 90% For 15 minutes ignore > 80% Endpoint 5XX errors Count Sum >10 For 5 minutes notBreaching Endpoint 4XX errors Count Sum >50 For 5 minutes The metrics in AWS CloudWatch can then be used for alarms: ➤ Always pay attention to how to handle missing data ➤ Always test your alarms ➤ Look to level your alarms ➤ Make your alarms complement each other
  • 19. 19 Availability Metrics: Usage Plan Metrics: ADDITIONAL CUSTOM METRICS ➤ AWS API GW has no Usage Plan or Access Key metric in AWS CloudWatch. Therefore we added it! ➤ Runs 0100 UTC daily and publish the previous 24hrs usage with dimensions being ‘Usage Plan Id’ and ‘Key Id’ for RemainingQouta, UsedQouta and PercentageUsed ➤ There is no away of recording complete end-to-end availability, therefore we added it! ➤ The end-to-end availability performs a ‘meaningful’ health check, involving all elements of the solution. ➤ Runs every minute and publishes a simple count.
  • 20. API error and success rates API Gateway response times using percentiles Lambda executions AWS CloudWatch a dashboard providing complete oversight of the inference process MONITORING Availability recorded from health checker API Usage data for Usage Plan
  • 21. 21 Complete production observations using AWS X-Ray, including all services involved in the inference process PRODUCTION OBSERVATIONS
  • 22. 22 Using AWS X-Ray from AWS Lambda is very easy ➤ Download X-RAY SDK via requirements.txt ➤ Monkey Patch AWS SDK and common SDKs ➤ Use annotations on your own functions, but be careful! ➤ Use Sample Rates to always capture in production INTEGRATING X-RAY # Tracing from aws_xray_sdk.core import patch_all patch_all() from aws_xray_sdk.core import xray_recorder xray_recorder.capture("## predict") def __predict(self, req): …. return a
  • 23. 23 For target response times below 200 Milliseconds, X-RAY traces can help you spot bottlenecks and costly areas of the code. X-RAY : TRACES
  • 24. 24 New this year - Analytics allows you to see the distribution of traces X-RAY : ANALYTICS
  • 25. 25 The following are the four ways to deploy new versions of models in Amazon SageMaker Rolling: DEPLOYMENT TYPES Endpoint Configuration Canary Variant Full Variant Endpoint Configuration Full Variant Endpoint Configuration Full Variant Endpoint Configuration Full Variant Endpoint Configuration New Variant Old Variant Canary: Blue/Green: Linear: weight The default option, SageMaker will start new instances and then once they are healthy stop the old ones Canary deployments are done using two Variants in the Endpoint Configuration and performed over two CloudFormation updates. Requires two CloudFormation stacks and then changing the endpoint name in the AWS Lambda using an Environment Variable Linear uses two Variants in the Endpoint Configuration and using an AWS Step Function and AWS Lambda to call the UpdateEndpointWeightsAndCap acities API.
  • 26. 26 AWS SageMaker with AWS CodePipeline, AWS Lambda and AWS Step Functions CONTINUOUS DEPLOYMENT OF ML MODELS
  • 27. 27 Using Infrastructure as Code to define the application with AWS SAM Defining applications with SAM allows you to use SAM Local. Using SAM Local then allows for rapid changes to be made locally to the inference process without the need to constantly upload lambda functions to AWS DEPLOYMENT SAM Local helps to package the source code of your AWS Lambda function into zip files (including dependencies). Then it uploads the zip files to S3 and deploys them to AWS Lambda Using stages in API GW and aliases for AWS Lambda function is brilliant for release control. This can be automated using SAM’s AutoPublishAlias feature AWS Toolkit for Visual Studio Code allows you to use the Python debugger to connect AWS Lambda functions running within SAM Local
  • 30. 020 3575 1337 [email protected] Columba House, Adastral Park, Martlesham Heath Ipswich, Suffolk, IP5 3RE www.inawisdom.com @philipbasford

Editor's Notes

  • #6: Above is our reference architecture for this blog. Here is a quick rundown of the components (each one of these components could be subject of its own blog):   Amazon API Gateway exposes and manages the API  and is our contract with the rest of an enterprise (or it could be consumed by a mobile client) Amazon Dynamo DB will store any reference data Amazon SageMaker Endpoint will host the deployed ML model AWS Lambda will contain the application logic; it processes the API request, looks up the reference data, calls the model for a prediction and then formats the response   The target uptime for the reference architecture is 99.95% (same as API Gateway), it is business critical and a response time of below 200ms at the 90th percentile is required. Everything is Multi AZ.
  • #10: Mention why t3 I created a dedicated VPC in EU-West-1 with three /20 subnets (4091 addresses), one located in each of the three AZs I deployed an notebook instance of Amazon SageMaker into the VPC in AZ 1a and then downloaded an example notebook for xgboost and trained a model inside the notebook instance to use AWS SageMakers BOYM I changed the model configuration to use the VPC and changed the endpoint configuration to use more than one instance. I Noted the addresses available in the subnets and I then ran the “Create endpoint” method and waited until the endpoint was in service. Once in service I again noted the addresses available in the VPC
  • #11: I then create a new AWS Lambda function that takes a body of a HTTP POST request and forwards it to the AWS SageMaker endpoint URL. Next I created an AWS API Gateway instance and used an API Key for authentication. Within the AWS API Gateway I created a resource integrated with the AWS Lambda function I downloaded, installed and configured serverless-artillery to hit my API Gateway I started the load test from serverless-artillery and recorded repeatedly every 15 minutes for 2 hours the available addresses in the VPC and metrics from CloudWatch
  • #13: During the entire run there was only three log streams and each one has an ‘instance id’ as an identifier. This combined with the fact that we only saw two used IP addresses in the Subnets means there was only ever 3 instances. There were 2 instances at the start of the run and at the end we still had only two instances. However, in between at 13:24 one of them stopped making log entries and at 13:38 just before 13:40 (the 100% CPU marker) we can see in CloudWatch a new instance starts to record log entries. There are no errors in the logs and the last entry was at 13:24. This would imply the instance did not crash due to a fatal error Instance i-010adxx started logging at 13:38 but we only saw the CPU start to increase at 13:45, i.e. a gap of 7 minutes to start up, which is in line with the default start-up of 5 minutes. The output from AWS CloudWatch Logs Insights confirms that each instance was producing 100-150 entries per second (one entry per request).
  • #16: Load was spread evenly over the 2 instances and each instance can serve a large amount of inference requests per second. This means that only a few instances are required, and this is reflected in the AWS SageMaker service limits. The failure of Instance i-01be56xxx proves that High Availability is possible, and instances can auto heal. Instance i-01be56xxx was a 'ml.t2.medium' and we can assume we used up all of its CPU credits, causing it to become overwhelmed and stop responding when there was not enough CPU to respond to health checks . Instance i-01be56xxx termination took 5 minutes and instance i-010adxx took 7 minutes to start. Such timings are indicative of the time it takes to start EC2 instances (5 minutes or more). Whilst By comparison, normally docker containers would take minutes and lambdas functions would take seconds. The ‘'InitialInstanceCount'’ was set to 2 as a hard limit. In neither the AWS CloudWatch outputs nor in the VPC monitoring did we ever see less than or more than 2 instances launched at the same time. We however did prove that if an instance crashes or terminates that it will be auto healed. We have proven from the endpoint creation process and from building a model that an endpoint uses Docker containers on EC2 instances.  The load test confirms that the containers are invoked using requests. Each instance if deployed into a VPC will use a single IP Address. This means that if a reasonable subnet range is used then the number of IP addresses will not be exhausted.
  • #17: We could stop there but let’s think about ‘'InitialInstanceCount'’ again and discuss a “top tip”. From the findings in this blog it does not seem very elastic, but this is actually not correct. You can make the instances a AWS SageMaker Endpoint uses scale in and out  with your load. The way you can achieve it is by using an EC2 application auto-scaling group and setting AWS SageMaker Endpoint as your target group. This allows you to scale them like a regular application auto-scaling group including adjusting the scalable dimension. How to do it is hidden in the depths of the AWS documentation (AWS : Endpoint auto scaling add policy). You can setup an auto-scaling group from the AWS Console, CLI or using CloudFormation. Here is a CloudFormation example:  Actually maybe “in our out” is better than “up or down”. Instance numbers scale “up or down”, instances scale “in or out” :)
  • #18: breaching, notBreaching, ignore, and missing
  • #22: In order to use AWS X-Ray, you need to download the AWS X-Ray SDK. The AWS X-Ray SDK works by monkey patching the main AWS SDK and a number of open source SDKs. Monkey patching is wrapping a class at runtime with another, recording its invocation and completion. AWS X-Ray takes these captures locally, then in the background it sends them asynchronously in batches using UDP to the X-Ray Daemon. The X-Ray Daemon runs as a sidecar to your main application (note for Lambda the running of the X-Ray Daemon is handled for you by the Lambda runtime). The X-Ray Daemon then sends the data to the AWS X-Ray API.
  • #23: For using X-Ray, here are some top tips:   X-Ray is not part of the AWS SDK; this means you need to add it to your dependency management framework. This also applies to the Lambda runtime, as it requires you to deploy the X-Ray SDK yourself. This seems a bit strange as the X-Ray Daemon is available as a sidecar within the Lambda runtime. This makes the X-Ray SDK ideal for putting into a Lambda Layer. Cold Starts are shown in the AWS console in amber - look into optimising these as much as you can Having monkey patched X-Ray, X-Ray will not record anything until you complete the following steps: Enable X-Ray for your stage in API Gateway Ensure any Lambda functions have IAM roles that includes a policy with write access for the X-Ray API Use AWS X-Ray everywhere (production, QA and development) and all of the time. X-Ray has a feature where you can use different Sample Rates and it loads them at run-time. This means in production the Sample Rate can be lowered to only 1% of your traffic. This can help in spotting any issues and identify any changes in performance between deployments.
  • #24: Annotations and multiple functions work as follows:   Calling an annotated function from another annotated function nests the captures Calling an annotated function more than once from another annotated function nests the multiple capture under the parent capture Calling an annotated function from within a loop means each iteration will create a new capture, under the calling annotated function