SPN CI/CD journey on AWS
SPN Infra., CoreTech
Scott Miao
11/22/2017
1
Who am I
• Scott Miao
• RD, SPN Infra., TrendMicro
• OOAD system dev. 10+ years
• Hadoop ecosystem 6 years
• AWS for BigData 4 years
• @linkedIn
• @slideshare
2
Agenda
• Original services delivery process in SPN
• Dev/Ops
– DevOps goals V.S. our original way
• CI/CD on AWS
• An example service CI/CD on AWS
• DevOps goals V.S. our original way V.S. CI/CD
on AWS
• Lessons learned
Original services delivery process
in SPN
Developers
2. Source Repo
1. Dev, utests,…
3. Back and forth
4. Trigger CI
Release portal
7. Trigger
Release
build
8.
Release
artifacts
Operators Infra. admin
5. Devices spec.
For both Stg/PROD6.1 Monitoring scripts
6.2 Puppet scripts
6.3 Operation guides
Release portal
Stg.
PROD
Service team Operation team DCS team
9. Stg resources
ready
11. Deploy
and monitor
13.
Release
artifacts
12.1 Itests
12.2 Stress tests
12.3 UAT
15. 16.
17. PROD
release
10. Deploy
service &scripts
14. PROD
resources ready
Dev/Ops
20171122 aws usergrp_coretech-spn-cicd-aws-v01
8
DevOps is not a new technology or a
product. It’s an approach or culture of
software development that seeks stability
and performance at the same time that it
speeds software deliveries to the business.
── Andi Mann, CA Technology ──
Cited from: Derek Chen, RD, TrendMicro
https://www.slideshare.net/derekhound/devops-in-practice-78905911, p#15
9
Software Delivery
Plan Release
Operat
e
Code Build DeployTest
Monito
r
Agile Development
Continuous Integration
Continuous Delivery
Continuous Deployment
DevOps
Cited from: Derek Chen, RD, TrendMicro
https://www.slideshare.net/derekhound/devops-in-practice-78905911, p#23
DevOps goals V.S. our original way
• Faster time to market
– Too complicated to miss steps
– Service team needs to follow up themselves
– Lead time needed steps (Machine resources, etc)
• Lower failure rate of new releases
– Manual steps lead to errors
• Shorten lead time between fixes
– Rolling upgrade
– Invasive
• Faster mean time to recovery
– Hard to deal with machine errors and peak
2https://en.wikipedia.org/wiki/DevOps#Goals
“Very often, automation supports
this objective”
https://en.wikipedia.org/wiki/DevOps#Goals
Quoted from Wikipedia for DevOps goals
CI/CD on AWS
TWO ACHIEVE SAME DEVOPS GOALS
DEVOPS FOCUSES ON ORGANIZATIONAL CHANGES
CI/CD FOCUSES ON TECHNICAL IMPLEMENTATIONS
Review for CI and CD
• Continuous Integration
– is the practice of merging all developer working
copies to a shared mainline (trunk) several times
a day
• Continuous Delivery
– produce software in short cycles, ensuring that
the software can be reliably released at any time
• Continuous Deployment
– means that every change is automatically
deployed to production
https://en.wikipedia.org/wiki/Continuous_integration
https://en.wikipedia.org/wiki/Continuous_delivery
Characteristics of Cloud Computing
• On-demand self-service
– A consumer can unilaterally provision computing capabilities
• Broad network access
– Capabilities are available over the network and accessed
through standard mechanisms
• Resource pooling
– The provider's computing resources are pooled to serve
multiple consumers using a multi-tenant model
• Rapid elasticity
– Capabilities can be elastically provisioned and released
• Measured service
– Cloud systems automatically control and optimize resource use
http://www.inforisktoday.com/5-essential-characteristics-cloud-computing-a-4189
https://en.wikipedia.org/wiki/Infrastructure_as_Code
(AWS)
DevOps
CI/CD
Automation
Cloud Computing
AWS managed services SPN used
• AWS CloudFormation
– Gives developers and systems administrators an easy
way to create and manage a collection of related
AWS resources
– We use it to provision our service components
• Such as Load balancer (ALB), machines (EC2)
• AWS OpsWorks
– A configuration management service that uses Chef,
an automation platform that treats server
configurations as code
– We use it to deploy, configure and startup our
service components
https://aws.amazon.com/cloudformation/
https://aws.amazon.com/opsworks/
AWS CloudFormation + OpsWorks
user
main
IAM ELB OpsWorks
AWS
CloudFormation
main
IAM ALB OpsWorks
AWS
OpsWorks
artifacts
AWS S3
AWS
VPC
Chef recipes1. Put CF templates
2. Put artifacts
3. Put Chef recipes
4. Create CF W/ params,
VPC ID, etc
5. Templates
input
6. Create CF
stacks
7. Provision
AWS resources
8. Create OpsWorks
9. Artifacts/recipes
input
10.
Deploy/Config/start
up service
User
CF
Ops
Ready to
serve
CoreTech DCS managed services
• Enterprise github
– Just like the github we use on Internet
• CloudCI – Enterprise Circle CI
– A Docker container based CI solution
– Seamlessly integrated with github
• JFrog Artifactory
– A CoreTech wise shared artifacts repo.
An example service CI/CD on AWS
ANALYTIC ENGINE
Analytic Engine is an API service for…
Common Big Data computation
service on Cloud (AWS)
https://www.slideshare.net/takeshi_miao/analytic-engine-a-common-big-data-computation-service-on-the-aws
IDC
AE High Level Architecture Design
AZb
AE API servers
RDS
AZa
AZb
AZc
AE API servers
RDS
services
services
services
peering
HTTPS
EMR
EMR
Cross-account
S3 buckets
Auto
Scaling
group
worker
s
worker
sMulti-AZs
Auto
Scaling
group
Auto
Scaling
group
Eureka
Eureka
VPN
HTTPS/HTTP
Basic
Cloud Storagepeering
isValidUser
CS output
HTTPS/HTTP
Basic
Amazon
SNS
Oregon (us-west-2)
IDC
VPN
Splunk
peering
Private ALB
IDC
This is really what we taking care about
AZb
AE API servers
RDS
AZa
AZb
AZc
AE API servers
RDS
services
services
services
peering
HTTPS
EMR
EMR
Cross-account
S3 buckets
Auto
Scaling
group
worker
s
worker
sMulti-AZs
Auto
Scaling
group
Auto
Scaling
group
Eureka
Eureka
VPN
HTTPS/HTTP
Basic
Cloud Storagepeering
isValidUser
CS output
HTTPS/HTTP
Basic
Amazon
SNS
Oregon (us-west-2)
IDC
VPN
Splunk
peering
Private ALB
What components in CI/CD scope
• In scope
– API, Worker, Eureka, Genie W/ auto-scaling group
• EC2, deploy, configure and startup component services
– AWS Elastic Application Load Balancer
– AWS Simple Notification Service
• NOT in scope
– VPC/subnets/VPC peerings
• We use fixed VPC and subnets for both VPN connections and VPC
peerings
– RDS MySQL DB
• Already pre-created
– EMR clusters
• Create by user API calls via AWS Java SDK
CI/CD Usecases
1. Developer edits/pushes codes to github
2. Developer deploys AE to Dev env. for tests
3. Developer terminates AE in Dev env. after tests
4. Developer deploys AE to Stg env. for integrated
tests/UAT
5. Developer deploys AE to PROD env.
6. Developer patches hotfixes and deploys to
PROD
7. Monitor your service components
1. Developer edits/pushes codes to github
Developers
master
AE-100
Repo: spn/ae-saas Project: spn/ae-saas
1.19.0 3.build 4.utests 5.package
6.cp artifacts
to S3
S3: dev-us-east-1
CF templates
ae-
1.19.AE_100.jar
s
Chef recipes
ae-
1.19.AE_100.jars
1. Push
AE-100 branch
2. Trigger CI
7. cp to S3
8.publish artifacts
to mvn repo.
9. Publish
artifacts to
mvn repo.
Feature branch workflow
https://www.atlassian.com/git/tutorials/comparing-workflows
Every commit will trigger this build
2. Developer deploys AE to Dev env. for tests
Developers
Repo: spn/ae-saas Project: spn/ae-saas
4.Create CF
S3: dev-us-east-1
CF templates
ae-
1.19.AE_100.jars
Chef recipes
1. Git tag: c-1.19.AE_100-
dev-us-east-1-myAE
3. Trigger CI
Feature branch workflow
2. Push tag
Dev VPC
AWS CF
5. CF creating for stack: ae-dev-myAE
5.1 Templates
input
6. Provision
resources
7.
Deploy/config/s
tartup service
Ready for
tests
Env.
variables
in CImaster
AE-100
3. Developer terminates AE in Dev env. after tests
Developers
Repo: spn/ae-saas Project: spn/ae-saas
4.delete CF
3. Trigger CI
Feature branch workflow
2. Push tag
Dev VPC
AWS CF
5. CF deleting for
stack: ae-dev-myAE
6. Terminating
resources
1. Git tag: d-1.19.AE_100-
dev-us-east-1-myAE
master
8.1
Deploy/config/
startup service
4. Developer deploys AE to Stg env. for integrated
tests/UAT (Much like UC#2)
Developers
Repo: spn/ae-saas Project: spn/ae-saas
7.Create CF
S3: dev-us-east-1
CF templates
ae-1.19.563.jars Chef recipes
2. Git tag: c-1.19.563-stg-
us-east-1-myAE
4. Trigger CI
Feature branch workflow
3. Push tag
Dev VPC
AWS CF
8. Provision resources
for stack: ae-stg-myAE
Ready for
tests
Env.
variables
in CImaster
AE-100
1.19.563
1. Merge feature branch:
1.19.<buildNum>
5.cp artifacts
to stg S3
●
●
●
6.1 copying
6. cp artifacts from dev to stg
9.Run itests
S3: stg-us-east-1
Run itests
on service
5. Developer deploys AE to PROD env. (Much like
UC#4)
29
Much like UC#4
Git tag: c-1.19.563-prod-us-west-2-myAE
6. Developer patches hotfixes and deploys to PROD
(1/2)
Developers
Repo: spn/ae-saas Project: spn/ae-saas
6.Update
CF
S3: stg-us-east-1
CF templates
ae-1.19.563.jars Chef recipes
1. Git tag: u-1.19.570-
prod-us-west-2-myAE
3. Trigger CI
Feature branch workflow
2. Push tag
Dev VPC
AWS CF
7. Update CF stack: ae-
prod-myAE
Ready to
serve
Env.
variables
in CImaster
AE-105
1.19.570
4.cp artifacts
to prod S3
●
●
●
5.1 copying
5. cp artifacts from stg to prod
S3: prod-us-west-2
8.1 Re-
Deploy/config/
startup service
6. Developer patches hotfixes and deploys to PROD
(2/2)
• Updating W/O SLA impact
– ALB W/ AutoScalingReplacingUpdate for
UpdatePolicy Attribute configured
• Better and flexible Auto-scaling
– EC2 Auto-scaling group + Opsworks
• Cross region deployment as early as possible
– Minor configuration diffs
• Deploy to us-east-1 successful does not assure on others…
– AWS SDK default value is us-east-1
• You may forgot to set in your code…
31
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-updatepolicy.html
https://aws.amazon.com/tw/blogs/devops/auto-scaling-aws-opsworks-instances/
(Auto-healing really sucks)
7. Monitor your service components (1/2)
These are the practices we learned from other teams in Trend
• Visibility
– Operator can get the timely system status every time every where
– Practice:
• CW metrics -> CW dashboard
• CloudWatchLog -> AWS Lambda -> Log management system
• Monitoring
– Operator can setup a threshold at specific point for any metrics as a
monitor
– Therefore, the monitor can trigger corresponding actions to notify operator
– Practice:
• [App logs -> WC agent -> | custom] WC metrics -> WC Alarm
• Auto-Recovery
– System can auto recovers itself for every component runs failed
– Practice:
• EC2 auto-scaling group + Opsworks
• WC metrics -> WC Alarm -> AWS Lambda -> AWS SDK -> AWS Opsworks|AWS EC2
32
7. Monitor your service components (2/2)
A high level architecture design
33
App
components
Managed
Services
AWS
CloudWatch
Default
metrics
Custom metrics
(CPU, mem, disk)
CW
metrics
CW Dashboard
CW Alarms
Pager
AWS SNS
AWS Lambda
AWS
CloudWatchLog
App logs to CWLog
Metric
filters
AWS Lambda
Input Store Process Output
Log management
Visibility
Monitoring
Visibility
AWS Lambda
Auto-recovery
DevOps goals V.S. our original way V.S. CI/CD on
AWS
Goals Original way CI/CD
Faster time to
market
• Too complicated to miss
steps
• Service team needs to
follow up themselves
• Lead time needed steps
(Machine resources, etc)
• One click delivery
• Only one role “developer”
• Minutes of lead time for
resources
Lower failure
rate of new
releases
• Manual steps lead to errors • Fully automation
Shorten lead
time between
fixes
• Rolling upgrade
• Invasive
• Replacing/Rolling upgrade
deployment
• Non-invasive
Faster mean
time to recovery
• Hard to deal with machine
errors and peak
• Elasticities brought from
Cloud Computing platform
https://en.wikipedia.org/wiki/DevOps#Goals
Lessons learned
• Try to automate everything as you can
– Cloudformation + EC2 Auto-scaling group + Opsworks
– AWS::CloudFormation::CustomResource is also a tool to rescue
• Consider to split your service CF template
– Service infra. (RDS, SNS, KMS key, etc)
• You not update your infra. often
– Service instance, (EC2, etc)
• We update our service instances very often
• Not only consider about first time creation
– How to update your services W/O impact SLA
• Monitor ! Monitor !! Monitor !!!
• TEST ! TEST !! TEST !!!
35
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-cfn-customresource.html
2
37
Backups
Different types of Auto-scaling group
39
Service
Auto Scaling
Group
Features Deploy
OpsWorks
24/7
•manual creation/deletion
•configure one instance for one AZ
chef recipe
time-based
•can specify time slot(s) based on hour unit,
on everyday or any day in week
•configure one instance for one AZ
chef recipe
load-based
•can specify CPU/MEM/workload avg. based
on an OPS layer
•UP: when to increase instances
•Down: when to decrease instances
•No max./min. # of instances setting
•configure one instance for one AZ
chef recipe
EC2
•can set max./min. for # of instance
•Multi-AZs support
user-data
Auto Recovery based on Monit
• OpsWorks already use Monit for Auto
Recovery
– Leverage the Monit on EC2
– Have practices in on-premise
11/22/201
7
Confidential | Copyright 2014
TrendMicro Inc.
2
AZ1 AZ2
API
server
API
server
https://mmonit.com/monit/
Auto Scaling group
• Instance check by
CloudWatch
• Process check by
Monit
• No process –
restart process
• Process health
check failed –
terminate EC2
• Terminate EC2 !Auto Scaling group
launch new EC2
Little variances among AWS regions
• Impact
– Same automation scripts can not run successfully among regions, even the
same region sometimes
• Issues
11/22/201
7
Confidential | Copyright 2014
TrendMicro Inc.
2
Service Regions Root cause
OpsWorks Same region on
us-west-2
S3 URL acceptable spec. had changed for property
“Repository URL”
From “https://s3.amazonaws.com” to “https://s3-us-
west-2.amazonaws.com”
OpsWorks us-west-2 V.S. us-
east-1
Still be “Repository URL” issue. “https://s3-us-west-
2.amazonaws.com” V.S. “https://s3.amazonaws.com”
EC2 us-west-2 V.S. us-
east-1
EC2 FQDN spec. is different.
“ip-10-104-33-152.us-west-2.compute.internal” V.S. “ip-
10-103-73-248.ec2.internal”
OpsWorks V.S. image-based deployment
• OpsWorks deployment
– We are currently using
– It takes too long to launch a service component
• E.g. It takes about ~10 mins to launch a Genie node
• Image-based deployment
– Theoretically, it should takes very short time to
launch a service component
– More responsive for peak workloads
– AMI (AWS Machine Images) V.S. Docker images ?
How about API Gateway and ECS ?
• API Gateway
– Not good due to only Internet accessible
– Cold start
– RDB connection overflow
– CORS integration for web UI
• ECS
– Still need to run standby EC2 instances for peak…
– Only take care for RESTful API services
– Kubernates more suitable for our usecases
43

More Related Content

PPTX
My thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
PPTX
Achieve big data analytic platform with lambda architecture on cloud
PPTX
analytic engine - a common big data computation service on the aws
PDF
Migrating Monolithic Applications with the Strangler Pattern
PDF
Architecting for the Cloud using NetflixOSS - Codemash Workshop
PPTX
Continuous delivery and deployment on AWS
PPTX
6Reinventing Oracle Systems in a Cloudy World (RMOUG Trainingdays, February 2...
PPTX
Dev ops for big data cluster management tools
My thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
Achieve big data analytic platform with lambda architecture on cloud
analytic engine - a common big data computation service on the aws
Migrating Monolithic Applications with the Strangler Pattern
Architecting for the Cloud using NetflixOSS - Codemash Workshop
Continuous delivery and deployment on AWS
6Reinventing Oracle Systems in a Cloudy World (RMOUG Trainingdays, February 2...
Dev ops for big data cluster management tools

Similar to 20171122 aws usergrp_coretech-spn-cicd-aws-v01 (20)

PPTX
CI/CD on AWS
PDF
DevOps Spain 2019. Pedro Mendoza-AWS
PDF
Getting to Walk with DevOps
PPTX
AWS CodeStar aws-akl-meetup-Sep2017-bp
PDF
CI/CD Pipelines for Your Infrastructure...as Code!
PPTX
End-to-End CI/CD at scale with Infrastructure-as-Code on AWS
PPTX
CI/CD on pure AWS
PPTX
PPTX
AWS DevDay Cologne - CI/CD for modern applications
PDF
CI/CD using AWS developer tools
PPTX
How Easy to Automate Application Deployment on AWS
PDF
Continuous Integration and Continuous Delivery for your serverless apps - Seb...
PPTX
LAST Conference - Dev-Ops and Continuous Delivery
PPTX
Cloud native Continuous Delivery
PDF
Migrate and Govern Applications on Cloud Infrastructure
PPTX
Cloud native Continuous Delivery
PDF
CI CD using AWS Developer Tools @ AWS Community Day Bengaluru 2018
PDF
Introduction to DevOps and the Practical Use Cases at Credit OK
PDF
DevOps and AWS
PDF
Infrastructure as Code
CI/CD on AWS
DevOps Spain 2019. Pedro Mendoza-AWS
Getting to Walk with DevOps
AWS CodeStar aws-akl-meetup-Sep2017-bp
CI/CD Pipelines for Your Infrastructure...as Code!
End-to-End CI/CD at scale with Infrastructure-as-Code on AWS
CI/CD on pure AWS
AWS DevDay Cologne - CI/CD for modern applications
CI/CD using AWS developer tools
How Easy to Automate Application Deployment on AWS
Continuous Integration and Continuous Delivery for your serverless apps - Seb...
LAST Conference - Dev-Ops and Continuous Delivery
Cloud native Continuous Delivery
Migrate and Govern Applications on Cloud Infrastructure
Cloud native Continuous Delivery
CI CD using AWS Developer Tools @ AWS Community Day Bengaluru 2018
Introduction to DevOps and the Practical Use Cases at Credit OK
DevOps and AWS
Infrastructure as Code
Ad

More from Scott Miao (9)

PPTX
Zero-downtime Hadoop/HBase Cross-datacenter Migration
PPTX
Attack on graph
PDF
004 architecture andadvanceduse
PDF
003 admin featuresandclients
PPTX
006 performance tuningandclusteradmin
PPTX
005 cluster monitoring
PPTX
002 hbase clientapi
PPTX
001 hbase introduction
PPTX
20121022 tm hbasecanarytool
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Attack on graph
004 architecture andadvanceduse
003 admin featuresandclients
006 performance tuningandclusteradmin
005 cluster monitoring
002 hbase clientapi
001 hbase introduction
20121022 tm hbasecanarytool
Ad

Recently uploaded (20)

PDF
Build Real-Time ML Apps with Python, Feast & NoSQL
PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
PDF
Ensemble model-based arrhythmia classification with local interpretable model...
PDF
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PDF
Connector Corner: Transform Unstructured Documents with Agentic Automation
PDF
Auditboard EB SOX Playbook 2023 edition.
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PPTX
Presentation - Principles of Instructional Design.pptx
PDF
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
PDF
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
PDF
Examining Bias in AI Generated News Content.pdf
PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PDF
giants, standing on the shoulders of - by Daniel Stenberg
PDF
Decision Optimization - From Theory to Practice
PDF
Electrocardiogram sequences data analytics and classification using unsupervi...
PPTX
Internet of Everything -Basic concepts details
Build Real-Time ML Apps with Python, Feast & NoSQL
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
Ensemble model-based arrhythmia classification with local interpretable model...
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
Connector Corner: Transform Unstructured Documents with Agentic Automation
Auditboard EB SOX Playbook 2023 edition.
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
Presentation - Principles of Instructional Design.pptx
Dell Pro Micro: Speed customer interactions, patient processing, and learning...
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
Examining Bias in AI Generated News Content.pdf
Data Virtualization in Action: Scaling APIs and Apps with FME
giants, standing on the shoulders of - by Daniel Stenberg
Decision Optimization - From Theory to Practice
Electrocardiogram sequences data analytics and classification using unsupervi...
Internet of Everything -Basic concepts details

20171122 aws usergrp_coretech-spn-cicd-aws-v01

  • 1. SPN CI/CD journey on AWS SPN Infra., CoreTech Scott Miao 11/22/2017 1
  • 2. Who am I • Scott Miao • RD, SPN Infra., TrendMicro • OOAD system dev. 10+ years • Hadoop ecosystem 6 years • AWS for BigData 4 years • @linkedIn • @slideshare 2
  • 3. Agenda • Original services delivery process in SPN • Dev/Ops – DevOps goals V.S. our original way • CI/CD on AWS • An example service CI/CD on AWS • DevOps goals V.S. our original way V.S. CI/CD on AWS • Lessons learned
  • 4. Original services delivery process in SPN
  • 5. Developers 2. Source Repo 1. Dev, utests,… 3. Back and forth 4. Trigger CI Release portal 7. Trigger Release build 8. Release artifacts Operators Infra. admin 5. Devices spec. For both Stg/PROD6.1 Monitoring scripts 6.2 Puppet scripts 6.3 Operation guides Release portal Stg. PROD Service team Operation team DCS team 9. Stg resources ready 11. Deploy and monitor 13. Release artifacts 12.1 Itests 12.2 Stress tests 12.3 UAT 15. 16. 17. PROD release 10. Deploy service &scripts 14. PROD resources ready
  • 8. 8 DevOps is not a new technology or a product. It’s an approach or culture of software development that seeks stability and performance at the same time that it speeds software deliveries to the business. ── Andi Mann, CA Technology ── Cited from: Derek Chen, RD, TrendMicro https://www.slideshare.net/derekhound/devops-in-practice-78905911, p#15
  • 9. 9 Software Delivery Plan Release Operat e Code Build DeployTest Monito r Agile Development Continuous Integration Continuous Delivery Continuous Deployment DevOps Cited from: Derek Chen, RD, TrendMicro https://www.slideshare.net/derekhound/devops-in-practice-78905911, p#23
  • 10. DevOps goals V.S. our original way • Faster time to market – Too complicated to miss steps – Service team needs to follow up themselves – Lead time needed steps (Machine resources, etc) • Lower failure rate of new releases – Manual steps lead to errors • Shorten lead time between fixes – Rolling upgrade – Invasive • Faster mean time to recovery – Hard to deal with machine errors and peak 2https://en.wikipedia.org/wiki/DevOps#Goals
  • 11. “Very often, automation supports this objective” https://en.wikipedia.org/wiki/DevOps#Goals Quoted from Wikipedia for DevOps goals
  • 12. CI/CD on AWS TWO ACHIEVE SAME DEVOPS GOALS DEVOPS FOCUSES ON ORGANIZATIONAL CHANGES CI/CD FOCUSES ON TECHNICAL IMPLEMENTATIONS
  • 13. Review for CI and CD • Continuous Integration – is the practice of merging all developer working copies to a shared mainline (trunk) several times a day • Continuous Delivery – produce software in short cycles, ensuring that the software can be reliably released at any time • Continuous Deployment – means that every change is automatically deployed to production https://en.wikipedia.org/wiki/Continuous_integration https://en.wikipedia.org/wiki/Continuous_delivery
  • 14. Characteristics of Cloud Computing • On-demand self-service – A consumer can unilaterally provision computing capabilities • Broad network access – Capabilities are available over the network and accessed through standard mechanisms • Resource pooling – The provider's computing resources are pooled to serve multiple consumers using a multi-tenant model • Rapid elasticity – Capabilities can be elastically provisioned and released • Measured service – Cloud systems automatically control and optimize resource use http://www.inforisktoday.com/5-essential-characteristics-cloud-computing-a-4189 https://en.wikipedia.org/wiki/Infrastructure_as_Code
  • 16. AWS managed services SPN used • AWS CloudFormation – Gives developers and systems administrators an easy way to create and manage a collection of related AWS resources – We use it to provision our service components • Such as Load balancer (ALB), machines (EC2) • AWS OpsWorks – A configuration management service that uses Chef, an automation platform that treats server configurations as code – We use it to deploy, configure and startup our service components https://aws.amazon.com/cloudformation/ https://aws.amazon.com/opsworks/
  • 17. AWS CloudFormation + OpsWorks user main IAM ELB OpsWorks AWS CloudFormation main IAM ALB OpsWorks AWS OpsWorks artifacts AWS S3 AWS VPC Chef recipes1. Put CF templates 2. Put artifacts 3. Put Chef recipes 4. Create CF W/ params, VPC ID, etc 5. Templates input 6. Create CF stacks 7. Provision AWS resources 8. Create OpsWorks 9. Artifacts/recipes input 10. Deploy/Config/start up service User CF Ops Ready to serve
  • 18. CoreTech DCS managed services • Enterprise github – Just like the github we use on Internet • CloudCI – Enterprise Circle CI – A Docker container based CI solution – Seamlessly integrated with github • JFrog Artifactory – A CoreTech wise shared artifacts repo.
  • 19. An example service CI/CD on AWS ANALYTIC ENGINE
  • 20. Analytic Engine is an API service for… Common Big Data computation service on Cloud (AWS) https://www.slideshare.net/takeshi_miao/analytic-engine-a-common-big-data-computation-service-on-the-aws
  • 21. IDC AE High Level Architecture Design AZb AE API servers RDS AZa AZb AZc AE API servers RDS services services services peering HTTPS EMR EMR Cross-account S3 buckets Auto Scaling group worker s worker sMulti-AZs Auto Scaling group Auto Scaling group Eureka Eureka VPN HTTPS/HTTP Basic Cloud Storagepeering isValidUser CS output HTTPS/HTTP Basic Amazon SNS Oregon (us-west-2) IDC VPN Splunk peering Private ALB
  • 22. IDC This is really what we taking care about AZb AE API servers RDS AZa AZb AZc AE API servers RDS services services services peering HTTPS EMR EMR Cross-account S3 buckets Auto Scaling group worker s worker sMulti-AZs Auto Scaling group Auto Scaling group Eureka Eureka VPN HTTPS/HTTP Basic Cloud Storagepeering isValidUser CS output HTTPS/HTTP Basic Amazon SNS Oregon (us-west-2) IDC VPN Splunk peering Private ALB
  • 23. What components in CI/CD scope • In scope – API, Worker, Eureka, Genie W/ auto-scaling group • EC2, deploy, configure and startup component services – AWS Elastic Application Load Balancer – AWS Simple Notification Service • NOT in scope – VPC/subnets/VPC peerings • We use fixed VPC and subnets for both VPN connections and VPC peerings – RDS MySQL DB • Already pre-created – EMR clusters • Create by user API calls via AWS Java SDK
  • 24. CI/CD Usecases 1. Developer edits/pushes codes to github 2. Developer deploys AE to Dev env. for tests 3. Developer terminates AE in Dev env. after tests 4. Developer deploys AE to Stg env. for integrated tests/UAT 5. Developer deploys AE to PROD env. 6. Developer patches hotfixes and deploys to PROD 7. Monitor your service components
  • 25. 1. Developer edits/pushes codes to github Developers master AE-100 Repo: spn/ae-saas Project: spn/ae-saas 1.19.0 3.build 4.utests 5.package 6.cp artifacts to S3 S3: dev-us-east-1 CF templates ae- 1.19.AE_100.jar s Chef recipes ae- 1.19.AE_100.jars 1. Push AE-100 branch 2. Trigger CI 7. cp to S3 8.publish artifacts to mvn repo. 9. Publish artifacts to mvn repo. Feature branch workflow https://www.atlassian.com/git/tutorials/comparing-workflows Every commit will trigger this build
  • 26. 2. Developer deploys AE to Dev env. for tests Developers Repo: spn/ae-saas Project: spn/ae-saas 4.Create CF S3: dev-us-east-1 CF templates ae- 1.19.AE_100.jars Chef recipes 1. Git tag: c-1.19.AE_100- dev-us-east-1-myAE 3. Trigger CI Feature branch workflow 2. Push tag Dev VPC AWS CF 5. CF creating for stack: ae-dev-myAE 5.1 Templates input 6. Provision resources 7. Deploy/config/s tartup service Ready for tests Env. variables in CImaster AE-100
  • 27. 3. Developer terminates AE in Dev env. after tests Developers Repo: spn/ae-saas Project: spn/ae-saas 4.delete CF 3. Trigger CI Feature branch workflow 2. Push tag Dev VPC AWS CF 5. CF deleting for stack: ae-dev-myAE 6. Terminating resources 1. Git tag: d-1.19.AE_100- dev-us-east-1-myAE master
  • 28. 8.1 Deploy/config/ startup service 4. Developer deploys AE to Stg env. for integrated tests/UAT (Much like UC#2) Developers Repo: spn/ae-saas Project: spn/ae-saas 7.Create CF S3: dev-us-east-1 CF templates ae-1.19.563.jars Chef recipes 2. Git tag: c-1.19.563-stg- us-east-1-myAE 4. Trigger CI Feature branch workflow 3. Push tag Dev VPC AWS CF 8. Provision resources for stack: ae-stg-myAE Ready for tests Env. variables in CImaster AE-100 1.19.563 1. Merge feature branch: 1.19.<buildNum> 5.cp artifacts to stg S3 ● ● ● 6.1 copying 6. cp artifacts from dev to stg 9.Run itests S3: stg-us-east-1 Run itests on service
  • 29. 5. Developer deploys AE to PROD env. (Much like UC#4) 29 Much like UC#4 Git tag: c-1.19.563-prod-us-west-2-myAE
  • 30. 6. Developer patches hotfixes and deploys to PROD (1/2) Developers Repo: spn/ae-saas Project: spn/ae-saas 6.Update CF S3: stg-us-east-1 CF templates ae-1.19.563.jars Chef recipes 1. Git tag: u-1.19.570- prod-us-west-2-myAE 3. Trigger CI Feature branch workflow 2. Push tag Dev VPC AWS CF 7. Update CF stack: ae- prod-myAE Ready to serve Env. variables in CImaster AE-105 1.19.570 4.cp artifacts to prod S3 ● ● ● 5.1 copying 5. cp artifacts from stg to prod S3: prod-us-west-2 8.1 Re- Deploy/config/ startup service
  • 31. 6. Developer patches hotfixes and deploys to PROD (2/2) • Updating W/O SLA impact – ALB W/ AutoScalingReplacingUpdate for UpdatePolicy Attribute configured • Better and flexible Auto-scaling – EC2 Auto-scaling group + Opsworks • Cross region deployment as early as possible – Minor configuration diffs • Deploy to us-east-1 successful does not assure on others… – AWS SDK default value is us-east-1 • You may forgot to set in your code… 31 http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-updatepolicy.html https://aws.amazon.com/tw/blogs/devops/auto-scaling-aws-opsworks-instances/ (Auto-healing really sucks)
  • 32. 7. Monitor your service components (1/2) These are the practices we learned from other teams in Trend • Visibility – Operator can get the timely system status every time every where – Practice: • CW metrics -> CW dashboard • CloudWatchLog -> AWS Lambda -> Log management system • Monitoring – Operator can setup a threshold at specific point for any metrics as a monitor – Therefore, the monitor can trigger corresponding actions to notify operator – Practice: • [App logs -> WC agent -> | custom] WC metrics -> WC Alarm • Auto-Recovery – System can auto recovers itself for every component runs failed – Practice: • EC2 auto-scaling group + Opsworks • WC metrics -> WC Alarm -> AWS Lambda -> AWS SDK -> AWS Opsworks|AWS EC2 32
  • 33. 7. Monitor your service components (2/2) A high level architecture design 33 App components Managed Services AWS CloudWatch Default metrics Custom metrics (CPU, mem, disk) CW metrics CW Dashboard CW Alarms Pager AWS SNS AWS Lambda AWS CloudWatchLog App logs to CWLog Metric filters AWS Lambda Input Store Process Output Log management Visibility Monitoring Visibility AWS Lambda Auto-recovery
  • 34. DevOps goals V.S. our original way V.S. CI/CD on AWS Goals Original way CI/CD Faster time to market • Too complicated to miss steps • Service team needs to follow up themselves • Lead time needed steps (Machine resources, etc) • One click delivery • Only one role “developer” • Minutes of lead time for resources Lower failure rate of new releases • Manual steps lead to errors • Fully automation Shorten lead time between fixes • Rolling upgrade • Invasive • Replacing/Rolling upgrade deployment • Non-invasive Faster mean time to recovery • Hard to deal with machine errors and peak • Elasticities brought from Cloud Computing platform https://en.wikipedia.org/wiki/DevOps#Goals
  • 35. Lessons learned • Try to automate everything as you can – Cloudformation + EC2 Auto-scaling group + Opsworks – AWS::CloudFormation::CustomResource is also a tool to rescue • Consider to split your service CF template – Service infra. (RDS, SNS, KMS key, etc) • You not update your infra. often – Service instance, (EC2, etc) • We update our service instances very often • Not only consider about first time creation – How to update your services W/O impact SLA • Monitor ! Monitor !! Monitor !!! • TEST ! TEST !! TEST !!! 35 http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-cfn-customresource.html
  • 36. 2
  • 37. 37
  • 39. Different types of Auto-scaling group 39 Service Auto Scaling Group Features Deploy OpsWorks 24/7 •manual creation/deletion •configure one instance for one AZ chef recipe time-based •can specify time slot(s) based on hour unit, on everyday or any day in week •configure one instance for one AZ chef recipe load-based •can specify CPU/MEM/workload avg. based on an OPS layer •UP: when to increase instances •Down: when to decrease instances •No max./min. # of instances setting •configure one instance for one AZ chef recipe EC2 •can set max./min. for # of instance •Multi-AZs support user-data
  • 40. Auto Recovery based on Monit • OpsWorks already use Monit for Auto Recovery – Leverage the Monit on EC2 – Have practices in on-premise 11/22/201 7 Confidential | Copyright 2014 TrendMicro Inc. 2 AZ1 AZ2 API server API server https://mmonit.com/monit/ Auto Scaling group • Instance check by CloudWatch • Process check by Monit • No process – restart process • Process health check failed – terminate EC2 • Terminate EC2 !Auto Scaling group launch new EC2
  • 41. Little variances among AWS regions • Impact – Same automation scripts can not run successfully among regions, even the same region sometimes • Issues 11/22/201 7 Confidential | Copyright 2014 TrendMicro Inc. 2 Service Regions Root cause OpsWorks Same region on us-west-2 S3 URL acceptable spec. had changed for property “Repository URL” From “https://s3.amazonaws.com” to “https://s3-us- west-2.amazonaws.com” OpsWorks us-west-2 V.S. us- east-1 Still be “Repository URL” issue. “https://s3-us-west- 2.amazonaws.com” V.S. “https://s3.amazonaws.com” EC2 us-west-2 V.S. us- east-1 EC2 FQDN spec. is different. “ip-10-104-33-152.us-west-2.compute.internal” V.S. “ip- 10-103-73-248.ec2.internal”
  • 42. OpsWorks V.S. image-based deployment • OpsWorks deployment – We are currently using – It takes too long to launch a service component • E.g. It takes about ~10 mins to launch a Genie node • Image-based deployment – Theoretically, it should takes very short time to launch a service component – More responsive for peak workloads – AMI (AWS Machine Images) V.S. Docker images ?
  • 43. How about API Gateway and ECS ? • API Gateway – Not good due to only Internet accessible – Cold start – RDB connection overflow – CORS integration for web UI • ECS – Still need to run standby EC2 instances for peak… – Only take care for RESTful API services – Kubernates more suitable for our usecases 43

Editor's Notes

  • #5: What’s our goal
  • #7: What’s our goal
  • #9: DevOps 其實不是一種新的技術或是新的產品。他是一種軟體開發的文化,尋求一種穩定、高品質的方式,快速把軟體交付到客戶的手中。這句話我覺得非常精準到位,把 DevOps 想要追求的目標給明確的定義出來。
  • #10: 現在我們來談談 DevOps 在軟體交付中扮演了什麼樣的角色。 剛剛我們談到了 Agile Development 敏捷開發,主要的核心思想圍繞在 Plan、Code、Build這三個階段。 如果我們可以把 Test 自動化,也就是當 RD 開發完成,Check-in 原始碼,透過 Integration Server 做 Unit Test,接著自動部署到 Test Environment 做更多的 Integration Test,這一段就是我們常聽到的 Continuous Integration 持續集成的概念。 如果我們可以把 Release 自動化,把通過前一個步驟的程式碼,自動部署到 Stage Environment,做自動化的 Acceptance Test 或者 Performance Test,這一段就是我們常聽到的 Continuous Delivery 持續交付的概念。 如果我們可以把 Deploy 自動化,把通過前一個步驟的程式碼,直接推到我們的 Production Environment,這一段就是我們常聽到的 Continuous Deployment 持續部署的概念。 你會發現一個很有趣的事情,這些不斷冒出來的名詞,其實都是前面概念的延伸,當你談到 DevOps 的時候,就是把自動化這件事情延伸的更長,把 Operate 和 Monitor 也盡可能的自動化了。
  • #13: What’s our goal
  • #20: What’s our goal
  • #39: What’s our goal
  • #44: Cross-Origin Resource Sharing