Confidential, Dynatrace, LLC
From 6 Months Waterfall to 1h Code Deploys
“It was a long journey!”
Andreas Grabner - May 2017
@grabnerandi
Before we get started …
How I explain DevOps (to my parents)
Ship the whole box!
Quality Control
Back to customer
24 “Features in a Box”
Very late feedback 
F r u s t r a t i o n !
1 “Feature at a Time”
Optimize Before DeployImmediate Customer Feedback
User-Driven Continuous Delivery through DevOps
WHO was Dynatrace in ~ 2011?
And what forced us to change?
2major releases/year
customers deploy & operate on-prem
2011
~250 employees
#1 APM Enterprise Space
New (Stack/Cloud) Technologies
New & Faster Competition
Some Complaining Customers (Speed)
Status Quo
Market Challenge
Compuware Acquisition
~3600 employees
Rebranding / Integrations / Expansion
believe in the mission impossible
Bernd Greifeneder
CTO @ Dynatrace
80%
20%
Organization & Culture
Technology
His Goal: 1h from Dev to Ops
2major releases/year
customers deploy & operate on-prem
26 feature releases/year
500 prod deployments/day
self-service online sales
SaaS & Managed
2011 2017
sprint releases (continuous-delivery)
1h: Code -> Prod6months
major/minor release
Step #1: Run Enterprise “SaaS-like”
NOC lessons learnt: TOO SLOW!
Step #2: Lift & Shift Enterprise
Software into AWS
Developer will never do that!
Operator’s job
Anita Engleder
Dynatrace DevOps Lead
First Orchestration Engine to AWS
Lesson #1: Velocity uncovers new bottlenecks!
•Going from 6 to 1 Month Cycles
•Offered to: On-Premise Customers + SaaS-Deployments
• Challenge: 1GB Monolithic Download
• Impact: Error prone updates
• Solution: Componentize, Automate Rollout/Rollback
Capability, A/B Rollout Model
Lesson #2: Need to Increase Sprint Quality
• Sprint Reviews Done on “dynaSprint“
• Daily Builds get deployed on “dynaDay“. Sprint builds to “dynaSprint“
• If you can only show it “on your dev machine“ its NOT DONE!
• Deploy Sprint Builds into our internal Production Environment
• We monitor Website, Support, Licensing, Community ... With Dynatrace
• If we break our own back office software we ALL feel the pain right away
confidential
Drinking a lot of our own
Champagne…
Lesson #3: Essential End User Feedback Loop
• Which Features to Optimize? Which Features to „Phase Out“?
• Allows Reducing Technical and Business Debt
• Allowed us to “Call Out Sales!” for requested features nobody used!
Lesson #4: Automated Error Analysis
• Birth of “ARCHIE” our “Automated Log Archive Analyzer” integrated with JIRA
Lesson #5: We started to understand “The Cloud”
• What Cloud Services to use for which tasks!
• It SCALES but it AIN’T CHEAP if you make a mistake!
4x $$$ to IaaS
Step #3: Incubation on New Stack
Keep Innovating on Enterprise Stack
Incubate “Start Up” on New Stack
Redefining the DevOps Team’s Role
Acting as engineers
& production
managers
Dynatrace Managed/SaaS
Orchestration Layer
DynatracePipeline Visualization
Deployment Timeline
Log Overview
using Dynatrace Log API
JIRA Integrations
Monitoring as Pipeline & Platform Feature
Dev Perf/Test Ops Biz
Faster Innovation with Quality Gates
Faster Acting on Feedback
Unit Perf
Cont. Perf
New Deploy
New Capability
CI CD Remove/Promote
Triage/Optimize
Update Tests
Innovate/Design
$$$
Lower Costs
Happy Users
Lesson #6: Pipeline quality + 10 min Builds
https://github.com/Dynatrace/ufo
dynatrace.com/
ufo3d print:
dynatrace.github.io/ufo
Dev: Shift-Left - Architectural Regression Decisions
= Capturing Application Metrics
+ # of Images, # of JS, Load Time …
+ # of SQL, # of Logs, # of API Calls, # of Excepts ...
== Functional Passed / Failed
31k
Unit/Int-Tests / hour
60h
UI-Tests / Build
Dev: Shift-Left - Architectural Regression Decisions
Regression
Baseline Every Metric of every Test Stop the Pipeline Early!
https://github.com/Dynatrace/ufo
Perf / Test: Continuous Performance Validation
“Performance Signature”
for Build Nov 16
“Performance Signature”
for Build Nov 17
Lesson #7: Deploy, Fail & Recover Fast
Total Number of Users
per User Experience
Conversion Rate
Biz: User Feedback Driven Decisions
New Features + Day # 1 of Mkt Push
Overall increase of Users!
Jump in Conversion Rate!
Biz: User Feedback Driven Decisions
Users keep growing
Increase # of “tolerating” users!
Lower Conversion as Day #1
Day #2 of Marketing Campaign
Biz: User Feedback Driven Decisions
Drop in Conversion Rate
Spikes in FRUSTRATED Users!
Hotfix Deployment was rolled out
Biz: User Feedback Driven Decisions
User Experience Back to Normal
Jump in Conversion Rate!
Fix of the Hotfix was rolled out
Biz: User Feedback Driven Decisions
Lesson #7: Make Alerts more Actionable
confidential
Be proud of your feature!
DevOps  (No)Ops
Step #4: Bringing it back together
Merging Development Teams
Applying things “that work” on each
others side!
confidential
Dynatrace Transformation by the numbers
26
500
Feature Releases / Year
Deployments / Day
31000 60h
Unit & Int Tests / hour UI Tests per Build
More Quality
~120 340
Code commits / day Stories per sprint
More Agile
93%
Production bugs found by Dev
More Stability 450 99.998%
Global EC2 Instances Global Availability
Raffle for DevOps Handbook + Echo
Tweet Creative Ideas for UFO Usage to
@grabnerandi
http://www.dynatrace.com
Confidential, Dynatrace, LLC
From 6 Months Waterfall to 1h Code Deploys
“It was a long journey!”
Andreas Grabner - May 2017
@grabnerandi
THANKS

DevOps Days Toronto: From 6 Months Waterfall to 1 hour Code Deploys

  • 1.
    Confidential, Dynatrace, LLC From6 Months Waterfall to 1h Code Deploys “It was a long journey!” Andreas Grabner - May 2017 @grabnerandi
  • 2.
    Before we getstarted … How I explain DevOps (to my parents)
  • 3.
    Ship the wholebox! Quality Control Back to customer 24 “Features in a Box” Very late feedback  F r u s t r a t i o n !
  • 4.
    1 “Feature ata Time” Optimize Before DeployImmediate Customer Feedback User-Driven Continuous Delivery through DevOps
  • 5.
    WHO was Dynatracein ~ 2011? And what forced us to change?
  • 6.
    2major releases/year customers deploy& operate on-prem 2011 ~250 employees #1 APM Enterprise Space New (Stack/Cloud) Technologies New & Faster Competition Some Complaining Customers (Speed) Status Quo Market Challenge Compuware Acquisition ~3600 employees Rebranding / Integrations / Expansion
  • 7.
    believe in themission impossible Bernd Greifeneder CTO @ Dynatrace 80% 20% Organization & Culture Technology His Goal: 1h from Dev to Ops
  • 8.
    2major releases/year customers deploy& operate on-prem 26 feature releases/year 500 prod deployments/day self-service online sales SaaS & Managed 2011 2017 sprint releases (continuous-delivery) 1h: Code -> Prod6months major/minor release
  • 9.
    Step #1: RunEnterprise “SaaS-like”
  • 10.
  • 11.
    Step #2: Lift& Shift Enterprise Software into AWS
  • 12.
    Developer will neverdo that! Operator’s job
  • 13.
    Anita Engleder Dynatrace DevOpsLead First Orchestration Engine to AWS
  • 14.
    Lesson #1: Velocityuncovers new bottlenecks! •Going from 6 to 1 Month Cycles •Offered to: On-Premise Customers + SaaS-Deployments • Challenge: 1GB Monolithic Download • Impact: Error prone updates • Solution: Componentize, Automate Rollout/Rollback Capability, A/B Rollout Model
  • 15.
    Lesson #2: Needto Increase Sprint Quality • Sprint Reviews Done on “dynaSprint“ • Daily Builds get deployed on “dynaDay“. Sprint builds to “dynaSprint“ • If you can only show it “on your dev machine“ its NOT DONE! • Deploy Sprint Builds into our internal Production Environment • We monitor Website, Support, Licensing, Community ... With Dynatrace • If we break our own back office software we ALL feel the pain right away
  • 16.
    confidential Drinking a lotof our own Champagne…
  • 17.
    Lesson #3: EssentialEnd User Feedback Loop • Which Features to Optimize? Which Features to „Phase Out“? • Allows Reducing Technical and Business Debt • Allowed us to “Call Out Sales!” for requested features nobody used!
  • 18.
    Lesson #4: AutomatedError Analysis • Birth of “ARCHIE” our “Automated Log Archive Analyzer” integrated with JIRA
  • 19.
    Lesson #5: Westarted to understand “The Cloud” • What Cloud Services to use for which tasks! • It SCALES but it AIN’T CHEAP if you make a mistake! 4x $$$ to IaaS
  • 20.
    Step #3: Incubationon New Stack Keep Innovating on Enterprise Stack Incubate “Start Up” on New Stack
  • 21.
    Redefining the DevOpsTeam’s Role Acting as engineers & production managers Dynatrace Managed/SaaS Orchestration Layer DynatracePipeline Visualization Deployment Timeline Log Overview using Dynatrace Log API JIRA Integrations
  • 22.
    Monitoring as Pipeline& Platform Feature Dev Perf/Test Ops Biz Faster Innovation with Quality Gates Faster Acting on Feedback Unit Perf Cont. Perf New Deploy New Capability CI CD Remove/Promote Triage/Optimize Update Tests Innovate/Design $$$ Lower Costs Happy Users
  • 23.
    Lesson #6: Pipelinequality + 10 min Builds https://github.com/Dynatrace/ufo
  • 24.
  • 25.
    Dev: Shift-Left -Architectural Regression Decisions = Capturing Application Metrics + # of Images, # of JS, Load Time … + # of SQL, # of Logs, # of API Calls, # of Excepts ... == Functional Passed / Failed 31k Unit/Int-Tests / hour 60h UI-Tests / Build
  • 26.
    Dev: Shift-Left -Architectural Regression Decisions Regression Baseline Every Metric of every Test Stop the Pipeline Early! https://github.com/Dynatrace/ufo
  • 27.
    Perf / Test:Continuous Performance Validation “Performance Signature” for Build Nov 16 “Performance Signature” for Build Nov 17
  • 28.
    Lesson #7: Deploy,Fail & Recover Fast
  • 29.
    Total Number ofUsers per User Experience Conversion Rate Biz: User Feedback Driven Decisions
  • 30.
    New Features +Day # 1 of Mkt Push Overall increase of Users! Jump in Conversion Rate! Biz: User Feedback Driven Decisions
  • 31.
    Users keep growing Increase# of “tolerating” users! Lower Conversion as Day #1 Day #2 of Marketing Campaign Biz: User Feedback Driven Decisions
  • 32.
    Drop in ConversionRate Spikes in FRUSTRATED Users! Hotfix Deployment was rolled out Biz: User Feedback Driven Decisions
  • 33.
    User Experience Backto Normal Jump in Conversion Rate! Fix of the Hotfix was rolled out Biz: User Feedback Driven Decisions
  • 34.
    Lesson #7: MakeAlerts more Actionable
  • 36.
    confidential Be proud ofyour feature! DevOps  (No)Ops
  • 37.
    Step #4: Bringingit back together Merging Development Teams Applying things “that work” on each others side!
  • 38.
    confidential Dynatrace Transformation bythe numbers 26 500 Feature Releases / Year Deployments / Day 31000 60h Unit & Int Tests / hour UI Tests per Build More Quality ~120 340 Code commits / day Stories per sprint More Agile 93% Production bugs found by Dev More Stability 450 99.998% Global EC2 Instances Global Availability
  • 39.
    Raffle for DevOpsHandbook + Echo Tweet Creative Ideas for UFO Usage to @grabnerandi http://www.dynatrace.com
  • 40.
    Confidential, Dynatrace, LLC From6 Months Waterfall to 1h Code Deploys “It was a long journey!” Andreas Grabner - May 2017 @grabnerandi THANKS

Editor's Notes

  • #4 My analogy for Waterfall: Putting many features into a single release Ship it to some other entity who does quality control Final product comes back very late -> hard to remember which features / fotos we created. Often we realize its not what we wanted
  • #5 This is the new way of delivering software: Continuously – with small batch updates I use the analogy on how my girlfriend takes pictures: One at a time Quality Control and Optimization is in her own hands thanks to software that is “part of the delivery chain” (foto app) She also controls what to push into production -> post it on Instagram / Facebook She wants to make her users (friends & family) happy – she is hoping for LIKES! If she gets dislikes she can remove an image If she gets comments she can take another picture and deploy it within seconds -> that is Continuous User Driven Innovation
  • #7 This is where we were in 2011
  • #8 Our CTO had a vision and pressure from customers and the market. He set the goal to be able to do 1h Code Deploys The biggest challenge was the cultural change – but our CTO always believed in the Mission Impossible
  • #9 Here is what we have achieved since then before I go into details about how we got there
  • #11 We tried to take our OnPremise Product and “Lift & Shift” it to a SaaS Model. Using our existing NOC Team We learned that we had a lot of processes in place that made frequent updates very painful -> Change Request Meetings every week …
  • #13 Biggest challenge with that is that no developer wanted to take responsibility for Operations
  • #14 13
  • #15 14
  • #16 15
  • #18 17
  • #19 18
  • #20 19
  • #21 We ran our own startup within our Company
  • #22 Our DevOps Team – initially 7 people – now only 3 – are Responsible for “The Delivery Pipeline and the DevOps Tool Chain” Their Customers: The different Dev Teams that want to push features through the pipeline into production
  • #23 Our Own Transformation + what we hear from customers and the market tells us EVERYONE WANTS to CHANGE – but the biggest challenge is Org / Culture not Technology More Resources DevOps Webinar with Bernd Greifeneder (CTO): https://info.dynatrace.com/apm_dtm_ops_17q3_wc_from_enterprise_tocloud_native_na_registration.html DevOps Webinar with Anita Engleder (DevOps Manager): https://info.dynatrace.com/17q3_wc_from_agile_to_cloudy_devops_na_registration.html
  • #24 Key Lessons Learned: Raise the awareness of quality and the impact of each individual developer on the bottom line -> which is quality in production “Eat our own dogfood” aka “Drink our own Champagne” -> we install sprint builds into our internal systems Visualize Build and Pipeline Quality via UFOs Make Devs Look into production as well
  • #28 Even if the deployment seemed good because all features work and response time is the same as before. If your resource consumption goes up like this the deployment is NOT GOOD. As you are now paying a lot of money for that extra compute power Dynatrace can look at key resource, performance, scalability and architectural metrics and trend it from build-to-build. If Dynatrace detects a regression it can notify the build pipeline (Jenkins, Bamboo, TFS, …) that the current code change should not be promoted to the next phase Screenshot from Dynatrace AppMon
  • #29 Continuous Performance Testing or Continuous Performance Validation is a good Pipeline Phase to have before deploying into a Production Environment. It is an envioronment running under continuous load. New builds of individual services or complete applications get deployed on a regular basis. The question is whether a new version of a service, application or component shows any degradation in performance, scalability or resrouce consumption. If so it should not be promoted to the next phase before closer examination Dynatrace automatically understands applications but more importantly services. Dynatrace also integrates with testing tools so that traffic on certain services can be associated to certain test scenarios you run in your continuous performance environment. Based on this information it is possible to see any regressions between builds or different loads. In the example above it is easy to spot that the build from Nov 17 shows a significant performance regression. Instead of allowing this build into production it is better to look into the differences between Build Nov 16 and Build Nov 17
  • #30 After a deployment we see an issue with network connectivity and CPU utilization – impacting our end users Dynatrace not only detects that issue but shows us the complete problem evolution path which allows us to then see which change actually caused that issue to happen and how to remediate it!
  • #31 Key Lessons Learned: Raise the awareness of quality and the impact of each individual developer on the bottom line -> which is quality in production “Eat our own dogfood” aka “Drink our own Champagne” -> we install sprint builds into our internal systems Visualize Build and Pipeline Quality via UFOs Make Devs Look into production as well
  • #32 The next slides show a scenario that happened in our organization. This dashboard is used by our marketing and business teams to see how well frequented our website is (total numbers in top chart), how user experience plays out (top chart with green/yellow/red) and how many people sign up for our free trial offering (conversion rate)
  • #33 May 1st was a push of a new release and a marketing campaign started that promoted these features and tried to get people to sign up Seems everything was working as expected
  • #34 Day 2 started good but we also saw that slower web site performance (due to the heavy load) was impacting our end user experience and also conversion rate
  • #35 The Dev Team provided a hotfix to make the sign up for faster #1: It got deployed around noon #2: Fix had negative impact as it broke the whole website due to a javascript problem on certain browsers #3: problem was immediately visible to both business (drop in conversion) and dev (they looked at the reported JavaScript problems and user experience)
  • #36 Due to the fast feedback from Production the Dev Team immediately fixed that regression – bringing the system back to where they wanted it to be in the first place
  • #37 36
  • #39 We learned that we need to have self-service in our pipeline. Intuitive Dashboards, Chat Ops and Voice Ops to allow developers to pro-actively react on feedback from the pipeline
  • #40 We ran our own startup within our Company
  • #41 Number of EC2 Instances: exact number 15th May 2017: 439 instances in dev+sprint+prod(157) – prod instances increased by ~ 25. In parallel we reduced instances in dev and sprint for cost saving purpose. So in sum we still have ~450 instances as we already had end of 2016. Deployments per working day: 5680 SaaS and 2410 Managed deployments within 28 days(=20 working days): 8090/20 = 404 deployments/working day 400/8h = 50 deployments/working hour 50/60min = 0,8 deployment per minute = ever 1m18s a new deployment Prod-found bugs by dev: 1.1.2017 – 15.5.2017= 92,52 % ~ 93% Global Availability: 1.1.2017 – 15.5.2017= 100% 1.1.2016 – 15.5.2017= 99,9956 % Commits/Day: Since move to Gits commits per day went down from 200 to about 120 commits per day. Reason is that devs first collect commits on their private branch before pushing it to master.
  • #42 We ran our own startup within our Company