Site Reliability
Toil Reduction Work Sharing
Engineering (SRE) Reduce Non-Value Add Work using Tooling, Address Technical Debt in Small Increments,
Foundation SM Automation, VSM, and Platform Engineering Manage Load Percentage for Ops, Dev,
and On-Call Work
BLUEPRINT
SLAs/SLOs/SLIs Deployments
Site Reliability Engineering (SRE) is Metrics such as Availability, Latency, Progressive Deployments using Green/Blue, A/B,
a discipline and a role that incorporates and Response Time with Error Budgets Canary Deployments, Automation Scripts, Testing
aspects of software engineering and and Monitoring
applies them to infrastructure and
operations problems to create ultra Measurements Performance Management
scalable and highly reliable Observability, Monitoring, Monitoring, APM, Capacity Testing,
distributed software systems. Telemetry, Instrumentation, and AIOps Auto-Scaling, and AIOps
Culture Anti-Fragility Incident Management
Reliability @ Scale, Shift-Left “Wisdom Improve Resilience using Fire Drills, Emergency Response, On-Call, and Blameless
of Production”, Learn from Failure, and Chaos Engineering, Security and Automation Retrospectives
Continuous Learning
Continuous Integration (CI) Pipeline Continuous Delivery / Deployment (CD)
Backlog Code Commit Build SAT Approve Deploy Post-Prod
Plan & Design & Test & Merge & Test
Artifacts & UAT Release to Prod tests
Operate
© PeopleCert | DevOps Institute [Link]/devops