Site Reliability Engineer Nanodegree Program Syllabus
Site Reliability Engineer Nanodegree Program Syllabus
Learning Objectives
• Use proactive and reactive SRE strategies (monitoring, postmortem, team building, etc.) to identify
reliability risks through evaluating systems and processes.
• Develop customer-centric SLOs (such as percentile targets for availability, latency, and correctness) and
set up corresponding monitoring and risk mitigation measures to ensure customer happiness.
• Create and deploy automated self-healing architectures and other technologies to make the environment
more maintainable.
• Design and implement organizational processes and culture that enhance product reliability, including
outage/postmortem review, quarterly state of production presentation, and production readiness review.
Prerequisites
• Write basic functions in an object-oriented language (Python or Java), such as for loops, conditionals, control flow, Python
methods, Java methods, etc.
• Write basic shell scripts in Bash or Powershell, which could include for loops, conditionals, scripting, etc.
• Exercise networking skills including knowledge of virtual networks, DNS, subnets, and basic network troubleshooting
techniques.
• Perform DevOps tasks, such as setting up monitoring, doing feature rollout, and troubleshooting production issues (ideally
for large systems).
• Work with Kubernetes and basic kubectl, such as kubectl apply, kubectl create, kubectl config.
Required Hardware/Software
There are no software and version requirements to complete this Nanodegree program. All coursework and projects can be
completed via Student Workspaces in the Udacity online classroom.
*The length of this program is an estimation of total hours the average student may take to complete all required
coursework, including lecture and project time. If you spend about 5-10 hours per week working through the program, you
should finish within the time provided. Actual hours may vary.
Foundations of Observability
In this course learners will focus on what observability requires in terms of people and tools. To begin with, we will introduce
SRE, its roles and responsibilities, and how those differ from other teams (DevOps, SysAdmin, Development). Once learners
establish that, they will see how SRE helps an enterprise improve and discuss the costs associated with SRE. Learners will come
to know the types of members of the SRE team, then end with the tool set that an SRE team may use to be successful.
Course Project
Lesson 4 • Create a dashboard for host metrics (latency, errors, resource utilization CPU/
RAM Disk I/O), observability dashboards, and site reliability metrics.
Monitoring System
Performance • Install and configure a synthetic monitoring solution.
Course 2
Deploying HA Infrastructure
In this project, learners will design and deploy HA infrastructure through Terraform and deploy it to AWS.
They will start by defining SLOs and SLIs and create a dashboard in Grafana for those objectives. Next, they
will create a disaster recovery plan and define their high-availability infrastructure. Learners will take what
they build and form Terraform code to deploy the infrastructure to multiple AWS regions. Finally, they will
deploy replicated databases through Terraform code to AWS.
Course 3
Self-Healing Architecture
Learn how to deploy microservices or cloud architecture that is resilient enough to withstand failures and predictable enough
to resolve issues via automation without human intervention. This framework is known as self-healing architecture. Begin
by learning some self-healing system design fundamentals such as single points of failure and three-tier architecture. Then
we will show some self-healing deployment strategies, implementation steps, and use cases. Finally, we’ll cover some cloud
automation that learners can use to increase the resiliency of systems, such as auto-scaling automation.
Course Project
Deployment Roulette
Play the role of an engineer at a growing consulting firm. Applications left by a departing team are in an
undocumented, unknown state. Identify failing applications and implement fixes to resolve the problems.
Create an architecture diagram that communicates the status of the cloud environment to improve the
onboarding of future developers.
Lesson 2 • Describe multiple deployment strategies and their benefits and drawbacks.
Course 4
Nathan is a Certified Six Sigma Black Belt and has 10+ years of experience in IT in multiple
industries. He is also the Instructor for two other Udacity courses: Ensuring Quality Releases and
Azure Performance.
Travis Scotto
Site Reliability Engineer
Travis Scotto has worked in technology for 10 years. He has worked in various infrastructure roles:
virtualization, databases, and monitoring. As an SRE, he employs automation and monitoring daily.
He also has adjunct taught IT classes for 4.5 years.
Emmanuel Apau
CTO at Mechanicode.io
Emmanuel is cofounder of the Black Code Collective and DC’s Technical.ly RealLIST Engineer
award recipient. An AWS Certified DevSecOps specialist with 12 years of experience, he has
spent his career developing innovative solutions using DevSecOps and site reliability best
practices.
Sonny Sevin
Site Reliability Engineer
Sonny is an SRE with a varied background. He has dabbled in research at Lawrence Berkeley
National Labs before moving into site reliability engineering to have a more hands on role. He
has been published in several computing journals, as well as taught introductory programming
courses.
• Project review cycle creates a feedback loop with multiple opportunities for
improvement—until the concept is mastered.
• Project reviewers leverage industry best practices and provide pro tips.
• Unlimited access to mentors means help arrives when it’s needed most.
• 2 hr or less average question response time assures that skills development stays on track.
Empower job-readiness.
• Access to a Github portfolio review that can give you an edge by highlighting your
strengths, and demonstrating your value to employers.*
• Get help optimizing your LinkedIn and establishing your personal brand so your profile
ranks higher in searches by recruiters and hiring managers.
Mentor Network
• Mentors work across more than 30 different industries and often complete a Nanodegree
program themselves.
01.13.22 | V1.0