SRE岗位理解(上篇)—读SRE实战手册有感

本文介绍了SRE(站点可靠性工程师)的角色和职责,强调了其在提升系统稳定性和可用性方面的重要性。通过Google的SRE实践,详细解释了SLI(服务级别指标)、SLO(服务级别目标)和错误预算的概念,并阐述了如何通过SLI和SLO来设定和衡量系统稳定性。此外,文章还探讨了在落地SLO时需要考虑的因素,包括核心链路的识别、服务依赖关系分析和验证策略,以确保在保证核心业务稳定的同时,实现整体系统的高效运维。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1. 引言

什么是SRE呢?SRE全称为:Site Reliability Engineering,意为:站点可靠性工程师。

SRE这个概念来自Google,Systems Engineer, Site Reliability Engineering是Google招聘给出的职位描述,我们具体看看这个岗位的要求:

image-20210128011732635

职位简介:

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Google’s services—both our internally critical and our externally-visible systems—have reliability, uptime appropriate to users’ needs and a fast rate of improvement. Additionally SRE’s will keep an ever-watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation.

On the SRE team, you’ll have the opportunity to manage the complex challenges of scale which are unique to Google, while using your expertise in coding, algorithms, complexity analysis and large-scale system design.
SRE’s culture of diversity, intellectual curiosity, problem solving and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.

To learn more: check out our books on Site Reliability Engineering, watch a recorded Hangout on Air to meet some of our SREs, or read a career profile about why a Software Engineer chose to join SRE.

Behind everything our users see online is the architecture built by the Technical Infrastructure team to keep it running. From developing and maintaining our data centers to building the next generation of Google platforms, we make Google’s product portfolio possible. We’re proud to be our engineers’ engineers and love voiding warranties by taking things apart so we can rebuild them. We keep our networks up and running, ensuring our users have the best and fastest experience possible.

image-20210128012932468

image-20210128013316953

Google SRE是目前最稳定性领域的最佳实践,在引入了微服务、容器,以及其他的分布式技术和产品之后,复杂架构的系

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值