Site reliability engineering (SRE) is an essential part of engineering at Google—it’s a mindset, and a set of practices, metrics, and prescriptive ways to ensure systems reliability. But not everyone knows the best places to start to implement SRE in their own organizations. Here are our top resources at Google Cloud for getting started.1. Do you have an SRE team yet? How to start and assess your journeyWe’re often asked what implementing SRE means in practice, since our customers face challenges quantifying their success when setting up their own SRE practices. In this post, we share a couple of checklists to be used by members of an organization responsible for any high-reliability services. These will be useful when you’re trying to move your team toward an SRE model. Implementing this model at your organization can benefit both your services and teams due to higher service reliability, lower operational cost, and higher-value work for everyone on the team.Related ArticleDo you have an SRE team yet? How to start and assess your journeyThis post shares checklists you can use when you’re trying to move your team toward an SRE model. These checklists can be useful as a for…Read Article2. SRE fundamentals: SLIs, SLAs and SLOsCore to the definition of SRE is the idea that metrics should be closely tied to business objectives. Thus, a big part of the day-to-day of SREs is establishing and monitoring these service-level metrics. At Google, we use several essential measurements—SLO, SLA and SLI—in SRE planning and practice. This post gives you an overview of what each of these acronyms are, what they mean, and how to incorporate them.Related ArticleSRE fundamentals: SLIs, SLAs and SLOsA big part of SRE is establishing and monitoring service-level metrics like SLOs, SLAs and SLIs. This post gives you an overview of what …Read Article3. How SRE teams are organized, and how to get startedYou know what SREs do and understand which best practices should be implemented at various levels of SRE maturity. Now you’re ready to take the next step by setting up your own SRE team. In this post, we’ll cover how different implementations of SRE teams establish boundaries to achieve their goals. We describe six different implementations that we’ve experienced, and what we have observed to be their most important pros and cons.Related ArticleHow SRE teams are organized, and how to get startedGetting started with SRE often starts with understanding SRE principles and how teams are organized. Find tips here on which SRE team imp…Read Article4. Meeting reliability challenges with SRE principlesThrough years of work using SRE principles, we’ve found there are a few common challenges that teams face, and some important ways to meet or avoid those challenges. Learn what we at Google think are the three top sources of production stress and how we recommend addressing them.Related ArticleMeeting reliability challenges with SRE principlesFollowing SRE principles can help you build reliable production systems. When getting started, you may encounter three common challenges….Read Article5. Transitioning a typical engineering ops team into an SRE powerhousePerpetually adding engineers to ops teams to meet customer growth doesn’t scale. Google’s SRE principles can help, bringing software engineering solutions to operational problems. In this post, we’ll take a look at how we transformed our global network ops team by abandoning traditional network engineering orthodoxy and replacing it with SRE. You’ll learn how Google’s production networking team tackled this problem and consider how you might incorporate SRE principles in your own organization.Related ArticleTransitioning a typical engineering ops team into an SRE powerhouseMoving a network operations team to an SRE-driven model took some time, but was well worth the effort, as teams can focus on reliability …Read ArticleLots more to readCan’t wait to read more about SRE? We wrote an entire book on SRE to help you get started (actually, we’ve written more than one). You can also find all our DevOps and SRE blog content or follow our columns on Customer Reliability Engineering.Related ArticleHow do you eat an elephant? Google SREs talk digital transformationIt’s not just about technology. Google Cloud SREs touch on the human and organizational side of a cloud migration.Read Article
Quelle: Google Cloud Platform
Published by