DevOpsDays TOKYO 2024 の登壇資料です。 https://confengine.com/conferences/devopsdays-tokyo-2024/proposal/19703/erroralertrunbook-centralized-management-of-erroralertrunbook-to-minimize-operational-costs-using-automated-code-generation
![自動生成を活用した、運用保守コストを抑える Error/Alert/Runbook の一元集約管理 / Centralized management of Error/Alert/Runbook to minimize operational costs using automated code generation](https://cdn-ak-scissors.b.st-hatena.com/image/square/33875fe9c1815c71fe1f83f6c092e33d4f09c8f0/height=288;version=1;width=512/https%3A%2F%2Ffiles.speakerdeck.com%2Fpresentations%2Fdbe26d3ac71e472ea1ed049f6e712ddc%2Fslide_0.jpg%3F29754265)
We encounter a lot of organizations talking about or attempting to implement SRE as part of our consulting at Real Kinetic. We’ve even discussed and debated ourselves, ad nauseam, how we can apply it at our own product company, Witful. There’s a brief, unassuming section in the SRE book tucked away towards the tail end of chapter 32, “The Evolving SRE Engagement Model.” Between the SLIs and SLOs,
How They SRE How They SRE is a curated knowledge repository of Site Reliability Engineering (SRE) best practices, tools, techniques, and culture adopted by leading technology or tech-savvy organizations. Numerous organizations frequently share their insights and expertise, encompassing best practices, tools, and techniques that shape their engineering culture. They do this through various public p
School of SRE Site Reliability Engineers (SREs) sits at the intersection of software engineering and systems engineering. While there are potentially infinite permutations and combinations of how infrastructure and software components can be put together to achieve an objective, focusing on foundational skills allows SREs to work with complex systems and software, regardless of whether these syste
The Art of SLOsは、GoogleのCustomer Reliability Engineeringチームによって開発されたワークショップです。このワークショップの目的は、Googleがサービスの信頼性を計測する方法 サービスレベル指標(SLI) とサービスレベル目標 (SLO)を参加者に紹介し、実際にこれらの計測方法を作成することを体験してもらうことです。これらは重要で土台となる概念です。サービスの信頼性を客観的に測定する方法があれば、サービスの信頼性について有意義な会話をすることがはるかに簡単になります。 ワークショップの理論編では、開発チームと運用チームの間でしばしば生じる組織的な緊張を、サービスの望ましい信頼性を表す目標値を設定することで解決する方法を学びます。また、SLOとエラーバジェットを使って、データ駆動で、客観的、かつユーザー重視の方法でサービスの信頼性を測定・
In cloud operations, we often hear about the benefits of microservices over monolithic architecture. Indeed, microservices help manage hardware being abstracted away and push developers towards resilient, distributed designs. However, many enterprises still have monolithic architectures which they need to maintain. For this post, we’ll use Wikipedia’s definition of a monolith: “A single-tiered sof
How SRE teams are organized, and how to get started At Google, Site Reliability Engineering (SRE) is our practice of continually defining reliability goals, measuring those goals, and working to improve our services as needed. We recently walked you through a guided tour of the SRE workbook. You can think of that guidance as what SRE teams generally do, paired with when the teams tend to perform t
Google has released a new book: The Site Reliability Workbook — Practical Ways to Implement SRE. It's the second book in their SRE series. How is it different than the previous Site Reliability Engineering book? David Rensin, a SRE at Google, says: It's a whole new book. It's designed to sit next to the original on the bookshelf and for folks to bounce between them -- moving between principle and
リリース、障害情報などのサービスのお知らせ
最新の人気エントリーの配信
処理を実行中です
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く