[B! sre] daikikoharaのブックマーク

自動生成を活用した、運用保守コストを抑える Error/Alert/Runbook の一元集約管理 / Centralized management of Error/Alert/Runbook to minimize operational costs using automated code generation

DevOpsDays TOKYO 2024 の登壇資料です。 https://confengine.com/conferences/devopsdays-tokyo-2024/proposal/19703/erroralertrunbook-centralized-management-of-erroralertrunbook-to-minimize-operational-costs-using-automated-code-generation

daikikohara 2024/04/18

リンク

GitHub - OneUptime/oneuptime: OneUptime is an open-source complete observability platform.

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

daikikohara 2023/01/20

リンク

https://services.google.com/fh/files/misc/state-of-devops-2021.pdf

daikikohara 2021/11/27

リンク

SRE Doesn’t Scale

We encounter a lot of organizations talking about or attempting to implement SRE as part of our consulting at Real Kinetic. We’ve even discussed and debated ourselves, ad nauseam, how we can apply it at our own product company, Witful. There’s a brief, unassuming section in the SRE book tucked away towards the tail end of chapter 32, “The Evolving SRE Engagement Model.” Between the SLIs and SLOs,

daikikohara 2021/10/08

sre

リンク

Actuating Google Production: How Google’s Site Reliability Engineering Team Uses Go

daikikohara 2021/04/14

リンク

GitHub - upgundecha/howtheysre: A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)

How They SRE How They SRE is a curated knowledge repository of Site Reliability Engineering (SRE) best practices, tools, techniques, and culture adopted by leading techno logy or tech-savvy organizations. Numerous organizations frequently share their insights and expertise, encompassing best practices, tools, and techniques that shape their engineering culture. They do this through various public p

daikikohara 2021/02/16

SREのawesome的なレポジトリ。会社ごとにまとまってるのは有り難い。

sre
awesome

リンク

School Of SRE

School of SRE Site Reliability Engineers (SREs) sits at the intersection of software engineering and systems engineering. While there are potentially infinite permutations and combinations of how infrastructure and software components can be put together to achieve an objective, focusing on foundational skills allows SREs to work with complex systems and software, regardless of whether these syste

daikikohara 2020/12/07

リンク

SRE principles in practice for business continuity | Google Cloud Blog

daikikohara 2020/06/12

sre
google

リンク

SRE Classroom: The Art of SLOs - Google

The Art of SLOsは、GoogleのCustomer Reliability Engineeringチームによって開発されたワークショップです。このワークショップの目的は、Googleがサービスの信頼性を計測する方法サービスレベル指標(SLI) とサービスレベル目標 (SLO)を参加者に紹介し、実際にこれらの計測方法を作成することを体験してもらうことです。これらは重要で土台となる概念です。サービスの信頼性を客観的に測定する方法があれば、サービスの信頼性について有意義な会話をすることがはるかに簡単になります。ワークショップの理論編では、開発チームと運用チームの間でしばしば生じる組織的な緊張を、サービスの望ましい信頼性を表す目標値を設定することで解決する方法を学びます。また、SLOとエラーバジェットを使って、データ駆動で、客観的、かつユーザー重視の方法でサービスの信頼性を測定・

daikikohara 2020/03/26

リンク

SRE for single-tiered software applications | Google Cloud Blog

In cloud operations, we often hear about the benefits of microservices over monolithic architecture. Indeed, microservices help manage hardware being abstracted away and push developers towards resilient, distributed designs. However, many enterprises still have monolithic architectures which they need to maintain. For this post, we’ll use Wikipedia’s definition of a monolith: “A single-tiered sof

daikikohara 2020/02/24

sre
google

リンク

Incident Management in the Age of DevOps & SRE

daikikohara 2020/01/28

incident
sre

リンク

【SRE Next 2020】発表資料まとめ - Qiita

Register as a new user and use Qiita more conveniently You get articles that match your needsYou can efficiently read back useful informationYou can use dark themeWhat you can do with signing up

daikikohara 2020/01/25

リンク

https://cloudskills.fm/058

daikikohara 2020/01/16

リンク

SRE at Google: How to structure your SRE team | Google Cloud Blog

How SRE teams are organized, and how to get started At Google, Site Reliability Engineering (SRE) is our practice of continually defining reliability goals, measuring those goals, and working to improve our services as needed. We recently walked you through a guided tour of the SRE workbook. You can think of that guidance as what SRE teams generally do, paired with when the teams tend to perform t

daikikohara 2020/01/06

sre

リンク

DEV, meet Site Reliability Engineering

daikikohara 2019/10/23

sre
dev.to

リンク

The service mesh era: Using Istio and Stackdriver to build an SRE service | Google Cloud Blog

daikikohara 2019/03/07

istio
sre

リンク

File not found - Google

File not found - 404 The document you requested does not exist, or is unavailable at this time. Please check the URL for typos.

daikikohara 2018/07/26

すばら

リンク

Google's New Book: The Site Reliability Workbook - High Scalability -

Google has released a new book: The Site Reliability Workbook — Practical Ways to Implement SRE. It's the second book in their SRE series. How is it different than the previous Site Reliability Engineering book? David Rensin, a SRE at Google, says: It's a whole new book. It's designed to sit next to the original on the bookshelf and for folks to bounce between them -- moving between principle and