Look out, honey, ’cause I’m using technology Ain’t got time to make no apology The Stooges, “Search and Destroy” For the last year or so, I’ve made a conscious effort to stop apologizing for bugs in my code. Apologizing for bugs is very tempting. I used to do it a lot. When my code was involved in a failure that screwed up a coworker’s day or caused a user-facing problem, I’d say “Whoops! Sorry! I
One of the first jobs I took on at Hashicorp was to create a training document for our corps of Incident Commanders. It was a super interesting task, because it gave me an opportunity to synthesize a whole bunch of thoughts I’ve been exposed to during my many years of responding to incidents. Below is the training document I wrote, mostly unedited. I hope it can be of some use to you, whether you’
Every ops team has some manual procedures that they haven’t gotten around to automating yet. Toil can never be totally eliminated. Very often, the biggest toil center for a team at a growing company will be its procedure for modifying infrastructure or its procedure for provisioning user accounts. Partial instructions for the latter might look like this: Create an SSH key pair for the user.Commit