Mastodawn

Liz Fong-Jones (方禮真)Jan 25, 2019

But he's an SRE not a historian. His job is to help people make their services more reliable, and help people understand what reliable actually means. Reliability is very important. #devopsdaysNYC

Bluesky

Bluesky Social

Show thread

Liz Fong-Jones (方禮真)Jan 25, 2019

Use of the phrase has been dramatically on the rise since the mid-1980s according to the Google Books ngram projects [ed: one of my favorite things to have been an SRE for!]. Why? Because everything is a Service. IaaS, PaaS, DBaaS, etc. etc. #devopsdaysNYC

Bluesky

Bluesky Social

Show thread

Liz Fong-Jones (方禮真)Jan 25, 2019

We need a new language around service reliability. What does our stack of reliability primitives look like? First, we have service level indicators that are metrics that define how well a service is operating (e.g. ratio of good events to total events). #devopsdaysNYC

Bluesky

Bluesky Social

Show thread

Liz Fong-Jones (方禮真)Jan 25, 2019

Important to measure from your user's perspective. the SLO sets a threshold on the SLIs. Nothing is ever 100% reliable, so SLOs let us pick a more reasonable number. And finally, the error budget calculates how our SLO has performed over time. #devopsdaysNYC

Bluesky

Bluesky Social

Show thread

Liz Fong-Jones (方禮真)Jan 25, 2019

e.g. "you can have 43 bad minutes per 30 days" rather than thinking in terms of nines. Finally, the SLA implies that there's some kind of contract or compensation involved. Less important to us as SREs than SLOs. #devopsdaysNYC

Bluesky

Bluesky Social

Show thread

Liz Fong-Jones (方禮真)Jan 25, 2019

SLIs are really overlooked compared to SLAs, SLOs, and error budgets but we need quality SLIs. #devopsdaysNYC

Bluesky

Bluesky Social

Show thread

Liz Fong-Jones (方禮真)Jan 25, 2019

This results in happier users, rather than having your stack crumble and catch on fire. #devopsdaysNYC

Bluesky

Bluesky Social

Show thread

Liz Fong-Jones (方禮真)Jan 25, 2019

And hopefully your engineers will be happier too because they'll stop getting paged about things that are not end user experience problems. #devopsdaysNYC

Bluesky

Bluesky Social

Show thread

Liz Fong-Jones (方禮真)

and hopefully your product teams and stakeholders will be happier too. Your service has one job: to be dependable. Your users define your reliability and dependability, not your own internal metrics. #devopsdaysNYC

Bluesky

Bluesky Social

Show thread

Liz Fong-Jones (方禮真)Jan 25, 2019

How does this actually look? We don't just want to verify the service is running, we have to make sure that it's available to users and performant enough. and returning correct results. #devopsdaysNYC

Bluesky

Bluesky Social

Show thread

Liz Fong-Jones (方禮真)Jan 25, 2019

There are a lot of things to care about, how can we measure only a few things to get everything? If you start from the most complex thing (correctness), you automatically get availability etc. You still have to measure responsiveness separately. #devopsdaysNYC

Bluesky

Bluesky Social

Show thread

Liz Fong-Jones (方禮真)Jan 25, 2019

If you do this, users will have their experiences measured better, engineers will get paged less, and product teams will have better metrics for their products. #devopsdaysNYC

Bluesky

Bluesky Social

Show thread

Liz Fong-Jones (方禮真)Jan 25, 2019

Take an example shopping website service: if we just check whether we can login, that doesn't validate whether people can add items to the cart. You need to look at the *entire* user journey. #devopsdaysNYC

Bluesky

Bluesky Social

Show thread

Liz Fong-Jones (方禮真)Jan 25, 2019

It absolutely is more work to measure black-box from users' perspectives [ed: or RUM], but you can measure your service the way users experience it. It's worth it. #devopsdaysNYC

Bluesky

Bluesky Social

Show thread

Liz Fong-Jones (方禮真)Jan 25, 2019

The things people generally measure today like API uptime, error rates, or database query latency don't tell you anything about whether users can log in, buy things, or search for items. Take a step aside and think from your users' perspective. [fin] #devopsdaysNYC

Bluesky

Bluesky Social

Show thread

Liz Fong-Jones (方禮真)Jan 25, 2019

[ed: he was brilliant as always and I miss working with him!] #devopsdaysNYC

Bluesky

Bluesky Social