terra tauri

420 Followers
694 Following
107 Posts

Staff Engineer, Platform Network @ Grafana Labs

🏳️‍⚧️👩🏼‍🌾👩🏼‍💻🏳️‍🌈

The talk for KubeCon EU was accepted! I'm going to London in April!

Here's another good one:

Platforms create their value through leverage, and one aspect of leverage is efficiency—supporting substantially more scale without needing to hire more people into the platform team. However, as this chapter’s introductory quote suggests, this is in conflict with the fact that systems often run into new problems just because of scale, particularly operationally. This means constant-sized teams supporting scaling platforms can wind up in “operational hell,” where neglected operational problems start having ongoing acute business impact, eroding customer trust. As the system is handling critical load at scale, it can take months to remediate the acute impact and years to address the core issues, and all the while new product features are stalled. To avoid this, platform teams need to routinely invest in operational practices, even when times are good.

A lot of fascinating insights into how companies should be hiring and promoting in here. It will be interesting to see what people end up discussing in this section.

Reading Platform Engineering again and I came across this interesting quote:

If you only promote people who solve big technical problems, you’re going to have
a hard time retaining the people who do the work to smooth out the usability edges, actively listen to the customer teams, and adjust their work priorities to fix the stuff that is causing the most pain. So, look closely at what you are celebrating, compensating, and promoting, and make sure you are including work that makes the product better, whatever that looks like, even if it isn’t the hardest technical bits. You may even want to reevaluate your engineering ladder to make sure the expectations at each level reflect all of the skills you now demand. Remember, this is a cultural change, and cultural changes that don’t involve changes to what is valued (as seen by what you recognize and reward) are destined to fail.

Might be something that people find interesting for no particular reason.

ha the first chapter is like "maybe skip this if ur already a platform engineer", nice.

I love fundamentals, though. I watched old SICP lectures every year for the first 6-7 years of my career to keep my fundamentals sharp.

Fundamentals are so important and far-reaching, so I really like to revisit them and keep them sharp.

Oh good, there's a "how to read this book" section and the reading schedule I arbitrarily decided just happened to line up with what is recommended. 😅

I love when books have this; especially when I'm doing a book club. It's nice to have a "you should understand TOPIC at a high level by the time you're done."

@kelseyhightower is a gem and I love his quote in the praise section.

I like the framing of platform engineering as being somewhere between chaos of free-for-all (no-ops) and a fully centralized ops team.

I tend to think of it as a different way of offering centralized tooling. It's not just the way the platform team works, it's also about how the organization operates around the team.

I am thankful to work in an organization that empowers teams, so our platform engineering team works very close to how @TeamTopologies describes a platform team.

So far, so good! I'm excited to read more. 😽

I am super excited to read the first two chapters of Platform Engineering by @skamille.

I'll be leading a little book club at work this week. Tonight is the last night I can reasonably prepare, so I'll be live tooting my hot takes as I read through the first two chapters to prepare myself to facilitate discussions.

Platform Engineering

Until recently, infrastructure was the backbone of organizations operating software they developed in-house. But now that cloud vendors run the computers, companies can finally bring the benefits of agile custom-centricity … - Selection from Platform Engineering [Book]

O’Reilly Online Learning

I have been asking cloud providers for support on why nodes died in strange ways for over half a decade.

I never WANT to ask this, but sometimes it must be asked.

I have tried:
- asking nicely
- providing evidence of the failure
- pleading a case for why redundancy in the scenario is not possible

... and nothing ever really worked. I would always get hand wave answers.

This week, I learned the magic spell: "this ticket will remain open until we have a root cause analysis and a list steps that will be taken to prevent it in the future."