Hot take: the default "we'll just run our own Kubernetes" in startup life stems from the fact that we are afraid of architecture. In the "nobody ever got fired for using..." we've added this behemoth of complexity and overhead.
In the late 10s, I made a not insignificant amount of money by helping startups either avoid k8s entirely or moving them off of it. Maybe it was moving to managed k8s, or maybe it was moving to something more "old skool" like ec2. The deployment process becomes more clear, the development process becomes less hairy, and we don't have to have multiple teams of SREs just trying to keep our infrastructure up.
My hypothesis is that we've gotten here because we think that agile processes mean we don't do quite as much design. In some cases, that's absolutely true. Agile processes help us pivot quickly. However, agile is for the manufacture of software, not design. We still have to consider architecture, even if only to acknowledge there is nothing novel about our design. Oftentimes, it means that we skip over the one or two _absolutely_ novel things about our software, which leads to problems.

@iamtherockstar I helped move Roblox to a containerized architecture, I wrote the training program for helping employees spin up new services and deploy them.

Roblox is actually big enough to justify k8s, but we didn't roll it out. Mostly based on the SRE pushback and capacity. But the devs really wanted it.

When I drilled into what the devs really wanted, it was less specifically about k8s and more about the notion that they wanted specific features that were supported by the k8s tooling. Not just orchestration, but things like Istio as well.

Istio and Service Mesh would have made devs lives far easier. And the Platform team as well. But it was also "free" to them because managing it becomes an "ops problem".

In our case it was less about architecture and more about moving faster by offloading work.

Ultimately, as you noted, that cost is real. And most orgs really can't really staff that. So they just end up handing it off to a cloud.

@gatesvp Also, I can see Roblox actually having a scale issue where k8s is helpful, e.g. on demand autoscaling, resource management, etc. I suspect that your architecture ended up being more purpose-built for your application(s), which probably meant you had spent more time in design than the folks who just take k8s off the shelf and run with it.

If a business evaluates things seriously and goes with k8s, cool. It's the k8s-as-default-infrastructure that is an issue.

@iamtherockstar Oh, we absolutely had the scale to justify such a thing. But we notably made it that far without actually having such a thing.

The nature of my research on that front made it pretty clear that k8s was powerful, but had a high bar for entry. And this research was literally part of my job.

This was back in 2020-2022 range, and I concluded back then that the vast majority of applications were best served using some form of hosted platform. Like Heroku or Azure Serverless, AWS Serverless etc. There were already so many tools that could run your services for you while offering really high availability databases/queues/caches.

If I were at a start-up tomorrow, I would be leveraging those as much as possible. Even hosted k8s would be way down on my priority list. There are so many steps on that staircase before K8s.

@gatesvp Just gonna say this here, publicly, for everyone to see: this is the kind of engineering that won't be replaced by AI, the kind of research, the thoughtful conclusion that is less about tech and more about the whole picture. It's disappointing that it's so rare in our field these days, but it's a crucial job and crucial viewpoint that isn't appreciated enough.