Mastodawn

We’re excited to welcome Lucerne University of Applied Sciences and Arts (HSLU) as a Bronze Sponsor of DevOpsDays Zürich 2026!

HSLU stands for practice-driven education and applied research, exactly the mindset our community thrives on: learning by doing, sharing real stories, and building what works in the field.

Thanks, HSLU, we’re proud to have you with us!

DevOpsDays #DevOpsDaysZurich #DevOps #PlatformEngineering #SRE

Romano Roth 19h ago

Just wrapped reviewing #DevOpsDays Zurich 2026 proposals 🤯

387 talks
47 workshops
28 ignites

That’s a tidal wave of ideas from our community 🌊🙌 Huge thanks to everyone who submitted!

Program drops soon — stay tuned 👀

#DevOpsDaysZurich #DevOps #PlatformEngineering #SRE

Show thread

strickvl 1d ago

If you're building this right now, I'd sanity-check 3 questions:

Can jobs queue (not crash) when capacity is short? Adept hit this gap with orchestration vs Slurm.

Do you measure GPU busy vs allocated, not just "utilisation"?
Is preemption a documented promise users can plan around?

GPU scheduling is a governance system pretending to be a data structure.

#MLOps #PlatformEngineering #MLInfrastructure #GPU

Show thread

strickvl 2d ago

The solution class isn't "pick the perfect scheduler."

It's make the allocation model legible:
→ Explicit baselines (quotas) so planning is possible
→ Borrowing of idle capacity so utilisation doesn't tank
→ Priority tiers with preemption contracts
→ Shared unit economics so finance and engineering argue from the same facts

Priority queues work when people believe the system is fair.
That belief is governance. The scheduler just enforces it.
#MLOps #PlatformEngineering #FinOps

Show thread

strickvl 3d ago

Each phase optimises for something different. The Wild West optimises for local speed. Static quotas optimise for local safety. Flexible borrowing optimises for global throughput and, maybe more importantly, legitimacy. (Everyone understands the rules.)

I'll be writing more about queuing and priority over the next couple of weeks. There's a lot to unpack here.

#MLOps #MachineLearning #PlatformEngineering #GPU

Show thread

strickvl 4d ago

If you're in one of these roles: you're doing something undeservedly difficult. The technical complexity alone is immense. Doing it while navigating organisational politics, competing priorities, and limited recognition? That takes something special.

Hats off to you. Your work matters... even when nobody says it!

#MLOps #PlatformEngineering #MachineLearning

Sidero Labs 4d ago

New in Talos: Multi-doc config for networking. No more bricking a remote server just because there’s a single typo in a 200-line configuration file.

→ https://www.siderolabs.com/blog/talos-omni-q4-2025-updates?utm_source=mastodon&utm_medium=social&utm_campaign=q4-2025&utm_content=networking

#K8s #BareMetal #PlatformEngineering #DevOps #SRE

From staged networking to cluster imports: What's new in Q4 2025

Talos Linux and Omni were purpose-built to solve the difficulty of bare metal. Our latest updates replace the high-stakes risk of physical deployments with cloud-like automated guardrails, enabling staged provisioning, seamless remote management, and stability rooted in automated governance rather than reactive troubleshooting. The result is infrastructure that is self-healing, immutable, and resistant to human […]

Sidero Labs

The Linux Foundation 5d ago

🛠️ Take your LLM skills to production! Join Rust for Interfacing with Language Models (LFWS309) and build real-world LLM apps in Rust with hands-on labs and expert guidance.

Focus on what matters in production:
🔹 Type-safe integrations
🔹 Streaming responses
🔹 Structured outputs
🔹 RAG pipelines
🔹 Agent workflows

Next session starts soon — enroll now: https://training.linuxfoundation.org/training/rust-for-interfacing-with-language-models-lfws309/

#Rust #AIEngineering #LLMs #PlatformEngineering

Rust for Interfacing with Language Models (LFWS309) | Linux Foundation Education

Create real-world LLM applications in Rust with the PAIML stack, optimized for performance, scalability, and reliability.

Linux Foundation - Education

The Linux Foundation 5d ago

CNCF survey data shows cloud native technologies established as foundational infrastructure, with Kubernetes widely used in production.
Operational maturity increasingly shapes how organizations run AI workloads.

Watch the video highlights for key findings from the data and learn more at https://www.linuxfoundation.org/research/cncf-2025-annual-survey

#CloudNative #OpenSource #Kubernetes #CNCF #PlatformEngineering #DevOps #Infrastructure #AI

Show thread

Mark Gardner Jan 31

At some point I should write up on https://PhoenixTrap.com a segmented walkthrough of my #SelfHosted setup, including the things that the #Mastodon documentation leaves out about data retention and backups.

Someday.

#SelfHosting #MastoAdmin #FediAdmin #DevOps #PlatformEngineering

The Phoenix Trap: Code, music, philosophy, etc.

Personal blog of Mark Gardner. I help Perl developers build modern, disciplined applications by writing easy-to-maintain code with confidence.

The Phoenix Trap