Mastodawn

🚀📕 The GPU + Kubernetes book is finally here. After six months of rabbit holes, I finally understood why this problem was so hard.

When I started, I thought GPUs were just fancy parallel processors. Mount the device, set some resource limits, and done. Then I learned that GPUs can't even pause a running kernel. Once computation starts, it runs to completion - no preemption, no time-slicing in the CPU sense, nothing. The hardware was designed this way for maximum throughput, and no amount of software can change it.

This fundamental difference breaks every assumption Kubernetes makes about resources. The Linux kernel sees and controls every CPU cycle and memory page. But GPU operations? They happen in a black box managed by the NVIDIA driver. The kernel is completely blind.

So I wrote this book. Six chapters that trace the problem from hardware to orchestration:

1️⃣ Why containers work beautifully for CPUs (syscalls, cgroups, namespaces) and why GPUs break every one of these assumptions. You'll understand exactly how device plugins trick Kubernetes into accepting GPUs it can't actually manage.

2️⃣ How traditional Kubernetes isolation completely fails for GPUs. When two pods share a GPU, there's no cgroup enforcement, no memory isolation, nothing. One pod can crash everything.

3️⃣ The truth about "GPU sharing" tools. KAI-Scheduler and NVIDIA's "time-slicing" don't share anything - they just orchestrate turn-taking. Your pods still wait in line for exclusive GPU access.

4️⃣ MIG vs HAMi vs vGPU. When you actually need hardware partitioning (spoiler: probably never), and why seven T4s might serve you better than one H100 with MIG.

5️⃣ Why nvidia-smi lies to you, Kubernetes metrics lie differently, and DCGM reveals that 60-70% of your GPU budget is wasted on idle resources.

6️⃣ How to share GPU clusters across teams without namespace chaos. Virtual clusters give each team its own control plane while efficiently sharing the underlying hardware.

Download the free book here: https://ku.bz/gpu-k8s

💡 If you want to go deeper, join me for a live discussion this Wednesday, where I will answer your GPU questions and explain how the book came to be https://ku.bz/g8gXCKW12