Alright, next up is "Evicted! All the Ways Kubernetes Kills Your Pods (and How To Avoid Them)" by Ahmet Alp Balkan (of kubectx/kubens fame)

#kubecon #kubernetes

We're ignore application-level errors/crashes. There are 8 different ways (to his knowledge) (but maybe he just learned another).

What's the most important kubernetes component for reliability? Is it apiserver? core-dns? kubelet?

....

Nope.

Rather, it's a property: inertia. Objects that are running should stay running.

"Kubernetes was not designed with stateful systems in mind."

Shhhhhh Ahmet you're not supposed to say that part out loud!!!

#kubernetes #kubecon

LinkedIn runs on bare metal, so they care a LOT about not moving things around. You literally see every single failure that happens when a pod gets moved.

There are extremely few knobs in Kubernetes to manage evictions, and most of them are "on/off" knobs, you don't get any fine-grained policies or configs around disruption.

(AND half the core kubernetes controllers ignore the controls that exist in the first place, 😡 )

#kubecon #kubernetes

First way to evict a pod is the pod delete API. Kubernetes didn't terminate your pod, you terminated your pod!

It doesn't do PDB checks!!!! Deployment controller uses this, so rollouts don't even check this!

Second way to evict a pod is the pod eviction API, which does respect PDBs.

This is the last time you will hear about the eviction API. Nobody uses it. Which is a real &$*%ing shame because everything should.

Can you write a webhook for the eviction API?

🤔

We're going to come back to this.

Third way to evict a pod: kubelet-initiated evictions. Kubelet is trying to protect the node as well as its own integrity. Actually quite a few ways that it does so.

Node pressure evictions: (disk/memory/inodes/PIDs)

Kubelet starts killing pods before the node/OS/kernel does, hopefully to save things.

Kubelet dgaf about your PDB.

When a pod gets terminated state (e.g. by kubelet) it's stuck there. Ain't nobody going to restart your pod.

Deployment controller, replicaset controller don't restart your pod, they create a new pod somewhere else.

Next is kubelet admission checks: kubelet will deny pods on its own, and transition to its failed state.
Restarting kubelet in place is unsafe, because it can start failing pods because of state mismatch. But kubelet can also just crash its own state and bring down your website.

Next mode is kubelet local storage evictions: rather new feature in Kubernetes. If you set ephemeral storage constraints on your pods, kubelet will evict your pod.

Again, no PDBs here.

Kubelet is the honey badger of Kubernetes. It DGAF.

#kubernetes #kubelet #kubecon

Next eviction path is the scheduler: if there's no room in the cluster, it will evict lower-priority pods to make room for higher-priority pods.

It DOES actually take PDBs into account, but it's BEST EFFORT. It will still pre-empt lower priority pods even if it would violate a PDB.

Next eviction path is taint-based eviction: the dreaded NoExecute taint. (Unreachable taint, NotReady taint); after the toleration period, if the node taints are violated your pod will get evicted.
You can actually do something about this now. The TaintManager controller is now separate from the taint eviction controller.

"Who here has accidentally deleted all your pods?"

drmorr raises his hand.

PodGC controller is the next eviction path: the pods are orphaned and something needs to clean them up.

Example: if you delete the node object (not the physical node, but the node object in etcd), the pods become orphaned and the podgc controller will clean up the pods (even if they're still actually running).

There's a new KEP4563 that lets you interrupt some of these eviction things.

Actions you can take:

- look into kubelet eviction threshold settings
- disaster recovery drills
- tolerations for stateful apps
- admission controls for evictions
- understand what happens when a pod fails

@drmorr I am not sure but I think that the information is not accurate and therefore miss leading.
We run >1000 Postgres clusters in the cluster. We use AWS as instance provider so a bit different but not much from kubernetes pov.
We use pdb, taints and drain or eviction.
We have a global controller that executes cluster updates and a daemonset that executes node readiness and termination.
My colleague talked about how to execute updates. It’s for sure a bit outdated but the core concept is running.
https://youtu.be/1xHmCrd8Qn8?si=W07S8rak20kyywW5
Continuously Deliver your Kubernetes Infrastructure - Mikkel Larsen, Zalando SE

YouTube

@sszuecs

> the information is not accurate and therefore miss leading.

Sorry, what information do you think is inaccurate? I can confirm that all of the things I mentioned in the thread are in fact ways that kubernetes might evict a pod.

@drmorr an easy one to explain is that it was mentioned that kubernetes is not made for stateful applications, because pods get a new name.
This is not true for statefulset pods.
Of course you likely need a controller to have advantages for stateful applications to run it in kubernetes. So I don’t wanted to say it’s wrong but also not completely true.