I've been a little rough and irresponsible with my #baremetal #Kubernetes cluster, especially when it comes to randomly rebooting nodes. Today I fixed that.
I'm running a bunch of somewhat delicate workloads, including database clusters with CSIs like #Longhorn and #OpenEBS. Checking if everything is in working order has been demanding task and often something I've skipped before rebooting or upgrading nodes - occasionally with horrific results.
Last night I finally took the time and wrote a pretty thorough script that checks that everything is working and healthy, before politely cordoning off a node, draining it and applying upgrades.
I felt so confident today that I tested it by running this new safe upgrade script for all the nodes in the cluster - and it worked! All nodes are now fully upgraded and running kernel 6.12.73 on Debian 13.
This also fixes the outstanding issue caused by #Hetzner no longer supporting obtaining IP addresses through DHCP.

