Let's continue the Proxmox + Tofu + Talos + Cilium adventure, with two little footnotes. "Devil is in the details!"

First: Talos "inlineManifests" behavior.

When you add some inlineManifests to your Talos MachineConfig and push that MachineConfig, the manifests get applied immediately. Yay!

However, when you update or remove some inlineManifests and push the MachineConfig ... Nothing happens. Talos does a full (potentially destructive!) reconcile only when executing a cluster upgrade. (This is pretty well explained in the Talos docs[1])

This means that our initial installation of CIlium will work immediately, but subsequent configuration changes won't work (the YAML won't be applied) until we run a "talosctl upgrade-k8s". (Pro-tip: make sure to specify "--to" with the current k8s version, otherwise it'll execute a "real" upgrade which implies downloading new images and restarting the whole control plane one component at a time - which takes a while.)

So, are we there yet?

Not quite!

The second issue: each time I'd do a "tofu plan", it would tell me that something had changed. Which is kind of annoying. If you don't change your Tofu configuration, variables, etc, normally, you'd expect "tofu plan" to tell you a reassuring:

No changes. Your infrastructure matches the configuration.

So, what is going on? ๐Ÿค”

[1] https://docs.siderolabs.com/kubernetes-guides/advanced-guides/inlinemanifests#how-talos-handles-manifest-resources

#terraform #talos #opentofu #homelab #kubernetes #cilium

inlineManifests and extraManifests - Sidero Documentation

Learn what inlineManifests and extraManifests are, how they differ, and why they matter.

Sidero Documentation

Also, I want the K8S cluster to support IPV6, which meant replacing Talos' default CNI (Flannel) with Cilium.

(OK, it might be possible to support IPv6 with Flannel on Talos, but the Talos docs say very little about how to customize Flannel, and I wanted Cilium for other reasons too - e.g. LoadBalancer support with L2 announcements, replacing kube-proxy...)

This means declaring "cni: none" in the Talos machine config, and then either:

1) manually installing Cilium after provisioning the cluster

2) finding a way to automatically install Cilium when the cluster is provisioned.

Of course I went for option 2, right :-)

Which leads us to a rabbit hole of multiple options:

1) wait for the cluster to be up (=K8S API is functional) and then use the Helm provider to create a helm_release resource on the cluster

Problem: there is no easy and clean way to wait for the cluster to be up.

Talos has a talos_cluster_health resource, but this one waits for all nodes to be "Ready", which isn't going to happen since the CNI hasn't been deployed yet. (There is a skip_kubernetes_checks option but it doesn't seem to help.)

Declaring something like a kubernetes_nodes resource in Tofu sort of works, ... until you reprovision the cluster. Then you realize that you can't even do a "tofu plan" because Tofu tries to refresh that resources' status, which requires the cluster to be up. So, this is a non-starter.

2) use Talos "inlineManifests" feature, which instructs talos to apply a bunch of YAML to the cluster when it's provisioned

Problem: this requires Cilium YAML manifests; and the way I install it is typically with the Helm chart.

Solution: use a helm_template data source to do the equivalent of the "helm template" command, and render the Cilium chart into ready-to-apply YAML manifests.

Next problem: the Cilium Helm chart is very sophisticated, and depends on Capabilities.KubeVersion - in other words, when we invoke the helm_template resource, we need to pass it the correct kube_version.

Next solution: that version is available in talos_machine_configuration resources.

And with that (and a good amount of Cilium configuration!) our cluster comes up fully functional!

#kubernetes #talos #proxmox #cilium #opentofu

Stop wasting hours hardening Linux for Kubernetes. ๐Ÿ›‘

Running K8s on Ubuntu means battling OS patches and config drift. Plus, shared cloud VMs throttle your I/O.

Move to Immutable Bare Metal:
โœ… Talos Linux (No SSH, purely API-driven)
โœ… 3-Node HA & strict etcd quorum
โœ… Cilium eBPF native L2 routing

Ditch the hypervisor tax. โšก
๐Ÿ”— https://www.servermo.com/howto/deploy-talos-linux-kubernetes-bare-metal/

#Kubernetes #TalosLinux #BareMetal #DevOps #eBPF #Cilium #Linux

I need to move off ingress-nginx because it's mothballed. I'm already using #Cilium for CNI, so I figured I'd switch to that. Yesterday was discovering that Cilium's gateway api implementation doesn't play nice with MetalLB. I figured that as Cilium BGP control plane can perform the same function, I might as well simplify a bit and replace MetalLB, so I did that this evening. Went pretty smoothly (except for a *weird* choice they made in their API wrt selectors, but at least it's documented) so I went back to try the service I was experimentally moving to gateway API. But it turns out that it doesn't support filters yet so I can't migrate the integrated authentication ๐Ÿ˜ญ Looks like it's been on the todo list for a few years without moving, so it's a tossup now whether to wait a couple of releases to see if it shows up, or use something else. #kubernetes

#homelab and #k3s made further progress, they now run #cilium for networking.

Why? Because I like Cilium and eBPF.

Compare to my full blown #k8s cluster on 10 bare metal supermicro systems, this was a tremendous amount easier. In total I spent about two hours to get k3s and Cilium to play nice.

All of this is Ansible based. I am reusing the official k3s orchestration, and a personalized Cilium role.

Dabbled with enabling #IPv6 on my #cilium based #k3s cluster this morning. Seems that it /is/ possible to enable without a full cluster/node rebuild*.

Mostly went fine, prefix, prefix mask, masq set to off. After poking a couple of the Cilium Pods new Pods got an IPv6 addr. ...but couldn't ping anything. Traffic made it out based on what Hubble was showing, but not the reverse.

Enabled v6 masquerading, and it all started to work, yay. Suspect I need to try setting up a static route on my router for this to work.

I have a couple pods w/ quirky networking so they got unhappy. v6 IP, dns query replying w/ AAAA but no dice as they really only have v4 connectivity.

Back off for now but promising that it could work.

*.spec.PodCIDR(s) are immutable on v1.Node resources, but cilium in it's default configuration doesn't get it's PodCIDR from there in the default config.

An experiment โ€“ Enable Cilium native routing on Azure Kubernetes Service BYOCNI โ€“ Part 1 โ€“ Daniel's Tech Blog

An experiment โ€“ Enable Cilium native routing on Azure Kubernetes Service BYOCNI โ€“ Part 3 https://www.danielstechblog.io/an-experiment-enable-cilium-native-routing-on-azure-kubernetes-service-byocni-part-3/ #Azure #AKS #Kubernetes #Cilium
An experiment โ€“ Enable Cilium native routing on Azure Kubernetes Service BYOCNI โ€“ Part 3 โ€“ Daniel's Tech Blog

Cilium deprecated external workload? Deploy HAProxy Ingress in DMZ w/ BGP+BIRD. Pod CIDR export, firewalld hardening, AlmaLinux-ready. Secure & tested! ๐Ÿ‘‡

https://devopstales.github.io/kubernetes/k8s-dmz-bgp-external-haproxy/

#Kubernetes #BGP #HAProxy #NetworkSecurity #DevOps #Cilium

Kubernetes DMZ Ingress with HAProxy and BGP: External Mode Without Cilium External Workload

Learn how to deploy HAProxy Ingress Controller on AlmaLinux in a DMZ network outside your Kubernetes clusterโ€”without Cilium’s deprecated external workload mode. This guide covers BGP peering with BIRD, Cilium’s Pod CIDR export, firewalld configuration, and production-ready setup for secure ingress traffic isolation.

DevOpsTales
An experiment โ€“ Enable Cilium native routing on Azure Kubernetes Service BYOCNI โ€“ Part 2 https://www.danielstechblog.io/an-experiment-enable-cilium-native-routing-on-azure-kubernetes-service-byocni-part-2/ #Azure #AKS #Kubernetes #Cilium
An experiment โ€“ Enable Cilium native routing on Azure Kubernetes Service BYOCNI โ€“ Part 2 โ€“ Daniel's Tech Blog