Also, I want the K8S cluster to support IPV6, which meant replacing Talos' default CNI (Flannel) with Cilium.
(OK, it might be possible to support IPv6 with Flannel on Talos, but the Talos docs say very little about how to customize Flannel, and I wanted Cilium for other reasons too - e.g. LoadBalancer support with L2 announcements, replacing kube-proxy...)
This means declaring "cni: none" in the Talos machine config, and then either:
1) manually installing Cilium after provisioning the cluster
2) finding a way to automatically install Cilium when the cluster is provisioned.
Of course I went for option 2, right :-)
Which leads us to a rabbit hole of multiple options:
1) wait for the cluster to be up (=K8S API is functional) and then use the Helm provider to create a helm_release resource on the cluster
Problem: there is no easy and clean way to wait for the cluster to be up.
Talos has a talos_cluster_health resource, but this one waits for all nodes to be "Ready", which isn't going to happen since the CNI hasn't been deployed yet. (There is a skip_kubernetes_checks option but it doesn't seem to help.)
Declaring something like a kubernetes_nodes resource in Tofu sort of works, ... until you reprovision the cluster. Then you realize that you can't even do a "tofu plan" because Tofu tries to refresh that resources' status, which requires the cluster to be up. So, this is a non-starter.
2) use Talos "inlineManifests" feature, which instructs talos to apply a bunch of YAML to the cluster when it's provisioned
Problem: this requires Cilium YAML manifests; and the way I install it is typically with the Helm chart.
Solution: use a helm_template data source to do the equivalent of the "helm template" command, and render the Cilium chart into ready-to-apply YAML manifests.
Next problem: the Cilium Helm chart is very sophisticated, and depends on Capabilities.KubeVersion - in other words, when we invoke the helm_template resource, we need to pass it the correct kube_version.
Next solution: that version is available in talos_machine_configuration resources.
And with that (and a good amount of Cilium configuration!) our cluster comes up fully functional!
#kubernetes #talos #proxmox #cilium #opentofu