One of my projects this week is to bring up a K8S cluster on our Proxmox homelab to perhaps eventually migrate EphemeraSearch on it.

EphemeraSearch currently runs on a 7-node K8S cluster at Hetzner.

I'm going to drop some notes in this thread, to perhaps consolidate them into a blog post or something later 馃

#kubernetes #homelab #selfhosted #proxmox

The whole thing is provisioned with Tofu; and one of my favorite things to do is to verify that the end-to-end provisioning works fine.

So that means a lot of "tofu destroy" + "tofu apply".

However, the TF configuration includes the Talos disk images used by the cluster, and I didn't want to re-download them every single time.

My first intention was to use "tofu taint" on the virtual machines. But they are declared in a for_each block; and you can't use "tofu taint" or "tofu plan -replace" on a for_each resource (unless you enumerate each resource individually).

However, you can do a targeted destroy:

tofu plan -destroy -target proxmox_virtual_environment_vm.k8s_nodes

And destroy will follow dependencies (if you destroy a resource, the resources that depend on it will automatically be destroyed), so in my case I could also do e.g.:

tofu plan -destroy -target talos_machine_secrets.this

(Because pretty much every Talos-related resource depends on this directly or indirectly).

#terraform #opentofu #talos #kubernetes #homelab #selfhosted

Also, I want the K8S cluster to support IPV6, which meant replacing Talos' default CNI (Flannel) with Cilium.

(OK, it might be possible to support IPv6 with Flannel on Talos, but the Talos docs say very little about how to customize Flannel, and I wanted Cilium for other reasons too - e.g. LoadBalancer support with L2 announcements, replacing kube-proxy...)

This means declaring "cni: none" in the Talos machine config, and then either:

1) manually installing Cilium after provisioning the cluster

2) finding a way to automatically install Cilium when the cluster is provisioned.

Of course I went for option 2, right :-)

Which leads us to a rabbit hole of multiple options:

1) wait for the cluster to be up (=K8S API is functional) and then use the Helm provider to create a helm_release resource on the cluster

Problem: there is no easy and clean way to wait for the cluster to be up.

Talos has a talos_cluster_health resource, but this one waits for all nodes to be "Ready", which isn't going to happen since the CNI hasn't been deployed yet. (There is a skip_kubernetes_checks option but it doesn't seem to help.)

Declaring something like a kubernetes_nodes resource in Tofu sort of works, ... until you reprovision the cluster. Then you realize that you can't even do a "tofu plan" because Tofu tries to refresh that resources' status, which requires the cluster to be up. So, this is a non-starter.

2) use Talos "inlineManifests" feature, which instructs talos to apply a bunch of YAML to the cluster when it's provisioned

Problem: this requires Cilium YAML manifests; and the way I install it is typically with the Helm chart.

Solution: use a helm_template data source to do the equivalent of the "helm template" command, and render the Cilium chart into ready-to-apply YAML manifests.

Next problem: the Cilium Helm chart is very sophisticated, and depends on Capabilities.KubeVersion - in other words, when we invoke the helm_template resource, we need to pass it the correct kube_version.

Next solution: that version is available in talos_machine_configuration resources.

And with that (and a good amount of Cilium configuration!) our cluster comes up fully functional!

#kubernetes #talos #proxmox #cilium #opentofu

Let's continue the Proxmox + Tofu + Talos + Cilium adventure, with two little footnotes. "Devil is in the details!"

First: Talos "inlineManifests" behavior.

When you add some inlineManifests to your Talos MachineConfig and push that MachineConfig, the manifests get applied immediately. Yay!

However, when you update or remove some inlineManifests and push the MachineConfig ... Nothing happens. Talos does a full (potentially destructive!) reconcile only when executing a cluster upgrade. (This is pretty well explained in the Talos docs[1])

This means that our initial installation of CIlium will work immediately, but subsequent configuration changes won't work (the YAML won't be applied) until we run a "talosctl upgrade-k8s". (Pro-tip: make sure to specify "--to" with the current k8s version, otherwise it'll execute a "real" upgrade which implies downloading new images and restarting the whole control plane one component at a time - which takes a while.)

So, are we there yet?

Not quite!

The second issue: each time I'd do a "tofu plan", it would tell me that something had changed. Which is kind of annoying. If you don't change your Tofu configuration, variables, etc, normally, you'd expect "tofu plan" to tell you a reassuring:

No changes. Your infrastructure matches the configuration.

So, what is going on? 馃

[1] https://docs.siderolabs.com/kubernetes-guides/advanced-guides/inlinemanifests#how-talos-handles-manifest-resources

#terraform #talos #opentofu #homelab #kubernetes #cilium

inlineManifests and extraManifests - Sidero Documentation

Learn what inlineManifests and extraManifests are, how they differ, and why they matter.

Sidero Documentation

When looking in the "tofu plan" output, we'd actually see a *huge* change. That huge change is the YAML rendering of the Cilium Helm chart. And since that YAML gets included in our Talos MachineConfigs... Yeah, that was annoying, because each "tofu apply" would repush a new MachineConfig to our Talos nodes.

(That push turns out to be mostly a no-op, but still. Unclean! Boo!)

My first intuition was: "the Cilium Helm chart is probably generating some UUID, secrets, keys, whatever". And, yes, that's exactly it! Cilium generates its own internal CA, and then uses it to issue a couple of certificates.

This is not a problem when using Helm "normally" (i.e. "helm upgrade --install ...") because the Cilium Helm chart is sophisticated enough to do this conditionally, only on the initial install.

However, when rendering the chart YAML "out of the box" (as we do here with Tofu and Talos, or as one would do with e.g. Flux or Argo), the Helm renderer has no access to the Kubernetes API, and doesn't know that there is already a certificate and that it shouldn't generate a new one.

Thankfully, the solution is fairly straightforward:

- generate that certificate (e.g. with the Tofu "tls" provider, it boils down to a couple of resources of a few lines each)

- pass that certificate (and associated key) in the Cilium Helm chart values

- also set a couple of values to tell Cilium to generate the other certificates later (instead of generating them from within Helm)

...And with that, our "tofu plan" now tells us the expected message:

No changes. Your infrastructure matches the configuration.

Next up for today: setting up the Proxmox CSI plugin [1] so that our K8S cluster has a StorageClass - actually two StorageClasses; we want our users to be able to request fast, efficient local volumes (using Proxmox local-zfs) as well as distributed, resilient ones (using ceph) !

[1] https://github.com/sergelogvinov/proxmox-csi-plugin

GitHub - sergelogvinov/proxmox-csi-plugin: Proxmox CSI Plugin

Proxmox CSI Plugin. Contribute to sergelogvinov/proxmox-csi-plugin development by creating an account on GitHub.

GitHub

... And we have the CSI provider. This makes it possible to create a PVC (PersistentVolumeClaim) in the Kubernetes cluster, and this will automatically create a volume in Proxmox and attach it to a Kubernetes node.

That part was both the easiest and the hardest.

The easiest because there wasn't much to do (install a Helm chart, so we repeat the templating technique used earlier with Cilium; and create a Proxmox user and associated token) but also the hardest because there are many little variations possible here.

Example: the Proxmox CSI plugin [1] needs to have the well-known labels topology.kubernetes.io/region and zone. "Region" here means "Proxmox cluster" - which allows us to have a Kubernetes cluster spanning multiple Proxmox clusters; and "Zone" means "Proxmox hypervisor". This is used by the CSI plugin to know where volumes should be created.

But there are many ways to set these labels!

1) through Talos MachineConfiguration [2]

2) by installing the Proxmox CCM [3]

3) by installing something like topomatik [4]

For now, I went with the first option, because I'm already generating MachineConfigurations in the TF configuration, so adding a few lines there was trivial.

But in the long run, I might settle for topomatik, as I believe it would behave correctly if I end up migrating worker nodes from one hypervisor to another. (I don't have any plans to do that at the moment, though!)

[1] https://github.com/sergelogvinov/proxmox-csi-plugin

[2] https://docs.siderolabs.com/kubernetes-guides/advanced-guides/node-labels

[3] https://github.com/sergelogvinov/proxmox-cloud-controller-manager

[4] https://github.com/enix/topomatik

GitHub - sergelogvinov/proxmox-csi-plugin: Proxmox CSI Plugin

Proxmox CSI Plugin. Contribute to sergelogvinov/proxmox-csi-plugin development by creating an account on GitHub.

GitHub

Next steps:

- clean up the Terrafu configuration, probably shove it into a module so that I can create multiple clusters as cleanly as possible (right now, a lot of things are hardcoded for a single cluster)

- push the module to a public forge

- maybe blog about all this?

Of course, I can't be trusted to take notes properly, so I haven't properly documented progress on this thread 馃槄

But here is what happened since last time...

I moved all that to a module that I intend to publish. This led me into investigating how to set default computed values for the module inputs. For instance, I want to be able to specify IPV4 and IPV6 subnets, but if no subnet is specified, I want to pick a random one.

I also added a bunch of documentation.

And then I tested everything using a Proxmox token instead of SSH access, and ... of course it broke, because I was importing a disk image (downloading a raw disk image from the Talos image factory) and that requires SSH access. Because the Proxmox API is annoying like that.

(I didn't think that'd be an issue because that particular feature wasn't listed in the bpg provider under "stuff that requires SSH access".)

So I'm now refactoring everything to install from an ISO image instead (since that doesn't require SSH access), but of course, yak shaving happened: when installing from a Talos image, when the VM reboots, instead of using the static IP address passed by Proxmox in the "nocloud" payload, it's now obtaining an address from DHCP. Which means that cluster bootstrap doesn't work anymore.

I'm now pondering options:

- switching back to raw disk provisioning (and requiring SSH access for my module to work)

- passing IP addresses in the Talos MachineConfig (that should definitely work, right?)

- finding out if there is a way to tell Talos to use the nocloud payload even when rebooting (actually kexec-ing) the disk install

#proxmox #talos #yakshaving

Good news, everyone (especially me): the issue turned out to be both logical and... not so logical.

Formerly, we were booting the Talos nodes on a disk image coming from the Talos factory. That disk image had all the configuration we wanted; in particular, it had the "nocloud" flavor, meaning: "hey, I'm going to give you a bunch of information - including your IP address - through a particular way - in this case, a tiny filesystem on a virtual block device. But now, we're booting from an ISO image. We can't *run* from an ISO image (although, technically, since Talos is immutable... I guess we should be able to? I wonder if that's possible?), so in the Talos MachineConfig, we pass an "install" block to say, "hey, install Talos on this particular disk". And here, there is an "image" parameter, to tell which image you want to use.

Naively, I thought that omitting this parameter meant "infer the image from the ISO" (i.e., use a nocloud image). I was wrong! It picks a different image. In this case, the "metal" image. And the metal image doesn't give a damn about the nocloud metadata, and just does DHCP in that case. So it makes sense!

...But also... Since I booted from a *nocloud* image, why can't it default to a *nocloud* installer? No idea.

Anyways, I changed my MachineConfig template to include the correct image and now we're back in business. Clusters are up and running.

So now I can go back to writing docs and perhaps publishing this module, ... but also I need to pack for my trip to Tennessee. So we'll see :)

#talos #terraform #kubernetes

My module to deploy Kubrnetes clusters on Proxmox using Talos is now documented, and published on github:

https://github.com/jpetazzo/taloprox/

Last step, perhaps write a blog post about all this? 馃

#kubernetes #talos #proxmox #homelab #selfhosted

GitHub - jpetazzo/taloprox

Contribute to jpetazzo/taloprox development by creating an account on GitHub.

GitHub
Would love to hear your thoughts on how we can make talos easier to adopt