I'm going to admit that I am doing something immature in my #homelab and I'm looking for opinions. I've got multiple #XCPng hosts, all using local storage. I have no NFS or iSCSI storage. That's kinda silly. Shared storage is super useful and I'm literally not using it.

Unless I go to some serious effort to make a high-performance SAN, I expect network storage performance to be so-so for VM storage, but maybe I'm too pessimistic. I currently only have copper gigabit in the rack. No fiber, no 2.5G copper or anything like that. I'm not sure if that's going to be viable for NFS or iSCSI.

I could dedicate a host to running TrueNAS Core with a bunch of storage. But what has always bugged me about this is that my storage host becomes a single point of failure for all the compute nodes. #TrueNAS is super reliable but everything has to reboot once in a while, and these stupid enterprise-grade servers take anywhere from 4-8 minutes to boot. If I had a single storage node, and I needed to reboot it for an OS upgrade, everything would hang for a while. That's no good. Not updating the OS on the storage system is also not good.

So what am I supposed to be doing for shared storage on a #Xen cluster? How do I avoid a storage host becoming a single point of failure? How do you update and reboot a storage node, without disrupting everything that depends on it?

#selfhosting #san #storage

@paco How is local storage configured?
@nicholasburns Each node has 5 drives in a ZFS pool. So each has about 7-10Tb of local storage. I just brought in a new 20Tb node, which I thought I might turn into a storage node. But, again, the 5 minute reboot duration is a real drag.

@paco SATA or SAS drives? If the latter, that's a pretty resilient and performant local storage solution, and not 'immature' at all. I might be more worried about disaster recovery depending on how that looks.

I also think the point @unixorn makes about distinguising between storage location of virtual system disks versus non-system disks is important.

Next questions would be about what kind of non-system disk data specifically, you think you could be storing more effectively and why. Not to mention thorough benchmarking of current local storage to even coherently compare to non-local options. Otherwise a storage network for the sake of a storage network is just that.

@nicholasburns Most of it is SAS. But where local storage bit me was in migrating from one host to another as I upgraded XCP 8.2.1 to 8.3. When you have shared storage, moving a node from one compute to another is trivial. When I have only local storage, migrating VMs has been really difficult. It's either time-consuming (20-120 minutes to move a big VM) or even impossible (version-to-version migration on XCP has been difficult with only local storage).

#homelab #selfhosting

@unixorn

@paco Fair point. I have admittedly zero hands-on Xen experience, which could very well be *the* deciding factor.

As an aside, I think some people have strong opinions about how Proxmox handles HA by-comparison.

@paco @nicholasburns @unixorn Just for what it’s worth 10G networking makes moving live vms a lot faster and NAS based storage more viable. If you’re looking for a reason to upgrade the network 😂
@paco Following to watch for smart people's responses

@paco
@kiraso

I want to see what smarter people are doing too. I was planning to set up iSCSI and NFS on my Synology this weekend, but just for data, not VM disks so I'm less worried about speed.

In the end, it's just a #homelab and I'm sure it'll be performant enough to support two users.

@unixorn For me, same as for @paco, there is a concern of single point of failure. And my #homelab is partially a #homeprod, hosting services and data that my family is using daily. Performance is less of the concern.

@paco
I'm running two proxmox and one esxi over nfs to a box running alma 9 and zfs. I know esxi nfs is synchronous, which means it waits for verification before a write returns. You'll want have some kind of very fast write cache. I just have one ssd since I can live with a few seconds of data loss if needed.

I probably patch the san once or twice a month. I wrote a script to suspend all the vms, reboot the san, then reboot the vm hosts. The vms come up automatically. You could probably script the whole process to run during your maintenance window.

My san box is an antique supermicro (x8 series) with two x5600 cpus and 48gb ram. There are 16 14tb Seagate exos drives in 8 mirrors and one nvme ssd. Performance is good enough for me with like 35 vms. Oh and two 1gb copper in lacp, and I can saturate one of them easily but with no real degredation.

@paco I’d look at a distributed storage system so the storage layer itself isn’t a single point of failure. Something like Ceph could work: you can run a small Ceph cluster on different hypervisor nodes (either directly on the hosts or in VMs with passthrough disks) and then expose it back to XCP-ng as NFS/iSCSI. That way you can reboot one node at a time without taking all your VM storage down, and you also get nice extras like S3/object storage.

I am a bit worried about performance though...

@paco I’m not a Ceph expert, especially when it’s running inside VMs, but I’ve really fallen in love with distributed storage in my homelab. I’m currently using Longhorn with 3 replicas plus external backups with X days’ retention, and I basically don’t stress about my data anymore. I can delete or reinstall a node with no impact on the workloads.
@mydoomfr would it actually work to setup 3 VM's and make them be a ceph cluster and point proxmox at it? I ask because I have 2 proxmox servers... one has 50tb of zfs space on it and the other has very little disk and pondering the best way to distribute that disk to the other system in a shared fashion.

@skryking I’m not a Ceph expert, but in your scenario I’d aim for 3 Proxmox nodes and try to spread the storage across them. Then you can run Ceph keeping in mind that Ceph is quite resource-hungry.

If you’re stuck with 50 TB on a single node, I’d rather spin up a TrueNAS VM, passthrough the disks, and export NFS/iSCSI to both Proxmox servers. It still leaves you with a big SPOF on that one box, but it’s simpler and more in line with the hardware you have right now?

@skryking Using TrueNAS as a VM on the big box also lets you use Proxmox Backup Server to back up that VM. In theory, you could reinstall the Proxmox node, restore the TrueNAS VM, reattach the passthrough disks, and everything should come back up reasonably cleanly.

That said, I haven’t really worked with VM setups for about two years (I’m mostly on bare-metal K8s now), so I don’t have real hands-on experience with this exact approach — but it’s probably the path I’d follow in your case. :)

@mydoomfr so I actually ended up adding the kernel nfs modules to the big box and sharing the disk space to the other box over nfs.. also at the same time I shared it to all the vm's on a private network internal to the host so that I could get faster speeds.