I was hoping to find which LXCs are causing this, but they all have similar disk IO graphs.
Well, shit.
#homelab #ProxmoxCluster #HighDiskUsage #zfs #mystery
Ok, that was kinda premature and stupid panic.
#Beszel agent was not reporting Disk I/O until the recent update. All agents are updating automatically, and that happened just around March 28-30. After that, the data began to flow.
In fact, I have sudden disk issues only with a single #Proxmox node, and it is clearly visible on the IO pressure stall graph. Spikes before May 30 are backups. Then it went crazy.
So I moved all LXCs to another #Proxmox node. The problematic drive’s usage has returned to normal, but another node where containers were moved is still fine.
Looks like the issue is in a WD Green SSD on that node. It took 1429 hours to retire it.
I'm running #diskscan on it right now. Have no idea why, because I still will replace it with a spare NVMe drive I have.