Cursed Homelab Server Upgrade - Part 7
The bootination!
So server is on the shelf, I'm rushing to put server #2 on top of it, hook up all the cables, put the QNAP box on top of that, and hook it all up before Linux finishes booting and Ceph starts expecting the drives to be there.
Except it hadn't booted.
Turns out that UEFI was still expecting a SATA drive to boot off and wasn't finding one, so we had a problem: UEFI doesn't like my KVM's keyboard thing and wouldn't respond to it, the other USB socket on the back was hooked up to the PiKVM which is functionally offline right now (though I could have gotten in if I needed to) so I grabbed a spare USB keyboard, plugged it into the front and started debugging it.
The obvious solution here is to boot off a thumb drive, chroot into Linux, and run grub-install and efibootmgr to update UEFI's boot list.
But this is a server, so they've obviously been here before you, so you can browse any GPT partitioned drive's EFI partition and just boot whatever you want.
And of course I ran into the usual issues of grub-install thinking the mirrored EFI partitions are a MBR disk, so still some efibootmgr shenanigans, but it's now booting fine.
Next step: networking. Ripped everything out of the three bridges, brought all the interfaces up, then noted which ones went down when I unplugged their cables. Hooked them up, rebooted for good measure, and we're booted.
Then to undo all the "64GB of server in 24GB" stuff.
And then the fans slowly span up to 100% while the server sat there doing nearly nothing (load of 2-3).
I'd been worried previously about iLO getting paranoid about stuff and running the fans too fast, but this was different, this was 100%, not 30% when 20% will do. Scoured the internet, poked around in all sorts of help articles, found people who'd hacked iLO 4 (the server has always run 5) to enable manual fan control, and found nothing.
HPE's article on this said that it'll spin up the fans if it's within 10 degrees of a caution temperature, and the only one was "76-AHCI HD Max", which was claimed to be at the front of the case, in the drive cages. Er. So it's in the drive backplanes!?!?!?
Ok, fair enough, let's check that, and iLO claims it isn't even plugged in (I'll get back to this *) but is more than happy to tell me about the external drives on the non-HPE SATA adapter (and the NVMe drives too)
Weird. Ok, can I just change the cooling scheme? How about "Enhanced CPU Cooling"? iLO resets and the fans spin down, then slowly go back up to 100%.
But while that happens, that AHCI HD Max sensor goes away, and I came across an article saying if AMSd isn't running, iLO runs the fans at full speed. (I also came across an article claiming that the conservative limits on this sensor was a conspiracy by HPE to make people buy their drives)
So what happens if I just switch AMSd off for a moment?
And the fans span down to ~30% and stayed there.
My best guess at this point is that AMSd is reading the sensors in the chips in the NVMe drives (76 degrees and happy) averaging them with the external drives (30-40 degrees) and presenting the resulting temperature (54 degrees) to iLO which then freaks out because it's near the 60 degree caution level.
And that's where I am now, it's all working well (*) and I'm reasonably happy. (Also internet is working flawlessly now)