Cursed Homelab Server Upgrade - Part 7

The bootination!

So server is on the shelf, I'm rushing to put server #2 on top of it, hook up all the cables, put the QNAP box on top of that, and hook it all up before Linux finishes booting and Ceph starts expecting the drives to be there.

Except it hadn't booted.

Turns out that UEFI was still expecting a SATA drive to boot off and wasn't finding one, so we had a problem: UEFI doesn't like my KVM's keyboard thing and wouldn't respond to it, the other USB socket on the back was hooked up to the PiKVM which is functionally offline right now (though I could have gotten in if I needed to) so I grabbed a spare USB keyboard, plugged it into the front and started debugging it.

The obvious solution here is to boot off a thumb drive, chroot into Linux, and run grub-install and efibootmgr to update UEFI's boot list.

But this is a server, so they've obviously been here before you, so you can browse any GPT partitioned drive's EFI partition and just boot whatever you want.

And of course I ran into the usual issues of grub-install thinking the mirrored EFI partitions are a MBR disk, so still some efibootmgr shenanigans, but it's now booting fine.

Next step: networking. Ripped everything out of the three bridges, brought all the interfaces up, then noted which ones went down when I unplugged their cables. Hooked them up, rebooted for good measure, and we're booted.

Then to undo all the "64GB of server in 24GB" stuff.

And then the fans slowly span up to 100% while the server sat there doing nearly nothing (load of 2-3).

I'd been worried previously about iLO getting paranoid about stuff and running the fans too fast, but this was different, this was 100%, not 30% when 20% will do. Scoured the internet, poked around in all sorts of help articles, found people who'd hacked iLO 4 (the server has always run 5) to enable manual fan control, and found nothing.

HPE's article on this said that it'll spin up the fans if it's within 10 degrees of a caution temperature, and the only one was "76-AHCI HD Max", which was claimed to be at the front of the case, in the drive cages. Er. So it's in the drive backplanes!?!?!?

Ok, fair enough, let's check that, and iLO claims it isn't even plugged in (I'll get back to this *) but is more than happy to tell me about the external drives on the non-HPE SATA adapter (and the NVMe drives too)

Weird. Ok, can I just change the cooling scheme? How about "Enhanced CPU Cooling"? iLO resets and the fans spin down, then slowly go back up to 100%.

But while that happens, that AHCI HD Max sensor goes away, and I came across an article saying if AMSd isn't running, iLO runs the fans at full speed. (I also came across an article claiming that the conservative limits on this sensor was a conspiracy by HPE to make people buy their drives)

So what happens if I just switch AMSd off for a moment?

And the fans span down to ~30% and stayed there.

My best guess at this point is that AMSd is reading the sensors in the chips in the NVMe drives (76 degrees and happy) averaging them with the external drives (30-40 degrees) and presenting the resulting temperature (54 degrees) to iLO which then freaks out because it's near the 60 degree caution level.

And that's where I am now, it's all working well (*) and I'm reasonably happy. (Also internet is working flawlessly now)

#homelab #cursedhomelab

Ok, so what the heck happened to the onboard AHCI controller.

I swear that somewhere deep inside iLO or UEFI it's completely switched the thing off. The only storage on lspci is the two NVMe drives and the controller for the external drives.

I think that it's expecting that the only storage controllers to get plugged into this are either:
1. HPE PCIe expansion cards
2. HPE mezzanine expansion cards
3. NVMe drives

And the first two mean that you're going to be plugging the internal cables up to those controllers (for the mezzanine cards, I believe they share connectors with the on-board AHCI controller) so no need for the AHCI controller. Right?

I think it thinks my QNAP brand external SATA card is all the storage controller it needs, therefore it doesn't need to spin up the internal "s100" AHCI controller.

So I believe this means that the 8 SATA/SAS ports in the front are now dark.

But I have no need to plug anything into them right now so I'll cross this bridge when I get some 2.5" drives for this server.