Mastodawn

Lennart Poettering Feb 10, 2025

Fun little thing I have been working on: teach systemd to boot directly into a disk image downloaded via HTTP within the initrd.

In v257 systemd learnt the ability to download disk images at boot via systemd-import-generator, both DDIs and tarballs, and place them in /var/lib/machines/, /var/lib/portables/, /var/lib/confexts, /var/lib/extensions/. The goal was to provide a way to provision any of these resources automatically at boot. But now that we have this, we can take it a step further:

Lennart Poettering Feb 10, 2025

download the root disk image itself with this. There were a bunch of missing bits to make this nice though:

First of all, for raw disk images we need to attach them to a loopback block device, to make them mountable. Easy-peasy, systemd-dissect --attach already delivers that.

Then, for tar disk images we need to bind mount the downloaded and unpacked image to /sysroot/ (which is where the rootfs goes before we transition into it).

Lennart Poettering Feb 10, 2025

Then, to make this nicer, it makes sense to allow deriving the URL to download the rootfs image from directly from the UEFI HTTP boot URL. Or in other words: if you point your UEFI to boot a UKI from some URL (i.e. http://example.com/somedir/myimage.efi), then that UKI's initrd is smart enough to derive from that same URL a different URL for the rootfs (by replacing the final component, so that it becomes http://example.com/somedir/myimage.raw.xz).

Lennart Poettering Feb 10, 2025

Net result of this: I can now point my UEFI to a single URL where it will load the UKI from. A few seconds later the initrd will pick up the rootfs from the same source, and boot it up. Magic!

Why all this though?

Lennart Poettering Feb 10, 2025

It's mostly to tighten my test loop a bit, for physical devices. So here's what this entails:

1. You build your image with mkosi one your development machine, and ask it to serve your image as HTTP. In other words: `mkosi -f serve`.

2. You boot into the target machine once, and register an EFI variable that enables HTTP boot from your development machine. Simply do `kernel-bootcfg --add-uri=http://192.168.47.11:8081/image.efi --title=testloop --boot-order=0`, using @kraxel's wonderful tool.

Lennart Poettering Feb 10, 2025

3. You simply reboot that target machine. It will now fetch the UKI kernel, which then fetches the root disk image. And everytime you reboot this happens again. The target's machine#s local disks are unnaffected.

4. …

5. Profit!!

Lennart Poettering

Sounds simple? That's because it is.

(Well of course, you wonder where the magic sauce is. It's here: you need to build your UKIs a certain way: i.e. add to the kernel cmdline: `rd.systemd.pull=verify=no,machine,blockdev,bootorigin,raw:root:image.raw rootflags=x-systemd.device-timeout=infinity ip=any`)

Lennart Poettering Feb 10, 2025

So, two take-aways here:

1. Really nice test loop now for testing immutable, modern OSes on physical devices, with onboard tooling

2. Yeah, you can frickin' boot into a damn tarball now, with just an UKI.

Lennart Poettering Feb 10, 2025

WIP PR for all of this is here:

https://github.com/systemd/systemd/pull/36314

[WIP] Support booting from rootfs acquired via HTTP by poettering · Pull Request #36314 · systemd/systemd

This extends systemd-import-generator to not only download a disk image at boot, but also attach it to a loopback device, so that we can boot from it. We have most of the pieces already in place, t...

GitHub

Lennart Poettering Feb 10, 2025

oh, and one more comment: this will only work on systems that are relatively high on the systemd adoption scale: you definitely need a systemd-based initrd for this. For deriving the rootfs URL from the UEFI network boot URL you need a systemd-stub based UKI.

Lennart Poettering Feb 10, 2025

and even one more comment:

next steps: instead of downloading root fs via http, access it via nvme-over-tcp.

Benefit: better performance (no ahead of time download, but download as needed), and even better: persistency!

Henri Feb 10, 2025

@pid_eins How about WebDAV?

Eric Curtin Feb 10, 2025

@pid_eins a lot of people still default to iSCSI because it's been around a long time. But NVMe/TCP is what people should really be defaulting to these days for this kinda solution.

Account: Computers Feb 10, 2025

@pid_eins It all sounded very good until the last moment. The whole point if downloading the whole thing is to let the thing be stored compressed or shared in unlimited ways. Once you start downloading block-by-block, you're throwing it all out the window. Might was well just back the root with that image on a translucent (CoW) filesystem or something.

Patrick Lang Feb 10, 2025

@pid_eins any thoughts on preventing tampering yet? Or restricting an image to a specific machine?

David Runge Feb 10, 2025

@pid_eins what would be needed for a verification (verify=yes)?

Overall this sounds really cool and a somewhat interesting replacement scenario for PXE in some cases 🤔

Lennart Poettering Feb 10, 2025

@dvzrv right now verify=yes means gpg (specifically: SHA256SUMS signed with some key whose public key is baked into the initrd). We really want to get away from gpg though, hence I hope to add pkcs7 or so eventually, and maybe other stuff.

Lennart Poettering Feb 10, 2025

@dvzrv and of course: just use DDIs, i.e. signed verity enabled disk images. way better security, and you simply don't have to bother about download-time verification, because you have something much better: continous use-time verification.

David Runge Feb 10, 2025

@pid_eins looking forward to opening that up with VOA then 😅