Mastodawn

Matthew Garrett Jun 17, 2025

Looking at the reports of some systems failing to boot after the latest UEFI DBX update and wondering whether it's another case of https://mjg59.dreamwidth.org/22855.html

mjg59 | Samsung laptop bug is not Linux specific

Show thread

Matthew Garrett Jun 17, 2025

We ended up writing some hilariously crappy workarounds for Linux to prevent this kind of thing, where firmware fails to boot if the UEFI variable store is too full. We check whether the write would leave under 5KB of free space (as reported by the firmware), and refuse the write if it would. Easy! Except some firmware would never actually increase the available space count if a variable was deleted, so after a while we'd just never be able to write any more variables

Show thread

Matthew Garrett Jun 17, 2025

I can't remember how I figured this out, but the affected machines would trigger garbage collection if we tried to create a variable bigger than the available space, so Linux just tries to create a giant variable and then deletes it again to force the firmware to actually update the free space counter

Show thread

Matthew Garrett Jun 17, 2025

Fucking computers, man

Show thread

Cassandrich Jun 17, 2025

@mjg59 Fucking UEFI. The ring -1 (or is it -42 now? does anyone even know or care?) shit nobody asked for.

Show thread

Demi Marie Obenour Jun 18, 2025

@dalias @mjg59 UEFI and ACPI solve the problem of booting the same image on a wide variety of machines. From a distro point of view, that's a huge win. Whether it is worth the huge number of tradeoffs is another question.

Show thread

Cassandrich Jun 18, 2025

@alwayscurious @mjg59 Device tree solves that problem. UEFI and ACPI both address a much larger-scope problem (which a lot of us don't want) of having a persistent layer under your trusted OS that you also have to trust, that continues execution after control was supposed to be passed to the OS, and that the OS is forced to interact with to access important functionality.

Show thread

Demi Marie Obenour Jun 18, 2025

@dalias @mjg59 Device tree would be a solution if all of the board-specific code reached mainline, but it doesn't. See @mjg59's commentary on the subject.

Show thread

Demi Marie Obenour Jun 18, 2025

@dalias @mjg59 If there was a way to force vendors to upstream all of their board support code, then device tree would be just as good as UEFI + ACPI for portability, but right now there is no such way.

I fully agree that UEFI + ACPI are security disasters and that device tree is much better in that regard, but it is also one of the reasons that one can boot Linux on x86 systems that were never intended to run it and usually have a lot of stuff work out of the box without someone having to write drivers first. I'm not aware of good solutions that are also economically feasible in the present market and regulatory environment.

Show thread

Cassandrich Jun 18, 2025

@alwayscurious @mjg59 The only reason it "works" on x86 is that everyone's essentially running black box proprietary substrate drivers for a bunch of critical stuff.

In theory these could just put things in a working state then wipe themselves out and pass execution permanently to the real OS. But they don't.

Show thread

Demi Marie Obenour Jun 18, 2025

@dalias @mjg59 the root cause of all of this is that hardware vendors don't write and upstream open source drivers. Even if somehow vendors could be compelled to do that, I'd rather Linux not be the upstream for all this code. Put it in a library that any OS can use.

Show thread

Cassandrich Jun 18, 2025

@alwayscurious @mjg59 The root cause is that they don't properly document hardware interfaces. A proper document is worth way more than the garbage-quality drivers they write. Making the hardware work minimally then ends up being as simple as hard-coding a sequence of register writes.

Show thread

Demi Marie Obenour Jun 18, 2025

@dalias @mjg59 what about documenting it in a machine-readable form that can be used to autogenerate a driver?

Show thread

Cassandrich Jun 18, 2025

@alwayscurious @mjg59 Theoretically that would be lovely. But a sufficiently expressive form would essentially become a programming language/virtual machine (like ACPI) that doesn't actually express how to use the hardware to a human except "execute this code on the virtual machine and the the thing comes out, or have fun reverse engineering it if you actually want to know what's happening".

Show thread

Demi Marie Obenour Jun 18, 2025

@dalias @mjg59 did you know that power management nowadays involves hard-realtime control loops that are run on a separate processor?

Show thread

Cassandrich Jun 18, 2025

@alwayscurious @mjg59 That's good, it means it should operate independently with no control channels between the power management processor and the domain that contains any user data or code except a simple channel for setting power management parameters.

Right? [insert Padme meme]

Show thread

Demi Marie Obenour

@dalias @mjg59 Nah, that code should be open source and must be trusted. See PlunderVolt for why.

Show thread

Cassandrich Jun 18, 2025

@alwayscurious @mjg59 What attacks do you have in mind if it has no communication channels, and how would baked-in intentional breakage here be any different from baked-in intentional breakage in the cpu that you also can't see?

Show thread

Demi Marie Obenour Jun 18, 2025

@dalias @mjg59 Fault injection, stealing crypto secrets via power analysis, maybe others. There is no difference between that and a CPU backdoor, but what is the advantage of moving this to a separate processor?

Show thread

Cassandrich Jun 18, 2025

@alwayscurious @mjg59 I claim it's far weaker than a cpu backdoor because you can't target it. It doesn't have enough information to know when it wants to break things, and it doesn't have any channel to exfiltrate anything; it would have to break things in a way that causes the cpu malfunction to double as exfiltration.

Advantage of having it on a separate processor is that you get hard realtime without having a hard-realtime rootkit below ring 0 on the real cpu where it *would* have access to all the context to to attacks.

Show thread

Demi Marie Obenour Jun 18, 2025

@dalias @mjg59 Ah, that makes sense. Also, it doesn't interfere with OS scheduling.

I do want power management to be under OS control but that requires major changes in OS architecture.