My annual plea for a thing: I want a type 1 hypervisor that just has a small isolated VM and then passes through the rest of the hardware to the main VM which runs Linux. The small VM is intended to be used to run small pieces of code that the main OS should not be able to interfere with. Does such a thing exist? (Think Xen, but with a Dom0 that can't see into DomUs)
@mjg59 you missed "in an ssh-agent”.
Hafnium - Hafnium architecture

@mjg59 AVF with pKVM is also effectively this but the hypervisor is a split off part of the Linux kernel, so not exactly type 1.
@rinon @mjg59 It’s not like “type 1” versus “type 2” is a real technical distinction.
@rinon @mjg59
how about this qualcomm gunyah thing?
https://github.com/quic/gunyah-hypervisor
GitHub - quic/gunyah-hypervisor: Gunyah is a Type-1 hypervisor designed for strong security, performance and modularity.

Gunyah is a Type-1 hypervisor designed for strong security, performance and modularity. - quic/gunyah-hypervisor

GitHub
@rinon That's broadly what I want, but is ARM only
@mjg59 doesn't quite sound like what Qubes OS is doing
@mjg59 sounds like something you'd need Secure Encrypted Virtualization for https://www.amd.com/en/developer/sev.html
@hyc No, once you're in SEV-land you're not really in a good place to do hardware passthrough
@mjg59 hm, that's a tough one then, maintaining isolation.
@hyc I'm fine with the hypervisor being able to see what's happening in arbitrary guests, but there needs to be isolation between the primary VM and the security VM (Hyper-V manages this fine in Windows land)
@mjg59 @hyc does one know how it manages this? Does it just pretend?
@fl0_id @hyc it's a hypervisor, it simply imposes a barrier between the resources? This isn't a conceptually complicated situation, modern CPUs support it just fine
@mjg59 @hyc sure, but I just meant if the hv can technically see into all guests, who enforces the rules for security vm? The cpu or the hv or both? If the hv, this is likely more easily overridden.
@mjg59 @hyc
Curious: what kind of hardware should the security VM need to access?
(I can only guess TPM? For state bootstrap or something?)

@baloo @mjg59 @hyc

I suspect this is a continuation of the fingerprint issue Matthew was writing about a couple of months(?) ago.

EDIT: This post https://nondeterministic.computer/@mjg59/111456696748600420

Matthew Garrett (@[email protected])

https://blackwinghq.com/blog/posts/a-touch-of-pwn-part-i/ is some very nice research, with some terrifying takeaways: 1) Microsoft developed a secure communications path between the OS and any biometric devices 2) One vendor used the same backing store for both the secure and insecure path, allowing enrollment of fingerprints via the insecure path that were then trusted in the secure path 3) Another vendor used their own fucked up TLS-based implementation rather than the Microsoft one 4) *Microsoft* didn't use their own protocol

Nondeterministic Computer
@baloo @hyc Potentially the TPM, but otherwise nothing - just CPU, RAM, and some sort of simple intra-VM communication channel.

@mjg59 @hyc
I know you already dismissed SEV, but https://github.com/project-oak/oak seems vaguely related?

This is a VM inside the main OS, but the binary inside the TEE is available over grpc.

GitHub - project-oak/oak: Meaningful control of data in distributed systems.

Meaningful control of data in distributed systems. - project-oak/oak

GitHub
@baloo @hyc Right, you can do it the other way around with SEV, but that then leaves you with very restricted hardware support at the moment
@mjg59 @hyc yeah definitely. You will need a piece of code in the main os to make the bridge for any hardware resource you might need.
@mjg59 @hyc Why can you not use SEV-SNP for the security VM, with the main OS running directly on the bare metal?
@mjg59 @hyc Ah, you want to carve the TPM away from the main OS?
@noodles @hyc Some form of secret manager, at least
@noodles @hyc SEV is pretty much exclusive to server parts, and I have a laptop
@mjg59 sounds pretty close to Jailhouse?
@agraf My recollection is that Jailhouse does static partitioning and no scheduling, ie you need to give it a CPU? It also starts from Linux which makes it harder to sequester secrets that Linux can't get at.
@mjg59 I'm not sure how much both of these are embedded into its architecture or just artifacts of how its main users consume it.
@agraf I'm pretty sure the lack of scheduling is a design choice that would need to be retrofitted. Launching from Linux is more about how it's managed, so that's probably an easier thing to fix.
@mjg59 true, it doesn't seem to support any scheduling at all. That said, I'd expect a simple round robin scheduler may not be super difficult to implement. Either way, not an off the shelf solution for your use case.
@mjg59 is this like, kind of a secure enclave/hsm-equiv situation you're looking for?
@munin Yeah, like Windows does with Credential Guard
@mjg59 @munin Which is based on their Krypton minimal hypervisor.
@mjg59 there's work in progress by @l0kod but don't think it's merged yet: https://lore.kernel.org/all/2024050313[email protected]/
[RFC PATCH v3 0/5] Hypervisor-Enforced Kernel Integrity - CR pinning - MickaĂŤl SalaĂźn

@bluca @mjg59 @l0kod

It always somewhat amazes me that something that reads like is super complicated is accomplished with a couple hundred lines of code.

@bluca @l0kod Not quite the same - you still have Linux with the ability to see everything, I think?
@mjg59 @bluca kind of, Heki is the equivalent of Windows's Virtualization Based Security (foundation of Credential Guard and other security mechanisms) for Linux (with KVM or Hyper-V). The host/VMM is part of the TCB like the hypervisor, but the Linux guest VM requests the hypervisor to protect itself (guest). For now this is only CR-pinning (v3) and memory permissions (v2). We could probably implement the same mechanism with Jailhouse, but that would remove a lot of VM use cases
@mjg59 Like Proxmox? Or maybe I have it backwards.
@mjg59 maybe check on kata and firecracker.
These are container engines and not really made for you usecase, but they do run a minimal system Linux, and then run your applications in isolated mini VMs.
Maybe some of their tech can be addapted

@mjg59 I don't think it's in a usable state yet (at least for x86 hosts, according to their FAQ), but I think seL4-as-hypervisor would fit the bill otherwise from my understanding

cf. https://docs.sel4.systems/projects/sel4/frequently-asked-questions.html#how-good-is-sel4-at-supporting-virtual-machines
& https://sel4.systems/About/seL4-whitepaper.pdf

Frequently Asked Questions on seL4 | seL4 docs

@mjg59 So Qubes one step further?
@mjg59 Sorta like what m1n1 does for #asahilinux ? https://github.com/AsahiLinux/m1n1
Commits ¡ AsahiLinux/m1n1

A bootloader and experimentation playground for Apple Silicon - Commits ¡ AsahiLinux/m1n1

GitHub

@mjg59 So basically "a programmable HSM" like a less-locked-down version of apple's secure enclave? I honestly think trying to achieve secure isolation on the same CPU as the rest of the OS is a fool's game, and the only way to ensure isolation is to physically isolate things onto independent cores via a mailbox interface.

(I've wanted something similar for literally ever...)

@becomethewaifu Hypervisors are "good enough", given that we haven't seen multi-tenant cloud turn into a complete disaster
@mjg59 a concept like SGX enclaves / LSASS isolation but actually accessible and convenient to use would be very nice.
@mjg59 acrn? https://projectacrn.org but also I think xen can also do this.
Home - Project ACRN™

..Read more

Project ACRN™
@mjg59 It’s more or less how I use/dogfood Muen¹. What makes it not usable in your case is probably static partitioning and highly target hardware dependent.
Would you mind elaborating a bit what the rest of your envisioned system looks like?
__
š https://muen.sk
Muen | SK for x86/64

@mjg59 To be sure I understand, you want a small VM and a big VM. The big VM gets all the hardware minus what’s needed to run the hypervisor and the small VM. Communications between the big VM and the small VM are strictly controlled in both directions such that neither can interfere with the other.

What sort of thing are you trying to do with this small VM?

This sounds kind of like what a TPM is for, or maybe a BMC/SMC/LOM.

@bob_zim Manage secrets in ways that the TPM can't (eg, the TPM can't establish a secure communications channel with a biometric reader)

@mjg59 So the small VM would own the physical link to the biometric reader, then provide its own attestation about the biometric reader’s attestation it was presented an authentic biometric?

Hmm. I’m not sure I know of a way to do that in software. Decent biometric readers should already use asymmetric keys, though. It should be possible to get a secure element like a TPM or smart card to only unlock a stored key when presented with a valid signature from the reader’s private key.

@bob_zim No need for a physical link (eg, TLS is secure without you having to trust the physical link, modern biometric devices implement equivalent functionality). It is not possible to use a TPM in this way given the hardware that exists.

@mjg59 That’s what I’m getting at: decent biometric readers should already use asymmetric keys. You may not be able to hook that directly to an off-the-shelf TPM, though I thought some had firmware allowing them to trust external public keys for exactly this reason. Might require writing custom RoT firmware like Oxide has done.

A guest can never keep a secret from the hypervisor under which it runs. The host always has full control over the guest, including the ability to inspect and change stack frames. At that point, you’re guaranteed to have a single piece of software which can get at both the clear key material from the small VM’s RAM and the data the key controls from the big VM.

Assuming that’s what you’re trying to prevent, I don’t know of any software system I would trust to provide sufficient isolation between the guests, even at a theoretical level. Computer-in-a-computer stuff like a TPM is it.

@bob_zim I can't rewrite the firmware for my TPM, so that's not a viable approach. I also trust the hypervisor. What I want is to not trust Linux.
@mjg59 @bob_zim The goal is not to hide secrets from the hypervisor, but for the small VM to hide secrets from the big VM, using the trusted hypervisor.
@nicolas17 @mjg59 Then just use dom0 instead of the small VM. That’s easy. The hypervisor can keep secrets from a guest. That’s an obvious solution, though, which is why I wanted to clarify the requirements.
@bob_zim @nicolas17 But then it's massively harder to plumb all my hardware (including ACPI) into a domu
@mjg59 @nicolas17 So is the goal something like a software secure element for workstation use?