So... codex wrote a qcow2 block device driver for the kernel in Rust. It's at least 10% faster than nbd/qemu-nbd mounts. It did so in about one day of prompting. All xfstests pass across a variety of image settings, it supports backing images.

I haven't touched a single line of code.

I'm really not sure how to feel about this 😔

@vegard I'd want to check that you're comparing the same caching and writeback settings and ordering semantics; in some of the more careful modes I know the qemu code is careful not to tell the guest a write is complete until it gets to.. the os? the disk? not sure - but that can be a source of some perf differences. Still if it's actually 10% faster, then finding out why would be interesting and telling @stefanha !
@penguin42 @vegard qcow2 has been implemented in the kernel in C before. It wasn't merged because they're wasn't a community around it with an interest in maintaining the code.
@penguin42 @vegard QEMU has a bunch of storage features like snapshots, backups, live storage migration, incremental backups that are exposed through the QMP monitor protocol. In order to integrate with QEMU a bunch of ioctls/io_uring/netlink interfaces would be required. And it's not portable to other host OSes. The qcow2-in-the-kernel approach hasn't taken hold, but if it does in the future it would be nice.
@penguin42 @vegard Regarding performance, there is also the QEMU FUSE export (it does not use NBD). Brian Song is currently working on FUSE-over-io_uring support and it would be interesting to compare the performance:
https://lore.kernel.org/qemu-devel/202[email protected]/
[PATCH v4 0/7] add fuse-over-io_uring support - Brian Song