Rust's unsafe has bothered me for a long time in a way that I finally managed to articulate. It's not really possible (or, at least, not easy) to do the right thing on non-#CHERI hardware, so it's not so bad, but on sensible ISAs it is.

There are really two core issues:

  • Unsafety is a property of code, not data.
  • Unsafety isn't binary.

Rust already has some nice examples of this in the standard library. UTF-8 strings in Rust can be:

  • Things where the type system guarantees that they are definitely sequences of bytes that you may choose to treat as UTF-8 (and it might even be sensible)
  • Things where they are definitely valid UTF-8 sequences (no null bytes in the middle, no invalid values, and so on).
  • Things where they are definitely valid UTF-8 sequences that have been normalised according to the Unicode canonicalisation rules.

These are all useful things for a type system to be able to assert. The first two are each slight relaxations of the type system guarantees of the latter two, which are essential for performance / interoperability. If you read a stream of bytes from a file descriptor, the only thing that you can guarantee is they are in the first state. If you read up to a null, you may be somewhere between the first and second states.

An unsafe object is one where some of the type-system guarantees may be weaker. On a non-CHERI system, it's one where every byte is some unspecified value. Rust's unsafe captures this because, once you have such an object and you interpret some of those bytes as pointers then all bets are off: the unsafety has propagated to the whole of the rest of the system. And this is why you shouldn't use computers.

On a CHERI system, you have memory safety from the ground up. An unsafe object has some guarantees. It has bytes that are either pointers or not-pointer types. You can then do validation: If the static type is an enumeration, is it in bounds? If a field is a pointer, is it a pointer to something of a valid size? If so, and it's a pointer type that has a vtable (or some other inline type information) is the dynamic type a subtype of the static type? If these are true, does recursively validating the types of the fields succeed?

With a system like that, you now have two kinds of unsafe object (not code block: ones where you can still make Rust's guarantees about concurrent mutation and so can do in-place validation to cast away the unsafe and ones where you can't guarantee no concurrent mutation and so can cast away the unsafe only by copying (otherwise you have TOCTOU issues).

This is a far more principled way of thinking about unsafety in a safe language, but it isn't really feasible in a language that wants to run on existing hardware. And this is why one of my talks at #PLISS last week was arguing that people who were interested in language design and implementation should co-design them with hardware.

My plan to spend #PLISS relaxing in the garden was foiled by all of the other speakers giving talks that I wanted to attend (for logistical reasons, I could not skip my own talk). Thanks to @ltratt and Jan Vitek for inviting me and Lucie Lerch for her incredible organisation.

I thought Jan's talk was a great finale, but Olaf (pictured below) didn't seem to agree. I guess he'd seen it a few times before.

So many interesting talks! #PLISS

Oh no I talked too soon

#PLISS