“Invariant inversion” in memory-unsafe languages

One way of seeing the difference between memory-safe and memory-unsafe languages is that in a memory-safe language, the invariants used to uphold memory safety only “lean on” invariants that are enforced entirely by the language, compiler, and runtime, while in a memory-unsafe language the invariants used to uphold memory safety can “lean on” programmer-created (and thus programmer-breakable) invariants. This latter case can lead to a weird situation that I call “invariant inversion”, where code breaks a safe-looking logical invariant and ends up creating subtle memory unsafety issues.

PACIBSP security

@sha1lan Hi Shallan, some coworkers and I have been discussing this post. Nice job on it!

We do disagree with the conclusions in the blog post and some of the methodology.

In short: Rust would make this bug much harder to miss!

@sha1lan
If Rust had a `fread` function, it would accept a `&mut [u8]` as input. You can't ordinarily convert a `&mut DataSet` into a `&mut [u8]`. That implicit conversion to `void*` and passing it to an unrestricted-byte-writing function is where the bug lies in C++.

@sha1lan
In order to do that conversion, you either:

- Use `unsafe`, at which point reviewers' eyes can hyperfocus on whether all usage of the conversion is valid, or
- Use safe, statically-checked conversions like https://docs.rs/zerocopy/latest/zerocopy/trait.IntoBytes.html#method.as_mut_bytes, which would fail because this library knows that `&mut bool` cannot be safely converted into `&mut [u8]`.

IntoBytes in zerocopy - Rust

Types that can be converted to an immutable slice of initialized bytes.

@sha1lan
What I would do in practice is replace the `bool` array with an array of `#[repr(transparent)] struct OpenBool(u8);` and just accept any non-zero as true. Then I could manipulate the bytes safely with zerocopy and with minimal checks.

@sha1lan

> in a memory-safe language, the invariants used to uphold memory safety only “lean on” invariants that are enforced entirely by the language, compiler, and runtime...

Rust sits at this interesting boundary where it has a checked superset of the language that is memory unsafe. This escape hatch can be combined with the type and safety system to create entirely *new* safety invariants that are programmer-designed but language-enforced. This is how indexing in a Vec works: inside it does a bounds check with its capacity field, and then an unsafe memory access. It's sound since it's memory safe for all valid inputs. The designers of Vec know this because they designed the `data` pointer to have an invariant that it points to `capacity * size_of::<T>()` units of owned aligned memory. Breaking that invariant outside of `Vec`'s definition itself requires `unsafe`.

@sha1lan In any case, I'm always happy to see more folks invested in memory safety. I'm glad to discuss further if you'd like. 🦀

(unrelatedly, I love your name!)

@kupiakos Thank you! And thanks for the detailed thoughts!

I think I basically agree with everything you said. I'm having a hard time seeing what in the post you disagree with, would you mind elaborating a bit? I certainly agree that Rust would make the bug harder to miss!