RFC: Enforcing Bounds Safety in C (-fbounds-safety)

Summary We propose -fbounds-safety, a C extension to enforce bounds safety to prevent out-of-bounds (OOB) memory accesses, which remain a major source of security vulnerabilities in C. -fbounds-safety aims to eliminate this class of bugs by turning OOB accesses into deterministic traps. The -fbounds-safety extension offers bounds annotations that programmers can use to attach bounds to pointers. For example, programmers can add the __counted_by(N) annotation to parameter ptr, indicating that t...

LLVM Discussion Forums

@fay59 @regehr

Is that also supposed to support C99 VLA-style function parameters (void fun(size_t n, float arr[n]) without further annotations? Would be useful, I guess?

@Doomed_Daniel @regehr it does interpret this syntax as a pointer with the counted_by annotation!
@fay59 @regehr
Wasn't clear to me, but I only skimmed the post (and searched for "VLA"), maybe I missed it - either way, I'm glad to hear that it's supported!

@Doomed_Daniel @fay59 @regehr This will be extremely useful for future standardization because I have a slew of proposals I need to write to support sized annotations for parameters using static and friends with void* pointers, so we can get byte-level safety that can be automated by -fbounds-safety :

void* memcpy( void restrict dest[static count], const void restrict src[static count], size_t count );

and similar crimes.

@thephd @fay59 @regehr

this sound pretty neat :)

thanks for making C better!

@thephd @regehr @fay59 @Doomed_Daniel we have Anil Madhavapeddy’s bounds checker for GCC 3 which can do precisely that. for example
z-archive-cvs/string.h,v at a0f4b088e8b05b263f9714e6a8784bab03014c54 · MirBSD/z-archive-cvs

Arctic archival of /var/anoncvs/cvs/ (current repository) - z-archive-cvs/string.h,v at a0f4b088e8b05b263f9714e6a8784bab03014c54 · MirBSD/z-archive-cvs

GitHub
@mirabilos @thephd @regehr @fay59
makes you wonder why that hasn't been mainlined in GCC back then
@regehr @Doomed_Daniel @fay59 @thephd probably FSF politics, though they did integrate the other large patch (ProPolice) somewhat.

@fay59 Bikeshedding: I dislike that `__single` implicitly allows null where most of the others do not. Did you consider a non-null `__single` alongside a new `__single_or_null` ?

How do these interact with the existing nullability attributes such as `nonnull` and `_Nonnull` and `_Nullable` ? (examples: Does __single+nonnull optimize away null checks? Does __counted_by+_Nullable convert to __counted_by_or_null? generate a compiler error?)

@gparker you’re right that __single isn’t the same as counted_by(1) in that regard, it’s effectively counted_by_or_null(1). It’s probably best for me keep design questions/answers on the discourse thread because it makes it easier for everyone else to see what’s going on, though.

(Even if you don’t bring it up yourself, I’m almost certain that somebody will)

@fay59 it's missing a mention of this, which is already found in the whole Windows SDK https://learn.microsoft.com/en-us/cpp/code-quality/understanding-sal?view=msvc-170
Understanding SAL

Learn more about: Understanding SAL

@fay59 Couple of questions:

* Are bounds dynamic expressions like VLAs? Can I __counted_by(2* size)?
* Can I split an allocation?
* Performance?
@saagar
* you canot dereference pointers (with exceptions for out pointer parameters) or call functions in count expressions because we don’t create storage for them (whereas VLAs get a hidden size variable) and we need to be able to evaluate them multiple times without side effects. “Simple” expressions (like 2*size) are fine
* yes, if you make an array of 200 elements and assign the two halves to counted_by(100) pointers, you can’t underflow/overflow from one half to the other.
* very variable, better discussed on discourse. I can write (new) code that has all the right patterns such that *all* the implicit checks are optimized away. It’s rarely the case in existing code where you adopt -fbounds-safety

@fay59 So glad this is hitting the shelves, great job to everyone involved!!

NOW I'M GONNA STANDARDIZE STUFF SO MUCH HARDER FOR SAFETY LET'S GOOOOOOOOOOOOOOOOOOOOO

@thephd @fay59 going all hulk hogan on C safety

OOOH YEAHHH

@thephd @fay59 we're trying to figure out whether this supports array lists (or whatever is the C equivalent of Rust's Vec) and we don't think we understand it...

does it just... not handle those at all?

@SoniEx2 @thephd @fay59 C as a language does not have any form of resizable array at all. C really doesn’t have anything except for pointers and fixed size arrays. When you build your own type, you can add appropriate bounds safety annotations.
@porglezomp @thephd @fay59 what would those look like?
@SoniEx2 @porglezomp @thephd I‘m not fluent enough in rust to be sure what you mean and I’m not sure you’re fluent enough in C to understand the situation? C’s implementation of contiguous storage of the same element are arrays, and if you want it to be resizable you use bare pointers and malloc/realloc/free as needed.
@fay59 @porglezomp @thephd so... you don't get bounds checking on (resizable) pointers?
@SoniEx2 @porglezomp @thephd C doesn’t bounds check anything on its own, which is why we came up with this language extension.
@fay59 @porglezomp @thephd yeah we understand that, but you're saying you can't use this with a resizable pointer? that seems unfortunate, we were really hoping you could...
@SoniEx2 @porglezomp @thephd the proposed extension does bounds check things coming out of malloc/calloc/etc

@fay59 @porglezomp @thephd we saw that but we're confused about how this would work with something like

struct growbuf {
int *data;
int size;
}

where all you have is a struct growbuf *.

(not really a proper size field but we digress)

how do you update the pointer and size when growing?

@SoniEx2 @fay59 @thephd you’d write that type as

struct growbuf {
 int * __counted_by(size) data;
 int size;
}

and then it’s able to check that your grow function correctly updated the pointer and size in sync when you reallocate.

@SoniEx2 @porglezomp @thephd the data field needs to be spelled “int *__counted_by(size) data”: without the annotation, you can only dereference index 0. If you update either the data or size field, you also need to update the other at the same time (without side effects between the two assignments). There is a bounds check at assignment that verifies the assigned pointer has at least size elements. When you use the data pointer, the index is bounds checked against 0..<size.
@fay59 @porglezomp @thephd ooh so you can just update them one by one with normal assignments? that's cool!