Proposing that C++ acknowledge that there are exactly 8 bits in a byte.
I’d love feedback, especially from embedded / DSP folks who think this is a terrible idea.
https://isocpp.org/files/papers/P3477R0.html
P3477R0: There are exactly 8 bits in a byte

@jfbastien How are we going to run C++26 code on our PDP-10s, then?
@miodvallat upgrade to PDP-11 😁
@jfbastien That's actually a downgrade (in word size).
@miodvallat it’s one more. It goes to 11.
@jfbastien I said word size, not marketing value!
@jfbastien You can only go up to 11 on a Spınal Tap workstation, not a Digital one.
@jfbastien @miodvallat Why mess around? Crank it up one more to PDP-12, and enjoy those lovely 12-bit words!!! And 12-bit address space, while you are at it!
@jfbastien
I'm no longer in embedded but the only situation I can think of is when you have some crazy PIC microcontroller which has 12/14 bit instructions but is an otherwise 8 bit processor and uses a Harvard architecture.
@dominikg @jfbastien I’ve worked on Unisys mainframes with 9-bit bytes.
@mwyman @dominikg cool story 😎
@jfbastien @dominikg got pretty used to reading octal at the time.
@jfbastien In a way, tying C/C++ together might make this decision worse. A world where the C standard doesn't support a lot of very standard DSP architectures seems unlikely or at least very odd, whereas a world where the same is only true for C++ doesn't actually seem that odd. However, I might just have an incorrect view about the penetration of C++ into those markets.
@zwarich I think C is unlikely to follow. @rcs can prove me wrong!
@jfbastien @zwarich I also think it is unlikely C will follow anytime soon
@jfbastien or all those pesky PDP-8 programmers
@scottearle they’re too busy saying “well actually CHAR_BIT is…” to notice the paper. I will free them of this burden and unleash them onto the world.
@jfbastien This sounds like it would disallow my totally serious architecture with a 63 bit byte and 64, 65, 66 and 67 bits for short, int, long and long long
@jfbastien @siracusa Are there embedded processors designed and made today that don’t use 8-bit bytes or do you mean people who for some reason have to keep supporting a chip from the 1970s?
@0x1ac @siracusa there are still processors made today with bytes that are not 8 bits.
@jfbastien I'd agree that the most recent CHAR_BIT=24 device I worked with, the Sigmatel STMP3500, and its 56000-based relatives, are not relevant to modern C++, nor vice versa. But there is no sane way to program those devices except in C, so the only sentence of your proposal that gives me pause is "Ideally, [the C and C++] committees would be aligned" -- it would not be ideal if C dropped support for such architectures, though I guess they can hardly _retroactively_ drop it from C99, C11 etc.
@TalesFromTheArmchair right it wouldn’t be old versions that change! That’s not a thing that would be doable. Only newer language versions
@jfbastien If someone asks how complicated the C++ community ia, you can simply show this and say “It’s 2024 and they haven’t even settled down on this.”
@jfbastien I don't see why the existence of int8_t guarantees the existence of int64_t. Certainly all modern compilers could implement int64_t by compiling it down to successive int8_t, but should they have to?
@jfbastien …ah, I see int_least64_t was already required, and it would be super unlikely for someone to omit int64_t but support int_least64_t. And even more unlikely to have int64_t but not int32_t.
@jfbastien but then how will i feel really clever when using i/CHAR_BIT, i%CHAR_BIT in yet another reimplementation of a bit set. years of useless standardeeze trivia down the drain
@aDot I put exactly this clever personality in the paper ☺️
Forever immortalized.
@jfbastien 12 bits is the new 8 bits 🙂
@jfbastien What about weirdass old systems, too? I guess that's similar to embedded.
@IceWolf they use C++26?

@jfbastien Good point, still though.

It also limits anyone wanting to build new Weird Computer Architectures.

@jfbastien Though, on the other paw, if specifying it gets compiler makers to stop making weird assumptions around it for ✨OPTIMIZATION✨...

@jfbastien but like, ideally the fix for that is more to get compiler makers to not do quite so much "it's undefined behavior so we can do whatever we want!", and less "you're not allowed to use C++ on wonky architectures". I would think.

But we've never even used C++ yet (only C), never ran into the weird UB things people complain about, and so yeah, no personal experience with this stuff.

@IceWolf compilers don't optimize on this. It's hard-coded pretty strongly in the frontend, so there's zero ambiguity about this value from the compiler's perspective.

@jfbastien Oh huh. That makes more sense; why specify it at the spec level, then? There's no benefit except for shafting weird architectures.

Unless there is a benefit somehow?

@IceWolf the "why bother?" part of the paper. I don't think it's world-shattering. Others think it is. So I wrote the paper.
@jfbastien I dunno if C++ has hosted/freestanding distinction like C does, but if so, a very good compromise would be mandating 8 bit byte for hosted. Then the DSP folks can do their freestanding thing. Full stdlib without 8 bit byte is bs, IMO.
@dalias oh that’s a great point. Will add to the paper (yes C++ has this distinction).

@jfbastien As someone who's been doing C++ embedded development for quite a few years, I can't remember the last time I saw a non 8 bit byte.

Certainly not in the last decade.

Very much in favor of mandating CHAR_BIT == 8.

@jfbastien (for that matter I would also like to formalize that integers are twos complement and signed overflow wraps, which is how literally every mainstream ISA works anyway. It should not be UB)
@jfbastien Or, worst case, make it implementation defined and provide a standard mechanism to determine the actual behavior (saturate / wrap / trap) on overflow.
@azonenberg @jfbastien iirc integers are two's complement already, but signed overflow is still UB

@whitequark @jfbastien Yeah and I am very much im favor of making the C++ standard reflect the realities of the 99.99% of hardware everyone actually uses and not UB things because some architecture from the 1970s does it differently.

For example defining that sizeof(&P) == sizeof(&Q) for any P and Q. And allowing printf(%p) to be used directly on any pointer type as a consequence.

Make the actual size implementation defined, by all means. As long as it's the same for any two pointer types.

@azonenberg @whitequark
Yeah I wrote that paper and gave a talk about the outcome https://youtu.be/JhUxIVf1qok?si=aSPEivvr84c27pVk
CppCon 2018: JF Bastien “Signed integers are two's complement”

YouTube
@jfbastien @whitequark An opt-in overflowing integer type would be fine by me (i.e. "int32_overflow_t" or similar). As long as there's a way to do it in a well-defined manner when I want to.
@azonenberg @jfbastien @whitequark 1 for each of the different possible behaviors, IMO: int32_wrap_t, int32_sat_t, and int32_trap_t
@egallager @jfbastien @whitequark I would not be opposed to that. And then software emulate any that aren't natively supported by hardware (with some means of querying if this is being done).
@azonenberg @egallager @jfbastien soooo... Rust's integer types, more or less?

@whitequark @egallager @jfbastien Similar. It looks like in rust saturation is an operator (saturating_add) rather than a type "integer with saturating operations".

I don't have strong feelings on how it's done in C++ other than wanting a standard way to get overflowing, saturating, or trapping behavior on demand.

@azonenberg @whitequark @egallager @jfbastien @ and ^ have room for a saturating operator syntax, e.g. "a +^ b". But I'm concerned about how all of this interacts with integer promotion. A saturating type might be easier to define new promotion rules for? Or maybe "a +@16 b", i.e. explicit width?
@azonenberg @whitequark @jfbastien sizeof(&P) can be != sizeof(&Q) though.. member function pointers, overloaded operator& come to mind