today i learned that if you don't end a C file with a newline, the compiler is free to steal your apes
@wingo it's really funny isn't it
@whitequark very serious language

@wingo @whitequark 😌Ritchie probably spins in his grave.

These standardization foobars commonly happen when implementors with commercial interests seek to „compromize“ out of being buggy.

Update: @david_chisnall points to this probably just being a misbehaviour of earlier tool chains. See below.

@icing @wingo @whitequark

This has nothing to do with commercial vendors. The original Lex / YACC could not handle token streams that did not terminate with a separator. The very first C compiler would not correctly handle files that did not end with a newline (or, I think, other whitespace) and so the UNIX convention was to always end a file with a newline.

This limitation was picked up by most early C compilers, because they used the same tooling.

Modern compilers tend not to use this kind of tooling (and newer versions can handle this corner case), but the standard didn't want to exclude them.

As a QoI issue, most compilers will simply diagnose this and interpret the file as if it had a newline at the end. Early compilers would, as I recall, simply stop in the middle of parsing (I think. A parser generator I used 20+ years ago had this issue, but I don't remember the details).

@david_chisnall @icing @wingo @whitequark Yes, but this doesn't justify UB at all. UB means "the compiler can do whatever it wants".
The missing new line could be specified as "mandatory else the compilation will fails" or as "if missing, the compilation might fail".

I guess they defined it as UB to treat multiple files at once, concatenated without separation, thus meaning the last line of a file could be completed with the first of another...

@tdelmas @icing @wingo @whitequark

Not ending a header with a newline means some C compilers will join the last token of the header with the first of the next file, which leads to completely unpredictable outcomes. Not terminating the compilation unit with a newline causes those same to fail to terminate during compilation.

Both of those were existing behaviours when ANSI C was standardised in 1989.

EDIT: UB wasn't meant to mean 'the compiler can do whatever it wants', it was meant to mean 'compilers cannot (for technical reasons) diagnose this and may do unexpected things'. Division by zero traps on a lot of architectures, adding a branch-if-zero before every divide would fix it but cause a lot of performance problems, so compilers may assume it doesn't happen. Use after free can't be statically detected in C, so compilers may assume it doesn't happen. Files not ending with a newline was probably detectable.

@tdelmas @david_chisnall @icing @wingo @whitequark If this wasn't UB, and the toolchain with this problem happened to produce output in this case for some random reason, it would be noncompliant. The point of UB is that there are no constraints on the implementer.

@david_chisnall @icing @wingo @whitequark I hit this around 2008 in the PIC microcontroller toolchain (I assume MPLAB). Symptom was the last function in the file couldn't be linked to - as it wasn't compiled.

On the plus side, leveraging loosely remembered minutiae to suggest "try adding a new line at the end of the file" makes you look like a magician. (When it was just a lucky guess).

I hope that's been fixed in that toolchain now, but haven't used PIC since then.

@Scatterdemic @icing @wingo @whitequark

The PIC C toolchain supported a very exciting version of C. As I recall, there were no variadics in the language and printf was handled specially by the compiler.

It's a safety feature against truncated code files.
@nafnlaus.bsky.social @wingo in what way does allowing the compiler to do anything whatsoever when it encounters a truncated file make anything safer?
Undefined behavior throws a warning in the compiler.
@nafnlaus.bsky.social @wingo yeah, that's not the case
Bevor Sie zur Google Suche weitergehen

@nafnlaus.bsky.social @wingo it does happen to do that on a specific compiler or a few. but we aren't talking about specific compilers, we're talking about the standard
Undefined behavior is considered harmful. Any compiler worth its salt will warn you about any undefined behavior it can readily detect at compile time (not all fits into this category, but newlines do). C is designed to be a "close to the metal" language (like Fortran), but also (like Fortran)...
cross-platform. The result of these two things is that you have to allow for undefined behavior, you can't be forcing checks and logic to prevent it because that pushes you "away from the metal", e.g. slowdowns / memory increases, which is anathema the reason people use C/C++
So the C/C++ includes the concept of "undefined behavior" and leaves it up to the compiler to figure out how best to handle it, so as not to hinder the compiler developer's ability to coax every last bit of performance out of the code, so long as the developer follows the standard.
Now of course trailing newlines don't have to be put into the "undefined behavior" category, there's no reason that couldn't be just a general mandated warning or error, and putting it as undefined behavior is again just giving the compiler freedom to decide how best to deal with the user...
violating coding standards. One might argue it's better to have a mandated way to handle *that* particular easy-to-detect case (as opposed to something harder to deal with, like accessing outside of array bounds or whatnot). But in general, C encourages compiler competition rather than discouraging.
E.g. that whatever compiler manages to find a way to get the best performance, the most useful warnings / error messages, etc, while still adhering to the standard, becomes the most popular. It doesn't try to be a monolith like, say, Python.

@whitequark @nafnlaus.bsky.social @wingo
I mean, that's a problem with undefined behaviour in general, isn't it

allowing the compiler to do anything whatsoever when it encounters a particular class of error doesn't make anything safer, whether that's a truncated file or a null pointer dereference or an infinite loop that makes no progress

@wingo UB for compile-time (not run-time) conditions is bonkers (there are more like this).
@amonakov @wingo does anyone know which compiler couldn't handle eof as a line ending marker? I assume there must have been some implementation that caused it to be designated as UB?
@dotstdy @amonakov i had just assumed that it was the famous bundler and transpiler, `cat`
@dotstdy I've definitely used such compilers in the past. Their apparent attitude was, "#include includes exactly the bytes of that file into the translation unit at this point", so if the included file didn't end with a newline, it effectively concatenated the next input line onto the end of the final "line" of the included file. And because most header files end with "#endif", and most compilers ignore any junk on the same line following an "'#endif", it effectively ignored the next input line completely, causing confusing errors.
@wingo Submitting a patch to gcc that auto-enables -Ofast if there’s no newline at the end of the file
@dpk @wingo Is this what people call malicious compliance
@wingo I remember some C compilers throwing warnings if you did not end the file with a newline, when I worked with TI DSPs more than a decade ago, so I tend to end all my C/H files with a newline since then. Didn't know this is officially a cause for undefined behavior 🤦‍♂️
@wingo No diagnostic required?
@tessarakt @wingo nothing required at all. Undefined behavior means compilers can do as they wish. Fail with diagnostic, fail silently, do what you expect them to do, or something else entirely.
@wingo let's throw in a triple negative, just in case somebody was about to understand the sentence.
Yay, nasal demons!
I wonder how safe would it be for a compiler to pass every file with undefined behavior into a linter first. Or would somebody start targeting the linter itself for exploits?
@wingo C compiler but if you do a UB then it will rm -rf /
@wingo likewise if they end in two or more newlines, if I read that right
@wingo I find files that don’t end with a newline annoying, and I’m not even a C compiler.
@wingo @ahihi funny enough I have run into this compiler failure before. It was only when compiling on OpenStep for Intel though. For whatever reason the M68K version of the compiler did not complain.