> [..] whenever possible, compiler writers refuse to take responsibility for the...

dataflow · on Aug 3, 2024

No, I think you're just speaking past each other here. You're using "bug" in reference to the source code. They're using "bug" in reference to the generated program. With UB it's often the case that the source code is buggy but the generated program is still correct. Later the compiler authors introduce a new optimization that generates a buggy program based on UB in the source code, and the finger-pointing starts.

Edit: What nobody likes to admit is that all sides share responsibility to the users here, and that is hard to deal with. People just want a single entity to offload the responsibility to, but reality doesn't care. To give an extreme analogy to get the point across: if your battery caught fire just because your CRUD app dereferenced NULL, nobody (well, nobody sane) would point the finger at the app author for forgetting to check for NULL. The compiler, OS, and hardware vendors would be held accountable for their irresponsibly-designed products, "undefined behavior" in the standard be damned. Everyone in the supply chain shares a responsibility to anticipate how their products can be misused and handle them in a reasonable manner. The apportionment of the responsibility depends on the situation and isn't something you can just determine by just asking "was this UB in the ISO standard?"

dathinab · on Aug 3, 2024

> just speaking past each other here

no I'm not

if your program has UB it's broken and it doesn't matter if it currently happen to work correct under a specific compiler version, it's also fully your fault

sure there is shared responsibility through the stack, but _one of the most important aspects when you have something like a supply chain is to know who supplies what under which guarantees taking which responsibilities_

and for C/C++ its clearly communicated that it's soly your responsibility to avoid UB (in the same way that for batteries it's the batteries vendors responsibility to produce batteries which can't randomly cough on fire and the firmware vendors responsibility for using the battery driver/chagrin circuit correctly and your OS responsibility so that a randoms program faulting can't affect the firmware etc.)

> be misused and handle them in a reasonable manner

For things provided B2B its in general only the case in context of it involving end user, likely accidents and similar.

Instead it's the responsibility of the supplier to be clear about what can be done with the product and what not and if you do something outside of the spec it's your responsibility to continuously make sure it's safe (or in general ask the supply for clarifying guarantees wrt. your usage).

E.g. if you buy capacitors rate for up to 50C environmental temperature but happen to work for up to 80C then you still can't use them for 80C because there is 0% guarantee that even other capacitors from the same batch will also work for 80C. In the same way compilers are only "rate"(1) to behave as expected for programs without UB.

If you find it unacceptable because it's to easy to end up with accidental UB, then you should do what anyone in a supply chain with a too risky to use component would do:

Replace it with something less risky to use.

There is a reason the ONCD urged developers to stop using C/C++ and similar where viable, because that is pretty much just following standard supply chain management best-practice.

(1: just for the sake of wording. Through there are certified, i.e. ~rated, compilers revisions)

gavinhoward · on Aug 4, 2024

> your program has UB it's broken and it doesn't matter if it currently happen to work correct under a specific compiler version, it's also fully your fault

Except that compiler writers essentially decide what's UB. Which is a conflict of interest.

And they add UB, making previously non-UB code fall under UB. Would you call such code buggy?

dzaima · on Aug 4, 2024

> Except that compiler writers essentially decide what's UB.

No, the C/C++ standards specify what is UB. So, as long as you don't switch targeted standard versions, the brokenness of your code never changes.

Compilers may happen to previously have never made optimizations around some specific UB, but, unless you read in the compiler's documentation that it won't, code relying on it was always broken. It's a bog standard "buggy thing working once doesn't mean it'll work always".

gavinhoward · on Aug 4, 2024

> No, the C/C++ standards specify what is UB.

And the compiler writers have a stranglehold on the standards bodies. They hold more than 50% of the voting power last time I checked.

So yeah, compiler writers decide what's UB.

dzaima · on Aug 4, 2024

The vast majority of UB usually considered problematic has been in the standards for decades, long before compilers took as much advantage of it as they do now (and the reasons for including said UB back then were actual hardware differences, not appeasing compiler developers).

Are there even that many UB additions? The only thing I can remember is realloc with size zero going from implementation-defined to undefined in C23.

gavinhoward · on Aug 4, 2024

Yes, but that does not change the fact that compilers writers have control of the standard, have had that control since probably C99, and have introduced new UB along with pushing the 00UB worldview.

dzaima · on Aug 4, 2024

What introduced UB are you thinking of? I'll admit I don't know how much has changed, but the usually-complained-about things (signed overflow, null pointer dereferencing, strict aliasing) are clearly listed as UB in some C89 draft I found.

C23's introduced stdc_trailing_zeros & co don't even UB on 0, even though baseline x86-64's equivalent instructions are literally specified to leave their destination undefined on such!

00UB is something one can argue about, but I can't think of a meaningful way to define UB that doesn't impose significant restrictions on even basic compilers, without precisely defining how UB-result values are allowed to propagate.

e.g. one might expect that 'someFloat == (float)(int8_t)someFloat' give false on an input of 1000, but guaranteeing that takes intentional effort - namely, on hardware whose int↔float conversions only operate on ≥32-bit integers (i.e. everything - x86, ARM, RISC-V), there'd need to be an explicit 8-to-32-bit sign-extend, and the most basic compiler just emitting the two f32→i32 & i32→f32 instructions would fail (but is imo pretty clearly within "ignoring the situation completely with unpredictable results" that the C89 draft contains). Sure it doesn't summon cthulhu, but it'll quite likely break things very badly anyway. (whether it'd be useful to not have UB here in the first place is a separate question)

Even for 'x+100 < x' one can imagine a similar case where the native addition & comparison instructions operate on inputs wider than int; using such for assuming-no-signed-wrap addition always works, but would mean that the comparison wouldn't detect overflow. Though here x86-64, aarch64, and RISC-V all do provide instructions for 32-bit arith, matching their int. This would be a bigger thing if it were possible to have sub-int-sized arith.

pertymcpert · on Aug 4, 2024

Which UB upsets you? Can you be specific so we can revert it?

gavinhoward · on Aug 4, 2024

All of it. But especially anything added after C89 that was not already there implicitly.

Edit: okay, not all of it. I was hyperbolic. Race conditions and data races should be UB. But anything that can be implementation-defined should be.

dzaima · on Aug 4, 2024

So your issue is not at all any specific thing or action anyone took, but just in general having UB in places not strictly necessary. And "Especially anything [different from The Golden Days]", besides being extremely cliche, is a completely arbitrary cutoff point.

A given compiler is free to define specific behavior for UB (and indeed you can add compiler flags to do that for many things); the standard explicitly acknowledges that with "Possible undefined behavior ranges from […], to behaving during translation or program execution in a documented manner characteristic of the environment".

gavinhoward · on Aug 4, 2024

Sigh...yes, I don't want any UB where it's not necessary.

But if you must have a concrete example, how about realloc?

In C89 [1] (page 155), realloc with a 0 size and a non-NULL pointer was defined as free:

> If size is zero and ptr is not a null pointer, the object it points to is freed.

In C99 [2] (page 314), that sentence was removed, making it undefined behavior when it wasn't before. This is a pure example of behavior becoming undefined when it was not before.

In C11 [3] (page 349), that sentence remains gone.

In C17 [4] (page 254), we get an interesting addition:

> If size is zero and memory for the new object is not allocated, it is implementation-defined whether the old object is deallocated. If the old object is not deallocated, its value shall be unchanged.

So the behavior switches from undefined to implementation-defined.

In C23 [5] (page 357), the wording completely changes to:

> ...or if the size is zero, the behavior is undefined.

So WG14 made it UB again after making implementation-defined.

SQLite targets C89, but people compile it with modern compilers all the time, and those modern compilers generally default to at least C99, where the behavior is UB. I don't know if SQLite uses realloc that way, but if it does, are you going to call it buggy just because the authors stick to C89 and their users use later standards?

[1]: https://web.archive.org/web/20200909074736if_/https://www.pd...

[2]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf

[3]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

[4]: https://web.archive.org/web/20181230041359if_/http://www.ope...

[5]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3047.pdf

dzaima · on Aug 4, 2024

If SQLite wants exactly C89, it can just require -std=c89, and then people compiling it with a different standard target are to blame. This is just standard backwards incompatibility, nothing about UB (in other languages requiring specific compiler/language versions is routine). Problems would arise even if it was changed from being a defined 'free(x)' to being a defined 'printf("here's the thing you realloc(x,0)'d: %p",x)'. (whether the C standard should always be backwards compatible is a more interesting question, but is orthogonal to UB)

I do remember reading somewhere that a real platform in fact not handling size 0 properly (or having explicitly-defined behavior going against what the standard allowed?) being an argument for changing the standard requirement. It's certainly not because compiler developers had big plans for optimizing around it, given that both gcc and clang don't: https://godbolt.org/z/jjcGYsE7W. and I'm pretty sure there's no way this could amount to any optimization on non-extremely-contrived examples anyway.

I had edited one of my parent comments to mention realloc, so if we both landed on the same example, there's probably not that many significant other cases.

gavinhoward · on Aug 4, 2024

> If SQLite wants exactly C89, it can just require -std=c89, and then people compiling it with a different standard target are to blame.

Backwards compatibility? I thought that was a target for WG14.

> This is just standard backwards incompatibility, nothing about UB

But UB is insidious and can bite you with implicit compiler settings, like the default to C99 or C11.

> whether the C standard should always be backwards compatible is more interesting, but is a question orthogonal to UB

If it's a target, then it should be.

And on the contrary, UB is not orthogonal to backwards compatibility.

Any UB could have been made implementation-defined and still be backwards compatible. But it's backwards-incompatible to make anything UB that wasn't UB. These count as examples of WG14 screwing over its users.

> I do remember some mention somewhere of a real platform in fact not handling size 0 properly being an argument for reducing the standard requirement.

So WG14 just decides to screw over users from other platforms? Just keep it implementation-defined! It already was! And that's still a concession from the pure defined behavior of C89!

> I had edited one of my parent comments to mention realloc, so if we both landed on the same example, there's probably not that many significant other cases.

I beg to differ. Any case where UB was implicit just because it wasn't defined in the standard could have easily been made implementation-defined instead.

Anytime WG14 adds UB that doesn't need to be UB, it is screwing over users.

dzaima · on Aug 4, 2024

> Backwards compatibility? I thought that was a target for WG14.

C23 removed K&R function declarations. Indeed backwards-compatibility is important for them, but it's not the be-all end-all.

Having a standard state exact possible behavior is meaningless if in practice it isn't followed. And it wasn't just implementation-defined, it had a specific set of options for what it could do.

> Any case where UB was implicit just because it wasn't defined in the standard could have easily been made implementation-defined instead. Any UB could have been made implementation-defined and still be backwards compatible. But anything that wasn't UB that now is counts as an example of WG14 screwing over its users.

If this is such a big issue for you, you could just name another example. It'd take, like, 5 words to say another feature in question unnecessarily changed. I'll happily do the research on how it changed over time.

It's clear that you don't like UB, but I don't think you've said anything more than that. I quite like that my compiler will optimize out dead null comparisons or some check that collapses to a 'a + C1 < a' after inlining/constant propagation. I think it's quite neat that not being able to assume signed wrapping means that one can run sanitizers that warn on such, without heaps of false-positives from people doing wrapping arith with it. If anything, I'd want some unsigned types with no unsigned wrapping (though I'd of course still want some way to do wrapping arith where needed)

gavinhoward · on Aug 4, 2024

> Having a standard state exact possible behavior is meaningless if in practice it isn't followed.

No, it means that the bug is documented to be in the platform, not the program.

> If this is such a big issue for you, you could just name another example. It'd take, like, 5 words to say another feature in question unnecessarily changed.

Okay, how about `signal()` being called in a multi-threaded program? Why couldn't they define it in C11 such that it could be called? Obviously, such a thing didn't really exist in C99, but it did in POSIX, and in POSIX, it wasn't, and still isn't, undefined. Why couldn't WG14 have simply made it implementation-defined?

> I quite like that my compiler will optimize out dead null comparisons or some check that collapses to a 'a + C1 < a' after inlining/constant propagation.

I'd rather not be forced to be a superhuman programmer.

dzaima · on Aug 5, 2024

> No, it means that the bug is documented to be in the platform, not the program.

Yes, it means that the platform is buggy, but that doesn't help anyone wanting to write portable-in-practice code. The standard specifying specific behavior is just giving a false sense of security.

> Okay, how about `signal()` being called in a multi-threaded program? Why couldn't they define it in C11 such that it could be called?

This is even more definitely not a case of compiler developer conflict of interest. And it's not a case of previously-defined behavior changing, so that set remains still at just realloc. (I wouldn't be surprised if there are more, but if it's not a thing easily listed off I find it hard to believe it's a real significant worry)

But POSIX defines it anyway; and as signals are rather pointless without platform-specific assumptions, it's not like it matters for portability. Honestly, having signals as-is in the C standard feels rather useless to me in general. And 'man 2 signal' warns to not use 'signal()', recommending the non-standard sigaction instead.

And, as far as I can tell, implementation-defined vs undefined barely matters, given that a platform may choose to define the implementation-defined thing as doing arbitrary things anyway, or, conversely, indeed document specific behavior for undefined things. The most significant thing I can tell from the wording is that implementation-defined requires the behavior to be documented, but I am fairly sure there are many C compilers that don't document everything implementation-defined.

> I'd rather not be forced to be a superhuman programmer.

All you have to do is not use signed integers for doing modular/bitwise arithmetic just as much as you don't use integers for doing floating-point arithmetic. It's not much to ask. And the null pointer thing isn't even an issue for userspace code (i.e. what 99.99% of programmers write).

I do think think configuring behavior of various things should be more prevalent & nicer to do; even in cases where a language/platform does define specific behavior, it may nevertheless be undesired (e.g. a+1<a might not work for overflow checking if signed addition was implementation-defined (and, say, a platform defines it as saturating), and so portable projects still couldn't use it for such).

andrewaylett · on Aug 4, 2024

If you want a programming language without undefined behaviour, you want something that's not C.

gavinhoward · on Aug 4, 2024

Correct. Which is why I made my own. But C is still better than other languages because it is small.

andrewaylett · on Aug 7, 2024

It looks small, but it's not really -- the C abstract machine differs too much from the actual hardware it's running on.

You could write a "CVM", akin to the JVM, that runs C code in a virtual environment that matches the abstract machine. Or you can let your compiler deal with the differences, which leads to unhappiness such as is exhibited in this discussion thread and the article it's discussing.

account42 · on Aug 5, 2024

Suppliers generally decide what guarantees they are able and willing to give, yes.

RandomThoughts3 · on Aug 3, 2024

> if your battery caught fire just because your CRUD app dereferenced NULL, nobody (well, nobody sane) would point the finger at the app author for forgetting to check for NULL.

I think pretty much anyone sane would and would be right to do so. Incorrect code is, well, incorrect and safety critical code shouldn’t use UB. Plus, it’s your duty as a software producer to use an appropriate toolchain and validate the application produced. You can’t offload the responsibility of your failure to do so to a third party (doesn’t stop people for trying all the time with either their toolchains or a library they use but that shouldn’t be tolerated and be pointed as the failure to properly test and validate it is).

I would be ashamed if fingers were pointed towards a compiler provider there unless said provider certified that its compiler wouldn’t do that and somehow lied (but even then, still a testing failure on the software producer part).

dataflow · on Aug 3, 2024

> I think pretty much anyone sane would and would be right to do so. Incorrect code is, well, incorrect and safety critical code shouldn’t use UB

You missed the whole point of the example. I gave CRUD app as an example for a reason. We weren't talking safety-critical code like battery firmware here.

RandomThoughts3 · on Aug 3, 2024

Because your exemple isn’t credible. But even then I don’t think I missed the point, no. You are responsible for what your application does (be it a CRUD app or any others). If it causes damage because you fail to test properly, it is your responsibility. The fact that so many programmers fail to grasp this - which is taken as evidence in pretty much any other domain - is why the current quality of the average piece of software is so low.

Anyway, I would like to know by which magic you think a CRUD app could burn a battery? There is a whole stack of systems to prevent that from ever happening.

pritambaral · on Aug 3, 2024

> There is a whole stack of systems to prevent that from ever happening.

You've almost got the point your parent is trying to make. That the supply chain shares this responsibility, as they said.

> I would like to know by which magic you think a CRUD app could burn a battery?

I don't know about batteries, but there was a time when Dell refused to honour their warranty on their Inspiron series laptops if they found VLC to be installed. Their (utterly stupid) reasoning? That VLC allows the user to raise the (software) volume higher than 100%. It was their own damn fault for using poor quality speakers and not limiting allowable current through them in their (software or hardware) drivers.

RandomThoughts3 · on Aug 3, 2024

> You've almost got the point your parent is trying to make. That the supply chain shares this responsibility, as they said.

Deeply disagree. Failsafe doesn’t magically remove your responsibility.

I’m so glad I started my career in a safety critical environment with other engineers working on the non software part. The amount of software people who think they can somehow absolve themselves of all responsibility for shipping garbage still shock me after 15 years in the field.

> It was their own damn fault for using poor quality speakers

Yes, exactly, I’m glad to see we actually agree. It’s Dell’s fault - not the speaker manufacturer’s fault, not the subcontractor who designed the sound part’s fault - Dell’s fault because they are the one who actually shipped the final product.

pritambaral · on Aug 6, 2024

>> ... shares this responsibility

> Deeply disagree. ... doesn't magically remove your responsibility.

??

Literally no-one in this thread is talking about "removing responsibility", except you.

> I'm so glad ... in the field.

I don't know which demon you're trying beat back here, nor why.

> It's Dell's fault - not ...

That it is Dell's fault is not under question, but it also does not automatically absolve the speaker manufacturer or the subcontractor. Hold on, isn't that exactly the drum you've been trying to beat here?

You and I have no idea what actually went down. Maybe the speaker was wrongly rated as being able to take a higher current than it actually could. Or maybe there was a bug in the driver. Either would make someone other than Dell also responsible for the failure.

And that's what we've been trying to tell you. That responsibility is shared.

HippoBaro · on Aug 3, 2024

I think the author knows very well what UB is and means. But he’s thinking critically about the whole system.

UB is meant to add value. It’s possible to write a language without it, so why do we have any UB at all? We do because of portability and because it gives flexibility to compilers writers.

The post is all about whether this flexibility is worth it when compared with the difficulty of writing programs without UB.

The author makes the case that (1) there seem to be more money lost on bugs than money saved on faster bytecode and (2) there’s an unwillingness to do something about it because compiler writers have a lot of weight when it comes to what goes into language standards.

twoodfin · on Aug 3, 2024

Even stipulating that part of the argument, the author then goes on a tear about optimizations breaking constant-time evaluation, which doesn’t have anything to do with UB.

The real argument seems to be that C compilers had it right when they really did embody C as portable assembly, and everything that’s made that mapping less predictable has been a regression.

dathinab · on Aug 3, 2024

But C never had been portable assembly.

Which I think is somewhat the core of the problem. People treating things in C in ways they just are not. Weather that is C is portable assembly or C the "it's just bit's in memory" view of things (which often is double wrong ignoring stuff like hardware caching). Or stuff like writing const time code based on assuming that the compiler probably, hopefully can't figure out that it can optimize something.

> The real argument seems to be that C compilers had it right when they really did embody C as portable assembly

But why would you use such a C. Such a C would be slow compared to it's competition while still prone to problematic bugs. At the same time often people seem to forgot that part of UB is rooted in different hardware doing different things including having behavior in some cases which isn't just a register/mem address having an "arbitrary value" but more similar to C UB (like e.g. when it involves CPU caches).

mpweiher · on Aug 3, 2024

> But C never had been portable assembly.

The ANSI C standards committee disagrees with you.

"Committee did not want to force programmers into writing portably, to preclude the use of C as a “high-level assembler:”

https://www.open-std.org/JTC1/SC22/WG14/www/docs/n897.pdf

p 2, line 39. (p10 of the PDF)

"C code can be portable. "

line 30

dathinab · on Aug 3, 2024

The full quote is:

> Although it strove to give programmers the opportunity to write truly portable programs, the C89 Committee did not want to force programmers into writing portably, to preclude the use of C as a “high-level assembler:” the ability to write machine-specific code is one of the strengths of C. It is this principle which largely motivates drawing the distinction between strictly conforming program and conforming program (§4).

This doesn't say that C is a high-level assembly.

It just says that the committee doesn't (at that point in time) wants to force the usage of "portable" C as a mean to prevent the usage of C as high-level assembler. But just because some people use something as high level assembler doesn't mean it is high level assembly (like I did use a spoon as a fork once, it's still a spoon).

Furthermore the fact that they explicitly mention forcing portable C with the terms "to preclude" and not "to break compatibility" or similar I think says a lot about weather or not the committee thought of C as high level assembly.

Most importantly the quote is about the process of making the first C standard which had to make sure to ease the transition from various non standardized C dialects to "standard C" and I'm pretty sure that through the history there had been C dialects/compiler implementations which approached C as high level assembly, but C as in "standard C" is not that.

mpweiher · on Aug 4, 2024

It specifically says that the use of C as a "portable assembler" is a use that the standards committee does not want to preclude.

Not sure how much clearer this can be.

HeroicKatora · on Aug 4, 2024

That statement means the comittee does not want to stop it from being developed. The question is, has it? They mean a specific implementation could work as portable assembler, mirroring djb's request for an 'unsurprising' C compiler. Another interpretation would be in the context of CompCert, which has been developed to achieve semantic preservation between assembly and its source. Interestingly this of course hints at verifying an assembled snippet coming from some other source as well. Then that alternate source for the critical functions frees the rest of compiler internals from the problems of preserving constant-timeness and leakfreedom through their passes.

mpweiher · on Aug 5, 2024

No.

C already existed prior to the ANSI standardization process, so there was nothing "to be developed", though a few changes were made to the language, in particular function prototypes.

C was being used in this fashion, and the ANSI standards committee made it clear that it wanted the standard to maintain that use-case.

HeroicKatora · on Aug 4, 2024

These are aspiration statements, not a factual judgment of what that standard or its existing implementations actually are. At least they do not cover all implementations nor define precisely what they cover. Note the immediate next statement: "C code can be non-portable."

In my opinion, C has tried to serve two masters and they made a screw-hammer in the process.

The rest of the field has moved on significantly. We want portable behavior, not implementation-defined vomit that will leave you doubting whether porting introduces new UB paths that you haven't already fully checked against (by, e.g. varying the size of integers in such a way some promotion is changed to something leading to signed overflow; or bounds checking is ineffective).

The paragraph further down about explicitly and swiftly rejecting a validation test suite should also read as a warning. Not only would the proposal of modern software development without a test suite get you swiftly fired today, but they're explicitly acknowledging the insurmountable difficulties in producing any code with consistent cross-implementation behavior. But in the time since then, other languages have demonstrated you can reap many of the advantages of close-to-the-metal without compromising on behavior consistency in cross-target behavior, at least for many relevant real-word cases.

They really knew what they were building, a compromise. But that gets cherry-picked into absurdity such as stating C is portable in present-tense or that any inherent properties make it assembly-like. It's neither.

mpweiher · on Aug 5, 2024

These are statements of intent. And the intent is both stated explicitly and also very clear in the standard document that the use as a "portable assembler" is one of the use cases that is intended and that the language should not prohibit.

That does not mean that C is a portable assembly language to the exclusion of everything and anything else, but it also means the claim that it is definitely in no way a portable assembly language at all is also clearly false. Being a portable assembly (and "high level" for the time) is one of the intended use-cases.

> In my opinion, C has tried to serve two masters and they made a screw-hammer in the process.

Yes. The original intent for which it was designed and in which role it works well.

> The rest of the field has moved on significantly. We want portable behavior, not implementation-defined vomit that will leave you doubting whether porting introduces new UB paths that you haven't already fully checked against

Yes, that's the "other" direction that deviates from the original intent. In this role, it does not work well, because, as you rightly point out, all that UB/IB becomes a bug, not a feature.

For that role: pick another language. Because trying to retrofit C to not be the language it is just doesn't work. People have tried. And failed.

Of course what we have now is the worst of both worlds: instead of either (a) UB serving its original purpose of letting C be a fairly thin and mostly portable shell above the machine, or (b) eliminating UB in order to have stable semantics, compiler writers have chosen (c): exploiting UB for optimization.

Now these optimizations alter program behavior, sometimes drastically and even impacting safety (for example by eliminating bounds checks that the programmer explicitly put in!), despite the fact that the one cardinal rule of program optimization is that it must not alter program behavior (except for execution speed).

The completely schizophrenic "reasoning" for this altering of program behavior being somehow OK is that, at the same time that we are using UB to optimize all over the place, we are also free to assume that UB cannot and never does happen. This despite the fact that it is demonstrably untrue. After all UB is all over the C standard, and all over real world code. And used for optimization purposes, while not existing.

> They really knew what they were building, a compromise.

Exactly. And for the last 3 decades or so people have been trying unsuccessfully to unpick that compromise. And the result is awful.

The interests driving this are also pretty clear. On the one hand a few mega-corps for whom the tradeoff of making code inscrutable and unmanageable for The Rest of Us™ is completely worth it as long as it shaves off 0.02% running time in the code they run on tens or hundreds of data centers and I don't know how many machines. On the other hand, compiler researchers and/or open-source compiler engineers who are mostly financed by those few megacorps (the joy of open-source!) and for whom there is little else in terms of PhD-worthy or paid work to do outside of that constellation.

I used to pay for my C compiler, thus there was a vendor and I was their customer and they had a strong interest in not pissing me off, because they depended on me and my ilk for their livelihood. This even pre-dated the first ANSI-C standard, so all the compiler's behavior was UB. They still didn't pull any of the shenanigans that current C compilers do.

pjmlp · on Aug 3, 2024

Back in 1989, when C abstract machine semantics were closer to being a portable macro processor, and stuff like the register keyword was actually something compilers cared about.

SAI_Peregrinus · on Aug 3, 2024

And even then there was no notion of constant-time being observable behavior to the compiler. You cannot write reliably constant-time code in C because execution time is not a property the C language includes in its model of computation.

mpweiher · on Aug 3, 2024

But having a straightforward/predictable mapping to the underlying machine and its semantics is included in the C model of computation.

And that is actually not just compatible with the C "model of computation" being otherwise quite incomplete, these two properties are really just two sides of the same coin.

The whole idea of an "abstract C machine" that unambiguously and completely specifies behavior is a fiction.

trealira · on Aug 4, 2024

> But having a straightforward/predictable mapping to the underlying machine and its semantics is included in the C model of computation.

While you can often guess what the assembly will be from looking at C code given that you're familiar with the compiler, exactly how C is to be translated into assembly isn't well-specified.

For example, you can't expect that all uses of the multiplication operator "*" results in an actual x86 mul instruction. Many users expect constant propagation, so you can write something like "2 * SOME_CONSTANT" without computing that value at runtime; there is no guarantee of this behavior, though. Also, for unsigned integers, when optimizations are turned on, many expect compilers to emit left shift instructions when multiplying by a constant power of two, but again, there's no guarantee of this. That's not to say this behavior couldn't be part of a specification, but it's just an informal expectation right now.

What I think people might want is some readable, well-defined set of attribute grammars[0] for translation of C into assembly for varying optimization levels - then, you really would be able to know exactly how some piece of C code under some context would be translated into assembly. They've already been used for writing code generator generators in compilers, but what I'm thinking is something more abstract, not as concrete as a code generation tool.

[0]: https://en.wikipedia.org/wiki/Attribute_grammar

mpweiher · on Aug 5, 2024

> exactly how C is to be translated into assembly isn't well-specified.

Exactly! It's not well-specified so the implementation is not prevented from doing a straightforward mapping to the machine by some part of the spec that doesn't map well to the actual machine.

dathinab · on Aug 4, 2024

> But having a straightforward/predictable mapping to the underlying machine and its semantics is included in the C model of computation.

not rally, or at least not in a way which would count as "high level assembler". If it would the majority of optimizations compilers do today would not be standard conform.

Like there is a mapping to behavior but not a mapping to assembly.

Which is where the abstract C machine as a hypothetical machine formed from the rules of the standard comes in. Kinda as a mind model which runs the behavior mappings instead of running any specific assembly. But then it not being ambiguous and complete doesn't change anything about C not being high level assembly, actually it makes C even less high level assembly.

mpweiher · on Aug 4, 2024

> If it would the majority of optimizations compilers do today would not be standard conform.

They aren't.

pjmlp · on Aug 4, 2024

So you can easily tell, just by looking to the C source code, if plain Assembly instructions are being used from four books of ISA manual, if the compiler is able to automatically vectorize a code region including which flavour of vector instructions, or completely replace specific math code patterns for a single opcode.

davrosthedalek · on Aug 4, 2024

Nobody says that implementation-defined behavior must be sane or safe. The crux of the issue is that a compiler can assume that UB never happens, while IB is allowed to. Does anyone have an example where the assumption that UB never happens actually makes the program faster and better, compared to UB==IB?

layer8 · on Aug 3, 2024

The issue is that you’d have to come up with and agree on an alternative language specification without (or with less) UB. Having the compiler implementation be the specification is not a solution. And such a newly agreed specification would invariably either turn some previously conforming programs nonconforming, or reduce performance in relevant scenarios, or both.

That’s not to say that it wouldn’t be worth it, but given the multitude of compiler implementations and vendors, and the huge amount of existing code, it’s a difficult proposition.

What traditionally has been done, is either to define some “safe” subset of C verified by linters, or since you probably want to break some compatibility anyway, design a separate new language.

zajio1am · on Aug 3, 2024

> UB is meant to add value. It’s possible to write a language without it, so why do we have any UB at all? We do because of portability and because it gives flexibility to compilers writers.

Implementation-defined behavior is here for portability for valid code. Undefined behavior is here so that compilers have leeway with handling invalid conditions (like null pointer dereference, out-of-bounds access, integer overflows, division by zero ...).

What does it mean that a language does not have UBs? There are several cases how to handle invalid conditions:

1) eliminate them at compile time - this is optimal, but currently practical just for some classes of errors.

2) have consistent, well-defined behavior for them - platforms may have vastly different way how to handle invalid conditions

3) have consistent, implementation-defined behavior for them - usable for some classes of errors (integer overflow, division by zero), but for others it would add extensive runtime overhead.

4) have inconsistent behavior (UB) - C way

wyager · on Aug 4, 2024

> It’s possible to write a language without it

Whenever you do that, programmers deride the language for being "excessively academic" or something