Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Do C compilers disprove Fermat's Last Theorem? (regehr.org)
76 points by jlangenauer on May 1, 2010 | hide | past | favorite | 26 comments


Welcome to the world of compiler optimization. A compiler optimization has to preserve all side effects of a program. This is pretty similar to partial correctness.

Partial correctness means: This code computes the right thing whenever it terminates. It is a subset of total correctness, which states: This code terminates given the precondition and also computes the right then whenever it terminates.

In his case, the first loop had a simple side effect: None. Thus it was a correct optimization to replace the code with nothing. Fixes would include adding volatile to a variable, returning the variables as he did, printing something, ...


Termination is an observable side-effect of a program. It is not a correct compiler optimization to turn a non-terminating program into a terminating one. Turing completeness is the black box beyond which a compiler's optimizer cannot see past, and it must, for correctness' sake, must give up when it cannot prove termination.


Termination is an observable side-effect of a program

Not per the C99 spec, at least (5.1.2.3: "Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment.")

Termination is "observable", but so is runtime, memory consumption, code size, and myriad other properties that a compiler is at liberty to manipulate.

Turing completeness is the black box beyond which a compiler's optimizer cannot see past, and it must, for correctness' sake, must give up when it cannot prove termination.

Correctness according to what standard?


Re C99 - the existence of an execution environment is certainly observable (it'll disappear with respect to the program when the program terminates), unless you try to argue that exit(0) has no observable side-effects according to the C standard, and thus we should expect code following exit(0) to also be executed, where I think we would be entering absurdity.

Re correctness: That the compiler optimizer cannot see past the halting problem is a general fact about compiler analysis, which you should be familiar with from most introductions to compiler theory. That an optimizer assumes without proof a solution to the halting problem for some sub-program would mean that it's an unreliable translator, because it is making unsound assumptions.

Of course, if your define your language such that it's OK to make unsound assumptions, that's another matter.


That an optimizer assumes without proof a solution to the halting problem for some sub-program would mean that it's an unreliable translator

The optimizer doesn't need to solve the halting problem for this example. It merely needs to prove that any iteration of the loop has no side effects; hence, an unbounded number of loop iterations also has no side effects, and the loop can be removed.

The real issue is whether termination is considered a side effect. The canon is the C spec; if you can point to some language in the spec that would prevent the optimization, I'd love to see it.

Update: on reflection, I think you're right. The point isn't that the loop has side-effects (it obviously doesn't); but by changing the termination behavior of the loop, the compiler is likely to induce additional side effects (when it runs the subsequent code).


But a loop that never exits will never continue on to the rest of the program, which will never have its side-effects.

This loop has no side-effects:

    for (;;)
        /* do nothing */;
So the compiler, according to the spec, can remove it, right?

    for (;;)
        /* do nothing */;
    destroy_the_world(); /* has side-effects */
So will the world be destroyed? Or is the following code dead? In which case, it's not OK for the compiler to remove loops that have no side-effects.


The C spec states[1] that code without side-effects may be removed. The C spec does not list non-termination as a side-effect. The C spec also states that anything not explicitly stated is undefined behaviour. That means that in the abstract machine that the C spec defines, it is valid to remove the loop.

If you don't want the loop to be removed, then you are working outside of the defined abstract machine, which means that its up to you to make sure it operates as desired. The embedded systems guys mentioned in the article (on the llvm bug tracker, they also said it "wasn't much of an issue" for them, because they did, in fact, tell the compiler to do what they wanted, wheras the author stated that it was) did this by compiling the necessary code without optimisations. Other valid approaches would be to tell the compiler that the loop may, in fact, have side-effects which cannot be known in the context of the code by declaring the variables volatile (the C spec defines reading volatile varibales as side-effects). If you see non-termination as side-effects, then you should tell your compiler somehow.

The C programming language runs in a well defined abstract machine. Anything a compiler does which is not defined by the abstract machine (so long as it doesn't contradict the abstract machine, ie the abstract machine functions as stated) is not a bug, but is considered undefined behaviour and cannot be relied upon - therefore relying on an infinite loop which does not contain side-effects is undefined behaviour.

--

As for if a programming language in general (rather than specifically C) should remove infinite loops.. I think its up to each language to define this. If a language states that non-termination is a side-effect, then compilers cannot optimise away code which may not terminate, unless the user invokes undefined compiler-specific behaviour by telling it to (eg compiler options) or by changing the code to make your intent obvious to the compiler. Or by solving the halting problem.

So, in summary: The C compilers are NOT wrong to elliminate the side-effect free infinite loops. Other languages MAY be wrong to do so - it depends on what the language specs state. You CAN get around this by telling the compiler what your intent is, by either changing the code (eg by introducing side-effects) or by invoking non-standard behavour (eg, through compiler switches).

[1] 5.1.2.3 (Execution Environment), paragraphs 2 and 3.


Every optimization can potentially have a side-effect :

    {
        time ta0 = getTime();
        for (;;) { /* do something 1*/ }
        time ta1 = getTime();
        
        time tb0 = getTime();
        for (;;) { /* do something 2*/ }
        time tb1 = getTime();
        
        if(ta1-ta0 < tb1-tb0)
            destroy_the_world(); /* has side-effects */
    }
The result depends of the speed of execution of each loops. Should optimizations really be allowed only if they don't have any side effect ?


  if you can point to some language in the spec that would  
  prevent the optimization, I'd love to see it.
The author added an update that says, among other things:

  In contrast, the Java language definition is quite clear
  that infinite loops may not be terminated by the JVM.
(with a link to the JLS)


The alternate behavior, while initially non-intuitive, seems more useful to me, in usual scenarios. And the grandparent post's taxonomy including the idea of "partial correctness" also seems useful. So you'll have to justify your "must" a bit more.

In particular: a programmer who wants an infinite loop can get one easily enough. Code that is complex, and has no side-effects other than affecting termination, and never terminates is likely to be a bug. As long as users of an optimizer know that it makes this choice -- that this particular pedantic kind of 'correctness' is waived in the face of otherwise side-effect-less code -- it seems an implementor's choice. And lots of competent implementors seem to have chosen the path you say they "must" not do.


If you accept the grandparent's argument on face value, you should accept that a compiler should be able to remove other do-nothing infinite loops, like in the example I mentioned elsewhere:

    for (;;)
        /* do nothing */;
    destroy_the_world(); /* has side-effects */
Why should a compiler remove the loop in the OP's case, but not remove the loop above? Do you accept that removing the loop changes the side-effects of the program?


It does change the effects -- but in a predictable and possibly-useful way. A loop that doesn't exit or do any work itself is far more likely a bug than an intentional behavior; removing it like other do-nothing code may benefit far more programs seeking "optimization" than the alternative.


That is correct. However, in reality, a program whose only side effect is to terminate (or not terminate) is boring enough to be considered an edge case and changed.


I don't think "boring" is the right technical term; are you suggesting that the halting problem itself is boring, since it explicitly only relates to whether programs terminate or not?

The compiler is not privileged enough to state what is boring and what isn't. The example program calculates something meaningful - if it terminated, it would mean that there was a solution found, contrary to Fermat's theorem.


No, i think he's suggesting knowledge understanding and science as a whole depend on observable side effects. For example, i have a program that does in fact provide a counter example to fermat's last theorem. Unfortunately it does not print the result.

If you deny observable side effects as a requirement, you must accept that i do have a program that does that.

Most cs curriculums cover propositional logic. if you recall the truth table for implication, the false case always evaluates to true. my instructor used to call it the "who cares" case.

If you don't demand a witness for the proposition, the compiler is under no obligation to provide one.


I find it somewhat difficult to believe we're having this conversation in any seriousness.

Consider this program fragment:

    for (;;)
        /* do nothing */;
    destroy_the_world(); /* has side-effects */
Would you say that a compiler that removed the loop changed the side-effects of the program?


Yes, but in what vaguely reasonable circumstances would you write dead code whose deadness depends on the fact that an infinite loop precedes it? The fact that the C standard doesn't consider changing termination properties from nonterminating to terminating to be an impermissible optimization seems reasonably practical.

The only thing I can guess at would be something in embedded-land that's using nonterminating loops as some sort of control-flow construct.


Reasonable has nothing much to do with it, IMO; and I'm not really talking about C (it is only an example), but about compiler optimizers.


In the end, this only means the following:

  - The C spec permits this behaviour, the Java Spec doesn't
  - as a programmer, know the language and the spec that you use
The destroying_world() example only shows that the programmer needs to know the language he is using. The optimization is in my opinion practical and reasonable in most cases.

Every programmer that depends his program workflow and important decisions on the termination of an infinite loop should be bitch slapped to hell. Do you have another more reasonable example? The only example for the infinite loop i can think of is the eventloop of some network code. But i fail to see where any code over a few lines doesn't have any side effects.


You're absolutely right. Strict evaluation requires proof of termination. I should never comment right before bed.


Compare to the reddit-thread and the other comments. I define boring as "without side effects", that is, without io-access, without access to volatile variables and such according to the C-Spec. Boring also is language dependant, of course, as different languages have different side effects.

And yes, as such, I define the theoretically interesting halting problem as practically boring, because I only care about the result my program computes and outputs. If there is no output, there is no output -- not even the possibility of output, no matter if my program runs in less than a second, an hour, a year or infinitely long.

Yes, technically they are differnt programs. Practically, I run the program, notice "What the... how is it so fast?", notice the missing printf in there, add it (while being happy about the change in semantics which saved me several minutes), see the result and do something useful.


The Halting problem is primarily practically useful in automated program analysis, of which the most common instance is compiler optimizers.

I'm loathe to reference Wikipedia as an authority, but it's even quoted there as part of a proof as to why compilers cannot optimize programs as much as might be possible:

http://en.wikipedia.org/wiki/Compiler_optimization

However, optimizing compilers are by no means perfect. There is no way that a compiler can guarantee that, for all program source code, the fastest (or smallest) possible equivalent compiled program is output; such a compiler is fundamentally impossible because it would solve the halting problem.

This may be proven by considering a call to a function, foo(). This function returns nothing and does not have side effects (no I/O, does not modify global variables and "live" data structures, etc.). The fastest possible equivalent program would be simply to eliminate the function call. However, if the function foo() in fact does not return, then the program with the call to foo() would be different from the program without the call; the optimizing compiler will then have to determine this by solving the halting problem.


Proggit had a lively discussion on this earlier today: http://www.reddit.com/r/programming/comments/byilp/c_compile...


I can't say I've tried it, but shouldn't this be enough to avoid the optimization?

   volatile int i=1;
   while (i) { }


Being volatile, i can change in ways which cannot be deduced from the program code.

Thus, the loop loops until something which cannot be deduced from the program code happens.

Thus, yes.


Make sure to read the first comment of the post, the poster wrote a number of nonsensical things ;)

The compiler does not try at all to analyze the loop to understand if this can exit or not, because it can do this only for very trivial cases. Instead since the variables are not references after the loop, the loop is ignored as it does not matter what will happen, but the outcome of the computation will never be used, so it can be skipped at all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: