Tail Call Improvements in .NET Framework 4

CWuestefeld · on March 9, 2010

on x64 you should see shorter call stacks in optimized code because the JIT generated more tail calls (don’t worry this optimization is turned off for debug code)

I think it used to be that there was little perf difference between DEBUG and RELEASE builds, so for simplicity, with internal apps we always just build to DEBUG.

The above quote suggests that these days, they really are doing worthwhile optimizations in RELEASE builds. Does anybody know more about that?

mikedouglas · on March 9, 2010

This sounds like a bad idea.

TCO is not just a perf difference. Once you add it to a language, you've changed the semantics. The same piece of code will throw a StackOverflowException in DEBUG mode, yet functional perfectly in RELEASE. Scheme's spec requires TCO for this very reason.

_lnsj · on March 9, 2010

If you think a feature that could potentially behave differently in RELEASE and DEBUG modes is problematic, you'll want to avoid floats and doubles in .NET. Runtime optimizations that do calculations in the processor's 80-bit FPU can alter the results of float/double calculations. My personal example: http://stackoverflow.com/questions/2225503/clr-jit-optimizat...

pcarmichael · on March 10, 2010

If you want identical results regardless of optimization level or machine characteristics, you should really be using fixed-point instead.

scott_s · on March 9, 2010

You and I have a different definition of "semantics" when it comes to programming languages. Your code still achieves the same result, just in a different way. That requirement - equivalent outcome - is how compilers decide what optimizations it can and can't perform. This is often stated as "optimizations can't change program semantics."

bad_user · on March 9, 2010

If an optimization means that an algorithm reaches the desired result or not, that's not an optimization anymore.

  def fact(n, r=1):
    if n == 1: return r
    return fact(n-1, r * n)

Regardless of any available stack-space you may have, for the above method (which can be tail-call optimized) try calculating fact(1000000). If that's the requirement, then without TCO this implementation is incorrect.

eru · on March 9, 2010

Indeed. Though you have to exclude preservation of timing (and memory usage) effects from your definition of semantics to make any optimization possible.

bad_user · on March 9, 2010

Sounds like a bad idea to me too ... but apparently X64 TCO on CLR 2 was considered an optimization, not something that changes the semantics ... so the DEBUG/RELEASE split has always been in there, and now it's a matter of backwards compatibility ... since it can change the stack-trace and in DEBUG-mode you expect the full trace.

CWuestefeld · on March 9, 2010

I don't think that they're talking about adding this at the language level. Nothing is changing in the C# spec for this, for example. This is just talking about the kinds of code that the optimizer and JITter are allowed to generate on the back end.

rubyrescue · on March 9, 2010

However the stack space is ‘recycled’ such that if you have a tail recursive algorithm, the first tail call that isn’t ‘easy’ will erect the TailCallHelper stack frame, and subsequent non-easy tail calls may need to grow that frame to accommodate all the arguments. Hopefully once the algorithm has gone through a full cycle of recursion the TailCallHelper stack frame has grown to the maximum sized needed by all the non-easy calls involved in the recursion, and then never grows, and thus prevents a stack overflow.

- does this mean that it's possible to use TailCallHelper but still not be able to guarantee no stack overflow?

amock · on March 9, 2010

Yes, but I don't think that's more true of TailCallHelper than of any function calls using a stack. This sounds like it's saying that even with tail call optimization it's possible for the arguments to a function to be too big for the stack. For example, if the arguments you pass to the recursive function call are stack allocated and grow each time then each call will increase stack usage, even though it's reusing the space from the previous call.

mquander · on March 9, 2010

It sounds to me like the stack space required by TailCallHelper is proportional to the largest amount of stack space ever consumed by a single call of the tail recursive function(s) it's assisting, whereas without it, the stack space required would be proportional to the depth of recursive nesting. So using TailCallHelper you should be able to guarantee arbitrarily deep tail recursion.

bad_user · on March 9, 2010

It seems to me that the TailCallHelper is just a trampoline ... the method wanting to do a tail call places on the stack a reference to the next method to be called + it's arguments, returns, and then TailCallHelper calls it.

So yeah ... you have a guarantee, but it's also using stack space, so you have to watch-out for big arguments lists, because that may overflow the stack.

trezor · on March 9, 2010

If I'm reading this correctly this only applies to the .NET 4 runtime and not the old .NET 2.0 runtime (also powering .NET 3.0 and 3.5 solutions).

In that case I guess if you are running tail-recursion heavy algorithms on x64, just recompiling the same old code for the .NET 4 runtime should yield runtime improvements.

Would be interesting if someone had any actual measurements for this.

_lnsj · on March 9, 2010

I'm no CLR JIT expert, but I'm not sure you even need to recompile. It sounds to me that you just need to run your existing tail-calling code on the .NET 4 x64 CLR, presuming your current code has ``tail'' opcodes.

DougBTX · on March 9, 2010

Appears to require a small amount of magic: http://stackoverflow.com/questions/896629/how-to-run-clr-2-a...

topbanana · on March 9, 2010

Filed under: BetIWontNoticeAnyDifference