Friday, July 8, 2011

JIT/Compiler Optimization of Fibonacci calculation

[intermediate-high level]
Hi All,


Today I want to consider the following case:
In my application I want to use some Fibonacci numbers (this is just a hypothetical example :)). I want to see if I can make the JIT/C++ Compiler to put the actual number instead of calling the calculation method. I made two implementations of the calculation but I will show only one, recursive and horribly slow implementation (don't ever use it or even consider of using it!):

    static int FibonacciRecursive(int level)
    {
      if (level == 0)
        return 0;
      if (level == 1)
        return 1;
      return FibonacciRecursive(level - 1) + FibonacciRecursive(level - 2);
    }
And my code for the tests is:
    static void Main(string[] args)
    {
      Console.ReadLine();
      int result = FibonacciRecursive(45);
      Console.Out.WriteLine(result);
    }
I added the ReadLine so that I will have time to attach a debugger when I run the code. The 45th Fibonacci number, if you were wondering, is 1134903170 and not long after that it overflows int :).
I will save you the reading time and not try my code in debug mode. I will first run my code from Visual Studio by pressing "Run" and here is the result (I added comments in case your assembly is a little rusty):

;      int result = FibonacciRecursive(45);
00000022  mov         ecx,2Dh ;put 45 in the ecx register (this is how methods in .net expect their first argument)
00000027  call        dword ptr ds:[006C1F34h] ;call the FibonacciRecursive method 
0000002d  mov         dword ptr [ebp-0Ch],eax ;get the return value from eax register and put it in the "result" variable
00000030  mov         eax,dword ptr [ebp-0Ch] 
00000033  mov         dword ptr [ebp-8],eax 

As you can see there was no optimization in this case, in fact we can see the two lines 
0000002d  mov         dword ptr [ebp-0Ch],eax 
00000030  mov         eax,dword ptr [ebp-0Ch] 
Which copy the return value of the function (stored in eax) to the result variable and then copy it back to eax.

Now I will run the same code but attach the debugger at a later stage (after JIT generated the code) here is the result (again I added some comments):
00000013  mov         ecx,2Ch ;put 44 in ecx register
00000018  call        dword ptr ds:[001A37F8h] ;call the FibonacciRecursive method
0000001e  mov         esi,eax ;put the result in esi register
00000020  mov         ecx,2Bh ;put 43 in the ecx register
00000025  call        dword ptr ds:[001A37F8h] ;call the FibonacciRecursive method
0000002b  add         eax,esi ;add the results of the two calls into eax register

As you can see the code is much more optimized. Which means JIT generates different code if it sees a debugger attached so even when you compile in Release with optimization and run in VS you don't see the most optimized code. 
You can also see that the JIT compiler unwinded the recursion and put its code inline but again the JIT compiler called the method and did not generate the result directly as I wanted. If you are beginning to doubt this is at all possible, keep reading.

Now I will change the call to 
int result = FibonacciRecursive(1);
 and run it without VS. Here is the result:
00000019  mov         edx,1 
Just as we wanted! Instead of calling the method it simply put 1 as the result.
Lets try running it with 
int result = FibonacciRecursive(2);
The result is quite interesting:
00000013  mov         ecx,1 ;put 1 in the ecx register
00000018  call        dword ptr ds:[001937F8h] ;call FibonacciRecursive
0000001e  mov         esi,eax ;put the result in esi register
00000020  mov         ecx,0 ;put 0 in the ecx register
00000025  call        dword ptr ds:[001937F8h] ;call FibonacciRecursive
0000002b  add         eax,esi ;add the results of the two calls
This is really nice. Now we have two calls which are non recursive. This makes me wonder why it didn't run the same logic as it did in the previous run and replaced the method calls with the actual values. My guess would be to speed up the compilation time as it is done in runtime.

These are all my tries in .NET and as you can see I failed to achieve my goal. One could say that this is not possible but I want to look at a little C++ trick which can achieve just what I want. This trick is taken from a programming paradigm known as Generic Programming. This is the C++ code:

#include <iostream>

template <int N>
struct Fibonacci
{
  static const long result = Fibonacci<N-1>::result+Fibonacci<N-2>::result;
};

template <>
struct Fibonacci<0>
{
  static const long result = 0;
};

template <>
struct Fibonacci<1>
{
  static const long result = 1;
};


int main()
{
  std::cout << Fibonacci<45>::result;
  return 0;  
}

and when run, it compiles into this:

01001000  mov         ecx,dword ptr [__imp_std::cout (1002048h)] 
01001006  push        43A53F82h  ; 1134903170 decimal == 43A53F82 hexadecimal
0100100B  call        dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (1002044h)]  

This is exactly what I wanted. The compiler simply pushes the 45th Fibonacci element onto the stack and calls cout. (In C++ arguments are passed using stdcall convention where all the arguments are pushed to the stack.)

That's it for this week.
Thank you for reading.

Boris.