Friday, May 20, 2011

Lambda expressions, anonymous classes and memory leaks.

[Intermediate level]
[This blog post was originally published on an internal company blog on February 20th 2011]


Hi All,



Couple days back I was writing code as usual and I decided to use a lambda expression as a function. The code I wrote looked something like this:

 public void Moo()
    {
      int x = 10;
      SomeEvent += (i) => { Console.Out.WriteLine(x); };
    }
Then, I stopped for a second. What is going on here? I just used a local variable in a method that is going to be called long after the stack frame where this local variable was defined is gone. This doesn't make any sense. Clearly, once the program counter (instruction pointer) leaves the method Moo, the variable x defined on the stack will be freed. I ran the code and everything worked perfectly, 10 was printed on the screen once SomeEvent was fired. Strange!
I decided to ask a friend... "What is going on here?" I asked. "Maybe it's like anonymous class" he said. This actually made some sense to I decided to dig deeper using my trusty (but no longer free) friend: Reflector.

Before I begin copy-pasting IL code, a word about anonymous classes. In .NET 3.5 Microsoft introduced LINQ which allows easy queries on various types of collections. But they noticed a problem. You can have many different types of queries on the same collection and every time a result record would look different. For example if you have a DB table with the colums ID, Name and Title. Sometimes you want to retrieve only the name, sometimes only the title, sometimes you want to count the number of rows for each name (so the result is (Count, Name)). Previously the user would have to define a new class or struct for each return type but anonymous classes solve this by using the "var" keyword.
(Yes, this is the correct place to use "var" and not when you are too lazy to write the actual variable type!)
I will not give a concrete example on how to use LINQ but for example if I want to define a new class that has two fields, Count and Name I can do it like:

var p = new { Name = "Boris"Count = 10 };

Now I can access p.Count or p.Name as usual; Anyway... I wrote a small class to see what IL created. The content of the file is:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication3
{
  public delegate void HowItWorksHandler(int number);
  class Class1
  {
    public event HowItWorksHandler SomeEvent;
    public Class1()
    {
    }
    public void Moo()
    {
      int x = 10;
      SomeEvent += (i) => { Console.Out.WriteLine(x); };
    }
    public void Honey()
    {
      var p = new { Name = "Boris", Count = 10 };
    }
    public void Sheep()
    {
      SomeEvent += (i) => { Console.Out.WriteLine(i + 1); };
    }
    public void Memory()
    {
      MemoryHolder mh = new MemoryHolder();
      SomeEvent += (x) => { mh.BigMemory = 10;};
    }
    void Class1_How(int number)
    {
      throw new NotImplementedException();
    } 
  }
  public class MemoryHolder
  {
    public int BigMemory;
    public MemoryHolder()
    {
      Console.Out.WriteLine("Memory holder created");
    }
    ~MemoryHolder()
    {
      Console.Out.WriteLine("Memory holder free");
    }
  }
}
Now lets start from the end... For the method Honey where I used an anonymous class a new class was created as expected. It was located in the dll but with no namespace and had quite a full definition:
[CompilerGenerated, DebuggerDisplay(@"\{ Name = {Name}, Count = {Count} }", Type="<Anonymous Type>")]
internal sealed class <>f__AnonymousType0<<Name>j__TPar, <Count>j__TPar>
{
    // Fields
    [DebuggerBrowsable(DebuggerBrowsableState.Never)]
    private readonly <Count>j__TPar <Count>i__Field;
    [DebuggerBrowsable(DebuggerBrowsableState.Never)]
    private readonly <Name>j__TPar <Name>i__Field;

    // Methods
    [DebuggerHidden]
    public <>f__AnonymousType0(<Name>j__TPar Name, <Count>j__TPar Count);
    [DebuggerHidden]
    public override bool Equals(object value);
    [DebuggerHidden]
    public override int GetHashCode();
    [DebuggerHidden]
    public override string ToString();

    // Properties
    public <Count>j__TPar Count { get; }
    public <Name>j__TPar Name { get; }
}

Nothing of much interest here but note that the compiler overrides the default methods to some less generic implementation. But, the really interesting method is the method Moo. Let's look at the IL there:

.method public hidebysig instance void Moo() cil managed
{
    .maxstack 4
    .locals init (
        [0] class ConsoleApplication3.Class1/<>c__DisplayClass1 CS$<>8__locals2)
    L_0000: newobj instance void ConsoleApplication3.Class1/<>c__DisplayClass1::.ctor()
    L_0005: stloc.0 
    L_0006: nop 
    L_0007: ldloc.0 
    L_0008: ldc.i4.s 10
    L_000a: stfld int32 ConsoleApplication3.Class1/<>c__DisplayClass1::x
    L_000f: ldarg.0 
    L_0010: ldloc.0 
    L_0011: ldftn instance void ConsoleApplication3.Class1/<>c__DisplayClass1::<Moo>b__0(int32)
    L_0017: newobj instance void ConsoleApplication3.HowItWorksHandler::.ctor(object, native int)
    L_001c: call instance void ConsoleApplication3.Class1::add_SomeEvent(class ConsoleApplication3.HowItWorksHandler)
    L_0021: nop 
    L_0022: nop 
    L_0023: ret 
}
Look at that, a new class named c__DisplayClass1 was defined, it has a local variable x and a method called Moo. When we do our += to the event a new instance is created (L_0000), the local variable x is copied to the instance variable x (L_000a) and the method that is 
being added to the event is the method Moo of this new instance (L_0011 - L_0017). Now lets look at the class code:

[CompilerGenerated]
private sealed class <>c__DisplayClass1
{
    // Fields
    public int x;

    // Methods
    public void <Moo>b__0(int i)
    {
        Console.Out.WriteLine(this.x);
    }
}
What a nice surprise, the compiler generated method Moo is exactly the same as our code inside the lambda expression :)
Now all the code makes sense! To make sure, I added a method named Sheep which does not use a local variable. In this case a new method
was added to our existing class (Class1) and it looks like this:

[CompilerGenerated]
private static void <Sheep>b__3(int i)
{
    Console.Out.WriteLine((int) (i + 1));
}
No surprises here.
If you made it to this point then you should be wondering... if this is what happens when I use a local variable in a lambda expression isn't that a huge potential for a memory leak?
Well, the answer is YES! This is a huge potential for a memory leak.
What would happen in this code:

      using (StreamReader reader = new StreamReader("MyFile"))

      {
        SomeEvent += (x) => { string s = reader.ReadToEnd(); };
      }
The stream instance was copied onto an anonymous class which is stored god knows where but it was closed once the using statement is over. An even worse case is when you think the variable will get garbage collected but it is actually held by some anonymous method (see the method Memory in the original code that simulates a memory holding class, the finalizer of the MemoryHolder class is never called after the Memory method is called, no matter how many times you call GC.Collect()).
Conclusion
1) Using anonymous classes and methods or lambda expressions has some overheads and garbage classes/methods created.
2) Using local variables in lambda expressions can cause memory leaks and other issues.
3) It seems that the functionality of anonymous classes and of local variables in lambda  expressions is not the same.
4) Using "var" instead of an actual type name doesn't make your code cool.
On a personal note, I am using .NET since version 1.1 and I have the feeling that in each version they add more and more "AutoMagical" (http://en.wiktionary.org/wiki/automagicalconcepts which make your code shorter but not very maintainable. I think the best example for the most "AutoMagical" feature is Binding in WPF.
You write some string in a XAML file and some magic changes the value of the ViewModel. I personally try to avoid any "AutoMagical" behavior and no one
has yet to convince me otherwise (maybe I am just used to Delphi where you could debug the code up to the assembler commands :)). My honest
advice for you is to consider the same.
Thanks for reading,
Boris.