Friday, November 18, 2011

Memory leaks when implementing GetHashCode()

 

Hi All,

If you have been following my blog then you know I am trying to list as many possible memory leaks in managed code as possible. Most developers, when told about the garbage collection mechanism in .Net, think that memory leaks cannot happen (at least in pure managed code) but as we seen this is not true. The most common case for such memory leaks are static (or singleton events) but as I shown before, using lambda expressions incorrectly may also cause objects not to be disposed in a timely manner.

We consider a case where a simple object overrides the GetHashCode method to be used in a dictionary later. In our case the object (named Person) will be used as a key for a much larger data chunk (named LargeDataObject). Being a seasoned .Net developer I know that if I override the GetHashCode method I also have to override the Equals method and so I do:

  1. public class Person
  2. {
  3.   public int Age { get; set; }
  4.  
  5.   public override int GetHashCode()
  6.   {
  7.     return Age.GetHashCode();
  8.   }
  9.  
  10.   public override bool Equals(object obj)
  11.   {
  12.     if (!(obj is Person))
  13.       return false;
  14.  
  15.     return (obj as Person).Age == this.Age;
  16.   }
  17. }

In my implementation lies the key to the memory leak (pun intended). We examine the following use of the person class:

  1. Dictionary<Person, LargeDataObject> table = new Dictionary<Person, LargeDataObject>();
  2. Person p1 = new Person() { Age = 10 };
  3. table.Add(p1, new LargeDataObject());
  4.  
  5. Console.Out.WriteLine("Contains P1(10) = {0}", table.ContainsKey(p1));

So far so good, this application would print : Contains P1(10) = True

But now I add the following code:

  1. p1.Age = 20;
  2. Console.Out.WriteLine("Contains P1(20) = {0}", table.ContainsKey(p1));

I changed the age of the person thus changing the hash code of person instance. The GetHashCode and the Equal methods are still in sync but now p1 sits in the wrong bucket within the dictionary! For this case the application would print out : Contains P1(20) = False

In fact you can get to the original p1 instance only by iterating over the keys of this dictionary (but lets face it, you didn’t use dictionary in the first place to iterate over all the keys). What you probably have is something like this in your code:

  1. Person p2 = new Person(){Age=10};
  2.       LargeDataObject dataObject = new LargeDataObject();
  3.       if (table.ContainsKey(p2))
  4.         table[p2] = dataObject;
  5.       else
  6.         table.Add(p2, dataObject);

If we run this code after the previous code then another person (and most importantly LargeDataObject) will be added to the dictionary. This would happen if I try to add a new Person with Age = 20 because of what MSDN tells us (quite correctly) to do:

“If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.”

What happens when you look for a Person with Age = 10: The dictionary goes to the bucket where the hash code is 10 and looks there. It finds p1 and checks if it equals to the current person. The age of p1 is 20 so the current person is not equal p1. Nothing found.

What happens when you look for a Person with Age = 20: The dictionary goes to the bucket where the hash code is 20 and looks there. It finds no items. Nothing found.

Conclusion: If you are implementing GetHashCode and Eqauls make sure they do not depend on mutable properties of an object or this object (if used in a hash based collection) may never be disposed or reached through your code.

Thanks for reading,

Boris.