Saturday, January 21, 2012

Weak Referencing – Building a weakly referenced cache.


Hi All,

Today let’s consider the following case: There are some external data objects inserted into your system by some external process. For each object the user may request to see additional data that takes some time to fetch or generate. To save this time you would like to build a cache for the generated data but there is one major issue. You don’t control the insertion rate of these external objects. An object may be valid (referenced) for several seconds or for several hours. Furthermore, the insertion rate may vary for each object. I will try to give a concrete example (which might not reflect real life accurately but will explain the scenario). Suppose you have a list of system alerts which you show the user in some table. The user may double click an alert to see some more details on it. To fetch this extra data you need to make a web service call to a slow server that generates this data. The problem is that the alerts may not be valid anymore because they are removed in the alert manager that manages the collection of the alerts. I would like to discuss only the cache part of this problem.

The most straightforward way to implement such cache would be to use a dictionary which maps an alert to the cached data. When the alert is removed from the alert manager collection (let’s hope it has an event for that you can subscribe to) you can remove this alert from your dictionary. This solution is in most cases too strongly coupled with the implementation of the alert manager. The alert manager will assume that once it removes an alert from its internal collection it will be freed by the garbage collection mechanism but if someone accidently keeps this alert in own dictionary or collection then it might remain there forever causing a memory leak.

The most common solution to this problem is using a weak dictionary. In C# this structure can be implemented using the WeakReference class provided by the framework. The WeakReference class allows you to create a reference that would not prevent from the garbage collection to collect this object. In the next section I provide a skeleton implementation of such a dictionary.

The WeakDictionary

To build a weak dictionary let's recap shortly how dictionary (hash table) data structure works. The dictionary contains pairs of objects, a key and a value. The value can be anything and is not important for the inner workings of the data structure. What’s important is the key (pun intended). The key class must implement two methods: GetHashCode() which generates a non-unique code that identifies the specific instance and Equals(other) which compares the current instance to another instance and returns true if they are equal. The dictionary keeps a list of “buckets”, in each bucket is a Set of key objects with the same GetHashCode result. When a key is inserted the dictionary looks at the hash code of that key and tries to insert it into a bucket with this hash code. In a bucket it iterates over all the keys already in the bucket to see if it is already contained in this bucket if no suck key is found it inserts a new key value pair in that bucket otherwise it updates the value of the existing key. The retrieval of an item works just the same.

We want our keys to be the user defined objects so that the internal implementation will not be visible for the user so we will build an internal object that we will use as a key. This object will hold a weak reference to the original key provided by the user and therefore will not prevent its collection by the garbage collector. Here is a possible implementation of such a class:

  1. public class WeakDictionary<TKey, TValue> where TKey : class
  2. {
  3.   private class WeakPair
  4.   {
  5.     private WeakReference _keyReference;
  7.     private TValue _value;
  8.     public TValue Value
  9.     {
  10.       get { return _value; }
  11.       set { _value = value; }
  12.     }
  14.     public TKey Key
  15.     {
  16.       get
  17.       {
  18.         TKey result = _keyReference.Target as TKey;
  19.         return result;
  20.       }
  21.     }
  23.     private int _innerHashCode;
  25.     public WeakPair(TKey key, TValue value)
  26.     {
  27.       _keyReference = new WeakReference(key);
  28.       _value = value;
  29.       _innerHashCode = key.GetHashCode();
  30.     }
  32.     public override int GetHashCode()
  33.     {
  34.       return _innerHashCode;
  35.     }
  37.     public override bool Equals(object obj)
  38.     {
  39.       WeakPair other = obj as WeakPair;
  40.       if (other == null)
  41.         return false;
  43.       object key1 = _keyReference.Target;
  44.       object key2 = other._keyReference.Target;
  46.       if (key1 == null && key2 == null)
  47.         return true;
  48.       if (key1 == null)
  49.         return false;
  50.       if (key2 == null)
  51.         return false;
  53.       return key1.Equals(key2);
  54.     }
  55.   }

The most interesting method here is the Equals method where we delegate to the Equals method of the original class but checking if it was released by the garbage collection beforehand.

Now for the dictionary implementation itself. Since this is only a skeleton implementation I omitted anything not relevant to understanding the implementation (such as null checks). Our weak dictionary will hold an internal regular dictionary but we will manage the buckets ourselves.

  1. private Dictionary<int, List<WeakPair>> _data = new Dictionary<int, List<WeakPair>>();

Each entry in this dictionary maps a hash code (int) to the data itself held by our weak pair class. I have implemented the three most important methods of the weak dictionary: Set, TryGet and CleanUp (which removes dead links from the dictionary). Here is my implementation:

  1. public void Set(TKey key, TValue value)
  2. {
  3.   WeakPair pair = new WeakPair(key, value);
  4.   int hashCode = key.GetHashCode();
  5.   if (!_data.ContainsKey(hashCode))
  6.     _data.Add(hashCode, new List<WeakPair>());
  7.   List<WeakPair> dataList = _data[hashCode];
  8.   for (int i = 0; i < dataList.Count; i++)
  9.   {
  10.     if (dataList[i].Equals(pair))
  11.     {
  12.       dataList[i] = pair;
  13.       return;
  14.     }
  15.   }
  17.   dataList.Add(pair);
  18. }
  20. public bool TryGet(TKey key, ref TValue value)
  21. {
  22.   int hashCode = key.GetHashCode();
  23.   if (!_data.ContainsKey(hashCode))
  24.   {
  25.     return false;
  26.   }
  28.   WeakPair pair = new WeakPair(key, default(TValue));
  29.   List<WeakPair> dataList = _data[hashCode];
  30.   for (int i = 0; i < dataList.Count; i++)
  31.   {
  32.     if (dataList[i].Equals(pair))
  33.     {
  34.       value = dataList[i].Value;
  35.       return true;
  36.     }
  37.   }
  39.   return false;
  40. }
  42. public void CleanUp()
  43. {
  44.   foreach (int key in _data.Keys)
  45.   {
  46.     List<WeakPair> pairs = _data[key];
  47.     for (int i = 0; i < pairs.Count; )
  48.     {
  49.       if (pairs[i].Key == null)
  50.         pairs.RemoveAt(i);
  51.       else
  52.         i++;
  53.     }
  54.   }
  55. }

The implementation is pretty straightforward. The interesting method is CleanUp which you need to call from time to time to remove the dead references from the dictionary. This implementation achieves the goal we set. If we use this dictionary as the cache for our alert objects they will not be considered as referenced by the cache for the garbage collection thus removing the need to register on the events of the alerts manager.

The key disadvantages of this implementation are:

  1. You still need to do the maintenance of the dictionary yourself. It is not entirely clear when to call the CleanUp method as this call may be costly in some cases.
  2. We use WeakReferences but how much overhead does it cause in reality? In terms of space the overhead is not severe (except maybe for the dead references that remain) but what about the time consumption? I ran a simple test to check how much time it takes to allocate a WeakReference as opposed to a regular object and here is the result:
    p1               As you can see, the mere allocation of a weak reference takes a lot more time than allocating a simple object. I won’t go into details on this one but if you check out the implementation using ILSpy you will see that inside it creates a GCHandle and uses some unmanaged code. Also note that each WeakReference object you add to the heap is registered within the CLR internal structures which are examined during the garbage collection run making it a bit more slower.
  3. The cached data itself is linked with the WeakReference object and not with the original object and therefore it will be garbage collected only after you run CleanUp.

So what can you do? Well the solution is simpler than it seems because .Net framework 4 provides a class for you that will do all I have mentioned above but without the disadvantages. This class is the ConditionalWeakTable. This data structure will weakly hold references to both the keys and the values.

Here is a little test program that compares the two structures:

  1. public class Alert
  2. {
  3.   static int Count = 0;
  4.   private int _data;
  5.   public Alert()
  6.   {
  7.     _data = Count++;
  8.   }
  10.   ~Alert()
  11.   {
  12.     Console.Out.WriteLine(String.Format("* Finalizing Alert {0}", _data));
  13.   }
  14. }
  16. public class CacheData
  17. {
  18.   static int Count = 0;
  19.   private int _data;
  20.   public CacheData()
  21.   {
  22.     _data = Count++;
  23.   }
  25.   ~CacheData()
  26.   {
  27.     Console.Out.WriteLine(String.Format("*** Finalizing CacheData {0}", _data));
  28.   }
  29. }

Here I defined two data objects that correspond to my example. Every time an object is created it is given a unique ID which is printed out on the finalizer of that object.

  1. Console.Out.WriteLine("WeakDictionary without cleanup");
  2. List<Alert> alerts = new List<Alert>();
  3. WeakDictionary<Alert, CacheData> dictionary = new WeakDictionary<Alert, CacheData>();
  4. for (int i = 1; i <= 10; i++)
  5. {
  6.   Alert alert = new Alert();
  7.   if (i % 2 == 0)
  8.     alerts.Add(alert);
  9.   dictionary.Set(alert, new CacheData());
  10. }
  12. GC.Collect();
  13. GC.WaitForPendingFinalizers();
  14. Console.Out.WriteLine("------------------");
  15. Console.Out.WriteLine("WeakDictionary after cleanup");
  17. dictionary.CleanUp();
  19. GC.Collect();
  20. GC.WaitForPendingFinalizers();
  21. Console.Out.WriteLine("------------------");
  22. Console.Out.WriteLine("Clearing the main list of alerts");
  24. alerts.Clear();
  26. GC.Collect();
  27. GC.WaitForPendingFinalizers();
  28. Console.Out.WriteLine("------------------");
  29. Console.Out.WriteLine("ConditionalWeakTable without cleanup");
  31. ConditionalWeakTable<Alert, CacheData> table = new ConditionalWeakTable<Alert, CacheData>();
  32. for (int i = 1; i <= 10; i++)
  33. {
  34.   Alert alert = new Alert();
  35.   if (i % 2 == 0)
  36.     alerts.Add(alert);
  37.   table.Add(alert, new CacheData());
  38. }
  40. GC.Collect();
  41. GC.WaitForPendingFinalizers();
  42. Console.Out.WriteLine("------------------");

and the output is:

WeakDictionary without cleanup
* Finalizing Alert 8
* Finalizing Alert 0
* Finalizing Alert 6
* Finalizing Alert 4
* Finalizing Alert 2
WeakDictionary after cleanup
*** Finalizing CacheData 2
*** Finalizing CacheData 4
*** Finalizing CacheData 6
*** Finalizing CacheData 0
*** Finalizing CacheData 8
Clearing the main list of alerts
* Finalizing Alert 7
* Finalizing Alert 3
* Finalizing Alert 1
* Finalizing Alert 5
ConditionalWeakTable without cleanup
* Finalizing Alert 9
*** Finalizing CacheData 18
* Finalizing Alert 18
*** Finalizing CacheData 16
* Finalizing Alert 16
*** Finalizing CacheData 14
* Finalizing Alert 14
*** Finalizing CacheData 12
* Finalizing Alert 12
*** Finalizing CacheData 10
* Finalizing Alert 10

As you can see when running garbage collection in the first example the even numbered alerts are freed because they are only in the weak dictionary and not in the list. The cache data is collected only after CleanUp is called. On the other hand in the ConditionalWeakTable both the alerts and the data of even ids are cleared immediately after the collection just as we wanted.

Thank you for reading,