Saturday, March 3, 2012

Implementing CacheDictionary–A Dictionary with a limited number of items that can be used as cache.

 

Hi All,

Today we consider the following scenario: There is a process that needs to generate many resources. The generation process takes a long time and each generated resource is taking a considerable amount of memory (you don’t want to store more than a few such resources in the memory at a time). After analysis it is evident that when the process runs it often generates the same resource over and over again such that while the total number of generated resources is quite large, all these resources are the same. The solution would be to generate each resource only once and store it in some dictionary. Each subsequent request to generate this resource would simply retrieve it from this dictionary. To solve the issue I have implemented a CacheDictionary which holds only a specified amount of items and has a versatile removal strategy (when trying to add an item over the allowed limit).

It is possible to solve this problem using WeakReferenced values using a regular dictionary. In this solution a regular dictionary holds the resources as WeakReferences and therefore the GC may collect them at any time. The main advantage here is that you allow the CLR to manage the memory itself but you lose the ability to tailor the memory allocation and freeing behavior to your exact needs. At some cases this solution is probably very good but for other cases we need the CacheDictionary.

The CacheDictionary

My implementation encapsulates (and not inherits from) a regular dictionary. I have implemented the IDictionary<TKey,TValue> interface delegating the methods to an internal instance of the dictionary. The constructor requires two arguments: The maximum number of items allowed in the CacheDictionary and the RemovalStrategy. The RemovalStrategy is a class which controls which key is removed from the CacheDictionary when the Add method is called but a full capacity is reached. This class implements the following interface:

  1. public interface ICacheDictionaryRemovalStrategy<TKey>
  2. {
  3.   /// <summary>
  4.   /// Initialize the strategy and pass the maximum number of allowed items
  5.   /// </summary>
  6.   /// <param name="maxSize">The maximum number of allowed items</param>
  7.   void Initialize(int maxSize);
  8.  
  9.   /// <summary>
  10.   /// Notify the strategy that a key was added to the base collection
  11.   /// </summary>
  12.   /// <param name="key">The key that was added</param>
  13.   void KeyAdded(TKey key);
  14.  
  15.   /// <summary>
  16.   /// Notify the strategy that a key was removed from the base collection
  17.   /// </summary>
  18.   /// <param name="key">The key that was removed</param>
  19.   void KeyRemoved(TKey key);
  20.  
  21.   /// <summary>
  22.   /// Notify the strategy that a key was accessed (retrieved by the user) in the base collection
  23.   /// </summary>
  24.   /// <param name="key">The key that was retrieved</param>
  25.   void KeyAccessed(TKey key);
  26.  
  27.   /// <summary>
  28.   /// Notify the strategy that the base collection was cleared
  29.   /// </summary>
  30.   void Clear();
  31.  
  32.   /// <summary>
  33.   /// Get the most appropriate key to remove, this is called when the base collection runs out of space
  34.   /// </summary>
  35.   /// <returns>The key that the base collection will remove</returns>
  36.   TKey GetKeyToRemove();
  37. }

Most of the methods are used to notify the strategy about changes in the underlying dictionary with two exceptions. Initialize method is called in the constructor of the CacheDictionary to allow the strategy run its initialization. The GetKeyToRemove is called when the user tries to add an item to the CacheDictionary but the items count is equal to the maxSize passed in the constructor. The CacheDictionary expects to get an existing key or an exception will be thrown. A noteworthy method is KeyAccessed. In my implementation this method is called whenever the user retrieves some value by using a key (see next section for details).

Tracking Access

As mentioned above, I define access to the CacheDictionary as retrieving some value by its key. The KeyAccessed method is therefore called in the following locations:

  • TryGetValue method if the underlying call to TryGetValue was successful.
  • The indexer property.
  • The Current property of the custom enumerator.

The third point is worth elaborating upon. I am implementing the IDictionary interface and therefore I must provide a way to enumerate over the Key-Value pairs inside the dictionary. If I would just delegate these calls to the regular enumerator then I would miss an access done by the enumerator itself. For this reason I have added a wrapper class (CacheDictionaryEnumerator) which wraps the calls to the underlying enumerator but calls KeyAccessed method when needed.

The full implementation is therefore:

  1. public class CacheDictionary<TKey, TValue> : IDictionary<TKey, TValue>
  2. {
  3.   private Dictionary<TKey, TValue> _data;
  4.   private int _maxSize;
  5.   private ICacheDictionaryRemovalStrategy<TKey> _removalStrategy;
  6.  
  7.   public CacheDictionary(int maxSize, ICacheDictionaryRemovalStrategy<TKey> removalStrategy)
  8.   {
  9.     if (removalStrategy == null)
  10.       throw new ArgumentNullException("removalStrategy");
  11.     if (maxSize == 0)
  12.       throw new ArgumentException("maxSize must be a positive integer value");
  13.     _maxSize = maxSize;
  14.     _removalStrategy = removalStrategy;
  15.     _data = new Dictionary<TKey, TValue>();
  16.  
  17.     _removalStrategy.Initialize(maxSize);
  18.   }
  19.  
  20.   #region IDictionaty Implementation
  21.  
  22.   public void Add(TKey key, TValue value)
  23.   {
  24.     if (_data.ContainsKey(key))
  25.       _data.Add(key, value); //I want to throw the same exception as the internal dictionary for this case.
  26.  
  27.     if (_data.Count == _maxSize)
  28.     {
  29.       TKey keyToRemove = _removalStrategy.GetKeyToRemove();
  30.       if (_data.ContainsKey(keyToRemove))
  31.         _data.Remove(keyToRemove);
  32.       else
  33.         throw new Exception(String.Format("Could not find a valid key to remove from cache, key = {0}", key == null ? "null" : key.ToString()));
  34.     }
  35.     _data.Add(key, value);
  36.     _removalStrategy.KeyAdded(key);
  37.   }
  38.  
  39.   public bool ContainsKey(TKey key)
  40.   {
  41.     return _data.ContainsKey(key);
  42.   }
  43.  
  44.   public ICollection<TKey> Keys
  45.   {
  46.     get {return _data.Keys;}
  47.   }
  48.  
  49.   public bool Remove(TKey key)
  50.   {
  51.     bool result = _data.Remove(key);
  52.     if (result)
  53.       _removalStrategy.KeyRemoved(key);
  54.     return result;
  55.   }
  56.  
  57.   public bool TryGetValue(TKey key, out TValue value)
  58.   {
  59.     bool result = _data.TryGetValue(key, out value);
  60.     if (result)
  61.       _removalStrategy.KeyAccessed(key);
  62.     return result;
  63.   }
  64.  
  65.   public ICollection<TValue> Values
  66.   {
  67.     get { return _data.Values; }
  68.   }
  69.  
  70.   public TValue this[TKey key]
  71.   {
  72.     get
  73.     {
  74.       TValue result = _data[key];
  75.       _removalStrategy.KeyAccessed(key);
  76.       return result;
  77.     }
  78.     set
  79.     {
  80.       _data[key] = value;
  81.       _removalStrategy.KeyAccessed(key);
  82.     }
  83.   }
  84.  
  85.   public void Add(KeyValuePair<TKey, TValue> item)
  86.   {
  87.     Add(item.Key, item.Value);
  88.   }
  89.  
  90.   public void Clear()
  91.   {
  92.     _data.Clear();
  93.     _removalStrategy.Clear();
  94.   }
  95.  
  96.   public bool Contains(KeyValuePair<TKey, TValue> item)
  97.   {
  98.     return _data.Contains(item);
  99.   }
  100.  
  101.   public void CopyTo(KeyValuePair<TKey, TValue>[] array, int arrayIndex)
  102.   {
  103.     ((IDictionary<TKey, TValue>)_data).CopyTo(array, arrayIndex);
  104.   }
  105.  
  106.   public int Count
  107.   {
  108.     get { return _data.Count; }
  109.   }
  110.  
  111.   public bool IsReadOnly
  112.   {
  113.     get { return ((IDictionary<TKey, TValue>)_data).IsReadOnly; }
  114.   }
  115.  
  116.   public bool Remove(KeyValuePair<TKey, TValue> item)
  117.   {
  118.     bool result = ((IDictionary<TKey, TValue>)_data).Remove(item);
  119.     if (result)
  120.       _removalStrategy.KeyRemoved(item.Key);
  121.     return result;
  122.   }
  123.  
  124.   public IEnumerator<KeyValuePair<TKey, TValue>> GetEnumerator()
  125.   {
  126.     return new CacheDictionaryEnumerator(_data.GetEnumerator(), _removalStrategy);
  127.   }
  128.  
  129.   System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
  130.   {
  131.     return new CacheDictionaryEnumerator(_data.GetEnumerator(), _removalStrategy);
  132.   }
  133.   #endregion
  134.  
  135.   public class CacheDictionaryEnumerator : IEnumerator<KeyValuePair<TKey, TValue>>
  136.   {
  137.     private IEnumerator<KeyValuePair<TKey, TValue>> _innerEnumerator;
  138.     private ICacheDictionaryRemovalStrategy<TKey> _removalStrategy;
  139.  
  140.     internal CacheDictionaryEnumerator(IEnumerator<KeyValuePair<TKey, TValue>> innerEnumerator, ICacheDictionaryRemovalStrategy<TKey> removalStrategy)
  141.     {
  142.       if (innerEnumerator == null)
  143.         throw new ArgumentNullException("innerEnumerator");
  144.       if (removalStrategy == null)
  145.         throw new ArgumentNullException("removalStrategy");
  146.  
  147.       _innerEnumerator = innerEnumerator;
  148.       _removalStrategy = removalStrategy;
  149.     }
  150.  
  151.     public KeyValuePair<TKey, TValue> Current
  152.     {
  153.       get
  154.       {
  155.         KeyValuePair<TKey, TValue> result = _innerEnumerator.Current;
  156.         _removalStrategy.KeyAccessed(result.Key);
  157.         return result;
  158.       }
  159.     }
  160.  
  161.     public void Dispose()
  162.     {
  163.       _innerEnumerator.Dispose();
  164.       _innerEnumerator = null;
  165.     }
  166.  
  167.     object System.Collections.IEnumerator.Current
  168.     {
  169.       get {return this.Current;}
  170.     }
  171.  
  172.     public bool MoveNext()
  173.     {
  174.       return _innerEnumerator.MoveNext();
  175.     }
  176.  
  177.     public void Reset()
  178.     {
  179.       _innerEnumerator.Reset();
  180.     }
  181.   }
  182. }

Strategy Implementations

For the solution to be useable out of the box (or out of the downloaded zip file) I have added three implementation of the ICacheDictionaryRemovalStrategy<TKey> interface. The three implementations are:

  1. EmptyRemovalStrategy<TKey> – Removes the first item in it’s internal collection. Does not track access in any way.
  2. MruRemovalStrategy<TKey> – Removes the most recently used (most accessed) item in the CacheDictionary.
  3. LruRemovalStrategy<TKey> – Removes the least recently used (least accessed) item in the CacheDictionary.

These three implementation should allow you a basic use of the CacheDictionary right after download.

As usual you can download all the needed data from my skydrive (link below). The 7zip contains a small demo program on how to use the provided class. If you find bugs/improvements to my implementations I would be more than happy if you share those in the comments.

Thank you for reading,

Boris.