Saturday, February 11, 2012

WinDbg 101–A step by step guide to finding a simple memory leak in your .Net application

 

Hi,

If you always wanted to learn how to use WinDbg to debug your .Net application but never had the courage to try, this guide is for you. In this post I will give step by step instructions on finding a very simple memory leak in a deployed application using WinDbg and SOS.

Let’s consider the following case: You come to work in the morning and the QA is rushing towards you. “I left the application MemoryLeakTest running at night and when I came in the morning it crashed!” he says in a disturbed tone. What should we do?

Step 1 – Get a crash dump

When a .Net application crashes it (usually) doesn’t close entirely. Instead, a window pops up and allows you to debug the crash or report to Microsoft (if error reporting is enabled). Reporting the error to Microsoft when your application crashes will usually not contribute to the resolution of the problem. Before closing the window we will create a crash dump of the application. I assume that debugging the application on the QA machine because is impossible because it simulates a customer machine. Creating a crash dump is made very simple in windows Vista and above (for older OSes this is a bit trickier. I advice you to use ADPlus , read here how). To create a crash dump, open the Task Manager, right click the crashed process and select Create Dump File, wait… A dialog will pop-up with the path to the created dump file.

p1

Important – You have to pay special attention to the bitness of the process. I am using a 64bit OS but the application I want to debug is 32bit (I can see this by the *32 marker near the process name). If I create a dump file using the standard Task Manager, it will be a 64bit crash dump which is completely un-usable. I need to use the 32bit task manager. It is located in C:\<Windows Dir>\SysWOW64\taskmgr.exe. When running it you will see no indication that this is the 32bit version so be careful.

Step 2 – Installing WinDbg on your machine

If you already have a 32bit WinDbg installed, please skip this step. WinDbg comes as a component in Microsoft Windows SDK for Windows 7 and .NET Framework 4 (this is the iso version which I find more convenient). From the site at the link above you can download GRMSDK_EN_DVD.iso (the 32bit version) but don’t install it fully. Mount the image using your favorite mounting tool (I use VirtualCloneDrive) then install only \Setup\WinSDKDebuggingTools\dbg_x86.msi.

Step 3 – Mount the crash dump

Start WinDbg x86 and select File->Open Crash Dump (Control + D) then select the dump file. At this point you can simply run analyze –v and get all the information of this step but I will guide you through what actually happens in this command. First, you need to load SOS (the WinDbg extension which allows us to debug .NET applications). To do so type the command .loadby sos clr (or .loadby sos mscorwks in .Net 3.5 and below). This command will try to load the sos extension of the same version as the clr version you are running. Note that this might fail if you don’t have the exact same version of SOS and clr (the clr that was used to run the application on the remote machine). In that case you would need to copy the SOS dll from the machine that ran the application from the correct framework folder. If the command succeeds then you will see nothing. Now we want to see what was the exception that caused the application to crash. To do so use the !pe command (all SOS commands begin with “!”). Here is what I got when I ran it on my application:

0:000> !pe
Exception object: 7c93a534
Exception type: System.OutOfMemoryException
Message: <none>
InnerException: <none>
StackTrace (generated):
    SP       IP       Function
    0025E784 04E6C56A mscorlib_ni!System.Collections.Generic.List`1[[System.Byte, mscorlib]]..ctor(Int32)+0x1a
    0025E790 006104D9 MemoryLeakTest!MemoryLeakTest.DataHolder..ctor()+0x31
    0025E7A4 0061039C MemoryLeakTest!MemoryLeakTest.MainWindow.Button_Click(System.Object, System.Windows.RoutedEventArgs)+0x64
    0025E7DC 538571C8 PresentationCore_ni!System.Windows.RoutedEventHandlerInfo.InvokeHandler(System.Object, System.Windows.RoutedEventArgs)+0x78
<the rest is omitted for brevity>

We can see two important things here:

  1. The exception is indeed and OutOfMemoryException so the problem could be a memory leak.
  2. The OutOfMemoryException happened when the constructor of the DataHolder type in my application tried to allocate a list of bytes (List<byte>).

We should try to narrow things down to see what might cause the memory to get full.

Step 4 – Memory analysis

The first thing we want to do is to see what is consuming so much memory. SOS provides a command called dumpheap that allows us to do just that but if we run it without any flags we will get too much information. I run the following command: !dumpheap –stat . The –stat flag will cause the command to print out only the statistics for each object type. It will show the object MT, the number of such objects in memory and the total ammount of memory all objects of that type consume. The list will be ordered by the total memory consumed and in our case the last couple of lines look like this:

049e1cf0      544        10880 System.RuntimeType
54f731a4      192        11520 System.Windows.Markup.BamlAttributeInfoRecord
54f67bf4      496        13888 System.Windows.FrameworkPropertyMetadata
56ef2058      391        17204 System.Windows.DependencyProperty
049e32c0       58        35232 System.Collections.Hashtable+bucket[]
049e2d0c      214        59980 System.Int32[]
049e0b70     2719       143968 System.String
0029efc0       29      4066792      Free
049b4340      766      4284024 System.Object[]
00429b9c   811569     12985104 MemoryLeakTest.DataHolder
049c4034   811569     19477656 System.Collections.Generic.List`1[[System.Byte, mscorlib]]
049d9ddc   811634     25972288 System.EventHandler
049e35e0   811590   1227092812 System.Byte[]
Total 3259762 objects

Notice the last four lines. There is the DataHolder again and there are as many DataHolders as List<byte> so each DataHolder probably holds some list of bytes. This also accounts for the huge number of arrays of bytes since a List<byte> stores the data as an array of bytes internally. But something is out of place here. There are a lot of EventHandler objects, too many for any application to be alive at the same time and their number is very close to the number of DataHolder objects. This piece of information leads us to suspect that someone registers to a static/singleton event using a DataHolder but never unregisters. This causes the DataHolder objects to never be garbage collected.

Let’s investigate further!

Step 5 – Finding the Root cause

To confirm our suspicion and to see who is the singleton who holds the DataHolder objects in memory we will do the following: We need to find an instance of a DataHolder object and see who holds it (since there are so many, there is a high probability we will find the problem on the first try). To do so I will use the dumpheap command again but this time I will use it with the –mt flag to show the details of all objects of a certain type. I can see from the previous command that the MT of the DataHolder type is 00429b9c (see first column in the previous results). The command I run is therefore !dumpheap –mt 00429b9c (note that it is also possible to use the –type flag but in this casre it is important to be exact on the name of the type. The dumpheap commands uses "contains” condition when searching for type names so, for example if you write “!dumpheap –type String” it will show all the types with the word string contained in the name such as StringBuilder. Run !help dumpheap for more details). At this point WinDbg prints out a long list of all the instances of the DataHolder class. Since I am not looking for one particular class I stop the printing using Control+break and pick one instance to analyze. I pick the instance: 7c0a9fe4 00429b9c       16

Now I probe the specific object we found in the previous command. First lets get the details of that object by running the DumpObject command with the address of the object. I run !do 7c0a9fe4 :

0:000> !do 7c0a9fe4
Name: MemoryLeakTest.DataHolder
MethodTable: 00429b9c
EEClass: 00620d68
Size: 16(0x10) bytes
(C:\Projects\MemoryLeakTest\bin\Debug\MemoryLeakTest.exe)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
00000000  4000007        4                       0 instance 7c0a9ff4 Bytes
049e2dbc  4000008        8         System.Int32  1 instance       -1 <Id>k__BackingField
049e2dbc  4000006       24         System.Int32  1   static   811569 Counter

This is the correct object indeed. I can see its inner structure but to see which root is holding it I need to run !gcroot 7c0a9fe4. Wait until the command finishes. WinDbg shows the following output (note that in the general case there are many roots holding the same object):

Scan Thread 0 OSTHread 1618
ESP:25dd78:Root:02785bf8(MemoryLeakTest.App)->
0278c26c(MemoryLeakTest.MainWindow)->
7c93a4e0(System.EventHandler)->
03b5a7f8(System.Object[])->
7c0aa5f4(System.EventHandler)->
7c0a9fe4(MemoryLeakTest.DataHolder)

This is a linked list (hence the –> markers) of a path that leads from a system root to the DataHolder instance. Looks like our assumption was correct. There is indeed a registration of some event of the MainWindow class to a method of the DataHolder (it could be worse in case of a lambda expression).

Now it is the right time to find the problem in the code.

Step 6 – Finding the problem in the code

Only at this step I open the code of the application. I am looking for an event registration in the main window. A simple search for the += symbol leads me to the following line in the code:

  1. DataHolder holder = new DataHolder();
  2. ReportDeath += holder.ReportDeath;
  3. holders.Add(holder);

There is a ReportDeath event in the MainWindow class and we register a method within the DataHolder class to that event. A simple search in the code shows that we never unregister from that event and therefore all the DataHolder instances are now “hanging” by the method ReportDeath on the MainWindow ReportDeath event.

Memory leak found!

Conclusion

I have shown how to use WinDbg to find a memory leak in the simplest possible scenario. In reality you would probably need to work harder to pinpoint what is causing the memory leak but your research will not be much different from the steps I took in this post. WinDbg is a powerful tool and a great compliment to Visual Studio. Knowing how to use it is a very important skill for both managed and unmanaged developers. I hope you enjoyed reading this post. Please leave a comment if you think I missed something or if you have something to add. The full source code of the MemoryLeakTest application can be found on my SkyDrive here:

Thank you for reading,

Boris