If you always wanted to learn how to use WinDbg to debug your .Net application but never had the courage to try, this guide is for you. In this post I will give step by step instructions on finding a very simple memory leak in a deployed application using WinDbg and SOS.
Let’s consider the following case: You come to work in the morning and the QA is rushing towards you. “I left the application MemoryLeakTest running at night and when I came in the morning it crashed!” he says in a disturbed tone. What should we do?
Step 1 – Get a crash dump
When a .Net application crashes it (usually) doesn’t close entirely. Instead, a window pops up and allows you to debug the crash or report to Microsoft (if error reporting is enabled). Reporting the error to Microsoft when your application crashes will usually not contribute to the resolution of the problem. Before closing the window we will create a crash dump of the application. I assume that debugging the application on the QA machine because is impossible because it simulates a customer machine. Creating a crash dump is made very simple in windows Vista and above (for older OSes this is a bit trickier. I advice you to use ADPlus , read here how). To create a crash dump, open the Task Manager, right click the crashed process and select Create Dump File, wait… A dialog will pop-up with the path to the created dump file.
Important – You have to pay special attention to the bitness of the process. I am using a 64bit OS but the application I want to debug is 32bit (I can see this by the *32 marker near the process name). If I create a dump file using the standard Task Manager, it will be a 64bit crash dump which is completely un-usable. I need to use the 32bit task manager. It is located in C:\<Windows Dir>\SysWOW64\taskmgr.exe. When running it you will see no indication that this is the 32bit version so be careful.
Step 2 – Installing WinDbg on your machine
If you already have a 32bit WinDbg installed, please skip this step. WinDbg comes as a component in Microsoft Windows SDK for Windows 7 and .NET Framework 4 (this is the iso version which I find more convenient). From the site at the link above you can download GRMSDK_EN_DVD.iso (the 32bit version) but don’t install it fully. Mount the image using your favorite mounting tool (I use VirtualCloneDrive) then install only \Setup\WinSDKDebuggingTools\dbg_x86.msi.
Step 3 – Mount the crash dump
Start WinDbg x86 and select File->Open Crash Dump (Control + D) then select the dump file. At this point you can simply run analyze –v and get all the information of this step but I will guide you through what actually happens in this command. First, you need to load SOS (the WinDbg extension which allows us to debug .NET applications). To do so type the command .loadby sos clr (or .loadby sos mscorwks in .Net 3.5 and below). This command will try to load the sos extension of the same version as the clr version you are running. Note that this might fail if you don’t have the exact same version of SOS and clr (the clr that was used to run the application on the remote machine). In that case you would need to copy the SOS dll from the machine that ran the application from the correct framework folder. If the command succeeds then you will see nothing. Now we want to see what was the exception that caused the application to crash. To do so use the !pe command (all SOS commands begin with “!”). Here is what I got when I ran it on my application:
Exception object: 7c93a534
Exception type: System.OutOfMemoryException
SP IP Function
0025E784 04E6C56A mscorlib_ni!System.Collections.Generic.List`1[[System.Byte, mscorlib]]..ctor(Int32)+0x1a
0025E790 006104D9 MemoryLeakTest!MemoryLeakTest.DataHolder..ctor()+0x31
0025E7A4 0061039C MemoryLeakTest!MemoryLeakTest.MainWindow.Button_Click(System.Object, System.Windows.RoutedEventArgs)+0x64
0025E7DC 538571C8 PresentationCore_ni!System.Windows.RoutedEventHandlerInfo.InvokeHandler(System.Object, System.Windows.RoutedEventArgs)+0x78
<the rest is omitted for brevity>
We can see two important things here:
- The exception is indeed and OutOfMemoryException so the problem could be a memory leak.
- The OutOfMemoryException happened when the constructor of the DataHolder type in my application tried to allocate a list of bytes (List<byte>).
We should try to narrow things down to see what might cause the memory to get full.
Step 4 – Memory analysis
The first thing we want to do is to see what is consuming so much memory. SOS provides a command called dumpheap that allows us to do just that but if we run it without any flags we will get too much information. I run the following command: !dumpheap –stat . The –stat flag will cause the command to print out only the statistics for each object type. It will show the object MT, the number of such objects in memory and the total ammount of memory all objects of that type consume. The list will be ordered by the total memory consumed and in our case the last couple of lines look like this:
049e1cf0 544 10880 System.RuntimeType
54f731a4 192 11520 System.Windows.Markup.BamlAttributeInfoRecord
54f67bf4 496 13888 System.Windows.FrameworkPropertyMetadata
56ef2058 391 17204 System.Windows.DependencyProperty
049e32c0 58 35232 System.Collections.Hashtable+bucket
049e2d0c 214 59980 System.Int32
049e0b70 2719 143968 System.String
0029efc0 29 4066792 Free
049b4340 766 4284024 System.Object
00429b9c 811569 12985104 MemoryLeakTest.DataHolder
049c4034 811569 19477656 System.Collections.Generic.List`1[[System.Byte, mscorlib]]
049d9ddc 811634 25972288 System.EventHandler
049e35e0 811590 1227092812 System.Byte
Total 3259762 objects
Notice the last four lines. There is the DataHolder again and there are as many DataHolders as List<byte> so each DataHolder probably holds some list of bytes. This also accounts for the huge number of arrays of bytes since a List<byte> stores the data as an array of bytes internally. But something is out of place here. There are a lot of EventHandler objects, too many for any application to be alive at the same time and their number is very close to the number of DataHolder objects. This piece of information leads us to suspect that someone registers to a static/singleton event using a DataHolder but never unregisters. This causes the DataHolder objects to never be garbage collected.
Let’s investigate further!
Step 5 – Finding the Root cause
To confirm our suspicion and to see who is the singleton who holds the DataHolder objects in memory we will do the following: We need to find an instance of a DataHolder object and see who holds it (since there are so many, there is a high probability we will find the problem on the first try). To do so I will use the dumpheap command again but this time I will use it with the –mt flag to show the details of all objects of a certain type. I can see from the previous command that the MT of the DataHolder type is 00429b9c (see first column in the previous results). The command I run is therefore !dumpheap –mt 00429b9c (note that it is also possible to use the –type flag but in this casre it is important to be exact on the name of the type. The dumpheap commands uses "contains” condition when searching for type names so, for example if you write “!dumpheap –type String” it will show all the types with the word string contained in the name such as StringBuilder. Run !help dumpheap for more details). At this point WinDbg prints out a long list of all the instances of the DataHolder class. Since I am not looking for one particular class I stop the printing using Control+break and pick one instance to analyze. I pick the instance: 7c0a9fe4 00429b9c 16
Now I probe the specific object we found in the previous command. First lets get the details of that object by running the DumpObject command with the address of the object. I run !do 7c0a9fe4 :
0:000> !do 7c0a9fe4
Size: 16(0x10) bytes
MT Field Offset Type VT Attr Value Name
00000000 4000007 4 0 instance 7c0a9ff4 Bytes
049e2dbc 4000008 8 System.Int32 1 instance -1 <Id>k__BackingField
049e2dbc 4000006 24 System.Int32 1 static 811569 Counter
This is the correct object indeed. I can see its inner structure but to see which root is holding it I need to run !gcroot 7c0a9fe4. Wait until the command finishes. WinDbg shows the following output (note that in the general case there are many roots holding the same object):
Scan Thread 0 OSTHread 1618
This is a linked list (hence the –> markers) of a path that leads from a system root to the DataHolder instance. Looks like our assumption was correct. There is indeed a registration of some event of the MainWindow class to a method of the DataHolder (it could be worse in case of a lambda expression).
Now it is the right time to find the problem in the code.
Step 6 – Finding the problem in the code
Only at this step I open the code of the application. I am looking for an event registration in the main window. A simple search for the += symbol leads me to the following line in the code:
There is a ReportDeath event in the MainWindow class and we register a method within the DataHolder class to that event. A simple search in the code shows that we never unregister from that event and therefore all the DataHolder instances are now “hanging” by the method ReportDeath on the MainWindow ReportDeath event.
Memory leak found!
I have shown how to use WinDbg to find a memory leak in the simplest possible scenario. In reality you would probably need to work harder to pinpoint what is causing the memory leak but your research will not be much different from the steps I took in this post. WinDbg is a powerful tool and a great compliment to Visual Studio. Knowing how to use it is a very important skill for both managed and unmanaged developers. I hope you enjoyed reading this post. Please leave a comment if you think I missed something or if you have something to add. The full source code of the MemoryLeakTest application can be found on my SkyDrive here:
Thank you for reading,