Hi All,
Today I want to talk about a nice and relatively unmentioned feature of the .Net framework – string interning. I will start with a small riddle. What do you think is the output of the following application?
- class Program
- {
- static void Main(string[] args)
- {
- Task t0 = new Task(() => { PrintNumber(0); });
- Task t1 = new Task(() => { PrintNumber(1); });
- t0.Start();
- t1.Start();
- Task.WaitAll(t0, t1);
- }
- public static void PrintNumber(int number)
- {
- lock ("Monkey")
- {
- for (int i = 0; i < 10; i++)
- {
- Console.Out.Write(number);
- Thread.Sleep(200);
- }
- }
- }
- }
If you said a series of 0s and then a series of 1s (or vise versa) then you are right. But why? When we lock the string “Monkey” shouldn’t the compiler create two instances of the this string (one per call)? This string is a local variable of the method.
As you probably know the String class in .Net is immutable which means that once an instance of this class is created it cannot be edited in any way. Any operation that changes the content of the string will always create a new instance of the string, this instance reflects the changes done on the original string. This behavior is not cemented using some sort of a language directive and is more of a “word of mouth”, an agreements if you wish, between the framework and the coders. Nevertheless, this immutability allows the .Net framework to treat the string in a special way. If the compiler can infer the string content at compile time it will not be allocated on the local heap. It will be allocated on the System Appdomain and added to an interning hash table on the System Domain (called the intern pool). Every time a string is created the framework will search the intern pool to check if an instance of this string already exists and if so it will return a reference to that instance. Note that this would happen automatically only to strings which the compiler can infer during compilation but you can force a string to enter the intern pool by using the String.Intern method.
Here is a simple program to list some interesting cases of string interning:
- static void Main(string[] args)
- {
- string s1 = "Hello World";
- string s2 = "Hello World";
- string s3 = s2;
- StringBuilder sb1 = new StringBuilder(s1);
- string s4 = sb1.ToString();
- string s5 = string.Intern(s4);
- string s6 = s1.Clone() as string;
- Console.Out.WriteLine(String.Format("s1 == s2 - {0}", s1 == s2));
- Console.Out.WriteLine(String.Format("Object.ReferenceEquals(s1,s2) - {0}", Object.ReferenceEquals(s1, s2)));
- Console.Out.WriteLine(String.Format("Object.ReferenceEquals(s1,s3) - {0}", Object.ReferenceEquals(s1, s3)));
- Console.Out.WriteLine(String.Format("Object.ReferenceEquals(s1,s4) - {0}", Object.ReferenceEquals(s1, s4)));
- Console.Out.WriteLine(String.Format("Object.ReferenceEquals(s1,s5) - {0}", Object.ReferenceEquals(s1, s5)));
- Console.Out.WriteLine(String.Format("Object.ReferenceEquals(s1,s6) - {0}", Object.ReferenceEquals(s1, s6)));
- StringBuilder sb2 = new StringBuilder();
- for (int i = 0; i < 20000; i++)
- sb1.Append("a");
- string[] strings = new string[2];
- for (int i = 0; i < 2; i++)
- {
- strings[i] = String.Intern(sb1.ToString());
- }
- Console.Out.WriteLine(String.Format("(s1,s6) - {0}", Object.ReferenceEquals(strings[0],strings[1])));
- }
And the output is:
s1 == s2 - True
Object.ReferenceEquals(s1,s2) - True
Object.ReferenceEquals(s1,s3) - True
Object.ReferenceEquals(s1,s4) - False
Object.ReferenceEquals(s1,s5) - True
Object.ReferenceEquals(s1,s6) - True
(s1,s6) – True
Couple of thigs to notice:
- String operations that do not change the string (such as Clone or ToString) will return the same instance.
- If an interend string is allocated on the System Appdomain it will never be released (Only when the CLR is terminated)
- StringBuilder will always generate a new instance of a string (the compiler can’t infer what the StringBuilder contains).
I am not sure where interning can actually help (although I heard that there are some use cases in ASP.NET where this behavior is benefitial) but you should be aware that it exist and never use a string in a lock statement. Who knows where this string comes from
This is what MSDN has to say about the lock statement :
lock("myLock") is a problem because any other code in the process using the same string, will share the same lock.
I hope now you understand why.
I want to thank Ilya Kosaev for helping me with this blog post.
Thanks for reading,
Boris.