Today I am going to consider the following case: You want to transmit a class over some sort of a protocol and of course you want to minimize the serialization size and maximize the serialization speed of your class. Since every class is essentially a set of primitive types we can simply consider the cost of serializing these primitive types to simple byte arrays (buffers) which can be sent over some medium.
My wife did an extensive research on this case at work and eventually came down with two possible methods (I am summarizing her work as there were many possible solutions to this case)
Method 1 – Using List<byte>
Method 2 – Using MemoryStream
At first glance these two methods should not differ by much. Perhaps Method 1 should be a bit slower but here are the results of this run:
ArrayFormatter time = 3439 value = 3 length = 400000
MemoryStreamFormatter time = 343 value = 3 length = 400000
Method 1 is ten times slower than Method 2. When my wife told me about this result I said she probably measured something wrong. I simply couldn’t believe it so I ran the analysis myself and got the following results:
First, we look at the results of Method 2. We can see that the MemoryStream was initialized with a single allocation of 400000 bytes just as we would expect. But there is something nice happening here. The result of the method GetBytes is taken directly from the memory stream buffer by using the GetBuffer method (the memory stream is disposed when we leave the method) so no additional allocations are required.
Now we look at Method 2. Oh the horror… The constructor of the List allocates 400000 bytes just as we expected. We did it so that AddRange would not need to allocate additional bytes and therefore relocate the entire collection in memory. Next we see that BitConverter.GetBytes allocated 100,000 arrays of 4 bytes each, the result of converting a single int into byte array but since an Array is a class we paid a great penalty here. For every class allocated in 32bit architecture we pay 8 bytes of system managed space (two pointers, one for the syncblock and one for the MT of that class) and probably each instance of Array contains a single int variable that contains its size (I am not sure about this if you know otherwise please leave a comment). All and all each such intermediate array takes 16 bytes and therefore there are 1600000 bytes allocated by this method. The big surprise comes with the AddRange method. You would expect that no allocations would be done in this method since we pre-allocated all the needed space in the List constructor but it seems that for each call to AddRange a temporary array is allocated (the number of byte allocated is the same as in the BitConverter.GetBytes calls). Lets look at the code using ILSpy -
Look at lines 136 – 138! Our assumption was correct. There is an allocation of a temporary array in the middle of AddRange but why? I found a cryptic answer in StackOverflow. In simple words the answer is this: Since the input collection is cast to ICollection<T> we have to use it's CopyTo method to copy the elements because it may have a custom implementation. The CopyTo method expects we send it the full array and not only the area to which it should copy the elements. If we would send the internal array of the List<T> (this._items) we would expose it to an external method of unknown implementation, this is very dangerous because it can change other elements in this array. As a rule of thumb it’s a good practice to not send reference type (mutable) private members to methods of unknown implementation. The solution here is to create a temporary array with the exactly needed size and make the input collection copy all the members there and then use the standard implementation of CopyTo to copy the elements to the internal array.
All things combined we get ten times more bytes allocated in Method 1 over Method 2 which accounts for the difference in performance. Mystery solved.Thanks for reading,