Serialization possibilities in the .NET Framework have been significantly augmented with the advent of .NET 3.0. Tradionally, you had the choice between using the BinaryFormatter used by .NET Remoting (or its slow and deprecated companion, the SoapFormatter) – producing binary data in a completely proprietary format using Reflection as the underlying engine for serialization discovery – and using the XmlSerializer – employing code generation to produce well-formed plain-text XML, as long as your object graph didn’t have cycles, didn’t try anything out of the ordinary, exposed every serializable piece of data through a public get/set property, and didn’t use derived types where base types were expected. (My bitterness about the XmlSerializer boiled down to implementing a SerializableDictionary<K,V> and SerializableList<T> to allow derived types where base types were expected…)
The DataContractSerializer introduced with WCF is designed to address the compatibility issues of the BinaryFormatter, while still allowing for plain-text XML or binary output, for cycles in the object graph, for data not explicitly exposed through a get/set property, etc. As a side note, it also features significantly better serialization performance.
This is all fun and games, so why not measure it? And while we’re at it, why measure serialization when we can measure the entire WCF communication stack as opposed to .NET Remoting? So let’s get going. What we’re going to measure is WCF as opposed to .NET Remoting, on two parameters: the amount of data transferred, and the kind of data transferred. As far as size is concerned, we’ll stick to powers of 2 between, say, a tiny 256 bytes packet, and a gigantic 4MB chunk. As for kinds of data, we’ll try a simple array of bytes (byte), a simple data structure, and a complex data structure aggregating additional data structures (introducing a relatively complex object graph). This simple benchmark produces the following (surprising?) results with 100 invocations per each data size tick. (Note that the scale is logarithmic on both axes.)
So… what on earth is that? WCF’s promising amazing optimizations, while beating Remoting easily with an array of bytes or a simple data structure, fail so badly with a slightly more complicated object graph? (You might argue that proper data contracts should not exhibit complicated object graphs. Tough luck, welcome to the real world.)
Now, what’s so complex about that complex data structure? Here’s the definition (data contract) of what we’re transmitting over the wire:
And here’s the way it’s initialized:
Hmm, so what we’re effectively creating is an object graph along the following lines:
Everyone’s pointing at the same data, plus there are lots of references to the same grandchild container ("Kiddo Container"). Let’s try something else entirely – why don’t we change the graph so that every container gets its own copy of the data:
The object graph looks much better now that everyone has their own copy of the data, and children aren’t duplicated across the graph. By the way, it’s clearly larger in size because there are more copies of the data. Let’s see how this affects performance:
Back to what we used to see! So what could possibly be so wrong with WCF to fail in the previous scenario so badly? It must have to do with serialization, because it’s fairly unlikely for the transport mechanism to behave differently with respect to the data being transferred.
There are several bloggers who have previously mentioned the merits of using the little-known "preserveObjectReferences" flag when constructing the DataContractSerializer. This flag has the effect of the DCS using non-interoperable extensions to serialize object references as references, without reproducing the entire object in the serialized output. The BinaryFormatter has this behavior built-in because it doesn’t care about producing interoperable XML; WCF, designed for interoperability, doesn’t give you this non-standard extension as a default. Which means that in our ill-behaved scenario, each "Data" object is being laid out separately in the serialized stream, causing a terrible bloat compared to the BinaryFormatter. Compare and contrast the stream sizes produced by the two serializers for objects of various sizes:
How about the "normal" scenario, where objects in the graph are distinct and the BinaryFormatter doesn’t have the advantage of properly serializing references?
Much more legitimate, and also explains the results!
If you’ve read until this point, you’re probably wondering what good is all this stuff. After all, you can’t always modify the object graph so that it’s suitable for the DataContractSerializer. And on the other hand, you can’t really control the DataContractSerializer used by WCF. Or can’t you?.. Of course you can!
We can apply an operation behavior derived from DataContractSerializerOperationBehavior to our WCF operation contract. In that custom behavior we will specify the "preserveObjectReferences" flag for the DCS that we create. Once we do that, WCF will use our DCS and produce a significantly more compact serialization in the ill-behaved scenario. The code here is mostly boiler-plate – it’s attached to this post as part of the benchmark solution. But here are the results:
WCF wins again! (Bear in mind that we’ve adopted a solution that kills any chance of interoperability with non-.NET code, but then again that’s what .NET Remoting is all about.)
As a final note, if you’re looking to minimize the message footprint and enjoying WCF nonetheless, there’s another option not measured here – the NetDataContractSerializer. It behaves similarly to the BinaryFormatter, but can be used by WCF’s serialization mechanism (so it’s possible to change the operation behavior in the benchmark to return a new instance of the NetDataContractSerializer). I haven’t performed extensive measurements, but WCF with NetDataContractSerializer seems to outperform .NET Remoting in every scenario out of the box.
(You can download the benchmark sources from here, as a Visual Studio 2008 solution.)