TPL Data Flow
I first encounter TPL data flow (part of .NET 4.5 – TPL DTF) during the build conference (I attended last year) but I must say I didn’t get it then. It sounded too much like RX which I was familiar with. Recently during one of my projects that required high performance CPU bounded – high throughput and low latency my colleague Alon the gave the idea – why not using TPL data flow and from then my world changed forever. It’s amazing to how many systems this technology can be suitable to,and for the project I was working on it was perfect. For those of you that are not yet familiar with this technology I suggest reading the article from here and see this amazing example of kinect and TPL DTF here – you’ll thank me later.
So I created some crazy TPL data flow networks on runtime (~100k different networks) and it worked getting 100% CPU most of the time, leveraging all the cores and getting really good performance – but it was really hard to debug and to understand the flow from the code. It was Alon again that suggested to write debug visualizer – so I did. This is my insights from the process.
Writing Debugger Visualizer
I don’t know how many of you had the chance to implement debugger visualizer but it is really straightforward if you need only view on a simple object just follow 5-6 steps:read here. Finally put your assembly in the visualizer folder (e.g. for VS 11 C:\Program Files (x86)\Microsoft Visual Studio 11.0\Common7\Packages\Debugger\Visualizers) and you’re done. This is very simple if your debugging target is simple and serializable. I from the other hand had few issues.
The Issues I encountered during the implementation
- Debugger Visualizer Target cannot be interface
This really surprised me because I know the target I wanted was IDataFlow which is the common interface of TPL data flow which all the blocks need to implement and I discovered that it’s impossible,the target must be a type (can be common base class but not an interface). My solution to this issue was not perfect – I added multiple attributes with all the build in blocks as target. It’s not that bad because the build in blocks are a good start and as long as you start from build in block (what you normally do) then it will discover the rest of the linked block even if they are not part of the build in blocks. In addition it’s not recommended to implement IDataFlow block on your own – there is a lot to take care of: locking synchronization,buffering and more. If one of you Microsoft guys are listening I think adding support of Interface as target of debug visualizer can be very handy.
- DataFlow blocks are not serializable
As I wrote in the previous paragraph writing debugger visualizer is simple (in most cases…). Well somehow the running process (debugee) needs to transfer the debugged object to the IDE (debugger) this is done by serialization. the thing is non of the IDataFlow block are serialzable. In order to resolve it I implemented derive to VisualizerObjectSource – overriding the GetData method that gets the object and output stream and write it to the stream, the debugger can read the stream from his own size. So I created a serializable object that holds the debug info needed and serialized it to the stream and deserialized on the debugger side. I worked O.K until I tested more complicated networks containing shared blocks meaning two blocks or more are connected to the same block. when serializing I got different reference on the debugger side the solution for this problem was DataContrat attribute:
[DataContract(IsReference=true)] and we get the same reference.
- Retrieving the information from the IDataFlow block
This experience made me learn a lot about the internal implementation of the build in blocks. If you look on this blocks you get a lot of data like Linked blocks Input Queue ext. This data is not part of the block properties – apparently each block has private inner class call Debug view that holds this data. This made me use some hard core reflection (made me understand how powerful reflection is). One more thing was the internal implementation of Joins and links. when you link to Join block it is linked to internal block called JoinBlockTarget<,> which has no data about the linked targets but has reference to the owner Join block .Another example is adding a link with a filter is adding another block called FilterLinkPropagator (see pic above)
So Start using it now…
Go to http://dataflowdebuggerview.codeplex.com/ download the source/binaries and start using it. I’ll be happy to get your comments and suggestions.
VS 2012 RC Update:TPL dataflow was moved to NuGet (http://nuget.org/packages/Microsoft.Tpl.Dataflow/4.5.1-rc), pay attention you have to add reference to System.Threadin.Tasks.dll (under the C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETCore\v4.5)