Real-life story: Blocking Collection
this post will discuss a real-life story which uncover none trivial (yet logical) behavior which related to Parallel.ForEach and BlockingCollection<T>.
I will explain why it happens and what how can we handle it right.
it all start when Guy Eden from ITG has found that the following code seem to leak memory:
it was leaking even those the blocking collection were empty (nothing was adding item to the collection).
while monitoring this code we can see a steep memory curve constantly raising up to the sky.
I found it very surprising because it seem quit reasonable scenario and BlockingCollection<T> is a prime member of the new TPL concurrent collection family.
so I have decided to ask Stephen Toub about this behavior and as always Stephen has enlighten me about what is really happening in this scenario.
the following is a snippet from Stephen‘s response:
"The Parallel.ForEach uses the ThreadPool. ForEach doesn’t use a fixed number of threads, but instead will use whatever threads the pool will make available to it. And the ThreadPool has a starvation mechanism, which will introduce additional threads if all threads in the pool are blocked and not making forward progress."
there are couple of options to cope with this issue:
the first is fairly simple which is to use the MaxDegreeOfParallelism as shown in the following snippet:
but this solution is suffering from inefficient use of threads because we are obviously targeting a long running scenario where it is better not to consume the thread from the ThreadPool.
the other option is to use a custom scheduler (actually this is one of the few scenario where custom scheduler seem right).
the following snippet show a simplify version of the custom scheduler:
the scheduler is getting maxDegreeOfParallelism as ctor parameter (better name may be poolSize) and construct a pool of long running tasks.
the tasks will be synchronized using ManualResetEventSlim (which accept cancellation token as parameter to the Wait API AutomaticResetEvent does not have a slim version and does not accept cancellation token) (see lines 4,5,20).
the scheduler will execute the tasks on a non ThreadPool thread, therefore won’t interfere the ThreadPool heuristics.
the following snippet show how to use this scheduler:
you should be aware of the Parallel.ForEach behavior which assume short running actions and try to use whatever threads the pool will make available to it.
in cases that you want to use it for potentially long actions, you better use scheduler that will take off the load to a non thread pool threads or at least use the MaxDegreeOfParallelism.