DCSIMG
March 2010 - Posts - Zuker On Foundations

Zuker On Foundations

The realm of .NET (WPF, WCF and all around)

March 2010 - Posts

Parallel Extensions or Asynchronous Invocation with IO-based operations

I've been playing around with the parallel extensions shipped as part of the .NET 4.0 and Visual Studio 2010 RC.

Turning immutable atomic self-contained CPU-bound operations to run in parallel is pretty easy. However, as soon as you drill down to more complex executions, where issues such as memory sharing, allocations, delegates, false sharing, and synchronization mechanisms come in mind, you may have trouble finding the optimal parallel execution model for your code.

I was interested in testing a specific scenario to see if the Parallel Extensions could be beneficial.

Consider the following example:
I have a method which gets a list of service endpoint addresses which I need to invoke.
I need to make the invocations concurrently and then wait until all of the results are completed and ready.

Prior to the arrival of Parallel Extensions, I used the standard asynchronous invocation pattern (Begin/EndXXX), and used the “WaitHandle.WaitAll” to wait for all the results to be ready.

In this specific case, I make a simple operation which basically does only IO-bound work.
I was wondering what would be the result of using the Parallel Extensions in this case.

You should note that testing parallelism depends on the code you’re executing and the machine configuration and you should ensure that other running prcocesses don't affect your test.

Expectation:
Because I’m testing an atomic IO-bound operation, I expect to find no improvement nor worsening with the performance of using the asynchronous invocation VS the parallel extensions.

Test Detail:

The test is of running 8 service calls concurrently. I will describe it while relating to different execution patterns - asynchronous invocation, Tasks (TPL), Parallel.For, and PLINQ.
The test was executed on my machine which has Quad-Core, 4GM RAM, and Windows 7 Ultimate 64bit.

For each execution pattern, I run a dummy call first (to satisfy JIT and initialization) and then I monitor 5 executions, each making these 8 concurrent service calls.

The service operation implementation does only “Thread.Sleep” for 2 seconds, which means that the whole client test should yield 2 seconds in average per run.

The following represents the list of service calls my operation receives to make the calls. (For the sake of the example, it’s just a list of integers, for each I make a service call)

static List<int> _serviceCalls = Enumerable.Repeat(1, 8).ToList();

I repeated the entire invocation of all the patterns two more times – one time right after and the last one after 1 minute of sleep.
This is because I wanted to demonstrate the fact that the Parallel Extensions take some initialization overhead when being used once in a while. (you’ll see that in the results)

Invocation Patterns:

Asynchronous Invocation:

static void CallServicesAsync(IMyService client)

{

    WaitHandle[] handles = _serviceCalls

        .Select(c => client.BeginDo(null, null).AsyncWaitHandle)

        .ToArray();

 

    WaitHandle.WaitAll(handles);

}

This pattern utilizes the asynchronous invocation pattern with the standard .NET thread pool.

Tasks: (TPL)

static void CallServicesTasks(IMyService client)

{

    Task[] tasks = _serviceCalls

        .Select(c => Task.Factory.StartNew(client.Do))

        .ToArray();

 

    Task.WaitAll(tasks);

}

This pattern utilizes the tasks infrastructure included in the Parallel Extensions.

Parallel.For:

Parallel.For(0, _serviceCalls.Count,

    //new ParallelOptions { MaxDegreeOfParallelism = calls }, –> Not needed (unlike PLINQ)

    i =>

    {

        client.Do();

    });

This pattern utilizes the parallel iteration loops included in the Parallel Extensions.

PLINQ:

_serviceCalls.AsParallel()

    .WithDegreeOfParallelism(_serviceCalls.Count)

    .ForAll(c => client.Do());

This pattern utilizes PLINQ infrastructure.

Test Results:

image

The most constant pattern at yielding almost 2sec average is by far the asynchronous invocation pattern, which did not meet my expectation.
This may change when they integrate the thread pool of the Parallel Extensions as the standard .NET thread pool, but currently, if I am facing a simple atomic IO-bound operation I would like to do concurrently – I might as well keep using the standard asynchronous invocation pattern.

Following are the key issues that lead me to it:

  1. Initialization overhead – you see in the results that when accessing the parallel extensions infrastructures once in a while, you may encounter some overhead it takes to initialize related context. This is noticeable in all of Parallel Extensions infrastructures – Parallel.For, Tasks, and PLINQ, which ever took place first caused the overhead. (after the 1minute pause)
    Unfortunately, this basically means that calls in a relatively not much time apart, you will experience the 2second average. (you may see it in the asynchronous pattern too from time to time, but it is usually the lowest)

  2. PLINQ – a specific thing about PLINQ is that it uses 1 thread per core by default.
    This means that in my case where the machine has 4 cores, you would see an average of 4seconds! (4 cores – 4 threads – 4 concurrent calls)
    This is why I used the “WithDegreeOfParallelism” directive on the PLINQ query. It is extremely important to address it when dealing with IO-bound operations using PLINQ.
    I just hope the everyday developer will take that into mind :)

In spite of these findings, you may still want to consider using the Parallel Extensions to parallelize simple atomic IO-bound operations if you don’t care about the notes above. That way you’d have what people may consider a more readable code and rely on a promising infrastructure and future optimizations.

Feel free to download the code and play with it yourself. (enter the specific post page to get the attachment link)

Update 01/04: Be sure to read the second take post.

Posted Wednesday, March 31, 2010 4:33 PM by Amir Zuker | 1 comment(s)

WCF Service Throttling

Throttling is an important behavior of your WCF service that you should be addressing before publishing your service to clients.

The throttling behavior holds the configuration for 3 limitations that control the amount of resources that your service hosting can allocate to deal with client requests, thus enables you to manage the resource usage and balance the performance load.

It is crucial that you set the behavior appropriately because the default limitations of this service behavior is considered to be quite low.
The reason for such default values was for blocking DOS (Denial of Service) attacks on your service. However, most of the services we build are on-premises services within our enterprise, it is more appropriate to support more clients rather than block malicious attacks of some kind. (The general purpose is to find the right balance between the two, yet, it usually isn’t as the default values are)

Why am I blogging about it just now?
Well, one of the rather unknown changes made in WCF 4.0 is the change of the service throttling behavior default values.

Microsoft realized that the prior default values weren’t practical and were seldom applied that way.

The following summarizes the throttling behavior elements and their default value setting.
You should note that "ProcessorCount" indicates the number of CPU processors at the machine.

  1. MaxConcurrentCalls
  2. Defines the total number of concurrent calls that the service will accept.

    Prior to .NET 4.0 the default value is 16, whereas in .NET 4.0 the default value is 16 * ProcessorCount.

    In practice, The amount of calls your service will actually handle depends on the concurrency mode your service is configured with.

  3. MaxConcurrentSessions
  4. Defines the total number of sessionful channels that the service will accept.

    Prior to .NET 4.0 the default value is 10, whereas in .NET 4.0 the default value is 100 * ProcessorCount.

    This setting affects only sessionful channels where a session is represented by each proxy created by the client.

    If a client calls through a sessionful channel which needs to be created and the limitation is exceeded, the request will be queued.

  5. MaxConcurrentInstances
  6. Defines the total number service instances that will be created for servicing requests.

    Prior to .NET 4.0 the default value is 26, whereas in .NET 4.0 the default value is 116 * ProcessorCount. (the sum of the previous two)

    In practice, the amount of service instances that will actually be created depends on the instance context mode your service is configured with.

    If your service is configured as per-session, the maximum number of instances and sessions will be the minimum value between the MaxConcurrencySessions and MaxConcurrancyInstances.

    If your service is configured as per-call, the setting limits the number of active service instances. If exceeded, new requests will be queued.

    If your service is configured as single, this setting is ignored.

Finally, WCF 4.0 has surely made the default values a bit more practical. In spite of that, I still recommend paying a visit to that behavior and set it differently if needed.

Posted Sunday, March 21, 2010 4:05 PM by Amir Zuker | 13 comment(s)

תגים:,

WCF Contrib v2.1 Mar07

A new release had been published - WCF Contrib v2.1 Mar07.

This release is the final version of v2.1 Beta that was published on February 10th, you can check the entire updates made from v2.0 further in this post.

You can distinguish between this release and the v2.1 Beta Feb10 by checking the assembly file version, in this release it was incremented to v2.1.0.1.

Changes from v2.1 Feb10 Beta:
There is a one small change which is a breaking change for those who used the asynchronous invocation pattern using the "InvokeChannelAsync" or "InvokeAsync". (protected methods of ClientChannel)
This is a seldom used pattern so the chances of that affecting you are pretty minimal.
The change is very simple, instead of working with AsyncServiceInvokeCallback in these methods, it gets the general AsyncCallback which it then calls by passing it a ChannelInvokeAsyncResult which contains all the information needed.

Posted Sunday, March 07, 2010 12:08 PM by Amir Zuker | with no comments

תגים:, ,