DCSIMG
Bnaya Eshet

Bnaya Eshet

Disclaimer

Rx - DistinctUntilChanged

Rx - DistinctUntilChanged

this post will focus on the simple yet very useful DistinctUntilChanged operator.

Rx, IObservable, IObserver, Buffer, Parallel, Concurrency, DistinctUntilChanged

sometimes a datum stream may produce the same value for a while, you can see it in stock exchange stream the value of specific stock may be steady for a while.

the observer can reduce its computation level by ignoring a repeatable value (sequential repeatable value, for none sequential you can use the Distinct operator).

the DistinctUntilChanged is having the following overloads:

Code Snippet
  1. IObservable<TSource> DistinctUntilChanged<TSource>();
  2. IObservable<TSource> DistinctUntilChanged<TSource, TKey>(
  3.     Func<TSource, TKey> keySelector);
  4. IObservable<TSource> DistinctUntilChanged<TSource>(
  5.     IEqualityComparer<TSource> comparer);
  6. IObservable<TSource> DistinctUntilChanged<TSource, TKey>(
  7.     Func<TSource, TKey> keySelector,
  8.     IEqualityComparer<TKey> comparer);

I was hiding the fist parameter of the extension method which is: this IObservable<TSource> source.

you can use the first overload when you dealing with a simple datum stream (of primitive or simple value type),
but real-life datum stream are often more complex.
for complex scenario you may prefer one of the other 3 overloads where you can define the comparison.

stock exchange scenario may be looking something like the following snippet:

Code Snippet
  1. public IObservable<Stock> AlertStream(IObservable<Stock> provider)
  2. {
  3.     return from oldAndCurrent in provider
  4.                .DistinctUntilChanged(stock => stock.Price)
  5.                .Buffer(2,1)
  6.            let old = oldAndCurrent[0]
  7.            let current = oldAndCurrent[1]
  8.            where old.Price < current.Price &&
  9.                  old.Volume + FACTOR < current.Volume
  10.            select current;
  11. }

the sample is using a sliding buffer of 2,
you can read more about the Buffer operator in here.

the AlertStream method is getting a stock stream.

it filter steady price out by using the DistinctUntilChanged operator with a lambda that indicate the Price property as the comparison property.

after that it buffer the current value with the previous value (without the distinct the buffer will not ignore steady couples) .

last thing the snippet is doing before it returns a filtered stream is to filter out old / current couples that does not match a criteria.

Summary

as simple as the DistinctUntilChanged is, it is one of useful operator which you should not ignore.


Shout it

Tpl Dataflow walkthrough - Part 5

Tpl Dataflow walkthrough - Part 5

this post is a complete walkthrough of a web crawler sample that was build purely by using Tpl Dataflow.

it was built on .NET 4.5 / C# 5 (on a virtual machine using VS 11).

I will analyze each part of this sample, both by discussing the Dataflow blocks and the patterns in used.

the sample code is available in here (it is a VS 11 project).

TDF, Tpl,Dataflow, ITargerBlock, ISorceBlock, IDataBlobk, Transform

during the walkthrough you will see the following Tpl Dataflow blocks:

  • TransformBlock
  • TransformManyBlock
  • ActionBlock
  • BroadcastBlock

you will see how the aysnc / await signature of the Dataflow blocks is better for executing an IO bound operation (without freezing a worker ThreadPool thread).

I should also mention that this post is part of the Tpl Dataflow series which you better read before reading this one.

Disclamation: the web crawler sample is for educational purpose only (running web crawler application may be forbidden by the low of your country).

The sample topography:

Tpl Dataflow application is usually a collection of agents which is linked together in order to compose a complete solution. each agent is having its own responsibilities and concerns. the following diagram present the agent topography for this sample:

TDF, Tpl,Dataflow, ITargerBlock, ISorceBlock, IDataBlobk, Transform 

agents block type and responsibilities

Downloader: the responsibility of the downloader is to download the html of a web page. it is using a TransformBlock<Tin, Tout> which belong to the executer block family. the transform block is getting a url as the input message and it produce the page's html as it output.

the transform block is construct from:

  • input buffer (for url)
  • task (do the transformation)
  • output buffer (for the downloaded html)

the task is taking one message at a time from the input buffer, transform the message by a Func<Tin, Tout> delegate which it get as a constructor parameter and put the result in the output buffer, where it is available for other blocks to consume.

later we will see that our crawler transformation is actually taking Func<Tin, Task<Tout>> which is a better signature for IO bound operations (I will discuss it latter).

TDF, Tpl,Dataflow, ITargerBlock, ISorceBlock, IDataBlobk, Transform

the transform block is a propagator block which mean that it exposed both as a target and a source block. it is implementing IPropagatorBlock<Tin, Tout>.

the following snippet show that IPropagatorBlock is simply an encapsulation of ITargetBlock and ISourceBlock.

Code Snippet
  1. public interface IPropagatorBlock<in TInput, out TOutput>
  2.     : ITargetBlock<TInput>,
  3.       ISourceBlock<TOutput>,
  4.       IDataflowBlock
  5. {
  6. }
Start crawling
Code Snippet
  1. var downloader = new TransformBlock<string, string>(
  2.     async (url) =>
  3.     {
  4.         // using IOCP the thread pool worker thread does return to the pool
  5.         WebClient wc = new WebClient();
  6.         string result = await wc.DownloadStringTaskAsync(url);
  7.         return result;
  8.     }, downloaderOptions);

as I was mentioning earlier the downloader contractor is getting Func<Tin, Task<Tout>>, therefore we can apply an async lambda expression (line 2). the code await for downloading (at line 6).

if you are not familiar with the async / await concept you can read this post or more posts in here.

anyway while awaiting for the download (DownloadStringTaskAsync) the block's task is actually return its worker thread to the ThreadPool and take advantage of the IOCP (IO Completion Port), this is an IO bound operation which mean that no CPU resources is needed while the network card fetching the data from the network.
it is important to understand that while the network card is handling the request the agent's task does not fetching another message from the buffer, the task will be interrupt when the data will be available.

analyzing the html

the crawler is using 2 agent for analyze the downloaded html:

  • link parser (which will look for links elements <a href="..."/>)
  • image parser (which will look for image elements <image src="..."/>)

both agent should be link to the downloader agent.
the problem is that linking both agent directly to the downloader agent will result with starvation of one of the agent.
unlike Rx the most blocks forward messages into the first linked target that accept the message, and ignore other linked targets. which mean that the message will be handle by a single agent at a time.

broadcast behavior can be achieved by using a BroadcastBlock<T> which is part of the pure buffer family.
the broadcast block is construct from:

  • input buffer
  • task
  • output buffer of single item.

TDF, Tpl,Dataflow, ITargerBlock, ISorceBlock, IDataBlobk, Transform, IPropagatorBlock

the task is fetching a message from the input buffer and place it in the output buffer, from the output buffer the message submit to the linked block.

the broadcast block is getting a Func<T,T> delegate as a constructor parameter, the idea behind it is cloning (which will enable separation of the messages).
if you are passing a reference type message to multiple agents, without cloning, changes that made by one agent will be visible to all the other agents.

the broadcast block will use the cloning delegate before sending the message to the linked agents.
the cloning pattern will ensure that only single block is processing a message instance at a time, this will maintain the message ownership and avoid the needs of data synchronization for thread safety.

the crawler will use the following block definition for broadcasting:

Code Snippet
  1. var contentBroadcaster = new BroadcastBlock<string>(s => s);

in our case the html content is a string which is immutable, therefore no real cloning is needed.

the crawler will link the agents (blocks) to each other after the construction of all the relevant blocks, right now we are focusing on the agents themselves.

Link parser

the link parser is using the following regular expression in order to fetch all the links (<a href=..."/>) out from the html and extract the link's url.

Code Snippet
  1. private const string LINK_REGEX_HREF =
  2.     "\\shref=('|\\\")?(?<LINK>http\\://.*?(?=\\1)).*>";
  3. private static readonly Regex _linkRegexHRef =
  4.     new Regex(LINK_REGEX_HREF);

unlike the downloader agent which get a single input (url) and produce a single output (html),
the link parser produce multiple outputs (links) per each input (html).
you can use the transform block and set the output type to array of links but the Tpl Dataflow is having a better block for this scenario.
because the processing of each link is independent of other links, it will be better if the transform output buffer will contain flatten links objects rather then a collection of link's array.

the crawler is using the TransformManyBlock<Tin,Tout>. this block is similar to the transform block with only one difference, the delegate at the constructor parameter is one of the following delegates:

  • Func<Tin, IEnumerable<Tout>>
  • Func<Tin, Task<IEnumerable<Tout>>>

the block task will extract the outputs results and put each of the extracted result, separately, in the output buffer.

TDF, Tpl,Dataflow, ITargerBlock, ISorceBlock, IDataBlobk, Transform, IPropagatorBlock 

this is the code for the link parser agent:

Code Snippet
  1. var linkParser = new TransformManyBlock<string, string>(
  2.        (html) =>
  3.        {
  4.            var output = new List<string>();
  5.            var links = _linkRegexHRef.Matches(html);
  6.            foreach (Match item in links)
  7.            {
  8.                var value = item.Groups["LINK"].Value;
  9.                output.Add(value);
  10.            }
  11.            return output;
  12.        });

it is very straight forward, parse each html by using regex and return list of result which the block will extract into the output buffer.

Image parser

the image parser is quit similar to the link parser.
the only differences is that it using different regular expression which extract the image's url.

the regex part is:

Code Snippet
  1. private const string IMG_REGEX =
  2.     "<\\s*img [^\\>]*src=('|\")?(?<IMG>http\\://.*?(?=\\1)).*>\\s*([^<]+|.*?)?\\s*</a>";
  3. private static readonly Regex _imgRegex =
  4.     new Regex(IMG_REGEX);

and the parser agent code is:

Code Snippet
  1. var imgParser = new TransformManyBlock<string, string>(
  2.         (html) =>
  3.         {
  4.             var output = new List<string>();
  5.             var images = _imgRegex.Matches(html);
  6.             foreach (Match item in images)
  7.             {
  8.                 var value = item.Groups["IMG"].Value;
  9.                 output.Add(value);
  10.             }
  11.             return output;
  12.         });
writer agent

the last operational agent is the writer agent which will download the an image from a url and save it to the local disk.

the writer is using a simple action block, which is a simple executer block that have an input buffer and a task.

TDF, Tpl,Dataflow, ITargerBlock, ISorceBlock, IDataBlobk, Transform, IPropagatorBlock

the task is fetching messages from the buffer and execute a delegate which is given as constructor parameter.
the delegate signature can be either Action<T> or Funk<T, Task>. the latter one is great for IO bound operation (from the same reasons discussed earlier when we was looking on the transform block signature).

because the writer is doing 2 IO bound operations:

  • download the image from the web
  • write the image to the file system

the crawler is using the Funk<T, Task> signature.
the writer code is:

Code Snippet
  1. var writer = new ActionBlock<string>(async url =>
  2. {
  3.     WebClient wc = new WebClient();
  4.     // using IOCP the thread pool worker thread does return to the pool
  5.     byte[] buffer = await wc.DownloadDataTaskAsync(url);
  6.     string fileName = Path.GetFileName(url);
  7.  
  8.     string name = @"Images\" + fileName;
  9.  
  10.     using (Stream srm = File.OpenWrite(name))
  11.     {
  12.         await srm.WriteAsync(buffer, 0, buffer.Length);
  13.     }
  14. });

the first await at line 5, is awaiting until the task will be interrupt by the network card,
and the second await at line 12, will await until it will be interrupt by the file system controller.

you may have been notice that the second await is within a using block, you can read more about this topic at this post.

link it together

right now we are having most of our building blocks and it is time to define the data-flow by linking the block to each other.

the downloader should be link to the content broadcaster which in tern should be linked both to the image and link parser, the image parser should be linked to the writer and the link parser should be linked back to the downloader (so it can crawl farther).

but there is one last issue.
it happens that some web page is having links that is targeting an image. this lead us to more complex linking where the link parser should be linked both to the downloader and having conditional link to the writer for those url that is having an image suffix.
as we discuss earlier having a direct link from the link parser to both the downloader and the writer will results with starvation of one of those agents.
we do need a final broadcast block which will handle this distribution task.

Code Snippet
  1. var linkBroadcaster = new BroadcastBlock<string>(s => s);

the link parser will be linked to the broadcaster and the broadcaster will be liked to both the downloader and the writer.

we have spoke of the conditional link from the link parser and the writer, but it will be more effective if the link parser to the downloader will be link only those pages that are most likely having useful data like php, aspx, htm, ext...

Filtering linked messages

the following predicates will be use in order to filter linked messages:

Code Snippet
  1. StringComparison comparison = StringComparison.InvariantCultureIgnoreCase;
  2. Predicate<string> linkFilter = link =>
  3.     link.IndexOf(".aspx", comparison) != -1 ||
  4.     link.IndexOf(".php", comparison) != -1 ||
  5.     link.IndexOf(".htm", comparison) != -1 ||
  6.     link.IndexOf(".html", comparison) != -1;
  7. Predicate<string> imgFilter = url =>
  8.     url.EndsWith(".jpg", comparison) ||
  9.     url.EndsWith(".png", comparison) ||
  10.     url.EndsWith(".gif", comparison);

the first predicate (line 2) will filter the downloader agent target and the second (line 7) will filter the link parser result which is targeting the writer agent.

compose the data-flow

finally we got to the agent composition.

TDF, Tpl,Dataflow, ITargerBlock, ISorceBlock, IDataBlobk, Transform, IPropagatorBlock

Code Snippet
  1. IDisposable disposeAll = new CompositeDisposable(
  2.     // from [downloader] to [contentBroadcaster]
  3.     downloader.LinkTo(contentBroadcaster),
  4.     // from [contentBroadcaster] to [imgParser]
  5.     contentBroadcaster.LinkTo(imgParser),
  6.     // from [contentBroadcaster] to [linkParserHRef]
  7.     contentBroadcaster.LinkTo(linkParser),
  8.     // from [linkParser] to [linkBroadcaster]
  9.     linkParser.LinkTo(linkBroadcaster),
  10.     // conditional link to from [linkBroadcaster] to [downloader]
  11.     linkBroadcaster.LinkTo(downloader, linkFilter, true),
  12.     // from [linkBroadcaster] to [writer]
  13.     linkBroadcaster.LinkTo(writer, imgFilter, true),
  14.     // from [imgParser] to [writer]
  15.     imgParser.LinkTo(writer));

each LinkTo operation return a disposable instance which can be use to dispose the link when it no longer needed. the crawler compose all those disposable together into a single disposable called dispose All by using the CompositeDisposable which is part of the Rx library.

you can see the conditional LinkTo at line 11 and 13.
is is very important to set the last parameter of the LinkTo to true if you don't want to dispose the link when the filter doesn't match the criteria.

summary

this post was a walkthrough of a web crawler sample.
the complete sample, which is available in here (VS 11), is also having exception handling, agent termination after x amount of seconds, prevention of processing the same url twice and more. for simplicity the code within this post was a simplified version.


Shout it

Rx - Aggregate vs. Scan

Rx - Aggregate vs. Scan

this post will focus on 2 Rx operators Aggregate and Scan.

Rx, Reactive extension, aggregate, scan, Iobservable, IObserver

both Aggregate and Scan are dealing with event stream accumulation, the only difference is that Aggregate produce single result (upon the stream completion)
and Scan present an ongoing runtime accumulation which react for each OnNext.

both operators has 2 overloads with the same signature:

Code Snippet
  1. IObservable<TSource> Aggregate<TSource>(
  2.     this IObservable<TSource> source,
  3.     Func<TSource, TSource, TSource> accumulator);
  4.  
  5. IObservable<TSource> Scan<TSource>(
  6.     this IObservable<TSource> source,
  7.     Func<TSource, TSource, TSource> accumulator);
  8.  
  9. IObservable<TAccumulate> Aggregate<TSource, TAccumulate>(
  10.     this IObservable<TSource> source,
  11.     TAccumulate seed,
  12.     Func<TAccumulate, TSource, TAccumulate> accumulator);
  13.  
  14. IObservable<TAccumulate> Scan<TSource, TAccumulate>(
  15.     this IObservable<TSource> source,
  16.     TAccumulate seed,
  17.     Func<TAccumulate, TSource, TAccumulate> accumulator);

the first overload (line 1,5) gets a simple accumulation Func<T,T,T> which get the previous accumulated value and the current value as parameters and should return new accumulated value (on the first accumulation the previous accumulated value will be default(T)).

the second overload define a seed value for the first accumulation and a Func<TAccumulate, TSource, TAccumulate> which get the previous accumulated value and the current value as parameters and should return new accumulated value.
notice that the accumulated value type can be different from the current value.

for example the following stream:

Code Snippet
  1. var xs = Observable.Range(1, 10);
  2.  
  3. var result = xs.Aggregate((acc, i) => acc + i);
  4. result.ForEach(item => Console.WriteLine(item));

will project a single result (55).

while the Scan version:

Code Snippet
  1. var xs = Observable.Range(1, 10);
  2.  
  3. var result = xs.Scan((acc, i) => acc + i);
  4. result.ForEach(item => Console.WriteLine(item));

will project each accumulation interval:

1
3
6
10
15
21
28
36
45
55

both operator can become very handy within a Window operator.

for more information about the Window operator see this post.

for example, you may want to accumulate stream of customers which enter a store on per hour base.

you can use the Window operator combine with the Aggregate operator to get per hour report
or using the Window combine with the Scan operation to get continues report per hour (it will let you to react immediately for a live data, for example you can react when more then 100 customer were enter the store within un hour or less).

the following code will demonstrate the aggregate scenario, but I should warn you, you are now stepping into some dark art code (which is the result of some concurrency behavior which I personally hope that the Rx team will address in the future in more intuitive way).

I consider to to add a few operator in future version of Rx Contrib which will handle this task more intuitively.

and I will also post a work-through series of how to use the Rx Contrib libraries.

what you will see is not the most intuitive code snippet but it is what you need in order to get the job done.

Code Snippet
  1. var storeStreamMock = Observable.Generate<Random, Unit>(
  2.     new Random(),   // random object
  3.     rnd => true,    // continue forever (exit term)
  4.     rnd => rnd,     // next iteration value (ignored)
  5.     rnd => Unit.Default, // projection (allways project Unit.Default)
  6.     rnd => TimeSpan.FromMilliseconds(rnd.Next(10, 100))); // deley between iterations
  7.  
  8. IObservable<Task<int>> accStream =
  9.     from win in storeStreamMock.Window(TimeSpan.FromSeconds(1))
  10.     select win.Aggregate(0, (acc, cur) => acc + 1).ToTask();
  11.  
  12. accStream
  13.     .ObserveOn(Scheduler.TaskPool)
  14.     .ForEach(item =>
  15.         Console.WriteLine(item.Result));

 

line 1-6 are generating a mock of store observable by using the Generate factory, you can completely ignore this part.

at line 9 we define a window of 5 second.

line 10 define the aggregation and export the aggregated value into a Task (TPL).

it is part of the dark art, otherwise we will end up with blocking and contentions.

the last part of the dark art is that you should process the result within the subscribe in parallel (line 13).

you can find different suggestion of how to complete such task in this thread.

Summary

both Scan and Aggregate are a very useful operators,

but you should be careful while using it within a Window. 


Shout it

Rx - Exception Handling

Rx - Exception Handling

this post will discuss exception handling within the Rx arena.

Rx, observable, observer, linq, exception handling, try, catch, finally, retry

handling event stream exception is not trivial,
for example observable should delegate exception to its subscribers though the OnError operation and cancel the subscription.
on the other hand the subscriber may want to response OnError state by renewing its subscription or fallback to alternative stream.

it is true that the Rx design guidelines suggest that faulted stream should not continue to produce data,
but real-world implementation such as stuck exchange stream (or other hot stream) may ignore this recommendation.

if you design such stream you can consider having a fault info, wrapped within the OnNext message (as a data property) instead of sending OnError state and leaving the OnError state for fatal fault which the stream cannot be recover from.

So how can you handle fault state?

Rx is having a few operator that response to OnError.

the first one is Retry which re-subscribe (forever or for specific number of failures).

for the Demonstration I will use the following observable (which produce OnError after the second OnNext):

Code Snippet
  1. var observable = Observable.Create<int>(
  2.     obs =>
  3.     {
  4.         obs.OnNext(1);
  5.         obs.OnNext(2);
  6.         obs.OnError(new SystemException());
  7.         return Disposable.Empty;
  8.     });

the following code re-subscribe 3 times before it do surrender to the evil exception.

Code Snippet
  1. observable
  2.     .Retry(3)
  3.     .Subscribe(
  4.         item => Console.WriteLine(item),
  5.         (ex) => Console.WriteLine(ex.Message),
  6.         () => Console.WriteLine("Complete"));

this scenario may be suitable for observable which download data from the network and response with an error when the network is not available (consider unreliable network).

the output will look like the following snapshot:

Rx, observable, observer, linq, exception handling, try, catch, finally, retry

sometimes it is not enough to re-subscribe and you have to define an alternative fallback stream.

consider stock exchange scenario, when ever specific stock provider has fail to supply the data you may want to switch and subscribe to different provider.

you can do so using the Catch operator:

just like try catch you can specify specific or have generic fallback strategy.

having the following fallback streams:

Code Snippet
  1. var fallback1 = Observable.Create<int>(
  2.     obs =>
  3.     {
  4.         for (int i = 0; i < 10; i++)
  5.             obs.OnNext(i);
  6.         return Disposable.Empty;
  7.     });
  8. var fallback2 = Observable.Create<int>(
  9.     obs =>
  10.     {
  11.         for (int i = 20; i < 23; i++)
  12.             obs.OnNext(i);
  13.         return Disposable.Empty;
  14.     });
  15. var fallback3 = Observable.Create<int>(
  16.     obs =>
  17.     {
  18.         for (int i = 30; i < 33; i++)
  19.             obs.OnNext(i);
  20.         return Disposable.Empty;
  21.     });

you can map the fallback using the following Rx code:

Code Snippet
  1. observable
  2.     .Catch((NullReferenceException ex) => fallback1)
  3.     .Catch((SystemException ex) => fallback2)
  4.     .Catch(fallback3)
  5.     .Subscribe(
  6.         item => Console.WriteLine(item),
  7.         (ex) => Console.WriteLine(ex.Message),
  8.         () => Console.WriteLine("Complete"));

lines 2-4 are mapping different fallbacks for different exceptions.

it will generate the following output:

Rx, observable, observer, linq, exception handling, try, catch, finally, retry

SystemException has thrown, therefore the fallback stream is starting at 20.

finally we can discuss the 3rd option.

sometimes you do not care whether the stream has stopped because it has complete or was faulted, all you really care about is to clear some resources.
in this case you can use the finally operator which will be trigger in both scenario, completed normally or in faulted state.

the following code demonstrate this API:

Code Snippet
  1. observable
  2.     .Finally(() => {/* clear some resources */})
  3.     .Subscribe(
  4.         item => Console.WriteLine(item),
  5.         (ex) => Console.WriteLine(ex.Message),
  6.         () => Console.WriteLine("Complete"));
Summary

Rx has some very useful operator which response to the OnError state, you can re-subscribe, switch into fallback stream of just handle the finalization state.


Shout it kick it on DotNetKicks.com

async / await, some reasoning

async / await, some reasoning

this post will try to make some reasoning about the .NET 4.5 / C#5 await keyword.

async, await, task, tpl, parallel, c#5, .NET 4.5

I will begin with a quiz.

how long will it take to the following method to produce the 42 value?

Code Snippet
  1. async Task<int> Execute()
  2. {
  3.     await Task.Delay(1000);
  4.     await Task.Delay(1000);
  5.     return 42;
  6. }

you should remember that conceptually the await keyword will translate to a continuation.

the above code can be compare to the following TPL 4 code snippet:

Code Snippet
  1. Task<int> Execute()
  2. {
  3.     Task t1 = Task.Factory.StartNew(
  4.         () => Thread.Sleep(1000));
  5.     Task t2 = t1.ContinueWith(t1_ =>
  6.         {
  7.             Thread.Sleep(1000);
  8.         });
  9.     Task<int> t3 = t2.ContinueWith(t2_ => 42);
  10.     return t3;
  11. }

whatever come after the await will be compile into a continuation closure.

therefore the 42 value will be produce after 2 second.

How can we await for multiple task?

there is couple of way for awaiting on multiple tasks, but the most recommended one is to use Task.WhenAll (you can also use Task.WhenAny to continue after the completion of the first task).

do not confuse the Task.WhenAll with Task.WaitAll, WhenAll is a continuation which happens when all the task come to completion, while WaitAll is a blocking API which will block the execution until all tasks will be completed.

the following snippet demonstrate the Task.WhenAll usage.

Code Snippet
  1. async Task<int> Execute()
  2. {
  3.     Task t1 = Task.Delay(1000);
  4.     Task t2 = Task.Delay(1000);
  5.     await Task.WhenAll(t1, t2);
  6.     return 42;
  7. }

the 42 value will now produce after 1 second.

it is somewhat equals to the following TPL 4 snippet:

Code Snippet
  1. Task<int> Execute()
  2. {
  3.     Task t1 = Task.Factory.StartNew(
  4.         () => Thread.Sleep(1000));
  5.     Task t2 = Task.Factory.StartNew(
  6.         () => Thread.Sleep(1000));
  7.     Task<int> t3 = Task.Factory.ContinueWhenAll(
  8.         new[]{t1, t2}, tsks => 42);
  9.     return t3;
  10. }

to wrap it up let think how long will it take to the following snippet to produce a value.

Code Snippet
  1. async Task<int> Execute()
  2. {
  3.     for (int i = 0; i < 10; i++)
  4.     {
  5.         await Task.Delay(1000);
  6.     }
  7.     return 42;
  8. }

the right answer is 10 seconds, each iteration will result in a continuation closure which will wrap the following iterations.

this can be translate to something like the following snippet (TPL 4).

Code Snippet
  1. Task<int> Execute()
  2. {
  3.     var stateMachine = new StateMachine();
  4.     return stateMachine.OnNext();
  5. }
  6.  
  7. class StateMachine
  8. {
  9.     private int _i;
  10.     private TaskCompletionSource<int> _semanticTask =
  11.         new TaskCompletionSource<int>();
  12.  
  13.     public Task<int> OnNext()
  14.     {
  15.         Interlocked.Increment(ref _i);
  16.  
  17.         Task t = Task.Factory.StartNew(() =>
  18.             Thread.Sleep(1000));
  19.         if (_i <= 10)
  20.             t.ContinueWith(t_ => OnNext());
  21.         else
  22.             _semanticTask.SetResult(42);
  23.  
  24.         return _semanticTask.Task;
  25.     }
  26. }

the execute method will create a state machine which will chain task continuation 10 times and then set the value of 42.
the TaskCompletionSource represent semantics of task (TAP - Task Async Pattern).

TaskCompletionSource does not produce any concurrency (doesn't attached to any thread), it just present a task which can be project result, exception or cancellation.

the OnNext method immediately return a semantic task which will signal as complete at line 22 (the exit term of the recursion).

Summary

await present a continuation. each time the code is hitting the await it construct a new continuation closure.


Shout it

Using async / await

Using async / await

this post will discuss parallel disposal.

async, await, parallel, task,tpl, using

whenever we want to dispose a parallel execution upon completion we can't use the convenient using keyword.

for example, the following code may be dispose the command before completion:

Very bad Code Snippet
  1. using (var conn = new SqlConnection(CONN_STR))
  2. using (var cmd = new SqlCommand("Select * from Employee", conn))
  3. {
  4.     conn.Open();
  5.     cmd.BeginExecuteReader(ar =>
  6.         {
  7.             int affected = cmd.EndExecuteNonQuery(ar);
  8.         });
  9. }

the using is absolutely wrong for the above sample.

what should be done is:

Code Snippet
  1. var conn = new SqlConnection(CONN_STR);
  2. var cmd = new SqlCommand("Select * from Employee", conn);
  3. conn.Open();
  4. cmd.BeginExecuteReader(ar =>
  5. {
  6.     int affected = cmd.EndExecuteNonQuery(ar);
  7.     cmd.Dispose();
  8.     conn.Dispose();
  9. }, null);

we can write it slightly friendlier by using the TPL FromAsync wrapper:

Code Snippet
  1. var conn = new SqlConnection(CONN_STR);
  2. var cmd = new SqlCommand("Select * from Employee", conn);
  3. conn.Open();
  4. Task<int> t = Task.Factory.FromAsync<int>(
  5.     cmd.BeginExecuteReader,
  6.     cmd.EndExecuteNonQuery,
  7.     null);
  8. t.ContinueWith(tsk =>
  9.     {
  10.         int affected = tsk.Result;
  11.         cmd.Dispose();
  12.         conn.Dispose();
  13.     });
but what if the compiler can rewrite our code?

in that case we can write a similar code to the code in the first code snippet and get it compiled into something like the above code snippet above.

this is exactly what happens when in .NET 4.5 / C#5 (async / await pattern).

when we write the following code:

Code Snippet
  1. async Task<int> ExecuteNonQueryAsync()
  2. {
  3.     int affected = 0;
  4.     using (var conn = new SqlConnection(CONN_STR))
  5.     using (var cmd = new SqlCommand("Select * from Employee", conn))
  6.     {
  7.         conn.Open();
  8.         affected = await Task.Factory.FromAsync<int>(
  9.             cmd.BeginExecuteReader,
  10.             cmd.EndExecuteNonQuery,
  11.             null);
  12.     }
  13.     return affected;  
  14. }

the compiler does rewrite this async method.

everything that follow the await keyword (line 8),

will be put into a continuation state machine, including the closing of the curly brackets (of the using).

Summary

using the new async / await pattern will dispose our resources on time.

using keyword is a very clean syntax and now we can apply it to parallel execution.


Shout it

async \ await and Exception Handling

async \ await and Exception Handling

this post will discuss how async / await is handling exceptions.

async, await, continuation, continue,exception, tpl,.net 4.5, c#5

as we mention in previous post, about the async / await concept, await is all about continuation.

before .NET 4.5 parallel execution exceptions has to be handle in separate of the synchronic handling.

for example:

handling ThreadPool execution:

Code Snippet
  1. void Foo()
  2. {
  3.     try
  4.     {
  5.         Console.WriteLine("Synchronic");
  6.         ThreadPool.QueueUserWorkItem(state =>
  7.             {
  8.                 try
  9.                 {
  10.                     Console.WriteLine("Parallel");
  11.                 }
  12.                 catch (Exception exAsync)
  13.                 {
  14.                     EventLog.WriteEntry("application", exAsync.ToString());
  15.                 }
  16.             }, null);
  17.     }
  18.     catch (Exception ex)
  19.     {
  20.         EventLog.WriteEntry("application", ex.ToString());
  21.     }
  22. }

as you can see we have to handle the parallel exception (line 12) in separate from the synchronic handling (line 18).

TPL has brought new option for handling parallel exception, now you can use ContinueWith with TaskContinuationOptions.OnlyOnFault (line 8).
but still you have to handle the parallel exception is in separate of the synchronic one:

Code Snippet
  1. void Foo()
  2. {
  3.     try
  4.     {
  5.         Console.WriteLine("Synchronic");
  6.         Task t = Task.Factory.StartNew(() => Console.WriteLine("Parallel"));
  7.         t.ContinueWith(tsk => EventLog.WriteEntry("application", tsk.Exception.ToString()),
  8.             TaskContinuationOptions.OnlyOnFaulted);
  9.     }
  10.     catch (Exception ex)
  11.     {
  12.         EventLog.WriteEntry("application", ex.ToString());
  13.     }
  14. }
async / await pattern

using the async / await pattern we can handle both synchronic and parallel exception in the same place:

Code Snippet
  1. async void Foo()
  2. {
  3.     try
  4.     {
  5.         Console.WriteLine("Synchronic");
  6.         await Task.Factory.StartNew(() => Console.WriteLine("Parallel"));
  7.     }
  8.     catch (Exception ex)
  9.     {
  10.         // handling both synchronic and parallel exceptions
  11.         EventLog.WriteEntry("application", ex.ToString());
  12.     }
  13. }

when we are using the async / await pattern at compile time the compiler convert our code into continuation state machine.
therefore the compiler can take the code within the catch area and apply it both for the synchronic and the parallel execution.

Summary

async / await pattern does simplify the exception handling. we do write our exception handling once and it will apply for both synchronic and parallel execution.


kick it on DotNetKicks.com

Rx - Sample

Rx - Sample

this post will focus on the Rx Sample operator.

IObservable,IObserver,Sample,Rx,Reactive

the Sample operation does sampling the observable stream and forward less intensive data stream of the sampled datum.

it can be prove very useful for scenario like handling
accelerometer stream which can produce 60 value per second, in some cases we don't need such intensity and
our machine resources may be happier to handle only 10 value per seconds.
the same may be apply to video stream analytics and many other scenario.

this is how the marble diagram look like:

IObservable,IObserver,Sample,Rx,Reactive

as you can see, nothing special it is just a time based filtering.

the Sample's API is:

Code Snippet
  1. var xs = Observable.Interval(TimeSpan.FromMilliseconds(0.1));
  2. var ys = xs.Sample(TimeSpan.FromMilliseconds(30));

the TimeSpan parameter define the sampling rate.

except of the TimeSpan overload the Sample operator
can be triggered upon a custom timing.

custom triggering is done by using an overload which accept IObservable as it sampling trigger.

the following example is using a random stream of data as the Sample trigger:

Code Snippet
  1. var rndStream = Observable.Create<Unit>(obs =>
  2.     {
  3.         var unsub = new BooleanDisposable();
  4.         Task.Factory.StartNew(() =>
  5.             {
  6.                 var rand = new Random();
  7.                 while (!unsub.IsDisposed)
  8.                 {
  9.                     Thread.Sleep(rand.Next(4000));
  10.                     obs.OnNext(Unit.Default);
  11.                 }
  12.             });
  13.         return unsub;
  14.     });
  15. var xs = Observable.Interval(TimeSpan.FromMilliseconds(0.1));
  16. var ys = xs.Sample(rndStream);
  17. ys.Subscribe(item => Console.Write("{0}, ", item));

as you can see ,at line 16, the xs stream will be sampled at random rate.

Summary

Sample is very straight forward operator which bring the ability of reducing stream intensity in cases which a lower data rate is better.


Shout it

Rx - Window

Rx - Window

continuing with the Rx series, this post will discuss the Window operator.

Rx, Reactive Extension,Window,Buffer, IObservable, IObserver

in previous post I was discussing the Buffer operator which enable buffering of Rx datum stream into chunks.

has good and useful as the Buffer operator is, it doesn't nail up every single scenario.

let consider a scenario of tracing the highest and lower value within a time period. for example hourly tracking of a service monitoring which produce values every second.

technically we can use the Buffer operator for handling this scenario. the problem is that buffering a hour of data for each of our services will end up with 3660 long living items for each of the services.

this will lead us to:

  • item which survive the GC Gen 0,1 collection.
  • large object heap allocation in case large chunks.

the point is that in this scenario buffering the data is doesn't needed at all, because calculating the highest and lower value can be done on the fly by comparing the current value against the previous one.

The right operator for this scenario

so we need some kind of window (either of a time period, fix item count or the combination of the 2) which will project value within it scope without buffering.

Rx does have such operator which is the Window operator.

in contrast to the Buffer operator which return IObservable<IList<T>> the Window operator return IObservable<IObservable<T>>.

at first glance it seem rather odd signature, but I will right about to show how powerful is this concept and how can you take advantage of it.

but before we will get into the implementation details we rather take a look on the differences between the Buffer and Window operators.

the Buffer marble diagram:

Rx, Reactive Extension,Window,Buffer, IObservable, IObserver

the buffer accumulate the item internally until the end of its buffering period and then project the accumulated values as IList<T>.

the Window marble diagram is:

Rx, Reactive Extension,Window,Buffer, IObservable, IObserver

unlike the Buffer, which is caching the item internally, the Window does not cache the items at all, each item is immediately project through IObservable<T>.OnNext.

Window support sliding window and the custom periods API, just like the Buffer operator do (read more on that at the Buffer operator post).

sliding Window will share the same item in multiple windows:

Rx, Reactive Extension,Window,Buffer, IObservable, IObserver

Better memory utilization

back to our initial goal, the following code is using an Aggregate operator over a Window and calculate the min / max value for each period.

I will use the following class of aggregation

Code Snippet
  1. class MinMaxItem
  2.     {
  3.         public long? Min { get; private set; }
  4.         public long? Max { get; private set; }
  5.  
  6.         public static MinMaxItem Calc(MinMaxItem instance, long value)
  7.         {
  8.             if (!instance.Min.HasValue)
  9.             {
  10.                 instance.Min = value;
  11.                 instance.Max = value;
  12.             }
  13.             else
  14.             {
  15.                 instance.Min = Math.Min(instance.Min.Value, value);
  16.                 instance.Max = Math.Max(instance.Max.Value, value);
  17.             }
  18.             return instance;
  19.         }
  20.     }

and the following code demonstrate the plumbing:

Code Snippet
  1. var xs = Observable.Interval(TimeSpan.FromMilliseconds(0.1));
  2.  
  3. IObservable<MinMaxItem> minMax =
  4.     from win in xs.Window(10)
  5.         from item in win.Aggregate(
  6.             new MinMaxItem(),
  7.             MinMaxItem.Calc)
  8.         select item;
  9.  
  10. minMax.Subscribe (item => Console.WriteLine(
  11.     "Min = {0} /tMax = {1}",
  12.     item.Min.Value, item.Max.Value));

the above code snippet is using the from keyword twice (line 4 and 5) which is in fact using the SelectMany operator. it is used to extract the inner IObservable<T> out of the IObservable<IObservable<T>>.

other alternative is to use the Switch operator, which also extract the inner IObservable<T> out of Coverable<IObservable<T>>.

in this case the code will look like:

Code Snippet
  1. IObservable<IObservable<MinMaxItem>> minMax =
  2.     from win in xs.Window(10)
  3.     select win.Aggregate(
  4.         new MinMaxItem(),
  5.         MinMaxItem.Calc);
  6.  
  7. minMax.Switch().Subscribe(item => Console.WriteLine(
  8.     "Min = {0} /tMax = {1}",
  9.     item.Min.Value, item.Max.Value));

notice the Switch operator at line 7.

Summary

I have shown the Window operator and it's capabilities.

because IObservable<IObservable<T>> is quit a complex API, Window is usually come with other operator like Aggregate, SelectMany or other operator which I will discuss on latter posts.

the Switch operator is very handy when you want to flatten the Window output (it is simply merge the inner observable streams into single result).

an alternative to Switch is the SelectMany operator which can handle each of the Window streams separately (it can be used directly or by the nested from syntax).

you can read more on Window memory benefit on James Miles's post about using Rx Window in stock exchange scenario.

and be aware of some parallelism issues involve with Window, Aggregate and SelectMany which you can read more about in this thread of the Rx forum which discuss some of the pitfall, API request and alternative API for the window aggregation scenario.

finally you should use the operator that is most appropriate for your scenario, for example the Buffer operator is great for balance IO operations, and the Window operator is great to reduce memory pressure upon aggregation.


Shout it

the concept of async \ await

the concept of async \ await

in this post I will survey the new .NET 4.5 / C# 5 concept of async / await.

async, await, .NET 4.5, C#5, continuation

I will focus on how to understand what is really happens behind the new async / await syntax.

What's it all about?

the new async / await syntax is using the C# syntactic compiler to generate async operation from code that is looking very much like a synchronous code.

but before we start we should discus the new C# 5  syntax.

the syntax include 2 keywords:

  • async - which is only a marker for async method.
  • await - indicate a callback boundary.
Code Snippet
  1. static async Task Execute()
  2. {
  3.     Console.WriteLine(run on calling thread);
  4.  
  5.     await Task.Factory.StartNew(() => Thread.Sleep(1000));
  6.  
  7.       Console.WriteLine(run on callback thread);
  8. }

so how should we understand what was written in the above code?

actually it is a different way to represent a continuation (you can read more about the continuation concept in here).

the above code is somewhat identical to the following TPL 4 code:

Code Snippet
  1. static Task Execute()
  2. {
  3.     Console.WriteLine(run on calling thread);
  4.     Task t = Task.Factory.StartNew(() => Thread.Sleep(1000));
  5.     return t.ContinueWith (tsk =>
  6.         {
  7.               Console.WriteLine(run on callback thread);
  8.         });
  9. }

the syntactic compiler will translate the code below the await keyword into continuation state machine, which is logically (not technically) identical to the above code.

Point of interest:

you may have been notice that the async method return a Task even though there is no return within the method block.
surveying the TPL 4 code snippet we can understand that the async method will actually return to the caller immediately after the Task.Factory.StartNew start the task and the rest of the code is actually a continuation callback.

what we got back from the async method is a task which represent the async part of the method.

async / await with return value

async / await can represent a continuation of a callback that accept async result.

Code Snippet
  1. static async Task<DateTime> Execute()
  2. {
  3.     DateTime result = await Task.Factory.StartNew(() => DateTime.Now );
  4.  
  5.       return result.AddDays(1);
  6. }

the above code will logically translate to:

Code Snippet
  1. static Task<DateTime> Execute()
  2. {
  3.     Task<DateTime> t = Task.Factory.StartNew(() => DateTime.Now);
  4.     return t.ContinueWith (tsk =>
  5.         {
  6.               return tsk.Result.AddDays(1);
  7.         });
  8. }

you may have notice that the return value (on the left side of the await) was unwrapped (DateTime instead of Task<DateTime>)

Which thread is running?

async, await, .NET 4.5, C#5, continuation

normally when the method doesn't invoke from the UI thread, everything before the await line will run synchronously on the caller thread.
the Task.Run naturally will be schedule on a different thread and everything under the await will be schedule on different thread then the caller thread, it may be the same thread of the Task.Run or any other ThreadPool thread (when there is only single continuation it will probably be the same thread as Task.Run)

Async and UI

whenever the async method invocation is coming from UI thread (or to be more precise from thread under synchronization context) the continuation return back to the synchronization context thread.

this is quit similar to the following TPL code (.NET 4):

Code Snippet
  1. TaskScheduler scheduler = TaskScheduler.FromCurrentSynchronizationContext();
  2. Task t = Task.Factory.StartNew(() => Trace.WriteLine("in parallel"));
  3. t.ContinueWith(tsk => Trace.WriteLine("UI thread"), scheduler);

or to more legacy code which is using the synchronization context directly:

Code Snippet
  1. Action a = () => Trace.WriteLine("in parallel");
  2. SynchronizationContext sc = SynchronizationContext.Current;
  3. a.BeginInvoke(ar =>
  4.     {
  5.         sc.Post(state => Trace.WriteLine("UI thread"), null);
  6.     }, null);

async / await is aware of the synchronization context of the caller and if any it schedule the await callback on this context.

async, await, .NET 4.5, C#5, continuation

Summary

the syntactic compiler translate the async / await syntax into state machine which handle the continuation flow after parallel operation.

there is much more for that and I will discuss it in future posts.

you can see it performance characteristic on this post.


Shout it

TPL - Continuation

TPL - Continuation

this post will discuss TPL Continuation.

Tpl, Continuation, continuewith,task

TPL continuation can chain task into a pipeline.

when dealing with dependencies between parallel work units, like [encoding -> compression -> encryption], continuation is the API for scheduling work unit upon completion of other work unit.

the general idea is quit similar to the old APM pattern (BeginXxx, EndXxx) callback.

basic completion

the syntax of continuation:

Code Snippet
  1. Task tsk = Task.Factory.StartNew(() => {/* do somethng*/});
  2. tsk.ContinueWith(t => {/* continue when something complete*/});

continuation API is fairly straight-forward:

we can define a continuation action upon task completion (the t in the lambda represent the completed task).

completion of Task<T>

continuation does also support Task<T> this way we can handle a task result upon the task's completion.

the following sample show the concept of the [encoding -> encryption -> send] pipeline:

Code Snippet
  1. Task<byte[]> tskEncoding = Task.Factory.StartNew(() =>
  2. {
  3.     return Encoding.UTF8.GetBytes(data);
  4. });
  5. Task<Byte[]> tskEncrypt  = tskEncoding.ContinueWith(t =>
  6. {
  7.     return Encrypt(t.Result);
  8. });
  9. Task tskSend = tskEncoding.ContinueWith(t =>
  10. {
  11.     Send(t.Result);
  12. });

the first task (line 1-4) return (async) encoding data.

the first continuation task (line 5-8) get the encoded data from the result of the completed task and encrypt it, the encrypted data return as the task's result.

the second continuation (line 9-12) will be schedule on the completion of the encryption task.

multiple completions

completion API does not limit to single completion per a task. the completion represent a callback and we can set as many callback as we need for any Task (or Task<T>).

Code Snippet
  1. Task tsk = Task.Factory.StartNew(() => {/* do somethng*/});
  2. tsk.ContinueWith(t => {/* callback 1 */});
  3. tsk.ContinueWith(t => {/* callback 2 */});
Continue when all/any

we can also set continuation which will be trigger upon the completion of multiple tasks.

Code Snippet
  1. Task tsk1 = Task.Factory.StartNew(() => {/* do somethng */});
  2. Task tsk2 = Task.Factory.StartNew(() => {/* do somethng else */});
  3. Task[] tsks = new Task[] { tsk1, tsk2 };
  4. Task.Factory.ContinueWhenAll(tsks, tskArr => {/* callback 1 */});

or continue on the completion of the first among multiple tasks.

Code Snippet
  1. Task tsk1 = Task.Factory.StartNew(() => {/* do somethng */});
  2. Task tsk2 = Task.Factory.StartNew(() => {/* do somethng else */});
  3. Task[] tsks = new Task[] { tsk1, tsk2 };
  4. Task.Factory.ContinueWhenAny(tsks, firstTask => {/* callback 1 */});
Parent / Child

as you may know TPL support a parent / child execution model,
when you start a task within an executing task scope you can set the task behavior to accept the parent / child paradigm.
the TPL infrastructure does aware when a task is having children and behave accordantly (wait will wait for the completion of all the task's children, cancelling a parent task will affect all of its children, the debug parallel tasks window can present the task's hierarchic).

Code Snippet
  1. Task.Factory.StartNew(() =>
  2. {
  3.     Task.Factory.StartNew(() =>
  4.         {
  5.             // ...
  6.         },TaskCreationOptions.AttachedToParent);
  7.     // ...
  8. });
Parent / Child and continuation

when it come to continuation the continuation callback will occurs only after the completion of all the task's children.

Code Snippet
  1. var t = Task.Factory.StartNew(() =>
  2. {
  3.     Task t1 = Task.Factory.StartNew(() =>
  4.     {
  5.         Thread.Sleep(1000);
  6.         Console.WriteLine("child1");
  7.     }, TaskCreationOptions.AttachedToParent);
  8. });
  9. t.ContinueWith(tsk => Console.WriteLine("Complete !!!"));

the above code demonstrate a simple continuation upon parent child task.

the completion will occurs when after the completion of t1.

Parent / child with nested continuation

let take another scenario when both the parent and the child task is having a continuation.

Code Snippet
  1. var t = Task.Factory.StartNew(() =>
  2. {
  3.     Task t1 = Task.Factory.StartNew(() =>
  4.     {
  5.         Thread.Sleep(1000);
  6.         Console.WriteLine("child1");
  7.     }, TaskCreationOptions.AttachedToParent);
  8.     t1.ContinueWith(tsk =>
  9.         {
  10.             Thread.Sleep(1000);
  11.             Console.WriteLine("child continuation");
  12.         });            
  13. });
  14.  
  15. t.ContinueWith(tsk => Console.WriteLine("parent continuation"));

let think of the above code. will the parent continuation complete before or after the child continuation?

the answer is: the parent continuation will ignore the child continuation and complete first.

the parent continuation will be aware of the child continuation only if we mark the child continuation with TaskContinuationOptions.AttachedToParent.

Code Snippet
  1. var t = Task.Factory.StartNew(() =>
  2. {
  3.     Task t1 = Task.Factory.StartNew(() =>
  4.     {
  5.         Thread.Sleep(1000);
  6.         Console.WriteLine("child1");
  7.     }, TaskCreationOptions.AttachedToParent);
  8.     t1.ContinueWith(tsk =>
  9.         {
  10.             Thread.Sleep(1000);
  11.             Console.WriteLine("child continuation");
  12.         }, TaskContinuationOptions.AttachedToParent);            
  13. });
  14.  
  15. t.ContinueWith(tsk => Console.WriteLine("parent continuation"));

now the parent continuation will complete after the completion of the child's task continuation.

Conditional continuation

till now we have seen many of the continuation scenarios. the last scenario which I want to present is the cool ability of tuning the continuation to occurs only when the execution status end with specific condition.

you can set the continuation to occur only on success, failure or cancellation.

Code Snippet
  1. var cancellation = new CancellationTokenSource();
  2. Task t = Task.Factory.StartNew(() =>
  3.     {
  4.         if (Environment.TickCount % 2 == 0)
  5.             throw new Exception();
  6.         else
  7.             Console.WriteLine("pass");
  8.     }, cancellation.Token);
  9.  
  10. t.ContinueWith(tsk => Console.WriteLine("OK"),
  11.     TaskContinuationOptions.OnlyOnRanToCompletion);
  12. t.ContinueWith(tsk => Console.WriteLine("Cancelled"),
  13.     TaskContinuationOptions.OnlyOnCanceled);
  14. t.ContinueWith(tsk => Console.WriteLine("Failed"),
  15.     TaskContinuationOptions.OnlyOnFaulted);

the TaskContinuationOptions is a bitwise so you can specify multiple option upon single continuation.

Summary

continuation is one of the most powerful feature of the new TPL infrastructure.

it is having more feature and it simpler to use than the old APM pattern.

using the continuation pattern we can manage complex parallelism with regard of dependencies.

finally ,as I will describe in latter past, the new async  feature (of .NET 4.5 / C#5) is all about the continuation concept.


Shout it

Rx - SP1

RX - SP1

Rx release is having it first service pack.

rx, SP, IObservable, IObserver

The Service Pack release doesn't include any new API-level functionality and fixes a few minor bugs (all of which were already fixed in the Experimental Releases in the v1.1 band):

  • Scheduler.TaskPool now guarantees the use of the task pool. See this forum post for more info.
  • SkipUntil now propagates errors of the source sequence, even when the "until" sequence hasn't fired yet.
  • ToQbservable now accepts an IScheduler parameter, mirroring its ToObservable brother.
  • Take(0) is now supported, resulting in an overload that accepts an IScheduler to produce the OnCompleted message.

In addition to those fixes, this (supported) release includes support for Silverlight 5 and Windows Phone 7.5, so you'll find the Rx assemblies in the Add New Reference dialog for those project types.

When using the MSI installer, you'll notice the installer performs an in-place update of any existing Rx SDK v1.0.10621 installation you may have on your machine. If you don't have the v1.0 SDK installed yet, you can simply use the new MSI to perform a clean install as well.

Assembly version numbers (used by the CLR) continue to be 1.0.10621.0, hence you don't need to recompile applications that use Rx v1.0 but you can simply service the Rx binaries. The file version number (used by installers to upgrade files) of the assemblies has bumped to 1.0.11221.5 (reflecting the build date, i.e. December 21st, precisely six months after the initial release). Also, the version number of the MSI package and NuGet packages will reflect the 11221 build number.

Rx - Buffer

Rx - Buffer

this post is on of a series of post about Rx (Reactive Extension). in this one I will discuss the Buffer operator.

no doubt that one of the most useful Rx operator is the Buffer.

Buffer,Rx,Reactive,IObservable,IObserver

the Buffer operator enable to reduce a throughput pressure and gain better utilization of our resources.

let take a scenario of monitoring data stream and persist the datum into database (or send it through a network boundaries).

assuming the datum rate is 1 per millisecond, databases does not typically design to work well for round-trips of such frequency,
but if we can buffer a chunk of datum each second (or more) we can save those chunk in much lower frequency (maybe by using bulk insert).

this is how we can gain better utilization of our system.

this is exactly what the Buffer operator does.
it can create chunk of data from an observable either by time or by count (or even by the combination of both).

it present those chunk as observable of IList<T> which mean that if we are dealing with high frequency Observable<T> we can transform it to less frequently Observable<IList<T>>.

the Buffer operator is very simple to use and very flexible in term of the batch size.

Buffer API

the following example demonstrates reducing pressure of 1 millisecond frequency observable:

Code Snippet
  1. var xs = Observable.Interval(TimeSpan.FromMilliseconds(1));
  2. var bufferdStream = xs.Buffer(TimeSpan.FromSeconds(1));
  3. bufferdStream.Subscribe(item => {/* do bulk insert */});

the same can be done with buffering for every n items:

Code Snippet
  1. var xs = Observable.Interval(TimeSpan.FromMilliseconds(1));
  2. var bufferdStream = xs.Buffer(1000);
  3. bufferdStream.Subscribe(item => {/* do bulk insert */});

there is even API for the combination of both, which mean that the buffer will be close, either after n item or elapsed of a duration:

Code Snippet
  1. var xs = Observable.Interval(TimeSpan.FromMilliseconds(1));
  2. var bufferdStream = xs.Buffer(TimeSpan.FromSeconds(1), 1000);
  3. bufferdStream.Subscribe(item => {/* do bulk insert */});

the Buffer operator can be overlapped, which mean that more than one buffer can coexist at a time (datum will be capture by multiple buffers)

Code Snippet
  1. var xs = Observable.Interval(TimeSpan.FromMilliseconds(1));
  2. // buffer window of 1 second will be open every 0.1 second
  3. var bufferdStream = xs.Buffer(TimeSpan.FromSeconds(1),
  4.                             TimeSpan.FromSeconds(0.1));
  5. bufferdStream.Subscribe(item => {/* do bulk insert */});

the marble diagram of it is:

Rx,Buffer,IObservable,IObserver

you can see that each buffer hold 4 datum and the same datum can be include in multiple buffers (IList<T>).

Advance Buffer API

actually you can gain even better control on the opening and closing of a buffering windows.

the buffer window design to accept an observable as the opening trigger of the buffering window and a corresponding observable factory for signaling the window's closing trigger.

even though this API look a bit odd it is very powerful.

with this API you can activate buffer as response for external situation. let think of a buffering stream which should buffer cars engine performance within separate geographical regions. we can start buffering each time the car GPS indicate a region border and close the buffer window on exiting the region.

real life scenario often need this kind of granularity.

the API for this feature goes like this:

Code Snippet
  1. var regionBorderStream = Observable.Create<Unit>(obs =>
  2.     {
  3.         // read and analize gps data
  4.         return Disposable.Empty;
  5.     });
  6. var carEngineStream = Observable.Interval(TimeSpan.FromMilliseconds(1));
  7. var bufferdStream = carEngineStream.Buffer(regionBorderStream,
  8.     region => regionBorderStream.Where(item =>  item == region));
  9. bufferdStream.Subscribe(item => {/* do bulk insert */});

regionBorderStream (at line 1) represent a stream of region passing notification.

carEngineStream (at line 6) represent a car engine information stream.

and the buffered stream (at line 7) is buffering the engine stream while open a buffer each time it enter a region, which mean whenever the regionBorderStream produce a value. the buffer will be close when we will exit the region.

you may have notice that the above code does support multiple buffering at a time. you may enter the city region and then enter a sub region (for example New York and the Central Park or Beijing and the Forbidden City area).

Summary

Buffering is very powerful scenario.

I will survey other operator on future post and we will discuss the advantage and disadvantage of the Buffer operator in compare with some of the other operators.

Tpl Dataflow (IDataflowBlock) - Part 5

Tpl Dataflow (IDataflowBlock ) - Part 5

the previous post discus the concept ITargetBlock and ISourceBlock,
which is the TDF consumer/Producer contract.

you can find all the post in this series under the TDF tag.

this post focus on the IDataflowBlock contract which is the life-time management contract for all data-flow's blocks.

IDataflowBlock

the IDataflowBlock define single property and 2 methods:

Code Snippet
  1. public interface IDataflowBlock
  2. {
  3.     Task Completion { get; }
  4.  
  5.     void Complete();
  6.     void Fault(Exception exception);
  7. }

ending the processing of Dataflow block is done either by calling Complete() or by Fault() in cases that the data flow should exit into fault state.

in case of Complete the block will finish the processing of all messages that already in it's inner buffer and decline (DecliningPermanently) any incoming messages.

Fault  put the block into faulty state and it does not schedule any messages from its inner buffer.

the Completion property is using the TAP (task asynchrony pattern) concept as it API for monitoring the block state.

the completion property return a Task which enable either waiting (Wait) on, continuation (ContinueWith) or await.

the block will set the task completion when it will complete the processing of all the messages within it inner buffer (or when finishing current executing messages in case of faulty state).

because the block is using the task semantic for it's easy to handle block exceptions.

the following code demonstrate the management of ActionBlock lifetime.

Code Snippet
  1. var ab = new ActionBlock<int>(i =>
  2. {
  3.     Thread.Sleep(1000);
  4.     Console.WriteLine(i);
  5. });
  6.  
  7. ab.Completion.ContinueWith(t =>
  8. {
  9.     if (t.Status == TaskStatus.Faulted)
  10.         Console.WriteLine("Failiur: {0}", t.Exception);
  11.     else
  12.         Console.WriteLine("Complete");
  13. });
  14.  
  15. Console.WriteLine("Proccesing");
  16.  
  17. ab.Post(1);
  18. ab.Post(2);
  19. ab.Post(3);
  20.  
  21. if (Console.ReadKey().KeyChar == 'f')
  22.     (ab as IDataflowBlock).Fault(new Exception("wrong tick"));
  23. else
  24.     ab.Complete();

at line 7 we have set a continuation callback which check the completion status and writing the completion information.

point of interest

as we discuss in the previous post the dataflow blocks does have internal task which is responsible for its execution (by default it is one task per block but it can be throttle to work with multiple tasks) this task does release when ever the block become idle (when the block will become active again it will construct new task).

it is important to understand that the completion task is a semantic task and not one of the worker tasks.

in case that you want to create you're own custom block you can use the TaskCompletionSource<T> in order to present the TAP (task asynchrony pattern) semantics.

you can learn more on the TAP concept in here.

Summary

every dataflow block implement the IDataflowBlock contract which enable to control and monitor the block life-time.

Task != Thread

Task != Thread

whenever I teaching the Tpl Task subject I continually repeating the mantra which say that "task is a metadata/context of execution and it does not really responsible for the actual execution".

Task is a data structure which hold information about code execution, it's hold the delegate which will be execute, status, state, result, exception synchronization object, ext...
but the responsibility of the execution is actually belong to the Task Scheduler.

in matter of fact task can be execute synchronously.

Code Snippet
  1. Console.WriteLine(Thread.CurrentThread.ManagedThreadId);
  2. Task t = new Task(() =>
  3.     Console.WriteLine(Thread.CurrentThread.ManagedThreadId));
  4. t.RunSynchronously();

by default it is execute on a thread pool worker thread but it can be execute on non thread pool thread (using the TaskCreationOption.LongRunning overload).

Code Snippet
  1. Task.Factory.StartNew(() =>
  2.     Console.WriteLine(Thread.CurrentThread.IsThreadPoolThread));
  3. Task.Factory.StartNew(() =>
  4.         Console.WriteLine(Thread.CurrentThread.IsThreadPoolThread),
  5.         TaskCreationOptions.LongRunning);

it can be benefit from IOCP (IO Completion Port) in order to avoid thread pool starvation (when performing IO operations).
IO operation are not CPU bounded operation, those operation performed by the hard disk controller, the network card, ext...

we can apply task semantic to IO operation which is using the APM pattern by using Task.Factory.FromAsync

Code Snippet
  1. //Task<int> returns the number of bytes read
  2. Task<int> taskAsyncRead = Task<int>.Factory.FromAsync(
  3.         file.BeginRead,  // begin invoke delegate
  4.         file.EndRead,    // end invoke delegate
  5.         buffer,          // read buffer
  6.         0,               // start index
  7.         buffer.Length,   // length
  8.         null);           // optional state

this lead us to the most general abstraction of the task semantic which is TaskCompletionSource<T>.

TaskCompletionSource<T> is a class that help us to apply TAP (Task Asynchrony Pattern) semantic.

in general we can use this class to apply a TAP semantic for any operation.

for example the following code apply a TAP semantics for WebClient download:

Code Snippet
  1. private static Task<string> DownloadAsync(string address)
  2. {
  3.     var uri = new Uri(address);
  4.  
  5.     // present TAP semantics
  6.     var tcs = new TaskCompletionSource<string>();
  7.  
  8.     var proxy = new WebClient();
  9.     // handling the download async result
  10.     proxy.DownloadStringCompleted += (s, e) =>
  11.         {
  12.             if (e.Cancelled)
  13.                 tcs.SetCanceled();
  14.             else if (e.Error != null)
  15.                 tcs.SetException(e.Error);
  16.             else
  17.                 tcs.SetResult(e.Result);
  18.  
  19.             proxy.Dispose();
  20.         };
  21.     // start downloading a-sync
  22.     proxy.DownloadStringAsync(uri);
  23.  
  24.     return tcs.Task; // does not wait for completion
  25. }

the TaskCompletionSource<T> has the following methods:

  • SetCanceled: apply cancellation semantic.
  • SetException: apply fault semantic.
  • SetResult: apply completion semantic.

and it have a Task property which encapsulate the TAP semantics and can be hand back to the caller.

IOCP

because WebClient async operation is using the IOCP by wrapping it with TAP semantic we gain a task which run over the IOCP.

Can we take it farther?

suppose we want to use a Task which is trigger by FileSystemWatcher. it is actually very feasible and easy when we are using the TaskCompletionSource<T>.

the following extension method returns task which will trigger when the user will delete text file.

Code Snippet
  1. public static Task<string> ToTask(
  2.     this FileSystemWatcher instance, Action action)
  3. {
  4.     var tcs = new TaskCompletionSource<string>();
  5.     instance.EnableRaisingEvents = true;
  6.     instance.Filter = "*.txt";
  7.     instance.Deleted += (s, e) =>
  8.         {   
  9.             action();
  10.             tcs.SetResult(e.FullPath);
  11.         };
  12.  
  13.     return tcs.Task;
  14. }

then it can be call from anywhere:

Code Snippet
  1. static void Main(string[] args)
  2. {
  3.     bool completed = false;
  4.  
  5.     var fsw = new FileSystemWatcher(".");
  6.     Task<string> t = fsw.ToTask(() =>
  7.         Console.WriteLine("Executing"));
  8.     t.ContinueWith(tsk =>
  9.         {
  10.             completed = true;
  11.             Console.WriteLine();
  12.             Console.WriteLine(tsk.Result);
  13.         });
  14.  
  15.     while (!completed)
  16.     {
  17.         Console.Write(".");
  18.         Thread.Sleep(100);
  19.     }
  20. }
Summary

Task != Thread, by default Task will be schedule on a thread pool thread but that is the responsibility of the task scheduler.

the TAP abstraction does enable to apply its semantic to almost any operation.

in future post I will show how to apply the FileSystemWatcher to the async/await pattern

More Posts Next page »