Demystify Async and Await (Part 1 of 2)

2016/03/29

no comments

Demystify Async and Await (Part 1 of 2)

After seeing so many confusion speed abroad the industry,

I had came to conclusion to write this post and demystify

the behavior of async and await.

 

To many people is having to many wrong conceptions and

theories about the functionality of async and await.

It lead to great confusion and bad practices.

In order to explain async and await I must go back to

the basic and demystify Task.

 

Part 1 of this 2 part series will explain what really is Task and

the confusion almost everybody in the industry has on its real nature.

Part 2 of this series will demystify async and await.

I strongly recommend to read part 1 before part 2.

Task, Parallel, async, await, concurrent

 

So what is Task?

Task is nothing more than a data structure.

Really, I’m not joking.

As simple as it seems, this is where its strength come from.

Many developers are thinking of Task as Thread, but this is 

wrong conception which lead to many mistakes.

Task is nothing more than a data structure, which represent the context of execution.

It have the following structure:

Status (the status represent the Task’s stage which can be WaitingToRun, Running, RanToCompletion, Canceled or Faulted)

  in general it represent different stage of the execution.

Pre execution data:

* AsyncState: arbitrary object which can be store on construction time

* Delegate: a method which may be invoke by a task’s scheduler.

   The decision of when and how to invoke the delegate

   is usually taken by the Task Scheduler (which is external to the Task)

   and never taken by the Task itself.

   In matter of fact in some cases the delegate won’t contain value at all.

* CreationOptions: some flags which may read by the Task Scheduler.

Post execution data:

* Result (in case of Task<T>): hold the execution result in case of success.

* Exception: Faulted state information (in case that the Task represent context of

   faulted execution).

  actually Task can represent failure of execution without ever running nor having a delegate.

 

Why is it so confusing, Why Task do seem like a Thread?

The code snippets ahead will use the following method in order of

writing the Thread information to the console..

Code Snippet
  1. private static void WriteThreadInfo(object title)
  2. {
  3.     var currentTread = Thread.CurrentThread;
  4.     int id = currentTread.ManagedThreadId;
  5.     bool isPool = currentTread.IsThreadPoolThread;
  6.     Console.WriteLine($"{title}: Id = {id}: Is Pool = {isPool}"); // C# 6 syntax
  7. }

 

It is so confusion because at first glance, Task functionality do seem equivalent to Thread (or Thread Pool).

considering the following code snippet this is a most logical conclusion.

Code Snippet
  1. static void Main(string[] args)
  2. {
  3.     WriteThreadInfo("Main");
  4.  
  5.     var trd = new Thread(WriteThreadInfo);
  6.     trd.Start("Thread");
  7.  
  8.     Task t1 = new Task(WriteThreadInfo, "Task 1");
  9.     t1.Start();
  10.  
  11.     ThreadPool.QueueUserWorkItem(WriteThreadInfo, "ThreadPool");
  12.  
  13.     Task t2 = Task.Factory.StartNew(WriteThreadInfo, "Task 1", TaskCreationOptions.LongRunning);
  14.  
  15.  
  16.     Task t3 = Task.Factory.StartNew(WriteThreadInfo, "ThreadPool");
  17. }

Considering the code above, you’re likely to come to conclusion that Task equals Thread.

 

so why do I argue that Task is not equivalent to Thread?

Like a good magician Task is hiding something which cause the illusion of Thread duality.

It’s happen that Task’s overloads shown at the above snippet is actually hiding the Task Scheduler.

When you don’t pass Task Scheduler to a Task, you’re actually using TaskScheduler.Current

(not TaskScheduler.Default which lead to different confusion, which I may write about on future post).

 

The following snippet is showing the missing piece:

Code Snippet
  1. Task t1 = Task.Factory.StartNew(WriteThreadInfo, "Task 1");
  2. t1.Start(TaskScheduler.Current); // run on the thread pool
  3.  
  4. Task t2 = Task.Factory.StartNew(WriteThreadInfo, "Task 1", TaskCreationOptions.LongRunning);
  5. t1.Start(TaskScheduler.Current); // run on new thread
  6.  
  7.  
  8. Task t3 = Task.Factory.StartNew(WriteThreadInfo, "ThreadPool",
  9.                             CancellationToken.None,
  10.                             TaskCreationOptions.None,
  11.                             TaskScheduler.Current);

Lines 2,5,11 suggest that the execution of the Task many not be its own responsibility at all.

The scheduler is what actually running the Task.

The Task responsibility is to reflect the execution context.

It may represent execution aspects like:

– Has it run?

– Does it complete and having value?

– Did it failed and having exception?

– Did it cancelled?

 

Line 2 may run the Task’s delegate on the thread pool while

line 5 may run it on a new thread (not pooled).

The Task Scheduler can read the Task’s data and decide to open new thread rather than

thread pool (for task that marked with  TaskCreationOptions.LongRunning).

 

Furthermore Task is data structure and it many not be running at all.

Take a look on the following code snippets.

Code Snippet
  1. // completed task which never run
  2. Task<int> t4 = Task.FromResult(1);
  3.  
  4. var tcs5 = new TaskCompletionSource<int>();
  5. tcs5.TrySetResult(1);
  6. // completed task which never run
  7. Task<int> t5 = tcs5.Task;
  8.  
  9. var tcs6 = new TaskCompletionSource<int>();
  10. tcs6.TrySetException(new ArgumentException("Execution failure"));
  11. // completed task which never run
  12. // and represent failure state
  13. Task<int> t6 = tcs6.Task;
  14.  
  15. var tcs7 = new TaskCompletionSource<int>();
  16. tcs7.TrySetCanceled();
  17. // canceled task which never run
  18. Task<int> t7 = tcs7.Task;
  19.  
  20. var tcs8 = new TaskCompletionSource<int>();
  21. // non completed task which never run
  22. Task<int> t8 = tcs8.Task;

All the Task above don’t even have any valid delegate, They’re only indicating

the execution state, even though non execution happened at all.

 

What’s so special about Task being a data structure?

Being a data structure enable to use the Task in broader context, much beyond simple threading.

For example, take common cache scenario where you have to do real asynchronous call

on the first time and return cached data on the following call (until the data became stale).

Task as data structure help you to abstract this functionality from the client.

The following snippet demonstrate the concept:

Code Snippet
  1. {
  2.     private readonly string _url;
  3.     private Task<byte[]> _localCache;
  4.  
  5.     public Proxy(string url)
  6.     {
  7.         _url = url;
  8.     }
  9.  
  10.     public Task<byte[]> GetData()
  11.     {
  12.         if (_localCache == null)
  13.         {
  14.             using (var http = new HttpClient())
  15.             {
  16.                 _localCache = http.GetByteArrayAsync(_url);
  17.             }
  18.         }
  19.         return _localCache;
  20.     }
  21. }

On the second call, GetData will be execute synchronously, yet it still return a Task.

The client don’t have to be aware of the caching implement within the method.

 

Parent / Child semantic is another feature which Task as data structure sine on.

Traditionally if you start new thread from a running thread, it don’t have any affinity to the original

thread it was start from. The new thread is a new execution root which isn’t aware of its origin.

On the other hand, Task does enable Parent / Child semantic. because Task is data structure, it can

maintain the affinity to child Tasks.

In matter of fact, Task don’t limit you to direct descendants.

You can have as many descendant’s level as you like, parent / child / grandchild / etc..

The following code demonstrate the idea:

Code Snippet
  1. Task t = Task.Factory.StartNew(() =>
  2. {
  3.     Task t1 = Task.Factory.StartNew(() =>
  4.     {
  5.         Task t2 = Task.Factory.StartNew(() =>
  6.         {
  7.             Thread.Sleep(1500);
  8.         }, TaskCreationOptions.AttachedToParent);
  9.         Thread.Sleep(1000);
  10.     }, TaskCreationOptions.AttachedToParent);
  11.     Thread.Sleep(500); // don't use sleep from task in real-life code
  12.                         // it will cause ThreadPool starvation (use await Task.Delay)
  13. });
  14. t.ContinueWith(parent => Trace.WriteLine(parent.Exception),
  15.                 TaskContinuationOptions.OnlyOnFaulted);

The Tasks at lines 3,5  attaching to their parents (lines 10, 8).

This mean that the ContinueWith at line 14 will trigger only after 1500 milliseconds (which is the longest

duration of on of the Task descendent ).

Another benefit of the data structure shown on the above code snippet is the ability to scheduler

conditional execution. As you remember Task don’t responsible of its execution. It’s the responsibility of

the TaskScheduler to do so. On line 15, we schedule conditional continuation.

The TaskScheduler will check the state of the Task when it complete and schedule the continuation

only if the origin Task is in faulted state.

 

Be aware that replacing Task.Factory.StartNew with Task.Run on the above snippet won’t

be a good idea, because Task.Run will deny attaching of child Task (more about the

difference between Task.Run and  Task.Factory.StartNew on future post).

 

summary

Task is the foundation for modern async execution on .NET.

This post demystify Task while the next post will goes one step further

and demystify async / await.

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*