Why Concurrency Is Hard (Or: TimedLock Can Get You in Trouble)

January 27, 2009

7 comments

I’ve just noticed a post by Guy Kolbis discussing a possible solution for deadlocks – ensuring that all locks are taken with a timeout.  To do so, Guy cites the TimedLock struct, originally introduced by Ian Griffiths.

The general idea is that instead of using a standard lock{…} block, you wrap your critical section in the following statement:

using (TimedLock.Lock(…))

{

}

Ian even introduces a refinement of this idea by including a sentinel reference type in the struct, in Debug builds, so that if the lock isn’t explicitly freed, the sentinel’s finalizer will be called and raise an assertion – so that the developer is immediately notified that there was a lock she forgot to release.

Unfortunately, there is a subtle yet fatal flaw with this approach.  Consider the proposed constructor of TimedLock:

public static TimedLock Lock (object o, TimeSpan timeout)

{

    TimedLock tl = new TimedLock (o);

    if (!Monitor.TryEnter (o, timeout))

    {

#if DEBUG

        System.GC.SuppressFinalize(tl.leakDetector);

#endif

        throw new LockTimeoutException ();

    }

 

    return tl;

Can you spot the problem?

Here’s a hint – what happens if an exception occurs after the lock was successfully acquired, but before the method returns?

Let’s step back for a second – how can an exception possibly occur between these two lines?  After all, if the TryEnter method returned successfully, there are no other operations in this method that could possibly introduce an exception.  Or maybe there are?

Of course there are!  If your thread is asynchronously aborted, e.g. using the Thread.Abort method, or as a result of an asynchronous exception such as OutOfMemoryException, then the exception can be potentially introduced between any two instructions in your code, including the most innocuous ones.

And what happens then?  Well, because the exception occurs before the method has returned, the using statement does not actually dispose the lock because the reference to the lock is never assigned in the caller’s code.  If you’re running in Debug, you would probably catch it thanks to the sentinel reference type – but if this happens to you in a production scenario, you have just leaked an acquired lock, which is never going to be released.

You can easily emulate this by putting a trace statement in the Dispose method, introducing a short sleep into the static Lock method (between the two problematic lines we discussed above), and then inducing an asynchronous exception from a different thread while the Lock method is executing – e.g. by aborting that thread.

So essentially, while the TimedLock apparently solves the problem of deadlocks, it does so by reintroducing potential deadlocks.  Not much gained here.

Nitpicker’s Corner

You, Dear Reader, might have a completely legitimate question.  Namely:  Why is the lock keyword immune from this problem?

As you probably know, a lock statement is compiled to a call to Monitor.Enter, and then a try…finally block.  In the finally block, Monitor.Exit is called.

The JIT code generation for both x86 and x64 ensures that a thread abort cannot occur between a Monitor.Enter call and a try block that immediately follows it.  This always worked that way for x86/Release (because there were no instructions between the call and the subsequent instruction within the try block), and was actually fixed in VS2008 for x64/Release.  (What we have in our case is a try block that is quite a few instructions, including a function call return, away from the Monitor.Enter call.)

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

7 comments

  1. Rupert BenbrookJanuary 27, 2009 ב 10:18 PM

    I think I’m right in thinking that the concurrency problem you describe could be solved by putting the TryEnter, throw, and return inside a Constrained Execution Region (CER).
    http://msdn.microsoft.com/en-us/library/system.runtime.compilerservices.runtimehelpers.prepareconstrainedregions.aspx

    Reply
  2. Sasha GoldshteinJanuary 27, 2009 ב 11:18 PM

    Unfortunately, you can’t acquire a lock inside a CER. This is well-documented.

    http://msdn.microsoft.com/en-us/library/ms228973.aspx

    Reply
  3. Rupert BenbrookJanuary 28, 2009 ב 10:50 AM

    Ah yes, I missed that bit in the docs.

    Reply
  4. SebastianJanuary 29, 2009 ב 6:37 PM

    We tried to do almost the same thing in our applications.

    During development we used one implementation that records the lock order and reports errors if different code paths aquire the locks in a different order (ie A, C, B in one call path and B, D, A in another is usually an error) and in production we changed that code into functions that simply called Monitor.Enter/Leave. But after changing all the locking calls into using(…), we noticed that the CRL wasn’t smart enough to inline our calls to the method that called Monitored.Enter/Leave. We measured a ten times performance drop in a program that was almost CPU bound already so we had to revert the changes.

    During that time I also checked the machine code generated when using:
    Monitor.Enter…
    try {

    } finally {
    Monitor.Leave…
    }

    but that code was also more inefficient than the standard lock-block and caused a performance drop.

    Reply
  5. Michael MacJanuary 30, 2009 ב 2:05 PM

    Sasha Goldshtein wrote

    “So essentially, while the TimedLock apparently solves the problem of deadlocks, it does so by reintroducing potential deadlocks. Not much gained here.”

    Although technically I agree let’s look at it from a different perspective.

    1. Do normal applications use Thread.Abort? It is not recommended to do so, because of the out-of order exceptions. It’s hard to do it properly.

    2. Do developers write code to recover from OutOfMemoryException or StackOverflowException? Is is it always possible?

    3. Will finnally always be called? http://msdn.microsoft.com/en-us/magazine/cc163716.aspx

    4. Out of order exceptions can cause state corruption, for example:
    lock (o)
    {
    a++;
    // out-of order exception
    a++;
    }

    Using this technique we’ll exit the lock, however our state would be corrupted.
    What is better: orphaned lock or corrupted state? It depends.

    All in all, I think that TimedLock is better in most cases. It solves the problem of the most common deadlocks. If we get out-of order exception we probably don’t handle it and can’t recover from it. However, as I said “It depends”.

    Reply
  6. Michael MacJanuary 30, 2009 ב 7:15 PM

    Sasha Goldshtein wroter

    “So essentially, while the TimedLock apparently solves the problem of deadlocks, it does so by reintroducing potential deadlocks. Not much gained here.”

    Although technically I agree let’s look at it from a different perspective.

    1. Do normal applications use Thread.Abort? It is not recommended to do so, because of the out-of order exceptions. It’s hard to do it properly.

    2. Do developers write code to recover from OutOfMemoryException or StackOverflowException? Is it always possible?

    3. Will finnally always be called? http://msdn.microsoft.com/en-us/magazine/cc163716.aspx

    4. Out of order exceptions can cause state corruption, for example:
    lock (o)
    {
    a++;
    // out-of order exception
    a++;
    }

    Using this technique we’ll exit the lock, however out state would be corrupted.
    What is better: orphaned lock or corrupted state? It depends.

    All in all, I think that TimedLock is better in most cases. It solves the problem of the most common deadlocks. If we get out-of order exception we probably don’t handle it and cann’t recover from it. However, as I said “It depends”.

    Reply
  7. Sasha GoldshteinFebruary 8, 2009 ב 9:33 PM

    @Michael:
    Of course it depends, but I’ve seen more than one application that needed (unfortunately) to rudely abort threads. Some hosts require reliability to such a degree that you can’t just wave it off by saying that out-of-order exceptions are rare and no one really bothers to handle them…

    Reply