Analyzing Monitor-Based Deadlocks with SOS.dll

October 17, 2009

one comment

Over a year ago, I blogged about the SOSEX debugging extension (which has been updated since!) and how its !dlk command can be used to diagnose a managed deadlock that involves Monitor locks.

There’s also the obvious question of what to do when SOSEX can’t be used for this purpose, i.e. how to diagnose this kind of deadlock with SOS alone.

First of all, you need to see the sync blocks involved. The SOS !syncblk command (and the !dumpheap -thinlock command) will show you information about the sync blocks in the debugged process or dump. (Sync blocks are the CLR mechanism behind the lock statement and the Monitor class.)

0:000> !syncblk
Index         SyncBlock MonitorHeld Recursion Owning Thread Info          SyncBlock Owner
    1 0000000002551e88            3         1 00000000025509a0   720   3   0000000002983cd8 ADeadlockOfSorts.BankAccount
    2 0000000002551ed0            3         1 000000000251e580  1608   0   0000000002983cc0 ADeadlockOfSorts.BankAccount
Total           2
CCW             0
RCW             0
ComClassFactory 0
Free            0

What you see in this output is two sync blocks, both belonging to objects of type BankAccount (this is the object instance that was passed to the lock statement or the Monitor.Enter method). You can also see the owning thread information (Windows thread ID and debugger thread number) for each of the sync blocks.

Now we need to know what each of the application threads are waiting for. We know about the locks they hold, but we don’t know about the locks they are trying to acquire. First, let’s take a look at all the managed threads using the !threads command:

0:000> !threads
ThreadCount: 3
UnstartedThread: 0
BackgroundThread: 2
PendingThread: 0
DeadThread: 0
Hosted Runtime: no
                                              PreEmptive                                                Lock
       ID OSID        ThreadOBJ     State   GC     GC Alloc Context                  Domain           Count APT Exception
   0    1 1608 000000000251e580   200a020 Enabled  0000000002983df0:0000000002983fd0 0000000000382df0     1 MTA
   2    2 159c 0000000002521d90      b220 Enabled  0000000000000000:0000000000000000 0000000000382df0     0 MTA (Finalizer)
   3    3  720 00000000025509a0   380b220 Enabled  0000000002984090:0000000002985fd0 0000000000382df0     1 MTA (Threadpool Worker)

So we have three threads here capable of executing managed code—the main application thread (ID 0), the finalizer (ID 2) and a thread-pool thread (ID 3). What are they up to?

0:000> ~*e !CLRStack
OS Thread Id: 0x1608 (0)
Child-SP         RetAddr          Call Site
000000000027e7f0 000007fee146d502 ADeadlockOfSorts.Program.Main(System.String[])
OS Thread Id: 0x159c (2)
Failed to start stack walk: 80004005
OS Thread Id: 0x720 (3)
Child-SP         RetAddr          Call Site
000000001b21f4b0 000007fee0142bbb ADeadlockOfSorts.Program+<>c__DisplayClass3.<Main>b__2(System.Object)
000000001b21f510 000007fee01b7411 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
000000001b21f560 000007fee01b724f System.Threading._ThreadPoolWaitCallback.PerformWaitCallbackInternal(System.Threading._ThreadPoolWaitCallback)
000000001b21f5b0 000007fee146d502 System.Threading._ThreadPoolWaitCallback.PerformWaitCallback(System.Object)

So we have three threads in the output, but the finalizer is not executing any code. However, it’s not obvious from what we see here what threads 0 and 3 are doing. (On a 32-bit machine, BTW, the stack would explicitly contain Monitor.Enter.)

Therefore, we have to resort to a native stack dump using the kb command:

0:000> ~*kb

.  0  Id: 1504.1608 Suspend: 1 Teb: 000007ff`fffde000 Unfrozen
RetAddr           : Args to Child                                                           : Call Site
000007fe`fdad2fa2 : 00000000`00344b30 00000000`00000010 000000d0`006a000a 00000000`003487d0 : ntdll!NtWaitForMultipleObjects+0xa
00000000`77782363 : 00000000`0027e1a8 00000000`0027e1a0 00000000`00000000 00000000`00000208 : KERNELBASE!WaitForMultipleObjectsEx+0xe8
000007fe`e12a65f9 : 00000000`ffffffff 00000000`00000000 00000000`0251e580 00000000`00000001 : KERNEL32!WaitForMultipleObjectsExImplementation+0xb3
000007fe`e12abac5 : ffffffff`00000001 00000000`02551ea0 00000000`0251e580 000007fe`e12be1b4 : mscorwks!WaitForMultipleObjectsEx_SO_TOLERANT+0xc1
000007fe`e12ebe9d : 00000000`00000000 00000000`02551ea0 00000000`00000000 00000000`ffffffff : mscorwks!Thread::DoAppropriateAptStateWait+0x41
000007fe`e12c75c0 : 00000000`0027e810 000007fe`e187b2e8 00000000`02551ea0 00000000`00000000 : mscorwks!Thread::DoAppropriateWaitWorker+0x191
000007fe`e1399afa : ffffffff`fffffffe 00000000`00000001 00000000`00000fc8 00000000`00000000 : mscorwks!Thread::DoAppropriateWait+0x5c
000007fe`e13dc2b1 : 00000000`02551e88 00000000`0027e810 00000000`0251e580 00000000`0251e580 : mscorwks!CLREvent::WaitEx+0xbe
000007fe`e127c006 : 00000000`02551e88 00000000`0027e810 00000000`ffffffff 00000000`00000000 : mscorwks!AwareLock::EnterEpilog+0xc9
000007fe`e180a9c5 : 00000000`00000001 000007fe`e189f200 00000000`0251e580 00000000`0251e580 : mscorwks!AwareLock::Enter+0x72
000007ff`001802d0 : 00000000`000003e8 00000000`00000000 00000000`02983cbc 00000000`0027e7d0 : mscorwks!JIT_MonEnterWorker_Portable+0xf5
000007fe`e146d502 : 00000000`02983bf8 000007fe`e134240a 000007fe`e1215890 00000000`00000000 : 0x7ff`001802d0
000007fe`e1329fd3 : 00000000`00000000 000007fe`e027ec90 00000000`00000000 00000000`00000000 : mscorwks!CallDescrWorker+0x82
000007fe`e133a3af : 00000000`0027ea38 00000000`00000008 00000000`0027ec50 00000000`0027ea38 : mscorwks!CallDescrWorkerWithHandler+0xd3
000007fe`e12adc7f : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`0027eeb0 : mscorwks!MethodDesc::CallDescr+0x24f
000007fe`e1291c74 : 00000000`00000000 00000000`00000000 00000004`0016003a 00000000`00000000 : mscorwks!ClassLoader::RunMain+0x22b
000007fe`e12c9955 : 00000000`0027f4a0 00000000`00000000 00000000`02523068 00000000`00000200 : mscorwks!Assembly::ExecuteMainMethod+0xbc
000007fe`e13ddb07 : 00000000`0036a860 000007fe`fb5e0000 00000000`00000000 000007fe`e1407762 : mscorwks!SystemDomain::ExecuteMainMethod+0x491
000007fe`e129855c : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`0036a860 : mscorwks!ExecuteEXE+0x47
000007fe`fb5f015e : ffffffff`ffffffff 00000000`0251e580 000007fe`fb5f2fd0 000007fe`fb763a18 : mscorwks!CorExeMain+0xac

…snipped the finalizer thread for brevity…

   3  Id: 1504.720 Suspend: 1 Teb: 000007ff`fffd7000 Unfrozen
RetAddr           : Args to Child                                                           : Call Site
000007fe`fdad2fa2 : 00000000`025509a0 00000000`00000000 ffffffff`fffffffe 00000000`00000000 : ntdll!NtWaitForMultipleObjects+0xa
00000000`77782363 : 00000000`1b21ee68 00000000`1b21ee60 00000000`00000000 000007fe`e11fcbe0 : KERNELBASE!WaitForMultipleObjectsEx+0xe8
000007fe`e12a65f9 : 00000000`ffffffff 00000000`00000000 00000000`025509a0 00000000`00000001 : KERNEL32!WaitForMultipleObjectsExImplementation+0xb3
000007fe`e12abac5 : ffffffff`00000001 00000000`02551ee8 00000000`025509a0 000007fe`e12be1b4 : mscorwks!WaitForMultipleObjectsEx_SO_TOLERANT+0xc1
000007fe`e12ebe9d : 00000000`00000000 00000000`02551ee8 00000000`00000000 00000000`ffffffff : mscorwks!Thread::DoAppropriateAptStateWait+0x41
000007fe`e12c75c0 : 00000000`00000000 00000000`00000000 00000000`02551ee8 00000000`00000000 : mscorwks!Thread::DoAppropriateWaitWorker+0x191
000007fe`e1399afa : ffffffff`fffffffe 00000000`00000001 00000000`025509a0 00000000`00000000 : mscorwks!Thread::DoAppropriateWait+0x5c
000007fe`e13dc2b1 : 00000000`02551ed0 00000000`1b21f4d0 00000000`025509a0 00000000`025509a0 : mscorwks!CLREvent::WaitEx+0xbe
000007fe`e127c006 : 00000000`02551ed0 00000000`1b21f4d0 00000000`ffffffff 00000000`00000000 : mscorwks!AwareLock::EnterEpilog+0xc9
000007fe`e180a9c5 : 00000000`00000001 000007fe`e189f200 00000000`025509a0 00000000`025509a0 : mscorwks!AwareLock::Enter+0x72
000007ff`00180535 : 00000000`000003e8 00000000`00000000 00000000`02983fe8 00000000`00000000 : mscorwks!JIT_MonEnterWorker_Portable+0xf5
000007fe`e0142bbb : 00000000`02983ca0 00000000`00000000 00000000`1b21f3f0 00000000`00000000 : 0x7ff`00180535
000007fe`e01b7411 : 00000000`1b21f3e8 00000000`064c784b 00000000`02983d30 00000000`128d1030 : mscorlib_ni+0x2f2bbb
000007fe`e01b724f : 00000000`00000000 00000000`00000000 00000000`02983dd4 00000000`02528020 : mscorlib_ni+0x367411
000007fe`e146d502 : 00000000`00000000 000007fe`e1902760 ffffffff`fffffffe 000007fe`e187b2e8 : mscorlib_ni+0x36724f
000007fe`e1329fd3 : 00000000`00000173 000007fe`e02c3698 00000000`00383148 00000000`00000000 : mscorwks!CallDescrWorker+0x82
000007fe`e1321c3a : 00000000`1b21f870 000007fe`e133d901 ffffffff`fffffffe 00000000`00000001 : mscorwks!CallDescrWorkerWithHandler+0xd3
000007fe`e12dd0bf : 000007fe`dfed7ac8 000007fe`dfe51000 000007fe`e1a8f568 000007fe`e13154ec : mscorwks!DispatchCallDebuggerWrapper+0x3e
000007fe`e13e2f17 : 00000000`1b21f9e0 00000000`025509a0 00000000`025509a0 00000000`025509a0 : mscorwks!DispatchCallNoEH+0x5f
000007fe`e12ac1a0 : 00000000`00000002 00000000`00000000 00000000`00000000 000007fe`e132d1de : mscorwks!QueueUserWorkItemManagedCallback+0x83

This call stack explicitly reveals that both threads are waiting on a lock; we can see the parameters to mscorwks!AwareLock::Enter, and the third parameter on each call stack is the sync block itself that the thread is trying to lock. Note that we don’t see the object instance here, but we certainly do see the sync block—and correlating it with the !syncblk output above is enough to reach a conclusion.

What we have here is thread 0 waiting for sync block 0000000002551e88 which is held by thread 3 which waits for sync block 0000000002551ed0 which is held by thread 0. We have a wait chain with a cycle in it, and therefore a deadlock.

Nitpicker’s corner: Even if we didn’t see the sync block parameter on the stack, we could use .frame /r 9 to see the register values on the AwareLock::Enter frame and deduce the sync block address from the ECX register:

0:000> .frame /r 9
09 00000000`0027e5f0 000007fe`e180a9c5 mscorwks!AwareLock::Enter+0x72
rax=000000000251e580 rbx=0000000002551e88 rcx=000000000251e580
rdx=0000000000000001 rsi=000000000027e6b8 rdi=000000000251e580
rip=000007fee127c006 rsp=000000000027e5f0 rbp=000000000027e810
r8=0000000002551ea0  r9=0000000000000000 r10=0000000000000000
r11=0000000000000202 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=000007fee146bc60
iopl=0         nv up ei pl zr na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000244
000007fe`e127c006 ebc6            jmp     mscorwks!AwareLock::Enter+0x3a (000007fe`e127bfce)

But if you have a call stack with valid parameters, it’s much better.

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>


one comment

  1. Steve JohnsonOctober 17, 2009 ב 7:45 PM

    I have one additional suggestion for easier sync block troubleshooting. I have adopted the convention of only locking on strings with meaningful names. This way, the purpose of a given lock is always obvious in the debugger.