Synchronization Objects and Vista's Wait Chain Traversal
Debugging issues which have to do with synchronization objects, such as deadlocks and other types of hangs, has traditionally been a very difficult task. Normally left to consultants, it was a great source of income too.
How does Windows actually keep track of synchronization objects? What does Vista have to do with this (as the title of the post suggests)? If this floats your boat, read on.
The Win32 synchronization objects, as well as their managed counterparts (such as the .NET Monitor, EventWaitHandle and others) are merely a convenient API wrapper around synchronization primitives provided by the OS kernel. Specifically, user mode synchronization is based on kernel mode dispatcher objects, which give us the ability to wait for an object and to be notified when the object becomes signaled.
Note that waiting for an object to become signaled is accomplished through a limited API subset (e.g. WaitForSingleObject, WaitForMultipleObjects etc.), but signaling an object is entirely dependant on the object's semantics. For example, a thread becomes signaled when it terminates; an event becomes signaled when someone calls SetEvent; a file becomes signaled when an overlapped I/O operation completes; and so on.
What does the operating system need to do to maintain the semantics of synchronization objects? It has to do two basic things, which are scattered all across the Windows kernel:
- When a thread attempts to wait on a synchronization object, if the object is already signaled the thread is released immediately. If the object is not signaled, the thread enters a wait state and is therefore removed from the CPU (a context switch is performed to a ready thread selected by the dispatcher, or the idle thread if no other thread is ready to run).
- When a thread signals a synchronization object, any threads waiting for that object are removed from the wait state and enter the ready state. It doesn't mean that these threads are immediately scheduled for execution - it only means that now they are ready to execute. (For nitpickers: some synchronization objects will only wake up a single thread when signaled, such as auto-reset events, a.k.a. synchronization events.)
If you still remain unconvinced that synchronization in the waiting sense cannot be achieved in user mode alone, go ahead and try stepping into the disassembly of a Win32 WaitForSingleObject call. You will find yourself stepping through WaitForSingleObjectEx (from kernel32.dll) and then NtWaitForSingleObject (from ntdll.dll). Finally, you will find yourself looking at the following piece of code (x64 here):
Go ahead and try stepping into the syscall instruction with Visual Studio. Nothing happens, and when the wait is satisfied you will hit the ret instruction immediately following it. The next thing you know, you are unwinding the stack back to the original call to WaitForSingleObject.
The syscall instruction is the gate to the system service dispatcher, which is a kernel mode component in charge of executing system calls. It transitions the CPU from user mode to kernel mode and executes the corresponding system service (in this case, system service number 1). Without a kernel debugger it will be impossible to single-step the system service call.
So what have we learned so far? Synchronization mechanisms aren't implemented in user mode. In fact, in view of the above it would be impossible to implement them in user mode because context switching is implemented in kernel mode. This already means that debugging synchronization objects is going to be more difficult than debugging normal user mode issues, because we do not have the facilities for actually viewing the state of these objects without a kernel debugger.
To understand what's going on under the covers, we need to familiarize ourselves with several data structures used by the operating system to keep track of which thread is waiting for which synchronization object. Some of these data structures are actually viewable in the Windows header files, which are distributed with the WDK (formerly DDK).
The first data structure is the DISPATCHER_HEADER structure, which is a part of the kernel data structure for any synchronization object:
This data structure contains the synchronization object's type, its signal state (whether it's signaled or non-signaled), and a LIST_ENTRY structure. This LIST_ENTRY is just the head of a linked list of other data structures, which represent the threads waiting for the synchronization object.
What are these data structures representing the threads waiting for the object? They are called KWAIT_BLOCKs:
What's notable here? First of all, there's a LIST_ENTRY again which contains forward and backward pointers in the linked list of wait blocks. Next, there's a pointer to a KTHREAD data structure, which represents a thread. Following it is a PVOID (void*) to the actual synchronization object.
To understand the subsequent three fields, a quick API reminder is in place. If you recall, it is possible to use the WaitForMultipleObjects Win32 function to wait for several objects at once. Furthermore, it's also possible to wait for any of the objects to become signaled (in which case your thread will return to the ready state once any single one of the objects becomes signaled). To make book-keeping easier, the structure contains a pointer to the next wait block (NextWaitBlock - representing the next object the thread is waiting for), a wait key (WaitKey - representing the index of this wait block in the array of handles passed to WaitForMultipleObjects), and a wait type (WaitType - whether to wait for all objects to become signaled or just for any one of them).
Finally, out of sheer curiosity we might take a look at the KTHREAD structure. It contains an awful lot of fields, so I snipped most of them:
Note the DISPATCHER_HEADER sitting there innocently as the first field of the KTHREAD data structure. Since a thread can also be synchronized upon, it contains a DISPATCHER_HEADER like any other synchronization object. Next we find an array of KWAIT_BLOCKs which are arranged in a union with some other data stored there as long as the wait blocks aren't needed. The reason for these four wait blocks sitting in the KTHREAD structure is merely an optimization: it's fairly common for a thread to wait on a synchronization object, so allocating and releasing the memory for the KWAIT_BLOCK structure every time becomes tedious and expensive. Therefore, there are four pre-allocated wait blocks waiting for the thread to use them (pun intended).
Next is a depiction of the relationships between these various data structures in a graphical form (following the above structure definitions in the textual form is brave nevertheless). Assume we have two threads, A and B, and two synchronization objects, X and Y. Thread A is waiting for X and Y; thread B is waiting for Y only. Here's what it looks like:
Aha, now you see why debugging synchronization objects gets out of hand so easily. Look at the enormous amount of data that has to be maintained by the system and traversed by us debuggers to understand what's going on. And we're talking about two objects and two threads; that's several orders of magnitude less than what's going on on a typical system.
Still, once we know what's under the covers we are able to appreciate how debugging this kind of entanglement would be like. Take the KWAIT_BLOCKs of all your threads; traverse the wait relationships to discover which objects your threads are waiting for; and so on.
This highly tedious task has finally been given to us on a silver platter as part of Windows Vista and Windows Server 2008 (NT 6.0). We now have a user-mode API, called Wait Chain Traversal (WCT), which does the error-prone and difficult job of enumerating through these synchronization objects and reports to us the wait chain that is formed by our threads and objects.
What is a wait chain then? It is an alternating directed graph of threads and synchronization objects. An edge from a thread to an object in the graph means the thread is waiting for that object; an edge from an object to a thread means the thread currently owns that object. (Nitpickers again: not every synchronization relationship can be expressed using these abstractions. And indeed, WCT doesn't support every kind of synchronization object.)
For example, if you have thread A waiting for a mutex X owned by a thread B waiting for a mutex Y, you have the following wait chain:
But what happens if mutex Y is owned by thread A? It would form a cycle in the chain:
And that cycle means one thing and one thing only: DEADLOCK. Has it ever been easier to diagnose one?
On the to API. John Robbins has written a great article in his Bugslayer column in the July 2007 MSDN Magazine, so I won't be repeating everything time and again. The basics are:
There is more to be talked about, such as the asynchronous callbacks WCT provides, COM-related synchronization callbacks, and other advanced diagnostic scenarios, but I will leave exploring them to the reader. The MSDN reference on WCT is a great place to start.
Have I whetted your appetite for finally taking the next step to Vista and Server 2008? . . .