DCSIMG
February 2008 - Posts - All Your Base Are Belong To Us

All Your Base Are Belong To Us

Mostly .NET internals and other kinds of gory details

February 2008 - Posts

Screencast: What's New in Visual Studio 2008?

A few days ago I have recorded a 14' pilot screencast (in Hebrew) titled "What's New in Visual Studio 2008?", aiming to show developers and decision makers what kind of new functionality to expect in the recently released Visual Studio.  In other words, what's in the box?

My expression on the left (which BTW is a static photo, unfortunately) is very grim, but don't worry, I get more vivid during the screencast.

image

Among the things I demonstrate:

  • Multi-targeting support
  • Multi-threaded debugging improvements
  • Code metrics
  • First steps in LINQ

Feel free to watch and, as always, let me know what you think.  Thanks to Sela and you-niversity for making this possible.

TechEd 2008 (Israel): Next Generation Production Debugging

TechEd Israel 2008 is going to take place on April 6-8, in Eilat (as usual).  By the way, if you haven't registered yet, there might still be some room so hurry up!

At the upcoming conference, I will be speaking about production debugging in a session cleverly titled "Next Generation Production Debugging".  I am just about to finalize the list of topics and demos that I will be talking about.

To help you decide, some talented people at Sela and you-niversity helped me film a short promo video (about 5 minutes long) introducing myself and the TechEd session.  Feel free to watch it (Hebrew warning...) and let me know what you think.

image

I will probably show you some tools and techniques that you haven't seen before; demonstrate a theoretical approach to practical problems and show why the theory matters; and last but not least, have some fun with my favorite debugger in front of all of you!  :-)

If you liked my DevAcademy session, you are probably going to like this one as well.  If you haven't been to my DevAcademy session, but are interested in understanding what's going on with your code in run-time, learning about new tools and concepts which reveal bugs and facilitate finding problems, then I expect to see you at my session.

Interception and Attributes: A Design-By-Contract Sample

A Design-By-Contract programming paradigm specifies that classes and methods in the language specify pre- and post-conditions which must hold when entering and leaving the class code.  For example, the following Eiffel snippet is a counter class which can be incremented, decremented and reset to 0.  Note the ensure, require and invariant clauses sprinkled across the definition:

class TINY COUNTER
feature
    item: INT
feature
    increment is 
        do
            item = item + 1
        ensure
            item = old item + 1
    end
    decrement is
        require
            item > 0
        do
            item := item - 1
        ensure
            item = old item - 1
    reset is
        do
            items := 0
        ensure item = 0
    end
invariant
    item >= 0
end

This kind of framework can be directly available in C# through the use of interception and attribute.  What we need in our implementation is a set of mechanisms for specifying pre- and post-conditions on methods, and an engine to generate the necessary code and call it.  There are at least three approaches:

  1. Explicit approach - call explicit methods when entering a method and when leaving a method to ensure that the invariants hold (for class-level invariants, this might be a little more difficult);
  2. Implicit approach #1 - use IL weaving or any other post-compilation mechanism (e.g. PostSharp) to plant the pre- and post-conditions into a method which requires it.  The method might ask for pre- and post-conditions through the use of attributes;
  3. Implicit approach #2 - use code generation to create a proxy which will check the pre- and post-conditions prior to dispatching the call to the actual target (this is obviously limited if only code-generation is used, e.g. only interfaces can be mocked).

I've chosen to demonstrate the third approach.  Assume we are looking at an implementation of a string-replace algorithm, which takes a string, an old substring and a new substring to replace the old one with.  Here's the way to express the expectations we have from our interface:

public interface IStringReplace
{
    [ParameterNotNull("orig")]
    [ParameterNotNull("what")]
    [ParameterNotNull("withWhat")]
    [ParameterConstraint("what.Length <= orig.Length")]
    [ReturnValueConstraint("__ReturnValue.IndexOf(what) == -1")]
    [ReturnValueNotNull]
    string Replace(string orig, string what, string withWhat);
}

Note how fluently we can express our requirements!  I'm saying that the three parameters can't be null, the return value can't be null, the length of the string to replace can't exceed the length of the original string, and after the replacement the string to be replaced shouldn't be found in the return value.  These checks might have just saved me quite some parameter verification, and saved my callers lots and lots of checks when calling my method and not knowing what to expect.  The power of this approach is addictive; it is almost masochistically expressive.

In this sample, I implemented two "generic" conditions - the ParameterConstraint pre-condition and the ReturnValueConstraint pre-condition.  Additionally, I implemented two specific conditions - the ParameterNotNull and ReturnValueNotNull conditions.  (There's lots of more interesting work to be done here, such as opting in and out of the checks, combining checks for multiple methods, specifying class invariants which must hold true for every method, customizing the behavior in case of failure, and many other interesting aspects.  By the way, with slight modifications this framework can be used to automatically generate unit tests as well!)  For now, let's take a look at the interface implementation and usage example:

public class StringReplaceImpl : IStringReplace
{
    public string Replace(string orig, string what, string withWhat)
    {
        return orig.Replace(what, withWhat);
    }
}

class Program
{
    static void Main(string[] args)
    {
        IStringReplace replacer = 
            DbcActivator.CreateInstance<StringReplaceImpl, IStringReplace>();
        string result = replacer.Replace("Blah", "B", "A");
    }
}

What happens if the method implementation is incorrect, or if the parameters passed by the caller do not meet the pre-conditions?  An exception is thrown by the DBC interception layer outlined below, specifying the cause of failure:

image

image

So who is this DbcActivator?  It's the initial interception point where all the magic happens.  By requiring the pre- and post-conditions to be specified on the interface and not the actual implementation, we can intercept the calls and perform pre- and post-condition verification in a proxy which wraps the actual invocation target.  The general architecture here is:

image 

Briefly speaking, the client asks the DbcActivator to return a proxy instance which implements the same interface he expects from the target class.  The proxy wraps the target class and delegates all calls to it, but checks pre-conditions before making calls and post-conditions after making the call but before returning to the client.  Since the proxy code is only generated once, and the pre- and post-conditions checks are code-generated and not determined in runtime, this is almost as efficient as we can get.  (IL weaving has more potential, but is far more complex without a big framework in place.)

The implementation is not particularly efficient or well-designed, but if you're looking for a sample of code generation, interception and runtime services provided through the use of attributes, this is a great place to start.  Here's the sample as a single C# file.

.NET to C++ Bridge

Most people have encountered the need for interoperability between managed and unmanaged code.  There are plenty of patterns and tutorials which explain every detail of writing managed code which can call into unmanaged code.  The techniques we can use, from most common to least explored, are:

  1. Straightforward P/Invoke (static extern, [DllImport], sprinkle a [MarshalAs] or two and we're set - there are even tools to help);
  2. COM interop (import the type library and you're good to go);
  3. C++/CLI wrapper class;
  4. Calling unmanaged function directly (CALLI instruction with Reflection.Emit).

The opposite way around, however, is something many people struggle with because it's not as sexy and common.  How can we call new managed code from our old legacy native code?  Well, there are again several ways to do it:

  1. Reverse P/Invoke (has to start from .NET delegate passed as callback, so this is only good if the "action" begins in your .NET code);
  2. COM interop (every .NET class can also be a COM object, with or without explicit interfaces);
  3. C++/CLI wrapper classes.

What I want to focus on is the third approach - generating C++/CLI wrapper classes to allow pure unmanaged code interaction with our managed code.  This has to be done in a tiered approach, because there's no direct way for native C++ code to call into managed code.  What we need to go through is the following mechanical process:

  1. Open a C++/CLI class library project and change the settings so it generates an import lib (under Linker->Output);
  2. Write a C++ class (not a .NET reference/value type, i.e. not a ref/value class) which wraps the methods of the original .NET class.  This means that this C++ class has to be compiled to IL, contain a reference to the .NET object (using the gcroot<> template) and delegate all calls to the .NET object.
  3. Write a native C++ class (using #pragma unmanaged, so it's not compiled to IL) which wraps the IL bridge written in step 2 and delegates all calls to it.

image

That last C++ class will also be decorated with __declspec(dllexport), so we can use the resulting class library as a normal native DLL.  Note that marshaling decisions (converting unmanaged types to managed types and vice versa) are made at the IL bridge class, which is aware of both unmanaged and managed code.

This flow seems very complicated and might also appear to have a negative effect on performance.  However, realistically, even though we have a complex flow, most of the path is simple delegation and therefore a good candidate for inlining.  For example, if we have a C# class A, IL bridge B, native C++ class C and a native client D, we're likely to have two extra function calls only: D->C (because of DLL boundaries, but if D lives in the same DLL as C, inlining is likely again; or if PGO is employed, inlining is an option), and then C->B performs the unmanaged to managed transition (which is most of the cost anyway).  After that, the code in the IL bridge is likely to be inlined with the original .NET class if the method on that class is small.

These steps are highly mechanical and annoyingly similar across various classes, so I wanted to see if I can devise an automatic tool for generating these wrappers.  It seems fairly simple once you have a good code generation framework in place; without one, I was able to bake some sample-quality code which takes a managed type and wraps it with an IL bridge and a native C++ class.  It lacks in many areas (such as support for recursively converting structures and other non-primitive types), but I still decided to attach it because I am not at all sure if I will have the time to wrap it up.  So if anyone feels like picking it up from here, or contributing parts of the work, it would be great.

Without further ado, here's a piece of sample output from the tool.  With the following class in place:

public class Calculator
{
    public int Add(int first, int second)
    {
        return first + second;
    }
    public string FormatAsString(float i)
    {
        return i.ToString();
    }
}

Here's the IL bridge generated for this class:

#pragma once
#pragma managed

#include <vcclr.h>

class ILBridge_CppCliWrapper_Calculator {
private:
    //Aggregating the managed class
    gcroot<CppCliWrapper::Calculator^> __Impl;
public:
    ILBridge_CppCliWrapper_Calculator() {
        __Impl = gcnew CppCliWrapper::Calculator;
    }
    int Add(int first, int second) {
        System::Int32 __Param_first = first;
        System::Int32 __Param_second = second;
        System::Int32 __ReturnVal = __Impl->Add(__Param_first, __Param_second);
        return __ReturnVal;
    }
    wchar_t* FormatAsString(float i) {
        System::Single __Param_i = i;
        System::String __ReturnVal = __Impl->FormatAsString(__Param_i);
        wchar_t* __MarshaledReturnVal = marshal_to<wchar_t*>(__ReturnVal);
        return __MarshaledReturnVal;
    }
};

And here's the native exported header and source for the class.  Note that the exported header is callable by any C++ client - that C++ client doesn't have to be compiled with /CLR or even know what .NET is.

//This is the .h file
#pragma once
#pragma unmanaged

#ifdef THISDLL_EXPORTS
#define THISDLL_API __declspec(dllexport)
#else
#define THISDLL_API __declspec(dllimport)
#endif

//Forward declaration for the bridge
class ILBridge_CppCliWrapper_Calculator;

class THISDLL_API NativeExport_CppCliWrapper_Calculator {
private:
    //Aggregating the bridge
    ILBridge_CppCliWrapper_Calculator* __Impl;
public:
    NativeExport_CppCliWrapper_Calculator();
    ~NativeExport_CppCliWrapper_Calculator();
    int Add(int first, int second);
    wchar_t* FormatAsString(float i);
};

//This is the .cpp file
#pragma managed
#include "ILBridge_CppCliWrapper_Calculator.h"
#pragma unmanaged
#include "NativeExport_CppCliWrapper_Calculator.h"

NativeExport_CppCliWrapper_Calculator::NativeExport_CppCliWrapper_Calculator() {
    __Impl = new ILBridge_CppCliWrapper_Calculator;
}
NativeExport_CppCliWrapper_Calculator::~NativeExport_CppCliWrapper_Calculator()
{
    delete __Impl;
}
int NativeExport_CppCliWrapper_Calculator::Add(int first, int second) {
    int __ReturnVal = __Impl->Add(first, second);
    return __ReturnVal;
}
wchar_t* NativeExport_CppCliWrapper_Calculator::FormatAsString(float i) {
    wchar_t* __ReturnVal = __Impl->FormatAsString(i);
    return __ReturnVal;
}
 
The very preliminary sample code used to generate these classes can be downloaded from here as a Visual Studio 2005 solution.  If you play with it please let me know.

Synchronization Objects and Vista's Wait Chain Traversal

Debugging issues which have to do with synchronization objects, such as deadlocks and other types of hangs, has traditionally been a very difficult task.  Normally left to consultants, it was a great source of income too.

How does Windows actually keep track of synchronization objects?  What does Vista have to do with this (as the title of the post suggests)?  If this floats your boat, read on.

The Win32 synchronization objects, as well as their managed counterparts (such as the .NET Monitor, EventWaitHandle and others) are merely a convenient API wrapper around synchronization primitives provided by the OS kernel.  Specifically, user mode synchronization is based on kernel mode dispatcher objects, which give us the ability to wait for an object and to be notified when the object becomes signaled.

Note that waiting for an object to become signaled is accomplished through a limited API subset (e.g. WaitForSingleObject, WaitForMultipleObjects etc.), but signaling an object is entirely dependant on the object's semantics.  For example, a thread becomes signaled when it terminates; an event becomes signaled when someone calls SetEvent; a file becomes signaled when an overlapped I/O operation completes; and so on.

What does the operating system need to do to maintain the semantics of synchronization objects?  It has to do two basic things, which are scattered all across the Windows kernel:

  1. When a thread attempts to wait on a synchronization object, if the object is already signaled the thread is released immediately.  If the object is not signaled, the thread enters a wait state and is therefore removed from the CPU (a context switch is performed to a ready thread selected by the dispatcher, or the idle thread if no other thread is ready to run).
  2. When a thread signals a synchronization object, any threads waiting for that object are removed from the wait state and enter the ready state.  It doesn't mean that these threads are immediately scheduled for execution - it only means that now they are ready to execute.  (For nitpickers: some synchronization objects will only wake up a single thread when signaled, such as auto-reset events, a.k.a. synchronization events.)

If you still remain unconvinced that synchronization in the waiting sense cannot be achieved in user mode alone, go ahead and try stepping into the disassembly of a Win32 WaitForSingleObject call.  You will find yourself stepping through WaitForSingleObjectEx (from kernel32.dll) and then NtWaitForSingleObject (from ntdll.dll).  Finally, you will find yourself looking at the following piece of code (x64 here):

image

Go ahead and try stepping into the syscall instruction with Visual Studio.  Nothing happens, and when the wait is satisfied you will hit the ret instruction immediately following it.  The next thing you know, you are unwinding the stack back to the original call to WaitForSingleObject.

The syscall instruction is the gate to the system service dispatcher, which is a kernel mode component in charge of executing system calls.  It transitions the CPU from user mode to kernel mode and executes the corresponding system service (in this case, system service number 1).  Without a kernel debugger it will be impossible to single-step the system service call.

So what have we learned so far?  Synchronization mechanisms aren't implemented in user mode.  In fact, in view of the above it would be impossible to implement them in user mode because context switching is implemented in kernel mode.  This already means that debugging synchronization objects is going to be more difficult than debugging normal user mode issues, because we do not have the facilities for actually viewing the state of these objects without a kernel debugger.

To understand what's going on under the covers, we need to familiarize ourselves with several data structures used by the operating system to keep track of which thread is waiting for which synchronization object.  Some of these data structures are actually viewable in the Windows header files, which are distributed with the WDK (formerly DDK).

The first data structure is the DISPATCHER_HEADER structure, which is a part of the kernel data structure for any synchronization object:

image 

This data structure contains the synchronization object's type, its signal state (whether it's signaled or non-signaled), and a LIST_ENTRY structure.  This LIST_ENTRY is just the head of a linked list of other data structures, which represent the threads waiting for the synchronization object.

What are these data structures representing the threads waiting for the object?  They are called KWAIT_BLOCKs:

image

What's notable here?  First of all, there's a LIST_ENTRY again which contains forward and backward pointers in the linked list of wait blocks.  Next, there's a pointer to a KTHREAD data structure, which represents a thread.  Following it is a PVOID (void*) to the actual synchronization object.

To understand the subsequent three fields, a quick API reminder is in place.  If you recall, it is possible to use the WaitForMultipleObjects Win32 function to wait for several objects at once.  Furthermore, it's also possible to wait for any of the objects to become signaled (in which case your thread will return to the ready state once any single one of the objects becomes signaled).  To make book-keeping easier, the structure contains a pointer to the next wait block (NextWaitBlock - representing the next object the thread is waiting for), a wait key (WaitKey - representing the index of this wait block in the array of handles passed to WaitForMultipleObjects), and a wait type (WaitType - whether to wait for all objects to become signaled or just for any one of them).

Finally, out of sheer curiosity we might take a look at the KTHREAD structure.  It contains an awful lot of fields, so I snipped most of them:

image

Note the DISPATCHER_HEADER sitting there innocently as the first field of the KTHREAD data structure.  Since a thread can also be synchronized upon, it contains a DISPATCHER_HEADER like any other synchronization object.  Next we find an array of KWAIT_BLOCKs which are arranged in a union with some other data stored there as long as the wait blocks aren't needed.  The reason for these four wait blocks sitting in the KTHREAD structure is merely an optimization: it's fairly common for a thread to wait on a synchronization object, so allocating and releasing the memory for the KWAIT_BLOCK structure every time becomes tedious and expensive.  Therefore, there are four pre-allocated wait blocks waiting for the thread to use them (pun intended).

Next is a depiction of the relationships between these various data structures in a graphical form (following the above structure definitions in the textual form is brave nevertheless).  Assume we have two threads, A and B, and two synchronization objects, X and Y.  Thread A is waiting for X and Y; thread B is waiting for Y only.  Here's what it looks like:

image

Aha, now you see why debugging synchronization objects gets out of hand so easily.  Look at the enormous amount of data that has to be maintained by the system and traversed by us debuggers to understand what's going on.  And we're talking about two objects and two threads; that's several orders of magnitude less than what's going on on a typical system.

Still, once we know what's under the covers we are able to appreciate how debugging this kind of entanglement would be like.  Take the KWAIT_BLOCKs of all your threads; traverse the wait relationships to discover which objects your threads are waiting for; and so on.

This highly tedious task has finally been given to us on a silver platter as part of Windows Vista and Windows Server 2008 (NT 6.0).  We now have a user-mode API, called Wait Chain Traversal (WCT), which does the error-prone and difficult job of enumerating through these synchronization objects and reports to us the wait chain that is formed by our threads and objects.

What is a wait chain then?  It is an alternating directed graph of threads and synchronization objects.  An edge from a thread to an object in the graph means the thread is waiting for that object; an edge from an object to a thread means the thread currently owns that object.  (Nitpickers again: not every synchronization relationship can be expressed using these abstractions.  And indeed, WCT doesn't support every kind of synchronization object.)

For example, if you have thread A waiting for a mutex X owned by a thread B waiting for a mutex Y, you have the following wait chain:

image

But what happens if mutex Y is owned by thread A?  It would form a cycle in the chain:

image

And that cycle means one thing and one thing only: DEADLOCK.  Has it ever been easier to diagnose one?

On the to API.  John Robbins has written a great article in his Bugslayer column in the July 2007 MSDN Magazine, so I won't be repeating everything time and again.  The basics are:

There is more to be talked about, such as the asynchronous callbacks WCT provides, COM-related synchronization callbacks, and other advanced diagnostic scenarios, but I will leave exploring them to the reader.  The MSDN reference on WCT is a great place to start.

Have I whetted your appetite for finally taking the next step to Vista and Server 2008? . . .

Cross-AppDomain Workflow Local Services

I stumbled across an interesting issue today that I thought might be worth sharing.  The design of my application server requires hosting multiple workflow types, potentially of multiple versions (the workflows are exposed as Workflow Services, using the WorkflowServiceHost class).  Since the server is long-running, it is feasible that several versions of the same workflow will be deployed and active on the same server, and several instances of each version will be created.

The only way to provide this behavior cleanly is through hosting each workflow in a separate AppDomain.  In this way, the server can be truly long-running, with workflow types being loaded and unloaded without requiring a restart.  Additionally, some of the workflow's local services are required to be shared across AppDomains, because they hold common state and information the workflows might require.

However, there is an interesting corner case involved when trying to host a workflow in a separate AppDomain and providing a local service that resides in a different AppDomain.  The corner case occurs when you combine the above with a persistence local service.

Without further ado, here's the cross-domain service:

image

And here's the initialization code that takes place in a separate AppDomain:

image

As you see, we have combined the custom CrossDomainCalculatorService (which is marshal-by-ref, so it lives in the creator's AppDomain) with the SqlWorkflowPersistenceService.  The connection string is invalid, but it's irrelevant because we will not get to the point where the persistence service complains about it.

Finally, here's the code to create the HostWorkflowInSeparateAppDomain thunk and call its Initialize method, causing the action to take place:

image

Tying these pieces together and running the resulting code produces an unexpected exception:

 image

Well, SqlWorkflowPersistenceService is not marked as serializable.  So bloody what?  I am not trying to marshal the persistence service across AppDomain boundaries, I'm trying to marshal the CrossDomainCalculatorService which is marshal-by-ref and should work...  Looking at the call stack we discover:

image

From here it's obvious that the exception occurs in the secondary AppDomain, and gets marshaled to the point of invocation in the primary AppDomain.  However, why does Object.Equals produce a SerializationException saying that SqlWorkflowPersistenceService is not marked as serializable?

If you can see it already, good for you; if not, let's take a look at what Reflector has to show us in the implementation of WorkflowServiceBehavior.ApplyDispatchBehavior (I have omitted the irrelevant parts):

image

OK, so if there is a workflow persistence service, then the Workflow runtime has to be stopped if it's started.  After that, the persistence service is removed, wrapped in an instance of SkipUnloadOnFirstIdleWorkflowPersistenceService (which is a private class in the WorkflowServiceBehavior - all it does it prevent workflow unload while the runtime is restarting), and added back to the workflow runtime.  Finally, the workflow runtime is restarted if it were stopped earlier.

Can you see it now?  Read again: "...the persistence service is removed..." - WorkflowRuntime.RemoveService is called.  It all boils down to this:

image

So we're removing from a list.  What's all the fuss about?  Well, List<T>.Contains is next, and what it does is call Equals using the default equality comparer for the type.  In our case (as in most cases), it just calls Object.Equals.  Right, you could have told me that from looking at the call stack a couple of screens above.  What else is new?

Let's think again about that Object.Equals call.  While we are looking for the service to remove, we are essentially checking whether the Object.Equals call returns true when we pass to it the service we want to remove.  Now, what happens when we invoke Object.Equals on our cross-AppDomain custom service and pass to it the SqlWorkflowPersistenceService?

What we have then is a cross-AppDomain call!  And we're trying to marshal the SqlWorkflowPersistenceService in that call.  Bingo.

What can we do to remedy the situation?  Well, it doesn't seem like there is much to be done except for not using a cross-AppDomain custom service or not using a workflow persistence service (because it's the combination of both that creates this corner case).  What I chose to do is to wrap the cross-AppDomain service in a class that will be created locally inside the secondary AppDomain, and will contain a reference to the cross-AppDomain service.  It could be a simple wrapper or a delegating wrapper (if they implemented an interface), for example:

image

And then adding that wrapper as a workflow service instead of the cross-AppDomain service:

image

Another mystery solved.