DCSIMG
April 2008 - Posts - All Your Base Are Belong To Us

All Your Base Are Belong To Us

Mostly .NET internals and other kinds of gory details

April 2008 - Posts

WMI Provider Extensions in .NET 3.5 - Publishing Events and Advanced Topics

The previous posts in this series described various mechanisms of communication from the WMI consumer to the WMI provider – including read-only properties, read-write properties and methods. However, publishing events from a WMI provider is the only scalable option of providing changing contextual information as it occurs. Pulling the information on demand is not an option because it doesn’t scale.

In this post, we will use the BaseEvent class or the [InstrumentationClass] attribute to publish events from a WMI provider.

Publishing events from a WMI provider is not supported by the WMI Provider Extensions for .NET 3.5. It is only supported by the .NET 2.0 implementation. Therefore, in order to publish events from your application, the [Instrumented] assembly-level attribute from .Net 2.0 must be placed on the assembly, and another installer class derived from DefaultManagementProjectInstaller (as opposed to DefaultManagementInstaller) must be added to the project. The following code demonstrates the assembly-level attribute and installer class required to support events from a WMI provider.

[assembly: Instrumented(@"root\MyApplication")]

 

[RunInstaller(true)]

public class OldInstaller :

    DefaultManagementProjectInstaller {}

Note that using the [WmiConfiguration] assembly-level attribute and DefaultManagementInstaller-derived installer (which we used in the previous posts) is not enough to publish events. To properly publish events to the required namespace, the [Instrumented] assembly-level attribute and DefaultManagementProjectInstaller-derived installer are required. It is possible to mix both approaches in the same assembly.

An event that can be published from a WMI provider is represented as a simple managed class. The class contains properties which constitute the event data. The following code demonstrates an event class that can be published whenever the CPU temperature rises above a certain threshold.

public class CpuTemperatureAboveThresholdEvent : BaseEvent

{

    public float ActualTemperature { get; private set; }

    private CpuTemperatureAboveThresholdEvent(

        float actualTemperature)

    {

        ActualTemperature = actualTemperature;

    }

    public static void Publish(float actualTemperature)

    {

        new CpuTemperatureAboveThresholdEvent(

            actualTemperature).Fire();

    }

}

To publish this event, we can use the static Publish method we have just written:

CpuTemperatureAboveThresholdEvent.Publish(85.0f);

An alternative to deriving the event class from the BaseEvent type is using the [InstrumentationClass] attribute and specifying that the instrumentation type for the class is an event. The following code demonstrates how the event can be rewritten using this paradigm:

[InstrumentationClass(InstrumentationType.Event)]

public class CpuTemperatureAboveThresholdEvent

{

    public float ActualTemperature { get; private set; }

    private CpuTemperatureAboveThresholdEvent(

        float actualTemperature)

    {

        ActualTemperature = actualTemperature;

    }

    public static void Publish(float actualTemperature)

    {

        Instrumentation.Fire(

            new CpuTemperatureAboveThresholdEvent(

                actualTemperature));

    }

}

Note that the only changes are that the class doesn’t derive from BaseEvent, and that to publish the class the Instrumentation.Fire method must be called instead of the Fire method inherited from BaseEvent.

Visual Studio’s Server Explorer features the support necessary for subscribing to events published by a WMI provider. Right click the Management Events node and select Add Event Query. In the dialog displayed, search for your class name (in our case, CpuTemperatureAboveThresholdEvent). A sub-node appears under the Management Events node – this node is the container for the events consumed by Server Explorer. These events should also appear in the Output window as soon as they are published. In a WinForms application or any other designer surface, it’s possible to drag and drop the event query node to create and configure an instance of the ManagementEventWatcher class ,enabling you to consume the event at runtime.

The ManagementEventWatcher class can be used directly without Server Explorer’s mediation. The following code demonstrates registering to the CpuTemperatureAboveThresholdEvent described in the previous section and outputting the CPU temperature when the event is published.

ManagementEventWatcher watcher =

    new ManagementEventWatcher(@"root\MyApplication",

        "SELECT * FROM CpuTemperatureAboveThresholdEvent");

watcher.EventArrived += (o, e) =>

    Console.WriteLine(e.NewEvent["ActualTemperature"]);

watcher.Start();

Is is time to outline some of the advanced topics that are outside the scope of this series.

References between WMI Objects

A reference between WMI objects has to be established using a property decorated with a [ManagementReference] attribute. The attribute’s parameter specifies the type of the referenced object. A standard object reference will not work with WMI without this attribute.

Enumerators

WMI classes can implement an enumeration static method to dynamically discover or create WMI object instances. This can be accomplished by decorating a method wit the [ManagementEnumerator] attribute and returning an IEnumerable. This static method can then be called instead of issuing a standard object query; it enables dynamically creating the WMI instances instead of publishing them in advance.

Create and Remove Methods

Instances methods on a WMI class can create a new instance of the WMI class. These methods must have the same signature as the class binding constructor (decorated with [ManagementBind]) and be decorated with the [ManagementCreate] attribute.

Similarly, a method can be denoted as a cleanup method for a MWI class instance. This method must not have any parameters and must return void, and be decorated with the [ManagementRemove] attribute.

Dynamic Registration

Instead of performing registration with the InstallUtil.exe tool, it is possible to dynamically register and unregister assemblies or types containing WMI provider information. The InstrumentationManager.RegisterAssembly and InstrumentationManager.UnregisterAssembly methods provide dynamic registration and unregistration for all the types in the specified assembly. The InstrumentationManager.RegisterType and InstrumentationManager.UnregisterType methods provide dynamic registration and unregistration for the specified specific type.

WMI Provider Extensions in .NET 3.5 - Read-Write and Method Provider

In the previous post in this series, we have looked into implementing a rudimentary WMI provider which exposes read-only information.  In this post, we will make our provider more interesting by exposing read-write information.

Read-write information is exposed by defining a read-write property and decorating it with the [ManagementConfiguration] attribute. The [ManagementEntity], [ManagementBind] and [ManagementKey] requirements described in the previous post must still hold.

The following code demonstrates a WMI provider which exposes a read-write property controlling the brightness of the screen. For demonstration purposes, the code to modify screen brightness has been omitted from the sample.

[ManagementEntity]

public class BrightnessController

{

    [ManagementBind]

    public BrightnessController()

    {

        Id = 1; BrightnessLevel = 500;

    }

    [ManagementKey]

    public int Id { get; private set; }

    [ManagementConfiguration]

    public int BrightnessLevel { get; set; }

}

Due to limitations in the current WMI Provider Extensions for .NET implementation, a singleton WMI provider cannot expose any updateable properties. Attempting to update a property exposed by a singleton WMI provider will result in an ExecutionEngineException at runtime.

The wmic tool has the capability of writing values to writeable properties of a WMI provider. The following command executed from a command prompt will modify the BrightnessLevel value depicted in the previous section.

wmic /namespace:\\root\MyApplication path BrightnessController.Id=1 set BrightnessLevel=400

Note the use of the .Id=1 syntax to specify which brightness controller instance we are updating. Alternatively, query syntax can also be used:

wmic /namespace:\\root\MyApplication path BrightnessController where Id=1 set BrightnessLevel=400

Modifying updateable information from managed code generated by Visual Studio is as easy as writing a value to a writeable property. After generating the managed class, the following code will update the brightness controller’s brightness level:

foreach (ROOT.MYAPPLICATION.BrightnessController

    controller in

    ROOT.MYAPPLICATION.BrightnessController.GetInstances(

    "Id = 1"))

{

    controller.BrightnessLevel = 400;

}

Control Methods

In the previous section, we have exposed read-write properties from a WMI provider. Setting a property value is similar to invoking a method; however, similarly to the semantic difference between properties and methods in .NET, there is a difference between properties and methods in the WMI taxonomy.

In this section, we will define a method exposed by our WMI provider by decorating it with the [ManagementTask] attribute. The [ManagementEntity], [ManagementBind] and [ManagementKey] requirements described in the previous post must still hold.

The following code demonstrates a WMI provider which exposes a PrintToConsole method which outputs the specified string to the console. This would clearly be a bad idea for an in-process provider (which has no console), but will work just fine for a decoupled provider hosted in a console application.

[ManagementEntity]

public class Printer

{

    [ManagementBind]

    public Printer([ManagementName("Id")] int id)

    {

        Id = id;

    }

    [ManagementKey]

    public int Id;

    [ManagementTask]

    public void PrintToScreen(string s)

    {

        Console.WriteLine(s);

    }

}

Due to limitations in the current WMI Provider Extensions for .NET implementation, a singleton WMI provider cannot expose any methods. Attempting to call a method exposed by a singleton WMI provider will result in an ExecutionEngineException at runtime.

The wmic tool has the capability of invoking methods exposed by a WMI provider. The following command executed from a command prompt will invoke the PrintToScreen method depicted in the previous section.

wmic /namespace:\\root\MyApplication path Printer.Id=1 call PrintToScreen Hello!

Invoking provider methods from managed code generated by Visual Studio is as easy as calling a method on an object. After generating the managed class, the following code will invoke the PrintToScreen method:

foreach (ROOT.MYAPPLICATION.Printer printer in

    ROOT.MYAPPLICATION.Printer.GetInstances("Id = 2"))

{

    printer.PrintToScreen("Hello!");

}

In the next (final) post in the series, we will look into publishing events from a WMI provider and briefly mention some advanced mechanisms.

WMI Provider Extensions in .NET 3.5 - Introduction and Read-Only Provider

Windows Management Instrumentation (WMI) is a C&C infrastructure that is integrated within Windows. It provides three primary capabilities:

  • Exposing state information regarding a configurable entity
  • Invoking control methods on a configurable entity
  • Publishing events from a configurable entity

These facilities are a complete instrumentation solution for any Windows application, and multiple system components expose information through the use of WMI providers. This information can be consumed from a multitude of languages and technologies by WMI consumers, using a standard query language (WQL).

Managed code support for WMI providers has been extended in .NET 3.5, enabling every capability a native WMI provider can support. This functionality resides in the System.Management and System.Management.Instrumentation assemblies, and is generally referred to as the WMI Provider Extensions for .NET.

The primary advantage to using WMI in favor of the multitude of communication technologies that are abound is that WMI is a standardized C&C mechanism which can be consumed by numerous existing C&C frameworks. Another advantage is that most Windows components expose C&C information using WMI, and it is preferable that a single C&C framework is used instead of reinventing a C&C framework for each individual component. This makes a single C&C tool suitable for a variety of configurable and controllable entities.

There is a significant number of resources demonstrating what kind of tasks can be accomplished using WMI: WMI Tasks for Scripts and Applications, WMI and .NET in the MSDN Magazine, WMI Code Creator Spotlight . . .  However, there is no single resource outlining the exact details of creating every kind of provider supported in the WMI Provider Extensions for managed code.

This series of posts provides guidance and walkthrough steps for implementing a WMI provider in managed code and consuming the information exposed by that provider from managed code.

Hosting Model

WMI providers can be hosted in a separate application, or within the WMI host service. Separately hosted providers are also called decoupled providers. Providers hosted within the WMI host service are also called in-process providers.

Decoupled providers expose their information only as long as the application in which they are hosted is running. In-process providers expose their information as long as the WMI infrastructure on the machine is functioning properly.

Basic Setup Steps

An assembly that contains a WMI provider must be decorated with the [WmiConfiguration] attribute, specifying the WMI namespace for the provider and the hosting model the provider chooses to use.

The following code demonstrates setting up an assembly that exposes a WMI provider in the “root\MyApplication” namespace, with a decoupled hosting model:

[assembly: WmiConfiguration(@"root\MyApplication",

    HostingModel = ManagementHostingModel.Decoupled)]

The following code demonstrates setting up an assembly that exposes a WMI provider in the “root\MyApplication” namespace, with an in-process hosting model requiring the provider to run under the NetworkService account:

[assembly: WmiConfiguration(@"root\MyApplication",

    HostingModel = ManagementHostingModel.NetworkService)]

To inform the WMI infrastructure that the assembly contains a WMI provider, the assembly must contain an installer class that can be run by the InstallUtil.exe tool. This installer class must derive from the DefaultManagementInstaller class and be decorated with the [RunInstaller] attribute. It does not require any additional functionality.

The following code demonstrates the installer class that must be present in the WMI provider’s assembly:

[RunInstaller(true)]

public class MyApplicationManagementInstaller :

    DefaultManagementInstaller { }

With these setup steps in place, all that is left is to implement the actual WMI provider (described in the subsequent sections). Once the assembly is complete, it must be registered using the InstallUtil.exe tool. From a Visual Studio command prompt, the following command registers the assembly with the WMI infrastructure:

InstallUtil MyAssembly.dll

To unregister the assembly, the following command can be executed from a Visual Studio command prompt:

InstallUtil /u MyAssembly.dll

Read-Only Information

The most rudimentary kind of WMI provider is a WMI provider that exposes read-only information regarding a configurable entity.

The managed model for this kind of WMI provider consists of implementing a class that exposes the information as read-only properties. The class itself must be decorated with the [ManagementEntity] attribute, and the properties must be decorated with the [ManagementProbe] attribute.

When applying the [ManagementEntity] attribute, you can choose whether the WMI class you are exposing will be a singleton entity, or whether it can have multiple instances. If your WMI class can have multiple instances, one or more of the read-only properties that you provide must be decorated with the [ManagementKey] attribute. These properties will allow WMI consumers to determine a consistent identity for the instances of your class that they are querying.

A multi-instance WMI provider which does not expose at least one property decorated with the [ManagementKey] attribute will not function properly at runtime. Any attempt to access the provider will result in an ExecutionEngineExecption in the hosting process. The exception details can be examined in the Application Event Log.

A singleton WMI provider which attempts to expose a property decorated with the [ManagementKey] attribute will fail to register during the installation phase.

Finally, the constructor of your class or a static method on your class must be decorated with the [ManagementBind] attribute. This specifies that the creation of an object instances when requested by the WMI infrastructure will be directed at the constructor or static method, respectively.

The following code demonstrates a singleton WMI class that exposes the number of processors available on the machine as a read-only property.

[ManagementEntity(Singleton=true)]

[ManagementQualifier("Description",

    Value="Obtain processor information.")]

public class ProcessorInformation

{

 

    [ManagementBind]

    public ProcessorInformation()

    {

    }

    [ManagementProbe]

    [ManagementQualifier("Descriiption",

        Value="The number of processors.")]

    public int ProcessorCount

    {

        get { return Environment.ProcessorCount; }

    }

}

To publish the instance to the WMI infrastructure, the hosting process can use the following code:

InstrumentationManager.Publish(new ProcessorInformation());

This simple class can now be registered using InstallUtil.exe and then queried from a WMI consumer. Note the use of the [ManagementQualifier] attribute to add description information to the class and property levels.

The following code demonstrates a multi-instance WMI class that exposes the thread count in each process running on the system as a read-only property.

[ManagementEntity]

public class ProcessInformation

{

    Process _theProcess;

    [ManagementBind]

    public ProcessInformation(

        [ManagementName("Id")] int id)

    {

        _theProcess = Process.GetProcessById(id);

    }

    [ManagementKey]

    public int Id

    {

        get { return _theProcess.Id; }

    }

    [ManagementProbe]

    public int ThreadCount

    {

        get { return _theProcess.Threads.Count; }

    }

}

To publish the process instances to the WMI infrastructure, the hosting process can use the following code:

foreach (Process p in Process.GetProcesses())

{

    InstrumentationManager.Publish(

           new ProcessInformation(p.Id));

}

Changes to object instances made by InstrumentationManager.Publish (publishing an object instance) or by InstrumentationManager.Revoke (revoking an object instance) are often not effectively immediately, but with a delay of a few seconds. When consuming WMI instances, a retry or timeout mechanism should be implemented when the object is initially published to alleviate this delay.

Consuming the information exposed by the provider developed in the previous section can be performed using the ManagementObjectSearcher, ManagementObject and other types. Additionally, there are numerous tools to aid in the development and testing of WMI providers, such as wmic built-in tool, or the GUI WMI Tools package. The Visual Studio Server Explorer can also generate a managed class wrapping a WMI provider.

Opening an administrator command prompt and executing the following command will yield a detailed list of ProcessInformation instances defined in the previous section.

wmic /namespace:\\root\MyApplication path ProcessInformation

The wmic tool supports query syntax for specifying conditions, as well:

wmic /namespace:\\root\MyApplication path ProcessInformation where Id=4

Alternatively, the following managed code can output the ProcessInformation objects:

ManagementObjectSearcher searcher =

    new ManagementObjectSearcher(@"root\MyApplication",

        "SELECT * FROM ProcessInformation");

foreach (ManagementObject obj in searcher.Get())

    Console.WriteLine("Id: " + obj["Id"] +

        ", ThreadCount: " + obj["ThreadCount"]);

If you’re looking for a strongly-typed accessor, there’s no need to generate it manually. Within Visual Studio’s Server Explorer, expand your server’s node, right click the Management Classes node and choose Add Classes. Navigate to the root\MyApplication namespace and choose the ProcessInformation class, click Add and dismiss the dialog. Now right-click the ProcessInformation node and select Generate Managed Class. Visual Studio automatically adds to your project a Component-derived class that enables strongly-typed access to the WMI provider. Now the following code can replace the ManagementObjectSearcher-based approach:

foreach (ROOT.MYAPPLICATION.ProcessInformation info in

    ROOT.MYAPPLICATION.ProcessInformation.GetInstances())

{

    Console.WriteLine("Id: " + info.Id +

        ", ThreadCount: " + info.ThreadCount);

}

Note that the underlying accessor still uses the weakly-typed ManagementObject, so using the accessor does not provide performance benefits – only correctness and ease of use. Throughout the rest of this series, the weakly-typed approach will not be demonstrated at all for brevity purposes.

In the next post, we will look into making our provider more interesting by exposing configurable (read-write) data and methods.  In the final post of this series we will also add support for publishing events and outline some advanced topics.

Don't Blindly Count on a Finalizer

Finalization is one of the most complicated and obscure areas of .NET.  Most developers don't actually bother writing finalizers, but if you're designing and implementing any framework of considerable size, you are probably going to stumble across a well-defined set of issues.  And yes, it's not that simple to get it right - performance issues, concurrency issues, throttling issues, GC issues - there is a plethora of rocks to wreck your boat on.

There are many resources out there that assist in getting finalization right, so I won't be repeating what others have written.  However, I would like to address an area that is just as important - the question of whether you can count on your finalizer.  To be more specific, whether you can count on your finalizer getting executed.

Finalization is a mechanism for releasing unmanaged resources.  As such, it is often quite important that the finalizer execute - or else a resource leak of some sort might arise.  Some of these leaks are restricted to the boundaries of a single process, and will go away once the process terminates.  Other leaks are not restricted to a single process or even a single machine, and have the potential of virally bringing an entire system down.

For example, consider a resource such as a file handle.  A file handle is an OS resource that is restricted to a single process, so if you forget to close a file handle, it will be closed for you automatically when the process terminates.  On the other hand, the file that you created does not go away automatically with your process (with the exception of files explicitly created for temporary storage with FILE_ATTRIBUTE_TEMPORARY) - it is a resource on your file system that outlives a single process, and if you have a leak of temporary files getting created without being deleted, you might run out of disk space, which is a system-wide problem.

So once we have established that finalizers are important and that their execution could be critical for the proper operation of a system, why would a finalizer not get called?  What can prevent your carefully crafted resource disposal mechanism from executing?

Well, there are at least four scenarios in which the finalizer won't get called.  Some of them might be regarded as corner cases, other might be more common.  However, the primary thing to keep in mind is that if the resource you're protecting has the potential of bringing the system down, then you must consider a form of protection outside finalization to proactively defend yourself against the case when the finalizer won't get called.  An example of a proactive defensive approach would be what .NET remoting uses to implement distributed garbage collection.  When a remote object is instantiated by a client, the server infrastructure could keep that object alive as long as the client didn't send an explicit "destroy object" request.  However, if the client forgets to destroy the object, or the message gets lost, or the network connection fails, the server object would then remain leaked forever.  And that's exactly why .NET remoting implements a lifetime management mechanism with leases and sponsors, making everything significantly more complicated but addressing the potential leak.

So without further ado, why would a finalizer not get called?

Well, the simplest way of all to prevent finalizers from executing is get the finalizer thread to do something else.  Indefinitely.  There is just one finalizer thread, so if you nailed it down, no one else is there to run finalizers on your behalf.  This is significantly less complicated than it sounds, because all you need to do is put a Thread.Sleep(Timeout.Infinite) in your rogue object's finalizer, and no other finalizer will ever get called.  This scenario has been extensively covered by Tess and myself, and can be diagnosed quite easily using SOS.  (Obviously there are also more realistic and less malicious ways to get the same effect, such as waiting for some resource that is never available, waiting for a message from the network that never arrives, etc.)

Another way to prevent a finalizer from executing is rudely aborting the application.  This can be requested by your CLR host, another process, or even a rogue piece of code inside your process.  For example, the Environment.FailFast method rudely aborts the process, without allowing finalizers to run.  Alternatively, the TerminateProcess function rudely terminates the process without any cleanup mechanisms.  Not only finalizers, but even finally blocks won't run in these cases.

A third scenario where a finalizer won't run is a corner case condition where the application experiences an OutOfMemoryException.  Consider what happens - if the application is supposed to terminate because of an OutOfMemoryException, surely I would like my finalizer to run.  However, it might as well be the case that this is the first time ever my finalizer wants to run in that execution.  So if it's the first time (and I haven't used NGEN), then the finalizer has to be JIT-compiled first before it can be executed.  But to JIT-compile it, memory must be allocated.  But hey! - we're in an out of memory situation.  Bummer.  No finalizer for you - come back one year.  This specific scenario can be alleviated by deriving your finalizeable resource class from CriticalFinalizerObject.  But it kind of starts showing you that ensuring that some code executes in every possible scenario is significantly harder than it seems.

A final scenario (there might be more but that's all I could think of right now) in which a finalizer won't run is a specific case of process termination or AppDomain unload.  When the process terminates or the AppDomain unloads, the CLR attempts to run all finalizers for objects that requested finalization.  However, in this specific scenario, it imposes a limit on the time allotted to these finalizers.  Each individual finalizer gets up to 2 seconds to run, and all finalizers combined get up to 40 seconds.  So no matter what you do, if your finalization work takes more than 2 seconds, or if the combined finalization work of all objects takes more than 40 seconds, some finalizers are not guaranteed to run, and some others might be interrupted in the middle.  You can program defensively against this situation using Environment.HasShutdownStarted and AppDomain.IsFinalizingForUnload - but this is another demonstration of how non-trivial it is to ensure that something happens in every possible scenario.

Summing things up, finalization is not a bad feature.  It is a must in the world of unmanaged resources which we live in.  However, there are significant pitfalls associated with finalization that we must be aware of - and one of them is that we can't rely on finalization one hundred percent.  So if after reading this post you no longer take it for granted that finalizers will take care of all the troubles of the world for you, you are already in a better position than you were in the beginning.

Parallelism and CPU Affinity

Contrary to popular belief, Windows actually does a good job scheduling threads for execution across the available CPUs.  (Yes, this was a controversial first sentence, but bear with me.)

More often than not, smart developers tend to try to "outsmart" the operating system or framework that they happen to be using.  In the particular case of thread scheduling, this "outsmarting" normally tends to fall in one of two categories:

  1. Not trusting the OS with thread CPU affinity.  I.e., if I know that I have 4 cores, I will explicitly create 4 threads and assign each thread to its very own CPU.  Now I feel in control.
  2. Not trusting the OS to schedule work items for execution.  I.e., if I know that I have 100,000 independent work items to execute, and 4 cores to execute them on, then I will explicitly create 4 threads and assign to each of them a queue, and enqueue 25,000 work items into each queue, . . .  (Not to mention the attempt to create 100,000 threads, one for each work item, a classical mistake which belongs to "Threading and Concurrency 101.")

I'd like to relate to affinity in this post, leaving thread pooling and thread pool management to a (hypothetical) future post.

One of the primary parallelism blockers is CPU affinity.  If I have a task that is affined to one CPU, then if that CPU is busy but another is available, I have no means of executing the task.  One of the classical cases of CPU affinity was the NDIS interrupt affinity, binding the network card to a single CPU which can process its incoming packets.  If that CPU was particularly busy processing other interrupts, network receive-side processing was adversely affected.  Furthermore, if multiple CPUs were available to perform receive-side processing, only one of them would do the actual work.

A slightly more complicated example has to do with scheduling threads with varying priorities on a multi-core system.  Assume we have the following threads:

  Priority Affinity Current CPU
Thread A 8 CPU 0, 1 0
Thread B 10 CPU 1 In a wait
Thread C 12 CPU 0, 1 1

Assume that thread B comes out of its wait.  The Windows scheduler now has to decide what to do with that thread.  Since it's only willing to run on CPU 1, and that CPU is currently running a higher-priority thread, thread B will have to wait.  The scheduler won't go out of the way to shuffle threads around so that both thread B and thread C can run (thread B on CPU 1 and thread C on CPU 0), because it means that all executing threads must be preempted, cache locality negatively affected, etc.  So toying around with affinity has dire consequences for thread B in this case.

By the way, oftentimes CPU affinity is not intentional.  It's not necessarily the case that the developer was smart enough to actually use the thread CPU affinity API; it could also be the result of a specific framework or a specific scenario within a framework.

For example, Win32 executables have an optional PE header flag under the Image Characteristics section indicating that the executable should be executed on a single-processor machine only.  To execute them on a multi-processor machine, a single CPU is chosen in a round-robin fashion for that executable.

Thread affinity and CPU affinity, by the way, are not the same.  CPU affinity means that a specific thread is bound to a specific CPU.  Thread affinity means that a set of tasks is bound to a specific thread.

For example, a COM object residing in an STA apartment will be marshaled as a proxy outside the apartment.  The proxy will ensure that all calls to the object are marshaled to its original apartment and executed on a single thread, even if they originate at multiple threads.  There's your thread affinity.

The .NET runtime is slightly different with regard to fooling around with thread CPU affinities, as far as developers are concerned.  Unlike the native execution model, which assumes direct control over threads, the CLR provides the notion of an abstract "task" which is not necessarily the same as the underlying OS thread.  The default CLR host maps these tasks to OS threads, so there is a one-to-one correspondence between the "managed" thread and the "unmanaged" or "physical" thread.  It is therefore possible to make certain assumptions and modify the CPU affinity of the physical thread as a means to modify the CPU affinity of the managed (logical) thread.  This is fragile because the behavior is subject to change in a future CLR version, but we can probably rely on it for the next couple of years.

However, non-default CLR hosts (such as SQL Server 2005) are welcome to implement the abstract "task" notion as something that doesn't directly map to an OS thread (for example, using cooperative scheduling, fibers, longjmp, or whatnot).  The common reasons for doing that are performance (refraining from creating many physical OS threads) and reliability (exercising tighter control over task execution).

When dealing with non-default CLR hosts (or future versions of the default CLR host, for that matter), our code can no longer assume anything regarding a correspondence between physical and logical threads.  Therefore, modifying the affinity of the physical thread is downright wrong in these environments, because it might affect logical tasks that are completely unrelated to the current logical task.

(As a side note, if our managed code specifically requires the same underlying physical thread during the execution of some operation, the Thread.BeginThreadAffinity and Thread.EndThreadAffinity APIs can be used to advise the CLR host of our intent.)

Design for Performance Up-Front?

There has been a lively discussion on Oren's blog with regard to whether it's appropriate to design for performance up-front when building an application.  I don't want to rephrase what Oren is saying (he makes an excellent case in these two posts), but it essentially boils down to the following three statements:

  1. You can always refactor for performance later.
  2. If an operation takes around the magnitude of a microsecond, then it is insignificant.
  3. Hotspots where abstractions must be removed are rare.

As always, the correctness of the above statements highly depends on the environment you're in and the application you are working on.  It's true that some applications can be constructed without considering performance up-front, and yet yield sufficient results.  For example, most of my demo applications that I use when teaching .NET are constructed without considering performance up-front, and yet everyone is perfectly happy about using them.  On the other hand, sometimes you'd rewrite code multiple times just because you didn't design for performance up-front.  And yes, it's fun to do (the rewriting) but sometimes you just can't afford doing it.

By the way, if refactoring and rewriting is a valid approach, and the time required to perform these activities is insignificant, then I could argue the following:

It is often not important to design an application for correctness up-front.  We could implement the incorrect behavior (which is easier to code, maybe?), and if something goes wrong then we could always rewrite it.  Besides, it might just be the case that the customer really wanted the incorrect behavior.  (Oh, and we have a decoupled approach so it's easy to replace just the component that yields the incorrect behavior.)

Can you spot the absurdity?  Can you spot the similarity?  I sure can, and that's why I constantly fail to understand how performance is so different from correctness as far as design, development, testing - the whole lifecycle - is concerned.  Performance is just another aspect of correctness.  If a user clicks a button and the application crashes, it's a problem.  If a user clicks a button and the application takes 5 seconds to respond when it should be unnoticeable, it's a problem.  It's the same problem.

As for abstractions and hotspots where these abstractions must be removed, I couldn't disagree more.  It doesn't occur to us often because we rely on frameworks so much without ever looking under the engine hood, but lots of the infrastructure code you'd find in a framework can be categorized as a hotspot where you should be giving up abstraction and decoupling for the sake of performance.

Consider adding floating point numbers.  Yes, I could design a mockable, testable approach with interfaces.  Or I could make it a hardware intrinsic.  Guess what, processors with floating point emulation in software are nowhere to be seen.  Extinct.  For a reason.

Consider taking a spinlock in the operating system code.  Yes, I could design a mockable, testable approach with interfaces.  Or I could make it a single DWORD and employ BTS (or XCHG) to do the work.  And I could compile a different flavor of the kernel for the single-processor case, where just raising the IRQL is enough.  Hey, it's a spinlock.  I'm taking millions of these spinlocks per second at times.  I don't care if it's testable (I'm pretty sure 30 years of programming have gotten spinlocks right), I want it to be fast.  And yes, if it takes a microsecond then it is SO SLOW that my best hope would be using my OS as a Commodore64 emulator.  And don't forget that any decently-sized system has billions of operations each taking under a microsecond.  These things tend to add up.

Consider adding elements to a queue.  Yes, I could design a mockable, testable approach with interfaces.  Or I could use a lock-free queue where I don't have the Count operation on a queue because it doesn't align with the non-blocking implementation.  How on earth do you refactor an application so that it doesn't use the Count operation on a queue, after you have written it?!

I could go on with the examples, and what I do recognize is that for each example you could come up with a counter-example.  You could also try undermining the validity of my examples.  But ignore the examples - look at the underlying theme.  The theme is that performance is not less important than correctness.  It has to be considered through every phase of the development lifecycle.  It cannot be neglected, it cannot be ignored, it cannot be "refactored into the app" later.

We all have performance built into our mindset.  A good programmer makes performance-related decisions (design or coding time) automatically.  It is effectively impossible to ignore performance considerations, just as it is impossible to write incorrect code when you know that it's not going to do what you want it to.

Automatically Converting Exceptions to WCF Faults

One of the annoyances in service design is that you have to dedicate lots of thinking to your error propagation strategy.  Any application framework or even utility class should be well-defined with regard to exceptions flowing to the outside world; however, when services are concerned, this is a matter of utmost importance.

The reason for the distinction is the following simple notion: If you're dealing with an object, as the object's client you are very strongly coupled to its exception model.  You are well-aware of the fact that exceptions can propagate from the object and you have the facilities for catching those exceptions and acting according.  However, if you're dealing with a service, as the service's client you are decoupled from its exception model.  In fact, you don't care what it uses internally (be it exceptions, Win32 error codes, HRESULTs or longjmp's), as long as there is a well-defined contract that exposes these errors externally.

In WCF, errors are exposed externally through the use of a fault contract.  An operation can specify that certain kinds of faults can escape it - this is similar to an exception specification in C++ or in Java.  While it is possible for standard .NET exceptions to cross service boundaries, a fault contract is the cleanest way that can be discovered in advance, as part of the service description.

The problem begins when you want to map .NET exceptions (which are the most straightforward way of dealing with application errors) to WCF faults.  This basically means that your top-level service methods have to take the following form:

public void SomeOperation()

{

    try

    {

        DoSomething();

        DoSomethingElse();

        DoSomethingElseEntirely();

    }

    catch (InvalidOperationException iopEx)

    {

        throw new FaultException<InvalidOperationFault>(iopEx.ToString());

    }

    catch (SecurityException)

    {

        throw new FaultException<SecurityFault>();

    }

    catch (Exception ex)

    {

        throw new FaultException(ex.Message);

    }

}

Note that in this particular case, there is a well-defined mapping the application has established between particular types of exceptions to particular types of faults (the InvalidOperationFault and SecurityFault classes were made up for this specific purpose).  However, this mapping is not automatic and therefore very error-prone.  And you don't want your exception propagation to be error-prone, right?

I was looking for an automatic approach to this scenario, and just like with everything else, WCF has an extensibility point to provide exactly what we're looking for.  There are two well-defined ways to provide fault messages corresponding to an exception.  The first is through the use of a FaultConverter, which is what you use if you're authoring a custom channel.  However, authoring a custom channel (or even binding element) for the sole purpose of translating exceptions to faults is an overkill.

The second approach is implementing the IErrorHandler interface and registering your object to the channel dispatcher's error handlers' collection.  This can be done as a service behavior, in about 20 lines of code.  Providing the mapping from exceptions to faults is not very difficult either:

public static class ExceptionToFaultConverter

{

    private static Dictionary<Type, Delegate> _converters =

        new Dictionary<Type, Delegate>();

 

    public static void RegisterConverter<TException, TFault>(

        Func<TException, TFault> converter)

    {

        _converters.Add(typeof(TException), converter);

    }

 

    public static object ConvertExceptionToFault(Exception ex)

    {

        Delegate converter;

        if (!_converters.TryGetValue(ex.GetType(), out converter))

            return null;

 

        return converter.DynamicInvoke(ex);

    }

}

With this simple framework in place, when we have a new exception type to map to a new fault contract, we can go ahead and do it:

ExceptionToFaultConverter.RegisterConverter<

    InvalidOperationException, InvalidOperationFault>(

        x => new InvalidOperationFault {ErrorDetails=x.ToString()});

We could also filter by the actual operation, the contract name, the fault contracts on the operation, or whatever we want really, because this is what the point of invocation looks like for our extension:

class MyErrorHandler : IErrorHandler

{

    public bool HandleError(Exception error)

    {

        return false;

    }

    public void ProvideFault(Exception error, MessageVersion version,

        ref Message fault)

    {

 

        object faultDetail =

            ExceptionToFaultConverter.ConvertExceptionToFault(error);

 

        //Find fault contracts for current operation.

        OperationContext ctx = OperationContext.Current;

        ServiceDescription hostDesc = ctx.Host.Description;

        ServiceEndpoint endpoint =

            hostDesc.Endpoints.Find(

                ctx.IncomingMessageHeaders.To);

        string operationName =

            ctx.IncomingMessageHeaders.Action.Replace(

                endpoint.Contract.Namespace +

                endpoint.Contract.Name + "/",

                "");

        OperationDescription operation =

            endpoint.Contract.Operations.Find(operationName);

        foreach (FaultDescription faultDesc in operation.Faults)

        {

            if (faultDesc.DetailType.IsAssignableFrom(

                faultDetail.GetType()))

            {

                //Found a match:

                fault = Message.CreateMessage(

                    version,

                    FaultCode.CreateSenderFaultCode(

                        faultDesc.Name, faultDesc.Namespace),

                    faultDetail.ToString(),

                    faultDetail,

                    faultDesc.Action

                    );

                break;

            }

        }

    }

}

To register this error handler, we need a service behavior attached, like the following one:

class ErrorBehaviorAttribute : Attribute, IServiceBehavior

{

    Type _handlerType;

    public ErrorBehaviorAttribute(Type handlerType)

    {

        _handlerType = handlerType;

    }

 

    #region IServiceBehavior Members

 

    public void AddBindingParameters(ServiceDescription serviceDescription, ServiceHostBase serviceHostBase, System.Collections.ObjectModel.Collection<ServiceEndpoint> endpoints, BindingParameterCollection bindingParameters)

    {

    }

 

    public void ApplyDispatchBehavior(ServiceDescription serviceDescription, ServiceHostBase serviceHostBase)

    {

        IErrorHandler errorHandler = (IErrorHandler)

            Activator.CreateInstance(_handlerType);

        foreach (ChannelDispatcherBase chanDisp in serviceHostBase.ChannelDispatchers)

        {

            ChannelDispatcher disp = chanDisp as ChannelDispatcher;

            disp.ErrorHandlers.Add(errorHandler);

        }

    }

 

    public void Validate(ServiceDescription serviceDescription, ServiceHostBase serviceHostBase)

    {

    }

 

    #endregion

}

And finally, we need to specify that behavior on our service:

[ErrorBehavior(typeof(MyErrorHandler))]

class ServiceImpl : IService

This gives us the automatic facilities for mapping .NET exceptions to WCF faults.  This is a simplistic sample, but it shows what the possibilities really are.

Next Generation Production Debugging: Demo 6

This is the last in a series of posts summarizing my TechEd 2008 presentation titled "Next Generation Production Debugging".  Previous posts in the series:

After spending some quality time with the debugger, analyzing an invalid handle situation, I approached the final demo.  In this particular case, the application is requested to perform some heavy processing operation on a set of images (I called that operation "Batch Get Average Color").

Customers were highly unpleased with the performance of that operation, particularly since they were running the service side of the application on powerful multi-processor servers and not seeing all the cores being utilized for processing purposes.  Therefore, we have implemented a parallelized version of the same operation.  By the way, implementing it in parallel only required a tiny change to the code.  These are the two versions side by side:

public List<PictureColor> BatchGetAverageColor(List<string> pictureFileNames)

{

    List<PictureColor> avgColorList = new List<PictureColor>();

    pictureFileNames.Aggregate(avgColorList,

        (list, picFn) => { list.Add(GetAverageColor(picFn)); return list; });

    return avgColorList;

}

 

public List<PictureColor> ParallelBatchGetAverageColor(List<string> pictureFileNames)

{

    List<PictureColor> avgColorList = new List<PictureColor>();

 

    //Could be rewritten as an aggregate with PLINQ:

    //pictureFileNames.AsParallel().Aggregate(...)

    Parallel.ForEach(pictureFileNames, picFn =>

    {

        PictureColor c = GetAverageColor(picFn);

        lock (avgColorList) avgColorList.Add(c);

    });

 

    return avgColorList;

}

Parallel.ForEach comes from the Parallel Extensions for .NET CTP, and makes parallelizing a heavy processing operation across multiple CPUs a trivial task.

However, when using the next version of the application, performance didn't only go up, but it even seemed to be degraded slighly in the multiprocessor scenario.  How is this possible?  I've mentioned multiple reasons for parallelism blockers before (such as cache coherency and contention), but in this particular case the operation seems to be 100% CPU-bound.

The easiest way of getting a bird's view picture of what's going on in the system is using the Windows Performance Toolkit I've blogged about before the conference.  I enabled it and run the application's batch processing again and got the following results on the CPU and I/O activity graphs for the service process:

image

We can see that the operation is hardly using the CPU; the disk, on the other hand, is working at full power.  Subsequently, I asked xperf to show me a detailed graph of I/O activity and what I saw was a huge number of disk accesses to temporary files at random disk offsets during the time of the batch processing operation!

image

Since the operation is clearly I/O bound, increasing the number of processors on which is runs is not going to improve performance whatsoever (if the disk bandwidth is completely saturated).

So thanks again for coming to my presentation or for reading this brief summary of the demos.  When the session recording hits the web, I'll be sure to let you know.  For the meantime, if you're looking for more material, you can tune in to my DevAcademy session recording, or read my post on debugging and investigation tools.

Next Generation Production Debugging: Demo 5

After looking at managed and native deadlock diagnosis, we transitioned to a state of banging our heads against the table, which is a state familiar to many developers from their debugging all-nighters.  How did we get in such a state?  By issuing the modest "Batch Move" command on a set of pictures we wanted to move to a separate folder.  The application responds in a way we have already seen before - it gets completely stuck.

If you try deadlock diagnosis using the techniques I've shown earlier (WCT or the SOSEX !dlk extension), you won't find anything useful.  Our last resort is trying to understand what our threads are doing, and see if we can come up with anything.

So at the client side (after attaching WinDbg and loading SOS), we have three threads, as shown in the debugger output of the !threads command:

image

Thread #0 is most likely the WinForm main GUI thread, #2 is the finalizer (we can see that in the Exception column not depicted in the above screenshot), and #6 is a thread we're not entirely sure about.

So let's go ahead and see what threads #0 and #6 are up to (the finalizer is of little interest unless other threads are waiting for finalization).  By switching into the thread and issuing the !clrstack and k commands we can see the managed and native call stack for the thread:

~0s
!clrstack

The debugger produces the following output (I've marked interesting frames in red and bold and cleaned the output a little bit by removing method parameters to make sure we can see what's going on):

001be510 76ffa69d [NDirectMethodFrameStandalone: 001be510] System.Net.UnsafeNclNativeMethods+OSSOCK.recv
001be528 7a5ff523 System.Net.Sockets.Socket.Receive
001be554 7a5ff401 System.Net.Sockets.Socket.Receive
001be570 7a5d3798 System.Net.Sockets.NetworkStream.Read
001be5a0 7a5ab17a System.Net.PooledStream.Read
001be5b8 7a5b01d3 System.Net.Connection.SyncRead
001be60c 7a5b0089 System.Net.Connection.PollAndRead
001be624 7a5b52fb System.Net.ConnectStream.PollAndRead
001be630 7a582278 System.Net.HttpWebRequest.EndWriteHeaders
001be65c 7a580be5 System.Net.HttpWebRequest.WriteHeadersCallback
001be66c 7a5b52bc System.Net.ConnectStream.WriteHeaders
001be6bc 7a5820d2 System.Net.HttpWebRequest.EndSubmitRequest
001be6e8 7a57f61b System.Net.HttpWebRequest.CheckDeferredCallDone
001be6f8 7a57f842 System.Net.HttpWebRequest.GetResponse
001be738 50971909 System.ServiceModel.Channels.HttpChannelFactory+HttpRequestChannel+HttpChannelRequest.WaitForReply
001be774 5092e8c6 System.ServiceModel.Channels.RequestChannel.Request
001be7f0 50b81bd5 System.ServiceModel.Dispatcher.RequestChannelBinder.Request
001be800 50b5e105 System.ServiceModel.Channels.ServiceChannel.Call001be930 50b5df23 System.ServiceModel.Channels.ServiceChannel.Call001be950 50b63ab1 System.ServiceModel.Channels.ServiceChannelProxy.InvokeService001be978 5091291b System.ServiceModel.Channels.ServiceChannelProxy.Invoke
001be9bc 79374dc3 System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke
001bec58 79f98b43 [TPMethodFrame: 001bec58] PictureViewer.Client.PictureViewerService.IPictureViewer.BatchMovePictures
001bec6c 00332e13 PictureViewer.Client.PictureViewerForm.batchMoveToolStripMenuItem_Click

What we can make out of the following output is the propagation of a service call to the top of the stack, where it finally simply waits for a response.  So this is an innocent thread all in all.  Let's take a look at #6 then:

05fbf780 76ffaec5 [HelperMethodFrame_1OBJ: 05fbf780] System.Threading.WaitHandle.WaitMultiple
05fbf84c 7940347e System.Threading.WaitHandle.WaitAny
05fbf868 7a5d2712 System.Net.TimerThread.ThreadProc
05fbf8b4 793b0d1f System.Threading.ThreadHelper.ThreadStart_Context
05fbf8bc 79373ecd System.Threading.ExecutionContext.Run
05fbf8d4 793b0c68 System.Threading.ThreadHelper.ThreadStart
05fbfafc 79e7c74b [GCFrame: 05fbfafc]

This isn't even a thread we control - it doesn't have a single frame related to our application.  It's running some sort of timer thread, so we'll leave it alone for now.  We still can't reach any definitive conclusions, but it seems like the service side is a better place to start looking for the deadlock at.

If we look at the service side, we have thread #0 doing an innocent Console.ReadLine (that's the service hosting code):

002feed8 758002b4 [NDirectMethodFrameStandaloneCleanup: 002feed8] System.IO.__ConsoleStream.ReadFile
002feef4 7948d2bb System.IO.__ConsoleStream.ReadFileNative
002fef20 7948d1ed System.IO.__ConsoleStream.Read
002fef40 793a3350 System.IO.StreamReader.ReadBuffer
002fef50 793aaa2f System.IO.StreamReader.ReadLine
002fef64 79497b5a System.IO.TextReader+SyncTextReader.ReadLine
002fef6c 001e01f1 PictureViewer.Service.Host.Main
002ff1ac 79e7c74b [GCFrame: 002ff1ac]

And then we have thread #5 left.  After a moment of dramatic suspense, let's see what that thread is up to (I've snipped most of the frames from the bottom of the output):

0552ed78 76ffa69d [NDirectMethodFrameStandalone: 0552ed78] <Module>.WaitForSingleObject(Void*, UInt32)
0552ed88 001e7882 PictureViewer.NativeHelpers.PictureUtils.BatchMovePictures
0552ed8c 001e77f2 PictureViewer.Service.PictureViewerService.BatchMovePictures
0552eda0 00c00693 DynamicClass.SyncInvokeBatchMovePictures
0552edb8 50b8d90b System.ServiceModel.Dispatcher.SyncMethodInvoker.Invoke
0552ee30 50b6d245 System.ServiceModel.Dispatcher.DispatchOperationRuntime.InvokeBegin
0552ee84 509137ad System.ServiceModel.Dispatcher.ImmutableDispatchRuntime.ProcessMessage5
 

This is a WCF service-side stack, invoking the BatchMovePictures method, which is exactly what we should expect.  It ends up calling <Module>.WaitForSingleObject, which is something that might be a little bit more difficult to come to good terms with.  As a general rule, whenever you see <Module> in your call stack, you can be pretty much sure that you're dealing with C++/CLI.  And as for WaitForSingleObject, that's just a Win32 API from kernel32.dll, so we're transitioning to native code.  Time for the native call stack (again, I snipped most of the bottom frames):

0:005> kb
ChildEBP RetAddr  Args to Child             
0552ecbc 757e1220 000004b4 00000000 00000000 ntdll!ZwWaitForSingleObject+0x15
0552ed2c 757e1188 000004b4 ffffffff 00000000 KERNEL32!WaitForSingleObjectEx+0xbe
0552ed40 00ba05d5 000004b4 ffffffff 4ecacae1 KERNEL32!WaitForSingleObject+0x12
WARNING: Frame IP not in any known module. Following frames may be wrong.
0552edac 50b8d90b 02b8e948 0552f054 00000000 0xba05d5

That's a pretty standard wait call stack, what we should like to understand now is why the pictures aren't getting moved, but instead we're waiting for something that clearly isn't happening.

The highlighted parameter (000004b4) is a handle to an object.  That's the synchronization object we're waiting for, so it could be nice to see what it is.  We can always use the !handle extension:

0:005> !handle 4b4
Could not duplicate handle 4b4, error 6

Error 6?  For those of us who don't have the entire Win32 error code list memorized, there's the !error extension:

0:005> !error 6
Error code: (Win32) 0x6 (6) - The handle is invalid.

Now is time for a little common sense.  We know that the handle is invalid.  However, if it were invalid at the time we called WaitForSingleObject, then we wouldn't be waiting right now - instead, WaitForSingleObject would simply have returned with a last error indicating that the handle is invalid.  Therefore, we must deduce that the handle was closed after the WaitForSingleObject call was made - someone pulled the handle from right under our feet!

In a case like this, where a handle suddenly becomes invalid, our best resort is Application Verifier.  It is a fantastic tool (that I already mentioned before) which can enable a whole suit of verifications on your application's code, without you having to modify a single line.  So fire up Application Verifier, and add the service application with the suite of checks you're interested in.  In this case, we're interested in the checks related to handle usage:

image

With that configured, we launch the service, attach a debugger (to ensure Application Verifier is loaded we can issue the !avrf command), and reproduce the scenario.  This time, instead of getting stuck, we have an exception in the debugger, also known as a verifier stop:

=======================================
VERIFIER STOP 00000300 : pid 0x2B54: Invalid handle exception for current stack trace.

    C0000008 : Exception code.
    0722F4EC : Exception record. Use .exr to display it.
    0722F53C : Context record. Use .cxr to display it.
    00000000 : Not used.

=======================================
This verifier stop is continuable.
After debugging it use `go' to continue.

=======================================

What this tells us is that Application Verifier has detected an invalid handle being passed to a system API.  Since we're looking for invalid handles, this is an interesting one to look at.  Note that we are certainly before the point where the application gets completely stuck, because we have seen earlier that the WaitForSingleObject call doesn't fail.  Therefore, some other code fails when trying to use the handle.  Where are we?  (Formatted for clarity.)

0:009> kb
ChildEBP RetAddr  Args to Child             
0722f1cc 71603933 63408066 0079f6f8 0079f6f0 ntdll!DbgBreakPoint
0722f3d0 6f9a3001 6f9a7ba8 00000300 c0000008 vrfcore!VerifierStopMessageEx+0x4bd
0722f3f4 6f998f5b 00000300 6f993204 c0000008 vfbasics!VfBasicsStopMessage+0xd1
0722f428 7700aa43 0722f440 0722f4ec 0722f4ec vfbasics!AVrfpVectoredExceptionHandler+0x9b
0722f450 76feb852 00000000 0079f6f0 77090180 ntdll!RtlpCallVectoredHandlers+0x57
0722f464 76feb5d1 0722f4ec 0722f53c 00000000 ntdll!RtlCallVectoredExceptionHandlers+0x15
0722f4d4 76fcee57 0722f4ec 0722f53c 0722f4ec ntdll!RtlDispatchException+0x19
0722f4d4 76ffa7ba 0722f4ec 0722f53c 0722f4ec ntdll!KiUserExceptionDispatcher+0xf
0722f804 6f99a14e 000006f8 00000002 0722f820 ntdll!NtQueryObject+0x12
0722f8b0 6f99a221 000006f8 00000000 0722f8d4 vfbasics!AVrfpCheckObjectType+0x4e
0722f8c0 6f99a4c3 000006f8 00000000 76ffa778 vfbasics!AVrfpHandleSanityChecks+0x51
0722f8d4 757e18e1 000006f8 00000000 0722f8f0 vfbasics!AVrfpNtSetEvent+0x23
0722f8e4 715e2d48 000006f8 0722f930 6f99544f KERNEL32!SetEvent+0x10
0722f8f0 6f99544f 00000000 7dbad029 00000000 PictureViewer_NativeHelpers!PictureViewer::NativeHelpers::BatchMoveWorkerThread+0x18
0722f930 758519f1 002cd258 0722f97c 7702d109 vfbasics!AVrfpStandardThreadFunction+0x6f
0722f93c 7702d109 002cd258 07224433 00000000 KERNEL32!BaseThreadInitThunk+0xe
0722f97c 00000000 6f9953e0 002cd258 00000000 ntdll!_RtlUserThreadStart+0x23

This is our thread (BatchMoveWorkerThread) trying to call the SetEvent API.  Note that there's no managed code on this stack - using !clrstack will only reveal that this is not a managed thread.  So we're trying to use the SetEvent API, and the handle is invalid (by the way, we already know that the handle's intended usage scenario was as an event).  Let's take a look at the handle with the !htrace extension, which has the ability of tracing through every handle operation if Application Verifier handle tracking is enabled:

0:009> !htrace 000006f8
--------------------------------------
Handle = 0x000006f8 - *** BAD REFERENCE ***
Thread ID = 0x00000378, Process ID = 0x00002b54

0x76e5037a: +0x76e5037a
--------------------------------------
Handle = 0x000006f8 - CLOSE
Thread ID = 0x00000378, Process ID = 0x00002b54

0x76e5036a: +0x76e5036a
--------------------------------------
Handle = 0x000006f8 - OPEN
Thread ID = 0x00002a7c, Process ID = 0x00002b54

0x76e506fa: +0x76e506fa
--------------------------------------
Handle = 0x000006f8 - CLOSE
Thread ID = 0x000029d4, Process ID = 0x00002b54

0x76e5036a: +0x76e5036a
--------------------------------------
Handle = 0x000006f8 - OPEN
Thread ID = 0x000029d4, Process ID = 0x00002b54

0x76e5039a: +0x76e5039a

--------------------------------------
Parsed 0xBD6 stack traces.
Dumped 0x5 stack traces.

If we trace through the list of operations on the handle, we see that it was opened by threads 29d4 and 2a7c, and closed by threads 29d4 and 378.  Finally, thread 378 tried to reference that handle (this is the moment of time when we walked in) and that was an invalid reference that caused a verifier stop.  This gives us every reason to go and look at the code for the current thread, because before using the SetEvent API it somehow caused the handle to be closed.

But wait, this isn't the thread calling WaitForSingleObject - that thread was a managed thread.  What if the thread calling WaitForSingleObject is waiting for the event (and the wait was issued before the handle was closed), and now the event was supposed to get signaled through SetEvent but the handle was already invalid?

Looking at the managed threads with !threads, !clrstack and kb again, we find the thread waiting for that very handle, in a wait that is unlikely to ever be satisfied...

0:006> kb
ChildEBP RetAddr  Args to Child             
06d3ef38 6f99d6e5 000006f8 00000000 00000000 ntdll!ZwWaitForSingleObject+0x15
06d3ef50 757e1220 000006f8 00000000 00000000 vfbasics!AVrfpNtWaitForSingleObject+0x25
06d3efc0 757e1188 000006f8 ffffffff 00000000 KERNEL32!WaitForSingleObjectEx+0xbe
06d3efd4 6f99d3c2 000006f8 ffffffff 00000000 KERNEL32!WaitForSingleObject+0x12
06d3eff0 011305d5 000006f8 ffffffff 30d379f5 vfbasics!AVrfpWaitForSingleObject+0xc2
WARNING: Frame IP not in any known module. Following frames may be wrong.
06d3f05c 50b8d90b 040fe948 06d3f304 00000000 0x11305d5

By the way, if you think this was impressive, there are so many other options in Application Verifier yet to be explored.  I extend my strongest recommendations to use this tool!

Next Generation Production Debugging: Demo 4

After utilizing WinDbg and SOS to diagnose a memory leak in our application, I shifted focus to a whole different category of problems - deadlocks.

By issuing the "Move" command on a particular picture in the client application, the user ends up with a non-responsive UI.  We can't tell for sure whether the reason for the hang is in the UI or in the WCF service being called without forcing our way in with a debugger.

However, there's a basic way of diagnosing deadlocks on Windows Vista and Windows Server 2008, which is built in into the operating system and requires no external tools whatsoever.  This mechanism is called Wait Chain Traversal, and it enables us to traverse the threads and the synchronization objects these threads are waiting on.  It will only work for kernel synchronization mechanisms (such as a mutex, a semaphore, an LPC port, a Windows message queue and others), and if they have names we will benefit from knowing exactly what object we're talking about.

To make this easier, I've written a simple tool called WCT Deadlock Detector which provides a very thin wrapper on top of the WCT APIs (the tool itself is available in the source code for the session I uploaded earlier).

If you run the tool and choose a process from the left, you get the output from the WCT API:

image

Analyzing what we see on the right, the first and most obvious thing is the big "DEADLOCK" we just detected between the two threads we are looking at.  The basic scenario is that thread A is waiting for mutex 1, which is currently held by thread B.  Thread B, on the other hand, is waiting for mutex 2, which is currently held by thread A.  This results in two threads waiting for each other to release the mutex in order to proceed, and so the system cannot make any forward progress - it is in a state of deadly embrace.

Since we have the mutex names, all we have to do now is go back to the source code and find why the threads are entering this kind of deadly embrace.  A typical technique for fixing deadlocks of this kind is defining a lock leveling strategy.

Let's try something else.  The user, hoping to somehow disentangle the deadlock, tries performing the "Move" operation with the "Overwrite" checkbox checked.  However, the application is still stuck, and the WCT Deadlock Detector doesn't show any useful output except for the fact the threads are blocked waiting for something to happen.

In that case, we have to resort to breaking in with a debugger and trying to analyze what's going on.  Since we don't know whether the service or the client is responsible for the hang, we have to guess.  So let's guess at the service (as the author of the buggy application I can make a really educated guess here).  After attaching and loading SOS, we can use the !threads and !clrstack commands to see what the individual threads are doing.  From the debugger spew we can establish the following two call stacks for the only threads that seem related to the message processing of the client request:

image

image

So both threads are waiting for a .NET Monitor (which is the underlying mechanism for the C# lock keyword).  What are they waiting on and do we have a deadlock here?  The built-in SOS commands will make it difficult for us, even though we could make use of the !syncblk extension.  However, the entire process can be fully automated by using another community debugger extension called SOSEX (so it's really a debugger extension extension).  Once we have it loaded (using .load) we can use the !dlk command to see the following deadlock beautifully detected:

image

We can see the exact objects the two threads are waiting for a lock, and we can see the exact point in code where the threads are stuck waiting to acquire it.  From here we can proceed to the source to fix the problem, which is very similar to the previously discussed native case (where we had kernel synchronization objects, mutexes, causing the deadlock).

Next Generation Production Debugging: Demo 2 and Demo 3

After seeing what can be done without a debugger, it was time to dive in and start experimenting with actual production debugging techniques.  I briefly explained what debugging symbols are (and how you configure your debugger to download symbols for Microsoft product automatically - just set the _NT_SYMBOL_PATH environment variable to srv*C:\Symbols*http://msdl.microsoft.com/download/symbols), and continued to demonstrate how a dump file can be generated.  The one thing many people don't know yet is that on any Windows version you can generate a dump with the tools out of the box, without resorting to any external debugging package.

On NT 5.2 and below (Windows Server 2003, Windows XP, Windows 2000 and so on) we have the built-in NT debugger, ntsd.  If you're really addicted to it you can get it to work on Windows Vista and Windows Server 2008.  You can use the following command line to capture a dump using ntsd:

ntsd -p <ProcessId> -g -c ".dump MyDump.dmp"

On NT 6.0 and above (Windows Vista and Windows Server 2008), the built-in Task Manager can capture a dump for you.  All you need to do is right-click the process and choose "Create Dump File", like in the following screenshot:

image

After you've captured the dump, it can be analyzed further on the production machine (if you have the tools) or on a development machine which is usually better equipped for the task.  Again, there aren't many people who have used Visual Studio to open a dump file, which is a pity, because if you have symbols and sources all lined up, you can get a fantastic experience.  Just fire up Visual Studio, choose File -> Open -> Project or Solution, load the dump and hit F5.  What you have now is the exact state of the application when you have captured the dump, as if you are now debugging it:

image

After having demonstrated that, I proceed to show a more real-life scenario.  In that scenario, the client application issues "Batch Delete" requests for pictures, and the application's memory usage seems to go up with each such request.  This can be established by using Performance Monitor to monitor the native and managed memory usage for the client process.  Subsequently, I fired up WinDbg (from the Debugging Tools for Windows package) and attached to the process (which could effectively be done by opening a dump file, as well).  This is done by choosing File -> Attach to Process, or using the F6 shortcut key.

After attaching, you get a bunch of debugger output (a.k.a. debugger spew), and a command line at the bottom which you can use to input your commands.  WinDbg also has some floating windows (such as call stack, processes and threads, memory etc.) but for our scenario they will prove less useful.

The next thing I did was load the SOS debugging extension, which is the primary debugging extension for .NET applications.  It ships with the .NET framework, so you can expect to find it on any production machine, and of course on your development machine if you have .NET installed.  Since we need to load the right version of SOS depending on the .NET version we have in the process, I issued the .loadby command:

.loadby sos mscorwks

Nothing is printed to the console, which means the debugger is happy and the extension is loaded.  This can be verified by typing !help and seeing a list of commands offered by the SOS extension.

The first command I used was !dumpheap, which proceeds to give you a summary listing of all objects on your GC heap.  This is an extremely useful command because to track down a memory leak you first need to know which objects are occupying the memory that is not being released.

!dumpheap -stat

The debugger output in this case was significantly more elaborate.  It is sorted by total object size in ascending order:

image

The first column here is the method table address, the second column is the number of objects we have from that type, the third column is the total amount of memory these objects occupy, and the final column is the human-readable type name.  So what we know now is that we have 660 arrays of bytes occupying 8.7MB of memory.  If we try the "Batch Delete" command again, we will see even more such arrays.

How do we look at a list of specific array objects?  Well, issue the !dumpheap command again, with the method table address for the System.Byte[] entry (you could also use -type System.Byte[] but it's less accurate because it does string matching against the type name):

!dumpheap -mt 7912dae8

This produces the following output, sorted by object size in ascending order again:

image

So, there are some objects that particularly stand out.  I have performed a "Batch Delete" operation on four pictures, and there are four large arrays of bytes that are for some reason not garbage collected.

Who is keeping these arrays from being garbage collected?  To determine that, we need to take the addresses of these individual arrays and issue the !gcroot command (we can also write a loop in the debugger to automate the work for us):

!gcroot 04476ed0

The output for any one of these four objects will look very similar to the following:

image

What we see here is that the stack pointer (ESP) is holding an Application.ThreadContext object, which holds a reference to the PictureViewerForm (which is the primary form for our application).  The form holds a reference to an EventHandler, which holds a reference to an array of objects, which holds a reference to an EventHandler, which holds a reference to a PictureViewerForm.NotifyOnDeleteSubscriber object, which holds the array of bytes we were looking for in the first place.

This surely looks like an event handler wired up for an event and not getting removed from the event when appropriate, but we can even know which event it is.  Let's remember the address of the first EventHandler in this list, in this case it's 02eabf44.  Additionally, let's issue a !do (dump object) command on the form's address:

!do 02e23c08

The result is:

image

If you examine this listing, you'll see the address I asked you to remember as the address of the instance field called NotifyOnDelete.  This is certainly an event handler.  (For many more walkthroughs on SOS, I suggest you stop whatever you're doing right now and go read the fantastic blog at http://blogs.msdn.com/tess by an Escalation Engineer working for Microsoft Sweden and doing some fantastic work utilizing and evangelizing WinDbg and SOS.)

To automate the process and make it significantly more visual, you can use another obscure extension command called !traverseheap.  This command generates a log file (that can be binary or XML formatted), which you can then feed to CLR Profiler or your own XML reader which analyzes and visualizes the output.

!traverseheap HeapStatus.log

When you launch CLR Profiler and tell it to open the log file, you get a gray dialog with a "Show Heap Graph" button on the bottom right:

image

After clicking that, you get a nice visual representation of what's going on in your application.  Here's what it looks like:

image

If you continue following the graph along the thickest line, you'll eventually find the same objects we were looking at earlier, to get a visual confirmation of where our memory leak stems from:

image

If you are still uncertain, or want to try a different approach, there's a very cool tool called Hawkeye.  For WinForms applications, it provides a runtime object explorer and editor, allowing you to view and edit object properties, invoke methods, view event registrations and lots of other interesting things.  So if we just launch Hawkeye, we get the following dialog, with a large button waiting to be dragged to a form or control within a form:

image

If we drag it to our application's main form, we get the following window, displaying what's going on in our form (the magic works by injecting a DLL into our process):

image

If we click on the events tab, we see the list of events in our client application, and have the ability of viewing the registered subscribers for the event:

image

If you right-click one of the subscribers, you can analyze the tree further by viewing all properties to see where the byte array actually joins in:

image

As I mention SOS, it's also important to note another community tool called SOSAssist, which has the potential of providing a visual user interface on top of SOS commands.  The current state of the tool is rather rudimentary, but with enough community support it can be picked and extended to a state of the art Production Debugging Studio like we all deserve.

A quick note regarding SOSAssist on 64-bit systems: if you try attaching to a process on a 64-bit system, SOSAssist will crash regardless of whether the target process is 32-bit or 64-bit (the latter is currently not supported one way or another).  The reason for that is that the SOSAssist main executable is compiled as IL only without a bitness preference, so on a 64-bit system it will be loaded as a 64-bit process.  However, some of the controls it uses are within 32-bit assemblies which will then fail to load.  This can be easily mitigated by using the corflags tool:

corflags SOSAssist.exe /32bit+

... and voila, SOSAssist works on a 64-bit system!  (This is a case of production debugging a production debugging tool, which really demonstrates that even production debugging is not without its sense of irony.)

Next Generation Production Debugging: Slides, Code and Demo Transcripts (Demo 1)

To each and every one of you who attended my TechEd session - thanks!  There are so many interesting talks and I appreciate the fact you have chosen mine.

As I promised, this post is a summary of slides, demo code and the transcripts of each demo I've shown throughout the session.  (As soon as the session recording will be available, I will update this post with a link to it.)

I divided the demo transcripts into a series of posts because they are fairly long.  You can find everything I've written regarding the TechEd at the TechEd08 blog tag.

First things first - the slides (PPT 2003) and the demo code solution.  (You can build the demos as 64-bit or 32-bit, they work either way.)

My primary objectives in this session have been to:

  1. Make you aware of the business importance of production debugging;
  2. Introduce you to tools that make production debugging as fun and as easy as it can be;
  3. Emphasize the importance of understanding the theory behind the tools and the frameworks that we're using.

The issues we have looked at included (in chronological order, not necessarily the degree of complexity):

  1. Using the Application Compatibility Toolkit to fake an OS version to an application so that is agrees to run;
  2. Capturing a dump file (using Task Manager, ADPlus, and a variety of other tools) and opening it in Visual Studio;
  3. Dissecting a managed memory leak using WinDbg, SOS, CLR Profiler and Hawkeye;
  4. Analyzing a native deadlock using Vista's WCT and SOSEX's deadlock detector;
  5. Disentangling a thread waiting for an invalid handle using WinDbg, the !htrace extension and Application Verifier;
  6. Understanding the activity of an application from a bird's eye view and in a detailed graph using the Windows Performance Toolkit (xperf).

I've blogged about some of those tools in the past in a comprehensive post titled Debugging and Investigation Tools.

The first demo I've shown has to do with the options we have before diving in and looking at debugger spew.  In the first demo, when I launched the PictureViewer client, I encountered the following screen:

image

Clearly, this application has got to be kidding.  I'm on Vista!  What is this nonsense about upgrading to Windows 2000 or Windows XP?

Well, it appears that this application is somehow detecting the operating system version, and treating that information incorrectly.  If only the original application developers (that's myself) used a future-OS emulating tool such as Application Verifier...  Oh well, what can you do.

To understand how the application is detecting the OS version we have two paths to follow.  The first alternative is using Process Monitor to see if the application is accessing the registry or some file paths to determine the OS version.  The second alternative is using APIMon or API Logger to determine whether the application is calling an API such as GetVersionEx.

Once we have established the source of the application's fallacy, we can resolve the "problem" immediately.  The Application Compatibility Toolkit features the ability to produce a compatibility shim which is basically a big bunch of lies that Windows can tell to your application.  (By the way, the toolkit won't install on a 64-bit OS.)

One of these lies is a version lie - the compatibility shim can emulate the GetVersionEx API or the relevant registry keys to make sure the application really thinks it's running on a different version of Windows.  There are many cool compatibility shims and much more work going on at Microsoft in this direction, but the one we are looking for in this case is either the VirtualRegistry shim (to redirect registry access) or the other version lie shims.

What you'll need to do is open the Compatibility Administrator and add a compatibility database for the PictureViewer client application.  The compatibility fix you will need to add is the VirtualRegistry fix, and the command line for that fix can be "NT51" (meaning the registry keys relevant for OS version detection will emulate Windows XP).  When you're done, you can save the compatibility database to an SDB file.  (Just looking at the list of preinstalled compatibility fixes makes you realize the power and potential of the AppCompat shims.)

One way or another, once we have the shim ready as an SDB file, all we need to do is install it on the target machine using the sdbinst command line utility:

image

After doing that, the client application launches successfully.

(Stay tuned as more posts are added with the complete set of demos I've shown at my session.  You can also subscribe to my TechEd08 tag RSS.)

TechEd Eilat 2008: Keynote and Brainteasers

This TechEd's keynote featured .NET brainteasers, an idea originally conceived by Amir Shevat.  The idea is interrupting the keynote once in a while, to ask a short technical question.  The answer seems obvious at first, but there's always a catch.  Following today's keynote, here are the two brainteasers I presented and the explanations.

Brainteaser #1

What does the following code output in the Release build?  (Note the emphasis.)

using System.Threading;

static void Main(string[] args)

{

    Timer t = new Timer(

        delegate { Console.WriteLine(

            DateTime.Now.ToString()); },

        null, 0, 100);

    Console.ReadLine();

}

The possible answers were:

  1. Does nothing because the local variable is garbage collected;
  2. Displays the current date indefinitely until the user presses Enter;
  3. Displays the current date for a limited number of times;
  4. Crashes because the timer's finalizer gets called.

The correct answer is #3, which is pretty shocking for most developers.  During the session I've shown the above code (with a slight modification) run in the debug build and in the release build.  In debug, it proceeds to run indefinitely; in release, it stops after a certain number of times.

Why, you ask?  Well, the timer stops at the first garbage collection that occurs in the application.  So first things first - why would a GC occur?

In the delegate registered to the timer we have a memory allocation in the DateTime.Now.ToString() call (which might seem invisible to the trained manager developer's eyes).  This memory allocation (sooner or later) causes a GC.

Now, why is the timer local variable being collected?  It seems like the main method is still using it...  But in fact, it isn't.  As soon as we get to the Console.ReadLine() line, the local variable is no longer an active root which is considered by the GC when building the object graph.  It makes sense, too, because we want to reclaim memory as soon as it is not referenced by our application, and once we get to the Console.ReadLine(), the timer is indeed not referenced by our application.  (For a deeper explanation of GC roots and the subtleties of the .NET garbage collector, you can refer to Jeffrey Richter's excellent CLR via C# book, or my course at Sela titled .NET Performance.)

Brainteaser #2

Which is the faster way to iterate over a random-access collection (such as List<T> or an array) - "for" or "foreach"?

The possible answers were:

  1. Always "for" because "foreach" creates an enumerator object, and it's garbage;
  2. Always "foreach", because the enumerator can pre-fetch elements from the collection before they are being accessed;
  3. "for" for most collections, but the same performance for arrays because of compiler optimizations;
  4. "foreach" for most collections, but the same performance for arrays because of compiler optimizations.

The correct answer is again #3, which is again quite unexpected.  This is a really fundamental issue, which questions our understanding of the .NET platform.  So why is "for" faster for a collection like a List<T>, and why is the performance identical for arrays?

Let's start with the easy part.  The C# compiler emits semantically identical IL whether you use "for" or "foreach" to iterate an array.  Two copies of identical code will give you the same performance footprint.  So that's great, and makes perfect sense.

However, for a List<T> or any other random-access collection you might come up with, the code generation is a little bit different.  Let's take a look at the conceptual steps we should go through when iterating over a List<T> using "for" and "foreach".

Starting with "for":

List<int> list = new List<int>();

int sum = 0;

for (int i = 0; i < list.Count; ++i)

{

    sum += list[i]; //sum += list.get_Item(i)

}

So if we look at the list[i] line, we see that there's an underlying method call to access the element.  Let's take a look at "foreach":

List<int> list = new List<int>();

int sum = 0;

var enumerator = list.GetEnumerator();

while (enumerator.MoveNext())

{

    sum += enumerator.Current;  //sum += enumerator.get_Current()

}

This time, we have two method calls for each iteration - that's the implementation of .NET enumerators.

I can already hear your objections - the single most important optimization the JIT compiler can offer us is method inlining.  But in this particular case, we don't benefit from inlining!  Why not?  This could be a brainteaser of its own, but I'll kindly give the answer away.  The implementation of IEnumerable and IEnumerator on the List<T> class is not explicitly virtual (you won't see the virtual keyword), but it is implicitly virtual because the methods are part of an interface.  And since the methods are virtual and are part of an interface, they can't be inlined, so we can't get performance equal to a built-in array.  Furthermore, it's clear that "foreach" is slower because it incurs two method calls as opposed to "for" which incurs a single method call.

This was a brief summary of the brainteasers at today's keynote.  I hope you enjoyed it, I sure did, and I am looking forward to future TechEd keynotes where this idea could be expanded to an even better format.  A great thank you goes to Microsoft Israel for making this fantastic event possible!

TechEd Eilat 2008: Less Than Two Hours Left

The TechEd Eilat 2008 is getting closer - the Developers Keynote is less than two hours ahead.  I'm here in Eilat since yesterday afternoon, making sure everything works for my lecture and assisting Yochay (our keynote speaker) with final preparations for the keynote.

image

If you couldn't make it to the TechEd, stay tuned with the TechEd website, featuring live content, blogs, pictures and videos from Eilat.

image The Live It! spirit is everywhere, and there are so many people working hard for several days to make this huge event possible with the minimal number of glitches.  To all of them I extend my huge blanket thank you!  I'm sure it will all go well beyond expectations, so see you around in Eilat, and don't forget to come to my session - DEV444: Next Generation Production Debugging (link to video promo).

Tales from an Interview

For the past couple of years, my side of Tales from an Interview (from "The Daily WTF") has been from the interviewer's position.  Sela Technology Center and Sela College are constantly looking for talented professionals to fill a variety of positions - developers, architects, to-be-consultants, lecturers and a combination of these jobs.

logo.gif

I have my own share of funny stories from interviews.  Probably the funniest one lately was a guy with several years of .NET experience who kept tackling even the most trivial of questions with answers like:

  • If I had to do <something>, I'd go look it up on the web.
  • If I had to do <something>, you can rest assured it's going to be the best <something> you have ever seen!
  • If I had to do <something>, it would be a <something> that would sell for millions and there would be no bugs in it because I would test it so well!
  • If I had to do <something>, I would instead do <something else entirely> because <something> is not interesting at all.

Now replace <something> with a random simple question like putting objects in a collection, loading information from a database table, or serving a dynamic web page to a visitor, and you get the picture.

Anyway, for now, I'll just give you my list of questions that I ask in an interview.  This is by no means a full list, just some of the stuff I have been asking recently.  Some of these questions are fairly difficult, I obviously ask them in an ascending degree of complexity.  BTW, you won't see any ASP.NET/WPF/WCF/TLA questions here, because I wanted to show the bare minimum.

If you can answer every single question without so much as a blink of an eye, I'll certainly love to know where you live and when we can send a SWAT team to seize you for a job at Sela ;-)

For .NET developers:

  • What are the methods of System.Object?  (Name at least 3, and as you name them, explain what their job is.)
  • What does Object.GetHashCode return?  What are the requirements from an object's hash code?
  • How are hash codes implemented differently for value types and reference types?
  • What are the primary differences between value types and reference types?  (Internal representation, allocation, destruction...)
  • What facilities does the System.Type class offer?
  • What is reflection?  Discuss using reflection for obtaining values of internal or private members, or invoking internal or private methods.  Discuss the security issues behind reflection.
  • How would you implement object serialization?  What are the obstacles?  What different kinds of serialization does .NET offer?
  • How would you implement a singleton?  Discuss possible problems with the implementation.
  • When does a .NET static constructor get called?
  • What's the purpose of the .NET GC?  How does it work?  Are there any memory leaks in .NET?  How do you find them?
  • What's the purpose of a finalizer in a .NET class?  Why do we need it?  How do we release unmanaged resources deterministically?
  • What kind of advantages can the JIT have over pre-runtime compilation?
  • What IPC technologies have you used in .NET?  Compare and contrast.
  • What's an AppDomain?  What does it enable?  How do you use it?
  • How are events implemented by the compiler?
  • How do you load an assembly dynamically?  What's the difference between loading by name and loading by path?

For C++ developers:

  • What are the uses of the const keyword?  Why is it important?
  • How is a const method implemented?  (I.e., what stops me from modifying state.)
  • How would you express an interface to a class in C++?
  • How do virtual functions get called (at assembly level)?
  • What is inlining?  How do you achieve inlining?  How do you achieve inlining across translation units?
  • What are templates?  Show some common usages of templates.  Discuss uses of meta-programming.
  • What are STL iterators all about?  What's the advantage compared to using plain old pointers?
  • How would you implement a string class, encapsulating a string of characters?  (With several operators like +, +=, ==, =.)
  • What are exceptions?  How would you deal with resource management when every single line of code can throw an exception?
  • How would you diagnose memory leaks at the language level?

General questions:

  • How do you sharpen your technological skills?  What are the last 3 books, blogs, magazines that you have read, and what have you learned from them?
  • Have you been playing around with the latest bits of <insert here>?
  • What's a thread?  Why would you use a thread?  What kind of problems ensue?
  • How do multiple threads appear to run simultaneously on a single processor?  Who maintains the illusion?  Discuss the illusion.
  • What enables me to use more memory in my applications than I have physically installed on my machine?
  • How do you address a situation where you have more client load than your single server can handle?  Discuss distribution scenarios and technologies.

There are certainly more that I can't remember, but the above list as it is doesn't fit in an hour of an interview anyway.

More Posts Next page »