DCSIMG
June 2010 - Posts - All Your Base Are Belong To Us

All Your Base Are Belong To Us

Mostly .NET internals and other kinds of gory details

June 2010 - Posts

STA Objects and the Finalizer Thread: Tale of a Deadlock

Here’s a non-trivial deadlock that manifests from using a non-pumping wait API and a finalizer. It is another example of why finalizers are a dangerous cleanup mechanism and why you should avoid them at all costs.

Let’s say that you have an STA COM object called NativeComObject that your managed application is using, and you wrap the COM object with a class called FinalizableResource. This latter class has a finalizer that cleans up resources associated with the COM object by calling a cleanup method on it, or even by deterministically releasing the object with Marshal.FinalReleaseComObject.

Note that the object is STA, meaning that if you created it in an application thread, the finalizer thread won’t be able to access the object directly—it will have to send a Windows message to the object’s STA thread and use it to call the method. This completes the picture of a possible deadlock—if the STA thread waits for a resource acquired by the finalizer thread, and the finalizer thread performs a COM method call into the STA, the two threads are blocked waiting for one another.

Fortunately, most .NET synchronization APIs use the moral equivalent of MsgWaitForMultipleObjects (or CoWaitForMultipleHandles), which are APIs that perform message pumping while waiting. However, if you resort to native synchronization APIs (for example, if your STA thread is now in unmanaged code which uses a wait API), you might encounter this deadlock.

This is some sample code that reproduces the problem (assuming, of course, that you have an STA COM object called SimpleComObject on your hands).

namespace ManagedApp
{
  class FinalizableResource
  {
    ISimpleComObject _obj;
    EventWaitHandle _signalWhenDone;

    public FinalizableResource(EventWaitHandle signalWhenDone)
    {
      _obj = new SimpleComObject();
      _signalWhenDone = signalWhenDone;
    }

    ~FinalizableResource()
    {
      //Deadlock here:

      Marshal.FinalReleaseComObject(_obj);

      _signalWhenDone.Set();
    }
  }

  class Program
  {
    [DllImport("kernel32.dll")]
    static extern uint WaitForSingleObject(
        IntPtr handle, uint timeout);

    [STAThread]
    static void Main(string[] args)
    {
      ManualResetEvent waitOn = new ManualResetEvent(false);
      FinalizableResource r = new FinalizableResource(waitOn);
      r = null;
      GC.Collect();      //The finalizer will be called soon
     

      //Deadlock here:
      WaitForSingleObject(waitOn.Handle, 100000);
    }
  }
}

(Note that the “r = null” line might seem redundant because the local variable is no longer used after the line where it is declared, but in Debug builds, local variables are considered GC roots until the end of the scope.)

Here’s what it looks like in the debugger:

0:000> kc 20
ntdll!NtWaitForSingleObject
KERNELBASE!WaitForSingleObjectEx
KERNEL32!WaitForSingleObjectExImplementation
KERNEL32!WaitForSingleObject
0x0
clr!CallDescrWorker
clr!SigParser::GetElemType
clr!MetaSig::MetaSig
0x0
clr!MethodDesc::GetSigFromMetadata

~0s0:002> kc 20
ntdll!NtWaitForSingleObject
KERNELBASE!WaitForSingleObjectEx
KERNEL32!WaitForSingleObjectExImplementation
KERNEL32!WaitForSingleObject
ole32!GetToSTA
ole32!CRpcChannelBuffer::SwitchAptAndDispatchCall
ole32!CRpcChannelBuffer::SendReceive2
ole32!CAptRpcChnl::SendReceive
ole32!CCtxComChnl::SendReceive
ole32!NdrExtpProxySendReceive
RPCRT4!NdrpProxySendReceive
RPCRT4!NdrClientCall2
ole32!ObjectStublessClient
ole32!ObjectStubless
ole32!CObjectContext::InternalContextCallback
ole32!CObjectContext::ContextCallback
clr!CtxEntry::EnterContext
clr!RCW::ReleaseAllInterfacesCallBack
clr!RCW::Cleanup
clr!RCW::FinalExternalRelease
clr!MarshalNative::FinalReleaseComObject
mscorlib_ni
clr!MethodTable::SetObjCreateDelegate
clr!MethodTable::SetObjCreateDelegate
clr!MethodTable::CallFinalizer
clr!WKS::CallFinalizer
clr!WKS::GCHeap::TraceGCSegments
clr!WKS::GCHeap::TraceGCSegments
clr!WKS::GCHeap::FinalizerThreadWorker
clr!Thread::DoExtraWorkForFinalizer
clr!Thread::ShouldChangeAbortToUnload
clr!Thread::ShouldChangeAbortToUnload

I.e, the main thread is calling WaitForSingleObject directly, and the finalizer thread, in its attempt to release a COM object, needs to perform a cross-thread call to the STA thread. Both threads are waiting for each other.

MDA: Callback On Garbage Collected Delegate

One day I’m going to write a long, detailed post about an incredible tool called Managed Debugging Assistants (MDAs). But today is not that day. Instead, I would like to ignite your interest in MDAs by showing you how they immediately make obvious a non-trivial debugging scenario.

Oren writes:

[…] if you run [this code] on a background thread and continue to do additional operations […] it will crash, sometimes with a null reference exception, sometimes with attempt to write to protected memory, etc.

There is a very subtle bug here, can you figure out what it is?

Luckily, this bug is not subtle enough to escape automatic detection. But first things first. I wrote a simple test case that is fairly similar in spirit to the bug Oren is referring to. Then, I ran the executable and received the following exception:

image

An access violation, all right, but what’s the precise location that caused the problem? The DoWork method is a P/Invoke call, but is it responsible for the memory corruption?

Opening the application’s crash dump yields the following call stack for the crash:

0:000> kn
0x120b92
NativeDll!DoWork+0x41 [nativedll.cpp @ 10]
0x3d0195
clr!CallDescrWorker+0x33

0:000> u 00120b92
00120b92 0000            add     byte ptr [eax],al
00120b94 0000            add     byte ptr [eax],al
00120b96 0000            add     byte ptr [eax],al
00120b98 9f              lahf
00120b99 f79d6a286a00    neg     dword ptr [ebp+6A286Ah]
00120b9f 1c01            sbb     al,1
00120ba1 0000            add     byte ptr [eax],al
00120ba3 0000            add     byte ptr [eax],al

The function invoked by DoWork looks like a random bunch of assembly instructions… It’s not immediately evident from what we’ve seen so far where the culprit lies. However, let’s see what happens if we run the application from within Visual Studio, with the debugger attached:

image

Visual Studio hands us the error on a silver platter—or, to be specific, the CallbackOnCollectedDelegate MDA hands us this bug. This is an automatic diagnostic tool that pops up in the middle of your debugging session and shows you the error of your ways. It appears that the delegate must be kept alive until the underlying P/Invoke call is done—if the delegate is garbage collected and a call is attempted through it from unmanaged code, havoc ensues.

CallbackOnGarbageDelegate has many other MDA friends lurking in the Visual Studio “Exceptions” dialog. You can enable and disable MDAs during your Visual Studio debugging session, or in advance using a special configuration file.

As I said, I’m looking forward to writing a couple more posts on MDAs. Until then, you might find the following resources useful:

Assembly.ReflectionOnlyLoad Ignores Assembly Binding Redirects

This is a short post to make you aware of the fact that Assembly.ReflectionOnlyLoad does not honor assembly binding redirects.

Assembly binding redirection allows you to specify at the machine- or application-level that if an application attempts to load a certain version of an assembly, it should load another version instead. For example, I don’t have .NET 1.0 installed on my system, but .NET 1.0 applications can run successfully because there are binding redirects in place. For example, when an application requests the 1.0.3300.0 System.Data.dll assembly, a binding redirect will give it the 4.0.0.0 version.

This is what a Fusion log for this redirect looks like (edited for brevity):

=== Pre-bind state information ===
LOG: DisplayName = System.Data, Version=1.0.3300.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
(Fully-specified)
===
LOG: This bind starts in default load context.
LOG: No application configuration file found.
LOG: Using host configuration file:
LOG: Using machine configuration file from C:\Windows\Microsoft.NET\Framework\v4.0.30319\config\machine.config.
LOG: Version redirect found in framework config: 1.0.3300.0 redirected to 4.0.0.0.
LOG: Post-policy reference: System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
LOG: Reusing an assembly instance that was previously loaded (C:\Windows\Microsoft.Net\assembly\GAC_32\System.Data\v4.0_4.0.0.0__b77a5c561934e089\System.Data.dll).

On the other hand, Assembly.ReflectionOnlyLoad disregards binding redirects and loads only the specific assembly version that you requested. Therefore, trying to load the 1.0.3300.0 System.Data.dll using a reflection-only load produces this Fusion log and an exception:

=== Pre-bind state information ===
LOG: DisplayName = System.Data, Version=1.0.3300.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
(Fully-specified)
===
LOG: This is an inspection only bind.
LOG: No application configuration file found.
LOG: Using host configuration file:
LOG: Using machine configuration file from C:\Windows\Microsoft.NET\Framework\v4.0.30319\config\machine.config.
LOG: GAC Lookup was unsuccessful.

LOG: All probing URLs attempted and failed.

By the way, after trying the GAC, the loader actually tries to look for the assembly in the AppDomain’s base directory and probing path. This can lead to pretty surprising exception messages. Hopefully, this bug won’t bite you now that you know this is by design.

[By the way, this bug affected me because I had some code loading assemblies for reflection-only and pre-loading their references (references are not loaded automatically for reflection-only assemblies). This pre-loading would fail if there ever were a reference that required an assembly binding redirect to resolve…]

My New Desktop

After almost three years using the same desktop PC, I finally switched the whole package. With Alon’s help and recommendations I managed to come up with this spec:

This is my biggest impression from the first few days of working on this rig: an SSD is an incredible improvement. It demonstrates ever so clearly just how much I/O is the bottleneck in today’s PCs. The system boots in under 15 seconds. It takes less than a second to get a fully functional desktop from the moment I finish typing in my password at the login screen. Programs launch instantaneously—it’s as if they were never closed. This is absolutely amazing.

Some screenshots:

image 
(Nothing too shabby, eight logical processors and eight gigs of memory.)

image 
(The measurement was performed while the system was running and many applications were open; a clean test yielded an average transfer rate of 190 MB/s.)

image
(I, too, found it hard to believe, but the SSD got the lowest subscore.)

Viewing Persisted Workflow State

A few weeks ago at work I was toying around with implementing a viewer for persisted workflow instances (with WF 3.5). While there is a Microsoft sample (Workflow Monitor) that displays tracking information recorded for a workflow instance accompanied by a visual designer, there is no API or tool to view the state of a persisted workflow—the property values and object references that are serialized to the workflow persistence store.

It’s always possible to write a custom persistence service that will serialize workflows in any way you deem fit, but I decided to do something simpler. I wrote a trivial serialization framework that hardly does anything but:

  • Query public properties and traverse references recursively;
  • Handle cyclic references by storing a set of visited objects;
  • Serializing primitive types using their simple string representation.

The result of this serialization is an XML document. Next, I wrote a custom persistence service that derives from SqlWorkflowPersistenceService and invokes my serialization framework in its implementation of SaveWorkflowInstanceState. The serialization result is stored in a versioned table, providing the advantage of querying multiple versions of the same workflow instance.

Here’s the really simple viewer:

image

Although this is just a proof-of-concept, it shows that it’s very much possible to view persisted workflow information without implementing a full-blown custom persistence service. It’s easy to envision how the serialization can be turned on or off completely, or how it can be optimized to improve performance (binary form, compression, etc.).

If you care to see how it’s done, you can download the whole thing as a Visual Studio solution. Note that you will need to create an appropriate persistence database with a custom table called PersistedWorkflowState (see the LINQ to SQL data context).