DCSIMG
June 2008 - Posts - All Your Base Are Belong To Us

All Your Base Are Belong To Us

Mostly .NET internals and other kinds of gory details

June 2008 - Posts

The ABC of Blue-Screen Dump Analysis

Kernel-mode crash dump analysis, affectionately called "Blue-Screen Analysis" thanks to the manifestation of kernel-mode crashes in Windows, is an extremely complicated topic to master.  Analyzing user-mode crash dumps is hard enough, plagued by missing information, mismatched symbols, dump corruption and inability to reproduce the live problem.  Kernel-mode crash dumps add a new dimension of complexity due to the interaction of multiple components (drivers, user-mode processes, Windows core services and components) which is often the root cause of the dump.  Additionally, analyzing a dump of significant complexity requires a great amount of knowledge about Windows system mechanisms and kernel-mode programming in general (interrupts, DPCs, APCs, thread scheduling and many other areas well covered by our Windows Internals course).

I've looked into kernel-mode crash analysis in the past, as part of the voluminous "Debugging and Investigation Tools" post, where I demonstrate isolating and pinpointing a faulty driver through the use of Driver Verifier.  For now, though, I would like to focus on the ABC of Blue-Screen Dump Analysis - the steps any of us can take at home to determine why our favorite laptop is giving us the blue-screen goodness with every meal.

Step A - Send Your Error Reports to Microsoft

The easiest way of actually getting your problem diagnosed and resolved if at all possible is sending the error report to Microsoft Online Crash Analysis.  After the system recovers from a blue-screen, it will ask you to send the information to Microsoft, and you should do so.  More often than not, shortly afterwards or a few days or weeks later, there will be a solution available for your problem:

image

If this kind of automatic diagnosis is not enough for your needs; if you're not getting a prompt solution to the problem; if you're curious what happens behind the scenes of a crash dump... then read on.

Last week one of my acquaintances was kind enough to give me the exact material necessary for this kind of ABC post - a collection of 18 blue-screen crash dumps from his laptop, collected across a period of 3 months.  To begin with, where do you actually find this kind of information?

Step B - Dumps Live at %SYSTEMROOT%\Minidump

Try looking at the %SYSTEMROOT%\Minidump folder right now to find out if you've had any blue-screens lately.  On my laptop, from the last 1.5 years, all I have are a measly 5 dumps:

image

As you see, a kernel crash dump is something you can easily send over the Internet to a curious colleague or, as we have already seen, to... Microsoft Online Crash Analysis.  And of course you can open it yourself to see what's lurking inside.

Step C - Bugs Fear WinDbg The Most

The single best tool for diagnosing kernel-mode crash dumps is WinDbg, part of the Debugging Tools for Windows package which I have extensively covered in the past.  It's a free download from Microsoft, and its facilities for analyzing kernel-mode and user-mode problems are truly endless.

All you need to do with a blue-screen dump to get some meaningful information from WinDbg is configure symbols (File -> Symbol File Path -> srv*C:\SymbolCache*http://msdl.microsoft.com/download/symbols) and File -> Open Crash Dump.  The next thing you'll see will closely resemble the following, and it will occur after an unspecified delay while your system is downloading symbols from the web:

Loading Dump File [D:\Temp\Dumps\Mini032308-01.dmp]

Mini Kernel Dump File: Only registers and stack trace are available

Symbol search path is: srv*D:\Symbols*http://msdl.microsoft.com/download/symbols

Executable search path is:

Windows Kernel Version 6001 (Service Pack 1) MP (2 procs) Free x86 compatible

Product: WinNt, suite: TerminalServer SingleUserTS

Built by: 6001.18000.x86fre.longhorn_rtm.080118-1840

Kernel base = 0x81c1a000 PsLoadedModuleList = 0x81d31c70

Debug session time: Sun Mar 23 18:29:59.623 2008 (GMT+3)

System Uptime: 0 days 0:09:01.883

Loading Kernel Symbols

......................................................................................................................................................

Loading User Symbols

Loading unloaded module list

.....

Use !analyze -v to get detailed debugging information.

BugCheck 1A, {4000, 8655d188, 80000000, 17e05c}

 

Probably caused by : memory_corruption ( nt!MiDeleteVirtualAddresses+7ef )

Followup: MachineOwner

---------

The interesting parts are in bold - we have the machine information (Vista SP1, 32-bit, 2 CPU), we have the system uptime (just 9 minutes!) and we have the probable cause right in front of us.  The debugger thinks it's a memory corruption, and suggests that we use the !analyze -v command for more detailed information.  Let's have a look:

1: kd> !analyze -v

MEMORY_MANAGEMENT (1a)

    # Any other values for parameter 1 must be individually examined.

Arguments:

Arg1: 00004000, The subtype of the bugcheck.

Arg2: 8655d188

Arg3: 80000000

Arg4: 0017e05c

 

Debugging Details:

------------------

BUGCHECK_STR:  0x1a_4000

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

PROCESS_NAME:  svchost.exe

CURRENT_IRQL:  0

LAST_CONTROL_TRANSFER:  from 81c584cf to 81ce7163

 

STACK_TEXT: 

a9e7baa4 81c584cf 0000001a 00004000 8655d188 nt!KeBugCheckEx+0x1e

a9e7bbd8 81cab82c 0e430002 0f586fff 8ddec810 nt!MiDeleteVirtualAddresses+0x7ef

a9e7bca8 81caadc5 8ddec810 84751ad8 84574d78 nt!MiRemoveMappedView+0x4aa

a9e7bcd0 81e3eb9d 84574d78 00000000 ffffffff nt!MiRemoveVadAndView+0xe3

a9e7bd34 81e3ecee 8ddec810 0e430000 00000000 nt!MiUnmapViewOfSection+0x265

a9e7bd54 81c71a7a ffffffff 0e430000 043eed4c nt!NtUnmapViewOfSection+0x55

a9e7bd54 77909a94 ffffffff 0e430000 043eed4c nt!KiFastCallEntry+0x12a

WARNING: Frame IP not in any known module. Following frames may be wrong.

043eed4c 00000000 00000000 00000000 00000000 0x77909a94

 

STACK_COMMAND:  kb

 

FOLLOWUP_IP:

nt!MiDeleteVirtualAddresses+7ef

81c584cf cc              int    3

 

SYMBOL_STACK_INDEX:  1

SYMBOL_NAME:  nt!MiDeleteVirtualAddresses+7ef

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: nt

DEBUG_FLR_IMAGE_TIMESTAMP:  47918b12

IMAGE_NAME:  memory_corruption

FAILURE_BUCKET_ID:  0x1a_4000_nt!MiDeleteVirtualAddresses+7ef

BUCKET_ID:  0x1a_4000_nt!MiDeleteVirtualAddresses+7ef

Followup: MachineOwner

---------

Note that we have no specifics regarding the user-mode stack that caused the crash because it's a kernel-only minidump (no user-mode information was captured).  However, we see that the memory_corruption indication is pretty consistent.  Looking this up on the web we see multiple recommendations:

  • Run some memory diagnostic tools
  • Use tools like DebugWiz to further diagnose the problem
  • Send the hardware to the manufacturer for inspection

Let's take a look at another dump (we have 18 of them, so no need to use them sparingly):

BugCheck 1000008E, {c0000005, 81e63829, aea91860, 0}

Probably caused by : ntkrpamp.exe ( nt!PfGetCompletedTrace+138 )

Followup: MachineOwner

---------

 

1: kd> !analyze -v

 

KERNEL_MODE_EXCEPTION_NOT_HANDLED_M (1000008e)

Arguments:

Arg1: c0000005, The exception code that was not handled

Arg2: 81e63829, The address that the exception occurred at

Arg3: aea91860, Trap Frame

Arg4: 00000000

 

Debugging Details:

------------------

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

 

FAULTING_IP:

nt!PfGetCompletedTrace+138

81e63829 894804          mov    dword ptr [eax+4],ecx

 

TRAP_FRAME:  aea91860 -- (.trap 0xffffffffaea91860)

ErrCode = 00000002

eax=00000000 ebx=00000001 ecx=81d341a4 edx=da84a000 esi=81d341c0 edi=81d341b4

eip=81e63829 esp=aea918d4 ebp=aea91928 iopl=0        nv up ei ng nz na pe cy

cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000            efl=00010287

nt!PfGetCompletedTrace+0x138:

81e63829 894804          mov    dword ptr [eax+4],ecx ds:0023:00000004=????????

Resetting default scope

 

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

BUGCHECK_STR:  0x8E

PROCESS_NAME:  svchost.exe

CURRENT_IRQL:  0

LAST_CONTROL_TRANSFER:  from 81e62c63 to 81e63829

 

STACK_TEXT: 

aea91928 81e62c63 01240000 00004000 aea91d30 nt!PfGetCompletedTrace+0x138

aea919a0 81e6e0ca 00000000 adb85501 aea91d30 nt!PfQuerySuperfetchInformation+0x204

aea91d4c 81c8ca7a 0000004f 012bf370 00000014 nt!NtQuerySystemInformation+0x2201

aea91d4c 77629a94 0000004f 012bf370 00000014 nt!KiFastCallEntry+0x12a

WARNING: Frame IP not in any known module. Following frames may be wrong.

012bf598 00000000 00000000 00000000 00000000 0x77629a94

 

STACK_COMMAND:  kb

FOLLOWUP_IP:

nt!PfGetCompletedTrace+138

81e63829 894804          mov    dword ptr [eax+4],ecx

 

SYMBOL_STACK_INDEX:  0

SYMBOL_NAME:  nt!PfGetCompletedTrace+138

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: nt

IMAGE_NAME:  ntkrpamp.exe

DEBUG_FLR_IMAGE_TIMESTAMP:  47918b12

FAILURE_BUCKET_ID:  0x8E_nt!PfGetCompletedTrace+138

BUCKET_ID:  0x8E_nt!PfGetCompletedTrace+138

Followup: MachineOwner

---------

This one sure looks different.  This time the module that takes the blame is not the generic memory_corruption, but the very specific ntkrpamp.exe which is the Windows kernel itself!  Examining the stack trace, it seems like a very innocent stack related to the SuperFetch memory caching and preloading feature which is built into the OS, triggering an access violation.  A random write bug is possible but unlikely, especially since we have seen traces of memory corruption in the previous dump, and SuperFetch is one of those services accessing memory quite heavily.  Let's take a look at another one:

BugCheck 50, {fb400428, 1, 81e71d60, 0}

Probably caused by : win32k.sys ( win32k!vSolidFillRect1+107 )

Followup: MachineOwner

---------

 

0: kd> !analyze -v

 

PAGE_FAULT_IN_NONPAGED_AREA (50)

Arguments:

Arg1: fb400428, memory referenced.

Arg2: 00000001, value 0 = read operation, 1 = write operation.

Arg3: 81e71d60, If non-zero, the instruction address which referenced the bad memory address.

Arg4: 00000000, (reserved)

 

Debugging Details:

------------------

FAULTING_IP:

nt!RtlFillMemoryUlong+10

81e71d60 f3ab            rep stos dword ptr es:[edi]

 

MM_INTERNAL_CODE:  0

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

BUGCHECK_STR:  0x50

PROCESS_NAME:  devenv.exe

CURRENT_IRQL:  0

 

TRAP_FRAME:  8e00f840 -- (.trap 0xffffffff8e00f840)

ErrCode = 00000002

eax=00f0f0f0 ebx=00000202 ecx=00000011 edx=00000011 esi=fb200008 edi=fb400428

eip=81e71d60 esp=8e00f8b4 ebp=8e00f8e8 iopl=0        nv up ei pl nz na pe nc

cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000            efl=00010206

nt!RtlFillMemoryUlong+0x10:

81e71d60 f3ab            rep stos dword ptr es:[edi]  es:0023:fb400428=????????

Resetting default scope

 

LAST_CONTROL_TRANSFER:  from 81e78bb4 to 81ec3155

 

STACK_TEXT: 

8e00f828 81e78bb4 00000001 fb400428 00000000 nt!MmAccessFault+0x10a

8e00f828 81e71d60 00000001 fb400428 00000000 nt!KiTrap0E+0xdc

8e00f8b4 961106f7 fb400428 00000044 00f0f0f0 nt!RtlFillMemoryUlong+0x10

8e00f8e8 9610bcc7 8e00fb44 00000001 fb200008 win32k!vSolidFillRect1+0x107

8e00fa88 9610b8b7 961105f0 8e00fb44 fda2dac8 win32k!vDIBSolidBlt+0x102

8e00faf4 960ded53 ffa81008 00000000 00000000 win32k!EngBitBlt+0x18e

8e00fb60 9609947b fda2da5c fda2dac8 181f35b1 win32k!ExtTextOutRect+0x1cf

8e00fbc8 960f8775 8e00fd0c 7ffdf2e4 006ce26c win32k!GreBatchTextOutRect+0xcb

8e00fd34 81e75a1c 00000099 0020ee6c 0020ee90 win32k!NtGdiFlushUserBatch+0x134

8e00fd44 77309a94 badb0d00 0020ee6c 00000000 nt!KiFastCallEntry+0xcc

WARNING: Frame IP not in any known module. Following frames may be wrong.

8e00fd48 badb0d00 0020ee6c 00000000 00000000 0x77309a94

8e00fd4c 0020ee6c 00000000 00000000 00000000 0xbadb0d00

8e00fd50 00000000 00000000 00000000 00000000 0x20ee6c

 

STACK_COMMAND:  kb

 

FOLLOWUP_IP:

win32k!vSolidFillRect1+107

961106f7 8b55f4          mov    edx,dword ptr [ebp-0Ch]

 

SYMBOL_STACK_INDEX:  3

SYMBOL_NAME:  win32k!vSolidFillRect1+107

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: win32k

IMAGE_NAME:  win32k.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  47c78851

FAILURE_BUCKET_ID:  0x50_W_win32k!vSolidFillRect1+107

BUCKET_ID:  0x50_W_win32k!vSolidFillRect1+107

Followup: MachineOwner

---------

This time, it's win32k.sys (the built-in windowing and graphics driver) taking the blame for the crash, as part of some code that appears to be filling out memory.  The originating process this time is devenv.exe (Visual Studio itself).  Again, it's highly unlikely that the win32k code is indeed at fault here - either it's a physical memory corruption, or some faulty driver is running over memory.  Let's take a look at a final, fourth dump before we start coming up with action items:

BugCheck 1A, {4000, 8d6a3678, 80000000, 17dfed}

Probably caused by : memory_corruption ( nt!MiDeleteVirtualAddresses+7ef )

Followup: MachineOwner

---------

 

0: kd> !analyze -v

 

Debugging Details:

------------------

BUGCHECK_STR:  0x1a_4000

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

PROCESS_NAME:  iexplore.exe

CURRENT_IRQL:  0

LAST_CONTROL_TRANSFER:  from 81e8e4cf to 81f1d163

 

STACK_TEXT: 

c2da4b5c 81e8e4cf 0000001a 00004000 8d6a3678 nt!KeBugCheckEx+0x1e

c2da4c94 81ee236e 0e770000 0ed41fff 07a4b321 nt!MiDeleteVirtualAddresses+0x7ef

c2da4d2c 81ea7a7a ffffffff 0e33ee50 0e33ee44 nt!NtFreeVirtualMemory+0x652

c2da4d2c 77469a94 ffffffff 0e33ee50 0e33ee44 nt!KiFastCallEntry+0x12a

WARNING: Frame IP not in any known module. Following frames may be wrong.

0e33ed9c 00000000 00000000 00000000 00000000 0x77469a94

 

STACK_COMMAND:  kb

FOLLOWUP_IP:

nt!MiDeleteVirtualAddresses+7ef

81e8e4cf cc              int    3

 

SYMBOL_STACK_INDEX:  1

SYMBOL_NAME:  nt!MiDeleteVirtualAddresses+7ef

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: nt

DEBUG_FLR_IMAGE_TIMESTAMP:  47918b12

IMAGE_NAME:  memory_corruption

FAILURE_BUCKET_ID:  0x1a_4000_nt!MiDeleteVirtualAddresses+7ef

BUCKET_ID:  0x1a_4000_nt!MiDeleteVirtualAddresses+7ef

Followup: MachineOwner

---------

Ah, it's our friend memory_corruption again, this time with iexplore.exe (Internet Explorer) as the current process responsible.  Time to wrap it up.

Conclusion: We are either looking at a machine with defective physical memory, overclocked physical memory or some other kind of hardware problem, or a misbehaving driver that is randomly corrupting memory as part of its normal operation.  In the former case, we can run memory diagnostic tools and send the machine to the manufacturer for replacement; in the latter case, we are looking at a long story of downloading latest versions of all drivers, ensuring that no rogue or irrelevant drivers are installed, enabling Driver Verifier on suspect drivers and waiting to reproduce the problem and catch the faulty component in the act.

The Case of RegSvr32 and the Haunted DLL

Last week I've resolved a simple "debugging" case by phone, and figured that it might benefit putting it online.  Here's the approximate outline of the call:

Customer: Sasha, we have a COM component registration that is failing because regsvr32 says it can't find the DLL.

Myself: Where is it looking for that DLL?

Customer: It doesn't matter where it's looking.  We put the component in the System32 directory, but it still complained that it can't find it.

Myself: [With a flash of psychic debugging] Is it a 64-bit system and you're trying to register a 32-bit component?  You should put the component in the SysWOW64 directory.

Customer: Yes, how did you know that?

This is one of these psychic debugging cases, where you can see the answer in a split second if you've encountered the situation before.  When a 32-bit application (be it an installer or the 32-bit regsvr32.exe) thinks it's looking for a file in the System32 directory, it's actually looking for the file in the SysWOW64 directory.

The reason for this behavior is plain old compatibility.  32-bit applications running on top of the Windows-on-Windows64 layer should be none the wiser regarding the location of system DLLs.  Therefore, file system redirection ensures that any accesses (even hard-coded accesses) to the System32 folder are routed to SysWOW64.  The same applies to the Program Files directory, which on a 64-bit system is replicated to Program Files (x86) for 32-bit applications.  And finally, if you remain incautious, the same issue can bite you with registry redirection - there is a separate view of the registry for 32-bit applications on 64-bit Windows, and only a small number of keys are reflected across both registry views for interoperability scenarios.

This can lead to the very frustrating situation where you're repeatedly trying to copy the file to System32 and run the registration code, only to be told that the file could not be found.

If you're keen enough on porting your applications to 64-bit Windows, you're probably not going to port every single line of code you've ever written - at the same time.  .NET applications are easiest to port, but native code takes time.  Therefore, you are probably going to end up running 32-bit applications on 64-bit Windows and getting 32-bit processes to talk to 64-bit processes.  This can be challenging, and we at Sela have prepared a 2-day course for addressing the new 64-bit architecture, improvements and practical porting issues for managed and native applications, and writing high-performance concurrent applications on top of the newest versions of Windows.

Non-Paging CLR Host

I've just uploaded a new open-source project called "Non-Paged CLR Host" to CodePlex, in collaboration with Alon Fliess.  This project features a custom CLR host that can be used for executing any existing .NET application with little or no modification.  This custom CLR host ensures that all memory given out to the CLR is locked into physical memory if possible, thus eliminating paging completely.

This can provide two important and distinct advantages to server and client applications alike:

  1. Applications will benefit from no paging during normal operation.  Even if other applications are actively allocating memory, allocations performed under the non-paged CLR host will be locked into physical memory.
  2. No paging will occur when the application is idle, providing a great benefit to low-latency processes such as GUI applications (even if the user has fallen asleep in front of the monitor).  The normal working set management scheme employed by Windows will not affect processes running under the non-paged CLR host.

The non-paged CLR host is available in x86 (32-bit) and x64 (64-bit) builds.  You can compile your own flavor for Windows 2000 and above on any supported processor.  Please note that this is a preliminary version that has not been extensively tested, so we strongly recommend that you use it in a controlled environment for testing purposes only.

Using the non-paged CLR host is extremely simple.  The current host consists of a console application that executes an assembly passed to it via the command line.  The only constraint imposed on the code to be executed is that it must reside in a static method which returns an int and accepts a string as a parameter (note: not string[]), such as the following method:

#pragma warning disable 0028

        public static int Main(string str)

#pragma warning restore 0028

        {

            return str.Length;

        }

(Warning CS0028 indicates that you have a Main method with a mismatched signature.)  Assuming that this method is placed in a class called Program that resides in an assembly called TestHost.dll, the following command line can be used to execute it:

AweClrHost_Win32_Release.exe TestHost.dll Program Main

You can also pass the parameter to the Main method on the same command line.  It defaults to the minimum working set size reservation that the host has been able to reserve for the process (as a string).

From an implementation perspective, the non-paged CLR host uses the SetProcessWorkingSetSize, SetProcessWorkingSetSizeEx (on Windows Server 2003 and above) and VirtualLock APIs to ensure that memory allocated by it is locked into physical memory.  Note that using the above APIs does not guarantee with absolute certainty that no paging will occur; instead, it minimizes the odds of it occurring to very exceptional scenarios.  (And from a practical perspective, they are really unlikely.)  In some load tests I conducted, even when the system as a whole was hogged by lack of physical memory, no page faults were observed in the process using the non-paged CLR host.  In future posts, we will (hopefully) delve into the implementation highlights of a custom CLR host and some of the surprises that lurk along the way.

Wrapping this up, Alon and I will be very glad to hear about your experiences with this preliminary release.  There are many interesting scenarios that can benefit from using the non-paging CLR host, and we would appreciate any such scenarios you might have encountered or any other kind of feedback.  You can contact me through the blog, as usual.  Alon can be contacted through his blog as well.

Forwarding Context-ful Messages

Workflow Services (introduced in .NET 3.5) are based on a simple convention for passing the workflow instance identifier from the client to the workflow and from the workflow to any services it invokes.  This convention revolves around the use of context-ful bindings (BasicHttpContextBinding, WSHttpContextBinding, NetTcpContextBinding and others) and a simple dictionary which contains a key called "instanceId" and a value that contains the workflow instance identifier.

This information is passed out-of-band to facilitate cleaner interfaces - it's passed in a SOAP header called ContextMessageHeader (which is an internal WCF class), and can be accessed from the channel's context manager or a message property called ContextMessageProperty.  (For completeness it's also worth stating that the context information can be passed in an HTTP cookie - which is the approach taken by the BasicHttpContextBinding - but we will focus on the SOAP header approach.)

If a client wants to communicate with a specific workflow instance, then the following plumbing code will ensure that the right instance will service the request:

//Extension for applying instance id to context

public static void ApplyInstanceId(this IContextChannel proxy, Guid id)

{

    IContextManager ctxManager = proxy.GetProperty<IContextManager>();

    IDictionary<string,string> ctx = ctxManager.GetContext();

    ctx.Add("instanceId", id.ToString());

    ctxManager.SetContext(ctx);

}

This extension method can be used on the proxy to the workflow endpoint:

IWorkflow workflowProxy = ChannelFactory<IWorkflow>.CreateChannel(

    Common.Binding, new EndpointAddress(Common.WorkflowAddress));

 

((IContextChannel)workflowProxy).ApplyInstanceId(workflow.InstanceId);

This causes the message to include a context message header with the workflow instance identifier embedded in it.  The message will appear similar to the following (note the Context element in bold italic):

<s:Envelope

    xmlns:s="http://www.w3.org/2003/05/soap-envelope"

    xmlns:a="http://www.w3.org/2005/08/addressing">

    <s:Header>

        <a:Action s:mustUnderstand="1">

            http://tempuri.org/IWorkflow/Echo

        </a:Action>

        <a:MessageID>

            urn:uuid:610c88d9-3365-4aa5-ac8c-3e7949621c80

        </a:MessageID>

        <a:ReplyTo>

            <a:Address>

                http://www.w3.org/2005/08/addressing/anonymous

            </a:Address>

        </a:ReplyTo>

        <Context xmlns="http://schemas.microsoft.com/ws/2006/05/context">

            <Property name="instanceId">

                dec52f54-c51b-4025-888d-58f29507f572

            </Property>

        </Context>

        <a:To s:mustUnderstand="1">

            http://localhost:9092/Intermediary/RR

        </a:To>

    </s:Header>

    <s:Body>

        <Echo xmlns="http://tempuri.org/">

            <message>Hello</message>

        </Echo>

    </s:Body>

</s:Envelope>

On the other hand, when receiving a callback from a workflow, a service has to use the following plumbing code to determine which specific workflow instance is responsible for the call:

//Extension for extracting instance id from message properties

public static string GetInstanceId(this MessageProperties properties)

{

    ContextMessageProperty property;

    if (!ContextMessageProperty.TryGet(properties, out property))

    {

        throw new InvalidOperationException("No ContextMessageProperty");

    }

    return property.Context["instanceId"];

}

This extension method can be used on the incoming message properties:

string workflowInstanceId =

    OperationContext.Current.IncomingMessageProperties.GetInstanceId();

This is all fairly straightforward and well-covered by existing technology samples.  The challenge is to forward context-ful messages through an intermediary (such as the intermediary covered in previous posts).  The naive approach is not going to work for a two primary reasons.

To begin with, the IContextManager approach is incompatible with the ContextMessageProperty.  This means that if the outgoing channel created within the router has context management enabled and the outgoing message properties contain the context message property, an exception will be thrown.  This means that we have to disable context management on the outgoing channel or remove the context message property before forwarding the message.  Either option is feasible, so here's how to disable context management (the alternative is left as an exercise for the reader):

//Extension for disabling context management on the channel

public static void DisableContextManagement(this IContextChannel channel)

{

    IContextManager ctxManager = channel.GetProperty<IContextManager>();

    ctxManager.Enabled = false;

}

This can be used on any outgoing channel.  For example, in the sample project featured in this post, the router code for request-reply messages now becomes:

public Message ActionRR(Message request)

{

    //Forward to workflow

    IGenericRR proxy = ChannelFactory<IGenericRR>.CreateChannel(

        Common.Binding, new EndpointAddress(Common.WorkflowAddress));

    ((IContextChannel)proxy).DisableContextManagement();

    return proxy.ActionRR(request);

}

Unfortunately, this still isn't going to work.  In fact, if you try this code out, you'll find that there's no exception thrown, but the message doesn't reach the workflow.  It's silently swallowed and there's absolutely no indication that anything went wrong in the process.

(Fast-forward countless hours of frustrating debugging.)  It appears that the context header present in the incoming message prevents the message from being successfully forwarded.  The context header is added again when the message is dispatched by the outgoing channel, and the header's presence somehow causes the message to be lost.  Therefore, we must remove the context header explicitly:

//Extension for removing the context header

public static void RemoveContext(this MessageHeaders headers)

{

    headers.RemoveAll(ContextHeaderName, ContextHeaderNamespace);

}

 

const string ContextHeaderName = "Context";

const string ContextHeaderNamespace = "http://schemas.microsoft.com/ws/2006/05/context";

This code can now be incorporated into the message-forwarding logic outlined above:

public Message ActionRR(Message request)

{

    //Forward to workflow

    request.Headers.RemoveContext();

    IGenericRR proxy = ChannelFactory<IGenericRR>.CreateChannel(

        Common.Binding, new EndpointAddress(Common.WorkflowAddress));

    ((IContextChannel)proxy).DisableContextManagement();

    return proxy.ActionRR(request);

}

This must be done regardless of whether the intermediary is processing a message directed at a workflow or a message originating from a workflow.  Without removing the context header, the message will fail to be processed.

To connect all these seemingly disconnected pieces of code, you can download a sample project demonstrating a scenario where a client communicates to a workflow through an intermediary and the workflow proceeds to send a one-way notification to the client through the same intermediary.

image

Constructing an Empty WCF Reply Message

We're continuing the series of posts arising from the implementation intricacies of a WCF router.  Today's post features a seemingly simple task: Constructing an empty WCF reply message, which is the equivalent of the message sent in response to an operation which returns void (but is not one-way).

The motivation for this could be the following: A router needs to dispatch a message to multiple subscribers.  However, the router itself is unaware of the data contract - it is willing to work on an untyped Message-based contract.  The operation doesn't have a return type - it's void.  On the other hand, the operation is not one-way for some reason - for example, it might require transactional semantics to assure that the message is delivered and made durable as part of the client transaction.  This would require the router to construct the semantic equivalent of the message that would be sent in response to the operation if the service was working with the original contract, and not the untyped Message-based contract the router is familiar with.

Let's observe the request and reply messages for a simple operation called Hello which takes a string parameter.  The request message looks like this:

<s:Envelope xmlns:s="http://www.w3.org/2003/05/soap-envelope"

            xmlns:a="http://www.w3.org/2005/08/addressing">

  <s:Header>

    <a:Action s:mustUnderstand="1">

        http://tempuri.org/IHello/Hello

    </a:Action>

    <a:MessageID>urn:uuid:2ec62f21-b334-4e50-85e0-586decdef121</a:MessageID>

    <a:ReplyTo>

      <a:Address>http://www.w3.org/2005/08/addressing/anonymous</a:Address>

    </a:ReplyTo>

    <a:To s:mustUnderstand="1">http://localhost:9090/Hello</a:To>

  </s:Header>

  <s:Body>

    <Hello xmlns="http://tempuri.org/">

      <str>Hello World!</str>

    </Hello>

  </s:Body>

</s:Envelope>

And the response message looks like this:

<s:Envelope xmlns:s="http://www.w3.org/2003/05/soap-envelope"

            xmlns:a="http://www.w3.org/2005/08/addressing">

    <s:Header>

        <a:Action s:mustUnderstand="1">

            http://tempuri.org/IHello/HelloResponse

        </a:Action>

        <a:RelatesTo>

            urn:uuid:2ec62f21-b334-4e50-85e0-586decdef121

        </a:RelatesTo>

    </s:Header>

    <s:Body>

        <HelloResponse xmlns="http://tempuri.org/" />

    </s:Body>

</s:Envelope>

This seems simple enough to fake.  The RelatesTo header is not critical (it provides the correlation between the request and response messages, but we can live without it).  The Action header must be present, and the response body has to be present because that's what the operation formatter on the client side expects.

I wasn't able to find an elegant way to generate the response - all relevant classes which implement IDispatchOperationFormatter are internal, and considered an implementation detail.  Therefore, I had to resort to the following (highly fragile) code to construct the response message:

XmlDictionaryReader requestReader =

    message.GetReaderAtBodyContents();

string bodyElemName = requestReader.Name;

string bodyNS = requestReader.NamespaceURI;

requestReader.Close();

 

XmlDictionaryReader bodyReader =

    XmlDictionaryReader.CreateDictionaryReader(

        XmlReader.Create(

            new StringReader(

                "<" + bodyElemName + "Response " +

                "xmlns='" + bodyNS + "' />"

                )));

 

Message reply = Message.CreateMessage(

    message.Version,

    message.Headers.Action + "Response",

    bodyReader);

It works, but if there's a better way that you can think of, I'd be delighted to hear about it.

Obtaining an Untyped WCF Message from a Typed Service Operation

Most WCF services operate on a typed message contract.  In other words, the underlying Message object is not available because it is parsed by the default WCF operation invoker into the data contract that the service operation expects.

However, oftentimes you need access to the underlying Message object even though your typed service does not directly consume it.  For example, you might want to automatically serialize the message, or pass it to other services that expose an untyped message contract (such as routing services or notification publishing services outlined in my previous posts).

This can be accomplished by making a copy of the incoming message and later obtaining it from within the typed service operation.  The most intuitive interception point for making such a copy is an implementation of IDispatchMessageInspector.  The most intuitive location for storing the message is a message property, because it is transient and does not get serialized into subsequent message calls.  (Note that storing the message in thread-local storage is an appealing option, but the service call is not guaranteed to be performed on the same thread that the message inspector is called on.  In fact, from experience, this is rarely the case with one-way operations.)

Consequently, we need is a message property for caching the message, which can install itself onto the current operation context's incoming message properties collection, and retrieve itself from that collection:

public sealed class MessageCacheProperty

{

    public const string Name = "MessageCacheProperty";

 

    public Message Message { get; private set; }

 

    public MessageCacheProperty(Message message)

    {

        Message = message;

    }

 

    public static Message GetContextMessage()

    {

        OperationContext ctx = OperationContext.Current;

        MessageCacheProperty messageProperty =

            (MessageCacheProperty)

            ctx.IncomingMessageProperties[Name];

        return messageProperty.Message;

    }

    public static void Install(Message message)

    {

        OperationContext ctx = OperationContext.Current;

        ctx.IncomingMessageProperties.Add(

            Name, new MessageCacheProperty(message));

    }

}

This property can now be installed using an implementation of IDispatchMessageInspector that is installed on our service's endpoints (through an IEndpointBehavior or an IServiceBehavior extension):

public sealed class MessageCacheInspectorAttribute :

    IDispatchMessageInspector

{

    public object AfterReceiveRequest(

        ref Message request,

        IClientChannel channel,

        InstanceContext instanceContext)

    {

        MessageBuffer copy =

            request.CreateBufferedCopy(int.MaxValue);

        MessageCacheProperty.Install(copy.CreateMessage());

        request = copy.CreateMessage();

 

        return null;

    }

 

    //The operation is one-way,

    //so this won't be called anyway

    public void BeforeSendReply(

        ref Message reply,

        object correlationState)

    {

    }

}

Note that we make two copies of the messages, because copying the message consumes it and makes it unusable for subsequent processing.

The above code makes the message available in any service operation that passes through this inspector, e.g.:

Message toSend = MessageCacheProperty.GetContextMessage();

Parallel Extensions and Native Code?

In my recent post series on the June '08 Parallel Extensions CTP we have looked at a multitude of new features for concurrent programming in managed code.  When the framework is finally released, we will have a lightning-fast scheduler, a fully functional parallel execution model for LINQ queries, new concurrent collection classes, new synchronization primitives featuring better performance, pipeline and out-of-order task execution models, and so much more.

But what about native code?  What about us native developers?  Surely there must be an ongoing effort to provide concurrent programming libraries for native code?

There surely is!  The Native Concurrency MSDN blog, started just a few days ago, is the first indicator of the concurrent programming effort in native code.  New libraries and constructs together with new C++0x language features such as lambdas can provide a concurrent programming experience that is a good match for the managed PFX.  The very first post on the Native Concurrency blog shows one example of this with regard to matrix multiplication, which yields itself easily to parallelization.  Instead of:

void MatrixMult(int size, double** m1,

                double** m2, double** result)

{

    for (int i = 0; i < size; i++)

    {

        for (int j = 0; j < size; j++)

        {

            for (int k = 0; k < size; k++)

            {

                result[i][j] += m1[i][k] * m2[k][j];

            }

        }

    }

}

...which is the non-concurrent example, the following is very similar to the managed Parallel.For and achieves parallelization:

void MatrixMult(int size, double** m1,

                double** m2, double** result)

{

    parallel_for (0,size,1,[&](int i)

        {

            for (int j = 0; j < size; j++)

            {

                for (int k = 0; k < size; k++)

                {

                    result[i][j] += m1[i][k] * m2[k][j];

                }

            }

        });

}

Note the bizarre [&] syntax, which is the proposed C++0x syntax for lambda expressions in C++.  This effectively parallelizes the outer loop across multiple threads and processors if available.

However, none of this is very new.  Many years before the managed Parallel Extensions have been introduced, the OpenMP standard defined C and C++ extensions for parallelizing code at the compilation level using #pragma directives, which have been implemented by Microsoft in Visual Studio.  If you have Visual Studio 2005 or Visual Studio 2008, you can compile with /OpenMP (under Project Properties -> C++ -> Language), link to vcomp.lib and try the following C++ code right away:

void MatrixMult(int size, double** m1,

                double** m2,double** result)

{

    #pragma omp parallel for

    for (int i = 0; i < size; i++){

        for (int j = 0; j < size; j++){

            for (int k = 0; k < size; k++) {

                result[i][j] += m1[i][k] * m2[k][j];

            }

        }

    }

}

This effectively parallelizes the outer loop, and has been available in Visual Studio for quite some time.  In fact, OpenMP is also supported for C++/CLI code (with /CLR), rendering it a feasible alternative to the managed Parallel Extensions.

Other constructs of interest in OpenMP include #pragma omp parallel section for executing multiple regions in parallel (similar to Parallel.Invoke), #pragma omp single for ensuring atomic access, environment routines such as omp_set_num_threads for controlling the degree of parallelism, and other advanced scheduling features.

If you haven't tried OpenMP before, you could get your hands on it right away, without waiting for C++0x and for the next release of the Microsoft native concurrency framework.

Waltzing Through the Parallel Extensions June CTP: Known Issues

In the previous posts in this series, we have looked at a multitude of features provided by the PFX June CTP, including synchronization mechanisms, task-related features and new collection classes.  However, there's also a large list of known issues with this release - it's obviously not production-ready, but nonetheless is a great milestone by the Parallel Extensions team.  The most interesting issues mentioned are:

  • TPL threads are not cleanly shut down when run in the Visual Studio test host.  This effectively means that it's difficult to write unit tests for code that uses the Parallel Extensions.  There's a work around by explicitly creating a TaskManager instance and explicitly disposing of it before letting the unit test terminate.
  • Some PLINQ operations without order-preservation enabled (using the AsOrdered() method) exhibit meaningless behavior.  For example, the Skip(N) operator without order-preservation will skip N elements in the randomly-ordered input, and not the first N elements.  This is, of course, by design.
  • Multi-core machines running a dedicated application or two might benefit from switching to the Server GC flavor if there's a significant amount of GC going on (the Server GC flavor performs garbage collections in parallel on all available processors, using dedicated GC threads, one per processor, yielding a potential significant collection speedup).  The GC flavor used by the default CLR host is the Workstation Concurrent GC, and switching to Server GC is a wise recommendation even for applications that aren't using the Parallel Extensions at all.  (Although there's a variety of good reasons why this is not the default.)

Waltzing Through the Parallel Extensions June CTP: Collection Classes

In the previous posts in this series, we have looked at the new synchronization mechanisms and the new task-related features in the PFX June CTP.  This post features a brief overview of the new collection classes introduced in the CTP.

In the new System.Threading.Collections namespace we find three new classes which facilitate concurrent programming.  These collections do not yet represent the wealth of concurrent and non-blocking collections that might be implemented in the future, but they are certainly a good sign.

A concurrent collection in the PFX nomenclature is what we recognize by the names of a non-blocking or a wait-free or a lock-free collection, i.e. a collection that does not incur a kernel-mode transition (wait) when items are added or removed.  (In my DevAcademy2 session last year I have demonstrated the performance and scalability benefits of using lock-free collections.)  Concurrent collections in PFX implement the IConcurrentCollection<T> interface, which extends IEnumerable<T> and ICollection:

public interface IConcurrentCollection<T> :

    IEnumerable<T>, ICollection, IEnumerable

{

    bool Add(T item);

    bool Remove(out T item);

    T[] ToArray();

}

The concurrent (i.e. lock-free) collections featured in this release are the ConcurrentQueue<T> and ConcurrentStack<T> collections.  These are classic implementation of a lock-free queue and lock-free stack, which rely on spinning internally when there's significant contention for adding or removing elements.  They also support enumeration semantics, with the caveat that if the collection is modified during enumeration, no exception is thrown and the enumeration can proceed and retrieve stale information.  Since the collection is lock-free, there is no way to synchronize enumeration with add or remove operations, because there's no way to make enumeration over N elements an atomic coordinated operation without true locking.

Note that both ConcurrentQueue<T> and ConcurrentStack<T> are based on a singly-linked list.  This is significantly less efficient than an array-based implementation for two reasons: First, there is significantly more garbage created by the add operations, because a node must be allocated (and the node is a reference type, adding to the overhead).  Second, traversing a linked list is slower than traversing an array because an array features explicit locality.  There's work in progress in the direction of alleviating these concerns, and we'll see where it leads.

The most interesting class of the bunch is the BlockingCollection<T> adapter class, which wraps a concurrent collection (any implementation of IConcurrentCollection<T>, defaulting to ConcurrentQueue<T>).  It provides the facilities for blocking when elements are removed or added to the collection when a bound is reached.  For example, it the collection is bounded to 1,000 elements, then when the bound is reached the add operation will block.  Alternatively, if the collection is currently empty then the remove operation will block.  Internally, the blocking collection adapter is implemented with two slim semaphores.

To streamline scenarios such as those where the blocking collection is used as a work-item coordination queue, there are also the static AddAny and RemoveAny methods, which can add or remove elements to any blocking collection from a set of blocking collections.  For example, if we implement a producer-consumer scenario with multiple producers and multiple consumers, then the producers can use the AddAny method to add the work item to any of the consumer's blocking queues, and the consumers can use the RemoveAny method to remove a work item from any of the producer's blocking queues.

The following code demonstrates the above scenario in a very rudimentary fashion - work items are enqueued to any available blocking collection, and retrieved from any available blocking collection:

BlockingCollection<Action>[] queues =

    new BlockingCollection<Action>[4];

for (int i = 0; i < queues.Length; ++i)

{

    queues[i] = new BlockingCollection<Action>(5);

}

for (int i = 0; i < queues.Length; ++i)

{

    new Thread(() =>

    {

        while (true)

        {

            Action action;

            if (BlockingCollection<Action>.RemoveAny(

                queues, out action) != -1)

            {

                action();

            }

        }

    }).Start();

}

for (int i = 0; i < 100; ++i)

{

    BlockingCollection<Action>.AddAny(queues, () =>

        {

            Thread.Sleep(100);

            Console.WriteLine(

                Thread.CurrentThread.ManagedThreadId);

        });

}

Console.ReadLine();

Waltzing Through the Parallel Extensions June CTP: Tasks

In the previous post in this series, we have looked at the new synchronization primitives offered by the PFX June CTP.  In this post, we will look at task-related features and at the new task scheduler.

Task Continuation

Another interesting feature in the CTP is the task continuation paradigm, allowing us to specify what should happen when a task completes.  This is accomplished through the use of the ContinueWith method on the Task and Future classes.  Among other things, this mechanism can be used for chaining multiple asynchronous operations in an ordered pipeline of execution.  Since most of the PFX is focused on out-of-order execution of independent work items, this is a welcome addition that streamlines pipeline processing of dependent tasks that should still utilize multiple processors.  The difference between the two paradigms is best illustrated by the following diagram:

 image

For example, the following code schedules a pipeline of three dependent work items.  Each work item depends on the execution result of the previous work item, producing a single final value:

Future<int> f = Future.Create(() => 5)

    .ContinueWith(a => a.Value - 1)

    .ContinueWith(b => b.Value - 1);

Console.WriteLine(f.Value);

Task.WaitAny

In the case of multiple tasks (or futures) executing concurrently, it's possible that we're interested in the execution result of only one of them.  The classic example is having several algorithms that could compute a result, but not knowing in advance which algorithm will be the fastest to compute it.  The following example launches multiple calculations at once, and waits for the first one to complete.  When it completes, the rest of the calculations can be canceled.

Future<int>[] calculations =

    new Future<int>[] {

        Future.Create(() => 5),

        Future.Create(() => 6),

        Future.Create(() => 7)

    };

int calcIndex = Task.WaitAny(calculations);

Array.ForEach(calculations, c => c.Cancel());

Console.WriteLine(calculations[calcIndex].Value);

The New Scheduler

This CTP features a new revamped scheduler that is used by the Task Parallel Library (TPL) and Parallel LINQ to schedule work items for execution.  This scheduler is by and large undocumented, and consists of dozens of internal classes that strive to perform cooperative scheduling in user-mode, without resorting to the operating system or to the .NET thread pool.  This has its advantages (potentially, could be lightning-fast) but also has its disadvantages.  For example, blocking tasks can potentially result in a scheduler thread exhaustion, rendering additional tasks unschedulable.  This is a classic concurrency scenario familiar to anyone who ever tried to implement a thread pool:  If there is a dependency between work items that are waiting for execution and work items that are already executing, then an unresolvable deadlock might occur.  E.g., consider the scenario where I have 4 thread pool threads executing 4 distinct work items.  After performing their work, these work items block waiting for a fifth work item to complete - but for that fifth work item to complete, it must be scheduled for execution, and it can't be scheduled as long as the 4 threads are blocking on the previous work items.  The .NET thread pool can alleviate such scenarios by dynamically expanding the pool of worker threads - expect this to be addressed in future releases of the PFX as well.

The underlying scheduler ought to be discussed in more detail sometime in the future, where more relevant information becomes available.

Waltzing Through the Parallel Extensions June CTP: Synchronization Primitives

Just a few days ago, the Parallel Extensions team has released a new CTP of the Parallel Extensions for .NET 3.5, a.k.a. PFX.  This new CTP is not just a bunch of bug fixes - it's packed with new functionality for us to explore.  (I've written some introduction bits on the December '07 CTP in the past, so you might want to read them if you haven't played with the PFX yet.)

In this post series, we will look at most of the interesting new functionality.

Synchronization Primitives

This release contains a significant number of synchronization-related primitives, providing better performance and scalability when compared to the existing .NET mechanisms.  Let's quickly review the new APIs.

First and foremost, almost every single new mechanism introduced in this CTP features the ability to spin (i.e., burn CPU cycles in a loop) before trying to acquire the synchronization primitive.  Spinning is generally frowned upon as a means of achieving synchronization, but a small number of spins is significantly faster than a system call to acquire a kernel synchronization mechanism.  These spinning facilities are provided through the SpinWait class, which we can use when constructing our own synchronization mechanisms.  (As a side note, spinning before acquiring is not something invented by the PFX team - the critical section Win32 API features the ability to spin before acquiring the critical section through the InitializeCriticalSectionAndSpinCount function.)

The SpinLock class implements a synchronization mechanism closely related to the SpinWait class.  The general idea here is that a spinlock is not supported by any kernel synchronization mechanism - a thread that wants to acquire a spinlock will spin indefinitely until the spinlock becomes available (spinlocks have been used in the Windows kernel from the very beginning, and it was fairly easy to write one in user-mode).  The synchronization is provided by the Interlocked.CompareExchange primitive.  Note that a spinlock cannot be acquired recursively - a LockRecursionException is thrown if you attempt to do so.

The CountdownEvent class is a synchronization mechanism that is initialized with a specified counter, and provides facilities for increment and decrementing the counter.  When the counter reaches zero, the synchronization mechanism becomes signaled, thus releasing any waiting threads.  This is an extremely useful facility that previously had to be implemented using a combination of Interlocked.Decrement and a ManualResetEvent.  For example, the following code spins off four distinct tasks which decrement the event's counter until it reaches zero and the main thread is released:

CountdownEvent countEvent = new CountdownEvent(4);

Parallel.Invoke(

    () => { ...; countEvent.Decrement(); },

    () => { ...; countEvent.Decrement(); },

    () => { ...; countEvent.Decrement(); },

    () => { ...; countEvent.Decrement(); }

);

countEvent.Wait();

The ManualResetEventSlim and SemaphoreSlim classes feature revised implementations of the familiar event and semaphore concepts.  This revised implementation relies on spinning and on using a Monitor internally, and creating a kernel synchronization primitive only as the last resort (for example, when the WaitAll or WaitAny methods are used, or when a WaitHandle is explicitly requested).  This should provide better performance for the vast majority of applications using these synchronization primitives.

Another category of synchronization mechanisms is featured by the LazyInit<T> and WriteOnce<T> structures.  The LazyInit<T> structure supports the lazy initialization paradigm in a thread-safe manner.  (This is suspiciously similar to the one-time initialization mechanism introduced in Windows Vista.)  The WriteOnce<T> structure supports a mechanism for ensuring that a variable is written to a most once, in a thread-safe manner.

The LazyInit<T> structure takes a factory function that performs the initialization when the value is first accessed, and is well-suited for lazily initializing a value that is expensive to initialize eagerly. It supports multiple modes of lazy initialization, exposed by the LazyInitMode enum.  The available options are:

  • EnsureSingleExecution - if multiple threads attempt to access the value concurrently, one of them will execute the factory function to initialize the value, and the rest will wait until the initialization completes.  (This is similar to the synchronous one-time initialization native API, with InitOnceBeginInitialize and InitOnceComplete.)
  • AllowMultipleExecution - if multiple threads attempt to access the value concurrently, all of them will begin executing the factory function to initialize the value, and the first one to succeed will signal the rest that initialization has completed and that this first value should be used.  (This is similar to the asynchronous one-time initialization native API, with the INIT_ONCE_ASYNC flag.)
  • ThreadLocal - if multiple threads attempt to access the value, each thread will execute the factory function to obtain a thread-local value that will be used on that thread only.

In the following example, the value is only initialized when accessed - this can be observed by setting a breakpoint on the second Console.WriteLine line and ensuring that the "Initializing" print-out is only executed when the value is accessed.

LazyInit<string> lazyInit = new LazyInit<string>(

    () => {

        Console.WriteLine("Initializing");

        return "Hello";

    });

Console.WriteLine(lazyInit.Value);

The WriteOnce<T> structure is similar to a nullable type that can be set only once.  Any further attempts to set the value will result in an InvalidOperationException.  It also features the TryGetValue and TrySetValue methods, which work in a thread-safe fashion to ensure that you're setting the value only once.

Debugging Aids

All of the new synchronization primitives feature debugging views that facilitate understanding the internal state of the synchronization mechanism while debugging in Visual Studio.  For example, here's the debugger view of a countdown event:

image

And here's the debugger view of a lazily initialized variable (yes, there are still some bugs in this CTP):

image

MSDN Pulse June 2008 Is Here!

With a few days' delay, I wanted to let you know that the MSDN Pulse issue for June 2008 has been released.  This month's bulletin features yours truly as the Blogger of the Month - indeed, April and May have been quite productive, blogging-wise.

The primary announcement in this issue is the release of the Visual Studio 2008 Service Pack 1 Beta, which contains lots and lots of new goodies.  I would add to that the recent release of a new Parallel Extensions CTP, which I hope to cover on my blog in a couple of days.

Additionally, this issue features my Visual Studio-related tip regarding the Visual Studio "Command Window".

טיפ החודש

Few people are aware of the Command Window, and even fewer people actually use it, but it can streamline fairly annoying tasks that require multiple mouse clicks through a single command console.  Additionally, the Command Window features auto-completion for commands, so if you don't remember where a command is to be found, you can type a part of the name into the Command Window and it will take care of it for you.

Single WCF Generic Endpoint for One-Way and Request-Reply Call Forwarding

Implementing a WCF forwarding router consists of providing an endpoint with a generic contract which accepts untyped messages.  These messages can then be forwarded to another endpoint, which can in turn use a specific typed or a generic untyped contract.

An example of this scenario would be a bus (or router) which coordinates multiple services.  These services might expose specific contracts, such as IOrderService, IShippingService, IPaymentService and many others.  The intermediary in this case can expose a generic untyped contract to the clients, who will be none the wiser: They can still use the specific contract and communicate to the untyped intermediary which will perform the forwarding to the requested service.  The intermediary doesn't have to be aware of all the different service contracts: It can communicate to the services through the generic contract.  This is best (or my best) illustrated by the following diagram:

image

One of the challenges with implementing such an endpoint is that request-reply and one-way calls must be treated differently.  A typical generic contract for the request-reply message exchange pattern will look like this:

[ServiceContract]

internal interface IGenericContract

{

    [OperationContract(Action = "*", ReplyAction = "*")]

    Message Action(Message message);

}

On the other hand, the same generic contract for the one-way MEP will look like this:

[ServiceContract]

internal interface IOneWayGenericContract

{

    [OperationContract(Action = "*", IsOneWay = true)]

    void Action(Message message);

}

Since these are two contracts, we can't trivially expose them on the same endpoint.  This is annoying because it means the client now has to use a different endpoint (or Via address) for R-R and one-way message exchanges.  Furthermore, since a single contract might contain both R-R and one-way operations, it's highly inconvenient for a client to use different endpoints for different operations on the same contract.

A naive attempt to remedy this might involve trying to mix the two MEPs in the same generic contract, like so:

[ServiceContract]

internal interface ICombinedGenericContract

{

    [OperationContract(Action = "*", ReplyAction = "*")]

    Message Action(Message message);

 

    [OperationContract(Action = "*", IsOneWay = true)]

    void OneWayAction(Message message);

}

However, this will not work because at service load-time, the contract will fail to validate: It is exposing two different operations with the Action set to *, so WCF has no means of automatically choosing one of them.

What we need to do is to give WCF the ability to disambiguate the two operations.  This means we are not going to use Action="*", but instead provide an IDispatchOperationSelector which will be in charge of choosing the appropriate routing method.  The interface will look like this (note the absence of Action="*" in the operation contracts):

[ServiceContract]

internal interface IFinalGenericContract

{

    [OperationContract(ReplyAction = "*")]

    Message Action(Message message);

 

    [OperationContract(IsOneWay = true)]

    void OneWayAction(Message message);

}

The operation selector will have to know whether the incoming message is targeted at a one-way operation or not.  This can be accomplished in several ways - for example, if we know the set of specific target contracts, then we can enumerate their contract descriptions at load time and build a cache of which operations are one-way.  Alternatively, we can just check if the incoming message has a ReplyTo SOAP element.  If it doesn't, it's a one-way call.

The following is a simple implementation of IDispatchOperationSelector which also implements IEndpointBehavior to install itself on the relevant dispatch runtimes.  (It's also possible to make it a service behavior and specify it on the service itself.)

sealed class GenericDispatchOperationSelector :

    IDispatchOperationSelector, IEndpointBehavior

{

    #region IEndpointBehavior Members

 

    public void AddBindingParameters(

        ServiceEndpoint endpoint,

        BindingParameterCollection bindingParameters)

    {

    }

 

    public void ApplyClientBehavior(

        ServiceEndpoint endpoint,

        ClientRuntime clientRuntime)

    {

    }

 

    public void ApplyDispatchBehavior(

        ServiceEndpoint endpoint,

        EndpointDispatcher endpointDispatcher)

    {

        endpointDispatcher.ContractFilter =

            new MatchAllMessageFilter();

        DispatchRuntime runtime =

            endpointDispatcher.DispatchRuntime;

        runtime.OperationSelector = this;

    }

 

    public void Validate(ServiceEndpoint endpoint)

    {

    }

 

    #endregion

 

    #region IDispatchOperationSelector Members

 

    public string SelectOperation(ref Message message)

    {

        string replyTo = message.Headers.ReplyTo;

        if (String.IsNullOrEmpty(replyTo))

            return "OneWayAction";

        //else

        return "Action";

    }

 

    #endregion

}

With this in place, we can move on to the implementation of the intermediary itself.  We will need to install the endpoint behavior on the host's endpoint which exposes the generic contract.  For example:

ServiceHost host =

    new ServiceHost(typeof(Intermediary));

 ServiceEndpoint endpoint =

    host.AddServiceEndpoint(

        typeof(IFinalGenericContract), ...);

endpoint.Behaviors.Add(

    new GenericDispatchOperationSelector());

host.Open();

The final bit that needs to fall in its place is the forwarding logic.  It might appear as if we can use the IFinalGenericContract devised above to communicate with one-way or R-R services alike.  However, this isn't the case!  If we use this generic contract, we effectively forward the need to use an operation selector and impose it on the services, which is not what we need.  Instead, we need a pair of interfaces - the IGenericContract and IOneWayGenericContract outlined above, one for each MEP:

internal sealed class Intermediary : IFinalGenericContract

{

    #region IFinalGenericContract Members

 

    public Message Action(Message message)

    {

        //Request-reply case:

        IGenericContract proxy = ...;

        return proxy.Action(message);

    }

 

    public void OneWayAction(Message message)

    {

        //One-way case:

        IOneWayGenericContract proxy = ...;

        proxy.Action(message);

    }

 

    #endregion

}

This is all we need to implement forwarding logic in one place for the one-way and R-R patterns.  The client code is exactly the same and the same endpoint can be used for both MEPs.  Extending this sample to support the duplex MEP is left as an exercise for the reader.  :-)