June 2008 - Posts
Kernel-mode crash dump analysis, affectionately called "Blue-Screen Analysis" thanks to the manifestation of kernel-mode crashes in Windows, is an extremely complicated topic to master. Analyzing user-mode crash dumps is hard enough, plagued by missing information, mismatched symbols, dump corruption and inability to reproduce the live problem. Kernel-mode crash dumps add a new dimension of complexity due to the interaction of multiple components (drivers, user-mode processes, Windows core services and components) which is often the root cause of the dump. Additionally, analyzing a dump of significant complexity requires a great amount of knowledge about Windows system mechanisms and kernel-mode programming in general (interrupts, DPCs, APCs, thread scheduling and many other areas well covered by our Windows Internals course).
I've looked into kernel-mode crash analysis in the past, as part of the voluminous "Debugging and Investigation Tools" post, where I demonstrate isolating and pinpointing a faulty driver through the use of Driver Verifier. For now, though, I would like to focus on the ABC of Blue-Screen Dump Analysis - the steps any of us can take at home to determine why our favorite laptop is giving us the blue-screen goodness with every meal.
Step A - Send Your Error Reports to Microsoft
The easiest way of actually getting your problem diagnosed and resolved if at all possible is sending the error report to Microsoft Online Crash Analysis. After the system recovers from a blue-screen, it will ask you to send the information to Microsoft, and you should do so. More often than not, shortly afterwards or a few days or weeks later, there will be a solution available for your problem:
If this kind of automatic diagnosis is not enough for your needs; if you're not getting a prompt solution to the problem; if you're curious what happens behind the scenes of a crash dump... then read on.
Last week one of my acquaintances was kind enough to give me the exact material necessary for this kind of ABC post - a collection of 18 blue-screen crash dumps from his laptop, collected across a period of 3 months. To begin with, where do you actually find this kind of information?
Step B - Dumps Live at %SYSTEMROOT%\Minidump
Try looking at the %SYSTEMROOT%\Minidump folder right now to find out if you've had any blue-screens lately. On my laptop, from the last 1.5 years, all I have are a measly 5 dumps:
As you see, a kernel crash dump is something you can easily send over the Internet to a curious colleague or, as we have already seen, to... Microsoft Online Crash Analysis. And of course you can open it yourself to see what's lurking inside.
Step C - Bugs Fear WinDbg The Most
The single best tool for diagnosing kernel-mode crash dumps is WinDbg, part of the Debugging Tools for Windows package which I have extensively covered in the past. It's a free download from Microsoft, and its facilities for analyzing kernel-mode and user-mode problems are truly endless.
All you need to do with a blue-screen dump to get some meaningful information from WinDbg is configure symbols (File -> Symbol File Path -> srv*C:\SymbolCache*http://msdl.microsoft.com/download/symbols) and File -> Open Crash Dump. The next thing you'll see will closely resemble the following, and it will occur after an unspecified delay while your system is downloading symbols from the web:
Loading Dump File [D:\Temp\Dumps\Mini032308-01.dmp]
Mini Kernel Dump File: Only registers and stack trace are available
Symbol search path is: srv*D:\Symbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Kernel Version 6001 (Service Pack 1) MP (2 procs) Free x86 compatible
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 6001.18000.x86fre.longhorn_rtm.080118-1840
Kernel base = 0x81c1a000 PsLoadedModuleList = 0x81d31c70
Debug session time: Sun Mar 23 18:29:59.623 2008 (GMT+3)
System Uptime: 0 days 0:09:01.883
Loading Kernel Symbols
......................................................................................................................................................
Loading User Symbols
Loading unloaded module list
.....
Use !analyze -v to get detailed debugging information.
BugCheck 1A, {4000, 8655d188, 80000000, 17e05c}
Probably caused by : memory_corruption ( nt!MiDeleteVirtualAddresses+7ef )
Followup: MachineOwner
---------
The interesting parts are in bold - we have the machine information (Vista SP1, 32-bit, 2 CPU), we have the system uptime (just 9 minutes!) and we have the probable cause right in front of us. The debugger thinks it's a memory corruption, and suggests that we use the !analyze -v command for more detailed information. Let's have a look:
1: kd> !analyze -v
MEMORY_MANAGEMENT (1a)
# Any other values for parameter 1 must be individually examined.
Arguments:
Arg1: 00004000, The subtype of the bugcheck.
Arg2: 8655d188
Arg3: 80000000
Arg4: 0017e05c
Debugging Details:
------------------
BUGCHECK_STR: 0x1a_4000
CUSTOMER_CRASH_COUNT: 1
DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT
PROCESS_NAME: svchost.exe
CURRENT_IRQL: 0
LAST_CONTROL_TRANSFER: from 81c584cf to 81ce7163
STACK_TEXT:
a9e7baa4 81c584cf 0000001a 00004000 8655d188 nt!KeBugCheckEx+0x1e
a9e7bbd8 81cab82c 0e430002 0f586fff 8ddec810 nt!MiDeleteVirtualAddresses+0x7ef
a9e7bca8 81caadc5 8ddec810 84751ad8 84574d78 nt!MiRemoveMappedView+0x4aa
a9e7bcd0 81e3eb9d 84574d78 00000000 ffffffff nt!MiRemoveVadAndView+0xe3
a9e7bd34 81e3ecee 8ddec810 0e430000 00000000 nt!MiUnmapViewOfSection+0x265
a9e7bd54 81c71a7a ffffffff 0e430000 043eed4c nt!NtUnmapViewOfSection+0x55
a9e7bd54 77909a94 ffffffff 0e430000 043eed4c nt!KiFastCallEntry+0x12a
WARNING: Frame IP not in any known module. Following frames may be wrong.
043eed4c 00000000 00000000 00000000 00000000 0x77909a94
STACK_COMMAND: kb
FOLLOWUP_IP:
nt!MiDeleteVirtualAddresses+7ef
81c584cf cc int 3
SYMBOL_STACK_INDEX: 1
SYMBOL_NAME: nt!MiDeleteVirtualAddresses+7ef
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: nt
DEBUG_FLR_IMAGE_TIMESTAMP: 47918b12
IMAGE_NAME: memory_corruption
FAILURE_BUCKET_ID: 0x1a_4000_nt!MiDeleteVirtualAddresses+7ef
BUCKET_ID: 0x1a_4000_nt!MiDeleteVirtualAddresses+7ef
Followup: MachineOwner
---------
Note that we have no specifics regarding the user-mode stack that caused the crash because it's a kernel-only minidump (no user-mode information was captured). However, we see that the memory_corruption indication is pretty consistent. Looking this up on the web we see multiple recommendations:
- Run some memory diagnostic tools
- Use tools like DebugWiz to further diagnose the problem
- Send the hardware to the manufacturer for inspection
Let's take a look at another dump (we have 18 of them, so no need to use them sparingly):
BugCheck 1000008E, {c0000005, 81e63829, aea91860, 0}
Probably caused by : ntkrpamp.exe ( nt!PfGetCompletedTrace+138 )
Followup: MachineOwner
---------
1: kd> !analyze -v
KERNEL_MODE_EXCEPTION_NOT_HANDLED_M (1000008e)
Arguments:
Arg1: c0000005, The exception code that was not handled
Arg2: 81e63829, The address that the exception occurred at
Arg3: aea91860, Trap Frame
Arg4: 00000000
Debugging Details:
------------------
EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.
FAULTING_IP:
nt!PfGetCompletedTrace+138
81e63829 894804 mov dword ptr [eax+4],ecx
TRAP_FRAME: aea91860 -- (.trap 0xffffffffaea91860)
ErrCode = 00000002
eax=00000000 ebx=00000001 ecx=81d341a4 edx=da84a000 esi=81d341c0 edi=81d341b4
eip=81e63829 esp=aea918d4 ebp=aea91928 iopl=0 nv up ei ng nz na pe cy
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010287
nt!PfGetCompletedTrace+0x138:
81e63829 894804 mov dword ptr [eax+4],ecx ds:0023:00000004=????????
Resetting default scope
CUSTOMER_CRASH_COUNT: 1
DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT
BUGCHECK_STR: 0x8E
PROCESS_NAME: svchost.exe
CURRENT_IRQL: 0
LAST_CONTROL_TRANSFER: from 81e62c63 to 81e63829
STACK_TEXT:
aea91928 81e62c63 01240000 00004000 aea91d30 nt!PfGetCompletedTrace+0x138
aea919a0 81e6e0ca 00000000 adb85501 aea91d30 nt!PfQuerySuperfetchInformation+0x204
aea91d4c 81c8ca7a 0000004f 012bf370 00000014 nt!NtQuerySystemInformation+0x2201
aea91d4c 77629a94 0000004f 012bf370 00000014 nt!KiFastCallEntry+0x12a
WARNING: Frame IP not in any known module. Following frames may be wrong.
012bf598 00000000 00000000 00000000 00000000 0x77629a94
STACK_COMMAND: kb
FOLLOWUP_IP:
nt!PfGetCompletedTrace+138
81e63829 894804 mov dword ptr [eax+4],ecx
SYMBOL_STACK_INDEX: 0
SYMBOL_NAME: nt!PfGetCompletedTrace+138
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: nt
IMAGE_NAME: ntkrpamp.exe
DEBUG_FLR_IMAGE_TIMESTAMP: 47918b12
FAILURE_BUCKET_ID: 0x8E_nt!PfGetCompletedTrace+138
BUCKET_ID: 0x8E_nt!PfGetCompletedTrace+138
Followup: MachineOwner
---------
This one sure looks different. This time the module that takes the blame is not the generic memory_corruption, but the very specific ntkrpamp.exe which is the Windows kernel itself! Examining the stack trace, it seems like a very innocent stack related to the SuperFetch memory caching and preloading feature which is built into the OS, triggering an access violation. A random write bug is possible but unlikely, especially since we have seen traces of memory corruption in the previous dump, and SuperFetch is one of those services accessing memory quite heavily. Let's take a look at another one:
BugCheck 50, {fb400428, 1, 81e71d60, 0}
Probably caused by : win32k.sys ( win32k!vSolidFillRect1+107 )
Followup: MachineOwner
---------
0: kd> !analyze -v
PAGE_FAULT_IN_NONPAGED_AREA (50)
Arguments:
Arg1: fb400428, memory referenced.
Arg2: 00000001, value 0 = read operation, 1 = write operation.
Arg3: 81e71d60, If non-zero, the instruction address which referenced the bad memory address.
Arg4: 00000000, (reserved)
Debugging Details:
------------------
FAULTING_IP:
nt!RtlFillMemoryUlong+10
81e71d60 f3ab rep stos dword ptr es:[edi]
MM_INTERNAL_CODE: 0
CUSTOMER_CRASH_COUNT: 1
DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT
BUGCHECK_STR: 0x50
PROCESS_NAME: devenv.exe
CURRENT_IRQL: 0
TRAP_FRAME: 8e00f840 -- (.trap 0xffffffff8e00f840)
ErrCode = 00000002
eax=00f0f0f0 ebx=00000202 ecx=00000011 edx=00000011 esi=fb200008 edi=fb400428
eip=81e71d60 esp=8e00f8b4 ebp=8e00f8e8 iopl=0 nv up ei pl nz na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010206
nt!RtlFillMemoryUlong+0x10:
81e71d60 f3ab rep stos dword ptr es:[edi] es:0023:fb400428=????????
Resetting default scope
LAST_CONTROL_TRANSFER: from 81e78bb4 to 81ec3155
STACK_TEXT:
8e00f828 81e78bb4 00000001 fb400428 00000000 nt!MmAccessFault+0x10a
8e00f828 81e71d60 00000001 fb400428 00000000 nt!KiTrap0E+0xdc
8e00f8b4 961106f7 fb400428 00000044 00f0f0f0 nt!RtlFillMemoryUlong+0x10
8e00f8e8 9610bcc7 8e00fb44 00000001 fb200008 win32k!vSolidFillRect1+0x107
8e00fa88 9610b8b7 961105f0 8e00fb44 fda2dac8 win32k!vDIBSolidBlt+0x102
8e00faf4 960ded53 ffa81008 00000000 00000000 win32k!EngBitBlt+0x18e
8e00fb60 9609947b fda2da5c fda2dac8 181f35b1 win32k!ExtTextOutRect+0x1cf
8e00fbc8 960f8775 8e00fd0c 7ffdf2e4 006ce26c win32k!GreBatchTextOutRect+0xcb
8e00fd34 81e75a1c 00000099 0020ee6c 0020ee90 win32k!NtGdiFlushUserBatch+0x134
8e00fd44 77309a94 badb0d00 0020ee6c 00000000 nt!KiFastCallEntry+0xcc
WARNING: Frame IP not in any known module. Following frames may be wrong.
8e00fd48 badb0d00 0020ee6c 00000000 00000000 0x77309a94
8e00fd4c 0020ee6c 00000000 00000000 00000000 0xbadb0d00
8e00fd50 00000000 00000000 00000000 00000000 0x20ee6c
STACK_COMMAND: kb
FOLLOWUP_IP:
win32k!vSolidFillRect1+107
961106f7 8b55f4 mov edx,dword ptr [ebp-0Ch]
SYMBOL_STACK_INDEX: 3
SYMBOL_NAME: win32k!vSolidFillRect1+107
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: win32k
IMAGE_NAME: win32k.sys
DEBUG_FLR_IMAGE_TIMESTAMP: 47c78851
FAILURE_BUCKET_ID: 0x50_W_win32k!vSolidFillRect1+107
BUCKET_ID: 0x50_W_win32k!vSolidFillRect1+107
Followup: MachineOwner
---------
This time, it's win32k.sys (the built-in windowing and graphics driver) taking the blame for the crash, as part of some code that appears to be filling out memory. The originating process this time is devenv.exe (Visual Studio itself). Again, it's highly unlikely that the win32k code is indeed at fault here - either it's a physical memory corruption, or some faulty driver is running over memory. Let's take a look at a final, fourth dump before we start coming up with action items:
BugCheck 1A, {4000, 8d6a3678, 80000000, 17dfed}
Probably caused by : memory_corruption ( nt!MiDeleteVirtualAddresses+7ef )
Followup: MachineOwner
---------
0: kd> !analyze -v
Debugging Details:
------------------
BUGCHECK_STR: 0x1a_4000
CUSTOMER_CRASH_COUNT: 1
DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT
PROCESS_NAME: iexplore.exe
CURRENT_IRQL: 0
LAST_CONTROL_TRANSFER: from 81e8e4cf to 81f1d163
STACK_TEXT:
c2da4b5c 81e8e4cf 0000001a 00004000 8d6a3678 nt!KeBugCheckEx+0x1e
c2da4c94 81ee236e 0e770000 0ed41fff 07a4b321 nt!MiDeleteVirtualAddresses+0x7ef
c2da4d2c 81ea7a7a ffffffff 0e33ee50 0e33ee44 nt!NtFreeVirtualMemory+0x652
c2da4d2c 77469a94 ffffffff 0e33ee50 0e33ee44 nt!KiFastCallEntry+0x12a
WARNING: Frame IP not in any known module. Following frames may be wrong.
0e33ed9c 00000000 00000000 00000000 00000000 0x77469a94
STACK_COMMAND: kb
FOLLOWUP_IP:
nt!MiDeleteVirtualAddresses+7ef
81e8e4cf cc int 3
SYMBOL_STACK_INDEX: 1
SYMBOL_NAME: nt!MiDeleteVirtualAddresses+7ef
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: nt
DEBUG_FLR_IMAGE_TIMESTAMP: 47918b12
IMAGE_NAME: memory_corruption
FAILURE_BUCKET_ID: 0x1a_4000_nt!MiDeleteVirtualAddresses+7ef
BUCKET_ID: 0x1a_4000_nt!MiDeleteVirtualAddresses+7ef
Followup: MachineOwner
---------
Ah, it's our friend memory_corruption again, this time with iexplore.exe (Internet Explorer) as the current process responsible. Time to wrap it up.
Conclusion: We are either looking at a machine with defective physical memory, overclocked physical memory or some other kind of hardware problem, or a misbehaving driver that is randomly corrupting memory as part of its normal operation. In the former case, we can run memory diagnostic tools and send the machine to the manufacturer for replacement; in the latter case, we are looking at a long story of downloading latest versions of all drivers, ensuring that no rogue or irrelevant drivers are installed, enabling Driver Verifier on suspect drivers and waiting to reproduce the problem and catch the faulty component in the act.
Last week I've resolved a simple "debugging" case by phone, and figured that it might benefit putting it online. Here's the approximate outline of the call:
Customer: Sasha, we have a COM component registration that is failing because regsvr32 says it can't find the DLL.
Myself: Where is it looking for that DLL?
Customer: It doesn't matter where it's looking. We put the component in the System32 directory, but it still complained that it can't find it.
Myself: [With a flash of psychic debugging] Is it a 64-bit system and you're trying to register a 32-bit component? You should put the component in the SysWOW64 directory.
Customer: Yes, how did you know that?
This is one of these psychic debugging cases, where you can see the answer in a split second if you've encountered the situation before. When a 32-bit application (be it an installer or the 32-bit regsvr32.exe) thinks it's looking for a file in the System32 directory, it's actually looking for the file in the SysWOW64 directory.
The reason for this behavior is plain old compatibility. 32-bit applications running on top of the Windows-on-Windows64 layer should be none the wiser regarding the location of system DLLs. Therefore, file system redirection ensures that any accesses (even hard-coded accesses) to the System32 folder are routed to SysWOW64. The same applies to the Program Files directory, which on a 64-bit system is replicated to Program Files (x86) for 32-bit applications. And finally, if you remain incautious, the same issue can bite you with registry redirection - there is a separate view of the registry for 32-bit applications on 64-bit Windows, and only a small number of keys are reflected across both registry views for interoperability scenarios.
This can lead to the very frustrating situation where you're repeatedly trying to copy the file to System32 and run the registration code, only to be told that the file could not be found.
If you're keen enough on porting your applications to 64-bit Windows, you're probably not going to port every single line of code you've ever written - at the same time. .NET applications are easiest to port, but native code takes time. Therefore, you are probably going to end up running 32-bit applications on 64-bit Windows and getting 32-bit processes to talk to 64-bit processes. This can be challenging, and we at Sela have prepared a 2-day course for addressing the new 64-bit architecture, improvements and practical porting issues for managed and native applications, and writing high-performance concurrent applications on top of the newest versions of Windows.
I've just uploaded a new open-source project called "Non-Paged CLR Host" to CodePlex, in collaboration with Alon Fliess. This project features a custom CLR host that can be used for executing any existing .NET application with little or no modification. This custom CLR host ensures that all memory given out to the CLR is locked into physical memory if possible, thus eliminating paging completely.
This can provide two important and distinct advantages to server and client applications alike:
- Applications will benefit from no paging during normal operation. Even if other applications are actively allocating memory, allocations performed under the non-paged CLR host will be locked into physical memory.
- No paging will occur when the application is idle, providing a great benefit to low-latency processes such as GUI applications (even if the user has fallen asleep in front of the monitor). The normal working set management scheme employed by Windows will not affect processes running under the non-paged CLR host.
The non-paged CLR host is available in x86 (32-bit) and x64 (64-bit) builds. You can compile your own flavor for Windows 2000 and above on any supported processor. Please note that this is a preliminary version that has not been extensively tested, so we strongly recommend that you use it in a controlled environment for testing purposes only.
Using the non-paged CLR host is extremely simple. The current host consists of a console application that executes an assembly passed to it via the command line. The only constraint imposed on the code to be executed is that it must reside in a static method which returns an int and accepts a string as a parameter (note: not string[]), such as the following method:
#pragma warning disable 0028
public static int Main(string str)
#pragma warning restore 0028
{
return str.Length;
}
(Warning CS0028 indicates that you have a Main method with a mismatched signature.) Assuming that this method is placed in a class called Program that resides in an assembly called TestHost.dll, the following command line can be used to execute it:
AweClrHost_Win32_Release.exe TestHost.dll Program Main
You can also pass the parameter to the Main method on the same command line. It defaults to the minimum working set size reservation that the host has been able to reserve for the process (as a string).
From an implementation perspective, the non-paged CLR host uses the SetProcessWorkingSetSize, SetProcessWorkingSetSizeEx (on Windows Server 2003 and above) and VirtualLock APIs to ensure that memory allocated by it is locked into physical memory. Note that using the above APIs does not guarantee with absolute certainty that no paging will occur; instead, it minimizes the odds of it occurring to very exceptional scenarios. (And from a practical perspective, they are really unlikely.) In some load tests I conducted, even when the system as a whole was hogged by lack of physical memory, no page faults were observed in the process using the non-paged CLR host. In future posts, we will (hopefully) delve into the implementation highlights of a custom CLR host and some of the surprises that lurk along the way.
Wrapping this up, Alon and I will be very glad to hear about your experiences with this preliminary release. There are many interesting scenarios that can benefit from using the non-paging CLR host, and we would appreciate any such scenarios you might have encountered or any other kind of feedback. You can contact me through the blog, as usual. Alon can be contacted through his blog as well.
Workflow Services (introduced in .NET 3.5) are based on a simple convention for passing the workflow instance identifier from the client to the workflow and from the workflow to any services it invokes. This convention revolves around the use of context-ful bindings (BasicHttpContextBinding, WSHttpContextBinding, NetTcpContextBinding and others) and a simple dictionary which contains a key called "instanceId" and a value that contains the workflow instance identifier.
This information is passed out-of-band to facilitate cleaner interfaces - it's passed in a SOAP header called ContextMessageHeader (which is an internal WCF class), and can be accessed from the channel's context manager or a message property called ContextMessageProperty. (For completeness it's also worth stating that the context information can be passed in an HTTP cookie - which is the approach taken by the BasicHttpContextBinding - but we will focus on the SOAP header approach.)
If a client wants to communicate with a specific workflow instance, then the following plumbing code will ensure that the right instance will service the request:
//Extension for applying instance id to context
public static void ApplyInstanceId(this IContextChannel proxy, Guid id)
{
IContextManager ctxManager = proxy.GetProperty<IContextManager>();
IDictionary<string,string> ctx = ctxManager.GetContext();
ctx.Add("instanceId", id.ToString());
ctxManager.SetContext(ctx);
}
This extension method can be used on the proxy to the workflow endpoint:
IWorkflow workflowProxy = ChannelFactory<IWorkflow>.CreateChannel(
Common.Binding, new EndpointAddress(Common.WorkflowAddress));
((IContextChannel)workflowProxy).ApplyInstanceId(workflow.InstanceId);
This causes the message to include a context message header with the workflow instance identifier embedded in it. The message will appear similar to the following (note the Context element in bold italic):
<s:Envelope
xmlns:s="http://www.w3.org/2003/05/soap-envelope"
xmlns:a="http://www.w3.org/2005/08/addressing">
<s:Header>
<a:Action s:mustUnderstand="1">
http://tempuri.org/IWorkflow/Echo
</a:Action>
<a:MessageID>
urn:uuid:610c88d9-3365-4aa5-ac8c-3e7949621c80
</a:MessageID>
<a:ReplyTo>
<a:Address>
http://www.w3.org/2005/08/addressing/anonymous
</a:Address>
</a:ReplyTo>
<Context xmlns="http://schemas.microsoft.com/ws/2006/05/context">
<Property name="instanceId">
dec52f54-c51b-4025-888d-58f29507f572
</Property>
</Context>
<a:To s:mustUnderstand="1">
http://localhost:9092/Intermediary/RR
</a:To>
</s:Header>
<s:Body>
<Echo xmlns="http://tempuri.org/">
<message>Hello</message>
</Echo>
</s:Body>
</s:Envelope>
On the other hand, when receiving a callback from a workflow, a service has to use the following plumbing code to determine which specific workflow instance is responsible for the call:
//Extension for extracting instance id from message properties
public static string GetInstanceId(this MessageProperties properties)
{
ContextMessageProperty property;
if (!ContextMessageProperty.TryGet(properties, out property))
{
throw new InvalidOperationException("No ContextMessageProperty");
}
return property.Context["instanceId"];
}
This extension method can be used on the incoming message properties:
string workflowInstanceId =
OperationContext.Current.IncomingMessageProperties.GetInstanceId();
This is all fairly straightforward and well-covered by existing technology samples. The challenge is to forward context-ful messages through an intermediary (such as the intermediary covered in previous posts). The naive approach is not going to work for a two primary reasons.
To begin with, the IContextManager approach is incompatible with the ContextMessageProperty. This means that if the outgoing channel created within the router has context management enabled and the outgoing message properties contain the context message property, an exception will be thrown. This means that we have to disable context management on the outgoing channel or remove the context message property before forwarding the message. Either option is feasible, so here's how to disable context management (the alternative is left as an exercise for the reader):
//Extension for disabling context management on the channel
public static void DisableContextManagement(this IContextChannel channel)
{
IContextManager ctxManager = channel.GetProperty<IContextManager>();
ctxManager.Enabled = false;
}
This can be used on any outgoing channel. For example, in the sample project featured in this post, the router code for request-reply messages now becomes:
public Message ActionRR(Message request)
{
//Forward to workflow
IGenericRR proxy = ChannelFactory<IGenericRR>.CreateChannel(
Common.Binding, new EndpointAddress(Common.WorkflowAddress));
((IContextChannel)proxy).DisableContextManagement();
return proxy.ActionRR(request);
}
Unfortunately, this still isn't going to work. In fact, if you try this code out, you'll find that there's no exception thrown, but the message doesn't reach the workflow. It's silently swallowed and there's absolutely no indication that anything went wrong in the process.
(Fast-forward countless hours of frustrating debugging.) It appears that the context header present in the incoming message prevents the message from being successfully forwarded. The context header is added again when the message is dispatched by the outgoing channel, and the header's presence somehow causes the message to be lost. Therefore, we must remove the context header explicitly:
//Extension for removing the context header
public static void RemoveContext(this MessageHeaders headers)
{
headers.RemoveAll(ContextHeaderName, ContextHeaderNamespace);
}
const string ContextHeaderName = "Context";
const string ContextHeaderNamespace = "http://schemas.microsoft.com/ws/2006/05/context";
This code can now be incorporated into the message-forwarding logic outlined above:
public Message ActionRR(Message request)
{
//Forward to workflow
request.Headers.RemoveContext();
IGenericRR proxy = ChannelFactory<IGenericRR>.CreateChannel(
Common.Binding, new EndpointAddress(Common.WorkflowAddress));
((IContextChannel)proxy).DisableContextManagement();
return proxy.ActionRR(request);
}
This must be done regardless of whether the intermediary is processing a message directed at a workflow or a message originating from a workflow. Without removing the context header, the message will fail to be processed.
To connect all these seemingly disconnected pieces of code, you can download a sample project demonstrating a scenario where a client communicates to a workflow through an intermediary and the workflow proceeds to send a one-way notification to the client through the same intermediary.

We're continuing the series of posts arising from the implementation intricacies of a WCF router. Today's post features a seemingly simple task: Constructing an empty WCF reply message, which is the equivalent of the message sent in response to an operation which returns void (but is not one-way).
The motivation for this could be the following: A router needs to dispatch a message to multiple subscribers. However, the router itself is unaware of the data contract - it is willing to work on an untyped Message-based contract. The operation doesn't have a return type - it's void. On the other hand, the operation is not one-way for some reason - for example, it might require transactional semantics to assure that the message is delivered and made durable as part of the client transaction. This would require the router to construct the semantic equivalent of the message that would be sent in response to the operation if the service was working with the original contract, and not the untyped Message-based contract the router is familiar with.
Let's observe the request and reply messages for a simple operation called Hello which takes a string parameter. The request message looks like this:
<s:Envelope xmlns:s="http://www.w3.org/2003/05/soap-envelope"
xmlns:a="http://www.w3.org/2005/08/addressing">
<s:Header>
<a:Action s:mustUnderstand="1">
http://tempuri.org/IHello/Hello
</a:Action>
<a:MessageID>urn:uuid:2ec62f21-b334-4e50-85e0-586decdef121</a:MessageID>
<a:ReplyTo>
<a:Address>http://www.w3.org/2005/08/addressing/anonymous</a:Address>
</a:ReplyTo>
<a:To s:mustUnderstand="1">http://localhost:9090/Hello</a:To>
</s:Header>
<s:Body>
<Hello xmlns="http://tempuri.org/">
<str>Hello World!</str>
</Hello>
</s:Body>
</s:Envelope>
And the response message looks like this:
<s:Envelope xmlns:s="http://www.w3.org/2003/05/soap-envelope"
xmlns:a="http://www.w3.org/2005/08/addressing">
<s:Header>
<a:Action s:mustUnderstand="1">
http://tempuri.org/IHello/HelloResponse
</a:Action>
<a:RelatesTo>
urn:uuid:2ec62f21-b334-4e50-85e0-586decdef121
</a:RelatesTo>
</s:Header>
<s:Body>
<HelloResponse xmlns="http://tempuri.org/" />
</s:Body>
</s:Envelope>
This seems simple enough to fake. The RelatesTo header is not critical (it provides the correlation between the request and response messages, but we can live without it). The Action header must be present, and the response body has to be present because that's what the operation formatter on the client side expects.
I wasn't able to find an elegant way to generate the response - all relevant classes which implement IDispatchOperationFormatter are internal, and considered an implementation detail. Therefore, I had to resort to the following (highly fragile) code to construct the response message:
XmlDictionaryReader requestReader =
message.GetReaderAtBodyContents();
string bodyElemName = requestReader.Name;
string bodyNS = requestReader.NamespaceURI;
requestReader.Close();
XmlDictionaryReader bodyReader =
XmlDictionaryReader.CreateDictionaryReader(
XmlReader.Create(
new StringReader(
"<" + bodyElemName + "Response " +
"xmlns='" + bodyNS + "' />"
)));
Message reply = Message.CreateMessage(
message.Version,
message.Headers.Action + "Response",
bodyReader);
It works, but if there's a better way that you can think of, I'd be delighted to hear about it.
Most WCF services operate on a typed message contract. In other words, the underlying Message object is not available because it is parsed by the default WCF operation invoker into the data contract that the service operation expects.
However, oftentimes you need access to the underlying Message object even though your typed service does not directly consume it. For example, you might want to automatically serialize the message, or pass it to other services that expose an untyped message contract (such as routing services or notification publishing services outlined in my previous posts).
This can be accomplished by making a copy of the incoming message and later obtaining it from within the typed service operation. The most intuitive interception point for making such a copy is an implementation of IDispatchMessageInspector. The most intuitive location for storing the message is a message property, because it is transient and does not get serialized into subsequent message calls. (Note that storing the message in thread-local storage is an appealing option, but the service call is not guaranteed to be performed on the same thread that the message inspector is called on. In fact, from experience, this is rarely the case with one-way operations.)
Consequently, we need is a message property for caching the message, which can install itself onto the current operation context's incoming message properties collection, and retrieve itself from that collection:
public sealed class MessageCacheProperty
{
public const string Name = "MessageCacheProperty";
public Message Message { get; private set; }
public MessageCacheProperty(Message message)
{
Message = message;
}
public static Message GetContextMessage()
{
OperationContext ctx = OperationContext.Current;
MessageCacheProperty messageProperty =
(MessageCacheProperty)
ctx.IncomingMessageProperties[Name];
return messageProperty.Message;
}
public static void Install(Message message)
{
OperationContext ctx = OperationContext.Current;
ctx.IncomingMessageProperties.Add(
Name, new MessageCacheProperty(message));
}
}
This property can now be installed using an implementation of IDispatchMessageInspector that is installed on our service's endpoints (through an IEndpointBehavior or an IServiceBehavior extension):
public sealed class MessageCacheInspectorAttribute :
IDispatchMessageInspector
{
public object AfterReceiveRequest(
ref Message request,
IClientChannel channel,
InstanceContext instanceContext)
{
MessageBuffer copy =
request.CreateBufferedCopy(int.MaxValue);
MessageCacheProperty.Install(copy.CreateMessage());
request = copy.CreateMessage();
return null;
}
//The operation is one-way,
//so this won't be called anyway
public void BeforeSendReply(
ref Message reply,
object correlationState)
{
}
}
Note that we make two copies of the messages, because copying the message consumes it and makes it unusable for subsequent processing.
The above code makes the message available in any service operation that passes through this inspector, e.g.:
Message toSend = MessageCacheProperty.GetContextMessage();
In my recent post series on the June '08 Parallel Extensions CTP we have looked at a multitude of new features for concurrent programming in managed code. When the framework is finally released, we will have a lightning-fast scheduler, a fully functional parallel execution model for LINQ queries, new concurrent collection classes, new synchronization primitives featuring better performance, pipeline and out-of-order task execution models, and so much more.
But what about native code? What about us native developers? Surely there must be an ongoing effort to provide concurrent programming libraries for native code?
There surely is! The Native Concurrency MSDN blog, started just a few days ago, is the first indicator of the concurrent programming effort in native code. New libraries and constructs together with new C++0x language features such as lambdas can provide a concurrent programming experience that is a good match for the managed PFX. The very first post on the Native Concurrency blog shows one example of this with regard to matrix multiplication, which yields itself easily to parallelization. Instead of:
void MatrixMult(int size, double** m1,
double** m2, double** result)
{
for (int i = 0; i < size; i++)
{
for (int j = 0; j < size; j++)
{
for (int k = 0; k < size; k++)
{
result[i][j] += m1[i][k] * m2[k][j];
}
}
}
}
...which is the non-concurrent example, the following is very similar to the managed Parallel.For and achieves parallelization:
void MatrixMult(int size, double** m1,
double** m2, double** result)
{
parallel_for (0,size,1,[&](int i)
{
for (int j = 0; j < size; j++)
{
for (int k = 0; k < size; k++)
{
result[i][j] += m1[i][k] * m2[k][j];
}
}
});
}
Note the bizarre [&] syntax, which is the proposed C++0x syntax for lambda expressions in C++. This effectively parallelizes the outer loop across multiple threads and processors if available.
However, none of this is very new. Many years before the managed Parallel Extensions have been introduced, the OpenMP standard defined C and C++ extensions for parallelizing code at the compilation level using #pragma directives, which have been implemented by Microsoft in Visual Studio. If you have Visual Studio 2005 or Visual Studio 2008, you can compile with /OpenMP (under Project Properties -> C++ -> Language), link to vcomp.lib and try the following C++ code right away:
void MatrixMult(int size, double** m1,
double** m2,double** result)
{
#pragma omp parallel for
for (int i = 0; i < size; i++){
for (int j = 0; j < size; j++){
for (int k = 0; k < size; k++) {
result[i][j] += m1[i][k] * m2[k][j];
}
}
}
}
This effectively parallelizes the outer loop, and has been available in Visual Studio for quite some time. In fact, OpenMP is also supported for C++/CLI code (with /CLR), rendering it a feasible alternative to the managed Parallel Extensions.
Other constructs of interest in OpenMP include #pragma omp parallel section for executing multiple regions in parallel (similar to Parallel.Invoke), #pragma omp single for ensuring atomic access, environment routines such as omp_set_num_threads for controlling the degree of parallelism, and other advanced scheduling features.
If you haven't tried OpenMP before, you could get your hands on it right away, without waiting for C++0x and for the next release of the Microsoft native concurrency framework.
In the previous posts in this series, we have looked at a multitude of features provided by the PFX June CTP, including synchronization mechanisms, task-related features and new collection classes. However, there's also a large list of known issues with this release - it's obviously not production-ready, but nonetheless is a great milestone by the Parallel Extensions team. The most interesting issues mentioned are:
- TPL threads are not cleanly shut down when run in the Visual Studio test host. This effectively means that it's difficult to write unit tests for code that uses the Parallel Extensions. There's a work around by explicitly creating a TaskManager instance and explicitly disposing of it before letting the unit test terminate.
- Some PLINQ operations without order-preservation enabled (using the AsOrdered() method) exhibit meaningless behavior. For example, the Skip(N) operator without order-preservation will skip N elements in the randomly-ordered input, and not the first N elements. This is, of course, by design.
- Multi-core machines running a dedicated application or two might benefit from switching to the Server GC flavor if there's a significant amount of GC going on (the Server GC flavor performs garbage collections in parallel on all available processors, using dedicated GC threads, one per processor, yielding a potential significant collection speedup). The GC flavor used by the default CLR host is the Workstation Concurrent GC, and switching to Server GC is a wise recommendation even for applications that aren't using the Parallel Extensions at all. (Although there's a variety of good reasons why this is not the default.)
In the previous posts in this series, we have looked at the new synchronization mechanisms and the new task-related features in the PFX June CTP. This post features a brief overview of the new collection classes introduced in the CTP.
In the new System.Threading.Collections namespace we find three new classes which facilitate concurrent programming. These collections do not yet represent the wealth of concurrent and non-blocking collections that might be implemented in the future, but they are certainly a good sign.
A concurrent collection in the PFX nomenclature is what we recognize by the names of a non-blocking or a wait-free or a lock-free collection, i.e. a collection that does not incur a kernel-mode transition (wait) when items are added or removed. (In my DevAcademy2 session last year I have demonstrated the performance and scalability benefits of using lock-free collections.) Concurrent collections in PFX implement the IConcurrentCollection<T> interface, which extends IEnumerable<T> and ICollection:
public interface IConcurrentCollection<T> :
IEnumerable<T>, ICollection, IEnumerable
{
bool Add(T item);
bool Remove(out T item);
T[] ToArray();
}
The concurrent (i.e. lock-free) collections featured in this release are the ConcurrentQueue<T> and ConcurrentStack<T> collections. These are classic implementation of a lock-free queue and lock-free stack, which rely on spinning internally when there's significant contention for adding or removing elements. They also support enumeration semantics, with the caveat that if the collection is modified during enumeration, no exception is thrown and the enumeration can proceed and retrieve stale information. Since the collection is lock-free, there is no way to synchronize enumeration with add or remove operations, because there's no way to make enumeration over N elements an atomic coordinated operation without true locking.
Note that both ConcurrentQueue<T> and ConcurrentStack<T> are based on a singly-linked list. This is significantly less efficient than an array-based implementation for two reasons: First, there is significantly more garbage created by the add operations, because a node must be allocated (and the node is a reference type, adding to the overhead). Second, traversing a linked list is slower than traversing an array because an array features explicit locality. There's work in progress in the direction of alleviating these concerns, and we'll see where it leads.
The most interesting class of the bunch is the BlockingCollection<T> adapter class, which wraps a concurrent collection (any implementation of IConcurrentCollection<T>, defaulting to ConcurrentQueue<T>). It provides the facilities for blocking when elements are removed or added to the collection when a bound is reached. For example, it the collection is bounded to 1,000 elements, then when the bound is reached the add operation will block. Alternatively, if the collection is currently empty then the remove operation will block. Internally, the blocking collection adapter is implemented with two slim semaphores.
To streamline scenarios such as those where the blocking collection is used as a work-item coordination queue, there are also the static AddAny and RemoveAny methods, which can add or remove elements to any blocking collection from a set of blocking collections. For example, if we implement a producer-consumer scenario with multiple producers and multiple consumers, then the producers can use the AddAny method to add the work item to any of the consumer's blocking queues, and the consumers can use the RemoveAny method to remove a work item from any of the producer's blocking queues.
The following code demonstrates the above scenario in a very rudimentary fashion - work items are enqueued to any available blocking collection, and retrieved from any available blocking collection:
BlockingCollection<Action>[] queues =
new BlockingCollection<Action>[4];
for (int i = 0; i < queues.Length; ++i)
{
queues[i] = new BlockingCollection<Action>(5);
}
for (int i = 0; i < queues.Length; ++i)
{
new Thread(() =>
{
while (true)
{
Action action;
if (BlockingCollection<Action>.RemoveAny(
queues, out action) != -1)
{
action();
}
}
}).Start();
}
for (int i = 0; i < 100; ++i)
{
BlockingCollection<Action>.AddAny(queues, () =>
{
Thread.Sleep(100);
Console.WriteLine(
Thread.CurrentThread.ManagedThreadId);
});
}
Console.ReadLine();
In the previous post in this series, we have looked at the new synchronization primitives offered by the PFX June CTP. In this post, we will look at task-related features and at the new task scheduler.
Task Continuation
Another interesting feature in the CTP is the task continuation paradigm, allowing us to specify what should happen when a task completes. This is accomplished through the use of the ContinueWith method on the Task and Future classes. Among other things, this mechanism can be used for chaining multiple asynchronous operations in an ordered pipeline of execution. Since most of the PFX is focused on out-of-order execution of independent work items, this is a welcome addition that streamlines pipeline processing of dependent tasks that should still utilize multiple processors. The difference between the two paradigms is best illustrated by the following diagram:
For example, the following code schedules a pipeline of three dependent work items. Each work item depends on the execution result of the previous work item, producing a single final value:
Future<int> f = Future.Create(() => 5)
.ContinueWith(a => a.Value - 1)
.ContinueWith(b => b.Value - 1);
Console.WriteLine(f.Value);
Task.WaitAny
In the case of multiple tasks (or futures) executing concurrently, it's possible that we're interested in the execution result of only one of them. The classic example is having several algorithms that could compute a result, but not knowing in advance which algorithm will be the fastest to compute it. The following example launches multiple calculations at once, and waits for the first one to complete. When it completes, the rest of the calculations can be canceled.
Future<int>[] calculations =
new Future<int>[] {
Future.Create(() => 5),
Future.Create(() => 6),
Future.Create(() => 7)
};
int calcIndex = Task.WaitAny(calculations);
Array.ForEach(calculations, c => c.Cancel());
Console.WriteLine(calculations[calcIndex].Value);
The New Scheduler
This CTP features a new revamped scheduler that is used by the Task Parallel Library (TPL) and Parallel LINQ to schedule work items for execution. This scheduler is by and large undocumented, and consists of dozens of internal classes that strive to perform cooperative scheduling in user-mode, without resorting to the operating system or to the .NET thread pool. This has its advantages (potentially, could be lightning-fast) but also has its disadvantages. For example, blocking tasks can potentially result in a scheduler thread exhaustion, rendering additional tasks unschedulable. This is a classic concurrency scenario familiar to anyone who ever tried to implement a thread pool: If there is a dependency between work items that are waiting for execution and work items that are already executing, then an unresolvable deadlock might occur. E.g., consider the scenario where I have 4 thread pool threads executing 4 distinct work items. After performing their work, these work items block waiting for a fifth work item to complete - but for that fifth work item to complete, it must be scheduled for execution, and it can't be scheduled as long as the 4 threads are blocking on the previous work items. The .NET thread pool can alleviate such scenarios by dynamically expanding the pool of worker threads - expect this to be addressed in future releases of the PFX as well.
The underlying scheduler ought to be discussed in more detail sometime in the future, where more relevant information becomes available.
Just a few days ago, the Parallel Extensions team has released a new CTP of the Parallel Extensions for .NET 3.5, a.k.a. PFX. This new CTP is not just a bunch of bug fixes - it's packed with new functionality for us to explore. (I've written some introduction bits on the December '07 CTP in the past, so you might want to read them if you haven't played with the PFX yet.)
In this post series, we will look at most of the interesting new functionality.
Synchronization Primitives
This release contains a significant number of synchronization-related primitives, providing better performance and scalability when compared to the existing .NET mechanisms. Let's quickly review the new APIs.
First and foremost, almost every single new mechanism introduced in this CTP features the ability to spin (i.e., burn CPU cycles in a loop) before trying to acquire the synchronization primitive. Spinning is generally frowned upon as a means of achieving synchronization, but a small number of spins is significantly faster than a system call to acquire a kernel synchronization mechanism. These spinning facilities are provided through the SpinWait class, which we can use when constructing our own synchronization mechanisms. (As a side note, spinning before acquiring is not something invented by the PFX team - the critical section Win32 API features the ability to spin before acquiring the critical section through the InitializeCriticalSectionAndSpinCount function.)
The SpinLock class implements a synchronization mechanism closely related to the SpinWait class. The general idea here is that a spinlock is not supported by any kernel synchronization mechanism - a thread that wants to acquire a spinlock will spin indefinitely until the spinlock becomes available (spinlocks have been used in the Windows kernel from the very beginning, and it was fairly easy to write one in user-mode). The synchronization is provided by the Interlocked.CompareExchange primitive. Note that a spinlock cannot be acquired recursively - a LockRecursionException is thrown if you attempt to do so.
The CountdownEvent class is a synchronization mechanism that is initialized with a specified counter, and provides facilities for increment and decrementing the counter. When the counter reaches zero, the synchronization mechanism becomes signaled, thus releasing any waiting threads. This is an extremely useful facility that previously had to be implemented using a combination of Interlocked.Decrement and a ManualResetEvent. For example, the following code spins off four distinct tasks which decrement the event's counter until it reaches zero and the main thread is released:
CountdownEvent countEvent = new CountdownEvent(4);
Parallel.Invoke(
() => { ...; countEvent.Decrement(); },
() => { ...; countEvent.Decrement(); },
() => { ...; countEvent.Decrement(); },
() => { ...; countEvent.Decrement(); }
);
countEvent.Wait();
The ManualResetEventSlim and SemaphoreSlim classes feature revised implementations of the familiar event and semaphore concepts. This revised implementation relies on spinning and on using a Monitor internally, and creating a kernel synchronization primitive only as the last resort (for example, when the WaitAll or WaitAny methods are used, or when a WaitHandle is explicitly requested). This should provide better performance for the vast majority of applications using these synchronization primitives.
Another category of synchronization mechanisms is featured by the LazyInit<T> and WriteOnce<T> structures. The LazyInit<T> structure supports the lazy initialization paradigm in a thread-safe manner. (This is suspiciously similar to the one-time initialization mechanism introduced in Windows Vista.) The WriteOnce<T> structure supports a mechanism for ensuring that a variable is written to a most once, in a thread-safe manner.
The LazyInit<T> structure takes a factory function that performs the initialization when the value is first accessed, and is well-suited for lazily initializing a value that is expensive to initialize eagerly. It supports multiple modes of lazy initialization, exposed by the LazyInitMode enum. The available options are:
- EnsureSingleExecution - if multiple threads attempt to access the value concurrently, one of them will execute the factory function to initialize the value, and the rest will wait until the initialization completes. (This is similar to the synchronous one-time initialization native API, with InitOnceBeginInitialize and InitOnceComplete.)
- AllowMultipleExecution - if multiple threads attempt to access the value concurrently, all of them will begin executing the factory function to initialize the value, and the first one to succeed will signal the rest that initialization has completed and that this first value should be used. (This is similar to the asynchronous one-time initialization native API, with the INIT_ONCE_ASYNC flag.)
- ThreadLocal - if multiple threads attempt to access the value, each thread will execute the factory function to obtain a thread-local value that will be used on that thread only.
In the following example, the value is only initialized when accessed - this can be observed by setting a breakpoint on the second Console.WriteLine line and ensuring that the "Initializing" print-out is only executed when the value is accessed.
LazyInit<string> lazyInit = new LazyInit<string>(
() => {
Console.WriteLine("Initializing");
return "Hello";
});
Console.WriteLine(lazyInit.Value);
The WriteOnce<T> structure is similar to a nullable type that can be set only once. Any further attempts to set the value will result in an InvalidOperationException. It also features the TryGetValue and TrySetValue methods, which work in a thread-safe fashion to ensure that you're setting the value only once.
Debugging Aids
All of the new synchronization primitives feature debugging views that facilitate understanding the internal state of the synchronization mechanism while debugging in Visual Studio. For example, here's the debugger view of a countdown event:
And here's the debugger view of a lazily initialized variable (yes, there are still some bugs in this CTP):

With a few days' delay, I wanted to let you know that the MSDN Pulse issue for June 2008 has been released. This month's bulletin features yours truly as the Blogger of the Month - indeed, April and May have been quite productive, blogging-wise.
The primary announcement in this issue is the release of the Visual Studio 2008 Service Pack 1 Beta, which contains lots and lots of new goodies. I would add to that the recent release of a new Parallel Extensions CTP, which I hope to cover on my blog in a couple of days.
Additionally, this issue features my Visual Studio-related tip regarding the Visual Studio "Command Window".

Few people are aware of the Command Window, and even fewer people actually use it, but it can streamline fairly annoying tasks that require multiple mouse clicks through a single command console. Additionally, the Command Window features auto-completion for commands, so if you don't remember where a command is to be found, you can type a part of the name into the Command Window and it will take care of it for you.
Implementing a WCF forwarding router consists of providing an endpoint with a generic contract which accepts untyped messages. These messages can then be forwarded to another endpoint, which can in turn use a specific typed or a generic untyped contract.
An example of this scenario would be a bus (or router) which coordinates multiple services. These services might expose specific contracts, such as IOrderService, IShippingService, IPaymentService and many others. The intermediary in this case can expose a generic untyped contract to the clients, who will be none the wiser: They can still use the specific contract and communicate to the untyped intermediary which will perform the forwarding to the requested service. The intermediary doesn't have to be aware of all the different service contracts: It can communicate to the services through the generic contract. This is best (or my best) illustrated by the following diagram:
One of the challenges with implementing such an endpoint is that request-reply and one-way calls must be treated differently. A typical generic contract for the request-reply message exchange pattern will look like this:
[ServiceContract]
internal interface IGenericContract
{
[OperationContract(Action = "*", ReplyAction = "*")]
Message Action(Message message);
}
On the other hand, the same generic contract for the one-way MEP will look like this:
[ServiceContract]
internal interface IOneWayGenericContract
{
[OperationContract(Action = "*", IsOneWay = true)]
void Action(Message message);
}
Since these are two contracts, we can't trivially expose them on the same endpoint. This is annoying because it means the client now has to use a different endpoint (or Via address) for R-R and one-way message exchanges. Furthermore, since a single contract might contain both R-R and one-way operations, it's highly inconvenient for a client to use different endpoints for different operations on the same contract.
A naive attempt to remedy this might involve trying to mix the two MEPs in the same generic contract, like so:
[ServiceContract]
internal interface ICombinedGenericContract
{
[OperationContract(Action = "*", ReplyAction = "*")]
Message Action(Message message);
[OperationContract(Action = "*", IsOneWay = true)]
void OneWayAction(Message message);
}
However, this will not work because at service load-time, the contract will fail to validate: It is exposing two different operations with the Action set to *, so WCF has no means of automatically choosing one of them.
What we need to do is to give WCF the ability to disambiguate the two operations. This means we are not going to use Action="*", but instead provide an IDispatchOperationSelector which will be in charge of choosing the appropriate routing method. The interface will look like this (note the absence of Action="*" in the operation contracts):
[ServiceContract]
internal interface IFinalGenericContract
{
[OperationContract(ReplyAction = "*")]
Message Action(Message message);
[OperationContract(IsOneWay = true)]
void OneWayAction(Message message);
}
The operation selector will have to know whether the incoming message is targeted at a one-way operation or not. This can be accomplished in several ways - for example, if we know the set of specific target contracts, then we can enumerate their contract descriptions at load time and build a cache of which operations are one-way. Alternatively, we can just check if the incoming message has a ReplyTo SOAP element. If it doesn't, it's a one-way call.
The following is a simple implementation of IDispatchOperationSelector which also implements IEndpointBehavior to install itself on the relevant dispatch runtimes. (It's also possible to make it a service behavior and specify it on the service itself.)
sealed class GenericDispatchOperationSelector :
IDispatchOperationSelector, IEndpointBehavior
{
#region IEndpointBehavior Members
public void AddBindingParameters(
ServiceEndpoint endpoint,
BindingParameterCollection bindingParameters)
{
}
public void ApplyClientBehavior(
ServiceEndpoint endpoint,
ClientRuntime clientRuntime)
{
}
public void ApplyDispatchBehavior(
ServiceEndpoint endpoint,
EndpointDispatcher endpointDispatcher)
{
endpointDispatcher.ContractFilter =
new MatchAllMessageFilter();
DispatchRuntime runtime =
endpointDispatcher.DispatchRuntime;
runtime.OperationSelector = this;
}
public void Validate(ServiceEndpoint endpoint)
{
}
#endregion
#region IDispatchOperationSelector Members
public string SelectOperation(ref Message message)
{
string replyTo = message.Headers.ReplyTo;
if (String.IsNullOrEmpty(replyTo))
return "OneWayAction";
//else
return "Action";
}
#endregion
}
With this in place, we can move on to the implementation of the intermediary itself. We will need to install the endpoint behavior on the host's endpoint which exposes the generic contract. For example:
ServiceHost host =
new ServiceHost(typeof(Intermediary));
ServiceEndpoint endpoint =
host.AddServiceEndpoint(
typeof(IFinalGenericContract), ...);
endpoint.Behaviors.Add(
new GenericDispatchOperationSelector());
host.Open();
The final bit that needs to fall in its place is the forwarding logic. It might appear as if we can use the IFinalGenericContract devised above to communicate with one-way or R-R services alike. However, this isn't the case! If we use this generic contract, we effectively forward the need to use an operation selector and impose it on the services, which is not what we need. Instead, we need a pair of interfaces - the IGenericContract and IOneWayGenericContract outlined above, one for each MEP:
internal sealed class Intermediary : IFinalGenericContract
{
#region IFinalGenericContract Members
public Message Action(Message message)
{
//Request-reply case:
IGenericContract proxy = ...;
return proxy.Action(message);
}
public void OneWayAction(Message message)
{
//One-way case:
IOneWayGenericContract proxy = ...;
proxy.Action(message);
}
#endregion
}
This is all we need to implement forwarding logic in one place for the one-way and R-R patterns. The client code is exactly the same and the same endpoint can be used for both MEPs. Extending this sample to support the duplex MEP is left as an exercise for the reader. :-)