Cross-AppDomain Workflow Local Services
I stumbled across an interesting issue today that I thought might be worth sharing. The design of my application server requires hosting multiple workflow types, potentially of multiple versions (the workflows are exposed as Workflow Services, using the WorkflowServiceHost class). Since the server is long-running, it is feasible that several versions of the same workflow will be deployed and active on the same server, and several instances of each version will be created.
The only way to provide this behavior cleanly is through hosting each workflow in a separate AppDomain. In this way, the server can be truly long-running, with workflow types being loaded and unloaded without requiring a restart. Additionally, some of the workflow's local services are required to be shared across AppDomains, because they hold common state and information the workflows might require.
However, there is an interesting corner case involved when trying to host a workflow in a separate AppDomain and providing a local service that resides in a different AppDomain. The corner case occurs when you combine the above with a persistence local service.
Without further ado, here's the cross-domain service:
And here's the initialization code that takes place in a separate AppDomain:
As you see, we have combined the custom CrossDomainCalculatorService (which is marshal-by-ref, so it lives in the creator's AppDomain) with the SqlWorkflowPersistenceService. The connection string is invalid, but it's irrelevant because we will not get to the point where the persistence service complains about it.
Finally, here's the code to create the HostWorkflowInSeparateAppDomain thunk and call its Initialize method, causing the action to take place:

Tying these pieces together and running the resulting code produces an unexpected exception:
Well, SqlWorkflowPersistenceService is not marked as serializable. So bloody what? I am not trying to marshal the persistence service across AppDomain boundaries, I'm trying to marshal the CrossDomainCalculatorService which is marshal-by-ref and should work... Looking at the call stack we discover:
From here it's obvious that the exception occurs in the secondary AppDomain, and gets marshaled to the point of invocation in the primary AppDomain. However, why does Object.Equals produce a SerializationException saying that SqlWorkflowPersistenceService is not marked as serializable?
If you can see it already, good for you; if not, let's take a look at what Reflector has to show us in the implementation of WorkflowServiceBehavior.ApplyDispatchBehavior (I have omitted the irrelevant parts):
OK, so if there is a workflow persistence service, then the Workflow runtime has to be stopped if it's started. After that, the persistence service is removed, wrapped in an instance of SkipUnloadOnFirstIdleWorkflowPersistenceService (which is a private class in the WorkflowServiceBehavior - all it does it prevent workflow unload while the runtime is restarting), and added back to the workflow runtime. Finally, the workflow runtime is restarted if it were stopped earlier.
Can you see it now? Read again: "...the persistence service is removed..." - WorkflowRuntime.RemoveService is called. It all boils down to this:
So we're removing from a list. What's all the fuss about? Well, List<T>.Contains is next, and what it does is call Equals using the default equality comparer for the type. In our case (as in most cases), it just calls Object.Equals. Right, you could have told me that from looking at the call stack a couple of screens above. What else is new?
Let's think again about that Object.Equals call. While we are looking for the service to remove, we are essentially checking whether the Object.Equals call returns true when we pass to it the service we want to remove. Now, what happens when we invoke Object.Equals on our cross-AppDomain custom service and pass to it the SqlWorkflowPersistenceService?
What we have then is a cross-AppDomain call! And we're trying to marshal the SqlWorkflowPersistenceService in that call. Bingo.
What can we do to remedy the situation? Well, it doesn't seem like there is much to be done except for not using a cross-AppDomain custom service or not using a workflow persistence service (because it's the combination of both that creates this corner case). What I chose to do is to wrap the cross-AppDomain service in a class that will be created locally inside the secondary AppDomain, and will contain a reference to the cross-AppDomain service. It could be a simple wrapper or a delegating wrapper (if they implemented an interface), for example:
And then adding that wrapper as a workflow service instead of the cross-AppDomain service:
Another mystery solved.