It has been a while since I’ve published a post in my blog. In the past couple of months I’ve been working hard on writing the new official WCF 4 course for Microsoft and writing a book about ASP.NET 4 together with Shay Fridman, but things are calming down and I’m getting back to business.
A few days ago Sasha started writing several posts about debugging distributed transactions.
One issue that developers need to be aware of is the need to configure the transaction coordinators in all the machines that are involved in the transaction.
WCF supports both OLE transactions, which uses MSDTC (Microsoft Distributed Transaction Coordinator) and WS-AtomicTransaction which uses SOAP messages to transfer coordination messages.
By default, WCF uses OLE transactions and MSDTC, because usually the transaction scenario is used in intranet scenarios and not in Internet scenarios. If you need to use a distributed transaction over the Internet, you’ll probably want to switch to WS-AtomicTransaction (you’ll need to use a custom binding to do this), because MSDTC uses TCP communication which might be blocked when used in the Internet.
Any way, I’m not here to talk about WCF and distributed transactions, I will leave this to Sasha, I would like to share with you a case I faced today, trying to make the MSDTC service to run properly.
The scenario is as follows – a Windows 7 desktop is running a client application that uses a distributed transaction between an MS SQL server 2005 and Oracle 10g. The application isn’t using WCF, but the story is the same if it had been using WCF.
When the application tried to open a distributed transaction with both databases, the connection to the SQL Server failed with the following reason:
“The MSDTC transaction manager was unable to pull the transaction from the source transaction manager due to communication problems. Possible causes are: a firewall is present and it doesn’t have an exception for the MSDTC process, the two machines cannot find each other by their NetBIOS names, or the support for network transactions is not enabled for one of the two transaction managers. (Exception from HRESULT: 0x8004D02B).”
The firewall option was dropped since the servers and the desktop are in a closed network with no firewall, and there is no port blocking (checked with netstat –n), we’ve also removed the transaction security to minimum, but it still didn’t work.
Next step was to enable transaction tracing. Looking at the trace file showed the following error message: “SecureBuildContextWrapper call failed. This is usually due to security/network configuration issues”.
Because security was already checked, we went to check the network. We already saw that there was no firewall problem, so we decided to check if all the coordinators can talk with each other. Pinging from the SQL Server machine to the Oracle machine worked fine (we used the Ping command, but you can also use dtcPing), and vice-versa, and the client was able to ping both machines, but… when pinging from the SQL Server machine to the client machine, the ping failed !!
A quick check found out the reason – the SQL Server and Oracle machines where in a different domain from the client machine. In our organization, servers are in a different domain than the developers machines, but all the developer’s machines have a DNS suffix settings in their network device, so we won’t need to use the full name of the servers (machine + domain suffix). However, non of the servers had a DNS suffix for the developers domain.
After adding the DNS suffix of the developers domain to the database machines, and validating that they can ping client machines by their names (without the domain suffix), the distributed transaction worked !
So to conclude, what we need to check when troubleshooting MSDTC exceptions is:
Make sure all machines have MSDTC installed and running. Actually this was our first problem with the SQL Server machine, but this is easy to check and fix, so I didn’t mention it before.
Make sure there is no firewall block. Either use netstat –n or use dtcPing to check blocked ports while you try to establish a transaction.
Make sure all machines support inbound and outbound communication (compare MSDTC settings between machines in control panel->Administrative tools->Component Services).
Make sure all machines can access each other (ping).
If you still can’t get it to work, enable tracing and look at the outputted trace file for any suspicious errors (don’t forget to remove the tracing settings, so it won’t affect performance).
I hope we won’t get to too many problems further down the road, but if we will and we’ll find how to solve them (hopefully), I’ll be sure to let you know.