DCSIMG
Windows Service Shutdown Problem - .NET Geek

.NET Geek

"It is upon the Trunk that a gentleman works" - Confucius

Windows Service Shutdown Problem

In our current project we have a service that handles messages sent to the application over MSMQ. On startup the service initiates a configurable numbers of listeners that processes messages as they arrive. The main reason this listener initialization is bootstrapped through a Windows Service is because we must be able to “survive” a system shutdown/restart. After a restart the queue listeners must get back to their task of processing and endless stream of messages. I’m not going to go into too much detail on the queuing infrastructure we use (I’m saving that for another post), but I want to share some problems I had while debugging our Windows Service on system shutdown.

The problem: When the service is stopped as a result of a system shutdown the queue listeners are prematurely aborted.

A little background. Since the queue listening infrastructure is designed to gracefully stop on a stop request, I was a little puzzled that the service teardown behaved so differently in the case of a system shutdown. Using the helper class I blogged about last week I tested that the service starts and stops as it should even if there are messages currently being processed in the pipeline.

I have never had to debug shutdown problems before, but that was not my biggest hindering. The server process is a multithreaded server that, as I stated before, processes incoming messages. My heartbeat rate went up significantly only by the thought that I may be exposed to dead-locks or race conditions related to multithreading. Despite having a solid (IMHO) understanding of multithreading, I’m always anxious that complex multithreaded code doesn’t work as I expected. Race conditions and deadlocks are hard to debug under normal circumstances. The thought of having a threading related issue only during shutdown made me more than a little concerned. Looking back, the “fear” of a multithreaded issue severely slowed down my progress resolving the issue.

An obstacle that I didn’t account for when starting to work on this problem is that for every code change I made I had to reboot the computer in order to test it. Each reboot takes a few minutes. You don’t want to know how many times I rebooted my computer during the last two days of last week.

Instead of just letting you know what the issue was, I’ll try to walk through the process of resolving it. If I did something utterly stupid, please let me know. Anyhow, here we go.

My first step was to carefully walk through the multithreaded execution paths of the program. A few times I stopped and told myself that it should be ok since it works when I stop the service manually. I did spend some quality time with the inner workings of our MSMQ code, custom dispatchers and more. I could not find anything that I thought could be the cause of the problem.

My next stop was Google. I searched for various combinations of the following phrases: “Windows Service”, Shutdown, “.net”, MSMQ, abort, termination, multithreaded, and some more…

I was surprised by the sheer number of people who looked for a solution to Windows Service Shutdown related issues. Shortly though, it became apparent that most of these questions were related to people not getting the system shutdown notification. In order for .net to call an overridden OnShutdown(), the CanShutdown property must be set to true. Some misleading MSDN docs on the relationship between OnShutdown and OnStop flooded the search results and made it a little hard to parse the results.
No real progress on this path either…

Since the system does not have a user interface we utilize logging to a great extent. When you look at our log, it is actually like reading a story on the forward progress of the system as it works over time. In order to get a view into the shutdown phase, I added logging in about every place I could. The problem however is that there were almost no log records for the shutdown process. The service OnShutdown was called which in turn calls Stop. I had log records up to the point were the queue listeners started to shut down.
With 10 listeners it would look something like:
Shutting down listener: xxxxx1
Shutting down listener: xxxxx2
Shutting down listener: xxxxx3

And then the process went dark…

Ok, maybe I have a dependency that I didn’t specify. MSSQL and MSMQ were already specified as services our service is dependent on. Back to Google… MSMQ, “Windows Service”, Shutdown, dependencies… Google don’t let me down… But to no avail.

I’m not going to make this much longer, you got the point if you’re still reading.
I decided to shut down the system when all queues are empty. The reason for this, is that if the queues are empty, the queue listeners hang on a WaitHandle.WaitAny(…) until either a message arrives, a timeout occurs or a stop request is issued. In other words, I know exactly the execution path from that point to the end of the respective thread in the case a stop request was issued. I removed almost all code from that execution path (the system won’t work correctly like that, but that is besides the point). By the time I was done, there was almost only logging code left.

Let me say that again. By the time I was done, there was almost only logging code left.

And then it struck me as a lightening. The problem had to be related to the logging library we use. We use Enterprise Library 4.1 for logging and Exception handling. We log everything to the database and some things to the event log. I wanted to turn off all logging, but the problem with that is that then I won’t know what’s going on during shutdown. Since both the EventLog and MSSQL are services I decided to turn the logging to those off. I configured a file trace listener to track activity and shut down the system. It worked! I looked at the log file which contained a record for each stopped queue listener. I enabled the Database listener and tried to shut down the system again. It FAILED! It seems like the Database listener crashes the process when we try to log during shutdown. (I think I know the reason for this as well, but this post is getting long enough as it is. Another post to the backlog)

Ok, now that I knew what the problem was, I felt much better. I still had to figure out a solution to the database logging problem. There’s no way we can run the system without a database log.
I added a SystemShutdown() method on the Logging interface and in the Entlib logging wrapper I just remove the database listener. When OnShutdown() is called I call _logger.SystemShutdown(). Since the system is multithreaded I wanted to make sure that no other thread is in the middle of writing a log entry since I would be removing one of the listeners under their feet. Here’s the Entlib Logging wrapper.

public class EntlibLogger : IEngineLogger

{

    private const string LogCategoryException = "Exception";

    private const string LogCategoryTrace = "Trace";

    /// <summary>

    /// Ensure no-one is logging while we remove the database listeners during shutdown.

    /// </summary>

    private ReaderWriterLockSlim _readerWriterLock = new ReaderWriterLockSlim();

    public void Trace(string message)

    {

        _readerWriterLock.EnterReadLock();

        try

        {

            Microsoft.Practices.EnterpriseLibrary.Logging.Logger.Write(message, LogCategoryTrace);

        }

        catch (Exception ex)

        {

            WriteLogFailureToEventLog(ex);

        }

        finally

        {

            _readerWriterLock.ExitReadLock();

        }

    }

    public void Write(string message)

    {

        _readerWriterLock.EnterReadLock();

        try

        {

            Microsoft.Practices.EnterpriseLibrary.Logging.Logger.Write(message);

        }

        catch (Exception ex)

        {

            WriteLogFailureToEventLog(ex);

        }

        finally

        {

            _readerWriterLock.ExitReadLock();

        }

    }

    public void WriteException(Exception ex)

    {

        WriteException(ex, null);

    }

    public void WriteException(Exception ex, AbstractMessage message)

    {

        _readerWriterLock.EnterReadLock();

        try

        {

            LogEntry entry = new LogEntry();

            entry.Message = ex.ToString();

            entry.Categories.Add(LogCategoryException);

            entry.Priority = 10;

            entry.Severity = TraceEventType.Error;

            AddMessageToEntry(message, entry);

            Microsoft.Practices.EnterpriseLibrary.Logging.Logger.Write(entry);

        }

        catch (Exception logEx)

        {

            Exception exWrap = new Exception(logEx.Message, ex);

            WriteLogFailureToEventLog(exWrap);

        }

        finally

        {

            _readerWriterLock.ExitReadLock();

        }

     }

    private void AddMessageToEntry(AbstractMessage data, LogEntry entry)

    {

        if (data == null)

            return;

        entry.ExtendedProperties.Add("Message", data.ToXml());

    }

    /// <summary>

    /// If we for some reason can't log, write an entry to the event log.

    /// This is the lowest level of logging and should never throw as it is used to

    /// log exceptions.

    /// </summary>

    /// <param name="ex"></param>

    private static void WriteLogFailureToEventLog(Exception ex)

    {

        try

        {

            EventLog.WriteEntry("Engine.Logger", ex.ToString(), EventLogEntryType.Error);

        }

        catch

        {

            // Failed to write to log

            // Failed to write to event log

            // No more options...

            // I think this is the only place in the whole system where we eat an exception.

        }

    }

    public void SystemShutdown()

    {

        DisableDatabaseLogging();

    }

 

    /// <summary>

    /// Logging to the database while the system is shutting down crashes the process.

    /// </summary>

    private void DisableDatabaseLogging()

    {

        _readerWriterLock.EnterWriteLock();

        try

        {

            var sources = Microsoft.Practices.EnterpriseLibrary.Logging.Logger.Writer.TraceSources;

            foreach (var logSource in sources.Values)

            {

                logSource.Listeners.RemoveAll(listener => listener.Name == "Database Trace Listener");

            }

        }

        catch (Exception ex)

        {

            WriteLogFailureToEventLog(ex);

        }

        finally

        {

            _readerWriterLock.ExitWriteLock();

        }

    }

}

Any comments and/or improvements are welcome.

Happy Coding!

kick it on DotNetKicks.com

Comments

DotNetKicks.com said:

You've been kicked (a good thing) - Trackback from DotNetKicks.com

# January 11, 2009 12:21 AM

Rob Kraft said:

Thanks for this post; it helped confirm a similar problem I had with the logging block and nunit tests.  I noticed that you are using ReaderWriterLockSlim around the application block.  Is that one because you are modify the tracesources?  I'm sure the logging block takes its own internal locks before writing to the log.

# June 12, 2009 11:45 PM

Kim said:

@Rob, yes.

# June 13, 2009 9:17 PM

Kim said:

A little off-base, but this could help me trouble-shoot a problem here:

Do you always get the service shutdown from Windows?

We seem to be getting them intermittently in Vista and 7. Our service issues a Mute to our audio hardware to make sure there are no audio pops when power is turned off. I tested with Win7 multiple times and it always worked well, but now I go back and find that the service is staying on (and therefore no mute) right up until the power is shutdown.

Kim K.

# September 3, 2009 10:19 PM

Kim said:

Hmm, I see that our problem could be a dependency race problem.

Kim K.

# September 3, 2009 11:23 PM

a iance insurance said:

I am wondering exactly what Seth has to say with this :D

# December 10, 2010 10:31 PM

Akin said:

Hello mates, how is the whole thing, and what you wish for to say concerning this piece

of writing, in my view its actually amazing in support of me.

# March 9, 2013 7:42 PM
Leave a Comment

(required) 

(required) 

(optional)

(required) 


Enter the numbers above: