It was a bad day. I was coding away happily as we started getting alerts that one of our production farms is down. Our service seemed to be stuck on each one of our servers. Requests are coming in, but no responses appear. Restarting the service helped, only to get stuck again after a while. The logs were unhelpful, and we had to pull out windbg to figure out what was going on. When your day starts with windbg, you know it won’t be a good one.
The command ~*e!clrstack (=view stack trace of all managed threads) showed us that most of the threads are deadlocked in log4net’s AppenderSkeleton class. It looks something like this:
Lo and behold! There’s some locking here. I wasn’t even aware that log4net is locking on each call to Append, although it makes sense of course when your usual task is writing to files (one could suggest that there are better ways to approach that as well, and one would probably be right). Our own custom appender isn’t really writing to files, it sends log messages to Splunk (a pretty awesome system, albeit very expensive).
I know what you’re thinking. Why is your append method async? Well, since logging to Splunk is a networked operation, we don’t want it to be done on the current thread. So our own SplunkLogger just puts messages in a queue, and a dedicated thread will send them to Splunk. But we also want to know if and when the message was succesfully sent. Hence, an async task that only completes when the message is actually sent.
You’ve probably figured out right now what happended to our threads by now. If for whatever reason messages are stuck in the queue and not being sent, then the lock in DoAppend is never released! The Monitor.Exit() call won’t happen until the task returns, as the async/await system rightfully makes sure of that. Log calls from other threads will never be able to complete.
Once we figured this out, the solution was simple. Stop locking! Our SplunkLogger is thread-safe anyway, so this wasn’t required at all in the first place. Unfortunately this means we had to leave the conveinence of AppenderSkeleton and directly implement IAppender. You lose some filters and formatting support there, but luckily for us we weren’t really using these features in this appender.
Less luckily for us, there were more problems with this code. More on that in the next post, stay tuned.