The Asynchronous Queuing Pattern describes a classic way to improve service throughput in distributed applications. Over the years I have seen quite a few implementations of this pattern, from the use of MSMQ to ReactiveQueue, each with its own strengths and weaknesses. Windows Azure queue storage is designed for passing messages between applications in a persisted, scalable and controlled manner. With the above attributes, queue storage is a natural choice for enabling the Asynchronous Queuing Pattern, as described in detail in this MSDN magazine article.
A recent implementation I ran across at a client challenged the performance of the Azure queue storage, especially when dealing with a large queue. Their initial implementation was too slow due to a design issue we identified easily, but now they were stuck with a queue containing millions of records and they could not retrieve the messages fast enough. I decided to measure the length of the different queue operations they were using.
The code I used to measure the performance is very simple and can be found here so you can reproduce the tests for yourself. Keep these considerations in mind:
- We are using a public storage infrastructure that is prone to preemption by other applications.
- The Windows Azure storage infrastructure and API implementations are subject to change.
The following totals reflect 1000 iterations (minus the first 2 to remove the additional cost of the JIT compiler and other potential initialization overhead) of a standard consumer/producer use of Windows Azure queue storage:
The first thing we notice is that we can easily improve is the message retrieval code. In the above code we used the GetMessage method to retrieve the messages one by one. However the Windows Azure Queue API also exposes an API that allows the retrieval of up to 32 messages at a time using the GetMessages method. As you can see in the results from the following run, messages retrieval was over 6 times faster.
Note: since I omitted the first two iterations of GetMessages, I also omitted the first 64 iterations of every other queue operation, so at the end of the day we are looking at 936 messages rather than 998, but still the improvement is clearly noticeable.
The next stop on our quest for throughput improvement is the deletion of messages from the queue after we retrieve them. The consumer has to perform this operation in order to clear the message from the queue and ensure reliability. The call to DeleteMessage can also be easily improved. If you take a closer look at the code, you can see that we are using the DeleteMessage method, which is a synchronous call to the Azure Queue service. However there is no real need to wait for this call, so we can use its async implementation by calling BeginDeleteMessage. The results of this run (again for 1000 iterations minus 64) are shown here:
In our sample code, we do not handle exceptions for BeginDeleteMessage (as well as for DeleteMessage) but we can easily do so by passing a callback function to BeginDeleteMessage, which calls the EndDeleteMessage method inside a try/catch block.
Until this point, we have dramatically improved the consumer code for our queue, which I must admit the easy part. For the producer part it is going to be a bit more problematic. Windows Azure Queue Storage exposes an APM based API for adding messages to the queue (using the BeginAddMessage/EndAddMessage methods). If you are adding to the queue from a client application you can use this API to release the calling thread and using the network card to perform the majority of the heavy lifting.
If you are adding to the queue from a WCF service this will not be enough, you should consider using an asynchronous service contract. More information about implementing asynchronous services (and asynchronous calls in WCF in general) can be found in this blog post by Wenlong dong.
Windows Azure Queue Storage was created with the SOA Asynchronous Queuing Pattern in mind. Using it’s async APIs (based on WCFs awesome async capabilities) and calling the GetMessages batch method we ware able to improve it’s throughput and lower the need for more compute instances.