As server developers, we are used to a certain level of interactivity. Our services get a request and often return some kind of response. Lately I found myself justifying having HPC batch jobs. It’s sometimes hard to grasp that classic HPC programs, like the Human Genome Project, rendering a full feature 3D animation movie, or simply operating a “civilian purposes” nuclear reactor, take a long time. And when you execute such programs for days, weeks, months, etc., interactivity is not even a consideration.
Batch jobs are the very essence of classic HPC applications. Still, Microsoft is trying to expose Windows HPC Server 2008 to new verticals, some of which would like to use it for more interactive tasks. For such purposes Windows HPC Server 2008 supports an SOA model exposed via WCF. Microsoft also released a cluster debugger that has some cool features such as:
- Cluster debugging
- Local Debugging
- Running service code locally in a simulated Windows Azure environment
- Two project templates for creating both Interactive and Durable Session clients
There is also a decent MSDN walkthrough that shows how to create and debug an HPC SOA application. I recommend running through it, to get a feeling for how HPC SOA applications are built. In this post, I would like to take a deeper look at both session types and some of the mechanisms and techniques they utilize, both on the client and in the cluster.
Sessions are an essential part of HPC SOA clients. There are two types of sessions: Interactive and Durable sessions. Neither HPC client session type provides the same semantics that WCF sessions provide, where every call during the session uses the same instance of the service class on the server side. In fact, one of the roles of sessions, when used correctly, is to ensure that calls during the session will be load-balanced between different compute nodes in the cluster. Conversely, durable sessions provide another key functionality which I will describe below. To understand this process we need to take a look at two more components: The Job Scheduler and Broker Node.
The Job Scheduler
The job scheduler is the main component that runs on the head node. It handles units of work called jobs. The main concern of the job scheduler is to allocate the necessary resources for the job and start sub-units of the job tasks on the allocated compute nodes. In an SOA application, the tasks that the job scheduler creates are called service tasks and they host the services defined in the job.
Figure 1: A session starting service tasks using the job scheduler
In SOA applications the job scheduler has one more task, start a broker node that will be used to load balance all service calls between the service tasks.
The Broker Node
The broker node provides a few key capabilities for SOA applications, the first is exposing an endpoint for every service call targeting the specific service job. Every call that is sent to the broker node through that endpoint, will be load balanced between the service tasks in that job.
Figure 2: A Windows HPC Service Message Lifecycle
And now that we understand the basic stuff we can look into two of the more powerful (and cool) scenarios HPC provides for SOA applications.
By now we discussed mechanisms that work the same way in both Interactive an Durable sessions. The main difference between both sessions is (not surprisingly) durability. Durable sessions simply save the response message in an MSMQ queue on the broker node where they can be retrieved by clients, either the initiating client or any other client with the permissions to attach itself to the session. Another difference is that in order to use durable sessions, one must use the BrokerClient class in order to send and receive messages defined as MessageContracts as shown in the following snippet:
Grow/Shrink is basically a scheduling policy, and while the job scheduler provides few of these, Grow/Shrink is the more relevant for SOA applications. As its name suggests, Grow/Shrink allows administrators to add or remove resources for a job over time. This helps dealing with peeks and can even be done using Windows Azure Worker Roles as additional compute nodes. The cool thing is that once you add the resources to the job, the job scheduler notifies the broker node about the new resources and the broker node can now take them into account while load balancing.
SOA is still the the ugly duckling in the world of HPC. In fact the whole concept of using sessions to start a service on the cluster comes from the classic HPC paradigm of a single job that needs to be distributed on a cluster. In fact, you can start a service job directly on the server using The HPC 2008 Cluster Manager or use some very simple patterns to start a session that will be shared among all clients transforming Windows HPC Server 2008 R2 in to a lean, mean SOA machine.
There are many capabilities just waiting to be used in commercial SOA applications and the hardware does not need to be a monstrosity (but that can definitely make things cooler). The constant improvement in Windows Azure support makes this type of solutions ideal for dealing with uncommon / unpredicted peeks, since there is no need to buy the hardware to support them.
So until next time, remember: real developers use big computers.