Dana Groff, Senior Program Manager on the ConcRT team is going to talk about the new Concurrency Runtime – an abstraction on top of the underlying operating system, supported from Windows XP through Windows Server 2008 R2.
The ConcRT Resource Manager is an abstraction over the hardware that allows vendors like Microsoft and Intel (OpenMP, TBB) to program at a higher layer and compose these platforms, as well as coming up with one set of concepts for providing parallel code such as tasks, task groups and so forth.
Dana uses a high-end AMD server with 48 cores (eight six-core processors, with eight NUMA nodes – the HP ProLiant DL785 G6). He uses it to scale a raw image processing application, and it scales almost linearly. When all the processors are working, the machine also generates a lot of noise that can be heard throughout the room 🙂
We think serially, and what we have to do is to start decomposing our algorithms into tasks, so that each task separately runs serially. Then we group these tasks – which ones depend on each other or work on the same data. Finally, we have to schedule these task groups together. What the runtime does is to cooperatively handle blocking events, dependencies and other things explicitly stated in the code.
A task can be a lambda, can be a pointer to a function, a function object – it’s a unit of work that has to run serially. Under the covers there are task_handle and lightweight tasks. The latter are used by agents – they are fire and forget; the former can be controlled and managed in a richer way, and they are cancelable.
When tasks are blocked, a notification is given to the runtime and the runtime decides that it can use that core for another task until it completes or blocks. When tasks unblock, the ConcRT scheduler prefers putting tasks on the same core where they ran before blocking.
Standard Threads and UMS
When using standard Windows threads, applications must use synchronization mechanisms provided by ConcRT to take advantage of cooperative scheduling. For example, there is an abstraction of an event that works differently from the Win32 event. When using Win32 synchronization mechanisms, there’s no way for ConcRT to know that its threads can run other tasks; when using ConcRT synchronization mechanisms, the blocking and switching is handled by ConcRT.
Some of the cooperative blocking mechanisms include wait for a task group, an event, critical sections, RWLs, receives and waits in the agent library, and others.
When running on UMS threads (on Windows 7 64-bit or Windows Server 2008 R2). User mode scheduling allows for ConcRT to be aware of kernel blocking, even using Win32 synchronization mechanisms. When a thread enters the kernel performing a wait, ConcRT is notified and can schedule some other user-mode thread on the same thread. UMS threads look exactly like regular threads, which is a great advantage compared to fibers.
The user-mode context of the UMS thread is handled by ConcRT; the kernel facility communicates with ConcRT when there’s need for rescheduling. ConcRT tries as hard as possible to reschedule the same user-mode context on the same core (or at least on the same NUMA node), if possible.
The promise of UMS is that you can sometimes be sloppy about not creating enough threads or oversubscribing to some resources, and also a performance win in some scheduling scenarios by as high as 8%.
Another great advantage of ConcRT is that it seamlessly takes advantage of the 256-processor support in Windows 7 and Windows Server 2008 R2. ConcRT has done the proper testing in that kind of environment (with 128- and 256-core machines) so that you don’t have to worry about that abstraction.
[The next talk, Developing Applications for Scale-Up Servers Running Windows Server 2008 R2, will focus more on the UMS and the low-level APIs. Stay tuned.]