Contrary to popular belief, Windows actually does a good job scheduling threads for execution across the available CPUs. (Yes, this was a controversial first sentence, but bear with me.)
More often than not, smart developers tend to try to “outsmart” the operating system or framework that they happen to be using. In the particular case of thread scheduling, this “outsmarting” normally tends to fall in one of two categories:
- Not trusting the OS with thread CPU affinity. I.e., if I know that I have 4 cores, I will explicitly create 4 threads and assign each thread to its very own CPU. Now I feel in control.
- Not trusting the OS to schedule work items for execution. I.e., if I know that I have 100,000 independent work items to execute, and 4 cores to execute them on, then I will explicitly create 4 threads and assign to each of them a queue, and enqueue 25,000 work items into each queue, . . . (Not to mention the attempt to create 100,000 threads, one for each work item, a classical mistake which belongs to “Threading and Concurrency 101.”)
I’d like to relate to affinity in this post, leaving thread pooling and thread pool management to a (hypothetical) future post.
One of the primary parallelism blockers is CPU affinity. If I have a task that is affined to one CPU, then if that CPU is busy but another is available, I have no means of executing the task. One of the classical cases of CPU affinity was the NDIS interrupt affinity, binding the network card to a single CPU which can process its incoming packets. If that CPU was particularly busy processing other interrupts, network receive-side processing was adversely affected. Furthermore, if multiple CPUs were available to perform receive-side processing, only one of them would do the actual work.
A slightly more complicated example has to do with scheduling threads with varying priorities on a multi-core system. Assume we have the following threads:
|Thread A||8||CPU 0, 1||0|
|Thread B||10||CPU 1||In a wait|
|Thread C||12||CPU 0, 1||1|
Assume that thread B comes out of its wait. The Windows scheduler now has to decide what to do with that thread. Since it’s only willing to run on CPU 1, and that CPU is currently running a higher-priority thread, thread B will have to wait. The scheduler won’t go out of the way to shuffle threads around so that both thread B and thread C can run (thread B on CPU 1 and thread C on CPU 0), because it means that all executing threads must be preempted, cache locality negatively affected, etc. So toying around with affinity has dire consequences for thread B in this case.
By the way, oftentimes CPU affinity is not intentional. It’s not necessarily the case that the developer was smart enough to actually use the thread CPU affinity API; it could also be the result of a specific framework or a specific scenario within a framework.
For example, Win32 executables have an optional PE header flag under the Image Characteristics section indicating that the executable should be executed on a single-processor machine only. To execute them on a multi-processor machine, a single CPU is chosen in a round-robin fashion for that executable.
Thread affinity and CPU affinity, by the way, are not the same. CPU affinity means that a specific thread is bound to a specific CPU. Thread affinity means that a set of tasks is bound to a specific thread.
For example, a COM object residing in an STA apartment will be marshaled as a proxy outside the apartment. The proxy will ensure that all calls to the object are marshaled to its original apartment and executed on a single thread, even if they originate at multiple threads. There’s your thread affinity.
The .NET runtime is slightly different with regard to fooling around with thread CPU affinities, as far as developers are concerned. Unlike the native execution model, which assumes direct control over threads, the CLR provides the notion of an abstract “task” which is not necessarily the same as the underlying OS thread. The default CLR host maps these tasks to OS threads, so there is a one-to-one correspondence between the “managed” thread and the “unmanaged” or “physical” thread. It is therefore possible to make certain assumptions and modify the CPU affinity of the physical thread as a means to modify the CPU affinity of the managed (logical) thread. This is fragile because the behavior is subject to change in a future CLR version, but we can probably rely on it for the next couple of years.
However, non-default CLR hosts (such as SQL Server 2005) are welcome to implement the abstract “task” notion as something that doesn’t directly map to an OS thread (for example, using cooperative scheduling, fibers, longjmp, or whatnot). The common reasons for doing that are performance (refraining from creating many physical OS threads) and reliability (exercising tighter control over task execution).
When dealing with non-default CLR hosts (or future versions of the default CLR host, for that matter), our code can no longer assume anything regarding a correspondence between physical and logical threads. Therefore, modifying the affinity of the physical thread is downright wrong in these environments, because it might affect logical tasks that are completely unrelated to the current logical task.
(As a side note, if our managed code specifically requires the same underlying physical thread during the execution of some operation, the Thread.BeginThreadAffinity and Thread.EndThreadAffinity APIs can be used to advise the CLR host of our intent.)