Practical Concurrency Patterns: Spinlock

August 10, 2008

one comment

Previously in the series we have examined the performance differences between concurrency patterns based on kernel synchronization (critical sections, events, mutexes etc.) and concurrency patterns based on wait-free synchronization (such as the interlocked family of operations).  Kernel synchronization tends to be expensive if locks are frequently acquired and released because of the associated context switches, introducing anti-patterns such as lock convoys.  Wait-free synchronization tends to be expensive if locks are held for a long time because of the spinning which burns CPU without making progress.

Oftentimes neither approach is flexible enough to implement a practical scenario, and an intermediate mechanism is required.  Such an intermediate mechanism is available in the Win32 API in the form of a critical section with a spin count.

The InitializeCriticalSectionAndSpinCount API initializes a critical section and associates it with a spin count.  When a thread attempts to acquire the critical section and finds it locked, it will first spin the CPU for the specified number of iterations (i.e. the spin count) before transitioning to a kernel wait.  If the critical section is released while the thread is spinning, it will acquire the critical section without context switching and without a kernel wait, producing a significant performance boost.

Critical sections with spin counts are a specific example of the general spinlock pattern used in the Windows kernel and in user-mode frameworks such as the Parallel Extensions for .NET.

A typical kernel implementation of a spinlock is a single bit.  The spinlock has the following semantics:

  • When the spinlock is taken, the bit is set to 1.
  • When the spinlock is free, the bit is set to 0.
  • When a thread attempts to acquire the spinlock, it will spin indefinitely until the bit is set to 0 (using an atomic operation such as the x86 BTS or InterlockedCompareExchange).

A refinement on the spinlock pattern that is better suited for multi-processor machines is the queued spinlock.  A queued spinlock is typically implemented using a set of bits (queue), one for each CPU waiting for the spinlock.  Each CPU is also associated with a bit that it uses to spin while waiting for the spinlock.  When a spinlock is released, the releasing CPU hands it over to the next waiting CPU by setting its private bit and removing it from the queue.

This pattern minimizes inter-processor contention for memory because each CPU spins on a private bit which serves as its spinlock indicator.  It also introduces first-in-first-out order for acquiring the spinlock to guarantee fairness.

Side note:  In the Windows kernel, queued spinlocks are used on multi-processor systems.  On single-processor systems, spinlocks are not needed because spinlock synchronization is required on high IRQLs only.  On high IRQLs (above dispatch IRQL) a context switch cannot occur, so instead of spinning the acquiring thread can simply request an interrupt on the relevant IRQL and return; the interrupt will be masked until the releasing thread lowers the IRQL below the requested IRQL.

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>


one comment

  1. codekaizenSeptember 29, 2008 ב 12:27 AM

    Thanks for the description of queued spinlocks – I haven’t heard of them before now.

    However, I’m unclear on how the CPU which has the lock knows which CPU is next in the queue just by setting a single bit – how is this done? Is there some well-known global structure which is used? Also, you state, “the releasing CPU hands it over to the next waiting CPU by setting its private bit and removing it from the queue,” which CPU’s private bit is set, the locked or waiting CPU?