August 2008 - Posts
I’ve written about the perils of non-deterministic finalization before: It is a guaranteed way to get yourself in trouble.
One of the more-or-less known facts is that in the process shutdown scenario, finalizers are called for reachable objects as well. The total allotted time for this operation is approximately 40 seconds, and each individual finalizer gets approximately 2 seconds to execute.
It is extremely difficult to detect this scenario and debug it because we are talking about the CLR shutdown sequence.
Fortunately, there’s a registry key you can set at HKLM\SOFTWARE\Microsoft\.NETFramework to ensure that when the shutdown finalizer timeout occurs, a debug breakpoint will be raised from the process, allowing you the opportunity to debug it. Under this key, add a DWORD value called BreakOnFinalizeTimeout and set its value to 1. (On a 64-bit system, make sure you are using the regedit.exe suitable for the CLR process’ bitness.)
What happens next is that whenever a timeout occurs, you get a breakpoint in the debugger and can see what the finalizer thread is up to. The breakpoint itself occurs on the shutdown thread inside a function called FinalizerThreadWatchdog. If you switch to the finalizer thread, you will be able to see the actual code causing the problem:
ntdll!DbgBreakPoint:
00000000`7794fdf0 cc int 3
0:000> k
<snipped for clarity>
ntdll!DbgBreakPoint
mscorwks!`string'+0x1f52b
mscorwks!WKS::GCHeap::FinalizerThreadWatchDog+0x1e5
mscorwks!EEShutDownHelper+0x309
mscorwks!EEShutDown+0x63
<snipped for clarity>
0:000> !Threads
<snipped for clarity>
0 1 1870 MTA
2 2 172c MTA (Finalizer)
0:000> ~2s
ntdll!NtDelayExecution+0xa:
00000000`779505ba c3 ret
0:002> !CLRStack
<snipped for clarity>
FinalizationTimeout.Program.Finalize()
0:002> kb
<snipped for clarity>
ntdll!NtDelayExecution+0xa
KERNEL32!SleepEx+0x84
mscorwks!EESleepEx+0x31
mscorwks!Thread::UserSleep+0x79
mscorwks!ThreadNative::Sleep+0x116
FinalizationTimeout!FinalizationTimeout.Program.Finalize()+0x35 [C:\Users\Sasha\Documents\Visual Studio 2008\Projects\FinalizationTimeout\FinalizationTimeout\Program.cs @ 13]
mscorwks!FastCallFinalizeWorker+0x6
WKS::GCHeap::FinalizerThreadStart+0x133
<snipped for clarity>
So we have code in the Program class calling Thread.Sleep in the finalizer. That explains it.
By the way, the registry setting obviously affects all .NET applications on the machine. If you want to isolate the effect to a specific application without affecting the rest, you could use the Application Compatibility Toolkit to enable registry redirection to a special registry location for the specific application.
In this entry, we will examine the last two interesting new commands bundled in the Silverlight SOS.DLL – namely !FindRoots and !ListNearObj. (Previously in this series: Basic introduction to Silverlight SOS; The !AnalyzeOOM command; The !HeapStat and !GCWhere commands.)
The !FindRoots command is specifically oriented at understanding memory leaks and object promotion scenarios. It is a wrapper on top of existing functionality, making analysis more convenient.
The key scenario to using !FindRoots is as follows:
- You’re noticing that some objects are not being collected even though you think they are supposed to be.
- You determine that these objects are in some generation N (by using the !GCWhere command).
- You use !FindRoots –gen N to set a breakpoint when the GC completes a collection of generation N. (This is a one-shot breakpoint, by the way.)
- You use the !FindRoots command again with the object’s address to find all roots still keeping it alive (or, if the object is dead, you are satisfied that it has been collected).
The following is example output for the above scenario – I’m looking for the reason a specific object in gen1 is not being collected:
0:016> !GCWhere 0394aabc
Address Gen Heap segment begin allocated size
0394aabc 1 0 038e0000 038e1000 03986fb8 ...(40300)
0:016> !FindRoots -gen 1
0:016> g
<snipped for clarity>
CLR notification: GC - Performing a gen 1 collection. Determined surviving objects...
First chance exceptions are reported before any exception handling.
<snipped for clarity>
0:005> !FindRoots 0394aabc
older generations::Root: 048e4260(Object[])->
038e61a4(Dictionary`2...)->
038f8234(Dictionary`2+Entry...)->
038f0ca0(DebuggingSilverlight.Page)->
038f0de8(List`1[[System.Byte[], mscorlib]])->
0396b460(System.Byte[][])->
0394aabc(System.Byte[])
We have just determined that the specific byte array (at 0x0394aabc) is kept alive by a reference from an older generation. This reference is coming through a list of byte arrays which is turn referenced by the Silverlight page and so on.
The second !FindRoots command can be used to look at object roots only after the breakpoint induced by the first !FindRoots command has occurred. An alternative that is (almost) always available is the !GCRoot command (which lacks the ability to report “older generations” as a special kind of roots – it doesn’t treat older generations differently from any other reference root path).
For those of us (like me) still working on the desktop version of the CLR, an alternative to setting a breakpoint with !FindRoots is using the standard breakpoints mechanism in WinDbg. The key method to look for is mscorwks!WKS::GCHeap::RestartEE (for server GC, replace WKS with SVR); to determine which generation is currently being collected, inspect the static variable mscorwks!WKS::GCHeap::GcCondemnedGeneration.
To automate the process, you can set a conditional breakpoint to stop whenever gen1 has been collected:
0:005> bp mscorwks!WKS::GCHeap::RestartEE "j 0<poi(mscorwks!WKS::GCHeap::GcCondemnedGeneration) ''; g"
This is a somewhat dense format to learn, but the Debugging Tools for Windows documentation is quite helpful.
Finally, we’re up to the last command – !ListNearObj. From everything we’ve seen in the previous installments, you might have safely deduced that heaps don’t lie (even confirmed by Shakira). However, sometimes heaps do lie – or become corrupted – and when they do, the ability to take an arbitrary address and look at its surroundings to see what’s going on is priceless.
This is the purpose of the !ListNearObj command – when given an address, it will attempt to show you other objects in the area, even if the address points to the middle of an object. It does so using a heuristic looking for things that appear like a valid method table address (so if you accidentally have an integer field that looks like a valid method table address, the command reports wrong information). It also attempts to detect local heap corruption, as a bonus.
Here’s what !ListNearObj looks like when passed a rogue pointer into the middle of a byte array:
0:016> !ListNearObj 0393d500
Before: 0x393d430 24456 (0x5f88) System.Byte[]
After: 0x39433b8 42740 (0xa6f4) System.Byte[]
Heap local consistency confirmed.
It has found the heap locally consistent, and the address passed to it is in the middle of a large (24,000+ elements) array of bytes.
Another SOS.DLL command new in Silverlight is !HeapStat, featuring statistics on the sizes and available space in the various GC heaps and generations. (Previously in this series: Basic steps with Silverlight SOS; The !AnalyzeOOM command.)
Because Silverlight applications run within a browser, there’s typically going to be a single GC heap because the browser’s CLR host uses workstation GC. However, generations and the large object heap (LOH) are still of interest.
The following is the (edited) output of executing the !HeapStat command a couple of times in a Silverlight application that allocates memory once in a while:
0:017> !heapstat
Heap Gen0 Gen1 Gen2 LOH
Heap0 106508 176432 12 18640
Free space: Heap0 12 39576 0 144
SOH: 13% LOH: 0%
0:012> !heapstat
Heap Gen0 Gen1 Gen2 LOH
Heap0 2551768 8347676 15988516 8732248
Free space:
Heap0 12 1094504 1770688 1616
SOH: 10% LOH: 0%
Some of this information is also available with Windows performance counters; however, if you’re looking at a crash dump, this is extremely valuable.
Before you panic in view of the very low amount of free space in gen2 and the LOH, note that their sizes can extend – they are capped by the available virtual memory in the process. The !VMStat or !ProcInfo commands are more appropriate for reporting whether available memory is running low.
A somewhat related new command is !GCWhere, which provides information on the heap and generation to which a specific object belongs. This information can be (somewhat tediously) extracted from the output of !EEHeap –gc, but !GCWhere makes the information much more readily available.
The following is the output of !GCWhere on two objects – the first one belongs in the LOH (and has its generation reported as gen3), and the second one belongs in gen0. (Note that for invalid objects which still happen to fall within the GC heap’s memory range, the reported size will be 0.)
0:012> !gcwhere 0513b150
Address Gen Heap segment begin allocated size
0513b150 3 0 049d0000 049d1000 05224e58 ...(96248)
0:012> !GCWhere 06aa8db8
Address Gen Heap segment begin allocated size
06aa8db8 0 0 06270000 06271000 06cd5ff4 ...(8852)
In the conclusion of my previous post on the Silverlight SOS.DLL, I’ve mentioned that some new commands (not present in the desktop CLR version) have been added.
The first command we’ll talk about is !AnalyzeOOM, which (as its name implies) analyzes an out of memory condition and explains what went wrong. There are various reasons for out of memory exceptions, and this command can help you to pinpoint exactly what went wrong.
I’ve written a simple Silverlight application that allocates 1MB byte arrays in a loop and keeps them referenced forever. After a few seconds the application crashes with an OutOfMemoryException. Running the command exposes the following:
0:005> !PrintException
Exception object: 03991ad4
Exception type: System.OutOfMemoryException
<snipped for clarity>
0:005> !AnalyzeOOM
Managed OOM occured after GC #17 (Requested to allocate 1048592 bytes)
Reason: Didn't have enough memory to commit
Observing the process’ virtual address space reveals the following (edited):
0:005> !VMStat
TYPE MINIMUM MAXIMUM AVERAGE
Free:
Large 1,032K 16,004K 5,898K
<snipped for clarity>
Hmm. The largest available block of virtual memory is less than 16MB (which is less than the smallest GC segment size). This is why the allocation is failing.
The managed synchronization mechanisms, including Monitor, WaitHandle.WaitAny, ManualResetEvent, ReaderWriterLock, Thread.Join, GC.WaitForPendingFinalizers and the rest of the family are not just a thin platform adaptation layer on top of the Win32 API.
The CLR needs to know exactly which threads are currently waiting for a synchronization mechanisms for a variety of reasons. To mention two of them:
- In hosting scenarios, the CLR host might want to limit the ability of application threads to perform synchronization at all (to ensure reliability and control over threads).
- A waiting thread (in the WaitSleepJoin thread state) won’t be rudely aborted (by Thread.Abort) until it leaves the waiting state.
If all synchronization calls were straight P/Invoke-s to the Win32 services, these restrictions (among others) would be impossible to achieve.
An additional feature enabled by (most) of the managed synchronization mechanisms is message pumping. When you call one of the above APIs in an STA thread, the CLR takes care of message pumping (as of Windows 2000, simply by calling CoWaitForMultipleHandles which calls MsgWaitForMultipleObjects) to ensure that STA COM objects within that STA thread can process incoming calls, which are delivered through the Windows message loop.
If this were not the case, interesting deadlocks could ensue. For example, the finalizer thread might want to release an STA COM object, which requires marshaling the call to the STA thread. If the STA thread has just called GC.WaitForPendingFinalizers and does not pump messages, a nasty deadlock occurs. (There are numerous other examples of how this could be problematic, but I’ll omit them for brevity. Courageous readers are welcome to read Chris Brumme’s post on apartments and pumping.)
However, the CLR is also smart enough to realize that on an MTA thread, there’s no need to pump messages – so it doesn’t. A wait on an MTA thread is a true WaitForMultipleObjects, (almost) no strings attached.
For future reference, here’s what a call stack for an STA thread Thread.Join call looks like (.NET 3.5 x64, edited for brevity):
0:006> k
ntdll!NtWaitForMultipleObjects+0xa
KERNEL32!WaitForMultipleObjectsEx+0x10b
USER32!RealMsgWaitForMultipleObjectsEx+0x129
USER32!MsgWaitForMultipleObjectsEx+0x46
ole32!CCliModalLoop::BlockFn+0xbb
ole32!CoWaitForMultipleHandles+0x145
mscorwks!NT5WaitRoutine+0x77
mscorwks!MsgWaitHelper+0xed
mscorwks!Thread::DoAppropriateAptStateWait+0x67
mscorwks!Thread::DoAppropriateWaitWorker+0x195
mscorwks!Thread::DoAppropriateWait+0x5c
mscorwks!Thread::JoinEx+0xa5
mscorwks!ThreadNative::DoJoin+0xda
mscorwks!ThreadNative::Join+0xfa
0x642`80150b01
And here’s what a call stack for an MTA thread Thread.Join call looks like:
0:005> k
ntdll!NtWaitForMultipleObjects+0xa
KERNEL32!WaitForMultipleObjectsEx+0x10b
mscorwks!WaitForMultipleObjectsEx_SO_TOLERANT+0xc1
mscorwks!Thread::DoAppropriateAptStateWait+0x41
mscorwks!Thread::DoAppropriateWaitWorker+0x195
mscorwks!Thread::DoAppropriateWait+0x5c
mscorwks!Thread::JoinEx+0xa5
mscorwks!ThreadNative::DoJoin+0xda
mscorwks!ThreadNative::Join+0xfa
0x642`80150a86
Don’t you want to hug these MTA threads? They are SO_TOLERANT.
Oh, and one more thing to wind this up. You may have read that WaitHandle.WaitAll is not supported (at all) on an STA thread, and throws an exception if you attempt to call it. Why does this make sense?
(… Dramatic suspense …)
Well, if you call MsgWaitForMultipleObjects from an STA thread and use bWaitAll=TRUE, you’re essentially saying that you want to wait for all the handles to become signaled and for a message to arrive. This is clearly not the intent, and there’s hardly anything the CLR can do about it, so it forbids the whole situation.
There are workarounds (some described in Chris Brumme’s post which I cited above, some elsewhere), but they don’t fully address the situation. For example, one alternative (which I recently used in a project) is to spawn a new MTA thread, have it perform the WaitHandle.WaitAll, and use Thread.Join to wait for that thread to complete. However, if one of the handles has ownership semantics (e.g. a mutex), this breaks because the mutex will be owned by the wrong (and terminated – thus considered abandoned) thread.
However, one thing to keep in mind is this: The CLR does a whole lot of work to ensure managed synchronization works properly (and employs ruthless dark magic when necessary). The worst thing you can possibly do is calling into Win32 synchronization primitives directly, bypassing the CLR.
One (or three) of the first questions native developers ask when learning about the .NET garbage collector is this:
How do I control the garbage collector? How do I get a notification when a garbage collection occurs? What do I do to limit the garbage collector’s impact on my application’s performance?
I would be shooting myself in the foot by saying that these questions should not be addressed with the seriousness they deserve. The garbage collector is one of the primary factors affecting the performance of managed applications. It is difficult to control and predict its behavior. There is a variety of native (CLR hosting) and managed APIs for obtaining information from and controlling the GC’s heuristics, including the GC flavor, segment sizes, generation startup limits, collection notifications and allocation control.
As of .NET 3.5 SP1, another set of APIs joins the existing interfaces. In .NET 3.5 SP1, it is possible to receive a notification when a garbage collector is approaching and act accordingly if application needs dictate it. (These APIs are available only if concurrent GC is disabled – i.e. when using workstation non-concurrent GC or server GC.)
The basic model is as follows. If an application is very sensitive to garbage collection timing, it might want to receive a notification when GC is about to occur, and receive yet another notification when GC has completed. The MSDN example cites a server application processing low-latency requests across multiple server instances. If a full GC is imminent, the application might want to redirect requests to another server instance while the collection is taking place.
This scenario is plagued with caveats. Redirecting load to another server instance might take longer than the GC itself. Messing around with load balancing algorithms might be more expensive than violating the low-latency guarantee on a few requests. And the list goes on and on – so caveat lector.
Another example is an application that is sensitive to GC timing but has lots of idle periods with no significant work. Such an application might want to trigger garbage collections “manually” by calling GC.Collect during idle times if it is notified that GC is approaching. This is again risky because if your application’s allocation patterns are not uniform, you might be triggering a collection too early or too late.
However. You’re not here for my lame advice telling you that every single pattern out there is wrong. The new APIs have been added for a reason – they give you a choice. Let’s take a look at this choice.
- By calling GC.RegisterForFullGCNotification, an application indicates that it is willing to poll the CLR for a GC approaching.
- In a separate application thread, the application then calls GC.WaitForFullGCApproach which returns a value of the GCNotificationStatus enumeration (this wait can also be associated with a timeout). The status can be many things, among them GCNotificationStatus.Succeeded indicating that a full GC is near.
- The application now proceeds to perform a collection manually, balance the work load to other instances, notify the user – anything of interest, really – and then calls GC.WaitForFullGCComplete.
- The loop then repeats until at some point someone calls the GC.CancelFullGCNotification method and stops the polling thread.
This is shown in the following simple code snippet:
GC.RegisterForFullGCNotification(10, 10);
Thread monitor = new Thread(() =>
{
while (true)
{
GCNotificationStatus status = GC.WaitForFullGCApproach();
if (status != GCNotificationStatus.Succeeded)
{
//Canceled, invalid etc.
}
//Redirect load, call GC.Collect yourself etc.
status = GC.WaitForFullGCComplete();
if (status != GCNotificationStatus.Succeeded)
{
//Canceled, invalid etc.
}
});
monitor.Start();
DoAllocations();
Console.ReadLine();
GC.CancelFullGCNotification();
(The above code assumes that the DoAllocations method performs enough allocations to trigger a full GC.)
To test this, you will have to ensure that you’re running under non-concurrent GC (or else, attempting to register for notifications will throw an exception). The easiest way to do so is by creating an app.config file and adding to it the following configuration element:
<configuration>
<runtime>
<gcConcurrent enabled="false" />
</runtime>
</configuration>
Note that the GC.RegisterForFullGCNotification method takes two integer parameters, which can be in the range 1-99. These values (thresholds) affect the eagerness of the notification: a high value means you are more likely to be notified when a GC is approaching, but that more time might pass until the GC actually occurs; a low value means you might not get the notification in time to do your magic before the GC begins and suspends all threads (remember, this is non-concurrent-GC-land).
Why are there two values? The first one controls the notification eagerness for the small object heap (generation 2) and the second controls the notification eagerness for the large object heap.
Summing this up, as of .NET 3.5 SP1 there’s yet another way to interact with the .NET garbage collector, and it is not recommended to start messing around (as always :-)) unless you are absolutely certain that the performance characteristics of your application are highly sensitive to garbage collection timing.
I’ve written about SOS.DLL before – it is the ultimate Swiss-army knife for debugging managed applications. Memory leaks, unexpected crashes, high-CPU scenarios, hangs and deadlocks can all be pinpointed using SOS.
Silverlight 2.0 offers a model for managed code execution inside the browser, and that managed code will require the same debugging capabilities needed for standard managed applications. The Silverlight CLR (coreclr.dll) is a subset of the desktop CLR, but fortunately it ships with a version of SOS that can be used to debug Silverlight applications in the browser.
This special SOS is available as part of the Silverlight Developer Runtime (Windows version), and is installed by default to C:\Program Files\Microsoft Silverlight\<Version> (currently 2.0.30523.8).
To demonstrate some of the capabilities, I’ve written a trivial Silverlight application with three buttons, one for each debugging scenario:
Let’s click the “Leak Memory” button a few times. Now we can take a dump of or attach WinDbg to the web browser process, exactly like any other process. (I’m deliberately saying “web browser process” and not “Internet Explorer”, because every single one of these demos can be reproduced with Firefox, for example.)
Once the debugger is attached, we can load the Silverlight SOS into the process and execute the standard commands for diagnosing a memory leak:
0:014> !dumpheap -stat
total 11445 objects
Statistics:
MT Count TotalSize Class Name
<snipped for clarity>
03845670 228 25644 System.Object[]
03845f9c 913 46164 System.String
03ad3ad4 5855 70260 System.RuntimeTypeHandle
03ade400 17 13632144 System.Byte[]
Total 11445 objects
OK, so we have large byte arrays. Who is holding a reference to them?
0:014> .foreach (obj {!dumpheap -type System.Byte[] -short}) {!gcroot obj}
Note: Roots found on stacks may be false positives. Run "!help gcroot" for
more info.
Scan Thread 4 OSTHread d0
Scan Thread 12 OSTHread d40
Scan Thread 13 OSTHread eb4
<snipped for clarity>
04e10f34(DebuggingSilverlight.Page)->
04e10f84(...List`1[[System.Byte[], mscorlib]])->
04e20f10(System.Byte[][])->
06505990(System.Byte[])
<snipped for clarity>
04e10f34(DebuggingSilverlight.Page)->
04e10f84(...List`1[[System.Byte[], mscorlib]])->
04e20f10(System.Byte[][])->
06a05a30(System.Byte[])
We can conclude that our byte arrays are being held by the main Silverlight page, through a list of byte arrays. We can detect the name of the list field by inspecting the page object (04e10f34) and looking for the address of the list (04e10f84):
0:014> !do 04e10f34
Name: DebuggingSilverlight.Page
MethodTable: 03b137dc
EEClass: 03b11954
Size: 80(0x50) bytes
(DebuggingSilverlight, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null)
Fields:
MT Field Offset Type VT Attr Value Name
<snipped for clarity>
0488ad68 4000009 44 ...yte[], mscorlib]] 0 instance 04e10f84 leakSource
Well, there we have it – there’s a field called leakSource (which is an instance of List<byte[]>) which has a reference to all the leaking byte arrays.
Let’s try another scenario. Clicking the “Crash” button causes the Silverlight application to throw an unhandled exception. When running without a debugger attached, this produces the “Error on page” icon and in the details dialog we can see the exception information:
When a debugger is attached, this causes a first-chance exception in the debugger, and gives us the opportunity to inspect the exception information:
(e54.d0): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
<snipped for clarity>
We can examine the thread stack and see exactly where the exception is coming from:
0:004> !CLRStack
OS Thread Id: 0xd0 (4)
ESP EIP
0350f3a0 023204e7 DebuggingSilverlight.Page.ButtonCrash_Click(...)
0350f3c8 0491cced System.Windows.Controls.Primitives.ButtonBase.OnClick()
0350f3e0 0491cbf0 System.Windows.Controls.Button.OnClick()
<snipped for clarity>
All the standard SOS tricks apply – we can obtain the IL of the crashing method, view annotated disassembly, dump stack objects etc.
The final scenario (with the “Hang” button) causes the browser to stop responding. We can break in with a debugger and inspect thread stacks to find the problematic stack:
0:004> !CLRStack
OS Thread Id: 0xd0 (4)
ESP EIP
0350f35c 76e59184 [HelperMethodFrame: 0350f35c] System.Threading.Thread.SleepInternal(Int32)
0350f3a8 02261fef System.Threading.Thread.Sleep(Int32)
0350f3b4 033e04cc DebuggingSilverlight.Page.ButtonHang_Click(...)
0350f3c8 0491cced System.Windows.Controls.Primitives.ButtonBase.OnClick()
0350f3e0 0491cbf0 System.Windows.Controls.Button.OnClick()
<snipped for clarity>
Well, someone is calling Thread.Sleep which causes the hang. What’s the parameter? We can do the usual kb to see the parameter:
0:004> kb
ChildEBP RetAddr Args to Child
<snipped for clarity>
0350f2a4 6700a753 ffffffff ... kernel32!SleepEx+0x52
Well, it’s an infinite timeout (0xffffffff). The browser is not going to become responsive anytime soon.
What’s even more interesting is that the SOS included in the Silverlight Developer Runtime has more commands than the standard SOS bundled with the desktop CLR! But that’s best reserved for a future post.
Shortly after publishing my last post on converting native (C++ or Win32) exceptions to managed exceptions, I realized that some clarifications are in place.
Trapping every Win32 exception and indiscriminately translating it to a managed exception is extremely dangerous. There are various scenarios in which a native exception is thrown to be caught, processed and handled – interfering with this process through a vectored exception handler might result in disastrous consequences.
For example, when a managed application attempts to access an object through a null reference, the JIT-ted code causes a Win32 access violation by referencing a null pointer. This exception is caught by the CLR and converted to a NullReferenceException without our handler – we certainly wouldn’t want to interfere with this process.
Another example is the way thread stacks extend on Windows. When a thread is created, a stack space is reserved for it (the default stack size is 1MB). However, only a single page is actually committed – because most threads don’t need a full 1MB of stack. The second page is marked with PAGE_GUARD, and when the thread requires this page a page guard exception is raised and handled to commit the second page to memory. As part of handling this exception, the next page on the stack is marked with PAGE_GUARD, and so on. If we interfere with this process by “converting” the exception to something else or by swallowing it (e.g. using IsBadWritePtr), the thread’s stack won’t be able to grow without a real reason. It is an exception that is intentionally thrown and handled – it’s none of our business to “handle” it.
There are subtler issues with converting exceptions using this approach. It is fairly difficult to determine exactly which exceptions you are interested in and which exceptions require conversion. As I wrote in the previous post, this can be done by deriving all “interesting” exceptions from a common base class, using a custom Win32 exception code, or even inspecting the exception address to see whether it was raised from “interesting” code.
In other words – use this approach with caution, as always when messing around with internals and bringing together the disparate worlds of managed and native code.
In a layered application consisting of managed upper layers and unmanaged lower implementation layers, you might frequently encounter the need to call unmanaged code, handle any exceptional conditions and reflect them to the upper layers as managed exceptions.
This can be done manually, but fortunately as of Windows XP there is an approach that does not require wrapping every unmanaged call in an exception handler. This approach is vectored exception handling.
A vectored exception handler is stackless – it can be attached to an application and get a chance to process any exception before stack-based exception handlers process it. If you want to convert Win32 or C++ exceptions to managed exceptions, a vectored exception handler can do it for you. (For more background on Win32 SEH and vectored exception handling, I strongly recommend Matt Pietrek’s articles: ‘97 MSJ on SEH and ‘01 MSDN on vectored exception handlers.)
A vectored exception handler, like any other exception handler, takes an EXCEPTION_POINTERS structure which contains an EXCEPTION_RECORD. Inside the record is the Win32 exception code and additional exception information. For example, C++ exceptions have the magic 0xE06D7363 Win32 exception code, and the exception object is passed by pointer in the second element of the EXCEPTION_RECORD.ExceptionInformation array.
Finally, we will have to ensure that our vectored exception handler does not process CLR exceptions – or we wind up in an infinite loop. This is easily done because all CLR exceptions have the same Win32 exception code – 0xE0434F4D.
Some kind of policy must be in place with regard to what exceptions are handled and how Win32 exceptions are converted to managed exceptions. As part of our policy, we should probably decide on a base class for all C++ exceptions propagated from our unmanaged code (e.g., std::exception).
An exception converter that would do the job for now can be implemented like this:
ref class ExceptionConverter {
public:
static Exception^ GetException(
unsigned int ExceptionCode,
void* pExceptionObject) {
//More complicated logic can be added here
if (pExceptionObject == NULL)
return gcnew Win32Exception(ExceptionCode);
std::exception& rException =
*(std::exception*)pExceptionObject;
return gcnew ApplicationException(
gcnew String(rException.what()));
}
};
With this policy in place, we can register the following exception handler to do the conversion:
#define CLR_EXCEPTION_CODE 0xE0434F4D
#define CPP_EXCEPTION_CODE 0xE06D7363
LONG ConvertToManagedExceptionIfNeeded(
PEXCEPTION_POINTERS ExceptionInfo) {
void* pExceptionObject = NULL;
PEXCEPTION_RECORD ExceptionRecord =
ExceptionInfo->ExceptionRecord;
switch (ExceptionRecord->ExceptionCode){
case CLR_EXCEPTION_CODE:
//Don't touch CLR exceptions.
return EXCEPTION_EXECUTE_HANDLER;
case CPP_EXCEPTION_CODE:
pExceptionObject = (void*)
ExceptionRecord->ExceptionInformation[1];
default:
break;
}
Exception^ e = ExceptionConverter::GetException(
ExceptionRecord->ExceptionCode, pExceptionObject);
throw e;
}
#pragma unmanaged
LONG CALLBACK ConverterVectoredExceptionHandler(
PEXCEPTION_POINTERS ExceptionInfo){
return ConvertToManagedExceptionIfNeeded(ExceptionInfo);
}
Note that it’s comprised of an unmanaged handler (which we need to register using AddVectoredExceptionHandler) and a managed method that does the actual conversion.
With this in place, we can use the following test method to see our converter in place for handling Win32 exceptions (such as an access violation) and C++ exceptions (std::exception in this case):
#define INSTALL_HANDLER_FIRST 1
#define INSTALL_HANDLER_LAST 0
int main()
{
AddVectoredExceptionHandler(
INSTALL_HANDLER_FIRST,
ConverterVectoredExceptionHandler);
try{
//Trigger an AV:
AccessViolation();
//Raise a Win32 exception:
//RaiseException(15, 0, 0, NULL);
}
catch (System::Exception^ e){
Console::WriteLine(
"Got managed exception: {0}",
e->Message);
}
catch (...){
Console::WriteLine("unexpected exception");
}
try{
//Throw a C++ exception:
throw std::exception("This is a C++ exception.");
}
catch (System::Exception^ e){
Console::WriteLine(
"Got managed exception: {0}",
e->Message);
}
catch (...){
Console::WriteLine("unexpected exception");
}
return 0;
}
There is one thing to look out for. If the unmanaged C++ code is rich with exceptions being thrown and handled, then this handler is inappropriate – it will catch all such exceptions and immediately convert them to managed ones. Therefore, we need some way to differentiate exceptions that need to be converted and propagated from exceptions that are handled entirely in the unmanaged lower layers.
This can be done with a custom exception base class, some sort of flag, or even (hack!) by inspecting EXCEPTION_RECORD.ExceptionAddress to see where the exception occurred in code.
Thanks to the reader Brian Romanko for facilitating the discussion which inspired this post.
Two-phase initialization is an architectural pattern for artificially breaking and managing coupling between strongly coupled components. The motivation and implementation of this pattern are not always obvious, so I will give a couple of examples to demonstrate.
Let’s take an operating system as an example. Some of the components involved in the initialization of the operating system are the I/O manager, the memory manager, the object manager and many others. At runtime, the strong coupling between the various components is obvious and beneficial – they tend to use each other, all the time.
However, during system startup, these dependencies (especially if startup is performed synchronously) can lead to a dead end. For example:
- The memory manager initializes. It needs to create shared memory objects (section objects) to represent binary images being loaded during system startup. This requires a trip to the object manager.
- The object manager initializes. It needs to allocate memory for the system handle table and for the actual resources being created. This requires a trip to the memory manager.
Another example can be taken from an ESB infrastructure I have been implementing lately. The infrastructure services include a configuration service, a publish/subscribe service and a “DNS”-style service. These services are typically used by other system components, but they also need each other:
- The configuration service initializes. It needs to register itself in the “DNS” service to be accessible by other system components.
- The “DNS” service initializes. It needs to obtain its configuration and use the pub/sub service to register for configuration change notifications.
- The pub/sub service initializes. It needs to obtain its configuration and register itself in the “DNS” service to be accessible by other system components.
Disentangling these dependencies can be done in various ways. For example, we could say that the infrastructure services are not allowed to use each other – the pub/sub service will use local configuration, the “DNS” service will have a predefined list of registered endpoints, etc.
However, in an operating system we can’t resort to a solution in which the object manager manages its own memory, and the memory manager manages its own objects.
The only feasible alternative is two-phase initialization.
When using two-phase initialization, infrastructure components initialize in two phases. In the first phase, they do not rely on any other components to reach a stable state in which they are able to provide basic services to the rest of the system. In the second phase, they transition to a fully-functional state in which they rely on other components (which have not necessarily reached the second phase yet).
Using this model in our example, the “DNS” service can start with a predefined list of endpoints that will be used to communicate with the infrastructure services while they are in the first phase. In the second phase, these predefined endpoints will be replaced by the actual endpoints for the actual services. The pub/sub service can start with a local configuration during the first phase, and retrieve its configuration when the configuration service becomes available (enters the first phase), and so on.
Providing a generic implementation for all infrastructure and non-infrastructure services to account for two-phase initialization is exceptionally difficult, but achievable if the proper metadata is in place. Components must provide metadata regarding their explicit dependencies and ways to make forward progress while these dependent components are not yet available.
This sounds simple, but in reality it really isn’t. Multiple issues plague the two-phase initialization pattern, but do not undermine its principal validity:
- Transitioning between initialization phases might require a significant amount of work. For example, the pub/sub service might use a database to store the subscription information, and when transitioning to the second phase (by talking to the configuration service) the connection string to the database might have changed.
- Deadlocks can be introduced into the startup sequence if initialization is not carefully asynchronous.
- Terrible race conditions can be introduced into the startup sequence if it is not carefully synchronized for multiple threads of execution.
- Lots of noise is generated in the system while it’s restarting or when some components are being reinitialized.
The two-phase initialization approach is used by Windows. In the first phase (called phase 0), initialization proceeds in a single thread and bring up only the minimal services required for the second phase. In the second phase (called phase 1), system components can rely on other components being present to start transitioning into their fully-functional state.
To summarize, two-phase initialization is difficult to manage and implement, but in the real world where components circularly depend on each other there is rarely a better alternative.
Another very simple pattern builds on the foundation of the Safe-Unsafe Cache pattern. What is the easiest way to protect data from multi-threaded access and to incur the minimal performance cost while doing so? Making it read-only!
Read-only (immutable) objects are extremely convenient to use because there’s no need to enforce thread-safety – all accesses to read-only objects are thread-safe. Additionally, immutable objects decrease complexity – it’s impossible for one part of the code to modify an object that is used by another part of the code (even if not concurrently from different threads).
Due to these benefits, some types are immutable by definition – for example, the .NET string is immutable (unless you resort to ugly pointer-manipulation tricks). Other types can be made immutable at some point after they have been used – for example, the SafeUnsafeCache<T> can be locked to and switched to a read-only mode. WPF has a similar feature called freezable objects – objects derived from Freezable can be frozen so that they can be safely accessed and referenced from multiple threads and points in code.
Mutating an immutable object is an illusion: It always results in the creation of a new object that has some properties similar to the prototype. Eric Lippert’s series on immutable types in C# features the process of implementing an immutable type. The following is an excruciatingly simple example of an object that becomes immutable at some point (and guarantees thread safety from that moment on):
public sealed class FreezableEmployee
{
private string _name;
private int _salary;
private volatile bool _isFrozen;
public FreezableEmployee(string name, int salary)
{
_name = name;
_salary = salary;
}
public void Freeze()
{
_isFrozen = true;
}
public string Name
{
get { return _name; }
set
{
CheckIfFrozen();
_name = value;
}
}
public int Salary
{
get { return _salary; }
set
{
CheckIfFrozen();
_salary = value;
}
}
private void CheckIfFrozen()
{
if (_isFrozen)
throw new InvalidOperationException("Attempted to modify a frozen instance");
}
}
Yes, it’s amazingly dumb but hey – patterns are supposed to be simple! It is their simplicity that makes them usable.
While we’re on the topic of immutable objects, there’s another little corner to which we should attend. Generally speaking, constructing a read-only object and giving it away to other code is slightly dangerous in and of itself, because constructing an object means writing to it. If a partially constructed object is given away to other code, then that other code might observe changes to the object even though it is considered to be immutable.
Fortunately, under the memory model enforced by CLR 2.0, all writes are considered volatile writes, which ensures that reads cannot drift past them. This means that assigning a constructed object to a reference that can be used by calling code occurs after the object is fully constructed and all writes to its internal fields are visible. (Curiously enough, the JVM memory model provides this guarantee only for final fields.)
GC flavors are a static performance optimization for the .NET garbage collector. Under various circumstances, applications can opt-in (before runtime) to a runtime mode that affects the GC’s behavior to better suit the specific application in question. For a succinct yet complete description of GC flavors, Maoni’s blog is a great source of information.
.NET 3.5 (or .NET 2.0 SP1) adds an additional API that can change the GC flavor under some circumstances. It is available as the System.Runtime.GCSettings class, which has two properties: IsServerGC and LatencyMode.
The IsServerGC property is a read-only property that specifies whether the application is running under the server GC. It can't be used to opt into server GC at runtime – it only reflects the state of the application's configuration or the CLR host's GC flavor definition.
The LatencyMode property, on the other hand, takes the values of the GCLatencyMode enumeration, which are: Batch, Interactive and LowLatency. Batch corresponds to server GC (it is the default and the only possible value when server GC is enabled) or to non-concurrent workstation GC; Interactive corresponds to concurrent workstation GC. Note that if the application is running under workstation GC, the LatencyMode property can be used to switch between concurrent and non-concurrent GC at runtime.
The final, most interesting value of the GCLatencyMode enumeration is LowLatency, which is only available when using workstation GC. This value signals to the garbage collector that your code is currently in the middle of a short-term time-sensitive operation where a garbage collection might be harmful. This is not the value of choice if you're about to execute missile-guiding code for reasons to be seen shortly. It is useful, however, if you're in the middle of performing a UI animation, and a garbage collection will be disruptive for the user experience.
The low latency garbage collection mode implied by the LowLatency enumeration value instructs the garbage collector to refrain from performing full collections unless absolutely necessary – e.g., if the operating system is running low on physical memory (the effects of paging could be even worse than the effects of performing a full collection). Low latency does not mean the garbage collector is off – partial collections (of the low generations) will still be performed – but the garbage collector's share of the application's processing time will be significantly lower.
Using the low latency GC mode can be tricky because forgetting to reset it back to the previous value can result in disastrous consequences. Additionally, the amount of time you want to spend with a low latency GC mode must be kept to a minimum – the long-term effects once you exit the low latency mode and the GC aggressively begins to reclaim unused memory can hinder the application's performance. Finally, if you don't have full control of all allocations taking place within your process (e.g. if you're hosting plug-ins or have multiple background threads doing independent work), remember that switching to the low latency GC mode affects the entire process, and can cause undesired effects for other allocation paths.
The only safe way of using the low latency GC mode is within a constrained execution region (CER), because it is the only way of guaranteeing that the latency mode will revert to its previous value. The following code (snipped from Chris Lyon’s blog) demonstrates how this can be accomplished:
GCLatencyMode oldMode = GCSettings.LatencyMode;
RuntimeHelpers.PrepareConstrainedRegions();
try
{
GCSettings.LatencyMode = GCLatencyMode.LowLatency;
// perform time-sensitive actions here
}
finally
{
GCSettings.LatencyMode = oldMode;
}
When using the low latency GC mode, it is advisable to force garbage collections at safe points when it is known that the time-sensitive work has idled and can afford pausing to perform a collection. Staying in low latency mode without performing a collection for a long period of time might result in out of memory conditions. Generally speaking, if the application is sensitive to garbage collection timing, it's reasonable to force a collection during idle times to influence a biased redistribution of garbage collection overhead into the idle run time regions.
Previously in the series we have examined the performance differences between concurrency patterns based on kernel synchronization (critical sections, events, mutexes etc.) and concurrency patterns based on wait-free synchronization (such as the interlocked family of operations). Kernel synchronization tends to be expensive if locks are frequently acquired and released because of the associated context switches, introducing anti-patterns such as lock convoys. Wait-free synchronization tends to be expensive if locks are held for a long time because of the spinning which burns CPU without making progress.
Oftentimes neither approach is flexible enough to implement a practical scenario, and an intermediate mechanism is required. Such an intermediate mechanism is available in the Win32 API in the form of a critical section with a spin count.
The InitializeCriticalSectionAndSpinCount API initializes a critical section and associates it with a spin count. When a thread attempts to acquire the critical section and finds it locked, it will first spin the CPU for the specified number of iterations (i.e. the spin count) before transitioning to a kernel wait. If the critical section is released while the thread is spinning, it will acquire the critical section without context switching and without a kernel wait, producing a significant performance boost.
Critical sections with spin counts are a specific example of the general spinlock pattern used in the Windows kernel and in user-mode frameworks such as the Parallel Extensions for .NET.
A typical kernel implementation of a spinlock is a single bit. The spinlock has the following semantics:
- When the spinlock is taken, the bit is set to 1.
- When the spinlock is free, the bit is set to 0.
- When a thread attempts to acquire the spinlock, it will spin indefinitely until the bit is set to 0 (using an atomic operation such as the x86 BTS or InterlockedCompareExchange).
A refinement on the spinlock pattern that is better suited for multi-processor machines is the queued spinlock. A queued spinlock is typically implemented using a set of bits (queue), one for each CPU waiting for the spinlock. Each CPU is also associated with a bit that it uses to spin while waiting for the spinlock. When a spinlock is released, the releasing CPU hands it over to the next waiting CPU by setting its private bit and removing it from the queue.
This pattern minimizes inter-processor contention for memory because each CPU spins on a private bit which serves as its spinlock indicator. It also introduces first-in-first-out order for acquiring the spinlock to guarantee fairness.
Side note: In the Windows kernel, queued spinlocks are used on multi-processor systems. On single-processor systems, spinlocks are not needed because spinlock synchronization is required on high IRQLs only. On high IRQLs (above dispatch IRQL) a context switch cannot occur, so instead of spinning the acquiring thread can simply request an interrupt on the relevant IRQL and return; the interrupt will be masked until the releasing thread lowers the IRQL below the requested IRQL.
I gave a 3-hour presentation today on C++ debugging techniques, with a focus on production debugging. I’d like to share with you the demos I’ve shown during the session with a brief walkthrough so that you can repeat what I did in class. (I have intentionally omitted the debugger spew and any screenshots so that this still remains somewhat of an interesting challenge.)
First of all, download the demo solution (30KB), unzip and open with Visual Studio 2008 (the code should work on Visual Studio 2005 as well, so you can downgrade the project files manually if you’d like).
#1 – Runtime Checks
In this demo, we will see the effect of the Microsoft C++ compiler’s runtime checks on our application’s behavior. Uncomment the first line in the main method, compile the project in debug mode and run it.
The expected behavior is an assertion failure indicating that you have attempted to use an uninitialized variable. From reviewing the code it is evident that the variable i within the IsPrime method is indeed used before it is initialized. This is also a compiler warning, but is it caught at runtime by monitoring accesses to the variable and ensuring it is not accessed before it is initialized. The runtime check saved you from erroneously relying on whatever garbage that was on the stack at the time of the call.
Comment the first line in the main method and run the project again.
This time, you receive an assertion failure indicating that the ESP register was not preserved across a function call, which is likely to imply an improper calling convention. From reviewing the code you can see that the Add method is defined with the __stdcall calling convention, but called through a function pointer that has the default __cdecl calling convention. The runtime check saved you from corrupting the stack.
#2 – Advanced Breakpoints
In this demo, we will experiment with advanced breakpoints in Visual Studio. Compile the project and run it under the debugger.
You receive an access violation exception inside Func1 attempting to write to a null pointer. However, from reviewing the code of the main method the pointer (called g_pData) appears to have been initialized properly.
Launch the debugger again and choose Debug –> Breakpoints –> New Data Breakpoint. The expression to monitor should be &g_pData – you want to be notified immediately as the value of the pointer is modified. (Data breakpoints are a feature provided by the CPU, so if running in a virtual machine environment or on odd hardware you might not be able to set a data breakpoint or it might have no effect.)
Continue executing the program. It should stop inside Func2 when the global pointer is modified and set to null.
#3 – Static Code Analysis
In this demo we will use the C++ static code analysis feature to detect a bug that is easy to overlook. Compile the project and review the code.
There is a simple bug inside the GenerateString function that could easily be missed by the developer. However, static code analysis should be able to find it.
Right click the project in Solution Explorer, choose Properties and under the Code Analysis section toggle the last property to Yes (/analyze). Compile the project again.
The compiler was able to detect the fact that you are overrunning the dynamically allocated buffer by one character. Static code analysis has just saved you hours, days or weeks of debugging time trying to chase down this single ill-behaving character.
#4 – Taking a Dump (Simple Crash)
In this demo you will take a dump of a crashing process. Compile the project and run it.
When you press any key, the application crashes. Obtain a dump of the application when it crashes by running ADPlus from the Debugging Tools for Windows package with the following command line:
ADPlus –crash –p <AppProcessIdHere>
After the dump has been generated, open it in Visual Studio (File –> Open –> Project or Solution) and “Run” it, to see the faulting statement in the source code.
Alternatively, open the dump in WinDbg (from the Debugging Tools for Windows package) by choosing File –> Open Crash Dump, and witness the faulting statement in the source window.
#5 – Critical Sections Deadlock
In this demo we will experiment with a real debugger (WinDbg) from the Debugging Tools for Windows package. Compile the project and run it.
The application hangs doing nothing. Run WinDbg and select File –> Attach to Process (or hit F6). Choose the right process and attach to it.
Run the ~* command to see all the threads running in the process. The last thread is the thread talking to the debugger; the first three threads are application threads.
Execute the ~0s command and then the kb command to see the stack for thread 0 (the main thread). From the stack it is evident that the thread is waiting for multiple objects. Examine the parameters passed to the kernel32!WaitForMultipleObjects method – the first parameter is the count of handles in the array, and the second parameter is the array itself.
Use the dd command and pass it the address of the array to see the handles the main thread is waiting for. Use the !handle command and pass to it each of the two handles you see in the memory dump. Both handles are thread handles, so the main thread is waiting for two other threads to complete. It is reasonable to deduce that the main thread is waiting for the other two application threads.
Execute the ~1s command and then the kb command to see the stack for thread 1 (repeat the same with thread 2). Both threads are waiting for a critical section to become available – each is waiting for a different critical section.
Use the !locks command to see the currently locked critical sections within your process. Correlate the addresses and owning threads of these critical sections with the output from the previous commands and deduce that you have a deadlock involving two threads and two critical sections.
#6 – Kernel Objects Deadlock
In this demo we will examine a more complicated deadlock which cannot be resolved with the user-mode debugger alone. Compile the project and run it.
The application hangs doing nothing. Run WinDbg and select File –> Attach to Process (or hit F6). Choose the right process and attach to it.
Repeat the steps from walkthrough #5 to see that the main thread is likely waiting for the two application threads, and the application threads are waiting for a mutex object (use the !handle command to see the object type).
What is lacking in the above output is the mutex owner (for critical sections, the !locks command gave away what we needed to know). Obtaining this information is only possible with a kernel debugger, which is outside the scope of this walkthrough.
On Vista, a built-in mechanism called Wait Chain Traversal can assist with this kind of issue (I’ve covered it in detail in the past). Using WCT on this process yields a deadlock very similar to the previous walkthrough – the threads are effectively waiting for each other, and therefore can’t make any forward progress. (You might want to use my WCT helper application to streamline this process – it’s part of the demos from my TechEd 2008 presentation.)
#7 – Handles
In this demo we will examine a slightly more complicated problem which requires reproducing the issue at least once or twice. Compile the project and run it.
From the console output it appears that a secondary thread is waiting for a termination event. Press any key to let the main thread signal that event. The main thread now waits indefinitely for the thread to exit (judging by the console output, anyway), which doesn’t actually happen.
Run WinDbg and select File –> Attach to Process (or hit F6). Choose the right process and attach to it.
Repeat the steps from walkthroughs #5 and #6 to verify that the main thread is waiting for a thread handle (which is likely the secondary thread) and that the secondary thread is waiting for … an invalid handle? The !handle command will fail with error 6, and executing the !error 6 command reveals that the handle is invalid.
How can the handle be invalid if the thread is waiting for it? It must have become invalid after the wait has started, so someone must have closed it. On the other hand, if it were closed after the event was signaled, the secondary thread wouldn’t be waiting for it right now.
Run the application again and attach the debugger before allowing the main thread to signal the termination event. Use the bp command to set a pair of breakpoints on kernel32!CloseHandle and kernel32!SetEvent. Press any key in the console and observe the two breakpoints – the handle is closed before the event is signaled, which is the root cause of the problem. (To examine the parameters when the breakpoint is hit you can use the kb command and observe the call stack as well.)
#8 – Potential Memory Corruption
In this demo we will detect a memory corruption at the point where it occurs, and not later ten layers deep into our application code when it’s impossible to trace back to its source. Compile the project and run it.
It completes successfully on my system – your results might vary. Examine the code and detect the buffer overrun – the function is asking to allocate 120 bytes, but it’s accessing the 130th integer from the returned buffer (so it’s an overrun on both counts!). However, since the heap allocation comes from a pool of already committed virtual memory, no access violation occurs. This kind of corruption could be revealed only hours later, when some other application code attempts to use the corrupted piece of memory.
Run the GFlags application from the Debugging Tools for Windows package. Under the Image tab, enter the application name including the .exe suffix, and click the Tab key. Check the “Enable page heap” checkbox and click Apply.
Run the application again within the debugger. It breaks on the access violation where it previously worked just fine. The reason is that the page heap option pads the allocations with invalid (not committed) pages and ensures that you get an access violation if you try to overrun a buffer.
Another approach to the same problem would be to use Application Verifier with its set of heap checks enabled.
Summary
This concludes the set of walkthroughs. If you have any interesting ideas for additional walkthroughs, or would like to comment on the existing ones, please let me know.
Last time we have minimized contention by using lock-free operations instead of acquiring a lock on a work item queue (neither a standard lock nor a reader-writer lock offer nearly-linear scaling). On the other hand, a few weeks ago we’ve also seen a technique for eliminating contention altogether by keeping state in thread-local storage.
However, there’s another trick in the bag for using shared state across threads without any locks. It is applicable for the scenario where multiple consumers “subscribe” to a work item, such that all work items must be processed by all consumers. This is the classic publish/subscribe, within the scope of a single process and very high performance demands.
This pattern uses a cyclic buffer with a single counter indicating the next item to be consumed. The producers increment the counter and subsequently update the previous item; the consumers keep a local counter which specifies the last item they have read.
class CyclicQueue<T>
{
int _currentIndex = -1;
int _size;
volatile T[] _items;
public CyclicQueue(int size)
{
_items = new T[_size = size];
}
public void Enqueue(T item)
{
int prevIndex = Interlocked.Increment(ref _currentIndex);
_items[prevIndex % _size] = item;
}
public bool Dequeue(ref int threadPrivateIndex, out T value)
{
if (threadPrivateIndex == _currentIndex + 1)
{
value = default(T);
return false;
}
value = _items[threadPrivateIndex++ % _size];
return true;
}
}
As long as consumers don’t lag behind too much as to cause an overflow with loss of data, this pattern provides lock-less operations. The only synchronization required is around the single counter indicating the current index into the buffer, and this can get a bit tricky in the face of an overflow. To simplify, we use counter % BUFSIZE whenever accessing the data instead of taking care of wrapping in the case of overflow when incrementing the counter.
Note that consumers in this case are required to busy-wait around the Dequeue operation as long as there is no new data in the queue. If this busy-wait is known to be expensive, we can either resort to kernel synchronization or use a spinlock and fail back to kernel synchronization. Using spinlocks to minimize contention and context switches will be the subject of our next installment.
More Posts
Next page »