DCSIMG
February 2012 - Posts - All Your Base Are Belong To Us

All Your Base Are Belong To Us

Mostly .NET internals and other kinds of gory details

February 2012 - Posts

Finalization Queue or F-Reachable Queue? Find Out with SOS

It’s 2012, so time for another post related to finalization. This so-often-abused CLR feature has popped up here in the past. A quick recap:

This time, we’ll see how to determine whether a particular object is in the finalization queue (which means it hasn’t been scheduled for finalization yet) or in the f-reachable queue (which means it’s waiting for the finalizer thread to run its finalizer).

Let’s fire up our trusty WinDbg and SOS and look at some heap objects:

0:003> !dumpheap -stat
...snipped…
000007ff00023b78     1340        32160 MemoryLeak.Schedule
000007ff00023aa0     1340        32160 MemoryLeak.Employee
000007fef5ff7b08      435        44400 System.String
000007fef5fe58f8      310        67120 System.Object[]
00000000005387f0      577      2680608      Free
000007fef5fffb48     1345     13432600 System.Byte[]
Total 6231 objects

Okay, what are these schedules, employees, and byte arrays running around?

0:003> .foreach (obj {!dumpheap -mt 000007fef5fffb48 -short}) {!gcroot obj; .echo -----}
…edited for clarity…
Finalizer queue:Root:000000002a021a8(MemoryLeak.Employee)->
0000000002a021c0(MemoryLeak.Schedule)->
0000000002a021d8(System.Byte[])
-----
Finalizer queue:Root:000000002a07058(MemoryLeak.Employee)->
0000000002a07070(MemoryLeak.Schedule)->
0000000002a07088(System.Byte[])
-----
Finalizer queue:Root:000000002a0bf08(MemoryLeak.Employee)->
0000000002a0bf20(MemoryLeak.Schedule)->
0000000002a0bf38(System.Byte[])
…many more of these snipped…

Now, are these Employee objects rooted at the finalization queue or the f-reachable queue? Unfortunately, !gcroot does not tell. However, !FinalizeQueue shows the queue statistics:

0:003> !FinalizeQueue
SyncBlocks to be cleaned up: 0
MTA Interfaces to be released: 0
STA Interfaces to be released: 0
----------------------------------
generation 0 has 370 finalizable objects
  (0000000000d29030->0000000000d29bc0)
generation 1 has 4 finalizable objects
  (0000000000d29010->0000000000d29030)
generation 2 has 8 finalizable objects
  (0000000000d28fd0->0000000000d29010)
Ready for finalization 571 objects
  (0000000000d29bc0->0000000000d2ad98)
Statistics:
…snipped…

Note the information on finalizable objects vs. objects that are ready for finalization. The former are in the finalization queue, the latter are in the f-reachable queue.

But now suppose that you have an individual object and you want to determine whether it’s in the finalization queue or the f-reachable queue. All you need to do is check in which array it is contained by searching memory with the s command:

0:003> s -q 0000000000d29bc0 0000000000d2ad98 0000000002a0bf08
00000000`00d29da8  00000000`02a0bf08 00000000`02a10db8
0:003> s -q 0000000000d28fd0 0000000000d29bc0 0000000002a0bf08

Okay, so 0000000002a0bf08 is in the f-reachable queue, waiting for the finalizer thread.


I am posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn
This blog post was also cross-posted to CodeProject.

Announcing the Jerusalem .NET/C++ User Group

I’m proud to announce the first meeting of the Jerusalem .NET/C++ User Group, which will take place on March 20! This user group is the product of collaboration between yours truly, SELA Group, and BrightSource Energy, and strives to be the meeting place for developers in and around Jerusalem working with .NET and C++ technologies.

Here’s the agenda for our first meeting – I recommend that you register quickly because the number of seats is very limited:

18:00-18:15 - Networking and Refreshments
18:15-18:20 - Introducing the User Group
18:20-19:00 - The New C++ Standard and Beyond (Sasha Goldshtein)
19:00-19:15 - Networking and Refreshments
19:15-19:55 - Debugging C++ Applications in Production (Noam Sheffer)
19:55-20:35 - C++ Tips for Realtime Applications (Vladimir Oster)

See you at the meeting, and I’ll post the slides and demos online after the talks are over. It’s really exciting to finally get this community rolling after years of driving to Tel-Aviv, Raanana, or Herzliya for a good tech talk! ;-)


 I am posting short updates and links on Twitter as well as on this blog. You can follow me:@goldshtn

How Understanding Assembly Language Helps Debug .NET Applications [CodeProject]

A quick plug to let you know that I published today an article called How Understanding Assembly Language Helps Debug .NET Applications on CodeProject.

It contains some scenarios previously covered on this blog and others that I worked on specifically for the article, and strives to explain why assembly language skills are beneficial even if you’re only working on .NET applications.

Here’s the TOC:

  • Analyzing a corrupted or incomplete call stack
  • Correlate crash location to source code line
  • Determine function arguments
  • Find the static root that references your object

I am posting short updates and links on Twitter as well as on this blog. You can follow me:@goldshtn

Performance Profiling and Optimization Session at Microsoft

Yesterday I delivered a short closed session on performance profiling and optimization at Microsoft Raanana. Thanks to Maor David for hosting this session.

image

Thanks to a very attentive audience I was able to cover all of the following:

  • The pitfalls of micro-benchmarking
  • Performance measurement with the Visual Studio Profiler
  • Instrumentation for production profiling
  • The case of the 250ms delay
  • CPU cache optimization in general, cache collision on SMP
  • Parallelization with C++/AMP

The slides are demos are available online. The slides are quite short, but the demos might be useful even if you haven’t been there :-)


I am posting short updates and links on Twitter as well as on this blog. You can follow me:@goldshtn

SELA Developer Practice: Back in 2012!

This is going to be an exciting year because right now we have THREE conferences planned for 2012! The first one is very near, in the end of March, and features 20 full-day tutorials and a day of breakout sessions at the Crowne Plaza hotel. As always, we will have two distinguished keynote speakers from Microsoft: Guy Burstein and Maor David.

image

Among the topics you’ll find Windows 8 (three in-depth tutorials!), .NET Parallel Programming, C++ 11, Windows Phone Mango, HTML 5, Deep Dive into JavaScript, and many others.

Yours truly will deliver two sessions at the SDP: a keynote Introducing Windows 8 and a full-day tutorial Improving the Performance of .NET Applications. It’s the third time in a row I’m delivering this tutorial and the class is packed again and again, so I guess you guys are interested in profiling and optimizing your managed apps.

See you at the SDP, or if you can’t make this one, see you at the next!


I am posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn

Pinpointing a Static GC Root with SOS

NOTE: if you’re not familiar with SOS (a WinDbg extension for managed code) and leak detection with !gcroot, start by reading an introductory post on the subject.


A typical root reference chain for a managed object that is retained by a static GC root would have a pinned object array appear as the rooted object. Here is a typical reference chain:

0:010> !gcroot 0000000002bcaf58
…snipped…
DOMAIN(0000000000C1C5F0):HANDLE(Pinned):5017f8:Root:0000000012761018(System.Object[])->
00000000039b3c30(System.EventHandler)->
0000000002bcab38(System.Object[])->
0000000002bcf8d8(System.EventHandler)->
0000000002bcaf58(FileExplorer.MainForm+FileInformation)

This object array is ubiquitous, it would seem that all static root references stem from it. Indeed (and this is a CLR implementation detail), static fields are stored in this array and their retention as far as the GC is concerned is through it.

This also makes it difficult to determine which static field of which class is responsible for the static reference. For example, in the reference chain above, it is apparent that there is a static EventHandler-typed field (which is likely an event) that retains the FileInformation instance – but it’s very desirable to find the details of that static field.

More than six years ago Doug Stewart wrote a short blog post outlining the general process in cases like these. This process generally works, but requires some adaptation in the 64-bit era, so here goes.

First, let’s take a look at that rooted array:

0:010> !do 0000000012761018
Name: System.Object[]
MethodTable: 000007fef68858f8
EEClass: 000007fef649eb78
Size: 8192(0x2000) bytes
Array: Rank 1, Number of elements 1020, Type CLASS
Element Type: System.Object
Fields:
None

OK, so it’s an array with 1020 elements, and one of these elements must be our event handler. Is it the case? Let’s see:

0:010> s -q 0000000012761018 L2000 00000000039b3c30
00000000`12762e10  00000000`039b3c30 00000000`0278b380

Sure enough, our event handler is one of the array elements, at the address 00000000`12762e10. Now there are two key observations:

  1. The EventHandler instance ended up in the array somehow. Maybe if we can find other references to this array address, we can find who put it there and then determine whose static field it is.
  2. There is a reference from that EventHandler instance to one of our application’s objects (eventually). Then there should be additional references to this array address, which shape the chain of references to our application’s object.

Frankly, both of these are long shots, because it might be the case that the address is calculated dynamically, but let’s give it a spin. Doug’s original guidance at this point is to launch a memory search for any references to the array location, which would complete in a few seconds for a 32-bit address space; not so much for a 64-bit address space!

However, we are looking for references in managed code only, so no need to traverse the entire address space. It suffices to look at the address ranges of modules in the current AppDomain:

0:010> !dumpdomain
…snipped…
--------------------------------------
Domain 1: 0000000000c1c5f0
LowFrequencyHeap: 0000000000c1c638
HighFrequencyHeap: 0000000000c1c6c8
StubHeap: 0000000000c1c758
Stage: OPEN
SecurityDescriptor: 0000000000c1de90
Name: FileExplorer.exe
Assembly: 0000000000c3cd80 [C:\Windows\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll]
ClassLoader: 0000000000c3ce40
SecurityDescriptor: 0000000000c3cc40
  Module Name
000007fef6461000 C:\Windows\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll
000007ff000f2568 C:\Windows\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\sortkey.nlp
000007ff000f2020 C:\Windows\assembly\GAC_64\mscorlib\2.0.0.0__b77a5c561934e089\sorttbls.nlp
Assembly: 0000000000c57480 [D:\courses\NET Debugging\Exercises\4_MemoryLeak\Binaries\FileExplorer.exe]
ClassLoader: 0000000000c57540
SecurityDescriptor: 0000000000c57390
  Module Name
000007ff000433d0 D:\courses\NET Debugging\Exercises\4_MemoryLeak\Binaries\FileExplorer.exe
…many more of these guys…

Now we have a couple of module addresses and can constrain our memory search. It seems safe to start at 7ff`00000000 and go through a few hundred megabytes looking for our address. Generally speaking, the proper WinDbg command here would be:

0:010> s -q 000007ff`00000000 L?00000000`40000000 00000000`12762e10

(…we are looking for a full QWORD.) The problem is that we might miss unaligned references to that address, which may occur if it is hardcoded into some instruction (e.g. a MOV). So instead we should be looking for the individual byte sequence, and remember that we are on a little endian architecture:

0:010> s -b 000007ff`00000000 L?00000000`40000000 10 2e 76 12
000007ff`001913d3  10 2e 76 12 00 00 00 00-48 8b 00 48 89 44 24 60  ..v.....H..H.D$`
000007ff`00191440  10 2e 76 12 00 00 00 00-48 8b d0 e8 60 c1 87 f7  ..v.....H...`...

Voila! Two references to the array location, and now let’s take a look at them with the !u command to see if they are code:

0:010> !u 000007ff`001913d3
Normal JIT generated code
FileExplorer.MainForm+FileInformation..ctor(System.String)
Begin 000007ff001912d0, size 18d
…snipped…
000007ff`001913d0 90              nop
000007ff`001913d1 48b8102e761200000000 mov rax,12762E10h
…snipped…
000007ff`0019143e 48b9102e761200000000 mov rcx,12762E10h
000007ff`00191448 488bd0          mov     rdx,rax
…snipped…

They are both a match inside FileInformation’s constructor, which gives us an excellent clue where to look. (The rest of the analysis is not shown here – you would look at the constructor’s source code and identify the event in question.)

This analysis process is rather tedious, but in the absence of a profiler capable of performing this analysis for you, it’s yet another useful skill to the memory leak detection toolkit.


I am posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn
This blog post was also cross-posted to CodeProject.

HTML5 Web Workers: Classic Message Passing Concurrency

Most concurrency frameworks I write about on this blog consist of numerous layers of abstraction. Consider the Task Parallel Library, for instance: it’s a wrapper on top of the .NET Thread Pool, which is a wrapper on top of Windows threads. This cruft of low-level abstraction layers forces certain expectations from the newer libraries – namely, they must allow direct access to shared state, provide synchronization mechanisms, volatile variables, atomic synchronization primitives, …

It seems that JavaScript (HTML5) with its Web Workers standard enjoys the lack of abstraction cruft for threading in the JavaScript world. Because there are no underlying low-level libraries for multithreaded JavaScript computation (in the browser), Web Workers are free to reinvent not only the APIs, but also the concurrency style.

Web Workers provide a message-passing model, where scripts can communicate only through well-defined immutable messages, do not share any data, and do not use synchronization mechanisms for signaling or data integrity. Indeed, Web Workers are not ridden with classic concurrency problems such as deadlocks and race conditions – simply because these concurrency problems are impossible.

Frankly, I’m a little jealous of JavaScript developers who can now leverage multithreading in their browser-side applications. The Web Workers API is minimalistic but done right. Below is a small example of a multithreaded prime number search with Web Workers – if you are looking for a more detailed introduction and walkthrough, check out the following resources:

First, the UI:

<input type="text" id="range_start"
       placeholder="start (e.g. 2)" /><br/>
<input type="text" id="range_end"
       placeholder="end (e.g. 100000)" /><br/>
<label for="dop">Degree of parallelism:</label>
<input type="range" id="dop"
       min="1" max="8" value="4" step="1" /><br/>
<input type="button" id="calculate" value="Calculate" />

image

Now the actual business. When the “Calculate” button is clicked, we spawn the specified number of worker threads to do the calculation in the background. The main thread passes to the workers the range of primes they will work on, and receives from the workers progress reports and the final count:

//Some parts of the code elided for clarity
$(document).ready(function () {
    $("#calculate").click(function (e) {
        e.preventDefault();
        var rangeStart = parseInt($("#range_start").val());
        var rangeEnd = parseInt($("#range_end").val());
        var parallelism = parseInt($("#dop").val());
        createWorkers(parallelism, rangeStart, rangeEnd);
    });
});
function createWorkers(parallelism, start, end) {
    var range = end - start;
    var chunk = range / parallelism;
    var count = 0;
    var done = 0;
    for (var i = 0; i < parallelism; ++i) {
        var worker = new Worker("prime_finder.js");
        worker.onmessage = function(event) {
            if (event.data.type === 'DONE') {
                ++done;
                count += event.data.count;
                if (done == parallelism) ...
            } else if (event.data.type === 'PROGRESS') {
                var progress = event.data.value;
                ...
            }
        };
        var init = {
            start: start + i*chunk,
            end: start + (i+1)*chunk,
            idx: i
        };
        worker.postMessage(init);
    }
}

Note the communication between the threads. The main thread uses Worker.postMessage to provide data to the worker thread, and receives from it status updates using the onmessage event. The worker thread runs the prime_finder.js script:

//The isPrime function elided for brevity
self.onmessage = function (event) {
    var start = event.data.start;
    var end = event.data.end;
    var size = end - start;
    var count = 0;
    for (var i = start; i < end; ++i) {
        if (isPrime(i)) ++count;
        if (i % 1000 === 0) {
            self.postMessage({
                type: 'PROGRESS',
                value: 100.0*((i-start)/size),
                idx: event.data.idx
            });
        }
    }
    self.postMessage({
        type: 'DONE',
        count: count,
       
idx: event.data.idx
    });
};

Here we see the opposite direction – the worker thread periodically posts progress reports and eventually reports completion. The whole thing is triggered by the receipt of the initial message from the main thread.

image
(In this screenshot you can see how uneven the distribution of work turns out to be – the first thread finishes very quickly while the fourth thread lags behind quite slowly…)

To experiment with this code, you can download the full demo, including a very small Node.js server that serves it.

If you have a C# or C++ development background, it all probably feels very unnatural to you. Where is the shared state? Where are the synchronization mechanisms? Where are the function pointers? – Indeed, it can be scary to write a multithreaded program using only asynchronous message passing – but it’s a much cleaner start than many of the libraries we have today.


I am posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn
This blog post was also cross-posted to .