DCSIMG
August 2011 - Posts - All Your Base Are Belong To Us

All Your Base Are Belong To Us

Mostly .NET internals and other kinds of gory details

August 2011 - Posts

Dear Team Lead, You Are Not Doing Agile If…

…Your sprint planning meeting begins with a condescending description of what The Methodology looks like, and ends with “meet me here at 4PM – you will be assigned tasks and pairs”.

…You switch task management tools every week, never failing to surprise your developers and upper management. (Hint: more tools is not necessarily better. Bugs on the whiteboard, tasks in Excel, projects in TFS, and resource scheduling in a custom tool is confusing.)

…You have a heterogeneous team with young developers and you don’t do code reviews. (And still, you find the audacity to complain about bugs introduced due to lack of experience.)

…You do testing in production. (Oh, I mentioned this one before.)

…You have continuous integration and nightly builds, but the nightly builds never run and the CI build always fails. (“Only these 200 tests keep failing every time” – not an excuse. Commenting out failing tests – not an excuse.)

What do you say, dear readers? What are your favorite “You are not doing agile if…” moments?


I have been recently posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn

Process Monitor Profiling Support

If you’re looking for a very simple profiler that will give you a general idea how CPU utilization is distributed in your system, look no further than the latest version of Process Monitor.

Under Tools | Profiling Events, you can enable stack trace collection every second or every 100ms for all running processes.

image

The profiling events have an execution stack, exactly like any other event in Process Monitor, and you can gain a quick impression where CPU time is spent and which call stacks are responsible for it.

Here’s an example call stack from a devenv.exe thread captured in the profiling mode:

image

And here is an aggregating view for all the activity in the log (under Tools | Stack Summary):

image

VMMap Allocation Profiling and Leak Detection

We have already seen VMMap as a tool for inspecting memory utilization and the layout of a process’ virtual address space.

The latest VMMap version (rush to download!) ships with a memory allocation profiler for VirtualAlloc() and HeapAlloc() calls. Additionally, the tool can display the allocating call stack for every heap block.

Sounds interesting? Here’s how to use it. You launch VMMap, and instead of inspecting an existing process, you tell it to launch and trace another process. (It uses Microsoft Detours to hook allocation APIs and log the information.)

image

Now you exercise your leaking application. The Timeline window gives you a good understanding of what type of memory is leaking:

image

The orange areas here are heap allocations, which are growing while everything else is fairly steady. When you’re convinced that there’s a leak, go to the detail view and start clicking suspicious heap areas. The “Heap Allocations” button at the bottom of the screen will become enabled, and clicking it provides a summary of allocations:

image

And now for each allocation you can obtain the stack trace:

image

Another approach is to click the “Trace” button, sort the output by bytes allocated, and start looking for suspicious call stacks:

image

I hope VMMap will be as useful to you as it is to me—and with memory allocation profiling, twice as useful.

Code Smells and Other Problems

Here are some Bad ThingsTM I’ve come across during the last few days and felt like sharing with you to let out some steam. These aren’t Coding Horrors per se, but perhaps there is something for all of us to learn here.

Bad Naming

I tweeted yesterday about a method called WaitForAllRequestsToExecute which doesn’t wait for all the requests to execute. This is one of the many bad things you can do to the maintenance programmer, which may lead to an axe-hunting adventure. (“Always write code as if the maintenance programmer is an axe murderer who knows where you live.”)

Over-Implementation

Just the other day I was reviewing and fixing a class responsible for asynchronous work item execution. It is really full to the brim with features – you can queue a work item for execution, you can update it, you can cancel it, you can wait for all work items to complete, you can query for all unfinished work items, and so on. Under the covers this glorified thread pool uses the .NET thread pool (the project is targeting .NET 3.5, so the TPL is not available).

The application, however, uses just two of these features. Namely, it needs to queue a bunch of work items for execution from various places, and then in some other, unrelated place, wait for all these work items to complete. Honestly, this is not something for which you need this glorified thread pool with all the unused and untested features. A simple WaitForNoneEvent would suffice (pseudo-code follows). The funny thing is that the glorified thread pool actually uses one of these under the covers.

class WaitForNoneEvent {
    ManualResetEvent e = new ManualResetEvent(true);
    int threadsInside = 0;

    public void Enter() {
        if (Interlocked.Increment(ref threadsInside) == 0)
           e.Reset();
    }
    public void Exit() {
        if (Interlocked.Decrement(ref threadsInside) == 0)
           e.Set();
    }
    public void Wait() { e.WaitOne(); }
}

Leaving TODOs in the Code

This is a really popular practice – you write a couple of classes, maybe even a few tests, and then before going home to call it a day you put a TODO comment in some method, as a reminder that something needs to be fixed. So far so good, as long as you remember to fix it the next day!

Yesterday I came across a TODO comment in a class that has been “live” for a couple of months that said roughly the following:

//TODO: Make sure there are no other operations on this item now

The consequences of the premise not being met? Well, there is no thread-safety built-in, so if there are actually “other operations on this item”, I can’t vouch for the item’s health after they complete :-)

Now, this is not the kind of thing you leave as a TODO comment in your code and then go away for a few months. This is a showstopper bug that needs to be fixed right away, and at the very least you need to put a big post-in in the middle of your screen before going home to remind you of this.

Testing by Production Deployment

How does the following sound as a testing plan? “We will deploy the application to production, let some users (not the important ones) play with it a little bit, and then see what bugs we find and fix them.”

The results were… interesting. First of all, shortly after the deployment, something failed and the production database was rendered unusable. But that’s fine, because we can always rename the production database to the “testing database” and see – now there’s no problem fiddling around with data in the production environment.

Next, there were so many bugs that the logs would become impossible to read with all the exceptions and error messages. But that’s also fine – you fix a bug, and then you test the fix by deploying it back to the production environment and letting users see if the problem went away.

The amazing thing here? The team has a build server, has a bunch of staging servers for these experiments, has well-written unit tests which run in the CI build and QA tests which run in the nightly build – and they have completely forsaken all these practices and went berserk with the “test in production” mantra.

I can’t fully explain why this happened yet, but one obvious thing that comes to mind is that the CI build and the nightly build have been failing for a few weeks. Whenever this happens, you lose quite a bit of the confidence you have in the tests, and might be inclined to try the “time-tested” approach described above.

I am happy to report, however, that this testing in deployment practice has ceased and we are now working on unit tests and QA tests for all the “new” code that hasn’t been tested yet.

Walking the Stack Without Symbols and With FPO (Frame Pointer Omission)

In the previous post on stack corruptions, we have discussed the case where the stack becomes corrupted but still contains a chain of EBP references which allows for manual reconstruction. (For background reading, see this article on EBP stack reconstruction and calling convention nightmares on x86.)

Below is a call stack from an application crash dump. The reported crash was an access violation inside a module called “HelperLibrary” for which we don’t have symbols or source code. The call stack doesn’t look promising:

0:000> kv
ChildEBP RetAddr  Args to Child             
WARNING: Stack unwind information not available. Following frames may be wrong.
0028fcec 74ba339a 7efde000 0028fd38 77479ed2
  HelperLibrary+0x1014
0028fcf8 77479ed2 7efde000 776a5346 00000000
  kernel32!BaseThreadInitThunk+0xe (FPO: [1,0,0])
0028fd38 77479ea5 011212b2 7efde000 00000000
  ntdll!__RtlUserThreadStart+0x70 (FPO: [SEH])
0028fd50 00000000 011212b2 7efde000 00000000
  ntdll!_RtlUserThreadStart+0x1b (FPO: [2,2,0])

There are no real frames here other than HelperLibrary+0x1014, but we’re pretty sure that there should be other code on the stack, such as the application’s main function :-)

To reconstruct something from this stack, you need to understand who called HelperLibrary+0x1014, even though you don’t have accurate symbols. Usually, it would be a matter of traversing EBP references, but if it were that easy, the debugger would already have done it!

OK, so what happened to EBP?

0:000> r ebp
ebp=0034fbfc
0:000> ln ebp
0:000> u ebp
0034fbfc 08fc            or      ah,bh
0034fbfe 3400            xor     al,0
0034fc00 9a33ba7400e0fd  call    FDE0:0074BA33
0034fc07 7e48            jle     0034fc51
0034fc09 fc              cld
0034fc0a 3400            xor     al,0
0034fc0c d29e477700e0    rcr     byte ptr [
  esi-1FFF88B9h],cl
0034fc12 fd              std

In case you haven’t noticed, this isn’t actual code—it’s a bunch of data which is interpreted as instructions. It is entirely possible that EBP has been corrupted to point to a totally unrelated location, but there is also another possibility: that the current code is using FPO.

What’s FPO? FPO (Frame Pointer Omission) is an optimization technique whereas the compiler uses the EBP register as a scratch value for storing miscellaneous data, like any other general-purposes register. How are local variables and function parameters handled? Directly through ESP.

In other words, when using FPO (which you can enable with the /Oy compilation switch), the compiler is free to refrain from creating a “real” stack frame, with the previous EBP value. There is no linked list of stack frames starting at the current EBP value. The debugger isn’t capable of doing anything without FPO information, which is present in the symbol files, which we don’t have.

This leaves us with disassembling HelperLibrary+0x1014 and try to figure out manually where it returns. Let’s take a look at the vicinity of HelperLibrary+0x1014 (the offending instruction in bold):

0:000> u HelperLibrary+0x1014-0x14 L8
HelperLibrary+0x1000:
66951000 56              push    esi
66951001 ff157c209566    call    dword ptr [6695207c]
66951007 50              push    eax
66951008 c60061          mov     byte ptr [eax],61h
6695100b ff1578209566    call    dword ptr [66952078]
66951011 83c408          add     esp,8
66951014 c60661          mov     byte ptr [esi],61h
66951017 c3              ret

This looks, indeed, like a frame with FPO – there is no EBP to be seen. However, the subsequent ret instruction is going somewhere – so we can look at ESP and find the return address there:

0:000> dps esp L1
0034fb9c  66951040 HelperLibrary+0x1040

OK, so what’s at HelperLibrary+0x1040?

0:000> u HelperLibrary+0x1040-0x20 LC
HelperLibrary+0x1020:
66951020 56              push    esi
66951021 8bf0            mov     esi,eax
66951023 56              push    esi
66951024 ff157c209566    call    dword ptr [6695207c]
6695102a 50              push    eax
6695102b c60061          mov     byte ptr [eax],61h
6695102e ff1578209566    call    dword ptr [66952078]
66951034 03742410        add     esi,dword ptr [esp+10h]
66951038 83c408          add     esp,8
6695103b e8c0ffffff      call    HelperLibrary+0x1000
66951040 5e              pop     esi
66951041 c3              ret

Interesting. This frame doesn’t use EBP either, so we can expect the return value to be at ESP+4 (because of the pop instruction immediately prior to returning). But how do we figure out the value of ESP when in this function? Well, suppose that the previous function returns. It has removed four bytes (the return address) from the stack. The next value of ESP, then, is ESP+4, and we need to add another four bytes to account for the “pop esi” instruction.

0:000> dps esp+8 L1
0034fba4  6695106b HelperLibrary!ImportantFunction+0x1b

All right! We are making some real progress—we have a really small offset, and even though there are no symbols, ImportantFunction is probably an exported function, so we have its location in the DLL:

0:000> u HelperLibrary!ImportantFunction LD
HelperLibrary!ImportantFunction:
66951050 8b442404        mov     eax,dword ptr [esp+4]
66951054 85c0            test    eax,eax
66951056 7501            jne     HelperLibrary!ImportantFunction+0x9
66951058 c3              ret
66951059 837c240c00      cmp     dword ptr [esp+0Ch],0
6695105e 7e0e            jle     HelperLibrary!ImportantFunction+0x1e
66951060 8b4c2408        mov     ecx,dword ptr [esp+8]
66951064 49              dec     ecx
66951065 51              push    ecx
66951066 e8b5ffffff      call    HelperLibrary+0x1020
6695106b 83c404          add     esp,4
6695106e b801000000      mov     eax,1
66951073 c3              ret

This is another function with no trace of EBP use in it. Note that it has parameters, and it accesses these parameters using direct offsets from ESP—which is a tell-tale sign of FPO. Where does this function return to? Well, after the previous function returns, ESP is already at ESP+0xC from its current value. ImportantFunction adds another four bytes, and then returns—so we need to look at ESP+0x10:

0:000> dps esp+0x10 L1
0034fbac  00ed100f MainApp!wmain+0xf

Yes! We’re out of the DLL, and back to symbol-land! The reconstructed stack, therefore, looks like this:

HelperLibrary!…somefunction…
HelperLibrary!…someotherfunction…
HelperLibrary!ImportantFunction
MainApp!wmain

None of this was provided by the debugger. For reference, here is the same call stack with symbols (which contain FPO information)—we were right on the money!

0:000> kv
ChildEBP RetAddr  Args to Child             
003dfee4 668e1040 00000001 668e106b 0000000f
 
HelperLibrary!AnotherHelperFunction+0x14 (FPO: [0,0,0])
003dfeec 668e106b 0000000f 010c100f 010c20f4
  HelperLibrary!HelperFunction+0x20 (FPO: [1,0,4])
003dfef4 010c100f 010c20f4 00000010 00000001
  HelperLibrary!ImportantFunction+0x1b (FPO: [3,0,0])
003dff04 010c1191 00000001 00771b78 00771c10
  MainApp!wmain+0xf (FPO: [2,0,0])
003dff48 74ba339a 7efde000 003dff94 77479ed2
  MainApp!__tmainCRTStartup+0x122 (FPO: [Non-Fpo])
003dff54 77479ed2 7efde000 777db3e9 00000000
  kernel32!BaseThreadInitThunk+0xe (FPO: [1,0,0])
003dff94 77479ea5 010c12b2 7efde000 00000000
  ntdll!__RtlUserThreadStart+0x70 (FPO: [SEH])
003dffac 00000000 010c12b2 7efde000 00000000
  ntdll!_RtlUserThreadStart+0x1b (FPO: [2,2,0])

Updated Course: Developing Windows Concurrent Applications

During the last couple of months, I have been updating the materials of the Developing Windows Concurrent Applications course. It is now an up-to-date four day ILT with lots of labs, demos, design patterns, and other practical materials to help C++ developers write their next great concurrent application for Windows.

The target audience is C++ programmers with 1-2 years of experience writing Windows applications, but who haven’t necessarily seen how to create threads, queue work to the thread pool, synchronize access to shared resources, minimize shared operations, throttle the amount of work, and many other things that are critical for an application running on a multicore system.

The theory and practice of “raw” Windows concurrent programming are also peppered with two leading concurrency frameworks—the steady but declining OpenMP, and the new and emerging ConcRT, along with a brief intro to C++11, the (very recently approved) C++ standard of the new millennium.

The course wouldn’t be at all useful for you if it weren’t for the labs, and here are just a few examples:

  • Designing and implementing a lock-free collection
  • Implementing synchronization mechanisms such as barrier, reader-writer lock, condition variable
  • Designing a custom thread pool based on APC or IOCP
  • Parallelizing sort, matrix sum, fold-and-scan and many other operations using OpenMP and ConcRT

I have recently delivered the updated version of the course for the first time, and it’s been a great deal of fun. I hope you will like it too.

Are Workflows Really That Bad? (Hint: Maybe They Aren’t)

I was thinking long and hard about whether Windows Workflow Foundation, or any other workflow tool for that matter, can really be used in a large production application for orchestrating large business processes. (Clarification: I’ve been writing a framework for developing such workflows using WF 3.5 and WF 4.0 for the past three years, so I’ll try to speak from experience.)


Is it possible to use workflows to orchestrate large business processes without landing in a mass of spaghetti a couple of years later?

I believe that the same techniques that need to be applied to code to prevent it from deteriorating into a slime ball need to be applied to workflows as well. It is, after all, “just code”.

For example, you really ought to review complex workflows you write. You need to come up with a framework for invoking one workflow from another, so that you can modularize your application. You need to learn to debug workflows and their code-behind, and train the activity developers on the internals of whatever workflow framework you’re using.

But wait, if it is just code, why don’t we just write the code? Surely a line of code flows from the keyboard much faster than an activity is dragged from the toolbox with the mouse…

In a nutshell: physical limitations to what you can do. When given a set of activities to choose from, with strict design-time validation on every operation, it’s much harder to go wrong than when working with APIs.

To some extent, this is a matter of personal preference—just the other day I saw a presenter demo a visual drag-and-drop tool for building certain UIs, and then dismiss it with “but you will write your UI in code anyway”. However, if you’re serious about writing a framework for workflow development, it’s not that hard to get to a state where the workflow developer is shielded very carefully from the bad choices, and the “right” application is almost magically generated from under his fingers.

Only developers can write workflow applications. Leave it to their personal preference whether they want workflows or code.

Other than the physical limitations mentioned above, there is a fallacy here that all the developers in the organizations have similar skillsets. What if you have a group of 10 extremely talented developers who can write a framework for workflow applications, who can write the activities and their code-behind and services and whatnot, and a group of 100 inexperienced developers who are accustomed to perhaps a simple scripting language and have never maintained an enterprise application?

I propose that it’s easier for 10 talented developers to maintain a workflow-building framework used by 100 other developers, than to maintain a framework for writing code.

Can workflows work together with any framework I already have in place for process orchestration, pub/sub, bus, etc.?

I believe they can. It’s just a matter of the proper integration. For example, in the project I’ve been building, we use WCF and a custom bus with pub/sub for most communication. I’m pretty sure that using WF with WCF and AppFabric is an even more seamless experience.

Isn’t testing a PITA? It’s not just “create an object, call a method” anymore.

I propose that testing workflows is in fact much easier than testing code. The main reason is that workflows automatically promote proper dependency management, which leads to better testability. If you drag activities around, and these activities are properly designed, it’s almost impossible to introduce any dependencies other than the dependencies these activities have.

In turn, activities rely on local services (extensions in WF4) for any external work, or invoke WCF services. Either is very easy to mock, replace, or isolate.

In other words, workflow-driven development actually holds your hand all the way to testable code, which can hardly be said about code-based APIs.

There is, of course, the matter of building a framework for unit testing your workflows. Someone has to run them, wait for completion, report any errors—but this is a framework you write once, much as a framework for testing code.


To summarize, I have faith in workflow-building frameworks for orchestrating business processes. It does take discipline, and there is no magic involved.

For an alternative (in fact, completely opposite) point of view, I recommend reading Udi Dahan’s post, The Danger of Centralized Workflows.

Baby Steps in Windows Device Driver Development: Part 6, Hiding Processes

Last time around, we’ve seen how to do something slightly useful in our driver. This time, we’ll simulate a technique used over ten years ago by Windows kernel rootkits to hide a process from tools such as Task Manager.

First, some background: the Windows scheduler doesn’t need process information to run code. The scheduler needs access only to threads—threads ready for execution are stored in a set of ready queues. When a thread enters a wait state, the system tracks its information using _KWAIT_BLOCK structures, which again don’t require access to processes.

Still, the system keeps track of the list of running processes, not the least for tools like Task Manager to display information on what’s going on in the system. Malicious software has been fond of subverting the information presented to Task Manager in order to hide processes; similar techniques exist for hiding files, registry entries, network connections, and other traces of malicious activity. (Needless to say, security software is detecting and preventing tricks of this sort as they emerge, and the arms race goes on.)

One way of hiding processes is called Direct Kernel Object Manipulation [pdf], which involves modifying the internal data structures the system uses to keep track of processes. Namely, there is a linked list of _EPROCESS structures which represent running processes. Unlinking an _EPROCESS structure from this list will cause the process to become invisible to tools like Task Manager.

Because the _EPROCESS structure is undocumented, and isn’t part of the WDK headers, you will need to find the embedded _LIST_ENTRY offset manually for each OS version you intend to support. On Windows XP 32-bit, this offset is 0x88. A _LIST_ENTRY is an entry in a doubly-linked list, with backward and forward pointers. This means that given one _EPROCESS you can find the rest by traversing these pointers—and you can find the first _EPROCESS by using PsGetCurrentProcess(), which is a documented WDK API.

The following is a routine that hides the calling process by unlinking it from the aforementioned list:

VOID
HideCaller(
    VOID
    )
{
    ULONG eProcess;
    PLIST_ENTRY plist;
   
    eProcess = (ULONG)PsGetCurrentProcess();
    plist = (PLIST_ENTRY)(eProcess+FLINKOFFSET);

    *((ULONG*)plist->Blink) = (ULONG) plist->Flink;
    *((ULONG*)plist->Flink+1) = (ULONG) plist->Blink;

    plist->Flink = (PLIST_ENTRY) &(plist->Flink);
    plist->Blink = (PLIST_ENTRY) &(plist->Flink);
}

The last two lines ensure that the forward and backward links on the hidden _EPROCESS remain valid (pointing to itself), in case some API will decide to use them. (This might happen, for instance, when the process exits and the Process Manager removes it from the list of processes.)

Next time: This post already wanders into grounds in which I’m not sure I’m willing to keep treading. We’ll see about next time.

Restart Windows and Restart All Registered Applications: shutdown -g

The Windows Restart Manager (introduced in Windows Vista) supports gracefully shutting down and restarting applications that registered for restart with the RegisterApplicationRestart API.

This functionality is used by Windows Update – thanks to the Restart Manager, when I come yawning to my desktop PC in the morning, even following a system restart, I have my Outlook, browser windows, OneNote, Visual Studio, and Messenger all lined up as they were when I went to bed.

Suppose you want to initiate one of these “automagically restart everything after restart” restarts. As of a few weeks ago, I had it in my head that you have to write a small app that uses the Restart Manager APIs (e.g. RmStartSession and RmShutdown) to do this.

And then it hit me that the shutdown command must have support for doing this. And indeed, it has:

shutdown /g