Debugging and Investigation Tools

August 11, 2007

2 comments

As part of the Windows Internals course I am teaching for Sela, I have compiled a list of tools that we are using throughout the course for demonstrations, and that I am recommending to students to use in their daily debugging and investigation needs.

In this post, I will introduce you to some of these tools, which I use on a regular basis.  Please note that I don’t own any of these, and (most of them) are freely distributed on the web.  For more information, you are more than welcome to contact the tool authors.

Process Explorer, Sysinternals (Microsoft)

Process Explorer is a “replacement Task Manager” application. However, it enables you to view much more diagnostic information on the system as far as processes, threads, loaded DLLs and handles are concerned. Process Explorer also features process highlighting for created/terminated processes, processes running within a job, protected system processes (on Windows Vista), services, and more. For each process, Process Explorer generates a wealth of information such as the security information, general image properties, and features a thread view with the ability to see thread stacks on a live system, if symbols are properly configured (_NT_SYMBOL_PATH or the internal Process Explorer symbol path). You will need Debugging Tools for Windows installed because it contains the correct version of DbgHelp.dll necessary for obtaining all of the information.

Sample scenarios:

  • See a list of processes and threads on the system and examine what they are doing
  • See a list of loaded modules (DLLs) for each process
  • See a list of handles and handle properties (object properties) for each process
  • See processes being created and terminated on a live system using the process highlighting features
  • Observe general system information such as virtual memory utilization, CPU utilization (user mode and kernel mode, for each CPU), paged and non-paged pool utilization and other data

Performance Monitor, Microsoft

Performance Monitor is the generic Windows mechanism for obtaining, displaying and logging information from Windows Performance Counters. Performance counters can be created programmatically for your application’s monitoring purposes, or you could look into the system-provided counters โ€“ system-wide and process-specific. Performance counters also feature the inherent ability to connect to another machine and display information from there, given the appropriate privileges; this makes the Performance Monitor a perfect remote-monitoring tool, that could be instructed to monitor the behavior of several machines, output a log of this behavior and generate alerts if certain thresholds are exceeded. The following is an incomplete list of interesting performance counters, for diagnostic and performance-monitoring purposes:

  • Process performance counters, enabling you to view the memory consumption, handle count, thread behavior, page faults and other statistics for a particular process
  • Processor performance counters, providing the CPU time the processor is using, the hardware and software (DPC) interrupts rate for that processor, and other useful information
  • .NET performance counters, featuring .NET memory, locks, interop and other statistics for any .NET application executing on the machine
  • Disk-, network- and protocol-oriented performance counters, providing I/O related information for the selected I/O subsystem

The following is a short demonstration of how you would export your own performance counter and observe it in the Performance Monitor.  First of all, you will have to create a performance counter category (a.k.a. “Performance Object” in Performance Monitor).  Inside the category, you will be able to create performance counters and export values from within your application (note: this is much easier to do from .NET than from native code).  The relevant code snippet is below, assuming you only wish to export a single category with a single counter in it:

This results in the following view in the Performance Monitor:

 

ADPlus, Microsoft

(and an introductory session in WinDbg)

ADPlus is a simple VBScript script, which is part of the Debugging Tools for Windows package.  It enables you to obtain a crash or hang dump from an application on a live production system.  ADPlus enables you to generate a dump and analyze it later on the same system (using any debugger, WinDbg included) or on another system, which has symbol information or even source code for the application in question.

You would use ADPlus when you encounter a problem in an application and don’t want to run it under a debugger; instead, you would generate a crash or hang dump for that application and later analyze it in a debugger (on the same or different machine).

This has been extensively covered in dozens of other debugging tutorials (I strongly recommend Tess’ blog or the Debugging Toolbox for this matter), but just for a quick demonstration: assume I have an application that hangs for no apparent reason.  I could use ADPlus to generate a hang dump, with no harm done to the application (as of Windows XP we have the possibility to detach from a debuggee!).  This dump can then be transferred to the developer’s computer or to any other machine with symbols and source code for the application, to facilitate finding the problem.

So here’s the application that hangs:

Yeah, that’s right.  Nothing to see here.  Move on.  Let’s generate a crash dump, by typing the following on a command line in the Debugging Tools for Windows directory:

ADPlus -hang -pn HangHag.exe

We could have used the process id as well.  ADPlus takes dozens of other command line options, accepts script files and XML configuration files – it’s a whole debugging toolbox on its own.  Anyway, let’s move on to the output.  It’s a dump file, generated in the Debugging Tools for Windows directory, so let’s open it up in WinDbg.  In my case, the directory name was Hang_Mode__Date_08-10-2007__Time_11-55-2929.

Inside WinDbg, just choose File -> Open Crash Dump, or click Ctrl+D and navigate to the file you want to open.  If you have symbols for the application and the file opens normally, issue the ~*kb command to see what all the threads were doing at the time the dump was taken.  Here’s the output from this particular case (note it’s a 64-bit system so all the pointers are longer and the output was, unfortunately, line-wrapped):

0:000> ~*kb

. 0 Id: 31c0.2990 Suspend: 1 Teb: 000007ff`fffde000 Unfrozen
RetAddr : Args to Child : Call Site
00000000`77aeed73 : 00000000`00000000 00000000`729ff397 00000000`001b7300 00000000`001b73fd : ntdll!NtWaitForMultipleObjects+0xa
00000000`77aeedf1 : 00000000`00000000 00000000`00000000 00000000`00000001 00000000`00000000 : kernel32!WaitForMultipleObjectsEx+0x10b
*** WARNING: Unable to verify checksum for HangHag.exe
00000001`400011e0 : 00000000`00000000 cccccccc`cccccccc 00000000`00000000 cccccccc`cccccccc : kernel32!WaitForMultipleObjects+0x11
00000001`400017cc : 00000001`00000001 00000000`007ce6d0 00000000`00000000 00000001`400025ce : HangHag!wmain+0xb0 [c:\users\sasha\documents\visual studio 2005\projects\hanghag\hanghag\hanghag.cpp @ 45]
00000001`4000161e : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : HangHag!__tmainCRTStartup+0x19c [f:\sp\vctools\crt_bld\self_64_amd64\crt\src\crtexe.c @ 594]
00000000`77aecdcd : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : HangHag!wmainCRTStartup+0xe [f:\sp\vctools\crt_bld\self_64_amd64\crt\src\crtexe.c @ 414]
00000000`77d0c6e1 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0xd
00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x1d

1 Id: 31c0.331c Suspend: 1 Teb: 000007ff`fffdc000 Unfrozen
RetAddr : Args to Child : Call Site
00000000`77d0efc8 : 00000000`00000000 00000000`00000054 00000000`00000000 00000000`00000000 : ntdll!NtWaitForSingleObject+0xa
00000000`77d0ee8b : 00000000`00000000 00000000`00000000 00000000`00000000 00000001`40008190 : ntdll!RtlpWaitOnCriticalSection+0xd8
00000001`40001078 : 00000000`00000000 00000000`00000000 00000000`009aff50 00000000`00000000 : ntdll!RtlEnterCriticalSection+0xf4
00000000`77aecdcd : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : HangHag!Thread1+0x48 [c:\users\sasha\documents\visual studio 2005\projects\hanghag\hanghag\hanghag.cpp @ 16]
00000000`77d0c6e1 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0xd
00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x1d

2 Id: 31c0.2cd4 Suspend: 1 Teb: 000007ff`fffda000 Unfrozen
RetAddr : Args to Child : Call Site
00000000`77d0efc8 : 00000000`00000000 00000000`0000003c 00000000`00000000 00000000`00000000 : ntdll!NtWaitForSingleObject+0xa
00000000`77d0ee8b : 00000000`00000000 00000000`00000000 00000000`00000000 00000001`400081b8 : ntdll!RtlpWaitOnCriticalSection+0xd8
00000001`400010f8 : 00000000`00000000 00000000`00000000 00000000`00aaff50 00000000`00000000 : ntdll!RtlEnterCriticalSection+0xf4
00000000`77aecdcd : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : HangHag!Thread2+0x48 [c:\users\sasha\documents\visual studio 2005\projects\hanghag\hanghag\hanghag.cpp @ 27]
00000000`77d0c6e1 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : kernel32!BaseThreadInitThunk+0xd
00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x1d

I’ve highlighted the interesting things from this output so it will be easier to comprehend.  What we have here is three threads.  The first one looks like the main thread (we have the wmain function on the call stack).  It’s inside WaitForMultipleObjects.  The second and third one look like secondary application threads (entry points called Thread1 and Thread2).  They are both inside RtlEnterCriticalSection of ntdll.dll (which is the forwarded symbol for EnterCriticalSection of kernel32.dll).  Onwards on the stack we see that they are inside NtWaitForSingleObject, so the critical section was not available (if it were available, the threads wouldn’t go wait on a kernel object).

Now it’s time to examine the critical sections in the process.  Fortunately, the great !locks command gives us all the necessary output:

0:000> !locks

CritSec HangHag!g_cs1+0 at 00000001400081b8
WaiterWoken No
LockCount 1
RecursionCount 1
OwningThread 331c
EntryCount 0
ContentionCount 1
*** Locked

CritSec HangHag!g_cs2+0 at 0000000140008190
WaiterWoken No
LockCount 1
RecursionCount 1
OwningThread 2cd4
EntryCount 0
ContentionCount 1
*** Locked

Scanned 111 critical sections

So, all in all there are 111 critical sections in the process, but only two of them matched the application’s symbols so they are displayed here.  These are the g_cs1 and g_cs2 critical sections, and we also have their addresses.  Note that we also have their owning threads, in the OwningThread field: it appears that the first critical section is owned by thread 331c (which is the second thread in the above output) and the second critical section is owned by thread 2cd4 (which is the third thread in the above output).

Now let’s see what critical sections the threads are trying to acquire.  On a 32-bit system, ~*kb would have given us the function parameters because they are on the stack (and so we’ll be able to see the critical sections addresses).  In this 64-bit dump, we have to work a little harder to see the parameters because 64-bit uses a kind of fast-call convention that attempts to pass as much as possible in the registers.  So one possible approach would be to issue the r command, and look for the critical section address in the registers on each thread (we could switch thread contexts between the second and third thread by using the ~1s and ~2s commands).

Another approach is to look at the address of the synchronization object passed to NtWaitForSingleObject, and try match it with the semaphore object in one of the critical sections.  This is a more interesting approach, so let’s see where it gets us.  First of all, on the call stack for thread #1 we can see:

00000000`77d0efc8 : 00000000`00000000 00000000`00000054 00000000`00000000 00000000`00000000 : ntdll!NtWaitForSingleObject+0xa

Aha!  Here’s our synchronization object (it’s a handle, of course).  We can verify it using the !handle command:

0:002> !handle 00000000`00000054 7
Handle 0000000000000054
Type Event
Attributes 0
GrantedAccess 0x100003:
Synch
QueryState,ModifyState
HandleCount 2
PointerCount 4
Name <none>

So it’s an event…  Let’s see if it matches the handle in one of our critical sections we obtained with the !locks command earlier.  We’ll just pass the address of the critical section to the dt ntdll!_RTL_CRITICAL_SECTION command, which will dump the RTL_CRITICAL_SECTION structure.

Here’s the first one:

0:002> dt ntdll!_RTL_CRITICAL_SECTION 00000001400081b8
+0x000 DebugInfo : 0x00000000`77dd94c0 _RTL_CRITICAL_SECTION_DEBUG
+0x008 LockCount : -6
+0x00c RecursionCount : 1
+0x010 OwningThread : 0x00000000`0000331c
+0x018 LockSemaphore : 0x00000000`0000003c
+0x020 SpinCount : 0

Here’s the second one:

0:002> dt ntdll!_RTL_CRITICAL_SECTION 0000000140008190
+0x000 DebugInfo : 0x00000000`77dd94f0 _RTL_CRITICAL_SECTION_DEBUG
+0x008 LockCount : -6
+0x00c RecursionCount : 1
+0x010 OwningThread : 0x00000000`00002cd4
+0x018 LockSemaphore : 0x00000000`00000054
+0x020 SpinCount : 0

There’s our synchronization object!  The field in the data structure is called “LockSemaphore”, while it is in fact an event, but that’s common knowledge about critical sections.  If we repeat the same process for thread #2, we can see that it passes the 00000000`0000003c handle value to NtWaitForSingleObject.  That just happens to match the “LockSemaphore” field for the first critical section we examined.

So let’s put our findings in context:

  • Thread #1 acquires g_cs1 and tries to acquire g_cs2.
  • Thread #2 acquires g_cs2 and tries to acquire g_cs1.

This results in a deadlock.  By the way, what is the main thread (#0 in the display above) trying to do?  This one is a bit harder to decipher (even though we have a direct source code pointer so we can look at the source is we want).  Let’s issue a ~0s command to switch to the main thread, and then a !for_each_frame dv /v command to have a look at the local variables (note that most of the enumeration will fail because we don’t have private symbols for the system code, and without private symbols we won’t be able to see local variables).

0:000> ~0s
ntdll!NtWaitForMultipleObjects+0xa:
00000000`77d1082a c3 ret
0:000> !for_each_frame dv /v
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
00 00000000`0012fcf8 00000000`77aeed73 ntdll!NtWaitForMultipleObjects+0xa
Unable to enumerate locals, HRESULT 0x80004005
Private symbols (symbols.pri) are required for locals.
Type “.hh dbgerr005” for details.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
01 00000000`0012fd00 00000000`77aeedf1 kernel32!WaitForMultipleObjectsEx+0x10b
Unable to enumerate locals, HRESULT 0x80004005
Private symbols (symbols.pri) are required for locals.
Type “.hh dbgerr005” for details.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
02 00000000`0012fe10 00000001`400011e0 kernel32!WaitForMultipleObjects+0x11
Unable to enumerate locals, HRESULT 0x80004005
Private symbols (symbols.pri) are required for locals.
Type “.hh dbgerr005” for details.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
03 00000000`0012fe50 00000001`400017cc HangHag!wmain+0xb0 [c:\users\sasha\documents\visual studio 2005\projects\hanghag\hanghag\hanghag.cpp @ 45]
00000000`0012fec0 argc = 1
00000000`0012fec8 argv = 0x00000000`007ce6d0
00000000`0012fe88 rghThreads = void *[2]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
04 00000000`0012fec0 00000001`4000161e HangHag!__tmainCRTStartup+0x19c [f:\sp\vctools\crt_bld\self_64_amd64\crt\src\crtexe.c @ 594]
00000000`0012ff00 lock_free = 0x00000000`00000000
00000000`0012fef8 fiberid = 0x00000000`00130000
00000000`0012fef0 nested = 0
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
05 00000000`0012ff30 00000000`77aecdcd HangHag!wmainCRTStartup+0xe [f:\sp\vctools\crt_bld\self_64_amd64\crt\src\crtexe.c @ 414]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
06 00000000`0012ff60 00000000`77d0c6e1 kernel32!BaseThreadInitThunk+0xd
Unable to enumerate locals, HRESULT 0x80004005
Private symbols (symbols.pri) are required for locals.
Type “.hh dbgerr005” for details.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
07 00000000`0012ff90 00000000`00000000 ntdll!RtlUserThreadStart+0x1d
Unable to enumerate locals, HRESULT 0x80004005
Private symbols (symbols.pri) are required for locals.
Type “.hh dbgerr005” for details.
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
00 00000000`0012fcf8 00000000`77aeed73 ntdll!NtWaitForMultipleObjects+0xa

The highlighted rghThreads array contains two elements and has the element type void* (which is probably just a HANDLE).  It’s the only interesting local variable in the wmain function, so it’s probably the thing we should be looking at.  Taking a quick look at the memory reveals this:

0:000> dd 00000000`0012fe88
00000000`0012fe88 0000000c 00000000 00000024 00000000

So these indeed look like handles, let’s examine them with !handle:

0:000> !handle c
Handle 000000000000000c
Type Thread
0:000> !handle 24
Handle 0000000000000024
Type Thread

With further investigation it is possible to prove that the main thread is indeed issuing a wait on these two handles (which are the handles for the other two threads in the application, which are deadlocked, as you remember), etc., but for now I am pretty convinced that it’s the case, and for the sake of keeping you awake, we’ll move to the next tool ๐Ÿ™‚

ProcMon, Sysinternals (Microsoft)

ProcMon (short for Process Monitor) is an extremely powerful diagnostic utility that installs a filter driver to monitor registry and file system activity.  The utility also features good filtering capabilities, which allow you to see exactly what each individual process or the system as a whole is doing at every given moment, from the registry and file system aspects.

For example, here’s a screenshot of Process Monitor monitoring notepad.exe, for a brief couple of seconds.  It’s quite obvious that you can get easily overwhelmed with the information, unless you know what you’re looking for.

 

What’s most amazing is that you can actually see a call stack for the event in question, generated at the time of the event (if you have symbols set up correctly, of course).

You can unleash the power of the process monitor to see what valid/invalid accesses to the registry and file system your processes are attempting, to debug cryptic errors coming from processes you can’t debug directly, or even to generate a nightly log of what your machine is doing when you aren’t around ๐Ÿ™‚

For example, when presented with the following dialog box:

… you could start pulling hairs and trying to get the developer responsible for this hung and shot (not necessarily in that order), or you could run this application under Process Monitor, choose to filter only this process, and highlight operations that didn’t result in “SUCCESS”:

There’s a couple of these “NAME NOT FOUND” errors, let’s have a closer look:

There’s your attempt to access a non-existing file, stupid program!

Logger, Microsoft

Logger (API Logger) is very similar in purpose to the Process Monitor, in the sense that it monitors your application’s activity and reports it in a useful UI.  It is different from the Process Monitor because it does not monitor file system and registry activity, but instead it monitors the API calls that your application is making (by injecting a special DLL into the process address space of your application and hooking these API calls).  Note that you can use Logger from within a debugging session in WinDbg, as well as a stand-alone module that generates a useful report.  It’s also part of the Debugging Tools for Windows package.

You run Logger from the command line, specifying the application to monitor as the command line parameter.  You will then get a nice dialog asking you to specify which APIs you would like to monitor, and to break the analysis to particular sections such as memory, registry access etc. (note that you won’t be able to monitor kernel32.dll – this is a major shortcoming of Logger; the reason for it is that Logger itself uses the kernel32.dll functions so it won’t hook them).  After that, you will run your application normally until completion.  Then you’d run Logviewer.exe to see the report, which is stored by default to the Desktop\LogExts directory (actually, I found no way to change this default).  Here’s what the report looks like, when running Logger on notepad.exe:

Yes, that’s right: you see individual APIs being called, with all parameters (parsed from the Windows header files), return values and even a highlighting when one of the APIs returns an error value.  Definitely a useful utility!

GFlags, Microsoft

GFlags (also a part of the Debugging Tools for Windows package) is a simple UI application allowing you to modify the NtGlobalFlags kernel variable, affecting the behavior of various operating system components.  It also has a set of flags for a particular image on the system (“Image Execution Options”), for example allowing you to specify that a particular image always has to be launched under a particular debugger.  Bear in mind that using most of these flags adversely affects system performance, so you should only have them enabled for the duration of a single troubleshooting session.

Here are some interesting flags that you can set using GFlags:

  • Kernel-wide settings (as of Windows Vista, some don’t require a system restart)
    • Stop the system (bug check, blue-screen) on a user-mode exception
    • Maintain a list of objects for each type (facilitates finding object leaks)
    • Object reference tracing (includes a detailed call stack for every handle or reference creation and destruction, facilititates finding double-release or object leak patterns)
    • Heap-verification flags
  • Per-image settings
    • Heap-verification flags
    • Always execute the image under the specified debugger

Here’s a simple setting that demonstrates the “power” of GFlags.  Open GFlags, and navigate to the “Image File” tab.  Type “notepad.exe” into the edit box, and hit TAB.  Now type “calc.exe” into the debugger edit box below, and save the settings.  Your screen should like like the following:

Now, try launching Notepad by any possible means (from the Start menu, from the command line, from the Run… applet).  Anything you do should bring up the Calculator.  A great trick to play on your colleagues (note that this survives restarts and application reinstallations – it’s just a registry setting!).

Application Verifier, Microsoft

Application Verifier is a verification tool for user-mode applications.  It enables checking a particular application to detect compatibility, performance and correctness issues, at the cost of degraded performance for the course of the verification session.  Quite like the Logger utility, it injects its DLL into the application’s address space to actively monitor for problems; unlike the Logger utility, it can actively change the state of the application (e.g. simulating a low memory situation) and diagnose common pitfalls.

Application Verifier is extremely useful for simulating a low-resource or low-privileges environment for your application to ensure that it still functions correctly and doesn’t cause data loss; for diagnosing resource leaks and synchronization behavior; for diagnostic application compatibility with future or past versions of the OS by simulating a newer or older version of the OS.

For example, let’s configure Application Verifier to create a low resource simulation for our all-time favorite, Notepad.  The UI is rather intuitive, so here’s the end result:

After saving the settings, you’d simply run Notepad.  Interestingly, on my machine, Notepad simply crashes (!) immediately after starting up!  When you look at the Application Verifier logs for Notepad, nothing shows up, as the following screenshot shows:

Let’s try something else entirely.  Application Verifier features a whole set of “basic” checks that monitor API usage, memory, handles and objects, etc.  Let’s run a sample application with these verifications enabled, and see what it has to say…

First of all, when you enable verifications that will cause an exception in a debugger, you need to run the application with a debugger attached (preferably WinDbg, as it recognizes the Application Verifier’s stop codes and extensions).  The Application Verifier will warn you if that’s the case:

So let’s launch it in the debugger.  Immediately after starting the executable and passing the initial (loader) breakpoint, I get a breakpoint from the Application Verifier:

0:000> g

=======================================
VERIFIER STOP 0000000000000305 : pid 0x27AC: Incorrect object type for handle.

0000000000000084 : Handle value.
000000000012FE08 : Object type name. Use du to display it
0000000073023C38 : Expected object type name. Use du to display it
0000000000000000 : Not used.

=======================================
This verifier stop is continuable.
After debugging it use `go’ to continue.

=======================================

(27ac.3860): Break instruction exception – code 80000003 (first chance)
ntdll!DbgBreakPoint:
00000000`77d0fdf0 cc int 3

So, I guess I’m using a handle to the incorrect type of object when calling some kind of API…  Let’s see some details:

0:000> du 000000000012FE08
00000000`0012fe08 “Mutant”
0:000> du 0000000073023C38
00000000`73023c38 “Event”
0:000> !handle 0000000000000084
Handle 84
Type Mutant

So the type I passed was a mutant (mutex), and the type expected was an event.  The call stack itself reveals the API (and also the hooking that Application Verifier performs):

0:000> k
Child-SP RetAddr Call Site
00000000`0012f9b8 00000000`756737f2 ntdll!DbgBreakPoint
00000000`0012f9c0 00000000`73032ae3 vrfcore!VerifierStopMessageEx+0x7c6
00000000`0012fce0 00000000`7302a08c vfbasics!VfBasicsStopMessage+0x137
00000000`0012fd40 00000000`7302a109 vfbasics!AVrfpCheckObjectType+0xe4
00000000`0012fe40 00000000`7302a3a4 vfbasics!AVrfpHandleSanityChecks+0x51
00000000`0012fea0 00000000`77aeec1c vfbasics!AVrfpNtSetEvent+0x34
00000000`0012fed0 00000001`4000101f kernel32!SetEvent+0xc
00000000`0012ff00 00000001`40001200 AnnoyingProgram!wmain+0x1f [c:\users\sasha\documents\visual studio 2005\projects\annoyingprogram\annoyingprogram\annoyingprogram.cpp @ 14]
00000000`0012ff30 00000000`77aecdcd AnnoyingProgram!__tmainCRTStartup+0x120 [f:\sp\vctools\crt_bld\self_64_amd64\crt\src\crtexe.c @ 594]
00000000`0012ff60 00000000`77d0c6e1 kernel32!BaseThreadInitThunk+0xd
00000000`0012ff90 00000000`00000000 ntdll!RtlUserThreadStart+0x1d

Very nice, Application Verifier! ๐Ÿ™‚  Note that it really can detect a whole lot of conditions that might require lots of manual labor to diagnose correctly…  Here’s a snippet from the documentation (it’s just a partial list!):

Application Compatibility Toolkit 5.0 (a.k.a. Standard User Analyzer), Microsoft

ACT enables you to determine whether your application will function properly (is compatible) with a new version of Windows.  Formerly, the ACT was called the Standard User Analyzer, because it would mostly determine whether the application can run under a standard (non-administrative) account.  Currently, the Standard User Analyzer is just one part of ACT (other parts include, for example, an Internet Explorer Analysis Tool, a Setup Analysis Tool, and a management interface for all application compatibility concerns).

The minor (not really minor!) problem with ACT is that it can only be installed on 32-bit versions of Windows (as of right now).  Therefore, my demo of this tool will be from my Windows 2003 Server (32-bit) Virtual PC; I will only show you the Standard User Analyzer portion.

You’d launch the application from within the Standard User Analyzer (SUA), use it like you always do, and then close it to see the reports from within the SUA UI.  In this example, I have launched the GFlags application and modified one of the system-wide settings.

Let’s see the issues SUA was able to find in this case:

The SUA specifically (and the ACT tools in general) should definitely be used by developers willing to deploy their applications on the new Vista and 2008 Server Windows operating systems.  Issues such as UAC (User Account Control) and other security features will take you down unless you plan for them and analyze your application scrupulously.

Driver Verifier, Microsoft

The Driver Verifier (which is included with any Windows installation as the Verifier.exe application) is a driver verification utility used by Microsoft and device driver developers for driver testing and debugging purposes.  It can also be used on a live system to troubleshoot problematic drivers and catch errors originating from these drivers before they have a chance to corrupt system state and make diagnosing the problem impossible.  Driver Verifier features various verification options, all of which require a restart before having effect (this is due to the fact Driver Verifier replaces driver calls to kernel routines with its own versions of these routines, that perform additional verification, e.g. VerifierExAllocatePool; this idea should already be familiar from the application verifier).

Driver Verifier features the following kinds of useful diagnostics:

  • Diagnosing executive pool corruption by enabling the Special Pool option.  This causes every allocation performed by the driver to be “wrapped” by a couple of invalid pages and a signature (“cookie”).  If a driver corrupts memory, it will be caught at the moment of corruption, and not at some later point when another components attempts to use the corrupted memory.
  • Enforcing IRQL verification for a driver to ensure that it isn’t accessing paged memory at a high IRQL, or forgetting to lower the IRQL after raising it.
  • Detecting deadlocks a driver or a group of drivers are causing (a deadlock at a high IRQL effectively results in a non-responsive systme, because no user-mode threads can run).
  • Simulating a low-resource environment for a driver to ensure that it behaves correctly if memory allocations are failing (it should fail I/O requests, but not corrupt state or crash the entire system).

I find Driver Verifier quite useful even for “home” diagnostics of bug-checks, the infamous blue-screen condition that occurs when Windows can’t proceed normally and must shut down to prevent further corruption.  I’ll quickly demonstrate how Driver Verifier can be useful by using the Notmyfault utility by Mark Russinovich, which is essentially a UI front-end to a faulty driver that can perform various malicious operations.

So let’s begin with opening Notmyfault and asking it to crash the system through a user-mode high IRQL.  This means that Notmyfault will return to user-mode from a DeviceIoControl call that raises the IRQL within the driver’s code, but doesn’t lower it.  At a later point, an operation in user-mode will trigger a bug-check because the IRQL is high but paging or a context switch will occur.  (If you look at the code, it simply calls KeRaiseIrql and returns from the dispatch routine.)

The system doesn’t crash immediately; I can actually move the mouse and open an application or two (that’s the beauty of non-deterministic crashes), until it eventually crashes with the following nice blue-screen:

Note that the component that takes the blame is Win32k.sys, which the windowing and graphics driver.  It is a core part of Windows, so the odds of it being responsible for the crash are rather slim (especially since we know that Notmyfault.exe with its driver are responsible), but let’s have a look at the generated crash dump in WinDbg:

Microsoft (R) Windows Debugger Version 6.7.0005.0
Copyright (c) Microsoft Corporation. All rights reserved.

Loading Dump File [D:\Temp\MEMORY.DMP]
Kernel Complete Dump File: Full address space is available

Symbol search path is: srv*C:\Windows\Symbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Server 2003 Kernel Version 3790 (Service Pack 1) UP Free x86 compatible
Product: Server, suite: Enterprise TerminalServer SingleUserTS
Built by: 3790.srv03_sp1_rtm.050324-1447
Kernel base = 0x80800000 PsLoadedModuleList = 0x808a8e48
Debug session time: Sat Aug 11 13:32:39.628 2007 (GMT+3)
System Uptime: 0 days 0:02:57.435
Loading Kernel Symbols
…………………………………………………………………………………
Loading User Symbols
………..
Loading unloaded module list
….
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck D1, {e102b000, 2, 1, bf87aa1c}

*** ERROR: Module load completed but symbols could not be loaded for NotMyfault.exe
Probably caused by : win32k.sys ( win32k!PALLOCMEM+2b )

Followup: MachineOwner
———

Even !analyze -v points at Win32k.sys:

kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: e102b000, memory referenced
Arg2: 00000002, IRQL
Arg3: 00000001, value 0 = read operation, 1 = write operation
Arg4: bf87aa1c, address which referenced memory

Debugging Details:
——————

WRITE_ADDRESS: e102b000 Paged pool

CURRENT_IRQL: 2

FAULTING_IP:
win32k!PALLOCMEM+2b
bf87aa1c f3ab rep stos dword ptr es:[edi]

DEFAULT_BUCKET_ID: INTEL_CPU_MICROCODE_ZERO

BUGCHECK_STR: 0xD1

PROCESS_NAME: NotMyfault.exe

TRAP_FRAME: f4114e10 — (.trap 0xfffffffff4114e10)
ErrCode = 00000002
eax=00000000 ebx=00003f34 ecx=00000fcd edx=e102b000 esi=00003f34 edi=e102b000
eip=bf87aa1c esp=f4114e84 ebp=f4114e8c iopl=0 nv up ei pl zr na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010246
win32k!PALLOCMEM+0x2b:
bf87aa1c f3ab rep stos dword ptr es:[edi] es:0023:e102b000=????????
Resetting default scope

LAST_CONTROL_TRANSFER: from bf87aa1c to 80826493

STACK_TEXT:
f4114e10 bf87aa1c badb0d00 e102b000 000007ff nt!KiTrap0E+0x2a1
f4114e8c bf86d688 00003f34 35306847 f4114f10 win32k!PALLOCMEM+0x2b
f4114ea8 bf87635d 00003f34 00000005 00000001 win32k!AllocateObject+0x9a
f4114edc bf83f6d1 0000037c 00000000 00000000 win32k!SURFMEM::bCreateDIB+0x1ca
f4114f40 bf80b866 e15fc1d8 f4114f68 f4114f88 win32k!psSetupDstSurface+0x70
f4114fa0 bf80ba0c 00000000 f41150b4 00000000 win32k!EngGradientFill+0xc6
f4114ff4 bf80bab6 bf80b7d1 f4115070 e15fc1e8 win32k!OffGradientFill+0x92
f4115078 bf80af68 e15fc1e8 f41150b4 00000000 win32k!SpGradientFill+0x5e
f411518c bf80b6c5 00000000 f41151ac 00000002 win32k!GreGradientFill+0x44d
f41151ec bf80b20c 01010054 f4115514 00808080 win32k!FillGradient+0x75
f4115204 bf89e2f9 01010054 f4115514 00000000 win32k!FillCaptionGradient+0x5d
f41154bc bf89da89 bc639250 01010054 f4115514 win32k!xxxDrawCaptionTemp+0x384
f4115540 bf8a89f0 bc639250 01010054 0000100c win32k!xxxDrawCaptionBar+0x712
f4115564 bf872390 bc639250 0000100c 00000000 win32k!xxxDWP_DoNCActivate+0xc7
f41155c4 bf887242 bc639250 00000086 00000000 win32k!xxxRealDefWindowProc+0x76b
f41155dc bf887272 bc639250 00000086 00000000 win32k!xxxWrapRealDefWindowProc+0x16
f41155f8 bf8871f2 bc639250 00000086 00000000 win32k!NtUserfnDWORD+0x27
f4115630 8082337b 000100a0 00000086 00000000 win32k!NtUserMessageCall+0xc0
f4115630 7c82ed54 000100a0 00000086 00000000 nt!KiFastCallEntry+0xf8
0012fb38 7739ca95 773965fb 000100a0 00000086 ntdll!KiFastSystemCallRet
0012fb8c 7739637e 000100a0 00000086 00000000 USER32!NtUserMessageCall+0xc
0012fba8 773963e8 000100a0 00000086 00000000 USER32!RealDefWindowProcA+0x47
0012fbf0 00401980 000100a0 00000086 00000000 USER32!DefWindowProcA+0x72
WARNING: Stack unwind information not available. Following frames may be wrong.
0012fc14 00401980 000100a0 00000200 00000000 NotMyfault+0x1980
00000000 00000000 00000000 00000000 00000000 NotMyfault+0x1980

STACK_COMMAND: kb

FOLLOWUP_IP:
win32k!PALLOCMEM+2b
bf87aa1c f3ab rep stos dword ptr es:[edi]

SYMBOL_STACK_INDEX: 1

SYMBOL_NAME: win32k!PALLOCMEM+2b

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: win32k

IMAGE_NAME: win32k.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 434471b4

FAILURE_BUCKET_ID: 0xD1_W_win32k!PALLOCMEM+2b

BUCKET_ID: 0xD1_W_win32k!PALLOCMEM+2b

Followup: MachineOwner
———

So what do we do if this doesn’t make much sense?  We could have Driver Verifier configured on the system to track the IRQL for a particular driver.  It is normally recommended to enable the Driver Verifier for suspect (new or updated) drivers first, then for drivers that haven’t been built with your version of Windows, then for unsigned drivers, and finally for all drivers (probably in groups of 5-10 drivers so as to keep performance reasonable).  In our particular case, I know the problematic driver in advance, so I enable the “Force IRQL Checking” flag for the Myfault.sys driver.  After having done that, I reboot the system and ensure that my settings were propagated by asking the Driver Verifier to show me the list of drivers that are being verified:

Now I open Notmyfault again and choose the same bug.  A blue-screen occurs immediately this time, without any delay, and even the blue-screen already pinpoints the problem:

Note that this time, the explanation shows that a driver being verified has caused the problem.  This is great news: we have caught a driver red-handed!  Now let’s look at the crash dump and we’ll have the full picture:

kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

DRIVER_VERIFIER_IOMANAGER_VIOLATION (c9)
The IO manager has caught a misbehaving driver.
Arguments:
Arg1: 00000005, Irql not equal across call to the driver dispatch routine
Arg2: 823058d0, the device object associated with the offending driver
Arg3: 00000000, the Irql before the call
Arg4: 00000002, the Irql after the call

Debugging Details:
——————

BUGCHECK_STR: 0xc9_5

DRIVER_VERIFIER_IO_VIOLATION_TYPE: 5

PREVIOUS_IRQL: 0

CURRENT_IRQL: 2

DEVICE_OBJECT: 823058d0

DRIVER_OBJECT: 820081a0

DEBUG_FLR_IMAGE_TIMESTAMP: 0

FAULTING_MODULE: f7a5a000 myfault

DEFAULT_BUCKET_ID: CODE_CORRUPTION

PROCESS_NAME: NotMyfault.exe

LAST_CONTROL_TRANSFER: from 809cc5c0 to 80875d0e

STACK_TEXT:
b7d13c04 809cc5c0 000000c9 00000005 823058d0 nt!KeBugCheckEx+0x1b
b7d13c40 80853648 8090b989 b7d13c60 8090b989 nt!IovCallDriver+0x155
b7d13c4c 8090b989 832b4fd8 82095038 832b4f68 nt!IofCallDriver+0x13
b7d13c60 8090eb17 823058d0 832b4f68 82095038 nt!IopSynchronousServiceTail+0x10b
b7d13d00 8092e81d 00000064 00000000 00000000 nt!IopXxxControlFile+0x605
b7d13d34 8082337b 00000064 00000000 00000000 nt!NtDeviceIoControlFile+0x2a
b7d13d34 7c82ed54 00000064 00000000 00000000 nt!KiFastCallEntry+0xf8
0012f9c4 7c8213e4 77e416f1 00000064 00000000 ntdll!KiFastSystemCallRet
0012f9c8 77e416f1 00000064 00000000 00000000 ntdll!NtDeviceIoControlFile+0xc
0012fa2c 004018c2 00000064 83360008 00000000 kernel32!DeviceIoControl+0x137
WARNING: Stack unwind information not available. Following frames may be wrong.
0012fa6c 773957f9 01010055 005a0049 0012fa94 NotMyfault+0x18c2
0012faf8 77393422 01010055 00000043 0110007a USER32!ClientFrame+0xe0
0012fb1c 773a239c 01010055 0012fb38 00000000 USER32!DrawFocusRect+0x40
0012fb88 773ab490 7739c3b7 000400a0 00000111 USER32!xxxBNDrawText+0x3e9
0012fbc0 000400a0 00000111 00000001 000400a6 USER32!xxxDrawButton+0xbb
00401640 55530000 3c24ac8b 56000001 24bc8b57 0x400a0
00401644 3c24ac8b 56000001 24bc8b57 00000140 0x55530000
00401648 56000001 24bc8b57 00000140 0111fd81 0x3c24ac8b
0040164c 24bc8b57 00000140 0111fd81 870f0000 0x56000001
00401650 00000000 0111fd81 870f0000 00000263 0x24bc8b57

STACK_COMMAND: kb

CHKIMG_EXTENSION: !chkimg -lo 50 -d !hal
80a6d8f4-80a6d8fa 7 bytes – hal!KfRaiseIrql+24
[ e6 21 c1 e8 08 e6 a1:66 e7 21 c1 e8 08 90 ]
80a6d925-80a6d92b 7 bytes – hal!KeRaiseIrqlToSynchLevel+d (+0x31)
[ e6 21 c1 e8 08 e6 a1:90 90 90 90 90 90 90 ]
80a6d95b-80a6d961 7 bytes – hal!KfLowerIrql+1b (+0x36)
[ e6 21 c1 e8 08 e6 a1:66 e7 21 c1 e8 08 90 ]
80a6da87-80a6da8d 7 bytes – hal!HalEndSystemInterrupt+1b (+0x12c)
[ e6 21 c1 e8 08 e6 a1:66 e7 21 c1 e8 08 90 ]
80a725a2-80a725a8 7 bytes – hal!HalBeginSystemInterrupt+5e (+0x4b1b)
[ e6 21 c1 e8 08 e6 a1:66 e7 21 c1 e8 08 90 ]
80a725b6-80a725bf 10 bytes – hal!HalBeginSystemInterrupt+72 (+0x14)
[ b0 20 e6 a0 b0 62 e6 20:66 b8 20 62 66 e7 a0 90 ]
80a725f5-80a725fb 7 bytes – hal!HalDisableSystemInterrupt+21 (+0x3f)
[ e6 21 c1 e8 08 e6 a1:66 e7 21 c1 e8 08 90 ]
80a7266e-80a72674 7 bytes – hal!HalEnableSystemInterrupt+6a (+0x79)
[ e6 21 c1 e8 08 e6 a1:66 e7 21 c1 e8 08 90 ]
59 errors : !hal (80a6d8f4-80a72674)

MODULE_NAME: memory_corruption

IMAGE_NAME: memory_corruption

FOLLOWUP_NAME: memory_corruption

MEMORY_CORRUPTOR: LARGE

FAILURE_BUCKET_ID: MEMORY_CORRUPTION_LARGE

BUCKET_ID: MEMORY_CORRUPTION_LARGE

Followup: memory_corruption
———

We caught the driver red-handed, and it was the myfault driver, as you can see highlighted above.

That’s it for now …

I might follow this up with a couple of additional tools that I haven’t covered here, such as WMI-related tools (the WMI Console, the WMI Code Creator), PoolMon, KernRate and others.  Then again I might not follow-up, because I’ve written so much my wrists hurt ๐Ÿ™‚

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*

2 comments

  1. adlaimaAugust 15, 2007 ื‘ 6:50 AM

    I think it a bit of a miss that you didn’t tell about the memory leak that “Process Explorer” has.
    Or in short , don’t leave it runing on a production machine longer then 2 hours just because you want to see the progress bar over these two hours !
    I know it’s usefull and some times fasinating , but heck !

    Reply
  2. SaurabhJanuary 27, 2009 ื‘ 4:46 PM

    This is a great stuff….keep writing such articles. Thanks for your efforts.

    Reply