May 2011 - Posts
As part of the Windows Internals course at SELA, I recently designed a set of exercises that serve as an introduction to Windows device driver development. Their purpose is to obtain a very cursory familiarity with what it means to build, deploy and load a driver, and consider some of the things available to kernel-mode components which make them way cooler than user-mode applications.
Some of this work can be turned easily into a series of blog posts, which you can enjoy outside of the course’s context. However, if you’re looking for background on Windows subsystems and components, what it means to deliver DPCs and interrupts, how IRQLs limit driver execution, why threads are scheduled the way they are, how synchronization mechanisms work, how memory is allocated and memory addresses are translated, and many other extremely important details on how Windows works—Windows Internals is the course for you. (And so is “the book”—Windows Internals, 5th Edition.)
First and foremost, you need to set up an environment in which you will build, deploy, and load your driver. We will be using a host machine on which we’ll build and debug, and a target virtual machine to which the driver will be deployed. My own setup is a Windows 7 64-bit physical host and a Windows XP 32-bit target VM, running VMWare Workstation.
- Download the Debugging Tools for Windows (both 32- and 64-bit editions) and install them on the host machine.
- Download the Windows Driver Kit and install it on the host machine.
- Set up a virtual machine running Windows XP or a newer version. (The instructions below are applicable to Windows XP or Windows Server 2003, 32-bit editions.)
- Go to My Computer | Properties | Advanced Settings and choose Startup and Recovery | Edit. You should see an OS boot choice similar to the following:
multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Windows XP Professional" /fastdetect
- Add /debug /debugport=com1 to the line above.
- In the virtual machine’s settings, redirect the COM1 port to a pipe called \\.\pipe\com_1. (This step depends on your virtualization product. For example, in VMWare Workstation you will need to add a new serial port.)
- Launch WinDbg and select File | Kernel Debug. On the COM tab, use the following settings:

- Click “OK”.
- Start the virtual machine and make sure the debugger connection is established. You can hit Ctrl+Break in WinDbg to issue a breakpoint to the virtual machine, and then use any kernel-mode WinDbg commands. (Try !process 0 0 for a process list.)
- Register at OSR Online, download the OSR Driver Loader, and copy the appropriate OSRLoader.exe to your target virtual machine. (For example, Windows XP 32-bit free build version is in the WXP\i386\FRE subdirectory of the ZIP archive you downloaded.)
In the next part, we will compile our first driver and load it onto the system using OSR Driver Loader.
During the last week of June Sela is going to host 25 one-day sessions in 5 days packed with the latest and greatest Microsoft technologies, agile and ALM tips, debugging and troubleshooting, cloud and web. This mini-conference, dubbed Sela Developer Days, is going to open for registration on Sunday and I encourage you to take a look at the conference website to see what sessions are going to be available.
Yours truly is scheduled for four sessions. Instead of rehashing the abstracts from the conference website, here’s some more information on what I intend to do with my four days. I’ll be very happy to see you there, and if you have any questions feel free to use the contact form.
.NET Debugging
This one-day session is going to be very similar to what I delivered at the SDP a few months ago. The classroom was packed, and we discussed in detail how to troubleshoot some fairly advanced issues like deadlocks and memory leaks using WinDbg and SOS. If you’re an experienced .NET developer but haven’t had the chance to use troubleshooting tools outside of Visual Studio or to debug production issues and analyze dump files, this day is for you.
C++ Debugging
This one-day session is based on the new two-day C++ Debugging course I recently developed at Sela. This course has been delivered a few times already and I consider it a great success—even very experienced C++ developers find new debugging tools and techniques, and better understanding of how their application integrates with the system, through this course’s set of hands-on labs. In the one-day setting, I’m going to dive right to the actual debugging scenarios, which include memory leaks, deadlocks, stack and heap corruptions, and much more.
Improving the Performance of .NET Applications
This one-day session is an attempt to give .NET developers the tools and techniques for measuring application performance, but also the necessary understanding of .NET internals to improve performance and design high-performance applications. We will look at various profilers, discuss GC inner-workings, and see several scenarios where application performance can be improved without a major rewrite or redesign.
Windows Internals for Busy Developers
This one-day session is based on one of my favorite Sela courses, Windows Internals for Developers. Understanding how the operating system works and what can affect the performance and reliability of your applications is crucial for writing great software on Windows. Windows is also a huge software product, and we can learn a lot from its architecture and design—the memory manager, the thread scheduler, and the implementation of synchronization mechanisms all offer interesting lessons that can be applied in your applications; and the tools we’re going to use to unveil the secrets of Windows are useful for troubleshooting and understanding your own software as well.
Dima has brought to my attention a nasty bug probably attributed to a memory corruption. The bug’s manifestation is usually an access violation in a completely unrelated piece of code, oftentimes causing an ExecutionEngineException.
This is an example of an access violation of the above variety (some of the output was snipped for brevity):
0:004> .loadby sos clr
0:004> g
(510.c88): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
00742a11 8b4028 mov eax,dword ptr [eax+28h] ds:002b:0000002c=????????
0:000> !CLRStack
OS Thread Id: 0xc88 (0)
Child SP IP Call Site
004bedb8 00742a11 OverlappingObjects.Program.Main(System.String[]) [\OverlappingObjects\Program.cs @ 51]
004beff0 724221bb [GCFrame: 004beff0]
0:000> k
ChildEBP RetAddr
WARNING: Frame IP not in any known module. Following frames may be wrong.
004bedc4 724221bb 0x742a11
004bedd4 72444be2 clr!CallDescrWorker+0x33
004bee50 72444d84 clr!CallDescrWorkerWithHandler+0x8e
…
0:000> !u 0x742a11
Normal JIT generated code
OverlappingObjects.Program.Main(System.String[])
Begin 00742980, size d7
…
\OverlappingObjects\Program.cs @ 51:
00742a0c mov ecx,dword ptr [esi+4]
00742a0f mov eax,dword ptr [ecx]
>>> 00742a11 mov eax,dword ptr [eax+28h]
00742a14 call dword ptr [eax+8]
Hmm. The exception seems to be happening when calling a virtual function on an object stored in the ECX register. From inspecting the source code, this is a virtual function call to GetHashCode.
If this generates an access violation, ECX must be some invalid pointer. (Although if this were a simple null reference exception, we would fail at the previous instruction when trying to dereference ECX.)
0:000> r ecx
ecx=026ae494
0:000> !do 026ae494
<Note: this object has an invalid CLASS field>
Invalid object
0:000> !gcwhere 026ae494
Address Gen Heap segment
026ae494 2 0 026a0000
Well, this is no null pointer. ECX is a reference into the GC heap, but for some reason calling a method through it fails. What does the method table look like?
0:000> dd 026ae494 L2
026ae494 00000004 00000000
Ow! This is not a method table. So far we have a reference into the GC heap that is not actually a reference to a valid object, so an attempt to call a virtual function on it fails. Let’s look around for some instances:
0:000> !dumpheap -type Overlapping
Address MT Size
026ae480 00373534 16
total 0 objects
Statistics:
MT Count TotalSize Class Name
00373534 1 16 OverlappingObjects.ReferenceHolder
Total 1 objects
0:000> !do 026ae480
Name: OverlappingObjects.ReferenceHolder
MethodTable: 00373534
EEClass: 0062126c
Size: 16(0x10) bytes
Fields:
MT Field Offset Type VT Attr Value Name
0060c12c 4000003 8 System.UInt32 1 instance 3405695742 Marker
003734b0 4000004 4 ...bjects.SomeObject 0 instance 026ae494 TheReference
0:000> !do 026ae494
<Note: this object has an invalid CLASS field>
Invalid object
0:000> !dumpheap 026ae494-100 026ae494+100
Address MT Size
026ae480 00373534 16
026ae490 001cdb38 16 Free
026ae4a0 005dc838 4012
total 0 objects
Statistics:
MT Count TotalSize Class Name
00373534 1 16 OverlappingObjects.ReferenceHolder
001cdb38 1 16 Free
005dc838 1 4012 System.Int32[]
Total 3 objects
What have we here? Our object lies in the range [026ae490…026ae4a0) which is attributed to a free object; i.e. the GC has reclaimed this memory for other uses (and we’re lucky not to see some other object already in this space!).
Moreover, the reference we have is at a four-byte offset from the object’s former resting place—and we obtain this reference from a valid instance of the ReferenceHolder class. Now here is a likely scenario that explains this turn of events:
- The ReferenceHolder instance contains the only reference to our object.
- For some reason, e.g. a random memory overwrite, the reference is bumped four bytes forward, so it no longer references a valid object.
- The GC runs and there is no longer a valid reference to our object, so its memory is reclaimed.
- The ReferenceHolder instance still thinks it has a valid reference to our object, and when a method call on that reference is attempted, we get the nice access violation.
I would like to reiterate that this access violation is a fairly optimistic outcome. Things could have been much, much worse if another valid object was allocated in the free space, and the GetHashCode method call would magically be invoked on that object. Another alternative is that a large object would occupy the space both before and after the reclaimed memory, and then the invalid reference would actually point in the middle of a valid object, producing the effect of objects overlapping in memory!
Below is the code required to reproduce this scenario. Because it uses memory offsets that may change between CLR versions and OS flavors, to repro you would need a Windows 7 64-bit OS and compile the code as .NET 4.0 Release 32-bit.
namespace OverlappingObjects
{ [StructLayout(LayoutKind.Sequential)] class SomeObject { public uint X; public uint Y; ~SomeObject() { Console.WriteLine("Finalizer"); } } [StructLayout(LayoutKind.Sequential)] class ReferenceHolder { public uint Marker; public SomeObject TheReference; } class Program { static void Dismantle(int[] arr) { GCHandle gch = GCHandle.Alloc(
arr, GCHandleType.Pinned); IntPtr ptr = gch.AddrOfPinnedObject(); const int OFFSET = 8 + 4000; Marshal.WriteInt32(ptr, OFFSET,
Marshal.ReadInt32(ptr, OFFSET) + 4); gch.Free(); } static void Main(string[] args) { Console.ReadLine(); int[] arr = new int[1000]; ReferenceHolder holder = new ReferenceHolder(); holder.Marker = 0xCAFECAFE; holder.TheReference = new SomeObject
{X = 0xDEADBEEF, Y = 0xBADF00D}; int[] arr2 = new int[1000]; Dismantle(arr); GC.Collect(); GC.WaitForPendingFinalizers(); GC.Collect(); holder.TheReference.GetHashCode(); Console.WriteLine("MAIN DONE"); Console.ReadLine(); GC.KeepAlive(holder); GC.KeepAlive(arr); GC.KeepAlive(arr2); } }
}
A year and a half ago I touched on the subject of debugging process startup, such as the startup of Windows Services, using the GFlags utility (the ImageFileExecutionOptions registry key).
The general idea is to rely on the Windows loader to launch a debugger instead of the debugged process, and trace your way through the process startup code. Unfortunately, this relies on the debugged process to run in the same session as you—otherwise, you won’t be able to actually see the debugger.
Starting from Windows Vista, Windows services are isolated into a separate session to which you do not have access when you are logged onto the system. The debugger is launched within this session as well, which produces the undesired result of having the service stuck waiting for the debugger, and the debugger stuck waiting for your input which you cannot provide. (To learn more about Session 0 Isolation, check out the trusty Windows 7 Training Kit which covers several application compatibility topics with detailed code examples.)
What can you do to debug service startup on Windows Vista or newer OS versions? All you need is to fire a remote debugging server that debugs the service, and connect to its debugging session from a debugging client. Assuming that your Debugging Tools for Windows installation resides in C:\Debuggers, you can configure the following as the Debugger string in GFlags:
C:\Debuggers\ntsd.exe -server tcp:port=10000 -noio
When you start the service, you will notice an ntsd.exe instance launch in session 0; you’ll need to connect to the debugging session quickly by launching WinDbg (or NTSD), choosing File | Connect to Remote Session, and providing tcp:port=10000 as the transport. (Note that when debugging service startup, you might want to increase the service startup timeout to prevent the SCM from giving up on your service. Unfortunately, this is a global setting; another option is to request additional time from code.)
This is where we are through the series:
- Calling a function
- Configuring breakpoints
- Tracing execution
- Execution control
- Displaying data, including STL collections
- Runtime application checking
- Miscellaneous commands
Today’s post is about configuring conditional breakpoints. Finally, this is one area where the Windows tools have feature parity with DBX, at least as far as the basic feature goes.
A conditional breakpoint is a breakpoint that is associated with a condition. If the condition is true, the debugger stops at the breakpoint; otherwise, it continues execution. The important question, then, is what kinds and shapes of conditional breakpoints are available, and how the expression evaluator copes with complex conditional expressions.
DBX
DBX makes it rather easy to specify a conditional breakpoint. Append –if at the end of any stop command and you have yourself a condition that applies to anything in the scope of the breakpoint. For example:
stopped in main at line 15 in file "stl.cc"
15 global = 1;
(dbx) stop at 22 -if v.size() == 2
(7) stop at "stl.cc":22 -if v.size() == 2
(dbx) cont
stopped in main at line 22 in file "stl.cc"
22 m[5] = v;
(dbx) status
(6) stop infunction main
*(7) stop at "stl.cc":22 -if v.size() == 2
Visual Studio
Visual Studio has conditional breakpoint support—when a breakpoint is set, you can specify a condition that must be true for the debugger to stop. Unfortunately, these conditions are subject to even more stringent limitations than the expressions you can input in the Immediate Window. The trivial things, such as paramName > 13, still work, of course.
WinDbg
WinDbg circumvents the problem altogether by providing the ability to execute an arbitrary debugger command when a breakpoint is hit. Among the things that this debugger command may do is continue execution using the gc command; alternatively, it could display some output, stop, etc.
We have seen before an example of WinDbg’s conditional breakpoint support for implementing the stop inobject DBX command equivalent; I will not be repeating another example here.