Native Debugging Walk-Through Set

August 6, 2008

4 comments

I gave a 3-hour presentation today on C++ debugging techniques, with a focus on production debugging.  I’d like to share with you the demos I’ve shown during the session with a brief walkthrough so that you can repeat what I did in class.  (I have intentionally omitted the debugger spew and any screenshots so that this still remains somewhat of an interesting challenge.)

First of all, download the demo solution (30KB), unzip and open with Visual Studio 2008 (the code should work on Visual Studio 2005 as well, so you can downgrade the project files manually if you’d like).

#1 – Runtime Checks

In this demo, we will see the effect of the Microsoft C++ compiler’s runtime checks on our application’s behavior.  Uncomment the first line in the main method, compile the project in debug mode and run it.

The expected behavior is an assertion failure indicating that you have attempted to use an uninitialized variable.  From reviewing the code it is evident that the variable i within the IsPrime method is indeed used before it is initialized.  This is also a compiler warning, but is it caught at runtime by monitoring accesses to the variable and ensuring it is not accessed before it is initialized.  The runtime check saved you from erroneously relying on whatever garbage that was on the stack at the time of the call.

Comment the first line in the main method and run the project again.

This time, you receive an assertion failure indicating that the ESP register was not preserved across a function call, which is likely to imply an improper calling convention.  From reviewing the code you can see that the Add method is defined with the __stdcall calling convention, but called through a function pointer that has the default __cdecl calling convention.  The runtime check saved you from corrupting the stack.

#2 – Advanced Breakpoints

In this demo, we will experiment with advanced breakpoints in Visual Studio.  Compile the project and run it under the debugger.

You receive an access violation exception inside Func1 attempting to write to a null pointer.  However, from reviewing the code of the main method the pointer (called g_pData) appears to have been initialized properly.

Launch the debugger again and choose Debug –> Breakpoints –> New Data Breakpoint.  The expression to monitor should be &g_pData – you want to be notified immediately as the value of the pointer is modified.  (Data breakpoints are a feature provided by the CPU, so if running in a virtual machine environment or on odd hardware you might not be able to set a data breakpoint or it might have no effect.)

Continue executing the program.  It should stop inside Func2 when the global pointer is modified and set to null.

#3 – Static Code Analysis

In this demo we will use the C++ static code analysis feature to detect a bug that is easy to overlook.  Compile the project and review the code.

There is a simple bug inside the GenerateString function that could easily be missed by the developer.  However, static code analysis should be able to find it.

Right click the project in Solution Explorer, choose Properties and under the Code Analysis section toggle the last property to Yes (/analyze).  Compile the project again.

The compiler was able to detect the fact that you are overrunning the dynamically allocated buffer by one character.  Static code analysis has just saved you hours, days or weeks of debugging time trying to chase down this single ill-behaving character.

#4 – Taking a Dump (Simple Crash)

In this demo you will take a dump of a crashing process.  Compile the project and run it.

When you press any key, the application crashes.  Obtain a dump of the application when it crashes by running ADPlus from the Debugging Tools for Windows package with the following command line:

ADPlus –crash –p <AppProcessIdHere>

After the dump has been generated, open it in Visual Studio (File –> Open –> Project or Solution) and “Run” it, to see the faulting statement in the source code.

Alternatively, open the dump in WinDbg (from the Debugging Tools for Windows package) by choosing File –> Open Crash Dump, and witness the faulting statement in the source window.

#5 – Critical Sections Deadlock

In this demo we will experiment with a real debugger (WinDbg) from the Debugging Tools for Windows package.  Compile the project and run it.

The application hangs doing nothing.  Run WinDbg and select File –> Attach to Process (or hit F6).  Choose the right process and attach to it.

Run the ~* command to see all the threads running in the process.  The last thread is the thread talking to the debugger; the first three threads are application threads.

Execute the ~0s command and then the kb command to see the stack for thread 0 (the main thread).  From the stack it is evident that the thread is waiting for multiple objects.  Examine the parameters passed to the kernel32!WaitForMultipleObjects method – the first parameter is the count of handles in the array, and the second parameter is the array itself.

Use the dd command and pass it the address of the array to see the handles the main thread is waiting for.  Use the !handle command and pass to it each of the two handles you see in the memory dump.  Both handles are thread handles, so the main thread is waiting for two other threads to complete.  It is reasonable to deduce that the main thread is waiting for the other two application threads.

Execute the ~1s command and then the kb command to see the stack for thread 1 (repeat the same with thread 2).  Both threads are waiting for a critical section to become available – each is waiting for a different critical section.

Use the !locks command to see the currently locked critical sections within your process.  Correlate the addresses and owning threads of these critical sections with the output from the previous commands and deduce that you have a deadlock involving two threads and two critical sections.

#6 – Kernel Objects Deadlock

In this demo we will examine a more complicated deadlock which cannot be resolved with the user-mode debugger alone.  Compile the project and run it.

The application hangs doing nothing.  Run WinDbg and select File –> Attach to Process (or hit F6).  Choose the right process and attach to it.

Repeat the steps from walkthrough #5 to see that the main thread is likely waiting for the two application threads, and the application threads are waiting for a mutex object (use the !handle command to see the object type).

What is lacking in the above output is the mutex owner (for critical sections, the !locks command gave away what we needed to know).  Obtaining this information is only possible with a kernel debugger, which is outside the scope of this walkthrough.

On Vista, a built-in mechanism called Wait Chain Traversal can assist with this kind of issue (I’ve covered it in detail in the past).  Using WCT on this process yields a deadlock very similar to the previous walkthrough – the threads are effectively waiting for each other, and therefore can’t make any forward progress.  (You might want to use my WCT helper application to streamline this process – it’s part of the demos from my TechEd 2008 presentation.)

#7 – Handles

In this demo we will examine a slightly more complicated problem which requires reproducing the issue at least once or twice.  Compile the project and run it.

From the console output it appears that a secondary thread is waiting for a termination event.  Press any key to let the main thread signal that event.  The main thread now waits indefinitely for the thread to exit (judging by the console output, anyway), which doesn’t actually happen.

Run WinDbg and select File –> Attach to Process (or hit F6).  Choose the right process and attach to it.

Repeat the steps from walkthroughs #5 and #6 to verify that the main thread is waiting for a thread handle (which is likely the secondary thread) and that the secondary thread is waiting for … an invalid handle?  The !handle command will fail with error 6, and executing the !error 6 command reveals that the handle is invalid.

How can the handle be invalid if the thread is waiting for it?  It must have become invalid after the wait has started, so someone must have closed it.  On the other hand, if it were closed after the event was signaled, the secondary thread wouldn’t be waiting for it right now.

Run the application again and attach the debugger before allowing the main thread to signal the termination event.  Use the bp command to set a pair of breakpoints on kernel32!CloseHandle and kernel32!SetEvent.  Press any key in the console and observe the two breakpoints – the handle is closed before the event is signaled, which is the root cause of the problem.  (To examine the parameters when the breakpoint is hit you can use the kb command and observe the call stack as well.)

#8 – Potential Memory Corruption

In this demo we will detect a memory corruption at the point where it occurs, and not later ten layers deep into our application code when it’s impossible to trace back to its source.  Compile the project and run it.

It completes successfully on my system – your results might vary.  Examine the code and detect the buffer overrun – the function is asking to allocate 120 bytes, but it’s accessing the 130th integer from the returned buffer (so it’s an overrun on both counts!).  However, since the heap allocation comes from a pool of already committed virtual memory, no access violation occurs.  This kind of corruption could be revealed only hours later, when some other application code attempts to use the corrupted piece of memory.

Run the GFlags application from the Debugging Tools for Windows package.  Under the Image tab, enter the application name including the .exe suffix, and click the Tab key.  Check the “Enable page heap” checkbox and click Apply.

Run the application again within the debugger.  It breaks on the access violation where it previously worked just fine.  The reason is that the page heap option pads the allocations with invalid (not committed) pages and ensures that you get an access violation if you try to overrun a buffer.

Another approach to the same problem would be to use Application Verifier with its set of heap checks enabled.

Summary

This concludes the set of walkthroughs.  If you have any interesting ideas for additional walkthroughs, or would like to comment on the existing ones, please let me know.

Add comment
facebook linkedin twitter email

Leave a Reply to JensC Cancel Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*

4 comments

  1. MattAugust 6, 2008 ב 10:24 AM

    I was looking forward to running through this exercise, but it seems the .zip file is corrupt.

    Reply
  2. Rotem BloomAugust 7, 2008 ב 4:31 AM

    היתה אחלה של הרצאה סשה. למרות שאני לא איש C++ ואני מכיר חלק מהכלים האילו, היה מעניין מאוד.
    מומלץ לכל מי שכותב ב-C++ יכול לחסוך לכם שעות של חיפוש אחרי בעיות.

    Reply
  3. Sasha GoldshteinAugust 9, 2008 ב 5:58 AM

    @Matt: I just downloaded the file and it was perfectly fine… Please let me know if it works for you, and if not, what exactly is corrupt. Thanks.

    Reply
  4. JensCAugust 21, 2008 ב 9:32 AM

    Had problems downloading the demo files using Firefox and FlashGot. Clicking the link while pressing Alt and Shift to bypass FlashGot did it for me. Maybe thsi is helful for you too.

    Reply