Manual Stack Walking
Corrupted stacks are no fun at all – when you get a crash dump or a live exception in an application, pretty much the first thing you do is take a look at the call stack. When the stack itself is corrupted, your primary investigation tool is taken away.
Still, it is sometimes possible to reconstruct the stack even in face of a corruption. I’ve been showing how in the .NET Debugging and C++ Debugging courses, but by popular demand will show one example here as well.
You can follow along on your own with the dump file, symbol file, and sources from here.
Here we go – open the dump file in WinDbg (32-bit) obtains the following output:
User Mini Dump File: Only registers, stack and portions of memory are available
. . .
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(1ed0.870): Access violation - code c0000005 (first/second chance not available)
eax=00000000 ebx=00000001 ecx=73536122 edx=00000000 esi=002af37c edi=0000004e
eip=00000000 esp=002af1a8 ebp=00000000 iopl=0 nv up ei pl zr na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010246
00000000 ?? ???
0:000> k
ChildEBP RetAddr
WARNING: Frame IP not in any known module. Following frames may be wrong.
002af1a4 00000000 0x0
This is already bad news – the current instruction is at address 0x00000000, which means the instruction pointer (EIP) has been corrupted. You can also see that EBP has been corrupted – its value is 0x00000000 as well, which is why the k command has nothing to report.
Fortunately, ESP seems to have a valid value – well, we can’t really tell if it’s valid or not from looking at it, but we can try reading the memory it points to. If we manage to read the memory, it is almost 100% certain that ESP still points to the stack – because this is a mini dump that contains (almost) only stack memory.
If ESP indeed points to the stack, we can try looking at the stack manually and try to find something that looks like a return address. Immediately before the return address we should find a saved EBP value – unless the frame uses FPO, which I plan to discuss in a future post. This EBP value will provide the foundation for walking the stack further back – EBPs are chained in the sense that EBP always points to the previous saved EBP on the stack, which points to the even earlier saved EBP on the stack, and so on. (Refresh your memory on how an x86 stack is laid out.)
Here’s the raw stack contents from ESP (this would be a good time to set up the symbol path to include the folder which contains BatteryMeter.pdb):
0:000> dds ESP
002af1a8 00000000
002af1ac 002af120
002af1b0 00000000
002af1b4 014cfe90
002af1b8 002af0fc
002af1bc 742fd594 uxtheme!StreamInit+0x36
002af1c0 002af180
002af1c4 01850815
002af1c8 0000029e
002af1cc 00000000
002af1d0 00000000
002af1d4 737990fa
002af1d8 002af210
002af1dc 013719be BatteryMeter!RecurseDeep+0x4e [...\batterymeterdlg.cpp @ 135]
002af1e0 00000004
002af1e4 77dbc290 mfc100u!AfxDlgProc [...\dlgcore.cpp @ 22]
002af1e8 00000000
002af1ec 002af284
002af1f0 00000001
002af1f4 00a24a74
002af1f8 00a5ec90
002af1fc 00a24cf0
002af200 002af228
002af204 002af198
002af208 002af234
002af20c 73799332
002af210 002af248
002af214 013719be BatteryMeter!RecurseDeep+0x4e [...\batterymeterdlg.cpp @ 135]
First of all, it’s nice to see that ESP points into a memory area that is included in the dump – which means we are looking at the stack. There are several things here that might be return addresses – and the addresses immediately preceding them are saved-EBP candidates. To eliminate candidates, we can peek at the memory location they point to – if it’s on the stack, the candidate is viable.
0:000> dd 002af0fc L1
002af0fc ????????
0:000> dd 002af210 L1
002af210 002af248
The first attempt failed, but the second attempt succeeded – we might have a saved EBP on our hands. We can now proceed with manual reconstruction – the saved EBP points to another EBP, and immediately following it we should find another return address. Repeat several times to see if it makes sense:
0:000> dds 002af210 L2
002af210 002af248
002af214 013719be BatteryMeter!RecurseDeep+0x4e [...\batterymeterdlg.cpp @ 135]
0:000> dds 002af248 L2
002af248 002af280
002af24c 013719be BatteryMeter!RecurseDeep+0x4e [...\batterymeterdlg.cpp @ 135]
0:000> dds 002af280 L2
002af280 002af2b8
002af284 013719be BatteryMeter!RecurseDeep+0x4e [...\batterymeterdlg.cpp @ 135]
0:000> dds 002af2b8 L2
002af2b8 002af2f0
002af2bc 013719be BatteryMeter!RecurseDeep+0x4e [...\batterymeterdlg.cpp @ 135]
0:000> dds 002af2f0 L2
002af2f0 002af304
002af2f4 013719f7 BatteryMeter!CBatteryMeterDlg::OnCPUSelectorChanged+0x27 [...\batterymeterdlg.cpp @ 142]
0:000> dds 002af304 L2
002af304 002af318
002af308 77d92c8c mfc100u!_AfxDispatchCmdMsg+0x58 [...\cmdtarg.cpp @ 112]
We could keep doing this for a while – reconstructing the stack (as long as we don’t run into an FPO frame) until we hit the bottom. So far we have the RecurseDeep function calling itself at least four times before we hit the stack corruption.
There is also a WinDbg command that can perform this reconstruction for us – we only need to give it a guess for EBP, ESP, and EIP – and it constructs a plausible call stack. Our EBP guess can be the first saved EBP we found on the stack, our EIP guess can be the return address immediately following it, and our ESP guess can be the same as EBP, producing the following output:
0:000> k = 002af210 002af210 013719be
ChildEBP RetAddr
002af210 013719be BatteryMeter!RecurseDeep+0x4e
002af248 013719be BatteryMeter!RecurseDeep+0x4e
002af280 013719be BatteryMeter!RecurseDeep+0x4e
002af2b8 013719be BatteryMeter!RecurseDeep+0x4e
002af2f0 013719f7 BatteryMeter!RecurseDeep+0x4e
002af304 77d92c8c BatteryMeter!CBatteryMeterDlg::OnCPUSelectorChanged+0x27 002af318
77d92e51 mfc100u!_AfxDispatchCmdMsg+0x58 002af334
77dc6d36 mfc100u!CCmdTarget::OnCmdMsg+0x124 002af358
77e1c4cb mfc100u!CPropertySheet::OnCmdMsg+0x1d
002af388 77e1bc7f mfc100u!CWnd::OnNotify+0x7b
002af454 002af478 mfc100u!CWnd::OnWndMsg+0x9e
... source information and the rest of the stack snipped for brevity
We have turned an impossible problem with very little information into a pretty decent call stack which gives us the likely culprit for the stack corruption. Inspecting the sources for BatteryMeter!RecurseDeep drives the point home – the function corrupts the stack, but does so in a sneaky fashion – instead of corrupting its own frame, it goes back several frames earlier on the stack and overwrites a small memory region with zeroes.