Tales from High Memory Scenarios: Part 2
In the first part, we looked at a scenario where fiddling with the in-memory field controlling a custom cache size and then externally triggering a garbage collection gave us a likely culprit for a high memory scenario.
After having disentangled the first problem, we faced a completely different issue. One of the apparently-leaking processes didn’t have that many objects in its managed heap. Inspecting the GC performance counters indicated that there while memory utilization was around 900MB, only 150MB or so of managed objects were actually on the managed heap.
We ran the !eeheap –gc WinDbg command to take a look at segment sizes, and noticed the following: In each heap (this is an 8-core system running server GC, so 8 heaps) there was just a single ephemeral segment (gen0 and gen1) but it was HUGE. And I’m not talking about reserved virtual memory – I’m talking about >100MB of committed memory in a segment, with only about 10-20MB of it actually used to store managed objects.
It was as if the memory from these segments was never being given back to the OS… Which sounds remotely familiar, doesn’t it? Well, it did to me – that’s exactly what the VM hoarding feature is all about. On a 64-bit system, it makes little sense to give virtual address space back to the OS, and it’s fairly likely that you would want to specify the STARTUP_HOARD_GC_VM flag if you’re writing your own CLR host and binding to the CLR.
But we’re not writing a CLR host here, now are we? Well, we aren’t, but ASP.NET is a custom host in and of its own. It already triggers server GC for you even though you didn’t ask it to, so why wouldn’t it specify some more of that magic sauce in the hope of maximizing performance?
Well, it might, but how do we find out?
At this point, we were a little blocked but fortunately, after crawling the Rotor (SSCLI) sources, Dima Zurbalev found an undocumented global flag called g_IGCHoardVM, which apparently indicated whether VM hoarding was on.
Luckily for us, this global flag was exported from mscorwks.dll, so we could simply inspect it at runtime with the WinDbg db command and see that it was set to 1! To confirm, we inspected its value in a simple console application that didn’t request VM hoarding, and it was set to 0 – so we could reach a reasonable conclusion that ASP.NET was indeed requesting VM hoarding.
What this basically means is that memory is never reclaimed by the OS – your application’s memory usage is dominated by its peak memory usage. This is not a memory leak – if memory was running out, these segments would likely be freed – but we couldn’t tell if it were the case or not because the customer was never able to show us a system that actually crashed with an OOM condition.
Paradoxically, our recommendation in this case was to remove the automatic high-memory alert and recycling system they had in place (at least for one of their servers), and see if it crashed with an OOM condition. If it did, we would know for sure that VM hoarding was not the only culprit here.