In the beginning of December, the consultants team at Sela had a day off-site for our annual hackathon to work on a variety of projects. This day was a blast, and there was a bunch of great energy and interesting work being done all around, but my team (Avi Avni and I) focused on diagnostics tools — my favorite — and here are some preliminary results.
Real-time Win32 memory leak diagnoser
This is a project I’ve had on my todo list for a couple of years now. In a nutshell, Win32 memory leak analysis in production is quite painful because of the sheer amount of data that has to be collected. Traditional approaches, which I’ve used quite successfully in the past, require recording every single allocation and deallocation, and then cross-correlating them to find allocations that weren’t freed (e.g., in this post using xperf and WPA). While this generally works, for an application with high-frequency allocations that leaks at a slow rate, collecting data over an hour or day or week is simply impractical due to the sheer sizes of the data files.
A couple of years ago, I wrote a BPF-based tool called memleak, which uses Linux uprobes to record allocations and deallocation stacks in a runtime data structure, without emitting data to files. I’ve already used this tool a couple of times to diagnose production issues.
The NativeLeakDetector project that Avi Avni built in just a few hours during the hackathon does the very same thing — for Windows, using ETW events. It’s still a bit shy on documentation, but is quite simple in principle. It uses the TraceEvent library to record heap allocation and deallocation events in a given process, and keeps track of all allocations with their call stacks in a runtime map. When instructed to, the tool prints all the allocations that were not freed and the call stacks leading to these allocations. There’s a bit of work remaining to make this tool production-ready, but the general skeleton is there and working quite fine.
Process snapshotting support in CLRMD
Our second project, also contributed by Avi Avni, was to add process snapshotting support to the popular CLRMD debugging library. If you haven’t seen it yet, CLRMD provides a convenient C# API for attaching to a live process or opening a dump file and analyzing its contents. You can walk threads and call stacks, locate specific objects in memory, investigate the heap size and GC state, and numerous other scenarios. The only catch is that to use CLRMD, you have to opt in for one of the following modes:
- Create a dump file of the process and open the dump file. This allows you to capture an accurate snapshot of the process’ state, but creating the dump file can take a long time and take a lot of disk space.
- Attach to the process invasively, like a debugger. Again, this lets you inspect the process’s state, but if the process is a production service, you just paused it completely.
- Inspect the process’ memory without suspending it. The process keeps running, which is great for production services, but it means you’re not seeing a consistent snapshot. For example, while you’re enumerating heap objects, a GC can occur and completely mess everything up.
Avi’s pull request adds another option: create a virtual address clone of the process using the Process Snapshotting API (essentially POSIX fork(), but without actually executing code in the child process), and then attach CLRMD to the clone. The original process can keep running, but we have an accurate snapshot of its state to analyze — and then throw away. What’s best, the snapshotting API uses copy-on-write, so only pages modified by the original process are actually cloned (on demand) in physical memory.
.NET Core real-time event tracer for Linux
Earlier this year, I wrote a couple of blog posts on tracing .NET Core runtime events on Linux, such as garbage collections, allocations, exceptions, and others. The tracing approach I’ve shown is based on recording LTTng events to a trace file, and analyzing the trace file later. While this has its merits, it’s not really suitable for real-time, continuous monitoring. So I set out to build a proof-of-concept script that captures a real-time trace of .NET Core events, aggregates them in real-time, and produces interesting statistics.
The result is dntrace, a two-part tool: dntrace.sh, which turns on the LTTng events and records them, and dntrace.py, which parses them in real-time and displays statistics. Currently, the Python part uses an extremely fragile approach, where the trace data is passed through Babeltrace and then parsed from strings back into structured events. Babeltrace 2.0 will introduce API support for parsing events from real-time sessions, which is when the dntrace.py script can be revisited and implemented in a less hacky way.
It’s still not bad, though — you can get real-time GC information, including GC durations and generation sizes; printouts on any exceptions thrown; live allocation data; and other statistics. See the project repository for an example.
Bonus project: run a process in a Windows job object
During the day, I started working on another little tool, which I was only able to finish a few days later: jobrun. This tool runs a process inside a Windows job object, and lets you apply various limits to its behavior. You can restrict the process’ memory usage, CPU time, CPU affinity, scheduling priority, scheduling weight, and apply additional quotas — all supported by the Windows job object API.
For me, this was a useful tool for testing how a process deals with scarce resources. What happens when I can’t commit more than 300 MB of memory? How long does it take for the application to start up when I only get 3% of CPU time per scheduling interval? Can a single batch job complete within a hard limit of 30 CPU seconds? Perhaps you’ll find some other uses for this tool, too.
You can also follow me on Twitter, where I put stuff that doesn’t necessarily deserve a full-blown blog post.