tl;dr I wrote a simple proof-of-concept tool called place-probe.py which helps place dynamic tracepoints on .NET methods. For example: place-probe.py $PID 'System.Threading.Thread::Sleep'.
Dynamic tracing is one of the Linux diagnostics superpowers. By adding dynamic tracepoints on arbitrary functions across the system, you can diagnose a variety of “impossible” bugs and performance problems on a live production application without having to add instrumentation, rebuild, and restart. The underlying kernel mechanism that makes dynamic tracing possible is called uprobes (for userspace) and kprobes (for kernel functions).
Unfortunately, uprobes can only be placed on code that is backed by an on-disk image. In other...
On Windows, .NET contains numerous very useful ETW events, which can be used for tracing garbage collections, assembly loading, exceptions thrown, object allocations, and other interesting scenarios. All events can come with a stack trace, which helps understand where they’re coming from. In fact, I’d say for some events, not getting the stack trace means the event is completely useless — e.g. what good is the ExceptionThrown event if you don’t have the exception stack trace?
On Linux, .NET Core doesn’t use ETW (Event Tracing for Windows, ya know). It uses LTTng instead, which is an awesome tracing framework but doesn’t have stack...
In the beginning of December, the consultants team at Sela had a day off-site for our annual hackathon to work on a variety of projects. This day was a blast, and there was a bunch of great energy and interesting work being done all around, but my team (Avi Avni and I) focused on diagnostics tools -- my favorite -- and here are some preliminary results.
Real-time Win32 memory leak diagnoser
This is a project I've had on my todo list for a couple of years now. In a nutshell, Win32 memory leak analysis in production is quite painful because of...
I hope you're outraged that your performance tools are lying to you. For quite a while, many Java sampling profilers have been known to blatantly misrepresent reality. In a nutshell, stack sampling using the documented JVMTI GetStackTrace method produces results that are biased towards safepoints, and not representative of the real CPU processing performed by your program.
Over the years, alternative profilers popped up, trying to fix this problem by using AsyncGetCallTrace, a less-documented API that doesn't wait for a safepoint, and can produce more accurate results. Simply calling AGCT from a timer signal handler gives you a fairly reliable way...
In my last post, I lamented the lack of call stack support for LTTng events in .NET Core. Fortunately, being open source, this is somewhat correctable -- so I set out to produce a quick-and-dirty patch that adds USDT support for CoreCLR's tracing events. This post explores some of the things that then become possible, and will hopefully become available in one form or another in CoreCLR in the future.
Very Brief USDT Primer
USDT (User Statically Defined Tracing) is a lightweight approach for embedding static trace markers into user-space libraries and applications. I've taken a closer look a year ago when...
After exploring the basic profiling story, let's turn to ETW events. On Windows, the CLR is instrumented with a myriad of ETW events, which can be used to tackle very hard problems at runtime. Here are some examples of these events:
Thread start/stop (including thread pool threads)
Exceptions thrown, caught, filtered
Methods compiled (JIT)
By collecting all of, or a subset of, these events, you can get a very nice picture of what your .NET application is doing. By combining these with Windows kernel events for CPU sampling, file accesses, process creations and more -- you have a fairly...
In the same vein of my previous post on analyzing core dumps of .NET Core applications on Linux, let's take a look at what it takes to do some basic performance profiling. When starting out, here are a few things I wrote down that would be nice to do:
CPU profiling (sampling) to see where the CPU bottlenecks are
Grabbing stacks for interesting system events (file accesses, network, forks, etc.)
Tracing memory management activity such as GCs and object allocations
Identifying blocked time and the block and wake-up reasons
With this task list in mind, let's get started!
Collecting Call Stacks of .NET Core Processes
Generally speaking, a...
Recently, I had to open a core dump of a .NET Core application on Linux. I thought this walkthrough might be useful if you find yourself in the same boat, because, to be quite honest, I didn't find it trivial.
Configure Linux to Generate Core Dumps
Before you begin, you need to configure your Linux box to generate core dumps in the first place. A lot of distros will have something preconfigured, but the simplest approach is to just put a file name in the /proc/sys/kernel/core_pattern file:
# echo core > /proc/sys/kernel/core_pattern
Additionally, there's a system limit maximum size for the generated core file. ulimit -c unlimited removes...
A lot of high-level languages, runtimes, and libraries people use on Linux have USDT probes embedded in them. In some cases, you have to compile with a specific flag to get the probes embedded (e.g. in Node.js), and in other cases they are part of the default package on most major distributions (e.g. in OpenJDK). Some examples of information these probes can provide include:
Garbage collection events and latencies in Java and Node
Method invocations and latencies in Python and Ruby
Object allocations in Ruby and Java
Thread start and stop events in Java
Class load events in Java and Ruby
Now that BCC has support for USDT probes, another thing I wanted to try is look at OpenJDK probes and extract some useful examples. To follow along, install a recent OpenJDK (I used 1.8) that has USDT probes enabled. On my Fedora 22, sudo dnf install java was just enough for everything.
Conveniently, OpenJDK ships with a set of .stp files that contain probe definitions. Here's an example -- and there are many more in your $JAVA_HOME/tapset directory:
probe hotspot.thread_start = process("/usr/lib/jvm/java-1.8.0-openjdk-220.127.116.11-1.b03.fc22.x86_64/jre/lib/amd64/server/libjvm.so").mark("thread__start")
name = "thread_start";
thread_name = user_string_n($arg1, $arg2);
id = $arg3;
native_id = $arg4;
is_daemon = $arg5;
probestr = sprintf("%s(thread_name='%s',id=%d,native_id=%d,is_daemon=%d)",