If you’re reading this, I hope you’re curious what your options are when it comes to running JVM diagnostic tools on containerized applications. Generally when it comes to containers, you can either shove all your diagnostic tools into the container image, or you can try running them from the host — this short post tries to explain what works, what doesn’t, and what can be done about it. Although it is focused on JVM tools (and HotSpot specifically), a lot of the same obstacles will apply to other runtimes and languages.
As a very quick reminder, container isolation on Linux works by using namespaces. Containerized processes are placed in a PID namespace that gives them private process ids that aren’t shared with the host (although they also have process ids on the host); in a mount namespace that gives them their own view of mount points, and hence their own view of the filesystem; in a network namespace that gives them their own network interfaces; and so on. A lot of diagnostic tools aren’t namespace-aware, and will happily try to open files on the host using container paths, or try to attach to a process by using the container’s PID namespace, or exhibit any number of other failures.
Additionally, container resources are often limited by using control groups. This is not so much an isolation mechanism as it is a quota mechanism: the cpu control group restricts container CPU usage shares; the memory control group restricts user and kernel memory usage; the blkio control group restricts I/O throughput and operation count; and so on.
Finally, a lot of container runtimes (including Docker) use seccomp to restrict the set of syscalls containerized processes can make, to further isolate them from the host and avoid nasty surprises. Turns out, though, that some of these restricted syscalls are actually essential for diagnostic tools to work properly.
JVM diagnostic mechanisms
This is by no means a complete survey, but it’s worth just listing quickly the main JVM diagnostic mechanisms and how they work, before we can consider what happens in a containerized environment. (For more on this, including source links, check out Serviceability in Hotspot from the OpenJDK documentation.)
- JVM performance data: by default, the JVM emits binary data into a file in the temp directory called hsperfdata_$UID/$PID. This file contains statistics on garbage collection, class loading, JIT compilation, and other events. It is the data source for jstat, and is also how jps and jinfo discover information about running JVM processes.
- JVM attach interface: by default, the JVM will react to a QUIT signal by looking for a file in the working directory called .attach_pid$PID. If the file exists, it will create a UNIX domain socket in the temp directory called .java_pid$PID, and create a thread that will listen for commands on that socket. jmap, jstack, jcmd are some of the tools that rely on the attach interface for heap dumps, thread dumps, obtaining VM information, and other facilities.
- Serviceability Agent: a component that runs in an external process and reads JVM data structures from the target by using ptrace (for a live process) or ELF parsing (for a core dump). This allows live diagnostics and core dump analysis to see thread states, heap objects, call stacks, and so on. HSDB, SOSQL, and other tools rely on the Serviceability Agent API. Notably, the JDK version has to match exactly between the original JVM and the one used to analyze the core dump or live process.
- JVMTI: this tool interface allows an external agent library (.so) to be loaded with or attached to a JVM process and register for various interesting events, including class loading, thread start, garbage collection, monitor contention, and others. To load an agent with your process you use the -agentpath command-line argument; to attach an agent to a live process you use the JVM attach interface.
- JMX: the JDK runtime provides a basic set of managed beans for inspecting the GC heap, threads, and other components. Many additional managed beans exist in various application containers like Tomcat.
Another important concept to consider is perf maps, used by the Linux perf tool to map JIT-compiled code addresses to Java methods. A common way of creating these is by using a JVMTI agent (e.g. perf-map-agent), which writes a perf map out to the default location in /tmp/perf-$PID.map. These are crucial for a lot of native Linux performance tools if you plan to use them with JVM processes.
Running diagnostic tools from the host
If you look at the way some of the JVM tools are implemented, it is clear that running them from the host will present a set of interesting challenges. Here’s how to address these challenges in some cases:
- The JVM performance data store will usually not be accessible from the host. However, you can bind-mount the temp directory to make it visible from the host, which makes tools like jstat happy. (With Docker, this would be something like -v /tmp:/tmp).
- The JVM attach interface has multiple points of failure: the containerized JVM thinks its process ID is X, while the host tool thinks it’s Y; and of course the attach file and the UNIX domain socket will be in the wrong mount namespaces. I just recently added a namespace-awareness patch to Andrei Pangin’s jattach tool, which covers the functionality of jmap, jstack, jcmd, and jinfo in a single package — so you can now use jattach from the host with no additional flags.
- The Serviceability Agent API requires the full JDK to be available on the host, and requires a perfect match between the host and container JDK. This is not a likely scenario.
- Attaching a JVMTI agent to a containerized process can be done with jattach, provided that the agent library is accessible in the container. This can be done with bind-mounts.
- JMX beans can be accessed from the host by making the container expose them remotely using RMI. This StackOverflow question and answer thread covers it well.
- If you plan on using perf maps, you need to generate them inside the container (by attaching a JVMTI agent) and then make them accessible to the host tool with the right PID. This is taken care of automatically by some tools, and was recently added to perf as well.
Running diagnostic tools from the container
Although I don’t particularly like the idea of bloating your container image with diagnostic tools, suppose you’ve done it anyway. Here are some of the likely problems:
- The Serviceability Agent API uses the ptrace syscall, which is disabled in Docker’s seccomp profile (and I imagine it would be disabled by other sensible container runtimes as well). You can use a custom seccomp profile, of course, if you understand the security consequences for your host.
- Using perf and perf-based tools inside the container requires the perf_event_open syscall, which is again blocked by Docker’s default seccomp profile.
Most diagnostic tools at our disposal today were not designed with containers in mind. You could say they are not container-aware — but they’re not aware of a bazillion other things which still don’t break their behavior. Unfortunately, most tools will not work out-of-the-box for containerized JVM processes, but there are ways to make them work with a fairly minimal effort.
You can also follow me on Twitter, where I put stuff that doesn’t necessarily deserve a full-blown blog post.