Probing the JVM with BPF/BCC

March 31, 2016

no comments

Now that BCC has support for USDT probes, another thing I wanted to try is look at OpenJDK probes and extract some useful examples. To follow along, install a recent OpenJDK (I used 1.8) that has USDT probes enabled. On my Fedora 22, sudo dnf install java was just enough for everything.

Conveniently, OpenJDK ships with a set of .stp files that contain probe definitions. Here’s an example — and there are many more in your $JAVA_HOME/tapset directory:

probe hotspot.thread_start = process("/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.77-1.b03.fc22.x86_64/jre/lib/amd64/server/libjvm.so").mark("thread__start")
{
  name = "thread_start";
  thread_name = user_string_n($arg1, $arg2);
  id = $arg3;
  native_id = $arg4;
  is_daemon = $arg5;
  probestr = sprintf("%s(thread_name='%s',id=%d,native_id=%d,is_daemon=%d)",
                     name, thread_name, id, native_id, is_daemon);
}

This tells you to expect a probe called thread__start, with various arguments specifying the name of the thread, the OS thread id, and other details. I have 520 probes in the various .stp files under the tapset directory — so let’s experiment with a few. Some of the probes are always enabled, and some will require that you provide the (relatively expensive) -XX:+ExtendedDTraceProbes startup flag to your JVM.

First, let’s trace all thread start and stop events (I created a symlink to libjvm.so to make the following commands a bit shorter):

# ./trace.py 'u:/tmp/libjvm.so:thread__start "%s [%d]", arg1, arg4' 'u:/tmp/libjvm.so:thread__stop "%s [%d]", arg1, arg4'
TIME     PID    COMM         FUNC             -
06:55:24 32157  java         thread__start    Reference Handler [32157]
06:55:24 32158  java         thread__start    Finalizer [32158]
06:55:24 32159  java         thread__start    Signal Dispatcher [32159]
06:55:24 32160  java         thread__start    C2 CompilerThread0 [32160]
06:55:24 32161  java         thread__start    C2 CompilerThread1 [32161]
06:55:24 32162  java         thread__start    C1 CompilerThread2 [32162]
06:55:24 32163  java         thread__start    Service Thread [32163]
06:55:28 32159  java         thread__stop     Signal Dispatcher [32159]
^C

Nice! How about tracing which classes are being loaded?

# ./trace.py 'u:/tmp/libjvm.so:class__loaded "%s", arg1'
TIME     PID    COMM         FUNC             -
06:58:32 32209  java         class__loaded    java/lang/Object
06:58:32 32209  java         class__loaded    java/io/Serializable
06:58:32 32209  java         class__loaded    java/lang/Comparable
06:58:32 32209  java         class__loaded    java/lang/CharSequence
06:58:32 32209  java         class__loaded    java/lang/String
06:58:32 32209  java         class__loaded    java/lang/reflect/AnnotatedElement
06:58:32 32209  java         class__loaded    java/lang/reflect/GenericDeclaration
06:58:32 32209  java         class__loaded    java/lang/reflect/Type
06:58:32 32209  java         class__loaded    java/lang/Class
06:58:32 32209  java         class__loaded    java/lang/Cloneable
06:58:32 32209  java         class__loaded    java/lang/ClassLoader
06:58:32 32209  java         class__loaded    java/lang/System
06:58:32 32209  java         class__loaded    java/lang/Throwable
06:58:32 32209  java         class__loaded    java/lang/Error
06:58:32 32209  java         class__loaded    java/lang/ThreadDeath
06:58:32 32209  java         class__loaded    java/lang/Exception
^C

Very cool. Now let’s try one of the extended probes — let’s get some statistics about which methods are being executed often, from the method__entry probe:

# ./argdist.py -C 'u:/tmp/libjvm.so:method__entry():char*:arg4#top 5 methods' -T 5
[07:01:43]
top 5 methods
        COUNT      EVENT
        720        arg4 = arraycopy
        899        arg4 = get
        2263       arg4 = charAt
        4631       arg4 = <init>
        28897      arg4 = isPrime
[07:01:44]
top 5 methods
        COUNT      EVENT
        732        arg4 = arraycopy
        899        arg4 = get
        2266       arg4 = charAt
        4658       arg4 = <init>
        99999      arg4 = isPrime
^C

And finally, let’s take a look at which object sizes are prevalent during our application’s execution — this another extended probe called object__alloc:

# ./argdist.py -H 'u:/tmp/libjvm.so:object__alloc():u32:arg4*8#object sizes'
[07:04:52]                                                                            
     object sizes        : count     distribution                                     
         0 -> 1          : 0        |                                        |        
         2 -> 3          : 0        |                                        |        
         4 -> 7          : 0        |                                        |        
         8 -> 15         : 0        |                                        |        
        16 -> 31         : 0        |                                        |        
        32 -> 63         : 0        |                                        |        
        64 -> 127        : 0        |                                        |        
       128 -> 255        : 1962     |****************************************|        
       256 -> 511        : 1332     |***************************             |        
       512 -> 1023       : 802      |****************                        |        
      1024 -> 2047       : 279      |*****                                   |        
      2048 -> 4095       : 101      |**                                      |        
      4096 -> 8191       : 5        |                                        |        
      8192 -> 16383      : 8        |                                        |        
     16384 -> 32767      : 30       |                                        |        
     32768 -> 65535      : 264      |*****                                   |        
     65536 -> 131071     : 7        |                                        |        
    131072 -> 262143     : 114      |**                                      |        
    262144 -> 524287     : 185      |***                                     |        
    524288 -> 1048575    : 469      |*********                               |        
   1048576 -> 2097151    : 301      |******                                  |        
   2097152 -> 4194303    : 1        |                                        |        
^C

Admittedly, probing user-space Java libraries with a kernel-mode mechanism like BPF isn’t the most efficient thing in the world. And still, I would expect the slowdown introduced by the lightweight probes (those that don’t require -XX:+ExtendedDTraceProbes) to be fairly reasonable, most of the time. As always, though, YMMV.

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*