Analyzing a .NET Core Core Dump on Linux

February 26, 2017

5 comments

Recently, I had to open a core dump of a .NET Core application on Linux. I thought this walkthrough might be useful if you find yourself in the same boat, because, to be quite honest, I didn’t find it trivial.

Configure Linux to Generate Core Dumps

Before you begin, you need to configure your Linux box to generate core dumps in the first place. A lot of distros will have something preconfigured, but the simplest approach is to just put a file name in the /proc/sys/kernel/core_pattern file:

# echo core > /proc/sys/kernel/core_pattern

Additionally, there’s a system limit maximum size for the generated core file. ulimit -c unlimited removes that limit. Now, whenever your .NET Core process (or any other process) crashes, you’ll get a core file generated in the same directory. By the way, .NET Core on Linux x86_64 reserves a pretty gigantic address space, so expect your core files to be pretty big. But compression helps — I had a 6.5GB core dump compress into a 59MB gzip file.

Installing LLDB

To open the core dump, you’ll need LLDB built with the same architecture as your CoreCLR. Here’s how I found out what I needed:

$ find /usr/share/dotnet -name libsosplugin.so
/usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.0/libsosplugin.so

$ ldd $(find /usr/share/dotnet -name libsosplugin.so) | grep lldb
liblldb-3.5.so.1 => /usr/lib/x86_64-linux-gnu/liblldb-3.5.so.1 (0x00007f0a6b2d8000)

Seeing that LLDB 3.5 was required, I installed it with sudo apt install lldb-3.5, but YMMV on other distros, of course.

Opening The Core File And Loading SOS

Now you’re ready to open the core file in LLDB. If you’re doing this on a different box, you’ll need the same version of .NET Core installed — that’s where the dotnet binary, SOS itself, and the DAC (debugger data access component) are coming from. You could also copy the /usr/share/dotnet/shared/Microsoft.NETCore.App/nnnn directory over, of course.

$ lldb $(which dotnet) --core ./core

Once inside LLDB, you’ll need to load the SOS plugin. It’s the one we found earlier:

(lldb) plugin load /usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.1/libsosplugin.so

Now, if everything went well, the SOS plugin needs the DAC (libmscordaccore.so), so you’ll need to tell it where to look:

(lldb) setclrpath /usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.1

With that, SOS should be loaded and ready for use.

Running Analysis

You’d think you can just start running the SOS commands you know and love, but there’s one final hurdle. Here’s what happened when I opened a core file generated from a crash, and tried to get the exception information (note that you should prefix SOS commands with ‘sos’):

(lldb) sos PrintException
The current thread is unmanaged

… which is kind of odd, right? Considering that the process crashed as a result of a managed exception. Looking at the docs, it looks like SOS and LLDB have trouble communicating around the current thread’s identity. So first, let’s find the thread that encountered an exception:

(lldb) sos Threads
ThreadCount:      13
UnstartedThread:  0
BackgroundThread: 11
PendingThread:    0
DeadThread:       0
Hosted Runtime:   no
                                                                                                        Lock
       ID OSID ThreadOBJ           State GC Mode     GC Alloc Context                  Domain           Count Apt Exception
XXXX    1 57ff 0000000000C2B380  2020020 Preemptive  (nil):(nil)                       0000000000C195C0 0     Ukn
XXXX    2 5807 0000000000CAAF80    21220 Preemptive  0x7f5ad2fcbc40:0x7f5ad2fcdae0     0000000000C195C0 0     Ukn (Finalizer)
XXXX    4 580a 0000000000DC2730    21220 Preemptive  (nil):(nil)                       0000000000C195C0 0     Ukn
XXXX    6 580d 0000000000EC1D70    21220 Preemptive  0x7f5ad576b4d0:0x7f5ad576cf58     0000000000C195C0 0     Ukn
XXXX    7 5a13 00007F5ABC0292A0  1021220 Preemptive  0x7f5ad5888d30:0x7f5ad5888fd0     0000000000C195C0 0     Ukn (Threadpool Worker)
XXXX    8 5a15 00007F5AC006A3F0    21020 Preemptive  0x7f5ad594dd10:0x7f5ad594ece8     0000000000C195C0 0     Ukn System.IO.FileNotFoundException 00007f5ad593fa80 (nested exceptions)
XXXX    9 5a16 00007F5AC00916A0    21220 Preemptive  (nil):(nil)                       0000000000C195C0 0     Ukn
XXXX   10 5a17 00007F5AC80015D0  1021220 Preemptive  0x7f5ad593a9a0:0x7f5ad593b978     0000000000C195C0 0     Ukn (Threadpool Worker)
XXXX    5 5a18 00007F5AC0814DF0    21220 Preemptive  0x7f5ad50ed1b8:0x7f5ad50eefd0     0000000000C195C0 0     Ukn
XXXX    3 5a19 00007F5C54000A00  1020220 Preemptive  (nil):(nil)                       0000000000C195C0 0     Ukn (Threadpool Worker)
XXXX   11 5a1a 00007F5C50019270  1021220 Preemptive  0x7f5ad58a5710:0x7f5ad58a6fd0     0000000000C195C0 0     Ukn (Threadpool Worker)
XXXX   12 5a1b 00007F5AC0831B80  1021220 Preemptive  0x7f5ad58fcf68:0x7f5ad58fd000     0000000000C195C0 0     Ukn (Threadpool Worker)
XXXX   13 5a1c 0000000000E8F720  1021220 Preemptive  0x7f5ad593bc80:0x7f5ad593d978     0000000000C195C0 0     Ukn (Threadpool Worker)

Thread #8 looks suspicious, what with the System.IO.FileNotFoundException in the Exception column. Now, let’s see all the LLDB threads:

(lldb) thread list
Process 0 stopped
* thread #1: tid = 0, 0x00007f5c5d83b7ef libc.so.6`__GI_raise(sig=2) + 159 at raise.c:58, name = 'dotnet', stop reason = signal SIGABRT
  thread #2: tid = 1, 0x00007f5c5e482510 libpthread.so.0`__pthread_cond_wait + 256, stop reason = signal SIGABRT
  thread #3: tid = 2, 0x00007f5c5d907d29 libc.so.6`syscall + 25, stop reason = signal SIGABRT
  thread #4: tid = 3, 0x00007f5c5d907d29 libc.so.6`syscall + 25, stop reason = signal SIGABRT
... more threads snipped for brevity ...

Here, it looks like thread 1 is the one with the exception being raised. So we have to map the OS thread ID from the first command, to the LLDB thread id from the second command:

(lldb) setsostid 5a15 1
Mapped sos OS tid 0x5a15 to lldb thread index 1

And now, we’re ready to roll:

(lldb) sos PrintException
Exception object: 00007f5ad593fa80
Exception type:   System.IO.FileNotFoundException
Message:          Could not load the specified file.
InnerException:   <none>
StackTrace (generated):    SP               IP               Function
    00007F5C45D227C0 00007F5BE37412E7 System.Private.CoreLib.ni.dll!System.Runtime.Loader.AssemblyLoadContext.ResolveUsingEvent(System.Reflection.AssemblyName)+0x20ab07
    00007F5C45D227F0 00007F5BE353664F System.Private.CoreLib.ni.dll!System.Runtime.Loader.AssemblyLoadContext.ResolveUsingResolvingEvent(IntPtr, System.Reflection.AssemblyName)+0x4f

StackTraceString: <none>
HResult: 80070002

Nested exception -------------------------------------------------------------
Exception object: 00007f5ad593dea0
Exception type:   System.InvalidOperationException
Message:          Authorization cannot be requested before logging in.
InnerException:   <none>
StackTrace (generated):
    SP               IP               Function
    00007F5C45D29890 00007F5BE63002FE kitt3ns.dll!WebApplication.Controllers.AuthorizationBackgroundWorker.VerifyAuthorized(System.String)+0xae
    00007F5C45D298D0 00007F5BE630022B kitt3ns.dll!WebApplication.Controllers.AuthorizationBackgroundWorker.RequestAuthorization()+0x2b
    00007F5C45D298E0 00007F5BE55BC31C kitt3ns.dll!WebApplication.Controllers.AuthorizationBackgroundWorker+<>c.<Authorize>b__0_0()+0x4c
    00007F5C45D29910 00007F5BE33BDF11 System.Private.CoreLib.ni.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)+0x111

StackTraceString: <none>
HResult: 80131509

(lldb) sos ClrStack
OS Thread Id: 0x5a15 (1)
        Child SP               IP Call Site
00007F5C45D272C8 00007f5c5d83b7ef [HelperMethodFrame: 00007f5c45d272c8]
00007F5C45D273E0 00007F5BE33BDF11 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
00007F5C45D29770 00007f5c5cbe9bad [HelperMethodFrame: 00007f5c45d29770]
00007F5C45D29890 00007F5BE63002FE WebApplication.Controllers.AuthorizationBackgroundWorker.VerifyAuthorized(System.String) [/home/vagrant/kitt3ns/Controllers/AccountController.cs @ 37]
00007F5C45D298D0 00007F5BE630022B WebApplication.Controllers.AuthorizationBackgroundWorker.RequestAuthorization() [/home/vagrant/kitt3ns/Controllers/AccountController.cs @ 30]
00007F5C45D298E0 00007F5BE55BC31C WebApplication.Controllers.AuthorizationBackgroundWorker+<>c.<Authorize>b__0_0() [/home/vagrant/kitt3ns/Controllers/AccountController.cs @ 24]
00007F5C45D29910 00007F5BE33BDE71 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
00007F5C45D29B50 00007f5c5cbfb207 [GCFrame: 00007f5c45d29b50] 
00007F5C45D29D30 00007f5c5cbfb207 [DebuggerU2MCatchHandlerFrame: 00007f5c45d29d30] 

This gives us the exception information and the thread’s current stack, if we want it. We could similarly inspect other threads by mapping the OS thread id to the LLDB thread id, but for a thread that didn’t have an exception, where do you get that clue that connects the OS thread id to the debugger thread ID? Well, it seems that GDB is using the same numbering as LLDB, but in GDB you can actually see the LWP id (on Linux, GDB LWP = kernel pid = thread) using ‘info threads’:

$ gdb $(which dotnet) --core ./core
...

(gdb) info threads
  Id   Target Id         Frame
* 1    Thread 0x7f5c45d2a700 (LWP 23061) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
  2    Thread 0x7f5c5eaab740 (LWP 22527) 0x00007f5c5e482510 in pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:219
  3    Thread 0x7f5c5b411700 (LWP 22529) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  4    Thread 0x7f5c5ac10700 (LWP 22530) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  5    Thread 0x7f5c5a40f700 (LWP 22531) 0x00007f5c5d9020bd in poll () at ../sysdeps/unix/syscall-template.S:84
  6    Thread 0x7f5c59c0e700 (LWP 22532) 0x00007f5c5e485d8d in __pause_nocancel () at ../sysdeps/unix/syscall-template.S:84
  7    Thread 0x7f5c5940d700 (LWP 22533) 0x00007f5c5e482510 in pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:219
  8    Thread 0x7f5c589b2700 (LWP 22534) 0x00007f5c5e482510 in pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:219
  9    Thread 0x7f5c498ae700 (LWP 22535) 0x00007f5c5e4828b9 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:258
  10   Thread 0x7f5c454ef700 (LWP 22538) 0x00007f5c5e4856ed in __close_nocancel () at ../sysdeps/unix/syscall-template.S:84
  11   Thread 0x7f5ad2324700 (LWP 22540) 0x00007f5c5e4856ed in __close_nocancel () at ../sysdeps/unix/syscall-template.S:84
  12   Thread 0x7f5ad1b23700 (LWP 22541) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  13   Thread 0x7f5ad2b25700 (LWP 23059) 0x00007f5c5e4828b9 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:258
... more output snipped for brevity ...

So, for example, suppose we wanted to know what managed thread #6 (OS thread id 0x580d from the ‘sos Threads’ output above) was doing when the dump file was generated. 0x580d = 22541, which is thread #12 in the output above. Going back to LLDB (note the hex notation for both thread ids):

(lldb) setsostid 580d c
Mapped sos OS tid 0x580d to lldb thread index 12

(lldb) clrstack
OS Thread Id: 0x580d (12)
        Child SP               IP Call Site
00007F5AD1B227F8 00007f5c5d907d29 [InlinedCallFrame: 00007f5ad1b227f8] Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.Libuv+NativeMethods.uv_run(Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.UvLoopHandle, Int32)
00007F5AD1B227F8 00007f5be45cea3a [InlinedCallFrame: 00007f5ad1b227f8] Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.Libuv+NativeMethods.uv_run(Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.UvLoopHandle, Int32)
00007F5AD1B227E0 00007F5BE45CEA3A DomainBoundILStubClass.(Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.UvLoopHandle, Int32)
00007F5AD1B22890 00007F5BE45CE968 Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.Libuv.run(Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.UvLoopHandle, Int32)
00007F5AD1B228B0 00007F5BE45CBCFF Microsoft.AspNetCore.Server.Kestrel.Internal.KestrelThread.ThreadStart(System.Object)
00007F5AD1B22910 00007F5BE33BDE71 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
00007F5AD1B22B50 00007f5c5cbfb207 [GCFrame: 00007f5ad1b22b50]
00007F5AD1B22D30 00007f5c5cbfb207 [DebuggerU2MCatchHandlerFrame: 00007f5ad1b22d30]

Other SOS commands that don’t depend on thread context (e.g. listing assemblies, heap objects, finalization queues and so on) do not require any fiddling with thread ids, and you can just run them directly.

Summary

So, what we had to do in order to open a .NET Core core dump from a Linux system was:

  • Set up the Linux system to generate core dumps on crash
  • Copy or install the right version of .NET Core on the analysis machine
  • Install the version of LLDB matching your .NET Core’s SOS plugin
  • Load the SOS plugin in LLDB and tell it where to find the DAC
  • Set the debugger thread id for SOS thread-sensitive commands to work
  • Run sos PrintException or any other commands to analyze the crash

Fun fun fun.


You can also follow me on Twitter, where I put stuff that doesn’t necessarily deserve a full-blown blog post.

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*

5 comments

  1. Steve JohnsonFebruary 27, 2017 ב 12:40 AM

    Interesting article. Now that this will be more of a thing, I wonder what I would have to do to make SOSEX work under LLDB? I you have any insight, please share 🙂

    Reply
    1. Sasha Goldshtein
      Sasha GoldshteinFebruary 27, 2017 ב 12:28 PM

      Oh I doubt it would be any fun. But I really haven’t looked into the LLDB plugin model, and I also haven’t looked at how the cross-platform SOS talks to the DAC.

      Reply
      1. liurongweiMarch 1, 2017 ב 4:27 AM

        exec sos threads it gave me a error like this :

        lldb: symbol lookup error: /usr/share/dotnet/shared/Microsoft.NETCore.App/1.0.1/libsosplugin.so: undefined symbol: _ZN4lldb10SBFileSpec11SetFilenameEPKc

        Reply
  2. Mark RendleFebruary 27, 2017 ב 3:30 PM

    Thanks for this, most informative 🙂

    Re: “.NET Core on Linux x86_64 reserves a pretty gigantic address space”, how gigantic are we talking?

    Reply
    1. Sasha Goldshtein
      Sasha GoldshteinFebruary 27, 2017 ב 4:44 PM

      A “Hello World” ASP.NET Core process had >6GB of address space reserved. The real problem is that the core dump has it all on disk, or I wouldn’t care.

      Reply