Using (Modern) C++ in Driver Development

November 30, 2016

one comment

When most developers think of writing a driver, they think of hard core C programming, with C99/C11 usage as a bonus – if they’re lucky.

However, C++ can be used today in driver development, but not just for the ability to declare variables at any point in a function (available in C99 as well), but use more useful C++ features, both old and new, available with the C++ 11 and C++14 standards. In the kernel, there is no standard library nor C++ runtime library, which means most types used in user mode are simply unavailable, such as std::string, std::vector, std::shared_ptr and most others. Exception handling with try/catch constructs cannot be used, either.

In this post, I’ll give several examples that make driver code more easy to write and maintain, and helps prevent errors, such as memory leaks. The list is by no means complete, but should give you some ideas.

1. Using RAII

The Resource Acquisition is Initialization (RAII) paradigm is a long time favorite in C++ (although poorly named). It allows resource cleanup, whether that resource is memory, synchronization objects, or something else. Many existing types in the standard library support RAII, and it’s equally useful when dealing with kernel constructs.

Here’s an example that uses a common synchronization primitive called a fast mutex. Such a mutex is not exposed to user mode, but it behaves essentially similar to a normal mutex (that is exposed to user mode with CreateMutex and such). A fast mutex is faster than a normal mutex because it does not support recursive acquisition – less management overhead. Another difference with respect to a normal mutex is that the current IRQL is raised to APC_LEVEL after acquisition.

struct FastMutex : FAST_MUTEX {
    FastMutex() {

struct FastMutexLocker {
    FastMutexLocker(PFAST_MUTEX mutex) : m_pFastMutex(mutex) {
    ~FastMutexLocker() {

    PFAST_MUTEX m_pFastMutex;

Here we see two things. First, a FastMutex structure is declared inheriting from the kernel’s FAST_MUTEX structure to allow easy initialization of the fast mutex (easy to forget, especially when it’s part of another class). The FastMutexLocker class automates acquiring and releasing the fast mutex by calling ExAcquireFastMutex in the constructor and ExReleaseFastMutex in the destructor. Here’s an example usage:

bool SandBox::IsProcessInSandBox(HANDLE pid) {
    FastMutexLocker locker(&m_ProcessesLock);

    for (size_t i = 0; i < m_Processes.Size(); i++)
        if (m_Processes[i]->ProcessId == pid)
            return true;
    return false;

Assume the m_Processes is an array that may be modified while the code executes the loop searching for a process. Thus, we need to protect the array from concurrent access, in this case with a fast mutex called m_ProcessesLock that is a member of the Sandbox class. Notice how RAII helps to write less code and not forget to release the fast mutex.


2. Memory Management

Memory management is critical in a driver, since memory that is not freed when the driver unloads is left allocated until the next system boot. Accessing invalid memory causes a bugcheck (blue screen); clearly, managing memory correctly is desirable.

In the kernel, general memory allocation is done with the ExAllocatePool* family of functions and freed with ExFreePool*. These function resemble the user mode malloc/free, but they take at least one extra parameter – the pool from which to allocate, which mostly is PagedPool (memory that can be paged out to disk) or NonPagedPool (memory that is always resident in RAM). To help track such allocations, an extra parameter can be given called a tag – a 4 byte value that identifies the allocation. This helps in tracking allocations with tools such as PoolMon (from the WDK) or my GUI variant, PoolMonEx (available on Github).

One way to help with calling these functions, is overriding the C++ new and delete operators, that can accept extra arguments for the pool type and the tag, like so:

void* __cdecl operator new(size_t size, POOL_TYPE pool, ULONG tag = 0) {
    return ExAllocatePoolWithTag(pool, size, tag);
void __cdecl operator delete(void* p) {
With these definitions in place, the new operator can be invoked like so:
auto sb = new (PagedPool) SandBox(m_SandBoxId);
In this case the tag was not supplied and defaulted to zero. Also, notice the auto keyword used to simplify the code, since it's obvious (at least to the compiler) that sb is of type SandBox*.


3. Lambda Functions

Lambda functions can simplify and streamline various registration functions that are required for many types of drivers. For example, suppose I’m writing a file system mini filter driver; such driver must register itself with the system and provide a host of callbacks for operations it’s interested in intercepting, such as read, write, create, etc. Let’s also suppose I have a global Driver class singleton that I want to pass these callbacks to because that class has state that is required when invoking these callbacks.

Here’s an example of filling a FLT_OPERATION_REGISTRATION structure that is one of the substructures required when a mini filter calls FltRegisterFilter in the DriverEntry function:

        [](auto data, auto objects, auto context) {
            return Driver::Get()->PreCreateOperation(data, objects, context);
        nullptr },

The lambda is converted to a C-function pointer (which is what the structure expects). Also notice the excessive usage of auto, in lambda arguments (available in C++14), which simplifies the delegation code. The actual function in the Driver class of course will list the actual types. But since all the lambda does is delegates, the actual types are unimportant.


4. Standard Library Replacements

Clearly missing are useful classes like strings and containers. Trying to copy the existing containers from the user mode standard libraries won’t work, since it’s too much dependent on user mode APIs (operator new, as a simple example, cannot compile). However, it’s not too difficult to create our own. We don’t need fancy iterators for containers and such, but we can still get decent usage patterns a-la modern C++.

Let’s start with a string class. The kernel uses UNICODE_STRING structures in most APIs, and it may be tempting to create a wrapper class for this structure. However, this is more difficult than it looks, because of ownership issues for the pointed-to string. For example, the function RtlInitUnicodeString doesn’t allocate anything and just initializes the internal pointer to the provided string pointer (and sets the lengths). So in this case, there’s no way of knowing if the pointed-to string is statically allocated (like a verbatim string) or was dynamically allocated. Also, copying such strings has various rules depending on the API used. Therefore, I suggest discarding the idea of wrapping a UNICODE_STRING directly.

Instead, we’ll create a separate String class and provide a way to get a UNICODE_STRING object when needed. Here are the declarations for the constructors, assignment operators and destructor:

class string {
const wchar_t* str = nullptr, POOL_TYPE pool = PagedPool, ULONG tag
= 0);
const string& other
string& operator= (const string& other
string&& other
string& operator=(string&& other


As with the new operator, we want to have the choice of what pool to use for allocating the string, as well as a tag. We provide a way to initialize the string from a UNICODE_STRING as well as a character pointer. Also, we provide a move constructor and move assignment, since they provide performance benefits where used. Other members of interest are the index operator[] for accessing characters, a Length() method, and comparison operator overloads (==, > etc.) – fortunately, the kernel implements the C-standard functions such as wcscmp, wcslen and the like.

Here are some examples for implementing some of the members:

string::string(const wchar_t* str, POOL_TYPE pool, ULONG tag) : m_Pool(pool), m_Tag(tag) {
    if (str) {
        m_Len = wcslen(str);
        Allocate(m_Len, str);
    else {
        m_str = nullptr;
        m_Len = 0;

string::~string() {
    if (m_str)
        ExFreePoolWithTag(m_str, m_Tag);

string::string(string&& other) {
    m_Len = other.m_Len;
    m_str = other.m_str;
    m_Pool = other.m_Pool;
    other.m_str = nullptr;
    other.m_Len = 0;

bool string::operator==(const string& other) {
    return wcscmp(m_str, other.m_str) == 0;
UNICODE_STRING* string::GetUnicodeString(PUNICODE_STRING pUnicodeString) {
    RtlInitUnicodeString(pUnicodeString, m_str);
    return pUnicodeString;

wchar_t* string::Allocate(size_t chars, const wchar_t* src) {
    m_str = static_cast<wchar_t*>(ExAllocatePoolWithTag(m_Pool, 2 * (chars + 1), m_Tag));
    if (!m_str)
        return nullptr;
    if (src)
        memcpy(m_str, src, 2 * (chars + 1));

    return m_str;
Another super useful type to have is a vector. Although the kernel provides the LIST_ENTRY type for bi-directional linked list and supporting API, it's clunky to use and slower than a dynamic array. Here's a basic definition of a templated vector:
template<typename T>
struct vector {
    vector(size_t capacity = 0, POOL_TYPE pool = PagedPool, ULONG tag = 0) {
        if (capacity == 0)
            capacity = 4;
        m_Size = 0;
        m_Pool = pool;
        m_Tag = tag;

        m_array = Allocate(m_Capacity = capacity);

    vector(const vector&) = delete;
    vector& operator=(const vector&) = delete;

    ~vector() {
        if (m_array)
            ExFreePoolWithTag(m_array, m_Tag);

    size_t Size() const {
        return m_Size;

    size_t Capacity() const {
        return m_Capacity;

    void Add(const T& value) {
        NT_ASSERT(m_Size <= m_Capacity);
        if (m_Size == m_Capacity)
            Resize(m_Capacity * 2);
        m_array[m_Size++] = value;


Notice that I decided for simplicity not to allow copying of the vector by removing the definitions using the C++11 =delete syntax. We provide a pool type and tag as with a string. Adding an item re-allocates the array if needed to twice its current capacity (the standard library uses 1.5 times). Not shown are a Remove method and index operator[] for getting to a specific element.

One nice feature that exists in the standard std::vector class (and all containers) is the ability to use a simpler version of for that does the iteration. We can get the same effect without implementing a full fledged iterator by creating a begin and end functions in the vector class like so:

T* begin() {

const T* begin() const

* end() {
m_array + m_Size;

const T* end() const
m_array + m_Size;


With these definitions in place, we can write an iterating loop using C++ 11 for like so (replacing the code in the RAII example):

bool SandBox::IsProcessInSandBox(HANDLE pid) {
    FastMutexLocker locker(&m_ProcessesLock);

    for (const auto& process : m_Processes)
        if (process->ProcessId == pid)
            return true;
    return false;



Kernel programming does not have to continue existing in the dark ages. Modern C++ can be used to simplify code writing, maintenance and eventually reduce bugs. The reusable classes such as the string and vector shown can be used in any kernel driver or library. The modern debuggers understand C++ pretty well, so debugging is not made more difficult.

Welcome to the new age!

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>


one comment

  1. aviDecember 1, 2016 ב 18:03

    great post, very useful!