DCSIMG
September 2009 - Posts - Pavel's Blog
Sign in | Join | Help

Pavel's Blog

Pavel is a software guy that is interested in almost everything
software related... way too much for too little time

September 2009 - Posts

Multiple Instance Windows Media Player

Published at Sep 23 2009, 12:21 PM by pavely

When Windows media player (WMP) is open, any attempt to open it again simply reactivates the existing WMP window. WMP is running as a single instance. It uses a relatively well known methods for this, creating a named mutex on startup and seeing if it already exists (by calling GetLastError and comparing with ERROR_ALREADY_EXISTS). WMP uses a mutex called "Microsoft_WMP_70_CheckForOtherInstanceMutex" and this name seems to be consistent between WMP versions (at least starting from Windows XP).

Mark Russinovich showed this mutex in the latest addition of Windows Internals. Let’s run an instance of WMP and look at process explorer’s lower pane when configured to show handles:

image

The session prefix indicates this object was created in the logged in user’s session (session 1 in Vista and up).

We can manually close the handle using Process Explorer again. Just right click the handle and select “Close Handle”. After this – you can open another WMP – that new WMP will try to create that named mutex – and will succeed.

Closing a handle like this may be catastrophic – the process doesn’t know the handle was closed “behind its back”, so any use of this handle will fail. Even worse, a new handle in that process may be created pointing to another object altogether, without the process realizing it. In this particular case, it’s benign.

What if you wanted to automate this, so that you could open multiple instances of WMP and play several video/audio files at the same time? Or pause one, play another, etc.? I think this has its uses.

How would we go about doing it? How can we close the correct handle like Process Explorer does?

Maybe we can call OpenMutex with the above name and close the handle twice… this won’t work, as after the first CloseHandle, the handle becomes invalid and the next CloseHandle simply fails.

What we need is to get to the handle inside the WMP process and close it from there. Easier said than done…

There are basically two ways we can go:

1. Write a driver, that can access the EPROCESS kernel structure, find the handle table for that process, locate the handle and close it. Theoretically possible, but many issues involved: EPROCESS is undocumented, except through the kernel debugger – not much fun to work with; installing a driver required user consent if UAC is active – not so user friendly; driver by its own nature is much dangerous to use (blue screen possible), …

2. Inject code into the WMP process that will scan the handle table from user mode, locate the handle and close it. Sounds better, easier; just one caveat: no Windows API allows scanning handles and getting the name of the object their pointing at.

So, we’ll go with option 2. Injecting code into a process is fairly well documented, e.g. by calling CreateRemoteThread pointed at LoadLibrary (because kernel32 is loaded at the same virtual address in every process) where our DLL is loaded into the other process and does the deed in its DllMain. The complete details can be found in Jeffrey Richter’s book Windows Via C/C++ (5th edition), or look at the source code accompanying this post.

Getting our code to execute under a WMP process is possible. The only thing remaining is handle enumeration. Although there is no official Windows API to do this, there is a native API (inside ntdll.dll) called NtQueryObject. It’s partially documented in the Windows SDK and in the Windows Driver Kit (WDK) (under ZwQueryObject) – there is a lot of symmetry in the APIs from ntdll and the executive. This function allows getting information given a handle. So, we can scan the handles, starting from 4 (the first legal handle value) up to some limit and look for an object with the aforementioned name. When we find it – just CloseHandle it and we’re done.

Here’s the prototype for NtQueryObject that we need:

 

typedef enum _OBJECT_INFORMATION_CLASS {

   ObjectBasicInformation, ObjectNameInformation, ObjectTypeInformation,

   ObjectAllInformation, ObjectDataInformation

} OBJECT_INFORMATION_CLASS;

 

typedef struct _UNICODE_STRING {

   SHORT Length;

   SHORT MaxLength;

   PWSTR String;

} UNICODE_STRING, *PUNICODE_STRING;

 

typedef struct _OBJECT_NAME_INFORMATION {

   UNICODE_STRING Name;

   WCHAR NameBuffer[1];

} OBJECT_NAME_INFORMATION, *POBJECT_NAME_INFORMATION;

 

extern "C" NTSYSAPI LONG NTAPI NtQueryObject(

   __in_opt HANDLE  Handle,

   __in OBJECT_INFORMATION_CLASS  ObjectInformationClass,

   __out_bcount_opt(Length) PVOID  ObjectInformation,

   __in ULONG  Length,

   __out_opt PULONG  ReturnLength

);

 

We need to use ObjectNameInformation (which is undocumented).

Here’s the way we can find the correct handle and close it:

OBJECT_NAME_INFORMATION* ninfo = (OBJECT_NAME_INFORMATION*)::malloc(1024);

 

// run through all handles until we find it

for(int h = 4; h <= 0x120; h += 4) {

      LONG status = NtQueryObject((HANDLE)h,

       ObjectNameInformation, ninfo, 900, NULL);

      if(status == 0) {

         PWSTR name = ::wcsrchr(ninfo->NameBuffer, L'\\');

         if(name != NULL && ::lstrcmpW(name + 1, mutexName) == 0) {

            // found it!

            ::CloseHandle((HANDLE)h);

            break;

         }

      }

}

::free(ninfo);

mutexName is the name we’re after.

Attached is the source code and the executable. The DLL is stored as a resource in the EXE, and extracted at runtime to a temp folder. This trick makes it easy to distribute – only one file is required.

Enjoy Media Player multi instancing!

A Thread’s Stack

Published at Sep 16 2009, 01:24 PM by pavely

When creating threads, we don’t usually think of its stack size. In the native world, the CreateThread function accepts a stack size (second argument) which we usually pass as 0. In the managed world, the Thread class exposes a pair of constructors expecting a stack size argument (which I was reminded by a comment).

Why is this important? Creating threads has its costs. This is not only the added work the Windows scheduler must undertake or the data structures that must be allocated in the kernel to manage that thread (KTHREAD, ETHREAD, etc.). Even if the threads are mostly waiting, memory for their stacks is wasted.

When a thread is created, actually two stacks are created: one in user space (lower addresses) and one in kernel (system) space. The latter is very limited in size (12KB in x86, 24KB in x64) and almost always resides in RAM (the reason has to do with interrupt service routines and other high IRQL code, such as DPCs, that are beyond the scope of this post). This stack size cannot be changed in any documented way, and in any case, only relevant for device driver programmers. We’ll concern ourselves with the user mode stack.

The Native World

When the stack size is specified as 0 in CreateThread, a default value, embedded in the PE header is used, which is 1MB by default. However, that 1MB is not actually committed in its entirety, but only a single page (4KB) and the next page in memory is marked with the PAGE_GUARD protection attribute, that causes an exception to be generated when the stack tries to expand beyond that first page. Windows’ memory manager responds by automatically committing the next page and moving the guard page to the following page (technically all downwards, as Intel stacks grow down in addresses, not up). So, what’s the meaning of that 1MB? This is the reserved size – that is, the maximum contiguous memory that thread stack can have. Trying to grow beyond that causes a stack overflow exception. (in this brief explanation, I’ve omitted some minor details for clarity)

Reserving memory is considered an inexpensive operation – no memory is committed, no RAM wasted, not even page file space; the only thing happening is the addition of a Virtual Address Descriptor (VAD) to indicate the fact that another address region needs to be described and marked “reserved”. But – and this is a big but – address space range is wasted. That means, other allocations (of any kind) have less address space to work with. This is mostly problematic for 32 bit processes, as they are limited to 2GB or 3GB (and sometimes 4GB on 64 bit system, look at my post for more details). 64 bit processes are mostly unaffected, as their address space is vast (~8TB).

Here’s a simple experiment we can try: how many threads can we create in a 32 bit process with 2GB user address space?

 

DWORD WINAPI DoSleep(PVOID) {

   Sleep(INFINITE);

   return 0;

}

 

int _tmain(int argc, _TCHAR* argv[]) {

   int count = 0;

   DWORD id;

   do {

      HANDLE h = CreateThread(0, 0, DoSleep, 0, 0, &id);

      if(h == NULL)

         break;

      count++;

   } while(true);

   printf("Total threads: %d\n", count);

   getchar();

 

   return 0;

}

Running this on my system yields:

image

When opening Task Manager and looking at the process memory we find:

image

The red rectangle indicates the committed size (process-wide). This is definitely less than 1456 * 1MB! But the address space is pretty full (almost 1.5GB just for stack threads!)

Most applications do not require such a large stack (1MB), so we can change that. One way is to change the size globally using a linker option. This will set a different stack size for all threads that specify 0 for the second argument to CreateThread. Here’s the dialog in Visual Studio 2008:

image

The “Stack Reserve Size” is the relevant option. Let’s change this to 65536 (64KB):

image

And run it again. This time the result is:

image 

Better than our previous 1456.

The issue is (of course) not the number of threads, as both numbers are ridiculously large. But the saving of address space allows more allocations of a “conventional” nature (malloc, new, VirtualAlloc, HeapAlloc, etc.).

The second argument to CreateThread allows changing the committed or the reserved size of that particular thread’s stack, overriding the default set by the linker option. By default, the change is in the initial committed size. To change the reserved size, one must specify the STACK_SIZE_PARAM_IS_A_RESERVATION constant (which I find to be a ridiculous and inconsistent name) in the flags argument (one before last).

What is you have a prebuilt EXE with no source code, and you suspect it’s creating too many threads with large stacks? You can use the editbin.exe tool (installed with Visual Studio) to manipulate PE header values, including this one. To do the same (reduce stack reservation to 64KB, for example) one could execute from a command prompt:

editbin /stack:65536 ThreadStack.Exe

The Managed World

In .NET, the thread’s stack size cannot be set in any visible way, and it’s set to 1MB by default. The additional problem with .NET, is that 1MB is immediately committed! (not just reserved). This means it consumes memory right away, even if the stacks don’t need to grow to 1MB.

Here’s a test to verify this:

static void Main(string[] args) {

   int count = 1;

   do {

      Thread t = new Thread(() => Thread.Sleep(Timeout.Infinite));

      try {

         t.IsBackground = true;

         t.Start();

      }

      catch(OutOfMemoryException e) {

         break;

      }

      catch(ThreadStartException e) {

         break;

      }

      count++;

   } while(true);

   Console.WriteLine("Threads: {0}", count);

   Console.ReadLine();

}

The result of running this simple app is:

image

Close to its native counterpart.

Looking at task manager reveals the big difference:

image

Note the almost 1.5GB of committed memory!

Can we change this behaviour? Not in any way I could find.

Can we at least change the default 1MB? The VS property pages do not expose this option. However, using editbin.exe works on .NET executables as well as native ones, because both are PE files with the same basic header. This works even with signed assemblies, because this value lies in the part that is not hashed by the signing process.

Using this command line:

editbin /stack:131072 ManagedThreadStack.Exe

yields:

image

Definitely an improvement!

Conclusion

Thread stack sizes need to be taken into account, especially in heavily multithreaded applications. The CLR imposes limits on what we can do, but hopefully more control will be available in future versions of the CLR.

Intel vPro Conference

Published at Sep 14 2009, 01:04 PM by pavely

I was present in the Intel vPro conference today. Although I didn’t stay through the entire event, I got the jist of it.

Intel talked about what vPro is, and how it can be used in remote IT management. Although vPro is not really new (launched in 2006), it has gone through some enhancements and improvements with the new 32nm technology.

What is vPro? It’s actually three components that interact with each other, regardless of the existence of an operating system on top of that: A CPU, chipset and an Intel network chip. Those three allow remote access to a system that may be turned off, hibernated or otherwise disabled (e.g. some hardware failure). With the right admin tools, that system can be brought back to life, turned off, put to sleep, installed some patch and otherwise manipulated, even in a catastrophic hardware or OS failure (when appropriate).

One of the demos shown was using vPro technology with Microsoft’s SCCM (System Center Configuration Manager, the new version of SMS). The two technologies complement each other, creating an all around IT solution. Impressive.