I’ve been writing a new course on this technology, so I thought I’d share some of my experiences with the Windows Media Foundation.
What is Windows Media Foundation?
The Windows Media Foundation is technically the successor of DirectShow (which is still around and very much supported), introduced in Windows Vista and enhanced in Windows 7.
It’s a multimedia platform, capable of playing, analyzing, writing and otherwise transforming media (mostly video & audio, but can technically be anything). It’s based on similar principles as DirectShow, such as interface based programming using COM, which naturally lends itself to multiple implementations.
MF exposes a COM API, much like many native technologies these days. Some people run away almost immediately when they hear “COM” uttered around some API, but at its core COM mandates interface based programming model, which is a good thing. The “bad” thing is probably the apartment model, which most people using COM don’t fully understand, or at least hate.
This is understandable, as the model is not a simple one, albeit a necessary evil considering the time COM was conceived (around 1993). At that time, multithreading of any kind was a new concept in the Windows arena – Windows 3.x didn’t have any and many programmers were using VB 3.0/4.0/5.0 and later 6.0, which did not support the creation of threads. COM apartments were built to protect objects that couldn’t protect themselves.
If COM was invented today, there wouldn’t have been any apartments. It would be similar to the .NET CLR model – everyone must be aware of multithreading and its perils, and that’s that. There are “helpers”, such as synchronization contexts, but the model is basically multithreaded all the way. Programmers simply had to grow up (yes, even the VB.NET guys).
Media Foundation states that its objects live in an MTA, but they are not “full COM objects”, which is not an official term. Either you’re a COM object or you’re not. You can’t technically be “part” COM. What MF means, is that it does not support a proxy coming from an STA, but must be marshaled to the MTA (or TNA) and then handed over to MF. This is pretty strange, so I tried to get a better feel of what this means.
In a perfect COM world, there would be no apartments and no proxies (in process). This is in fact achievable, by aggregating the Free Threaded Marshalar (FTM). This object implements IMarshal and ensures that whenever an interface pointer on the object is requested, it always returns a direct pointer, never a proxy. This sounds ideal, and it is, provided one avoids some potential “gotchas”, the main one being if an FTM based objects wants to hold interface pointers to non-FTM based objects. This is problematic, because when the FTM based object wants to call the other object it must do so with the correct pointer, be it a proxy or not, depending on the calling apartment. Although an FTM based object is apartment agnostic – the other object isn’t. The usual solution to this is to register the interface pointer upon reception in the Global Interface Table (GIT), getting back a DWORD cookie that is apartment agnostic. Then, the FTM based object can get a correct interface pointer from any thread by getting it from the GIT, using it, and releasing it, while continuing to keep that cookie. If this explanation confuses you, dear reader, you’re not alone, and if you didn’t hate apartments up till now, you may start now.
Does MF aggregate the FTM?
I was pretty sure that MF objects aggregate the FTM. After all, it’s almost a perfect solution. However, querying various objects for IMarshal (a first must sign for possible FTM aggregation) failed. MF objects don’t aggregate the FTM. Most of them are not created through the official COM CoCreateInstance API, but are created privately, partially leading to the “not full COM objects” statement. Checking some interfaces in the registry revealed it has its own proxy/stub DLL (does not use the type library marshalar). This is a snapshot of one of the common interfaces in MF, IMFAttributes:

What does all this mean? I’m not entirely sure, and the documentation on this is (IMHO) too complex. The only sure thing is that client’s threads had better be in the MTA, and client objects had better be in the MTA as well or aggregate the FTM.
After all this COM geek talk, what is really Media Foundation? I’ll talk about that in the next post in this series starting with the TopoEdit tool, the equivalent (sort of) of DirectShow’s GraphEdit.