So, what can we do with Media Foundation? One of the simplest things, perhaps, is getting information on some media file, somewhat similar to what we see in Windows Explorer, but we can dig deeper if we like. Let’s get started.
First, we’ll create a simple Win32 console application named MediaInfo (I check the box to include ATL headers, we’ll use ATL smart pointers). We then add some Media Foundation includes (e.g. in StdAfx.h):
#include <mfidl.h>
#include <mfapi.h>
These are the basic header files for MF (there are more). We’ll also need to link against the MF libraries. I prefer to do that with #pragma lib instead of Project->Properties:
#pragma comment(lib, "mfplat.lib")
#pragma comment(lib, "Mfuuid.lib")
#pragma comment(lib, "mf.lib")
This helps ensure I get the libs no matter what: debug or release, 32/64 bit, if I lose the project file, etc. (a bit paranoid, I know).
In main, we need first to initialize COM and Media Foundation:
::CoInitializeEx(0, COINIT_MULTITHREADED);
::MFStartup(MF_VERSION);
The MFStartup function must be called before most MF operations. Next, we’ll create a local function called DisplayInfo that takes a file path and display some info to the console. So, the entire main function looks like so:
int _tmain(int argc, _TCHAR* argv[]) {
if(argc != 2) {
cout << "Usage: MediaInfo <path>" << endl;
return 1;
}
::CoInitializeEx(0, COINIT_MULTITHREADED);
::MFStartup(MF_VERSION);
HRESULT hr = DisplayInfo(argv[1]);
if(FAILED(hr)) {
cout << "Error: " << hex << hr << endl;
}
::MFShutdown();
::CoUninitialize();
return 0;
}
Now for the real stuff. How do we get information about a media file? We need a media source, an MF abstraction of some source for media data (in our case a file, but can be anything, such as a live feed from a camera).
To get a media source for a file, we’ll need the help of another MF entity, the Source Resolver. This object can “resolve” a source file into a media source object (implementing the IMFmediaSource interface). If the source resolver fails, it’s safe to say that MF does not recognize the file’s format, perhaps because a decoder is missing:
HRESULT DisplayInfo(LPCWSTR url) {
CComPtr<IMFSourceResolver> spResolver;
CHECK_HR(::MFCreateSourceResolver(&spResolver));
MF_OBJECT_TYPE type;
CComPtr<IUnknown> spUnkSource;
CHECK_HR(spResolver->CreateObjectFromURL(url, MF_RESOLUTION_MEDIASOURCE, NULL, &type, &spUnkSource));
CComQIPtr<IMFMediaSource> spSource(spUnkSource);
The CHECK_HR is a simple macro that checks the returned HRESULT and backs out if it’s a failure code:
#define CHECK_HR(x) { HRESULT _hr = (x); if(FAILED(_hr)) return _hr; }
First, we create the source resolver (MFCreateSourceResolver), and then we try to obtain a media source with the CreateObjectFromURL method. This one is synchronous, so there may be a short delay (doesn’t matter in our case), but if it does (perhaps running on a UI thread), there are asynchronous alternatives (BeginCreateObjectFromURL and EndCreateObjectFromURL).
Now that we have a media source, we need something called a presentation descriptor. This object describes the “presentation”, an MF term for a set of media steams sharing a common timeline:
CComPtr<IMFPresentationDescriptor> spDesc;
CHECK_HR(spSource->CreatePresentationDescriptor(&spDesc));
A presentation descriptor holds stream descriptors. Each stream corresponds to some media data, such as audio or video. We now need to iterate over all stream descriptors looking for a video and/or audio stream, then describe it:
DWORD count;
spDesc->GetStreamDescriptorCount(&count);
for(DWORD i = 0; i < count; i++) {
BOOL selected;
CComPtr<IMFStreamDescriptor> spStreamDesc;
CHECK_HR(spDesc->GetStreamDescriptorByIndex(i, &selected, &spStreamDesc));
if(selected) {
// analyze stream descriptor
}
}
A stream descriptor describes a stream of data (e.g. video or audio). Only selected streams are of interest. An unselected stream means there is data but it’s not interesting for some reason. Technically, we can select or deselect streams ourselves, but we’re just looking at the descriptors, not playing it. So, we’ll stick with selected streams.
A stream may have one or media types. For example, a video stream may be capable of providing data in more than one resolution. We need to get to those media types, look for the current one and display its properties:
CComPtr<IMFMediaTypeHandler> spHandler;
spStreamDesc->GetMediaTypeHandler(&spHandler);
CComPtr<IMFMediaType> spMediaType;
spHandler->GetCurrentMediaType(&spMediaType);
Once we have the media type, we can finally get some details.
The first thing we need to know is what kind of media this is. Typical result is audio or video. if it’s something else, we’ll ignore it:
GUID major;
spMediaType->GetMajorType(&major);
bool video;
if(major == MFMediaType_Audio)
video = false;
else if(major == MFMediaType_Video)
video = true;
else
continue;
The GetMajorType method returns the actual type of the media (as a GUID). There are several predefined GUIDs for this in the MF headers. We could actually get that from the media type handler directly.
Next, we want to display information that is common to audio and video, and then get specific info for audio and video.
Information in MF is stored in attributes. An attribute store implements the IMFAttributes interface, and it’s a kind of property bag, where keys are always GUIDs and values may be of several types (the value itself is stored in a PROPVARIANT), such as UINT32, UINT64, WCHAR*, GUID, IUnknown*, double and a BLOB. The IMFMediaType interface inherits from IMFAttributes, so we can query it directly. The complete list of attributes for MF can be found here.
Common attributes for a media type can be found here. Let’s display some of them:
cout << "Stream Index: " << i << endl;
cout << "Media type: " << (video ? "Video" : "Audio") << endl;
UINT32 compressed;
HRESULT hr = spMediaType->GetUINT32(MF_MT_COMPRESSED, &compressed);
if(SUCCEEDED(hr))
cout << "Compressed: " << (compressed ? "True" : "False");
GUID guid;
spMediaType->GetGUID(MF_MT_SUBTYPE, &guid);
::StringFromGUID2(guid, buffer, 128);
wcout << "Subtype GUID: " << buffer << endl;
There is a list of possible subtypes for video and audio. We need some conversion from a GUID to a more readable description using some lookup table. This is left as an exercise for the reader. Now let’s turn our attention to video attributes:
if(video) {
UINT32 num;
if(SUCCEEDED(spMediaType->GetUINT32(MF_MT_AVG_BITRATE, &num)))
cout << "Average bitrate: " << (num >> 10) << " Kbps" << endl;
UINT32 width, height;
::MFGetAttributeSize(spMediaType, MF_MT_FRAME_SIZE, &width, &height);
cout << "Frame size: " << width << " X " << height << endl;
::MFGetAttributeRatio(spMediaType, MF_MT_FRAME_RATE, &width, &height);
cout << "Frame rate: " << width / (float)height << " FPS" << endl;
}
There are many other attributes we can query. Let’s look at audio:
else {
UINT32 num;
if(SUCCEEDED(spMediaType->GetUINT32(MF_MT_AUDIO_BITS_PER_SAMPLE, &num)))
cout << "Bits/sample: " << num << endl;
if(SUCCEEDED(spMediaType->GetUINT32(MF_MT_AUDIO_NUM_CHANNELS, &num)))
cout << "# Channels: " << num << endl;
if(SUCCEEDED(spMediaType->GetUINT32(MF_MT_AUDIO_AVG_BYTES_PER_SECOND, &num)))
cout << "Average bytes/sec: " << num << endl;
if(SUCCEEDED(spMediaType->GetUINT32(MF_MT_AUDIO_SAMPLES_PER_SECOND, &num)))
cout << "Samples/sec: " << num << endl;
}
cout << endl;
}
One thing we forgot is to show the media file’s duration… Let’s add that. This is not a stream based attributes, but is file based (or more precise, presentation based):
UINT64 duration;
spDesc->GetUINT64(MF_PD_DURATION, &duration);
CTimeSpan span(duration / 10000000);
wcout << "Duration: " << (LPCWSTR)span.Format(L"%H:%M:%S") << endl;
CTimeSpan is a small shared MFC/ATL class that is used here for formatting purposes. The original value is in 100 nano-second units. This is converted to seconds when passed to CTimeSpan.
Here’s an output for some video file I have:

Here’s an example of an MP3 file:

The entire project is attached.