See the previous posts for this article here: Part 1, Part 2.
Before I review the projects for this article, I would like to describe the basics: a few simple steps that will give you a C# experience while programming MSXML with C++.
Step 1: Import MSXML
There are a number of ways to import com libraries in a C++ project. I think the simplest way is to add the following line in a common header (best precompiled).
#import <msxml6.dll> named_guids
This will create the headers (with extensions .tli and .tlh) that we need to access the COM objects created by MSXML. It will also automatically add them to your project. These also include other header files that we will be using later (comdef.h and comip.h)
The 6 in <msxml6.dll> stands for version 6. If you have this version installed, it should be in your path (under system32), so specifying its name is sufficient. Microsoft recommends that you use MSXML version 6 or version 3 unless you need some specific feature from another version. Choose version 6 to get the best in performance and security. Choose 3 (replace the 6 with a 3) if you want to target the broadest audience. Both versions work for the projects in these posts. You can read more about MSXML versions here.
We will be using types from the MSXML2 namespace (yes, for msxml3 and msxml6 too) so I recommend you add the following line too:
The named_guids keyword will allow you to refer to guids by their names later in the code.
Step 2: Enter Smart Pointers
Simply put, a smart pointer is a C++ class that has the semantics of a pointer to another class but does not need to be released explicitly.
Smart pointers are able to achieve this due to three powerful C++ features which I will review briefly:
- Reliable object lifetime management
- Operator overloading
- Templates
C++ manages the lifetime of an object by calling the object’s destructor when it goes out of scope or after the destructor’s containing class is called. The destructor is also called if an exception is thrown from within the scope of an object or from within a nested call made from that scope. In this sense, the mechanism is reliable and ensures that class destructors can be used to release resources reliably.
C++ also supports overloading of the ‘–>’ operator. This allows an object of a class to return a pointer to an object other than itself, thereby giving it the semantics of that pointer.
The method of releasing a pointer differs from domain to domain (for instance using the ‘delete’ operator for memory, or by calling some domain specific Release function). But often, within a domain, pointers of different types can be released in the same way. It would therefore seem rather cumbersome to have to write the same smart pointer logic for each pointer type in the domain.
C++ templates allow you to write a smart pointer once as a template for many classes in a domain. STL provides classic examples of smart pointers templates with its auto_ptr and shared_ptr classes. For COM objects, Microsoft has implemented a smart pointer template called ‘_com_ptr_t’. _com_ptr_t uses the specific COM mechanisms to manage any COM object’s lifetime and can be found in comip.h which is automatically included in your code by the #import statement.
As a convenience, for many COM interfaces, Microsoft also provides a type definition (typedef) to instantiate a smart pointer type for that interface. according to the naming convention for these types, they usually have a ‘Ptr’ suffix.
Moreover, MSXML offers two sets of interfaces for many objects. The raw interfaces use the ‘good’ old COM types (like VARIANT, BSTR and HRESULT) and ‘dumb’ pointers (you know what I mean – not smart pointers). The second set of interfaces wrap the raw interfaces and are defined in terms of wrapper types that wrap raw COM types and manage their resources safely. If you only want the raw interfaces, you can add the keyword “raw_interfaces_only” after the #import statement above.
You may be asking yourself – why would I not want to import the non-raw interfaces? Why work so hard to manage resources safely, manage object lifetime, convert types safely and handle errors, if I can get it all for free? I will answer that in Part 5 when we review the SAXReader project.
Now, in order to make our C++ code look like code written in C#, we will use that second set of interfaces, and the smart pointers that are defined for them. We will also add our own type definitions to map the smart pointer types from the MSXML2 namespace to equivalent types in the System.Xml namespace.
typedef MSXML2::IXMLDOMNodePtr XmlNode;
typedef MSXML2::IXMLDOMDocument2Ptr XmlDocument;
typedef MSXML2::IXMLDOMElementPtr XmlElement;
typedef MSXML2::IXMLDOMAttributePtr XmlAttribute;
typedef MSXML2::IXMLDOMCommentPtr XmlComment;
typedef MSXML2::IXMLDOMNamedNodeMapPtr XmlNamedNodeMap;
typedef MSXML2::IXMLDOMNodeListPtr XmlNodeList;
typedef MSXML2::IXMLDOMDocumentFragmentPtr XmlDocumentFragment;
typedef MSXML2::IXMLDOMCDATASectionPtr XmlCDataSection;
typedef MSXML2::IXMLDOMProcessingInstructionPtr XmlProcessingInstruction;
typedef MSXML2::IXMLDOMSchemaCollectionPtr XmlSchemaCollection;
typedef MSXML2::IXMLDOMParseErrorPtr XmlParseError;
typedef MSXML2::IXSLProcessorPtr XslProcessor;
typedef MSXML2::IXSLTemplatePtr XslTemplate;
Feel free to remove some of these if you don’t need them or add more, similar types if you use other interfaces.
You may be asking why I explicitly specified the MSXML2 namespace in these definitions. Would it not suffice to include the ‘using’ directive from the previous step?
Well, one of the few differences between the Visual C++ 6.0 environment and that of Visual Studio 2008 with regard to MSXML is that in the latter, some of the COM smart pointers (on the left side of my typedefs) were redefined in the global namespace. As we specifically need those from the msxml2 namespace, and to avoid an ambiguity compilation error, this has to be specified explicitly. On the whole, that makes the left side pretty ugly, but this will be of no concern to you once you include the typedefs as I propose.
Step 3: Add Some Helper Classes
A CoUninitialize Helper
Applications must call CoInitialize in a thread before any other call to COM in that thread. They must also call CoUninitialize when COM is no longer needed. Forgetting to call CoUninitialize is not a problem in a single threaded application, because when the process exits any clean-up that needs to be done will be done for you. However, in multi-threaded applications, every thread that runs and exits without calling CoUninitialize generates a resource leak in your application.
Seasoned C++ programmers like us probably won’t forget to call CoUninitialize before exiting a thread, but remember, you have to make the call even if your thread exits due to an unhandled exception. Altogether, managing all cases can make your code a little messy – which is a big NO, NO :)
The simple solution for such problems in C++ is Resource Allocation as Initialization (RAI). RAI refers to the use of C++ object lifetime management to ensure that a resource is released automatically, as we would expect it to.
The following class does the trick. Just instantiate a local variable of this type at the beginning of the outermost block in your thread and forget about CoUninitialize.
class ComInit
{
public:
ComInit() { ::CoInitialize(NULL); }
~ComInit() { ::CoUninitialize(); }
};
An Error Handling Helper
Another aspect of COM programming that we must address is error management.
C++ supports structured error handling very well, but unfortunately, its mostly ‘do it yourself’ with COM. Most COM methods return the cryptic HRESULT which immediately causes the following problems:
- HRESULT is not an enumerated type, so providing useful information to callers and users usually requires additional steps. Yes you could stay with the FAILED(hr) macro, but is that really enough information?
- When every line contains a call to a COM function returning an HRESULT, you have only a few options:
- You can check the return code of every function adding ~3 lines for each function call, rendering your code utterly unreadable. (75% of the code deals with error handling).
- You might take your chances and ignore some of the errors. A catastrophe waiting to happen.
- You can use macros to check the return code and throw an exception, as in the MSDN code quoted in my first post in this article. Macros make code difficult to browse and debug
Well, Microsoft defines a very useful class called ‘_com_error’ in the comdef.h include file. comdef.h is automatically included in your code by the #import statement. _com_error is a very useful class to throw when an HRESULT value indicates some error. It takes an HRESULT in its constructor and provides string formatted information through the ErrorDescription method. As you probably know, some COM objects support the IErrorInfo interface which provides more detailed error information. _com_error can optionally take one of those in its constructor too and provide easy access to that information.
So? Where does that get us? COM doesn’t throw this class.
Well, first of all, _com_error is used by the _com_ptr class to manage errors that occur in the COM methods that it calls. Thus, by wrapping a COM object with a _com_ptr you create a COM object in one line and use C++ try catch syntax to handle errors in a structured way.
Second, you can use _com_error objects yourself to access more information about an HRESULT error.
But what about errors that occur in your application and are not generated by COM? Well, just for convenience, I added my own Error class that can optionally handle HRESULT errors by reusing _com_error. Nothing clever here. You can write your own class to wrap an error with an exception, but please do something, because structured exception handling is the way to go. Here is mine:
class Error
{
char m_Message[512];
public:
Error (HRESULT hr)
{
m_Message[0] = '\0';
_com_error comError (hr);
const TCHAR* message = comError.ErrorMessage();
if (message)
strncpy_s (m_Message, message, sizeof (m_Message));
m_Message[sizeof(m_Message)-1] = '\0';
}
Error (char* format, ...)
{
va_list args;
va_start(args, format);
vsprintf_s(m_Message, format, args);
va_end(args);
}
Error(const Error& r)
{
strcpy_s (m_Message, r.m_Message);
}
operator char*() { return m_Message; }
};
Visual C++ 6.0 and Visual Studio 2008 Compatibility
Oh, and one last point. I used a few of the new safe CRT calls provided with Visual Studio 2008. So, for backward compatibility with Visual C++ 6.0, define the following.
#if _MSC_VER <= 1200 // Visual Studio 6
#define strncpy_s(dest, src, size) strcpy (dest, src)
#define vsprintf_s vsprintf
#define wcsncpy_s wcsncpy
#endif
In each of the C++ projects (download here) you will find my implementation of Step 1 and Step 2 in ImportMSXML.h and my implementation of Step 3 in Utils.h
In the next post(s) I will briefly describe each of the 5 project pairs (one in C# and one in C++) in more detail.
See the previous posts for this article here: Part 1, Part 2.
Stay tuned.