MSXML in C++ but as elegant as in C# (Part 3)

24 בדצמבר 2008

תגיות: ,
18 תגובות

See the previous posts for this article here: Part 1, Part 2.

Before I review the projects for this article, I would like to describe the basics: a few simple steps that will give you a C# experience while programming MSXML with C++.

Step 1: Import MSXML

There are a number of ways to import com libraries in a C++ project. I think the simplest way is to add the following line in a common header (best precompiled).

#import <msxml6.dll> named_guids

This will create the headers (with extensions .tli and .tlh) that we need to access the COM objects created by MSXML. It will also automatically add them to your project. These also include other header files that we will be using later (comdef.h and comip.h)

The 6 in <msxml6.dll> stands for version 6. If you have this version installed, it should be in your path (under system32), so specifying its name is sufficient. Microsoft recommends that you use MSXML version 6 or version 3 unless you need some specific feature from another version. Choose version 6 to get the best in performance and security. Choose 3 (replace the 6 with a 3) if you want to target the broadest audience. Both versions work for the projects in these posts. You can read more about MSXML versions here.

We will be using types from the MSXML2 namespace (yes, for msxml3 and msxml6 too) so I recommend you add the following line too:

using namespace MSXML2;

The named_guids keyword will allow you to refer to guids by their names later in the code.

Step 2: Enter Smart Pointers

Simply put, a smart pointer is a C++ class that has the semantics of a pointer to another class but does not need to be released explicitly.

Smart pointers are able to achieve this due to three powerful C++ features which I will review briefly:

  1. Reliable object lifetime management
  2. Operator overloading
  3. Templates

C++ manages the lifetime of an object by calling the object’s destructor when it goes out of scope or after the destructor’s containing class is called. The destructor is also called if an exception is thrown from within the scope of an object or from within a nested call made from that scope. In this sense, the mechanism is reliable and ensures that class destructors can be used to release resources reliably.

C++ also supports overloading of the ‘–>’ operator. This allows an object of a class to return a pointer to an object other than itself, thereby giving it the semantics of that pointer.

The method of releasing a pointer differs from domain to domain (for instance using the ‘delete’ operator for memory, or by calling some domain specific Release function). But often, within a domain, pointers of different types can be released in the same way. It would therefore seem rather cumbersome to have to write the same smart pointer logic for each pointer type in the domain.

C++ templates allow you to write a smart pointer once as a template for many classes in a domain. STL provides classic examples of smart pointers templates with its auto_ptr and shared_ptr classes. For COM objects, Microsoft has implemented a smart pointer template called ‘_com_ptr_t’. _com_ptr_t uses the specific COM mechanisms to manage any COM object’s lifetime and can be found in comip.h which is automatically included in your code by the #import statement.

As a convenience, for many COM interfaces, Microsoft also provides a type definition (typedef) to instantiate a smart pointer type for that interface. according to the naming convention for these types, they usually have a ‘Ptr’ suffix.

Moreover, MSXML offers two sets of interfaces for many objects. The raw interfaces use the ‘good’ old COM types (like VARIANT, BSTR and HRESULT) and ‘dumb’ pointers (you know what I mean – not smart pointers). The second set of interfaces wrap the raw interfaces and are defined in terms of wrapper types that wrap raw COM types and manage their resources safely. If you only want the raw interfaces, you can add the keyword “raw_interfaces_only” after the #import statement above.

You may be asking yourself – why would I not want to import the non-raw interfaces? Why work so hard to manage resources safely, manage object lifetime, convert types safely and handle errors, if I can get it all for free? I will answer that in Part 5 when we review the SAXReader project.

Now, in order to make our C++ code look like code written in C#, we will use that second set of interfaces, and the smart pointers that are defined for them. We will also add our own type definitions to map the smart pointer types from the MSXML2 namespace to equivalent types in the System.Xml namespace.

typedef MSXML2::IXMLDOMNodePtr                  XmlNode;   

typedef MSXML2::IXMLDOMDocument2Ptr             XmlDocument;

typedef MSXML2::IXMLDOMElementPtr               XmlElement;       

typedef MSXML2::IXMLDOMAttributePtr             XmlAttribute;           

typedef MSXML2::IXMLDOMCommentPtr               XmlComment;           

typedef MSXML2::IXMLDOMNamedNodeMapPtr          XmlNamedNodeMap;           

typedef MSXML2::IXMLDOMNodeListPtr              XmlNodeList;           

typedef MSXML2::IXMLDOMDocumentFragmentPtr      XmlDocumentFragment;           

typedef MSXML2::IXMLDOMCDATASectionPtr          XmlCDataSection;           

typedef MSXML2::IXMLDOMProcessingInstructionPtr XmlProcessingInstruction;           

typedef MSXML2::IXMLDOMSchemaCollectionPtr      XmlSchemaCollection;           

typedef MSXML2::IXMLDOMParseErrorPtr            XmlParseError;   

typedef MSXML2::IXSLProcessorPtr                XslProcessor;       

typedef MSXML2::IXSLTemplatePtr                 XslTemplate;

Feel free to remove some of these if you don’t need them or add more, similar types if you use other interfaces.

You may be asking why I explicitly specified the MSXML2 namespace in these definitions. Would it not suffice to include the ‘using’ directive from the previous step?

Well, one of the few differences between the Visual C++ 6.0 environment and that of Visual Studio 2008 with regard to MSXML is that in the latter, some of the COM smart pointers (on the left side of my typedefs) were redefined in the global namespace. As we specifically need those from the msxml2 namespace, and to avoid an ambiguity compilation error, this has to be specified explicitly. On the whole, that makes the left side pretty ugly, but this will be of no concern to you once you include the typedefs as I propose.

Step 3: Add Some Helper Classes

A CoUninitialize Helper

Applications must call CoInitialize in a thread before any other call to COM in that thread. They must also call CoUninitialize when COM is no longer needed. Forgetting to call CoUninitialize is not a problem in a single threaded application, because when the process exits any clean-up that needs to be done will be done for you. However, in multi-threaded applications, every thread that runs and exits without calling CoUninitialize generates a resource leak in your application.

Seasoned C++ programmers like us probably won’t forget to call CoUninitialize before exiting a thread, but remember, you have to make the call even if your thread exits due to an unhandled exception. Altogether, managing all cases can make your code a little messy – which is a big NO, NO 🙂

The simple solution for such problems in C++ is Resource Allocation as Initialization (RAI). RAI refers to the use of C++ object lifetime management to ensure that a resource is released automatically, as we would expect it to.

The following class does the trick. Just instantiate a local variable of this type at the beginning of the outermost block in your thread and forget about CoUninitialize.

class ComInit

{

public:

    ComInit()  { ::CoInitialize(NULL); }

    ~ComInit() { ::CoUninitialize(); }

};

An Error Handling Helper

Another aspect of COM programming that we must address is error management.

C++ supports structured error handling very well, but unfortunately, its mostly ‘do it yourself’ with COM. Most COM methods return the cryptic HRESULT which immediately causes the following problems:

  1. HRESULT is not an enumerated type, so providing useful information to callers and users usually requires additional steps. Yes you could stay with the FAILED(hr) macro, but is that really enough information?
  2. When every line contains a call to a COM function returning an HRESULT, you have only a few options:
    1. You can check the return code of every function adding ~3 lines for each function call, rendering your code utterly unreadable. (75% of the code deals with error handling).
    2. You might take your chances and ignore some of the errors. A catastrophe waiting to happen.
    3. You can use macros to check the return code and throw an exception, as in the MSDN code quoted in my first post in this article. Macros make code difficult to browse and debug

Well, Microsoft defines a very useful class called ‘_com_error’ in the comdef.h include file. comdef.h is automatically included in your code by the #import statement. _com_error is a very useful class to throw when an HRESULT value indicates some error. It takes an HRESULT in its constructor and provides string formatted information through the ErrorDescription method. As you probably know, some COM objects support the IErrorInfo interface which provides more detailed error information. _com_error can optionally take one of those in its constructor too and provide easy access to that information.

So? Where does that get us? COM doesn’t throw this class.

Well, first of all, _com_error is used by the _com_ptr class to manage errors that occur in the COM methods that it calls. Thus, by wrapping a COM object with a _com_ptr you create a COM object in one line and use C++ try catch syntax to handle errors in a structured way.

Second, you can use _com_error objects yourself to access more information about an HRESULT error.

But what about errors that occur in your application and are not generated by COM? Well, just for convenience, I added my own Error class that can optionally handle HRESULT errors by reusing _com_error. Nothing clever here. You can write your own class to wrap an error with an exception, but please do something, because structured exception handling is the way to go. Here is mine:

class Error

{

    char m_Message[512];

 

public:

 

    Error (HRESULT hr)

    {

        m_Message[0] = '\0';

 

        _com_error comError (hr);

 

        const TCHAR* message = comError.ErrorMessage();

        if (message)

            strncpy_s (m_Message, message, sizeof (m_Message));

 

        m_Message[sizeof(m_Message)-1] = '\0';

    }

    Error (char* format, …)

    {

        va_list args;

        va_start(args, format);

 

        vsprintf_s(m_Message, format, args);

 

        va_end(args);

    }

    Error(const Error& r)

    {

        strcpy_s (m_Message, r.m_Message);

    }

    operator char*() { return m_Message; }

};

Visual C++ 6.0 and Visual Studio 2008 Compatibility

Oh, and one last point. I used a few of the new safe CRT calls provided with Visual Studio 2008. So, for backward compatibility with Visual C++ 6.0, define the following.

#if _MSC_VER <= 1200 // Visual Studio 6

    #define strncpy_s(dest, src, size) strcpy (dest, src)

    #define vsprintf_s vsprintf

    #define wcsncpy_s wcsncpy

#endif

In each of the C++ projects (download here) you will find my implementation of Step 1 and Step 2 in ImportMSXML.h and my implementation of Step 3 in Utils.h

In the next post(s) I will briefly describe each of the 5 project pairs (one in C# and one in C++) in more detail.

See the previous posts for this article here: Part 1, Part 2.

Stay tuned.

הוסף תגובה
facebook linkedin twitter email

כתיבת תגובה

האימייל לא יוצג באתר. שדות החובה מסומנים *

18 תגובות

  1. Geoff22 בינואר 2009 ב 9:36

    After much gnashing of teeth and slapping of forehead I found this article to be simply the best ever on C++ and XML usage.

    I have a small problem though when compiling this for release.

    I get:
    Error 1 error C2665: 'strncpy_s' : none of the 2 overloads could convert all the argument types u:\projects\general\usbnotify\usbnotify\Utils.h 30 USBNotify

    and it happens on the line with:

    strncpy_s (m_Message, message, sizeof (m_Message));

    What am I doing wrong?

    Many Thanks.
    Geoff

    הגב
  2. Geoff28 בינואר 2009 ב 13:35

    Don't sweat it fixed my own problem. Was not checking build config.

    Still a great article!

    הגב
  3. David Sackstein29 בינואר 2009 ב 15:12

    Hi Geoff,

    Sorry for the delay in responding.
    strncpy_s is the "safe" form of strncpy provided with VS 2008.
    If you are using VC 6.0 you should be using strcpy.
    Please take a look in Utils.h and you will see that I defined the following:

    #if _MSC_VER <= 1200 // Visual Studio 6 #define strcpy_s strcpy #define vsprintf_s vsprintf #define wcsncpy_s wcsncpy #endif Did you change that code? Which version of Visual Studio are you using? David

    הגב
  4. Helen5 בפברואר 2009 ב 11:57

    Thank you so much for these articles and the accompanying examples – they are exactly what I needed! I was having fits over the MSXML examples in the MSDN (goto?!), but your solution is wonderfully elegant. I particularly appreciate the way that your sample code includes both the c# and the c++ projects. You've saved my software team a lot of work this month!

    (we're currently porting our software over to Vista, which involves moving a lot of data out of the registry into XML data files. The catch is that the codebase we're porting is 10+ years-worth of software, including a lot of shared code between the PC-based tools and the embedded software run by our industrial instruments, and the data we're moving is used by multiple applications in four different languages…)

    Hope you continue the series – I'd love to read parts 4 and 5!

    הגב
  5. Geoff6 בפברואר 2009 ב 6:21

    Hi David,

    I used your code verbatim. The problem was in the release version of the build. I had not configured the character set to use multi-byte and therefore getting conversion errors on the parameters for strcpy_s. A noob mistake! I am using VS2008 and so in my final build have commented out the VS6 defines.

    This is a superb article!

    Many thanks,
    Geoff

    הגב
  6. Steve2 במרץ 2009 ב 19:56

    I think utils.h is broken for unicode in VS2008. There is a mix and match of TCHAR and char that causes compilation errors.

    הגב
  7. Duong14 באפריל 2009 ב 4:19

    This is a superb article!
    Many thanks
    Duong

    הגב
  8. Hans Smit16 ביוני 2009 ב 16:39

    I'm very impressed. The article/code you provided is an excellent diving board into the deep mirky waters call COM/MSXML.

    I'm currently busy developing a standard interface to xercesc/libxml2 and now you've given me the know how to include msxml.

    Like all the previous posts before me have said, and I will reiterate – many thanks.

    הגב
  9. Gary14 ביולי 2009 ב 1:51

    I just downloaded the samples and tried to compile them. I got an error on the use of strcpy_s in utiles.h. In some previous posts David called attention to some definitions for VS6. Specifically:

    #if _MSC_VER <= 1200 // Visual Studio 6 #define strcpy_s strcpy #define vsprintf_s vsprintf #define wcsncpy_s wcsncpy #endif In the version of utils.h that I got the first define was not there and seemd to hve been replaced with: #define strncpy_s(dest, src, size) strcpy (dest, src) I added the original define back and everything compiled correctly.

    הגב
  10. Hans Smit25 ביולי 2009 ב 10:47

    Update: I have now successfully managed to create a wrapper interface class to msxml/libxml2+libxslt/xercesc+xalanc. Thanks to your code I managed to create the msxml wrapper. I can now switch my code to compile against one of the 3 xml libraries without having to change a single line of code. Excellent.

    BUT: I discovered a small bug in your code. It's a memory leak.

    In the file Utils.h, the StringWriter class is missing a destructor. The following code should be inserted:

    ~StringWriter () {
    stream->Release();
    }

    After inserting this code, the memory leak I detected disappeared.

    הגב
  11. David Sackstein25 ביולי 2009 ב 20:23

    Hi Hans,
    Great. Thanks for the comment and the bug fix.
    C++ is not QUITE as elegant as C# : )

    הגב
  12. Mark Jones24 באוגוסט 2009 ב 18:42

    David,
    I like this very much, but can i compile my target executable with /clr and use the smart pointers.
    This is very new to me and I have a native C++ project that needs to be moved to managed code.

    הגב
  13. David Sackstein24 באוגוסט 2009 ב 20:02

    Hi Mark,
    If you are moving to managed code, I would recommend you use the classes in the System.Xml namespace.
    You will find excellent documentation and many examples on the net.
    Let me know if you need any assistance with that,
    David

    הגב
  14. Mark Jones26 באוגוסט 2009 ב 19:32

    David,
    Thanks for your advice but I'm not entirley sure if I can do do this due to the amount of work that may be required. I am trying to do this as a refactor against a set of bug fixes under the radar as my manager (recently promoted team member) is not prepared to allocate time to this.

    As Martin Fowler's Principles in Refactoring > What Do I Tell My Manager?
    "Of course, many people say they are driven by quality but are more driven by schedule. In these cases I give my more controversial advice: Don't tell!"
    He understands quality but is not prepared to put the time in.

    הגב
  15. Montano5 באוקטובר 2012 ב 16:05

    Hello to every body, it's my first visit of this web site; this web site carries awesome and genuinely fine information for readers.

    הגב
  16. Cruse11 באוקטובר 2012 ב 5:02

    Thanks a bunch for sharing this with all of us you really know what you're talking about!

    הגב
  17. Varela5 בינואר 2013 ב 6:23

    What's up all, here every person is sharing such experience, thus it's good
    to read this web site, and I used to pay a quick visit this weblog all the time.

    הגב
  18. Peak19 בינואר 2013 ב 16:26

    It's amazing to visit this site and reading the views of all colleagues regarding this paragraph, while I am also keen of getting knowledge.

    הגב