DCSIMG
January 2012 - Posts - All Your Base Are Belong To Us

All Your Base Are Belong To Us

Mostly .NET internals and other kinds of gory details

January 2012 - Posts

Analysis of a Mobile Redirection Framework and Obfuscated Regular Expressions

I don’t often read Haaretz, but there are from time to time articles that friends share on Facebook or come up in search results – and I find myself on the Haaretz website. Often enough, it happens on my mobile phone – and every time I find myself redirected to a very primitive version of the website. Compare for yourself:

imageimage
(Screenshot on the right obtained by changing the user agent in the Chrome Canary build. Very nice built-in feature.)

I was curious what were the criteria used by the Haaretz website to do this redirect, and started sniffing around the traffic with Fiddler. After most of the Haaretz front page has been downloaded, the browser suddenly issued a request for g.watap.net/w2w/haaretz, which issues not one, but two 302 redirects and eventually lands on the crippled mobile version.

image

Interestingly, I’ve tried more than one mobile user agent, and the resulting mobile website was pretty much the same (so I am getting the same experience with an iPhone, Android, or a feature-phone). I believe this is a poor choice on Haaretz’s behalf, so I started investigating a little.

I started by running a whois query on watap.net, and found that it’s registered through Go Daddy for PassCall Advanced Technologies. Then I turned my attention to PassCall, and found on their website that they are providing a platform that adapts existing websites to mobile browsing. Indeed, I find Haaretz in their list of customers. From what I could tell, all the customers are Israeli companies, and the g.watap.net host resolves to an Israeli IP address, probably hosted by NetVision, a major Israeli ISP.

What is the precise process used by PassCall to determine whether or not to redirect my browser to the dumbed-down mobile version? I was brave enough to start reading through the ~8500 lines of HTML and script that is the Haaretz front page. Very close to the beginning there’s a copyright notice by PassCall with a minified script. I won’t paste the whole thing, but here’s a start:

var passcall_pcmdt={i$i:function(){try{eval(function(p,a,c,k,e,d){e=function(c){return(c<a?'':e(parseInt(c/a)))+((c=c%a)>35?String.fromCharCode(c+29):c.toString(36))};if(!''.replace(/^/,String)){while(c--){d[e(c)]=k[c]||e(c)}k=[function(e){return d[e]}];e=function(){return'\\w+'};c=1};while(c--){if(k[c]){p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c])}}return p}('3 f=D(p,q,o,2,e){4(6.7.a(\'5=0\')>-1)m;3 b=s r();b.t(b.n()+B);3 d=s r();d.t(d.n()+1);4(9.c.a(\'y=1\')>-1){6.7=\'5=0; j=/; i=\'+d.l();m}E 4(9.c.a(\'G=1\')>-1){6.7=\'5=0; j=/; i=\'+b.l();m}3 8=6.7.a(\'5=1\')>-1;4(!8){3 v=p.h(k.g);3 x=q.h(k.g);3 u=!o.h(k.g);8=(v||x)&&u;6.7=\'5=\'+F(I(8))+"; j=/; i="+b.l()}4(8){2=9.c.w(9.H,2);4(C e!=\'z\'&&e.A){2+=2.a(\'?\')>-1?\'&\':\'?\';2=2.w(\'?&\',\'?\');2+=e}9.c=2}}',45,45,'||r\x65\x64\x69rt\x6f|\x76\x61r|\x69\x66|___\x70\x63\x6d\x64\x74___|\x64\x6f\x63\x75men\x74|\x63\x6f\x6f\x6b\x69\x65|\x72\x65\x64\x69\x72|l\x6f\x63\x61ti\x6fn|\x69n\x64\x65x\x4ff||\x68\x72\x65\x66|\x62\x62|p\x61r\x61ms||u\x73\x65rAge\x6e\x74|\x74\x65\x73\x74|\x65\x78\x70\x69\x72\x65\x73|\x70a\x74h|\x6e\x61\x76\x69\x67\x61t\x6fr|toU\x54\x43S\x74\x72in\x67|\x72e\x74\x75\x72\x6e|g\x65\x74D\x61te|\x723|\x721|r2|\x44\x61\x74\x65|\x6e\x65\x77|\x73\x65\x74D\x61\x…

I took this beauty to jsbeautifier.org where it got a much prettier shape. Here’s the first part, beautified:

var passcall_pcmdt = {
    i$i: function () {
        try {
            eval(function (p, a, c, k, e, d) {
                e = function (c) {
                    return (c < a ? '' : e(parseInt(c / a))) + ((c = c % a) > 35 ? String.fromCharCode(c + 29) : c.toString(36))
                };
                if (!''.replace(/^/, String)) {
                    while (c--) {
                        d[e(c)] = k[c] || e(c)
                    }
                    k = [function (e) {
                        return d[e]
                    }];
                    e = function () {
                        return '\\w+'
                    };
                    c = 1
                };
                while (c--) {
                    if (k[c]) {
                        p = p.replace(new RegExp('\\b' + e(c) + '\\b', 'g'), k[c])
                    }
                }
                return p
            }(…

Okay, this is obviously an unpacker – it even says function (p, a, c, k, e, d) right there. Thanks for the hint. So the first part is a slightly minified unpacker, and the hex-encoded strings (not shown here) are probably the actual code. Instead of trying to run the unpacking algorithm with a pen and paper, I simply put a breakpoint in the beginning of the script and started stepping in and out until I got this beautiful function, called f, which does the interesting part:

var f = function(r1, r2, r3, redirto, params) {
  if (document.cookie.indexOf('___pcmdt___=0') > -1)
    return;
 
var b = new Date();
 
b.setDate(b.getDate() + 360);
 
var bb = new Date();
 
bb.setDate(bb.getDate() + 1);
 
if (location.href.indexOf('snopcmdt=1') > -1) {
   
document.cookie = '___pcmdt___=0; path=/; expires=' + bb.toUTCString();
   
return
 
} else if (location.href.indexOf('nopcmdt=1') > -1) {
   
document.cookie = '___pcmdt___=0; path=/; expires=' + b.toUTCString();
   
return
 
}
 
var redir = document.cookie.indexOf('___pcmdt___=1') > -1;
 
if (!redir) {
   
var b1 = r1.test(navigator.userAgent);
   
var b2 = r2.test(navigator.userAgent);
   
var b3 = !r3.test(navigator.userAgent);
   
redir = (b1 || b2) && b3;
   
document.cookie = '___pcmdt___=' + parseInt(Number(redir)) + "; path=/; expires=" + b.toUTCString()
 
}
 
if (redir) {
   
redirto = location.href.replace(location.host, redirto);
   
if (typeof params != 'undefined' && params.length) {
     
redirto += redirto.indexOf('?') > -1 ? '&' : '?';
     
redirto = redirto.replace('?&', '?');
     
redirto += params
    }
  location.href = redirto
  }
}

Note how this is no longer obfuscated, and perfectly readable. The script starts by checking if there is a cookie instructing it whether to do the mobile redirect or not. Marked in bold are the interesting parts – this is what we get if we have to make a new decision – and then the redirect itself is simply replacing location.href with a new location. The whole redirect-or-not logic boils down to three regular expressions (r1, r2, r3). Let’s take a look at these regular expressions. Here is r1:

^(((A|3|Q)l(v|5|c)a(t|3|2)e(l|3|x))|((X|6|E)Z(O|2|j)S)|((9|H|6)D(q|2|h)_(h|3|T))|((H|7|1)D_m(i|0|j)n)|((9|I|6)C(v|1|o)p(q|8|p)i(q|e|j)d(v|8|F)r(v|o|q)m(4|P|9)a(s|2|0)s(c|4|X)a(v|7|l)l(q|7|C)o(q|6|d)e(j|4|0)h(a|6|0)a(r|0|q)e(t|5|x)z)|((v|8|L)G(E|7|3)?[-/_])|((0|M|6)a(u|3|Q)i B(r|7|x)o(w|4|0)s(q|6|e)r)|((P|0|h)C(L|2|q)[4-6][4-6])|((q|6|S)E(h|C|Q)-)|((v|S|j)G(H|2|3)-)|((S|0|2)I(E|X|h)-)|((j|4|S)K_)|((q|6|S)O(j|4|N)I(M|x|q))|((8|S|4)e(0|n|4)d(j|o|Q))|((j|8|T)e(j|l|X)i(4|t|6))|((h|p|q)o(q|r|5)t(q|a|h)l(q|m|v)m(h|8|m)))

Looks like a bad-ass regular expression? Not at all. In fact, this is just a light attempt at obfuscating the regular expression without changing its meaning too much. Note that the whole thing is just a big disjunction over a bunch of strings. Here’s the first component:

((A|3|Q)l(v|5|c)a(t|3|2)e(l|3|x))

What could it possibly be? Obviously, it’s “Alcatel”:

((A|3|Q)l(v|5|c)a(t|3|2)e(l|3|x))

How about this guy?

((h|p|q)o(q|r|5)t(q|a|h)l(q|m|v)m(h|8|m))

This one is “portalmmm”, which apparently is a mobile user agent used by i-mode mobile browsers. Finally, what is this:

((9|I|6)C(v|1|o)p(q|8|p)i(q|e|j)d(v|8|F)r(v|o|q)m(4|P|9)a(s|2|0)s(c|4|X)a(v|7|l)l(q|7|C)o(q|6|d)e(j|4|0)h(a|6|0)a(r|0|q)e(t|5|x)z)

Fairly long to be a mobile user agent. Indeed, it becomes ICopiedFromPasscallCode?haaretz – which is a rudimentary copy-protection mechanism.

I can tell you right away that r2 is no different:

((a|3|Q)n(v|5|d)r(o|3|2)i(d|3|x))|((X|6|b)l(a|2|j)c(k|X|q)B(4|e|8)r(r|Q|0)y)|((G|x|h)T-P(v|9|1)0(q|0|x)0)|((H|7|Q)T(q|6|C))|((Q|6|H)u(6|a|5)w(X|5|e)i[u/-])|((i|1|0)p(h|5|a)d)|((q|i|v)p(h|9|0)o(h|6|n)e)|((j|2|m)o(5|t|8)o(4|r|8)o(l|2|q)a)|((4|M|7)O(T|7|0)[-_])|((8|n|6)o(k|3|x)i(h|2|a))|((s|Q|1)o(x|4|n)y(x|8|e)r(q|8|i)c(s|5|Q)s(o|3|Q)n)|((h|6|s)a(4|m|5)s(h|u|x)n(g|6|3))|((j|3|P)a(l|X|h)m)|((x|5|p)h(h|7|i)l(i|7|2)p(3|s|5))|((v|U|Q)P.(v|6|B)r(o|0|j)w(s|7|Q)e(r|3|q))|((w|5|2)i(7|n|5)d(q|3|o)w(6|s|4) (((9|p|5)h(o|X|q)n(e|x|q))|((h|c|q)e)))|((8|I|7)E(X|8|M)o(h|5|b)i(h|6|l)e)|((9|V|7)o(d|Q|j)a(f|8|Q)o(X|4|n)e)|((X|9|o)p(q|e|j)r(h|a|j) (v|m|q)o(v|b|x)i)|((4|o|5)p(e|0|2)r(v|4|a) (2|m|5)i(v|9|n)i)|((q|7|s)y(q|m|0)b(q|6|i)a(n|0|q))|((X|8|1)o(o|X|0)m)|( P(q|r|h)e[/])

This yields stuff like “android”, “ipad”, “iphone”, “windows phone”, “Xoom”, and many others. And then there’s r3, which will disqualify a user agent from redirecting to the mobile site:

((i|3|Q)p(v|5|a)d)|((q|9|V)i(7|e|8)w(j|P|X)a(Q|8|d))|((M|7|Q)Z(q|6|9)0(1|X|q))|((q|8|G)T-P(q|2|1)0(3|0|8)0)|((q|6|G)T-P(v|7|Q)5(v|8|0)0)|((j|5|X)o(3|o|5)m)

…and here we have “ipad” and “Xoom” again, which is quite silly because we just saw them in r2. Probably the obfuscation layer makes it hard for the PassCall developers to make changes :-)

All in all, here is a (partial) set of user agents that PassCall will redirect to the mobile website and a (partial) set of user agents that they won’t (the list is based on what I’ve seen on Haaretz’s website on January 31, 2012):

Will redirect if starts with: Alcatel, ICopiedFromPasscallCode?haaretz, LGE-, Maui Browser, SEC-, SGH-, SIE-, SK-, SONIM-, Sendo-, Telit-, portalmmm

Will redirect if contains: android, blackBerry, GTP-?, HTC, Huawei, ipad, iphone, motorola, MOT-, nokia, sonyericcson, samsung, Palm, philips, UP.Browser, windows phone, windows ce, IEMobile, Vodafone, opera mobi, opera mini, symbian, Xoom, Pre

Will not redirect if contains: ipad, ViewPad, MZ601, Xoom

I am curious about other tablets, such as the Samsung Galaxy Tab, meeting the criteria for a mobile device. Indeed, with the Galaxy Tab user agent (“Mozilla/5.0 (Linux; U; Android 3.0; xx-xx; GT-P7100 Build/HRI83) AppleWebkit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13”) we are getting the mobile version. And the most annoying thing? I don’t see a way on the mobile version to switch back to the desktop version if I’d like. And that’s the number one fallback you should have if you’re using blunt regular expressions to determine which website to show me.

To summarize:

  • Haaretz is using PassCall Advanced Technologies to redirect its mobile visitors to a crippled mobile version of the website
  • PassCall is using a set of regular expressions to determine whether a user’s user agent represents a mobile device and performs an unconditional redirect
  • PassCall provides the same mobile experience for a 2011 iPhone 4S and a 2005 Nokia feature-phone
  • There doesn’t seem to be a way to get back to the desktop version from the dumbed-down mobile one

I would greatly appreciate any comments and corrections. This research has been performed for personal purposes and does not represent the position of my employer.


I am posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn

Managed-Unmanaged Interoperability in the 2010s: Part 3, C++/CLI and Other Options

What if P/Invoke is not enough? Hold on, why should it not be enough? What needs could we possibly have but calling global C-style functions exported from DLLs?

Well, suppose you want to use a C++ class exported from a DLL. Or maybe the C++ class is not yet exported and you are looking for a way to make it available to managed code. The problem with a C++ class as opposed to a global function is that … it is not a global function.

For example, a typical C++ class with a constructor and a couple of instance methods would be compiled to roughly the following when you look at it as a bunch of C-style calls:

//C++-style class declaration
class Klass {
private:
    int _n;
public:
    Klass(int n) : _n(n) {}
    void Work(int m) ...
    virtual bool Sleep() ...
};
//C++-style class usage
Klass k = new Klass(42);
k->Work(13);
bool b = k->Sleep();

//C-style class declaration
struct Klass {
    void* vfptr[1];
    int _n;
};
bool __Klass_Sleep() ...
void __Klass_Ctor(Klass* pThis, int n) {
    pThis->vfptr[0] = &__Klass_Sleep;
    pThis->_n = n;
}
void __Klass_Work(Klass* pThis, int m) ...
//C-style class usage
Klass* k = (Klass*)malloc(sizeof(Klass));
__Klass_Ctor(k, 42);
__Klass_Work(k, 13);
bool b = ((bool(*)(Klass*))k->vfptr[0])();

What am I saying here? I’m saying that using a C++ class from P/Invoke signatures would entail mimicking the C-style class usage below – i.e. declaring a P/Invoke signature for the constructor, for the instance methods, and calling virtual methods through the virtual function table. This is clearly intractable.

C++/CLI offers a reasonable approach that bridges .NET and C++ interoperability in both directions, and allows both sides to be the initiator (unlike P/Invoke). It has been covered well elsewhere, and I have written a post on automating the reverse direction four years ago. Still, a brief recap is in order.

You create a new C++/CLI assembly that bridges the managed and unmanaged worlds. It can be a part of an existing C++ DLL (which will then contain parts compiled with /CLR), or a brand new one. In your new assembly, you can create three categories of types:

  1. Managed types – “ref class” or “value class”, which are fully privileged citizens of the managed world. These types can be exported from your assembly and form the façade to other managed code (e.g. C#).
  2. C++ classes compiled to IL – can be accessed directly by managed types within the same assembly, or by (3) below.
  3. C++ classes compiled to machine code – can be accessed directly by managed types within the same assembly, by (2) above, or by any native C++ code within the same DLL. They can also be exported from the DLL as C++ classes.

Now there are two interop directions:

  • Managed code calls into a managed type in the C++/CLI assembly, which calls into (2) or (3) or any other unmanaged API, including Win32.
  • Unmanaged code calls into (2) which calls into (1) or any managed type, including the whole .NET Framework.

For more information about C++/CLI, you can check out my post I mentioned above, and there’s plenty of good docs on MSDN and a book (which, frankly, I haven’t read).


Is there anything else? Any other options for interop in the 2010s? Well, as the matter of fact, there are quite a few:

COM Interop – managed code can very easily call COM objects and managed classes can act as COM objects very easily. This allows bidirectional interoperability, but means you have to learn COM (most probably ATL) to expose COM objects from the unmanaged world. Verdict: Ugh.

C++/CX – the language extensions for C++ (/ZW) that make consuming and creating Windows 8 components very easy from C++. Not surprisingly, C++/CX is based on COM, but makes it much easier to create COM objects, which can be subsequently consumed by managed code. Verdict: Perhaps a feasible alternative when Visual Studio 11 ships.

CXXI Mono’s new technology which extends gcc and creates an interoperability façade easily accessible from managed code. This works not only for global functions, like P/Invoke, but for complex C++ code as well. Verdict: This might just be the future.


I am posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn

Managed-Unmanaged Interoperability in the 2010s: Part 2, P/Invoke – Unmanaged to Managed

We’ve got P/Invoke doing some heavy lifting of managed-to-unmanaged signature translation. The opposite direction is surprisingly easy as well.

Typically, you would encounter the need for unmanaged code to call managed code as part of a callback. (There is the scenario where unmanaged code is the interop initiator, which we will discuss in another post.)

In the unmanaged world, callbacks are function pointers; in the managed world, callbacks are delegates. Recall that a delegate “knows” not only the method that needs to be called, but also the target object – and indeed, you can create delegates that reference an instance method on a specific object. On the other hand, a function pointer is just that – a pointer – you can’t use a single pointer to store information about the method and the target object.

Obviously, a conversion is in place, and this conversion is also handled automatically by P/Invoke. For example, suppose you want to call the EnumWindows Win32 API to enumerate all the windows on the system (and perhaps retrieve their text using the awesome GetWindowText wrapper we developed earlier). The function’s signature is:

BOOL WINAPI EnumWindows(
__in WNDENUMPROC lpEnumFunc,
__in LPARAM lParam
);

Let’s ignore the second parameter for now, and consider the first one. It’s a function pointer, specifically to a function that has the following signature:

BOOL CALLBACK EnumWindowsProc(
__in HWND hwnd,
__in LPARAM lParam
);

What’s the calling convention here? CALLBACK is a typedef for stdcall, which is the default Win32 calling convention.

We would like to pass a delegate to the EnumWindows function instead of a function pointer, or at least treat a managed delegate as a function pointer. There are two ways of doing this. The harder way would be to declare EnumWindows as a method that takes a function pointer for its first parameter, and then obtain manually that function pointer from a managed delegate:

[DllImport("user32")]
public static extern bool EnumWindows(
IntPtr lpEnumFunc, uint lParam);

[UnmanagedFunctionPointer]
public delegate bool EnumWindowsProc(
IntPtr hWnd, uint lParam);

EnumWindowsProc proc = new SomeClass().SomeMethod;
IntPtr fptr = Marshal.GetFunctionPointerForDelegate(proc);
EnumWindows(proc, 0);
GC.KeepAlive(proc);

OK, so what’s going on here? First of all, our P/Invoke signature takes a pointer – very sad. Next, we have a delegate that matches the EnumWindowsProc signature. The [UnmanagedFunctionPointer] attribute is not strictly required in this case, but you should know that it can be used to customize the calling convention of the obtained delegate, as well as other things.

Finally, to call the actual entry point, we create a delegate and obtain a function pointer from it using Marshal.GetFunctionPointerForDelegate. That’s the function pointer we pass to EnumWindows. All that’s left is to exercise extra caution, and make sure the delegate is not garbage collected while the EnumWindows method executes. Fortunately, it executes synchronously, so the GC.KeepAlive call after the EnumWindows call would ensure that the delegate is not collected under any circumstances.

If this whole delegate-being-collected stuff seems nonsensical, take a look at the MDA that detects this kind of bugs – CallbackOnCollectedDelegate.

You might be wondering how Marshal.GetFunctionPointerForDelegate overcomes the problem of converting a pair <method,target> to a single pointer. Indeed, what typically happens is that a small stub is generated on the fly – this stub’s address is the unmanaged function pointer, and what it does is call the method on the appropriate object, which is hardcoded into it.

The easier approach would be to let P/Invoke do the conversion from a managed delegate to the unmanaged function pointer, and would work just as well:

[DllImport("user32")]
public static extern bool EnumWindows(
EnumWindowsProc lpEnumFunc, uint lParam);

[UnmanagedFunctionPointer]
public delegate bool EnumWindowsProc(
IntPtr hWnd, uint lParam);

EnumWindowsProc proc = new SomeClass().SomeMethod;
EnumWindows(proc, 0);
GC.KeepAlive(proc);

Still, it is the caller’s responsibility to make sure the delegate is not collected during the unmanaged call.


I am posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn

Managed-Unmanaged Interoperability in the 2010s: Part 1, P/Invoke – Managed to Unmanaged

With Windows 8 around the corner, managed code slowly taking over legacy systems written in C++, games developed in a mixture of .NET and C++, and the rest of this technology soup – I thought it would be a good time to provide a quick refresher of the available interoperability mechanisms between managed and unmanaged code. Nothing here is very new, but I get so many questions about it that at the very least I would have something to refer people to.


P/Invoke is best suited for managed code invoking global functions exported from C-style DLLs. For example, if your awesome WPF app needs just this one Win32 API, say AccessCheckByTypeResultListAndAuditAlarmByHandle, then you can declare a “managed” signature for it and call it as if it was a managed method.

Let’s take a look at a simpler example first. Suppose you want to call memcpy to copy around a chunk of memory (because you don’t trust Array.Copy). Well, you figure it’s exported from msvcrt.dll (the C runtime DLL), and then declare it like this:

[DllImport("msvcrt",
           CallingConvention=CallingConvention.Cdecl)]
public static extern int memcpy(
           byte[] dst, byte[] src, uint len);

Now you can call this method from C# as if the unmanaged function indeed knew what managed byte arrays were. The P/Invoke layer creates the magic here: it takes managed byte arrays that reside on the GC heap, pins them (so that the GC can’t move them while they are accessed by memcpy), and passes a pair of pointers to memcpy. When memcpy returns, the P/Invoke layer unpins the managed byte arrays as if nothing happened.

A somewhat more complicated example, where the power of P/Invoke really shines, is when the memory allocation becomes tricky. For example, consider the GetWindowText Win32 API:

int WINAPI GetWindowText(
  __in   HWND hWnd,
  __out  LPTSTR lpString,
  __in   int nMaxCount
);

This function receives a string buffer that it is supposed to fill with the text retrieved from a particular window handle. Mapping the window handle to a managed type is easy – it’s just an opaque IntPtr returned by various window-management Win32 APIs. Now, the buffer itself is a pointer to a string that GetWindowText is supposed to fill out – however, recall that managed strings are immutable!

P/Invoke helps again here by accepting a StringBuilder where a mutable string is expected. In fact, the capacity of the StringBuilder instance is the buffer size we can pass as the third parameter, obtaining this managed code:

[DllImport("user32")]
public static extern int GetWindowText(
    IntPtr hWnd, StringBuilder lpString, int nMaxCount);

IntPtr hwnd = ...;
StringBuilder text = new StringBuilder(100);
GetWindowText(hwnd, text, 100);

Note that this time we didn’t specify the calling convention because GetWindowText is a standard Win32 API, and uses the stdcall calling convention, which is also P/Invoke’s default.

There are even more complex examples where you can use P/Invoke to manage the mapping of complex structures and in/out parameters. Nonetheless, the AccessCheckByTypeResultListAndAuditAlarmByHandle horror is not any less horrible – the function has 17 parameters, most of them pointers to various structures and arrays, and figuring out the mapping won’t be very easy.

As you see, P/Invoke does not absolve you from understanding the unmanaged function signature. It’s important that you understand the various function calling conventions, and how parameters can be mapped from managed types to unmanaged types. The great pinvoke.net community website is of immense assistance here, and can provide you a ready-made P/Invoke signature for almost anything. And then there’s also the Microsoft-made P/Invoke Signature Generator, which I mentioned here a couple of years ago.


I am posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn

Aggressive Inlining in the CLR 4.5 JIT

Inlining is an important optimization that allows compilers to eliminate the cost of method calls in situations where the method call overhead is more significant than the method body itself. The CLR JIT uses inlining conservatively, but features some nice tricks such as interface method call inlining – this was one of the first things I covered on this blog, almost five years ago.

The limitations on JIT inlining are not known precisely, but some criteria have been announced previously (in 2004!). Namely, the JIT won’t inline:

  • Methods marked with MethodImplOptions.NoInlining
  • Methods larger than 32 bytes of IL
  • Virtual methods
  • Methods that take a large value type as a parameter
  • Methods on MarshalByRef classes
  • Methods with complicated flowgraphs
  • Methods meeting other, more exotic criteria

Today, however, I’d like to direct your attention towards a new flag in CLR 4.5, MethodImplOptions.AggressiveInlining. The documentation here is very brief:

The method should be inlined if possible.

Well, thanks. Interestingly, Mono introduced support for this option as well, committed on January 5th (two weeks ago!), and here’s the effect it has on deciding whether to inline a method, inside the function mono_method_check_inlining:

-  if (header.code_size >= inline_limit)
+  if (header.code_size >= inline_limit && !(method->iflags & METHOD_IMPL_ATTRIBUTE_AGGRESSIVE_INLINING))

(The inline_limit parameter is configurable by an environment variable, and defaults to 20.)

So, what is the effect of AggressiveInlining on the Microsoft JIT?

From what I checked, it seems to be similar to what it does in Mono. Namely, methods that are not inlined only because of code size are inlined when this attribute is applied to them.

Here are two methods that you can try yourself:

public static int SmallMethod(int i, int j)
{
    if (i > j)
        return i + j;
    else
        return i + 2 * j - 1;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int LargeMethod(int i, int j)
{
    if (i + 14 > j)
    {
        return i + j;
    }
    else if (j * 12 < i)
    {
        return 42 + i - j * 7;
    }
    else
    {
        return i % 14 - j;
    }
}

The code size for these methods is 16 and 34, respectively. Without the AggressiveInlining attribute, the first method is inlined and the second is not inlined. With the AggressiveInlining attribute, the second method is inlined as well.

However, methods that couldn’t be inlined previously because of other criteria are still not inlined. I checked the following, and neither of these methods was inlined:

  • Recursive method
  • Virtual method (even if the static type of the receiver variable is sealed)
  • Method with exception handling (representing a “complicated flowgraph”)

I am posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn

P/Invoke Stack Imbalance MDA

More than a year after writing my first post touching on the subject of Managed Debugging Assistants (MDA) through the “Callback on Garbage Collected Delegate” case study, it’s time for a brief mention of another useful MDA – “P/Invoke Stack Imbalance”.

This MDA fires whenever a P/Invoke call causes an imbalance on the stack. What does a stack imbalance mean? The top of the thread’s stack is pointed to by the ESP register (RSP on x64), and a stack imbalance occurs if the ESP value before making a function call is not the same as the ESP value after the function returns.

Why would a P/Invoke call cause a stack imbalance? Well, recall that parameters are typically passed on the stack when doing P/Invoke calls, and there is no single calling convention that every API in the world adheres to. For more information on x86 calling conventions, here’s a great (and lengthy) read [less details]; the two most popular calling conventions are stdcall and cdecl, which differ on a very important aspect: who is responsible for removing the function parameters from the stack after the function call completes.

In the cdecl calling convention (which is the default for C/C++ functions to this day), the caller is responsible for removing parameters from the stack. In other words, whenever the compiler encounters a function call and the function uses the cdecl calling convention, it will emit assembly instructions to remove the parameters from the stack (e.g. “add esp, 8” to remove two integer parameters from the stack).

On the other hand, in the stdcall calling convention (which is used almost exclusively by Win32 APIs), the callee is responsible for removing parameters from the stack. When the compiler emits code for an stdcall function call, it does not include instructions for removing parameters from the stack, and relies instead on the called function to do so.

Observe that two things can go wrong here, even if we assume only two calling conventions. Suppose there is a function void f(int) using the cdecl calling convention and you call it (mistakenly!) with the stdcall calling convention. After calling the function, the stack contains an extra four bytes of garbage – which is undesired, but will not crash the program. Here is a conceptual view of the stack before the call:

local var of caller
local var of caller
saved EBP of caller
ret address of caller
param of caller
param of caller

During the call:

local var of f
saved EBP of f
ret address of f
param of f
local var of caller
local var of caller
saved EBP of caller
ret address of caller
param of caller
param of caller

After the call:

param of f (garbage!)
local var of caller
local var of caller
saved EBP of caller
ret address of caller
param of caller
param of caller

Now suppose that there is a function void g(int) using the stdcall calling convention and you call it with the cdecl calling convention. The function cleans up the four byte parameter from the stack, and so does the caller – now, if additional functions are called, the stack space previously reserved to the calling function may be reused for other data (e.g. return address for a function).

When using P/Invoke, there’s ample chance to create a stack imbalance, typically because of a calling convention mismatch. The [DllImport] attribute has an explicit property called CallingConvention, which you must set to the appropriate value (the default is Stdcall!).

The “P/Invoke Stack Imbalance” MDA detects the condition where the stack becomes unbalanced after a P/Invoke call, and raises an exception in the debugger (as any MDA does). This gives you an immediate opportunity to fix the problematic P/Invoke signature.

image

To fix the signature, the best bet is to look up the documentation and the header files for the appropriate calling convention. If you can’t figure out the problem, try to call the native function from native code and look at the assembly generated at the call site. Perhaps you’ll be able to see a parameter size mismatch or an unfamiliar calling convention being used.


I am posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn

Garbage Collection in The Age of Smart Pointers

A few days ago I was asked on Twitter whether research into garbage collection is paying off, considering the super-smart-pointers introduced into C++ and other language and library tricks. Let’s take a look at some of the smart pointer facilities introduced in C++11, and then tackle them from a garbage collection perspective.

C++11 features three standard smart pointer classes:

  • unique_ptr<T> wraps a pointer to a value and guarantees that there is only one pointer at a time to that value. You cannot copy unique_ptr<T> instances around (only move them), and when the unique_ptr<T> instance is destroyed, the underlying pointer is deleted.
  • shared_ptr<T> wraps a pointer to a value and a reference count, allowing multiple locations in the program to point to the same value at the same time. You can copy shared_ptr<T> instances around, and when the last shared_ptr<T> instance with a given pointer is destroyed, the underlying pointer is deleted.
  • weak_ptr<T> wraps a pointer to a value and can be converted to a shared_ptr<T>, temporarily (using the lock() method). However, while you have only a weak_ptr<T> instance, the underlying pointer can be deleted (because there are no “strong” pointers to it), and then your attempt to convert it to a shared_ptr<T> will fail gracefully. (This is very similar to the implementation of short weak references in garbage collected languages, such as .NET’s WeakReference class.)

From a garbage collection perspective, unique_ptrs are not that interesting because they wrap access to a uniquely owned pointer, which is not available to anyone else. Shared_ptrs and weak_ptrs, however, are more interesting, because they provide actual semantics for shared ownership of resources and their eventual reclamation. Moreover, the fundamental problem of reference counting GC, namely that of cycles (e.g. see Python GC treatment of this issue), is addressed by providing a fairly convenient weak pointer concept.

What are the benefits of using automatic garbage collection when you have shared pointers and weak pointers, and don’t have to call “delete” in your C++ programs anymore?

Reference counting is costly. Every copy of a reference (shared_ptr, in C++’s case), involves updating a reference count, and that update requires multiprocessor synchronization. Even worse, if an object is shared across multiple threads, the reference count updates may invalidate cache lines on multiple processors. In the C++ shared_ptr, only the control block will be invalidated, but in “legacy” reference counting environments, where the reference count is embedded in the object, the object itself may be invalidated, which has an even higher performance cost.

Using weak_ptr is counter-intuitive and hard. As a programmer, I don’t want to be forced to analyze and break every reference cycle in my application. There are the obvious textbook examples with managers referencing their direct reports referencing back their managers, which can be broken with weak_ptrs; however, imagine a cycle that is not formed deterministically and that contains dozens of objects – even figuring out that you have a cycle is an impossible task.

Using smart pointers requires extraordinary consistency. C++ developers might be able to go on a “delete-elimination” crusade and convert their entire application to use smart pointers. I just don’t see this library feature ever being as convenient as standard C# references.

Reclaiming many objects at once might be more efficient than reclaiming each individual object. You may have heard horror stories about multi-second delays introduced by garbage collections (missiles diverted from their path, anyone?), but most garbage collections have sub-millisecond times, and most applications, even on fully-loaded servers, don’t suffer by default from garbage collection latencies. In fact, when you think about it, garbage collection costs (in the small object heap) are roughly linear in the number of live objects, so if most of your heap is garbage, collecting it is a fairly rapid task. On the other hand, if you reclaimed each object individually, you’d pay the same price for each piece of that garbage, instead of reclaiming it in one fell swoop.

Now, if you consider some of the “futuristic” advances in garbage collection, you might be even more inclined to prefer it over smart pointers. Take a look at Azul’s Zing pauseless garbage collection, which runs concurrently with the application and introduces only very rare and short pause times, even for huge (~100GB) heaps. If this is the future of garbage collection, we won’t need smart pointers.


 I am posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn

Stack Unwind Does Not Occur When C++ Exceptions Propagate Into Managed Code

This is a bug Dima encountered several months ago, and I’ve been looking for an opportunity to document ever since.

tl;dr – when unmanaged C++ code throws an exception which propagates through an interop boundary to managed code, C++ destructors are not called. To fix, don’t allow C++ exception propagation outside module boundaries, or compile with /EHa.

It was recorded before on this StackOverflow thread, where the conclusion is that “the CLR hooks into SEH handling to catch native exceptions and process them into CLR exceptions. […] the exception looks like an SEH exception to C++ and destructors are […] not run”.

There are several solutions, in increasing order of “recommendedness”:

  • Do not allow C++ exceptions to propagate untouched across the interop boundary. This is generally a good idea, because C++ exception handling is really best handled by C++ code only, and allowing exceptions to propagate outside is dangerous in other settings as well (e.g. exporting a function from a DLL that throws a C++ exception). If this were an option, you wouldn’t be reading on, so here goes . . .
  • Compile your C++ code with /EHa to indicate that SEH exceptions may be thrown from C++ code. This will limit compiler optimizations w.r.t. stack unwind and destructor invocation, which is exactly what we need in this scenario.
  • Use the _set_se_translator function to translate SEH exceptions back to C++ exceptions (basically, just by throwing a C++ exception from the body of the function). This is a hack and probably expensive, performance-wise.

A few years ago I demonstrated the interaction between the CLR and SEH by installing my own unhandled exception filter. This is just another demonstration of how brittle the state of affairs is between managed and unmanaged code.


I am posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn

Things Learned in 2011 and Plans for 2012

I keep telling junior software developers that the only way to maintain their expertise and to become better developers is a continuous learning process. 2011 has been a very productive year for me (and many others at SELA!), and I am looking forward to 2012, the year of Windows 8, in eager anticipation. Below are some of the things I learned in 2011 and some of the things planned for 2012.

Learned in 2011

Most of Q1 2011 I was working on the brand-new Parallel Programming in .NET 4.0 course with Bnaya Eshet. I’ve been using the TPL for a while before that, but writing slide decks, labs, and demos – culminating in actually teaching the course – is definitely the best way to learn a subject of this magnitude.

Later during the year I updated the Windows Concurrent Programming course (for C++/Win32 developers), including brand new content on ConcRT, synchronization and threading internals, and a bunch of new hands-on labs.

Back in 2010 I delivered a user group presentation on C++0x, which emerged as a final standard in late September. At the December SDP, Noam Sheffer and I delivered a whole day on the new C++11 language standard and standard library, which involved getting all the nitty-gritty details of the new language syntax (such as perfect forwarding, which I find the hardest “feature” to teach).

Next up is JavaScript. Yes, seriously. If you ever talked to me for more than a few minutes you know that I hate UI development in general, and Web development in particular. However, this was a gaping hole in my understanding of Web-related performance problems and bugs, and a gaping hole in the “general education” of any software developer. So I forced myself to relearn HTML, CSS, and JavaScript, and used the opportunity to learn the new HTML5 JavaScript APIs, jQuery, and even node.js (which should be the subject of another post).

Another thing I was always very interested in but rarely had the time to invest in professionally is security research. Although my day job usually does not involve any serious security research, I like to keep my general knowledge up to date by subscribing to vulnerability disclosure lists, following notorious security researchers, practicing simple exploitation and reverse-engineering scenarios with modern tools, and so on. In 2011 I invested in my reverse engineering skills and some modern exploitation techniques such as ROP.

Any list summarizing 2011 will be woefully incomplete without mentioning Windows 8. At //build we saw a glimpse of what’s to come in 2012, but the Windows 8 Developer Preview is already an exciting consumer experience and an exciting target for software development. I wrote an article on Windows 8 security aspects [pdf, Hebrew] and presented Windows 8 at the December SDP keynote, but there’s obviously much more to learn here, and I’ll leave it to 2012.

Finally, here’s a grocery list of some smaller things I learned in 2011 (some of which I don’t understand yet in a professional capacity):

Planned for 2012

2012 is going to be remembered as the year of Windows 8. With a sharp turn to a new runtime, development framework, UI style, and form factors – it’s inevitably going to take quite a while to learn and practice. From what I’ve seen so far, it will be fun :-)

General-purpose GPU computing (GPGPU) is becoming mainstream, especially with an awesome framework like C++ AMP behind it. In 2012, I hope to find opportunities to use C++ AMP in a real project, and expand my understanding of the underlying GPU concepts.

Finally, on a less professional level, I plan to expand my horizons in Web application security, specifically new advances in HTML injection, smart XSS attacks, crypto weaknesses, and similar topics.

Now, if I had to guess in 2010 whether I will be spending time on Solaris or JavaScript in 2011, there’s no way I’d say yes. I’m really looking forward to January 1, 2013 to see what actually happens in 2012 :-)


I am posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn