I don’t often read Haaretz, but there are from time to time articles that friends share on Facebook or come up in search results – and I find myself on the Haaretz website. Often enough, it happens on my mobile phone – and every time I find myself redirected to a very primitive version of the website. Compare for yourself:

(Screenshot on the right obtained by changing the user agent in the Chrome Canary build. Very nice built-in feature.)
I was curious what were the criteria used by the Haaretz website to do this redirect, and started sniffing around the traffic with Fiddler. After most of the Haaretz front page has been downloaded, the browser suddenly issued a request for g.watap.net/w2w/haaretz, which issues not one, but two 302 redirects and eventually lands on the crippled mobile version.

Interestingly, I’ve tried more than one mobile user agent, and the resulting mobile website was pretty much the same (so I am getting the same experience with an iPhone, Android, or a feature-phone). I believe this is a poor choice on Haaretz’s behalf, so I started investigating a little.
I started by running a whois query on watap.net, and found that it’s registered through Go Daddy for PassCall Advanced Technologies. Then I turned my attention to PassCall, and found on their website that they are providing a platform that adapts existing websites to mobile browsing. Indeed, I find Haaretz in their list of customers. From what I could tell, all the customers are Israeli companies, and the g.watap.net host resolves to an Israeli IP address, probably hosted by NetVision, a major Israeli ISP.
What is the precise process used by PassCall to determine whether or not to redirect my browser to the dumbed-down mobile version? I was brave enough to start reading through the ~8500 lines of HTML and script that is the Haaretz front page. Very close to the beginning there’s a copyright notice by PassCall with a minified script. I won’t paste the whole thing, but here’s a start:
var passcall_pcmdt={i$i:function(){try{eval(function(p,a,c,k,e,d){e=function(c){return(c<a?'':e(parseInt(c/a)))+((c=c%a)>35?String.fromCharCode(c+29):c.toString(36))};if(!''.replace(/^/,String)){while(c--){d[e(c)]=k[c]||e(c)}k=[function(e){return d[e]}];e=function(){return'\\w+'};c=1};while(c--){if(k[c]){p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c])}}return p}('3 f=D(p,q,o,2,e){4(6.7.a(\'5=0\')>-1)m;3 b=s r();b.t(b.n()+B);3 d=s r();d.t(d.n()+1);4(9.c.a(\'y=1\')>-1){6.7=\'5=0; j=/; i=\'+d.l();m}E 4(9.c.a(\'G=1\')>-1){6.7=\'5=0; j=/; i=\'+b.l();m}3 8=6.7.a(\'5=1\')>-1;4(!8){3 v=p.h(k.g);3 x=q.h(k.g);3 u=!o.h(k.g);8=(v||x)&&u;6.7=\'5=\'+F(I(8))+"; j=/; i="+b.l()}4(8){2=9.c.w(9.H,2);4(C e!=\'z\'&&e.A){2+=2.a(\'?\')>-1?\'&\':\'?\';2=2.w(\'?&\',\'?\');2+=e}9.c=2}}',45,45,'||r\x65\x64\x69rt\x6f|\x76\x61r|\x69\x66|___\x70\x63\x6d\x64\x74___|\x64\x6f\x63\x75men\x74|\x63\x6f\x6f\x6b\x69\x65|\x72\x65\x64\x69\x72|l\x6f\x63\x61ti\x6fn|\x69n\x64\x65x\x4ff||\x68\x72\x65\x66|\x62\x62|p\x61r\x61ms||u\x73\x65rAge\x6e\x74|\x74\x65\x73\x74|\x65\x78\x70\x69\x72\x65\x73|\x70a\x74h|\x6e\x61\x76\x69\x67\x61t\x6fr|toU\x54\x43S\x74\x72in\x67|\x72e\x74\x75\x72\x6e|g\x65\x74D\x61te|\x723|\x721|r2|\x44\x61\x74\x65|\x6e\x65\x77|\x73\x65\x74D\x61\x…
I took this beauty to jsbeautifier.org where it got a much prettier shape. Here’s the first part, beautified:
var passcall_pcmdt = {
i$i: function () {
try {
eval(function (p, a, c, k, e, d) {
e = function (c) {
return (c < a ? '' : e(parseInt(c / a))) + ((c = c % a) > 35 ? String.fromCharCode(c + 29) : c.toString(36))
};
if (!''.replace(/^/, String)) {
while (c--) {
d[e(c)] = k[c] || e(c)
}
k = [function (e) {
return d[e]
}];
e = function () {
return '\\w+'
};
c = 1
};
while (c--) {
if (k[c]) {
p = p.replace(new RegExp('\\b' + e(c) + '\\b', 'g'), k[c])
}
}
return p
}(…
Okay, this is obviously an unpacker – it even says function (p, a, c, k, e, d) right there. Thanks for the hint. So the first part is a slightly minified unpacker, and the hex-encoded strings (not shown here) are probably the actual code. Instead of trying to run the unpacking algorithm with a pen and paper, I simply put a breakpoint in the beginning of the script and started stepping in and out until I got this beautiful function, called f, which does the interesting part:
var f = function(r1, r2, r3, redirto, params) {
if (document.cookie.indexOf('___pcmdt___=0') > -1)
return;
var b = new Date();
b.setDate(b.getDate() + 360);
var bb = new Date();
bb.setDate(bb.getDate() + 1);
if (location.href.indexOf('snopcmdt=1') > -1) {
document.cookie = '___pcmdt___=0; path=/; expires=' + bb.toUTCString();
return
} else if (location.href.indexOf('nopcmdt=1') > -1) {
document.cookie = '___pcmdt___=0; path=/; expires=' + b.toUTCString();
return
}
var redir = document.cookie.indexOf('___pcmdt___=1') > -1;
if (!redir) {
var b1 = r1.test(navigator.userAgent);
var b2 = r2.test(navigator.userAgent);
var b3 = !r3.test(navigator.userAgent);
redir = (b1 || b2) && b3;
document.cookie = '___pcmdt___=' + parseInt(Number(redir)) + "; path=/; expires=" + b.toUTCString()
}
if (redir) {
redirto = location.href.replace(location.host, redirto);
if (typeof params != 'undefined' && params.length) {
redirto += redirto.indexOf('?') > -1 ? '&' : '?';
redirto = redirto.replace('?&', '?');
redirto += params
}
location.href = redirto
}
}
Note how this is no longer obfuscated, and perfectly readable. The script starts by checking if there is a cookie instructing it whether to do the mobile redirect or not. Marked in bold are the interesting parts – this is what we get if we have to make a new decision – and then the redirect itself is simply replacing location.href with a new location. The whole redirect-or-not logic boils down to three regular expressions (r1, r2, r3). Let’s take a look at these regular expressions. Here is r1:
^(((A|3|Q)l(v|5|c)a(t|3|2)e(l|3|x))|((X|6|E)Z(O|2|j)S)|((9|H|6)D(q|2|h)_(h|3|T))|((H|7|1)D_m(i|0|j)n)|((9|I|6)C(v|1|o)p(q|8|p)i(q|e|j)d(v|8|F)r(v|o|q)m(4|P|9)a(s|2|0)s(c|4|X)a(v|7|l)l(q|7|C)o(q|6|d)e(j|4|0)h(a|6|0)a(r|0|q)e(t|5|x)z)|((v|8|L)G(E|7|3)?[-/_])|((0|M|6)a(u|3|Q)i B(r|7|x)o(w|4|0)s(q|6|e)r)|((P|0|h)C(L|2|q)[4-6][4-6])|((q|6|S)E(h|C|Q)-)|((v|S|j)G(H|2|3)-)|((S|0|2)I(E|X|h)-)|((j|4|S)K_)|((q|6|S)O(j|4|N)I(M|x|q))|((8|S|4)e(0|n|4)d(j|o|Q))|((j|8|T)e(j|l|X)i(4|t|6))|((h|p|q)o(q|r|5)t(q|a|h)l(q|m|v)m(h|8|m)))
Looks like a bad-ass regular expression? Not at all. In fact, this is just a light attempt at obfuscating the regular expression without changing its meaning too much. Note that the whole thing is just a big disjunction over a bunch of strings. Here’s the first component:
((A|3|Q)l(v|5|c)a(t|3|2)e(l|3|x))
What could it possibly be? Obviously, it’s “Alcatel”:
((A|3|Q)l(v|5|c)a(t|3|2)e(l|3|x))
How about this guy?
((h|p|q)o(q|r|5)t(q|a|h)l(q|m|v)m(h|8|m))
This one is “portalmmm”, which apparently is a mobile user agent used by i-mode mobile browsers. Finally, what is this:
((9|I|6)C(v|1|o)p(q|8|p)i(q|e|j)d(v|8|F)r(v|o|q)m(4|P|9)a(s|2|0)s(c|4|X)a(v|7|l)l(q|7|C)o(q|6|d)e(j|4|0)h(a|6|0)a(r|0|q)e(t|5|x)z)
Fairly long to be a mobile user agent. Indeed, it becomes ICopiedFromPasscallCode?haaretz – which is a rudimentary copy-protection mechanism.
I can tell you right away that r2 is no different:
((a|3|Q)n(v|5|d)r(o|3|2)i(d|3|x))|((X|6|b)l(a|2|j)c(k|X|q)B(4|e|8)r(r|Q|0)y)|((G|x|h)T-P(v|9|1)0(q|0|x)0)|((H|7|Q)T(q|6|C))|((Q|6|H)u(6|a|5)w(X|5|e)i[u/-])|((i|1|0)p(h|5|a)d)|((q|i|v)p(h|9|0)o(h|6|n)e)|((j|2|m)o(5|t|8)o(4|r|8)o(l|2|q)a)|((4|M|7)O(T|7|0)[-_])|((8|n|6)o(k|3|x)i(h|2|a))|((s|Q|1)o(x|4|n)y(x|8|e)r(q|8|i)c(s|5|Q)s(o|3|Q)n)|((h|6|s)a(4|m|5)s(h|u|x)n(g|6|3))|((j|3|P)a(l|X|h)m)|((x|5|p)h(h|7|i)l(i|7|2)p(3|s|5))|((v|U|Q)P.(v|6|B)r(o|0|j)w(s|7|Q)e(r|3|q))|((w|5|2)i(7|n|5)d(q|3|o)w(6|s|4) (((9|p|5)h(o|X|q)n(e|x|q))|((h|c|q)e)))|((8|I|7)E(X|8|M)o(h|5|b)i(h|6|l)e)|((9|V|7)o(d|Q|j)a(f|8|Q)o(X|4|n)e)|((X|9|o)p(q|e|j)r(h|a|j) (v|m|q)o(v|b|x)i)|((4|o|5)p(e|0|2)r(v|4|a) (2|m|5)i(v|9|n)i)|((q|7|s)y(q|m|0)b(q|6|i)a(n|0|q))|((X|8|1)o(o|X|0)m)|( P(q|r|h)e[/])
This yields stuff like “android”, “ipad”, “iphone”, “windows phone”, “Xoom”, and many others. And then there’s r3, which will disqualify a user agent from redirecting to the mobile site:
((i|3|Q)p(v|5|a)d)|((q|9|V)i(7|e|8)w(j|P|X)a(Q|8|d))|((M|7|Q)Z(q|6|9)0(1|X|q))|((q|8|G)T-P(q|2|1)0(3|0|8)0)|((q|6|G)T-P(v|7|Q)5(v|8|0)0)|((j|5|X)o(3|o|5)m)
…and here we have “ipad” and “Xoom” again, which is quite silly because we just saw them in r2. Probably the obfuscation layer makes it hard for the PassCall developers to make changes :-)
All in all, here is a (partial) set of user agents that PassCall will redirect to the mobile website and a (partial) set of user agents that they won’t (the list is based on what I’ve seen on Haaretz’s website on January 31, 2012):
Will redirect if starts with: Alcatel, ICopiedFromPasscallCode?haaretz, LGE-, Maui Browser, SEC-, SGH-, SIE-, SK-, SONIM-, Sendo-, Telit-, portalmmm
Will redirect if contains: android, blackBerry, GTP-?, HTC, Huawei, ipad, iphone, motorola, MOT-, nokia, sonyericcson, samsung, Palm, philips, UP.Browser, windows phone, windows ce, IEMobile, Vodafone, opera mobi, opera mini, symbian, Xoom, Pre
Will not redirect if contains: ipad, ViewPad, MZ601, Xoom
I am curious about other tablets, such as the Samsung Galaxy Tab, meeting the criteria for a mobile device. Indeed, with the Galaxy Tab user agent (“Mozilla/5.0 (Linux; U; Android 3.0; xx-xx; GT-P7100 Build/HRI83) AppleWebkit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13”) we are getting the mobile version. And the most annoying thing? I don’t see a way on the mobile version to switch back to the desktop version if I’d like. And that’s the number one fallback you should have if you’re using blunt regular expressions to determine which website to show me.
To summarize:
- Haaretz is using PassCall Advanced Technologies to redirect its mobile visitors to a crippled mobile version of the website
- PassCall is using a set of regular expressions to determine whether a user’s user agent represents a mobile device and performs an unconditional redirect
- PassCall provides the same mobile experience for a 2011 iPhone 4S and a 2005 Nokia feature-phone
- There doesn’t seem to be a way to get back to the desktop version from the dumbed-down mobile one
I would greatly appreciate any comments and corrections. This research has been performed for personal purposes and does not represent the position of my employer.
I am posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn
What if P/Invoke is not enough? Hold on, why should it not be enough? What needs could we possibly have but calling global C-style functions exported from DLLs?
Well, suppose you want to use a C++ class exported from a DLL. Or maybe the C++ class is not yet exported and you are looking for a way to make it available to managed code. The problem with a C++ class as opposed to a global function is that … it is not a global function.
For example, a typical C++ class with a constructor and a couple of instance methods would be compiled to roughly the following when you look at it as a bunch of C-style calls:
//C++-style class declaration
class Klass {
private:
int _n;
public:
Klass(int n) : _n(n) {}
void Work(int m) ...
virtual bool Sleep() ...
};
//C++-style class usage
Klass k = new Klass(42);
k->Work(13);
bool b = k->Sleep();
//C-style class declaration
struct Klass {
void* vfptr[1];
int _n;
};
bool __Klass_Sleep() ...
void __Klass_Ctor(Klass* pThis, int n) {
pThis->vfptr[0] = &__Klass_Sleep;
pThis->_n = n;
}
void __Klass_Work(Klass* pThis, int m) ...
//C-style class usage
Klass* k = (Klass*)malloc(sizeof(Klass));
__Klass_Ctor(k, 42);
__Klass_Work(k, 13);
bool b = ((bool(*)(Klass*))k->vfptr[0])();
What am I saying here? I’m saying that using a C++ class from P/Invoke signatures would entail mimicking the C-style class usage below – i.e. declaring a P/Invoke signature for the constructor, for the instance methods, and calling virtual methods through the virtual function table. This is clearly intractable.
C++/CLI offers a reasonable approach that bridges .NET and C++ interoperability in both directions, and allows both sides to be the initiator (unlike P/Invoke). It has been covered well elsewhere, and I have written a post on automating the reverse direction four years ago. Still, a brief recap is in order.
You create a new C++/CLI assembly that bridges the managed and unmanaged worlds. It can be a part of an existing C++ DLL (which will then contain parts compiled with /CLR), or a brand new one. In your new assembly, you can create three categories of types:
- Managed types – “ref class” or “value class”, which are fully privileged citizens of the managed world. These types can be exported from your assembly and form the façade to other managed code (e.g. C#).
- C++ classes compiled to IL – can be accessed directly by managed types within the same assembly, or by (3) below.
- C++ classes compiled to machine code – can be accessed directly by managed types within the same assembly, by (2) above, or by any native C++ code within the same DLL. They can also be exported from the DLL as C++ classes.
Now there are two interop directions:
- Managed code calls into a managed type in the C++/CLI assembly, which calls into (2) or (3) or any other unmanaged API, including Win32.
- Unmanaged code calls into (2) which calls into (1) or any managed type, including the whole .NET Framework.
For more information about C++/CLI, you can check out my post I mentioned above, and there’s plenty of good docs on MSDN and a book (which, frankly, I haven’t read).
Is there anything else? Any other options for interop in the 2010s? Well, as the matter of fact, there are quite a few:
COM Interop – managed code can very easily call COM objects and managed classes can act as COM objects very easily. This allows bidirectional interoperability, but means you have to learn COM (most probably ATL) to expose COM objects from the unmanaged world. Verdict: Ugh.
C++/CX – the language extensions for C++ (/ZW) that make consuming and creating Windows 8 components very easy from C++. Not surprisingly, C++/CX is based on COM, but makes it much easier to create COM objects, which can be subsequently consumed by managed code. Verdict: Perhaps a feasible alternative when Visual Studio 11 ships.
CXXI – Mono’s new technology which extends gcc and creates an interoperability façade easily accessible from managed code. This works not only for global functions, like P/Invoke, but for complex C++ code as well. Verdict: This might just be the future.
I am posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn
We’ve got P/Invoke doing some heavy lifting of managed-to-unmanaged signature translation. The opposite direction is surprisingly easy as well.
Typically, you would encounter the need for unmanaged code to call managed code as part of a callback. (There is the scenario where unmanaged code is the interop initiator, which we will discuss in another post.)
In the unmanaged world, callbacks are function pointers; in the managed world, callbacks are delegates. Recall that a delegate “knows” not only the method that needs to be called, but also the target object – and indeed, you can create delegates that reference an instance method on a specific object. On the other hand, a function pointer is just that – a pointer – you can’t use a single pointer to store information about the method and the target object.
Obviously, a conversion is in place, and this conversion is also handled automatically by P/Invoke. For example, suppose you want to call the EnumWindows Win32 API to enumerate all the windows on the system (and perhaps retrieve their text using the awesome GetWindowText wrapper we developed earlier). The function’s signature is:
BOOL WINAPI EnumWindows(
__in WNDENUMPROC lpEnumFunc,
__in LPARAM lParam
);
Let’s ignore the second parameter for now, and consider the first one. It’s a function pointer, specifically to a function that has the following signature:
BOOL CALLBACK EnumWindowsProc(
__in HWND hwnd,
__in LPARAM lParam
);
What’s the calling convention here? CALLBACK is a typedef for stdcall, which is the default Win32 calling convention.
We would like to pass a delegate to the EnumWindows function instead of a function pointer, or at least treat a managed delegate as a function pointer. There are two ways of doing this. The harder way would be to declare EnumWindows as a method that takes a function pointer for its first parameter, and then obtain manually that function pointer from a managed delegate:
[DllImport("user32")]
public static extern bool EnumWindows(
IntPtr lpEnumFunc, uint lParam);
[UnmanagedFunctionPointer]
public delegate bool EnumWindowsProc(
IntPtr hWnd, uint lParam);
EnumWindowsProc proc = new SomeClass().SomeMethod;
IntPtr fptr = Marshal.GetFunctionPointerForDelegate(proc);
EnumWindows(proc, 0);
GC.KeepAlive(proc);
OK, so what’s going on here? First of all, our P/Invoke signature takes a pointer – very sad. Next, we have a delegate that matches the EnumWindowsProc signature. The [UnmanagedFunctionPointer] attribute is not strictly required in this case, but you should know that it can be used to customize the calling convention of the obtained delegate, as well as other things.
Finally, to call the actual entry point, we create a delegate and obtain a function pointer from it using Marshal.GetFunctionPointerForDelegate. That’s the function pointer we pass to EnumWindows. All that’s left is to exercise extra caution, and make sure the delegate is not garbage collected while the EnumWindows method executes. Fortunately, it executes synchronously, so the GC.KeepAlive call after the EnumWindows call would ensure that the delegate is not collected under any circumstances.
If this whole delegate-being-collected stuff seems nonsensical, take a look at the MDA that detects this kind of bugs – CallbackOnCollectedDelegate.
You might be wondering how Marshal.GetFunctionPointerForDelegate overcomes the problem of converting a pair <method,target> to a single pointer. Indeed, what typically happens is that a small stub is generated on the fly – this stub’s address is the unmanaged function pointer, and what it does is call the method on the appropriate object, which is hardcoded into it.
The easier approach would be to let P/Invoke do the conversion from a managed delegate to the unmanaged function pointer, and would work just as well:
[DllImport("user32")]
public static extern bool EnumWindows(
EnumWindowsProc lpEnumFunc, uint lParam);
[UnmanagedFunctionPointer]
public delegate bool EnumWindowsProc(
IntPtr hWnd, uint lParam);
EnumWindowsProc proc = new SomeClass().SomeMethod;
EnumWindows(proc, 0);
GC.KeepAlive(proc); Still, it is the caller’s responsibility to make sure the delegate is not collected during the unmanaged call.
I am posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn
With Windows 8 around the corner, managed code slowly taking over legacy systems written in C++, games developed in a mixture of .NET and C++, and the rest of this technology soup – I thought it would be a good time to provide a quick refresher of the available interoperability mechanisms between managed and unmanaged code. Nothing here is very new, but I get so many questions about it that at the very least I would have something to refer people to.
P/Invoke is best suited for managed code invoking global functions exported from C-style DLLs. For example, if your awesome WPF app needs just this one Win32 API, say AccessCheckByTypeResultListAndAuditAlarmByHandle, then you can declare a “managed” signature for it and call it as if it was a managed method.
Let’s take a look at a simpler example first. Suppose you want to call memcpy to copy around a chunk of memory (because you don’t trust Array.Copy). Well, you figure it’s exported from msvcrt.dll (the C runtime DLL), and then declare it like this:
[DllImport("msvcrt",
CallingConvention=CallingConvention.Cdecl)]
public static extern int memcpy(
byte[] dst, byte[] src, uint len);
Now you can call this method from C# as if the unmanaged function indeed knew what managed byte arrays were. The P/Invoke layer creates the magic here: it takes managed byte arrays that reside on the GC heap, pins them (so that the GC can’t move them while they are accessed by memcpy), and passes a pair of pointers to memcpy. When memcpy returns, the P/Invoke layer unpins the managed byte arrays as if nothing happened.
A somewhat more complicated example, where the power of P/Invoke really shines, is when the memory allocation becomes tricky. For example, consider the GetWindowText Win32 API:
int WINAPI GetWindowText(
__in HWND hWnd,
__out LPTSTR lpString,
__in int nMaxCount
);
This function receives a string buffer that it is supposed to fill with the text retrieved from a particular window handle. Mapping the window handle to a managed type is easy – it’s just an opaque IntPtr returned by various window-management Win32 APIs. Now, the buffer itself is a pointer to a string that GetWindowText is supposed to fill out – however, recall that managed strings are immutable!
P/Invoke helps again here by accepting a StringBuilder where a mutable string is expected. In fact, the capacity of the StringBuilder instance is the buffer size we can pass as the third parameter, obtaining this managed code:
[DllImport("user32")]
public static extern int GetWindowText(
IntPtr hWnd, StringBuilder lpString, int nMaxCount);
IntPtr hwnd = ...;
StringBuilder text = new StringBuilder(100);
GetWindowText(hwnd, text, 100);
Note that this time we didn’t specify the calling convention because GetWindowText is a standard Win32 API, and uses the stdcall calling convention, which is also P/Invoke’s default.
There are even more complex examples where you can use P/Invoke to manage the mapping of complex structures and in/out parameters. Nonetheless, the AccessCheckByTypeResultListAndAuditAlarmByHandle horror is not any less horrible – the function has 17 parameters, most of them pointers to various structures and arrays, and figuring out the mapping won’t be very easy.
As you see, P/Invoke does not absolve you from understanding the unmanaged function signature. It’s important that you understand the various function calling conventions, and how parameters can be mapped from managed types to unmanaged types. The great pinvoke.net community website is of immense assistance here, and can provide you a ready-made P/Invoke signature for almost anything. And then there’s also the Microsoft-made P/Invoke Signature Generator, which I mentioned here a couple of years ago.
I am posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn
Inlining is an important optimization that allows compilers to eliminate the cost of method calls in situations where the method call overhead is more significant than the method body itself. The CLR JIT uses inlining conservatively, but features some nice tricks such as interface method call inlining – this was one of the first things I covered on this blog, almost five years ago.
The limitations on JIT inlining are not known precisely, but some criteria have been announced previously (in 2004!). Namely, the JIT won’t inline:
- Methods marked with MethodImplOptions.NoInlining
- Methods larger than 32 bytes of IL
- Virtual methods
- Methods that take a large value type as a parameter
- Methods on MarshalByRef classes
- Methods with complicated flowgraphs
- Methods meeting other, more exotic criteria
Today, however, I’d like to direct your attention towards a new flag in CLR 4.5, MethodImplOptions.AggressiveInlining. The documentation here is very brief:
The method should be inlined if possible.
Well, thanks. Interestingly, Mono introduced support for this option as well, committed on January 5th (two weeks ago!), and here’s the effect it has on deciding whether to inline a method, inside the function mono_method_check_inlining:
- if (header.code_size >= inline_limit)
+ if (header.code_size >= inline_limit && !(method->iflags & METHOD_IMPL_ATTRIBUTE_AGGRESSIVE_INLINING))
(The inline_limit parameter is configurable by an environment variable, and defaults to 20.)
So, what is the effect of AggressiveInlining on the Microsoft JIT?
From what I checked, it seems to be similar to what it does in Mono. Namely, methods that are not inlined only because of code size are inlined when this attribute is applied to them.
Here are two methods that you can try yourself:
public static int SmallMethod(int i, int j)
{
if (i > j)
return i + j;
else
return i + 2 * j - 1;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int LargeMethod(int i, int j)
{
if (i + 14 > j)
{
return i + j;
}
else if (j * 12 < i)
{
return 42 + i - j * 7;
}
else
{
return i % 14 - j;
}
}
The code size for these methods is 16 and 34, respectively. Without the AggressiveInlining attribute, the first method is inlined and the second is not inlined. With the AggressiveInlining attribute, the second method is inlined as well.
However, methods that couldn’t be inlined previously because of other criteria are still not inlined. I checked the following, and neither of these methods was inlined:
- Recursive method
- Virtual method (even if the static type of the receiver variable is sealed)
- Method with exception handling (representing a “complicated flowgraph”)
I am posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn
More than a year after writing my first post touching on the subject of Managed Debugging Assistants (MDA) through the “Callback on Garbage Collected Delegate” case study, it’s time for a brief mention of another useful MDA – “P/Invoke Stack Imbalance”.
This MDA fires whenever a P/Invoke call causes an imbalance on the stack. What does a stack imbalance mean? The top of the thread’s stack is pointed to by the ESP register (RSP on x64), and a stack imbalance occurs if the ESP value before making a function call is not the same as the ESP value after the function returns.
Why would a P/Invoke call cause a stack imbalance? Well, recall that parameters are typically passed on the stack when doing P/Invoke calls, and there is no single calling convention that every API in the world adheres to. For more information on x86 calling conventions, here’s a great (and lengthy) read [less details]; the two most popular calling conventions are stdcall and cdecl, which differ on a very important aspect: who is responsible for removing the function parameters from the stack after the function call completes.
In the cdecl calling convention (which is the default for C/C++ functions to this day), the caller is responsible for removing parameters from the stack. In other words, whenever the compiler encounters a function call and the function uses the cdecl calling convention, it will emit assembly instructions to remove the parameters from the stack (e.g. “add esp, 8” to remove two integer parameters from the stack).
On the other hand, in the stdcall calling convention (which is used almost exclusively by Win32 APIs), the callee is responsible for removing parameters from the stack. When the compiler emits code for an stdcall function call, it does not include instructions for removing parameters from the stack, and relies instead on the called function to do so.
Observe that two things can go wrong here, even if we assume only two calling conventions. Suppose there is a function void f(int) using the cdecl calling convention and you call it (mistakenly!) with the stdcall calling convention. After calling the function, the stack contains an extra four bytes of garbage – which is undesired, but will not crash the program. Here is a conceptual view of the stack before the call:
| local var of caller |
| local var of caller |
| saved EBP of caller |
| ret address of caller |
| param of caller |
| param of caller |
During the call:
| local var of f |
| saved EBP of f |
| ret address of f |
| param of f |
| local var of caller |
| local var of caller |
| saved EBP of caller |
| ret address of caller |
| param of caller |
| param of caller |
After the call:
| param of f (garbage!) |
| local var of caller |
| local var of caller |
| saved EBP of caller |
| ret address of caller |
| param of caller |
| param of caller |
Now suppose that there is a function void g(int) using the stdcall calling convention and you call it with the cdecl calling convention. The function cleans up the four byte parameter from the stack, and so does the caller – now, if additional functions are called, the stack space previously reserved to the calling function may be reused for other data (e.g. return address for a function).
When using P/Invoke, there’s ample chance to create a stack imbalance, typically because of a calling convention mismatch. The [DllImport] attribute has an explicit property called CallingConvention, which you must set to the appropriate value (the default is Stdcall!).
The “P/Invoke Stack Imbalance” MDA detects the condition where the stack becomes unbalanced after a P/Invoke call, and raises an exception in the debugger (as any MDA does). This gives you an immediate opportunity to fix the problematic P/Invoke signature.

To fix the signature, the best bet is to look up the documentation and the header files for the appropriate calling convention. If you can’t figure out the problem, try to call the native function from native code and look at the assembly generated at the call site. Perhaps you’ll be able to see a parameter size mismatch or an unfamiliar calling convention being used.
I am posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn
A few days ago I was asked on Twitter whether research into garbage collection is paying off, considering the super-smart-pointers introduced into C++ and other language and library tricks. Let’s take a look at some of the smart pointer facilities introduced in C++11, and then tackle them from a garbage collection perspective.
C++11 features three standard smart pointer classes:
- unique_ptr<T> wraps a pointer to a value and guarantees that there is only one pointer at a time to that value. You cannot copy unique_ptr<T> instances around (only move them), and when the unique_ptr<T> instance is destroyed, the underlying pointer is deleted.
- shared_ptr<T> wraps a pointer to a value and a reference count, allowing multiple locations in the program to point to the same value at the same time. You can copy shared_ptr<T> instances around, and when the last shared_ptr<T> instance with a given pointer is destroyed, the underlying pointer is deleted.
- weak_ptr<T> wraps a pointer to a value and can be converted to a shared_ptr<T>, temporarily (using the lock() method). However, while you have only a weak_ptr<T> instance, the underlying pointer can be deleted (because there are no “strong” pointers to it), and then your attempt to convert it to a shared_ptr<T> will fail gracefully. (This is very similar to the implementation of short weak references in garbage collected languages, such as .NET’s WeakReference class.)
From a garbage collection perspective, unique_ptrs are not that interesting because they wrap access to a uniquely owned pointer, which is not available to anyone else. Shared_ptrs and weak_ptrs, however, are more interesting, because they provide actual semantics for shared ownership of resources and their eventual reclamation. Moreover, the fundamental problem of reference counting GC, namely that of cycles (e.g. see Python GC treatment of this issue), is addressed by providing a fairly convenient weak pointer concept.
What are the benefits of using automatic garbage collection when you have shared pointers and weak pointers, and don’t have to call “delete” in your C++ programs anymore?
Reference counting is costly. Every copy of a reference (shared_ptr, in C++’s case), involves updating a reference count, and that update requires multiprocessor synchronization. Even worse, if an object is shared across multiple threads, the reference count updates may invalidate cache lines on multiple processors. In the C++ shared_ptr, only the control block will be invalidated, but in “legacy” reference counting environments, where the reference count is embedded in the object, the object itself may be invalidated, which has an even higher performance cost.
Using weak_ptr is counter-intuitive and hard. As a programmer, I don’t want to be forced to analyze and break every reference cycle in my application. There are the obvious textbook examples with managers referencing their direct reports referencing back their managers, which can be broken with weak_ptrs; however, imagine a cycle that is not formed deterministically and that contains dozens of objects – even figuring out that you have a cycle is an impossible task.
Using smart pointers requires extraordinary consistency. C++ developers might be able to go on a “delete-elimination” crusade and convert their entire application to use smart pointers. I just don’t see this library feature ever being as convenient as standard C# references.
Reclaiming many objects at once might be more efficient than reclaiming each individual object. You may have heard horror stories about multi-second delays introduced by garbage collections (missiles diverted from their path, anyone?), but most garbage collections have sub-millisecond times, and most applications, even on fully-loaded servers, don’t suffer by default from garbage collection latencies. In fact, when you think about it, garbage collection costs (in the small object heap) are roughly linear in the number of live objects, so if most of your heap is garbage, collecting it is a fairly rapid task. On the other hand, if you reclaimed each object individually, you’d pay the same price for each piece of that garbage, instead of reclaiming it in one fell swoop.
Now, if you consider some of the “futuristic” advances in garbage collection, you might be even more inclined to prefer it over smart pointers. Take a look at Azul’s Zing pauseless garbage collection, which runs concurrently with the application and introduces only very rare and short pause times, even for huge (~100GB) heaps. If this is the future of garbage collection, we won’t need smart pointers.
I am posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn
This is a bug Dima encountered several months ago, and I’ve been looking for an opportunity to document ever since.
tl;dr – when unmanaged C++ code throws an exception which propagates through an interop boundary to managed code, C++ destructors are not called. To fix, don’t allow C++ exception propagation outside module boundaries, or compile with /EHa.
It was recorded before on this StackOverflow thread, where the conclusion is that “the CLR hooks into SEH handling to catch native exceptions and process them into CLR exceptions. […] the exception looks like an SEH exception to C++ and destructors are […] not run”.
There are several solutions, in increasing order of “recommendedness”:
- Do not allow C++ exceptions to propagate untouched across the interop boundary. This is generally a good idea, because C++ exception handling is really best handled by C++ code only, and allowing exceptions to propagate outside is dangerous in other settings as well (e.g. exporting a function from a DLL that throws a C++ exception). If this were an option, you wouldn’t be reading on, so here goes . . .
- Compile your C++ code with /EHa to indicate that SEH exceptions may be thrown from C++ code. This will limit compiler optimizations w.r.t. stack unwind and destructor invocation, which is exactly what we need in this scenario.
- Use the _set_se_translator function to translate SEH exceptions back to C++ exceptions (basically, just by throwing a C++ exception from the body of the function). This is a hack and probably expensive, performance-wise.
A few years ago I demonstrated the interaction between the CLR and SEH by installing my own unhandled exception filter. This is just another demonstration of how brittle the state of affairs is between managed and unmanaged code.
I am posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn
I keep telling junior software developers that the only way to maintain their expertise and to become better developers is a continuous learning process. 2011 has been a very productive year for me (and many others at SELA!), and I am looking forward to 2012, the year of Windows 8, in eager anticipation. Below are some of the things I learned in 2011 and some of the things planned for 2012.
Learned in 2011
Most of Q1 2011 I was working on the brand-new Parallel Programming in .NET 4.0 course with Bnaya Eshet. I’ve been using the TPL for a while before that, but writing slide decks, labs, and demos – culminating in actually teaching the course – is definitely the best way to learn a subject of this magnitude.
Later during the year I updated the Windows Concurrent Programming course (for C++/Win32 developers), including brand new content on ConcRT, synchronization and threading internals, and a bunch of new hands-on labs.
Back in 2010 I delivered a user group presentation on C++0x, which emerged as a final standard in late September. At the December SDP, Noam Sheffer and I delivered a whole day on the new C++11 language standard and standard library, which involved getting all the nitty-gritty details of the new language syntax (such as perfect forwarding, which I find the hardest “feature” to teach).
Next up is JavaScript. Yes, seriously. If you ever talked to me for more than a few minutes you know that I hate UI development in general, and Web development in particular. However, this was a gaping hole in my understanding of Web-related performance problems and bugs, and a gaping hole in the “general education” of any software developer. So I forced myself to relearn HTML, CSS, and JavaScript, and used the opportunity to learn the new HTML5 JavaScript APIs, jQuery, and even node.js (which should be the subject of another post).
Another thing I was always very interested in but rarely had the time to invest in professionally is security research. Although my day job usually does not involve any serious security research, I like to keep my general knowledge up to date by subscribing to vulnerability disclosure lists, following notorious security researchers, practicing simple exploitation and reverse-engineering scenarios with modern tools, and so on. In 2011 I invested in my reverse engineering skills and some modern exploitation techniques such as ROP.
Any list summarizing 2011 will be woefully incomplete without mentioning Windows 8. At //build we saw a glimpse of what’s to come in 2012, but the Windows 8 Developer Preview is already an exciting consumer experience and an exciting target for software development. I wrote an article on Windows 8 security aspects [pdf, Hebrew] and presented Windows 8 at the December SDP keynote, but there’s obviously much more to learn here, and I’ll leave it to 2012.
Finally, here’s a grocery list of some smaller things I learned in 2011 (some of which I don’t understand yet in a professional capacity):
Planned for 2012
2012 is going to be remembered as the year of Windows 8. With a sharp turn to a new runtime, development framework, UI style, and form factors – it’s inevitably going to take quite a while to learn and practice. From what I’ve seen so far, it will be fun :-)
General-purpose GPU computing (GPGPU) is becoming mainstream, especially with an awesome framework like C++ AMP behind it. In 2012, I hope to find opportunities to use C++ AMP in a real project, and expand my understanding of the underlying GPU concepts.
Finally, on a less professional level, I plan to expand my horizons in Web application security, specifically new advances in HTML injection, smart XSS attacks, crypto weaknesses, and similar topics.
Now, if I had to guess in 2010 whether I will be spending time on Solaris or JavaScript in 2011, there’s no way I’d say yes. I’m really looking forward to January 1, 2013 to see what actually happens in 2012 :-)
I am posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn
A few days ago I delivered a session on return-oriented programming, in the context of stack-based buffer overflow exploitation, at the Distributed Systems, Networking and Security seminar (HUJI).
Generally speaking, return-oriented programming (at least in limited form, such as return to libc, return to syscall) is not new at all. It is a very effective means of bypassing stack-based buffer overflow mitigations such as NX (non-executable stack) and W+X. The awesome thing about ROP is that code execution vulnerabilities don’t have to involve actual code being placed in memory – a carefully constructed sequence of stack words can lead to arbitrary code execution through pieces of code (ROP gadgets) located elsewhere in memory.
However, my presentation was based mostly on results from a 2011 paper by Shacham et al., where they show that Linux and Solaris libc binaries contain more than enough ROP gadgets to enable arbitrary control flow, and develop an actual compiler for generating exploit stack structure from a C-like syntax. The paper is short, funny, and highly recommended.
If you’d like to read my short presentation instead, view my slides here.
Last Wednesday I delivered my last session at the SDP: Production Debugging of .NET Applications. After delivering a similar session in the June DevDays, I thought about how I can make it better by focusing on a smaller set of core debugging scenarios and making sure attendees get a chance to practice them first-hand.

Indeed, we had time to discuss and practice the following:
- Capturing crash dumps and hang dumps with ADPlus, Windows Task Manager, and Procdump
- Analyzing crash dumps in Visual Studio 2010 and WinDbg to find the exception that occurred and the stack of other threads
- Detecting memory leaks while the application is running or from an out-of-memory postmortem crash dump using ANTS Memory Profiler and .NET Memory Profiler
- Automatically looking for deadlocks using SOSEX’s !dlk command
If you’re looking for references and sample scenarios, make sure to check out my blog post: .NET and C++ Debugging Resources. I use it every time I teach my full .NET Debugging Workshop, which is packed with >20 hands-on debugging labs.
If you attended my session, thanks a lot for coming and I hope you’ve seen a glimpse of what production debugging can look like. The intimidating task of opening crash dumps or analyzing complex bugs can be fun if you have a set of core scenarios and patterns, and a great toolbox.
I have been recently posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn
Noam and I delivered a joint keynote at the first day of the SDP, covering the new APIs and internal features of .NET 4.5. With my love for internals, I took the easy route of talking about CLR internals and C# 5 async methods, and Noam talked about WCF, Entity Framework, WPF, ASP.NET MVC, and plenty of other frameworks which have been updated in .NET 4.5.

As you probably know, .NET 4.5 is an in-place update for .NET 4.0, which means—as far as Microsoft is concerned—that it should work seamlessly where .NET 4.0 does, with full backwards compatibility. Indeed, I’ve taken the plunge and installed Visual Studio 11 Developer Preview (which ships with .NET 4.5) on my primary laptop and desktop, and haven’t run into any trouble during the last >2 months.
I spent a large part of my 20 minutes talking about C# 5 async methods, which are definitely on their way to becoming my favorite C# feature. It’s going to be very hard to imagine how we used to work directly with continuations and marshaling work back to the UI thread after the dust settles and Visual Studio 11 is released. I haven’t had time to show much more than the simple examples, but I’m pretty sure the potential was clear:
async void DownloadLargeFile(string url) {
_cts = new CancellationTokenSource();
try {
byte[] result = await DownloadAsync(url, cts.Token);
PlayMovie(result);
} catch (OperationCanceledException) {
UpdateStatus("Download canceled.");
}
}
void CancelDownload() { _cts.Cancel(); }
(Async method with cancellation support and exception handling, all covered by the C# 5 compiler. To run this sample, you will need the Visual Studio 11 Developer Preview.)
During the last few minutes of the talk I focused on the latest CLR performance improvements: concurrent background server GC, multi-core background JIT, automatic NGEN, and managed profile-guided optimization. Currently there aren’t many resources on this topics lying around, but the //build session is pretty good and detailed.
It’s the first CLR release in >5 years that brings serious news from the performance front, especially w.r.t. compilation time penalty. I am looking forward to testing the final bits on production systems to see how startup time and CPU utilization are affected by these impressive features.
I have been recently posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn
The SDP started with my 40-minute keynote, Introducing Windows 8. I was working on it for more than 3 weeks, and wasn’t completely sure what I wanted in it until only a few days before the conference. That was also when I decided to ditch the slides and go for a fresh idea: a Metro-style Windows 8 application that contains both the slides and interactive code demos for the session.


(The application’s tile and title page.)
My personal view of Windows 8, after letting the news sink and playing with the system for a couple of months, can be summarized in the following three tenets:
Building on the foundation of Windows 7—the new OS has plenty of new features, but it doesn’t shake the foundations and maintains full compatibility with existing Windows 7 applications, albeit in the desktop environment.
Fresh experience for new form factors—the new Metro touch-centric UI targets the tablet form factor as well as hybrid laptops with touch screens. On these form factors, Metro is a revolution.
New application model and development framework—it’s impossible to achieve good battery life, provide a fresh start for application development, and deliver a brand new UI experience without changing the way applications interact with Windows and with each other, and the fundamental API sets that they can use.
Most of my session focused on the various scenarios, features, and APIs that Windows 8 provides to Metro applications. I mentioned language projections to the various languages—C#, VB, C++, JavaScript—and how the WinRT APIs map naturally to the C# 5 “await” concept or to JavaScript promises.
Below are some of the code demos I’ve shown:
Updating live tiles from application code (as opposed to push notifications)—I created a badge on the application’s tile and offered the user to pin a secondary tile to the start screen. This secondary tile would deep-link into the application, to a specific screen.
Uri logo = new Uri("ms-resource:images/Logo.png");
SecondaryTile secondaryTile = new SecondaryTile(
"SDPKeynoteSecondaryTile",
"SDP Keynote Tile",
"SDP Keynote - Tiles Deep Link",
"SecondaryTile",
TileDisplayAttributes.ShowName,
logo);
bool isPinned = await secondaryTile.RequestCreateAsync();
(Pinning a secondary tile to the start screen, pending the user’s permission.)
Next, I showed how pickers can be used to integrate with other applications and with Windows without implementing specific interfaces and being aware of all data providers on the system. Namely, I used the image picker to choose one of my Facebook photos (through the Socialite sample application) and use it in my app.

FileOpenPicker picker = new FileOpenPicker();
picker.FileTypeFilter.Add(".png");
picker.FileTypeFilter.Add(".jpg");
picker.SuggestedStartLocation =
PickerLocationId.PicturesLibrary;
picker.ViewMode = PickerViewMode.Thumbnail;
var result = await picker.PickSingleFileAsync();
var stream = await result.OpenAsync(
Windows.Storage.FileAccessMode.Read);
BitmapImage image = new BitmapImage();
image.SetSource(stream);
ResultImage.Source = image;
(Using the FileOpenPicker to ask Windows for a .jpg image from any data source the user will choose.)
Next, I talked about the sandbox model for Windows 8 Metro applications, how each application runs under its own user identity and cannot access the resources of other applications, and how access to privacy-affecting data (such as location or camera) requires the user’s explicit permission after declaring this intent in your application’s manifest.

CameraCaptureUI capture = new CameraCaptureUI();
var result = await capture.CaptureFileAsync(
CameraCaptureUIMode.Photo);
var stream = await result.OpenAsync(
Windows.Storage.FileAccessMode.Read);
(Using the CameraCaptureUI to capture a picture from the device’s camera.)
Next, I showed how contracts bridge together applications that are blissfully unaware of each other, and enable unforeseen sharing scenarios that will feel natural to every Windows 8 user.

Finally, I talked about LiveID and roaming settings, which is an incredibly easy way to set up your Windows 8 machine, but also an incredibly easy way for application developers to share application settings and data (game levels, achievements, favorites, history, last read book page, …) across devices. With Windows 8 on the desktop, laptop, and tablet, users will appreciate roaming-savvy applications that keep up with the multitude of devices.
I didn’t have much time to answer questions, although I did prepare a slide with FAQs. Later during the day, many people asked me if Silverlight is dead, what the porting process from other UI frameworks looks like, how WinRT is implemented, and what restrictions apply to Metro applications running in the background. Answers to all these questions—and many others—were provided by Tomer and Elad in their back-to-back Windows 8 full-day sessions, later in the SDP.
To paraphrase Steve Ballmer, these are exciting days for Windows developers, and an exciting time to become a Windows developer. I’m sure there will be many more Windows 8 news to come. In the meantime, thanks for coming to the SDP—or for reading this post—and make sure to share your Windows 8 experiences, too!
I have been recently posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn
Noam and I delivered on Tuesday a joint session called Everything New in C++ at the SELA Developer Practice. It’s been a really fun session to work on, even though it was also a cold reminder how easy it is to forget “The C++ Way” when you stay away for a little while. The new C++ standard is not just a set of minor additions to the C++ language and libraries—it almost feels like a whole new language, what with the lambda functions, type inference, and rich concurrency libraries.

While we were planning this full-day C++ session, Noam and I decided to focus not only on the standard C++, but also on some of the emerging extensions and runtimes around it. Because this day was the only day aimed directly at C++ developers, we wanted to cover as much of the new ground as possible. And indeed, we did a lot:
First, I talked about the new C++ standard. We couldn’t discuss in depth all the features, but obviously the first to mention were automatic type inference (a.k.a. auto), lambda functions, and rvalue references. Explaining the motivation for rvalue references—especially around perfect forwarding—is always tricky, and it’s been great to see an understanding audience of seasoned C++ developers applaud the reasoning behind the new language features.
int fib1 = 1, fib2 = 1;
auto next_step = [&]() {
int temp = fib2; fib2 = fib2 + fib1; fib1 = temp;
};
for (int i = 0; i < 20; ++i) next_step();
cout << fib1 << " " << fib2 << endl;
int n = 10;
auto say_yes_n_times = [=]() mutable ->bool {
return (--n > 0);
};
(Two examples of lambda functions, the biggest feature—in my opinion—in the C++11 standard.)
Next, I talked about the “new STL”—the additions to the C++ standard library that made it through the standard, with a specific focus on concurrency (std::thread and its kin). I also mentioned regex support and unordered containers, which should have been part of the STL a long time ago.
template <typename Future>
void wait_all(initializer_list<Future> l) {
for each (const auto& f in l) { f.wait(); }
}
template <typename RAIter>
void quick_sort(RAIter begin, RAIter end) {
RAIter p = partition(begin, end);
auto left = std::async([=]() { quick_sort(begin, p); });
auto right = std::async([=]() { quick_sort(p++, end); });
wait_all({left,right});
}
(Naïve parallel QuickSort using lambda functions, the async library function, futures, and initializer lists.)
Next, Noam talked about ConcRT—the Microsoft-specific concurrency runtime for C++ applications which shifts concurrency from being about threads to being about tasks and data-parallel algorithms. Noam used a couple of great textbook examples taken from the Win32 Concurrent Programming course that demonstrate exactly how easy it is to break recursive algorithms into tasks and to parallelize data-oriented algorithms with the data-parallel APIs.
combinable<int> sum;
parallel_for_each(matrix.begin(), matrix.end(), [&](row& r)
{
for each (int i in row) sum.local() += i;
});
int total_sum = sum.combine(
[](int a, int b) { return a+b; });
(Using the parallel_for_each ConcRT algorithm to parallelize matrix summation with partial sums for each thread aggregated into a combinable<int>.)
Finally, we moved to unexplored-land, the emerging-but-not-implemented-and-completely-nonstandard extensions to C++. Noam talked about C++/CX (a.k.a. /Zw), the language extensions for working with WinRT in Windows 8. These neat language extensions (very similar to C++/CLI) make consuming and exporting WinRT components very easy despite their COM-laden nature.
I had very little time left to talk about C++ AMP, a set of APIs and minor language extensions which make standard C++ code run on the GPU with as little extra work as a lambda function:
void MatrixMultiplyAMP(
vector<float>& vC, const vector<float>& vA,
const vector<float>& vB, int M, int N, int W) {
array_view<const float,2> a(M,W,vA), b(W,N,vB);
array_view<writeonly<float>,2> c(M,N,vC);
parallel_for_each(c.grid,
[=](index<2> idx) restrict(direct3d) {
int row = idx[0]; int col = idx[1]; float sum = 0.0f;
for(int i = 0; i < W; i++)
sum += a(row, i) * b(i, col);
c[idx] = sum;
});
}
(Matrix multiplication using C++ AMP, non-tiled, adapted from Daniel Moth’s blog post.)
Thanks for attending this session, and we truly hope that you like the direction C++ is taking. It’s really the renaissance of C++ in the 2010s, starting with Visual Studio 2010 and the new C++ standard, and going forward with Windows 8 support, general purpose GPU programming, and game development. You can download some of the code demos and exercises we used during the day from here.
I have been recently posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn
I apologize for the silence during the last two weeks—organizing the SDP and preparing three full-day sessions and two keynotes left no time to breathe :-)
On Monday I delivered a session called Improving the Performance of .NET Applications at the SELA Developer Practice. Here are some of the practical scenarios we covered:
- Measuring application memory usage and allocation sources
- Diagnosing memory leak sources with memory profilers
- Using sampling and instrumentation profilers to find CPU bottlenecks and methods with problematic cache access patterns
- Reading performance counter information as a lead into more intensive diagnostics
Additionally, we’ve had a couple of hours to talk about more “theoretical” things such as the memory layout of .NET reference types, boxing and its true implications, workstation and server GC flavors, and GC generations.

It’s been the second time I’m doing this session (the first was at the DevDays in June), and the room was packed again. I thought then, as I do now, that learning how to measure application performance is impossible without some practice time with the tools – so this time we created a unique format with 7 hours of frontal training followed by 2 hours of self-paced hands-on exercises in a computer classroom. I think this workshop format is the best way to experience performance measurement and optimization in a single-day setting.
If you participated in this day, thanks a lot for coming, for your constructive comments and for your interesting questions. If not – you can always take the full course, or else I’ll see you at the next SDP in May :-)
Other sessions delivered today included Windows Phone Mango (by Alex Golesh), HTML 5 (by Gil Fink), WCF Crash Course (by Erez Harari), and Windows Azure (by Manu Cohen-Yashar). Tomorrow Noam and I are up to talk about the new C++ standard and other C++ goodies—stay tuned!
I have been recently posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn
More Posts
Next page »