Visual Studio 2012 C++ Auto-Parallelizer

June 17, 2012

no comments

As you might have gathered from some scarce reports on the Web and the initial list of new features in Visual Studio 2012, the new C++ compiler is now capable of automatically vectorizing loop bodies—a feature I’ve already covered here, and also automatically parallelizing them using multiple threads.

Here’s an example. Consider the classic prime number calculation loop, designed to count the number of primes in a given range:

__declspec(noinline) bool is_prime(int n) {
    for (int x = 2; x < n; ++x) {
        if (n % x == 0 && n != x) return false;
    return true;

LONG count = 0;
for (int i = 3; i < N; ++i) {
  if (is_prime(i)) {
printf(“Count = %d"\n”, count);

This is a classic, ripe candidate for parallelization—although we need to be a little careful with the shared count variable. With N=100000 the loop completes in ~1600ms on my desktop; perhaps the compiler can make it faster automatically.

We go ahead and enable the /Qpar switch in the project properties. This allows the C++ compiler to perform automatic parallelization, but it still sometimes requires an explicit hint regarding the loops that might benefit from parallelization.


This hint is given in the form of a #pragma, indicating also how many threads you recommend that the runtime should use:

#pragma loop(hint_parallel(4))
for (int i = 3; i < N; ++i) {
  if (is_prime(i)) {

This still takes ~1600ms on my machine, and no parallelization is visible. What’s wrong? The shared variable, of course. The compiler notices that it would be unsafe to parallelize the loop body and refrains from doing it. Changing the loop to…

#pragma loop(hint_parallel(4))
for (int i = 3; i < N; ++i) {
  if (is_prime(i)) {

…suddenly works, and brings down the time to ~450ms. Here are the four threads and a representative call stack, showing that the underlying engine is the same as in OpenMP (with its #pragma omp directives introduced in Visual Studio 2005!):

Index  Function
*1      ParallelizingCompilerCpp.exe!wmain$par$1()
2      vcomp110.dll!_vcomp::C2VectParallelRegion::serialCallback(_vcomp::C2VectParallelRegion * c2pr, int)
3      vcomp110.dll!_vcomp::C2VectParallelRegion::parallelCallback_Guided(_vcomp::C2VectParallelRegion * c2pr=0x002af840)
4      vcomp110.dll!_vcomp::fork_helper_wrapper(void (…) *)
5      vcomp110.dll!_vcomp::ParallelRegion::HandlerThreadFunc(void * context=0x002af7dc, unsigned long index=0x00000000)
6      vcomp110.dll!InvokeThreadTeam(_THREAD_TEAM * ptm=0x002dddd8, void (void *, unsigned long) * pvContext=0x002af7dc, void *)
7      vcomp110.dll!_vcomp_fork(int if_test=0x00000001, int arg_count=0x00000001, void (…) * funclet=0x0f941d54, …)
8      vcomp110.dll!_vcomp::C2VectParallelRegion::Execute()
9      vcomp110.dll!C2VectParallel(int start=0x00000003, int end=0x000186a0, int stride=0x00000001, int inclusive=0x00000000, unsigned int numChunks=0x00000004, int schedule=0x00000003, void (int, int, …) * func=0x012018d0, int argcnt, …)
10     Demo.exe!wmain(int argc=0x00000001, wchar_t * * argv=0x002dc178)
11     Demo.exe!__tmainCRTStartup()
12     kernel32.dll!@BaseThreadInitThunk@12()
13     ntdll.dll!___RtlUserThreadStart@8()
14     ntdll.dll!__RtlUserThreadStart@8()

Index Id     Name                           Location
*1     2340   Main Thread                    wmain$par$1
2     8084   vcomp110.dll!_vcomp::PersistentThreadFunc _RtlUserThreadStart@8
3     2444   vcomp110.dll!_vcomp::PersistentThreadFunc @RtlpAllocateHeap@24
4     6764   vcomp110.dll!_vcomp::PersistentThreadFunc _RtlUserThreadStart@8

The documentation now is much better than it was in the Beta, and you can find online more details about the /Qpar compiler switch and the parallelization #pragmas.

I am posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>