Wishes for the CLR JIT in the 2020s

March 3, 2013

3 comments

There have been some very interesting discussions at the MVP Summit concerning the CLR JIT, what we expect of it, and how to evolve it forward. I obviously can’t disclose any NDA materials, but what I can do is share my hopes and dreams for the JIT, going forward. This is not a terribly popular subject, but there are some UserVoice suggestions around the JIT, such as adding SIMD support to C#.

The state of the JIT today is that it’s a fairly quick compiler that does a fairly bad job at optimization. There are some tricks it employs that are not available to statically compiled languages, such as interface method inlining via profiling, but compared to state-of-the-art dynamic JITs, it lags strongly behind. Some of the biggest weaknesses, in my opinion, include:

  • Almost complete lack of support for SIMD operations, both automatic and programmer-specified
  • Issues with inlining support of methods that return or accept complex value types
  • Disparity between x86 and x64 JITs in terms of code quality and compilation speed
  • Lack of flexibility on inlining and “hot-spot” optimization decisions
  • No extra, breath-taking optimizations in NGen compared to the runtime JIT
  • Insufficient knobs for tweaking JIT behavior, aggressiveness, memory utilization

Of all these, I think the biggest pain point is support for SIMD operations. In Visual C++ 2012, the compiler can automatically vectorize loops over integers or floats and produce super-efficient vector operations using 128- and 256-bit registers. The potential speedups from auto-vectorization are 8x on modern hardware. The biggest challenge with auto-vectorization is figuring out whether it’s safe to perform it — i.e., making sure that no dependencies exist that would break the vectorized version. But that kind of analysis is easier in C# than it is in C++.

Even without auto-vectorization support, C++ compilers have offered intrinsic operations for years that give low-level developers the opportunity to optimize their loops, game engines, and math operations, at the expense of nastiness such as:

for (int i = 0; i < size; i += 8) {
  __m128i vb = _mm_load_si128((__m128i const*)&b[i]);
  __m128i vc = _mm_load_si128((__m128i const*)&c[i]);
  __m128i vd = _mm_load_si128((__m128i const*)&d[i]);
  vc = _mm_add_epi16(vc, vtwo);
  vd = _mm_add_epi16(vd, vk);
  __m128i mask = _mm_cmpgt_epi16(vb, vzero);
  vc = _mm_and_si128(vc, mask);
  vd = _mm_andnot_si128(mask, vd);
  __m128i vr = _mm_or_si128(vc, vd);
  _mm_store_si128((__m128i*)&a[i], vr);
}

Now, I’m not an advocate for including every single intrinsic into C# so that it’s supported by the JIT. In fact, I think it would be an ugly approach to take, albeit a relatively easy one. Another option is simulating existing vector libraries for C++ code, using a small number of built-in types whose operations are automatically promoted to SIMD instructions. For example:

Vector4f a = new Vector4f(1.0f, 2.0f, 3.0f, 4.0f);
Vector4f b = a.Reverse();
Vector4f c = -a * b/2;

But there definitely exist more elegant approaches that also decouple the source code from the processor implementation, so that, for example, when larger registers are available (e.g. 512-bit registers), they could be used automatically. This would likely require either fully automatic vectorization, so that you write “a simple loop” and the compiler generates appropriate instructions, or significant new syntax/attributes that would be used by the developer to hint that automatic vectorization is possible. Perhaps a new type of arrays, or a new type of array indexing syntax, would go toward enabling this experience.

My next item on the list after SIMD support would be extra work in NGen. Currently, running NGen on your application has (almost) only the effect of improving startup performance, because there’s not need to perform compilation at runtime. However, I’ll be willing to sacrifice compilation time when running NGen — i.e., have a much slower NGen turnaround time — at the expense of generating more efficient code and introducing additional optimizations.

You are welcome to chime in with your suggestions and thoughts in the comments, and make sure to visit UserVoice and make suggestions. (Although you should first skim through the existing ones to avoid diluting votes on already-posted suggestions.)


I am posting short links and updates on Twitter as well as on this blog. You can follow me: @goldshtn

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

3 comments

  1. GuestMarch 3, 2013 ב 8:25 AM

    2020 ????
    I thought they are already working on this …
    http://www.compilerjobs.com/db/jobs_view.php?editid1=648

    Reply
  2. tobiMarch 3, 2013 ב 2:31 PM

    Mostly games would benefit from SIMD. My #1 item would be all general optimizations that apply to typical ASP.NET/WCF server applications. We can easily get 20% more throughput for nothing on there I believe.

    Reply
  3. tobiMarch 3, 2013 ב 2:33 PM

    I forgot to mention that escape analysis would be *very* useful to allocate small, temporary objects on the stack. Thinking of iterators (LINQ) and strings here. A tremendous optimization opportunity. Especially as more immutability is being employed.

    Reply