Materials from TechDays Netherlands 2015

August 31, 2015

7 comments

Oops! This was sitting in my queue for several months now, and I just noticed it needs to be published. But better late than never I guess. Here goes:

I’ve been lucky enough to be invited to speak at TechDays Netherlands again this year. This time I was asked to do four talks on some of my favorite subjects — performance optimization,ย debugging, and diagnostics. Same as last year, the conference was impeccably organized.

I’m really looking forward to next year’s TechDays ๐Ÿ™‚ In the meantime, here are the materials from my talks.

Making .NET Applications Faster [slides, demos]

My usual favorite on improving .NET application performance through betterย choice of collection types, reducing pressure on the garbage collector, and improvingย application startup times.

Mastering IntelliTrace in Development and Production [slides]

Another favorite on one of Visual Studio’s more powerful debugging features, which is unfortunately hidden behind the Ultimate edition’s paywall.

Visual Studio Diagnostic Hub [slides]

Visual Studio’s debugging and performance optimization experiences keep consolidating into the unified diagnostic hub (Alt+F2). This talk covered some of the good old features such as the sampling profiler, as well as new features introduced in Visual Studio 2015.

Making Software 10x Faster with Low-Level CPU Optimizations [slides, demos]

A new talk which explains how CPU caches, pipelines, stalls, and other low-level issues that apparently only hardware designers should understand can have a profound effect on application performance.

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*

7 comments

  1. thomasSeptember 3, 2015 ื‘ 10:31 AM

    wrong link to
    Making .NET Applications Faster => Demos
    (its the ppt link)

    Reply
  2. thomasSeptember 3, 2015 ื‘ 10:37 AM

    Low-Level CPU Optimizations sample:
    i get
    An unhandled exception of type ‘System.TypeLoadException’ occurred in mscorlib.dll

    Additional information: Could not load type ‘System.Numerics.Vector`1’ from assembly ‘System.Numerics.Vectors, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a’.

    Reply
    1. Sasha Goldshtein
      Sasha GoldshteinSeptember 27, 2015 ื‘ 10:04 AM

      Yeah, that’s an old version of that NuGet package. You need to rip it out and replace with System.Numerics.Vectors, retarget for .NET 4.6, and make a few minor code changes (mostly change Vector.Length to Vector.Count).

      Reply
  3. Matt WarrenSeptember 7, 2015 ื‘ 1:16 PM

    I’m a bit confused about the Mod04_CacheExploration sample from the Low-Level CPU Optimisations talk.

    In it you are measuring this code (for strides of 2, 4, 8, 16, 32, 64, 128 and 256)


    for (int j = 0; j < REPETITIONS; ++j)
    {
    if (j == 1)
    {
    // discard first iteration
    sw.Start();
    }
    for (int i = 0; i < memory.Length; i += stride)
    {
    memory[i * stride] *= 3;
    }
    }

    But all that happens is that you do the inner loop less times as the strides increase, for instance with stride = 4, the inner loop runs 4,194,304 times, but with a stride of 256 it only runs 65,536 times.

    I would think that the inner loop needs to look like this, so that it runs the same amount of time regardless of the stride size:


    for (int i = 0; i < TOTAL_ARRAY_LENGTH / 256; i++) // 256 is the "max" cache stride tested
    {
    // Access memory locations that are "stride" bytes apart
    memory[i * stride] *= 3;
    }

    But what varies is the array locations accessed each time, so that the effect of the stride is seen.

    On my machine I then get the following timings (which seem sensible as according to CPUID I have a 64-byte line size):

    *** CACHE LINE SIZES DEMONSTRATION ***
    Stride Time (ms)
    1 3.840
    2 3.689
    4 3.686
    8 4.239
    16 4.562
    32 12.228
    64 16.489
    128 18.300
    256 18.960

    Or am I missing something?

    Reply
    1. Matt WarrenSeptember 7, 2015 ื‘ 1:20 PM

      Sorry, I pasted timing from a DEBUG build, here’s the proper ones:


      Stride Time (ms)
      1 1.157
      2 1.085
      4 1.532
      8 1.844
      16 3.304
      32 6.878
      64 11.205
      128 10.679
      256 11.555

      Reply
    2. Sasha Goldshtein
      Sasha GoldshteinSeptember 27, 2015 ื‘ 10:04 AM

      You can see the result either way, actually, because when the strides are smaller than the cache line size, the perf differences are smaller than 2x when the stride increases by 2x.

      Reply