C++ Sink Parameter Passing

Thursday, August 21, 2014

C++ is the most complex language I know, and its parameter passing rules are only getting more arcane now that we have rvalue references in C++ 11. In this post I'd like to examine the specific scenario of passing parameters to a sink method, which consumes its parameters. What do I mean by "consume"? If a parameter is movable and safe to move from (i.e. it's an rvalue), it should move from it; if a parameter is only copyable, it should copy it. Here are a couple of sink methods from the C++ Standard Library: std::vector's push_back method will move from...
no comments

Materials From My SDP 2014 Sessions and Workshops

Sunday, July 6, 2014

This year's first SDP has been a huge success, with over 1,200 developers signed up for a huge variety of workshops and talks. The snow didn't keep me from getting to Tel-Aviv this time, and I enjoyed the conference atmosphere, the talks, and some great conversations. View from one of the SDP rooms. Really hard to stay focused on developer stuff :) I'm also VERY MUCH behind on emails and everything else that isn't directly related to the conference -- so it's going to take me a while to recuperate. In the meantime, here are the materials...
one comment

.NET Native Performance and Internals

Monday, April 28, 2014

Introduction to .NET Native.NET Native is a compilation and packaging technology that compiles .NET applications to native code. It uses the C++ optimizing compiler backend and removes the need for any JIT compilation at runtime and any dependency on the .NET Framework installed on the target machine. Formerly known as "Project N", .NET Native is currently in public preview and this post explores the internals of the compilation process and the resulting executable's runtime performance.At this time, .NET Native is only available for C# Windows Store Apps compiled to x64 or ARM, so the experiments below are based on...
2 comments

C# Vectorization with Microsoft.Bcl.Simd

Tuesday, April 22, 2014

tl;drA couple of weeks ago at Build, the .NET/CLR team announced a preview release of a library, Microsoft.Bcl.Simd, that exposes a set of JIT intrinsics on top of CPU vector instructions (a.k.a. SIMD). This library relies on RyuJIT, another preview technology that is aimed to replace the existing JIT compiler. When using Microsoft.Bcl.Simd, you program against a vector abstraction that is then translated at runtime to the appropriate SIMD instructions that your processor supports.In this post, I'd like to take a look at what exactly this SIMD support is about, and show you some examples of what kind of...
6 comments

Workshops at Sela Developer Practice, December 2013: Improving .NET Performance and .NET/C++ Interop Crash Course

Thursday, December 19, 2013

In addition to my three breakout sessions, I've also had the pleasure of delivering two workshops at the Sela Developer Practice: Improving .NET Performance and .NET/C++ Interop Crash Course. Although these workshops are quite time-tested, I always try to add new materials and tools to make them more interesting for both myself and the audience. There's also constant interest in these topics -- I had 110 people registered for the performance workshop and more than 40 people at the interop course. In the performance workshop, we cover various performance measurement tools. I always try to squeeze in new tools in...
no comments

Uneven Work Distribution and Oversubscription

Wednesday, October 23, 2013

A few days ago I was teaching our Win32 Concurrent Programming course and showed students an experiment with the std::thread class introduced in C++ 11. The experiment is designed to demonstrate how to partition work across multiple threads and coordinate their execution, and the work to partition is simply counting the number of primes in a certain interval. You can find the whole benchmark here. The heart of the code is the parallelize_count function, below: void parallelize_count(unsigned nthreads, unsigned begin, unsigned end) {     std::vector<std::thread> threads;     unsigned...
no comments

On ‘stackalloc’ Performance and The Large Object Heap

Thursday, October 17, 2013

An interesting blog post is making the rounds on Twitter, 10 Things You Maybe Didn’t Know About C#.  There are some nice points in there, such as using the FieldOffset attribute to create unions or specifying custom add/remove accessors for events. However, item #4 on the list claims that using stackalloc is not faster than allocating a standard array. The proof is given in form of a benchmark program that allocates 10,000 element arrays – so far so good – and then proceeds to store values in them. The values are obtained by using Math.Pow. The benchmark results...
2 comments

Talks from DevConnections 2013: Advanced Debugging with WinDbg and SOS, Task and Data Parallelism, and Garbage Collection Performance Tips

Thursday, October 10, 2013

I'm falling behind in documenting all my travels this fall :-) In the beginning of the month I flew out to Vegas for IT/DevConnections, which was my second Las Vegas conference this year. I've been there for just 48 hours, but it was enough time to deliver three talks, meet fellow speakers, and even have a few meaningful chats with attendees about the future of .NET and production debugging techniques. You can find my presentations below -- the last couple of slides of each presentations have some additional references and books that might be useful if you want to expand...

Lock vs. Mutex

Friday, July 12, 2013

Here’s a quick brainteaser for you. Suppose you really want to find all the prime numbers in a certain range, and store them in a List<uint>. And also suppose that you want to parallelize that calculation to make it as quick as possible. You then need to synchronize access to the list so that it’s not corrupted by add operations performed in multiple threads. Would it be better to use a C# lock (CLR Monitor) or a Windows mutex to protect the list of primes? Parallel.For(2, 400000, n => { ...
15 comments

Introduction to Performance Measurement Session

Wednesday, July 3, 2013

I delivered a short two-hour session today introducing performance measurement tools. We covered performance counters – including a demo of custom performance counters, the Visual Studio profiler (sampling, instrumentation, allocations, and concurrency), and finally capturing ETW information using PerfView. Introduction to .NET Performance Measurement from Sasha Goldshtein The slides and demos are available here. In the Allocations folder you’ll find an app that allocates memory rapidly because it uses string concatenation instead of StringBuilder. In the Leak folder you’ll find a classic memory leak. In the Concurrency folder you’ll find a naïve parallelization attempt...
no comments