October 2011 - Posts
I’m interrupting our scheduled programming for an important announcement: we will be hosting the SELA Developer Practice at the Crown Plaza hotel (Tel Aviv) and the SELA headquarters on December 4-8, 2011!

The format is (again) slightly different—we will be having a day full of keynote sessions on Windows 8 and other //build announcements, including Visual Studio 11 and .NET Framework 4.5. Then, we will host 22 full-day tutorials on a wide variety of topics—old and new—parallel programming, Windows 8 development, TFS, Windows Phone Mango, HTML 5, .NET debugging, and many others.
The speakers are what makes or breaks a conference, and this time we have a lineup of six Microsoft MVPs, twelve SELA architects and senior consultants, and two special guests: Guy Burstein and Maor David-Pur from Microsoft Israel.
I will speak at the first keynote of the day, introducing Windows 8, the frameworks, and the development experience. Additionally, I will deliver three tutorial days: the several-times-successful .NET Performance, .NET Debugging, and a unique new tutorial with Noam Sheffer titled Everything New in C++, covering C++11, C++/CX, and the C++ AMP extensions.
This is the third time this year we are delivering a conference of this scale, and I’m sure it’s going to be a blast. Thanks for your continued support, and I hope you learn from this conference and enjoy its unique atmosphere!
One small tip to conclude this announcement: if you’re considering attending, I should advise you to register as quickly as possible. Last time around (in June) we had most tutorial days sell out several weeks before the conference.
I have been recently posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn
How many times have you seen in code reviews a piece of code that calls a method, say Dictionary<K,V>.TryGetValue, and ignores the return value? We are going on a quest to find all such invocations and produce a warning.
We’re going to derive from SyntaxWalker (and not SyntaxRewriter), because we won’t be doing any rewriting—just issue detection*. There are two major cases we need to consider:
- The method is invoked without storing its result in a local variable or using it as part of an expression. Two examples:
dict.TryGetValue(1, out r);
for (int.TryParse(s, out i); ; ) …
- The method is invoked and its result is assigned to a local variable, but that local variable is never read throughout the rest of the method.
The skeleton of our SyntaxWalker is as follows—this should seem familiar, with the exception of the VisitMethodDeclaration method that doesn’t return a value (remember, we’re not doing any rewriting—just inspection).
class MyIgnoredBooleanReturnValueLocator : SyntaxWalker
{
private readonly SemanticModel _semanticModel;
private readonly HashSet<SyntaxKind> _assignExprs;
public MyIgnoredBooleanReturnValueLocator(
SemanticModel model)
{
_semanticModel = model;
_assignExprs = new HashSet<SyntaxKind>(
new[] {
SyntaxKind.AndAssignExpression,
SyntaxKind.AssignExpression,
SyntaxKind.ExclusiveOrAssignExpression,
SyntaxKind.OrAssignExpression
});
}
protected override void VisitMethodDeclaration(
MethodDeclarationSyntax node)
{
}
}
Why are we visiting the method declaration? The whole analysis seems appropriate at the method body level—this will make it easier to answer questions such as “is this local variable read throughout the rest of the method?”. To start, we need to look at all method invocations in the method we’re visiting:
foreach (InvocationExpressionSyntax invocation in
node.DescendentNodes()
.OfType<InvocationExpressionSyntax>())
{
SemanticInfo methodInfo =
_semanticModel.GetSemanticInfo(
invocation.Expression);
MethodSymbol methodSym = (MethodSymbol)
methodInfo.Symbol;
Symbol localVariableAssigned;
SyntaxToken localVariableInitialized;
bool needTrack = CheckInvocation(
invocation,
out localVariableAssigned,
out localVariableInitialized);
if (!needTrack)
continue;
//...
}
The CheckInvocation method checks whether the method invocation adheres to one of the suspicious patterns we have detected. This requires some work, as the tree structures of these patterns are quite different. We have the following four cases:
- The parent of the InvocationExpressionSyntax is an ExpressionStatementSyntax, representing the case where the method is invoked without treating the result as an expression at all.
- The parent of the InvocationExpressionSyntax is a ForStatementSyntax, in which case we need to check if the method invocations is a direct descendant of the Initializers or Incrementors collection on the ForStatementSyntax.
- The parent of the InvocationExpressionSyntax is a BinaryExpressionSyntax, in which case the Left property must be an IdentifierNameSyntax representing a local variable and the ExpressionKind must be one of the assignment expressions (=, &&=, etc.). This represents a local variable being assigned the method’s return value.
- The parent of the InvocationExpressionSyntax is an EqualsValueClauseSyntax whose parent is a VariableDeclaratorSyntax, representing a local variable being initialized with the method’s return value.
In cases #3 and #4 we want to return to the caller the local variable to track. There is another complication here, as in case #3 we have a symbol representing the local variable to speak of, whereas in case #4 we have only a SyntaxToken representing the identifier of the local variable. With that said, here’s the code:
private bool CheckInvocation(
InvocationExpressionSyntax invocation,
out Symbol localVariableAssigned,
out SyntaxToken localVariableInitialized)
{
localVariableAssigned = null;
localVariableInitialized = NoToken;
SemanticInfo info = _semanticModel.GetSemanticInfo(
invocation);
MethodSymbol methodSymbol = (MethodSymbol)info.Symbol;
if (methodSymbol.ReturnType.SpecialType !=
SpecialType.System_Boolean)
return false;
if (!methodSymbol.Name.Contains("Try"))
return false;
if (invocation.Parent is ExpressionStatementSyntax)
{
WARN("Invocation of {0} ignores its return value.",
methodSymbol.Name);
return false;
}
ForStatementSyntax forStmt =
invocation.Parent as ForStatementSyntax;
if (forStmt != null)
{
if (forStmt.Initializers.Contains(invocation) ||
forStmt.Incrementors.Contains(invocation))
{
WARN("Invocation of {0} as part of the 'for'" +
"statement ignores its return value.",
methodSymbol.Name);
return false;
}
}
BinaryExpressionSyntax binaryExpr =
invocation.Parent as BinaryExpressionSyntax;
if (binaryExpr != null &&
_assignExprs.Contains(binaryExpr.Kind))
{
IdentifierNameSyntax id =
binaryExpr.Left as IdentifierNameSyntax;
if (id != null)
{
Symbol symbol =
_semanticModel.GetSemanticInfo(id).Symbol;
if (symbol.Kind == SymbolKind.Local)
{
localVariableAssigned = symbol;
return true;
}
}
}
EqualsValueClauseSyntax equalsClause =
invocation.Parent as EqualsValueClauseSyntax;
if (equalsClause != null)
{
VariableDeclaratorSyntax varDecl =
equalsClause.Parent as
VariableDeclaratorSyntax;
if (varDecl != null)
{
SyntaxToken localVar = varDecl.Identifier;
localVariableInitialized = localVar;
return true;
}
}
return false;
}
In the latter two cases, the caller needs to track the local variable being initialized or assigned throughout the rest of the method and determine whether it’s being used. For simplicity (and brevity!) we’ll ignore the case that the variable is being overwritten before it’s read, or being read conditionally—that’s not to say these cases can’t be treated, at least partially**.
How are we going to figure out whether the variable is being read? The naïve approach would be to examine every possible syntax node that might read the variable. This would be very difficult and error-prone. Instead, Roslyn offers a data-flow analysis API that can answer questions like “is this variable read/written in this block?” or “which variables are written outside this region?”. (To be fair, there’s also a control-flow analysis API, not shown here, which can answer questions like “what are all the locations where control leaves this block?” and “what are all the target locations to which control arrives in this block?”.)
Going back to our VisitMethodDeclaration method, we can complete the tracking code with the following:
RegionDataFlowAnalysis flow =
_semanticModel.AnalyzeRegionDataFlow(
TextSpan.FromBounds(invocation.FullSpan.End,
node.Span.End));
if (localVariableAssigned != null)
{
if (!flow.ReadInside.Contains(localVariableAssigned))
{
WARN("The local variable {0} assigned the " +
"return value of {1} is never read.",
localVariableAssigned.Name, methodSym.Name);
}
}
else if (localVariableInitialized != NoToken)
{
VariableDeclaratorSyntax varDecl =
(VariableDeclaratorSyntax)invocation.Parent.Parent;
if (!flow.ReadInside.Any(
sym => sym.Name ==
localVariableInitialized.ValueText &&
sym.Kind == SymbolKind.Local &&
sym.Locations.Any(loc =>
loc.SourceSpan.IntersectsWith(varDecl.Span))))
{
WARN("The local variable {0} initialized with " +
"the return value of {1} is never read.",
localVariableInitialized.ValueText,
methodSym.Name);
}
}
This concludes our syntax analysis—we have detected with absolute certainty several cases in which the method’s return value is ignored. There are some cases we don’t detect—but hopefully they are a small minority.
* Even though it could be interesting to automatically insert code that checks the return value and throws an appropriate exception, this is probably a bad idea in most cases :-)
** Trying to solve the general problem with absolute precision is simply impossible. Compiler theorists simply can’t pass up on the opportunity to discuss an undecidable problem, so here goes:
The Halting Problem can be reduced to deciding the language
L = { <x,P>: the variable x is read in every execution of program P }
The reduction proceeds as follows. Given an input <T,w> we construct the following program P:
- int x = 0
- run T on w
- print(x)
Now, P reads the value of x iff P reaches line #3 iff T halts on w. This completes the reduction, showing that the language is undecidable (because the Halting Problem is undecidable), and leaves only heuristics to speak of.
I have been recently posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn
Last time around, we were replacing the 42 numeric literal with 43. This time let’s pretend to do something more useful. Suppose you really don’t like developers calling the Console.Write method and insist on using Console.WriteLine instead. You might be slightly reluctant to use find-and-replace, because—just like last time—you don’t want to modify Console.Write calls within comments, within string literals, or—and this is vicious—calls to the Console.Write method on something that is not the System.Console class from the mscorlib assembly, like maybe a property called Console!
The C# parser, which we met in its SyntaxTree incarnation, doesn’t bind MethodInvocationExpression instances to the actual method being invoked. All it cares about is the proper structure of the expression. For all it care, Console could be a private class, and not the BCL one.
Enter the semantic model (SemanticModel class), which represents everything the compiler knows about your code after binding the syntax tree to symbols. In this case, the semantic model will give us a symbol for the Console.Write invocation expression, and we’ll be able to tell which Console.Write is being invoked and replace it accordingly.
To obtain a SemanticModel instance, we need to provide Roslyn with all the information to perform binding—i.e., the assembly references for our code. Recall that we could create a SyntaxTree without specifying those! This information is wrapped by a Compilation class instance, which can be used (eventually) to emit actual code.
Compilation compilation = Compilation.Create(
"MyCompilation",
CompilationOptions.Default,
new SyntaxTree[] { tree },
new MetadataReference[] {
new AssemblyFileReference(
typeof(object).Assembly.Location)
},
null, null);
SemanticModel model = compilation.GetSemanticModel(tree);
Now that we have the semantic model, we can use pass it to our rewriter’s constructor:
/// <summary>
/// Replaces Console.Write calls with equivalent
/// Console.WriteLine calls.
/// </summary>
class MyConsoleWriteRewriter : SyntaxRewriter
{
private readonly SemanticModel _semanticModel;
public MyConsoleWriteRewriter(SemanticModel model)
{
_semanticModel = model;
}
protected override SyntaxNode
VisitInvocationExpression(
InvocationExpressionSyntax node)
{
SemanticInfo info =
_semanticModel.GetSemanticInfo(node);
MethodSymbol symbol = (MethodSymbol)info.Symbol;
if (symbol.Name == "Write" &&
symbol.ContainingType.Name == "Console" &&
symbol.ContainingNamespace.Name == "System" &&
symbol.ContainingAssembly.Name == "mscorlib")
{
MemberAccessExpressionSyntax old =
(MemberAccessExpressionSyntax)
node.Expression;
return node.ReplaceNode(
old,
old.Update(
old.Expression,
old.OperatorToken,
Syntax.IdentifierName("WriteLine")));
}
return node;
}
}
To understand what’s going on here, let’s take a look at the structure of the MethodInvocationExpression node for a typical Console.Write call:

The MethodInvocationExpression, in this case, consists of a MemberAccessExpression, which specifies the method to invoke, and an ArgumentList that specifies the arguments. Because we trust Console.WriteLine to accept the same arguments Console.Write accepts, we don’t need to touch the ArgumentList node. Moreover, we don’t even need to touch the first IdentifierName under the MemberAccessExpression—all we need to replace is the second IdentifierName.
Therefore, we return a new node from our VisitInvocationExpression method whenever we have something to replace the existing node with. Specifically, we ask the semantic model to give us symbol information for the method invocation expression—if it matches the System.Console.Write method from the mscorlib assembly, we keep the entire expression except the method name identifier.
Of course, to apply this rewriter to our tree, we need to provide to it the SemanticModel instance retrieved earlier:
SyntaxNode newRoot =
new MyConsoleWriteRewriter(model).Visit(tree.Root);
This actually starts looking useful. Next time, we won’t bother with rewriting, but instead perform more complicated analysis of the syntax tree and semantic model, including data flow and control flow within a method.
I have been recently posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn
To start doing something useful with Roslyn, we’re going to inspect a syntax tree, locate something interesting—and then modify it! The complex structure of a C# program’s syntax tree (SyntaxTree class) is exposed through a fairly intuitive object model, featuring three types of entities:
Nodes are the major elements of the language; for example, an IfStatementSyntax is a node representing an “if” statement and a LiteralExpressionSyntax is a node representing a literal expression.
Tokens are secondary elements—which are nonetheless very important—such as identifiers, string literals, and numeric literals. Tokens are always attached to a node. For example, an IfStatementSyntax node will have an ExpressionSyntax node as its Condition property, and that might turn out to be a BinaryExpressionSyntax node with a Left and Right properties describing more ExpressionSyntax nodes and an OperationToken token that describes the operation.
Trivia are all the rest—preprocessor directives, whitespace, comments—riding on top of tokens.
The syntax tree, of course, has parent-child relationships between nodes, and there’s a set of APIs for traversing these relationships, such as DescendantNodes and FirstAncestor.
Syntax trees can be created from source code very easily. It is also a very quick process, because no binding or code emission takes place—only the lexer and parser are involved in the construction of the tree. (In the next post we’ll look at the semantic model as well, which requires symbol construction and binding.)
SyntaxTree tree = SyntaxTree.ParseCompilationUnit(@"
using System;
public class MyClass {
public static void MyMethod() {
Console.Write(""Hello There {0}"", 42);
Console.Write(42);
}
}
");
Console.WriteLine(tree.Root.GetFullText());
Inspecting the tree in the debugger visualizer (supplied with the Roslyn CTP as a sample) shows the following structure:

You can inspect and modify syntax trees directly, but the easier way would be to use a visitor class derived from SyntaxWalker or SyntaxRewriter. A quick code demo is better than a thousand words describing it, so here’s a rewriter that will modify numeric literals with the value 42 to the value 43:
/// <summary>
/// Replaces the numeric literal 42 with the
/// numeric literal 43.
/// </summary>
class MyLiteralRewriter : SyntaxRewriter
{
protected override SyntaxNode VisitLiteralExpression(
LiteralExpressionSyntax node)
{
if (node.Kind ==
SyntaxKind.NumericLiteralExpression)
{
SyntaxToken token = node.Token;
if (token.Value is int &&
(int)token.Value == 42)
{
return node.ReplaceToken(
token, Syntax.Literal(
token.LeadingTrivia,
"43", 43,
token.TrailingTrivia));
}
}
return node;
}
}
Note that the VisitLiteralExpression method does not modify the node—it either returns the existing node, or returns a new node with a new literal token. The entire Roslyn API surface is like that—all objects are immutable, and you create new objects off existing ones.
How is this visitor applied to a syntax tree? To apply it, we need to give it the tree root, and it will return a new tree root. This new tree root can be compiled, analyzed, or simply … serialized to text:
SyntaxNode newRoot =
new MyLiteralRewriter().Visit(newRoot);
tree = SyntaxTree.Create(
tree.FileName, (CompilationUnitSyntax)newRoot);
Console.WriteLine(tree.Root.GetFullText());
It goes without saying that this rewriter will detect only numeric literals—it will not match the number 42 when it appears in comments or in strings, as the more fallible regex-based approach may.
In the next post, we’ll look into a somewhat more complicated syntax rewriting visitor, which will require the semantic model of the code (i.e., symbols and their meanings) and not just the syntactic information.
I have been recently posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn
The Roslyn project is the Microsoft implementation of C# and VB compilers-as-a-service. Roslyn provides a transparent view into the inner workings of the compiler, including syntax tree inspection and modification. An initial CTP of Roslyn has been released for download a couple of days ago—it requires Visual Studio 2010 SP1 and the VS2010 SP1 SDK. | original image, license: CC BY-NC 2.0 |
Some of the scenarios enabled by Roslyn are the following:
Refactoring. Refactoring tools no longer need to parse the original source code and then reconstruct it manually. Instead, a set of rewriting APIs is available on the syntax tree level, making possible classical refactorings—“make readonly”, “extract method”, “extract interface”—as well as incredible new things yet to be invented.
Code analysis. Performing static code analysis—especially when data flow and control flow are involved—is an extremely difficult task without the syntactic and semantic compiler engines. It becomes possible to answer questions like “is this local variable returned from the method?”, “is this parameter assigned in this specific block under that specific condition?”, “is this method invoked with a compile-time constant expression?”.
“Quick Fix”. Visual Studio extensions that run in the background can detect common source code issues and suggest a fix.
REPL (Read-Eval-Print Loop). To test code quickly against a given API or to invent a new API and then try it out is easier than ever with the C# Interactive Window in Visual Studio. It is more powerful than the Immediate Window, because you can use the full power of the language—including lambdas and query comprehensions—and because you can define new types or methods and immediately use them.
Scripting. Roslyn ships with a command-line tool called rcsi.exe that evaluates .csx files—script files written in C# that are clear of the “class Program … void Main” clutter and focus on exactly the task at hand. I’ll go as far as saying that this is serious competition for PowerShell!
Embedded compiler. It is now easier to embed the C# compiler in an application and provide an ambient context (host object model) implicitly to the compiled code. Think DSLs and rule engines—taking advantages of the full power of C#.
In the next couple of posts we’ll perform some experimentation with the Roslyn APIs. If you are looking for more, right away, then go ahead and read the “Getting Started” documents and the samples that ship with the CTP.
.NET 4.5 has a little hidden gem up its sleeve – the ExceptionDispatchInfo class. It’s used by the Task Parallel Library to capture and rethrow exceptions when they are not aggregated – specifically, to support the await keyword. Luckily, the class is public and can be used by anyone to capture an exception that occurred in one context – say, a thread – and then rethrow it (selectively) in another context – say, on another thread, while maintaining the full fidelity of the original stack trace and exception information.
First, let’s take a look at a stack trace that uses ExceptionDispatchInfo to rethrow a captured exception. Given the following async method:
private static async void ThrowingMethod()
{
try
{
await Task.Run(() => { throw new InvalidOperationException(); });
}
catch (Exception ex)
{
Console.WriteLine(ex);
}
}
The exception that occurs in the task is marshaled to the continuation, where it is rethrown and caught by the catch block. And here’s the output:
System.InvalidOperationException: Operation is not valid due to the current state of the object.
at VanillaCSharpConsoleApp.Program.<ThrowingMethod>b__3()
at System.Threading.Tasks.Task`1.InnerInvoke()
at System.Threading.Tasks.Task.Execute()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNot
ification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.GetResult()
at VanillaCSharpConsoleApp.Program.<ThrowingMethod>d__5.MoveNext()
Note the bold red separator between the location where the exception was thrown and captured and the location where it was rethrown.
Now let’s try something similar in custom code – we’ll throw an exception on a thread pool thread, capture it, transfer it to the main thread, inspect it, and optionally rethrow it. It’s really easy – you use ExceptionDispatchInfo.Capture to create an ExceptionDispatchInfo instance, which can now be transferred from place to place (as long as it doesn’t cross AppDomain boundaries – the class is not serializable). Next, you inspect the captured exception using the SourceException property; and finally, you rethrow the captured exception using the Throw method.
var exceptions = new BlockingCollection<ExceptionDispatchInfo>();
ThreadPool.QueueUserWorkItem(_ =>
{
try
{
Foo();
}
catch (Exception ex)
{
ExceptionDispatchInfo edi = ExceptionDispatchInfo.Capture(ex);
exceptions.Add(edi);
}
exceptions.CompleteAdding();
});
foreach (ExceptionDispatchInfo edi in exceptions.GetConsumingEnumerable())
{
try
{
if (edi.SourceException.Message.Contains("invalid"))
{
edi.Throw();
}
}
catch (Exception ex)
{
Console.WriteLine(ex);
}
}
The program’s output is:
System.InvalidOperationException: This operation is, well, invalid.
at VanillaCSharpConsoleApp.Program.Bar()
at VanillaCSharpConsoleApp.Program.Foo()
at VanillaCSharpConsoleApp.Program.<>c__DisplayClass1.<Main>b__0(Object _)
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at VanillaCSharpConsoleApp.Program.Main(String[] args)
What does all of this mean for practical purposes? In any situation where an exception thrown in one place has to be rethrown in another, you would typically do that by wrapping the exception with another exception. This is no longer necessary now that the original exception can be rethrown while preserving the original stack trace and exception information.
The C++ compiler in Visual Studio 11 has another neat optimization feature up its sleeve. Unlike intrusive features, such as running code on the GPU using the AMP extensions, this one requires no additional compilation switches and no changes – even the slightest – to the code.
The new compiler will use SIMD (Single Instruction Multiple Data) instructions from the SSE/SSE2 and AVX family to "parallelize" loops. This is not the standard, thread-level parallelism, which runs certain iterations of the loop in parallel. This is the processor’s inherent ability to execute operations on individual parts of large data elements in parallel.
The following trivial example illustrates the benefits of this optimization. Suppose you want to sum two vectors of floating-point numbers, element-by-element. The following C/C++ loop performs this task:
for (int i = 0; i < N; ++i)
C[i] = A[i] + B[i];
The current VC++ compiler compiles this loop to the following 32-bit code with optimizations:
013E105A xor eax,eax
013E105C lea esp,[esp]
013E1060 fld dword ptr B[eax]
013E1067 add eax,28h
013E106A fadd dword ptr [ebp+eax-0FCCh]
013E1071 fstp dword ptr [ebp+eax-2F0Ch]
013E1078 fld dword ptr [ebp+eax-0FC8h]
013E107F fadd dword ptr [ebp+eax-1F68h]
013E1086 fstp dword ptr [ebp+eax-2F08h]
013E108D fld dword ptr [ebp+eax-0FC4h]
013E1094 fadd dword ptr [ebp+eax-1F64h]
013E109B fstp dword ptr [ebp+eax-2F04h]
013E10A2 fld dword ptr [ebp+eax-0FC0h]
013E10A9 fadd dword ptr [ebp+eax-1F60h]
013E10B0 fstp dword ptr [ebp+eax-2F00h]
013E10B7 fld dword ptr [ebp+eax-0FBCh]
013E10BE fadd dword ptr [ebp+eax-1F5Ch]
013E10C5 fstp dword ptr [ebp+eax-2EFCh]
013E10CC fld dword ptr [ebp+eax-0FB8h]
013E10D3 fadd dword ptr [ebp+eax-1F58h]
013E10DA fstp dword ptr [ebp+eax-2EF8h]
013E10E1 fld dword ptr [ebp+eax-0FB4h]
013E10E8 fadd dword ptr [ebp+eax-1F54h]
013E10EF fstp dword ptr [ebp+eax-2EF4h]
013E10F6 fld dword ptr [ebp+eax-0FB0h]
013E10FD fadd dword ptr [ebp+eax-1F50h]
013E1104 fstp dword ptr [ebp+eax-2EF0h]
013E110B fld dword ptr [ebp+eax-0FACh]
013E1112 fadd dword ptr [ebp+eax-1F4Ch]
013E1119 fstp dword ptr [ebp+eax-2EECh]
013E1120 fld dword ptr [ebp+eax-0FA8h]
013E1127 fadd dword ptr [ebp+eax-1F48h]
013E112E fstp dword ptr i[eax]
013E1135 cmp eax,0FA0h
013E113A jb wmain+60h (013E1060h)
Note the aggressive loop unrolling employed by the compiler – each iteration of this loop will perform 10 operations.
The new VC++ compiler compiles the loop to the following 32-bit code with optimizations:
00381041 xor eax,eax
00381043 jmp wmain+50h (0381050h)
00381045 lea esp,[esp]
0038104C lea esp,[esp]
00381050 movups xmm1,xmmword ptr B[eax]
00381058 movups xmm0,xmmword ptr A[eax]
00381060 add eax,10h
00381063 addps xmm1,xmm0
00381066 movups xmmword ptr [ebp+eax-2EF4h],xmm1
0038106E cmp eax,0FA0h
00381073 jb wmain+50h (0381050h)
This time, each iteration of the loop performs 4 operations, by using the SIMD instructions MOVUPS and ADDPS. The first, MOVUPS, copies four floating-point values from memory to registers and the other way around. The second, ADDPS, adds four floating-point values that are packed next to each other in two registers.
What's the performance difference? On my Intel i7-860 processor, there is exactly a 2x difference between the two compiler toolsets.*
The loop above is a silly example, but it shows the potential of automatic optimization. Using SIMD instructions from C++ programs – up to now – relied on dropping to low-level intrinsics such as _mm_add_ps, and low-level types such as __m128. I’m willing to bet that most C++ developers have never considered using these intrinsics in their programs. That’s why this is an important feature, and just a tiny step in the right direction.
* It’s worth mentioning that the VC++11 compiler can produce AVX instructions (operating on 256 bit YMMx registers), which should be even faster, but this is not the default. My first-generation i7 processor doesn’t support them – feel free to check them out on a Sandy Bridge processor and let me know if it helps.
A dream is coming true. A dream where all the debugging you’ll ever do on your developer box is going to be in a single tool – Visual Studio.
In a later post, I will discuss device driver development in Visual Studio 11, which is another dream come true. For now, let’s take a look at how Visual Studio can open kernel crash dumps and perform crash analysis with all the comfy tool windows and UI that we know and love.
To perform kernel crash analysis in Visual Studio 11, you will need to install the Windows Driver Kit (WDK) on top of Visual Studio. Go on, I’ll wait here.
First things first – you go to File | Open Crash Dump, and you’re good to go:

Visual Studio will load that dump file and open the initial analysis window – which is a new tool called Debugger Immediate Window.

Note that the Threads window displays processors, and you can switch between processors to examine their call stack in the Call Stack window. Finally, if you’re dead serious and want to run some real debugger commands, there’s command-line IntelliSense for debugger commands, complete with a documentation tooltip.

In case you’re wondering, there is still room for WinDbg as a standalone tool. The obvious difference between WinDbg and Visual Studio – other than usability – is installation size. You can copy WinDbg over to a machine or run it from a USB stick, which is amazing in a production environment. So no, WinDbg isn’t redundant yet, but Visual Studio has just earned itself some street cred in the most hardcore debugging circles.
Executive summary: When using the Windows debugger engine to debug optimized C++ code compiled with Visual Studio 11 you can step into inline functions and see local variables that are stored in CPU registers.
The current (Visual Studio 2010 compiler) state of affairs is that compiler optimizations are way smarter than the debugger engine, which lacks the information necessary to map a fully optimized binary back to the source code in a reliable manner. This is why C++ developers don’t like debugging optimized code: as if the compiler-introduced reorderings which take you from one line to a completely unrelated line at a whim aren’t enough, you often don’t see all local variables because they are stored in CPU registers at runtime, and you can’t step into or set breakpoints in functions that have been inlined.

The Visual Studio 11 C++ compiler helps by emitting additional debug information in the PDB. This information allows the debugger to map more accurately the optimized assembly stream back to the source. Currently, to get this to work, you need the Visual Studio 11 C++ compiler to build your code and you need to use the most recent Debugging Tools for Windows build – it contains the Windows debugger you must then use to work with this information*. Fortunately, you can use the Windows debugger engine to debug C++ code without leaving Visual Studio 11 – which means it ain’t all that bad.
The magic ensues when you add the /d2Zi+ undocumented compiler flag to your C++ compiler settings. (Obviously, this is something that will change before the release.) When you launch your program with WinDbg, or use the Windows debugger engine in Visual Studio to debug your code, you will see local variables and will be able to step into inline functions:


Finally, this also means that the debugger will be able to show inline functions on a call stack. For example, if you encounter an exception in an inline function and store a crash dump of the problem, you’ll have the debugger point at the inline function as the source of the crash. In WinDbg, a call stack would look similar to the following, complete with line numbers:
0:000> k
ChildEBP RetAddr
(Inline) -------- VanillaC__ConsoleApp!AddTwoNumbers+0x9 [d:\...\vanillac++consoleapp.cpp @ 9]
0035fae4 00901267 VanillaC__ConsoleApp!wmain+0x39 [d:\...\vanillac++consoleapp.cpp @ 25]
0035fb24 75f7339a VanillaC__ConsoleApp!__tmainCRTStartup+0xfd
0035fb30 77049ed2 kernel32!BaseThreadInitThunk+0xe
0035fb70 77049ea5 ntdll!__RtlUserThreadStart+0x70
0035fb88 00000000 ntdll!_RtlUserThreadStart+0x1b
* Here’s to hoping that in the very near future we’ll see this supported by the built-in C++ debugger in Visual Studio, and not just the Windows debugger.
We are all so used to reformatting Windows boxes every couple of years, especially for not-so-technically-savvy relatives’ machines infested with malware. Refresh your PC is a refreshing feature of Windows 8 that maintains all the files and settings you have on your machine, but removes all applications (other than Windows Store apps).

A couple of days ago I had to perform a refresh on my Samsung Developer Tablet, after connecting to it a ZTE USB modem caused all Metro apps to fail at the splash screen. And there’s really not so much to it – a refresh is simply an in-place Windows upgrade which preserves your files, and even stores the old Windows and Program Files directories in C:\Windows.old (which is kind of annoying, actually, considering the tiny 64GB hard drive this tablet has).
Still, it’s a cool feature that many of us “sysadmins not by choice” are going to enjoy*. Internet Explorer is loading too slowly? Refresh. You’re getting annoying popups when logging in? Refresh. Office complains that the installation is corrupt? Refresh. And so on.
* At least until malware figures out a way to stay alive after a refresh. But there’s always Reset your PC for that.
