Lap Around Roslyn CTP: Syntax Analysis and Flow Analysis

Thursday, October 27, 2011

How many times have you seen in code reviews a piece of code that calls a method, say Dictionary<K,V>.TryGetValue, and ignores the return value? We are going on a quest to find all such invocations and produce a warning. We’re going to derive from SyntaxWalker (and not SyntaxRewriter), because we won’t be doing any rewriting—just issue detection*. There are two major cases we need to consider: The method is invoked without storing its result in a local variable or using it as part of an expression. Two examples: ...
no comments

Lap Around Roslyn CTP: Syntax Rewriting with Symbol Information

Tuesday, October 25, 2011

Last time around, we were replacing the 42 numeric literal with 43. This time let’s pretend to do something more useful. Suppose you really don’t like developers calling the Console.Write method and insist on using Console.WriteLine instead. You might be slightly reluctant to use find-and-replace, because—just like last time—you don’t want to modify Console.Write calls within comments, within string literals, or—and this is vicious—calls to the Console.Write method on something that is not the System.Console class from the mscorlib assembly, like maybe a property called Console! The C# parser, which we met in its SyntaxTree incarnation, doesn’t bind...
no comments

Lap Around Roslyn CTP: Syntax Rewriting

Sunday, October 23, 2011

To start doing something useful with Roslyn, we’re going to inspect a syntax tree, locate something interesting—and then modify it! The complex structure of a C# program’s syntax tree (SyntaxTree class) is exposed through a fairly intuitive object model, featuring three types of entities: Nodes are the major elements of the language; for example, an IfStatementSyntax is a node representing an “if” statement and a LiteralExpressionSyntax is a node representing a literal expression. Tokens are secondary elements—which are nonetheless very important—such as identifiers, string literals, and numeric literals. Tokens are always attached to a node. For...
no comments

Lap Around Roslyn CTP: Introduction

Friday, October 21, 2011

The Roslyn project is the Microsoft implementation of C# and VB compilers-as-a-service. Roslyn provides a transparent view into the inner workings of the compiler, including syntax tree inspection and modification. An initial CTP of Roslyn has been released for download a couple of days ago—it requires Visual Studio 2010 SP1 and the VS2010 SP1 SDK. ...
no comments

Writing a Compiler in C#: C Code Generation, Part 2

Wednesday, February 9, 2011

We’re ready to deal with control statements and top-level program structure. Let’s tackle these one after the other. The let statement has already been handled as part of the Assignment method in the previous installment. The while statement in Jack requires evaluating an expression and deciding whether to continue or to jump to the end of the loop. At the end of the while statement block, there should be an unconditional jump to the beginning of the loop. Something along the following lines: BEGINWHILE_0: evaluate condition expression ...
tags:
2 comments

Writing a Compiler in C#: C Code Generation, Part 1

Sunday, January 2, 2011

After having discussed in some detail the lexical analysis and parsing phases, it’s time to get our hands dirty with actual code generation. Theoretically speaking, our parser emits an intermediate representation of the parsed program—the code-generator interface, shown below, can be used to construct an actual tree depicting the structure of the program. For the practical purpose of translating a Jack program to C or assembly language, there’s no need to maintain in memory a real parse tree. By using the symbol state and a small set of auxiliary data structures, we can implement a code generator that...
tags:
no comments

Writing a Compiler in C#: Parsing, Part 4

Sunday, December 12, 2010

That’s it. We’re ready for the full BNF of the Jack grammar, followed by the top-down parser of a complete Jack program. Here goes: class        ::= class cls-name { cls-var-decl* sub-decl* } cls-var-decl ::= ( static | field ) type var-name                  ( , var-name )* ; type         ::= int | char | boolean | cls-name sub-decl     ::= ( constructor | function | method ) ...
tags:
no comments

Writing a Compiler in C#: Parsing, Part 3

Thursday, November 11, 2010

Last time we left off on the brink of finishing the parser for Jack expressions. We need only fill in the blanks for parsing subroutine calls. There are three forms of subroutine calls allowed in Jack: class C {     constructor C new() { return this; }     function void f() {         var C c;         var D d;         var int i; ...
tags:
no comments

Writing a Compiler in C#: Parsing, Part 2

Thursday, November 4, 2010

Before we proceed to the full BNF of a Jack expression, we need to decide which operators we’re going to support. Our final implementation will have some additional operators, but for now we’ll settle for +, –, *, /, <, >, =, &, |, and !. One obvious question when dealing with arithmetic and relational operators is the question of operator precedence. What’s the value of 5+3*2? Is it 16 or 11? What’s the value of 3<2+5? Is it 6, or 1, or something else entirely depending on the integer coercion of a Boolean value? ...
tags:
2 comments

Writing a Compiler in C#: Parsing, Part 1

Sunday, October 17, 2010

In the previous installment we saw the core of a lexical analyzer, a module that generates from a stream of characters a set of tokens for symbols, identifiers, keywords, integer constants, and string constants. Today, we move to parsing. The parser’s job is to give semantic structure to the syntactic tokens bestowed upon it by the lexical analyzer. There are, as always, automatic tools like yacc that create from a BNF grammar a program that parses tokens in a certain language. However, it is often more efficient and certainly more educational to write a parser by hand. ...
tags:
no comments