Lap Around Roslyn CTP: Syntax Rewriting

October 23, 2011

no comments

To start doing something useful with Roslyn, we’re going to inspect a syntax tree, locate something interesting—and then modify it! The complex structure of a C# program’s syntax tree (SyntaxTree class) is exposed through a fairly intuitive object model, featuring three types of entities:

Nodes are the major elements of the language; for example, an IfStatementSyntax is a node representing an “if” statement and a LiteralExpressionSyntax is a node representing a literal expression.

Tokens are secondary elements—which are nonetheless very important—such as identifiers, string literals, and numeric literals. Tokens are always attached to a node. For example, an IfStatementSyntax node will have an ExpressionSyntax node as its Condition property, and that might turn out to be a BinaryExpressionSyntax node with a Left and Right  properties describing more ExpressionSyntax nodes and an OperationToken token that describes the operation.

Trivia are all the rest—preprocessor directives, whitespace, comments—riding on top of tokens.

The syntax tree, of course, has parent-child relationships between nodes, and there’s a set of APIs for traversing these relationships, such as DescendantNodes and FirstAncestor.

Syntax trees can be created from source code very easily. It is also a very quick process, because no binding or code emission takes place—only the lexer and parser are involved in the construction of the tree. (In the next post we’ll look at the semantic model as well, which requires symbol construction and binding.)

SyntaxTree tree = SyntaxTree.ParseCompilationUnit(@"
using System;
public class MyClass {
    public static void MyMethod() {
        Console.Write(""Hello There {0}"", 42);
        Console.Write(42);
    }
}
");
Console.WriteLine(tree.Root.GetFullText());

Inspecting the tree in the debugger visualizer (supplied with the Roslyn CTP as a sample) shows the following structure:

image

You can inspect and modify syntax trees directly, but the easier way would be to use a visitor class derived from SyntaxWalker or SyntaxRewriter. A quick code demo is better than a thousand words describing it, so here’s a rewriter that will modify numeric literals with the value 42 to the value 43:

/// <summary>
/// Replaces the numeric literal 42 with the
/// numeric literal 43.
/// </summary>
class MyLiteralRewriter : SyntaxRewriter
{
    protected override SyntaxNode VisitLiteralExpression(
        LiteralExpressionSyntax node)
    {
        if (node.Kind ==
            SyntaxKind.NumericLiteralExpression)
        {
            SyntaxToken token = node.Token;
            if (token.Value is int &&
                (int)token.Value == 42)
            {
                return node.ReplaceToken(
                    token, Syntax.Literal(
                           token.LeadingTrivia,
                           "43", 43,
                           token.TrailingTrivia));
            }
        }
        return node;
    }
}

Note that the VisitLiteralExpression method does not modify the node—it either returns the existing node, or returns a new node with a new literal token. The entire Roslyn API surface is like that—all objects are immutable, and you create new objects off existing ones.

How is this visitor applied to a syntax tree? To apply it, we need to give it the tree root, and it will return a new tree root. This new tree root can be compiled, analyzed, or simply … serialized to text:

SyntaxNode newRoot =
    new MyLiteralRewriter().Visit(newRoot);           
tree = SyntaxTree.Create(
    tree.FileName, (CompilationUnitSyntax)newRoot);
Console.WriteLine(tree.Root.GetFullText());

It goes without saying that this rewriter will detect only numeric literals—it will not match the number 42 when it appears in comments or in strings, as the more fallible regex-based approach may.

In the next post, we’ll look into a somewhat more complicated syntax rewriting visitor, which will require the semantic model of the code (i.e., symbols and their meanings) and not just the syntactic information.


I have been recently posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>