Lap Around Roslyn CTP: Syntax Analysis and Flow Analysis

October 27, 2011

no comments

How many times have you seen in code reviews a piece of code that calls a method, say Dictionary<K,V>.TryGetValue, and ignores the return value? We are going on a quest to find all such invocations and produce a warning.

We’re going to derive from SyntaxWalker (and not SyntaxRewriter), because we won’t be doing any rewriting—just issue detection*. There are two major cases we need to consider:

  • The method is invoked without storing its result in a local variable or using it as part of an expression. Two examples:

dict.TryGetValue(1, out r);
for (int.TryParse(s, out i); ; ) …

  • The method is invoked and its result is assigned to a local variable, but that local variable is never read throughout the rest of the method.

The skeleton of our SyntaxWalker is as follows—this should seem familiar, with the exception of the VisitMethodDeclaration method that doesn’t return a value (remember, we’re not doing any rewriting—just inspection).

class MyIgnoredBooleanReturnValueLocator : SyntaxWalker
{
    private readonly SemanticModel _semanticModel;
    private readonly HashSet<SyntaxKind> _assignExprs;

    public MyIgnoredBooleanReturnValueLocator(
        SemanticModel model)
    {
        _semanticModel = model;
        _assignExprs = new HashSet<SyntaxKind>(
            new[] {
                SyntaxKind.AndAssignExpression,
               
SyntaxKind.AssignExpression,
                SyntaxKind.ExclusiveOrAssignExpression,
                SyntaxKind.OrAssignExpression
            });
    }

    protected override void VisitMethodDeclaration(
        MethodDeclarationSyntax node)
    {
    }
}

Why are we visiting the method declaration? The whole analysis seems appropriate at the method body level—this will make it easier to answer questions such as “is this local variable read throughout the rest of the method?”. To start, we need to look at all method invocations in the method we’re visiting:

foreach (InvocationExpressionSyntax invocation in
         node.DescendentNodes()
             .OfType<InvocationExpressionSyntax>())
{
    SemanticInfo methodInfo =
        _semanticModel.GetSemanticInfo(
            invocation.Expression);
    MethodSymbol methodSym = (MethodSymbol)
        methodInfo.Symbol;

    Symbol localVariableAssigned;
    SyntaxToken localVariableInitialized;
    bool needTrack = CheckInvocation(
        invocation,
        out localVariableAssigned,
        out localVariableInitialized);
    if (!needTrack)
        continue;
    //…
}

The CheckInvocation method checks whether the method invocation adheres to one of the suspicious patterns we have detected. This requires some work, as the tree structures of these patterns are quite different. We have the following four cases:

  1. The parent of the InvocationExpressionSyntax is an ExpressionStatementSyntax, representing the case where the method is invoked without treating the result as an expression at all.
  2. The parent of the InvocationExpressionSyntax is a ForStatementSyntax, in which case we need to check if the method invocations is a direct descendant of the Initializers or Incrementors collection on the ForStatementSyntax.
  3. The parent of the InvocationExpressionSyntax is a BinaryExpressionSyntax, in which case the Left property must be an IdentifierNameSyntax representing a local variable and the ExpressionKind must be one of the assignment expressions (=, &&=, etc.). This represents a local variable being assigned the method’s return value.
  4. The parent of the InvocationExpressionSyntax is an EqualsValueClauseSyntax whose parent is a VariableDeclaratorSyntax, representing a local variable being initialized with the method’s return value.

In cases #3 and #4 we want to return to the caller the local variable to track. There is another complication here, as in case #3 we have a symbol representing the local variable to speak of, whereas in case #4 we have only a SyntaxToken representing the identifier of the local variable. With that said, here’s the code:

private bool CheckInvocation(
    InvocationExpressionSyntax invocation,
    out Symbol localVariableAssigned,
    out SyntaxToken localVariableInitialized)
{
    localVariableAssigned = null;
    localVariableInitialized = NoToken;

    SemanticInfo info = _semanticModel.GetSemanticInfo(
        invocation);
    MethodSymbol methodSymbol = (MethodSymbol)info.Symbol;
    if (methodSymbol.ReturnType.SpecialType !=
        SpecialType.System_Boolean)
        return false;
    if (!methodSymbol.Name.Contains("Try"))
        return false;

    if (invocation.Parent is ExpressionStatementSyntax)
    {
        WARN("Invocation of {0} ignores its return value.",
             methodSymbol.Name);
        return false;
    }
    ForStatementSyntax forStmt =
       
invocation.Parent as ForStatementSyntax;
    if (forStmt != null)
    {
        if (forStmt.Initializers.Contains(invocation) ||
            forStmt.Incrementors.Contains(invocation))
        {
            WARN("Invocation of {0} as part of the ‘for’" +
                 "statement ignores its return value.",
                 methodSymbol.Name);
            return false;
        }
    }
    BinaryExpressionSyntax binaryExpr =
        invocation.Parent as BinaryExpressionSyntax;
    if (binaryExpr != null &&
        _assignExprs.Contains(binaryExpr.Kind))
    {
        IdentifierNameSyntax id =
            binaryExpr.Left as IdentifierNameSyntax;
        if (id != null)
        {
            Symbol symbol =
               
_semanticModel.GetSemanticInfo(id).Symbol;
            if (symbol.Kind == SymbolKind.Local)
            {
                localVariableAssigned = symbol;
                return true;
            }
        }
    }
    EqualsValueClauseSyntax equalsClause =
        invocation.Parent as EqualsValueClauseSyntax;
    if (equalsClause != null)
    {
        VariableDeclaratorSyntax varDecl =
            equalsClause.Parent as
                         VariableDeclaratorSyntax;
        if (varDecl != null)
        {
            SyntaxToken localVar = varDecl.Identifier;
            localVariableInitialized = localVar;
            return true;
        }
    }

    return false;
}

In the latter two cases, the caller needs to track the local variable being initialized or assigned throughout the rest of the method and determine whether it’s being used. For simplicity (and brevity!) we’ll ignore the case that the variable is being overwritten before it’s read, or being read conditionally—that’s not to say these cases can’t be treated, at least partially**.

How are we going to figure out whether the variable is being read? The naïve approach would be to examine every possible syntax node that might read the variable. This would be very difficult and error-prone. Instead, Roslyn offers a data-flow analysis API that can answer questions like “is this variable read/written in this block?” or “which variables are written outside this region?”. (To be fair, there’s also a control-flow analysis API, not shown here, which can answer questions like “what are all the locations where control leaves this block?” and “what are all the target locations to which control arrives in this block?”.)

Going back to our VisitMethodDeclaration method, we can complete the tracking code with the following:

RegionDataFlowAnalysis flow =
    _semanticModel.AnalyzeRegionDataFlow(
        TextSpan.FromBounds(invocation.FullSpan.End,
                            node.Span.End));
if (localVariableAssigned != null)
{
    if (!flow.ReadInside.Contains(localVariableAssigned))
    {
        WARN("The local variable {0} assigned the " +
             "return value of {1} is never read.",
             localVariableAssigned.Name, methodSym.Name);
    }
}
else if (localVariableInitialized != NoToken)
{
    VariableDeclaratorSyntax varDecl =
        (VariableDeclaratorSyntax)invocation.Parent.Parent;
    if (!flow.ReadInside.Any(
         sym => sym.Name ==
             localVariableInitialized.ValueText &&
         sym.Kind == SymbolKind.Local &&
         sym.Locations.Any(loc =>
             loc.SourceSpan.IntersectsWith(varDecl.Span))))
    {
        WARN("The local variable {0} initialized with " +
             "the return value of {1} is never read.",
             localVariableInitialized.ValueText,
             methodSym.Name);
    }
}

This concludes our syntax analysis—we have detected with absolute certainty several cases in which the method’s return value is ignored. There are some cases we don’t detect—but hopefully they are a small minority.


* Even though it could be interesting to automatically insert code that checks the return value and throws an appropriate exception, this is probably a bad idea in most cases :-)

** Trying to solve the general problem with absolute precision is simply impossible. Compiler theorists simply can’t pass up on the opportunity to discuss an undecidable problem, so here goes:

The Halting Problem can be reduced to deciding the language

L = { <x,P>: the variable x is read in every execution of program P }

The reduction proceeds as follows. Given an input <T,w> we construct the following program P:

  1. int x = 0
  2. run T on w
  3. print(x)

Now, P reads the value of x iff P reaches line #3 iff T halts on w. This completes the reduction, showing that the language is undecidable (because the Halting Problem is undecidable), and leaves only heuristics to speak of.


 I have been recently posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>