Writing a Compiler in C#: Parsing, Part 4

December 12, 2010

tags:
no comments

That’s it. We’re ready for the full BNF of the Jack grammar, followed by the top-down parser of a complete Jack program. Here goes:

class        ::= class cls-name { cls-var-decl* sub-decl* }
cls-var-decl ::= ( static | field ) type var-name
                
( , var-name )* ;
type         ::= int | char | boolean | cls-name
sub-decl     ::= ( constructor | function | method )
                 ( void | type ) sub-name
                
‘(‘ param-list)’ sub-body
sub-body     ::= { var-decl* statement* }
var-decl     ::= var type var-name ( , var-name )* ;
statement    ::= let-stmt | if-stmt | while-stmt |
                 do-stmt  | return-stmt
let-stmt     ::= let var-name ( [ expr ] )? = expr ;
if-stmt      ::= if(‘ expr)’ { statement* }
                
( else { statement* } )?
while-stmt   ::= while(‘ expr)’ { statement* }
do-stmt      ::= do sub-call ;
return-stmt  ::= return expr? ;

The rest is expr (expression) and sub-call, which we’ve already seen. Generating a parser for the above BNF is trivial, much easier than what we did with expressions:

private void ParseClass()
{
    Match(new Token(TokenType.Keyword, "class"));
           
    Token className = NextToken();
    if (className == null)
        Expected("class name");

    if (className.Type != TokenType.Ident)
        Expected("identifier");
           
    _currentClassName = className.Value;
    _codeGenerator.BeginClass(className.Value);

    Match(new Token(TokenType.Symbol, "{"));
    ParseClassVarDecls();
    ParseSubDecls();
    Match(new Token(TokenType.Symbol, "}"));
    _codeGenerator.EndClass();

    _currentClassName = String.Empty;
    _classSymTable.Clear();
}

private void ParseClassVarDecls()
{
    while (IsNextTokenClassVarDecl())
    {
        ParseClassVarDecl();
    }
}
private void ParseClassVarDecl()
{
    Token varKind = NextToken();
    bool isStatic = varKind.Value == "static";
    Token varType = NextToken();
    Token varName = NextToken();

    do
    {
        if (isStatic)
        {
            Symbol variable = new Symbol(
                varType.Value, varName.Value,
SymbolKind.Static, _currentClassName);
            _classSymTable.AddSymbol(variable);
            _codeGenerator.StaticDeclaration(variable);
        }
        else
        {
            Symbol variable = new Symbol(
                varType.Value, varName.Value,
SymbolKind.Field, _currentClassName);
            _classSymTable.AddSymbol(variable);
            _codeGenerator.FieldDeclaration(variable);
        }
    } while (LookAheadToken.Value == ",");

    Match(new Token(TokenType.Symbol, ";"));
}
private void ParseSubDecls()
{
    while (IsNextTokenSubDecl())
    {
        ParseSubDecl();
    }
}
private void ParseSubDecl()
{
    Contract.Requires(IsNextTokenSubDecl());

    Token subKind = NextToken();
    Token returnType = NextToken();
    Token subName = NextToken();

    if (subKind.Type != TokenType.Keyword ||
        !new[]{"constructor", "method", "function"}
.Contains(subKind.Value))
    {
        ThrowCompilationException(...);
    }
    if (returnType.Type != TokenType.Keyword &&
        returnType.Type != TokenType.Ident)
    {
        ThrowCompilationException(...);
    }
    if (subName.Type != TokenType.Ident)
    {
        ThrowCompilationException(...);
    }

    _currentSub = new Subroutine(
        (SubroutineKind) Enum.Parse(typeof (SubroutineKind),
subKind.Value, ignoreCase: true),
        _currentClassName, subName.Value, returnType.Value);

    if (_currentSub.Kind == SubroutineKind.Constructor &&
        _currentSub.ReturnType != _currentClassName)
    {
        ThrowCompilationException(...);
    }

    Match(new Token(TokenType.Symbol, "("));
    ParseFormalParamList();
    Match(new Token(TokenType.Symbol, ")"));


    ParseSubBody();

    _currentSub = null;
    _methodSymTable.Clear();
}
private void ParseFormalParamList()
{
    while (LookAheadToken.Value != ")")
    {
        Token paramType = NextToken();
        Token paramName = NextToken();

        if (paramType.Type != TokenType.Keyword &&
            paramType.Type != TokenType.Ident)
        {
            ThrowCompilationException(...);
        }
        if (paramName.Type != TokenType.Ident)
        {
            ThrowCompilationException(...);
        }

        Symbol parameter = new Symbol(paramType.Value,
paramName.Value, SymbolKind.Parameter,
_currentClassName);
        _currentSub.Parameters.Add(parameter);
        _methodSymTable.AddSymbol(parameter);

        if (LookAheadToken.Value == ",")
        {
            NextToken();//Skip the comma
        }
    }
}
private void ParseSubBody()
{
    Match(new Token(TokenType.Symbol, "{"));
    ParseLocalVarDecls();
           
    switch (_currentSub.Kind)
    {
        case SubroutineKind.Constructor:
            _codeGenerator.ConstructorDeclaration(_currentSub);
            break;
        case SubroutineKind.Function:
            _codeGenerator.FunctionDeclaration(_currentSub);
            break;
        case SubroutineKind.Method:
            _codeGenerator.MethodDeclaration(_currentSub);
            break;
        default:
            ThrowCompilationException(...);
            break;
    }
           
    ParseStatements();
           
    //Make sure the method ends with 'return'. If the
//method is void,
return an an arbitrary value. This
//could be left to the CG's
discretion as well.
    if (!_wasLastStatementReturn)
    {
        if (_currentSub.ReturnType == "void")
        {
            _codeGenerator.IntConst(0);
            _codeGenerator.Return();

        }
        else
        {
            ThrowCompilationException(...);
        }
    }
    _wasLastStatementReturn = false;
    _codeGenerator.EndSubroutine();

    Match(new Token(TokenType.Symbol, "}"));
}
private void ParseLocalVarDecls()
{
    while (IsNextTokenLocalVarDecl())
    {
        ParseLocalVarDecl();
    }
}
private void ParseLocalVarDecl()
{
    Contract.Requires(IsNextTokenLocalVarDecl());

    Match(new Token(TokenType.Keyword, "var"));
    Token varType = NextToken();

    while (true)
    {
        Token varName = NextToken();

        Symbol variable = new Symbol(
            varType.Value, varName.Value,
SymbolKind.Local, _currentClassName);

        //Locals can't clash with parameter names.
        if (_methodSymTable.HasSymbol(variable.Name))
        {
            ThrowCompilationException(...);
        }

        _methodSymTable.AddSymbol(variable);
        _currentSub.Locals.Add(variable);


        if (LookAheadToken.Value != ",")
        {
            break;
        }
        Match(new Token(TokenType.Symbol, ","));
    }

    Match(new Token(TokenType.Symbol, ";"));
}

private void ParseStatements()
{
    while (LookAheadToken.Value != "}")
    {
        ParseStatement();
    }
}
private void ParseStatement()
{
    _wasLastStatementReturn = false;
    Token keyword = LookAheadToken;
    if (keyword.Type != TokenType.Keyword)
    {
        ThrowCompilationException(...);
    }
    switch (keyword.Value)
    {
        case "let":
            ParseLetStatement();
            break;
        case "if":
            ParseIfStatement();
            break;
        case "while":
            ParseWhileStatement();
            break;
        case "do":
            ParseDoStatement();
            break;
        case "return":
            ParseReturnStatement();
            break;
        default:
            ThrowCompilationException(...);
            break;
    }
}
private void ParseReturnStatement()
{
    Match(new Token(TokenType.Keyword, "return"));
    if (LookAheadToken.Value != ";")
    {
        if (_currentSub.ReturnType == "void")
        {
            ThrowCompilationException(...);
        }
        //Constructors may only return 'this'.
        if (_currentSub.Kind == SubroutineKind.Constructor &&
            LookAheadToken.Value != "this")
        {
            ThrowCompilationException(...);
        }
        ParseExpression();
    }
    else
    {
        if (_currentSub.ReturnType != "void")
        {
            ThrowCompilationException(...);
        }
        //As a convention, void methods return 0.
        _codeGenerator.IntConst(0);
    }
    _codeGenerator.Return();
    Match(new Token(TokenType.Symbol, ";"));
    _wasLastStatementReturn = true;
}

private void ParseDoStatement()
{
    Match(new Token(TokenType.Keyword, "do"));
    ParseSubCall();
    Match(new Token(TokenType.Symbol, ";"));
    _codeGenerator.DiscardReturnValueFromLastCall();
}
private void ParseWhileStatement()
{
    Match(new Token(TokenType.Keyword, "while"));
    Match(new Token(TokenType.Symbol, "("));
    _codeGenerator.BeginWhile();
    ParseExpression();
    _codeGenerator.WhileCondition();
    Match(new Token(TokenType.Symbol, ")"));
    Match(new Token(TokenType.Symbol, "{"));
    ParseStatements();
    Match(new Token(TokenType.Symbol, "}"));
    _codeGenerator.EndWhile();
}
private void ParseIfStatement()
{
    Match(new Token(TokenType.Keyword, "if"));
    Match(new Token(TokenType.Symbol, "("));
    ParseExpression();
    _codeGenerator.BeginIf();
    Match(new Token(TokenType.Symbol, ")"));
    Match(new Token(TokenType.Symbol, "{"));
    ParseStatements();
    Match(new Token(TokenType.Symbol, "}"));
    _codeGenerator.PossibleElse();
    if (LookAheadToken.Value == "else")
    {
        Match(new Token(TokenType.Keyword, "else"));
        Match(new Token(TokenType.Symbol, "{"));
        ParseStatements();
        Match(new Token(TokenType.Symbol, "}"));
    }
    _codeGenerator.EndIf();
}

private Symbol GetClosestSymbol(string symName)
{
    if (_methodSymTable.HasSymbol(symName))
        return _methodSymTable.GetSymbol(symName);
    if (_classSymTable.HasSymbol(symName))
        return _classSymTable.GetSymbol(symName);
    return null;
}
private void ParseLetStatement()
{
    Match(new Token(TokenType.Keyword, "let"));
           
    Token varName = NextToken();
    if (GetClosestSymbol(varName.Value) == null)
    {
        ThrowCompilationException(...);
    }
    if (!_methodSymTable.HasSymbol(varName.Value))
    {
        Symbol variable =
_classSymTable.GetSymbol(varName.Value);
        if (variable.Kind == SymbolKind.Field &&
            _currentSub.Kind != SubroutineKind.Constructor &&
            _currentSub.Kind != SubroutineKind.Method)
        {
            ThrowCompilationException(...);
        }
    }

    bool withArrayIndex = false;
    if (LookAheadToken.Value == "[")
    {
        VerifyArrayAccessAllowed(varName);
        withArrayIndex = true;
        Match(new Token(TokenType.Symbol, "["));
        ParseExpression();
        Match(new Token(TokenType.Symbol, "]"));
    }
    Match(new Token(TokenType.Symbol, "="));
    ParseExpression();
    _codeGenerator.Assignment(varName, withArrayIndex);
    Match(new Token(TokenType.Symbol, ";"));
}

There’s a lot of code here, but it really writes itself. Now we have a fully functional parser that delegates to the code generator the actual … code generation. Up next, generating fully functional C code by implementing a code generator for this parser.

NOTE: If you’re only interested in getting to a working version of a compiler, feel free to wait for the next post. Otherwise, I’m assuming you’re ready for some theoretical discussion.

One area where our compiler enjoys a huge relaxation is error handling and recovery. So far, we have taken the approach of failing whenever something looked wrong—the liberal sprinkling of “ThrowCompilationException(…)” in the code is unmistakable evidence that we’re not actually doing any recovery.

Compare this behavior to that of a C# or C++ compiler. For example, the following clearly invalid program produces three compilation errors, not just one:

C:\temp>type invalid_program.cs

class Program {

    static int Main() {

        i = 5;

        foo();

        return "hello";

    }

}

C:\temp>csc invalid_program.cs

Microsoft (R) Visual C# 2010 Compiler version 4.0.30319.1

Copyright (C) Microsoft Corporation. All rights reserved.

invalid_program.cs(3,3): error CS0103: The name ‘i’ does not exist in the current context

invalid_program.cs(3,3): error CS0103: The name ‘foo’ does not exist in the current context

invalid_program.cs(3,10): error CS0029: Cannot implicitly convert type ‘string’ to ‘int’

How can we introduce a similar behavior into our compiler? The Dragon Book [section 4.1.4] suggests four possible methods of error recovery:

  • Panic-mode recovery—upon discovering an error, the parser should start discarding input symbols until it reaches a synchronizing token, such as ; or }. This is a fairly simple approach; in our example above, a compiler employing panic-mode recovery with ; as a synchronizing token will find all three errors in the program.
  • Phrase-level recovery—upon discovering an error, the parser should perform error correction on the input.
  • Error productions—modify the grammar to accommodate for common errors and issue appropriate error messages when the error is detected. Parsing can continue after the erroneous sentence has been processed, as there has been no error in the lexical sense.
  • Global correction—upon discovering an error, find the minimal sequence of corrections to the source program. Although there are general-purpose algorithms that do this, in practice this approach is too costly to incorporate into any real-world compiler.

Of these four approaches, it would make sense to implement panic-mode recovery in the Jack compiler. The synchronizing tokens would be the semicolon, a closing curly brace, and the entire set of statement-beginning keywords (do, let, et al.). The implementation, however, is left as an exercise to the reader, or at the very least—to a much later phase of our compiler development.

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*