Writing a Compiler in C#: C Code Generation, Part 2

February 9, 2011

tags:
4 comments

We’re ready to deal with control statements and top-level program structure. Let’s tackle these one after the other.

The let statement has already been handled as part of the Assignment method in the previous installment.

The while statement in Jack requires evaluating an expression and deciding whether to continue or to jump to the end of the loop. At the end of the while statement block, there should be an unconditional jump to the beginning of the loop. Something along the following lines:

BEGINWHILE_0:
evaluate condition expression
if the top of the stack is false, goto ENDWHILE_0
statement block
goto BEGINWHILE_0

For all these gotos we’ll need a set of unique labels. It’s fairly easy to generate these labels, but allowing nesting of labels is less trivial. In other words, the BeginWhile and EndWhile methods need to correlate on the label names according to their nesting. Consider the following nested fragment:

while (x < 0) {
    while (y < 0) {
       let y = y + 1;
       let x = x + 1;
    }
}

If we were to take the naive approach that EndWhile emits a goto in the same order of labels as BeginWhile generates them, we’ll end up with:

BEGINWHILE_0:

if …, goto ENDWHILE_0
BEGINWHILE_1:
if …, goto ENDWHILE_1
goto BEGINWHILE_0
goto BEGINWHILE_1

These last two statements are clearly out of order, so a LIFO structure (a stack) is in place. We can abstract a stack of unique labels with a certain prefix (such as BEGINWHILE_) behind a LabelStack class, and then we have:

public override void BeginWhile()
{
    Output.WriteLine(
"{0}:", _beginWhileLabels.Push());
}
public override void WhileCondition()
{
    Output.WriteLine(
"if (__POP() == 0) goto {0};",
_endWhileLabels.Push());
}

public override void EndWhile()
{
    Output.WriteLine(
"goto {0};", _beginWhileLabels.Pop());
    Output.WriteLine(
"{0}:", _endWhileLabels.Pop());
}

The if statement is slightly more complicated because we have the possibility of an else block. The following is a translation that works regardless of whether the else block is there or not:

evaluate condition expression

if the top of the stack is false, goto ELSE_0

statement block

goto ENDIF_0

ELSE_0:

optional statement block

ENDIF_0:

Implementing BeginIf, PossibleElse, and EndIf with this translation in mind is so trivial that I won’t reproduce it here.

The return statement is really trivial—we simply emit a C return statement with __POP() as the argument. (Recall that void subroutines in Jack are translated to C functions that return __WORD.)

Finally, the do statement, which contains a subroutine call expression, is translated to a set of __PUSH calls for parameter passing (including THIS if the subroutine is an instance method or constructor), followed by a simple C function call.

With all the statements covered, there’s just the general program structure that remains. We’re going to rely on the fact that C does not require function declarations and assumes that undeclared functions return int—which is conveniently our choice for __WORD, the unified data type. The only challenge is that we need to emit the C struct corresponding to the Jack class and fields before we emit the C functions that may manipulate it, and not when the EndClass method is called.

To do this, we need an auxiliary data structure storing the list of fields encountered so far. The Jack grammar requires that all fields be declared prior to declaring any subroutines; therefore, when we first encounter a subroutine, the field declarations are over, and we can emit the C struct representing the Jack class.

The following are the code generator fragments that deal with the class declaration, the field and static declarations, and subroutine declarations. This completes the code generator.

private void EmitLocalsAndParameters(Subroutine subroutine)
{
    foreach (Symbol localVar in subroutine.Locals)
    {
        Output.WriteLine("  __WORD {0};", localVar.Name);
    }
    foreach (Symbol param in subroutine.Parameters)
    {
        Output.WriteLine("  __WORD {0};", param.Name);
    }
}
private void ExtractParameters(Subroutine subroutine)
{
    foreach (Symbol param in subroutine.Parameters.Reverse())
    {
        Output.WriteLine("  {0} = __POP();", param.Name);
    }
}
private int GetClassSizeInWords()
{
    return _fields.Count;
}
private void EmitStructDeclarationIfNotEmitted()
{
    if (!_emittedStructDeclaration)
    {
        _emittedStructDeclaration = true;
        Output.WriteLine("typedef struct __{0} {{",
_currentClassName);
        foreach (Symbol field in _fields)
        {
            Output.WriteLine("  __WORD {0};", field.Name);
        }
        //NOTE: C requires that every struct declaration
//have at least
one member. If there are no declared
//members, we introduce a
dummy member.
        if (_fields.Count == 0)
        {
            Output.WriteLine("  __WORD __DUMMY;");
        }
        Output.WriteLine("}} {0};", _currentClassName);
    }
}

public override void StaticDeclaration(Symbol variable)
{
    Output.WriteLine(
        "static __WORD {0};",
        FormatStaticName(variable.Name));
}
public override void FieldDeclaration(Symbol variable)
{
    _fields.Add(variable);
}
public override void ConstructorDeclaration(
Subroutine subroutine)
{
    EmitStructDeclarationIfNotEmitted();
    Output.WriteLine("__WORD {0} () {{",
FormatSubroutineName(subroutine.Name));
    EmitLocalsAndParameters(subroutine);
    Output.WriteLine("  __WORD THIS;");
    //A constructor must allocate memory for its
//class and
assign the result to the THIS local
//variable.

    Output.WriteLine(
"THIS = (__WORD) __ALLOC({0}*sizeof(__WORD));",
GetClassSizeInWords());
    ExtractParameters(subroutine);
}

public override void FunctionDeclaration(
Subroutine subroutine)
{
    EmitStructDeclarationIfNotEmitted();
    Output.WriteLine("__WORD {0} () {{",
FormatSubroutineName(subroutine.Name));
    EmitLocalsAndParameters(subroutine);
    ExtractParameters(subroutine);
}

public override void MethodDeclaration(
Subroutine subroutine)
{
    EmitStructDeclarationIfNotEmitted();
    Output.WriteLine("__WORD {0} () {{",
FormatSubroutineName(subroutine.Name));
    EmitLocalsAndParameters(subroutine);
    //By convention, 'this' is the first parameter
//of the subroutine
call, so it's the last thing
//on the stack.

    Output.WriteLine("__WORD THIS;");
    Output.WriteLine("THIS = __POP();");
    ExtractParameters(subroutine);
}

public override void EndSubroutine()
{
    Output.WriteLine("}");
}

The final thing we need for the output to be a legal C program is a bootstrapper. If we require that every Jack program have a Main class with a main method, then the bootstrapper becomes:

public override void EmitBootstrapper()
{
    Output.WriteLine("int main() {");
    Output.WriteLine("  Main__main();");
    Output.WriteLine("  return 0;");
    Output.WriteLine("}");
}

We also referred to such macros as __POP and __PUSH, so we need to define them as well. These macros, as well as some built-in subroutines, are declared in EmitEnvironment:

public override void EmitEnvironment()
{
    Output.WriteLine("#include <stdlib.h>");
    Output.WriteLine("#include <stdio.h>");
    Output.WriteLine("typedef int __WORD;");
    Output.WriteLine("static __WORD __SCRATCH1;");
    Output.WriteLine("static __WORD __SCRATCH2;");
    Output.WriteLine("#define __ALLOC(n) malloc(n)");
    Output.WriteLine("#define __FREE(p) free(p)");
           
    //The evaluation stack, limited in size. Note that
//__PUSH and __POP
are macros and not functions to
//make sure that they are inlined.

    Output.WriteLine("#define STACK_SIZE 1000");
    Output.WriteLine("static int __SP = 0;");
    Output.WriteLine("static __WORD __STACK[STACK_SIZE];");
    Output.WriteLine(
"#define __PUSH(w) __STACK[__SP++] = w ");
    Output.WriteLine("#define __POP() (__STACK[--__SP]) ");

    //Memory allocation and deallocation:
    Output.WriteLine("static __WORD Array__new() {");
    Output.WriteLine(
"  return (__WORD)__ALLOC(__POP()*sizeof(__WORD));");
    Output.WriteLine("}");
    Output.WriteLine("__WORD Memory__free() {");
    Output.WriteLine("  __FREE((void*)__POP());");
    Output.WriteLine("  return 0;");
    Output.WriteLine("}");

    //Input and output:
    Output.WriteLine("static __WORD System__print() {");
    Output.WriteLine("  printf(\"%s\", (char*)__POP());");
    Output.WriteLine("  return 0;");
    Output.WriteLine("}");
    Output.WriteLine("static __WORD System__println() {");
    Output.WriteLine("  printf(\"\\n\");");
    Output.WriteLine("  return 0;");
    Output.WriteLine("}");
    Output.WriteLine("static __WORD System__printInt() {");
    Output.WriteLine("  printf(\"%d\", (int)__POP());");
    Output.WriteLine("  return 0;");
    Output.WriteLine("}");
    Output.WriteLine("static __WORD System__readInt() {");
    Output.WriteLine("  int i;");
    Output.WriteLine("  scanf_s(\"%d\", &i);");
    Output.WriteLine("  return i;");
    Output.WriteLine("}");
}

That’s it—we have a working code generator that translates a Jack program to C code. This is an appropriate time to give you the full source code of the compiler we developed so far. You can find it here.

What’s next? To tell you the truth, I don’t know yet. I’m writing these lines at least a month before I’m going to publish this post, and at this time I have a functional Jack-to-C compiler. There are some additional things of interests that you might read here:

  • Intermediate optimization of the emitted C code
  • A code generator that emits x86 assembly language instead of C
  • Extending the language with additional statements such as for and switch…case
  • Rudimentary support for type checking
  • A floating-point data type
  • break support in loop statements
  • Support for comments
  • …any suggestions from you, Dear Reader?
Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*

4 comments

  1. Ben MorrisApril 19, 2011 ב 3:24 PM

    Great set of tutorials, really like the way you have divided them up. Gave them a quick read through on my lunch break but will go through them in depth.

    I like the idea of the next part supporting additional statements but particularly generating assembly, will be good to see the best way to reuse the code generator interface having parsed the same code.

    Reply
  2. AanchalAugust 2, 2011 ב 9:55 AM

    How to use this Compiler after compiling the source Code? How to use this compiler to convert a Jack program to C program ? Please Reply….

    Reply
  3. miguelJune 10, 2016 ב 8:22 PM

    Hello
    The link to “JackCompiler^_CCodeGenerator.zip” is down, can you re-upload?

    Thank you

    Reply
  4. JimJuly 8, 2016 ב 4:55 PM

    The link to the Jack Compiler from 2011 is broke. Can you provide a working link please? Thank you.

    Reply