November 2010 - Posts
The TechEd is near and so is my session on Tuesday, 11:15 at the Hilton big hall. When I wrote about it two weeks ago, I really had no idea how awesome it’s going to turn out!
Some cool things I’m going to show during my session:
- Customizing IntelliTrace events with declarative and programmable data queries
- SOS and PSSCor2 “better together” and DML support
- Visualizing managed object references using the Visual Studio 2010 architecture diagramming support
- Bleeding-edge profiler reports with sampling on cache misses, tier interactions, and concurrency visualization
- Customizing profiler guidance with thresholds and messages
Immediately after my session I will upload the full session slides, demos, and links to additional material. The session video recordings should be available at the Microsoft TechEd channel shortly afterwards.
Other Sela speakers at the TechEd:
Alex Golesh will talk about XNA game development for Xbox, PC, and Windows Phone 7; I’ve seen a dry run of this great session.
- Manu Cohen-Yashar will talk about the Windows Azure Storage engine, which is something I really need to learn more about.
- Gil Fink will talk about building n-tier applications with Entity
Framework 4, which looks like a very mature ORM. - Elad Katz will talk about Silverlight, WPF, and MVVM, a case study.
- Shai Raiten will talk about testing in Visual Studio 2010, which features a very interesting bunch of technologies for developers and QA alike.
- Shmulik Segal will talk about the TFS 2010 build and deploy processes, which I hope are easier than what they used to be in TFS 2008.
- Amit Marlov will talk about application and website compatibility, from Internet Explorer 6 to Internet Explorer 8.
Yesterday I delivered another Sela Open House in Haifa (kudos to Philips MS for hosting this session). The subject was “Design and Architecture of Concurrent Applications”, and indeed much to my surprise I managed to avoid firing up WinDbg or writing lots of code, and instead talk about high-level principles.
Some of the things we covered:
- Hardware trends—the increasing number of processors means new challenges such as eliminating false sharing, reducing synchronization to a minimum, coping with NUMA architectures
- Types of concurrent applications—partitioning, pipeline
- Trends in software concurrency frameworks—the move from threads to tasks
- Types of concurrent algorithms—task parallelism, data parallelism, algorithms that don’t yield easily to parallelization
You can download the slides and sample code from here (caters for C# developers, but there’s an abundance of C++ ConcRT samples on the web as well). I also said I will provide some links for the material we didn’t cover on concurrency design patterns and debugging, so here goes:
Last time we left off on the brink of finishing the parser for Jack expressions. We need only fill in the blanks for parsing subroutine calls.
There are three forms of subroutine calls allowed in Jack:
class C {
constructor C new() { return this; }
function void f() {
var C c;
var D d;
var int i;
let c = C.new(); //ctor call
let d = D.new(); //ctor call
let i = c.m(); //method call on another instance
let i = d.m(); //method call on another instance
let i = D.f(); //function call on another instance
}
method int m() {
var int i;
let i = x(); //method call on this instance
return i;
}
method int x() { return 42; }
}
class D {
constructor D new() { return this; }
method int m() { return 42; }
}
The easiest form is the dotless method call on the same instance, e.g. “x()”. This thing is characterized by the lack of the dot. The next two forms are slightly more difficult to tell apart—either a method call on a variable, e.g. “c.m()” or a function call on a class, e.g. “D.m()”.
To tell the latter two forms apart, we need to see if the first identifier is a variable name or not. For that, we need a symbol table, discussed later. To tell the first form apart from the latter two forms, we need to see if the next identifier after the first one is a dot or an opening parenthesis. This is the only case in the Jack grammar when double look-ahead is required, making Jack an LL(2) language. Fortunately, we can code a special case for parsing sub-call so that we don’t need to introduce backtracking of the look-ahead token.
When the ParseTerm method calls the ParseSubCall method, it passes to it the token already eaten by ParseTerm (which could be a method name, a variable name, or a class name). The look-ahead token is then the second token in the BNF definition of sub-call, so the ParseSubCall method needs to treat it accordingly:
private void ParseSubCall(Token firstPart)
{
if (firstPart.Type != TokenType.Ident)
{
ThrowCompilationException(...);
}
string classNameOfSubroutine = null;
string subroutineName = null;
bool isInstanceMethod = false;
Symbol classOrVarName = GetClosestSymbol(firstPart.Value);
if (classOrVarName == null)
{
if (LookAheadToken.Value == "(")
{
...
classNameOfSubroutine = _currentClassName;
subroutineName = firstPart.Value;
isInstanceMethod = true;
}
else if (LookAheadToken.Value == ".")
{
Match(new Token(TokenType.Symbol, "."));
Token subName = NextToken();
if (subName.Type != TokenType.Ident)
{
ThrowCompilationException(...);
}
classNameOfSubroutine = firstPart.Value;
subroutineName = subName.Value;
isInstanceMethod = false;
}
else
{
ThrowCompilationException(...);
}
}
else
{
//We must be able to tell the class name by looking
//at the variable's type.
classNameOfSubroutine = classOrVarName.Type;
if (Syntax.IsKeyword(classNameOfSubroutine))
{
ThrowCompilationException(
"Can't call methods on built-in types");
}
Match(new Token(TokenType.Symbol, "."));
Token subName = NextToken();
if (subName.Type != TokenType.Ident)
{
ThrowCompilationException(...);
}
subroutineName = subName.Value;
isInstanceMethod = true;
}
Match(new Token(TokenType.Symbol, "("));
//If this is an instance method, we need to supply
//'this' as the first parameter to be extracted on
//the other side.
if (isInstanceMethod)
{
if (classNameOfSubroutine == _currentClassName)
{ //'this' is actually our very own instance,
//as we're calling an instance method on ourselves.
_codeGenerator.This();
}
else
{ //'this' is actually the variable on which
//we're making the call, so we need that pushed.
_codeGenerator.VariableRead(firstPart, false);
}
}
ParseExpressionList();
Match(new Token(TokenType.Symbol, ")"));
_codeGenerator.Call(classNameOfSubroutine, subroutineName);
}
private void ParseExpressionList()
{
if (LookAheadToken.Value == ")")
{
return;//This is an empty list
}
while (true)
{
ParseBLOCKED EXPRESSION;
if (LookAheadToken.Value != ",")
{
break;
}
Match(new Token(TokenType.Symbol, ","));
}
}
Now we must come back to the subject of a symbol table.
When compiling the method call “c.m()” where c is a variable of class C, we need the type of c to be available to the compiler to emit the appropriate method call. In other words, we need a repository that maps variable names to their types.
In a similar vein, when compiling the method call “C.m()” where C is a class name, we need the compiler to know that C is not a variable name in any active scope, and therefore it’s a class name. Again, we need a repository that contains the set of variable names in the current scope.
The repository that takes the latter two responsibilities is called a symbol table, and in our case is implemented using a simple Dictionary<string,Symbol> where Symbol is a class with properties Name, Type, and Kind (local, parameter, field, or static).
Clearly, some part of the compiler has to populate the symbol table. In most cases, it’s the parser’s responsibility (although some smart lexical analyzers may also help). In our compiler, the parser populates the symbol table when it processes local variable declarations in methods and class-level variable declarations.
Unfortunately, one symbol table is not enough. Consider the following legal Jack program fragment:
class C {
static int i;
function void f() {
var int i;
let i = 15;
}
function void g() {
let i = 10;
do C.f();
do System.printInt(i);
}
}
After invoking C.g, the output should be 10, not 15! In other words, we need at least two symbol tables to parse properly a Jack program—a class-level symbol table and a method-level symbol table. Fortunately, we don’t allow nested blocks in Jack, so two tables are indeed enough and we don’t have to deal with scope lookup for an identifier.
One final note: the error-handling capabilities of our parser are significantly limited by the fact that Jack does not require subroutines to be declared prior to their use, and the fact that our compiler is a one-pass compiler. In other words, we might happily parse the following program, only to determine at a later time that the method calls can’t be resolved:
class C {
function void f() {
do C.questionable_method();
do D.questionable_method();
}
}
One alleviation is to require subroutines to be declared prior to their use, i.e. something along the lines of:
class C {
function void C.questionable_method() FORWARD;
function void D.questionable_method() FORWARD;
function void f() {
do C.questionable_method();
do D.questionable_method();
}
}
This will at the very least ensure that the developer really meant to invoke the declared method.
Another approach is to perform two passes over the source code. In the first pass, the compiler doesn’t look inside subroutine bodies—it looks only at the class and method declarations to know “what’s available”. In the second pass, parsing can proceed with full knowledge of all class and method declarations and can flag any errors that arise. (Incidentally, the former approach is taken by the C++ compiler, and the latter approach taken by the C# compiler. Which one is friendlier to the developer?)
Next time, the full BNF of a Jack program and the final parser implementation.
Finally! The “TechEd10” tag is here, and I’m already working hard on my session at TechEd Israel 2010 (dev track) that’s going to take place in Eilat at the end of the month (November 28-30). I really hope to see you at my session, but even if you don’t make it but are in the area, feel free to drop by and have a chat.
My session this year is called “Deep Dive: Performance and Debugging in Visual Studio 2010”. If you’ve been to any of my sessions, you know they’re usually level-400 talks with a lot of demos and hardcore internals. This session is going to be no different—I’m going to show you around the revamped Visual Studio Profiler, explain how to use IntelliTrace in the development and production environments, and plug a little SOS 4.0 glory to diagnose bugs that render Visual Studio helpless.
See you in Eilat!
Yesterday I delivered a 2.5-hour presentation on Visual Studio 2010 and .NET 4.0 as part of the Open Houses program we do at Sela. There was definitely not enough time to cover everything new in Visual Studio and .NET, so I decided to focus on IntelliTrace, Code Contracts, Profiling, and Parallel Programming.
The slides from the session are available for download here. Enjoy, and I hope to see you around at other sessions!
Before we proceed to the full BNF of a Jack expression, we need to decide which operators we’re going to support. Our final implementation will have some additional operators, but for now we’ll settle for +, –, *, /, <, >, =, &, |, and !.
One obvious question when dealing with arithmetic and relational operators is the question of operator precedence.
What’s the value of 5+3*2? Is it 16 or 11?
What’s the value of 3<2+5? Is it 6, or 1, or something else entirely depending on the integer coercion of a Boolean value?
You might be tempted to think that operator precedence is the parser’s problem, i.e. we must install special parsing code to handle multiplications before additions. Indeed, this problem can be solved by using two stacks for operators and operands (e.g. see here).
However, we can solve the problem of operator precedence at the BNF level, from which we create the parsing procedures. This is one of the neatest parsing tricks. The following BNF provides proper operator precedence for arithmetic operators:
expression ::= factor ( ('+' | '-') factor )*
factor ::= term ( ('*' | '/') term )*
term ::= digit+
From this thing the parser code becomes immediately self-evident:
procedure ParseExpression
ParseFactor()
while LookaheadToken <> ";"
op = NextToken()
assert op in [ "+", "-" ]
ParseFactor()
print op
end while
end procedure
procedure ParseFactor
ParseTerm()
while LookaheadToken <> ";"
op = NextToken()
assert op in [ "*", "/" ]
ParseTerm()
print op
end while
end procedure
procedure ParseTerm
term := NextToken()
print term
end procedure What if we wanted to add support for parentheses to modify explicitly the operator precedence in an expression? Again, before being tempted to add special cases to the parser, consider the following grammar modification:
term ::= digit+ | '(' expression ')'
The modification to the parser is also immediate:
procedure ParseTerm
if LookAheadToken = "("
ParseBLOCKED EXPRESSION
Match(")")
else
term := NextToken()
print term
end if
end procedure
Incredible! Now we’re ready to appreciate the full BNF of a Jack expression (with added operator precedence compared to the TECS version):
expression ::= relexpr ( ('&' | '|') relexpr )*
relexpr ::= addexpr ( ('<' | '>' | | '=') addexpr )*
addexpr ::= mulexpr ( ('+' | '-') mulexpr )*
mulexpr ::= term ( ('*' | '/') term )*
term ::= int-const | str-const | keyword-const |
var-name | var-name '[' expression ']' |
sub-call | '(' expression ')' |
('-' | '!') term
sub-call ::= sub-name '(' expr-list ')' |
(class-name | var-name) '.' sub-name
'(' expr-list ')'
expr-list ::= ( expression ( ',' expression )* )?
Whew! This thing recognizes four levels of operator precedence through the use of three intermediate BNF definitions—relexpr, addexpr, and mulexpr. Note that the unary operators have the highest precedence, which is guaranteed by their placement at the bottom-most term level.
As before, the parser code flows naturally and beautifully from this BNF. This time, instead of using “print term” as the placeholder for actual operations, we call the code generator whenever we have some interesting information about the parsed program. (Another alternative is to construct an intermediate representation such as a parse tree, and have the code generator analyze it in one shot.)
private void ParseBLOCKED EXPRESSION
{
ParseRelationalBLOCKED EXPRESSION;
while (IsNextTokenLogicalOp())
{
Token logicalOp = NextToken();
ParseRelationalBLOCKED EXPRESSION;
switch (logicalOp.Value)
{
case "&":
_codeGenerator.And();
break;
case "|":
_codeGenerator.Or();
break;
default:
ThrowCompilationException(...);
break;
}
}
}
private void ParseRelationalBLOCKED EXPRESSION
{
ParseAddBLOCKED EXPRESSION;
while (IsNextTokenRelationalOp())
{
Token relationalOp = NextToken();
ParseAddBLOCKED EXPRESSION;
switch (relationalOp.Value)
{
case "<":
_codeGenerator.Less();
break;
case ">":
_codeGenerator.Greater();
break;
case "=":
_codeGenerator.Equal();
break;
case "<=":
_codeGenerator.LessOrEqual();
break;
case ">=":
_codeGenerator.GreaterOrEqual();
break;
case "!=":
_codeGenerator.NotEqual();
break;
default:
ThrowCompilationException(...);
break;
}
}
}
private void ParseAddBLOCKED EXPRESSION
{
ParseMulBLOCKED EXPRESSION;
while (IsNextTokenAddOp())
{
Token addOp = NextToken();
ParseMulBLOCKED EXPRESSION;
switch (addOp.Value)
{
case "+":
_codeGenerator.Add();
break;
case "-":
_codeGenerator.Sub();
break;
default:
ThrowCompilationException(...);
break;
}
}
}
private void ParseMulBLOCKED EXPRESSION
{
ParseTerm();
while (IsNextTokenMulOp())
{
Token mulOp = NextToken();
ParseTerm();
switch (mulOp.Value)
{
case "*":
_codeGenerator.Mul();
break;
case "/":
_codeGenerator.Div();
break;
case "%":
_codeGenerator.Mod();
break;
default:
ThrowCompilationException(...);
break;
}
}
}
private void ParseTerm()
{
if (LookAheadToken.Type == TokenType.IntConst)
{
Token intConst = NextToken();
_codeGenerator.IntConst(int.Parse(intConst.Value));
}
else if (LookAheadToken.Type == TokenType.StrConst)
{
Token strConst = NextToken();
_codeGenerator.StrConst(strConst.Value);
}
else if (IsNextTokenKeywordConst())
{
Token keywordConst = NextToken();
switch (keywordConst.Value)
{
case "true":
_codeGenerator.True();
break;
case "false":
_codeGenerator.False();
break;
case "null":
_codeGenerator.Null();
break;
case "this":
...
_codeGenerator.This();
break;
default:
ThrowCompilationException(...);
break;
}
}
else if (LookAheadToken.Type == TokenType.Ident)
{
//This is either a variable name, an array access,
//or a subroutine call. We need to look ahead:
Token varName = NextToken();
if (LookAheadToken.Value == ".")
{
ParseSubCall(varName);
}
else
{
//We’ll discuss symbol tables later. For now,
//this thing only helps us to figure out if the
//variable has been defined previously.
Symbol symbol = GetClosestSymbol(varName.Value);
if (symbol == null)
{
ThrowCompilationException(
"Undefined identifier '" +
varName.Value + "'");
}
bool arrAccess = false;
if (LookAheadToken.Value == "[")
{
arrAccess = true;
Match(new Token(TokenType.Symbol, "["));
ParseBLOCKED EXPRESSION;
Match(new Token(TokenType.Symbol, "]"));
}
_codeGenerator.VariableRead(varName, arrAccess);
}
}
else if (LookAheadToken.Value == "(")
{
Match(new Token(TokenType.Symbol, "("));
ParseBLOCKED EXPRESSION;
Match(new Token(TokenType.Symbol, ")"));
}
else if (IsNextTokenUnaryOp())
{
Token unaryOp = NextToken();
ParseTerm();
switch (unaryOp.Value)
{
case "-":
_codeGenerator.Negate();
break;
case "!":
_codeGenerator.Not();
break;
default:
ThrowCompilationException(...);
break;
}
}
else
{
ThrowCompilationException(...);
}
}
I intentionally omitted the parsing procedure for sub-call because it poses an interesting challenge, up next.