December 2007 - Posts
This weekend I wrote a simple helper application that is supposed to divide a file with several SQL-scripts to several files. Nothing serious, but the work gave me two insights.
The first one is, that it always, always, takes me half a day to re-learn the Regular Expression syntax. No matter how many times I used them, the next I'll need them I will still stare at the MSDN examples trying to figure out what's going on here. These magical strings are so goddamn incomprehensible.
The second one, and the subject of this post is, that I don't know how to write code without writing tests anymore. At first, I have to admit, I didn't mean to write tests at all. For this small a tool, I thought I would just spray some code and get it over with. But I couldn't. I was about to start coding when I thought that - wait, If I don't write any tests, how would I know that the code even works? I won't.
So I tried to recall how I used to work, in my pre-TDD days. I realized that what I did was: 1. Write all of the code, including the UI, without running it at all. 2. Start debugging. Now that's seems rather silly. Why not write working code to begin with, and also gain tests that will accompany the code until forever?
So I wrote the tests, and the code. At the end - everything worked from the first run. I didn't have to use a debugger at all. So what if this is a just a small app. There just doesn't seem to be another way to work anymore. It's in your blood.
This is a feature I was completely unaware of until I started digging into Linq-to-SQL recently. Upon using the Linq class designer on the Northwind database, I looked into the generated code, and saw something like this:
public partial class Category
{
partial void OnCreated();
public Category()
{
//Do stuff
OnCreated();
}
}
This is generated code for the Category entity, and you can see it includes a method definition for OnCreated, which does not have any body and is marked with the partial keyword. Now, if I wanted to add logic to the OnCreated method, I could do that in the second part of the partial class.
public partial class Category
{
partial void OnCreated()
{
Console.WriteLine("Category was created");
}
}
Since I can't actually change the generated code (as it might get re-generated, while deleting my changes), the partial method syntax provides a hook for me to enter my logic into the class construction, and put it in a different file.
How is it implemented?
In order to understand what's going on, we should look at the Category class in the reflector, before and after we add the second partial (the one that implements "OnCreated").
Before:
public class Category
{
}
And After:
public class Category
{
// Methods
public Category()
{
this.OnCreated();
}
private void OnCreated()
{
Console.WriteLine("Category was created");
}
}
As you can see, if we don't implement OnCreated, the method isn't even there! The compiler simply ignores its existence, so you won't see neither the method definition or any of the method calls. Also note that the method OnCreated is private. Partial methods always are - you cannot add access modifiers (public, protected) to them.
A question comes to mind: why not use events instead? Well, you could define a private OnCreated event, but a private event is a rather weird beast, isn't it? Since when a class needs events that no one else will get to look at? Not to mention that you will have to remember to register to that event. Also, consider that there might be a whole bunch of hooks such as OnCreated (and there are in the Linq-to-SQL generated code). Most of them will never get used, so why should they even appear in the class? Ignoring them completely makes the class a lot more lightweight.
All in all, this could be a pretty useful tool for all of you out there that generate code. On a side note: Man, I wish I could change the compiler every time I feel like I'm missing a feature. Those lucky Microsoft guys...
This is a small note for people trying to play with the MVC framework, and that are currently using Windows XP. If you dislike the ASP.NET development server that comes by default with Visual Studio as much as I do, you've switched your MVC application to work against IIS already. WinXP is running IIS 5, and you'll notice that the routing will stop working for you once you make that switch.
For instance, a request for /Products/Categories, where Products is a controller and Categories is an action on the controller, will fail. The problem is that IIS never routes this request to ASP.NET. Since you are not asking for a .aspx page, there's no way for it to know that it needs to load ASP.NET at all. So you have to tell it to map all the requests to ASP.NET. You'll do it like this:
- Right-click your application virtual directory on inetmgr.exe.
- Properties->Virtual Directory Tab-> Configuration.
- Add a new mapping extension. The extension should be .*, which will be mapped to the Executable C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\aspnet_isapi.dll, or the appropriate location on your computer (you can simply copy this from the mapping for .aspx files). On the mapping uncheck "check that file exists".
- 3 X OK and you're good to go.
- If you want, you can apply this setting to all your web sites. In step1, click on the "Default Web Site" node instead of your own virtual directory, and in step 2 go to the "Home Directory" tab. The rest is the same.
Now you've mapped every request on the web site to ASP.NET. To check that it works you can request a non existing html file on your web-site and see that the error you get is with an ASP.NET footer.
The post I wrote yesterday about Expression Trees, and Jafar Husain's work, have inspired me to find some more cool usages for this feature. Consider this code:
public class PersonRepository
{
public void Add(Person person, Context context)
{
if (person == null)
throw new ArgumentNullException("person");
if (context == null)
throw new ArgumentNullException("context");
//...Do the actual stuff
}
}
The main annoying thing about this code is having to write the parameter as a string, which might tackle refactoring of this method. Anyway, nobody likes too many strings in his code, right? Well, with Expression Trees and Extension Methods we can have this:
public void Add(Person person, Context context)
{
person.AssertNotNull( () => person );
context.AssertNotNull( () => context );
//...Do the actual stuff
}
You will note that the code is shorter and contains no strings. We can achieve this by writing the following extension method:
1 public static class AssertionExtensions
2 {
3 public static void AssertNotNull<T>(this T value, Expression<Func<T>> paramNameExpr)
4 {
5 string paramSymbol = ((MemberExpression)paramNameExpr.Body).Member.Name;
6
7 if (value == null)
8 throw new ArgumentNullException(paramSymbol);
9 }
10 }
This method accepts our value, and a lambda expression. The lambda expression should be a method that accepts nothing, and returns a value of the same type. "()=>person" is exactly that. From the lambda, we extract the name of the parameter to put in the exception message.
Now, some of you might wonder why I'm casting to MemberExpression in line 5 here. Isn't "()=>person" a ConstantExpression? There is no class here whose member is "person"! Well, actually, there is. since ()=>person is just a shortcut for an anonymous method, an anonymous class is created behind the scenes to support the fact that we are referencing a variable from an outer scope in the PersonRepository.Add method. This is the class whose member we're accessing.
I definitely like this syntax. If only it weren't so slow to build the expressions...
This week I'll talk about what seems to me like the coolest features of C# 3.0: lambda expressions and expression trees.
Lambda Expressions
We'll start with the part that is easier to understand. Remember anonymous methods from C# 2.0? Useful little bastards they are, only sometimes not so pleasant on the eyes.
public void SortListIgnoreCase()
{
List<string> list = new List<string>{"abc", "ADE", "dol"};
list.Sort(delegate(string s1, string s2)
{
return s1.ToLower().CompareTo(s2.ToLower());
}
);
}
So here we are sorting a list by passing a delegate of type Comparison to the sort method, which compares the lowercase version of the strings. Thing is, the syntax is not the prettiest. Here's how you'll write in C# 3.0:
list.Sort((s1,s2) => s1.ToLower().CompareTo(s2.ToLower()));
The parameter here is called a lambda expression. We are doing the exact same thing as before, only now there are just less words. The compiler is now smart enough to infer the types for the parameters by itself. For anonymous methods that are short as this, I recommend that from now on you will use only this syntax.
How is it Implemented?
Well, as you would expect. Reflector shows:
1 public void SortListIgnoreCase()
2 {
3 List<string> <>g__initLocal1 = new List<string>();
4 <>g__initLocal1.Add("abc");
5 <>g__initLocal1.Add("ADE");
6 <>g__initLocal1.Add("dol");
7 List<string> list = <>g__initLocal1;
8 if (CS$<>9__CachedAnonymousMethodDelegate3 == null)
9 {
10 CS$<>9__CachedAnonymousMethodDelegate3 = delegate (string s1, string s2) {
11 return s1.ToLower().CompareTo(s2.ToLower());
12 };
13 }
14 list.Sort(CS$<>9__CachedAnonymousMethodDelegate3);
15 }
The lines of interest are 8-11. In line 10 we can see how our lambda expression is changed to a standard anonymous method. We can also note that the compiler adds caching to our delegate, so it won't be created twice on the two runs of the same method.
Expression Trees
Now, say we wanted the Sort method to print the comparison algorithm to the console.
Specifically, we want that running something like:
public void SortListIgnoreCase()
{
List<string> list = new List<string>{"abc", "ADE", "dol"};
list.SortWithLogging((s1,s2) => s1.ToLower().CompareTo(s2.ToLower()));
}
Will print this:
Sorting with the following comparison: (s1, s2) => s1.ToLower().CompareTo(s2.ToLower())
Ha? Print a method? Yeah, you heard right, and it's damn easy to do with Expression Trees. Let's look at the sort method we wrote last week, when we talked about extension methods:
public static IEnumerable<T> Sort<T>(this IEnumerable<T> collection)
{
List<T> toSort = new List<T>(collection);
toSort.Sort();
return toSort;
}
We will now change our method to something like this:
public static void SortWithLogging<T>(this IEnumerable<T> collection, Expression<Comparison<T>> comparison)
{
Console.WriteLine("Sorting with the following comparison: " + comparison.ToString());
Comparison<T> actualComparison = comparison.Compile();
List<T> toSort = new List<T>(collection);
toSort.Sort(actualComparison);
}
Now, let's see what we have here. We've added a parameter, Comparison<T> to do the objects comparison when sorting. But wait, the parameter is actually Expression<Comparison<T>>, what the hell? The Expression abstract class and its derivatives can represent, in a strongly type manner, any C# expression. Once we declare a parameter of type Expression<Comparison<T>> and someone passes us a Comparison<T> lambda, the compiler builds an expression for us. We can easily print the expression to the screen, by invoking its ToString method. We can then Compile the expression to recieve the actual comparison delegate and use it.
How is it implemented?
We call this feature expression trees, since every expression is in fact a tree of other Expressions. Let's see how the compiler translates our call to SortWithLogging:
ParameterExpression C1;
ParameterExpression C2;
list.SortWithLogging<string>
(
Expression.Lambda<Comparison<string>>
(
Expression.Call
(
Expression.Call
(
C1 = Expression.Parameter(typeof(string), "s1"),
(MethodInfo) methodof(string.ToLower),
new Expression[0]
),
(MethodInfo) methodof(string.CompareTo),
new Expression[]
{
Expression.Call
(
C2 = Expression.Parameter(typeof(string), "s2"),
(MethodInfo) methodof(string.ToLower), new Expression[0]
)
}
),
new ParameterExpression[] { C1, C2 }
)
);
Yes, WTF?! is supposed to be your reaction. But indeed, the compiler translates s1.ToLower().CompareTo(s2.ToLower()) to this entire tree of expressions, via repeating calls to static methods on the Expression class. I will not start to explain each and every call, but you should understand the concept: The compiler is creating an object representation of a method, including the calls it makes, its use of parameters, everything. The parameters C1 and C2 represent our own two parameters s1 and s2.
One thing that should be said is that the expression building is slow. Very slow. Running Sort 10000 times took 62.5 milliseconds. Running SortWithLogging, even without the Console.WriteLine call, took more than 8 seconds! That's a big difference right there. Also, note that the caching is gone once we pass our lambda to a method that accepts an expression. Apparently, this issue is affecting Linq to SQL in a bad way.
For more cool stuff you can do with expression trees, have a look at using it for strong-typing INotifyPropertyChanged, logging, and mocking. Just beware of that performance hit!
Consider the following. You need to sort an IEnumerable<T>. By default, Sort is only available for T[] (arrays) and List<T> (generic list). So you write something like this (probably not the most efficient implementation, this is just an example):
namespace Utils
{
public static class CollectionUtils
{
public static IEnumerable<T> Sort<T>(IEnumerable<T> collection)
{
List<T> toSort = new List<T>(collection);
toSort.Sort();
return toSort;
}
}
}
And you'll use it like this:
IEnumerable<string> sorted = CollectionUtils.Sort(myCollection);
Which is OK, but not as pretty as myCollection.Sort(), a much cleaner syntax. With C# 3.0 new feature of Extension methods, it is now possible to achieve just that, with only one little change to our method signature:
public static IEnumerable<T> Sort<T>(this IEnumerable<T> collection)
Note the "this" key word before the parameter. Now in every code that references the Utils namespace, we can have pretty syntax:
using Utils;
...
IEnumerable<string> sorted = (new string[]{"hello", "friend"}).Sort();
Actually, Microsoft created this syntax for the sake of Linq, which requires a lot of operations on Enumerables. You will note now that in every time you use an IEnumerable<T> you get a whole bunch of extra methods:
This is thanks to the System.Linq.Enumerable static class which provides all these static methods on IEnumerable<T>. Still, it doesn't have a definition for Sort, and maybe some other methods you have in your own CollectionUtils class (I'm sure we all got one). Therefore, nothing stops you from going right now and changing all of the methods in your CollectionUtils/StringUtils/WhateverUtils to use the new syntax. Older code will compile just fine, since you can still call the methods with the direct syntax.
PowerCollections and Extension Methods
In fact, I liked the extension methods syntax so much, that I had to go and change the PowerCollections library that I use extensively. This is an open source library, which contains a lot of collections and data structures that do not come by default with the framework. It also has an Algorithms static class which falls beautifully to the "must be converted to use extension methods category". And so, I did. Now I have this:
using Wintellect.PowerCollections;
...
public void PowerCollectionsDemo()
{
string[] array = new string[] { "G", "H", "A", "B" };
array.Sort(); //Yey, can delete that from my own CollectionUtils
array.StableSortInPlace(); //A bit more clever
int result = array.LexicographicalCompare(new string[] {"B", "C", "D" });
Console.WriteLine(array.Reverse().AsString());
//... And a lot more
}
This is just an example of some of the methods you get with PowerCollections, and it all becomes even nicer with the new syntax. Sort is there, as you can see (and it works for any collection), but also dozens of other methods. Note the usage of the "AsString" method. This method recursively converts a collection to a string representation, but it was used to be called ToString. I had to go and change that to AsString since calling array.ToString() would have triggered the Array type member ToString, which would have printed something like System.String[]. Whenever the compiler has a conflict between a class method and an extension method, it will choose the class method.
Anyway, you can download the Binaries and Source for the changed PowerCollections right here.
How is it implemented?
Let's reflect on our last method.
public void PowerCollectionsDemo()
{
string[] array = new string[] { "G", "H", "A", "B" };
Algorithms.Sort<string>(array);
Algorithms.StableSortInPlace<string>(array);
int result = Algorithms.LexicographicalCompare<string>(array, new string[] { "B", "C", "D" });
Console.WriteLine(Algorithms.AsString<string>(Algorithms.Reverse<string>(array)));
}
Well, not much to it. All our extension method calls were converted by the compiler to call the static Algorithms class directly. One thing that should be mentioned is that you should always remember that this is no more than syntactic sugar for static methods. Your extension methods should always behave as such - that is, do not maintain state.
Until the next week, happy coding.