November 2008 - Posts
In the last edition of Visual Studio Magazine, a reader-mail entry stated that he preferred VB.Net over C# because C# is ugly. Without entering that discussion, I find C# pretty at times. That reminded me of a an episode from high school. Our math teacher was ill and none of the regular replacement teachers were available. Instead we got a math professor from a university who stepped on late notice. I don’t remember much, but one thing got burnt into my memory. During that class she stood by the blackboard and scribbled something that at the time looked gibberish to me. For a few minutes she stood there with her back to the pupils. When she finally turned to us, she gazed towards us, back to the blackboard and then to us again. She said with a passionate voice: “Can you see the beauty?”
I couldn’t, and for years I didn’t even understand what can be beautiful in numbers or in source code.
This morning when the kids were still asleep I needed to compare two parse results and see if they differ.
private bool IdentifiersChanged(IEnumerable<ScopeIdentifier> parseResult)
{
if (_scopeIdentifiers == null)
return true;
var firstModification = parseResult
.Except(_scopeIdentifiers, new ScopeIdentifierComparer())
.FirstOrDefault();
return (firstModification != null);
}
Can you see the beauty?
We use Database Projects to keep multiple versions of our databases in sync (both schema and data). Lately I’ve encountered several annoying build errors. The cause for all of them seemed to be related to the fact that the validation database had gotten out of sync.
Partial error message (so that SE can pick it up): Error 7 TSD7031: An object with name already exists in the database project…
Solution:
- Close the database project.
- If the validation database is still present on the server, delete it. (The name of the database is ProjectName_GUID, where ProjectName is the name of the database project)
- Delete the ProjectName.dat file.
- Reopen the database project.
I just read a post “Entity Framework - Disappointment” where a decision was made to ditch the EF because of some deficiencies. I want to emphasize that I know nothing about that specific project and the point of this post is not by any means to “attack” their approach. It is not my intention to pick on that specific post since I’ve seen similar complaints elsewhere. It did catch my attention though. I started with a comment, but it quickly became too long.
My first reaction when I read the list was that I wanted to check how difficult it would be to deal with these issues in our repository implementation.
The list of deficiencies:
- No eager loading. Need to specify .Include(“”) for every query.
- No way to make to turn change tracking on/off except from changing every query.
- A bug that prevents simple queries where nvarchar columns are used in the criteria.
- A designer error with a cryptic error message.
Before I go over the list of issues, it is important to note that in our whole system which has many hundred queries there is only one ObjectQuery<T>. What makes each query return different data are the specifications passed to the GetXXX() methods.
Every query in the repository is created by a method that looks similar to the following. The real implementation is a little bit more complex than this.
/// <summary>
/// Create a query for type T.
/// </summary>
private ObjectQuery<T> CreateQuery<T>()
{
ObjectQuery<T> query;
string entitySetName = GetEntitySetName(typeof(T));
query = ObjectContext.CreateQuery<T>(entitySetName);
return query;
}
No Eager Loading
This is an issue that has been criticized a lot since the initial release of the Entity Framework. Since I didn’t know how to achieve this, I’ll walk us through the exploratory steps towards a somewhat simplistic solution.
The eager loading requirements I defined were as following.
- A property on the repository should allow the developer consuming the repository to opt in/out for eager loading for all subsequent queries.
- No additional parameters should be required. (No use of the IncludeBuilder mentioned in my previous post on the repository)
- The eager load will only load related entities one level deep.
As you might know, the Entity Framework ObjectContext contains metadata. I was convinced that by exploring the metadata I would find a way to extract a list of related entities based on the type requested from the repository. After spending a few minutes in the debugger looking at a loaded ObjectContext I found that the MetaDataWorkspace contains all the information we need. Now we only need to write the code…
Here’s the method that retrieves the navigation values that we will need to insert into the calls to .Include(“”).
private IEnumerable<string> GetEntitySetNavigations(EntitySet entitySet)
{
return entitySet.ElementType.NavigationProperties.Select(p => p.Name);
}
We can now change the CreateQuery() to include the following.
if (EagerLoad)
{
foreach (string navigation in GetEntitySetNavigations(entitySet)
{
query = query.Include(navigation);
}
}
return query;
That’s it. Eager loading by setting a property on the repository.
No property to turn change tracking off
Since the property that determines change tracking is a property of the ObjectQuery<T>() variable in CreateQuery, we can solve the tracking issue by adding a property to the repository as well.
public bool TrackChanges { get; set; }
We now need to check this property in the CreateQuery() method.
We’ll change the last line of CreateQuery() to:
…
if (!TrackChanges)
query.MergeOption = MergeOption.NoTracking;
return query;
…
This effectively allow us to turn change tracking on and off as we see fit either on a per query or per session basis.
Can’t use nvarchar in criteria
We have not encountered this issue. I read the msdn forum thread referenced and I agree that this could be a showstopper.
Cryptic error messages by the designer.
I agree. The designer is not very helpful in telling you where things went wrong. In order to make things a little smoother, I usually add a few entities at a time. That way you narrow down where your issues are. Not that that makes dealing with the errors much fun, but I haven’t gotten hung up too much.
To summarize, the Entity Framework has a lot of power, and it does take some time to get familiar with it. But I guess that’s the case for any complex technology. I would have to disagree with the blanket statement that EF is a disappointment. It does have some rough edges, but that’s the case with many v1 technologies. If you don’t mind doing some plumbing code then EF can be a good choice IMHO.
BTW, It took much longer to write this post than to code changes to the repository. Maybe I should stick with coding. :-)
In my last post I explored a little about how we use the Entity Framework. One question that comes up a lot is how do you test your services with the data access layer without hitting the database. Not hitting the database during tests is not only a performance issue. Unless you build and tear down your data on every run you have to make sure the test data is consistent. So how do we do it?
The first step is to create an in-memory version of the repository. This implementation will keep data in memory instead of hitting the database. It is important to build this in-memory version early and evolve it with the real repository. By having an in-memory implementation of the repository you immediately get a warning flag if you try to exploit some esoteric feature of your ORM or other data access technology. Not that it is inherently bad to exploit such features, but data access technologies come and go so this should be done carefully. (L2S anyone?)
Note that the implementation below is not only for unit tests. You can inject this implementation instead of a real one in the application to run it with consistent data.
Here’s a refresher of the interface.
public interface IRepository : IDisposable
{
T[] GetAll<T>();
T[] GetAll<T>(Expression<Func<T, bool>> filter);
T GetSingle<T>(Expression<Func<T, bool>> filter);
T GetSingle<T>(Expression<Func<T, bool>> filter, List<Expression<Func<T, object>>> subSelectors);
void Delete<T>(T entity);
void Add<T>(T entity);
int SaveChanges();
DbTransaction BeginTransaction();
}
Now let’s jump to the incredibly complex implementation of the in-memory implementation of the repository. :-)
As you can see below, the code to implement a in-memory repository is remarkably slim. Despite the fact that the code is not rocket science, let’s walk through some interesting parts. The filters are all implemented as Expressions<>. In the in-memory implementation we don’t need to do anything with the expressions, so we compile them and pass them as specifications to Linq to Objects.
Another interesting point (IMHO) is the fact that it has become so simple to work with types. You just just call .OfType<> and pass in the type you want and Linq does the rest.
The last point I would like to emphasize, is the use of a NullTransaction. We don’t need transactions for our in-memory implementation, but the service code might call any of the methods on the DbTransaction object returned from BeginTransaction. By the use of a NullTransaction (See code below) we simulate transactions. (I guess you could use a mocking framework for some of this, but I’m not there yet…)
Let’s dissect how this works:
public T GetSingle<T>(Expression<System.Func<T, bool>> filter)
{
var predicate = filter.Compile();
return _storage.OfType<T>().Where(p => predicate(p)).FirstOrDefault();
}
The first step is to compile the expression so that we end up with a predicate that we use in our Linq query. In the next line we first specify the type T, which was passed in by the caller so we just pass that on. Next we call OfType<T> to filter on only the requested type and then we filter using Where() passing in our predicate. We then return to the caller either the first instance found or nothing. That’s it.
Here’s the full implementation. (Note: The SaveChanges() just returns 1. If you have service logic that depends on the actual return value you will have to track changes yourself.)
public class InMemoryRepository : IRepository
{
private List<object> _storage = new List<object>();
#region IRepository Members
public T[] GetAll<T>()
{
return _storage.OfType<T>().ToArray();
}
public T[] GetAll<T>(Expression<System.Func<T, bool>> filter)
{
var predicate = filter.Compile();
return _storage.OfType<T>().Where(p => predicate(p)).ToArray();
}
public T GetSingle<T>(Expression<System.Func<T, bool>> filter)
{
var predicate = filter.Compile();
return _storage.OfType<T>().Where(p => predicate(p)).FirstOrDefault();
}
public T GetSingle<T>(Expression<System.Func<T, bool>> filter, List<Expression<System.Func<T, object>>> subSelectors)
{
// no need for sub selectors in L2O.
return GetSingle<T>(filter);
}
public void Delete<T>(T entity)
{
_storage.Remove(entity);
}
public void Add<T>(T entity)
{
_storage.Add(entity);
}
public int SaveChanges()
{
return 1;
}
public DbTransaction BeginTransaction()
{
return new NullTransaction();
}
#endregion
#region IDisposable Members
public void Dispose()
{
// nothing to dispose here
}
#endregion
}
Here is the implementation of the NullTransaction
public class NullTransaction : DbTransaction
{
protected override DbConnection DbConnection
{
get { throw new NotImplementedException(); }
}
public override IsolationLevel IsolationLevel
{
get { throw new NotImplementedException(); }
}
public override void Commit()
{
// do nothing
}
protected override void Dispose(bool disposing)
{
base.Dispose(disposing);
}
public override void Rollback()
{
// do nothing
}
}
Now you can easily pre-populate the in-memory repository with consistent data so that you can test your application code easier.
A lot has been written about L2S and the Entity Framework over the last few weeks since the announcement that the Microsoft Data Team will focus their efforts on the Entity Framework. A lot has also been written about all the deficiencies the Entity Framework has and that it is not ready for prime time.
For us the Entity Framework has greatly simplified data access across the board.
Data Access code is tedious, repetitive and boring. I don’t want to focus my energy on how to access data. It should be simple, fast and it should just work. While stored procedures still have their place in certain scenarios and certain environments, most of the time writing sql by hand is a waste of your customers money. Most ORMs will do the work at least as well as you do, if not better. In edge cases you will have to tweak, but for mainstream scenarios the sql generated by most ORM tools is good enough.
I want to point out that this is not an introductory post on how to use the Entity Framework. If you want information on how to work with it, we have assembled a good list on our site. So without further delay, let’s see how we use the Entity Framework at Renaissance.
We made an early decision that we don’t want the Entity Framework to leak too much into our services. For that reason we access our model using the fairly standard repository pattern.
Here’s the main repository interface.
public interface IRepository : IDisposable
{
T[] GetAll<T>();
T[] GetAll<T>(Expression<Func<T, bool>> filter);
T GetSingle<T>(Expression<Func<T, bool>> filter);
T GetSingle<T>(Expression<Func<T, bool>> filter, List<Expression<Func<T, object>>> subSelectors);
void Delete<T>(T entity);
void Add<T>(T entity);
int SaveChanges();
DbTransaction BeginTransaction();
}
As you can see from the interface, the Getters return a generic type or an array of a generic type. Delete and Add affect the repository instance, but will not hit the database. To commit any pending changes in the repository we call the SaveChanges method. If you’re not familiar with C# 3.0 expressions, the code “Expression<Func<T, bool>> filter” might look a little cryptic. Don’t worry, from the consuming side you don’t have to deal with this. What these expressions facilitates, is to allow us to use Lambda Expressions as a specifications for our repository.
The parameter “List<Expression<Func<T, object>>> subSelectors” requires some additional explanation. The Entity Framework will not retrieve related entities when you request a top level entity. For example, if you have a customer that can have many orders you will not receive the orders when you query for a customer. The Entity Framework will only retrieve related data if you explicitly tell it to do so. I don’t have a huge issue with this, but I know a lot of others do. What did bother me tremendously though, was the fact that you have to specify the related entities using string literals.
For example: context.Customer.Include("SalesOrderHeader.SalesOrderDetail");
This was pretty much a showstopper for me, so we introduced the IncludeBuilder. The IncludeBuilder allows you to retrieve related entities as well, but strongly typed. More on that later.
Ok, enough background, let’s look at the code that consumes the repository.
Here’s a simple example of adding a new data.
…
using (var repository = _factory.Create())
using (var tx = repository.BeginTransaction())
{
Customer customer = new Customer();
customer.FirstName = firstName;
repository.Add<Customer>(customer);
repository.SaveChanges();
… Some more code that requires transaction management…
tx.Commit();
}
…
The boilerplate code is generated by a custom CodeRush template “nrp” aka “New Repository” (I just had to insert something related to CodeRush :-) )
Here’s a standard example of retrieving a customer with an ID of 10.
Customer customer = repository.GetSingle<Customer>(p => p.CustomerId == 10);
It can’t get much simpler than that.
If we want to retrieve an entity with its related entities the code is a little more involved, but remember, boiler plate is generated. (Snippets is a decent option if you don’t have CodeRush or R#)
var builder = new IncludeBuilder<Customer>();
builder.Add(p => p.Orders);
customer = repository.GetSingle<Customer>(p => p.CustomerId == customerId, builder.Includes);
If you often retrieve the same hierarchy of data you could always create a specialized repository that implements the IRepository interface and add a GetCustomerWithOrders() method. If you have many variations this will become a burden though.
To summarize, I am basically a happy camper with the Entity Framework. It has a some rough edges, but nothing we haven’t been able to work around.
Despite the fact that my blog has been dominated by CodeRush stuff lately, I think this one has been anticipated enough to spend some extra ink… :-)
Rory Becker (DxSquad) just announced (5 minutes ago) that CodeRush 3.2.1 has been released. If you are a CodeRush/Refactor! Pro user you can log in to the Client Center and download version 3.2.1.