Entity Framework Bad Practices and solutions with EF-Prof

6 בספטמבר 2011

אין תגובות

Entity Framework Bad Practices – and solutions with EF-Prof

This week I had to find and fix an interesting bug. For some reason, when using Entity Framework, we’ve seen database queries that were causing a TimeoutException to be thrown. We used EF-Prof to find which query was giving us troubles, and found a few additional issues. Solving these issues has improved the performance of our server by a great deal.

While there are many ways to abuse ADO.Net Entity Framework, these were the two issues I’ve stumbled upon this week:

Setting the environment

In order to demonstrate the issues, I’m using an ultra-simple Entity Framework Model:

01-SimpleModel

The model is created in a Console Application.

Badly Interpreted Queries

The first issue we looked for was the TimeoutException. Consider the following code:

 using  (var  context = new  PubsEntities())
 {
     var  latestIds = from  pubs in  context.Pubs
                      where  pubs.City != null  && pubs != null
                      group  pubs by  pubs.City
                         into  cities
                         select  cities.Max(pub => pub.Id);
 
     var  latest = from  pub in  context.Pubs
                   where  latestIds.Contains(pub.Id)
                   select  pub;
 
     foreach  (var  pub in  latest)
     {
         Console .WriteLine("{0}:{1}" , pub.CityId, pub.Name);
     }
 }
 

Breakdown: This code finds the pub that has the largest Id in each city. Since the model uses auto incrementing identity, it’s safe to say that it finds the most recent pub added in each city. It does that by picking up the max Id of pubs in a city, and then gets all pubs, according to the latestIds.

The trivial problem is easy to see (because I painted it yellow Smile) – There’s a redundant check there, to verify that a row coming from the context is not null, which can’t really happen. The interesting thing is to check what’s going on in the generated SQL code:


SELECT [Id] AS [Id], [Name] AS [Name], [Owner] AS [Owner], 
       [NumberOfSeats] AS [NumberOfSeats], [Street] AS [Street], 
       [HouseNumber] AS [HouseNumber], [City] AS [City] 
FROM [Pub] 
WHERE EXISTS (
       SELECT 1 AS [C1] 
       FROM (  
              SELECT E3.[Id] AS [K1], E3.[Name] AS [K2], 
                     E3.[OtherUnusefulInformation] AS [K3], 
                     MAX([E2].[Id]) AS [A1] 
              FROM   [Pub] AS [E2] 
              LEFT OUTER JOIN [City] AS E3 
              ON     [E2].[City] = E3.[Id] 
              WHERE  ([E2].[City] IS NOT NULL) 
              AND    (cast(1 as bit) <> cast(0 as bit))
              GROUP BY E3.[Id], E3.[Name], E3.[OtherUnusefulInformation]) AS [GroupBy1] 
       WHERE [GroupBy1].[A1] = E1.[Id])

See the weird condition? It is evaluated for each record, because it can’t be cached by SQL-Server. That little piece of code slowed the query on a large enough dataset, and caused a timeout. By removing the nullity check, we reduced the query runtime from 30 sec to 34ms.

How did we find it? Whenever an error is thrown, the EF-Profiler shows an alert. Rest of the time, it was our most expensive query, according to the EF-Profiler.

Select N+1

Once the profiler was hooked and the timeout was gone, we went on and randomly clicked buttons on our application. We’ve found that a few of the methods had code that resembled the following:

 using  (var context = new PubsEntities())
 {
     var cities = context.Cities;
     foreach (var  city in  cities)
     {
         Console.WriteLine(city.Name);
         foreach (var  pub in  city.Pubs)
         {
             Console.WriteLine(“\t{0}”, pub.Name);
         }
     }
 }

Now, in an object oriented world, this code makes perfect sense – However, when using EF, this causes a lot of different SELECT statements – once for each pub being used. This is caused because the Pubs are Lazy-loaded. This issue is called “The Select N+1 data access anti-pattern”.

Solution? simple indicate to EF that the Cities collection should be loaded with pubs, like this:

 var cities = context.Cities.Include("Pubs");

The Include method causes the pubs collection to be eagerly loaded, thus only one call is made to the database. This reduces a lot of overhead. This issue was the most common alert given by EF-Prof.

Conclusions

1. When using an O/RM, such as ADO.Net’s entity framework, make sure to use a profiler to find any performance bottlenecks.

2. Don’t trust the generated queries blindly– go over the queries, and check their translation to SQL.

3. Using ADO.Net Entity Framework, or any O/RM for that matter, does not hide the database – it makes it much easier to use, but still – be aware that there’s a database under the hood, and do your best, to minimize the number of queries the database has to execute.

Happy Coding!

הוסף תגובה
facebook linkedin twitter email

כתיבת תגובה

האימייל לא יוצג באתר. שדות החובה מסומנים *