Select N+1 Problem – How to Decrease Your ORM Performance

August 18, 2010

3 comments

Select N+1 Problem – How to Decrease Your ORM Performance

Today one of the developers at my main customer showed me a codeSelect N+1 Problem – How to Decrease Your ORM Performance
snippet he wrote against Entity Framework and made me very pale. The code included the horrible select N+1 problem. This post will introduce the select N+1 pitfall and will explain how avoid it in Entity Framework.

What is Select N+1 Problem?

ORMs can help you to address the impedance mismatch between relational databases and object oriented models and by that make your life simpler. But not knowing about some of their pitfalls can decrease your performance dramatically. One of those pitfalls is the select N+1 problem. This problem is being caused mainly because most of the ORMs out there are enabling lazy loading behavior by default. When we have a parent-children relation the problem can raise its ugly head. The problem is happening when we are executing a single query and then N following queries (N is the number of parent entities) in order to query for something. As you can expect doing N+1 queries instead of a single one will flood your database with queries that we can and should avoid. This is very unacceptable.

Select N+1 Example

To explain the problem more properly lets look at an example. Lets say that we have the following model:
Entity Designer Diagram    

A department can hold 0 or more courses (a typical parent-children relation). Since EF4 enables the lazy loading behavior by default then the following code will raise the select N+1 problem:

using (var context = new SchoolEntities())
{
  foreach (var department in context.Departments)
  {
    foreach (var course in department.Courses)
    {
      Console.WriteLine("{0}: {1}", department.Name, course.Title);
    }
  }
}

And the result is:

Running Output 
All I wanted to do is to write to the output the titles of the courses and attach to them their parent department name. In the database I got one query to retrieve all the departments and then N queries to retrieve each and every one of the courses for that department. Since in my database there are only 4 departments then I got 5 queries (1 for departments and 4 for all the courses for each department). Now in real world scenario when there are many parents… you can figure the amount of queries you’ll be generating without even knowing you did that.

How to Avoid the Problem in Entity Framework?

One of the main solutions to the select N+1 problem in Entity Framework is to use the Include method.

The Include method is making an eager load for the children that you indicate to it. You give the method a path of all the children you like to load in the query (as long as you have a relation between the entities) and one query will be generated to bring back all the relevant entities. This isn’t a bullet proof solution! There are serious implications that you should understand when you use the Include method. The main implication is that it is doing a join between all the tables that you want to return and the data is retrieved in a flatten manner in order to materialize all the entities from it. Also the materialization process when having a lot of included entities can cause a downgrade of performance. So you will have to weight the balance between using Include or lazy loading. The following code will generate the same results as in the above figure but with only one query:

using (var context = new SchoolEntities())
{
  foreach (var department in context.Departments.Include("Courses"))
  {
    foreach (var course in department.Courses)
    {
      Console.WriteLine("{0}: {1}", department.Name, course.Title);
    }
  }
}

and take a look at the generated query:

SELECT   [Project1].[DepartmentID]  AS [DepartmentID],
         [Project1].[Name]          AS [Name],
         [Project1].[Budget]        AS [Budget],
         [Project1].[StartDate]     AS [StartDate],
         [Project1].[Administrator] AS [Administrator],
         [Project1].[C1]            AS [C1],
         [Project1].[CourseID]      AS [CourseID],
         [Project1].[Title]         AS [Title],
         [Project1].[Days]          AS [Days],
         [Project1].[Time]          AS [Time],
         [Project1].[Location]      AS [Location],
         [Project1].[Credits]       AS [Credits],
         [Project1].[DepartmentID1] AS [DepartmentID1]
FROM     (SELECT [Extent1].[DepartmentID]  AS [DepartmentID],
                 [Extent1].[Name]          AS [Name],
                 [Extent1].[Budget]        AS [Budget],
                 [Extent1].[StartDate]     AS [StartDate],
                 [Extent1].[Administrator] AS [Administrator],
                 [Extent2].[CourseID]      AS [CourseID],
                 [Extent2].[Title]         AS [Title],
                 [Extent2].[Days]          AS [Days],
                 [Extent2].[Time]          AS [Time],
                 [Extent2].[Location]      AS [Location],
                 [Extent2].[Credits]       AS [Credits],
                 [Extent2].[DepartmentID]  AS [DepartmentID1],
                 CASE 
                   WHEN ([Extent2].[CourseID] IS NULL) THEN CAST(NULL AS int)
                   ELSE 1
                 END AS [C1]
          FROM   [dbo].[Department] AS [Extent1]
                 LEFT OUTER JOIN [dbo].[Course] AS [Extent2]
                   ON [Extent1].[DepartmentID] = [Extent2].[DepartmentID]) AS [Project1]
ORDER BY [Project1].[DepartmentID] ASC,
         [Project1].[C1] ASC

Summary

There are pitfalls when we are using ORMs and one of them is the select N+1 problem. This isn’t a problem of Entity Framework only. This problem exists in other ORMs like NHibernate, LINQ to SQL and more. You should be aware of those problems when you develop with ORMs and avoid them whenever it is possible. One way to do that is the Include method in Entity Framework but this solution also can generate problems.

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

3 comments

  1. Luciano Evaristo Guerche (Gorše)August 18, 2010 ב 16:45

    Gil,

    Using or not using “Include” depends on scenario/context. I’d resort to lazy loading at user interfaces, where data is loaded/consumed on demand (on a grid displaying many items, only children related to those rows user clicked be loaded) and I’d resort to eager loading (Include) for reporting, where all data will be consume at all. If data being consumed is to be used for summarization, I’d rather resort to linq aggregation functions instead (count, sum, average, aggregate).

    Cheers.

    Reply
  2. Bret Ferrier @runxc1August 18, 2010 ב 20:09

    One of the items that scottgu recently tweeted about is that in upcoming versions of EF4 that you will be able to load items without doing the fuzzy name matching namely you will be able to write something like this.

    northwind.Where(p=>p.UnitsInStock > 9).Include(p=>p.Category)

    Using Include when needed can be very handy.

    Reply
  3. Pingback: Entity Framework, et le piège du Lazy Loading | Some DotNet Thoughts, with a bit of agile (or the contrary)