Select N+1 Problem – How to Decrease Your ORM Performance
Today one of the developers at my main customer showed me a code
snippet he wrote against Entity Framework and made me very pale. The code included the horrible select N+1 problem. This post will introduce the select N+1 pitfall and will explain how avoid it in Entity Framework.
What is Select N+1 Problem?
ORMs can help you to address the impedance mismatch between relational databases and object oriented models and by that make your life simpler. But not knowing about some of their pitfalls can decrease your performance dramatically. One of those pitfalls is the select N+1 problem. This problem is being caused mainly because most of the ORMs out there are enabling lazy loading behavior by default. When we have a parent-children relation the problem can raise its ugly head. The problem is happening when we are executing a single query and then N following queries (N is the number of parent entities) in order to query for something. As you can expect doing N+1 queries instead of a single one will flood your database with queries that we can and should avoid. This is very unacceptable.
Select N+1 Example
A department can hold 0 or more courses (a typical parent-children relation). Since EF4 enables the lazy loading behavior by default then the following code will raise the select N+1 problem:
And the result is:
All I wanted to do is to write to the output the titles of the courses and attach to them their parent department name. In the database I got one query to retrieve all the departments and then N queries to retrieve each and every one of the courses for that department. Since in my database there are only 4 departments then I got 5 queries (1 for departments and 4 for all the courses for each department). Now in real world scenario when there are many parents… you can figure the amount of queries you’ll be generating without even knowing you did that.
How to Avoid the Problem in Entity Framework?
One of the main solutions to the select N+1 problem in Entity Framework is to use the Include method.
The Include method is making an eager load for the children that you indicate to it. You give the method a path of all the children you like to load in the query (as long as you have a relation between the entities) and one query will be generated to bring back all the relevant entities. This isn’t a bullet proof solution! There are serious implications that you should understand when you use the Include method. The main implication is that it is doing a join between all the tables that you want to return and the data is retrieved in a flatten manner in order to materialize all the entities from it. Also the materialization process when having a lot of included entities can cause a downgrade of performance. So you will have to weight the balance between using Include or lazy loading. The following code will generate the same results as in the above figure but with only one query:
and take a look at the generated query:
There are pitfalls when we are using ORMs and one of them is the select N+1 problem. This isn’t a problem of Entity Framework only. This problem exists in other ORMs like NHibernate, LINQ to SQL and more. You should be aware of those problems when you develop with ORMs and avoid them whenever it is possible. One way to do that is the Include method in Entity Framework but this solution also can generate problems.