I've been trying to familiarize myself with the Entity Framework. Most of it seems straight forward, but I'm a bit confused on the difference between eager loading with the Include method and default lazy loading. Both seem like they load related entities, so on the surface it looks like they do the same thing. What am I missing?
Let's say you have two entities with a one-to-many relationship: Customer and Order, where each Customer can have multiple Orders.
When loading up a Customer entity, Entity Framework allows you to either eager load or lazy load the Customer's Orders collection. If you choose to eager load the Orders collection, when you retrieve a Customer out of the database Entity Framework will generate SQL that retrieves both the Customer's information and the Customer's Orders in one query. However, if you choose to lazy load the Orders collection, when you retrieve a Customer out of the database Entity Framework will generate SQL that only pulls the Customer's information (Entity Framework will then generate a separate SQL statement if you access the Customer's Orders collection later in your code).
Determining when to use eager loading and when to use lazy loading all comes down to what you expect to do with the entities you retrieve. If you know you only need a Customer's information, then you should lazy-load the Orders collection (so that the SQL query can be efficient by only retrieving the Customer's information). Conversely, if you know you'll need to traverse through a Customer's Orders, then you should eager-load the Orders (so you'll save yourself an extra database hit once you access the Customer's Orders in your code).
P.S. Be very careful when using lazy-loading as it can lead to the N+1 problem. For example, let's say you have a page that displays a list of Customers and their Orders. However, you decide to use lazy-loading when fetching the Orders. When you iterate over the Customers collection, then over each Customer's Orders, you'll perform a database hit for each Customer to lazy-load in their Orders collection. This means that for N customers, you'll have N+1 database hits (1 database hit to load up all the Customers, then N database hits to load up each of their Orders) instead of just 1 database hit had you used eager loading (which would have retrieved all Customers and their Orders in one query).
If you come from SQL world think about JOIN.
If you have to show in a grid 10 orders and the customer that put the order you have 2 choices:
1) LAZY LOAD ( = 11 queryes = SLOW PERFORMANCES)
EF will shot a query to retrieve the orders and a query for each order to retrieve the customer data.
Select * from order where order=1 + 10 x (Select * from customer where id = (order.customerId))
1) EAGER LOAD ( = 1 query = HIGH PERFORMANCES)
EF will shot a single query to retrieve the orders and customers with a JOIN.
Select * from orders INNER JOIN customers on orders.customerId=customer.Id where order=1
PS: When you retrieve an object from the db, the object is stored in a cache while the context is active. In the example that I made with LAZY LOAD, if all the 10 orders relate to the same customer you will see only 2 query because when you ask to EF to retrieve an object the EF will check if the object is in the cache and if it find it will not fire another SQL query to the DB.