What is the best practice for multiple "Include"-s in Entity Framework?

architecture c# data-access-layer ef-database-first entity-framework

Question

Let's say we have four entities in data model: Categories, Books, Authors and BookPages. Also assume Categories-Books, Books-Authors and Books-BookPages relationships are one-to-many.

If a category entity instance is retrieved from database - including "Books", "Books.BookPages" and "Books.Authors" - this will become a serious performance issue. Moreover, not including them will result in "Object reference is not set to an instance of an object" exception.

What is the best practice for using multiple Include method calls?

  • Write a single method GetCategoryById and include all items inside (performance issue)
  • Write a single method GetCategoryById and send a list of relationships to include (maybe, but still seems not elegant enough)
  • Write methods like GetCategoryByIdWithBooks, GetCategoryByIdWithBooksAndBooksPages and GetCategoryByIdWithBooksAndAuthors (not practical)

EDIT: By second option I meant something like this:

public static Category GetCategoryById(ModelEntities db, int categoryId, params string[] includeFields)
{
    var categories = db.Categories;

    foreach (string includeField in includeFields)
    {
        categories = categories.Include(includeField);
    }

    return categories.SingleOrDefault(i => i.CategoryId == categoryId);
}

When calling we need a code like this:

Category theCategory1 = CategoryHelper.GetCategoryById(db, 5, "Books");
Category theCategory2 = CategoryHelper.GetCategoryById(db, 5, "Books", "Books.Pages");
Category theCategory3 = CategoryHelper.GetCategoryById(db, 5, "Books", "Books.Authors");
Category theCategory4 = CategoryHelper.GetCategoryById(db, 5, "Books", "Books.Pages", "Books.Authors");

Are there any distinct disadvantages of this approach?

1
15
1/15/2014 12:30:22 PM

Accepted Answer

Write a single method GetCategoryById and send a list of relationships to include (maybe, but still seems not elegant enough)

Write methods like GetCategoryByIdWithBooks, GetCategoryByIdWithBooksAndBooksPages and GetCategoryByIdWithBooksAndAuthors (not practical)

A combination of these two is currently my approach. I know what properties I want to include for each context, so I rather hand-code them (as you said yourself, lazy-loading isn't always an option, and if it is, you'll repeat the same repetitive Include()-like syntax when mapping from data models to DTO's).

This separation causes you to think harder about what datasets you want to expose, given data-access-code like this is usually hidden beneath a service.

By utilizing a base class containing a virtual method you can override to run the required Include()s:

using System.Data.Entity;

public class DataAccessBase<T>
{
    // For example redirect this to a DbContext.Set<T>().
    public IQueryable<T> DataSet { get; private set; }

    public IQueryable<T> Include(Func<IQueryable<T>, IQueryable<T>> include = null)
    {
        if (include == null)
        {
            // If omitted, apply the default Include() method 
            // (will call overridden Include() when it exists) 
            include = Include;
        }

        return include(DataSet);
    }

    public virtual IQueryable<T> Include(IQueryable<T> entities)
    {
        // provide optional entities.Include(f => f.Foo) that must be included for all entities
        return entities;
    }
}

You can then instantiate and use this class as-is, or extend it:

using System.Data.Entity;

public class BookAccess : DataAccessBase<Book>
{
    // Overridden to specify Include()s to be run for each book
    public override IQueryable<Book> Include(IQueryable<Book> entities)
    {
        return base.Include(entities)
                   .Include(e => e.Author);
    }

    // A separate Include()-method
    private IQueryable<Book> IncludePages(IQueryable<Book> entities)
    {
        return entities.Include(e => e.Pages);
    }

    // Access this method from the outside to retrieve all pages from each book
    public IEnumerable<Book> GetBooksWithPages()
    {
        var books = Include(IncludePages);
    }
}

Now you can instantiate a BookAccess and call methods on it:

var bookAccess = new BookAccess();

var allBooksWithoutNavigationProperties = bookAccess.DataSet;
var allBooksWithAuthors = bookAccess.Include();
var allBooksWithAuthorsAndPages = bookAccess.GetBooksWithPages();

In your case, you might want to create separate IncludePages and GetBooksWithPages-alike method pairs for each view of your collection. Or just write it as one method, the IncludePages method exists for reusability.

You can chain these methods all the way you like, since each of them (as well as Entity Framework's Include() extension method) returns yet another IQueryable<T>.

8
7/8/2013 4:15:51 PM

Popular Answer

As @Colin mentioned in the comments, you need to use the virtual keyword when defining navigation properties in order for them to work with lazy loading. Assuming you're using Code-First, your Book class should look something like this:

public class Book
{
  public int BookID { get; set; }
  //Whatever other information about the Book...
  public virtual Category Category { get; set; }
  public virtual List<Author> Authors { get; set; }
  public virtual List<BookPage> BookPages { get; set; }
}

If the virtual keyword is not used, then the proxy class created by EF will not be able to lazy load the related entity/entities.

Of course, if you are creating a new Book, it isn't going to be able to do lazy loading and will just throw the NullReferenceException if you try to iterate over the BookPages. That's why you should do one of two things:

  1. define a Book() constructor that includes BookPages = new List<BookPage>(); (same for Authors) or
  2. make sure that the ONLY time you ever have "new Book()" in your code is when you are creating a new entry that you are immediately saving to the database and then discarding without trying to get anything from it.

I personally prefer the 2nd option, but I know that many others prefer the 1st.

<EDIT> I found a third option, which is to use the Create method of the DbSet<> class. This means you would call myContext.Books.Create() instead of new Book(). See this Q+A for more info: Ramifications of DbSet.Create versus new Entity() </EDIT>

Now, the other way that lazy loading can break is when it is turned off. (I'm assuming that ModelEntities is the name of your DbContext class.) To turn it off, you would set ModelEntities.Configuration.LazyLoadingEnabled = false; Pretty self explanatory, no?

Bottom line is that you shouldn't need to use Include() everywhere. It's really meant to be more of a means of optimization rather than a requirement for your code to function. Using Include() excessively results in very poor performance because you end up getting far more than you really need from the database, because Include() will always bring in all related records. Let's say that you are loading a Category and there are 1000 Books belonging to that Category. You can't filter it to only include fetch the Books written by John Smith when using the Include() function. You can however (with lazy loading enabled) just do the following:

Category cat = ModelEntities.Categorys.Find(1);
var books = cat.Books.Where(b => b.Authors.Any(a => a.Name == "John Smith"));

This will actually result in fewer records being returned from the database and is much simpler to understand.

Hope that helps! ;)



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow