LINQ + Entity Framework + "Contains" == Slow as molasses?

c# entity-framework linq

Question

I discovered a code block that was taking more than five seconds to run while trying to restructure some code that had just become very sluggish.

Two statements make up the code:

IEnumerable<int> StudentIds = _entities.Filters
                    .Where(x => x.TeacherId == Profile.TeacherId.Value && x.StudentId != null)
                    .Select(x => x.StudentId)
                    .Distinct<int>();

and

_entities.StudentClassrooms
                    .Include("ClassroomTerm.Classroom.School.District")
                    .Include("ClassroomTerm.Teacher.Profile")
                    .Include("Student")
                    .Where(x => StudentIds.Contains(x.StudentId)
                    && x.ClassroomTerm.IsActive
                    && x.ClassroomTerm.Classroom.IsActive
                    && x.ClassroomTerm.Classroom.School.IsActive
                    && x.ClassroomTerm.Classroom.School.District.IsActive).AsQueryable<StudentClassroom>();

It's a little disorganized, but I first get a distinct list of Ids from one Table (Filters), after which I query another Table using that list.

Despite the fact that they are tiny tables, the query time is still 5 or more seconds.

I entered this into LINQPad, and it revealed that it executed 1000 "distinct" queries after completing the bottom query.

I altered the "StudentIds" code on a whim by just appending.ToArray() to the end. It now takes just around 100ms to finish the identical query thanks to this 1000x performance improvement.

What's going on? Why am I misusing this?

1
16
4/9/2012 10:34:18 PM

Accepted Answer

One of the dangers of postponed execution in Linq is this: When you first tryStudentIds actually anIQueryable not a collection stored in memory. It will thus execute the second query on the database once again each time it is used.

forcing the initial query's execution usingToArray() makes StudentIds a memory collection, together with theContains A portion of your second query will run over this collection, which has a set order of objects in it. This is translated to a SQL-like object.where StudentId in (1,2,3,4) query.

Naturally, this question will be significantly quicker since you had calculated this sequence once, rather than every time theWhere clause is carried out. Your subsequent inquiry made withoutToArray() (I believe) would be translated to a SQL query with anwhere exists (...) a sub-query that is assessed for every row.

24
4/9/2012 10:58:34 PM

Popular Answer

ToArray() translates the first query into server memory materialization.

My best guess is that the expression cannot be parsed by the query provider.StudentIds.Contains(x.StudentId) . Therefore, it likely believes that thestudentIds is a loaded array in the memory. Therefore, during the parsing stage, it is presumably repeatedly contacting the database. Setting up the profiler is the only way to be sure.

Use a join rather than "contains" if you need to do this on the database server. You are probably lacking a surrogate primary key or a foreign key someplace if you need to utilize contains to solve what seems to be a join issue.

You may also statestudentIds instead of IEnumerable, as IQueryable. This might offer the query provider with the cue they need to understand thestudentIds as expression, or as data that hasn't been read into memory. It's worth a shot, albeit I have my doubts.

Use this if everything else fails.ToArray() This will bring up the openingstudentIds to the mind.



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow