I discovered a code block that was taking more than five seconds to run while trying to restructure some code that had just become very sluggish.
Two statements make up the code:
IEnumerable<int> StudentIds = _entities.Filters .Where(x => x.TeacherId == Profile.TeacherId.Value && x.StudentId != null) .Select(x => x.StudentId) .Distinct<int>();
_entities.StudentClassrooms .Include("ClassroomTerm.Classroom.School.District") .Include("ClassroomTerm.Teacher.Profile") .Include("Student") .Where(x => StudentIds.Contains(x.StudentId) && x.ClassroomTerm.IsActive && x.ClassroomTerm.Classroom.IsActive && x.ClassroomTerm.Classroom.School.IsActive && x.ClassroomTerm.Classroom.School.District.IsActive).AsQueryable<StudentClassroom>();
It's a little disorganized, but I first get a distinct list of Ids from one Table (Filters), after which I query another Table using that list.
Despite the fact that they are tiny tables, the query time is still 5 or more seconds.
I entered this into LINQPad, and it revealed that it executed 1000 "distinct" queries after completing the bottom query.
I altered the "StudentIds" code on a whim by just appending.ToArray() to the end. It now takes just around 100ms to finish the identical query thanks to this 1000x performance improvement.
What's going on? Why am I misusing this?
One of the dangers of postponed execution in Linq is this: When you first try
not a collection stored in memory. It will thus execute the second query on the database once again each time it is used.
forcing the initial query's execution using
a memory collection, together with the
A portion of your second query will run over this collection, which has a set order of objects in it. This is translated to a SQL-like object.
where StudentId in (1,2,3,4)
Naturally, this question will be significantly quicker since you had calculated this sequence once, rather than every time the
clause is carried out. Your subsequent inquiry made without
(I believe) would be translated to a SQL query with an
where exists (...)
a sub-query that is assessed for every row.
translates the first query into server memory materialization.
My best guess is that the expression cannot be parsed by the query provider.
. Therefore, it likely believes that the
is a loaded array in the memory. Therefore, during the parsing stage, it is presumably repeatedly contacting the database. Setting up the profiler is the only way to be sure.
Use a join rather than "contains" if you need to do this on the database server. You are probably lacking a surrogate primary key or a foreign key someplace if you need to utilize contains to solve what seems to be a join issue.
You may also state
instead of IEnumerable, as IQueryable. This might offer the query provider with the cue they need to understand the
as expression, or as data that hasn't been read into memory. It's worth a shot, albeit I have my doubts.
Use this if everything else fails.
This will bring up the opening
to the mind.