Join table with collection EntityFramework

c# entity-framework

Question

Maybe is it duplicate but I can't find proper way to do following correctly.

Generally I want to retrieve all data from Employee table which are related with List. Type MyEmployee contains EntitySourceID which I use to map with EmployeeID. So, I want to retrieve all Employees which have match EmployeeID with EntitySourceID in List collection.

Type MyEmployee looks like:

public class MyEmployee 
    {
        public long PersonID { get; set; }
        public string ConnectionString { get; set; }
        public long EntitySourceID { get; set; }
        public int EntitySourceTypeID { get; set; }
    }

My query looks like:

internal IEnumerable<Person> GetPersons(List<MyEmployee> myEmployees)
    {
            return (from p in _context.Employee
                join pList in myEmployees on p.EmployeeID equals pList.EntitySourceID
                select new Person 
                {
                    PersonID = pList.PersonID,
                    FirstName = p.FirstName,
                    LastName = p.LastName,
                    Name = p.Name,
                    Suffix = p.Suffix,
                    Title = p.Title
                }).ToList();
    }

You can see in query, when I create new Person object I used pList.PersonID from List myEmployees collection to fill Person's.

So, my question is how I can efficient retrieve data from Employee table which have match with List collection, and also use pList.PersonID (from collection) to create return result?

I use EF 6, database first approach.

Also, I didn't mention. This query produce following exception: Unable to create a constant value of type 'MyEmployee'. Only primitive types or enumeration types are supported in this context.

1
5
5/19/2016 8:11:13 PM

Accepted Answer

IQueryable vs IEnumerable

A good start for solving some of your deeper questions would be spending some time discovering the differences between

and maybe also between

because while they have similar forms they differ in purpose and behavior.

Now back to your question

For starters let's name a few things:

  • Let's call the RAM hosted collection of MyEmployee instances THE LIST
  • Let's call the database table (most probably entitled "Employee(s)") THE TABLE

Sadly you didn't specify a few very important details while writing your question. That leads me to proposing 4 different answers. The answers will be categorized based on the truth values of the following 2 questions:

  • Is THE LIST huge?
  • Is THE TABLE huge?

We have 4 very different cases:

  1. No, No
  2. No, Yes
  3. Yes, No
  4. Yes, Yes

You can imagine by now that the fourth is maybe the ugliest.

When THE LIST is not huge

In cases 1 and 2 you could think about your problem from a different perspective:

Say you need to fetch ONE (or zero) record(s) from the database based on Precisely 1 parameter which is an ID. Should you be performing a JOIN?

The answer is: absolutely NOT. Take a look at this code:

var query = from employee in _context.Employee
            where employee.EmployeeId == 23
            select employee;
var found = query.FirstOrDefault();

What if I wanted to fetch the records associated with Precisely 2 parameters? I could achieve that in a similar manner:

var query = from employee in _context.Employee
            where employee.EmployeeId == 23 || employee.EmployeeId == 24
            select employee;
var results = query.ToArray();

if (results.Length == 0)
   // didn't find anyone of the presumably existing records

else if (results.Length == 1) {
   if (results[0].EmployeeId == 23)
      // then we found the 23
   else
      // the other one

} else if (results.Length == 2)
   // found both, look inside to see which is which

I have intentionally written the finishing touches of the algorithms (the if part) in a silly manner in order to avoid extra confusion.

This would be a more humane approach for the finishing touches:

...
var results = ... got them (see above)

var map = results.ToDictionary(keySelector: x => x.EmployeeId);
var count = map.Count; // this gives you the number of results, same as results.Length
var have23 = map.ContainsKey(23); // this tells you whether you managed to fetch a certain id
var record23 = map[23]; // this actually gives you the record
foreach (var key in map.Keys) { .. } // will iterate over the fetched ids
foreach (var record in map.Values) { .. } // will iterate over the fetched values

Worry not about the ToDictionary extension method. It has NOTHING to do with EntityFramework (look it up by clicking on it).

Now.. back to our story: What if you wanted to bring the records associated with 15 ids? Stop. Where is this going? Am I asking you to hardcode a different query for each possible number of ids?

Of course not. As long as the number of ids is "relatively small" (meaning you are allowed by someone, or by yourself to bombard the database with that request magnitude) you could very well use an "column IN list of params" SQL construct.

How can you instruct LINQ to SQL, or EF to translate into an "x IN y" operation instead of an "x = y" operation, on the SQL side?

By using a primitive array of the respective type and the Contains method. In other words, get a load of:

var query = from employee in _context.Employee
            where listOfIds.Contains( employee.EmployeeId )
            select employee;
var results = query.ToArray();

But you need a "list of Ids" not a "list of MyEmployee instances". You could pull that off very easily like so:

List<MyEmployee> originalList = new List<MyEmployee>();
// ... say you populate this somehow, or you've received it from elsewhere

int[] listOfIds = (from employee in originalList
                   select employee.EntityId).ToArray();

// .. and then carry on with the EF query

Please note that queries on collections manifest as IEnumerable<T> instances, not as IQueryable<T> instances and have nothing to do with EF or LINQ to SQL or anyother DB or external data service.

IF THE TABLE IS NOT HUGE

Then you could refrain from actually using EF with complex queries, use it just for a "Full table fetch", temporarily store the results in your .NET process and use regular LINQ however you like.

The key to this story is fetching the entire table from the beginning. In your question you wrote:

return (from p in _context.Employee
            join pList in myEmployees on p.EmployeeID equals pList.EntitySourceID
            select new Person 
            {
                PersonID = pList.PersonID,
                FirstName = p.FirstName
                ... etc

Simply augment that with:

var entityList = _context.Employee.ToArray();

return (from p in entityList  // PLEASE NOTE THIS CHANGE ALSO
        join pList in myEmployees on p.EmployeeID equals pList.EntitySourceID
        select ...

TO WRAP IT UP

You can either:

  • instruct the database to do the work but in which case you can't send it fancy .NET instances in the process
  • do the work yourself, upstairs, in .NET

Either one side or the other (the database or the .NET process) needs to have all the cards (needs to have a clone of the other side) in order be able to perform the JOIN.

So it's just a game of compromise.

HOW ABOUT THE REMAINING CASE

If both THE TABLE and THE LIST are huge, then you're s****d. No- I'm just kidding.

Nobody heard of someone asking someone else to do wonders when they can't actually be done.

If this is the case, then you have to simplify the problem into a large number of smaller problems. I would suggest transforming into a TABLE HUGE + LIST NOT SO HUGE problem multiplied by N.

So how do you go about that?

List<MyEmployee> original = ...
// you take your list
// and you split it in sections of .. say 50 (which in my book is not huge for a database
// although be careful - the pressure on the database will be almost that of 50 selects running in parallel for each select)

// how do you split it?
// you could try this

public static IEnumerable<List<MyEmployee>> Split(List<MyEmployee> source, int sectionLength) {
    List<MyEmployee> buffer = new List<MyEmployee>();
    foreach (var employee in source) {
        buffer.Add(employee);
        if (buffer.Count == sectionLength) {
            yield return buffer.ToList(); // MAKE SURE YOU .ToList() the buffer in order to clone it
            buffer.Clear(); // or otherwise all resulting sections will actually point to the same instance which gets cleared and refilled over and over again
        }             
    }
    if (buffer.Count > 0)   // and if you have a remainder you need that too
       yield return buffer; // except for the last time when you don't really need to clone it
}

List<List<MyEmployee>> sections = Split(original, 50).ToList();

// and now you can use the sections
// as if you're in CASE 2 (the list is not huge but the table is)
// inside a foreach loop

List<Person> results = new List<Person>(); // prepare to accumulate results

foreach (var section in sections) {

    int[] ids = (from x in section select x.EntityID).ToArray();

    var query = from employee in _context.Employee
                where ids.Contains(employee.EmployeeId) 
                ... etc;

    var currentBatch = query.ToArray();

    results.AddRange(currentBatch);

}

Now you could say that this is simply a way of fooling the database into believing it has little work to do when in fact we're still flooding it with a lot of work and maybe making life hard for the other concurrent clients.

Well- yeah, but at least you could throttle down. You could Thread.Sleep between sections... You could use iterators (look them up) and actually not flood the RAM with records that will take a long time to process anyway but rather "stream things".

You have more control over the situation.

Good luck!

15
9/26/2014 11:44:48 PM

Popular Answer

I use following approach:

  • Extract IDs from myEmployees list;
  • Retrieve data for that IDs and put in var query;
  • Join mayEmployees and query result.

Like in following example:

   long[] myEmployeesIDs  = myEmployees.Select(p => p.EntitySourceID).ToArray();

       var query = (from e in _context.Employees
                 where myEmployeesIDs.Contains(e.EmployeID)
                 select new Person
                 {
                     PersonID = e.EmployeeID 
                 }).ToList();

       return  (from m in myEmployees
                join q in query on m.EntitySourceID equals q.PersonID
                select new Person
                {
                    PersonID  = i.PersonID,
                    ...
                }).ToList();


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow