Ignore duplicate key insert with Entity Framework

c# entity-framework

Question

I'm using ASP.NET MVC4 with Entity Framework Code First. I have a table called "users", with primary key "UserId". This table may have 200,000+ entries.

I need to insert another 50 users. I might do this like

foreach(User user in NewUsers){
    context.Add(user);
}
dbcontext.SaveChanges();

The problem is, one or more of those new users might already exist in the DB. If I add them and then try to save, it throws an error and none of the valid ones get added. I could modify the code to do this:

foreach(User user in NewUsers){
    if(dbcontext.Users.FirstOrDefault(u => u.UserId) == null)
    {
        dbcontext.Users.Add(user);
    }
}
dbcontext.SaveChanges();

which would work. The problem is, then it has to run a query 50 times on a 200,000+ entry table. So my question is, what is the most performance efficient method of inserting these users, ignoring any duplicates?

1
25
8/7/2013 8:44:46 PM

Accepted Answer

You can do this:

var newUserIDs = NewUsers.Select(u => u.UserId).Distinct().ToArray();
var usersInDb = dbcontext.Users.Where(u => newUserIDs.Contains(u.UserId))
                               .Select(u => u.UserId).ToArray();
var usersNotInDb = NewUsers.Where(u => !usersInDb.Contains(u.UserId));
foreach(User user in usersNotInDb){
    context.Add(user);
}

dbcontext.SaveChanges();

This will execute a single query in your database to find users which already exist, then filter them out of your NewUsers set.

9
8/7/2013 8:48:48 PM

Popular Answer

You can filter out the existing users with one query

foreach(User user in NewUsers.Where(us => !dbcontext.Users.Any(u => u.userId == us.userId)))
{
    dbcontext.Users.Add(user);
}
dbcontext.SaveChanges();

EDIT:

As pointed out in the comments the proposal above will result in an sql call for each element in the NewUsers collection. I could confirm that with SQL Server Profiler.

One intresting result of the profiling is the somewhat wierd sql generated by EF for each item(Model names are different than in the OP, but the query is the same):

exec sp_executesql N'SELECT 
CASE WHEN ( EXISTS (SELECT 
    1 AS [C1]
    FROM [dbo].[EventGroup] AS [Extent1]
    WHERE [Extent1].[EventGroupID] = @p__linq__0
)) THEN cast(1 as bit) WHEN ( NOT EXISTS (SELECT 
    1 AS [C1]
    FROM [dbo].[EventGroup] AS [Extent2]
    WHERE [Extent2].[EventGroupID] = @p__linq__0
)) THEN cast(0 as bit) END AS [C1]
FROM  ( SELECT 1 AS X ) AS [SingleRowTable1]',N'@p__linq__0 int',@p__linq__0=10

Quite a nice piece of code to do the job of a simple one-liner.

My point of view is that writing nice and readable declarative code and let the compiler and optimizer do the dirty job is a great attitude. This is one of the cases when the result of such a style is surprising and you have to go dirty.



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow