How to extract primary key from SqlBulkCopy when inserting numerous rows in EF is too slow?

entity-framework sql sqlbulkcopy

Question

In one of our use cases, a user initiates a request that leads to the insert of 100–1000 rows.
The Primary Key ID of the inserted objects is required in order to process the object further and produce other objects that serve as foreign keys to the original inserted objects once it has been inserted.

This has been done so far using EF in a foreach loop, but it was too sluggish and took between 15 and 20 seconds to finish 600 rows. (despite preventing the user, negative: ( )

Original code (which also handles updates; efficiency is unimportant here since it does not obstruct users):

foreach (Location updatedLoc in locationsLoaded)
{
    // find it in the collection from the database
    Location fromDb = existingLocations.SingleOrDefault(loc => loc.ExtId.Equals(updatedLoc.ExtId));

    // update or insert
    if (fromDb != null)
    {
        // link ids for update
        updatedLoc.Id = fromDb.Id;

        // set values for update 
        db.Entry(fromDb).CurrentValues.SetValues(updatedLoc);
    }
    else
    {
        System.Diagnostics.Trace.WriteLine("Adding new location: " + updatedLoc.Name, "loadSimple");

        // insert a new location <============ This is the bottleneck, takes about 20-40ms per row
        db.Locations.Add(updatedLoc);
    }
}

// This actually takes about 3 seconds for 600 rows, was actually acceptable
db.SaveChanges();

I discovered I was using EF incorrectly and needed to utilize after doing study on SO and the internet.SqlBulkCopy

As a result, the code was modified, and what previously took 20 seconds now only takes 100 milliseconds (!).

foreach (Location updatedLoc in locationsLoaded)
{
    // find it in the collection from the database
    Location fromDb = existingLocations.SingleOrDefault(loc => loc.ExtId.Equals(updatedLoc.ExtId));

    // update or insert
    if (fromDb != null)
    {
        // link ids for update
        updatedLoc.Id = fromDb.Id;

        // set values for update
        db.Entry(fromDb).CurrentValues.SetValues(updatedLoc);
    }
    else
    {
        System.Diagnostics.Trace.WriteLine("Adding new location: " + updatedLoc.Name, "loadSimple");

        // insert a new location
        dataTable.Rows.Add(new object[] { \\the 14 fields of the location.. });
    }
}

System.Diagnostics.Trace.WriteLine("preparing to bulk insert", "loadSimple");

// perform the bulk insert
using (var bulkCopy = new System.Data.SqlClient.SqlBulkCopy(System.Configuration.ConfigurationManager.ConnectionStrings["bulk-inserter"].ConnectionString))
{
    bulkCopy.DestinationTableName = "Locations";

    for (int i = 0; i < dataTable.Columns.Count; i++)
    {
        bulkCopy.ColumnMappings.Add(i, i + 1);
    }

    bulkCopy.WriteToServer(dataTable);
}

// for update
db.SaveChanges();

Following the bulk copy, the items in theLocations I need the inserted ids to continue working on these objects even if the collection, which is a component of the EF ORM, has not changed (that is OK and anticipated).

A straightforward option would be to pick the data from the database once again right away. Since I already have the data, I can easily choose it again and place it in a separate collection.

However, that method seems erroneous since the ids cannot be obtained as part of the insert.

The straightforward technique works; for information on how to quickly sync it back to EF, see the acceptable answer below.

Perhaps I should use something different instead of SqlBulkCopy (I anticipate up to roughly 1000 rows, no more)?

Please take notice that a few linked SO questions and their answers all appear to depart from EF.

  1. It is possible to recover PrimayKey IDs after a SQL BulkCopy.
  2. Performance optimization for bulk inserts in Entity Framework
  3. (This is in reference to SaveChanges() performance when there are several pending inserts; it should be called per X inserts rather than at the conclusion of the processing when there are thousands of pending.)
1
4
5/23/2017 10:26:45 AM

Accepted Answer

SqlBulkCopy is the fastest thing you can perform with the EF. Certainly, raw SQLINSERT s move more slowly. So all you have to do is read Locations again. Reread the question while using MergeOption.OverwriteChanges.

4
5/1/2012 2:13:01 PM

Popular Answer

A stored procedure might be used to accomplish your goals if you are using SQL-Server 2008 or later. You would have to establish aTYPE It is identical to your SQL data table:

CREATE TYPE dbo.YourType AS TABLE (ID INT, Column1 INT, Column2 VARCHAR(5)...)

Pass this type on to a stored procedure after that.

CREATE PROCEDURE dbo.InsertYourType (@YourType dbo.YourType READONLY)
AS
    BEGIN
        DECLARE @ID TABLE (ID INT NOT NULL PRIMARY KEY)
        INSERT INTO YourTable (Column1, Column2...)
        OUTPUT inserted.ID INTO @ID
        SELECT  Column1, Column2...
        FROM    @YourType

        SELECT  *
        FROM    YourTable
        WHERE   ID IN (SELECT ID FROM @ID)

    END

This will retrieve all newly added rows and collect the ID for the inserted rows. Providing that your C# datatable follows the dbo format. Similar to how you would typically send a parameter to a SqlCommand, you may pass this YourType.

SqlCommand.Parameters.Add("@YourType", YourDataTable)

I understand that this is similar to your suggestion to reselect the data, but because this just uses the identify column, the select should be quick. While the issue of utilizing a SQL insert rather than a bulk copy still exists, you are switching back to a more set-based approach as opposed to EF's procedural approach. This is really close to the chosen response in one of the sites you provided, however I've eliminated a few steps by using a table variable.



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow