Improving the performance of bulk inserts in Entity Framework

entity entity-framework

Question

I want to insert 20000 records in a table by entity framework and it takes about 2 min. Is there any way other than using SP to improve its performance. This is my code:

 foreach (Employees item in sequence)
 {
   t = new Employees ();
   t.Text = item.Text;
   dataContext.Employees.AddObject(t);                  
 }
 dataContext.SaveChanges();
2
111
10/29/2014 10:25:12 AM

Accepted Answer

There is opportunity for several improvements (if you are using DbContext):

Set:

yourContext.Configuration.AutoDetectChangesEnabled = false;
yourContext.Configuration.ValidateOnSaveEnabled = false;

Do SaveChanges() in packages of 100 inserts... or you can try with packages of 1000 items and see the changes in performance.

Since during all this inserts, the context is the same and it is getting bigger, you can rebuild your context object every 1000 inserts. var yourContext = new YourContext(); I think this is the big gain.

Doing this improvements in an importing data process of mine, took it from 7 minutes to 6 seconds.

The actual numbers... could not be 100 or 1000 in your case... try it and tweak it.

211
11/16/2017 12:20:12 PM

Expert Answer

There are two major performance issues with your code:

  • Using Add method
  • Using SaveChanges

Using Add method

The Add method becomes only slower and slower at each entity you add.

See: http://entityframework.net/improve-ef-add-performance

For example, adding 10,000 entities via:

  • Add (take ~105,000ms)
  • AddRange (take ~120ms)

Note: Entities has not been saved yet in the database!

The problem is that the Add method tries to DetectChanges at every entity added while AddRange does it once after all entities have been added to the context.

Common solutions are:

  • Use AddRange over Add
  • SET AutoDetectChanges to false
  • SPLIT SaveChanges in multiple batches

Using SaveChanges

Entity Framework has not been created for Bulk Operations. For every entity you save, a database round-trip is performed.

So, if you want to insert 20,000 records, you will perform 20,000 database round-trip which is INSANE!

There are some third-party libraries supporting Bulk Insert available:

  • Z.EntityFramework.Extensions (Recommended)
  • EFUtilities
  • EntityFramework.BulkInsert

See: Entity Framework Bulk Insert library

Be careful, when choosing a bulk insert library. Only Entity Framework Extensions support all kind of associations and inheritance, and it's the only one still supported.


Disclaimer: I'm the owner of Entity Framework Extensions

This library allows you to perform all bulk operations you need for your scenarios:

  • Bulk SaveChanges
  • Bulk Insert
  • Bulk Delete
  • Bulk Update
  • Bulk Merge

Example

// Easy to use
context.BulkSaveChanges();

// Easy to customize
context.BulkSaveChanges(bulk => bulk.BatchSize = 100);

// Perform Bulk Operations
context.BulkDelete(customers);
context.BulkInsert(customers);
context.BulkUpdate(customers);

// Customize Primary Key
context.BulkMerge(customers, operation => {
   operation.ColumnPrimaryKeyExpression = 
        customer => customer.Code;
});

EDIT: Answer Question in Comment

Is there a recommend max size for each bulk insert for the library you created

Not too high, not too low. There isn't a particular value that fit in all scenarios since it depends on multiple factors such as row size, index, trigger, etc.

It's normally recommended to be around 4000.

Also is there a way to tie it all in one transaction and not worry about it timing out

You can use Entity Framework transaction. Our library uses the transaction if one is started. But be careful, a transaction that takes too much time come also with problems such as some row/index/table lock.

5
8/26/2018 7:50:11 PM


Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow