How to copy and validate data from one table (all varchar) to another (typed) in C#?

c# entity-framework sqlbulkcopy


[note: needs to be in code as can't use SSIS or similar]

I need to bulk copy data from one database to another using C# and EF probably - though this isn't cast in stone.

The problem is that the source data is all in varchar(max) and I want the destination in correct data types. The source is historical from an old ETL job that works very well and I can't get replaced. The most common issue I have seen is alpha's in numeric fields - e.g. "none" in a money field. These are fine in the source since it's all varchar.

I'd like to copy the data and validate it:
source -> validate -> destination
in the simplest way possible. If validation fails then I need to know the exact row that failed (and ideally WHAT failed) so that it can be manually fixed in the source, and the data re-copied.

There are around 50 tables ranging between 10 and 1.7M rows! So speed is important as well.

What would be a sensible way to approach this? Create DTO's, validation attributes and automap? Two EF entities and map across row by row and validate each? SPROC and manual insert?

2/27/2013 5:43:25 AM

Accepted Answer

Do it in T-SQL with a linked server.


--begin a transaction to wrap validation and load

--Validate that no tickets are set to closed without a completion date
FROM bigTableOnLocalServer with (TABLOCKX) -- prevent new rows
WHERE ticketState = '1' /* ticket closed */ and CompletionDate = 'open' 

--if validation fails, quit the transaction to release the lock

--if no rows in result set 1, execute the load
INSERT INTO RemoteServerName.RemoteServerDBName.RemoteSchema.RemoteTable (field1Int, Field2Money, field3text)
SELECT CAST(Field1 as int), 
    CASE Field2Money WHEN 'none' then null else CAST(Field2Money as money) END,
FROM bigTableOnLocalServer
WHERE recordID between 1 and 1000000

-- after complete, commit the transaction to release the lock

If you cannot communicate directly between the servers, still do the validation in SQL, but use a C# client to write the data to disk and hit the Bulk insert function on the destination server. Since the C# component would do nothing more than transport the data, I would just go straight to a format usable by BULK INSERT, à la CSV.

2/27/2013 5:25:07 AM

Related Questions


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow