We are starting a new project where we need to store product and many product attributes in a database. The technology stack is MS SQL 2008 and Entity Framework 4.0 / LINQ for data access.
The products (and Products Table) are pretty straightforward (a SKU, manufacturer, price, etc..). However there are also many attributes to store with each product (think industrial widgets). These may range from color to certification(s) to pipe size. Every product may have different attributes, and some may have multiples of the same attribute (Ex: Certifications).
The current proposal is that we will basically have a name/value pair table with a FK back to the product ID in each row.
An example of the attributes Table may look like this:
ProdID AttributeName AttributeValue 123 Color Blue 123 FittingSize 1.25 123 Certification AS1111 123 Certification EE2212 123 Certification FM.3 456 Pipe 11 678 Color Red 999 Certification AE1111 ...
Note: Attribute name would likely come from a lookup table or enum.
So the main question here is: Is this the best pattern for doing something like this? How will the performance be? Queries will be based on a JOIN of the product and attributes table, and generally need many WHEREs to filter on specific attributes - the most common search will be to find a product based on a set of known/desired attributes.
If anyone has any suggestions or a better pattern for this type of data, please let me know.
You are about to re-invent the dreaded EAV model, Entity-Attribute-Value. This is notorious for having problems in real-life, for various reasons, many covered by Dave's answer.
Luckly the SQL Customer Advisory Team (SQLCAT) has a whitepaper on the topic, Best Practices for Semantic Data Modeling for Performance and Scalability. I highly recommend this paper. Unfortunately, it does not offer a panacea, a cookie cutter solution, since the problem has no solution. Instead, you'll learn how to find the balance between a fixed queryable schema and a flexible EAV structure, a balance that works for your specific case:
Semantic data models can be very complex and until semantic databases are commonly available, the challenge remains to find the optimal balance between the pure object model and the pure relational model for each application. The key to success is to understand the issues, make the necessary mitigations for those issues, and then test, test, and test. Scalability testing is a critical success factor if you are going to find that optimal design.
This is going to be problematic for a couple of reasons:
Your entity queries will be much harder to write. Transforming the results of those queries into something resembling a ViewModel when it comes time for presentation is going to be painful because it will involve a pivot for each product.
Understanding what your datatypes will be is going to be tough when it comes time to read certain types of data. Are you planning on storing this as strings? For example, DateTimes hold more data than the default .ToString() implementation writes to the string. You're also going to have issues if you try to store floating-point values.
Your objects' data integrity is at risk. There will be a temptation to put properties which should be just attributes of your main product tables in this "bucket o' data". Maybe the design will be semi-sane to begin with, but I guarantee you that after a certain amount of time, folks will start to just throw properties in the bag. It'll then be very tough to keep your objects' integrity with such a loosely defined structure.
Your indexes will most likely be suboptimal. Again think of a property which should be on your product table. Instead of being able to index on just one column, you will now be forced to make a potentially very large composite index on your "type" table.
Since you're apparently planning to throw out proper datatypes and use strings, the performance of range queries for numeric data will likely be poor.
Your table will get big, slowing backups and queries. Instead of an integer being 4 bytes, you're going to have to store far more for an integer of any size.
Better to normalize the table in a more "traditional" way using "IS-A" relationships. For example, you might have Pipes, which are a type of Product, but have a couple more attributes. You might have Stoves, which are a type of product, but have a couple more attributes still.
If you really have a generic database and all sorts of other properties which aren't going to be subject to data integrity rules, you very well may want to consider storing data in an XML column. It's hard to tell you what the correct design choice is unless I know a lot more about your business.
IMO this is a design antipattern. The siren song of this idea has lured many a developer onto the rocks of of an unmaintainable application.