The best SQL Server storage pattern for (product) characteristics

database database-design entity-framework sql sql-server

Question

In a database, we must store the product and many of its properties as part of a new project that we are commencing. MS SQL 2008 and Entity Framework 4.0/LINQ for data access make up the technological stack.

The items are rather simple, as is the products table (a SKU, manufacturer, price, etc..). However, each product must also be stored with a number of properties (think industrial widgets). These might include pipe size, certification(s), and color. Each product may have unique characteristics, while others may have many variations of the same characteristic (Ex: Certifications).

According to the current plan, we will essentially have a database of name/value pairs with an FK back to the product ID in each row.

An example of the characteristics Table would resemble this:

ProdID     AttributeName     AttributeValue
123        Color             Blue
123        FittingSize       1.25
123        Certification     AS1111
123        Certification     EE2212
123        Certification     FM.3
456        Pipe              11
678        Color             Red
999        Certification     AE1111
...

Note: An enum or lookup table would probably provide the attribute name.

The key query in this situation is therefore: Is this the optimal pattern for carrying out such an action? The performance's quality? The most frequent search will be to discover a product based on a set of known/desired characteristics. Queries will be based on a JOIN of the product and attributes database, and often require multiple WHEREs to filter on particular attributes.

Please let me know if you have any recommendations or a better pattern for this kind of data.

Thanks! -Ed

1
18
5/26/2010 12:29:01 AM

Accepted Answer

You're going to reinvent the terrible Entity-Attribute-Value (EAV) paradigm. Due to a variety of factors, many of which were addressed in Dave's response, this is infamous for having issues in real life.

Fortunately, the SQLCAT has a whitepaper on the subject titled Semantic Data Modeling Best Practices for Performance and Scalability. I heartily endorse this essay. Sadly, it does not provide a quick fix or a predetermined answer since the issue is unsolvable. Instead, you'll discover how to strike a balance that's appropriate for your particular situation between a rigid queryable schema and a flexible EAV structure:

Semantic data models can be very complex and until semantic databases are commonly available, the challenge remains to find the optimal balance between the pure object model and the pure relational model for each application. The key to success is to understand the issues, make the necessary mitigations for those issues, and then test, test, and test. Scalability testing is a critical success factor if you are going to find that optimal design.

18
11/6/2013 8:39:28 PM

Popular Answer

This will cause problems for the following reasons:

  • Writing entity queries for you will be significantly more challenging. When it comes time for presentation, converting the results of those queries into anything like a ViewModel will be difficult since it will need a pivot for each product.

  • When it comes time to read certain forms of data, it will be difficult to understand what your datatypes will be. Do you intend to save this as strings? For instance, DateTimes may store more information than the default. The string is written by the ToString() method. If you attempt to save floating-point values, you'll also run into problems.

  • The data integrity of your objects is at danger. There will be a desire to include characteristics in this "bucket of data" that should only be attributes of your primary product tables. It's possible that the design will initially seem somewhat rational, but I can assure you that after a while, people will start to simply toss things into the bag. The integrity of your objects will therefore be extremely difficult to maintain with such a vaguely specified structure.

  • Most certainly, your indices won't be at their best. Consider another property that belongs on your product table. You will now be compelled to create a possibly extremely big composite index on your "type" table as you were previously only permitted to index on a single column.

  • Range searches for numeric data will probably perform poorly since you seem to be aiming to abandon correct datatypes and use strings instead.

  • Your table will expand, causing backups and queries to run slowly. An integer of any size will need far more storage space than the standard 4 bytes.

It is preferable to use "IS-A" connections to normalize the table in a more "conventional" manner. For instance, you may have Pipes, a sort of Product that has a few more properties. Stoves are a form of merchandise, but they may also have a few other characteristics.

You may want to think about putting data in an XML column if you actually have a generic database and a variety of other features that won't be subject to data integrity requirements. Without learning much more about your company, it's difficult for me to advise you on the best design option.

I think this is an antipattern in design. Many developers have fallen victim to the seductive song of this concept and ended up with an unmaintainable program.



Related Questions





Related

Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow