This is my second time hosting the t-SQL Tuesday blog party. The party was started by Adam Machanic (b|t) in December of 2009.
This month’s invitation topic is on Data Modeling Gone Wrong. Being a Database Developer, I deal with bad database design decisions daily. One of my app-dev teammates loves to tell me that the bad decisions were made because I didn’t work there yet. (That makes me laugh.)
Surrogate Keys vs Natural Keys
The point of surrogate keys is to represent complicated natural keys as the primary key of the table. Both the surrogate key and natural key will yield a unique key for the row. Sometimes that unique natural key is the entire row. When possible, it is better to use the natural key since it is the true representation of the row. Unfortunately, this is not always practical. Let’s look at some examples.
In the employee table it would take four fields to make a primary key from the natural key (first name, last name, social security number, and birthdate). Note: This is assuming this table is only used in the US and the employees have social security numbers. The reason the birthdate is also needed is due to the fact that social security numbers can be reused after someone has passed away. For the employee table it makes sense to have a surrogate key since it would be cumbersome to use all four fields as foreign keys in other tables.
The StateList (representing each state in the United States) is a good example of using the natural key as the primary key. Each state only uses two characters to represent the State so CHAR(2) can be used for the natural key and the primary key. This would provide the added benefit of not needing to join back to the StateList to get the two character representation of the State abbreviation…unless additional information about the state is needed. So what is the point of this table? Well, by having it, you are guaranteed referential integrity on the StateCode field by having a foreign key back to the StateList table. You don’t have to worry that someone puts ZZ as a StateCode.
Danger, Will Robinson!
One of the problems I’ve seen with careless use of surrogate keys are the duplication of natural keys. Quite often it’s overlooked that the natural key still needs to have a unique constraint. Without it, the reporting team ends up having to use MAX or DISTINCT to get the latest instance of the natural key, or SSIS packages are needed to clean up the duplicates. This can be compounded with many-to-many tables.
Many-to-many tables allow two tables to be joined multiple times. An example can be seen in the car insurance industry. If you have multiple people on the same insurance and they are registered to drive multiple cars, then a many-to-many table would be created to capture the data.
If a surrogate key is used on the many-to-many table in order to provide uniqueness and if the natural key does not have a unique constraint, then duplicate natural key combinations can occur. This can be obfuscated if there is additional information in the table. Maybe the amount the car is insured, is also maintained in this table. Let’s take Victoria’s insurance as an example. If Victoria is in the table with her 1971 Corvette listed twice with two different insurance amounts listed, which one is the current one? The better pattern in this case would be to use the natural key.
Surrogate keys are very useful, but it should not be assumed that they should be used for all tables. The natural key should always be considered first. If the natural key is too complicated to be used as foreign keys in other tables, then the surrogate key is a good choice. Just remember to ALSO put a unique constraint on the natural key.
Thanks for all the fish
I had several people tell me on Twitter that they were going to write their first blog post for this t-SQL Tuesday blog party. I want to thank them ahead of time for taking the leap into the blogging world to share their experiences and expertise in their fields.