There are three approaches to implement a primary key:
Natural key:
1. A column or collection of columns that define the table uniqueness (existing column\columns from data table).
Surrogate primary key: an auto-generated keys without any business relationship:
2. Auto-increment column (Aka sequence) that store in an ID column.
3. Auto-generated key using the Global Unique Identifier(GUID).
|
No. |
Subject |
Natural primary key |
Sequence key |
GUID key |
|
1 |
Extra data size(tables and indexes) |
Big index data |
Add 4 bytes for int and 8 for long (peer row) |
Add 16 bytes for each row |
|
2 |
Key visibility |
Seen and understood to users |
Hide from users |
Hide from users |
|
3 |
Modifying Keys |
Difficult to update |
Easy to modify |
Easy to modify |
|
4 |
Quantity of joins |
Reducing the join between tables(In some cases) |
When we use a link table that contain 2 keys from A, B table to C table we must include A, B ,C in any join. |
When we use a link table that contain 2 keys from A, B table to C table we must include A, B, C in any join. |
|
5 |
Sql statement complexity |
Hard to implement, must contain all the primary columns in almost any statement |
The easiest option to implement(Only one primary column, and it's contains only numeric incrementally number) |
More easy from the natural primary key, but slightly more difficult from the sequence key |
|
6 |
Ensuring uniqueness when working disconnected |
Very complicated |
The uniqueness can be broken if the identity key is duplicated(We can solve this problem if we use some tricks like: working with negative number for the primary key in the disconnected entities and changing the primary key when merging to master database) |
The best option, because the primary key can't be duplicated*.
* The chance to get the same key is 1/(2^127) |
|
8 |
Migrating data to other databases |
The synchronization is a little bit complicated |
Requires some work, especially in the synchronization process of the Foreign Key tables |
Just copy the data… |
Who is the winner?
It's a good question; I don't think that we have only one answer. It's depends on our requirements.
For example:
- If we using a data warehouse were we need to provide high performance read-only access, the problem of the key modification is less severed. In this case we should use the natural primary keys.
- If we are working with disconnected data and database synchronization, we should choose the GUID keys.
- If we have an application that doesn't work with disconnected data and have regular DML operations, we should choose the sequence keys.