SURROGATE KEYS
“A surrogate key is an
artificial or synthetic key that is used as a substitute for a natural key”.
The surrogate key is an automatically generated
sequence number which will act as primary key for a dimension and also used in
referencing from fact tables.
In this example Org_key is Surrogate Key and Org_ID is natural key.
Org_key
|
Org_ID
|
Org_Name
|
HeadQuarters
|
1
|
101
|
XYZ Ltd
|
Sydney
|
2
|
102
|
PQR Ltd
|
Cape Town
|
3
|
103
|
ABC Ltd
|
Beijing
|
The question!!!! –What is use of generating a special key(surrogate)
when we already have a natural key coming in from source???.
Let us find the answer to this through the following scenario:
Scenario - 1 :
System1 and System2 have a products table which we integrate in our
Dim_Products table in the DWH.
Both the systems have their own unique identifiers for the Products
which is Product_ID column.
However, when we try to consolidate the data, we see a conflict in the
Natural keys;
For e.g. Product_ID=1 represents “Rice” in the System1 , but there is
also a Product_ID=1 in System2 (which represents a different product).
Here is pictorial
representation of the scenario(with and without surrogate keys)
*The use of Surrogate key, eliminates this confusion
and gives a us a clear-cut attribute, that can be used in referencing the
product table.
Scenario - 2 :
There are also other scenarios e.g.”98xxxxx256” is a
phone number associated with person A. Later person A has changed his number
and this number is allotted to person B and then person C; the surrogate key
comes into the picture helping us to maintain the history of that original
mapping.
Some recommended characteristics of a Surrogate key in
DWH:
·
System generated
(not composed of several values from different domains).
·
Unique
system-wide, hence never reused
·
Should not be not
editable by the user or application
Thank you very much! I totally understand the benefit of using surrogate keys now.
ReplyDelete