In a transaction system, many a times the change is overwritten and track of change is lost. However, a data warehouse needs to maintain all the history as the key benefit of a warehouse is to provide historical information to analyze the trend.
Below SCD Types are the most implemented methods to
handle these changing dimensions in a warehouse. Let’s understand these with
an example below
Example:
You have a dimensional table with Customer_ID ' C01' with marital status as 'single' mentioned below. Overtime, customer gets married and also moves to a new location.
Let’s see how this scenario is managed with different SCD types.
Initial
Data Record:
Surrogate Key(Surrogate Key)
|
Customer ID(Natural Key)
|
Date Valid
|
Marital Status
|
Date of Birth
|
City
|
100
|
C01
|
Jan
23, 2008
|
Single
|
Jan8,
1982
|
Palo
Alto
|
SCD Type1:
The Type 1 methodology overwrites old data with new
data, and therefore does not track historical data at all. This is obviously
done, when we are not analyzing the historical information.
Surrogate Key
|
Customer ID
|
Date Valid
|
Marital Status
|
Date of Birth
|
City
|
100
|
C01
|
July
7,2012
|
Married
|
Jan8,
1982
|
Francisco
|
The record
is simple over-written and no history is maintained here.
SCD Type 2:
The Type 2 method tracks historical data by creating
multiple records for a given natural key in the dimensional tables with separate surrogate
keys and/or different version numbers. With Type 2, we have unlimited
history preservation as a new record is inserted each time a change is made.
This is implemented using a version column or through effective date columns to know the active record.
Implemented through a Version Column
Surrogate Key
|
Customer ID
|
Date Valid
|
Marital Status
|
Date of Birth
|
City
|
Version
|
100
|
C01
|
Sept
23, 2004
|
Single
|
Jan8,
1982
|
Palo
Alto
|
0
|
101
|
C01
|
Sept
23, 2004
|
Married
|
Jan8,
1982
|
Francisco
|
1
|
Implemented through Effective Start and End
Date Columns
Surrogate Key
|
Customer ID
|
Date Valid
|
Marital Status
|
Date of Birth
|
City
|
Start-Date
|
End-Date
|
100
|
C01
|
Sept
23, 2004
|
Single
|
Jan8,
1982
|
Palo
Alto
|
01-Sep-2000
|
23-Sep-2004
|
101
|
C01
|
Sept
23, 2004
|
Married
|
Jan8,
1982
|
Francisco
|
24-Sep-2004
|
31-12-9999
|
A new record is added every time there is a change in
the source with the version or Effective Date columns updating accordingly..
SCD Type 3:
The Type 3 method tracks changes using separate
columns. Whereas Type 2 had unlimited history preservation, Type 3 has limited
history preservation, as it's limited to the number of columns designated for
storing historical data and will have only the recent historical change.
Where the original table structure in Type 1 and Type 2 was very similar, Type
3 adds additional columns to the tables
Implemented through additional Original Columns
Surrogate Key
|
Customer ID
|
Date Valid
|
Original Marital Status
|
Marital Status
|
Date of Birth
|
Original City
|
City
|
100
|
C01
|
Sept
23, 2004
|
Single
|
Married
|
Jan8,
1982
|
Palo
Alto
|
Francisco
|
Original Columns have been added to capture the most
recent historical change .
Additional SCDs which are occasionally used:
SCD Type 0:
· The Type 0 method is a passive approach to managing dimension value changes, in which no action is taken.
·
Values remain as they were at
the time of the dimension record was first entered.
SCD Type 4
· The Type 4 method is usually referred to as using "history tables", where one table keeps the current data,
·
An additional table is used to keep
a record of some or all changes.
SCD Type 6 / hybrid:
No comments:
Post a Comment