Slowly Changing dimensional in Informatica with example ( SCD -1, SCD -2 , SCD -3)
Dimensions
that change over time are called Slowly Changing Dimensions. For instance, a
product price changes over time; People change their names for some reason;
Country and State names may change over time. These are a few examples of
Slowly Changing Dimensions since some changes are happening to them over a
period of time.
slowly changing dimension in informatica |
Slowly Changing Dimensions are often
categorized into three types namely Type1, Type2 and Type3.
The following section deals with how to capture and handling these changes over
time.
The "Product" table
mentioned below contains a product named, Product1 with Product ID being the
primary key. In the year 2004, the price of Product1 was $150 and over the
time, Product1's price changes from $150 to $350. With this information, let us
explain the three types of Slowly Changing
Dimensions.
Product Price in 2004:
Product ID(PK)
|
Year
|
Product Name
|
Product Price
|
1
|
2004
|
Product1
|
$150
|
Type 1: Overwriting the old values.
In the year 2005, if the price of the product changes to $250, then the old values of the columns "Year" and "Product Price" have to be updated and replaced with the new values. In this Type 1, there is no way to find out the old value of the product "Product1" in year 2004 since the table now contains only the new price and year information.
In the year 2005, if the price of the product changes to $250, then the old values of the columns "Year" and "Product Price" have to be updated and replaced with the new values. In this Type 1, there is no way to find out the old value of the product "Product1" in year 2004 since the table now contains only the new price and year information.
Product
Product ID(PK)
|
Year
|
Product Name
|
Product Price
|
1
|
2005
|
Product1
|
$250
|
Type 2: Creating an another additional
record.
In this Type 2, the old values will not be replaced but a new row containing the new values will be added to the product table. So at any point of time, the difference between the old values and new values can be retrieved and easily be compared. This would be very useful for reporting purposes.
In this Type 2, the old values will not be replaced but a new row containing the new values will be added to the product table. So at any point of time, the difference between the old values and new values can be retrieved and easily be compared. This would be very useful for reporting purposes.
Product
Product ID(PK)
|
Year
|
Product Name
|
Product Price
|
1
|
2004
|
Product1
|
$150
|
1
|
2005
|
Product1
|
$250
|
The problem with the above mentioned
data structure is "Product ID" cannot store duplicate values of
"Product1" since "Product ID" is the primary key. Also, the
current data structure doesn't clearly specify the effective date and expiry
date of Product1 like when the change to its price happened. So, it would be
better to change the current data structure to overcome the above primary key
violation.
Product
Product ID(PK)
|
Effective
DateTime(PK) |
Year
|
Product Name
|
Product Price
|
Expiry
DateTime |
1
|
01-01-2004 12.00AM
|
2004
|
Product1
|
$150
|
12-31-2004 11.59PM
|
1
|
01-01-2005 12.00AM
|
2005
|
Product1
|
$250
|
|
In the changed Product table's Data
structure, "Product ID" and "Effective DateTime" are
composite primary keys. So there would be no violation of primary key
constraint. Addition of new columns, "Effective DateTime" and
"Expiry DateTime" provides the information about the product's
effective date and expiry date which adds more clarity and enhances the scope
of this table. Type2 approach may need additional space in the data base, since
for every changed record, an additional row has to be stored. Since dimensions
are not that big in the real world, additional space is negligible.
Type 3: Creating new fields.
In this Type 3, the latest update to the changed values can be seen. Example mentioned below illustrates how to add new columns and keep track of the changes. From that, we are able to see the current price and the previous price of the product, Product1.
In this Type 3, the latest update to the changed values can be seen. Example mentioned below illustrates how to add new columns and keep track of the changes. From that, we are able to see the current price and the previous price of the product, Product1.
Product
Product ID(PK)
|
Current
Year |
Product
Name |
Current
Product Price |
Old Product
Price |
Old Year
|
1
|
2005
|
Product1
|
$250
|
$150
|
2004
|
would not be able to see the complete
history of 2004 prices, since the old values would have been updated with 2005
product information.
Product
Product ID(PK)
|
Year
|
Product
Name |
Product
Price |
Old Product
Price |
Old Year
|
1
|
2006
|
Product1
|
$350
|
$250
|
2005
|
Comments
Post a Comment