`

Slowly Changing Dimensions Type 1 and Type 2(原创)

 
阅读更多

Slowly Changing Dimensions

The "Slowly Changing Dimension" problem is a common one particular to data warehousing. In a nutshell, this applies to cases where the attribute for a record varies over time. We give an example below:

Christina is a customer with ABC Inc. She first lived in Chicago, Illinois. So, the original entry in the customer lookup table has the following record:

Customer Key Customer_ID Name Birthday State
1001 999 Christina 1990-01-01 Illinois

At a later date, she moved to Los Angeles, California on January, 2003. How should ABC Inc. now modify its customer table to reflect this change? This is the "Slowly Changing Dimension" problem.

There are in general three ways to solve this type of problem, and they are categorized as follows:

Type 1: The new record replaces the original record. No trace of the old record exists.

Type 2: A new record is added into the customer dimension table. Therefore, the customer is treated essentially as two people.

Type 3: The original record is modified to reflect the change.

Type 1 Slowly Changing Dimension

When the source of a dimension value changes, and it is not necessary to preserve its history in the star schema, a type 1 response is employed. The dimension is simply overwritten with the new value. This technique is commonly employed in situations where a source data element is being changed to correct an error.
By overwriting the corresponding dimension in the star schema, the type 1 change obliterates the history of the data element. The star carries no hint that the column ever contained a different value. While this is generally the desired effect, it can also lead to confusion. If there were any associated facts before the change occurred, their historic context is retroactively altered.
When a record is updated in a dimension table, the context for existing facts is restated. This effect
can give rise to confusion. Steps can be taken to minimize the confusion caused by type 1 changes. Systems analysts responsible for supporting the data warehouse users must be aware of this phenomenon so they are prepared to address confusion. Developers of reports can place the query execution date within the report footer or cover page, signaling to readers the date as of which the report was current. Any reports that are pre-run and stored for users can be automatically updated on a regular basis so users do not inintentionally access “stale” data.

In our example, recall we originally have the following table:

Customer Key Customer_ID Name Birthday State
1001 999 Christina 1990-01-01 Illinois

Developers found Operating system record Christina's birthday incorrectly at the begining, and they correct the birtday value to 1990-02-01, the new information replaces the new record, and we have the following table:

Customer Key Customer_ID Name Birthday State
1001 999 Christina 1990-02-01 Illinois

Advantages:

- This is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep track of the old information.

Disadvantages:

- All history is lost. By applying this methodology, it is not possible to trace back in history. For example, in this case, the company would not be able to know that Christina lived in Illinois before.

Preexisting Facts Have a New Context
When a record is updated in a dimension table, the context for existing facts is restated. This effect
can give rise to confusion.

Usage:

About 50% of the time.

When to use Type 1:

Type 1 slowly changing dimension should be used when it is not necessary for the data warehouse to keep track of historical changes.

Type 2 Slowly Changing Dimension

In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own surrogate key.

In our example, recall we have corrected the birthday value, and Christina moved from Illinois to California this time, the following table:

Customer Key Customer_ID Name Birthday State
1001 999 Christina 1990-02-01 Illinois

The Order fact table shows as below

Customer Key day_key product_key quantity_ordered
1001 2322 10119 5

After Christina moved from Illinois to California, we add the new information as a new row into the table:

Dimension table

Customer Key Customer_ID Name Birthday State
1001 999 Christina 1990-02-01 Illinois
1005 999 Christina 1990-02-01 California

Fact table

Customer Key day_key product_key quantity_ordered
1001 2322 10119 5
1005 4377 20111 1

Historic Context of Facts Is Preserved
By creating multiple versions of the dimension, a type 2 response avoids restating the context of previously existing facts. Old facts can remain associated with the old row; new facts can be associated with the new row. This has the desired effect of preserving past history, while allowing new activity to be associated with the new value.

History of Dimension Is Partially Maintained
A type 2 change results in multiple dimension rows for a given natural key. While this serves to preserve the historic context of facts, it can trigger new forms of confusion. Users may be confused by the presence of duplicate values in the dimension tables. Designers may be lulled by a false sense that they are preserving dimensional history.
Type 2 changes can confuse end users because they cause duplicate values to appear in dimension tables. For example, after Christina's change of address, there are two rows in the dimension table for her customer key. If someone were to query the dimension table to get the name associated with Christina, both rows would be returned. This side effect can be avoided by issuing browse queries that select distinct values. A flag may also be added to indicate the current row for a given natural key value.

Although the type 2 change preserves the historic context of facts, it does not preserve history in the dimension. It is easy to see that a given natural key has taken on multiple representations in the dimension, but we do not know when each of these representations was correct. This information is only provided by way of a fact.
For example, after the change to Christina’s address has occurred, the dimension table in Figure 3-8 shows that there have been two versions of Christina, but it cannot tell us what Christina looked like on any given date. Where was she living on January 1, 2008? The dimension table does not carry this information. If there is an order for January 1, 2008, we are in luck, because the orders fact table will refer to the version of Sue that was correct at the time of the order. If there is not an order on that date, we are unable to determine what Christina looked like at that point in time.
It may be clear to you that this problem is easily rectified by adding a date stamp to each version of Christina. This technique allows the dimension to preserve both the history of facts and the history of dimensions. Another possibility is to build an additional fact table that associates versions of Sue with various dates.

Advantages:

- This allows us to accurately keep all historical information.

Disadvantages:

- This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern.

- This necessarily complicates the ETL process.

- It's not easy to confirm which version is correct in dimension table without timestamp information.

Usage:

About 50% of the time.

When to use Type 2:

Type 2 slowly changing dimension should be used when it is necessary for the data warehouse to track historical changes.

In addition to the type 1 and type 2 techniques introduced in this chapter, additional responses to source data changes are possible. Options include the type 3 response, hybrid responses, and time-stamped variations. Though less common, these techniques meet additional analytic challenges that I plan to elobrate it by other post.

 

参考至:《Star Schema The Complete Reference》

                http://www.1keydata.com/datawarehousing/slowly-changing-dimensions.html

                http://www.1keydata.com/datawarehousing/slowly-changing-dimensions-type-1.html

                http://www.1keydata.com/datawarehousing/slowly-changing-dimensions-type-2.html

本文原创,转载请注明出处,作者

如有错误,欢迎指正

邮箱:czmcj@163.com

0
1
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics