You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 1000_Design_Patterns/Design Pattern - Generic - Types of History.md
+85-83Lines changed: 85 additions & 83 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,117 +5,119 @@ This design pattern describes the definitions for the commonly used history stor
5
5
6
6
## Motivation
7
7
Due to definitions changing over time and different definitions being made by different parties there usually is a lot of discussion about what exactly constitutes the different types of history. This design pattern aims to define these history types in order to provide the common ground for discussion.
8
-
Also known as
9
-
SCD / Slowly Changing Dimensions
10
-
Type 1,2,3,4 etc.
8
+
9
+
This is also known as:
10
+
* SCD; Slowly Changing Dimensions
11
+
* Type 1,2,3,4 etc.
11
12
12
13
## Applicability
13
-
Every situation where historical data is needed / stored or a discussion arises. Depending on the Data Warehouse architecture, this can be needed in a variety of situations. But typically these concepts are applied in the integration and presentation layer of the Data Warehouse.
14
+
Every situation where historical data is needed / stored or a discussion arises.
15
+
16
+
Depending on the Data Warehouse architecture, this can be needed in a variety of situations. But typically these concepts are applied in the integration and presentation layer of the Data Warehouse.
14
17
15
18
## Structure
16
19
The following history types are defined, some distinction is made where there are multiple viable explanations. All definitions can be valid coming from a specific background and in order to cater for every situation some history types are tagged with specific letters indicating a slightly different approach.
17
-
Type 0. No change, while uncommon it has to be mentioned that this passive approach sometimes is implemented when storage space is to be saved or only the initial state has to be preserved.
18
-
Type 1 – A. Change only the latest record. This implementation of type 1 is implemented if there is limited interest in keeping a specific kind of history. A good example is spelling errors; only the latest record is updated in that case (if you’re not interested in the wrong spelling for data quality purposes).
20
+
21
+
**Type 0**. No change, while uncommon it has to be mentioned that this passive approach sometimes is implemented when storage space is to be saved or only the initial state has to be preserved.
22
+
23
+
**Type 1 – A**. Change only the latest record. This implementation of type 1 is implemented if there is limited interest in keeping a specific kind of history. A good example is spelling errors; only the latest record is updated in that case (if you’re not interested in the wrong spelling for data quality purposes).
24
+
19
25
An example of the first instance of a type 1-A change:
20
26
Old situation; a record exists for the logical key CHS (Cheese). The attribute Name is defined as a type 1(A) attribute.
21
-
DWH_KEY Logical Key Name Colour Start date End date Update date
22
-
3 CHS
23
-
Cheese
24
-
Golden05-01-200031-12-999905-01-2000
25
-
2CHSCheeseYellow11-01-199604-01-200011-01-1996
26
-
3 CHSCheeseYellow07-03-199410-01-199610-01-1996
27
-
28
-
When at some point (at 24-06-2006) the name is changed to Old Cheese and the Name attribute is defined as type 1(A) the name is overwritten, resulting in the following:
29
-
DWH_KEY Logical Key Name Colour Start date End date Update date
30
-
3 CHS
31
-
Old Cheese
32
-
Golden05-01-200031-12-999924-06-2006
33
-
2CHSCheeseYellow11-01-199604-01-200011-01-1996
34
-
3 CHSCheeseYellow07-03-199410-01-199610-01-1996
35
-
36
-
Type 1 – B. Update the entire history based on the latest situation. The previous example for the second version of type 1 is as follows:
27
+
28
+
DWH Key | Logical Key | Name | Colour | Start date | End date | Update date
When at some point (at 24-06-2006) the name is changed to *Old Cheese* and the Name attribute is defined as type 1(A) the name is overwritten, resulting in the following:
35
+
36
+
DWH Key | Logical Key | Name | Colour | Start date | End date | Update date
37
+
--- | --- | --- | --- | --- | --- | ---
38
+
3 | CHS | Old Cheese | Golden | 05-01-2000 | 31-12-9999 | 24-06-2006
When at some point (at 24-06-2006) the name is changed to Old Cheese and the Name attribute is defined as type 1(B) the name is overwritten, resulting in the following:
46
-
DWH_KEY Logical Key Name Colour Start date End date Update date
Type 2 / also known as SCD-type2. The slowly changing dimension type 2 concept tracks history by inserting a new record and closing the most recent corresponding record whenever a change occurs.
52
+
53
+
DWH Key | Logical Key | Name | Colour | Start date | End date | Update date
54
+
--- | --- | --- | --- | --- | --- | ---
55
+
3| CHS | Old Cheese | Golden | 05-01-2000 | 31-12-9999 | 24-06-2006
**Type 2** / also known as SCD-type2. The slowly changing dimension type 2 concept tracks history by inserting a new record and closing the most recent corresponding record whenever a change occurs.
54
60
A new record is inserted in the Data Warehouse table.
55
-
DWH_KEY Logical Key Name Colour Start date End date
56
-
1 CHS
57
-
Cheese
58
-
Golden05-01-200031-12-9999
61
+
62
+
DWH Key | Logical Key | Name | Colour | Start date | End date | Update date
In this case you have basic information of a product; the name is the attribute that can change over time. This record has been inserted on the 1st of January 2000 and is still active. But now, on the 20th July 2008 the name changes to Old Cheese. This will lead to a new record and an updated previous record for the same DWH key.
61
-
DWH_KEY Logical Key Name Colour Start date End date
62
-
2 CHS
63
-
Cheese
64
-
Golden 20-07-2008 31-12-9999
65
-
1 CHS
66
-
Cheese
67
-
Golden 05-01-2000 19-07-2008
68
-
69
-
Type 3 history stores history in a separate attribute. As many attributes can be added to a record as the previous states that need to be captured. Typically only the previous state is recorded in the separate attribute. An example would be:
67
+
68
+
DWH Key | Logical Key | Name | Colour | Start date | End date | Update date
**Type 3** history stores history in a separate attribute. As many attributes can be added to a record as the previous states that need to be captured. Typically only the previous state is recorded in the separate attribute. An example would be:
70
74
A new record is inserted in the Data Warehouse table on 12-10-2009:
71
-
DWH_KEY Logical Key Name Previous name Colour Update date
72
-
1 CHS
73
-
Cheese
74
-
<empty> Golden12-10-2009
75
+
76
+
DWH Key | Logical Key | Name | Previous Name | Colour | Update date
77
+
--- | --- | --- | --- | --- | ---
78
+
1 | CHS | Cheese | NULL | Golden | 12-10-2009
75
79
76
80
When the name is changed to Old Cheese on February 2010 it leads to the following results:
77
-
DWH_KEY Logical Key Name Previous name Colour Update date
78
-
1 CHS
79
-
Old Cheese
80
-
Cheese Golden 02-02-2010
81
81
82
-
Type 4. This history tracking mechanism operates by using separate tables to store the history. One table contains the most recent version of the record and the history table contains some or all history.
82
+
DWH Key | Logical Key | Name | Previous Name | Colour | Update date
83
+
--- | --- | --- | --- | --- | ---
84
+
1 | CHS | Old Cheese | Cheese | Golden | 02-02-2010
85
+
86
+
**Type 4**. This history tracking mechanism operates by using separate tables to store the history. One table contains the most recent version of the record and the history table contains some or all history.
83
87
84
-
Type 5. The type 5 method of tracking history uses versions of tables for every period in time. Also known as ‘snapshotting’. No example is supplied since it’s basically a copy of the entire table.
88
+
**Type 5**. The type 5 method of tracking history uses versions of tables for every period in time. Also known as ‘snapshotting’. No example is supplied since it’s basically a copy of the entire table.
85
89
86
-
Type 6 / hybrid. Also known as ‘twin time stamping’, the type 6 approach combines the concepts of type 1-B, type 2 and type 3 mechanisms (1+2+3=6!). In the following example the attribute combination is the name. It consists of two attributes.
90
+
**Type 6 / hybrid**. Also known as ‘twin time stamping’, the type 6 approach combines the concepts of type 1-B, type 2 and type 3 mechanisms (1+2+3=6!). In the following example the attribute combination is the name. It consists of two attributes.
87
91
A new record is inserted in the Data Warehouse table.
88
-
DWH_KEY Logical Key Name Current Name Colour Start date End date
89
-
1 CHS
90
-
Cheese
91
-
CheeseGolden05-01-200031-12-9999
92
+
93
+
DWH Key | Logical Key | Name | Current Name | Colour | Start date | End date
After some time the name is changed to Old Cheese. This leads to a SCD2 event where a new record is inserted and an old one is closed off. At the same time, the history of the existing type 3 attribute is overwritten by a type 1-B event.
94
-
DWH_KEY Logical Key Name Current Name Colour Start date End date
95
-
2 CHS
96
-
Old Cheese
97
-
Old CheeseGolden20-07-200831-12-9999
98
-
1CHSCheeseOld CheeseGolden05-01-200019-07-2008
98
+
99
+
DWH Key | Logical Key | Name | Current Name | Colour | Start date | End date
100
+
--- | --- | --- | --- | --- | --- | ---
101
+
2 | CHS | Old Cheese | Old Cheese | Golden | 20-07-2008 | 31-12-9999
102
+
1 | CHS | Cheese | Old Cheese | Golden | 05-01-2000 | 19-07-2008
99
103
100
104
Now you can see the previous record and all related facts against both the current and historical name. When a new change occurs, the following happens:
101
-
DWH_KEY Logical Key Name Current Name Colour Start date End date
102
-
3 CHS
103
-
A+ Cheese
104
-
A+ CheeseGolden13-03-201031-12-9999
105
-
2CHSOld CheeseA+ CheeseGolden20-07-200812-03-2010
106
-
1CHSCheeseA+ CheeseGolden05-01-200019-07-2008
105
+
106
+
DWH Key | Logical Key | Name | Current Name | Colour | Start date | End date
Obviously, corresponding records are identified by the logical key.
110
-
Type 1-B and the corresponding concept in type 6 usually require separate mappings to update the entire history. Special care from a performance perspective because it has to be avoided that the entire history will be rewritten over and over again when really only the latest situation for that logical key. This mapping will have to aggregate the dataset to merge the latest state per natural key with the target table, and it will have to run after the regular type 2 processes.
111
-
Never use NULL in the end date attribute of the most recent record to indicate an open / recent record date. Some databases have troubles handling NULL values and it is best practice to avoid NULL values wherever possible, especially in dimensions.
112
-
It is advised to add an ‘actual record indicator’ for quick querying and easy understanding.
113
-
Depending on the location in the Data Warehouse either tables or attributes may be defined for a specific history type. For instance, defining a table as SCD type 2 means that a change in every attribute will lead to a new record (and closing an old one). In Data Marts the common approach is often to specify a history type per attribute. So a change in one attribute may lead to an SCD type 2 event, but a change in another one may cause the history to be overwritten.
113
+
*Obviously, corresponding records are identified by the logical key.
114
+
*Type 1-B and the corresponding concept in Type 6 usually require separate mappings to update the entire history. Special care from a performance perspective because it has to be avoided that the entire history will be rewritten over and over again when really only the latest situation for that logical key. This mapping will have to aggregate the dataset to merge the latest state per natural key with the target table, and it will have to run after the regular Type 2 processes.
115
+
* Avoid using NULL in the end date attribute of the most recent record to indicate an open / recent record date. Some databases have troubles handling NULL values and it is best practice to avoid NULL values wherever possible, especially in dimensions.
116
+
*It is advised to add an ‘current record indicator’ for quick querying and easy understanding.
117
+
*Depending on the location in the Data Warehouse either tables or attributes may be defined for a specific history type. For instance, defining a table as SCD Type 2 means that a change in every attribute will lead to a new record (and closing an old one). In Data Marts the common approach is often to specify a history type per attribute. So a change in one attribute may lead to an SCD Type 2 event, but a change in another one may cause the history to be overwritten.
114
118
115
119
## Considerations and Consequences
116
-
None.
117
-
Known uses
118
-
Usually in the integration and presentation layer, but applicability is related to the architecture.
0 commit comments