Skip to content

Commit 59fdd42

Browse files
committed
Updated history and source system interfacing patterns to correct MD and reviewed
1 parent 6922783 commit 59fdd42

File tree

2 files changed

+115
-110
lines changed

2 files changed

+115
-110
lines changed

1000_Design_Patterns/Design Pattern - Generic - Types of History.md

Lines changed: 85 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -5,117 +5,119 @@ This design pattern describes the definitions for the commonly used history stor
55

66
## Motivation
77
Due to definitions changing over time and different definitions being made by different parties there usually is a lot of discussion about what exactly constitutes the different types of history. This design pattern aims to define these history types in order to provide the common ground for discussion.
8-
Also known as
9-
SCD / Slowly Changing Dimensions
10-
Type 1,2,3,4 etc.
8+
9+
This is also known as:
10+
* SCD; Slowly Changing Dimensions
11+
* Type 1,2,3,4 etc.
1112

1213
## Applicability
13-
Every situation where historical data is needed / stored or a discussion arises. Depending on the Data Warehouse architecture, this can be needed in a variety of situations. But typically these concepts are applied in the integration and presentation layer of the Data Warehouse.
14+
Every situation where historical data is needed / stored or a discussion arises.
15+
16+
Depending on the Data Warehouse architecture, this can be needed in a variety of situations. But typically these concepts are applied in the integration and presentation layer of the Data Warehouse.
1417

1518
## Structure
1619
The following history types are defined, some distinction is made where there are multiple viable explanations. All definitions can be valid coming from a specific background and in order to cater for every situation some history types are tagged with specific letters indicating a slightly different approach.
17-
Type 0. No change, while uncommon it has to be mentioned that this passive approach sometimes is implemented when storage space is to be saved or only the initial state has to be preserved.
18-
Type 1 – A. Change only the latest record. This implementation of type 1 is implemented if there is limited interest in keeping a specific kind of history. A good example is spelling errors; only the latest record is updated in that case (if you’re not interested in the wrong spelling for data quality purposes).
20+
21+
**Type 0**. No change, while uncommon it has to be mentioned that this passive approach sometimes is implemented when storage space is to be saved or only the initial state has to be preserved.
22+
23+
**Type 1 – A**. Change only the latest record. This implementation of type 1 is implemented if there is limited interest in keeping a specific kind of history. A good example is spelling errors; only the latest record is updated in that case (if you’re not interested in the wrong spelling for data quality purposes).
24+
1925
An example of the first instance of a type 1-A change:
2026
Old situation; a record exists for the logical key CHS (Cheese). The attribute Name is defined as a type 1(A) attribute.
21-
DWH_KEY Logical Key Name Colour Start date End date Update date
22-
3 CHS
23-
Cheese
24-
Golden 05-01-2000 31-12-9999 05-01-2000
25-
2 CHS Cheese Yellow 11-01-1996 04-01-2000 11-01-1996
26-
3 CHS Cheese Yellow 07-03-1994 10-01-1996 10-01-1996
27-
28-
When at some point (at 24-06-2006) the name is changed to Old Cheese and the Name attribute is defined as type 1(A) the name is overwritten, resulting in the following:
29-
DWH_KEY Logical Key Name Colour Start date End date Update date
30-
3 CHS
31-
Old Cheese
32-
Golden 05-01-2000 31-12-9999 24-06-2006
33-
2 CHS Cheese Yellow 11-01-1996 04-01-2000 11-01-1996
34-
3 CHS Cheese Yellow 07-03-1994 10-01-1996 10-01-1996
35-
36-
Type 1 – B. Update the entire history based on the latest situation. The previous example for the second version of type 1 is as follows:
27+
28+
DWH Key | Logical Key | Name | Colour | Start date | End date | Update date
29+
--- | --- | --- | --- | --- | --- | ---
30+
3 | CHS | Cheese | Golden | 05-01-2000 | 31-12-9999 | 05-01-2000
31+
2 | CHS | Cheese | Yellow | 11-01-1996 | 04-01-2000 | 11-01-1996
32+
1 | CHS | Cheese | Yellow | 07-03-1994 | 10-01-1996 | 10-01-1996
33+
34+
When at some point (at 24-06-2006) the name is changed to *Old Cheese* and the Name attribute is defined as type 1(A) the name is overwritten, resulting in the following:
35+
36+
DWH Key | Logical Key | Name | Colour | Start date | End date | Update date
37+
--- | --- | --- | --- | --- | --- | ---
38+
3 | CHS | Old Cheese | Golden | 05-01-2000 | 31-12-9999 | 24-06-2006
39+
2 | CHS | Cheese | Yellow | 11-01-1996 | 04-01-2000 | 11-01-1996
40+
1 | CHS | Cheese | Yellow | 07-03-1994 | 10-01-1996 | 10-01-1996
41+
42+
**Type 1 – B**. Update the entire history based on the latest situation. The previous example for the second version of type 1 is as follows:
3743
Old situation; a record exists for the logical key CHS (Cheese). The attribute Name is defined as a type 1(B) attribute.
38-
DWH_KEY Logical Key Name Colour Start date End date Update date
39-
3 CHS
40-
Cheese
41-
Golden 05-01-2000 31-12-9999 05-01-2000
42-
2 CHS Cheese Yellow 11-01-1996 04-01-2000 11-01-1996
43-
3 CHS Cheese Yellow 07-03-1994 10-01-1996 10-01-1996
44+
45+
DWH Key | Logical Key | Name | Colour | Start date | End date | Update date
46+
--- | --- | --- | --- | --- | --- | ---
47+
3| CHS | Cheese | Golden | 05-01-2000 | 31-12-9999 | 05-01-2000
48+
2 | CHS | Cheese | Yellow | 11-01-1996 | 04-01-2000 | 11-01-1996
49+
1 | CHS | Cheese | Yellow |07-03-1994 | 10-01-1996 | 10-01-1996
4450

4551
When at some point (at 24-06-2006) the name is changed to Old Cheese and the Name attribute is defined as type 1(B) the name is overwritten, resulting in the following:
46-
DWH_KEY Logical Key Name Colour Start date End date Update date
47-
3 CHS
48-
Old Cheese
49-
Golden 05-01-2000 31-12-9999 24-06-2006
50-
2 CHS Old Cheese Yellow 11-01-1996 04-01-2000 24-06-2006
51-
3 CHS Old Cheese Yellow 07-03-1994 10-01-1996 24-06-2006
52-
53-
Type 2 / also known as SCD-type2. The slowly changing dimension type 2 concept tracks history by inserting a new record and closing the most recent corresponding record whenever a change occurs.
52+
53+
DWH Key | Logical Key | Name | Colour | Start date | End date | Update date
54+
--- | --- | --- | --- | --- | --- | ---
55+
3| CHS | Old Cheese | Golden | 05-01-2000 | 31-12-9999 | 24-06-2006
56+
2 | CHS | Old Cheese | Yellow | 11-01-1996 | 04-01-2000 | 24-06-2006
57+
1 | CHS | Old Cheese | Yellow | 07-03-1994 | 10-01-1996 | 24-06-2006
58+
59+
**Type 2** / also known as SCD-type2. The slowly changing dimension type 2 concept tracks history by inserting a new record and closing the most recent corresponding record whenever a change occurs.
5460
A new record is inserted in the Data Warehouse table.
55-
DWH_KEY Logical Key Name Colour Start date End date
56-
1 CHS
57-
Cheese
58-
Golden 05-01-2000 31-12-9999
61+
62+
DWH Key | Logical Key | Name | Colour | Start date | End date | Update date
63+
--- | --- | --- | --- | --- | --- | ---
64+
1 | CHS | Cheese | Golden | 05-01-2000 | 31-12-9999 | 05-01-2000
5965

6066
In this case you have basic information of a product; the name is the attribute that can change over time. This record has been inserted on the 1st of January 2000 and is still active. But now, on the 20th July 2008 the name changes to Old Cheese. This will lead to a new record and an updated previous record for the same DWH key.
61-
DWH_KEY Logical Key Name Colour Start date End date
62-
2 CHS
63-
Cheese
64-
Golden 20-07-2008 31-12-9999
65-
1 CHS
66-
Cheese
67-
Golden 05-01-2000 19-07-2008
68-
69-
Type 3 history stores history in a separate attribute. As many attributes can be added to a record as the previous states that need to be captured. Typically only the previous state is recorded in the separate attribute. An example would be:
67+
68+
DWH Key | Logical Key | Name | Colour | Start date | End date | Update date
69+
--- | --- | --- | --- | --- | --- | ---
70+
2 | CHS | Cheese | Golden | 20-07-2008 | 31-12-9999 | 20-07-2000
71+
1 | CHS | Cheese | Golden | 05-01-2000 | 19-07-2008 | 05-01-2000
72+
73+
**Type 3** history stores history in a separate attribute. As many attributes can be added to a record as the previous states that need to be captured. Typically only the previous state is recorded in the separate attribute. An example would be:
7074
A new record is inserted in the Data Warehouse table on 12-10-2009:
71-
DWH_KEY Logical Key Name Previous name Colour Update date
72-
1 CHS
73-
Cheese
74-
<empty> Golden 12-10-2009
75+
76+
DWH Key | Logical Key | Name | Previous Name | Colour | Update date
77+
--- | --- | --- | --- | --- | ---
78+
1 | CHS | Cheese | NULL | Golden | 12-10-2009
7579

7680
When the name is changed to Old Cheese on February 2010 it leads to the following results:
77-
DWH_KEY Logical Key Name Previous name Colour Update date
78-
1 CHS
79-
Old Cheese
80-
Cheese Golden 02-02-2010
8181

82-
Type 4. This history tracking mechanism operates by using separate tables to store the history. One table contains the most recent version of the record and the history table contains some or all history.
82+
DWH Key | Logical Key | Name | Previous Name | Colour | Update date
83+
--- | --- | --- | --- | --- | ---
84+
1 | CHS | Old Cheese | Cheese | Golden | 02-02-2010
85+
86+
**Type 4**. This history tracking mechanism operates by using separate tables to store the history. One table contains the most recent version of the record and the history table contains some or all history.
8387

84-
Type 5. The type 5 method of tracking history uses versions of tables for every period in time. Also known as ‘snapshotting’. No example is supplied since it’s basically a copy of the entire table.
88+
**Type 5**. The type 5 method of tracking history uses versions of tables for every period in time. Also known as ‘snapshotting’. No example is supplied since it’s basically a copy of the entire table.
8589

86-
Type 6 / hybrid. Also known as ‘twin time stamping’, the type 6 approach combines the concepts of type 1-B, type 2 and type 3 mechanisms (1+2+3=6!). In the following example the attribute combination is the name. It consists of two attributes.
90+
**Type 6 / hybrid**. Also known as ‘twin time stamping’, the type 6 approach combines the concepts of type 1-B, type 2 and type 3 mechanisms (1+2+3=6!). In the following example the attribute combination is the name. It consists of two attributes.
8791
A new record is inserted in the Data Warehouse table.
88-
DWH_KEY Logical Key Name Current Name Colour Start date End date
89-
1 CHS
90-
Cheese
91-
Cheese Golden 05-01-2000 31-12-9999
92+
93+
DWH Key | Logical Key | Name | Current Name | Colour | Start date | End date
94+
--- | --- | --- | --- | --- | --- | ---
95+
1 | CHS | Cheese | Cheese | Golden | 05-01-2000 | 31-12-9999
9296

9397
After some time the name is changed to Old Cheese. This leads to a SCD2 event where a new record is inserted and an old one is closed off. At the same time, the history of the existing type 3 attribute is overwritten by a type 1-B event.
94-
DWH_KEY Logical Key Name Current Name Colour Start date End date
95-
2 CHS
96-
Old Cheese
97-
Old Cheese Golden 20-07-2008 31-12-9999
98-
1 CHS Cheese Old Cheese Golden 05-01-2000 19-07-2008
98+
99+
DWH Key | Logical Key | Name | Current Name | Colour | Start date | End date
100+
--- | --- | --- | --- | --- | --- | ---
101+
2 | CHS | Old Cheese | Old Cheese | Golden | 20-07-2008 | 31-12-9999
102+
1 | CHS | Cheese | Old Cheese | Golden | 05-01-2000 | 19-07-2008
99103

100104
Now you can see the previous record and all related facts against both the current and historical name. When a new change occurs, the following happens:
101-
DWH_KEY Logical Key Name Current Name Colour Start date End date
102-
3 CHS
103-
A+ Cheese
104-
A+ Cheese Golden 13-03-2010 31-12-9999
105-
2 CHS Old Cheese A+ Cheese Golden 20-07-2008 12-03-2010
106-
1 CHS Cheese A+ Cheese Golden 05-01-2000 19-07-2008
105+
106+
DWH Key | Logical Key | Name | Current Name | Colour | Start date | End date
107+
--- | --- | --- | --- | --- | --- | ---
108+
3 | CHS | A+ Cheese | A+ Cheese | Golden | 13-03-2010 | 31-12-9999
109+
2 | CHS | Old Cheese | A+ Cheese | Golden | 20-07-2008 | 12-03-2010
110+
1 | CHS | Cheese | A+ Cheese | Golden | 05-01-2000 | 19-07-2008
107111

108112
## Implementation Guidelines
109-
Obviously, corresponding records are identified by the logical key.
110-
Type 1-B and the corresponding concept in type 6 usually require separate mappings to update the entire history. Special care from a performance perspective because it has to be avoided that the entire history will be rewritten over and over again when really only the latest situation for that logical key. This mapping will have to aggregate the dataset to merge the latest state per natural key with the target table, and it will have to run after the regular type 2 processes.
111-
Never use NULL in the end date attribute of the most recent record to indicate an open / recent record date. Some databases have troubles handling NULL values and it is best practice to avoid NULL values wherever possible, especially in dimensions.
112-
It is advised to add an ‘actual record indicator’ for quick querying and easy understanding.
113-
Depending on the location in the Data Warehouse either tables or attributes may be defined for a specific history type. For instance, defining a table as SCD type 2 means that a change in every attribute will lead to a new record (and closing an old one). In Data Marts the common approach is often to specify a history type per attribute. So a change in one attribute may lead to an SCD type 2 event, but a change in another one may cause the history to be overwritten.
113+
* Obviously, corresponding records are identified by the logical key.
114+
* Type 1-B and the corresponding concept in Type 6 usually require separate mappings to update the entire history. Special care from a performance perspective because it has to be avoided that the entire history will be rewritten over and over again when really only the latest situation for that logical key. This mapping will have to aggregate the dataset to merge the latest state per natural key with the target table, and it will have to run after the regular Type 2 processes.
115+
* Avoid using NULL in the end date attribute of the most recent record to indicate an open / recent record date. Some databases have troubles handling NULL values and it is best practice to avoid NULL values wherever possible, especially in dimensions.
116+
* It is advised to add an ‘current record indicator’ for quick querying and easy understanding.
117+
* Depending on the location in the Data Warehouse either tables or attributes may be defined for a specific history type. For instance, defining a table as SCD Type 2 means that a change in every attribute will lead to a new record (and closing an old one). In Data Marts the common approach is often to specify a history type per attribute. So a change in one attribute may lead to an SCD Type 2 event, but a change in another one may cause the history to be overwritten.
114118

115119
## Considerations and Consequences
116-
None.
117-
Known uses
118-
Usually in the integration and presentation layer, but applicability is related to the architecture.
120+
Not applicable.
119121

120122
## Related Patterns
121123
* Design Pattern 011 – Kimball – Multiple SCD2 time periods.

0 commit comments

Comments
 (0)