Skip to content

Commit f9a9a23

Browse files
authored
Update Design Pattern - Generic - Managing temporality by using Load, Event and Change dates.md
1 parent d33575e commit f9a9a23

File tree

1 file changed

+39
-28
lines changed

1 file changed

+39
-28
lines changed

1000_Design_Patterns/Design Pattern - Generic - Managing temporality by using Load, Event and Change dates.md

Lines changed: 39 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -2,39 +2,50 @@
22

33
## Purpose
44
This Design Pattern describes how and when dates and timestamps are generated by the Data Warehouse.
5-
Motivation
5+
6+
## Motivation
67
Making sure that the data is valid for a period time is a core functionality of the Data Warehouse. Data is only accurate when a specific value in the period of time (interval) reflects the reality of events, this is a fundamental requirement for auditability / lineage. For this reason care must be taken when using system generated date/time values to indicate when a value (row) is valid and to track whether the value has been changed in the providing source systems.
8+
79
This is also known as:
8-
Effective and Expiry Dates.
9-
Start and End Dates.
10-
Slowly Changing Dimension valid period.
11-
Event Date/Time
12-
Temporality, Time-Variance
13-
Applicability
10+
* Effective and Expiry Dates.
11+
* Start and End Dates.
12+
* Slowly Changing Dimension valid period.
13+
* Event Date/Time
14+
* Temporality, Time-Variance
15+
16+
## Applicability
1417
This pattern is applicable for all ETL processes in the Data Warehouse, but is more standardised in the Staging Layer and the Integration Layer. Dates are typically inherited or re-calculated into the Presentation Layer as well but since this is essentially a free-format layer this decision is left open for individual projects.
15-
Structure
18+
19+
## Structure
1620
The essence of this Design Pattern is to identify a date/time for every data set in the Staging Area and use the same date/time value onwards through the Integration Layer processes. In other words; the registration of date/time in the Data Warehouse is decoupled from when the ETL is actually run for the rest of the Data Warehouse. There are three main date/times that can be considered as standard:
17-
The Load Date/Time, when the change was received by the Data Warehouse environment. This is logged at the first entry of data, usually the Staging or Persistent Staging Areas.
18-
The Event Date/Time, when the change was made in the source / feeding system.(also almost the same as CDC Date/Time, although there are cases where the actual event of change differs from the change detection). This is embedded into source-to-staging (CDC) loading patterns. The method to determine the best Event Date/Time varies from interface to interface but never is a source attribute (except for initial loads).
19-
The ETL Execution Date/Time, when the data integration process was run that touched the record(s). This is implemented in the ETL control Framework (DIRECT).
20-
It is important to clarify there are many other date/time values available which can be used to represent data (changes) over time, these are considered ‘business’ date/times and are treated as regular data attributes. In some cases, these can be identified as a calculated Business Change Date/Time attribute which can sometimes be used for standardisation.
21+
* The Load Date/Time, when the change was received by the Data Warehouse environment. This is logged at the first entry of data, usually the Staging or Persistent Staging Areas.
22+
* The Event Date/Time, when the change was made in the source / feeding system.(also almost the same as CDC Date/Time, although there are cases where the actual event of change differs from the change detection). This is embedded into source-to-staging (CDC) loading patterns. The method to determine the best Event Date/Time varies from interface to interface but never is a source attribute (except for initial loads).
23+
* The ETL Execution Date/Time, when the data integration process was run that touched the record(s). This is implemented in the ETL control Framework (DIRECT).
24+
25+
It is important to clarify there are many other date/time values available which can be used to represent data (changes) over time, these are considered ‘business’ date/times and are treated as regular data attributes. In some cases, these can be identified as a calculated Business Change Date/Time attribute which can sometimes be used for standardisation.
26+
2127
The Load Date/Time is the only value that can be systematically relied on, to provide a deterministic date/time field which is decoupled from loading processes.
2228
As defined in the Staging Layer Design Pattern (015), one of the most important features of the Staging Layer is capturing the Load Date/Time and identifying the best Event Date/Time for each (type of) interface. As a technical data element, the Load Date/Time ensures that there is a unique version of the record for each time interval.
29+
2330
This is different from the regular ETL process execution and orchestration (DIRECT) controls which manage the information about the ETL process itself. Individual execution date/times are still available for the Data Warehouse by querying the control framework for the unique ETL execution ID associated with the records. By combining these concepts the Data Warehouse can identify both when information was loaded and the correct time interval for which the information is valid.
31+
2432
After the Staging Layer has been loaded the subsequent Integration Area ETL processes use the Load Date/Time value for managing effective and expiry dates (in all their occurrences).
25-
Implementation Guidelines
26-
Always use high precision date/time everywhere (not only date). This makes the implementation more scalable, for instance when moving from batch to intra-day or near real-time scheduling.
27-
The Load Date/Time can be implemented as a trigger / automatically updated field in the Staging Layer.
28-
The Load Date/Time acts as Effective Date/Time in the Integration Layer.
29-
CDC sources are able to provide a more accurate Event Date/Time than regular ETL processes (i.e. push or pull deltas) do. In this case the Event Date/Time for the CDC event, or the condensed (net changes) version of this (see Using CDC’ – Design Pattern 021) can be used as Event Date/Time since this represents a more detailed and uniform date/time for the record, based on the transaction log, to be presented to the Data Warehouse environment.
30-
End dates (Expiry Dates) are redundant and usually not required. These can be calculated from the Effective Date, if required. Given the fact that the Load Date/Time is fundamentally a technical value this is representing a time-variant view from the perspective of data arrival. Business users usually want to see information in a different way, which is why date/time series are often recalculated in the Presentation Layer following a different timeline.
31-
Persistent Staging Area ETL processes typically do not process end-dates for performance purposes. In this case the Load Date/Time is inserted with each record as time interval as part of the Primary Key (thus acting as effective date).
32-
Dependencies that are introduced by the loading strategy must be kept in mind. For instance: Integration Layer processes should be completed before the next Staging Layer ETL runs. If not, the new batch might overwrite any values in the Staging Area table and the wrong date/time will be used. This is one of the reasons why the standard Batch process includes both the Staging and Integration ETL processes.
33-
Decoupling Data Warehouse timelines with process control (system date/times) means that the ETL will provide the same results regardless of when it is executed.
34-
Design Pattern 008 – Data Vault – Loading Hub tables.
35-
Design Pattern 009 – Data Vault – Loading Satellite tables.
36-
Design Pattern 010 – Data Vault – Loading Link tables.
37-
Design Pattern 015 – Generic – Loading Staging Area tables.
38-
Design Pattern 017 – Generic – Loading History Area tables.
39-
Considerations
40-
Related patterns
33+
34+
## Implementation Guidelines
35+
* Always use high precision date/time everywhere (not only date). This makes the implementation more scalable, for instance when moving from batch to intra-day or near real-time scheduling.
36+
* The Load Date/Time can be implemented as a trigger / automatically updated field in the Staging Layer.
37+
* The Load Date/Time acts as Effective Date/Time in the Integration Layer.
38+
* CDC sources are able to provide a more accurate Event Date/Time than regular ETL processes (i.e. push or pull deltas) do. In this case the Event Date/Time for the CDC event, or the condensed (net changes) version of this (see Using CDC’ – Design Pattern 021) can be used as Event Date/Time since this represents a more detailed and uniform date/time for the record, based on the transaction log, to be presented to the Data Warehouse environment.
39+
* End dates (Expiry Dates) are redundant and usually not required. These can be calculated from the Effective Date, if required. Given the fact that the Load Date/Time is fundamentally a technical value this is representing a time-variant view from the perspective of data arrival. Business users usually want to see information in a different way, which is why date/time series are often recalculated in the Presentation Layer following a different timeline.
40+
* Persistent Staging Area ETL processes typically do not process end-dates for performance purposes. In this case the Load Date/Time is inserted with each record as time interval as part of the Primary Key (thus acting as effective date).
41+
42+
## Considerations and consequences
43+
* Dependencies that are introduced by the loading strategy must be kept in mind. For instance: Integration Layer processes should be completed before the next Staging Layer ETL runs. If not, the new batch might overwrite any values in the Staging Area table and the wrong date/time will be used. This is one of the reasons why the standard Batch process includes both the Staging and Integration ETL processes.
44+
* Decoupling Data Warehouse timelines with process control (system date/times) means that the ETL will provide the same results regardless of when it is executed.
45+
46+
## Related Patterns
47+
* Design Pattern – Data Vault – Loading Hub tables.
48+
* Design Pattern – Data Vault – Loading Satellite tables.
49+
* Design Pattern – Data Vault – Loading Link tables.
50+
* Design Pattern – Generic – Loading Staging Area tables.
51+
* Design Pattern – Generic – Loading History Area tables.

0 commit comments

Comments
 (0)