You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Design Pattern - Data Vault - Loading Link Satellite tables
1
+
---
2
+
uid: design-pattern-data-vault-link-satellite
3
+
---
4
+
5
+
# Design Pattern - Data Vault - Link Satellite
6
+
7
+
> [!WARNING]
8
+
> This design pattern requires a major update to refresh the content.
9
+
10
+
> [!NOTE]
11
+
> Depending on your philosophy on Data Vault implementation, Link Satellites may not be relevant or applicable.
12
+
> There are very viable considerations to implement a Data Vault model *without* Link-Satellites.
2
13
3
14
## Purpose
4
-
This Design Pattern describes how to load data into Link-Satellite tables within a ‘Data Vault’ EDW architecture. In Data Vault, Link-Satellite tables manage the change for relationships over time.
15
+
16
+
This design pattern describes how to load process data for a Data Vault methodology Link-Satellite. In Data Vault, Link-Satellite tables manage the change for relationships over time.
5
17
6
18
## Motivation
7
19
@@ -16,43 +28,15 @@ This pattern is only applicable for loading data to Link-Satellite tables from:
16
28
* The only difference to the specified ETL template is any business logic required in the mappings towards the Interpretation Area tables.
17
29
18
30
## Structure
19
-
Standard Link-Satellites use the Driving Key concept to manage the ending of ‘old’ relationships.
31
+
32
+
Standard Link-Satellites use the Driving Key concept to manage the ending of �old� relationships.
20
33
21
34
## Implementation Guidelines
22
-
Multiple passes of the same source table or file are usually required. The first pass will insert new keys in the Hub table; the other passes are needed to populate the Satellite and Link tables.
23
-
Select all records for the Link Satellite which have more than one open effective date / current record indicator but are not the most recent (because that record does not need to be closed
24
-
WITH MyCTE (<LinkSK>, <DrivingKeySK>, <Effective Date/Time>, <Expiry Date/Time>, RowVersion)
# Design Pattern - Data Vault - Missing Keys and Placeholders
2
2
3
3
## Purpose
4
+
4
5
This Design Pattern documents how to handle situations where there are mismatches with the source business keys leading to values not being available in some cases. Due to the strict approach towards key lookups this would lead to errors in ETL. This is where placeholders are applied. The pattern assumes that source files are always first processed against Hub tables including loading any transactional tables against the Hubs.
5
6
6
7
## Motivation
8
+
7
9
This pattern focuses on processing data from dodgy sources that actually contain NULL business keys. When a business key is NULL this should be resolved to a placeholder (dummy Surrogate Key).
8
10
The reasoning behind this is to prevent overcomplicated error handling while loading data into the (raw) Data Vault; supporting the goal to load everything just as the source system provides it while at the same time preventing losing any records.
9
11
Also known as
@@ -12,25 +14,27 @@ Early or late arriving data.
12
14
Empty business keys.
13
15
14
16
## Applicability
17
+
15
18
This pattern is only applicable for loading data into the Integration Area tables.
16
19
17
20
## Structure
18
-
The Enterprise Data Warehouse architecture specifies that ‘hard’ business rules are implemented on the way into the Data Warehouse (the process from the Staging Area into the Integration Area) whereas ‘soft’ business rules are implemented from the Integration Layer to the Interpretation Area and/or the Presentation Layer (on the way out).
19
-
Using placeholders is a ‘hard’ business rule because no-one can interpret the meaning of a NULL value. SQL cannot deal with NULL values very well and because of this allowing NULL values increases the complexity of the queries against the Integration Area (potentially using outer joins). This is the reason why NULL values are remapped on the way into the Integration Area and ultimately why this kind of (hard) business logic is allowed here.
21
+
22
+
The Enterprise Data Warehouse architecture specifies that �hard� business rules are implemented on the way into the Data Warehouse (the process from the Staging Area into the Integration Area) whereas �soft� business rules are implemented from the Integration Layer to the Interpretation Area and/or the Presentation Layer (on the way out).
23
+
Using placeholders is a �hard� business rule because no-one can interpret the meaning of a NULL value. SQL cannot deal with NULL values very well and because of this allowing NULL values increases the complexity of the queries against the Integration Area (potentially using outer joins). This is the reason why NULL values are remapped on the way into the Integration Area and ultimately why this kind of (hard) business logic is allowed here.
20
24
21
25
For example, here are some reasons how NULL values can be presented instead of business keys:
22
-
The source declares them as optional Foreign Keys; for instance when ‘X’ is true, then the business key is populated. Otherwise the business key remains NULL.
26
+
The source declares them as optional Foreign Keys; for instance when �X� is true, then the business key is populated. Otherwise the business key remains NULL.
23
27
The source declares them as required but the declaration is broken or not enforced (there is an error in the source application that allows NULLS when it shouldn't).
24
28
Implementation guidelines
25
-
NULL/unknown/undefined business key values can be mapped to various placeholder surrogate key values (-1 to -7 surrogate key values) with descriptions like ‘Not Applicable’, ‘Unknown’ or anything that fits the business key domain. The taxonomy usable for most situations is (not all values are applicable in all situations):
26
-
Missing (-1): the root node and supertype of all ‘missing’ information, it encompasses:
27
-
Missing value (-2): supertype of all missing values. Can be ‘Unknown’ or ‘Not Applicable’:
29
+
NULL/unknown/undefined business key values can be mapped to various placeholder surrogate key values (-1 to -7 surrogate key values) with descriptions like �Not Applicable�, �Unknown� or anything that fits the business key domain. The taxonomy usable for most situations is (not all values are applicable in all situations):
30
+
Missing (-1): the root node and supertype of all �missing� information, it encompasses:
31
+
Missing value (-2): supertype of all missing values. Can be �Unknown� or �Not Applicable�:
28
32
Not Applicable (-3).
29
33
Unknown (-4).
30
34
Missing Attribute/Column (-5): supertype of all missing values due to missing attributes:
31
35
Missing Source Attribute (Non recordable Source) (-6). Used when source fails to supply attribute/column
32
36
Missing Target Attribute (Non recordable DWH Attribute) (-7). Used for temporal data that falls before the deployment of the attribute.
33
-
Deciding between the various types of ‘unknown’ is a business question that is decided based on how the source database works.
37
+
Deciding between the various types of �unknown� is a business question that is decided based on how the source database works.
34
38
35
39
## Considerations and Consequences
36
40
The Hubs must be pre-populated with the placeholder values (records).
@@ -40,4 +44,4 @@ Known uses
40
44
This type of ETL process is to be used in all Hub or Surrogate Key tables in the Integration Area. The Interpretation Area Hub tables, if used, have similar characteristics but the ETL process contains business logic.
41
45
42
46
## Related Patterns
43
-
Design Pattern 008 – Data Vault – Loading Hub tables.
47
+
Design Pattern 008 � Data Vault � Loading Hub tables.
0 commit comments