Skip to content

Commit 23a4ce7

Browse files
committed
Adding warnings on old content.
1 parent bebf05e commit 23a4ce7

5 files changed

+43
-59
lines changed

design-patterns/design-pattern-data-vault-hub.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
---
2-
uid: design-pattern-data-vault-hub-table
2+
uid: design-pattern-data-vault-hub
33
---
44

5-
# Design Pattern - Data Vault - Hub table
5+
# Design Pattern - Data Vault - Hub
66

77
> [!WARNING]
88
> This design pattern requires a major update to refresh the content.
Lines changed: 20 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,19 @@
1-
# Design Pattern - Data Vault - Loading Link Satellite tables
1+
---
2+
uid: design-pattern-data-vault-link-satellite
3+
---
4+
5+
# Design Pattern - Data Vault - Link Satellite
6+
7+
> [!WARNING]
8+
> This design pattern requires a major update to refresh the content.
9+
10+
> [!NOTE]
11+
> Depending on your philosophy on Data Vault implementation, Link Satellites may not be relevant or applicable.
12+
> There are very viable considerations to implement a Data Vault model *without* Link-Satellites.
213
314
## Purpose
4-
This Design Pattern describes how to load data into Link-Satellite tables within a ‘Data Vault’ EDW architecture. In Data Vault, Link-Satellite tables manage the change for relationships over time.
15+
16+
This design pattern describes how to load process data for a Data Vault methodology Link-Satellite. In Data Vault, Link-Satellite tables manage the change for relationships over time.
517

618
## Motivation
719

@@ -16,43 +28,15 @@ This pattern is only applicable for loading data to Link-Satellite tables from:
1628
* The only difference to the specified ETL template is any business logic required in the mappings towards the Interpretation Area tables.
1729

1830
## Structure
19-
Standard Link-Satellites use the Driving Key concept to manage the ending of ‘old’ relationships.
31+
32+
Standard Link-Satellites use the Driving Key concept to manage the ending of �old� relationships.
2033

2134
## Implementation Guidelines
22-
Multiple passes of the same source table or file are usually required. The first pass will insert new keys in the Hub table; the other passes are needed to populate the Satellite and Link tables.
23-
Select all records for the Link Satellite which have more than one open effective date / current record indicator but are not the most recent (because that record does not need to be closed
24-
WITH MyCTE (<Link SK>, <Driving Key SK>, <Effective Date/Time>, <Expiry Date/Time>, RowVersion)
25-
AS (
26-
SELECT
27-
A.<Link SK>, B.<Driving Key SK>, A.<Effective Date/Time>, A.<Expiry Date/Time>,
28-
DENSE_RANK() OVER(PARTITION BY B.<Driving Key SK> ORDER BY B.<Link SK>, <Effective Date/Time> ASC) RowVersion
29-
FROM <Link Sat table> A
30-
JOIN <Link table> B ON A.<Link SK>=B.<Link SK>
31-
JOIN (
32-
SELECT <Driving Key SK>
33-
FROM <Link Sat table> A
34-
JOIN <Link table> B ON A.<Link SK>=B.<Link SK>
35-
WHERE A.<Expiry Date/Time> = '99991231'
36-
GROUP BY <Driving Key SK>
37-
HAVING COUNT(*) > 1
38-
) C ON B.<Driving Key SK> = C.<Driving Key SK>
39-
)
40-
SELECT
41-
BASE.<Link SK>
42-
,CASE WHEN LAG.<Effective Date/Time> IS NULL THEN '19000101' ELSE BASE.<Effective Date/Time> END AS <Effective Date/Time>
43-
,CASE WHEN LEAD.<Effective Date/Time> IS NULL THEN '99991231' ELSE LEAD.<Effective Date/Time> END AS <Expiry Date/Time>
44-
,CASE WHEN LEAD.<Effective Date/Time> IS NULL THEN 'Y' ELSE 'N' END AS <Current Row Indicator>
45-
FROM MyCTE BASE
46-
LEFT JOIN MyCTE LEAD ON BASE.<Driving Key SK> = LEAD.<Driving Key SK>
47-
AND BASE.RowVersion = LEAD.RowVersion-1
48-
LEFT JOIN MyCTE LAG ON BASE.<Driving Key SK> = LAG.<Driving Key SK>
49-
AND BASE.RowVersion = LAG.RowVersion+1
50-
WHERE BASE.<Expiry Date/Time> = '99991231'
5135

5236
## Considerations and Consequences
53-
Multiple passes on source data are likely to be required.
5437

5538
## Related Patterns
56-
* Design Pattern 006 – Using Start, Process and End Dates
57-
* Design Pattern 009 – Loading Satellite tables.
58-
* Design Pattern 010 – Loading Link tables.
39+
40+
* Design Pattern - Using Start, Process and End Dates
41+
* Design Pattern - Satellite
42+
* Design Pattern - Link

design-patterns/design-pattern-data-vault-link.md

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,11 @@
11
---
2-
uid: design-pattern-data-vault-link-table
2+
uid: design-pattern-data-vault-link
33
---
44

5-
# Design Pattern - Data Vault - Loading Link tables
5+
# Design Pattern - Data Vault - Link
66

7-
---
8-
**NOTE**
9-
10-
This design pattern requires a major update to refresh the content.
11-
12-
---
7+
> [!WARNING]
8+
> This design pattern requires a major update to refresh the content.
139
1410
## Purpose
1511

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
# Design Pattern - Data Vault - Missing Keys and Placeholders
22

33
## Purpose
4+
45
This Design Pattern documents how to handle situations where there are mismatches with the source business keys leading to values not being available in some cases. Due to the strict approach towards key lookups this would lead to errors in ETL. This is where placeholders are applied. The pattern assumes that source files are always first processed against Hub tables including loading any transactional tables against the Hubs.
56

67
## Motivation
8+
79
This pattern focuses on processing data from dodgy sources that actually contain NULL business keys. When a business key is NULL this should be resolved to a placeholder (dummy Surrogate Key).
810
The reasoning behind this is to prevent overcomplicated error handling while loading data into the (raw) Data Vault; supporting the goal to load everything just as the source system provides it while at the same time preventing losing any records.
911
Also known as
@@ -12,25 +14,27 @@ Early or late arriving data.
1214
Empty business keys.
1315

1416
## Applicability
17+
1518
This pattern is only applicable for loading data into the Integration Area tables.
1619

1720
## Structure
18-
The Enterprise Data Warehouse architecture specifies that ‘hard’ business rules are implemented on the way into the Data Warehouse (the process from the Staging Area into the Integration Area) whereas ‘soft’ business rules are implemented from the Integration Layer to the Interpretation Area and/or the Presentation Layer (on the way out).
19-
Using placeholders is a ‘hard’ business rule because no-one can interpret the meaning of a NULL value. SQL cannot deal with NULL values very well and because of this allowing NULL values increases the complexity of the queries against the Integration Area (potentially using outer joins). This is the reason why NULL values are remapped on the way into the Integration Area and ultimately why this kind of (hard) business logic is allowed here.
21+
22+
The Enterprise Data Warehouse architecture specifies that �hard� business rules are implemented on the way into the Data Warehouse (the process from the Staging Area into the Integration Area) whereas �soft� business rules are implemented from the Integration Layer to the Interpretation Area and/or the Presentation Layer (on the way out).
23+
Using placeholders is a �hard� business rule because no-one can interpret the meaning of a NULL value. SQL cannot deal with NULL values very well and because of this allowing NULL values increases the complexity of the queries against the Integration Area (potentially using outer joins). This is the reason why NULL values are remapped on the way into the Integration Area and ultimately why this kind of (hard) business logic is allowed here.
2024

2125
For example, here are some reasons how NULL values can be presented instead of business keys:
22-
The source declares them as optional Foreign Keys; for instance when ‘X’ is true, then the business key is populated. Otherwise the business key remains NULL.
26+
The source declares them as optional Foreign Keys; for instance when �X� is true, then the business key is populated. Otherwise the business key remains NULL.
2327
The source declares them as required but the declaration is broken or not enforced (there is an error in the source application that allows NULLS when it shouldn't).
2428
Implementation guidelines
25-
NULL/unknown/undefined business key values can be mapped to various placeholder surrogate key values (-1 to -7 surrogate key values) with descriptions like Not Applicable’, ‘Unknown or anything that fits the business key domain. The taxonomy usable for most situations is (not all values are applicable in all situations):
26-
Missing (-1): the root node and supertype of all missing information, it encompasses:
27-
Missing value (-2): supertype of all missing values. Can be Unknown or Not Applicable:
29+
NULL/unknown/undefined business key values can be mapped to various placeholder surrogate key values (-1 to -7 surrogate key values) with descriptions like Not Applicable�, �Unknown or anything that fits the business key domain. The taxonomy usable for most situations is (not all values are applicable in all situations):
30+
Missing (-1): the root node and supertype of all missing information, it encompasses:
31+
Missing value (-2): supertype of all missing values. Can be Unknown or Not Applicable:
2832
Not Applicable (-3).
2933
Unknown (-4).
3034
Missing Attribute/Column (-5): supertype of all missing values due to missing attributes:
3135
Missing Source Attribute (Non recordable Source) (-6). Used when source fails to supply attribute/column
3236
Missing Target Attribute (Non recordable DWH Attribute) (-7). Used for temporal data that falls before the deployment of the attribute.
33-
Deciding between the various types of unknown is a business question that is decided based on how the source database works.
37+
Deciding between the various types of unknown is a business question that is decided based on how the source database works.
3438

3539
## Considerations and Consequences
3640
The Hubs must be pre-populated with the placeholder values (records).
@@ -40,4 +44,4 @@ Known uses
4044
This type of ETL process is to be used in all Hub or Surrogate Key tables in the Integration Area. The Interpretation Area Hub tables, if used, have similar characteristics but the ETL process contains business logic.
4145

4246
## Related Patterns
43-
Design Pattern 008 Data Vault Loading Hub tables.
47+
Design Pattern 008 Data Vault Loading Hub tables.

design-patterns/design-pattern-data-vault-satellite.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
1-
# Design Pattern - Data Vault - Satellites table
2-
31
---
4-
**NOTE**
2+
uid: design-pattern-data-vault-satellite
3+
---
54

6-
This design pattern requires a major update to refresh the content.
5+
# Design Pattern - Data Vault - Satellite
76

8-
---
7+
> [!WARNING]
8+
> This design pattern requires a major update to refresh the content.
99
1010
## Purpose
1111

0 commit comments

Comments
 (0)