Skip to content

Commit e4a4e36

Browse files
committed
Further tidy-ups
1 parent d970392 commit e4a4e36

14 files changed

+189
-286
lines changed

docs/design-patterns/design-pattern-data-vault-hub.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ By default the DISTINCT function is executed on database level to reserve resour
5757

5858
The logic to create the initial (dummy) Satellite record can both be implemented as part of the Hub ETL process, as a separate ETL process which queries all keys that have no corresponding dummy or as part of the Satellite ETL process. This depends on the capabilities of the ETL software since not all are able to provide and reuse sequence generators or able to write to multiple targets in one process. The default and arguably most flexible way is to incorporate this concept as part of the Satellite ETL since it does not require rework when additional Satellites are associated with the Hub. This means that each Satellite ETL must perform a check if a dummy record exists before starting the standard process (and be able to roll back the dummy records if required).
5959

60-
When modeling the Hub tables try to be conservative when defining the business keys. Not every foreign key in the source indicates a business key and therefore a Hub table. A true business key is a concept that is known and used throughout the organisation (and systems) and is self-standing and meaningful.
60+
When modeling the Hub tables try to be conservative when defining the business keys. Not every foreign key in the source indicates a business key and therefore a Hub table. A true business key is a concept that is known and used throughout the organisation (and systems) and is self-standing and meaningful.
6161

6262
To cater for a situation where multiple Load Date / Time stamp values exist for a single business key, the minimum Load Date / Time stamp should be the value passed through with the HUB record. This can be implemented in ETL logic, or passed through to the database. When implemented at a database level, instead of using a SELECT DISTINCT, using the MIN function with a GROUP BY the business key can achieve both a distinct selection, and minimum Load Date / Time stamp in one step.
6363

docs/design-patterns/design-pattern-data-vault-link-satellite-driving-key.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ This pattern is applicable for processing data for a Link-Satellite table, or it
3232

3333
## Structure
3434

35-
Standard Link-Satellites use the Driving Key concept to manage the ending of old relationships.
35+
Standard Link-Satellites use the Driving Key concept to manage the ending of old relationships.
3636

3737
## Implementation Guidelines
3838

docs/design-patterns/design-pattern-data-vault-link-satellite.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ This pattern is only applicable for loading data to Link-Satellite tables from:
2929

3030
## Structure
3131

32-
Standard Link-Satellites use the Driving Key concept to manage the ending of old relationships.
32+
Standard Link-Satellites use the Driving Key concept to manage the ending of old relationships.
3333

3434
## Implementation Guidelines
3535

docs/design-patterns/design-pattern-data-vault-missing-keys-and-placeholders.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -19,22 +19,22 @@ This pattern is only applicable for loading data into the Integration Area table
1919

2020
## Structure
2121

22-
The Enterprise Data Warehouse architecture specifies that hard business rules are implemented on the way into the Data Warehouse (the process from the Staging Area into the Integration Area) whereas soft business rules are implemented from the Integration Layer to the Interpretation Area and/or the Presentation Layer (on the way out).
23-
Using placeholders is a hard business rule because no-one can interpret the meaning of a NULL value. SQL cannot deal with NULL values very well and because of this allowing NULL values increases the complexity of the queries against the Integration Area (potentially using outer joins). This is the reason why NULL values are remapped on the way into the Integration Area and ultimately why this kind of (hard) business logic is allowed here.
22+
The Enterprise Data Warehouse architecture specifies that hard business rules are implemented on the way into the Data Warehouse (the process from the Staging Area into the Integration Area) whereas soft business rules are implemented from the Integration Layer to the Interpretation Area and/or the Presentation Layer (on the way out).
23+
Using placeholders is a hard business rule because no-one can interpret the meaning of a NULL value. SQL cannot deal with NULL values very well and because of this allowing NULL values increases the complexity of the queries against the Integration Area (potentially using outer joins). This is the reason why NULL values are remapped on the way into the Integration Area and ultimately why this kind of (hard) business logic is allowed here.
2424

2525
For example, here are some reasons how NULL values can be presented instead of business keys:
26-
The source declares them as optional Foreign Keys; for instance when �X� is true, then the business key is populated. Otherwise the business key remains NULL.
26+
The source declares them as optional Foreign Keys; for instance when X is true, then the business key is populated. Otherwise the business key remains NULL.
2727
The source declares them as required but the declaration is broken or not enforced (there is an error in the source application that allows NULLS when it shouldn't).
2828
Implementation guidelines
29-
NULL/unknown/undefined business key values can be mapped to various placeholder surrogate key values (-1 to -7 surrogate key values) with descriptions like Not Applicable�, �Unknown or anything that fits the business key domain. The taxonomy usable for most situations is (not all values are applicable in all situations):
30-
Missing (-1): the root node and supertype of all missing information, it encompasses:
31-
Missing value (-2): supertype of all missing values. Can be Unknown or Not Applicable:
29+
NULL/unknown/undefined business key values can be mapped to various placeholder surrogate key values (-1 to -7 surrogate key values) with descriptions like Not Applicable, Unknown or anything that fits the business key domain. The taxonomy usable for most situations is (not all values are applicable in all situations):
30+
Missing (-1): the root node and supertype of all missing information, it encompasses:
31+
Missing value (-2): supertype of all missing values. Can be Unknown or Not Applicable:
3232
Not Applicable (-3).
3333
Unknown (-4).
3434
Missing Attribute/Column (-5): supertype of all missing values due to missing attributes:
3535
Missing Source Attribute (Non recordable Source) (-6). Used when source fails to supply attribute/column
3636
Missing Target Attribute (Non recordable DWH Attribute) (-7). Used for temporal data that falls before the deployment of the attribute.
37-
Deciding between the various types of unknown is a business question that is decided based on how the source database works.
37+
Deciding between the various types of unknown is a business question that is decided based on how the source database works.
3838

3939
## Considerations and Consequences
4040
The Hubs must be pre-populated with the placeholder values (records).
@@ -44,4 +44,4 @@ Known uses
4444
This type of ETL process is to be used in all Hub or Surrogate Key tables in the Integration Area. The Interpretation Area Hub tables, if used, have similar characteristics but the ETL process contains business logic.
4545

4646
## Related Patterns
47-
Design Pattern 008 Data Vault Loading Hub tables.
47+
Design Pattern 008 Data Vault Loading Hub tables.

0 commit comments

Comments
 (0)