Minor updates, many , many updates are needed.

RoelantVos · RoelantVos · commit 4a88313514aa · 2023-11-07T08:32:38.000+10:00
diff --git a/design-patterns/design-pattern-data-vault-hub.md b/design-patterns/design-pattern-data-vault-hub.md
@@ -1,3 +1,7 @@
+---
+uid: design-pattern-data-vault-hub-table
+---
+
 # Design Pattern - Data Vault - Hub table
 
 ---
@@ -9,10 +13,12 @@ This design pattern requires a major update to refresh the content.
 
 ## Purpose
 
-This Design Pattern describes how to load data into Data Vault Hub style tables. It is a specification of the Hub ETL process.
+This design pattern describes how to define, and load data into, Data Vault Hub style tables.
 
 ## Motivation
 
+A Data Vault Hub is the physical implementation of a Core Business Concept. These are the the key identified 'things' that can be meaningfully identified as part of an organization's business processes.
+
 Loading data into Hub tables is a relatively straightforward process with a clearly defined location in the architecture: it is applied when loading data from the Staging Layer to the Integration Layer.
 
 The Hub is a vital component of a Data Vault solution, making sure that Data Warehouse keys are distributed properly and at the right point in time.
@@ -40,6 +46,8 @@ During the selection the key distribution approach is implemented to make sure a
 
 ## Implementation Guidelines
 
+Hubs are core business concepts which must be immediately and uniquely identifiable through their name.
+
 Loading a Hub table from a specific Staging Layer table is a single, modular, ETL process. This is a requirement for flexibility in loading information as it enables full parallel processing.
 
 Multiple passes of the same source table or file are usually required for various tasks. The first pass will insert new keys in the Hub table; the other passes may be needed to populate the Satellite and Link tables.
diff --git a/design-patterns/design-pattern-data-vault-link.md b/design-patterns/design-pattern-data-vault-link.md
@@ -1,46 +1,72 @@
+---
+uid: design-pattern-data-vault-link-table
+---
+
 # Design Pattern - Data Vault - Loading Link tables
 
+---
+**NOTE**
+
+This design pattern requires a major update to refresh the content.
+
+---
+
 ## Purpose
-This design pattern describes the loading process for �Link� tables in the Data Vault concept.
+
+This design pattern describes how to define, and load data into, Data Vault Link style tables.
 
 ## Motivation
-The Link concept in Data Vault provides the flexibility of this data modeling approach. Links are sets of (hub) keys that indicate that a relationship between those Hubs has existed at some point in time. A Link table is similar in concept to the Hub table, but only stores key pairs. The structure of the relationship tables (including the Link table) is documented in the Integration FrameworkA120 � Integration Layer document of the Outline Architecture. Even though the Data Vault concept allows for adding attributes in the Link table it is strongly recommended (for flexibility reasons) to only store the generated hashkey (meaningless key), the Hub surrogate hashkeys and the date/time information. Doing this will ensure compatibility with both stationary facts (time dependent facts such as balances) and pure transactions.
-Link to file Eru Marumaru
-Also known As
-Relationship table
+
+A Link table in Data Vault is the physical implementation of a Natural Business Relationship. A Link uniquely identifies a relationship between Core Business Concepts (Hub tables in Data Vault).
+
+The Link concept in Data Vault provides the flexibility of this data modeling approach. Links are sets of (hub) keys that indicate that a relationship between those Hubs has existed at some point in time.
+
+A Link table is similar in concept to the Hub table, but only stores key pairs.
+
+Even though the Data Vault concept allows for adding attributes in the Link table it is strongly recommended (for flexibility reasons) to only store the generated hash key (meaningless key), the Hub surrogate hash keys and the date/time information. Doing this will ensure compatibility with both stationary facts (time dependent facts such as balances) and pure transactions.
 
 ## Applicability
+
 This pattern is only applicable for loading processes from the Staging Area into the Integration Area and from the Integration Area to the Interpretation Area. The pattern varies slightly for the type of Link table specified (such as Transactional Link, Same-As Link, Hierarchical Link or Low-Value Links) or whether it contains a degenerate attribute. In most cases the Link table will contain the default attributes (Link Key, Hub Keys and metadata attributes) but in the case of a pure transactional Link table it can contain the transaction attributes as well.
 
 ## Structure
-The ETL process can be described as an �insert only� set of the unique combination of Data Warehouse keys. Depending on the type of source table, the process will do the following:
+
+The ETL process can be described as an 'insert only' set of the unique combination of Data Warehouse keys. Depending on the type of source table, the process will do the following:
+
 Source Area to Integration Area: the process executes a SELECT DISTINCT query on business keys and performs key lookups (outer join) on the corresponding Hub tables to obtain the Hub Data Warehouse keys. The resulting key combination is then verified using a key lookup against the target Link table to verify if that specific combination of Data Warehouse keys already exists. If it exists, the row can be discarded, if not it can be inserted.
+
 Integration Area to Interpretation Area: the process executes a SELECT DISTINCT query on Data Warehouse keys (likely after combining multiple tables first) and performs a key lookup against the target Link table to verify if that specific combination of Data Warehouse keys already exists. If it exists, the row can be discarded, if not it can be inserted.
 The maintenance of the Interpretation Area can also be done as part of an (external) process or through Master Data Management. In this context, Link tables between Integration and Interpretation Area tables are very similar to cross-referencing tables.
+
 The following diagram displays the ETL process for Link tables;
 Business Insights > Design Pattern 010 - Data Vault - Loading Link tables > image2015-4-29 16:24:14.png
-This image needs updating to reflect DV 2.0 (hashkey) useage Eru Marumaru
+
 In a pure relational Link it is required that a dummy key is available in each corresponding Link-Satellite to complete the timelines. This is handled as part of the Link-Satellite processing as a Link can contain multiple Link-Satellites. Dummy records are only required to be inserted for each driving key as a view in time across the driving key is ultimately required. Inserting a dummy record for every Link key will cause issues in the timeline. This is explained in more detail in the Link-Satellite Design Pattern.
 
 ## Implementation Guidelines
+
 Use a single ETL process, module or mapping to load the Link table, thus improving flexibility in processing. Every ETL process should have a distinct function.
+
 Multiple passes of the same source table or file are usually required. The first pass will insert new keys in the Link table; the other passes are needed to populate the Link Satellite tables (if any).
+
 By default, create a sequence / meaningless key for each unique key combination in a Link table.
 Link tables can be seen as the relationship equivalent of Hub tables; only distinct new key pairs are inserted.
 Date/time information is copied from the Staging Area tables and not generated by the ETL process.
-The logic to create the initial (dummy) Satellite record can both be implemented as part of the Link ETL process, as a separate ETL process which queries all keys that have no corresponding dummy or as part of the Link-Satellite ETL process. This depends on the capabilities of the ETL software since not all are able to provide and reuse sequence generators or able to write to multiple targets in one process. The default and arguably most flexible way is to incorporate this concept as part of the Link-Satellite ETL since it does not require rework when additional Link-Satellites are associated with the Link. This means that each Link-Satellite ETL must perform a check if a dummy record exists before starting the standard process (and be able to roll back the dummy records if required).
+
+The logic to create the initial (dummy) Satellite record can both be implemented as part of the Link ETL process, as a separate ETL process which queries all keys that have no corresponding dummy or as part of the Link-Satellite ETL process. This depends on the capabilities of the ETL software since not all are able to provide and reuse sequence generators or able to write to multiple targets in one process. 
+
+The default and arguably most flexible way is to incorporate this concept as part of the Link-Satellite ETL since it does not require rework when additional Link-Satellites are associated with the Link. This means that each Link-Satellite ETL must perform a check if a dummy record exists before starting the standard process (and be able to roll back the dummy records if required).
+
 Depending on how the Link table is modelled (what kind of relationship it manages) the Link table may contains a relationship type attribute. If a link table contains multiple, or changing, relationships (types) this attributes is moved to the Link-Satellite table.
 Ending /closing relationships is always done in the Link-Satellite table, typically using a separate ETL process.
 
 ## Considerations and Consequences
+
 Multiple passes on source data is likely to be required. In extreme cases a single source table might be used (branch out) to Hubs, Satellites, Links and Link Satellites.
- Known Uses
+
 This type of ETL process is to be used for loading all link tables in both the Integration Area as well as the Interpretation Area. This is because the Link table is also used to relate raw (Integration Area) data and cleansed (Interpretation Area) data together.
 
 ## Related Patterns
-Design Pattern 006 � Generic � Using Start, Process and End Dates
-Design Pattern 008 � Data Vault � Loading Hub tables
-Design Pattern 009 � Data Vault � Loading Satellite tables
-Design Pattern 013 � Data Vault � Loading Link Satellite tables
-Discussion items (not yet to be implemented or used until final)
-None.
+
+* Design Pattern - Generic - Using Start, Process and End Dates
+* [Design Pattern - Data Vault - Hub tables](xref:design-pattern-data-vault-hub-table)
diff --git a/design-patterns/design-pattern-generic-types-of-history.md b/design-patterns/design-pattern-generic-types-of-history.md
@@ -1,17 +1,19 @@
 # Design Pattern - Generic - Types of History
 
 ## Purpose
+
 This design pattern describes the definitions for the commonly used history storage concepts.
 
 ## Motivation
+
 Due to definitions changing over time and different definitions being made by different parties there usually is a lot of discussion about what exactly constitutes the different types of history. This design pattern aims to define these history types in order to provide the common ground for discussion.
 
 This is also known as:
 * SCD; Slowly Changing Dimensions
 * Type 1,2,3,4 etc.
 
 ## Applicability
-Every situation where historical data is needed / stored or a discussion arises. 
+Every situation where historical data is needed / stored or a discussion arises.
 
 Depending on the Data Warehouse architecture, this can be needed in a variety of situations. But typically these concepts are applied in the integration and presentation layer of the Data Warehouse.
 
@@ -20,7 +22,7 @@ The following history types are defined, some distinction is made where there ar
 
 **Type 0**. No change, while uncommon it has to be mentioned that this passive approach sometimes is implemented when storage space is to be saved or only the initial state has to be preserved.
 
-**Type 1 � A**. Change only the latest record. This implementation of type 1 is implemented if there is limited interest in keeping a specific kind of history. A good example is spelling errors; only the latest record is updated in that case (if you�re not interested in the wrong spelling for data quality purposes). 
+**Type 1 - A**. Change only the latest record. This implementation of type 1 is implemented if there is limited interest in keeping a specific kind of history. A good example is spelling errors; only the latest record is updated in that case (if you're not interested in the wrong spelling for data quality purposes).
 
 An example of the first instance of a type 1-A change:
 Old situation; a record exists for the logical key CHS (Cheese). The attribute Name is defined as a type 1(A) attribute.
@@ -39,7 +41,7 @@ DWH Key	| Logical Key | Name | Colour | Start date | End date | Update date
 2 | CHS | Cheese | Yellow | 11-01-1996 | 04-01-2000 | 11-01-1996
 1 | CHS | Cheese | Yellow | 07-03-1994 | 10-01-1996 | 10-01-1996
 
-**Type 1 � B**. Update the entire history based on the latest situation. The previous example for the second version of type 1 is as follows:
+**Type 1 - B**. Update the entire history based on the latest situation. The previous example for the second version of type 1 is as follows:
 Old situation; a record exists for the logical key CHS (Cheese). The attribute Name is defined as a type 1(B) attribute.
 
 DWH Key	| Logical Key | Name | Colour | Start date | End date | Update date
@@ -48,7 +50,7 @@ DWH Key	| Logical Key | Name | Colour | Start date | End date | Update date
 2 | CHS	| Cheese | Yellow | 11-01-1996 | 04-01-2000 | 11-01-1996
 1 | CHS | Cheese | Yellow |07-03-1994 | 10-01-1996 | 10-01-1996
 
-When at some point (at 24-06-2006) the name is changed to Old Cheese and the Name attribute is defined as type 1(B) the name is overwritten, resulting in the following: 
+When at some point (at 24-06-2006) the name is changed to Old Cheese and the Name attribute is defined as type 1(B) the name is overwritten, resulting in the following:
 
 DWH Key	| Logical Key | Name | Colour | Start date | End date | Update date
 --- | --- | --- | --- | --- | --- | ---
@@ -85,41 +87,41 @@ DWH Key	| Logical Key | Name | Previous Name | Colour | Update date
 
 **Type 4**. This history tracking mechanism operates by using separate tables to store the history. One table contains the most recent version of the record and the history table contains some or all history.
 
-**Type 5**. The type 5 method of tracking history uses versions of tables for every period in time. Also known as �snapshotting�. No example is supplied since it�s basically a copy of the entire table.
+**Type 5**. The type 5 method of tracking history uses versions of tables for every period in time. Also known as �snapshotting�. No example is supplied since it's basically a copy of the entire table.
 
-**Type 6 / hybrid**. Also known as �twin time stamping�, the type 6 approach combines the concepts of type 1-B, type 2 and type 3 mechanisms (1+2+3=6!). In the following example the attribute combination is the name. It consists of two attributes.
+**Type 6 / hybrid**. Also known as �twin time stamping�, the type 6 approach combines the concepts of type 1-B, type 2 and type 3 mechanisms (1+2+3=6!). In the following example the attribute combination is the name. It consists of two attributes.
 A new record is inserted in the Data Warehouse table.
 
-DWH Key	| Logical Key | Name | Current Name | Colour | Start date | End date 
+DWH Key	| Logical Key | Name | Current Name | Colour | Start date | End date
 --- | --- | --- | --- | --- | --- | ---
 1 | CHS | Cheese | Cheese | Golden | 05-01-2000 | 31-12-9999
 
-After some time the name is changed to Old Cheese. This leads to a SCD2 event where a new record is inserted and an old one is closed off. At the same time, the history of the existing type 3 attribute is overwritten by a type 1-B event.        
+After some time the name is changed to Old Cheese. This leads to a SCD2 event where a new record is inserted and an old one is closed off. At the same time, the history of the existing type 3 attribute is overwritten by a type 1-B event.
 
-DWH Key	| Logical Key | Name | Current Name | Colour | Start date | End date 
+DWH Key	| Logical Key | Name | Current Name | Colour | Start date | End date
 --- | --- | --- | --- | --- | --- | ---
 2 | CHS | Old Cheese | Old Cheese | Golden | 20-07-2008 | 31-12-9999
 1 | CHS | Cheese | Old Cheese | Golden | 05-01-2000 | 19-07-2008
 
 Now you can see the previous record and all related facts against both the current and historical name. When a new change occurs, the following happens:
 
-DWH Key	| Logical Key | Name | Current Name | Colour | Start date | End date 
+DWH Key	| Logical Key | Name | Current Name | Colour | Start date | End date
 --- | --- | --- | --- | --- | --- | ---
 3 | CHS | A+ Cheese | A+ Cheese | Golden | 13-03-2010 | 31-12-9999
 2 | CHS | Old Cheese | A+ Cheese | Golden | 20-07-2008 | 12-03-2010
 1 | CHS | Cheese | A+ Cheese | Golden | 05-01-2000 | 19-07-2008
 
 ## Implementation Guidelines
+
 * Obviously, corresponding records are identified by the logical key.
 * Type 1-B and the corresponding concept in Type 6 usually require separate mappings to update the entire history. Special care from a performance perspective because it has to be avoided that the entire history will be rewritten over and over again when really only the latest situation for that logical key. This mapping will have to aggregate the dataset to merge the latest state per natural key with the target table, and it will have to run after the regular Type 2 processes.
 * Avoid using NULL in the end date attribute of the most recent record to indicate an open / recent record date. Some databases have troubles handling NULL values and it is best practice to avoid NULL values wherever possible, especially in dimensions.
-* It is advised to add an �current record indicator� for quick querying and easy understanding.
 * Depending on the location in the Data Warehouse either tables or attributes may be defined for a specific history type. For instance, defining a table as SCD Type 2 means that a change in every attribute will lead to a new record (and closing an old one). In Data Marts the common approach is often to specify a history type per attribute. So a change in one attribute may lead to an SCD Type 2 event, but a change in another one may cause the history to be overwritten.
 
 ## Considerations and Consequences
 Not applicable.
 
 ## Related Patterns
-* Design Pattern 011 � Kimball � Multiple SCD2 time periods.
-* Design Pattern 005 � Generic � Current view on historical data.
-* Design Pattern 007 � Kimball � Receiving order of information and late and early arrivals.
+* Design Pattern 011 - Kimball - Multiple SCD2 time periods.
+* Design Pattern 005 - Generic - Current view on historical data.
+* Design Pattern 007 - Kimball - Receiving order of information and late and early arrivals.