data-solution-automation-engine
diff --git a/‎Design_Patterns/Design Pattern - Data Vault - Loading Hub tables.md
Lines changed: 12 additions & 13 deletions b/‎Design_Patterns/Design Pattern - Data Vault - Loading Hub tables.md
Lines changed: 12 additions & 13 deletions
diff --git a/‎Design_Patterns/Design Pattern - Data Vault - Loading Link Satellite tables.md
Lines changed: 17 additions & 20 deletions b/‎Design_Patterns/Design Pattern - Data Vault - Loading Link Satellite tables.md
Lines changed: 17 additions & 20 deletions
diff --git a/‎Design_Patterns/Design Pattern - Data Vault - Loading Satellite tables.md
Lines changed: 3 additions & 3 deletions b/‎Design_Patterns/Design Pattern - Data Vault - Loading Satellite tables.md
Lines changed: 3 additions & 3 deletions
@@ -1,25 +1,24 @@
 # Design Pattern - Data Vault - Loading Hub tables
 
 ## Purpose
-This Design Pattern describes how to load data into Data Vault Hub style entities.
+This Design Pattern describes how to load data into Data Vault Hub style tables. It is a specification of the Hub ETL process.
 
 ## Motivation
-Loading data into Hub tables is a relatively straightforward process with a set location in the architecture: it is applied when loading data from the Staging Layer to the Integration Layer. It is a vital component of the Data Warehouse architecture, making sure that Data Warehouse keys are distributed properly and at the right point in time. 
+Loading data into Hub tables is a relatively straightforward process with a clearly defined location in the architecture: it is applied when loading data from the Staging Layer to the Integration Layer. 
 
-Decoupling key distribution and historical information is an essential requirement for reducing dependencies in the loading process and enabling flexible storage design in the Data Warehouse. 
+The Hub is a vital component of a Data Vault solution, making sure that Data Warehouse keys are distributed properly and at the right point in time. 
 
-This pattern specifies how the Hub ETL process works and why it is important to follow. 
-
-In a Data Vault based Enterprise Data Warehouse solution, the Hub tables (and corresponding ETL) are the only places where Data Warehouse keys are distributed.
+Decoupling key distribution and managing historical information (changes over time) is essential to reduce loading dependencies. It also simplifies (flexible) storage design in the Data Warehouse. 
 
 Also known as:
 
+- Core Business Concept (Ensemble modelling)
 - Hub (Data Vault modelling concept)
-- Surrogate Key (SK) or Hash Key (HSH) distribution
+- Surrogate Key (SK) or Hash Key (HSH) distribution, as commonly used implementations of the concept
 - Data Warehouse key distribution
 
 ## Applicability
-This pattern is applicable for the process of loading from the Staging Layer into the Integration Area Hub tables. It is used in all Hub in the Integration Layer. Derived (Business Data Vault) Hub tables follow the same pattern, but with business logic applied.
+This pattern is applicable for the process of loading from the Staging Layer into Hub tables. It is used in all Hubs in the Integration Layer. Derived (Business Data Vault) Hub ETL processes follow the same pattern.
 
 ## Structure
 A Hub table contains the unique list of business key, and the corresponding Hub ETL process can be described as an ‘insert only’ of the unique business keys that are not yet in the the target Hub. 
@@ -44,15 +43,15 @@ The logic to create the initial (dummy) Satellite record can both be implemented
 
 When modeling the Hub tables try to be conservative when defining the business keys. Not every foreign key in the source indicates a business key and therefore a Hub table. A true business key is a concept that is known and used throughout the organisation (and systems) and is ‘self-standing’ and meaningful.
 
-To cater for a situation where multiple OMD_INSERT_DATETIME values exist for a single business key, the minimum OMD_INSERT_DATETIME should be the value passed through with the HUB record. This can be implemented in ETL logic, or passed through to the database.  When implemented at a database level, instead of using a SELECT DISTINCT, using the MIN function with a GROUP BY the business key can achieve both a distinct selection, and minimum OMD_INSERT_DATETIME in one step.
+To cater for a situation where multiple Load Date / Time stamp values exist for a single business key, the minimum Load Date / Time stamp should be the value passed through with the HUB record. This can be implemented in ETL logic, or passed through to the database.  When implemented at a database level, instead of using a SELECT DISTINCT, using the MIN function with a GROUP BY the business key can achieve both a distinct selection, and minimum Load Date / Time stamp in one step.
 
 ## Considerations and Consequences
 Multiple passes on the same Staging Layer data set are likely to be required: once for the Hub table(s) but also for any corresponding Link and Satellite tables. 
 
 Defining Hub ETL processes as atomic modules, as defined in this Design Pattern, means that many Staging Layer tables load data to the same central Hub table. All processes will be very similar with the only difference being the mapping between the Staging Layer business key attribute and the target Hub business key counterpart.
 
 ## Related Patterns
-Design Pattern 006 – Generic – Using Start, Process and End Dates
-Design Pattern 009 – Data Vault – Loading Satellite tables
-Design Pattern 010 – Data Vault – Loading Link tables
-Design Pattern 023 – Data Vault – Missing keys and placeholders
+* Design Pattern 006 – Generic – Using Start, Process and End Dates
+* Design Pattern 009 – Data Vault – Loading Satellite tables
+* Design Pattern 010 – Data Vault – Loading Link tables
+* Design Pattern 023 – Data Vault – Missing keys and placeholders
@@ -5,57 +5,54 @@ This Design Pattern describes how to load data into Link-Satellite tables within
 
 ## Motivation
 
-Also known as
-Link-Satellite (Data Vault modelling concept).
-History or INT tables.
+To provide a generic approach for loading Link Satellites.
 
 ## Applicability
+
 This pattern is only applicable for loading data to Link-Satellite tables from:
-The Staging Area into the Integration Area.
-The Integration Area into the Interpretation Area.
-The only difference to the specified ETL template is any business logic required in the mappings towards the Interpretation Area tables.
+
+* The Staging Area into the Integration Area.
+* The Integration Area into the Interpretation Area.
+* The only difference to the specified ETL template is any business logic required in the mappings towards the Interpretation Area tables.
 
 ## Structure
  Standard Link-Satellites use the Driving Key concept to manage the ending of ‘old’ relationships.
 
 ## Implementation Guidelines
 Multiple passes of the same source table or file are usually required. The first pass will insert new keys in the Hub table; the other passes are needed to populate the Satellite and Link tables.
 Select all records for the Link Satellite which have more than one open effective date / current record indicator but are not the most recent (because that record does not need to be closed
-WITH MyCTE (<Link SK>, <Driving Key SK>, OMD_EFFECTIVE_DATE, OMD_EXPIRY_DATE, RowVersion)
+WITH MyCTE (<Link SK>, <Driving Key SK>, <Effective Date/Time>, <Expiry Date/Time>, RowVersion)
 AS (
   SELECT
-     A.<Link SK>, B.<Driving Key SK>, A.OMD_EFFECTIVE_DATE, A.OMD_EXPIRY_DATE,
-     DENSE_RANK() OVER(PARTITION BY B.<Driving Key SK> ORDER BY B.<Link SK>, OMD_EFFECTIVE_DATE ASC) RowVersion
+     A.<Link SK>, B.<Driving Key SK>, A.<Effective Date/Time>, A.<Expiry Date/Time>,
+     DENSE_RANK() OVER(PARTITION BY B.<Driving Key SK> ORDER BY B.<Link SK>, <Effective Date/Time> ASC) RowVersion
   FROM <Link Sat table> A
   JOIN <Link table> B ON A.<Link SK>=B.<Link SK>
   JOIN (
     SELECT <Driving Key SK>
     FROM <Link Sat table> A
     JOIN <Link table> B ON A.<Link SK>=B.<Link SK>
-    WHERE A.OMD_EXPIRY_DATE = '99991231'
+    WHERE A.<Expiry Date/Time> = '99991231'
     GROUP BY <Driving Key SK>
     HAVING COUNT(*) > 1
   ) C ON B.<Driving Key SK> = C.<Driving Key SK>
 )
 SELECT
   BASE.<Link SK>
-  ,CASE WHEN LAG.OMD_EFFECTIVE_DATE IS NULL THEN '19000101' ELSE BASE.OMD_EFFECTIVE_DATE END AS OMD_EFFECTIVE_DATE
-  ,CASE WHEN LEAD.OMD_EFFECTIVE_DATE IS NULL THEN '99991231' ELSE LEAD.OMD_EFFECTIVE_DATE END AS OMD_EXPIRY_DATE
-  ,CASE WHEN LEAD.OMD_EFFECTIVE_DATE IS NULL THEN 'Y' ELSE 'N' END AS OMD_CURRENT_RECORD_INDICATOR
+  ,CASE WHEN LAG.<Effective Date/Time> IS NULL THEN '19000101' ELSE BASE.<Effective Date/Time> END AS <Effective Date/Time>
+  ,CASE WHEN LEAD.<Effective Date/Time> IS NULL THEN '99991231' ELSE LEAD.<Effective Date/Time> END AS <Expiry Date/Time>
+  ,CASE WHEN LEAD.<Effective Date/Time> IS NULL THEN 'Y' ELSE 'N' END AS <Current Row Indicator>
 FROM MyCTE BASE
 LEFT JOIN MyCTE LEAD ON BASE.<Driving Key SK> = LEAD.<Driving Key SK>
   AND BASE.RowVersion = LEAD.RowVersion-1
 LEFT JOIN MyCTE LAG ON BASE.<Driving Key SK> = LAG.<Driving Key SK>
   AND BASE.RowVersion = LAG.RowVersion+1
-WHERE BASE.OMD_EXPIRY_DATE = '99991231'
+WHERE BASE.<Expiry Date/Time> = '99991231'
 
 ## Considerations and Consequences
 Multiple passes on source data are likely to be required.
-Known uses
 
 ## Related Patterns
-Design Pattern 006 – Using Start, Process and End Dates
-Design Pattern 009 – Loading Satellite tables.
-Design Pattern 010 – Loading Link tables.
-Discussion items (not yet to be implemented or used until final)
-None.
+* Design Pattern 006 – Using Start, Process and End Dates
+* Design Pattern 009 – Loading Satellite tables.
+* Design Pattern 010 – Loading Link tables.
@@ -32,10 +32,10 @@ Figure 2: Dependencies
 Multiple passes of the same source table or file are usually required. The first pass will insert new keys in the Hub table; the other passes are needed to populate the Satellite and Link tables.
 The process in Figure 1 shows the entire ETL in one single process. For specific tools this way of developing ETL might be relatively inefficient. Therefore, the process can also be broken up into two separate mappings; one for inserts and one for updates. Logically the same actions will be executed, but physically two separate mappings can be used. This can be done in two ways:
 Follow the same logic, with the same selects, but place filters for the update and insert branches. This leads to an extra pass on the source table, at the possible benefit of running the processes in parallel.
-Only run the insert branch and automatically update the end dates based on the existing information in the Satellite. This process selects all records in the Satellite which have more than one open OMD_EXPIRY_DATE (this is the case after running the insert branch separately), sorts the records in order and uses the OMD_EFFECTIVE_DATE from the previous record to close the next one. This introduces a dependency between the insert and update branch, but will run faster. An extra benefit is that this also closes off any previous records that were left open. As sample query for this selection is:
-SELECT satellite.DWH_ID, satellite.OMD_EXPIRY_DATE
+Only run the insert branch and automatically update the end dates based on the existing information in the Satellite. This process selects all records in the Satellite which have more than one open EXPIRY_DATE (this is the case after running the insert branch separately), sorts the records in order and uses the EFFECTIVE_DATE from the previous record to close the next one. This introduces a dependency between the insert and update branch, but will run faster. An extra benefit is that this also closes off any previous records that were left open. As sample query for this selection is:
+SELECT satellite.DWH_ID, satellite.<Expiry Date/Time>
 FROM  satellite
-WHERE  (            satellite.OMD_EXPIRY_DATE IS NULL AND
+WHERE  (            satellite.<Expiry Date/Time> IS NULL AND
                             2 <= (SELECT COUNT(DWH_ID)
                                      FROM satellite A WHERE a.DWH_ID = satellite.DWH_ID
                                   AND a.FIRM_LEDTS IS NULL)