Skip to content

Commit 7da42fd

Browse files
committed
Tidy-ups - continued
1 parent e77689d commit 7da42fd

File tree

37 files changed

+149
-142
lines changed

37 files changed

+149
-142
lines changed

docs/design-patterns/design-pattern-data-vault-hub.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ The process performs a distinct selection on the business key attribute(s) in th
4040

4141
During the selection the key distribution approach is implemented to make sure a dedicated Data Warehouse key is created. This can be an integer value, a hash key (i.e. MD5 or SHA1) or a natural business key.
4242

43-
## Implementation Guidelines
43+
## Implementation guidelines
4444

4545
Hubs are core business concepts which must be immediately and uniquely identifiable through their name.
4646

@@ -61,7 +61,7 @@ When modeling the Hub tables try to be conservative when defining the business k
6161

6262
To cater for a situation where multiple Load Date / Time stamp values exist for a single business key, the minimum Load Date / Time stamp should be the value passed through with the HUB record. This can be implemented in ETL logic, or passed through to the database. When implemented at a database level, instead of using a SELECT DISTINCT, using the MIN function with a GROUP BY the business key can achieve both a distinct selection, and minimum Load Date / Time stamp in one step.
6363

64-
## Considerations and Consequences
64+
## CConsiderations and consequences
6565

6666
Multiple passes on the same Staging Layer data set are likely to be required: once for the Hub table(s) but also for any corresponding Link and Satellite tables.
6767

docs/design-patterns/design-pattern-data-vault-link-satellite-driving-key.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,11 +34,11 @@ This pattern is applicable for processing data for a Link-Satellite table, or it
3434

3535
Standard Link-Satellites use the Driving Key concept to manage the ending of old relationships.
3636

37-
## Implementation Guidelines
37+
## Implementation guidelines
3838

3939
To avoid data redundancy, it is recommended to manage this process into the target table as opposed to using end-dating.
4040

41-
## Considerations and Consequences
41+
## CConsiderations and consequences
4242

4343
## Related Patterns
4444

docs/design-patterns/design-pattern-data-vault-link-satellite.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,9 +31,9 @@ This pattern is only applicable for loading data to Link-Satellite tables from:
3131

3232
Standard Link-Satellites use the Driving Key concept to manage the ending of old relationships.
3333

34-
## Implementation Guidelines
34+
## Implementation guidelines
3535

36-
## Considerations and Consequences
36+
## CConsiderations and consequences
3737

3838
## Related Patterns
3939

docs/design-patterns/design-pattern-data-vault-link.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ Business Insights > Design Pattern 010 - Data Vault - Loading Link tables > imag
3939

4040
In a pure relational Link it is required that a dummy key is available in each corresponding Link-Satellite to complete the timelines. This is handled as part of the Link-Satellite processing as a Link can contain multiple Link-Satellites. Dummy records are only required to be inserted for each driving key as a view in time across the driving key is ultimately required. Inserting a dummy record for every Link key will cause issues in the timeline. This is explained in more detail in the Link-Satellite Design Pattern.
4141

42-
## Implementation Guidelines
42+
## Implementation guidelines
4343

4444
Use a single ETL process, module or mapping to load the Link table, thus improving flexibility in processing. Every ETL process should have a distinct function.
4545

@@ -56,7 +56,7 @@ The default and arguably most flexible way is to incorporate this concept as par
5656
Depending on how the Link table is modelled (what kind of relationship it manages) the Link table may contains a relationship type attribute. If a link table contains multiple, or changing, relationships (types) this attributes is moved to the Link-Satellite table.
5757
Ending /closing relationships is always done in the Link-Satellite table, typically using a separate ETL process.
5858

59-
## Considerations and Consequences
59+
## CConsiderations and consequences
6060

6161
Multiple passes on source data is likely to be required. In extreme cases a single source table might be used (branch out) to Hubs, Satellites, Links and Link Satellites.
6262

docs/design-patterns/design-pattern-data-vault-missing-keys-and-placeholders.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ Missing Source Attribute (Non recordable Source) (-6). Used when source fails to
3636
Missing Target Attribute (Non recordable DWH Attribute) (-7). Used for temporal data that falls before the deployment of the attribute.
3737
Deciding between the various types of unknown is a business question that is decided based on how the source database works.
3838

39-
## Considerations and Consequences
39+
## CConsiderations and consequences
4040
The Hubs must be pre-populated with the placeholder values (records).
4141
ETL processes loading data into the Integration Area must automatically resolve NULL values to (potentially different) placeholders.
4242
Implementing a full taxonomy of potential unknown values as hard business rules must be weighed against extra complexity while loading Integration Area tables.

docs/design-patterns/design-pattern-data-vault-satellite.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ The ETL process can be described as a slowly changing dimension / history update
2828
Load Date / Time Stamp (used for the target Effective Date / Time and potentially the Update Date / TimeE attributes).
2929
Source Row Id.
3030

31-
## Implementation Guidelines
31+
## Implementation guidelines
3232

3333
Multiple passes of the same source table or file are usually required. The first pass will insert new keys in the Hub table; the other passes are needed to populate the Satellite and Link tables.
3434

@@ -53,7 +53,7 @@ If you have a Change Data Capture based source, the attribute comparison is not
5353

5454
Use hash values to detect changes, instead of comparing attributes separately. The hash value is created from all attributes except the business key and ETL process control values.
5555

56-
## Considerations and Consequences
56+
## CConsiderations and consequences
5757

5858
Multiple passes on source data are likely to be required.
5959

docs/design-patterns/design-pattern-dimensional-loading-from-the-persistent-staging-area.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -43,12 +43,12 @@ Therefore, the logic is slightly more complex. Joining Persistent Staging Area t
4343

4444
### Example Datasets
4545

46-
| HSTG Table 1 | Key | INSERT_DATETIME | Fund Code | Amount |
46+
| PSA Table 1 | Key | INSERT_DATETIME | Fund Code | Amount |
4747
|--------------|-----|-----------------|-----------|------------|
4848
| | 1 | 2012-01-01 | ABC | $1,000,000 |
4949
| | 2 | 2013-06-02 | ABC | $1,500,000 |
5050

51-
| HSTG Table 2 | Key | INSERT_DATETIME | Fund Code | Short Name | Additional Amount |
51+
| PSA Table 2 | Key | INSERT_DATETIME | Fund Code | Short Name | Additional Amount |
5252
|--------------|-----|-----------------|-----------|------------|-------------------|
5353
| | 1 | 2012-04-05 | ABC | ABC Corp | $5,000 |
5454
| | 2 | 2013-07-07 | ABC | ABC Pty | $5,000 |
@@ -79,9 +79,9 @@ Therefore, the logic is slightly more complex. Joining Persistent Staging Area t
7979
```sql
8080
-- Select all variations of the available time intervals
8181
WITH TimeIntervals AS (
82-
SELECT INSERT_DATETIME FROM HSTG_Table1
82+
SELECT INSERT_DATETIME FROM PSA_Table1
8383
UNION
84-
SELECT INSERT_DATETIME FROM HSTG_Table2
84+
SELECT INSERT_DATETIME FROM PSA_Table2
8585
),
8686

8787
-- Calculate the ranges (time intervals / slices) between available time intervals
@@ -99,32 +99,32 @@ Ranges AS (
9999
-- Connect source table 1
100100
Table1 AS (
101101
SELECT
102-
c.HSTG_Table1_SK,
102+
c.PSA_Table1_SK,
103103
c.Fundcode,
104104
c.Total_Amount,
105105
c.INSERT_DATETIME AS EFFECTIVE_DATETIME,
106106
COALESCE(MIN(c2.INSERT_DATETIME), CONVERT(DATETIME, '99991231')) AS EXPIRY_DATETIME
107-
FROM HSTG_Table1 c
108-
LEFT JOIN HSTG_Table1 c2 ON
107+
FROM PSA_Table1 c
108+
LEFT JOIN PSA_Table1 c2 ON
109109
c.Fundcode = c2.Fundcode AND
110110
c.INSERT_DATETIME < c2.INSERT_DATETIME
111-
GROUP BY c.HSTG_Table1_SK, c.Fundcode, c.Total_Amount, c.INSERT_DATETIME
111+
GROUP BY c.PSA_Table1_SK, c.Fundcode, c.Total_Amount, c.INSERT_DATETIME
112112
),
113113

114114
-- Connect source table 2
115115
Table2 AS (
116116
SELECT
117-
c.HSTG_Table2_SK,
117+
c.PSA_Table2_SK,
118118
c.Fundcode,
119119
c.Short_name,
120120
c.Additional_amount,
121121
c.INSERT_DATETIME AS EFFECTIVE_DATETIME,
122122
COALESCE(MIN(c2.INSERT_DATETIME), CONVERT(DATETIME, '99991231')) AS EXPIRY_DATETIME
123-
FROM HSTG_Table2 c
124-
LEFT JOIN HSTG_Table2 c2 ON
123+
FROM PSA_Table2 c
124+
LEFT JOIN PSA_Table2 c2 ON
125125
c.Fundcode = c2.Fundcode AND
126126
c.INSERT_DATETIME < c2.INSERT_DATETIME
127-
GROUP BY c.HSTG_Table2_SK, c.Fundcode, c.Short_Name, c.Additional_Amount, c.INSERT_DATETIME
127+
GROUP BY c.PSA_Table2_SK, c.Fundcode, c.Short_Name, c.Additional_Amount, c.INSERT_DATETIME
128128
)
129129

130130
-- Join tables to time ranges

docs/design-patterns/design-pattern-dimensional-time-dimension.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,10 +19,10 @@ Implementation guidelines
1919
Every separate source system has its own directory in the landing area.
2020
Every source directory has an archive directory.
2121

22-
## Considerations and Consequences
22+
## CConsiderations and consequences
2323
The decision not to copy the data types from the file definitions but to check and explicitly convert these in the ETL process will mean that explicit checks and data type conversions will have to be added later.
2424
Known uses
2525
None.
2626

2727
## Related Patterns
28-
Design Pattern 015 Generic Loading Staging Area tables.
28+
Design Pattern 015 Generic Loading Staging Area tables.

docs/design-patterns/design-pattern-generic-control-framework.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@ This pattern is only applicable for every process in the data solution.
1212

1313
## Structure
1414

15-
## Implementation Guidelines
15+
## Implementation guidelines
1616

17-
## Considerations and Consequences
17+
## CConsiderations and consequences
1818

1919
## Related Patterns

docs/design-patterns/design-pattern-generic-data extraction from internal systems.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,10 @@ This Design Pattern describes the overarching concepts related to extracting dat
1717
* The standard data integration tool must be used to extract data from the source systems, unless another efficient data extract utility is provided as part of the application package (this may include using SQL for ETL)
1818
* Implement incremental extracts where possible, as this is more scalable.
1919

20-
## Implementation Guidelines
20+
## Implementation guidelines
2121

2222

23-
## Considerations and Consequences
23+
## CConsiderations and consequences
2424

2525

2626
## Related Patterns

0 commit comments

Comments
 (0)