Skip to content

Commit 64c218f

Browse files
committed
Minor trweaks and typos
1 parent e50306b commit 64c218f

7 files changed

+45
-45
lines changed

docs/design-patterns/design-pattern-generic-interfacing-from-an-operational-system.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,14 @@
11
# Design Pattern - Generic - Interfacing to an operational (source) system
22

3+
> [!WARNING]
4+
> This design pattern requires a major update to refresh the content.
5+
36
## Purpose
7+
48
This Design Pattern describes the generic requirements, rationale and approach when there is a need to obtain information from a operational (OLTP) system. From the perspective of the Data Warehouse this is considered a 'source' or 'feeding' system.
59

610
## Motivation
11+
712
The interfaces between operational systems and the Data Warehouse is one of the most complex and difficult areas to implement, because the solution are in many cases dependent on various external factors including (but not limited to) funding, security, IT processes and controls and technology. Still, to enable a full audit trail and prevent issues associated with incorrect or incomplete receiving of data there are some fundamental requirements that every interface should comply with.
813
The scope of this pattern covers the access and retrieval of data delta (changes) into the Staging Layer of the Data Warehouse, including delta detection concepts such as Change Data Capture (CDC), Change Tracking (CT) and exposure of data via APIs.
914

@@ -12,10 +17,13 @@ The exact technical implementation of these concepts depends on an evaluation of
1217
This area of focus is also known as 'Sourcing' or 'Interfacing' with other systems.
1318

1419
## Applicability
20+
1521
This Design Pattern is applicable for every data that is presented to / loaded in the Data Warehouse (Staging Layer). This usually is a result of a project initiation, information analysis task or change request.
1622

1723
## Structure
24+
1825
The following list captures the fundamental requirements of an interface between an operational system and the Data Warehouse:
26+
1927
* Flexibility in expanding the scope of data access (adding additional artefacts / tables). The selected solution should aim to make it relatively easy to add additional information to the interface. This prevents all data to be staged as part of a delivery, which may incur a maintenance overhead for data that may not (yet) be required. The best balance is usually found when more data can be added to an interface on an iterative basis.
2028
* Support for a pull mechanism (pulling the data delta into the Staging Layer from the Data Warehosue perspective) as opposed to a push mechanism (scheduled extracts from the source / feeding system 'pushing' the data into the Staging Layer). This allows the Data Warehouse to manage the loading frequency (i.e. increase ETL frequency). CDC solutions need to ensure adequate resource governance to prevent impact on OLTP performance.
2129
* Granular access to data (original raw, atomic data as opposed to aggregated information).
@@ -27,12 +35,15 @@ The following list captures the fundamental requirements of an interface between
2735
* The interface needs to be able to detect and provide notice of record deletes. The Data Warehouse will store this information as a *logical delete*, which means the data row is understood to be closed in the source system (i.e. the most recent state of the record is 'deleted').
2836

2937
## Implementation guidelines
38+
3039
* Consider agreeing on a data interfacing contract / agreement (Service Level Agreement) where possible.
3140
* Consider scalability in terms of ability to add more data elements or tables to existing interfaces. In some cases significant effort may be required to add or modify interfaces, and this effort may be limited by increasing the scope of data in a single change. This is especially the case when dealing with third-party systems. It may be better to retrieve all the data in one go as in some cases the first contact is �free�, but subsequent efforts to add additional interfaces typically meet more resistance, suffer from politics or require additional funding. A downside to this approach is the extra effort both in terms of maintenance and development to load/ stage (and perhaps integrate) all tables. However, in some cases this trade-off is a positive one in the longer term when no extra communication to the source system owners is required.
3241
* An additional consideration is that information that is captured by the Data Warehouse can be archived in the Persistent Staging Area (PSA), which enables the solution to collect information early on in the development cycle (for use later on).
3342

3443
## Considerations and consequences
44+
3545
Not applicable.
3646

3747
## Related patterns
38-
* Design Pattern 006 - Generic - Managing Temporality by using Start, Process and End Dates.
48+
49+
* Design Pattern - Generic - Managing Temporality by using Start, Process and End Dates.
Lines changed: 19 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,22 @@
11
# Design Pattern - Generic - Loading Landing Area tables using Record Condensing
22

3+
> [!WARNING]
4+
> This design pattern requires a major update to refresh the content.
5+
36
## Purpose
7+
48
This Design Pattern specifies how a data source that contains multiple changes for the same business key is processed. For instance when using �net changes� within a Change Data Capture interval or when the source application supplies redundant records.
9+
510
Motivation
6-
This process is optional for the Staging Area; its application depends on the specific (nature of the) data source itself. The reason to implement a �condense� process in the Staging Area ETL is to prevent implementing this logic in multiple locations when loading data out of the Staging Area (to the History and Integration Areas). During this process no information is lost, only redundant records are removed. These are records that are, in reality, no changes at all in the Data Warehouse context.
11+
12+
This process is optional for the Staging Area; its application depends on the specific (nature of the) data source itself. The reason to implement a 'condense' process in the Staging Area ETL is to prevent implementing this logic in multiple locations when loading data out of the Staging Area (to the History and Integration Areas). During this process no information is lost, only redundant records are removed. These are records that are, in reality, no changes at all in the Data Warehouse context.
13+
714
Also known as
815
Condensing Records
916
Net changes
10-
Applicability
17+
18+
## Applicability
19+
1120
This pattern is only applicable for loading processes from source systems or files to the Staging Area (of the Staging Layer) only. Also, this process should only be added to the Staging Area ETL when the data source shows this particular behaviour or when a history of changes is loaded in one run (catch-up for instance).
1221
Structure
1322
Depending on the nature of the source data, the following situation may occur. In this example these are the original records as they appear in the source system:
@@ -45,49 +54,25 @@ Cheese
4554
This is a situation where the condensing process can be implemented so that the record will not be inserted into the Data Warehouse as a new record (without there being a change).
4655
The process to do this is as follows:
4756

57+
## Implementation guidelines
4858

49-
Figure 1: Record condensing in STG
50-
Implementation guidelines
5159
The condensation process should be part of the Staging Area ETL process.
5260
Depending on the available ETL software this process can be defined as a reusable or generic object.
5361
This Design Pattern attempts to avoid ETL design where you have to run a source which contains multiple intervals (typically days) of data multiple times to correctly record the history. With this concept the entire history can be loaded in one run.
5462
If all changes from a CDC source are captured this process is not required.
5563
There is a performance overhead when processing larger deltas or when running an initial load.
5664
Typically CDC sources where not all changes are processed but only the net changes for an interval. For instance when only the last change per day should be processed.
5765
Message sources can have the same issue when treated the same way (only last record state per interval).
58-
Design Pattern 015 � Generic � Loading Staging Area tables.
59-
Design Pattern 021 � Generic � Using CDC.
60-
Consequences
61-
There is a performance overhead when processing larger deltas or when running an initial load.
62-
Known uses
63-
Typically CDC sources where not all changes are processed but only the net changes for an interval. For instance when only the last change per day should be processed.
64-
Message sources can have the same issue when treated the same way (only last record state per interval).
65-
Related patterns
66-
Design Pattern 015 � Generic � Loading Staging Area tables.Design Pattern 015 - Generic - Loading Staging Area Tables
67-
Design Pattern 021 � Generic � Using CDC.
68-
Discussion items (not yet to be implemented or used until final)
69-
None.
7066

71-
## Motivation
72-
73-
74-
75-
## Applicability
76-
77-
78-
79-
## Structure
80-
81-
82-
83-
## Implementation guidelines
84-
85-
86-
87-
## Considerations and consequences
67+
## Consequences and considerations
8868

69+
There is a performance overhead when processing larger deltas or when running an initial load.
8970

71+
Typically CDC sources where not all changes are processed but only the net changes for an interval. For instance when only the last change per day should be processed.
72+
Message sources can have the same issue when treated the same way (only last record state per interval).
9073

9174
## Related patterns
9275

93-
-
76+
* Design Pattern - Generic - Loading Staging Area tables
77+
* Design Pattern - Generic - Loading Staging Area Tables
78+
* Design Pattern - Generic - Using CDC.

docs/design-patterns/design-pattern-generic-logical-partition-keys.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
# Design Pattern - Generic - Logical Partition Keys
22

3+
> [!WARNING]
4+
> This design pattern requires a major update to refresh the content.
5+
36
## Purpose
47

58
This design pattern describes how to handle large data volumes by using logical partition keys. It is a technique which may help loading large datasets faster; an alternative approach to handling large data volumes.

docs/design-patterns/design-pattern-generic-managing-multi-temporality.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
# Managing temporality by using Load, Event and Change dates
22

3+
> [!WARNING]
4+
> This design pattern requires a major update to refresh the content.
5+
36
## Purpose
47

58
This Design Pattern describes how and when dates and timestamps are generated by the Data Warehouse.

docs/design-patterns/design-pattern-generic-referential-integrity.md

Lines changed: 4 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,14 @@
11
# Design Pattern - Generic - Referential Integrity
22

3-
## Purpose
4-
3+
> [!WARNING]
4+
> This design pattern requires a major update to refresh the content.
55
6+
## Purpose
67

78
## Motivation
89

9-
10-
1110
## Applicability
1211

13-
14-
1512
## Structure
1613

1714
Referential Integrity and constraints
@@ -33,12 +30,10 @@ Every Data Warehouse table contains a predefined set of metadata attributes, whi
3330

3431
## Implementation guidelines
3532

36-
37-
3833
## Considerations and consequences
3934

4035
TBD
4136

4237
## Related patterns
4338

44-
N/A.
39+
N/A

docs/design-patterns/design-pattern-logical-natural-business-relationship.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,9 @@ uid: design-pattern-logical-natural-business-relationship
44

55
# Design Pattern - Natural Business Relationship
66

7+
> [!WARNING]
8+
> This design pattern requires a major update to refresh the content.
9+
710
## Purpose
811

912
This design pattern defines a Natural Business Relationship (NBR) entity.

docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Design and implementation of data solutions can be a labour-intensive activity t
1818

1919
Over time, as requirements change and demand for data increases, the architecture faces challenges in the complexity, consistency and flexibility in the design (and maintenance) of the data integration processes.
2020

21-
These changes can include latency and availability requirements, a bigger variety of operational systems that generate data, and the need to expose information in different ways. At the same time, data tends to increate in volume, variety, and velocity.
21+
These changes can include latency and availability requirements, a bigger variety of operational systems that generate data, and the need to expose information in different ways. At the same time, data tends to increase in volume, variety, and velocity.
2222

2323
These issues are compounded by an absence of agreed industry best practices; which in turn leads to various ad-hoc patterns being implemented based on an individual's experience (or lack thereof). Implications of (poor) design decisions are often not fully understood, and only become apparent when the investment in time and money has already been done.
2424

0 commit comments

Comments
 (0)