Minor trweaks and typos

RoelantVos · RoelantVos · commit 64c218fabf59 · 2025-07-07T07:36:16.000+10:00
diff --git a/docs/design-patterns/design-pattern-generic-interfacing-from-an-operational-system.md b/docs/design-patterns/design-pattern-generic-interfacing-from-an-operational-system.md
@@ -1,9 +1,14 @@
 # Design Pattern - Generic - Interfacing to an operational (source) system
 
+> [!WARNING]
+> This design pattern requires a major update to refresh the content.
+
 ## Purpose
+
 This Design Pattern describes the generic requirements, rationale and approach when there is a need to obtain information from a operational (OLTP) system. From the perspective of the Data Warehouse this is considered a 'source' or 'feeding' system.
 
 ## Motivation
+
 The interfaces between operational systems and the Data Warehouse is one of the most complex and difficult areas to implement, because the solution are in many cases dependent on various external factors including (but not limited to) funding, security, IT processes and controls and technology. Still, to enable a full audit trail and prevent issues associated with incorrect or incomplete receiving of data there are some fundamental requirements that every interface should comply with. 
 The scope of this pattern covers the access and retrieval of data delta (changes) into the Staging Layer of the Data Warehouse, including delta detection concepts such as Change Data Capture (CDC), Change Tracking (CT) and exposure of data via APIs.
 
@@ -12,10 +17,13 @@ The exact technical implementation of these concepts depends on an evaluation of
 This area of focus is also known as 'Sourcing' or 'Interfacing' with other systems.
 
 ## Applicability
+
 This Design Pattern is applicable for every data that is presented to / loaded in the Data Warehouse (Staging Layer). This usually is a result of a project initiation, information analysis task or change request.
 
 ## Structure
+
 The following list captures the fundamental requirements of an interface between an operational system and the Data Warehouse:
+
 * Flexibility in expanding the scope of data access (adding additional artefacts / tables). The selected solution should aim to make it relatively easy to add additional information to the interface. This prevents all data to be staged as part of a delivery, which may incur a maintenance overhead for data that may not (yet) be required. The best balance is usually found when more data can be added to an interface on an iterative basis.
 * Support for a pull mechanism (pulling the data delta into the Staging Layer from the Data Warehosue perspective) as opposed to a push mechanism (scheduled extracts from the source / feeding system 'pushing' the data into the Staging Layer). This allows the Data Warehouse to manage the loading frequency (i.e. increase ETL frequency). CDC solutions need to ensure adequate resource governance to prevent impact on OLTP performance.
 * Granular access to data (original raw, atomic data as opposed to aggregated information).
@@ -27,12 +35,15 @@ The following list captures the fundamental requirements of an interface between
 * The interface needs to be able to detect and provide notice of record deletes. The Data Warehouse will store this information as a *logical delete*, which means the data row is understood to be closed in the source system (i.e. the most recent state of the record is 'deleted').
 
 ## Implementation guidelines
+
 * Consider agreeing on a data interfacing contract / agreement (Service Level Agreement) where possible.
 * Consider scalability in terms of ability to add more data elements or tables to existing interfaces. In some cases significant effort may be required to add or modify interfaces, and this effort may be limited by increasing the scope of data in a single change. This is especially the case when dealing with third-party systems. It may be better to retrieve all the data in one go as in some cases the first contact is �free�, but subsequent efforts to add additional interfaces typically meet more resistance, suffer from politics or require additional funding. A downside to this approach is the extra effort both in terms of maintenance and development to load/ stage (and perhaps integrate) all tables. However, in some cases this trade-off is a positive one in the longer term when no extra communication to the source system owners is required.
 * An additional consideration is that information that is captured by the Data Warehouse can be archived in the Persistent Staging Area (PSA), which enables the solution to collect information early on in the development cycle (for use later on).
 
 ## Considerations and consequences
+
 Not applicable.
 
 ## Related patterns
-* Design Pattern 006 - Generic - Managing Temporality by using Start, Process and End Dates.
+
+* Design Pattern - Generic - Managing Temporality by using Start, Process and End Dates.
diff --git a/docs/design-patterns/design-pattern-generic-loading-landing-area-tables-using-row-compacting.md b/docs/design-patterns/design-pattern-generic-loading-landing-area-tables-using-row-compacting.md
@@ -1,13 +1,22 @@
 # Design Pattern - Generic - Loading Landing Area tables using Record Condensing
 
+> [!WARNING]
+> This design pattern requires a major update to refresh the content.
+
 ## Purpose
+
 This Design Pattern specifies how a data source that contains multiple changes for the same business key is processed. For instance when using �net changes� within a Change Data Capture interval or when the source application supplies redundant records.
+
 Motivation
-This process is optional for the Staging Area; its application depends on the specific (nature of the) data source itself. The reason to implement a �condense� process in the Staging Area ETL is to prevent implementing this logic in multiple locations when loading data out of the Staging Area (to the History and Integration Areas). During this process no information is lost, only redundant records are removed. These are records that are, in reality, no changes at all in the Data Warehouse context.
+
+This process is optional for the Staging Area; its application depends on the specific (nature of the) data source itself. The reason to implement a 'condense' process in the Staging Area ETL is to prevent implementing this logic in multiple locations when loading data out of the Staging Area (to the History and Integration Areas). During this process no information is lost, only redundant records are removed. These are records that are, in reality, no changes at all in the Data Warehouse context.
+
 Also known as
 Condensing Records
 Net changes
-Applicability
+
+## Applicability
+
 This pattern is only applicable for loading processes from source systems or files to the Staging Area (of the Staging Layer) only. Also, this process should only be added to the Staging Area ETL when the data source shows this particular behaviour or when a history of changes is loaded in one run (catch-up for instance).
 Structure
 Depending on the nature of the source data, the following situation may occur. In this example these are the original records as they appear in the source system:
@@ -45,49 +54,25 @@ Cheese
 This is a situation where the condensing process can be implemented so that the record will not be inserted into the Data Warehouse as a new record (without there being a change).
 The process to do this is as follows:
 
+## Implementation guidelines
 
- Figure 1: Record condensing in STG
-Implementation guidelines
 The condensation process should be part of the Staging Area ETL process.
 Depending on the available ETL software this process can be defined as a reusable or generic object.
 This Design Pattern attempts to avoid ETL design where you have to run a source which contains multiple intervals (typically days) of data multiple times to correctly record the history. With this concept the entire history can be loaded in one run.
 If all changes from a CDC source are captured this process is not required.
 There is a performance overhead when processing larger deltas or when running an initial load.
 Typically CDC sources where not all changes are processed but only the net changes for an interval. For instance when only the last change per day should be processed.
 Message sources can have the same issue when treated the same way (only last record state per interval).
-Design Pattern 015 � Generic � Loading Staging Area tables.
-Design Pattern 021 � Generic � Using CDC.
-Consequences
-There is a performance overhead when processing larger deltas or when running an initial load.
-Known uses
-Typically CDC sources where not all changes are processed but only the net changes for an interval. For instance when only the last change per day should be processed.
-Message sources can have the same issue when treated the same way (only last record state per interval).
-Related patterns
-Design Pattern 015 � Generic � Loading Staging Area tables.Design Pattern 015 - Generic - Loading Staging Area Tables
-Design Pattern 021 � Generic � Using CDC.
-Discussion items (not yet to be implemented or used until final)
-None.
 
-## Motivation
-
-
-
-## Applicability
-
-
-
-## Structure
-
-
-
-## Implementation guidelines
-
-
-
-## Considerations and consequences
+## Consequences and considerations
 
+There is a performance overhead when processing larger deltas or when running an initial load.
 
+Typically CDC sources where not all changes are processed but only the net changes for an interval. For instance when only the last change per day should be processed.
+Message sources can have the same issue when treated the same way (only last record state per interval).
 
 ## Related patterns
 
-- 
+* Design Pattern - Generic - Loading Staging Area tables
+* Design Pattern - Generic - Loading Staging Area Tables
+* Design Pattern - Generic - Using CDC.
diff --git a/docs/design-patterns/design-pattern-generic-logical-partition-keys.md b/docs/design-patterns/design-pattern-generic-logical-partition-keys.md
@@ -1,5 +1,8 @@
 # Design Pattern - Generic - Logical Partition Keys
 
+> [!WARNING]
+> This design pattern requires a major update to refresh the content.
+
 ## Purpose
 
 This design pattern describes how to handle large data volumes by using logical partition keys. It is a technique which may help loading large datasets faster; an alternative approach to handling large data volumes.
diff --git a/docs/design-patterns/design-pattern-generic-managing-multi-temporality.md b/docs/design-patterns/design-pattern-generic-managing-multi-temporality.md
@@ -1,5 +1,8 @@
 ﻿# Managing temporality by using Load, Event and Change dates
 
+> [!WARNING]
+> This design pattern requires a major update to refresh the content.
+
 ## Purpose
 
 This Design Pattern describes how and when dates and timestamps are generated by the Data Warehouse.
diff --git a/docs/design-patterns/design-pattern-generic-referential-integrity.md b/docs/design-patterns/design-pattern-generic-referential-integrity.md
@@ -1,17 +1,14 @@
 # Design Pattern - Generic - Referential Integrity
 
-## Purpose
-
+> [!WARNING]
+> This design pattern requires a major update to refresh the content.
 
+## Purpose
 
 ## Motivation
 
-
-
 ## Applicability
 
-
-
 ## Structure
 
  Referential Integrity and constraints
@@ -33,12 +30,10 @@ Every Data Warehouse table contains a predefined set of metadata attributes, whi
 
 ## Implementation guidelines
 
-
-
 ## Considerations and consequences
 
 TBD
 
 ## Related patterns
 
-N/A.
+N/A
diff --git a/docs/design-patterns/design-pattern-logical-natural-business-relationship.md b/docs/design-patterns/design-pattern-logical-natural-business-relationship.md
@@ -4,6 +4,9 @@ uid: design-pattern-logical-natural-business-relationship
 
 # Design Pattern - Natural Business Relationship
 
+> [!WARNING]
+> This design pattern requires a major update to refresh the content.
+
 ## Purpose
 
 This design pattern defines a Natural Business Relationship (NBR) entity.
diff --git a/docs/index.md b/docs/index.md
@@ -18,7 +18,7 @@ Design and implementation of data solutions can be a labour-intensive activity t
 
 Over time, as requirements change and demand for data increases, the architecture faces challenges in the complexity, consistency and flexibility in the design (and maintenance) of the data integration processes.
 
-These changes can include latency and availability requirements, a bigger variety of operational systems that generate data, and the need to expose information in different ways. At the same time, data tends to increate in volume, variety, and velocity.
+These changes can include latency and availability requirements, a bigger variety of operational systems that generate data, and the need to expose information in different ways. At the same time, data tends to increase in volume, variety, and velocity.
 
 These issues are compounded by an absence of agreed industry best practices; which in turn leads to various ad-hoc patterns being implemented based on an individual's experience (or lack thereof). Implications of (poor) design decisions are often not fully understood, and only become apparent when the investment in time and money has already been done.