|
1 | 1 | # Design Pattern - Generic - Loading Landing Area tables using Record Condensing
|
2 | 2 |
|
| 3 | +> [!WARNING] |
| 4 | +> This design pattern requires a major update to refresh the content. |
| 5 | +
|
3 | 6 | ## Purpose
|
| 7 | + |
4 | 8 | This Design Pattern specifies how a data source that contains multiple changes for the same business key is processed. For instance when using �net changes� within a Change Data Capture interval or when the source application supplies redundant records.
|
| 9 | + |
5 | 10 | Motivation
|
6 |
| -This process is optional for the Staging Area; its application depends on the specific (nature of the) data source itself. The reason to implement a �condense� process in the Staging Area ETL is to prevent implementing this logic in multiple locations when loading data out of the Staging Area (to the History and Integration Areas). During this process no information is lost, only redundant records are removed. These are records that are, in reality, no changes at all in the Data Warehouse context. |
| 11 | + |
| 12 | +This process is optional for the Staging Area; its application depends on the specific (nature of the) data source itself. The reason to implement a 'condense' process in the Staging Area ETL is to prevent implementing this logic in multiple locations when loading data out of the Staging Area (to the History and Integration Areas). During this process no information is lost, only redundant records are removed. These are records that are, in reality, no changes at all in the Data Warehouse context. |
| 13 | + |
7 | 14 | Also known as
|
8 | 15 | Condensing Records
|
9 | 16 | Net changes
|
10 |
| -Applicability |
| 17 | + |
| 18 | +## Applicability |
| 19 | + |
11 | 20 | This pattern is only applicable for loading processes from source systems or files to the Staging Area (of the Staging Layer) only. Also, this process should only be added to the Staging Area ETL when the data source shows this particular behaviour or when a history of changes is loaded in one run (catch-up for instance).
|
12 | 21 | Structure
|
13 | 22 | Depending on the nature of the source data, the following situation may occur. In this example these are the original records as they appear in the source system:
|
@@ -45,49 +54,25 @@ Cheese
|
45 | 54 | This is a situation where the condensing process can be implemented so that the record will not be inserted into the Data Warehouse as a new record (without there being a change).
|
46 | 55 | The process to do this is as follows:
|
47 | 56 |
|
| 57 | +## Implementation guidelines |
48 | 58 |
|
49 |
| - Figure 1: Record condensing in STG |
50 |
| -Implementation guidelines |
51 | 59 | The condensation process should be part of the Staging Area ETL process.
|
52 | 60 | Depending on the available ETL software this process can be defined as a reusable or generic object.
|
53 | 61 | This Design Pattern attempts to avoid ETL design where you have to run a source which contains multiple intervals (typically days) of data multiple times to correctly record the history. With this concept the entire history can be loaded in one run.
|
54 | 62 | If all changes from a CDC source are captured this process is not required.
|
55 | 63 | There is a performance overhead when processing larger deltas or when running an initial load.
|
56 | 64 | Typically CDC sources where not all changes are processed but only the net changes for an interval. For instance when only the last change per day should be processed.
|
57 | 65 | Message sources can have the same issue when treated the same way (only last record state per interval).
|
58 |
| -Design Pattern 015 � Generic � Loading Staging Area tables. |
59 |
| -Design Pattern 021 � Generic � Using CDC. |
60 |
| -Consequences |
61 |
| -There is a performance overhead when processing larger deltas or when running an initial load. |
62 |
| -Known uses |
63 |
| -Typically CDC sources where not all changes are processed but only the net changes for an interval. For instance when only the last change per day should be processed. |
64 |
| -Message sources can have the same issue when treated the same way (only last record state per interval). |
65 |
| -Related patterns |
66 |
| -Design Pattern 015 � Generic � Loading Staging Area tables.Design Pattern 015 - Generic - Loading Staging Area Tables |
67 |
| -Design Pattern 021 � Generic � Using CDC. |
68 |
| -Discussion items (not yet to be implemented or used until final) |
69 |
| -None. |
70 | 66 |
|
71 |
| -## Motivation |
72 |
| - |
73 |
| - |
74 |
| - |
75 |
| -## Applicability |
76 |
| - |
77 |
| - |
78 |
| - |
79 |
| -## Structure |
80 |
| - |
81 |
| - |
82 |
| - |
83 |
| -## Implementation guidelines |
84 |
| - |
85 |
| - |
86 |
| - |
87 |
| -## Considerations and consequences |
| 67 | +## Consequences and considerations |
88 | 68 |
|
| 69 | +There is a performance overhead when processing larger deltas or when running an initial load. |
89 | 70 |
|
| 71 | +Typically CDC sources where not all changes are processed but only the net changes for an interval. For instance when only the last change per day should be processed. |
| 72 | +Message sources can have the same issue when treated the same way (only last record state per interval). |
90 | 73 |
|
91 | 74 | ## Related patterns
|
92 | 75 |
|
93 |
| -- |
| 76 | +* Design Pattern - Generic - Loading Staging Area tables |
| 77 | +* Design Pattern - Generic - Loading Staging Area Tables |
| 78 | +* Design Pattern - Generic - Using CDC. |
0 commit comments