Skip to content

Commit d9cd021

Browse files
committed
Clean-up part 2
1 parent 684f609 commit d9cd021

20 files changed

+421
-221
lines changed

Data Integration Framework - Introduction.md

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -83,13 +83,9 @@ A full overview is provided below:
8383

8484
- The (reference) **Solution Architecture** documentation is composed of the following documents:
8585
- Data Integration – 1 – Overview. The current document, providing an overview of Data Integration components.
86-
- Data Integration – 2 – Reference Architecture. The reference architecture describes the elements that comprise the (enterprise) Data Warehouse and Business Intelligence foundations, with the details showing how these elements fit together. It also provides the principles and guidelines to enable the design and development of Business Intelligence applications together with a Data Warehouse foundation that is scaleable, maintainable and flexible to meet business needs. These high level designs and principles greatly influence and direct the technical implementation and components
87-
- Data Integration – 3 – Staging Layer. This document covers the specific requirements and design of the Staging Layer. The document specifies how to set up a Staging Area and History Area
88-
- Data Integration – 4 – Integration Layer. This document covers the specific requirements and design of the Integration Layer; the core Enterprise Data Warehouse
89-
- Data Integration – 5 – Presentation Layer. This document covers the specific requirements and design of the Data Marts in the Presentation Layer which supports the Business Intelligence front-end.
90-
- Data Integration – 6 – Metadata Model. This document covers the complete process of controlling the system, which ties in with every step in the architecture. All ETL processes make use of the metadata and this document provides the overview of the entire concept. The model can be deployed as a separate module
91-
- Data Integration - 7- Error handling and recycling process, which ties in with every step in the architecture. Elements of the error handling and recycling documentation can be used in a variety of situations
92-
- Data Integration – 8 – OMD Framework Detailed Design. This document provides detailed process descriptions for the ETL process control (Operational Meta Data model – OMD).
86+
- Data Integration – 2 – Staging Layer. This document covers the specific requirements and design of the Staging Layer. The document specifies how to set up a Staging Area and History Area
87+
- Data Integration – 3 – Integration Layer. This document covers the specific requirements and design of the Integration Layer; the core Enterprise Data Warehouse
88+
- Data Integration – 4 – Presentation Layer. This document covers the specific requirements and design of the Data Marts in the Presentation Layer which supports the Business Intelligence front-end.
9389

9490
- Design Patterns. Detailed backgrounds on design principles: the how-to’s. Design Patterns provide best-practice approaches to typical Data Warehouse challenges. At the same time the Design Patterns provide a template to document future design decisions.
9591
- Solution Patterns. Highly detailed implementation documentation for specific software platforms. Typically a single Design Patterns is referred to by multiple Solution Patterns, all of which document how to exactly implement the concept using a specific technology

Data Integration Framework - Reference Solution Architecture - 2 - Staging Layer.md

Lines changed: 27 additions & 29 deletions
Large diffs are not rendered by default.

Data Integration Framework - Reference Solution Architecture - 3 - Integration Layer.md

Lines changed: 25 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Introduction
1+
# Integration Layer overview
22

33
The Integration Layer is the second layer in the reference Data Warehouse solution architecture. This Layer is not designed to be accessible by (end) users of the information but serves as the true Data Warehouse Layer, where information is maintained in such a way that it is both resilient and flexible. The Integration Layer sources its information from the Staging Area and stores it in a consistent and atomic way, without applying business logic. This data can then be presented in a consumable form in the Presentation Layer.
44

@@ -8,8 +8,6 @@ The design and approach for modelling the Integration Layer is a project specifi
88

99
If the Solution Architecture for a project is defined as ‘2-tiered’ – the classic Kimball approach – the Integration Layer is not implemented.
1010

11-
# Integration Layer overview
12-
1311
The Integration Layer, or the process from staging to integration, is comprised of two parts (or areas): the Integration Area and the Interpretation Area. The Integration Layer is a persistent Layer.
1412

1513
The Integration Area is the phase where data from the Staging Layer is re-modelled and changes in attributes are captured and tracked using the Slowly Changing Dimension (SCD) 2 technique. Surrogate keys for new records are also identified and assigned prior to the loading of the attributes.
@@ -52,19 +50,19 @@ The following metadata attributes are mandatory for the Surrogate Key tables:
5250
| **Column Name** | **Data Type** | **Reasoning** |
5351
| ----------------------------- | ---------------------------- | ------------------------------------------------------------ |
5452
| <entity>_SK | INTEGER or CHAR(32) / Hash | The Data Warehouse key; an unique identifier and also the primary key which is issued for each record in the table. It can be a meaningless key (sequence) or hashed value |
55-
| OMD_INSERT_MODULE_INSTANCE_ID | INTEGER | Default OMD; logging which process has inserted the record |
56-
| OMD_FIRST_SEEN_DATETIME | DATETIME (high precision) | This is the time that the record has been presented to the Data Warehouse environment. This is not the system date/time for insert however, but the original processing time for the records to be loaded into the Staging Area. The Insert Date/Time is the conceptual Event Date/Time; the date time when the source event was triggered or the change in the source has taken place. It can be the moment a user updated a record in a source system, or the trigger which caused a message to be sent. |
57-
| OMD_RECORD_SOURCE_ID | INTEGER | The relation to the OMD table which contains the identification of the source system that originally supplied the information. |
53+
| | INTEGER | Default; logging which process has inserted the record |
54+
| | DATETIME (high precision) | This is the time that the record has been presented to the Data Warehouse environment. This is not the system date/time for insert however, but the original processing time for the records to be loaded into the Staging Area. The Insert Date/Time is the conceptual Event Date/Time; the date time when the source event was triggered or the change in the source has taken place. It can be the moment a user updated a record in a source system, or the trigger which caused a message to be sent. |
55+
| Record Source Id | INTEGER | The relation to the ETL process control table which contains the identification of the source system that originally supplied the information. |
5856
| <business key> | Depending | The business key value |
5957

6058
The following attributes are optional for the Surrogate Key tables depending on the approach for Data Modelling:
6159

6260
| **Column Name** | **Data Type** | **Reasoning** |
6361
| ----------------------------- | ---------------------------- | ------------------------------------------------------------ |
64-
| OMD_EFFECTIVE_DATETIME | DATETIME (high precision) | Start of the validity period for the record. Equal to the OMD_INSERT_DATETIME; this is not the system date/time, but the information recorded during the Staging Area ETL process. |
65-
| OMD_EXPIRY_DATETIME | DATETIME (high precision) | The date time when the record was closed. Records are closes based on changes in the history (alteration or deletion). The value of this attribute is the value of the valid start date time of the previous related. The default value is 99991231 23:59:59. |
66-
| OMD_CURRENT_RECORD_INDICATOR | VARCHAR(100) | The flag (Y/N) whether this record is active. This makes selection and querying easier, but is essentially twice redundant. If possible use the Expiry Date/Time for this purpose. |
67-
| OMD_UPDATE_MODULE_INSTANCE_ID | INTEGER | The module ID of the ETL process which has updated the record. |
62+
| Effective date / time | DATETIME (high precision) | Start of the validity period for the record. Equal to the Load Date / Time; this is not the ystem date/time, but the information recorded during the Staging Area ETL process. |
63+
| Expiry date / time | DATETIME (high precision) | The date time when the record was closed. Records are closes based on changes in the history (alteration or deletion). The value of this attribute is the value of the valid start date time of the previous related. The default value is 99991231 23:59:59. |
64+
| Current record indicator | VARCHAR(100) | The flag (Y/N) whether this record is active. This makes selection and querying easier, but is essentially twice redundant. If possible use the Expiry Date/Time for this purpose. |
65+
| ETL Process control Id | INTEGER | The module ID of the ETLvprocess which has updated the record. |
6866

6967
The use of a ‘valid period of time’ (start and end date time) including the current record indicator is optional. There can be sound reasons for including these metadata attributes in a surrogate key table when source systems can reuse their own keys and specific logic has to be created to determine if a reused key is in fact a new instance of an entity or that an old one has been reopened.
7068

@@ -76,24 +74,24 @@ The following metadata attributes are mandatory for the history tables:
7674

7775
| **Column Name** | **Data Type** | **Reasoning** |
7876
| ----------------------------- | ---------------------------- | ------------------------------------------------------------ |
79-
| <entity>_SK | INTEGER or CHAR(32) / Hash | The Data Warehouse key; an unique identifier and also the primary key which is issued for each record in the table. It can be a meaningless key (sequence) or hashed value. This is inherited from the parent table as Foreign Key |
80-
| OMD_EFFECTIVE_DATETIME | DATETIME (high precision) | Start of the validity period for a record. Populated by the OMD_INSERT_DATETIME value from the Staging Area this is not the system date/time but the information recorded during the Staging Area ETL process. |
81-
| OMD_INSERT_MODULE_INSTANCE_ID | INTEGER | Default OMD attribute for any table for logging which process has inserted the record. |
82-
| OMD_UPDATE_MODULE_INSTANCE_ID | INTEGER | The module ID of the ETL process which has updated the record. |
83-
| OMD_RECORD_SOURCE_ID | INTEGER | The relation to the OMD table which contains the identification of the source system that originally supplied the information. |
84-
| OMD_SOURCE_ROW_ID | INTEGER | Copied from the Staging Area. The combination of OMD_INSERT_MODULE_INSTANCE_ID and OMD_SOURCE_ROW_ID always relate back to a single History Area record |
85-
| OMD_DELETED_RECORD_INDICATOR | VARCHAR(100) | This flag (Y/N) indicates that the record has been deleted from the source system. |
77+
| <entity>_<key> | INTEGER or CHAR(32) / Hash | The Data Warehouse key; an unique identifier and also the primary key which is issued for each record in the table. It can be a meaningless key (sequence) or hashed value. This is inherited from the parent table as Foreign Key |
78+
| Effective Date / Time | DATETIME (high precision) | Start of the validity period for a record. Populated by the Load Date / Time value from the Staging Area this is not the system date/time but the information recorded during the Staging Area ETL process. |
79+
| ETL process control Id | INTEGER | Default ETL process control attribute for any table for logging which process has inserted the record. |
80+
| ETL process control Id | INTEGER | The module ID of the ETL process which has updated the record. |
81+
| Record Source Id | INTEGER | The relation to the ETL process control table which contains the identification of the source system that originally supplied the information. |
82+
| Source Row Id | INTEGER | Copied from the Staging Area. The combination of ETL process control Id and Source Row Id always relate back to a single History Area record |
83+
| Deleted Record Indicator | VARCHAR(100) | This flag (Y/N) indicates that the record has been deleted from the source system. |
8684

8785
The following attributes are optional for the history tables in the Integration Layer:
8886

8987
| **Column Name** | **Data Type** | **Reasoning** |
9088
| ---------------------------- | ---------------------------- | ------------------------------------------------------------ |
91-
| OMD_EXPIRY_DATETIME | DATETIME (high precision) | The date time when the record was closed. Records are closes based on changes in the history (alteration or deletion). The value of this attribute is the value of the valid start date time of the previous related record minus 1 second. The default value is 99991231 23:59:59. |
92-
| OMD_CURRENT_RECORD_INDICATOR | VARCHAR(100) | The flag (Y/N) whether this record is active. This makes selection and querying easier. |
89+
| Expiry Date / Time | DATETIME (high precision) | The date time when the record was closed. Records are closes based on changes in the history (alteration or deletion). The value of this attribute is the value of the valid start date time of the previous related record minus 1 second. The default value is 99991231 23:59:59. |
90+
| Current Record Indicator | VARCHAR(100) | The flag (Y/N) whether this record is active. This makes selection and querying easier. |
9391
| | | |
94-
| OMD_HASH_FULL_RECORD | CHAR(32) | A checksum for record comparison requires storing a checksum value as an attribute. |
92+
| Hash Full Record | CHAR(32) | A checksum for record comparison requires storing a checksum value as an attribute. |
9593

96-
In history tables the Primary Key is composed of the <entity_SK> and the OMD_EXPIRY_DATETIME attributes.
94+
In history tables the Primary Key is composed of the <entity_SK> and the Expiry Date / Time attributes.
9795

9896
The optional attributes include all reference data which relates to the entity Data Warehouse key. In the example of an employee record the person ID would lead to the generation of a new surrogate key, while all descriptive attributes are placed in the history table. Depending on considerations regarding volume or width of the table (in terms of records, bytes) different history records can be placed in different history tables, but always with the same structure as described in the above table.
9997

@@ -103,12 +101,12 @@ The relationship table structure is largely dependent on the applied modelling t
103101

104102
| **Column Name** | **Data Type** | **Reasoning** |
105103
| ---------------------------------------------- | ---------------------------- | ------------------------------------------------------------ |
106-
| <relationship_SK> | INTEGER or CHAR(32) / Hash | The Data Warehouse key; an unique identifier and also the primary key which is issued for each record in the table. It can be a meaningless key (sequence) or hashed value |
107-
| <entity>_SK (one side of the relationship) | INTEGER or CHAR(32) / Hash | A unique identifier; the Data Warehouse key obtained from the Surrogate Key table. |
108-
| <entity>_SK (other side of the relationship) | INTEGER or CHAR(32) / Hash | A unique identifier; the Data Warehouse key obtained from the Surrogate Key table. |
109-
| OMD_INSERT_MODULE_INSTANCE_ID | INTEGER | Default OMD; logging which process has inserted the record |
110-
| OMD_FIRST_SEEN_DATETIME | DATETIME (high precision) | This is the time that the record has been presented to the Data Warehouse environment. This is not the system date/time for insert however, but the processing time for the records to be moved into the Staging Area. |
111-
| OMD_RECORD_SOURCE_ID | INTEGER | The relation to the OMD table which contains the identification of the source system that originally supplied the information. |
104+
| <relationship>_<key> | INTEGER or CHAR(32) / Hash | The Data Warehouse key; an unique identifier and also the primary key which is issued for each record in the table. It can be a meaningless key (sequence) or hashed value |
105+
| <entity>_<key> (one side of the relationship) | INTEGER or CHAR(32) / Hash | A unique identifier; the Data Warehouse key obtained from the Surrogate Key table. |
106+
| <entity>_<key> (other side of the relationship) | INTEGER or CHAR(32) / Hash | A unique identifier; the Data Warehouse key obtained from the Surrogate Key table. |
107+
| ETL Process Control id | INTEGER | Default; logging which process has inserted the record |
108+
| Load Date / Time Stamp | DATETIME (high precision) | This is the time that the record has been presented to the Data Warehouse environment. This is not the system date/time for insert however, but the processing time for the records to be moved into the Staging Area. |
109+
| Source Row Id | INTEGER | The relation to the ETL process control table which contains the identification of the source system that originally supplied the information. |
112110

113111

114112

0 commit comments

Comments
 (0)