You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 100_Overview/Data Integration Framework - Overview.md
+26-28Lines changed: 26 additions & 28 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,18 +1,16 @@
1
1
# Introduction
2
2
3
-
## Simple, Agile, Data Warehousing
3
+
The Data Integration Framework provides a software and methodology independent, structured approach to data integration. The framework is designed to facilitate a flexible and affordable development cycle.
4
4
5
-
The Data Integration Framework is a complete and proven solution for data management using a fast, consistent, robust and easy to use agile approach.
5
+
By fit for purpose pre-defined documents, templates, design decisions, built-in error handling, auditability and process control the framework provides the consistency and structure for future-proof data integration and Data Warehouse design and development on any platform.
6
6
7
-
The ETL Framework provides a software and methodology independent, structured approach to data integration. The framework is designed to facilitate a flexible and affordable development cycle. By fit for purpose pre-defined documents, templates, design decisions, built-in error handling, auditability and process control the framework provides the consistency and structure for future-proof data integration and Data Warehouse design and development on any platform.
7
+
It is not a one-size-fits all solution; everything is defined in a modular way and different elements can be applied to suit the needs of each individual project. The Data Integration has a variety of content components which can be used in conjunction with each other or as stand-alone addition to existing management information solutions.
8
8
9
-
It is not a one-size-fits all solution; everything is defined in a modular way and different elements can be applied to suit the needs of each individual project. The ETL Framework has a variety of content components which can be used in conjunction with each other or as stand-alone addition to existing management information solutions.
9
+
The fundamental principle of the Data Integration is to design for change by decoupling 'warehouse logic' and 'business logic' and ensuring every ETL process can run and recover at any point in time (parallel) without impacting dependencies to other processes. The framework provides a standard to manage all data as an *asset* for the organisation and ensures that any new or changed requirements in reporting and information delivery can be met without re-engineering the underlying foundations. By decoupling the business logic it is possible to add new sources of information in a straightforward and consistent way without impacting existing reporting or analysis initiatives
10
10
11
-
The fundamental principle of the ETL Framework is to design for change by decoupling 'warehouse logic' and 'business logic' and ensuring every ETL process can run and recover at any point in time (parallel) without impacting dependencies to other processes. The framework provides a standard to manage all data as an *asset* for the organisation and ensures that any new or changed requirements in reporting and information delivery can be met without re-engineering the underlying foundations. By decoupling the business logic it is possible to add new sources of information in a straightforward and consistent way without impacting existing reporting or analysis initiatives
11
+
The Data Integration does not break with established approaches and schools of thought but has defined the optimal combination of well-known and understood techniques to simplify data and information delivery.
12
12
13
-
The ETL Framework does not break with established approaches and schools of thought but has defined the optimal combination of well-known and understood techniques to simplify data and information delivery.
14
-
15
-
## Why need an ETL Framework?
13
+
## Why need an Data Integration?
16
14
17
15
‘If we want better performance we can buy better hardware, unfortunately we cannot buy a more maintainable or more reliable system’.
18
16
@@ -22,13 +20,13 @@ These changes can include changes in latency, the bigger variety of sources or t
22
20
23
21
Using a flexible ETL approach will meet these challenges by providing structure, flexibility and scalability for the design of data integration flows.
24
22
25
-
Today’s BI architecture is typically designed to store structured data for strategic decision making where a small number of (expert) users analyse (historical) data and reports. Data is typically periodically extracted, cleansed, integrated and transformed in a Data Warehouse from a heterogeneous set of sources. The focus for ETL has been on ‘correct functionality’ and ‘adequate performance’ but this focus misses key elements that are equally important for success. These elements, such as the consistency, degree of atomicity, ability to rerun, scalability and robustness are addressed by using the ETL Framework.
23
+
Today’s BI architecture is typically designed to store structured data for strategic decision making where a small number of (expert) users analyse (historical) data and reports. Data is typically periodically extracted, cleansed, integrated and transformed in a Data Warehouse from a heterogeneous set of sources. The focus for ETL has been on ‘correct functionality’ and ‘adequate performance’ but this focus misses key elements that are equally important for success. These elements, such as the consistency, degree of atomicity, ability to rerun, scalability and robustness are addressed by using the Data Integration.
26
24
27
25
Future data solutions should for example be able to cater for sending back cleansed or interpreted data to the operational systems. They also should be able to cope with unstructured data next to the structured data and must be able to quickly respond to changes in (business) requirements. Lastly, it will need to support a ‘feedback loop’ to incorporate changes made by (authorised) end-users in the front-end environments.
28
26
29
-
To be ready for future changes the next generation data integration and ETL designs must support a methodology which provides the foundation for a flexible approach. Without this structured approach to data integration design the solution will ultimately risk becoming the ‘spaghetti of code and rules’ that it was initially meant to replace. That is why we need an ETL Framework.
27
+
To be ready for future changes the next generation data integration and ETL designs must support a methodology which provides the foundation for a flexible approach. Without this structured approach to data integration design the solution will ultimately risk becoming the ‘spaghetti of code and rules’ that it was initially meant to replace. That is why we need an Data Integration.
30
28
31
-
The ETL Framework provides a structured approach to data integration design for an easy, flexible and affordable development cycle. By providing architecture documents and mapping templates, design decisions and built-in error handling and process control the ETL Framework provides the consistency and structure for future-proof ETL on any platform.
29
+
The Data Integration provides a structured approach to data integration design for an easy, flexible and affordable development cycle. By providing architecture documents and mapping templates, design decisions and built-in error handling and process control the Data Integration provides the consistency and structure for future-proof ETL on any platform.
32
30
33
31
## Key benefits
34
32
@@ -40,7 +38,7 @@ The ETL Framework provides a structured approach to data integration design for
40
38
- Model driven design; define the information model, and expand your solution gradually and consistently from there. ETL is automatically generated using the model specifications
41
39
- ETL quality and consistency; template driven ETL automation based on a conceptual framework provides a repeatable and dynamic development process which reduces the need for extensive documentation and delivers deterministic and high quality ETL logic
42
40
- A documented and sound foundation for the Data Warehouse; the highly structure and complete documentation of all framework components provide a full picture from the high level concepts all the way down to the technical implementation for a large variety of ETL platforms
43
-
- The ETL Framework provides the rules; only the focus on the necessary data (input) and the reporting (output) is required
41
+
- The Data Integration provides the rules; only the focus on the necessary data (input) and the reporting (output) is required
44
42
45
43
## Intent and foundational principles
46
44
@@ -50,7 +48,7 @@ To accurately and quickly adapt to business needs the intended data solution sho
50
48
51
49
52
50
53
-
# ETL Framework overview
51
+
# Data Integration overview
54
52
55
53
## Components
56
54
@@ -79,23 +77,23 @@ The complete deployment of all components supports an industry standard flexible
79
77
80
78
81
79
82
-
## ETL Framework documentation breakdown
80
+
## Data Integration documentation breakdown
83
81
84
-
The ETL Framework consists of the following documents:
82
+
The Data Integration consists of the following documents:
- The (reference) **Solution Architecture** documentation is composed of the following documents:
91
-
-ETL Framework – 1 – Overview. The current document, providing an overview of ETL Framework components.
92
-
-ETL Framework – 2 – Reference Architecture. The reference architecture describes the elements that comprise the (enterprise) Data Warehouse and Business Intelligence foundations, with the details showing how these elements fit together. It also provides the principles and guidelines to enable the design and development of Business Intelligence applications together with a Data Warehouse foundation that is scaleable, maintainable and flexible to meet business needs. These high level designs and principles greatly influence and direct the technical implementation and components
93
-
-ETL Framework – 3 – Staging Layer. This document covers the specific requirements and design of the Staging Layer. The document specifies how to set up a Staging Area and History Area
94
-
-ETL Framework – 4 – Integration Layer. This document covers the specific requirements and design of the Integration Layer; the core Enterprise Data Warehouse
95
-
-ETL Framework – 5 – Presentation Layer. This document covers the specific requirements and design of the Data Marts in the Presentation Layer which supports the Business Intelligence front-end.
96
-
-ETL Framework – 6 – Metadata Model. This document covers the complete process of controlling the system, which ties in with every step in the architecture. All ETL processes make use of the metadata and this document provides the overview of the entire concept. The model can be deployed as a separate module
97
-
-ETL Framework - 7- Error handling and recycling process, which ties in with every step in the architecture. Elements of the error handling and recycling documentation can be used in a variety of situations
98
-
-ETL Framework – 8 – OMD Framework Detailed Design. This document provides detailed process descriptions for the ETL process control (Operational Meta Data model – OMD).
89
+
-Data Integration – 1 – Overview. The current document, providing an overview of Data Integration components.
90
+
-Data Integration – 2 – Reference Architecture. The reference architecture describes the elements that comprise the (enterprise) Data Warehouse and Business Intelligence foundations, with the details showing how these elements fit together. It also provides the principles and guidelines to enable the design and development of Business Intelligence applications together with a Data Warehouse foundation that is scaleable, maintainable and flexible to meet business needs. These high level designs and principles greatly influence and direct the technical implementation and components
91
+
-Data Integration – 3 – Staging Layer. This document covers the specific requirements and design of the Staging Layer. The document specifies how to set up a Staging Area and History Area
92
+
-Data Integration – 4 – Integration Layer. This document covers the specific requirements and design of the Integration Layer; the core Enterprise Data Warehouse
93
+
-Data Integration – 5 – Presentation Layer. This document covers the specific requirements and design of the Data Marts in the Presentation Layer which supports the Business Intelligence front-end.
94
+
-Data Integration – 6 – Metadata Model. This document covers the complete process of controlling the system, which ties in with every step in the architecture. All ETL processes make use of the metadata and this document provides the overview of the entire concept. The model can be deployed as a separate module
95
+
-Data Integration - 7- Error handling and recycling process, which ties in with every step in the architecture. Elements of the error handling and recycling documentation can be used in a variety of situations
96
+
-Data Integration – 8 – OMD Framework Detailed Design. This document provides detailed process descriptions for the ETL process control (Operational Meta Data model – OMD).
99
97
100
98
- Design Patterns. Detailed backgrounds on design principles: the how-to’s. Design Patterns provide best-practice approaches to typical Data Warehouse challenges. At the same time the Design Patterns provide a template to document future design decisions.
101
99
- Solution Patterns. Highly detailed implementation documentation for specific software platforms. Typically a single Design Patterns is referred to by multiple Solution Patterns, all of which document how to exactly implement the concept using a specific technology
@@ -115,29 +113,29 @@ The Reference Architecture and the corresponding Technical (Solution) Architectu
115
113
116
114
For instance, the Reference Architecture (Staging Layer component) states that the loading of Flat Files should be broken in different process steps where data type conversions must be performed. It also states that Flat Files should be archived after processing, and why.
117
115
118
-
In this example the Design Pattern would refer to the ‘AGA ETL Framework - 2 - Staging Layer’ document and related Solution Patterns to define the necessary elements like storing the file creation date, unzipping and moving files, creating file lists and other necessary steps.
116
+
In this example the Design Pattern would refer to the ‘AGA Data Integration - 2 - Staging Layer’ document and related Solution Patterns to define the necessary elements like storing the file creation date, unzipping and moving files, creating file lists and other necessary steps.
119
117
120
118
# Adoption
121
119
122
120
## Positioning
123
121
124
-
The ETL Framework should be viewed as one part of the larger (enterprise) architecture. The purpose is to specify how the ETL and the data model can be configured for an optimal Enterprise Data Warehouse implementation. This is a detailed (albeit significant) component in the Data Warehouse architecture which in itself includes other components such as system landscape, subject areas and the Business Intelligence and Data domain.
122
+
The Data Integration should be viewed as one part of the larger (enterprise) architecture. The purpose is to specify how the ETL and the data model can be configured for an optimal Enterprise Data Warehouse implementation. This is a detailed (albeit significant) component in the Data Warehouse architecture which in itself includes other components such as system landscape, subject areas and the Business Intelligence and Data domain.
The reference architecture serves as an outline to relate ETL examples and best-practices to. The main purpose is to create a common ground where every developer can use the same approach and background to contribute to a common integrated data repository.
131
129
132
130
Because of its nature as reference architecture not all components necessarily have to be deployed for an individual project. In some scenarios components are integrated in existing solutions or structures. For this reason the solutions will be as designed to be as modular as possible thus enabling the utilisation of specific components.
133
131
134
132
The framework Reference Architecture is a standard approach for Enterprise Data Warehouse and Business Intelligence design and implementation. The most important aspect is to understand which ETL and Data Warehousing concepts are used in what Layer of the architecture and the reasoning behind this.
135
133
136
-
The Reference Architecture also provides the basic structure for the documentation of the ETL Framework. Every component of the design and implementation relates back to this architecture, including tips, tricks and examples of implementation options. The implementation solutions for this architecture are designed to be as generic as possible without losing practical value.
134
+
The Reference Architecture also provides the basic structure for the documentation of the Data Integration. Every component of the design and implementation relates back to this architecture, including tips, tricks and examples of implementation options. The implementation solutions for this architecture are designed to be as generic as possible without losing practical value.
137
135
138
136
## Executing a project
139
137
140
-
In principle every project that contributes to the common integrated data model as executed within the AGA ETL Framework follows the same approach. At a high level this is as follows:
138
+
In principle every project that contributes to the common integrated data model as executed within the AGA Data Integration follows the same approach. At a high level this is as follows:
141
139
142
140
- Define Solution Architecture and Technical Architecture based on the framework reference architecture. Effectively this defines how sources are interfaced and integrated into the central model. For instance how to collect data delta (CDC) following the framework principles, where data should be integrated in the structured or unstructured world etc.
143
141
- Define Project Scope; this is a breakdown of a requirement in data terms. What data is needed to meet requirements in the broader sense (answer this and similar questions). This scope of data becomes the input for bottom-up planning of data integration
0 commit comments