Skip to content

Commit 4f776fe

Browse files
committed
Update README.md
1 parent 914bb36 commit 4f776fe

File tree

1 file changed

+20
-43
lines changed

1 file changed

+20
-43
lines changed

README.md

Lines changed: 20 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,67 +1,46 @@
1-
# Data Integration framework - Overview
1+
# Data Solution Framework
22

33
## Introduction
44

5-
The Data Integration framework provides a software and methodology independent, structured approach to developing data processes.
5+
The Data Solution Framework provides a software and methodology independent, structured approach to developing data solutions.
66

7-
The framework is designed to facilitate a platform-independent, flexible and manageable development cycle. It contains pre-defined documents, templates, design decisions, implementation approaches as well as auditing and process control (orchestration).
7+
It is a collection of documents that can be uses as a 'recipe' for creating and managing data solutions. The framework contains example solution designs, design patterns, and implementation approaches which can be adopted to create various types of data solution - ideally leveraging the experience consolidated in these designs.
88

9-
The framework is defined in a modular way, allowing different elements to be selected to suit the needs individual data solutions. The individual components can be used in conjunction with each other or as stand-alone additions to existing data solutions.
9+
The framework is defined in a modular way, allowing different elements to be selected to suit the needs individual data solutions. The individual components can be used in conjunction with each other, or as stand-alone additions to existing data solutions. As such, the framework is agnostic of technology and methodology, and designed to facilitate a platform-independent, flexible and manageable development cycle.
1010

11-
To enable collective maintenance of this body of knowledge these standards are developed and maintained using the MarkDown format on Github.
11+
On several occasions, the Data Solution Framework mentions an a control framework for data logistics (also referred to as Extract, Transform, and Load processes - ETL). Although other control frameworks can be added, the default option for this is the DIRECT framework as maintained in the [DIRECT repository](https://github.com/data-solution-automation-engine/DIRECT), which is part of the [engine for data solution automation](https://github.com/data-solution-automation-engine).
1212

13-
On several occasions, the Data Integration framework makes mention of the ETL process control framework. Although other control frameworks can be added, the default option for this is the DIRECT framework as maintained in the [DIRECT Github](https://github.com/RoelantVos/DIRECT) (private at the moment while being finalised).
13+
### Why use a framework?
1414

15-
## Why need a Data Integration framework?
15+
*‘If we want better performance we can usually buy better hardware, unfortunately we cannot buy a more maintainable or reliable system’.*
1616

17-
*‘If we want better performance we can buy better hardware, unfortunately we cannot buy a more maintainable or reliable system’.*
17+
Design and implementation of data solutions can be a labour-intensive activity that typically consumes large amounts of effort, time, and cost. In his book “Thank you for being late” (2016), Thomas Friedman describes and explains the drivers behind technology’s increasing rate of change and how it enables innovation to a point where human adaptability it outpaced.
1818

19-
Design and implementation of data integration can be a labour-intensive activity that typically consumes large amounts of effort in Data Warehouse and data integration projects.
19+
Over time, as requirements change and demand for data increases, the architecture faces challenges in the complexity, consistency and flexibility in the design (and maintenance) of the data integration processes.
2020

21-
Over time, as requirements change and enterprises become more data-driven, the architecture faces challenges in the complexity, consistency and flexibility in the design (and maintenance) of the data integration flows.
21+
These changes can include latency and availability requirements, a bigger variety of operational systems that generate data, and the need to expose information in different ways. At the same time, data tends to increate in volume, variety, and velocity.
2222

23-
These changes can include changes in latency and availability requirements, a bigger variety of sources or the need to expose information in different ways. This typically occurs when adoption of data and information products (i.e. BI, Analytics) matures within an organisation and the need to have up-to-date information becomes more mission critical.
23+
These issues are compounded by an absence of agreed industry best practices; which in turn leads to various ad-hoc design patterns being implemented based on an individual's experience (or lack thereof). Implications of (poor) design decisions are often not fully understood, and only become apparent when the investment in time and money has already been done.
2424

25-
Using a standard data integration approach will meet these challenges by providing structure, flexibility and scalability for the design of data flows.
25+
The framework aims to provide the means to address these challenges, by consolidating past experiences into standardized and community-managed design patterns. Without a high quality library of patterns, the design, implementation and testing of data solutions is often sporadic, and not sufficient to provide high-quality results.
2626

27-
In a more traditional configuration, data solutions are often designed to store structured data for strategic decision making. This type of solution allows a small number of (expert) users to analyse (historical) data and define reports.
27+
### How can the framework be used?
2828

29-
Data is typically periodically extracted, cleansed, integrated and transformed in a centralised Data Warehouse from a heterogeneous set of sources. The focus for ETL in these design is typically on ‘correct functionality’ and ‘adequate performance’ - but not necessarily on design elements that are equally important for success.
29+
The framework contains a reference solution architecture that provides an overview of the definitions and intent of the various layers and areas that can be considered in the design.
3030

31-
These elements, including consistency, degree of atomicity, ability to rerun, scalability and durability are addressed in the Data Integration framework.
32-
33-
For example, data solutions may be required to cater for sending back cleansed or interpreted data to the operational (feeding, or source) systems. They also may need to handle unstructured data in addition to the structured data, as well as being able to quickly respond to changes in (business) requirements. Lastly, they may need to support a ‘feedback loop’ to incorporate changes made by (authorised) end-users in the front-end environments.
34-
35-
The Data Integration framework intents to provide architecture patterns and templates, design decisions and guidelines for error handling and process control for a flexible and manageable development cycle.
36-
37-
## Key contents
38-
39-
The framework contains a reference Solution Architecture that provides an overview of the definitions and intent of the various layers and areas that can be considered in the design.
40-
41-
The core body of knowledge sits in the various *Design Patterns* (details of specific concepts) and *Solution Patterns* (implementation guides at technical level).
31+
The core body of knowledge sits in the various *Design Patterns* (details of specific concepts) and *Solution Patterns* (implementation guides at technical level).
4232

4333
The idea is that Design- and Solution patterns are continuously updated and added to. A typical solution design would select the relevant patterns to define the architecture - captured in the Solution Architecture design artefact.
4434

45-
## Data Integration framework components
35+
## Contents
4636

47-
The diagram below outlines the Data Integration framework components. These are all required to define a data solution that supports Data Warehouse Automation.
37+
At a high level, the framework contains the following artefacts:
4838

49-
The idea is to enable a standard and structured way for documenting decisions related to system design and operation.
50-
51-
![1547519339316](./Images/5C1547519339316.png)
52-
53-
* **Reference Solution Architecture**; a blueprint for a common data solution architecture such as Data Warehouses, Data Hubs etc. The corresponding documents outline the various layers and areas that define the data solution.
54-
* **Reference Technical Architecture**; capturing the technical details relevant to the Solution Architecture. The intent for this template is to capture the infrastructure and software specifics, as well as context for the physical data models and database / data platform configuration. The Technical Architecture also covers details around the implementation of security, encryption and retention approaches.
39+
* **Reference solution architecture**; a blueprint for a common data solution architectures such as Data Warehouses, Data Hubs, Data Lakes etc. The corresponding documents outline the various layers and areas that define the data solution.
5540
* **Design Patterns**; documentation of key design decisions and backgrounds on design principles: the 'how-to's'. This includes the application of data integration and modelling concepts. Design Patterns follow a defined template and are centrally stored and managed.
5641
* **Solution Patterns**; the practical details on how to implement concepts explained in a Design Pattern for a given technology. Similar to Design Patterns, the Solution Patterns all follow the same template. In many cases a single Design Pattern is referred to by multiple Solution Patterns, all of which document how to implement the concept for a specific technology.
57-
* **Documentation templates**, standards and conventions; modelling and technical conventions.
58-
* **ETL templates & patterns**; technical templates that can be used as blueprints to generate data integration processes with or against.
59-
* **ETL mapping metadata**; approaches for managing the source-to-target mappings - vital ETL metadata to enable Data Warehouse Automation / ETL generation.
60-
* **ETL process control framework**; this is the runtime execution, logging and monitoring of data integration processes, including recovery and orchestration. This is further detailed in the DIRECT Github (Data Integration Runtime Execution and Control framework). DIRECT includes a repository for ETL control, integration hooks for ETL processes and automation scripts.
6142

62-
In short, the reference Solution Architecture and the corresponding Technical Architecture provide a common framework for all design and development effort. Design Patterns provide the details of how selected concepts are approaches, including considerations and pros and cons. Solution Patterns describe how these approaches are best translated in the selected technology.
63-
64-
## Standards for Design and Solution Patterns
43+
### Standards for Design and Solution Patterns
6544

6645
The pattern structure (Design and Solution Pattern layout) always is as follows:
6746

@@ -70,8 +49,6 @@ The pattern structure (Design and Solution Pattern layout) always is as follows:
7049
* **Motivation**, a short overview of the background and relevance of the pattern. Why is there a need?
7150
* **Applicability**, a listing of where this pattern can be expected to play a role.
7251
* **Structure**, the main section with the pattern details.
73-
* I**mplementation guidelines**, any references to how to implement this pattern (Design Patterns only). Note that the Solution Pattern is intended to explain the specifics in a technical context. This is meant to capture any generic topics.
52+
* **Implementation guidelines**, any references to how to implement this pattern (Design Patterns only). Note that the Solution Pattern is intended to explain the specifics in a technical context. This is meant to capture any generic topics.
7453
* **Considerations and consequences**, meant to offer some alternative views and experiences as to what it means to take a certain decision.
7554
* **Related patterns**, any references towards further reading and related content.
76-
77-
The Title is in Header 1 format, the sections are in Header 2 format.

0 commit comments

Comments
 (0)