|
1 | 1 | # Introduction of DIRECT |
2 | 2 |
|
3 | | -DIRECT, the Data Integration & Execution Control Tool, is a data integration control and execution metadata model. It is a core and stand-alone component of the Data Integration Framework. |
| 3 | +DIRECT, the Data Integration & Execution Control Tool, is a metadata model for controlling and executing data integration processes ('processes'). A robust data logistics control framework is essential for many data solutions, and DIRECT can serve as such a framework. |
4 | 4 |
|
5 | | -Every Data Integration / Extract Transform and Load (ETL) process is linked to this model which provides the orchestration and management capabilities for data integration. |
| 5 | +Every data logistics process, such as data integration or Extract, Transform, and Load (ETL), can be registered in the DIRECT framework. DIRECT provides orchestration and management capabilities for data integration, ensuring smooth execution and control. |
6 | 6 |
|
7 | | -Data Integration in this context is a broad definition covering various implementation techniques such as ELT (Extract Load, Transform - push-down into SQL or underlying processing) and LETS (Load-Extract-Transform-Store). |
| 7 | +DIRECT features a database repository where each data logistics process is registered, and every runtime execution is tracked. This repository serves as a valuable source of information on platform performance, usage trends, and platform growth in terms of both time and size. At its core, DIRECT focuses on defining and orchestrating processes, while also offering advanced features like continuous and parallel processing, as well as transaction control. |
8 | 8 |
|
9 | | -Data Integration in this document essentially covers all processes that 'touch' data. |
10 | | - |
11 | | -The DIRECT repository captures Data Integration process information, and is an invaluable source of information to monitor how the system is expanding (time, size) but also to drive and monitor processes - a fundamental requirement for parallel processing and transaction control. |
12 | | - |
13 | | -The objective of the DIRECT Framework is to provide a structured approach to describing and recording Data integration processes that can be made up of many separate components. This is to be done in such a way that they can be represented and managed as a coherent system. |
14 | | - |
15 | | -## Overview |
16 | | - |
17 | | -This document covers the design and specifications for the DIRECT metadata repository and the integration (events) for data integration processes. |
18 | | - |
19 | | -The DIRECT framework covers a broad variety of process details, including (but not limited to): |
20 | | - |
21 | | -* What process information will be stored and how. |
22 | | -* How a process is integrated into the various defined Layers and Areas. |
23 | | -* Of what entities the metadata model consists, |
24 | | -* The available procedures for managing the data solution. |
25 | | -* Concepts and principles. |
26 | | -* The logic which can be used to control the processes. |
27 | | -* Housekeeping functions. |
28 | | -* Reporting. |
29 | | - |
30 | | -The position of the control and execution framework in the overall architecture is: |
31 | | - |
32 | | - |
| 9 | +The primary goal of the DIRECT framework is to provide a structured approach to describing and recording data logistics processes, which may consist of many distinct components. This structure allows these processes to be represented and managed as a cohesive system. |
33 | 10 |
|
34 | 11 | ## Concepts |
35 | 12 |
|
36 | 13 | ### Purpose |
37 | 14 |
|
38 | | -The process control framework supports the ability to trace back what data has been loaded, when and in what way for every individual data integration process. |
| 15 | +The framework provides the ability to trace back what data has been processed, when and in what way for every individual data logisitcs process. |
39 | 16 |
|
40 | 17 | Any single data element (e.g. attribute value in a table) should be auditable. It should be possible to track the what processes have been run that has led to the visible result. |
41 | 18 |
|
@@ -118,7 +95,7 @@ The following diagram illustrates the layers and technologies involved in this p |
118 | 95 |
|
119 | 96 | ## Rollback and re-processing |
120 | 97 |
|
121 | | -When processing errors occur (a data integration process fails), relevant information about the failure is recorded in the repository by the framework. This information can be used to recover from data loading errors and set the data solution back into the original state prior to the occurrence of the error. |
| 98 | +When processing errors occur (a process fails), relevant information about the failure is recorded in the repository by the framework. This information can be used to recover from data loading errors and set the data solution back into the original state prior to the occurrence of the error. |
122 | 99 |
|
123 | 100 | This 'rollback' can be configured at both Batch and Module level. |
124 | 101 |
|
|
0 commit comments