Skip to content

Commit fddfba2

Browse files
authored
Merge pull request #43 from yuankang134/1.1.0
1.1.0
2 parents ee4ce24 + f69aaf0 commit fddfba2

File tree

120 files changed

+5274
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

120 files changed

+5274
-0
lines changed
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
FlowExecution
2+
-------------------------
3+
FlowExecution:The workflow real-time execution module provides interface services for workflow execution, uses the Entrance module of linkis, and inherits some classes of linkis-entrance for adaptive transformation.
4+
Such as inheriting the PersistenceEngine to implement the persistence operation of the dss workflow task, and overriding the EntranceExecutionJob class to implement the workflow node execution, state transition, kill and other operations. Finally, the parsed and processed tasks are delivered to the linkis service through linkis-computation-client.
5+
6+
7+
### 1 business architecture
8+
9+
User use function points:
10+
11+
| component name | First-level module | Second-level module | Function point |
12+
|---------------------|------------------|-----------------|-----------------|
13+
| DataSphereStudio | Workflow | Workflow Execution | Execute |
14+
| | | | Check to execute |
15+
| | | | fail and rerun |
16+
| | | | Execution history view |
17+
18+
![](images/workflow_execution_uml.png)
19+
20+
### 2. Introduction to FlowExecution interface/class function:
21+
22+
| Core Interface/Class | Core Function |
23+
|---------------------------|------------------------------|
24+
| FlowEntranceRestfulApi | Provides a restful interface for workflow execution, such as task execution, status acquisition |
25+
| WorkflowPersistenceEngine | Overrides the persist method of the PersistenceEngine of linkis, converts the jobRequest to a workflowTask and persists it to the workflow_task table of dss |
26+
| WorkflowQueryService | Provides interface services such as workflow task creation and status update |
27+
| WorkflowExecutionInfoService | Provides services such as creation and query of workflow execution information |
28+
| FlowEntranceEngine | Executor inherited from linkis, implements the execute method, will call flowParser to parse the flow and use it as the entry of the runJob method |
29+
| FlowExecutionExecutorManagerImpl | Inherit the ExecutorManager of the child linkis, rewrite the createExecutor method, so that the created executor is the FlowEntranceEngine of dss |
30+
| FlowExecutionParser | CommonEntranceParser inherited from linkis, rewrites the parseToJob method to return FlowEntranceJob of dss |
31+
| DefaultFlowExecution | Provides the runJob() method to convert all scheduled nodes of FlowEntranceJob to runnable state, and add the running nodes to the task queue. The timing thread pool polls the linkis task status corresponding to each node. If the status task is completed is removed from the queue. |
32+
| FlowEntranceJobParser | Define the parse() method, the subclass contains various parsing implementation classes, such as parsing workflow from jobRequest, parsing the params attribute of workflow nodes |
33+
| FlowEntranceJob | EntranceExecutionJob inherited from linkis, rewrites run(), kill(), onStatusChanged() and other methods to provide job execution entry and status callback processing. |
34+
| DefaultNodeRunner | The node task running thread, converts the node task to LinkisJob and delivers it to linkis, and provides methods for initiating task status query and task cancellation to linkis |
35+
| NodeSkipStrategy | Strategy interface for judging whether a node is skipped, including three implementation strategies: execution, rerun on failure, and selected execution |
36+
| FlowContext | Contains workflow context status information, and provides methods such as getRunningNodes and getSucceedNodes to obtain information such as running nodes and successful nodes of the workflow |
37+
38+
39+
40+
### 3. Workflow execution process link:
41+
![](images/flowexecution.drawio.png)
42+
43+
### 4. Data Structure/Storage Design
44+
Workflow execution information table:
45+
```roomsql
46+
CREATE TABLE `dss_workflow_execute_info` (
47+
`id` bigint(20) NOT NULL AUTO_INCREMENT,
48+
`task_id` bigint(20) NOT NULL COMMENT 'task id',
49+
`status` int(1) DEFAULT NULL COMMENT 'status,0:faild 1:success,',
50+
`flow_id` bigint(20) NOT NULL COMMENT 'flowId',
51+
`version` varchar(200) DEFAULT NULL COMMENT 'Workflow bml version number',
52+
`failed_jobs` text COMMENT 'execution failed node',
53+
`Pending_jobs` text COMMENT 'Node not executed',
54+
`skipped_jobs` text COMMENT 'execute skip node',
55+
`succeed_jobs` text COMMENT 'Execute successful node',
56+
`createtime` datetime NOT NULL COMMENT 'create time',
57+
`running_jobs` text COMMENT 'running jobs',
58+
`updatetime` datetime DEFAULT NULL COMMENT 'update time',
59+
PRIMARY KEY (`id`)
60+
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
61+
```
62+
Workflow task information table:
63+
```roomsql
64+
CREATE TABLE `dss_workflow_task` (
65+
`id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'Primary Key, auto increment',
66+
`instance` varchar(50) DEFAULT NULL COMMENT 'An instance of Entrance, consists of IP address of the entrance server and port',
67+
`exec_id` varchar(50) DEFAULT NULL COMMENT 'execution ID, consists of jobID(generated by scheduler), executeApplicationName , creator and instance',
68+
`um_user` varchar(50) DEFAULT NULL COMMENT 'User name',
69+
`submit_user` varchar(50) DEFAULT NULL COMMENT 'submitUser name',
70+
`execution_code` text COMMENT 'Run script. When exceeding 6000 lines, script would be stored in HDFS and its file path would be stored in database',
71+
`progress` float DEFAULT NULL COMMENT 'Script execution progress, between zero and one',
72+
`log_path` varchar(200) DEFAULT NULL COMMENT 'File path of the log files',
73+
`result_location` varchar(200) DEFAULT NULL COMMENT 'File path of the result',
74+
`status` varchar(50) DEFAULT NULL COMMENT 'Script execution status, must be one of the following: Inited, WaitForRetry, Scheduled, Running, Succeed, Failed, Cancelled, Timeout',
75+
`created_time` datetime DEFAULT NULL COMMENT 'Creation time',
76+
`updated_time` datetime DEFAULT NULL COMMENT 'Update time',
77+
`run_type` varchar(50) DEFAULT NULL COMMENT 'Further refinement of execution_application_time, e.g, specifying whether to run pySpark or SparkR',
78+
`err_code` int(11) DEFAULT NULL COMMENT 'Error code. Generated when the execution of the script fails',
79+
`err_desc` text COMMENT 'Execution description. Generated when the execution of script fails',
80+
`execute_application_name` varchar(200) DEFAULT NULL COMMENT 'The service a user selects, e.g, Spark, Python, R, etc',
81+
`request_application_name` varchar(200) DEFAULT NULL COMMENT 'Parameter name for creator',
82+
`script_path` varchar(200) DEFAULT NULL COMMENT 'Path of the script in workspace',
83+
`params` text COMMENT 'Configuration item of the parameters',
84+
`engine_instance` varchar(50) DEFAULT NULL COMMENT 'An instance of engine, consists of IP address of the engine server and port',
85+
`task_resource` varchar(1024) DEFAULT NULL,
86+
`engine_start_time` time DEFAULT NULL,
87+
`label_json` varchar(200) DEFAULT NULL COMMENT 'label json',
88+
PRIMARY KEY (`id`),
89+
KEY `created_time` (`created_time`),
90+
KEY `um_user` (`um_user`)
91+
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
92+
```
135 KB
Loading
111 KB
Loading
Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
Orchestrator Architecture Design
2+
-------------------------
3+
Orchestrator:The orchestration module provides interface services such as adding, deleting, modifying, querying, importing and exporting orchestrated under the project, and serves as a unified input for each orchestration implementation (such as workflow). Connect to the project service upward, and connect to the specific orchestration implementation (such as workflow service) downward.
4+
5+
### 1 business architecture
6+
User use function point:
7+
8+
9+
|Component name | First-level module | Second-level module | Function point |
10+
|---------------------|------------------|-----------------|-----------------|
11+
| DataSphereStudio | Orchestration Mode | New Orchestration Mode | Create a New Orchestration |
12+
| | | Edit Arrangement Mode | Edit Arranged Field Information |
13+
| | | Delete Arrangement Mode | Delete Arrangement |
14+
| | | Open Arrangement Mode | Open Arrangement to perform drag-and-drop development of choreography nodes |
15+
| | | View the list of orchestration versions | View the historical versions of the orchestration mode, you can open and view a version, or roll back a version |
16+
| | | Orchestration mode rollback | Roll back to a historical version of the orchestration (a version will be added after the orchestration is released) |
17+
18+
![](images/orchestrator_uml.png)
19+
20+
### 一、Orchestrator Architecture:
21+
![](images/orchestrator_arch.png)
22+
23+
### 二、Orchestrator module design:
24+
Second-level module core class introduction:
25+
26+
**dss-orchestrator-core**
27+
28+
The core module of Orchestrator defines top-level interfaces such as DSSOrchestrator, DSSOrchestratorContext, and DSSOrchestratorPlugin.
29+
30+
| Core top-level interface/class | Core functionality |
31+
|---------------------------|------------------------------|
32+
| DSSOrchestrator | Defines the method of obtaining the properties of the orchestration, such as obtaining the editorial ranking, associated appconn, context information, etc. |
33+
| DSSOrchestratorContext | Defines the context information of the orchestrator, and provides methods such as obtaining the orchestration plug-in class |
34+
| DSSOrchestratorPlugin | The top-level interface for orchestrating plugins, defines the init method, and the subclass contains the imported and exported plugin implementation classes |
35+
| DSSOrchestratorRelation |Defines the method of obtaining the associated properties of the orchestration, such as obtaining the orchestration mode, and obtaining the appconn associated with the orchestration |
36+
37+
**dss-orchestrator-db**
38+
39+
Defines the unified entry of the dao layer method of orchestration.
40+
41+
**dss-orchestrator-conversion-standard**
42+
43+
Defines the interface specification for orchestration and conversion to third-party systems, including top-level interfaces such as ConversionOperation, ConversionRequestRef, and ConversionService.
44+
The core module of Orchestrator defines top-level interfaces such as DSSOrchestrator, DSSOrchestratorContext, and DSSOrchestratorPlugin.
45+
46+
| Core Interface/Class | Core Function |
47+
|--------------------- |------------------------------------------|
48+
| ConversionOperation | Defines the conversion core convert method, the input parameter is ConversionRequestRef, and returns ResponseRef |
49+
| DSSToRelConversionRequestRef | Defines the basic parameters of the conversion request, such as userName, workspace, dssProject and other information |
50+
| ConversionIntegrationStandard | The following core methods are defined: getDSSToRelConversionService (used to support the orchestration of DSS and convert it to the workflow of the scheduling system) |
51+
| ConversionService | Defines methods for getting labels and getting ConversionIntegrationStandard |
52+
53+
54+
**dss-orchestrator-loader**
55+
56+
Used to load orchestration-related appconn, such as workflow-appconn, and load subclasses of DSSOrchestratorPlugin, such as ExportDSSOrchestratorPlugin.
57+
58+
| Core Interface/Class | Core Function |
59+
|--------------------- |---------------------------------------------|
60+
| OrchestratorManager | The getOrCreateOrchestrator method is defined to load the appconn associated with the orchestrator, which will be cached after the first load to avoid repeated loading |
61+
| LinkedAppConnResolver | Defines the interface for obtaining appconn according to the user |
62+
| SpringDSSOrchestratorContext | The initialization method of the class will load all subclasses of DSSOrchestratorPlugin and cache them in memory |
63+
64+
**dss-framework-orchestrator-server**
65+
66+
The Orchestrator framework service provides interfaces such as adding, deleting, modifying, checking, and rolling back the orchestration front-end, as well as rpc services such as orchestration import and export.
67+
68+
**dss-framework-orchestraotr-publish**
69+
70+
Provides publishing-related plug-ins, such as arranging import and export implementation classes, arranging compressed package generation, and parsing implementation classes.
71+
72+
| Core Interface/Class | Core Function |
73+
|--------------------- |---------------------------------------------|
74+
| ExportDSSOrchestratorPlugin | Defines the orchestration export interface |
75+
| ImportDSSOrchestratorPlugin | The orchestration import interface is defined |
76+
| MetaWriter | Provides the function of outputting the arranged table field information to the metadata file in a specific format|
77+
| MetaReader | Provides the function of parsing the arranged metadata file to generate table field content|
78+
79+
#### Create an orchestration sequence diagram (delete and edit operations are similar):
80+
81+
![](images/Create_an_orchestration_sequence_diagram.png)
82+
83+
#### Import and arrange sequence diagrams (export operations are similar):
84+
85+
![](images/Import_Orchestration_Sequence_Diagram.png)
86+
87+
### Data Structure/Storage Design
88+
orchestrator information sheet:
89+
```roomsql
90+
CREATE TABLE `dss_orchestrator_info` (
91+
`id` bigint(20) NOT NULL AUTO_INCREMENT,
92+
`name` varchar(255) NOT NULL COMMENT 'orchestrator name',
93+
`type` varchar(255) NOT NULL COMMENT 'orchestrator type,E.g:workflow',
94+
`desc` varchar(1024) DEFAULT NULL COMMENT 'description',
95+
`creator` varchar(100) NOT NULL COMMENT 'creator',
96+
`create_time` datetime DEFAULT NULL COMMENT 'create time',
97+
`project_id` bigint(20) DEFAULT NULL COMMENT 'project id',
98+
`uses` varchar(500) DEFAULT NULL COMMNET 'uses',
99+
`appconn_name` varchar(1024) NOT NULL COMMENT 'Orchestrate the associated appconn,E.g:workflow',
100+
`uuid` varchar(180) NOT NULL COMMENT 'uuid',
101+
`secondary_type` varchar(500) DEFAULT NULL COMMENT 'Orchestrate of the second type,E.g:workflow-DAG',
102+
`is_published` tinyint(1) NOT NULL DEFAULT '0' COMMENT 'Is it published',
103+
`workspace_id` int(11) DEFAULT NULL COMMENT 'workspace id',
104+
`orchestrator_mode` varchar(100) DEFAULT NULL COMMENT 'orchestrator mode,The value obtained is dic_key(parent_key=p_arrangement_mode) in dss_dictionary',
105+
`orchestrator_way` varchar(256) DEFAULT NULL COMMENT 'orchestrator way',
106+
`orchestrator_level` varchar(32) DEFAULT NULL COMMENT 'orchestrator level',
107+
`update_user` varchar(100) DEFAULT NULL COMMENT 'update user',
108+
`update_time` datetime DEFAULT CURRENT_TIMESTAMP COMMENT 'update time',
109+
PRIMARY KEY (`id`) USING BTREE,
110+
UNIQUE KEY `unique_idx_uuid` (`uuid`)
111+
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 ROW_FORMAT=COMPACT;
112+
```
113+
114+
orchestrator version information table:
115+
```roomsql
116+
CREATE TABLE `dss_orchestrator_version_info` (
117+
`id` bigint(20) NOT NULL AUTO_INCREMENT,
118+
`orchestrator_id` bigint(20) NOT NULL COMMENT 'associated orchestration id',
119+
`app_id` bigint(20) DEFAULT NULL COMMENT 'The id of the orchestration implementation, such as flowId',
120+
`source` varchar(255) DEFAULT NULL COMMENT 'source',
121+
`version` varchar(255) DEFAULT NULL COMMENT 'verison',
122+
`comment` varchar(255) DEFAULT NULL COMMENT 'description',
123+
`update_time` datetime DEFAULT NULL COMMENT 'update time',
124+
`updater` varchar(32) DEFAULT NULL COMMENT 'updater',
125+
`project_id` bigint(20) DEFAULT NULL COMMENT 'project id',
126+
`content` varchar(255) DEFAULT NULL COMMENT '',
127+
`context_id` varchar(200) DEFAULT NULL COMMENT 'context id',
128+
`valid_flag` INT(1) DEFAULT '1' COMMENT 'Version valid flag, 0: invalid; 1: valid',
129+
PRIMARY KEY (`id`) USING BTREE
130+
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 ROW_FORMAT=COMPACT;
131+
```
132+
133+
And the scheduling system orchestration association table:
134+
```roomsql
135+
CREATE TABLE `dss_orchestrator_ref_orchestration_relation` (
136+
`id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'primary key ID',
137+
`orchestrator_id` bigint(20) NOT NULL COMMENT 'The orchestration mode id of dss',
138+
`ref_project_id` bigint(20) DEFAULT NULL COMMENT 'The project ID associated with the scheduling system',
139+
`ref_orchestration_id` int(11) DEFAULT NULL COMMENT 'The id of the scheduling system workflow (the orchestrationId returned by calling the OrchestrationOperation service of SchedulerAppConn)',
140+
PRIMARY KEY (`id`)
141+
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 ROW_FORMAT=COMPACT;
142+
```
143+
144+
### 5.interface design
145+
146+
147+
### 6.non-functional design
148+
#### 6.1 Safety
149+
Using the special ID in the cookie, GateWay needs to use a special decryption algorithm to identify it.
150+
#### 6.2 Performance
151+
can meet performance requirements.
152+
#### 6.3 Capacity
153+
not involving
154+
#### 6.4 High Availability
155+
Deployable Multi-Active
156+
86.9 KB
Loading
166 KB
Loading
120 KB
Loading
128 KB
Loading
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# DSS-UserGuide module design
2+
3+
### Introduce
4+
5+
The DSS user manual module is a newly added function module of DSS1.0, which is used to provide user guidance for DSS users. It contains many problems and solutions encountered during the use of DSS, and also includes some function points. Users can self-service search for solutions to problems encountered. It can also be used to associate error codes in the later stage, which supports directly locating the solutions that have been entered in the knowledge base after the pop-up error codes. The guide module stores the files in the fields of the table in the form of html. It needs to parse the md file and convert it into html. Since some files have links and need to be jumped, it is necessary to build a gitbook to display and manage these documents. In order to be efficient To synchronize the dss-guide-user module, package the files on gitLab, upload and decompress them to the specified directory of the server where gitbook is located, and scan the specified directory regularly through guide-user to achieve the purpose of synchronization.
6+
7+
## Introduction to the main modules of dss_guide
8+
9+
The DSS_Guide module mainly contains the definitions of Restful, Service, Dao, and Entity.
10+
11+
### GuideGroupService
12+
13+
It is used to solve GuideGroup's ability to add, query, modify, save, delete, etc. It also has the ability to synchronize Summary.md. The guide module can parse this file, and then locate the file that needs to be read and write it to the database regularly according to the configuration paths of the parsed directories at all levels in the file, so as to complete the synchronization. When the service is running on other servers, In order to avoid repeated installation of gitbook, the guide module needs to pass the ip of the server where the configuration file is located, and then automatically synchronize the file to the server where the guide module is located for display.
14+
15+
### GuideContentService
16+
17+
It is used to handle the save, query, update and delete operations of GuideContent.
18+
19+
### GuideChapterService
20+
21+
It is used to deal with the specific content of manual chapters, including chapter search, ID-based query, deletion, saving, etc.
22+
23+
### GuideCatalogService
24+
25+
It is used to synchronize the knowledge base, support batch insertion of directory content, and implement operations such as saving, deleting, and querying the directory structure classification.
26+
27+
28+
### Core flow chart
29+
30+
![](./images/16559707626688.png)
31+
32+
33+
### data structure
34+
35+
![](./images/1653309930194.png)
36+
37+
### dss_guide_group
38+
39+
Divide the grouping of dss_guide, including group_id, path (access path), title, etc.
40+
41+
### dss_guide_chapter
42+
43+
Used to store the detailed content of the dss_guide chapter, including catalog_id, title, content, content_html. Associate with the content of dss_guide_catalog.
44+
45+
### dss_guide_content
46+
47+
It is used to store the description content after grouping, and will be planned under the corresponding group. Contains title, type, content, content_html, etc.
48+
49+
### dss_guide_catalog
50+
51+
It is used to classify the content of dss_guide, which is equivalent to the directory structure of the knowledge base and has a hierarchical directory relationship.
49.4 KB
Loading

0 commit comments

Comments
 (0)