Skip to content

Commit 0d96dc2

Browse files
committed
Documentation translated into English
1 parent b924ef7 commit 0d96dc2

35 files changed

+3651
-0
lines changed
135 KB
Loading
86.9 KB
Loading
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# DSS-UserGuide module design
2+
3+
### Introduce
4+
5+
The DSS user manual module is a newly added function module of DSS1.0, which is used to provide user guidance for DSS users. It contains many problems and solutions encountered during the use of DSS, and also includes some function points. Users can self-service search for solutions to problems encountered. It can also be used to associate error codes in the later stage, which supports directly locating the solutions that have been entered in the knowledge base after the pop-up error codes. The guide module stores the files in the fields of the table in the form of html. It needs to parse the md file and convert it into html. Since some files have links and need to be jumped, it is necessary to build a gitbook to display and manage these documents. In order to be efficient To synchronize the dss-guide-user module, package the files on gitLab, upload and decompress them to the specified directory of the server where gitbook is located, and scan the specified directory regularly through guide-user to achieve the purpose of synchronization.
6+
7+
## Introduction to the main modules of dss_guide
8+
9+
The DSS_Guide module mainly contains the definitions of Restful, Service, Dao, and Entity.
10+
11+
### GuideGroupService
12+
13+
It is used to solve GuideGroup's ability to add, query, modify, save, delete, etc. It also has the ability to synchronize Summary.md. The guide module can parse this file, and then locate the file that needs to be read and write it to the database regularly according to the configuration paths of the parsed directories at all levels in the file, so as to complete the synchronization. When the service is running on other servers, In order to avoid repeated installation of gitbook, the guide module needs to pass the ip of the server where the configuration file is located, and then automatically synchronize the file to the server where the guide module is located for display.
14+
15+
### GuideContentService
16+
17+
It is used to handle the save, query, update and delete operations of GuideContent.
18+
19+
### GuideChapterService
20+
21+
It is used to deal with the specific content of manual chapters, including chapter search, ID-based query, deletion, saving, etc.
22+
23+
### GuideCatalogService
24+
25+
It is used to synchronize the knowledge base, support batch insertion of directory content, and implement operations such as saving, deleting, and querying the directory structure classification.
26+
27+
28+
### Core flow chart
29+
30+
![](./images/16559707626688.png)
31+
32+
33+
### data structure
34+
35+
![](./images/1653309930194.png)
36+
37+
### dss_guide_group
38+
39+
Divide the grouping of dss_guide, including group_id, path (access path), title, etc.
40+
41+
### dss_guide_chapter
42+
43+
Used to store the detailed content of the dss_guide chapter, including catalog_id, title, content, content_html. Associate with the content of dss_guide_catalog.
44+
45+
### dss_guide_content
46+
47+
It is used to store the description content after grouping, and will be planned under the corresponding group. Contains title, type, content, content_html, etc.
48+
49+
### dss_guide_catalog
50+
51+
It is used to classify the content of dss_guide, which is equivalent to the directory structure of the knowledge base and has a hierarchical directory relationship.
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# DSS access scheduling system
2+
3+
## background
4+
5+
There are many batch timing scheduling systems currently used in the field of big data, such as Azkaban, Dophinscheduler, Airflow, etc. DataSphere Studio (DSS) supports publishing workflows designed by users to different scheduling systems for scheduling. Currently, the default support for publishing to Azkaban. The user has completed the DAG design of the workflow in DSS, including workflow resource file loading, workflow parameter setting, data import, data quality check, node code writing, visual report design, email output, node resource file upload and running parameter setting After the operation, it can be debugged and executed in DSS. After verifying that the execution of all nodes is correct, it will be released to the scheduling system, and the scheduling system will schedule and execute regularly according to the configuration of the scheduled task.
6+
7+
### release method
8+
9+
The DSS integration new scheduling system needs to be accessed by AppConn. Users need to define the corresponding XXXSchedulerAppConn according to different scheduling systems, and the transformation integration specification and the structured integration specification are defined in the SchedulerAppConn. The transformation integration specification contains the transformation of DSS project-level content and DSS workflow-level content to third-party scheduling systems. The access of DSS to the dispatching system can be divided into the following two types:
10+
11+
1. Engineering level release
12+
13+
It refers to converting all the workflows in the project, and uniformly packaging and uploading the converted content to the scheduling system. There are mainly the ProjectPreConversionRel interface, which defines the workflow that needs to be converted in the project.
14+
15+
2. Workflow level release
16+
17+
It refers to the conversion according to the granularity of the workflow, and only the content of the workflow is packaged and uploaded to the scheduling system. The current workflow definition of DSS is stored in the BML file in the form of Json, and the metadata information of the workflow is stored in the database.
18+
19+
## The main steps
20+
21+
### Parser
22+
23+
JsonToFlowParser is used to convert the Json of the workflow into Workflow. Workflow is the standard format for operating workflow in DSS. It includes the node information of the workflow, the edge information of the workflow, the parent workflow, the child workflow, the workflow resource file, Workflow property file, workflow creation time, update time, workflow user, workflow proxy user, workflow metadata information such as name, ID, description, type, whether it is a root workflow, etc. These are parsed according to the content of Json and converted into DSS operable Workflow objects, such as AzkabanWorkflow and DolphinSchedulerWorkflow.
24+
25+
### Converter
26+
27+
Turn the Workflow of DSS into a workflow that can be recognized by the access scheduling system. Each scheduling system has its own definition for the workflow. If you convert the nodes of the DSS workflow into Azkaban's job format files, or into DolphinScheduler's tasks, you can also convert them in reverse, and convert the workflow of the scheduling system into a workflow that can be loaded and displayed by DSS, and the dependencies of the workflow. , the connection of the node is converted into the dependency of the corresponding scheduling system. You can also check whether the workflow nodes under the project have nodes with the same name in Converter. For example, the use of nodes with the same name is not allowed in Azkaban's scheduling system.
28+
29+
WorkflowConVerter defines the workflow conversion output directory structure, including workflow storage directory, workflow resource file storage, and establishment of sub-workflow storage directory. For example, Azkaban also includes creating a project conversion directory in the project-level conversion operation, and establishing a workflow conversion directory according to the workflow situation in the project. Convert Workflow to dolphinSchedulerWorkflow or ScheduleWorkFlow in convertToRel
30+
31+
NodeConverter defines the output content of node conversion: such as Azkaban's ConvertNode, it will convert the node content of the workflow into the corresponding Job file content. Including the name, type, dependency of the conversion node, the execution command of the node (depending on linkis-jobtype parsing), the configuration parameters of the node, the label of the node, etc. Finally, it is stored in the format defined by the Job file. The Converter of DolphinScheduler converts the nodes in DSS into tasks in DolphinScheduler, and builds the execution script of Shell type Task, and converts the node content of DSS into the parameters required for the execution of the custom dss-dolphinscheduler-client.sh script.
32+
33+
```--java
34+
addLine.accept("LINKIS_TYPE", dssNode.getNodeType()); //Workflow Node Type
35+
addLine.accept("PROXY_USER", dssNode.getUserProxy()); //proxy user
36+
addObjectLine.accept("JOB_COMMAND", dssNode.getJobContent()); //Excuting an order
37+
addObjectLine.accept("JOB_PARAMS", dssNode.getParams()); //Node execution parameters
38+
addObjectLine.accept("JOB_RESOURCES", dssNode.getResources()); //Node execution resource file
39+
addObjectLine.accept("JOB_SOURCE", sourceMap); //Node's source information
40+
addLine.accept("CONTEXT_ID", workflow.getContextID()); //context ID
41+
addLine.accept("LINKIS_GATEWAY_URL", Configuration.getGateWayURL()); //linkis gateway address
42+
addLine.accept("RUN_DATE", "${system.biz.date}"); //run date variable
43+
```
44+
45+
### Tunning
46+
47+
It is used to complete the overall adjustment operation before the project is released. In the implementation of Azkaban, the path setting of the project and the storage path setting of the workflow are mainly completed.Because at this time it is possible to operate the project=》workflow=》sub-workflow,It is convenient for setting operation from outside to inside. For example, the storage of workflow depends on the storage location of the project, and the storage of child workflow depends on the location of the parent workflow. The child node calculation is completed in FlowTuning, and the end node is automatically added.
48+
49+
## Scheduling the AppConn implementation
50+
51+
### AbstractSchedulerAppConn
52+
53+
The abstract class of scheduling AppConn, the new scheduling system AppConn access can directly inherit this abstract class, it implements the SchedulerAppConn interface, and inherits AbstractOnlySSOAppConn, to get through the SSO login between DSS and the scheduling system. For example, the already integrated DolphinSchedulerAppConn and SchedulisAppConn both inherit this abstract class.
54+
55+
This abstract class contains two types of Standard
56+
57+
The first is ConversionIntegrationStandard to support workflows that transform DSS orchestration into a scheduling system
58+
59+
The second is SchedulerStructureIntegrationStandard, a structured integration specification for DSS and scheduling systems
60+
61+
### ConversionIntegrationStandard
62+
63+
Conversion integration specification for scheduling systems, including DSSToRelConversionService for converting DSS orchestration into scheduling system workflows. An interface is also reserved to support converting the workflow of the scheduling system into the orchestration of DSS
64+
65+
### AbstractSchedulerStructureIntegrationStandard
66+
67+
The scheduling system organizational structure integration specification is specially used for the organizational structure management of the scheduling system, mainly including engineering services and orchestration services.
68+
69+
### ProjectService
70+
71+
* The unified creation, update, deletion and duplicate checking operations of the project are realized.
72+
* It is used to open up the engineering system of DSS engineering and access third-party application tools, and realize the collaborative management of engineering.
73+
* If the scheduling system needs to open up the engineering system with DSS, it needs to implement all the interfaces of engineering services in the structured integration specification.
74+
75+
### OrchestrationService
76+
77+
The orchestration service is used for the unified orchestration specification of the scheduling system, and has the following functions:
78+
79+
* Unified orchestration specification, specially used to open up the orchestration system of DSS and SchedulerAppConn (scheduling system).
80+
* For example: connect DSS workflow and Schedules workflow.
81+
* Please note that if the docking SchedulerAppConn system does not support management workflow itself, you do not need to implement this interface.
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
DSS-AppConn Design Documentation
2+
------
3+
## Introduce
4+
    The principle of the original AppJoint is to define a top-level interface AppJoint. The third party implements this interface and stores its own connection information in the DSS table, and implements a "proxy service" in DSS that communicates with the third-party system. At the initial stage of initialization , creating an instance of the service through the reflection mechanism, and using the connection information in the table, DSS can use the "proxy service" to establish HTTP communication with the third-party system, thereby invoking the third-party system. However, there are shortcomings in the design of AppJoint. Each connected application instance needs to generate an AppJoint instance. Different instances of the same application are not logically associated. The application instance AppConn of each system is DSS1.0. The top-level interface, in DSS1.0, its own orchestration mode, workflow, single-task node, etc., are all instances of AppConn. In addition, third-party systems that access DSS need to implement the AppConn interface to implement DSS. Integrate with third-party systems to call third-party applications. Logically, AppConn has a higher abstraction logic than AppJoint. AppConn is similar to a class instance, while AppJoint is similar to an instance.
5+
6+
### Introduction to related modules
7+
|Level 1 Module | Level 2 Module | Function Introduction|
8+
|-------------|-----------|----------------|
9+
|dss-appconn|appconns|Access DSS to implement AppConn related specification implementation code|
10+
| |dss-appconn-core|Appconn interface and basic class definition|
11+
| |dss-appconn-loader|Instantiation, loading and assembly of the AppConn compiled package of the connected application|
12+
| |dss-appconn-manager|Interact with framework modules and manage related AppConn instance information|
13+
| |dss-scheduler-appconn|Abstract AppConn Definition for Scheduling System Implementation|
14+
| |linkis-appconn-engineplugin|Implement the relevant specifications of linkis appconn and open up the interaction between DSS AppConn and Linkis|
15+
16+
17+
18+
| Core interface/class | Core functions |
19+
|---------------------------|------------------------------|
20+
| DSSDictionaryRestful、DSSDictionaryServiceImpl | Provide dictionary information acquisition interface, query corresponding records from dictionary table through parameter key or parentKey |
21+
| DSSWorkspacePrivRestful、DSSWorkspacePrivServiceImpl | Provides viewing and editing functions for the permission information of the menu component of the workspace role |
22+
| DSSWorkspaceRestful、DSSWorkspaceServiceImpl | Provides basic functional interfaces of workspace, such as creating workspace, obtaining workspace list, obtaining permission information of menu components, etc. |
23+
| DSSWorkspaceRoleRestful、DSSWorkspaceRoleServiceImpl | Provides query and creation interfaces for workspace roles |
24+
| DSSWorkspaceUserRestful、DSSWorkspaceUserServiceImpl | Provides an interface for adding, deleting, modifying, and querying workspace users |
25+
26+
### AppConn Architecture Diagram
27+
![](./images/appconn_class_uml.png)
28+
![](./images/appconn_structure.png)
29+
![](./images/appconn_load_process.png)
19.7 KB
Loading
32.1 KB
Loading
38.8 KB
Loading

0 commit comments

Comments
 (0)