Skip to content

Commit 9f38010

Browse files
committed
Merge branch '1.8_release_3.10.x' into 1.8_release_3.10.x_jingdongfang
# Conflicts: # flinkx-rdb/flinkx-rdb-reader/src/main/java/com.dtstack.flinkx.rdb.inputformat/JdbcInputFormat.java
2 parents 2b01a06 + e64ee54 commit 9f38010

File tree

629 files changed

+27140
-9941
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

629 files changed

+27140
-9941
lines changed

.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,14 @@
11
# Created by .ignore support plugin (hsz.mobi)
22
.idea/
33
plugins/
4+
syncplugins/
45
*.iml
56
target/
67
lib/
78
jobs/
89
nohup.out
910
flinkconf/
1011
hadoopconf/
11-
/default_task_id_output
12+
/default_task_id_output
13+
/syncplugins
14+
flinkx-test/

.gitlab-ci.yml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
build:
2+
stage: test
3+
script:
4+
- sh ci/install_jars.sh
5+
- mvn clean org.jacoco:jacoco-maven-plugin:0.7.8:prepare-agent package -Dmaven.test.failure.ignore=true -q
6+
- mvn sonar:sonar -Dsonar.projectKey="dt-insight-engine/flinkx" -Dsonar.host.url=http://172.16.100.198:9000 -Dsonar.jdbc.url=jdbc:postgresql://172.16.100.198:5432/sonar -Dsonar.java.binaries=target/sonar -Dsonar.login=11974c5e9a29625efa09fdc3c3fdc031efb1aab1
7+
- sh ci/sonar_notify.sh
8+
only:
9+
- 1.8_dev
10+
tags:
11+
- dt-insight-engine

README.md

Lines changed: 90 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@
2525
* 单机模式:对应Flink集群的单机模式
2626
* standalone模式:对应Flink集群的分布式模式
2727
* yarn模式:对应Flink集群的yarn模式
28+
* yarnPer模式: 对应Flink集群的Per-job模式
29+
2830

2931
### 3.2 执行环境
3032

@@ -52,6 +54,7 @@ mvn clean package -Dmaven.test.skip
5254
* local: 本地模式
5355
* standalone: 独立部署模式的flink集群
5456
* yarn: yarn模式的flink集群,需要提前在yarn上启动一个flink session,使用默认名称"Flink session cluster"
57+
* yarnPer: yarn模式的flink集群,单独为当前任务启动一个flink session,使用默认名称"Flink per-job cluster"
5558
* 必选:否
5659
* 默认值:local
5760

@@ -61,15 +64,15 @@ mvn clean package -Dmaven.test.skip
6164
* 必选:是
6265
* 默认值:无
6366

64-
* **plugin**
67+
* **pluginRoot**
6568

66-
* 描述:插件根目录地址,也就是打包后产生的plugins目录
69+
* 描述:插件根目录地址,也就是打包后产生的pluginRoot目录
6770
* 必选:是
6871
* 默认值:无
6972

7073
* **flinkconf**
7174

72-
* 描述:flink配置文件所在的目录(单机模式下不需要),如/hadoop/flink-1.4.0/conf
75+
* 描述:flink配置文件所在的目录(单机模式下不需要),如/opt/dtstack/flink-1.8.1/conf
7376
* 必选:否
7477
* 默认值:无
7578

@@ -78,25 +81,79 @@ mvn clean package -Dmaven.test.skip
7881
* 描述:Hadoop配置文件(包括hdfs和yarn)所在的目录(单机模式下不需要),如/hadoop/etc/hadoop
7982
* 必选:否
8083
* 默认值:无
84+
85+
* **flinkLibJar**
86+
87+
* 描述:flink lib所在的目录(单机模式下不需要),如/opt/dtstack/flink-1.8.1/lib
88+
* 必选:否
89+
* 默认值:无
90+
91+
* **confProp**
92+
93+
* 描述:flink相关参数,如{\"flink.checkpoint.interval\":200000}
94+
* 必选:否
95+
* 默认值:无
96+
97+
* **queue**
98+
99+
* 描述:yarn队列,如default
100+
* 必选:否
101+
* 默认值:无
102+
103+
* **pluginLoadMode**
104+
105+
* 描述:yarnPer模式插件加载方式:
106+
* classpath:提交任务时不上传插件包,需要在yarn-node节点pluginRoot目录下部署插件包,但任务启动速度较快
107+
* shipfile:提交任务时上传pluginRoot目录下部署插件包的插件包,yarn-node节点不需要部署插件包,任务启动速度取决于插件包的大小及网络环境
108+
* 必选:否
109+
* 默认值:classpath
81110

82111
#### 3.4.2 启动数据同步任务
83112

84113
* **以本地模式启动数据同步任务**
85114

86115
```
87-
bin/flinkx -mode local -job /Users/softfly/company/flink-data-transfer/jobs/task_to_run.json -pluginRoot /Users/softfly/company/flink-data-transfer/plugins -confProp "{"flink.checkpoint.interval":60000,"flink.checkpoint.stateBackend":"/flink_checkpoint/"}" -s /flink_checkpoint/0481473685a8e7d22e7bd079d6e5c08c/chk-*
116+
bin/flinkx -mode local \
117+
-job /Users/softfly/company/flink-data-transfer/jobs/task_to_run.json \
118+
-plugin /Users/softfly/company/flink-data-transfer/plugins \
119+
-confProp "{"flink.checkpoint.interval":60000,"flink.checkpoint.stateBackend":"/flink_checkpoint/"}" \
120+
-s /flink_checkpoint/0481473685a8e7d22e7bd079d6e5c08c/chk-*
88121
```
89122

90123
* **以standalone模式启动数据同步任务**
91124

92125
```
93-
bin/flinkx -mode standalone -job /Users/softfly/company/flink-data-transfer/jobs/oracle_to_oracle.json -pluginRoot /Users/softfly/company/flink-data-transfer/plugins -flinkconf /hadoop/flink-1.4.0/conf -confProp "{"flink.checkpoint.interval":60000,"flink.checkpoint.stateBackend":"/flink_checkpoint/"}" -s /flink_checkpoint/0481473685a8e7d22e7bd079d6e5c08c/chk-*
126+
bin/flinkx -mode standalone \
127+
-job /Users/softfly/company/flink-data-transfer/jobs/oracle_to_oracle.json \
128+
-plugin /Users/softfly/company/flink-data-transfer/plugins \
129+
-flinkconf /hadoop/flink-1.4.0/conf \
130+
-confProp "{"flink.checkpoint.interval":60000,"flink.checkpoint.stateBackend":"/flink_checkpoint/"}" \
131+
-s /flink_checkpoint/0481473685a8e7d22e7bd079d6e5c08c/chk-*
94132
```
95133

96134
* **以yarn模式启动数据同步任务**
97135

98136
```
99-
bin/flinkx -mode yarn -job /Users/softfly/company/flinkx/jobs/mysql_to_mysql.json -pluginRoot /opt/dtstack/flinkplugin/syncplugin -flinkconf /opt/dtstack/myconf/conf -yarnconf /opt/dtstack/myconf/hadoop -confProp "{"flink.checkpoint.interval":60000,"flink.checkpoint.stateBackend":"/flink_checkpoint/"}" -s /flink_checkpoint/0481473685a8e7d22e7bd079d6e5c08c/chk-*
137+
bin/flinkx -mode yarn \
138+
-job /Users/softfly/company/flinkx/jobs/mysql_to_mysql.json \
139+
-plugin /opt/dtstack/flinkplugin/syncplugin \
140+
-flinkconf /opt/dtstack/myconf/conf \
141+
-yarnconf /opt/dtstack/myconf/hadoop \
142+
-confProp "{"flink.checkpoint.interval":60000,"flink.checkpoint.stateBackend":"/flink_checkpoint/"}" \
143+
-s /flink_checkpoint/0481473685a8e7d22e7bd079d6e5c08c/chk-*
144+
```
145+
146+
* **以perjob模式启动数据同步任务**
147+
148+
```
149+
bin/flinkx -mode yarnPer \
150+
-job /test.json \
151+
-pluginRoot /opt/dtstack/syncplugin \
152+
-flinkconf /opt/dtstack/flink-1.8.1/conf \
153+
-yarnconf /opt/dtstack/hadoop-2.7.3/etc/hadoop \
154+
-flinkLibJar /opt/dtstack/flink-1.8.1/lib \
155+
-confProp {\"flink.checkpoint.interval\":200000} \
156+
-queue c -pluginLoadMode classpath
100157
```
101158

102159
## 4 数据同步任务模版
@@ -187,6 +244,25 @@ setting包括speed、errorLimit和dirty三部分,分别描述限速、错误
187244

188245
restore配置请参考[断点续传](docs/restore.md)
189246

247+
#### 4.1.5 log
248+
249+
```
250+
"log" : {
251+
"isLogger": true,
252+
"level" : "warn",
253+
"path" : "/opt/log/",
254+
"pattern":""
255+
}
256+
```
257+
* isLogger: 日志是否保存到磁盘, `true`: 是; `false`(默认): 否;
258+
* level: 日志输出级别, `trace`, `debug`, `info`(默认), `warn`, `error`;
259+
* path: 日志保存路径, 默认为`/tmp/dtstack/flinkx/`, 日志名称为当前flink任务的jobID,如: `97501729f8c44c260d889d099968cc74.log`
260+
* pattern: 日志输出格式
261+
* log4j默认格式为: `%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{60} %X{sourceThread} - %msg%n`;
262+
* logback默认格式为: `%d{yyyy-MM-dd HH:mm:ss,SSS} %-5p %-60c %x - %m%n`
263+
264+
注意:该日志记录功能只会记录`com.dtstack`包下的输出日志, 如需变更,可修改类参数`DtLogger.LOGGER_NAME`
265+
190266
### 4.2 content
191267

192268
```
@@ -221,7 +297,7 @@ reader和writer包括name和parameter,分别表示插件名称和插件参数
221297

222298
### 5.1 读取插件
223299

224-
* [关系数据库读取插件](docs/rdbreader.md)
300+
* [关系数据库读取插件(Mysql,Oracle,Sqlserver,Postgresql,Db2,Gbase)](docs/rdbreader.md)
225301
* [分库分表读取插件](docs/rdbdreader.md)
226302
* [HDFS读取插件](docs/hdfsreader.md)
227303
* [HBase读取插件](docs/hbasereader.md)
@@ -234,11 +310,14 @@ reader和writer包括name和parameter,分别表示插件名称和插件参数
234310
* [MySQL binlog读取插件](docs/binlog.md)
235311
* [KafKa读取插件](docs/kafkareader.md)
236312
* [Kudu读取插件](docs/kudureader.md)
313+
* [Emqx读取插件](docs/emqxreader.md)
314+
* [Oracle实时采集插件](docs/logminer.md)
315+
* [SqlServerCdc实时采集插件](docs/sqlservercdc.md)
237316

238317

239318
### 5.2 写入插件
240319

241-
* [关系数据库写入插件](docs/rdbwriter.md)
320+
* [关系数据库写入插件(Mysql,Oracle,Sqlserver,Postgresql,Db2,Gbase)](docs/rdbwriter.md)
242321
* [HDFS写入插件](docs/hdfswriter.md)
243322
* [HBase写入插件](docs/hbasewriter.md)
244323
* [Elasticsearch写入插件](docs/eswriter.md)
@@ -251,6 +330,8 @@ reader和writer包括name和parameter,分别表示插件名称和插件参数
251330
* [Kafka写入插件](docs/kafkawriter.md)
252331
* [Hive写入插件](docs/hivewriter.md)
253332
* [Kudu写入插件](docs/kuduwriter.md)
333+
* [Emqx写入插件](docs/emqxwriter.md)
334+
254335

255336
[断点续传和实时采集功能介绍](docs/restore.md)
256337

@@ -269,4 +350,4 @@ reader和writer包括name和parameter,分别表示插件名称和插件参数
269350

270351
## 7.招聘信息
271352

272-
1.大数据平台开发工程师,想了解岗位详细信息可以添加本人微信号ysqwhiletrue,注明招聘,如有意者发送简历至[email protected]
353+
1.大数据平台开发工程师,想了解岗位详细信息可以添加本人微信号ysqwhiletrue,注明招聘,如有意者发送简历至[email protected]

bin/install_jars.bat

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
call mvn install:install-file -DgroupId=com.ibm.db2 -DartifactId=db2jcc -Dversion=3.72.44 -Dpackaging=jar -Dfile=../jars/db2jcc-3.72.44.jar
2+
3+
call mvn install:install-file -DgroupId=com.github.noraui -DartifactId=ojdbc8 -Dversion=12.2.0.1 -Dpackaging=jar -Dfile=../jars/ojdbc8-12.2.0.1.jar
4+
5+
call mvn install:install-file -DgroupId=com.esen.jdbc -DartifactId=gbase -Dversion=8.3.81.53 -Dpackaging=jar -Dfile=../jars/gbase-8.3.81.53.jar
6+
7+
call mvn install:install-file -DgroupId=dm.jdbc.driver -DartifactId=dm7 -Dversion=18.0.0 -Dpackaging=jar -Dfile=../jars/Dm7JdbcDriver18.jar
8+

bin/install_jars.sh

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
#!/usr/bin/env bash
2+
3+
## db2 driver
4+
mvn install:install-file -DgroupId=com.ibm.db2 -DartifactId=db2jcc -Dversion=3.72.44 -Dpackaging=jar -Dfile=../jars/db2jcc-3.72.44.jar
5+
6+
## oracle driver
7+
mvn install:install-file -DgroupId=com.github.noraui -DartifactId=ojdbc8 -Dversion=12.2.0.1 -Dpackaging=jar -Dfile=../jars/ojdbc8-12.2.0.1.jar
8+
9+
## gbase driver
10+
mvn install:install-file -DgroupId=com.esen.jdbc -DartifactId=gbase -Dversion=8.3.81.53 -Dpackaging=jar -Dfile=../jars/gbase-8.3.81.53.jar
11+
12+
## dm driver
13+
mvn install:install-file -DgroupId=dm.jdbc.driver -DartifactId=dm7 -Dversion=18.0.0 -Dpackaging=jar -Dfile=../jars/Dm7JdbcDriver18.jar

ci/install_jars.sh

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
#!/usr/bin/env bash
2+
3+
## db2 driver
4+
mvn install:install-file -DgroupId=com.ibm.db2 -DartifactId=db2jcc -Dversion=3.72.44 -Dpackaging=jar -Dfile=jars/db2jcc-3.72.44.jar
5+
6+
## oracle driver
7+
mvn install:install-file -DgroupId=com.github.noraui -DartifactId=ojdbc8 -Dversion=12.2.0.1 -Dpackaging=jar -Dfile=jars/ojdbc8-12.2.0.1.jar
8+
9+
## gbase driver
10+
mvn install:install-file -DgroupId=com.esen.jdbc -DartifactId=gbase -Dversion=8.3.81.53 -Dpackaging=jar -Dfile=jars/gbase-8.3.81.53.jar
11+
12+
## dm driver
13+
mvn install:install-file -DgroupId=dm.jdbc.driver -DartifactId=dm7 -Dversion=18.0.0 -Dpackaging=jar -Dfile=jars/Dm7JdbcDriver18.jar

ci/sonar_notify.sh

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
#!/bin/bash
2+
#参考钉钉文档 https://open-doc.dingtalk.com/microapp/serverapi2/qf2nxq
3+
sonarreport=$(curl -s http://172.16.100.198:8082/?projectname=dt-insight-engine/flinkx)
4+
curl -s "https://oapi.dingtalk.com/robot/send?access_token=e2718f7311243d2e58fa2695aa9c67a37760c7fce553311a32d53b3f092328ed" \
5+
-H "Content-Type: application/json" \
6+
-d "{
7+
\"msgtype\": \"markdown\",
8+
\"markdown\": {
9+
\"title\":\"sonar代码质量\",
10+
\"text\": \"## sonar代码质量报告: \n
11+
> [sonar地址](http://172.16.100.198:9000/dashboard?id=dt-insight-engine/flinkx) \n
12+
> ${sonarreport} \n\"
13+
}
14+
}"

docs/binlog.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,11 @@
11
# MySQL binlog读取插件(*reader)
22

3-
## 1. 配置样例
3+
## 1.首先给莫个用户赋权,有读binglog的权限
4+
5+
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT
6+
ON *.* TO 'xxx'@'%' IDENTIFIED BY 'xxx';
7+
8+
## 2. 配置样例
49

510
```json
611
{
@@ -9,7 +14,7 @@
914
"reader": {
1015
"parameter": {
1116
"jdbcUrl" : "jdbc:mysql://127.0.0.1:3306/test?charset=utf8",
12-
"username" : "username"
17+
"username" : "username",
1318
"password" : "password",
1419
"host" : "127.0.0.1",
1520
"port": 3306,
@@ -164,10 +169,10 @@
164169
"table":"tb1",
165170
"ts":1231232,
166171
"ingestion":123213,
167-
"before_id":{
172+
"before":{
168173
"id":1
169174
},
170-
"after_id":{
175+
"after":{
171176
"id":2
172177
}
173178
}

0 commit comments

Comments
 (0)