Skip to content

Commit d003953

Browse files
authored
[doc](ecosystem) add datatype mapping for flink spark connector (#3247)
## Versions - [x] dev - [x] 4.x - [x] 3.x - [x] 2.1 ## Languages - [x] Chinese - [x] English ## Docs Checklist - [ ] Checked by AI - [ ] Test Cases Built
1 parent 20ddc5b commit d003953

File tree

16 files changed

+404
-80
lines changed

16 files changed

+404
-80
lines changed

docs/ecosystem/flink-doris-connector.md

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -901,7 +901,7 @@ After starting the Flink cluster, you can directly run the following command:
901901
| --multi-to-one-target | Used in combination with multi-to-one-origin, the configuration of the target table, for example: --multi-to-one-target "a\|b" |
902902
| --create-table-only | Whether to only synchronize the structure of the table. |
903903
904-
### Type Mapping
904+
### Doris to Flink Data Type Mapping
905905
906906
| Doris Type | Flink Type |
907907
| ---------- | ---------- |
@@ -928,6 +928,27 @@ After starting the Flink cluster, you can directly run the following command:
928928
| IPV4 | STRING |
929929
| IPV6 | STRING |
930930
931+
### Flink to Doris Data Type Mapping
932+
| Flink Type | Doris Type |
933+
| ------------- | -------------- |
934+
| BOOLEAN | BOOLEAN |
935+
| TINYINT | TINYINT |
936+
| SMALLINT | SMALLINT |
937+
| INTEGER | INTEGER |
938+
| BIGINT | BIGINT |
939+
| FLOAT | FLOAT |
940+
| DOUBLE | DOUBLE |
941+
| DECIMAL | DECIMAL |
942+
| CHAR | CHAR |
943+
| VARCHAR | VARCHAR/STRING |
944+
| STRING | STRING |
945+
| DATE | DATE |
946+
| TIMESTAMP | DATETIME |
947+
| TIMESTAMP_LTZ | DATETIME |
948+
| ARRAY | ARRAY |
949+
| MAP | MAP/JSON |
950+
| ROW | STRUCT/JSON |
951+
931952
### Monitoring Metrics
932953
933954
Flink provides multiple [Metrics](https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#metrics) for monitoring the indicators of the Flink cluster. The following are the newly added monitoring metrics for the Flink Doris Connector.

docs/ecosystem/spark-doris-connector.md

Lines changed: 29 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ Github: https://github.com/apache/doris-spark-connector
2121

2222
| Connector | Spark | Doris | Java | Scala |
2323
|-----------|---------------------|-------------|------|------------|
24+
| 25.2.0 | 3.5 - 3.1, 2.4 | 1.0 + | 8 | 2.12, 2.11 |
2425
| 25.1.0 | 3.5 - 3.1, 2.4 | 1.0 + | 8 | 2.12, 2.11 |
2526
| 25.0.1 | 3.5 - 3.1, 2.4 | 1.0 + | 8 | 2.12, 2.11 |
2627
| 25.0.0 | 3.5 - 3.1, 2.4 | 1.0 + | 8 | 2.12, 2.11 |
@@ -39,7 +40,7 @@ Github: https://github.com/apache/doris-spark-connector
3940
<dependency>
4041
<groupId>org.apache.doris</groupId>
4142
<artifactId>spark-doris-connector-spark-3.5</artifactId>
42-
<version>25.1.0</version>
43+
<version>25.2.0</version>
4344
</dependency>
4445
```
4546

@@ -62,7 +63,7 @@ Starting from version 24.0.0, the naming rules of the Doris connector package ha
6263

6364
When compiling, you can directly run `sh build.sh`, for details, please refer to here.
6465

65-
After successful compilation, the target jar package will be generated in the `dist` directory, such as: spark-doris-connector-spark-3.5-25.1.0.jar. Copy this file to the `ClassPath` of `Spark` to use `Spark-Doris-Connector`. For example, for `Spark` running in `Local` mode, put this file in the `jars/` folder. For `Spark` running in `Yarn` cluster mode, put this file in the pre-deployment package.
66+
After successful compilation, the target jar package will be generated in the `dist` directory, such as: spark-doris-connector-spark-3.5-25.2.0.jar. Copy this file to the `ClassPath` of `Spark` to use `Spark-Doris-Connector`. For example, for `Spark` running in `Local` mode, put this file in the `jars/` folder. For `Spark` running in `Yarn` cluster mode, put this file in the pre-deployment package.
6667
You can also
6768

6869
Execute in the source code directory:
@@ -71,21 +72,21 @@ Execute in the source code directory:
7172

7273
Enter the Scala and Spark versions you need to compile according to the prompts.
7374

74-
After successful compilation, the target jar package will be generated in the `dist` directory, such as: `spark-doris-connector-spark-3.5-25.1.0.jar`.
75+
After successful compilation, the target jar package will be generated in the `dist` directory, such as: `spark-doris-connector-spark-3.5-25.2.0.jar`.
7576
Copy this file to the `ClassPath` of `Spark` to use `Spark-Doris-Connector`.
7677

7778
For example, if `Spark` is running in `Local` mode, put this file in the `jars/` folder. If `Spark` is running in `Yarn` cluster mode, put this file in the pre-deployment package.
7879

79-
For example, upload `spark-doris-connector-spark-3.5-25.1.0.jar` to hdfs and add the Jar package path on hdfs to the `spark.yarn.jars` parameter
80+
For example, upload `spark-doris-connector-spark-3.5-25.2.0.jar` to hdfs and add the Jar package path on hdfs to the `spark.yarn.jars` parameter
8081
```shell
8182

82-
1. Upload `spark-doris-connector-spark-3.5-25.1.0.jar` to hdfs.
83+
1. Upload `spark-doris-connector-spark-3.5-25.2.0.jar` to hdfs.
8384

8485
hdfs dfs -mkdir /spark-jars/
85-
hdfs dfs -put /your_local_path/spark-doris-connector-spark-3.5-25.1.0.jar /spark-jars/
86+
hdfs dfs -put /your_local_path/spark-doris-connector-spark-3.5-25.2.0.jar /spark-jars/
8687

87-
2. Add the `spark-doris-connector-spark-3.5-25.1.0.jar` dependency in the cluster.
88-
spark.yarn.jars=hdfs:///spark-jars/spark-doris-connector-spark-3.5-25.1.0.jar
88+
2. Add the `spark-doris-connector-spark-3.5-25.2.0.jar` dependency in the cluster.
89+
spark.yarn.jars=hdfs:///spark-jars/spark-doris-connector-spark-3.5-25.2.0.jar
8990

9091
```
9192

@@ -449,7 +450,7 @@ insert into your_catalog_name.your_doris_db.your_doris_table select * from your_
449450
| doris.filter.query | -- | Filter expression of the query, which is transparently transmitted to Doris. Doris uses this expression to complete source-side data filtering. |
450451

451452

452-
## Doris & Spark Column Type Mapping
453+
## Data Type Mapping from Doris to Spark
453454

454455
| Doris Type | Spark Type |
455456
|------------|-------------------------|
@@ -474,6 +475,25 @@ insert into your_catalog_name.your_doris_db.your_doris_table select * from your_
474475
| HLL | DataTypes.StringType |
475476
| Bitmap | DataTypes.StringType |
476477

478+
479+
## Data Type Mapping from Spark to Doris
480+
481+
| Spark Type | Doris Type |
482+
|----------------|----------------|
483+
| BooleanType | BOOLEAN |
484+
| ShortType | SMALLINT |
485+
| IntegerType | INT |
486+
| LongType | BIGINT |
487+
| FloatType | FLOAT |
488+
| DoubleType | DOUBLE |
489+
| DecimalType | DECIMAL |
490+
| StringType | VARCHAR/STRING |
491+
| DateType | DATE |
492+
| TimestampType | DATETIME |
493+
| ArrayType | ARRAY |
494+
| MapType | MAP/JSON |
495+
| StructType | STRUCT/JSON |
496+
477497
:::tip
478498

479499
Since version 24.0.0, the return type of the Bitmap type is string type, and the default return value is string value `Read unsupported`.

i18n/zh-CN/docusaurus-plugin-content-docs/current/ecosystem/flink-doris-connector.md

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -902,7 +902,7 @@ Flink Doris Connector 中集成了[Flink CDC](https://nightlies.apache.org/flink
902902
| --multi-to-one-target | 与 multi-to-one-origin 搭配使用,目标表的配置,比如:--multi-to-one-target "a\|b" |
903903
| --create-table-only | 是否只仅仅同步表的结构 |
904904
905-
### 类型映射
905+
### Doris 到 Flink 的数据类型映射
906906
907907
| Doris Type | Flink Type |
908908
| ---------- | ---------- |
@@ -929,6 +929,27 @@ Flink Doris Connector 中集成了[Flink CDC](https://nightlies.apache.org/flink
929929
| IPV4 | STRING |
930930
| IPV6 | STRING |
931931
932+
### Flink 到 Doris 的数据类型映射
933+
| Flink Type | Doris Type |
934+
| ------------- | -------------- |
935+
| BOOLEAN | BOOLEAN |
936+
| TINYINT | TINYINT |
937+
| SMALLINT | SMALLINT |
938+
| INTEGER | INTEGER |
939+
| BIGINT | BIGINT |
940+
| FLOAT | FLOAT |
941+
| DOUBLE | DOUBLE |
942+
| DECIMAL | DECIMAL |
943+
| CHAR | CHAR |
944+
| VARCHAR | VARCHAR/STRING |
945+
| STRING | STRING |
946+
| DATE | DATE |
947+
| TIMESTAMP | DATETIME |
948+
| TIMESTAMP_LTZ | DATETIME |
949+
| ARRAY | ARRAY |
950+
| MAP | MAP/JSON |
951+
| ROW | STRUCT/JSON |
952+
932953
### 监控指标
933954
934955
Flink 提供了多种[Metrics](https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#metrics)用于监测 Flink 集群的指标,以下为 Flink Doris Connector 新增的监控指标。

i18n/zh-CN/docusaurus-plugin-content-docs/current/ecosystem/spark-doris-connector.md

Lines changed: 28 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ Spark Doris Connector 可以支持通过 Spark 读取 Doris 中存储的数据
2020

2121
| Connector | Spark | Doris | Java | Scala |
2222
|-----------|---------------------|-------------|------|------------|
23+
| 25.2.0 | 3.5 - 3.1, 2.4 | 1.0 + | 8 | 2.12, 2.11 |
2324
| 25.1.0 | 3.5 - 3.1, 2.4 | 1.0 + | 8 | 2.12, 2.11 |
2425
| 25.0.1 | 3.5 - 3.1, 2.4 | 1.0 + | 8 | 2.12, 2.11 |
2526
| 25.0.0 | 3.5 - 3.1, 2.4 | 1.0 + | 8 | 2.12, 2.11 |
@@ -37,7 +38,7 @@ Spark Doris Connector 可以支持通过 Spark 读取 Doris 中存储的数据
3738
<dependency>
3839
<groupId>org.apache.doris</groupId>
3940
<artifactId>spark-doris-connector-spark-3.5</artifactId>
40-
<version>25.1.0</version>
41+
<version>25.2.0</version>
4142
</dependency>
4243
```
4344

@@ -60,28 +61,28 @@ Spark Doris Connector 可以支持通过 Spark 读取 Doris 中存储的数据
6061

6162
编译时,可直接运行 `sh build.sh`,具体可参考这里。
6263

63-
编译成功后,会在 `dist` 目录生成目标 jar 包,如:spark-doris-connector-spark-3.5-25.1.0.jar。将此文件复制到 `Spark``ClassPath` 中即可使用 `Spark-Doris-Connector`。例如,`Local` 模式运行的 `Spark`,将此文件放入 `jars/` 文件夹下。`Yarn`集群模式运行的`Spark`,则将此文件放入预部署包中。
64+
编译成功后,会在 `dist` 目录生成目标 jar 包,如:spark-doris-connector-spark-3.5-25.2.0.jar。将此文件复制到 `Spark``ClassPath` 中即可使用 `Spark-Doris-Connector`。例如,`Local` 模式运行的 `Spark`,将此文件放入 `jars/` 文件夹下。`Yarn`集群模式运行的`Spark`,则将此文件放入预部署包中。
6465
也可以
6566

6667

6768
2. 在源码目录下执行:
6869
`sh build.sh`
6970
根据提示输入你需要的 Scala 与 Spark 版本进行编译。
7071

71-
编译成功后,会在 `dist` 目录生成目标 jar 包,如:`spark-doris-connector-spark-3.5-25.1.0.jar`
72+
编译成功后,会在 `dist` 目录生成目标 jar 包,如:`spark-doris-connector-spark-3.5-25.2.0.jar`
7273
将此文件复制到 `Spark``ClassPath` 中即可使用 `Spark-Doris-Connector`
7374

7475
例如,`Local` 模式运行的 `Spark`,将此文件放入 `jars/` 文件夹下。`Yarn`集群模式运行的`Spark`,则将此文件放入预部署包中。
7576

76-
例如将 `spark-doris-connector-spark-3.5-25.1.0.jar` 上传到 hdfs 并在 `spark.yarn.jars` 参数上添加 hdfs 上的 Jar 包路径
77+
例如将 `spark-doris-connector-spark-3.5-25.2.0.jar` 上传到 hdfs 并在 `spark.yarn.jars` 参数上添加 hdfs 上的 Jar 包路径
7778
```shell
78-
1. 上传 `spark-doris-connector-spark-3.5-25.1.0.jar` 到 hdfs。
79+
1. 上传 `spark-doris-connector-spark-3.5-25.2.0.jar` 到 hdfs。
7980

8081
hdfs dfs -mkdir /spark-jars/
81-
hdfs dfs -put /your_local_path/spark-doris-connector-spark-3.5-25.1.0.jar /spark-jars/
82+
hdfs dfs -put /your_local_path/spark-doris-connector-spark-3.5-25.2.0.jar /spark-jars/
8283

83-
2. 在集群中添加 `spark-doris-connector-spark-3.5-25.1.0.jar` 依赖。
84-
spark.yarn.jars=hdfs:///spark-jars/spark-doris-connector-spark-3.5-25.1.0.jar
84+
2. 在集群中添加 `spark-doris-connector-spark-3.5-25.2.0.jar` 依赖。
85+
spark.yarn.jars=hdfs:///spark-jars/spark-doris-connector-spark-3.5-25.2.0.jar
8586

8687
```
8788

@@ -449,7 +450,7 @@ insert into your_catalog_name.your_doris_db.your_doris_table select * from your_
449450
| doris.filter.query | -- | 过滤读取数据的表达式,此表达式透传给 Doris。Doris 使用此表达式完成源端数据过滤。 |
450451

451452

452-
## Doris Spark 列类型映射关系
453+
## Doris Spark 列类型映射关系
453454

454455
| Doris Type | Spark Type |
455456
|------------|-------------------------|
@@ -474,6 +475,24 @@ insert into your_catalog_name.your_doris_db.your_doris_table select * from your_
474475
| HLL | DataTypes.StringType |
475476
| Bitmap | DataTypes.StringType |
476477

478+
## Spark 到 Doris 的数据类型映射
479+
480+
| Spark Type | Doris Type |
481+
|----------------|----------------|
482+
| BooleanType | BOOLEAN |
483+
| ShortType | SMALLINT |
484+
| IntegerType | INT |
485+
| LongType | BIGINT |
486+
| FloatType | FLOAT |
487+
| DoubleType | DOUBLE |
488+
| DecimalType | DECIMAL |
489+
| StringType | VARCHAR/STRING |
490+
| DateType | DATE |
491+
| TimestampType | DATETIME |
492+
| ArrayType | ARRAY |
493+
| MapType | MAP/JSON |
494+
| StructType | STRUCT/JSON |
495+
477496
:::tip
478497

479498
从 24.0.0 版本开始,Bitmap 类型读取返回类型为字符串,默认返回字符串值 Read unsupported。

i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/ecosystem/flink-doris-connector.md

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -902,7 +902,7 @@ Flink Doris Connector 中集成了[Flink CDC](https://nightlies.apache.org/flink
902902
| --multi-to-one-target | 与 multi-to-one-origin 搭配使用,目标表的配置,比如:--multi-to-one-target "a\|b" |
903903
| --create-table-only | 是否只仅仅同步表的结构 |
904904
905-
### 类型映射
905+
### Doris 到 Flink 的数据类型映射
906906
907907
| Doris Type | Flink Type |
908908
| ---------- | ---------- |
@@ -929,6 +929,27 @@ Flink Doris Connector 中集成了[Flink CDC](https://nightlies.apache.org/flink
929929
| IPV4 | STRING |
930930
| IPV6 | STRING |
931931
932+
### Flink 到 Doris 的数据类型映射
933+
| Flink Type | Doris Type |
934+
| ------------- | -------------- |
935+
| BOOLEAN | BOOLEAN |
936+
| TINYINT | TINYINT |
937+
| SMALLINT | SMALLINT |
938+
| INTEGER | INTEGER |
939+
| BIGINT | BIGINT |
940+
| FLOAT | FLOAT |
941+
| DOUBLE | DOUBLE |
942+
| DECIMAL | DECIMAL |
943+
| CHAR | CHAR |
944+
| VARCHAR | VARCHAR/STRING |
945+
| STRING | STRING |
946+
| DATE | DATE |
947+
| TIMESTAMP | DATETIME |
948+
| TIMESTAMP_LTZ | DATETIME |
949+
| ARRAY | ARRAY |
950+
| MAP | MAP/JSON |
951+
| ROW | STRUCT/JSON |
952+
932953
### 监控指标
933954
934955
Flink 提供了多种[Metrics](https://nightlies.apache.org/flink/flink-docs-master/docs/ops/metrics/#metrics)用于监测 Flink 集群的指标,以下为 Flink Doris Connector 新增的监控指标。

i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/ecosystem/spark-doris-connector.md

Lines changed: 28 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ Spark Doris Connector 可以支持通过 Spark 读取 Doris 中存储的数据
2020

2121
| Connector | Spark | Doris | Java | Scala |
2222
|-----------|---------------------|-------------|------|------------|
23+
| 25.2.0 | 3.5 - 3.1, 2.4 | 1.0 + | 8 | 2.12, 2.11 |
2324
| 25.1.0 | 3.5 - 3.1, 2.4 | 1.0 + | 8 | 2.12, 2.11 |
2425
| 25.0.1 | 3.5 - 3.1, 2.4 | 1.0 + | 8 | 2.12, 2.11 |
2526
| 25.0.0 | 3.5 - 3.1, 2.4 | 1.0 + | 8 | 2.12, 2.11 |
@@ -37,7 +38,7 @@ Spark Doris Connector 可以支持通过 Spark 读取 Doris 中存储的数据
3738
<dependency>
3839
<groupId>org.apache.doris</groupId>
3940
<artifactId>spark-doris-connector-spark-3.5</artifactId>
40-
<version>25.1.0</version>
41+
<version>25.2.0</version>
4142
</dependency>
4243
```
4344

@@ -60,28 +61,28 @@ Spark Doris Connector 可以支持通过 Spark 读取 Doris 中存储的数据
6061

6162
编译时,可直接运行 `sh build.sh`,具体可参考这里。
6263

63-
编译成功后,会在 `dist` 目录生成目标 jar 包,如:spark-doris-connector-spark-3.5-25.1.0.jar。将此文件复制到 `Spark``ClassPath` 中即可使用 `Spark-Doris-Connector`。例如,`Local` 模式运行的 `Spark`,将此文件放入 `jars/` 文件夹下。`Yarn`集群模式运行的`Spark`,则将此文件放入预部署包中。
64+
编译成功后,会在 `dist` 目录生成目标 jar 包,如:spark-doris-connector-spark-3.5-25.2.0.jar。将此文件复制到 `Spark``ClassPath` 中即可使用 `Spark-Doris-Connector`。例如,`Local` 模式运行的 `Spark`,将此文件放入 `jars/` 文件夹下。`Yarn`集群模式运行的`Spark`,则将此文件放入预部署包中。
6465
也可以
6566

6667

6768
2. 在源码目录下执行:
6869
`sh build.sh`
6970
根据提示输入你需要的 Scala 与 Spark 版本进行编译。
7071

71-
编译成功后,会在 `dist` 目录生成目标 jar 包,如:`spark-doris-connector-spark-3.5-25.1.0.jar`
72+
编译成功后,会在 `dist` 目录生成目标 jar 包,如:`spark-doris-connector-spark-3.5-25.2.0.jar`
7273
将此文件复制到 `Spark``ClassPath` 中即可使用 `Spark-Doris-Connector`
7374

7475
例如,`Local` 模式运行的 `Spark`,将此文件放入 `jars/` 文件夹下。`Yarn`集群模式运行的`Spark`,则将此文件放入预部署包中。
7576

76-
例如将 `spark-doris-connector-spark-3.5-25.1.0.jar` 上传到 hdfs 并在 `spark.yarn.jars` 参数上添加 hdfs 上的 Jar 包路径
77+
例如将 `spark-doris-connector-spark-3.5-25.2.0.jar` 上传到 hdfs 并在 `spark.yarn.jars` 参数上添加 hdfs 上的 Jar 包路径
7778
```shell
78-
1. 上传 `spark-doris-connector-spark-3.5-25.1.0.jar` 到 hdfs。
79+
1. 上传 `spark-doris-connector-spark-3.5-25.2.0.jar` 到 hdfs。
7980

8081
hdfs dfs -mkdir /spark-jars/
81-
hdfs dfs -put /your_local_path/spark-doris-connector-spark-3.5-25.1.0.jar /spark-jars/
82+
hdfs dfs -put /your_local_path/spark-doris-connector-spark-3.5-25.2.0.jar /spark-jars/
8283

83-
2. 在集群中添加 `spark-doris-connector-spark-3.5-25.1.0.jar` 依赖。
84-
spark.yarn.jars=hdfs:///spark-jars/spark-doris-connector-spark-3.5-25.1.0.jar
84+
2. 在集群中添加 `spark-doris-connector-spark-3.5-25.2.0.jar` 依赖。
85+
spark.yarn.jars=hdfs:///spark-jars/spark-doris-connector-spark-3.5-25.2.0.jar
8586

8687
```
8788

@@ -449,7 +450,7 @@ insert into your_catalog_name.your_doris_db.your_doris_table select * from your_
449450
| doris.filter.query | -- | 过滤读取数据的表达式,此表达式透传给 Doris。Doris 使用此表达式完成源端数据过滤。 |
450451

451452

452-
## Doris Spark 列类型映射关系
453+
## Doris Spark 列类型映射关系
453454

454455
| Doris Type | Spark Type |
455456
|------------|-------------------------|
@@ -474,6 +475,24 @@ insert into your_catalog_name.your_doris_db.your_doris_table select * from your_
474475
| HLL | DataTypes.StringType |
475476
| Bitmap | DataTypes.StringType |
476477

478+
## Spark 到 Doris 的数据类型映射
479+
480+
| Spark Type | Doris Type |
481+
|----------------|----------------|
482+
| BooleanType | BOOLEAN |
483+
| ShortType | SMALLINT |
484+
| IntegerType | INT |
485+
| LongType | BIGINT |
486+
| FloatType | FLOAT |
487+
| DoubleType | DOUBLE |
488+
| DecimalType | DECIMAL |
489+
| StringType | VARCHAR/STRING |
490+
| DateType | DATE |
491+
| TimestampType | DATETIME |
492+
| ArrayType | ARRAY |
493+
| MapType | MAP/JSON |
494+
| StructType | STRUCT/JSON |
495+
477496
:::tip
478497

479498
从 24.0.0 版本开始,Bitmap 类型读取返回类型为字符串,默认返回字符串值 Read unsupported。

0 commit comments

Comments
 (0)