Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions docs/en/connector-v2/sink/HdfsFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,12 +88,29 @@ Output data to hdfs file
| enable_header_write | boolean | no | false | Only used when file_format_type is text,csv.<br/> false:don't write header,true:write header. |
| encoding | string | no | "UTF-8" | Only used when file_format_type is json,text,csv,xml. |
| remote_user | string | no | - | The remote user name of hdfs. |
| schema_save_mode | string | no | CREATE_SCHEMA_WHEN_NOT_EXIST | Existing dir processing method |
| data_save_mode | string | no | APPEND_DATA | Existing data processing method |
| merge_update_event | boolean | no | false | Only used when file_format_type is canal_json,debezium_json or maxwell_json. When value is true, the UPDATE_AFTER and UPDATE_BEFORE event will be merged into UPDATE event data |

### Tips

> If you use spark/flink, In order to use this connector, You must ensure your spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x. If you use SeaTunnel Engine, It automatically integrated the hadoop jar when you download and install SeaTunnel Engine. You can check the jar package under ${SEATUNNEL_HOME}/lib to confirm this.

### schema_save_mode [string]

Existing dir processing method.
- RECREATE_SCHEMA: will create when the dir does not exist, delete and recreate when the dir is exist
- CREATE_SCHEMA_WHEN_NOT_EXIST: will create when the dir does not exist, skipped when the dir is exist
- ERROR_WHEN_SCHEMA_NOT_EXIST: error will be reported when the dir does not exist
- IGNORE :Ignore the treatment of the table

### data_save_mode [string]

Existing data processing method.
- DROP_DATA: preserve dir and delete data files
- APPEND_DATA: preserve dir, preserve data files
- ERROR_WHEN_DATA_EXISTS: when there is data files, an error is reported

### merge_update_event [boolean]

Only used when file_format_type is canal_json,debezium_json or maxwell_json.
Expand Down
17 changes: 17 additions & 0 deletions docs/zh/connector-v2/sink/HdfsFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,8 @@ import ChangeLog from '../changelog/connector-file-hadoop.md';
| max_rows_in_memory | int | 否 | - | 仅当 file_format 为 excel 时使用。当文件格式为 Excel 时,可以缓存在内存中的最大数据项数。 |
| sheet_name | string | 否 | Sheet${Random number} | 仅当 file_format 为 excel 时使用。将工作簿的表写入指定的表名 |
| remote_user | string | 否 | - | Hdfs的远端用户名。 |
| schema_save_mode | string | 否 | CREATE_SCHEMA_WHEN_NOT_EXIST | 现有目录处理方式 |
| data_save_mode | string | 否 | APPEND_DATA | 现有数据处理方式 |
| merge_update_event | boolean | 否 | false | 仅当file_format_type为canal_json、debezium_json、maxwell_json. |

### 提示
Expand All @@ -87,6 +89,21 @@ import ChangeLog from '../changelog/connector-file-hadoop.md';
> 2.x。如果您使用 SeaTunnel Engine,则在下载和安装 SeaTunnel Engine 时会自动集成 hadoop
> jar。您可以检查 `${SEATUNNEL_HOME}/lib` 下的 jar 包来确认这一点。

### schema_save_mode [string]

现有的目录处理方法。
- RECREATE_SCHEMA:当目录不存在时创建,当目录存在时删除并重新创建
- CREATE_SCHEMA_WHEN_NOT_EXIST:当目录不存在时创建,当目录存在时跳过
- ERROR_WHEN_SCHEMA_NOT_EXIST:当目录不存在时,将报告错误
- IGNORE:忽略对表的处理

### data_save_mode [string]

现有的数据处理方法。
- DROP_DATA:保留目录并删除数据文件
- APPEND_DATA:保留目录,保留数据文件
- ERROR_WHEN_DATA_EXISTS:当有数据文件时,会报告错误

### merge_update_event [boolean]

仅当file_format_type为canal_json、debezium_json、maxwell_json时使用.
Expand Down
21 changes: 19 additions & 2 deletions docs/zh/connector-v2/sink/LocalFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,10 @@ import ChangeLog from '../changelog/connector-file-local.md';
| parquet_avro_write_timestamp_as_int96 | boolean | 否 | false | 仅在 file_format 为 parquet 时使用 |
| parquet_avro_write_fixed_as_int96 | array | 否 | - | 仅在 file_format 为 parquet 时使用 |
| enable_header_write | boolean | 否 | false | 仅在 file_format_type 为 text,csv 时使用。<br/> false:不写入表头,true:写入表头。 |
| encoding | string | 否 | "UTF-8" | 仅在 file_format_type 为 json,text,csv,xml 时使用 |
| merge_update_event | boolean | 否 | false | 仅当file_format_type为canal_json、debezium_json、maxwell_json. |
| encoding | string | 否 | "UTF-8" | 仅在 file_format_type 为 json,text,csv,xml 时使用 |
| schema_save_mode | string | 否 | CREATE_SCHEMA_WHEN_NOT_EXIST | 现有目录处理方式 |
| data_save_mode | string | 否 | APPEND_DATA | 现有数据处理方式 |
| merge_update_event | boolean | 否 | false | 仅当file_format_type为canal_json、debezium_json、maxwell_json. |

### path [string]

Expand Down Expand Up @@ -226,6 +228,21 @@ _root_tag [string]

仅在 file_format_type 为 json,text,csv,xml 时使用。文件写入的编码。该参数将通过 `Charset.forName(encoding)` 解析。

### schema_save_mode [string]

现有的目录处理方法。
- RECREATE_SCHEMA:当目录不存在时创建,当目录存在时删除并重新创建
- CREATE_SCHEMA_WHEN_NOT_EXIST:当目录不存在时创建,当目录存在时跳过
- ERROR_WHEN_SCHEMA_NOT_EXIST:当目录不存在时,将报告错误
- IGNORE:忽略对表的处理

### data_save_mode [string]

现有的数据处理方法。
- DROP_DATA:保留目录并删除数据文件
- APPEND_DATA:保留目录,保留数据文件
- ERROR_WHEN_DATA_EXISTS:当有数据文件时,会报告错误

### merge_update_event [boolean]

仅当file_format_type为canal_json、debezium_json、maxwell_json时使用.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,8 @@ public OptionRule optionRule() {
.optional(FileBaseSinkOptions.CREATE_EMPTY_FILE_WHEN_NO_DATA)
.optional(FileBaseSinkOptions.FILENAME_EXTENSION)
.optional(FileBaseSinkOptions.TMP_PATH)
.optional(FileBaseSinkOptions.SCHEMA_SAVE_MODE)
.optional(FileBaseSinkOptions.DATA_SAVE_MODE)
.build();
}

Expand Down