Skip to content

Commit db1caee

Browse files
authored
[FLINK-38837][docs][pipeline-connector/hudi] Add documentation for Hudi connector and improve the configuration
This closes #4200.
1 parent 1da0567 commit db1caee

File tree

8 files changed

+477
-108
lines changed

8 files changed

+477
-108
lines changed
Lines changed: 222 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
---
2+
title: "Hudi"
3+
weight: 10
4+
type: docs
5+
aliases:
6+
- /connectors/pipeline-connectors/hudi
7+
---
8+
<!--
9+
Licensed to the Apache Software Foundation (ASF) under one
10+
or more contributor license agreements. See the NOTICE file
11+
distributed with this work for additional information
12+
regarding copyright ownership. The ASF licenses this file
13+
to you under the Apache License, Version 2.0 (the
14+
"License"); you may not use this file except in compliance
15+
with the License. You may obtain a copy of the License at
16+
17+
http://www.apache.org/licenses/LICENSE-2.0
18+
19+
Unless required by applicable law or agreed to in writing,
20+
software distributed under the License is distributed on an
21+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
22+
KIND, either express or implied. See the License for the
23+
specific language governing permissions and limitations
24+
under the License.
25+
-->
26+
27+
# Hudi Pipeline 连接器
28+
29+
Hudi Pipeline 连接器可以用作 Pipeline 的 *Data Sink*,将数据写入[Apache Hudi](https://hudi.apache.org)。 本文档介绍如何设置 Hudi Pipeline 连接器。
30+
31+
## 连接器的功能
32+
* 自动创建 Hudi 表
33+
* 自动的表结构变更同步
34+
* 数据实时同步
35+
36+
如何创建 Pipeline
37+
----------------
38+
39+
从 MySQL 读取数据同步到 Hudi 的 Pipeline 可以定义如下:
40+
41+
```yaml
42+
source:
43+
type: mysql
44+
name: MySQL Source
45+
hostname: 127.0.0.1
46+
port: 3306
47+
username: admin
48+
password: pass
49+
tables: adb.\.*, bdb.user_table_[0-9]+
50+
server-id: 5401-5404
51+
52+
sink:
53+
type: hudi
54+
name: Hudi Sink
55+
path: /path/warehouse
56+
hoodie.table.type: MERGE_ON_READ
57+
58+
transform:
59+
- source-table: adb.\.*
60+
table-options: ordering.fields=ts1
61+
- source-table: bdb.user_table_[0-9]+
62+
table-options: ordering.fields=ts2
63+
64+
pipeline:
65+
name: MySQL to Hudi Pipeline
66+
parallelism: 4
67+
```
68+
69+
Pipeline 连接器配置项
70+
----------------
71+
<div class="highlight">
72+
<table class="colwidths-auto docutils">
73+
<thead>
74+
<tr>
75+
<th class="text-left" style="width: 25%">Option</th>
76+
<th class="text-left" style="width: 8%">Required</th>
77+
<th class="text-left" style="width: 7%">Default</th>
78+
<th class="text-left" style="width: 10%">Type</th>
79+
<th class="text-left" style="width: 50%">Description</th>
80+
</tr>
81+
</thead>
82+
<tbody>
83+
<tr>
84+
<td>type</td>
85+
<td>required</td>
86+
<td style="word-wrap: break-word;">(none)</td>
87+
<td>String</td>
88+
<td>指定要使用的连接器, 这里需要设置成 <code>'hudi'</code></td>
89+
</tr>
90+
<tr>
91+
<td>path</td>
92+
<td>required</td>
93+
<td style="word-wrap: break-word;">(none)</td>
94+
<td>String</td>
95+
<td>为目标 <code>hudi</code> 表指定基本路径</td>
96+
</tr>
97+
<tr>
98+
<td>name</td>
99+
<td>optional</td>
100+
<td style="word-wrap: break-word;">(none)</td>
101+
<td>String</td>
102+
<td>Sink 的名称</td>
103+
</tr>
104+
<tr>
105+
<td>hoodie.table.type</td>
106+
<td>optional</td>
107+
<td style="word-wrap: break-word;">(none)</td>
108+
<td>String</td>
109+
<td>目标 <code>hudi</code> 表的类型, 目前仅支持 <code>MERGE_ON_READ</code></td>
110+
</tr>
111+
<tr>
112+
<td>index.type</td>
113+
<td>optional</td>
114+
<td style="word-wrap: break-word;">(none)</td>
115+
<td>String</td>
116+
<td>目标 <code>hudi</code> 表的索引类型, 目前仅支持 <code>BUCKET</code></td>
117+
</tr>
118+
<tr>
119+
<td>table.properties.*</td>
120+
<td>optional</td>
121+
<td style="word-wrap: break-word;">(none)</td>
122+
<td>String</td>
123+
<td>将 <code>hudi</code> 表支持的参数传递给 pipeline,参考 <a href="https://hudi.apache.org/docs/configurations/#FLINK_SQL">Hudi table options</a></td>
124+
</tr>
125+
</tbody>
126+
</table>
127+
</div>
128+
129+
使用说明
130+
--------
131+
132+
* 源表必须定义主键。用户可通过 <code>transform</code> 规则来自覆盖主键配置,或设置 Ordering Fields 配置。
133+
* 目前只支持 MERGE_ON_READ 表类型,以及简单 BUCKET 索引。
134+
* 暂不支持 exactly-once 语义,连接器通过 at-least-once 语义和主键表实现幂等写。
135+
136+
数据类型映射
137+
----------------
138+
<div class="wy-table-responsive">
139+
<table class="colwidths-auto docutils">
140+
<thead>
141+
<tr>
142+
<th class="text-left">CDC type</th>
143+
<th class="text-left">Hudi type</th>
144+
<th class="text-left" style="width:60%;">NOTE</th>
145+
</tr>
146+
</thead>
147+
<tbody>
148+
<tr>
149+
<td>TINYINT</td>
150+
<td>TINYINT</td>
151+
<td></td>
152+
</tr>
153+
<tr>
154+
<td>SMALLINT</td>
155+
<td>SMALLINT</td>
156+
<td></td>
157+
</tr>
158+
<tr>
159+
<td>INT</td>
160+
<td>INT</td>
161+
<td></td>
162+
</tr>
163+
<tr>
164+
<td>BIGINT</td>
165+
<td>BIGINT</td>
166+
<td></td>
167+
</tr>
168+
<tr>
169+
<td>FLOAT</td>
170+
<td>FLOAT</td>
171+
<td></td>
172+
</tr>
173+
<tr>
174+
<td>DOUBLE</td>
175+
<td>DOUBLE</td>
176+
<td></td>
177+
</tr>
178+
<tr>
179+
<td>DECIMAL(p, s)</td>
180+
<td>DECIMAL(p, s)</td>
181+
<td></td>
182+
</tr>
183+
<tr>
184+
<td>BOOLEAN</td>
185+
<td>BOOLEAN</td>
186+
<td></td>
187+
</tr>
188+
<tr>
189+
<td>DATE</td>
190+
<td>DATE</td>
191+
<td></td>
192+
</tr>
193+
<tr>
194+
<td>TIME</td>
195+
<td>TIME</td>
196+
<td></td>
197+
</tr>
198+
<tr>
199+
<td>TIMESTAMP</td>
200+
<td>TIMESTAMP</td>
201+
<td></td>
202+
</tr>
203+
<tr>
204+
<td>BINARY</td>
205+
<td>BINARY</td>
206+
<td></td>
207+
</tr>
208+
<tr>
209+
<td>CHAR(n)</td>
210+
<td>CHAR(n)</td>
211+
<td></td>
212+
</tr>
213+
<tr>
214+
<td>VARCHAR(n)</td>
215+
<td>VARCHAR(n)</td>
216+
<td></td>
217+
</tr>
218+
</tbody>
219+
</table>
220+
</div>
221+
222+
{{< top >}}

0 commit comments

Comments
 (0)