Skip to content

Commit abb2e55

Browse files
authored
Merge branch 'master' into enable-inculde-comments
2 parents 1729423 + 79e868b commit abb2e55

File tree

119 files changed

+5844
-977
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

119 files changed

+5844
-977
lines changed

README.md

Lines changed: 28 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,14 +28,41 @@ and elegance of data integration via YAML to describe the data movement and tran
2828
The Flink CDC prioritizes efficient end-to-end data integration and offers enhanced functionalities such as
2929
full database synchronization, sharding table synchronization, schema evolution and data transformation.
3030

31-
![Flink CDC framework desigin](docs/static/fig/architecture.png)
31+
![Flink CDC framework design](docs/static/fig/architecture.png)
3232

33+
### Quickstart Guide
3334

35+
Flink CDC provides a CdcUp CLI utility to start a playground environment and run Flink CDC jobs.
36+
You will need to have a working Docker and Docker compose environment to use it.
37+
38+
1. Run `git clone https://github.com/apache/flink-cdc.git --depth=1` to retrieve a copy of Flink CDC source code.
39+
2. Run `cd tools/cdcup/ && ./cdcup.sh init` to use the CdcUp tool to start a playground environment.
40+
3. Run `./cdcup.sh up` to boot-up docker containers, and wait for them to be ready.
41+
4. Run `./cdcup.sh mysql` to open a MySQL session, and create at least one table.
42+
43+
```sql
44+
-- initialize db and table
45+
CREATE DATABASE cdc_playground;
46+
USE cdc_playground;
47+
CREATE TABLE test_table (id INT PRIMARY KEY, name VARCHAR(32));
48+
49+
-- insert test data
50+
INSERT INTO test_table VALUES (1, 'alice'), (2, 'bob'), (3, 'cicada'), (4, 'derrida');
51+
52+
-- verify if it has been successfully inserted
53+
SELECT * FROM test_table;
54+
```
55+
56+
5. Run `./cdcup.sh pipeline pipeline-definition.yaml` to submit the pipeline job. You may also edit the pipeline definition file for further configurations.
57+
6. Run `./cdcup.sh flink` to access the Flink Web UI.
3458

3559
### Getting Started
3660

3761
1. Prepare a [Apache Flink](https://nightlies.apache.org/flink/flink-docs-master/docs/try-flink/local_installation/#starting-and-stopping-a-local-cluster) cluster and set up `FLINK_HOME` environment variable.
3862
2. [Download](https://github.com/apache/flink-cdc/releases) Flink CDC tar, unzip it and put jars of pipeline connector to Flink `lib` directory.
63+
64+
> If you're using macOS or Linux, you may use `brew install apache-flink-cdc` to install Flink CDC and compatible connectors quickly.
65+
3966
3. Create a **YAML** file to describe the data source and data sink, the following example synchronizes all tables under MySQL app_db database to Doris :
4067
```yaml
4168
source:
@@ -89,8 +116,6 @@ Try it out yourself with our more detailed [tutorial](docs/content/docs/get-star
89116
You can also see [connector overview](docs/content/docs/connectors/pipeline-connectors/overview.md) to view a comprehensive catalog of the
90117
connectors currently provided and understand more detailed configurations.
91118

92-
93-
94119
### Join the Community
95120

96121
There are many ways to participate in the Apache Flink CDC community. The

docs/content.zh/docs/connectors/pipeline-connectors/maxcompute.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -94,15 +94,15 @@ pipeline:
9494
<td>Sink 的名称.</td>
9595
</tr>
9696
<tr>
97-
<td>accessId</td>
97+
<td>access-id</td>
9898
<td>required</td>
9999
<td style="word-wrap: break-word;">(none)</td>
100100
<td>String</td>
101101
<td>阿里云账号或RAM用户的AccessKey ID。您可以进入<a href="https://ram.console.aliyun.com/manage/ak">
102102
AccessKey管理页面</a> 获取AccessKey ID。</td>
103103
</tr>
104104
<tr>
105-
<td>accessKey</td>
105+
<td>access-key</td>
106106
<td>required</td>
107107
<td style="word-wrap: break-word;">(none)</td>
108108
<td>String</td>
@@ -126,65 +126,65 @@ pipeline:
126126
MaxCompute控制台</a>,在 工作区 > 项目管理 页面获取MaxCompute项目名称。</td>
127127
</tr>
128128
<tr>
129-
<td>tunnelEndpoint</td>
129+
<td>tunnel.endpoint</td>
130130
<td>optional</td>
131131
<td style="word-wrap: break-word;">(none)</td>
132132
<td>String</td>
133133
<td>MaxCompute Tunnel服务的连接地址,通常这项配置可以根据指定的project所在的region进行自动路由。仅在使用代理等特殊网络环境下使用该配置。</td>
134134
</tr>
135135
<tr>
136-
<td>quotaName</td>
136+
<td>quota.name</td>
137137
<td>optional</td>
138138
<td style="word-wrap: break-word;">(none)</td>
139139
<td>String</td>
140140
<td>MaxCompute 数据传输使用的独享资源组名称,如不指定该配置,则使用共享资源组。详情可以参考<a href="https://help.aliyun.com/zh/maxcompute/user-guide/purchase-and-use-exclusive-resource-groups-for-dts">
141141
使用 Maxcompute 独享资源组</a></td>
142142
</tr>
143143
<tr>
144-
<td>stsToken</td>
144+
<td>sts-token</td>
145145
<td>optional</td>
146146
<td style="word-wrap: break-word;">(none)</td>
147147
<td>String</td>
148148
<td>当使用RAM角色颁发的短时有效的访问令牌(STS Token)进行鉴权时,需要指定该参数。</td>
149149
</tr>
150150
<tr>
151-
<td>bucketsNum</td>
151+
<td>buckets-num</td>
152152
<td>optional</td>
153153
<td style="word-wrap: break-word;">16</td>
154154
<td>Integer</td>
155155
<td>自动创建 MaxCompute Delta 表时使用的桶数。使用方式可以参考 <a href="https://help.aliyun.com/zh/maxcompute/user-guide/transaction-table2-0-overview">
156156
Delta Table 概述</a></td>
157157
</tr>
158158
<tr>
159-
<td>compressAlgorithm</td>
159+
<td>compress.algorithm</td>
160160
<td>optional</td>
161161
<td style="word-wrap: break-word;">zlib</td>
162162
<td>String</td>
163-
<td>写入MaxCompute时使用的数据压缩算法,当前支持<code>raw</code>(不进行压缩),<code>zlib</code>和<code>snappy</code>。</td>
163+
<td>写入MaxCompute时使用的数据压缩算法,当前支持<code>raw</code>(不进行压缩),<code>zlib</code>, <code>lz4</code>和<code>snappy</code>。</td>
164164
</tr>
165165
<tr>
166-
<td>totalBatchSize</td>
166+
<td>total.buffer-size</td>
167167
<td>optional</td>
168168
<td style="word-wrap: break-word;">64MB</td>
169169
<td>String</td>
170170
<td>内存中缓冲的数据量大小,单位为分区级(非分区表单位为表级),不同分区(表)的缓冲区相互独立,达到阈值后数据写入到MaxCompute。</td>
171171
</tr>
172172
<tr>
173-
<td>bucketBatchSize</td>
173+
<td>bucket.buffer-size</td>
174174
<td>optional</td>
175175
<td style="word-wrap: break-word;">4MB</td>
176176
<td>String</td>
177177
<td>内存中缓冲的数据量大小,单位为桶级,仅写入 Delta 表时生效。不同数据桶的缓冲区相互独立,达到阈值后将该桶数据写入到MaxCompute。</td>
178178
</tr>
179179
<tr>
180-
<td>numCommitThreads</td>
180+
<td>commit.thread-num</td>
181181
<td>optional</td>
182182
<td style="word-wrap: break-word;">16</td>
183183
<td>Integer</td>
184184
<td>checkpoint阶段,能够同时处理的分区(表)数量。</td>
185185
</tr>
186186
<tr>
187-
<td>numFlushConcurrent</td>
187+
<td>flush.concurrent-num</td>
188188
<td>optional</td>
189189
<td style="word-wrap: break-word;">4</td>
190190
<td>Integer</td>

docs/content.zh/docs/core-concept/transform.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -92,10 +92,10 @@ Flink CDC uses [Calcite](https://calcite.apache.org/) to parse expressions and [
9292
|----------------------|-----------------------------|-----------------------------------------------------------------|
9393
| value1 = value2 | valueEquals(value1, value2) | Returns TRUE if value1 is equal to value2; returns FALSE if value1 or value2 is NULL. |
9494
| value1 <> value2 | !valueEquals(value1, value2) | Returns TRUE if value1 is not equal to value2; returns FALSE if value1 or value2 is NULL. |
95-
| value1 > value2 | value1 > value2 | Returns TRUE if value1 is greater than value2; returns FALSE if value1 or value2 is NULL. |
96-
| value1 >= value2 | value1 >= value2 | Returns TRUE if value1 is greater than or equal to value2; returns FALSE if value1 or value2 is NULL. |
97-
| value1 < value2 | value1 < value2 | Returns TRUE if value1 is less than value2; returns FALSE if value1 or value2 is NULL. |
98-
| value1 <= value2 | value1 <= value2 | Returns TRUE if value1 is less than or equal to value2; returns FALSE if value1 or value2 is NULL. |
95+
| value1 > value2 | greaterThan(value1, value2) | Returns TRUE if value1 is greater than value2; returns FALSE if value1 or value2 is NULL. |
96+
| value1 >= value2 | greaterThanOrEqual(value1, value2) | Returns TRUE if value1 is greater than or equal to value2; returns FALSE if value1 or value2 is NULL. |
97+
| value1 < value2 | lessThan(value1, value2) | Returns TRUE if value1 is less than value2; returns FALSE if value1 or value2 is NULL. |
98+
| value1 <= value2 | lessThanOrEqual(value1, value2) | Returns TRUE if value1 is less than or equal to value2; returns FALSE if value1 or value2 is NULL. |
9999
| value IS NULL | null == value | Returns TRUE if value is NULL. |
100100
| value IS NOT NULL | null != value | Returns TRUE if value is not NULL. |
101101
| value1 BETWEEN value2 AND value3 | betweenAsymmetric(value1, value2, value3) | Returns TRUE if value1 is greater than or equal to value2 and less than or equal to value3. |
Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
---
2+
title: "CdcUp Quickstart Guide"
3+
weight: 3
4+
type: docs
5+
aliases:
6+
- /get-started/quickstart/cdc-up-quickstart-guide
7+
---
8+
<!--
9+
Licensed to the Apache Software Foundation (ASF) under one
10+
or more contributor license agreements. See the NOTICE file
11+
distributed with this work for additional information
12+
regarding copyright ownership. The ASF licenses this file
13+
to you under the Apache License, Version 2.0 (the
14+
"License"); you may not use this file except in compliance
15+
with the License. You may obtain a copy of the License at
16+
17+
http://www.apache.org/licenses/LICENSE-2.0
18+
19+
Unless required by applicable law or agreed to in writing,
20+
software distributed under the License is distributed on an
21+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
22+
KIND, either express or implied. See the License for the
23+
specific language governing permissions and limitations
24+
under the License.
25+
-->
26+
27+
# Setting-up a CDC Pipeline Environment with CdcUp CLI
28+
29+
Flink CDC now provides a CdcUp CLI utility to start a playground environment and run Flink CDC jobs quickly.
30+
31+
You will need a working Docker and Docker Compose V2 environment to use it.
32+
33+
## Step-by-step Guide
34+
35+
1. Open a terminal and run `git clone https://github.com/apache/flink-cdc.git --depth=1` to retrieve a copy of Flink CDC source code.
36+
37+
2. Run `cd flink-cdc/tools/cdcup/` to enter CDC directory and run `./cdcup.sh` to use the "cdc-up" tool to start a playground environment. You should see the following output:
38+
39+
```
40+
Usage: ./cdcup.sh { init | up | pipeline <yaml> | flink | mysql | stop | down | help }
41+
42+
Commands:
43+
* init:
44+
Initialize a playground environment, and generate configuration files.
45+
46+
* up:
47+
Start docker containers. This may take a while before database is ready.
48+
49+
* pipeline <yaml>:
50+
Submit a YAML pipeline job.
51+
52+
* flink:
53+
Print Flink Web dashboard URL.
54+
55+
* mysql:
56+
Open MySQL console.
57+
58+
* stop:
59+
Stop all running playground containers.
60+
61+
* down:
62+
Stop and remove containers, networks, and volumes.
63+
64+
* help:
65+
Print this message.
66+
```
67+
68+
3. Run `./cdcup.sh init`, waiting for pulling base docker image, and you should see the following output:
69+
70+
```
71+
🎉 Welcome to cdc-up quickstart wizard!
72+
There are a few questions to ask before getting started:
73+
🐿️ Which Flink version would you like to use? (Press ↑/↓ arrow to move and Enter to select)
74+
1.17.2
75+
1.18.1
76+
1.19.1
77+
‣ 1.20.0
78+
```
79+
80+
Use the arrow keys to navigate and press Enter to select specified Flink and CDC version, source, and sink connectors.
81+
82+
4. Run `./cdcup.sh up` to boot-up docker containers, and wait for them to be ready. Some sink connectors (like Doris and StarRocks) need some time to initialize and ready for handling requests.
83+
84+
5. If you're choosing MySQL as data source, you need to run `./cdcup.sh mysql` to open a MySQL session and create at least one table.
85+
86+
For example, the following SQL commands create a database and a table, insert some test data, and verify the result:
87+
88+
```sql
89+
-- initialize db and table
90+
CREATE DATABASE cdc_playground;
91+
USE cdc_playground;
92+
CREATE TABLE test_table (id INT PRIMARY KEY, name VARCHAR(32));
93+
94+
-- insert test data
95+
INSERT INTO test_table VALUES (1, 'alice'), (2, 'bob'), (3, 'cicada'), (4, 'derrida');
96+
97+
-- verify if it has been successfully inserted
98+
SELECT * FROM test_table;
99+
```
100+
101+
6. Run `./cdcup.sh pipeline pipeline-definition.yaml` to submit the pipeline job. You may also edit the pipeline definition file for further configurations.
102+
103+
7. Run `./cdcup.sh flink` and navigate to the printed URL to access the Flink Web UI.
104+
105+
```
106+
$ ./cdcup.sh flink
107+
🚩 Visit Flink Dashboard at: http://localhost:33448
108+
```

docs/content/docs/connectors/pipeline-connectors/maxcompute.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -94,15 +94,15 @@ pipeline:
9494
<td>The name of the sink.</td>
9595
</tr>
9696
<tr>
97-
<td>accessId</td>
97+
<td>access-id</td>
9898
<td>required</td>
9999
<td style="word-wrap: break-word;">(none)</td>
100100
<td>String</td>
101101
<td>AccessKey ID of Alibaba Cloud account or RAM user. You can enter <a href="https://ram.console.aliyun.com/manage/ak">
102102
AccessKey management page</a> Obtain AccessKey ID.</td>
103103
</tr>
104104
<tr>
105-
<td>accessKey</td>
105+
<td>access-key</td>
106106
<td>required</td>
107107
<td style="word-wrap: break-word;">(none)</td>
108108
<td>String</td>
@@ -124,63 +124,63 @@ pipeline:
124124
<td>The name of the MaxCompute project. You can log in to the <a href="https://maxcompute.console.aliyun.com/">MaxCompute console</a> and obtain the MaxCompute project name on the Workspace > Project Management page.</td>
125125
</tr>
126126
<tr>
127-
<td>tunnelEndpoint</td>
127+
<td>tunnel.endpoint</td>
128128
<td>optional</td>
129129
<td style="word-wrap: break-word;">(none)</td>
130130
<td>String</td>
131131
<td>The connection address for the MaxCompute Tunnel service. Typically, this configuration can be auto-routed based on the region where the specified project is located. It is used only in special network environments such as when using a proxy.</td>
132132
</tr>
133133
<tr>
134-
<td>quotaName</td>
134+
<td>quota.name</td>
135135
<td>optional</td>
136136
<td style="word-wrap: break-word;">(none)</td>
137137
<td>String</td>
138138
<td>The name of the exclusive resource group for MaxCompute data transfer. If not specified, the shared resource group is used. For details, refer to <a href="https://help.aliyun.com/zh/maxcompute/user-guide/purchase-and-use-exclusive-resource-groups-for-dts">Using exclusive resource groups for Maxcompute</a></td>
139139
</tr>
140140
<tr>
141-
<td>stsToken</td>
141+
<td>sts-token</td>
142142
<td>optional</td>
143143
<td style="word-wrap: break-word;">(none)</td>
144144
<td>String</td>
145145
<td>When using a temporary access token (STS Token) issued by a RAM role for authentication, this parameter must be specified.</td>
146146
</tr>
147147
<tr>
148-
<td>bucketsNum</td>
148+
<td>buckets-num</td>
149149
<td>optional</td>
150150
<td style="word-wrap: break-word;">16</td>
151151
<td>Integer</td>
152152
<td>The number of buckets used when auto-creating MaxCompute Delta tables. For usage, refer to <a href="https://help.aliyun.com/zh/maxcompute/user-guide/transaction-table2-0-overview">Delta Table Overview</a></td>
153153
</tr>
154154
<tr>
155-
<td>compressAlgorithm</td>
155+
<td>compress.algorithm</td>
156156
<td>optional</td>
157157
<td style="word-wrap: break-word;">zlib</td>
158158
<td>String</td>
159-
<td>The data compression algorithm used when writing to MaxCompute. Currently supports <code>raw</code> (no compression), <code>zlib</code>, and <code>snappy</code>.</td>
159+
<td>The data compression algorithm used when writing to MaxCompute. Currently supports <code>raw</code> (no compression), <code>zlib</code>, <code>lz4</code>, and <code>snappy</code>.</td>
160160
</tr>
161161
<tr>
162-
<td>totalBatchSize</td>
162+
<td>total.buffer-size</td>
163163
<td>optional</td>
164164
<td style="word-wrap: break-word;">64MB</td>
165165
<td>String</td>
166166
<td>The size of the data buffer in memory, by partition level (for non-partitioned tables, by table level). Buffers for different partitions (tables) are independent, and data is written to MaxCompute when the threshold is reached.</td>
167167
</tr>
168168
<tr>
169-
<td>bucketBatchSize</td>
169+
<td>bucket.buffer-size</td>
170170
<td>optional</td>
171171
<td style="word-wrap: break-word;">4MB</td>
172172
<td>String</td>
173173
<td>The size of the data buffer in memory, by bucket level. This is effective only when writing to Delta tables. Buffers for different data buckets are independent, and the bucket data is written to MaxCompute when the threshold is reached.</td>
174174
</tr>
175175
<tr>
176-
<td>numCommitThreads</td>
176+
<td>commit.thread-num</td>
177177
<td>optional</td>
178178
<td style="word-wrap: break-word;">16</td>
179179
<td>Integer</td>
180180
<td>The number of partitions (tables) that can be processed simultaneously during the checkpoint stage.</td>
181181
</tr>
182182
<tr>
183-
<td>numFlushConcurrent</td>
183+
<td>flush.concurrent-num</td>
184184
<td>optional</td>
185185
<td style="word-wrap: break-word;">4</td>
186186
<td>Integer</td>

0 commit comments

Comments
 (0)