Skip to content

Commit 5e85cf9

Browse files
committed
fix test and readme
1 parent ac8c70d commit 5e85cf9

File tree

13 files changed

+193
-10124
lines changed

13 files changed

+193
-10124
lines changed

docs/en/cdc/resume.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22

33
When starting a task, you can retrieve the latest task position from the previous execution based on the configuration, allowing you to continue the task without starting from scratch.
44

5-
65
## Supported Sources
76

87
- MySQL source

docs/en/snapshot/check.md

Lines changed: 42 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,23 @@
11
# Data Check
22

3-
After data migration, you may want to compare the source and target data row by row and column by column. If the data volume is too large, you can perform sampling check. Please ensure that the tables to be checked have primary keys/unique keys.
3+
After data migration, you may want to compare the source and target data row by row and column by column. If the data volume is too large, you can perform a sampled check. Please ensure that the tables to be checked have primary keys/unique keys.
44

5-
Support comparison for MySQL/PG/Mongo.
5+
Supports comparison for MySQL, PostgreSQL, and MongoDB.
66

7-
Data check can be used with both snapshot and CDC tasks. For CDC tasks, keep `[checker]` enabled and set `extract_type=cdc`; the checker validates applied changes after they are sunk.
7+
Data check can be used with both snapshot and CDC tasks. For CDC tasks, keep `[checker]` enabled and set `extract_type=cdc`; the checker validates applied changes after they are written to the target.
88

9-
# Example: MySQL -> MySQL
9+
## Example: MySQL -> MySQL
1010

1111
Refer to [task templates](../../templates/mysql_to_mysql.md) and [tutorial](../tutorial/mysql_to_mysql.md)
1212

13-
## Sampling Check
13+
### Sampling Check
1414

15-
In the full check configuration, add `sample_interval` configuration. That is, sample 1 record for every 3 records.
15+
In the full check configuration, add `sample_interval` to the `[extractor]` section. For example, setting `sample_interval=3` checks every 3rd record.
1616
```
1717
[extractor]
1818
sample_interval=3
1919
```
2020

21-
## Configuration
22-
23-
See [config.md](../config.md) for `[checker]` options and target selection rules. Use the task
24-
templates and tutorials for end-to-end examples.
25-
2621
## Limitations
2722

2823
- Data check is source-driven (validates Source ∈ Target) and cannot detect extra rows that exist only in the target. To catch such cases, consider setting up a [Reverse Check](#reverse-check) by swapping extractor and checker configurations.
@@ -34,7 +29,7 @@ In CDC + Check scenarios, the checker validates DELETE events: it queries the ta
3429

3530
# Check Results
3631

37-
The check results are written to the log in json format, including diff.log, miss.log, sql.log and summary.log. The logs are stored in the log/check subdirectory.
32+
The check results are written to the log in JSON format, including diff.log, miss.log, sql.log, and summary.log. The logs are stored in the `log/check` subdirectory.
3833

3934
## Difference Log (diff.log)
4035

@@ -46,13 +41,13 @@ Difference logs include database (schema), table (tb), primary key/unique key (i
4641
{"schema":"test_db_1","tb":"one_pk_no_uk","id_col_values":{"f_0":"6"},"diff_col_values":{"f_1":{"src":null,"dst":"1","src_type":"None","dst_type":"Short"}}}
4742
```
4843

49-
When the source and target types are different (such as Int32 vs Int64, or None vs Short), `src_type`/`dst_type` will appear under the corresponding column, clearly marking the type inconsistency. Mongo also applies this rule, and the difference log will output the BSON type name.
44+
When the source and target types are different (such as Int32 vs Int64, or None vs Short), `src_type`/`dst_type` will appear under the corresponding column, clearly marking the type inconsistency. MongoDB also applies this rule, and the difference log will output the BSON type name.
5045

51-
Only when the route renames the schema or table, the log will supplement `target_schema`/`target_tb` to identify the real destination database table; `schema`, `tb` still represent the source, facilitating troubleshooting.
46+
Only when the router renames the schema or table will the log include `target_schema`/`target_tb` to identify the real destination table. `schema` and `tb` still represent the source, facilitating troubleshooting.
5247

5348
## Missing Log (miss.log)
5449

55-
Missing logs include database (schema), table (tb) and primary/unique key (id_col_values). Since missing records do not have difference columns, `diff_col_values` will not be output.
50+
Missing logs include database (schema), table (tb), and primary/unique key (id_col_values). Since missing records do not have difference columns, `diff_col_values` will not be output.
5651

5752
```json
5853
{"schema":"test_db_1","tb":"no_pk_one_uk","id_col_values":{"f_1":"8","f_2":"1"}}
@@ -62,14 +57,14 @@ Missing logs include database (schema), table (tb) and primary/unique key (id_co
6257

6358
## Output Full Row
6459

65-
When the business needs full row content for troubleshooting exceptions, you can enable full row logging in `[checker]`:
60+
When you need full row content for troubleshooting, you can enable full row logging in `[checker]`:
6661

6762
```
6863
[checker]
6964
output_full_row=true
7065
```
7166

72-
After enabling, all diff.log will append `src_row` and `dst_row`, and miss.log will append `src_row` (currently only supports MySQL/PG/Mongo, Redis is not supported yet). Example:
67+
After enabling, all diff.log entries will append `src_row` and `dst_row`, and miss.log entries will append `src_row` (currently only supports MySQL/PG/Mongo; Redis is not supported yet). Example:
7368

7469
```json
7570
{
@@ -103,7 +98,7 @@ After enabling, all diff.log will append `src_row` and `dst_row`, and miss.log w
10398

10499
## Output Revise SQL
105100

106-
If the business needs to manually repair different data, you can enable SQL output in `[checker]`:
101+
If you need to manually repair inconsistent data, you can enable SQL output in `[checker]`:
107102

108103
```
109104
[checker]
@@ -112,9 +107,13 @@ output_revise_sql=true
112107
revise_match_full_row=true
113108
```
114109

115-
After enabling, `INSERT` statements for missing records and `UPDATE` statements for differing records will be written to `sql.log`. When `revise_match_full_row=true`, even if the table has a primary key, it will use the entire row data to generate the WHERE condition, so as to locate the target data through the full row value. If the route is not renamed, `target_schema`/`target_tb` will not be output, and these two fields are only needed to determine the table where the SQL should be executed when renaming.
110+
After enabling, `INSERT` statements for missing records and `UPDATE` statements for differing records will be written to `sql.log`.
111+
112+
When `revise_match_full_row=true`, the entire row data is used to generate the WHERE condition even if the table has a primary key, so that the target row is located by matching all column values.
113+
114+
If the router does not rename the schema or table, `target_schema`/`target_tb` will not appear in the log. These two fields are only needed to determine the destination table when routing renames are configured.
116115

117-
The generated SQL is essentially the SQL that the sinker needs to execute to correct the target data to be consistent with the source; it directly uses the real destination schema/table, so it can be executed directly at the target (refer to `target_schema`/`target_tb` to determine the final target object when routing renames).
116+
The generated SQL uses the real destination schema/table and can be executed directly at the target. When routing renames are configured, refer to `target_schema`/`target_tb` to determine the final target object.
118117

119118
Example:
120119

@@ -151,19 +150,29 @@ Missing record log example:
151150
INSERT INTO `test_db_1`.`test_table`(`id`,`name`,`age`,`email`) VALUES(3,'Charlie',35,'charlie@example.com');
152151
```
153152

154-
### Summary Log (summary.log)
153+
## Summary Log (summary.log)
154+
155155
The summary log contains the overall results of the check, such as start_time, end_time, is_consistent, and the number of miss, diff.
156156

157157
```json
158158
{"start_time": "2023-09-01T12:00:00+08:00", "end_time": "2023-09-01T12:00:01+08:00", "is_consistent": false, "miss_count": 1, "diff_count": 2, "sql_count": 3}
159159
```
160160

161-
162161
# Reverse Check
163162

164-
Swap the [extractor] and [checker] target configurations to perform reverse check.
163+
Data check is source-driven and only verifies that source rows exist in the target. To detect extra rows in the target that do not exist in the source, set up a reverse check by swapping the `[extractor]` and `[checker]` target configurations:
164+
165+
```
166+
# Original: source=A, target=B
167+
# Reverse: source=B, target=A
168+
[extractor]
169+
url=<original checker url>
165170
166-
# Checker Configuration Parameters
171+
[checker]
172+
url=<original extractor url>
173+
```
174+
175+
# Configuration
167176

168177
See [config.md](../config.md) for the full `[checker]` configuration list and target selection rules.
169178

@@ -174,10 +183,13 @@ When `max_retries > 0`, the checker automatically retries on inconsistency:
174183
- Detailed miss/diff logs are only written on the final check
175184
- Useful when target data synchronization is not yet complete
176185

177-
# Other Configurations
186+
## Router
187+
188+
Supports the `[router]` configuration section. Refer to [config details](../config.md) for details.
189+
190+
## Integration Test References
178191

179-
- Support [router], please refer to [config details](../config.md) for details.
180-
- Refer to task_config.ini of each type of integration test:
181-
- dt-tests/tests/mysql_to_mysql/check
182-
- dt-tests/tests/pg_to_pg/check
183-
- dt-tests/tests/mongo_to_mongo/check
192+
Refer to `task_config.ini` of each type of integration test:
193+
- dt-tests/tests/mysql_to_mysql/check
194+
- dt-tests/tests/pg_to_pg/check
195+
- dt-tests/tests/mongo_to_mongo/check

docs/en/structure/check.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,40 @@
1-
# Check structures
1+
# Structure Check
22

3-
After structure migration, you can choose from two methods for verification. One is provided by us, and the other is an open source tool called [Liquibase](./check_liquibase.md). This document primarily focuses on the former one.
3+
After structure migration, you can choose from two verification methods. One is the built-in checker provided by ape-dts, and the other is an open-source tool called [Liquibase](./check_by_liquibase.md). This document focuses on the built-in checker.
44

5-
Structure check is independent of CDC. CDC + checker applies to row-level data check (see data check docs).
5+
Structure check is independent of CDC. "CDC + checker" refers to row-level data check (see [data check docs](../snapshot/check.md)).
6+
7+
## Example: MySQL -> MySQL
68

7-
# Example: MySQL -> MySQL
89
Refer to [task templates](../../templates/mysql_to_mysql.md)
910

1011
# Results
1112

12-
Based on the source structures, the check results include **miss**, **diff**, and **summary**, all presented in JSON.
13-
`miss.log`, `diff.log` use the same JSON structure (`StructCheckLog`):
13+
Based on the source structures, the check results include **miss**, **diff**, and **summary**, all presented in JSON format.
14+
15+
`miss.log` and `diff.log` use the same JSON structure (`StructCheckLog`):
1416

1517
```json
1618
{
1719
"key": "type.schema.table", // e.g., table.db_name.tb_name or index.db.tb.idx
1820
"src_sql": "CREATE TABLE `table_name` (id INT PRIMARY KEY)", // appears in miss/diff
19-
"dst_sql": "CREATE TABLE `table_name` (id INT PRIMARY KEY)" // appears in diff
21+
"dst_sql": "CREATE TABLE `table_name` (id INT PRIMARY KEY)" // appears in diff only
2022
}
2123
```
2224

23-
- `miss.log` (Present in source but missing in destination)
25+
- `miss.log` (present in source but missing in target)
2426
```json
2527
{"key":"table.struct_check_test_1.not_match_miss","src_sql":"CREATE TABLE IF NOT EXISTS `not_match_miss` (`id` int NOT NULL PRIMARY KEY)"}
2628
{"key":"index.struct_check_test_1.not_match_index.i6_miss","src_sql":"CREATE INDEX `i6_miss` ON `not_match_index` (`col6`)"}
2729
```
2830

29-
- `diff.log` (Present in both but different; contains both src_sql and dst_sql)
31+
- `diff.log` (present in both but different; contains both src_sql and dst_sql)
3032
```json
3133
{"key":"index.struct_check_test_1.not_match_index","src_sql":"ALTER TABLE `not_match_index` ADD INDEX `idx_v1` (`col1`)","dst_sql":"ALTER TABLE `not_match_index` ADD INDEX `idx_v2` (`col1`)"}
3234
{"key":"table.struct_check_test_1.not_match_column","src_sql":"CREATE TABLE `not_match_column` (`id` int)","dst_sql":"CREATE TABLE `not_match_column` (`id` bigint)"}
3335
```
3436

35-
36-
37-
- `summary.log` (Overview of the check results)
37+
- `summary.log` (overview of the check results)
3838
```json
3939
{"start_time": "2023-10-01T10:00:00+08:00", "end_time": "2023-10-01T10:00:05+08:00", "is_consistent": false, "miss_count": 8, "diff_count": 5, "sql_count": 14}
4040
```

docs/zh/snapshot/check.md

Lines changed: 42 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -2,26 +2,22 @@
22

33
数据迁移完成后,需要对源数据和目标数据进行逐行逐列比对。如果数据量过大,可以进行抽样校验。请确保需要校验的表具有主键/唯一键。
44

5-
支持对 MySQL/PG/Mongo 进行比对。
5+
支持对 MySQL、PostgreSQL、MongoDB 进行比对。
66

7-
数据校验可用于 Snapshot 与 CDC 任务。若为 CDC 任务,保持 `[checker]` 开启并设置 `extract_type=cdc`,checker 会在数据落库后进行校验
7+
数据校验可用于 Snapshot 与 CDC 任务。若为 CDC 任务,保持 `[checker]` 开启并设置 `extract_type=cdc`,checker 会在数据写入目标端后进行校验
88

9-
# 示例: MySQL -> MySQL
9+
## 示例: MySQL -> MySQL
1010

1111
参考 [任务模版](../../templates/mysql_to_mysql.md)[教程](../../en/tutorial/mysql_to_mysql.md)
1212

13-
## 抽样校验
13+
### 抽样校验
1414

15-
在全量校验配置下,添加 `sample_interval` 配置。即,每 3 条记录采样 1 次。
15+
在全量校验配置下,`[extractor]` 中添加 `sample_interval` 配置。例如设置 `sample_interval=3` 表示每 3 条记录采样 1 次。
1616
```
1717
[extractor]
1818
sample_interval=3
1919
```
2020

21-
## 配置
22-
23-
`[checker]` 相关配置与目标选择规则请参考 [config.md](../config.md)。端到端示例请参考模板与教程。
24-
2521
## 限制
2622

2723
- 数据校验为源端驱动(仅验证 Source ∈ Target),无法发现目标端多余数据(幽灵数据)。如需检测目标端多余数据,可通过 [反向校验](#反向校验) 交换 extractor 和 checker 配置。
@@ -33,7 +29,7 @@ sample_interval=3
3329

3430
# 校验结果
3531

36-
校验结果以 json 格式写入日志中,包括 diff.log, miss.log, sql.log 和 summary.log。日志存放在 log/check 子目录中。
32+
校验结果以 JSON 格式写入日志,包括 diff.logmiss.logsql.log 和 summary.log。日志存放在 `log/check` 子目录中。
3733

3834
## 差异日志(diff.log)
3935

@@ -45,9 +41,9 @@ sample_interval=3
4541
{"schema":"test_db_1","tb":"one_pk_no_uk","id_col_values":{"f_0":"6"},"diff_col_values":{"f_1":{"src":null,"dst":"1","src_type":"None","dst_type":"Short"}}}
4642
```
4743

48-
当源端与目标端的类型不同(如 Int32 对 Int64,或 None 对 Short),`src_type`/`dst_type` 会出现在对应列下,明确标出类型不一致。Mongo 也适用这一规则,差异日志会输出 BSON 类型名称。
44+
当源端与目标端的类型不同(如 Int32 对 Int64,或 None 对 Short)`src_type`/`dst_type` 会出现在对应列下,明确标出类型不一致。MongoDB 也适用这一规则,差异日志会输出 BSON 类型名称。
4945

50-
只有在路由对 schema 或 table 进行重命名时,日志才会补充 `target_schema`/`target_tb` 来标识目的端真实库表`schema``tb` 依旧表示源端,方便排查。
46+
只有在路由对 schema 或 table 进行重命名时,日志才会补充 `target_schema`/`target_tb` 来标识目的端真实库表`schema``tb` 依旧表示源端,方便排查。
5147

5248
## 缺失日志(miss.log)
5349

@@ -61,14 +57,14 @@ sample_interval=3
6157

6258
## 输出完整行
6359

64-
当业务需要完整行内容用于排查异常,可以在 `[checker]` 中开启全行日志:
60+
当需要完整行内容用于排查问题时,可以在 `[checker]` 中开启全行日志:
6561

6662
```
6763
[checker]
6864
output_full_row=true
6965
```
7066

71-
开启后,所有 diff.log 会追加 `src_row``dst_row`,miss.log 会追加 `src_row`(当前仅支持 MySQL/PG/Mongo,Redis 仍不支持)。示例:
67+
开启后,所有 diff.log 条目会追加 `src_row``dst_row`,miss.log 条目会追加 `src_row`(当前仅支持 MySQL/PG/Mongo,Redis 暂不支持)。示例:
7268

7369
```json
7470
{
@@ -102,7 +98,7 @@ output_full_row=true
10298

10399
## 输出修复 SQL
104100

105-
业务若需要人工修复差异数据,可以在 `[checker]` 中开启 SQL 输出:
101+
如需人工修复差异数据,可以在 `[checker]` 中开启 SQL 输出:
106102

107103
```
108104
[checker]
@@ -111,9 +107,13 @@ output_revise_sql=true
111107
revise_match_full_row=true
112108
```
113109

114-
开启后,缺失记录的 `INSERT` 语句与差异记录的 `UPDATE` 语句会被写入 `sql.log``revise_match_full_row=true` 时,即使表存在主键也会使用整行数据生成 WHERE 条件,以便通过完整行值定位目标数据。若路由没有改名就不会输出 `target_schema`/`target_tb`,只在改名时才需要参考这两个字段决定 SQL 应执行的表。
110+
开启后,缺失记录的 `INSERT` 语句与差异记录的 `UPDATE` 语句会被写入 `sql.log`
111+
112+
`revise_match_full_row=true` 时,即使表存在主键也会使用整行数据生成 WHERE 条件,以便通过完整行值定位目标数据。
113+
114+
若路由没有对 schema 或 table 改名,则不会输出 `target_schema`/`target_tb`。这两个字段仅在路由改名时用于确定 SQL 应执行的目标表。
115115

116-
生成的 SQL 本质上是 sinker 需要执行、用以把目标数据修正到和源端一致的 SQL;它直接使用了真正的目的端 schema/table,所以可以直接在目标执行(路由改名时仍可参考 `target_schema`/`target_tb` 判断最终目标对象
116+
生成的 SQL 直接使用真正的目的端 schema/table,可以直接在目标端执行。路由改名时可参考 `target_schema`/`target_tb` 判断最终目标对象。
117117

118118
示例:
119119

@@ -150,33 +150,46 @@ UPDATE `target_db`.`target_tb` SET `f_1`='2' WHERE `f_0` = 4;
150150
INSERT INTO `test_db_1`.`test_table`(`id`,`name`,`age`,`email`) VALUES(3,'Charlie',35,'charlie@example.com');
151151
```
152152

153-
### 概览日志(summary.log)
154-
概览日志包含校验的总体结果,如 start_time, end_time, is_consistent,以及 miss, diff 的数量。
153+
## 概览日志(summary.log)
154+
155+
概览日志包含校验的总体结果,如 start_time、end_time、is_consistent,以及 miss、diff 的数量。
155156

156157
```json
157158
{"start_time": "2023-09-01T12:00:00+08:00", "end_time": "2023-09-01T12:00:01+08:00", "is_consistent": false, "miss_count": 1, "diff_count": 2, "sql_count": 3}
158159
```
159160

160-
161161
# 反向校验
162162

163-
[extractor][checker] 目标配置调换,即可进行反向校验。
163+
数据校验为源端驱动,只验证源端数据是否存在于目标端。若需检测目标端中多余的数据(源端不存在),可通过交换 `[extractor]``[checker]` 的目标配置来进行反向校验:
164+
165+
```
166+
# 原始:源端=A,目标端=B
167+
# 反向:源端=B,目标端=A
168+
[extractor]
169+
url=<原 checker 的 url>
164170
165-
# Checker 配置参数
171+
[checker]
172+
url=<原 extractor 的 url>
173+
```
174+
175+
# 配置
166176

167177
`[checker]` 的完整配置与目标选择规则请参考 [config.md](../config.md)
168178

169-
## 重试机制说明
179+
## 重试机制
170180

171181
`max_retries > 0` 时,checker 会在检测到不一致时自动重试:
172182
- 重试期间不记录日志,避免噪音
173183
- 仅在最后一次检查时记录详细的 miss/diff 日志
174184
- 适用于目标端数据尚未完全同步的场景
175185

176-
# 其他配置
186+
## 路由
187+
188+
支持 `[router]` 配置,详情请参考 [配置详解](../config.md)
189+
190+
## 集成测试参考
177191

178-
- 支持 [router],详情请参考 [配置详解](../config.md)
179-
- 参考各类型集成测试的 task_config.ini:
180-
- dt-tests/tests/mysql_to_mysql/check
181-
- dt-tests/tests/pg_to_pg/check
182-
- dt-tests/tests/mongo_to_mongo/check
192+
参考各类型集成测试的 `task_config.ini`
193+
- dt-tests/tests/mysql_to_mysql/check
194+
- dt-tests/tests/pg_to_pg/check
195+
- dt-tests/tests/mongo_to_mongo/check

0 commit comments

Comments
 (0)