Archive data from common databases into Databend with parallel sync (by key or time range).
| Data source | Supported |
|---|---|
| MySQL | Yes |
| PostgreSQL | Yes |
| TiDB | Yes |
| SQL Server | Yes |
| Oracle | Coming soon |
| CSV | Coming soon |
| NDJSON | Coming soon |
Download the binary from the release page.
Create config/conf.json.
Parameters (defaults are from code):
| Key | Required | Default | Notes |
|---|---|---|---|
databaseType |
No | mysql |
mysql, tidb, pg, mssql, oracle |
sourceHost |
Yes | - | Source host |
sourcePort |
Yes | - | Source port |
sourceUser |
Yes | - | Source user |
sourcePass |
Yes | - | Source password |
sourceDB |
If no sourceDbTables |
- | Source database |
sourceTable |
If no sourceDbTables |
- | Source table |
sourceDbTables |
No | [] |
Multi-table: ["dbRegex@tableRegex"] |
sourceQuery |
No | - | Currently ignored |
sourceWhereCondition |
Yes | - | WHERE clause without WHERE |
sourceSplitKey |
If key split | - | Integer primary key |
sourceSplitTimeKey |
If time split | - | Time column |
timeSplitUnit |
If time split | hour |
minute, quarter, hour, day |
sslMode |
No | disable |
Postgres only |
databendDSN |
Yes | localhost:8000 |
Databend DSN |
databendTable |
Yes | - | Target table |
batchSize |
Yes | 1000 |
Rows per batch |
batchMaxInterval |
No | 3 |
Seconds between batches |
copyPurge |
No | true |
Databend COPY option |
copyForce |
No | false |
Databend COPY option |
disableVariantCheck |
No | true |
Databend COPY option |
userStage |
No | ~ |
Databend stage |
deleteAfterSync |
No | false |
Deletes source rows |
maxThread |
No | 1 |
Max concurrency |
oracleSID |
No | - | Oracle SID |
Rules:
sourceWhereConditionis always required; for time split uset >= '...' and t < '...'withYYYY-MM-DD HH:MM:SS.sourceSplitKeyandsourceSplitTimeKeyare mutually exclusive.- For time split,
timeSplitUnitis required.
Example (key split):
{
"databaseType": "mysql",
"sourceHost": "127.0.0.1",
"sourcePort": 3306,
"sourceUser": "root",
"sourcePass": "123456",
"sourceDB": "mydb",
"sourceTable": "test_table",
"sourceWhereCondition": "id > 0",
"sourceSplitKey": "id",
"databendDSN": "databend://username:password@host:port?sslmode=disable",
"databendTable": "mydb.test_table",
"batchSize": 40000,
"maxThread": 5
}Example (time split keys):
{
"sourceWhereCondition": "t1 >= '2024-06-01 00:00:00' and t1 < '2024-07-01 00:00:00'",
"sourceSplitTimeKey": "t1",
"timeSplitUnit": "hour"
}./bend-archiver -f config/conf.jsonIf -f is omitted, it loads config/conf.json.
go build -o bend-archiver ./cmdgo test ./...Tests in cmd and source expect local databases (Databend plus the source DBs in the tests).
go run ./cmd -f config/conf.json- Multi-table sync uses regex in
sourceDbTables(example:["^mydb$@^test_table_.*$"]). - The MySQL driver reports BOOL as
TINYINT(1), so useTINYINTin Databend for boolean columns. - COPY options reference: https://docs.databend.com/sql/sql-commands/dml/dml-copy-into-table#copy-options