Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/en/guides/40-load-data/01-load/00-stage.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: Loading from Stage
sidebar_label: Stage
---

Databend enables you to easily import data from files uploaded to either the user stage or an internal/external stage. To do so, you can first upload the files to a stage using [BendSQL](../../30-sql-clients/00-bendsql/index.md), and then employ the [COPY INTO](/sql/sql-commands/dml/dml-copy-into-table) command to load the data from the staged file. Please note that the files must be in a format supported by Databend, otherwise the data cannot be imported. For more information on the file formats supported by Databend, see [Input & Output File Formats](/sql/sql-reference/file-format-options).
Expand Down
1 change: 1 addition & 0 deletions docs/en/guides/40-load-data/01-load/01-s3.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: Loading from Bucket
sidebar_label: Bucket
---

When data files are stored in an object storage bucket, such as Amazon S3, it is possible to load them directly into Databend using the [COPY INTO](/sql/sql-commands/dml/dml-copy-into-table) command. Please note that the files must be in a format supported by Databend, otherwise the data cannot be imported. For more information on the file formats supported by Databend, see [Input & Output File Formats](/sql/sql-reference/file-format-options).
Expand Down
1 change: 1 addition & 0 deletions docs/en/guides/40-load-data/01-load/02-local.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: Loading from Local File
sidebar_label: Local
---

Uploading your local data files to a stage or bucket before loading them into Databend can be unnecessary. Instead, you can use [BendSQL](../../30-sql-clients/00-bendsql/index.md), the Databend native CLI tool, to directly import the data. This simplifies the workflow and can save you storage fees.
Expand Down
1 change: 1 addition & 0 deletions docs/en/guides/40-load-data/01-load/03-http.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
title: Loading from Remote File
sidebar_label: Remote
---

To load data from remote files into Databend, the [COPY INTO](/sql/sql-commands/dml/dml-copy-into-table) command can be used. This command allows you to copy data from a variety of sources, including remote files, into Databend with ease. With COPY INTO, you can specify the source file location, file format, and other relevant parameters to tailor the import process to your needs. Please note that the files must be in a format supported by Databend, otherwise the data cannot be imported. For more information on the file formats supported by Databend, see [Input & Output File Formats](/sql/sql-reference/file-format-options).
Expand Down
2 changes: 1 addition & 1 deletion docs/en/guides/40-load-data/01-load/_category_.json
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
{
"label": "Loading Data from Files"
"label": "Loading Files"
}
2 changes: 1 addition & 1 deletion docs/en/guides/40-load-data/01-load/index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Loading Data from Files
title: Loading from Files
---

import DetailsWrap from '@site/src/components/DetailsWrap';
Expand Down
2 changes: 1 addition & 1 deletion docs/en/guides/40-load-data/02-load-db/_category_.json
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
{
"label": "Loading Data with Tools"
"label": "Loading with Platforms"
}
118 changes: 1 addition & 117 deletions docs/en/guides/40-load-data/02-load-db/addax.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,120 +2,4 @@
title: Addax
---

import FunctionDescription from '@site/src/components/FunctionDescription';

<FunctionDescription description="Introduced: v1.1.70"/>

[Addax](https://github.com/wgzhao/Addax), originally derived from Alibaba's [DataX](https://github.com/alibaba/DataX), is a versatile open-source ETL (Extract, Transform, Load) tool. It excels at seamlessly transferring data between diverse RDBMS (Relational Database Management Systems) and NoSQL databases, making it an optimal solution for efficient data migration.

For information about the system requirements, download, and deployment steps for Addax, refer to Addax's [Getting Started Guide](https://github.com/wgzhao/Addax#getting-started). The guide provides detailed instructions and guidelines for setting up and using Addax.

See also: [DataX](datax.md)

## DatabendReader & DatabendWriter

DatabendReader and DatabendWriter are integrated plugins of Addax, allowing seamless integration with Databend.

The DatabendReader plugin enables reading data from Databend. Databend provides compatibility with the MySQL client protocol, so you can also use the [MySQLReader](https://wgzhao.github.io/Addax/develop/reader/mysqlreader/) plugin to retrieve data from Databend. For more information about DatabendReader, see https://wgzhao.github.io/Addax/develop/reader/databendreader/

## Tutorial: Data Loading from MySQL

In this tutorial, you will load data from MySQL to Databend with Addax. Before you start, make sure you have successfully set up Databend, MySQL, and Addax in your environment.

1. In MySQL, create a SQL user that you will use for data loading and then create a table and populate it with sample data.

```sql title='In MySQL:'
mysql> create user 'mysqlu1'@'%' identified by '123';
mysql> grant all on *.* to 'mysqlu1'@'%';
mysql> create database db;
mysql> create table db.tb01(id int, col1 varchar(10));
mysql> insert into db.tb01 values(1, 'test1'), (2, 'test2'), (3, 'test3');
```

2. In Databend, create a corresponding target table.

```sql title='In Databend:'
databend> create database migrated_db;
databend> create table migrated_db.tb01(id int null, col1 String null);
```

3. Copy and paste the following code to a file, and name the file as *mysql_demo.json*:

:::note
For the available parameters and their descriptions, refer to the documentation provided at the following link: https://wgzhao.github.io/Addax/develop/writer/databendwriter/#_2
:::

```json title='mysql_demo.json'
{
"job": {
"setting": {
"speed": {
"channel": 4
}
},
"content": {
"writer": {
"name": "databendwriter",
"parameter": {
"preSql": [
"truncate table @table"
],
"postSql": [
],
"username": "u1",
"password": "123",
"database": "migrate_db",
"table": "tb01",
"jdbcUrl": "jdbc:mysql://127.0.0.1:3307/migrated_db",
"loadUrl": ["127.0.0.1:8000","127.0.0.1:8000"],
"fieldDelimiter": "\\x01",
"lineDelimiter": "\\x02",
"column": ["*"],
"format": "csv"
}
},
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "mysqlu1",
"password": "123",
"column": [
"*"
],
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://127.0.0.1:3306/db"
],
"driver": "com.mysql.jdbc.Driver",
"table": [
"tb01"
]
}
]
}
}
}
}
}
```

4. Run Addax:

```shell
cd {YOUR_ADDAX_DIR_BIN}
./addax.sh -L debug ./mysql_demo.json
```

You're all set! To verify the data loading, execute the query in Databend:

```sql
databend> select * from migrated_db.tb01;
+------+-------+
| id | col1 |
+------+-------+
| 1 | test1 |
| 2 | test2 |
| 3 | test3 |
+------+-------+
```
See [Addax](/guides/migrate/mysql#addax).
142 changes: 1 addition & 141 deletions docs/en/guides/40-load-data/02-load-db/datax.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,144 +2,4 @@
title: DataX
---

import FunctionDescription from '@site/src/components/FunctionDescription';

<FunctionDescription description="Introduced: v1.1.70"/>

[DataX](https://github.com/alibaba/DataX) is an open-source data integration tool developed by Alibaba. It is designed to efficiently and reliably transfer data between various data storage systems and platforms, such as relational databases, big data platforms, and cloud storage services. DataX supports a wide range of data sources and data sinks, including but not limited to MySQL, Oracle, SQL Server, PostgreSQL, HDFS, Hive, HBase, MongoDB, and more.

:::tip
[Apache DolphinScheduler](https://dolphinscheduler.apache.org/) now has added support for Databend as a data source. This enhancement enables you to leverage DolphinScheduler for managing DataX tasks and effortlessly load data from MySQL to Databend.
:::

For information about the system requirements, download, and deployment steps for DataX, refer to DataX's [Quick Start Guide](https://github.com/alibaba/DataX/blob/master/userGuid.md). The guide provides detailed instructions and guidelines for setting up and using DataX.

See also: [Addax](addax.md)

## DatabendWriter

DatabendWriter is an integrated plugin of DataX, which means it comes pre-installed and does not require any manual installation. It acts as a seamless connector that enables the effortless transfer of data from other databases to Databend. With DatabendWriter, you can leverage the capabilities of DataX to efficiently load data from various databases into Databend.

DatabendWriter supports two operational modes: INSERT (default) and REPLACE. In INSERT Mode, new data is added while conflicts with existing records are prevented to maintain data integrity. On the other hand, the REPLACE Mode prioritizes data consistency by replacing existing records with newer data in case of conflicts.

If you need more information about DatabendWriter and its functionalities, you can refer to the documentation available at https://github.com/alibaba/DataX/blob/master/databendwriter/doc/databendwriter.md

## Tutorial: Data Loading from MySQL

In this tutorial, you will load data from MySQL to Databend with DataX. Before you start, make sure you have successfully set up Databend, MySQL, and DataX in your environment.

1. In MySQL, create a SQL user that you will use for data loading and then create a table and populate it with sample data.

```sql title='In MySQL:'
mysql> create user 'mysqlu1'@'%' identified by 'databend';
mysql> grant all on *.* to 'mysqlu1'@'%';
mysql> create database db;
mysql> create table db.tb01(id int, d double, t TIMESTAMP, col1 varchar(10));
mysql> insert into db.tb01 values(1, 3.1,now(), 'test1'), (1, 4.1,now(), 'test2'), (1, 4.1,now(), 'test2');
```

2. In Databend, create a corresponding target table.

:::note
DataX data types can be converted to Databend's data types when loaded into Databend. For the specific correspondences between DataX data types and Databend's data types, refer to the documentation provided at the following link: https://github.com/alibaba/DataX/blob/master/databendwriter/doc/databendwriter.md#33-type-convert
:::

```sql title='In Databend:'
databend> create database migrated_db;
databend> create table migrated_db.tb01(id int null, d double null, t TIMESTAMP null, col1 varchar(10) null);
```

3. Copy and paste the following code to a file, and name the file as *mysql_demo.json*. For the available parameters and their descriptions, refer to the documentation provided at the following link: https://github.com/alibaba/DataX/blob/master/databendwriter/doc/databendwriter.md#32-configuration-description

```json title='mysql_demo.json'
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "mysqlu1",
"password": "databend",
"column": [
"id", "d", "t", "col1"
],
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://127.0.0.1:3307/db"
],
"driver": "com.mysql.jdbc.Driver",
"table": [
"tb01"
]
}
]
}
},
"writer": {
"name": "databendwriter",
"parameter": {
"username": "databend",
"password": "databend",
"column": [
"id", "d", "t", "col1"
],
"preSql": [
],
"postSql": [
],
"connection": [
{
"jdbcUrl": "jdbc:databend://localhost:8000/migrated_db",
"table": [
"tb01"
]
}
]
}
}
}
],
"setting": {
"speed": {
"channel": 1
}
}
}
}
```

:::tip
The provided code above configures DatabendWriter to operate in the INSERT mode. To switch to the REPLACE mode, you must include the writeMode and onConflictColumn parameters. For example:

```json title='mysql_demo.json'
...
"writer": {
"name": "databendwriter",
"parameter": {
"writeMode": "replace",
"onConflictColumn":["id"],
"username": ...
```
:::

4. Run DataX:

```shell
cd {YOUR_DATAX_DIR_BIN}
python datax.py ./mysql_demo.json
```

You're all set! To verify the data loading, execute the query in Databend:

```sql
databend> select * from migrated_db.tb01;
+------+------+----------------------------+-------+
| id | d | t | col1 |
+------+------+----------------------------+-------+
| 1 | 3.1 | 2023-02-01 07:11:08.500000 | test1 |
| 1 | 4.1 | 2023-02-01 07:11:08.501000 | test2 |
| 1 | 4.1 | 2023-02-01 07:11:08.501000 | test2 |
+------+------+----------------------------+-------+
```
See [DataX](/guides/migrate/mysql#datax).
Loading