Skip to content

Commit 20ca21a

Browse files
authored
Merge pull request #10580 from BohuTANG/doc-data-transform-load
docs(transforming): add data load transform
2 parents cfdebfe + f788de9 commit 20ca21a

File tree

5 files changed

+94
-8
lines changed

5 files changed

+94
-8
lines changed

docs/doc/12-load-data/00-stage.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
2-
title: Tutorial - Load from an Internal Stage
3-
sidebar_label: Tutorial - Load from an Internal Stage
2+
title: Load from an Internal Stage
3+
sidebar_label: Load from an Internal Stage
44
description:
55
Load data from Databend stages.
66
---

docs/doc/12-load-data/01-s3.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
2-
title: Tutorial - Load from an Amazon S3 Bucket
3-
sidebar_label: Tutorial - Load from an Amazon S3 Bucket
2+
title: Load from an Amazon S3 Bucket
3+
sidebar_label: Load from an Amazon S3 Bucket
44
description:
55
Load data from Amazon S3.
66
---

docs/doc/12-load-data/02-local.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
2-
title: Tutorial - Load from a Local File
3-
sidebar_label: Tutorial - Load from a Local File
2+
title: Load from a Local File
3+
sidebar_label: Load from a Local File
44
description:
55
Load data from local file system.
66
---

docs/doc/12-load-data/04-http.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
2-
title: Tutorial - Load from Remote File
3-
sidebar_label: Tutorial - Load from a Remote File
2+
title: Load from Remote File
3+
sidebar_label: Load from a Remote File
44
description:
55
Load data from remote files.
66
---
Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
---
2+
title: Transforming Data During a Load
3+
sidebar_label: Transforming Data During a Load
4+
description: Learn how to use Databend to transform data while loading it into a table using the COPY INTO <table> command.
5+
---
6+
7+
Databend supports transforming data while loading it into a table using the `COPY INTO <table>` command, which simplifies your ETL pipeline for basic transformations.
8+
9+
This feature helps you avoid the use of temporary tables to store pre-transformed data when reordering columns during a data load.
10+
11+
The `COPY` command supports:
12+
- Column reordering, column omission, and casts using a SELECT statement. There is no requirement for your data files to have the same number and ordering of columns as your target table.
13+
14+
:::note
15+
Transforming is only supported for Parquet format in the stage.
16+
:::
17+
18+
## Load a Subset of Table Data
19+
20+
Load a subset of data into a table. The following example loads data from columns `id`, `name` of a staged Parquet file:
21+
22+
**Sample Data**
23+
```text
24+
id | name | age
25+
---|------------|----
26+
1 | John Doe | 35
27+
2 | Jane Smith | 28
28+
```
29+
30+
**Example**
31+
```sql
32+
-- create a table
33+
CREATE TABLE my_table(id int, name string);
34+
35+
COPY INTO my_table
36+
FROM (SELECT t.id, t.name FROM @mystage t)
37+
FILE_FORMAT = (type = parquet) PATTERN='.*parquet';
38+
````
39+
40+
## Reorder Columns During a Load
41+
42+
To reorder the columns from a staged Parquet file before loading it into a table, you can use the `COPY INTO` command with a `SELECT` statement. The following example reorders the columns `name` and `id`:
43+
44+
**Sample Data**
45+
```text
46+
id | name | age
47+
---|------------|----
48+
1 | John Doe | 35
49+
2 | Jane Smith | 28
50+
```
51+
52+
**Example**
53+
````sql
54+
CREATE TABLE my_table(name string, id int);
55+
56+
COPY INTO my_table
57+
FROM (SELECT t.name, t.id FROM @mystage t)
58+
FILE_FORMAT = (type = parquet) PATTERN='.*parquet';
59+
````
60+
61+
## Convert Data Types During a Load
62+
63+
To convert staged data into other data types during a data load, you can use the appropriate conversion function in your `SELECT` statement.
64+
65+
The following example converts a timestamp into a date:
66+
67+
**Sample Data**
68+
```text
69+
id | name | timestamp
70+
---|---------|--------------------
71+
1 | John Doe| 2022-03-15 10:30:00
72+
2 | Jane Doe| 2022-03-14 09:00:00
73+
```
74+
75+
**Example**
76+
```sql
77+
CREATE TABLE my_table(id int, name string, time date);
78+
79+
COPY INTO my_table
80+
FROM (SELECT t.id, t.name, to_date(t.timestamp) FROM @mystage t)
81+
FILE_FORMAT = (type = parquet) PATTERN='.*parquet';
82+
```
83+
84+
## Conclusion
85+
86+
Transforming data during a load is a powerful feature of Databend that allows you to simplify your ETL pipeline and avoid the use of temporary tables. With the ability to transform data during a load, you can streamline your ETL pipeline and focus on the analysis of your data rather than the mechanics of moving it around.

0 commit comments

Comments
 (0)