|
| 1 | +--- |
| 2 | +title: Transforming Data During a Load |
| 3 | +sidebar_label: Transforming Data During a Load |
| 4 | +description: Learn how to use Databend to transform data while loading it into a table using the COPY INTO <table> command. |
| 5 | +--- |
| 6 | + |
| 7 | +Databend supports transforming data while loading it into a table using the `COPY INTO <table>` command, which simplifies your ETL pipeline for basic transformations. |
| 8 | + |
| 9 | +This feature helps you avoid the use of temporary tables to store pre-transformed data when reordering columns during a data load. |
| 10 | + |
| 11 | +The `COPY` command supports: |
| 12 | +- Column reordering, column omission, and casts using a SELECT statement. There is no requirement for your data files to have the same number and ordering of columns as your target table. |
| 13 | + |
| 14 | +:::note |
| 15 | +Transforming is only supported for Parquet format in the stage. |
| 16 | +::: |
| 17 | + |
| 18 | +## Load a Subset of Table Data |
| 19 | + |
| 20 | +Load a subset of data into a table. The following example loads data from columns `id`, `name` of a staged Parquet file: |
| 21 | + |
| 22 | +**Sample Data** |
| 23 | +```text |
| 24 | +id | name | age |
| 25 | +---|------------|---- |
| 26 | + 1 | John Doe | 35 |
| 27 | + 2 | Jane Smith | 28 |
| 28 | +``` |
| 29 | + |
| 30 | +**Example** |
| 31 | +```sql |
| 32 | +-- create a table |
| 33 | +CREATE TABLE my_table(id int, name string); |
| 34 | + |
| 35 | +COPY INTO my_table |
| 36 | +FROM (SELECT t.id, t.name FROM @mystage t) |
| 37 | +FILE_FORMAT = (type = parquet) PATTERN='.*parquet'; |
| 38 | +```` |
| 39 | + |
| 40 | +## Reorder Columns During a Load |
| 41 | + |
| 42 | +To reorder the columns from a staged Parquet file before loading it into a table, you can use the `COPY INTO` command with a `SELECT` statement. The following example reorders the columns `name` and `id`: |
| 43 | + |
| 44 | +**Sample Data** |
| 45 | +```text |
| 46 | +id | name | age |
| 47 | +---|------------|---- |
| 48 | + 1 | John Doe | 35 |
| 49 | + 2 | Jane Smith | 28 |
| 50 | +``` |
| 51 | + |
| 52 | +**Example** |
| 53 | +````sql |
| 54 | +CREATE TABLE my_table(name string, id int); |
| 55 | + |
| 56 | +COPY INTO my_table |
| 57 | +FROM (SELECT t.name, t.id FROM @mystage t) |
| 58 | +FILE_FORMAT = (type = parquet) PATTERN='.*parquet'; |
| 59 | +```` |
| 60 | + |
| 61 | +## Convert Data Types During a Load |
| 62 | + |
| 63 | +To convert staged data into other data types during a data load, you can use the appropriate conversion function in your `SELECT` statement. |
| 64 | + |
| 65 | +The following example converts a timestamp into a date: |
| 66 | + |
| 67 | +**Sample Data** |
| 68 | +```text |
| 69 | +id | name | timestamp |
| 70 | +---|---------|-------------------- |
| 71 | + 1 | John Doe| 2022-03-15 10:30:00 |
| 72 | + 2 | Jane Doe| 2022-03-14 09:00:00 |
| 73 | +``` |
| 74 | + |
| 75 | +**Example** |
| 76 | +```sql |
| 77 | +CREATE TABLE my_table(id int, name string, time date); |
| 78 | +
|
| 79 | +COPY INTO my_table |
| 80 | +FROM (SELECT t.id, t.name, to_date(t.timestamp) FROM @mystage t) |
| 81 | +FILE_FORMAT = (type = parquet) PATTERN='.*parquet'; |
| 82 | +``` |
| 83 | + |
| 84 | +## Conclusion |
| 85 | + |
| 86 | +Transforming data during a load is a powerful feature of Databend that allows you to simplify your ETL pipeline and avoid the use of temporary tables. With the ability to transform data during a load, you can streamline your ETL pipeline and focus on the analysis of your data rather than the mechanics of moving it around. |
0 commit comments