You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The FROM clause specifies the source location (user stage, internal stage, external stage, or external location) from which data will be loaded into the specified table using the COPY INTO command. You can also nest a SELECT ... FROM subquery to transform the data you want to load. For more information, see [Transforming Data on Load](/guides/load-data/transform/data-load-transform).
38
-
39
-
:::note
40
-
When you load data from a staged file and the stage path contains special characters such as spaces or parentheses, you can enclose the entire path in single quotes, as demonstrated in the following SQL statements:
41
34
42
-
COPY INTO mytable FROM 's3://mybucket/dataset(databend)/' ...
43
-
COPY INTO mytable FROM 's3://mybucket/dataset databend/' ...
For the connection parameters available for accessing Amazon S3-like storage services, see [Connection Parameters](/00-sql-reference/51-connect-parameters.md).
-**PATTERN**: A [PCRE2](https://www.pcre.org/current/doc/html/)-based regular expression pattern string that specifies file names to match. See [Example 4: Filtering Files with Pattern](#example-4-filtering-files-with-pattern).
124
175
125
-
```sql
126
-
externalLocation ::=
127
-
'cos://<bucket>[<path>]'
128
-
CONNECTION = (
129
-
<connection_parameters>
130
-
)
131
-
```
176
+
## Format Type Options
132
177
133
-
For the connection parameters available for accessing Tencent Cloud Object Storage, see [Connection Parameters](/00-sql-reference/51-connect-parameters.md).
134
-
</TabItem>
178
+
The `FILE_FORMAT` parameter supports different file types, each with specific formatting options. Below are the available options for each supported file format:
135
179
136
-
<TabItemvalue="Remote Files"label="Remote Files">
180
+
### Common Options for All Formats
137
181
138
-
```sql
139
-
externalLocation ::=
140
-
'https://<url>'
141
-
```
182
+
| Option | Description | Values | Default |
183
+
|--------|-------------|--------|--------|
184
+
| COMPRESSION | Compression algorithm for data files | AUTO, GZIP, BZ2, BROTLI, ZSTD, DEFLATE, RAW_DEFLATE, XZ, NONE | AUTO |
142
185
143
-
You can use glob patterns to specify more than one file. For example, use
186
+
### TYPE = CSV
144
187
145
-
-`ontime_200{6,7,8}.csv` to represents `ontime_2006.csv`,`ontime_2007.csv`,`ontime_2008.csv`.
146
-
-`ontime_200[6-8].csv` to represents `ontime_2006.csv`,`ontime_2007.csv`,`ontime_2008.csv`.
188
+
| Option | Description | Default |
189
+
|--------|-------------|--------|
190
+
| RECORD_DELIMITER | Character(s) separating records | newline |
FILES specifies one or more file names (separated by commas) to be loaded.
218
+
| Option | Description | Default |
219
+
|--------|-------------|--------|
220
+
| MISSING_FIELD_AS | How to handle missing fields | ERROR |
164
221
165
-
### PATTERN
222
+
### TYPE = ORC
166
223
167
-
A [PCRE2](https://www.pcre.org/current/doc/html/)-based regular expression pattern string, enclosed in single quotes, specifying the file names to match. For PCRE2 syntax, see http://www.pcre.org/current/doc/html/pcre2syntax.html. See [Example 4: Filtering Files with Pattern](#example-4-filtering-files-with-pattern) for examples and useful tips about filtering files with the PATTERN parameter.
224
+
| Option | Description | Default |
225
+
|--------|-------------|--------|
226
+
| MISSING_FIELD_AS | How to handle missing fields | ERROR |
168
227
169
-
### FILE_FORMAT
228
+
### TYPE = AVRO
170
229
171
-
See [Input & Output File Formats](../../00-sql-reference/50-file-format-options.md) for details.
230
+
| Option | Description | Default |
231
+
|--------|-------------|--------|
232
+
| MISSING_FIELD_AS | How to handle missing fields | ERROR |
| SIZE_LIMIT | Specifies the maximum rows of data to be loaded for a given COPY statement. Defaults to `0` meaning no limits. | Optional |
190
-
| PURGE | If `true`, the command will purge the files in the stage after they are loaded successfully into the table. Default: `false`. | Optional |
191
-
| FORCE | COPY INTO ensures idempotence by automatically tracking and preventing the reloading of files for a default period of 12 hours. This can be customized using the `load_file_metadata_expire_hours` setting to control the expiration time for file metadata.<br/>This parameter defaults to `false` meaning COPY INTO will skip duplicate files when copying data. If `true`, duplicate files will not be skipped. | Optional |
192
-
| DISABLE_VARIANT_CHECK | If `true`, invalid JSON data is replaced with null values during COPY INTO. If `false` (default), COPY INTO fails on invalid JSON data. | Optional |
193
-
| ON_ERROR | Decides how to handle a file that contains errors: `continue` to skip and proceed, `abort` (default) to terminate on error, `abort_N` to terminate when errors ≥ N. Note: `abort_N` not available for Parquet files. | Optional |
194
-
| MAX_FILES | Sets the maximum number of files to load that have not been loaded already. The value can be set up to 15,000; any value greater than 15,000 will be treated as 15,000. | Optional |
195
-
| RETURN_FAILED_ONLY | When set to `true`, only files that failed to load will be returned in the output. Default: `false`. | Optional |
196
-
| COLUMN_MATCH_MODE | (For Parquet only) Determines if column name matching during COPY INTO is `case-sensitive` or `case-insensitive` (default). | Optional |
247
+
:::tip
248
+
When importing large volumes of data (like logs), set both `PURGE` and `FORCE` to `true` for efficient data import without Meta server interaction. Note this may lead to duplicate data imports.
249
+
:::
197
250
198
251
:::tip
199
252
When importing large volumes of data, such as logs, it is recommended to set both `PURGE` and `FORCE` to `true`. This ensures efficient data import without the need for interaction with the Meta server (updating the copied-files set). However, it is important to be aware that this may lead to duplicate data imports.
@@ -213,10 +266,6 @@ COPY INTO provides a summary of the data loading results with these columns:
213
266
214
267
If `RETURN_FAILED_ONLY` is set to `true`, the output will only contain the files that failed to load.
215
268
216
-
## Distributed COPY INTO
217
-
218
-
The COPY INTO feature in Databend activates distributed execution automatically in cluster environments, enhancing data loading efficiency and scalability.
0 commit comments