Skip to content

Commit 06e5058

Browse files
authored
docs: parquet compression options (#1725)
* Update 50-file-format-options.md * Update 50-file-format-options.md
1 parent bc0908c commit 06e5058

File tree

1 file changed

+24
-26
lines changed

1 file changed

+24
-26
lines changed

docs/en/sql-reference/00-sql-reference/50-file-format-options.md

Lines changed: 24 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Input & Output File Formats
33
---
44
import FunctionDescription from '@site/src/components/FunctionDescription';
55

6-
<FunctionDescription description="Introduced or updated: v1.2.530"/>
6+
<FunctionDescription description="Introduced or updated: v1.2.713"/>
77

88
Databend accepts a variety of file formats both as a source and as a target for data loading or unloading. This page explains the supported file formats and their available options.
99

@@ -72,14 +72,10 @@ Separates fields in a record.
7272

7373
**Default**: `,` (comma)
7474

75-
### QUOTE
75+
### QUOTE (Load Only)
7676

7777
Quotes strings in a CSV file. For data loading, the quote is not necessary unless a string contains the character of a [QUOTE](#quote), [ESCAPE](#escape), [RECORD_DELIMITER](#record_delimiter), or [FIELD_DELIMITER](#field_delimiter).
7878

79-
:::note
80-
**Used for data loading ONLY**: This option is not available when you unload data from Databend.
81-
:::
82-
8379
**Available Values**: `'`, `"`, or `(backtick)
8480

8581
**Default**: `"`
@@ -92,49 +88,43 @@ Escapes a quote in a quoted string.
9288

9389
**Default**: `''`
9490

95-
### SKIP_HEADER
91+
### SKIP_HEADER (Load Only)
9692

9793
Specifies how many lines to be skipped from the beginning of the file.
9894

99-
:::note
100-
**Used for data loading ONLY**: This option is not available when you unload data from Databend.
101-
:::
102-
10395
**Default**: `0`
10496

105-
### NAN_DISPLAY
97+
### NAN_DISPLAY (Load Only)
10698

10799
Specifies how "NaN" (Not-a-Number) values are displayed in query results.
108100

109101
**Available Values**: Must be literal `'nan'` or `'null'` (case-insensitive)
110102

111103
**Default**: `'NaN'`
112104

113-
### NULL_DISPLAY
105+
### NULL_DISPLAY (Load Only)
114106

115107
Specifies how NULL values are displayed in query results.
116108

117109
**Default**: `'\N'`
118110

119-
### ERROR_ON_COLUMN_COUNT_MISMATCH
111+
### ERROR_ON_COLUMN_COUNT_MISMATCH (Load Only)
120112

121113
ERROR_ON_COLUMN_COUNT_MISMATCH is a boolean option that, when set to true, specifies that an error should be raised if the number of columns in the data file doesn't match the number of columns in the destination table. Setting it to true helps ensure data integrity and consistency during the loading process.
122114

123115
**Default**: `true`
124116

125-
### EMPTY_FIELD_AS
117+
### EMPTY_FIELD_AS (Load Only)
126118

127119
Specifies the value that should be used when encountering empty fields, including both `,,` and `,"",`, in the CSV data being loaded into the table.
128120

129-
**Available Values**:
130-
131-
| Value | Description |
121+
| Available Values | Description |
132122
|------------------|-----------------------------------------------------------------------------------|
133123
| `null` (Default) | Interprets empty fields as NULL values. Applicable to nullable columns only. |
134124
| `string` | Interprets empty fields as empty strings (''). Applicable to String columns only. |
135125
| `field_default` | Uses the column's default value for empty fields. |
136126

137-
### OUTPUT_HEADER
127+
### OUTPUT_HEADER (Unload Only)
138128

139129
Specifies whether to include a header row in the CSV file when exporting data with the `COPY INTO <location>` command. Defaults to `false`.
140130

@@ -146,9 +136,7 @@ Controls the binary encoding format during both data export and import operation
146136

147137
Specifies the compression algorithm.
148138

149-
**Available Values**:
150-
151-
| Value | Description |
139+
| Available Values | Description |
152140
|------------------|-----------------------------------------------------------------|
153141
| `NONE` (Default) | Indicates that the files are not compressed. |
154142
| `AUTO` | Auto detect compression via file extensions |
@@ -209,7 +197,7 @@ Same as [the COMPRESSION option for CSV](#compression).
209197

210198
## NDJSON Options
211199

212-
### NULL_FIELD_AS
200+
### NULL_FIELD_AS (Load Only)
213201

214202
Specifies how to handle null values during data loading. Refer to the options in the table below for possible configurations.
215203

@@ -218,7 +206,7 @@ Specifies how to handle null values during data loading. Refer to the options in
218206
| `NULL` (Default) | Interprets null values as NULL for nullable fields. An error will be generated for non-nullable fields. |
219207
| `FIELD_DEFAULT` | Uses the default value of the field for null values. |
220208

221-
### MISSING_FIELD_AS
209+
### MISSING_FIELD_AS (Load Only)
222210

223211
Determines the behavior when encountering missing fields during data loading. Refer to the options in the table below for possible configurations.
224212

@@ -234,7 +222,7 @@ Same as [the COMPRESSION option for CSV](#compression).
234222

235223
## PARQUET Options
236224

237-
### MISSING_FIELD_AS
225+
### MISSING_FIELD_AS (Load Only)
238226

239227
Determines the behavior when encountering missing fields during data loading. Refer to the options in the table below for possible configurations.
240228

@@ -243,9 +231,19 @@ Determines the behavior when encountering missing fields during data loading. Re
243231
| `ERROR` (Default)| Generates an error if a missing field is encountered. |
244232
| `FIELD_DEFAULT` | Uses the default value of the field for missing fields. |
245233

234+
### COMPRESSION (Unload Only)
235+
236+
Specifies the compression algorithm, which is used for compressing internal blocks of the file rather than the entire file, so the output remains in Parquet format.
237+
238+
| Available Values | Description |
239+
|------------------|-----------------------------------------------------------------------------|
240+
| `ZSTD` (default) | Zstandard v0.8 (and higher) is supported. |
241+
| `SNAPPY` | Snappy is a popular and fast compression algorithm often used with Parquet. |
242+
243+
246244
## ORC Options
247245

248-
### MISSING_FIELD_AS
246+
### MISSING_FIELD_AS (Load Only)
249247

250248
Determines the behavior when encountering missing fields during data loading. Refer to the options in the table below for possible configurations.
251249

0 commit comments

Comments
 (0)