Skip to content

Commit 95f1875

Browse files
authored
docs(self-hosted): add streaming load selected-columns tutorial (#3036)
* docs: add streaming load column tutorial * docs: clarify streaming load column mapping * docs: add examples for streaming load column mapping * docs: clarify CSV non-adjacent column limitation * docs: align optional city example with schema
1 parent 79e22cd commit 95f1875

File tree

2 files changed

+140
-2
lines changed

2 files changed

+140
-2
lines changed

docs/cn/guides/20-self-hosted/04-references/http-streaming-load.md

Lines changed: 70 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,40 @@ FILE_FORMAT=(type=<format> [<options>...])
5353
X-Databend-SQL: insert into demo.people(name,age,city) values (?, ?, 'BJ') from @_databend_load file_format=(type=csv skip_header=1)
5454
```
5555

56+
### 列映射规则
57+
58+
- **不写列清单,也不写 `VALUES`**:按表的列定义顺序写入(文件字段依次对应表列)。
59+
- CSV 表头:`id,name,age`
60+
- SQL:
61+
```text
62+
X-Databend-SQL: insert into demo.people from @_databend_load file_format=(type=csv skip_header=1)
63+
```
64+
- **写了列清单,但不写 `VALUES`**:按列清单的顺序写入(文件字段依次对应列清单)。
65+
- CSV 表头:`id,name`
66+
- SQL:
67+
```text
68+
X-Databend-SQL: insert into demo.people(id,name) from @_databend_load file_format=(type=csv skip_header=1)
69+
```
70+
- **写了列清单且写 `VALUES`**:
71+
- 每个目标列对应 `VALUES` 中的一个表达式。
72+
- `VALUES` 里的每个 `?` 会依次消费上传文件里的一个字段。
73+
- CSV 表头:`name,age`
74+
- SQL:
75+
```text
76+
X-Databend-SQL: insert into demo.people(name,age,city) values (?, ?, 'BJ') from @_databend_load file_format=(type=csv skip_header=1)
77+
```
78+
- **未提供的列**:
79+
- 如果该列有 `DEFAULT`,则使用默认值;
80+
- 否则写入 `NULL`(若列是 `NOT NULL` 则会失败)。
81+
- **只读取 CSV 的部分字段(忽略多余字段)**:
82+
- 默认情况下,如果文件字段数多于目标列清单,会直接报错。
83+
- 如需忽略多余字段,设置 `error_on_column_count_mismatch=false`:
84+
```text
85+
X-Databend-SQL: insert into demo.people(id,name) from @_databend_load file_format=(type=csv skip_header=1 error_on_column_count_mismatch=false)
86+
```
87+
- 这个能力只适用于“取前 N 列”的场景。Streaming load 按字段位置映射,不支持挑选非连续列(例如 `id,name,age` 想只导入 `id` 和 `age`)。
88+
- 解决思路:先在本地把 CSV 预处理成只包含需要的列,或先上传到 stage 再用 `SELECT $1, $3 FROM @stage/file.csv` 这种方式做列投影。
89+
5690
**cURL 模板:**
5791
5892
```shell
@@ -85,6 +119,7 @@ CSV 的解析规则通过 `FILE_FORMAT=(...)` 指定,语法与 Databend 的文
85119
- `field_delimiter=','`:字段分隔符(默认 `,`)。
86120
- `quote='\"'`:引用符号。
87121
- `record_delimiter='\n'`:行分隔符。
122+
- `error_on_column_count_mismatch=false`:允许列数不匹配并忽略多余字段。
88123

89124
示例:
90125

@@ -129,6 +164,8 @@ docker logs -f databend-streaming-load
129164

130165
### 步骤 2:建库建表
131166

167+
建一张带 `city` 列的表(在后面的可选步骤里会用到):
168+
132169
```shell
133170
curl -sS -u databend:databend \
134171
-H 'Content-Type: application/json' \
@@ -137,7 +174,7 @@ curl -sS -u databend:databend \
137174

138175
curl -sS -u databend:databend \
139176
-H 'Content-Type: application/json' \
140-
-d '{"sql":"create or replace table demo.people (id int, name string, age int)"}' \
177+
-d '{"sql":"create or replace table demo.people (id int, name string, age int, city string)"}' \
141178
http://localhost:8000/v1/query/ >/dev/null
142179
```
143180

@@ -174,6 +211,38 @@ curl -sS -u databend:databend \
174211
http://localhost:8000/v1/query/
175212
```
176213

214+
### (可选)步骤 6:只导入部分列,并用 `VALUES` 补齐其它列
215+
216+
这一小节演示:上传的文件只包含部分列,其它列用常量补齐写入。
217+
218+
1. 准备一个只包含 `name``age` 的 CSV:
219+
220+
```shell
221+
cat > people_name_age.csv << 'EOF'
222+
name,age
223+
Carol,25
224+
Dave,52
225+
EOF
226+
```
227+
228+
2. 导入到 `demo.people`,并把 `city` 固定写成常量:
229+
230+
```shell
231+
curl -sS -u databend:databend \
232+
-H "X-Databend-SQL: insert into demo.people(name,age,city) values (?, ?, 'BJ') from @_databend_load file_format=(type=csv skip_header=1)" \
233+
-F "upload=@./people_name_age.csv" \
234+
-X PUT "http://localhost:8000/v1/streaming_load"
235+
```
236+
237+
3. 验证:
238+
239+
```shell
240+
curl -sS -u databend:databend \
241+
-H 'Content-Type: application/json' \
242+
-d '{"sql":"select id,name,age,city from demo.people order by name"}' \
243+
http://localhost:8000/v1/query/
244+
```
245+
177246
## 常见问题排查
178247

179248
- `/v1/streaming_load` 返回 `404 Not Found`:使用 `datafuselabs/databend:nightly`(或自行编译)。

docs/en/guides/20-self-hosted/04-references/http-streaming-load.md

Lines changed: 70 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,40 @@ Example (load two columns from a CSV file and set a constant):
5353
X-Databend-SQL: insert into demo.people(name,age,city) values (?, ?, 'BJ') from @_databend_load file_format=(type=csv skip_header=1)
5454
```
5555

56+
### Column mapping rules
57+
58+
- **No column list, no `VALUES`**: file fields map to table columns by table definition order.
59+
- CSV header: `id,name,age`
60+
- SQL:
61+
```text
62+
X-Databend-SQL: insert into demo.people from @_databend_load file_format=(type=csv skip_header=1)
63+
```
64+
- **With column list, no `VALUES`**: file fields map to the listed columns in order.
65+
- CSV header: `id,name`
66+
- SQL:
67+
```text
68+
X-Databend-SQL: insert into demo.people(id,name) from @_databend_load file_format=(type=csv skip_header=1)
69+
```
70+
- **With column list and `VALUES`**:
71+
- Each target column gets the corresponding expression in `VALUES`.
72+
- Each `?` consumes one field from the uploaded file, in order.
73+
- CSV header: `name,age`
74+
- SQL:
75+
```text
76+
X-Databend-SQL: insert into demo.people(name,age,city) values (?, ?, 'BJ') from @_databend_load file_format=(type=csv skip_header=1)
77+
```
78+
- **Columns not provided**:
79+
- Use column `DEFAULT` value if defined.
80+
- Otherwise insert `NULL` (and fail if the column is `NOT NULL`).
81+
- **Read only part of a CSV (ignore extra fields)**:
82+
- By default, Databend errors if the file has more fields than the target column list.
83+
- To ignore extra fields, set `error_on_column_count_mismatch=false`:
84+
```text
85+
X-Databend-SQL: insert into demo.people(id,name) from @_databend_load file_format=(type=csv skip_header=1 error_on_column_count_mismatch=false)
86+
```
87+
- This only helps when you want the **first N fields**. Streaming load maps CSV fields by position and does not support selecting non-adjacent fields (for example, `id,name,age` → insert only `id` and `age`).
88+
- Workaround: preprocess the CSV to keep only the needed columns, or load via stage and project columns (for example, `SELECT $1, $3 FROM @stage/file.csv`).
89+
5690
**cURL template:**
5791
5892
```shell
@@ -91,6 +125,7 @@ Common CSV options:
91125
- `field_delimiter=','`: Use a custom delimiter (default is `,`).
92126
- `quote='\"'`: Quote character.
93127
- `record_delimiter='\n'`: Line delimiter.
128+
- `error_on_column_count_mismatch=false`: Allow column count mismatch and ignore extra fields.
94129

95130
Examples:
96131

@@ -135,6 +170,8 @@ docker logs -f databend-streaming-load
135170

136171
### Step 2. Create a Table
137172

173+
Create a table with an extra `city` column (used later in the optional step):
174+
138175
```shell
139176
curl -sS -u databend:databend \
140177
-H 'Content-Type: application/json' \
@@ -143,7 +180,7 @@ curl -sS -u databend:databend \
143180

144181
curl -sS -u databend:databend \
145182
-H 'Content-Type: application/json' \
146-
-d '{"sql":"create or replace table demo.people (id int, name string, age int)"}' \
183+
-d '{"sql":"create or replace table demo.people (id int, name string, age int, city string)"}' \
147184
http://localhost:8000/v1/query/ >/dev/null
148185
```
149186

@@ -185,6 +222,38 @@ curl -sS -u databend:databend \
185222
http://localhost:8000/v1/query/
186223
```
187224

225+
### (Optional) Step 6. Load into selected columns with `VALUES`
226+
227+
This step shows how to load only some columns from the uploaded file, and fill the rest with constants.
228+
229+
1. Prepare a CSV file that contains only `name` and `age`:
230+
231+
```shell
232+
cat > people_name_age.csv << 'EOF'
233+
name,age
234+
Carol,25
235+
Dave,52
236+
EOF
237+
```
238+
239+
2. Load the file into `demo.people`, set `city` to a constant:
240+
241+
```shell
242+
curl -sS -u databend:databend \
243+
-H "X-Databend-SQL: insert into demo.people(name,age,city) values (?, ?, 'BJ') from @_databend_load file_format=(type=csv skip_header=1)" \
244+
-F "upload=@./people_name_age.csv" \
245+
-X PUT "http://localhost:8000/v1/streaming_load"
246+
```
247+
248+
3. Verify:
249+
250+
```shell
251+
curl -sS -u databend:databend \
252+
-H 'Content-Type: application/json' \
253+
-d '{"sql":"select id,name,age,city from demo.people order by name"}' \
254+
http://localhost:8000/v1/query/
255+
```
256+
188257
## Troubleshooting
189258

190259
- `404 Not Found` on `/v1/streaming_load`: use `datafuselabs/databend:nightly` (or build Databend from source).

0 commit comments

Comments
 (0)