Skip to content

Commit b791cc8

Browse files
authored
docs: add self-hosted HTTP streaming load (#3034)
1 parent 335aaba commit b791cc8

File tree

6 files changed

+351
-6
lines changed

6 files changed

+351
-6
lines changed

docs/cn/guides/20-self-hosted/04-references/admin-users.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
---
22
title: 配置管理用户
3+
sidebar_position: 10
34
---
45
import FunctionDescription from '@site/src/components/FunctionDescription';
56

@@ -30,4 +31,4 @@ Databend 不提供任何开箱即用的内置管理用户。在 Databend 启动
3031

3132
```shell
3233
echo -n "databend" | sha256sum
33-
```
34+
```
Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
---
2+
title: HTTP 流式导入(本地文件)
3+
sidebar_label: HTTP 流式导入
4+
sidebar_position: 30
5+
---
6+
7+
本页介绍 **Databend 自建(self-hosted)** 环境下的 **HTTP Streaming Load**(Databend Query HTTP Handler)。
8+
9+
它可以让你把本地文件直接通过 HTTP 上传,并在同一次请求里写入表中,无需先把文件上传到 stage。
10+
11+
## 概览
12+
13+
HTTP Streaming Load 是一个专门用来“边传边导入”的接口:服务端接收 `multipart/form-data` 上传的文件流,然后执行一条读取特殊占位符 `@_databend_load``INSERT` 语句,把文件内容写入目标表。
14+
15+
适用场景:
16+
17+
- 本地文件想直接导入,不想先上传到 stage。
18+
- 文件很大,不适合以单个对象的形式落到对象存储中。
19+
20+
## 接口用法
21+
22+
**Endpoint:** `PUT /v1/streaming_load`
23+
24+
### 请求
25+
26+
- 认证:HTTP Basic auth(与其它 HTTP Handler 接口一致)。
27+
- Headers:
28+
- `X-Databend-SQL`(必需):一条从 `@_databend_load` 读取的 `INSERT` 语句。
29+
- Body:
30+
- `multipart/form-data`,仅包含一个文件字段,字段名必须是 `upload`
31+
32+
**SQL 结构(必需):**
33+
34+
```sql
35+
INSERT INTO <db>.<table>
36+
FROM @_databend_load
37+
FILE_FORMAT=(type=<format> [<options>...])
38+
```
39+
40+
**cURL 模板:**
41+
42+
```shell
43+
curl -u "<user>:<password>" \
44+
-H "X-Databend-SQL: insert into <db>.<table> from @_databend_load file_format=(type=csv ...)" \
45+
-F "upload=@./file.csv" \
46+
-X PUT "http://<host>:8000/v1/streaming_load"
47+
```
48+
49+
### 返回
50+
51+
成功时返回类似:
52+
53+
```json
54+
{"id":"<query_id>","stats":{"rows":<rows>,"bytes":<bytes>}}
55+
```
56+
57+
### 注意与限制
58+
59+
- `@_databend_load` 只在 `PUT /v1/streaming_load` 中有效;用 `POST /v1/query` 执行会报错。
60+
- Streaming load 目前支持的格式:**CSV****TSV****NDJSON****Parquet**
61+
62+
## CSV 的 FILE_FORMAT 选项
63+
64+
CSV 的解析规则通过 `FILE_FORMAT=(...)` 指定,语法与 Databend 的文件格式选项一致。更多选项请参考 [输入输出文件格式](/sql/sql-reference/file-format-options)
65+
66+
常用参数:
67+
68+
- `skip_header=1`:跳过首行表头。
69+
- `field_delimiter=','`:字段分隔符(默认 `,`)。
70+
- `quote='\"'`:引用符号。
71+
- `record_delimiter='\n'`:行分隔符。
72+
73+
示例:
74+
75+
```text
76+
X-Databend-SQL: insert into demo.people from @_databend_load file_format=(type=csv skip_header=1)
77+
```
78+
79+
```text
80+
X-Databend-SQL: insert into demo.people from @_databend_load file_format=(type=csv field_delimiter='|' quote='\"' skip_header=1)
81+
```
82+
83+
## 教程
84+
85+
下面用一个本地 CSV 演示从启动到导入的完整流程。
86+
87+
### 准备工作
88+
89+
- 已启动的 self-hosted `databend-query`(HTTP Handler 默认端口 `8000`)。
90+
- 本机已安装 `curl`
91+
92+
### 步骤 1:用 Docker 快速启动(用于验证)
93+
94+
:::note
95+
`PUT /v1/streaming_load` 在较新的 nightly 版本中可用。如果你使用稳定版镜像返回 `404 Not Found`,请切换到 `:nightly`(或自行编译 Databend)。
96+
:::
97+
98+
```shell
99+
docker run -d --name databend-streaming-load \
100+
-p 8000:8000 \
101+
-e MINIO_ENABLED=true \
102+
-e QUERY_DEFAULT_USER=databend \
103+
-e QUERY_DEFAULT_PASSWORD=databend \
104+
--restart unless-stopped \
105+
datafuselabs/databend:nightly
106+
```
107+
108+
等待服务就绪:
109+
110+
```shell
111+
docker logs -f databend-streaming-load
112+
```
113+
114+
### 步骤 2:建库建表
115+
116+
```shell
117+
curl -sS -u databend:databend \
118+
-H 'Content-Type: application/json' \
119+
-d '{"sql":"create database if not exists demo"}' \
120+
http://localhost:8000/v1/query/ >/dev/null
121+
122+
curl -sS -u databend:databend \
123+
-H 'Content-Type: application/json' \
124+
-d '{"sql":"create or replace table demo.people (id int, name string, age int)"}' \
125+
http://localhost:8000/v1/query/ >/dev/null
126+
```
127+
128+
### 步骤 3:准备本地 CSV 文件
129+
130+
这个示例 CSV:
131+
132+
- 第一行是表头(`id,name,age`
133+
- 使用英文逗号作为分隔符(`,`
134+
135+
```shell
136+
cat > people.csv << 'EOF'
137+
id,name,age
138+
1,Alice,30
139+
2,Bob,41
140+
EOF
141+
```
142+
143+
### 步骤 4:上传并导入
144+
145+
```shell
146+
curl -sS -u databend:databend \
147+
-H "X-Databend-SQL: insert into demo.people from @_databend_load file_format=(type=csv field_delimiter=',' skip_header=1)" \
148+
-F "upload=@./people.csv" \
149+
-X PUT "http://localhost:8000/v1/streaming_load"
150+
```
151+
152+
### 步骤 5:验证结果
153+
154+
```shell
155+
curl -sS -u databend:databend \
156+
-H 'Content-Type: application/json' \
157+
-d '{"sql":"select * from demo.people order by id"}' \
158+
http://localhost:8000/v1/query/
159+
```
160+
161+
## 常见问题排查
162+
163+
- `/v1/streaming_load` 返回 `404 Not Found`:使用 `datafuselabs/databend:nightly`(或自行编译)。
164+
- 返回 `415 Unsupported Media Type`:请求必须是 `multipart/form-data`,且文件字段名必须是 `upload`
165+
- 提示缺少 `X-Databend-SQL`:加上该 header,并确保 SQL 包含 `FROM @_databend_load`
Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
{
2-
"label": "节点配置"
3-
}
2+
"label": "节点配置",
3+
"position": 20
4+
}

docs/en/guides/20-self-hosted/04-references/admin-users.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
---
22
title: Configuring Admin Users
3+
sidebar_position: 10
34
---
45
import FunctionDescription from '@site/src/components/FunctionDescription';
56

@@ -30,4 +31,4 @@ To generate the **auth_string**, use cryptographic hash functions. Here's how yo
3031

3132
```shell
3233
echo -n "databend" | sha256sum
33-
```
34+
```
Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
---
2+
title: HTTP Streaming Load (Local Files)
3+
sidebar_label: HTTP Streaming Load
4+
sidebar_position: 30
5+
---
6+
7+
This page describes **HTTP Streaming Load** for **self-hosted Databend** (Databend Query HTTP handler).
8+
9+
Use it to upload a local file and load it into a table **in the same request** (no staging step).
10+
11+
## Overview
12+
13+
HTTP streaming load is an HTTP endpoint that accepts a file upload (multipart) and executes an `INSERT` statement that reads from the special placeholder `@_databend_load`.
14+
15+
It is useful when you:
16+
17+
- Want to load a local file without uploading it to a stage first.
18+
- Need to stream large files that should not be stored as a single object in object storage.
19+
20+
## API Usage
21+
22+
**Endpoint:** `PUT /v1/streaming_load`
23+
24+
### Request
25+
26+
- Auth: HTTP Basic auth (same as other HTTP handler endpoints).
27+
- Headers:
28+
- `X-Databend-SQL` (required): an `INSERT` statement that reads from `@_databend_load`.
29+
- Body:
30+
- `multipart/form-data` with a single file field named `upload`.
31+
32+
**SQL format (required):**
33+
34+
```sql
35+
INSERT INTO <db>.<table>
36+
FROM @_databend_load
37+
FILE_FORMAT=(type=<format> [<options>...])
38+
```
39+
40+
**cURL template:**
41+
42+
```shell
43+
curl -u "<user>:<password>" \
44+
-H "X-Databend-SQL: insert into <db>.<table> from @_databend_load file_format=(type=csv ...)" \
45+
-F "upload=@./file.csv" \
46+
-X PUT "http://<host>:8000/v1/streaming_load"
47+
```
48+
49+
Example (CSV with header row):
50+
51+
```text
52+
X-Databend-SQL: insert into demo.people from @_databend_load file_format=(type=csv skip_header=1)
53+
```
54+
55+
### Response
56+
57+
On success, returns JSON like:
58+
59+
```json
60+
{"id":"<query_id>","stats":{"rows":<rows>,"bytes":<bytes>}}
61+
```
62+
63+
### Notes & Limitations
64+
65+
- `@_databend_load` is only valid for `PUT /v1/streaming_load`. It is not accepted by `POST /v1/query`.
66+
- Supported formats for streaming load: **CSV**, **TSV**, **NDJSON**, **Parquet**.
67+
68+
## CSV FILE_FORMAT options
69+
70+
CSV options are specified in the `FILE_FORMAT=(...)` clause, using the same syntax as Databend file format options. See [Input & Output File Formats](/sql/sql-reference/file-format-options).
71+
72+
Common CSV options:
73+
74+
- `skip_header=1`: Skip the first header row.
75+
- `field_delimiter=','`: Use a custom delimiter (default is `,`).
76+
- `quote='\"'`: Quote character.
77+
- `record_delimiter='\n'`: Line delimiter.
78+
79+
Examples:
80+
81+
```text
82+
X-Databend-SQL: insert into demo.people from @_databend_load file_format=(type=csv skip_header=1)
83+
```
84+
85+
```text
86+
X-Databend-SQL: insert into demo.people from @_databend_load file_format=(type=csv field_delimiter='|' quote='\"' skip_header=1)
87+
```
88+
89+
## Tutorial
90+
91+
This tutorial uses a local CSV file and loads it into a table on a self-hosted Databend.
92+
93+
### Before You Start
94+
95+
- A running self-hosted `databend-query` with the HTTP handler enabled (default port `8000`).
96+
- `curl` installed.
97+
98+
### Step 1. Start Databend with Docker (Quick Test)
99+
100+
:::note
101+
`PUT /v1/streaming_load` is available in recent nightly builds. If you use a stable Docker tag and get `404 Not Found`, try the `:nightly` image (or build Databend from source).
102+
:::
103+
104+
```shell
105+
docker run -d --name databend-streaming-load \
106+
-p 8000:8000 \
107+
-e MINIO_ENABLED=true \
108+
-e QUERY_DEFAULT_USER=databend \
109+
-e QUERY_DEFAULT_PASSWORD=databend \
110+
--restart unless-stopped \
111+
datafuselabs/databend:nightly
112+
```
113+
114+
Wait until it is ready:
115+
116+
```shell
117+
docker logs -f databend-streaming-load
118+
```
119+
120+
### Step 2. Create a Table
121+
122+
```shell
123+
curl -sS -u databend:databend \
124+
-H 'Content-Type: application/json' \
125+
-d '{"sql":"create database if not exists demo"}' \
126+
http://localhost:8000/v1/query/ >/dev/null
127+
128+
curl -sS -u databend:databend \
129+
-H 'Content-Type: application/json' \
130+
-d '{"sql":"create or replace table demo.people (id int, name string, age int)"}' \
131+
http://localhost:8000/v1/query/ >/dev/null
132+
```
133+
134+
### Step 3. Prepare a Local CSV File
135+
136+
This tutorial uses a CSV file with:
137+
138+
- A header row (`id,name,age`)
139+
- Comma delimiter (`,`)
140+
141+
```shell
142+
cat > people.csv << 'EOF'
143+
id,name,age
144+
1,Alice,30
145+
2,Bob,41
146+
EOF
147+
```
148+
149+
### Step 4. Upload and Load with HTTP Streaming Load
150+
151+
Send a `PUT /v1/streaming_load` request:
152+
153+
- The SQL must be provided in header `X-Databend-SQL`.
154+
- The file must be uploaded as `multipart/form-data` with field name `upload`.
155+
156+
```shell
157+
curl -sS -u databend:databend \
158+
-H "X-Databend-SQL: insert into demo.people from @_databend_load file_format=(type=csv field_delimiter=',' skip_header=1)" \
159+
-F "upload=@./people.csv" \
160+
-X PUT "http://localhost:8000/v1/streaming_load"
161+
```
162+
163+
### Step 5. Verify the Data
164+
165+
```shell
166+
curl -sS -u databend:databend \
167+
-H 'Content-Type: application/json' \
168+
-d '{"sql":"select * from demo.people order by id"}' \
169+
http://localhost:8000/v1/query/
170+
```
171+
172+
## Troubleshooting
173+
174+
- `404 Not Found` on `/v1/streaming_load`: use `datafuselabs/databend:nightly` (or build Databend from source).
175+
- `415 Unsupported Media Type`: send `multipart/form-data` and include exactly one file field named `upload`.
176+
- `400 Missing required header X-Databend-SQL`: add the `X-Databend-SQL` header and make sure it contains `FROM @_databend_load`.
Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
{
2-
"label": "Node Configurations"
3-
}
2+
"label": "Node Configurations",
3+
"position": 20
4+
}

0 commit comments

Comments
 (0)