Skip to content

Commit 4febf28

Browse files
guilloadfmassot
andauthored
Document source configuration (#1053)
Co-authored-by: François Massot <[email protected]>
1 parent f3969a6 commit 4febf28

File tree

1 file changed

+117
-1
lines changed

1 file changed

+117
-1
lines changed

docs/reference/source-config.md

Lines changed: 117 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,120 @@ title: Source configuration
33
position: 4
44
---
55

6-
WIP on Notion.
6+
Quickwit can insert data into an index from one or multiple sources. When creating an index, sources are declared in the [index config](index-config.md). Additional sources can be added later using the [CLI command](cli.md#source) `quickwit source add`.
7+
8+
A source is declared using an object called source config. A source config uniquely identifies and defines a source. It consists of three parameters:
9+
10+
- source ID
11+
- source type
12+
- source parameters
13+
14+
*Source ID*
15+
16+
The source ID is a string that uniquely identifies the source within an index. It may only contain uppercase or lowercase ASCII letters, digits, hyphens (`-`), periods (`.`) and underscores (`_`). The source ID must start with a letter and must not be longer than 255 characters.
17+
18+
*Source type*
19+
20+
The source type designates the kind of source being configured. As of version 0.2, available source types are `file` and `kafka`.
21+
22+
*Source parameters*
23+
24+
The source parameters indicate how to connect to a data store and are specific to the type of source.
25+
26+
## File source
27+
28+
A file source reads data from a local file. The file must consist of JSON objects separated by a newline. As of version 0.2, compressed files (bz2, gzip, ...) and remote files (Amazon S3, HTTP, ...) are not supported.
29+
30+
### File source parameters
31+
32+
| Property | Description | Default value |
33+
| --- | --- | --- |
34+
| filepath | Path to a local file consisting of JSON objects separated by a newline. | |
35+
36+
*Declaring a file source in an [index config](index-config.md) (YAML)*
37+
38+
```yaml
39+
# Version of the index config file format
40+
version: 0
41+
42+
# Sources
43+
sources:
44+
- source_id: my-source-id
45+
source_type: file
46+
params:
47+
filepath: path/to/local/file.json
48+
49+
# The rest of your index config here
50+
# ...
51+
```
52+
53+
*Adding a file source to an index with the [CLI](cli.md#source)*
54+
55+
```bash
56+
quickwit source add --index my-index-id --source my-source-id --type file --params '{"filepath": "path/to/file.json"}'
57+
```
58+
59+
Finally, note that the [CLI command](clid.md#index) `quickwit index ingest` allows ingesting data directly from a file or the standard input without creating a source beforehand.
60+
61+
## Kafka source
62+
63+
A Kafka source reads data from a Kafka stream. Each message in the stream must hold a JSON object.
64+
65+
### Kafka source parameters
66+
67+
The Kafka source consumes a `topic` using the client library [librdkafka](https://github.com/edenhill/librdkafka) and forwards the key-value pairs carried by the parameter `client_params` to the underlying librdkafka consumer. Common `client_params` options are bootstrap servers (`bootstrap.servers`), consumer group ID (`group.id`), or security protocol (`security.protocol`). Please, refer to [Kafka](https://kafka.apache.org/documentation/#consumerconfigs) and [librdkafka](https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md) documentation pages for more advanced options.
68+
69+
| Property | Description | Default value |
70+
| --- | --- | --- |
71+
| topic | Name of the topic to consume. | |
72+
| client_log_level | librdkafka client log level. Possible values are: debug, info, warn, error. | info |
73+
| client_params | librdkafka client configuration parameters. | |
74+
75+
Note that the Kafka source manages commit offsets manually thanks to Quickwit’s index checkpoint mechanism and always disables auto-commit.
76+
77+
*Declaring a Kafka source in an [index config](index-config.md) (YAML)*
78+
79+
80+
```yaml
81+
# Version of the index config file format
82+
version: 0
83+
84+
# Sources
85+
sources:
86+
- source_id: my-source-id
87+
source_type: kafka
88+
params:
89+
topic: my-topic
90+
client_params:
91+
bootstrap.servers: localhost:9092
92+
group.id: my-group-id
93+
security.protocol: SSL
94+
95+
# The rest of your index config here
96+
# ...
97+
```
98+
99+
*Adding a Kafka source to an index with the [CLI](cli.md#source)*
100+
101+
```bash
102+
cat << EOF > my-kafka-source.json
103+
{
104+
"topic": "my-topic",
105+
"client_params": {
106+
"bootstrap.servers": "localhost:9092",
107+
"group.id": "my-group-id",
108+
"security.protocol": "SSL"
109+
}
110+
}
111+
EOF
112+
quickwit source add --index my-index-id --source my-source-id --type kafka --params my-kafka-source.json
113+
```
114+
115+
## Deleting a source from an index
116+
A source can be removed from an index using the [CLI command](cli.md) `quickwit source delete`:
117+
118+
```bash
119+
quickwit source delete --index my-index-id --source my-source-id
120+
```
121+
122+
When deleting a source, the checkpoint associated with the source is also removed.

0 commit comments

Comments
 (0)