Skip to content

Commit d6bbdf3

Browse files
authored
Add storage uri and metastore uri (#1066)
* Add storage uri and metastore uri * Fix broken links
1 parent 3bd0b92 commit d6bbdf3

File tree

9 files changed

+178
-8
lines changed

9 files changed

+178
-8
lines changed

docs/reference/index-config.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Index configuration
3-
position: 3
3+
position: 4
44
---
55

66
This page describes how to configure an index.

docs/reference/metastore-config.md

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
---
2+
title: Metastore configuration
3+
position: 3
4+
---
5+
6+
Quickwit needs a place to store meta-information about its indexes.
7+
8+
For instance:
9+
10+
- The index configuration.
11+
- Meta-information about its splits. For instance, their IDs, the number of documents they contain, their sizes, their min/max timestamp, and the set of tags present in the split.
12+
- The different sources checkpoints.
13+
- Some extra information such as the index creation time.
14+
15+
The metastore is entirely defined by a single URI. One can set it by editing the `metastore_uri` parameter of the [Quickwit configuration file](https://www.notion.so/Quickwit-configuration-MERGED-3fab5a181a9a43cba83db2fb25b46729) (often named `quickwit.yaml`).
16+
17+
Currently, Quickwit offers two implementations:
18+
19+
- **PostgreSQL**: recommended for distributed usage.
20+
- **File-backed implementation**.
21+
22+
# PostgreSQL Metastore
23+
24+
We recommend the PostgreSQL metastore for any distributed usage.
25+
26+
The PostgreSQL metastore can be configured by setting a PostgreSQL URI in the `metastore_uri` parameter of the Quickwit configuration file. The URI takes the following format:
27+
28+
```
29+
postgres://[user]:[password]@[host]:[port]/[dbname]
30+
```
31+
32+
Some of those parameters can be omitted. The following PostgreSQL URIs are for instance valid:
33+
34+
```
35+
postgres://localhost/mydb
36+
postgres://user@localhost
37+
postgres://user:secret@localhost
38+
postgres://host1:123,host2:456/mydb
39+
```
40+
41+
The database has to be created in advance.
42+
43+
On its first execution, Quickwit will transparently create the necessary tables.
44+
45+
Likewise, if you upgrade Quickwit to a version that includes some changes in the PostgreSQL schema, Quickwit will transparently operate the migration startup.
46+
47+
# File-backed metastore
48+
49+
For convenience, Quickwit also makes it possible to store its metadata in files using a file-backed metastore. In that case, Quickwit will write one file per index.
50+
51+
The metastore is then configured by passing a [Storage URI](https://www.notion.so/Storage-URI-APPROVED-176d8befb8d144fb820bcd0df077a728) that will serve as the root of the metastore storage.
52+
53+
The metadata file associated with a given index will then be stored under
54+
55+
`[storage_uri]/[index_id]/metastore.json`
56+
57+
For the moment, Quickwit supports two types of storage types:
58+
59+
- a local file system URI (e.g., `file:///opt/toto`). It is also valid to pass a file path directly (without file://). `/var/quickwit`. Relative paths will be resolved with respect to the current working directory.
60+
- S3-compatible storage URI (e.g. `s3://my-bucket/some-path`] ). See the [Storage URI](https://www.notion.so/Storage-URI-APPROVED-176d8befb8d144fb820bcd0df077a728) documentation to configure S3 or S3-compatible storage.
61+
62+
### Polling configuration
63+
64+
By default, the File-Backed Metastore is only read once when you start a Quickwit process (searcher, indexer,...).
65+
66+
You can also configure it to poll the File-Backed Metastore periodically to keep a fresh view of it. This is useful for a Searcher instance that needs to be aware of new splits published by an Indexer running in parallel.
67+
68+
To configure the polling interval (in seconds only), add a URI fragment to the storage URI like this: `s3://quickwit/my-indexes#polling_interval=30s`
69+
70+
<aside>
71+
👌 Amazon S3 charges $0.0004 per 1000 GET requests. Polling a metastore every 30 seconds will induce a cost of $0.04 per month and per index.
72+
73+
</aside>
74+
75+
### Examples
76+
77+
The following file-backed metastore URIs for instance are valid:
78+
79+
```markdown
80+
s3://my-indexes
81+
s3://quickwit/my-indexes
82+
s3://quickwit/my-indexes#polling_interval=30s
83+
file:///local/indices
84+
file:///local/indices#polling_interval=30s
85+
/local/indices
86+
./quickwit-metastores
87+
```
88+
89+
<aside>
90+
⛔ The file-backed metastore does not allow concurrent writes. For this reason, it should not be used in distributed settings.
91+
Running several indexer services on the same file-backed metastore can lead to the corruption of the metastore.
92+
Running several search services, on the other hand, is perfectly safe.
93+
94+
</aside>

docs/reference/ports.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Quickwit's hosts and ports
3-
position: 7
3+
position: 8
44
---
55

66
When starting a quickwit search server, one important parameter that can be configured is

docs/reference/query-language.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Query language
3-
position: 6
3+
position: 7
44
---
55

66
Quickwit uses a query mini-language which is used by providing a `query` parameter to the search endpoints.

docs/reference/quickwit-config.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ A commented example is accessible here: [quickwit.yaml]([link](https://github.co
2727

2828
## Indexer configuration
2929

30-
This section contains the configuration options for an indexer. The split store is documented in the [indexing document](indexing.md#split-store).
30+
This section contains the configuration options for an indexer. The split store is documented in the [indexing document](../design/indexing.md#split-store).
3131

3232
| Property | Description | Default value |
3333
| --- | --- | --- |

docs/reference/rest-api.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Search REST API
3-
position: 5
3+
position: 9
44
---
55

66
## API version

docs/reference/source-config.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Source configuration
3-
position: 4
3+
position: 5
44
---
55

66
Quickwit can insert data into an index from one or multiple sources. When creating an index, sources are declared in the [index config](index-config.md). Additional sources can be added later using the [CLI command](cli.md#source) `quickwit source add`.
@@ -56,7 +56,7 @@ sources:
5656
quickwit source add --index my-index-id --source my-source-id --type file --params '{"filepath": "path/to/file.json"}'
5757
```
5858

59-
Finally, note that the [CLI command](clid.md#index) `quickwit index ingest` allows ingesting data directly from a file or the standard input without creating a source beforehand.
59+
Finally, note that the [CLI command](cli.md#index) `quickwit index ingest` allows ingesting data directly from a file or the standard input without creating a source beforehand.
6060

6161
## Kafka source
6262

docs/reference/storage-uri.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
---
2+
title: Storage URI
3+
position: 6
4+
---
5+
6+
In Quickwit, Storage URIs refer to different kinds of storage.
7+
8+
Generally speaking, you can use a storage URI or a regular file path wherever you would have expected a file path.
9+
10+
11+
For instance
12+
13+
- when configuring the index storage. (Passed as the `index_uri` in the index command line.)
14+
- when configuring a file-backed metastore. (`metastore_uri` in the QuickwitConfig).
15+
- when passing a config file in the command line. (you can store your `quickwit.yaml` on Amazon S3 if you want)
16+
17+
Right now, only two types of storage are supported.
18+
19+
## Local file system
20+
21+
One can refer to the file system storage by using a file path directly, or a URI with the `file://` protocol. Relative file paths are allowed and are resolved relatively to the current working directory (CWD). `~` can be used as a shortcut to refer to the current user directory.
22+
23+
The following are valid local file system URIs
24+
25+
```markdown
26+
- /var/quickwit
27+
- file:///var/quickwit
28+
- /home/quickwit/data
29+
- ~/data
30+
- ./quickwit
31+
```
32+
33+
<aside>
34+
⚠️ When using the `file://` protocol, a third `/` is necessary to express an absolute path.
35+
36+
For instance, the following URI `file://home/quickwit/` is interpreted as `./home/quickwit`
37+
38+
</aside>
39+
40+
## Amazon S3
41+
42+
It is also possible to refer to Amazon S3 using a S3 URI. S3 URIs must have to follow the following format:
43+
44+
```markdown
45+
s3://<bucket name>/<key>
46+
```
47+
48+
For instance
49+
50+
```markdown
51+
s3://quickwit-prod/quickwit-indexes
52+
```
53+
54+
The credentials, as well as the region or the custom endpoint, have to be configured separately, using the methods described below.
55+
56+
### S3 credentials
57+
58+
Quickwit will detect the S3 credentials using the first successful method in this list (order matters)
59+
60+
- check for environment variables (`AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`)
61+
- check for the configuration in the `~/.aws/credentials` filepath.
62+
- check for the [Amazon ECS environment](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html)
63+
- check the [EC2 instance metadata API](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html)
64+
65+
### Region
66+
67+
The region will be detected using the first successful method in this list (order matters)
68+
69+
- `AWS_DEFAULT_REGION` environment variable
70+
- `AWS_REGION` environment variable
71+
- Amazon’s instance metadata API [https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html)
72+
73+
<aside>
74+
⚠️ Custom endpoints are not supported yet.
75+
76+
</aside>

docs/reference/telemetry.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Telemetry
3-
position: 8
3+
position: 10
44
---
55

66
Quickwit Inc. collects anonymous data regarding general usage to help us drive our development. Privacy and transparency are at the heart of Quickwit values and we only collect the minimal useful data and don't use any third party tool for the collection.

0 commit comments

Comments
 (0)