You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Remove split describe and extract from doc as it's ready from a user … (#1059)
* Remove split describe and extract from doc as it's ready from a user point of view. Add last examples.
* Fix guides.
* Fix fmt.
* Fix comments to be compatible with docusaurus.
* Fix broken links.
Copy file name to clipboardExpand all lines: docs/administration/cloud-env.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ We recommend picking instances with high network performance to allow faster dow
19
19
A final note on object storage requests costs. These are [quite low](https://aws.amazon.com/s3/pricing/) actually, $0,0004 / 1000 requests for GET and $0.005 / 1000 requests for PUT on AWS S3.
20
20
21
21
### PUT requests
22
-
22
+
=======
23
23
During indexing, Quickwit uploads new splits on Amazon S3 and progressively merges them until they reach 10 million documents that we call “mature splits”. Such splits have a typical size between 1GB and 10GB and will usually require 2 PUT requests to be uploaded (1 PUT request / 5GB).
24
24
25
25
With default indexing parameters `commit_timeout_secs` of 60 seconds and `merge_policy.merge_factor` of 10 and assuming you want to ingest 1 million documents every minute, this will cost you less than $1 / month.
@@ -29,7 +29,7 @@ With default indexing parameters `commit_timeout_secs` of 60 seconds and `merge_
29
29
When querying, Quickwit needs to make multiple GET requests:
Copy file name to clipboardExpand all lines: docs/get-started/quickstart.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -83,21 +83,21 @@ Now we can create the index with the command:
83
83
./quickwit index create --index-config ./wikipedia_index_config.yaml
84
84
```
85
85
86
-
Check that a directory `./qwdata/wikipedia` has been created, Quickwit will write index files here and a `quickwit.json` which contains the [index metadata](../overview/architecture.md#index-metadata).
86
+
Check that a directory `./qwdata/wikipedia` has been created, Quickwit will write index files here and a `quickwit.json` which contains the [index metadata](../design/architecture.md#index).
87
87
You're now ready to fill the index.
88
88
89
89
90
90
## Let's add some documents
91
91
92
-
Quickwit can index data from many [sources](./sources.md). We will use a new line delimited json [ndjson](http://ndjson.org/) datasets as our data source.
92
+
Quickwit can index data from many [sources](../reference/source-config.md). We will use a new line delimited json [ndjson](http://ndjson.org/) datasets as our data source.
93
93
Let's download [a bunch of wikipedia articles (10 000)](https://quickwit-datasets-public.s3.amazonaws.com/wiki-articles-10000.json) in [ndjson](http://ndjson.org/) format and index it.
The index config defines four fields: `timestamp`, `severity_text`, `body`, and one object field
47
-
for the nested values `resource.service` . It also sets the `default_search_fields`, the `tag_fields`, and the `timestamp_field`. The `timestamp_field` and `tag_fields` are used by Quickwit for [splits pruning](../overview/architecture.md) at query time to boost search speed. Check out the [index config docs](../reference/index-config.md) for more details.
46
+
The index config defines five fields: `timestamp`, `tenant_id`, `severity_text`, `body`, and one object field
47
+
for the nested values `resource.service` . It also sets the `default_search_fields`, the `tag_fields`, and the `timestamp_field`. The `timestamp_field` and `tag_fields` are used by Quickwit for [splits pruning](../design/architecture.md) at query time to boost search speed. Check out the [index config docs](../reference/index-config.md) for more details.
We can now create the index with the `create` subcommand.
92
94
93
95
```bash
94
-
./quickwit index create --index-config hdfslogs_index_config.yaml --config config.yaml
96
+
./quickwit index create --index-config hdfs_logs_index_config.yaml --config config.yaml
95
97
```
96
98
97
99
:::note
98
100
99
-
This step can also be executed on your local machine. The `create` command creates the index locally and then uploads a json file `metastore.json` to your bucket at `s3://path-to-your-bucket/hdfslogs/metastore.json`.
101
+
This step can also be executed on your local machine. The `create` command creates the index locally and then uploads a json file `metastore.json` to your bucket at `s3://path-to-your-bucket/hdfs-logs/metastore.json`.
100
102
101
103
:::
102
104
103
105
## Index logs
104
-
The dataset is a compressed [ndjson file](https://quickwit-datasets-public.s3.amazonaws.com/hdfs.logs.quickwit.json.gz). Instead of downloading and indexing the data in separate steps, we will use pipes to send a decompressed stream to Quickwit directly.
106
+
The dataset is a compressed [ndjson file](https://quickwit-datasets-public.s3.amazonaws.com/hdfs-logs-multitenants.json.gz). Instead of downloading and indexing the data in separate steps, we will use pipes to send a decompressed stream to Quickwit directly.
4GB of RAM is enough to index this dataset; an instance like `t4g.medium` with 4GB and 2 vCPU indexed this dataset in 20 minutes.
113
115
114
-
This step can also be done on your local machine. The `ingest` subcommand generates locally [splits](../overview/architecture.md) of 10 million documents and will upload them on your bucket. Concretely, each split is a bundle of index files and metadata files.
116
+
This step can also be done on your local machine. The `ingest` subcommand generates locally [splits](../design/architecture.md) of 10 million documents and will upload them on your bucket. Concretely, each split is a bundle of index files and metadata files.
115
117
116
118
:::
117
119
118
120
119
121
You can check it's working by using `search` subcommand and look for `ERROR` in `serverity_text` field:
120
122
```bash
121
-
./quickwit index search --index hdfslogs --config ./config.yaml --query "severity_text:ERROR"
123
+
./quickwit index search --index hdfs-logs --config ./config.yaml --query "severity_text:ERROR"
122
124
```
123
125
124
126
Now that we have indexed the logs and can search from one instance, It's time to configure and start a search cluster.
@@ -205,7 +207,7 @@ INFO quickwit_cluster::cluster: Joined. node_id="searcher-1" remote_host=Some(18
205
207
Now we can query one of our instance directly by issuing http requests to one of the nodes rest API endpoint.
You can see that this query has only 364 hits and that the server responds in 0.5 seconds.
255
257
256
-
The index config shows that we can use the timestamp field parameters `startTimestamp` and `endTimestamp` and benefit from time pruning. Behind the scenes, Quickwit will only query [splits](../overview/architecture.md) that have logs in this time range. This can have a significant impact on speed.
258
+
The index config shows that we can use the timestamp field parameters `startTimestamp` and `endTimestamp` and benefit from time pruning. Behind the scenes, Quickwit will only query [splits](../design/architecture.md) that have logs in this time range. This can have a significant impact on speed.
257
259
258
260
259
261
```bash
@@ -268,11 +270,11 @@ Returns 6 hits in 0.36 seconds.
268
270
Let's do some cleanup by deleting the index:
269
271
270
272
```bash
271
-
./quickwit index delete --index hdfslogs --config ./config.yaml
273
+
./quickwit index delete --index hdfs-logs --config ./config.yaml
272
274
```
273
275
274
276
Also remember to remove the security group to protect your EC2 instances. You can just remove the instances if you don't need them.
275
277
276
278
Congratz! You finished this tutorial!
277
279
278
-
To continue your Quickwit journey, check out the [search REST API reference](../reference/search-api.md) or the [query language reference](../reference/query-language.md).
280
+
To continue your Quickwit journey, check out the [search REST API reference](../reference/rest-api.md) or the [query language reference](../reference/query-language.md).
The index config defines four fields: `timestamp`, `severity_text`, `body`, and one object field
45
-
for the nested values `resource.service` . It also sets the `default_search_fields`, the `tag_fields`, and the `timestamp_field`.The `timestamp_field` and `tag_fields` are used by Quickwit for [splits pruning](../overview/architecture.md) at query time to boost search speed. Check out the [index config docs](../reference/index-config.md) for more details.
44
+
The index config defines five fields: `timestamp`, `tenant_id`, `severity_text`, `body`, and one object field
45
+
for the nested values `resource.service` . It also sets the `default_search_fields`, the `tag_fields`, and the `timestamp_field`.The `timestamp_field` and `tag_fields` are used by Quickwit for [splits pruning](../design/architecture.md) at query time to boost search speed. Check out the [index config docs](../reference/index-config.md) for more details.
46
46
47
-
```yaml title="hdfslogs_index_config.yaml"
47
+
```yaml title="hdfs_logs_index_config.yaml"
48
48
version: 0
49
49
50
50
doc_mapping:
51
51
field_mappings:
52
52
- name: timestamp
53
53
type: i64
54
54
fast: true # Fast field must be present when this is the timestamp field.
55
+
- name: tenant_id
56
+
type: u64
57
+
fast: true
55
58
- name: severity_text
56
59
type: text
57
60
tokenizer: raw # No tokeninization.
@@ -65,8 +68,7 @@ doc_mapping:
65
68
- name: service
66
69
type: text
67
70
tokenizer: raw # Text field referenced as tag must have the `raw` tokenier.
./quickwit index create --index-config hdfslogs_index_config.yaml
91
+
./quickwit index create --index-config hdfs_logs_index_config.yaml
90
92
```
91
93
92
94
You're now ready to fill the index.
93
95
94
96
## Index logs
95
-
The dataset is a compressed [ndjson file](https://quickwit-datasets-public.s3.amazonaws.com/hdfs.logs.quickwit.json.gz). Instead of downloading it and then indexing the data, we will use pipes to directly send a decompressed stream to Quickwit.
97
+
The dataset is a compressed [ndjson file](https://quickwit-datasets-public.s3.amazonaws.com/hdfs-logs-multitenants.json.gz). Instead of downloading it and then indexing the data, we will use pipes to directly send a decompressed stream to Quickwit.
96
98
This can take up to 10 min on a modern machine, the perfect time for a coffee break.
97
99
98
100
```bash
99
-
curl https://quickwit-datasets-public.s3.amazonaws.com/hdfs.logs.quickwit.json.gz | gunzip | ./quickwit index ingest --index hdfslogs
101
+
curl https://quickwit-datasets-public.s3.amazonaws.com/hdfs-logs-multitenants.json.gz | gunzip | ./quickwit index ingest --index hdfs-logs
100
102
```
101
103
102
104
You can check it's working by using `search` subcommand and look for `ERROR` in `serverity_text` field:
103
105
```bash
104
-
./quickwit index search --index hdfslogs --query "severity_text:ERROR"
106
+
./quickwit index search --index hdfs-logs --query "severity_text:ERROR"
105
107
```
106
108
107
109
:::note
108
110
109
-
The `ingest` subcommand generates [splits](../overview/architecture.md) of 5 millions documents. Each split is a small piece of index represented by a file in which index files and metadata files are saved.
111
+
The `ingest` subcommand generates [splits](../design/architecture.md) of 5 millions documents. Each split is a small piece of index represented by a file in which index files and metadata files are saved.
110
112
111
113
:::
112
114
113
115
114
116
## Start your server
115
117
116
-
The command `service run searcher` starts an http server which provides a [REST API](../reference/search-api.md).
118
+
The command `service run searcher` starts an http server which provides a [REST API](../reference/rest-api.md).
117
119
118
120
119
121
```bash
@@ -123,7 +125,7 @@ The command `service run searcher` starts an http server which provides a [REST
123
125
Let's execute the same query on field `severity_text` but with `cURL`:
The index config shows that we can use the timestamp field parameters `startTimestamp` and `endTimestamp` and benefit from time pruning. Behind the scenes, Quickwit will only query [splits](../overview/architecture.md) that have logs in this time range.
160
+
The index config shows that we can use the timestamp field parameters `startTimestamp` and `endTimestamp` and benefit from time pruning. Behind the scenes, Quickwit will only query [splits](../design/architecture.md) that have logs in this time range.
159
161
160
162
Let's use these parameters with the following query:
To continue your Quickwit journey, check out the [tutorial for distributed search](tutorial-hdfs-logs-distributed-search-aws-s3.md) or dig into the [search REST API](../reference/search-api.md) or [query language](../reference/query-language.md).
228
+
To continue your Quickwit journey, check out the [tutorial for distributed search](tutorial-hdfs-logs-distributed-search-aws-s3.md) or dig into the [search REST API](../reference/rest-api.md) or [query language](../reference/query-language.md).
0 commit comments