Remove split describe and extract from doc as it's ready from a user … (#1059)

fmassot · web-flow · commit 3bd0b9271de7 · 2022-01-11T03:22:42.000+01:00
* Remove split describe and extract from doc as it's ready from a user point of view. Add last examples.

* Fix guides.

* Fix fmt.

* Fix comments to be compatible with docusaurus.

* Fix broken links.
diff --git a/config/tutorials/hdfs-logs/index-config.yaml b/config/tutorials/hdfs-logs/index-config.yaml
@@ -11,6 +11,8 @@ doc_mapping:
     - name: timestamp
       type: i64
       fast: true
+    - name: tenant_id
+      type: u64
     - name: severity_text
       type: text
       tokenizer: raw
diff --git a/docs/administration/cloud-env.md b/docs/administration/cloud-env.md
@@ -19,7 +19,7 @@ We recommend picking instances with high network performance to allow faster dow
 A final note on object storage requests costs. These are [quite low](https://aws.amazon.com/s3/pricing/) actually, $0,0004 / 1000 requests for GET and $0.005 / 1000 requests for PUT on AWS S3.
 
 ### PUT requests
-
+=======
 During indexing, Quickwit uploads new splits on Amazon S3 and progressively merges them until they reach 10 million documents that we call “mature splits”. Such splits have a typical size between 1GB and 10GB and will usually require 2 PUT requests to be uploaded (1 PUT request / 5GB).
 
 With default indexing parameters `commit_timeout_secs` of 60 seconds and `merge_policy.merge_factor` of 10 and assuming you want to ingest 1 million documents every minute, this will cost you less than $1 / month.
@@ -29,7 +29,7 @@ With default indexing parameters `commit_timeout_secs` of 60 seconds and `merge_
 When querying, Quickwit needs to make multiple GET requests:
 
 ```jsx
-#num requests = #num splits * ((#num search fields * #num terms * 3) + #num fast fields)
+#num requests = #num splits * ((#num search fields * #num terms * 3) + 1 (timestamp fast field if present))
 ```
 
 The above formula assumes that the hotcache is cached, which will be loaded after the first query for every split.
diff --git a/docs/get-started/quickstart.md b/docs/get-started/quickstart.md
@@ -83,21 +83,21 @@ Now we can create the index with the command:
 ./quickwit index create --index-config ./wikipedia_index_config.yaml
 ```
 
-Check that a directory `./qwdata/wikipedia` has been created, Quickwit will write index files here and a `quickwit.json` which contains the [index metadata](../overview/architecture.md#index-metadata).
+Check that a directory `./qwdata/wikipedia` has been created, Quickwit will write index files here and a `quickwit.json` which contains the [index metadata](../design/architecture.md#index).
 You're now ready to fill the index.
 
 
 ## Let's add some documents
 
-Quickwit can index data from many [sources](./sources.md). We will use a new line delimited json [ndjson](http://ndjson.org/) datasets as our data source.
+Quickwit can index data from many [sources](../reference/source-config.md). We will use a new line delimited json [ndjson](http://ndjson.org/) datasets as our data source.
 Let's download [a bunch of wikipedia articles (10 000)](https://quickwit-datasets-public.s3.amazonaws.com/wiki-articles-10000.json) in [ndjson](http://ndjson.org/) format and index it.
 
 ```bash
 # Download the first 10_000 Wikipedia articles.
 curl -o wiki-articles-10000.json https://quickwit-datasets-public.s3.amazonaws.com/wiki-articles-10000.json
 
 # Index our 10k documents.
-./quickwit index ingest --index wikipedia --input-path ./wiki-articles-10000.json
+./quickwit index ingest --index wikipedia --input-path wiki-articles-10000.json
 ```
 
 Wait one second or two and check if it worked by using `search` command:
@@ -111,7 +111,7 @@ It should return 10 hits. Now you're ready to serve our search API.
 
 ## Start the search service
 
-Quickwit provides a search [REST API](../reference/search-api.md) that can be started using the `service` subcommand.
+Quickwit provides a search [REST API](../reference/rest-api.md) that can be started using the `service` subcommand.
 
 ```bash
 ./quickwit service run searcher 
@@ -165,7 +165,7 @@ curl -o wiki-articles-10000.json https://quickwit-datasets-public.s3.amazonaws.c
 
 ## Next tutorials
 
-- [Search on logs with timestamp pruning](../tutorials/tutorial-hdfs-logs.md)
-- [Setup a distributed search on AWS S3](../tutorials/tutorial-hdfs-logs-distributed-search-aws-s3.md)
+- [Search on logs with timestamp pruning](../guides/tutorial-hdfs-logs.md)
+- [Setup a distributed search on AWS S3](../guides/tutorial-hdfs-logs-distributed-search-aws-s3.md)
 
 
diff --git a/docs/guides/add-full-text-search-to-your-olap-db.md b/docs/guides/add-full-text-search-to-your-olap-db.md
@@ -54,25 +54,20 @@ doc_mapping:
     - name: id
       type: u64
       fast: true
-      stored: true
     - name: created_at
       type: i64
       fast: true
-      stored: true
     - name: event_type
       type: text
       tokenizer: raw
-      stored: true
     - name: title
       type: text
       tokenizer: default
       record: position
-      stored: true
     - name: body
       type: text
       tokenizer: default
       record: position
-      stored: true
 search_settings:
   default_search_fields: [title, body]
 }
@@ -89,8 +84,8 @@ The dataset is a compressed [ndjson file](https://quickwit-datasets-public.s3.am
 Let's index it.
 
 ```bash
-curl https://quickwit-datasets-public.s3.amazonaws.com/gh-archive/gh-archive-2021-12-text-only.json.gz
-gunzip gh-archive-2021-12-text-only.json.gz | ./quickwit index ingest --index gh-archive
+wget https://quickwit-datasets-public.s3.amazonaws.com/gh-archive/gh-archive-2021-12-text-only.json.gz
+gunzip -c gh-archive-2021-12-text-only.json.gz | ./quickwit index ingest --index gh-archive
 ```
 
 You can check it's working by using the `search` command and looking for `tantivy` word:
@@ -105,12 +100,12 @@ You can check it's working by using the `search` command and looking for `tantiv
 ./quickwit service run searcher
 ```
 
-This command will start an HTTP server with a [REST API](../reference/search-api.md). We are now
+This command will start an HTTP server with a [REST API](../reference/rest-api.md). We are now
 ready to fetch some ids with the search stream endpoint. Let's start by streaming them on a simple
 query and with a `CSV` output format.
 
 ```bash
-curl -v "http://0.0.0.0:8080/api/v1/gh-archive/search/stream?query=tantivy&outputFormat=Csv&fastField=id"
+curl "http://0.0.0.0:7280/api/v1/gh-archive/search/stream?query=tantivy&outputFormat=csv&fastField=id"
 ```
 
 We will use the `Clickhouse` binary output format in the following sections to speed up queries.
@@ -161,8 +156,8 @@ text. So it's better to insert it into Clickhouse, but if you don't have the tim
 `gh-archive-2021-12-text-only.json.gz` used for Quickwit.
 
 ```bash
-curl https://quickwit-datasets-public.s3.amazonaws.com/gh-archive/gh-archive-2021-12.json.gz
-gunzip gh-archive-2021-12.json.gz | clickhouse-client -d gh-archive --query="INSERT INTO github_events FORMAT JSONEachRow"
+wget https://quickwit-datasets-public.s3.amazonaws.com/gh-archive/gh-archive-2021-12.json.gz
+gunzip -c gh-archive-2021-12.json.gz | clickhouse-client -d gh-archive --query="INSERT INTO github_events FORMAT JSONEachRow"
 ```
 
 Let's check it's working:
diff --git a/docs/guides/tutorial-hdfs-logs-distributed-search-aws-s3.md b/docs/guides/tutorial-hdfs-logs-distributed-search-aws-s3.md
@@ -40,20 +40,23 @@ cd quickwit-v*/
 
 ```bash
 # First, download the hdfs logs config from Quickwit repository.
-curl -o hdfslogs_index_config.yaml https://raw.githubusercontent.com/quickwit-inc/quickwit/main/config/tutorials/hdfs-logs/index-config.yaml
+curl -o hdfs_logs_index_config.yaml https://raw.githubusercontent.com/quickwit-inc/quickwit/main/config/tutorials/hdfs-logs/index-config.yaml
 ```
 
-The index config defines four fields: `timestamp`, `severity_text`, `body`, and one object field
-for the nested values `resource.service` . It also sets the `default_search_fields`, the `tag_fields`, and the `timestamp_field`. The `timestamp_field` and `tag_fields` are used by Quickwit for [splits pruning](../overview/architecture.md) at query time to boost search speed. Check out the [index config docs](../reference/index-config.md) for more details.
+The index config defines five fields: `timestamp`, `tenant_id`, `severity_text`, `body`, and one object field
+for the nested values `resource.service` . It also sets the `default_search_fields`, the `tag_fields`, and the `timestamp_field`. The `timestamp_field` and `tag_fields` are used by Quickwit for [splits pruning](../design/architecture.md) at query time to boost search speed. Check out the [index config docs](../reference/index-config.md) for more details.
 
-```yaml title="hdfslogs_index_config.yaml"
+```yaml title="hdfs_logs_index_config.yaml"
 version: 0
 
 doc_mapping:
   field_mappings:
     - name: severity_text
       type: text
       tokenizer: raw
+    - name: tenant_id
+      type: u64
+      fast: true
     - name: body
       type: text
       tokenizer: default
@@ -64,8 +67,7 @@ doc_mapping:
         - name: service
           type: text
           tokenizer: raw
-  tag_fields: []
-  store_source: true
+  tag_fields: [tenant_id]
 
 indexing_settings:
   timestamp_field: timestamp
@@ -91,34 +93,34 @@ default_index_root_uri: ${S3_PATH}
 We can now create the index with the `create` subcommand.
 
 ```bash
-./quickwit index create --index-config hdfslogs_index_config.yaml --config config.yaml
+./quickwit index create --index-config hdfs_logs_index_config.yaml --config config.yaml
 ```
 
 :::note
 
-This step can also be executed on your local machine. The `create` command creates the index locally and then uploads a json file `metastore.json` to your bucket at `s3://path-to-your-bucket/hdfslogs/metastore.json`. 
+This step can also be executed on your local machine. The `create` command creates the index locally and then uploads a json file `metastore.json` to your bucket at `s3://path-to-your-bucket/hdfs-logs/metastore.json`. 
 
 :::
 
 ## Index logs
-The dataset is a compressed [ndjson file](https://quickwit-datasets-public.s3.amazonaws.com/hdfs.logs.quickwit.json.gz). Instead of downloading and indexing the data in separate steps, we will use pipes to send a decompressed stream to Quickwit directly.
+The dataset is a compressed [ndjson file](https://quickwit-datasets-public.s3.amazonaws.com/hdfs-logs-multitenants.json.gz). Instead of downloading and indexing the data in separate steps, we will use pipes to send a decompressed stream to Quickwit directly.
 
 ```bash
-curl https://quickwit-datasets-public.s3.amazonaws.com/hdfs.logs.quickwit.json.gz | gunzip | ./quickwit index ingest --index hdfslogs --config ./config.yaml
+curl https://quickwit-datasets-public.s3.amazonaws.com/hdfs-logs-multitenants.json.gz | gunzip | ./quickwit index ingest --index hdfs-logs --config ./config.yaml
 ```
 
 :::note
 
 4GB of RAM is enough to index this dataset; an instance like `t4g.medium` with 4GB and 2 vCPU indexed this dataset in 20 minutes.   
 
-This step can also be done on your local machine. The `ingest` subcommand generates locally [splits](../overview/architecture.md) of 10 million documents and will upload them on your bucket. Concretely, each split is a bundle of index files and metadata files.
+This step can also be done on your local machine. The `ingest` subcommand generates locally [splits](../design/architecture.md) of 10 million documents and will upload them on your bucket. Concretely, each split is a bundle of index files and metadata files.
 
 :::
 
 
 You can check it's working by using `search` subcommand and look for `ERROR` in `serverity_text` field:
 ```bash
-./quickwit index search --index hdfslogs --config ./config.yaml --query "severity_text:ERROR"
+./quickwit index search --index hdfs-logs --config ./config.yaml --query "severity_text:ERROR"
 ```
 
 Now that we have indexed the logs and can search from one instance, It's time to configure and start a search cluster.
@@ -205,7 +207,7 @@ INFO quickwit_cluster::cluster: Joined. node_id="searcher-1" remote_host=Some(18
 Now we can query one of our instance directly by issuing http requests to one of the nodes rest API endpoint.
 
 ```
-curl -v "http://${IP_NODE_2}:7280/api/v1/hdfslogs/search?query=severity_text:ERROR"
+curl -v "http://${IP_NODE_2}:7280/api/v1/hdfs-logs/search?query=severity_text:ERROR"
 ```
 
 ## Load balancing incoming requests
@@ -219,7 +221,7 @@ You can now play with your cluster, kill processes randomly, add/remove new inst
 Let's execute a simple query that returns only `ERROR` entries on field `severity_text`:
 
 ```bash
-curl -v 'http://your-load-balancer/api/v1/hdfslogs/search?query=severity_text:ERROR
+curl -v 'http://your-load-balancer/api/v1/hdfs-logs/search?query=severity_text:ERROR
 ```
 
 which returns the json
@@ -253,7 +255,7 @@ which returns the json
 
 You can see that this query has only 364 hits and that the server responds in 0.5 seconds.
 
-The index config shows that we can use the timestamp field parameters `startTimestamp` and `endTimestamp` and benefit from time pruning. Behind the scenes, Quickwit will only query [splits](../overview/architecture.md) that have logs in this time range. This can have a significant impact on speed.
+The index config shows that we can use the timestamp field parameters `startTimestamp` and `endTimestamp` and benefit from time pruning. Behind the scenes, Quickwit will only query [splits](../design/architecture.md) that have logs in this time range. This can have a significant impact on speed.
 
 
 ```bash
@@ -268,11 +270,11 @@ Returns 6 hits in 0.36 seconds.
 Let's do some cleanup by deleting the index:
 
 ```bash
-./quickwit index delete --index hdfslogs  --config ./config.yaml
+./quickwit index delete --index hdfs-logs  --config ./config.yaml
 ```
 
 Also remember to remove the security group to protect your EC2 instances. You can just remove the instances if you don't need them.
 
 Congratz! You finished this tutorial! 
 
-To continue your Quickwit journey, check out the [search REST API reference](../reference/search-api.md) or the [query language reference](../reference/query-language.md).
+To continue your Quickwit journey, check out the [search REST API reference](../reference/rest-api.md) or the [query language reference](../reference/query-language.md).
diff --git a/docs/guides/tutorial-hdfs-logs.md b/docs/guides/tutorial-hdfs-logs.md
@@ -38,20 +38,23 @@ Let's create an index configured to receive these logs.
 
 ```bash
 # First, download the hdfs logs config from Quickwit repository.
-curl -o hdfslogs_index_config.yaml https://raw.githubusercontent.com/quickwit-inc/quickwit/main/config/tutorials/hdfs-logs/index-config.yaml
+curl -o hdfs_logs_index_config.yaml https://raw.githubusercontent.com/quickwit-inc/quickwit/main/config/tutorials/hdfs-logs/index-config.yaml
 ```
 
-The index config defines four fields: `timestamp`, `severity_text`, `body`, and one object field
-for the nested values `resource.service` . It also sets the `default_search_fields`, the `tag_fields`, and the `timestamp_field`.The `timestamp_field` and `tag_fields` are used by Quickwit for [splits pruning](../overview/architecture.md) at query time to boost search speed. Check out the [index config docs](../reference/index-config.md) for more details.
+The index config defines five fields: `timestamp`, `tenant_id`, `severity_text`, `body`, and one object field
+for the nested values `resource.service` . It also sets the `default_search_fields`, the `tag_fields`, and the `timestamp_field`.The `timestamp_field` and `tag_fields` are used by Quickwit for [splits pruning](../design/architecture.md) at query time to boost search speed. Check out the [index config docs](../reference/index-config.md) for more details.
 
-```yaml title="hdfslogs_index_config.yaml"
+```yaml title="hdfs_logs_index_config.yaml"
 version: 0
 
 doc_mapping:
   field_mappings:
     - name: timestamp
       type: i64
       fast: true # Fast field must be present when this is the timestamp field.
+    - name: tenant_id
+      type: u64
+      fast: true
     - name: severity_text
       type: text
       tokenizer: raw # No tokeninization.
@@ -65,8 +68,7 @@ doc_mapping:
         - name: service
           type: text
           tokenizer: raw # Text field referenced as tag must have the `raw` tokenier.
-  tag_fields: [resource.service]
-  store_source: true
+  tag_fields: [tenant_id]
 
 indexing_settings:
   timestamp_field: timestamp
@@ -86,34 +88,34 @@ export QW_CONFIG=./config/quickwit.yaml
 ```
 
 ```bash
-./quickwit index create --index-config hdfslogs_index_config.yaml
+./quickwit index create --index-config hdfs_logs_index_config.yaml
 ```
 
 You're now ready to fill the index.
 
 ## Index logs
-The dataset is a compressed [ndjson file](https://quickwit-datasets-public.s3.amazonaws.com/hdfs.logs.quickwit.json.gz). Instead of downloading it and then indexing the data, we will use pipes to directly send a decompressed stream to Quickwit.
+The dataset is a compressed [ndjson file](https://quickwit-datasets-public.s3.amazonaws.com/hdfs-logs-multitenants.json.gz). Instead of downloading it and then indexing the data, we will use pipes to directly send a decompressed stream to Quickwit.
 This can take up to 10 min on a modern machine, the perfect time for a coffee break.
 
 ```bash
-curl https://quickwit-datasets-public.s3.amazonaws.com/hdfs.logs.quickwit.json.gz | gunzip | ./quickwit index ingest --index hdfslogs
+curl https://quickwit-datasets-public.s3.amazonaws.com/hdfs-logs-multitenants.json.gz | gunzip | ./quickwit index ingest --index hdfs-logs
 ```
 
 You can check it's working by using `search` subcommand and look for `ERROR` in `serverity_text` field:
 ```bash
-./quickwit index search --index hdfslogs  --query "severity_text:ERROR"
+./quickwit index search --index hdfs-logs  --query "severity_text:ERROR"
 ```
 
 :::note
 
-The `ingest` subcommand generates [splits](../overview/architecture.md) of 5 millions documents. Each split is a small piece of index represented by a file in which index files and metadata files are saved.
+The `ingest` subcommand generates [splits](../design/architecture.md) of 5 millions documents. Each split is a small piece of index represented by a file in which index files and metadata files are saved.
 
 :::
 
 
 ## Start your server
 
-The command `service run searcher` starts an http server which provides a [REST API](../reference/search-api.md).
+The command `service run searcher` starts an http server which provides a [REST API](../reference/rest-api.md).
 
 
 ```bash
@@ -123,7 +125,7 @@ The command `service run searcher` starts an http server which provides a [REST
 Let's execute the same query on field `severity_text` but with `cURL`:
 
 ```bash
-curl -v "http://127.0.0.1:7280/api/v1/hdfslogs/search?query=severity_text:ERROR"
+curl "http://127.0.0.1:7280/api/v1/hdfs-logs/search?query=severity_text:ERROR"
 ```
 
 which returns the json
@@ -155,12 +157,12 @@ which returns the json
 }
 ```
 
-The index config shows that we can use the timestamp field parameters `startTimestamp` and `endTimestamp` and benefit from time pruning. Behind the scenes, Quickwit will only query [splits](../overview/architecture.md) that have logs in this time range.
+The index config shows that we can use the timestamp field parameters `startTimestamp` and `endTimestamp` and benefit from time pruning. Behind the scenes, Quickwit will only query [splits](../design/architecture.md) that have logs in this time range.
 
 Let's use these parameters with the following query:
 
 ```bash
-curl -v 'http://127.0.0.1:7280/api/v1/hdfslogs/search?query=severity_text:ERROR&startTimestamp=1442834249&endTimestamp=1442900000'
+curl -v 'http://127.0.0.1:7280/api/v1/hdfs-logs/search?query=severity_text:ERROR&startTimestamp=1442834249&endTimestamp=1442900000'
 ```
 
 It should return 6 hits faster as Quickwit will query fewer splits.
@@ -216,12 +218,12 @@ curl -v 'http://127.0.0.1:7280/api/v1/hdfs_logs/search?query=severity_text:ERROR
 Let's do some cleanup by deleting the index:
 
 ```bash
-./quickwit index delete --index hdfslogs
+./quickwit index delete --index hdfs-logs
 ```
 
 
 Congratz! You finished this tutorial! 
 
 
-To continue your Quickwit journey, check out the [tutorial for distributed search](tutorial-hdfs-logs-distributed-search-aws-s3.md) or dig into the [search REST API](../reference/search-api.md) or [query language](../reference/query-language.md).
+To continue your Quickwit journey, check out the [tutorial for distributed search](tutorial-hdfs-logs-distributed-search-aws-s3.md) or dig into the [search REST API](../reference/rest-api.md) or [query language](../reference/query-language.md).
 
diff --git a/docs/reference/cli.md b/docs/reference/cli.md
diff --git a/quickwit-cli/src/cli_doc_ext.toml b/quickwit-cli/src/cli_doc_ext.toml
diff --git a/quickwit-cli/src/generate_markdown.rs b/quickwit-cli/src/generate_markdown.rs