From 3c675dc285c2c94746dcc962f125d796a498fa3f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Thu, 24 Jul 2025 11:22:41 +0200 Subject: [PATCH 1/9] [DOCS] Adds inline applies_to tags to semantic text docs. --- .../mapping-reference/semantic-text.md | 25 ++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/docs/reference/elasticsearch/mapping-reference/semantic-text.md b/docs/reference/elasticsearch/mapping-reference/semantic-text.md index fbea748bbe596..6b942547a487e 100644 --- a/docs/reference/elasticsearch/mapping-reference/semantic-text.md +++ b/docs/reference/elasticsearch/mapping-reference/semantic-text.md @@ -29,7 +29,7 @@ service. Using `semantic_text`, you won’t need to specify how to generate embeddings for your data, or how to index it. The {{infer}} endpoint automatically determines the embedding generation, indexing, and query to use. -Newly created indices with `semantic_text` fields using dense embeddings will be +{applies_to}`stack: 9.1, serverless` Newly created indices with `semantic_text` fields using dense embeddings will be [quantized](/reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-quantization) to `bbq_hnsw` automatically. @@ -111,13 +111,13 @@ the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/ope to create the endpoint. If not specified, the {{infer}} endpoint defined by `inference_id` will be used at both index and query time. -`index_options` {applies_to}`stack: ga 9.1` +`index_options` {applies_to}`stack: ga 9.1, serverless` : (Optional, object) Specifies the index options to override default values for the field. Currently, `dense_vector` index options are supported. For text embeddings, `index_options` may match any allowed [dense_vector index options](/reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options). -`chunking_settings` {applies_to}`stack: ga 9.1` +`chunking_settings` {applies_to}`stack: ga 9.1, serverless` : (Optional, object) Settings for chunking text into smaller passages. If specified, these will override the chunking settings set in the {{infer-cap}} endpoint associated with `inference_id`. @@ -182,6 +182,8 @@ For more details on chunking and how to configure chunking settings, see [Configuring chunking](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-inference) in the Inference API documentation. +{applies_to}`stack: ga 9.1, serverless` + You can pre-chunk the input by sending it to Elasticsearch as an array of strings. Example: @@ -295,6 +297,8 @@ specified. It enables you to quickstart your semantic search by providing automatic {{infer}} and a dedicated query so you don’t need to provide further details. +{applies_to}`stack: 9.1, serverless` + If you want to override those defaults and customize the embeddings that `semantic_text` indexes, you can do so by modifying [parameters](#semantic-text-params): @@ -328,6 +332,21 @@ PUT my-index-000004 } ``` +{applies_to}`stack: 9.0` + +In case you want to customize data indexing, use the +[`sparse_vector`](/reference/elasticsearch/mapping-reference/sparse-vector.md) +or [`dense_vector`](/reference/elasticsearch/mapping-reference/dense-vector.md) +field types and create an ingest pipeline with an +[{{infer}} processor](/reference/enrich-processor/inference-processor.md) to +generate the embeddings. +[This tutorial](docs-content://solutions/search/semantic-search/semantic-search-inference.md) +walks you through the process. In these cases - when you use `sparse_vector` or +`dense_vector` field types instead of the `semantic_text` field type to +customize indexing - using the +[`semantic_query`](/reference/query-languages/query-dsl/query-dsl-semantic-query.md) +is not supported for querying the field data. + ## Updates to `semantic_text` fields [update-script] For indices containing `semantic_text` fields, updates that use scripts have the From 1d753433b61492db37fcf097c1732ac99431b5fd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Thu, 24 Jul 2025 12:11:15 +0200 Subject: [PATCH 2/9] More edits. --- .../elasticsearch/mapping-reference/semantic-text.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/docs/reference/elasticsearch/mapping-reference/semantic-text.md b/docs/reference/elasticsearch/mapping-reference/semantic-text.md index 6b942547a487e..39db3c14c15bb 100644 --- a/docs/reference/elasticsearch/mapping-reference/semantic-text.md +++ b/docs/reference/elasticsearch/mapping-reference/semantic-text.md @@ -2,6 +2,9 @@ navigation_title: "Semantic text" mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/semantic-text.html +applies_to: + stack: ga 9.0 + serverless: ga --- # Semantic text field type [semantic-text] @@ -117,13 +120,13 @@ for the field. Currently, `dense_vector` index options are supported. For text embeddings, `index_options` may match any allowed [dense_vector index options](/reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options). -`chunking_settings` {applies_to}`stack: ga 9.1, serverless` +`chunking_settings` : (Optional, object) Settings for chunking text into smaller passages. If specified, these will override the chunking settings set in the {{infer-cap}} endpoint associated with `inference_id`. If chunking settings are updated, they will not be applied to existing documents until they are reindexed. -To completely disable chunking, use the `none` chunking strategy. +{applies_to}`stack: ga 9.1, serverless` To completely disable chunking, use the `none` chunking strategy. **Valid values for `chunking_settings`**: From 5bc4cae69d65e0a16869ba75f5351c86c342c366 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Thu, 24 Jul 2025 12:17:46 +0200 Subject: [PATCH 3/9] Fine-tunes tags. --- .../elasticsearch/mapping-reference/semantic-text.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/reference/elasticsearch/mapping-reference/semantic-text.md b/docs/reference/elasticsearch/mapping-reference/semantic-text.md index 39db3c14c15bb..0e22c05ca8841 100644 --- a/docs/reference/elasticsearch/mapping-reference/semantic-text.md +++ b/docs/reference/elasticsearch/mapping-reference/semantic-text.md @@ -32,7 +32,7 @@ service. Using `semantic_text`, you won’t need to specify how to generate embeddings for your data, or how to index it. The {{infer}} endpoint automatically determines the embedding generation, indexing, and query to use. -{applies_to}`stack: 9.1, serverless` Newly created indices with `semantic_text` fields using dense embeddings will be +{applies_to}`stack: 9.1` Newly created indices with `semantic_text` fields using dense embeddings will be [quantized](/reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-quantization) to `bbq_hnsw` automatically. @@ -114,7 +114,7 @@ the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/ope to create the endpoint. If not specified, the {{infer}} endpoint defined by `inference_id` will be used at both index and query time. -`index_options` {applies_to}`stack: ga 9.1, serverless` +`index_options` {applies_to}`stack: ga 9.1` : (Optional, object) Specifies the index options to override default values for the field. Currently, `dense_vector` index options are supported. For text embeddings, `index_options` may match any allowed @@ -126,7 +126,7 @@ If specified, these will override the chunking settings set in the {{infer-cap}} endpoint associated with `inference_id`. If chunking settings are updated, they will not be applied to existing documents until they are reindexed. -{applies_to}`stack: ga 9.1, serverless` To completely disable chunking, use the `none` chunking strategy. +{applies_to}`stack: ga 9.1` To completely disable chunking, use the `none` chunking strategy. **Valid values for `chunking_settings`**: @@ -185,7 +185,7 @@ For more details on chunking and how to configure chunking settings, see [Configuring chunking](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-inference) in the Inference API documentation. -{applies_to}`stack: ga 9.1, serverless` +{applies_to}`stack: ga 9.1` You can pre-chunk the input by sending it to Elasticsearch as an array of strings. @@ -300,7 +300,7 @@ specified. It enables you to quickstart your semantic search by providing automatic {{infer}} and a dedicated query so you don’t need to provide further details. -{applies_to}`stack: 9.1, serverless` +{applies_to}`stack: 9.1` If you want to override those defaults and customize the embeddings that `semantic_text` indexes, you can do so by From 334fbd46c555ecd829dccbe95581f1806a201c8f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Thu, 24 Jul 2025 12:30:40 +0200 Subject: [PATCH 4/9] Adds role. --- .../elasticsearch/mapping-reference/semantic-text.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/reference/elasticsearch/mapping-reference/semantic-text.md b/docs/reference/elasticsearch/mapping-reference/semantic-text.md index 0e22c05ca8841..174115521aca2 100644 --- a/docs/reference/elasticsearch/mapping-reference/semantic-text.md +++ b/docs/reference/elasticsearch/mapping-reference/semantic-text.md @@ -32,7 +32,7 @@ service. Using `semantic_text`, you won’t need to specify how to generate embeddings for your data, or how to index it. The {{infer}} endpoint automatically determines the embedding generation, indexing, and query to use. -{applies_to}`stack: 9.1` Newly created indices with `semantic_text` fields using dense embeddings will be +{applies_to}`stack: ga 9.1` Newly created indices with `semantic_text` fields using dense embeddings will be [quantized](/reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-quantization) to `bbq_hnsw` automatically. @@ -300,7 +300,7 @@ specified. It enables you to quickstart your semantic search by providing automatic {{infer}} and a dedicated query so you don’t need to provide further details. -{applies_to}`stack: 9.1` +{applies_to}`stack: ga 9.1` If you want to override those defaults and customize the embeddings that `semantic_text` indexes, you can do so by @@ -335,7 +335,7 @@ PUT my-index-000004 } ``` -{applies_to}`stack: 9.0` +{applies_to}`stack: ga 9.0` In case you want to customize data indexing, use the [`sparse_vector`](/reference/elasticsearch/mapping-reference/sparse-vector.md) From 32f79f3c28ce36da0a033f005204f73da9ca394b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Thu, 24 Jul 2025 14:22:55 +0200 Subject: [PATCH 5/9] Addresses feedback. --- .../elasticsearch/mapping-reference/semantic-text.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/reference/elasticsearch/mapping-reference/semantic-text.md b/docs/reference/elasticsearch/mapping-reference/semantic-text.md index 174115521aca2..131205c40a95d 100644 --- a/docs/reference/elasticsearch/mapping-reference/semantic-text.md +++ b/docs/reference/elasticsearch/mapping-reference/semantic-text.md @@ -32,7 +32,8 @@ service. Using `semantic_text`, you won’t need to specify how to generate embeddings for your data, or how to index it. The {{infer}} endpoint automatically determines the embedding generation, indexing, and query to use. -{applies_to}`stack: ga 9.1` Newly created indices with `semantic_text` fields using dense embeddings will be + +{applies_to}`stack: ga 9.1` Newly created indices with `semantic_text` fields using dense embeddings will be [quantized](/reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-quantization) to `bbq_hnsw` automatically. @@ -120,13 +121,13 @@ for the field. Currently, `dense_vector` index options are supported. For text embeddings, `index_options` may match any allowed [dense_vector index options](/reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options). -`chunking_settings` +`chunking_settings` {applies_to}`stack: ga 9.1` : (Optional, object) Settings for chunking text into smaller passages. If specified, these will override the chunking settings set in the {{infer-cap}} endpoint associated with `inference_id`. If chunking settings are updated, they will not be applied to existing documents until they are reindexed. -{applies_to}`stack: ga 9.1` To completely disable chunking, use the `none` chunking strategy. +To completely disable chunking, use the `none` chunking strategy. **Valid values for `chunking_settings`**: From 6244c8d2cb70a236ff673b2176aa310f5fb75add Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Thu, 24 Jul 2025 14:34:24 +0200 Subject: [PATCH 6/9] Adds sub-sections. --- .../mapping-reference/semantic-text.md | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/docs/reference/elasticsearch/mapping-reference/semantic-text.md b/docs/reference/elasticsearch/mapping-reference/semantic-text.md index 131205c40a95d..db9d0f36316f1 100644 --- a/docs/reference/elasticsearch/mapping-reference/semantic-text.md +++ b/docs/reference/elasticsearch/mapping-reference/semantic-text.md @@ -186,6 +186,12 @@ For more details on chunking and how to configure chunking settings, see [Configuring chunking](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-inference) in the Inference API documentation. +Refer +to [this tutorial](docs-content://solutions/search/semantic-search/semantic-search-semantic-text.md) +to learn more about semantic search using `semantic_text`. + +### Pre-chunking [pre-chunking] + {applies_to}`stack: ga 9.1` You can pre-chunk the input by sending it to Elasticsearch as an array of @@ -234,10 +240,6 @@ PUT test-index/_doc/1 * Others (such as `elastic` and `elasticsearch`) will automatically truncate the input. -Refer -to [this tutorial](docs-content://solutions/search/semantic-search/semantic-search-semantic-text.md) -to learn more about semantic search using `semantic_text`. - ## Extracting relevant fragments from semantic text [semantic-text-highlighting] You can extract the most relevant fragments from a semantic text field by using @@ -301,6 +303,8 @@ specified. It enables you to quickstart your semantic search by providing automatic {{infer}} and a dedicated query so you don’t need to provide further details. +### Customizing using `semantic_text` parameters [custom-by-parameters] + {applies_to}`stack: ga 9.1` If you want to override those defaults and customize the embeddings that @@ -336,6 +340,8 @@ PUT my-index-000004 } ``` +### Customizing using ingest pipelines [custom-by-pipelines] + {applies_to}`stack: ga 9.0` In case you want to customize data indexing, use the From d4c40d9b99d6f2683b627cfb17f010561e6a5d28 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Thu, 24 Jul 2025 14:38:59 +0200 Subject: [PATCH 7/9] Positions the tags differently. --- .../elasticsearch/mapping-reference/semantic-text.md | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/docs/reference/elasticsearch/mapping-reference/semantic-text.md b/docs/reference/elasticsearch/mapping-reference/semantic-text.md index db9d0f36316f1..ffa106445da22 100644 --- a/docs/reference/elasticsearch/mapping-reference/semantic-text.md +++ b/docs/reference/elasticsearch/mapping-reference/semantic-text.md @@ -190,9 +190,7 @@ Refer to [this tutorial](docs-content://solutions/search/semantic-search/semantic-search-semantic-text.md) to learn more about semantic search using `semantic_text`. -### Pre-chunking [pre-chunking] - -{applies_to}`stack: ga 9.1` +### Pre-chunking {applies_to}`stack: ga 9.1` [pre-chunking] You can pre-chunk the input by sending it to Elasticsearch as an array of strings. @@ -340,9 +338,7 @@ PUT my-index-000004 } ``` -### Customizing using ingest pipelines [custom-by-pipelines] - -{applies_to}`stack: ga 9.0` +### Customizing using ingest pipelines {applies_to}`stack: ga 9.0` [custom-by-pipelines] In case you want to customize data indexing, use the [`sparse_vector`](/reference/elasticsearch/mapping-reference/sparse-vector.md) From eef3d7db5a22bbcadfedf2ac7ad3df0855d26d7f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Thu, 24 Jul 2025 14:57:20 +0200 Subject: [PATCH 8/9] Repositions applies to tags. --- .../elasticsearch/mapping-reference/semantic-text.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/docs/reference/elasticsearch/mapping-reference/semantic-text.md b/docs/reference/elasticsearch/mapping-reference/semantic-text.md index ffa106445da22..db9d0f36316f1 100644 --- a/docs/reference/elasticsearch/mapping-reference/semantic-text.md +++ b/docs/reference/elasticsearch/mapping-reference/semantic-text.md @@ -190,7 +190,9 @@ Refer to [this tutorial](docs-content://solutions/search/semantic-search/semantic-search-semantic-text.md) to learn more about semantic search using `semantic_text`. -### Pre-chunking {applies_to}`stack: ga 9.1` [pre-chunking] +### Pre-chunking [pre-chunking] + +{applies_to}`stack: ga 9.1` You can pre-chunk the input by sending it to Elasticsearch as an array of strings. @@ -338,7 +340,9 @@ PUT my-index-000004 } ``` -### Customizing using ingest pipelines {applies_to}`stack: ga 9.0` [custom-by-pipelines] +### Customizing using ingest pipelines [custom-by-pipelines] + +{applies_to}`stack: ga 9.0` In case you want to customize data indexing, use the [`sparse_vector`](/reference/elasticsearch/mapping-reference/sparse-vector.md) From 2ad6dfcf1adc05d7a550e23f4884362334933f34 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Thu, 24 Jul 2025 14:59:52 +0200 Subject: [PATCH 9/9] Annotates sections. --- .../mapping-reference/semantic-text.md | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/docs/reference/elasticsearch/mapping-reference/semantic-text.md b/docs/reference/elasticsearch/mapping-reference/semantic-text.md index db9d0f36316f1..fba9c1b263420 100644 --- a/docs/reference/elasticsearch/mapping-reference/semantic-text.md +++ b/docs/reference/elasticsearch/mapping-reference/semantic-text.md @@ -191,8 +191,9 @@ to [this tutorial](docs-content://solutions/search/semantic-search/semantic-sear to learn more about semantic search using `semantic_text`. ### Pre-chunking [pre-chunking] - -{applies_to}`stack: ga 9.1` +```{applies_to} +stack: ga 9.1 +``` You can pre-chunk the input by sending it to Elasticsearch as an array of strings. @@ -304,8 +305,9 @@ automatic {{infer}} and a dedicated query so you don’t need to provide further details. ### Customizing using `semantic_text` parameters [custom-by-parameters] - -{applies_to}`stack: ga 9.1` +```{applies_to} +stack: ga 9.1 +``` If you want to override those defaults and customize the embeddings that `semantic_text` indexes, you can do so by @@ -341,8 +343,9 @@ PUT my-index-000004 ``` ### Customizing using ingest pipelines [custom-by-pipelines] - -{applies_to}`stack: ga 9.0` +```{applies_to} +stack: ga 9.0 +``` In case you want to customize data indexing, use the [`sparse_vector`](/reference/elasticsearch/mapping-reference/sparse-vector.md)