From e280153ac719123421b026b11b89c88f0ba4e98c Mon Sep 17 00:00:00 2001 From: Kathleen DeRusso Date: Wed, 25 Jun 2025 13:24:27 -0400 Subject: [PATCH 1/5] Update semantic text docs to suggest using index options for customization --- .../mapping-reference/semantic-text.md | 20 ++++++++----------- 1 file changed, 8 insertions(+), 12 deletions(-) diff --git a/docs/reference/elasticsearch/mapping-reference/semantic-text.md b/docs/reference/elasticsearch/mapping-reference/semantic-text.md index 0e71155b94ce5..a59e582119b62 100644 --- a/docs/reference/elasticsearch/mapping-reference/semantic-text.md +++ b/docs/reference/elasticsearch/mapping-reference/semantic-text.md @@ -315,18 +315,14 @@ specified. It enables you to quickstart your semantic search by providing automatic {{infer}} and a dedicated query so you don’t need to provide further details. -In case you want to customize data indexing, use the [ -`sparse_vector`](/reference/elasticsearch/mapping-reference/sparse-vector.md) -or [`dense_vector`](/reference/elasticsearch/mapping-reference/dense-vector.md) -field types and create an ingest pipeline with -an [{{infer}} processor](/reference/enrich-processor/inference-processor.md) to -generate the -embeddings. [This tutorial](docs-content://solutions/search/semantic-search/semantic-search-inference.md) -walks you through the process. In these cases - when you use `sparse_vector` or -`dense_vector` field types instead of the `semantic_text` field type to -customize indexing - using the [ -`semantic_query`](/reference/query-languages/query-dsl/query-dsl-semantic-query.md) -is not supported for querying the field data. +If you want to override those defaults and customize the embeddings that +`semantic_text` stores, you can do so by modifying <>: + +- Use `index_options` to specify alternate index options such as specific + `dense_vector` quantization methods +- Use `chunking_settings` to override the chunking strategy associated with the + {{infer}} endpoint, or completely disable chunking using the `none` type ## Updates to `semantic_text` fields [update-script] From 2bed4569f04c9b58140cfd39a005a17e1d9563fb Mon Sep 17 00:00:00 2001 From: Kathleen DeRusso Date: Wed, 25 Jun 2025 13:29:38 -0400 Subject: [PATCH 2/5] Correct type of index_options --- docs/reference/elasticsearch/mapping-reference/semantic-text.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/reference/elasticsearch/mapping-reference/semantic-text.md b/docs/reference/elasticsearch/mapping-reference/semantic-text.md index a59e582119b62..a1eaf37251653 100644 --- a/docs/reference/elasticsearch/mapping-reference/semantic-text.md +++ b/docs/reference/elasticsearch/mapping-reference/semantic-text.md @@ -112,7 +112,7 @@ to create the endpoint. If not specified, the {{infer}} endpoint defined by `inference_id` will be used at both index and query time. `index_options` -: (Optional, string) Specifies the index options to override default values +: (Optional, object) Specifies the index options to override default values for the field. Currently, `dense_vector` index options are supported. For text embeddings, `index_options` may match any allowed [dense_vector index options](/reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options). From 528ecb8fe47bc660cd78b8e7b86fc41eecbaf66b Mon Sep 17 00:00:00 2001 From: Kathleen DeRusso Date: Wed, 25 Jun 2025 13:36:42 -0400 Subject: [PATCH 3/5] Move example --- .../mapping-reference/semantic-text.md | 45 ++++++++++--------- 1 file changed, 24 insertions(+), 21 deletions(-) diff --git a/docs/reference/elasticsearch/mapping-reference/semantic-text.md b/docs/reference/elasticsearch/mapping-reference/semantic-text.md index a1eaf37251653..6888be10703de 100644 --- a/docs/reference/elasticsearch/mapping-reference/semantic-text.md +++ b/docs/reference/elasticsearch/mapping-reference/semantic-text.md @@ -117,27 +117,6 @@ for the field. Currently, `dense_vector` index options are supported. For text embeddings, `index_options` may match any allowed [dense_vector index options](/reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options). -An example of how to set index_options for a `semantic_text` field: - -```console -PUT my-index-000004 -{ - "mappings": { - "properties": { - "inference_field": { - "type": "semantic_text", - "inference_id": "my-text-embedding-endpoint", - "index_options": { - "dense_vector": { - "type": "int4_flat" - } - } - } - } - } -} -``` - `chunking_settings` : (Optional, object) Settings for chunking text into smaller passages. If specified, these will override the chunking settings set in the {{infer-cap}} @@ -324,6 +303,30 @@ parameters>>: - Use `chunking_settings` to override the chunking strategy associated with the {{infer}} endpoint, or completely disable chunking using the `none` type +Here is an example of how to set these parameters for a text embedding endpoint: + +```console +PUT my-index-000004 +{ + "mappings": { + "properties": { + "inference_field": { + "type": "semantic_text", + "inference_id": "my-text-embedding-endpoint", + "index_options": { + "dense_vector": { + "type": "int4_flat" + } + }, + "chunking_settings": { + "type": "none" + } + } + } + } +} +``` + ## Updates to `semantic_text` fields [update-script] For indices containing `semantic_text` fields, updates that use scripts have the From 18beaef960973781b3e6cc7a1d97e8342aae95e9 Mon Sep 17 00:00:00 2001 From: Kathleen DeRusso Date: Wed, 25 Jun 2025 15:01:48 -0400 Subject: [PATCH 4/5] PR feedback --- docs/reference/elasticsearch/mapping-reference/semantic-text.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/reference/elasticsearch/mapping-reference/semantic-text.md b/docs/reference/elasticsearch/mapping-reference/semantic-text.md index 6888be10703de..f4e7cabaa5a79 100644 --- a/docs/reference/elasticsearch/mapping-reference/semantic-text.md +++ b/docs/reference/elasticsearch/mapping-reference/semantic-text.md @@ -295,7 +295,7 @@ automatic {{infer}} and a dedicated query so you don’t need to provide further details. If you want to override those defaults and customize the embeddings that -`semantic_text` stores, you can do so by modifying <>: - Use `index_options` to specify alternate index options such as specific From 9a78ff342739a50bd09e3aed52a7cf4db30cea62 Mon Sep 17 00:00:00 2001 From: Kathleen DeRusso Date: Wed, 25 Jun 2025 15:03:26 -0400 Subject: [PATCH 5/5] Copy warning fix --- docs/reference/elasticsearch/mapping-reference/semantic-text.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/reference/elasticsearch/mapping-reference/semantic-text.md b/docs/reference/elasticsearch/mapping-reference/semantic-text.md index f4e7cabaa5a79..06ea9cc9156e3 100644 --- a/docs/reference/elasticsearch/mapping-reference/semantic-text.md +++ b/docs/reference/elasticsearch/mapping-reference/semantic-text.md @@ -144,7 +144,7 @@ To completely disable chunking, use the `none` chunking strategy. or `1`. Required for `sentence` type chunking settings ::::{warning} -If the input exceeds the maximum token limit of the underlying model, some +When using the `none` chunking strategy, if the input exceeds the maximum token limit of the underlying model, some services (such as OpenAI) may return an error. In contrast, the `elastic` and `elasticsearch` services will automatically truncate the input to fit within the