From 2b806a0c30fb3fbb8ba1cf67254b07003a8abf4a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Wed, 13 Aug 2025 11:05:23 +0200 Subject: [PATCH 1/3] Explains that chunks stored as offsets. --- .../elasticsearch/mapping-reference/semantic-text.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/reference/elasticsearch/mapping-reference/semantic-text.md b/docs/reference/elasticsearch/mapping-reference/semantic-text.md index 2655ae8bc8bd0..88767cf190846 100644 --- a/docs/reference/elasticsearch/mapping-reference/semantic-text.md +++ b/docs/reference/elasticsearch/mapping-reference/semantic-text.md @@ -223,6 +223,10 @@ generated from it. When querying, the individual passages will be automatically searched for each document, and the most relevant passage will be used to compute a score. +Chunks are stored as start and end character offsets rather than as separate +text strings. These offsets point to the exact location of each chunk within the +original input text. + For more details on chunking and how to configure chunking settings, see [Configuring chunking](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-inference) in the Inference API documentation. @@ -232,6 +236,7 @@ to [this tutorial](docs-content://solutions/search/semantic-search/semantic-sear to learn more about semantic search using `semantic_text`. ### Pre-chunking [pre-chunking] + ```{applies_to} stack: ga 9.1 ``` @@ -283,6 +288,7 @@ PUT test-index/_doc/1 the input. ## Retrieving indexed chunks + ```{applies_to} stack: ga 9.2 serverless: ga From ea08d627164ec609162382ceb3bb732b7486e95a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Wed, 13 Aug 2025 15:29:06 +0200 Subject: [PATCH 2/3] Small changes. --- .../elasticsearch/mapping-reference/semantic-text.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/reference/elasticsearch/mapping-reference/semantic-text.md b/docs/reference/elasticsearch/mapping-reference/semantic-text.md index 88767cf190846..8e6e054808436 100644 --- a/docs/reference/elasticsearch/mapping-reference/semantic-text.md +++ b/docs/reference/elasticsearch/mapping-reference/semantic-text.md @@ -243,7 +243,8 @@ stack: ga 9.1 You can pre-chunk the input by sending it to Elasticsearch as an array of strings. -Example: + +For example: ```console PUT test-index @@ -546,7 +547,6 @@ POST test-index/_search This will return verbose chunked embeddings content that is used to perform semantic search for `semantic_text` fields. - ## Limitations [limitations] `semantic_text` field types have the following limitations: From 8a0bc145a11978176d6bbea1b893cc2808dcd4f8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Wed, 13 Aug 2025 15:46:20 +0200 Subject: [PATCH 3/3] Refines applies_to placement. --- .../reference/elasticsearch/mapping-reference/semantic-text.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/docs/reference/elasticsearch/mapping-reference/semantic-text.md b/docs/reference/elasticsearch/mapping-reference/semantic-text.md index 8e6e054808436..17903e6f94f05 100644 --- a/docs/reference/elasticsearch/mapping-reference/semantic-text.md +++ b/docs/reference/elasticsearch/mapping-reference/semantic-text.md @@ -107,7 +107,6 @@ PUT my-index-000003 ``` ### Using ELSER on EIS - ```{applies_to} stack: preview 9.1 serverless: preview @@ -236,7 +235,6 @@ to [this tutorial](docs-content://solutions/search/semantic-search/semantic-sear to learn more about semantic search using `semantic_text`. ### Pre-chunking [pre-chunking] - ```{applies_to} stack: ga 9.1 ``` @@ -289,7 +287,6 @@ PUT test-index/_doc/1 the input. ## Retrieving indexed chunks - ```{applies_to} stack: ga 9.2 serverless: ga