From 9ed90a5bb3dac3d908bf6c5d13f8f9d463b37f79 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Istv=C3=A1n=20Zolt=C3=A1n=20Szab=C3=B3?= Date: Thu, 24 Jul 2025 15:32:39 +0200 Subject: [PATCH] [DOCS] Adds inline applies_to tags to semantic text docs (#131814) * [DOCS] Adds inline applies_to tags to semantic text docs. * More edits. * Fine-tunes tags. * Adds role. * Addresses feedback. * Adds sub-sections. * Positions the tags differently. * Repositions applies to tags. * Annotates sections. --- .../mapping-reference/semantic-text.md | 42 ++++++++++++++++--- 1 file changed, 37 insertions(+), 5 deletions(-) diff --git a/docs/reference/elasticsearch/mapping-reference/semantic-text.md b/docs/reference/elasticsearch/mapping-reference/semantic-text.md index fbea748bbe596..fba9c1b263420 100644 --- a/docs/reference/elasticsearch/mapping-reference/semantic-text.md +++ b/docs/reference/elasticsearch/mapping-reference/semantic-text.md @@ -2,6 +2,9 @@ navigation_title: "Semantic text" mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/semantic-text.html +applies_to: + stack: ga 9.0 + serverless: ga --- # Semantic text field type [semantic-text] @@ -29,7 +32,8 @@ service. Using `semantic_text`, you won’t need to specify how to generate embeddings for your data, or how to index it. The {{infer}} endpoint automatically determines the embedding generation, indexing, and query to use. -Newly created indices with `semantic_text` fields using dense embeddings will be + +{applies_to}`stack: ga 9.1` Newly created indices with `semantic_text` fields using dense embeddings will be [quantized](/reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-quantization) to `bbq_hnsw` automatically. @@ -182,6 +186,15 @@ For more details on chunking and how to configure chunking settings, see [Configuring chunking](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-inference) in the Inference API documentation. +Refer +to [this tutorial](docs-content://solutions/search/semantic-search/semantic-search-semantic-text.md) +to learn more about semantic search using `semantic_text`. + +### Pre-chunking [pre-chunking] +```{applies_to} +stack: ga 9.1 +``` + You can pre-chunk the input by sending it to Elasticsearch as an array of strings. Example: @@ -228,10 +241,6 @@ PUT test-index/_doc/1 * Others (such as `elastic` and `elasticsearch`) will automatically truncate the input. -Refer -to [this tutorial](docs-content://solutions/search/semantic-search/semantic-search-semantic-text.md) -to learn more about semantic search using `semantic_text`. - ## Extracting relevant fragments from semantic text [semantic-text-highlighting] You can extract the most relevant fragments from a semantic text field by using @@ -295,6 +304,11 @@ specified. It enables you to quickstart your semantic search by providing automatic {{infer}} and a dedicated query so you don’t need to provide further details. +### Customizing using `semantic_text` parameters [custom-by-parameters] +```{applies_to} +stack: ga 9.1 +``` + If you want to override those defaults and customize the embeddings that `semantic_text` indexes, you can do so by modifying [parameters](#semantic-text-params): @@ -328,6 +342,24 @@ PUT my-index-000004 } ``` +### Customizing using ingest pipelines [custom-by-pipelines] +```{applies_to} +stack: ga 9.0 +``` + +In case you want to customize data indexing, use the +[`sparse_vector`](/reference/elasticsearch/mapping-reference/sparse-vector.md) +or [`dense_vector`](/reference/elasticsearch/mapping-reference/dense-vector.md) +field types and create an ingest pipeline with an +[{{infer}} processor](/reference/enrich-processor/inference-processor.md) to +generate the embeddings. +[This tutorial](docs-content://solutions/search/semantic-search/semantic-search-inference.md) +walks you through the process. In these cases - when you use `sparse_vector` or +`dense_vector` field types instead of the `semantic_text` field type to +customize indexing - using the +[`semantic_query`](/reference/query-languages/query-dsl/query-dsl-semantic-query.md) +is not supported for querying the field data. + ## Updates to `semantic_text` fields [update-script] For indices containing `semantic_text` fields, updates that use scripts have the