diff --git a/docs/reference/elasticsearch/mapping-reference/semantic-text.md b/docs/reference/elasticsearch/mapping-reference/semantic-text.md index ca5fe4127ffd7..0f2b22a299397 100644 --- a/docs/reference/elasticsearch/mapping-reference/semantic-text.md +++ b/docs/reference/elasticsearch/mapping-reference/semantic-text.md @@ -359,6 +359,24 @@ PUT test-index 1. Ensures that highlighting is applied exclusively to semantic_text fields. +## Updates and partial updates for `semantic_text` fields [semantic-text-updates] + +When updating documents that contain `semantic_text` fields, it’s important to understand how inference is triggered: + +* **Full document updates** + When you perform a full document update, **all `semantic_text` fields will re-run inference** even if their values did not change. This ensures that the embeddings are always consistent with the current document state but can increase ingestion costs. + +* **Partial updates using the Bulk API** + Partial updates that **omit `semantic_text` fields** and are submitted through the [Bulk API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-bulk) will **reuse the existing embeddings** stored in the index. In this case, inference is **not triggered** for fields that were not updated, which can significantly reduce processing time and cost. + +* **Partial updates using the Update API** + When using the [Update API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-update) with a `doc` object that **omits `semantic_text` fields**, inference **will still run** on all `semantic_text` fields. This means that even if the field values are not changed, embeddings will be re-generated. + +If you want to avoid unnecessary inference and keep existing embeddings: + + * Use **partial updates through the Bulk API**. + * Omit any `semantic_text` fields that did not change from the `doc` object in your request. + ## Customizing `semantic_text` indexing [custom-indexing] `semantic_text` uses defaults for indexing data based on the {{infer}} endpoint @@ -404,24 +422,6 @@ PUT my-index-000004 } ``` -### Customizing using ingest pipelines [custom-by-pipelines] -```{applies_to} -stack: ga 9.0 -``` - -In case you want to customize data indexing, use the -[`sparse_vector`](/reference/elasticsearch/mapping-reference/sparse-vector.md) -or [`dense_vector`](/reference/elasticsearch/mapping-reference/dense-vector.md) -field types and create an ingest pipeline with an -[{{infer}} processor](/reference/enrich-processor/inference-processor.md) to -generate the embeddings. -[This tutorial](docs-content://solutions/search/semantic-search/semantic-search-inference.md) -walks you through the process. In these cases - when you use `sparse_vector` or -`dense_vector` field types instead of the `semantic_text` field type to -customize indexing - using the -[`semantic_query`](/reference/query-languages/query-dsl/query-dsl-semantic-query.md) -is not supported for querying the field data. - ## Updates to `semantic_text` fields [update-script] For indices containing `semantic_text` fields, updates that use scripts have the