Skip to content

Commit ee74efc

Browse files
authored
Clarify partial updates for semantic text (#132485)
This commit clarifies the behaviour of the semantic text field with partial updates. It also removes the reference to ingest pipeline since semantic text is fully customizable now.
1 parent ef9e390 commit ee74efc

File tree

1 file changed

+18
-18
lines changed

1 file changed

+18
-18
lines changed

docs/reference/elasticsearch/mapping-reference/semantic-text.md

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -359,6 +359,24 @@ PUT test-index
359359

360360
1. Ensures that highlighting is applied exclusively to semantic_text fields.
361361

362+
## Updates and partial updates for `semantic_text` fields [semantic-text-updates]
363+
364+
When updating documents that contain `semantic_text` fields, it’s important to understand how inference is triggered:
365+
366+
* **Full document updates**
367+
When you perform a full document update, **all `semantic_text` fields will re-run inference** even if their values did not change. This ensures that the embeddings are always consistent with the current document state but can increase ingestion costs.
368+
369+
* **Partial updates using the Bulk API**
370+
Partial updates that **omit `semantic_text` fields** and are submitted through the [Bulk API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-bulk) will **reuse the existing embeddings** stored in the index. In this case, inference is **not triggered** for fields that were not updated, which can significantly reduce processing time and cost.
371+
372+
* **Partial updates using the Update API**
373+
When using the [Update API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-update) with a `doc` object that **omits `semantic_text` fields**, inference **will still run** on all `semantic_text` fields. This means that even if the field values are not changed, embeddings will be re-generated.
374+
375+
If you want to avoid unnecessary inference and keep existing embeddings:
376+
377+
* Use **partial updates through the Bulk API**.
378+
* Omit any `semantic_text` fields that did not change from the `doc` object in your request.
379+
362380
## Customizing `semantic_text` indexing [custom-indexing]
363381

364382
`semantic_text` uses defaults for indexing data based on the {{infer}} endpoint
@@ -404,24 +422,6 @@ PUT my-index-000004
404422
}
405423
```
406424

407-
### Customizing using ingest pipelines [custom-by-pipelines]
408-
```{applies_to}
409-
stack: ga 9.0
410-
```
411-
412-
In case you want to customize data indexing, use the
413-
[`sparse_vector`](/reference/elasticsearch/mapping-reference/sparse-vector.md)
414-
or [`dense_vector`](/reference/elasticsearch/mapping-reference/dense-vector.md)
415-
field types and create an ingest pipeline with an
416-
[{{infer}} processor](/reference/enrich-processor/inference-processor.md) to
417-
generate the embeddings.
418-
[This tutorial](docs-content://solutions/search/semantic-search/semantic-search-inference.md)
419-
walks you through the process. In these cases - when you use `sparse_vector` or
420-
`dense_vector` field types instead of the `semantic_text` field type to
421-
customize indexing - using the
422-
[`semantic_query`](/reference/query-languages/query-dsl/query-dsl-semantic-query.md)
423-
is not supported for querying the field data.
424-
425425
## Updates to `semantic_text` fields [update-script]
426426

427427
For indices containing `semantic_text` fields, updates that use scripts have the

0 commit comments

Comments
 (0)