Skip to content

Commit abdea73

Browse files
committed
Clarify partial updates for semantic text
This commit clarifies the behaviour of the semantic text field with partial updates. It also removes the reference to ingest pipeline since semantic text is fully customizable now.
1 parent f91cc68 commit abdea73

File tree

1 file changed

+20
-18
lines changed

1 file changed

+20
-18
lines changed

docs/reference/elasticsearch/mapping-reference/semantic-text.md

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -359,6 +359,26 @@ PUT test-index
359359

360360
1. Ensures that highlighting is applied exclusively to semantic_text fields.
361361

362+
## Updates and partial updates for `semantic_text` fields \[semantic-text-updates]
363+
364+
When updating documents that contain `semantic_text` fields, it’s important to understand how inference is triggered:
365+
366+
* **Full document updates**
367+
When you perform a full document update, **all `semantic_text` fields will re-run inference** even if their values did not change. This ensures that the embeddings are always consistent with the current document state but can increase ingestion costs.
368+
369+
* **Partial updates using the Bulk API**
370+
Partial updates that **omit `semantic_text` fields** and are submitted through the [Bulk API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-bulk) will **reuse the existing embeddings** stored in the index. In this case, inference is **not triggered** for fields that were not updated, which can significantly reduce processing time and cost.
371+
372+
* **Partial updates using the Update API**
373+
When using the [Update API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-update) with a `doc` object that **omits `semantic_text` fields**, inference **will still run** on all `semantic_text` fields. This means that even if the field values are not changed, embeddings will be re-generated.
374+
375+
### Best practices
376+
377+
* If you want to avoid unnecessary inference and keep existing embeddings:
378+
379+
* Use **partial updates through the Bulk API**.
380+
* Omit any `semantic_text` fields that did not change from the `doc` object in your request.
381+
362382
## Customizing `semantic_text` indexing [custom-indexing]
363383

364384
`semantic_text` uses defaults for indexing data based on the {{infer}} endpoint
@@ -404,24 +424,6 @@ PUT my-index-000004
404424
}
405425
```
406426

407-
### Customizing using ingest pipelines [custom-by-pipelines]
408-
```{applies_to}
409-
stack: ga 9.0
410-
```
411-
412-
In case you want to customize data indexing, use the
413-
[`sparse_vector`](/reference/elasticsearch/mapping-reference/sparse-vector.md)
414-
or [`dense_vector`](/reference/elasticsearch/mapping-reference/dense-vector.md)
415-
field types and create an ingest pipeline with an
416-
[{{infer}} processor](/reference/enrich-processor/inference-processor.md) to
417-
generate the embeddings.
418-
[This tutorial](docs-content://solutions/search/semantic-search/semantic-search-inference.md)
419-
walks you through the process. In these cases - when you use `sparse_vector` or
420-
`dense_vector` field types instead of the `semantic_text` field type to
421-
customize indexing - using the
422-
[`semantic_query`](/reference/query-languages/query-dsl/query-dsl-semantic-query.md)
423-
is not supported for querying the field data.
424-
425427
## Updates to `semantic_text` fields [update-script]
426428

427429
For indices containing `semantic_text` fields, updates that use scripts have the

0 commit comments

Comments
 (0)