Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 18 additions & 18 deletions docs/reference/elasticsearch/mapping-reference/semantic-text.md
Original file line number Diff line number Diff line change
Expand Up @@ -359,6 +359,24 @@ PUT test-index

1. Ensures that highlighting is applied exclusively to semantic_text fields.

## Updates and partial updates for `semantic_text` fields [semantic-text-updates]

When updating documents that contain `semantic_text` fields, it’s important to understand how inference is triggered:

* **Full document updates**
When you perform a full document update, **all `semantic_text` fields will re-run inference** even if their values did not change. This ensures that the embeddings are always consistent with the current document state but can increase ingestion costs.

* **Partial updates using the Bulk API**
Partial updates that **omit `semantic_text` fields** and are submitted through the [Bulk API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-bulk) will **reuse the existing embeddings** stored in the index. In this case, inference is **not triggered** for fields that were not updated, which can significantly reduce processing time and cost.

* **Partial updates using the Update API**
When using the [Update API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-update) with a `doc` object that **omits `semantic_text` fields**, inference **will still run** on all `semantic_text` fields. This means that even if the field values are not changed, embeddings will be re-generated.

If you want to avoid unnecessary inference and keep existing embeddings:

* Use **partial updates through the Bulk API**.
* Omit any `semantic_text` fields that did not change from the `doc` object in your request.

## Customizing `semantic_text` indexing [custom-indexing]

`semantic_text` uses defaults for indexing data based on the {{infer}} endpoint
Expand Down Expand Up @@ -404,24 +422,6 @@ PUT my-index-000004
}
```

### Customizing using ingest pipelines [custom-by-pipelines]
```{applies_to}
stack: ga 9.0
```

In case you want to customize data indexing, use the
[`sparse_vector`](/reference/elasticsearch/mapping-reference/sparse-vector.md)
or [`dense_vector`](/reference/elasticsearch/mapping-reference/dense-vector.md)
field types and create an ingest pipeline with an
[{{infer}} processor](/reference/enrich-processor/inference-processor.md) to
generate the embeddings.
[This tutorial](docs-content://solutions/search/semantic-search/semantic-search-inference.md)
walks you through the process. In these cases - when you use `sparse_vector` or
`dense_vector` field types instead of the `semantic_text` field type to
customize indexing - using the
[`semantic_query`](/reference/query-languages/query-dsl/query-dsl-semantic-query.md)
is not supported for querying the field data.

## Updates to `semantic_text` fields [update-script]

For indices containing `semantic_text` fields, updates that use scripts have the
Expand Down
Loading