Skip to content
Merged
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 25 additions & 3 deletions docs/reference/elasticsearch/mapping-reference/semantic-text.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@
navigation_title: "Semantic text"
mapped_pages:
- https://www.elastic.co/guide/en/elasticsearch/reference/current/semantic-text.html
applies_to:
stack: ga 9.0
serverless: ga
---

# Semantic text field type [semantic-text]
Expand Down Expand Up @@ -29,7 +32,7 @@ service.
Using `semantic_text`, you won’t need to specify how to generate embeddings for
your data, or how to index it. The {{infer}} endpoint automatically determines
the embedding generation, indexing, and query to use.
Newly created indices with `semantic_text` fields using dense embeddings will be
{applies_to}`stack: ga 9.1` Newly created indices with `semantic_text` fields using dense embeddings will be
[quantized](/reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-quantization)
to `bbq_hnsw` automatically.

Expand Down Expand Up @@ -117,13 +120,13 @@ for the field. Currently, `dense_vector` index options are supported.
For text embeddings, `index_options` may match any allowed
[dense_vector index options](/reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options).

`chunking_settings` {applies_to}`stack: ga 9.1`
`chunking_settings`
: (Optional, object) Settings for chunking text into smaller passages.
If specified, these will override the chunking settings set in the {{infer-cap}}
endpoint associated with `inference_id`.
If chunking settings are updated, they will not be applied to existing documents
until they are reindexed.
To completely disable chunking, use the `none` chunking strategy.
{applies_to}`stack: ga 9.1` To completely disable chunking, use the `none` chunking strategy.

**Valid values for `chunking_settings`**:

Expand Down Expand Up @@ -182,6 +185,8 @@ For more details on chunking and how to configure chunking settings,
see [Configuring chunking](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-inference)
in the Inference API documentation.

{applies_to}`stack: ga 9.1`

You can pre-chunk the input by sending it to Elasticsearch as an array of
strings.
Example:
Expand Down Expand Up @@ -295,6 +300,8 @@ specified. It enables you to quickstart your semantic search by providing
automatic {{infer}} and a dedicated query so you don’t need to provide further
details.

{applies_to}`stack: ga 9.1`

If you want to override those defaults and customize the embeddings that
`semantic_text` indexes, you can do so by
modifying [parameters](#semantic-text-params):
Expand Down Expand Up @@ -328,6 +335,21 @@ PUT my-index-000004
}
```

{applies_to}`stack: ga 9.0`

In case you want to customize data indexing, use the
[`sparse_vector`](/reference/elasticsearch/mapping-reference/sparse-vector.md)
or [`dense_vector`](/reference/elasticsearch/mapping-reference/dense-vector.md)
field types and create an ingest pipeline with an
[{{infer}} processor](/reference/enrich-processor/inference-processor.md) to
generate the embeddings.
[This tutorial](docs-content://solutions/search/semantic-search/semantic-search-inference.md)
walks you through the process. In these cases - when you use `sparse_vector` or
`dense_vector` field types instead of the `semantic_text` field type to
customize indexing - using the
[`semantic_query`](/reference/query-languages/query-dsl/query-dsl-semantic-query.md)
is not supported for querying the field data.

## Updates to `semantic_text` fields [update-script]

For indices containing `semantic_text` fields, updates that use scripts have the
Expand Down
Loading