Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 37 additions & 5 deletions docs/reference/elasticsearch/mapping-reference/semantic-text.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@
navigation_title: "Semantic text"
mapped_pages:
- https://www.elastic.co/guide/en/elasticsearch/reference/current/semantic-text.html
applies_to:
stack: ga 9.0
serverless: ga
---

# Semantic text field type [semantic-text]
Expand Down Expand Up @@ -29,7 +32,8 @@ service.
Using `semantic_text`, you won’t need to specify how to generate embeddings for
your data, or how to index it. The {{infer}} endpoint automatically determines
the embedding generation, indexing, and query to use.
Newly created indices with `semantic_text` fields using dense embeddings will be

{applies_to}`stack: ga 9.1` Newly created indices with `semantic_text` fields using dense embeddings will be
[quantized](/reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-quantization)
to `bbq_hnsw` automatically.

Expand Down Expand Up @@ -182,6 +186,15 @@ For more details on chunking and how to configure chunking settings,
see [Configuring chunking](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-inference)
in the Inference API documentation.

Refer
to [this tutorial](docs-content://solutions/search/semantic-search/semantic-search-semantic-text.md)
to learn more about semantic search using `semantic_text`.

### Pre-chunking [pre-chunking]
```{applies_to}
stack: ga 9.1
```

You can pre-chunk the input by sending it to Elasticsearch as an array of
strings.
Example:
Expand Down Expand Up @@ -228,10 +241,6 @@ PUT test-index/_doc/1
* Others (such as `elastic` and `elasticsearch`) will automatically truncate
the input.

Refer
to [this tutorial](docs-content://solutions/search/semantic-search/semantic-search-semantic-text.md)
to learn more about semantic search using `semantic_text`.

## Extracting relevant fragments from semantic text [semantic-text-highlighting]

You can extract the most relevant fragments from a semantic text field by using
Expand Down Expand Up @@ -295,6 +304,11 @@ specified. It enables you to quickstart your semantic search by providing
automatic {{infer}} and a dedicated query so you don’t need to provide further
details.

### Customizing using `semantic_text` parameters [custom-by-parameters]
```{applies_to}
stack: ga 9.1
```

If you want to override those defaults and customize the embeddings that
`semantic_text` indexes, you can do so by
modifying [parameters](#semantic-text-params):
Expand Down Expand Up @@ -328,6 +342,24 @@ PUT my-index-000004
}
```

### Customizing using ingest pipelines [custom-by-pipelines]
```{applies_to}
stack: ga 9.0
```

In case you want to customize data indexing, use the
[`sparse_vector`](/reference/elasticsearch/mapping-reference/sparse-vector.md)
or [`dense_vector`](/reference/elasticsearch/mapping-reference/dense-vector.md)
field types and create an ingest pipeline with an
[{{infer}} processor](/reference/enrich-processor/inference-processor.md) to
generate the embeddings.
[This tutorial](docs-content://solutions/search/semantic-search/semantic-search-inference.md)
walks you through the process. In these cases - when you use `sparse_vector` or
`dense_vector` field types instead of the `semantic_text` field type to
customize indexing - using the
[`semantic_query`](/reference/query-languages/query-dsl/query-dsl-semantic-query.md)
is not supported for querying the field data.

## Updates to `semantic_text` fields [update-script]

For indices containing `semantic_text` fields, updates that use scripts have the
Expand Down
Loading