Skip to content

Commit f2cf76f

Browse files
authored
Update semantic text docs to suggest customization using index_options (#130028)
* Update semantic text docs to suggest using index options for customization * Correct type of index_options * Move example * PR feedback * Copy warning fix
1 parent f37c037 commit f2cf76f

File tree

1 file changed

+34
-35
lines changed

1 file changed

+34
-35
lines changed

docs/reference/elasticsearch/mapping-reference/semantic-text.md

Lines changed: 34 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -112,32 +112,11 @@ to create the endpoint. If not specified, the {{infer}} endpoint defined by
112112
`inference_id` will be used at both index and query time.
113113

114114
`index_options`
115-
: (Optional, string) Specifies the index options to override default values
115+
: (Optional, object) Specifies the index options to override default values
116116
for the field. Currently, `dense_vector` index options are supported.
117117
For text embeddings, `index_options` may match any allowed
118118
[dense_vector index options](/reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options).
119119

120-
An example of how to set index_options for a `semantic_text` field:
121-
122-
```console
123-
PUT my-index-000004
124-
{
125-
"mappings": {
126-
"properties": {
127-
"inference_field": {
128-
"type": "semantic_text",
129-
"inference_id": "my-text-embedding-endpoint",
130-
"index_options": {
131-
"dense_vector": {
132-
"type": "int4_flat"
133-
}
134-
}
135-
}
136-
}
137-
}
138-
}
139-
```
140-
141120
`chunking_settings`
142121
: (Optional, object) Settings for chunking text into smaller passages.
143122
If specified, these will override the chunking settings set in the {{infer-cap}}
@@ -165,7 +144,7 @@ To completely disable chunking, use the `none` chunking strategy.
165144
or `1`. Required for `sentence` type chunking settings
166145

167146
::::{warning}
168-
If the input exceeds the maximum token limit of the underlying model, some
147+
When using the `none` chunking strategy, if the input exceeds the maximum token limit of the underlying model, some
169148
services (such as OpenAI) may return an
170149
error. In contrast, the `elastic` and `elasticsearch` services will
171150
automatically truncate the input to fit within the
@@ -315,18 +294,38 @@ specified. It enables you to quickstart your semantic search by providing
315294
automatic {{infer}} and a dedicated query so you don’t need to provide further
316295
details.
317296

318-
In case you want to customize data indexing, use the [
319-
`sparse_vector`](/reference/elasticsearch/mapping-reference/sparse-vector.md)
320-
or [`dense_vector`](/reference/elasticsearch/mapping-reference/dense-vector.md)
321-
field types and create an ingest pipeline with
322-
an [{{infer}} processor](/reference/enrich-processor/inference-processor.md) to
323-
generate the
324-
embeddings. [This tutorial](docs-content://solutions/search/semantic-search/semantic-search-inference.md)
325-
walks you through the process. In these cases - when you use `sparse_vector` or
326-
`dense_vector` field types instead of the `semantic_text` field type to
327-
customize indexing - using the [
328-
`semantic_query`](/reference/query-languages/query-dsl/query-dsl-semantic-query.md)
329-
is not supported for querying the field data.
297+
If you want to override those defaults and customize the embeddings that
298+
`semantic_text` indexes, you can do so by modifying <<semantic-text-params,
299+
parameters>>:
300+
301+
- Use `index_options` to specify alternate index options such as specific
302+
`dense_vector` quantization methods
303+
- Use `chunking_settings` to override the chunking strategy associated with the
304+
{{infer}} endpoint, or completely disable chunking using the `none` type
305+
306+
Here is an example of how to set these parameters for a text embedding endpoint:
307+
308+
```console
309+
PUT my-index-000004
310+
{
311+
"mappings": {
312+
"properties": {
313+
"inference_field": {
314+
"type": "semantic_text",
315+
"inference_id": "my-text-embedding-endpoint",
316+
"index_options": {
317+
"dense_vector": {
318+
"type": "int4_flat"
319+
}
320+
},
321+
"chunking_settings": {
322+
"type": "none"
323+
}
324+
}
325+
}
326+
}
327+
}
328+
```
330329

331330
## Updates to `semantic_text` fields [update-script]
332331

0 commit comments

Comments
 (0)