Skip to content
Merged
Show file tree
Hide file tree
Changes from 92 commits
Commits
Show all changes
96 commits
Select commit Hold shift + click to select a range
67b7623
Add index_options parameter to semantic_text field mapping
kderusso Jan 8, 2025
d822301
Cleanup & tests
kderusso Jan 10, 2025
251d22c
Update docs
kderusso Jan 10, 2025
8724cce
Update docs/changelog/119967.yaml
kderusso Jan 10, 2025
6445a44
Merge branch 'main' into kderusso/semantic-text-index-options
kderusso Jan 10, 2025
26de9d3
Merge branch 'main' into kderusso/semantic-text-index-options
kderusso Jan 10, 2025
948d596
Addressed some PR feedbak
kderusso Jan 13, 2025
860ebc4
Update yaml tests
kderusso Jan 14, 2025
07284f5
Refactoring
kderusso Jan 14, 2025
3c1f6e1
Cleanup
kderusso Jan 14, 2025
b4f45ea
Merge main into kderusso/semantic-text-index-options
kderusso Jan 15, 2025
5d2f48b
Fix some tests
kderusso Jan 15, 2025
47b4e23
Hack in inferring text_embedding task type from index options
kderusso Jan 16, 2025
6f72f00
[CI] Auto commit changes from spotless
Jan 16, 2025
1602b09
Fix error inferring model settings
kderusso Jan 17, 2025
c7f99c1
Merge branch 'main' into kderusso/semantic-text-index-options
kderusso Jan 17, 2025
d9db3b4
Update docs
kderusso Jan 17, 2025
4151cca
Update tests
kderusso Jan 17, 2025
d96dadc
Update docs/reference/mapping/types/semantic-text.asciidoc
kderusso Jan 21, 2025
701dab5
Address some minor PR feedback
kderusso Jan 21, 2025
453f132
Remove partial model_settings with inferred task type
kderusso Jan 21, 2025
e7744a7
Cleanup
kderusso Jan 21, 2025
14fde82
Remove unnecessary changes
kderusso Jan 21, 2025
53ab0ac
Merge from main
kderusso Mar 31, 2025
59bde7d
Fix errors from merge
kderusso Mar 31, 2025
0ce700e
[CI] Auto commit changes from spotless
Mar 31, 2025
73f7017
Cleanup
kderusso Mar 31, 2025
0ab598c
Checkpoint, saving changes before merge
kderusso Apr 4, 2025
560d6c2
Merge from main
kderusso Apr 4, 2025
498e6c9
Update parsing
kderusso Apr 4, 2025
228b308
[CI] Auto commit changes from spotless
Apr 4, 2025
5ebb84c
Stash changes
kderusso Apr 7, 2025
8ea0167
Merge from main
kderusso Apr 15, 2025
e41022e
Fix compile errors
kderusso Apr 15, 2025
959b1d3
[CI] Auto commit changes from spotless
Apr 15, 2025
8701954
Cleanup error
kderusso Apr 17, 2025
fe63309
fix test
kderusso Apr 17, 2025
8db6942
fix test
kderusso Apr 17, 2025
7affc70
Fix another test
kderusso Apr 18, 2025
df06595
A bit of cleanup
kderusso Apr 18, 2025
5bf9d2f
Merge from main
kderusso Apr 18, 2025
34b2153
Fix tests
kderusso Apr 18, 2025
be4136b
Spotless
kderusso Apr 18, 2025
216f8bc
Respect index options if set over defaults
kderusso Apr 18, 2025
651fede
Cleanup
kderusso Apr 18, 2025
3abdc75
[CI] Auto commit changes from spotless
Apr 18, 2025
27b4f9e
Support updating to compatible versions, add some cleanup and validation
kderusso Apr 18, 2025
9fb403b
Merge from main
kderusso Apr 30, 2025
7b4d424
Remove test that can't be done here - needs to be unit test
kderusso Apr 30, 2025
4b11083
Add validation
kderusso Apr 30, 2025
d65c011
Cleanup
kderusso Apr 30, 2025
4f933fc
Fix some yaml tests
kderusso May 2, 2025
07829e8
Merge from main
kderusso Jun 3, 2025
1d524b2
[CI] Auto commit changes from spotless
Jun 3, 2025
f9127eb
Happy path early index validation works now; edge cases surrounding d…
kderusso Jun 4, 2025
9912426
Always emit index options, even when using defaults
kderusso Jun 6, 2025
dca3e54
Minor cleanup
kderusso Jun 6, 2025
7a5a29a
Fix test compilation failures
kderusso Jun 9, 2025
279d4c2
Fix some tests
kderusso Jun 9, 2025
a452d8e
Continue to iterate on test failures
kderusso Jun 10, 2025
9a0ca94
Remove index options from inference field metadata as it is only need…
kderusso Jun 11, 2025
3e1c941
Fix some tests
kderusso Jun 11, 2025
e9bfdc8
Remove transport version, no longer needed
kderusso Jun 11, 2025
40ed2fd
Fix yaml tests
kderusso Jun 11, 2025
78bf4a2
Add tests
kderusso Jun 12, 2025
2cc4191
Merge main
kderusso Jun 12, 2025
c909699
IndexOptions don't need to implement Writeable
kderusso Jun 12, 2025
64b787f
[CI] Auto commit changes from spotless
Jun 12, 2025
bad0585
Refactor - move SemanticTextIndexOptions
kderusso Jun 12, 2025
3036adf
Remove writeable
kderusso Jun 12, 2025
66d65b5
Move index_options parsing to semantic text field mapper
kderusso Jun 13, 2025
2152926
Cleanup
kderusso Jun 13, 2025
26393a5
Fix test compilation issue
kderusso Jun 13, 2025
9390356
Cleanup
kderusso Jun 13, 2025
a1aaffc
Remove whitespace
kderusso Jun 13, 2025
f6d58a1
Remove writeables from index options
kderusso Jun 13, 2025
2ef8b1d
Disable merging null options?
kderusso Jun 13, 2025
b044184
Add docs
kderusso Jun 13, 2025
aa12294
[CI] Auto commit changes from spotless
Jun 13, 2025
1c4104a
Revert "Disable merging null options?"
kderusso Jun 13, 2025
1886969
Merge update from main
kderusso Jun 16, 2025
7c23a7c
Remove default serialization
kderusso Jun 16, 2025
aac9ab7
Merge from main
kderusso Jun 16, 2025
b08e2a1
Include default index option type to defaults
kderusso Jun 16, 2025
9e247c9
[CI] Auto commit changes from spotless
Jun 16, 2025
3be5eda
Go back to allowing null updateS
kderusso Jun 16, 2025
c94955e
Cleanup
kderusso Jun 16, 2025
e993671
Fix validation error
kderusso Jun 16, 2025
b843047
Revert "Include default index option type to defaults"
kderusso Jun 16, 2025
aedfafe
Update tests
kderusso Jun 16, 2025
062eeac
Revert "Update tests"
kderusso Jun 16, 2025
65a3d02
Better fix for null inputs
kderusso Jun 16, 2025
fa22c56
Merge branch 'main' into kderusso/semantic-text-index-options
kderusso Jun 17, 2025
57bd949
Remove redundant merge validation
kderusso Jun 17, 2025
2db090c
Merge branch 'main' into kderusso/semantic-text-index-options
kderusso Jun 17, 2025
a6c04a5
Merge branch 'main' into kderusso/semantic-text-index-options
kderusso Jun 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/119967.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 119967
summary: Add `index_options` to `semantic_text` field mappings
area: Mapping
type: enhancement
issues: [ ]
55 changes: 45 additions & 10 deletions docs/reference/elasticsearch/mapping-reference/semantic-text.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ service.

Using `semantic_text`, you won’t need to specify how to generate embeddings for
your data, or how to index it. The {{infer}} endpoint automatically determines
the embedding generation, indexing, and query to use.
the embedding generation, indexing, and query to use.
Newly created indices with `semantic_text` fields using dense embeddings will be
[quantized](/reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-quantization)
to `bbq_hnsw` automatically.
Expand Down Expand Up @@ -111,6 +111,33 @@ the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/ope
to create the endpoint. If not specified, the {{infer}} endpoint defined by
`inference_id` will be used at both index and query time.

`index_options`
: (Optional, string) Specifies the index options to override default values
for the field. Currently, `dense_vector` index options are supported.
For text embeddings, `index_options` may match any allowed
[dense_vector index options](/reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options).

An example of how to set index_options for a `semantic_text` field:

```console
PUT my-index-000004
{
"mappings": {
"properties": {
"inference_field": {
"type": "semantic_text",
"inference_id": "my-text-embedding-endpoint",
"index_options": {
"dense_vector": {
"type": "int4_flat"
}
}
}
}
}
}
```

`chunking_settings`
: (Optional, object) Settings for chunking text into smaller passages.
If specified, these will override the chunking settings set in the {{infer-cap}}
Expand Down Expand Up @@ -138,8 +165,10 @@ To completely disable chunking, use the `none` chunking strategy.
or `1`. Required for `sentence` type chunking settings

::::{warning}
If the input exceeds the maximum token limit of the underlying model, some services (such as OpenAI) may return an
error. In contrast, the `elastic` and `elasticsearch` services will automatically truncate the input to fit within the
If the input exceeds the maximum token limit of the underlying model, some
services (such as OpenAI) may return an
error. In contrast, the `elastic` and `elasticsearch` services will
automatically truncate the input to fit within the
model's limit.
::::

Expand Down Expand Up @@ -173,7 +202,8 @@ For more details on chunking and how to configure chunking settings,
see [Configuring chunking](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-inference)
in the Inference API documentation.

You can pre-chunk the input by sending it to Elasticsearch as an array of strings.
You can pre-chunk the input by sending it to Elasticsearch as an array of
strings.
Example:

```console
Expand Down Expand Up @@ -203,15 +233,20 @@ PUT test-index/_doc/1
```

1. The text is pre-chunked and provided as an array of strings.
Each element in the array represents a single chunk that will be sent directly to the inference service without further chunking.
Each element in the array represents a single chunk that will be sent
directly to the inference service without further chunking.

**Important considerations**:

* When providing pre-chunked input, ensure that you set the chunking strategy to `none` to avoid additional processing.
* Each chunk should be sized carefully, staying within the token limit of the inference service and the underlying model.
* If a chunk exceeds the model's token limit, the behavior depends on the service:
* Some services (such as OpenAI) will return an error.
* Others (such as `elastic` and `elasticsearch`) will automatically truncate the input.
* When providing pre-chunked input, ensure that you set the chunking strategy to
`none` to avoid additional processing.
* Each chunk should be sized carefully, staying within the token limit of the
inference service and the underlying model.
* If a chunk exceeds the model's token limit, the behavior depends on the
service:
* Some services (such as OpenAI) will return an error.
* Others (such as `elastic` and `elasticsearch`) will automatically truncate
the input.

Refer
to [this tutorial](docs-content://solutions/search/semantic-search/semantic-search-semantic-text.md)
Expand Down
Loading
Loading