Skip to content

Commit 474c9a7

Browse files
authored
Merge branch 'main' into indexing_pressure_bulk_inference
2 parents c85718d + 21d1c78 commit 474c9a7

File tree

136 files changed

+2344
-728
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

136 files changed

+2344
-728
lines changed

CONTRIBUTING.md

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -168,16 +168,13 @@ You can import the Elasticsearch project into IntelliJ IDEA via:
168168

169169
#### Checkstyle
170170

171-
If you have the [Checkstyle] plugin installed, you can configure IntelliJ to
172-
check the Elasticsearch code. However, the Checkstyle configuration file does
173-
not work by default with the IntelliJ plugin, so instead an IDE-specific config
174-
file is generated automatically after IntelliJ finishes syncing. You can
175-
manually generate the file with `./gradlew configureIdeCheckstyle` in case
176-
it is removed due to a `./gradlew clean` or other action.
177-
178-
IntelliJ should be automatically configured to use the generated rules after
179-
import via the `.idea/checkstyle-idea.xml` configuration file. No further
180-
action is required.
171+
IntelliJ should automatically configure checkstyle. It does so by running
172+
`configureIdeCheckstyle` on import. That makes `.idea/checkstyle-idea.xml`
173+
configuration file. IntelliJ points checkstyle at that.
174+
175+
Things like `./gradlew clean` or `git clean -xdf` can nuke the file. You can
176+
regenerate it by running `./gradlew -Didea.active=true configureIdeCheckstyle`,
177+
but generally shouldn't have to.
181178

182179
#### Formatting
183180

docs/changelog/127472.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pr: 127472
2+
summary: Change queries ID to be the same as the async
3+
area: ES|QL
4+
type: feature
5+
issues:
6+
- 127187

docs/changelog/129150.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pr: 129150
2+
summary: Add `none` chunking strategy to disable automatic chunking for inference
3+
endpoints
4+
area: Machine Learning
5+
type: feature
6+
issues: []

docs/changelog/129176.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pr: 129176
2+
summary: Adjust unpromotable shard refresh request validation to allow `RefreshResult.NO_REFRESH`
3+
area: Searchable Snapshots
4+
type: bug
5+
issues:
6+
- 129036

docs/changelog/129223.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 129223
2+
summary: Fix text similarity reranker does not propagate min score correctly
3+
area: Search
4+
type: bug
5+
issues: []

docs/changelog/129278.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 129278
2+
summary: Fix constant keyword optimization
3+
area: ES|QL
4+
type: bug
5+
issues: []

docs/changelog/129326.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pr: 129326
2+
summary: Check positions on `MultiPhraseQueries` as well as phrase queries
3+
area: Search
4+
type: bug
5+
issues:
6+
- 123871

docs/changelog/129359.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 129359
2+
summary: Add min score linear retriever
3+
area: Search
4+
type: enhancement
5+
issues: []

docs/reference/elasticsearch/mapping-reference/semantic-text.md

Lines changed: 50 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -117,15 +117,16 @@ If specified, these will override the chunking settings set in the {{infer-cap}}
117117
endpoint associated with `inference_id`.
118118
If chunking settings are updated, they will not be applied to existing documents
119119
until they are reindexed.
120+
To completely disable chunking, use the `none` chunking strategy.
120121

121122
**Valid values for `chunking_settings`**:
122123

123124
`type`
124-
: Indicates the type of chunking strategy to use. Valid values are `word` or
125+
: Indicates the type of chunking strategy to use. Valid values are `none`, `word` or
125126
`sentence`. Required.
126127

127128
`max_chunk_size`
128-
: The maximum number of works in a chunk. Required.
129+
: The maximum number of words in a chunk. Required for `word` and `sentence` strategies.
129130

130131
`overlap`
131132
: The number of overlapping words allowed in chunks. This cannot be defined as
@@ -136,6 +137,12 @@ until they are reindexed.
136137
: The number of overlapping sentences allowed in chunks. Valid values are `0`
137138
or `1`. Required for `sentence` type chunking settings
138139

140+
::::{warning}
141+
If the input exceeds the maximum token limit of the underlying model, some services (such as OpenAI) may return an
142+
error. In contrast, the `elastic` and `elasticsearch` services will automatically truncate the input to fit within the
143+
model's limit.
144+
::::
145+
139146
## {{infer-cap}} endpoint validation [infer-endpoint-validation]
140147

141148
The `inference_id` will not be validated when the mapping is created, but when
@@ -166,10 +173,49 @@ For more details on chunking and how to configure chunking settings,
166173
see [Configuring chunking](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-inference)
167174
in the Inference API documentation.
168175

176+
You can pre-chunk the input by sending it to Elasticsearch as an array of strings.
177+
Example:
178+
179+
```console
180+
PUT test-index
181+
{
182+
"mappings": {
183+
"properties": {
184+
"my_semantic_field": {
185+
"type": "semantic_text",
186+
"chunking_settings": {
187+
"strategy": "none" <1>
188+
}
189+
}
190+
}
191+
}
192+
}
193+
```
194+
195+
1. Disable chunking on `my_semantic_field`.
196+
197+
```console
198+
PUT test-index/_doc/1
199+
{
200+
"my_semantic_field": ["my first chunk", "my second chunk", ...] <1>
201+
...
202+
}
203+
```
204+
205+
1. The text is pre-chunked and provided as an array of strings.
206+
Each element in the array represents a single chunk that will be sent directly to the inference service without further chunking.
207+
208+
**Important considerations**:
209+
210+
* When providing pre-chunked input, ensure that you set the chunking strategy to `none` to avoid additional processing.
211+
* Each chunk should be sized carefully, staying within the token limit of the inference service and the underlying model.
212+
* If a chunk exceeds the model's token limit, the behavior depends on the service:
213+
* Some services (such as OpenAI) will return an error.
214+
* Others (such as `elastic` and `elasticsearch`) will automatically truncate the input.
215+
169216
Refer
170217
to [this tutorial](docs-content://solutions/search/semantic-search/semantic-search-semantic-text.md)
171-
to learn more about semantic search using `semantic_text` and the `semantic`
172-
query.
218+
to learn more about semantic search using `semantic_text`.
173219

174220
## Extracting Relevant Fragments from Semantic Text [semantic-text-highlighting]
175221

docs/reference/elasticsearch/mapping-reference/term-vector.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ Term vectors contain information about the terms produced by the [analysis](docs
1414

1515
These term vectors can be stored so that they can be retrieved for a particular document.
1616

17+
Refer to the [term vectors API examples](../rest-apis/term-vectors-examples.md) page for usage examples.
18+
1719
The `term_vector` setting accepts:
1820

1921
`no`

0 commit comments

Comments
 (0)