Skip to content

Commit a59a527

Browse files
authored
Merge branch 'main' into entitlements/serverless_tests
2 parents ac7d449 + a626d9c commit a59a527

File tree

157 files changed

+4746
-1139
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

157 files changed

+4746
-1139
lines changed

.github/workflows/gradle-wrapper-validation.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,5 +10,5 @@ jobs:
1010
if: github.repository == 'elastic/elasticsearch'
1111
runs-on: ubuntu-latest
1212
steps:
13-
- uses: actions/checkout@v2
14-
- uses: gradle/wrapper-validation-action@699bb18358f12c5b78b37bb0111d3a0e2276e0e2 # Release v2.1.1
13+
- uses: actions/checkout@v4
14+
- uses: gradle/actions/wrapper-validation@ac638b010cf58a27ee6c972d7336334ccaf61c96 # Release v4.4.1

benchmarks/src/main/java/org/elasticsearch/benchmark/_nightly/esql/ValuesSourceReaderBenchmark.java

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -579,7 +579,6 @@ record ItrAndOrd(PrimitiveIterator.OfInt itr, int ord) {}
579579
pages.add(
580580
new Page(
581581
new DocVector(
582-
583582
ShardRefCounted.ALWAYS_REFERENCED,
584583
blockFactory.newConstantIntBlockWith(0, size).asVector(),
585584
leafs.build().asBlock().asVector(),

docs/changelog/129369.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
pr: 129369
2+
summary: Support semantic reranking using contextual snippets instead of entire field
3+
text
4+
area: Relevance
5+
type: enhancement
6+
issues: []

docs/changelog/130279.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 130279
2+
summary: Fix missing removal of query cancellation callback in QueryPhase
3+
area: Search
4+
type: bug
5+
issues: [130071]

docs/changelog/131694.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 131694
2+
summary: Accept unsigned longs on MAX and MIN aggregations
3+
area: ES|QL
4+
type: enhancement
5+
issues: []

docs/changelog/131711.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 131711
2+
summary: Track & log when there is insufficient disk space available to execute merges
3+
area: Engine
4+
type: enhancement
5+
issues: []

docs/reference/elasticsearch/configuration-reference/security-settings.md

Lines changed: 9 additions & 9 deletions
Large diffs are not rendered by default.

docs/reference/elasticsearch/index-settings/index-modules.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -267,5 +267,5 @@ $$$index-esql-stored-fields-sequential-proportion$$$
267267
`index.esql.stored_fields_sequential_proportion`
268268
: Tuning parameter for deciding when {{esql}} will load [Stored fields](/reference/elasticsearch/rest-apis/retrieve-selected-fields.md#stored-fields) using a strategy tuned for loading dense sequence of documents. Allows values between 0.0 and 1.0 and defaults to 0.2. Indices with documents smaller than 10kb may see speed improvements loading `text` fields by setting this lower.
269269

270-
$$$index-dense-vector-hnsw-early-termination$$$ `index.dense_vector.hnsw_early_termination`
270+
$$$index-dense-vector-hnsw-early-termination$$$ `index.dense_vector.hnsw_early_termination` {applies_to}`stack: ga 9.2` {applies_to}`serverless: all`
271271
: Whether to apply _patience_ based early termination strategy to knn queries over HNSW graphs (see [paper](https://cs.uwaterloo.ca/~jimmylin/publications/Teofili_Lin_ECIR2025.pdf)). This is only applicable to `dense_vector` fields with `hnsw`, `int8_hnsw`, `int4_hnsw` and `bbq_hnsw` index types. Defaults to `false`.

docs/reference/elasticsearch/mapping-reference/dense-vector.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -396,9 +396,18 @@ POST /my-bit-vectors/_search?filter_path=hits.hits
396396

397397
To better accommodate scaling and performance needs, updating the `type` setting in `index_options` is possible with the [Update Mapping API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-mapping), according to the following graph (jumps allowed):
398398

399+
::::{tab-set}
400+
:::{tab-item} {{stack}} 9.1+
399401
```txt
400402
flat --> int8_flat --> int4_flat --> bbq_flat --> hnsw --> int8_hnsw --> int4_hnsw --> bbq_hnsw
401403
```
404+
:::
405+
:::{tab-item} {{stack}} 9.0
406+
```txt
407+
flat --> int8_flat --> int4_flat --> hnsw --> int8_hnsw --> int4_hnsw
408+
```
409+
:::
410+
::::
402411

403412
For updating all HNSW types (`hnsw`, `int8_hnsw`, `int4_hnsw`, `bbq_hnsw`) the number of connections `m` must either stay the same or increase. For the scalar quantized formats `int8_flat`, `int4_flat`, `int8_hnsw` and `int4_hnsw` the `confidence_interval` must always be consistent (once defined, it cannot change).
404413

docs/reference/elasticsearch/mapping-reference/semantic-text.md

Lines changed: 45 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22
navigation_title: "Semantic text"
33
mapped_pages:
44
- https://www.elastic.co/guide/en/elasticsearch/reference/current/semantic-text.html
5+
applies_to:
6+
stack: ga 9.0
7+
serverless: ga
58
---
69

710
# Semantic text field type [semantic-text]
@@ -29,7 +32,8 @@ service.
2932
Using `semantic_text`, you won’t need to specify how to generate embeddings for
3033
your data, or how to index it. The {{infer}} endpoint automatically determines
3134
the embedding generation, indexing, and query to use.
32-
Newly created indices with `semantic_text` fields using dense embeddings will be
35+
36+
{applies_to}`stack: ga 9.1` Newly created indices with `semantic_text` fields using dense embeddings will be
3337
[quantized](/reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-quantization)
3438
to `bbq_hnsw` automatically.
3539

@@ -111,13 +115,13 @@ the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/ope
111115
to create the endpoint. If not specified, the {{infer}} endpoint defined by
112116
`inference_id` will be used at both index and query time.
113117

114-
`index_options`
118+
`index_options` {applies_to}`stack: ga 9.1`
115119
: (Optional, object) Specifies the index options to override default values
116120
for the field. Currently, `dense_vector` index options are supported.
117121
For text embeddings, `index_options` may match any allowed
118122
[dense_vector index options](/reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options).
119123

120-
`chunking_settings`
124+
`chunking_settings` {applies_to}`stack: ga 9.1`
121125
: (Optional, object) Settings for chunking text into smaller passages.
122126
If specified, these will override the chunking settings set in the {{infer-cap}}
123127
endpoint associated with `inference_id`.
@@ -127,8 +131,8 @@ To completely disable chunking, use the `none` chunking strategy.
127131

128132
**Valid values for `chunking_settings`**:
129133

130-
`type`
131-
: Indicates the type of chunking strategy to use. Valid values are `none`, `word` or
134+
`strategy`
135+
: Indicates the strategy of chunking strategy to use. Valid values are `none`, `word` or
132136
`sentence`. Required.
133137

134138
`max_chunk_size`
@@ -144,7 +148,8 @@ To completely disable chunking, use the `none` chunking strategy.
144148
or `1`. Required for `sentence` type chunking settings
145149

146150
::::{warning}
147-
When using the `none` chunking strategy, if the input exceeds the maximum token limit of the underlying model, some
151+
When using the `none` chunking strategy, if the input exceeds the maximum token
152+
limit of the underlying model, some
148153
services (such as OpenAI) may return an
149154
error. In contrast, the `elastic` and `elasticsearch` services will
150155
automatically truncate the input to fit within the
@@ -181,6 +186,15 @@ For more details on chunking and how to configure chunking settings,
181186
see [Configuring chunking](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-inference)
182187
in the Inference API documentation.
183188

189+
Refer
190+
to [this tutorial](docs-content://solutions/search/semantic-search/semantic-search-semantic-text.md)
191+
to learn more about semantic search using `semantic_text`.
192+
193+
### Pre-chunking [pre-chunking]
194+
```{applies_to}
195+
stack: ga 9.1
196+
```
197+
184198
You can pre-chunk the input by sending it to Elasticsearch as an array of
185199
strings.
186200
Example:
@@ -227,10 +241,6 @@ PUT test-index/_doc/1
227241
* Others (such as `elastic` and `elasticsearch`) will automatically truncate
228242
the input.
229243

230-
Refer
231-
to [this tutorial](docs-content://solutions/search/semantic-search/semantic-search-semantic-text.md)
232-
to learn more about semantic search using `semantic_text`.
233-
234244
## Extracting relevant fragments from semantic text [semantic-text-highlighting]
235245

236246
You can extract the most relevant fragments from a semantic text field by using
@@ -294,8 +304,14 @@ specified. It enables you to quickstart your semantic search by providing
294304
automatic {{infer}} and a dedicated query so you don’t need to provide further
295305
details.
296306

307+
### Customizing using `semantic_text` parameters [custom-by-parameters]
308+
```{applies_to}
309+
stack: ga 9.1
310+
```
311+
297312
If you want to override those defaults and customize the embeddings that
298-
`semantic_text` indexes, you can do so by modifying [parameters](#semantic-text-params):
313+
`semantic_text` indexes, you can do so by
314+
modifying [parameters](#semantic-text-params):
299315

300316
- Use `index_options` to specify alternate index options such as specific
301317
`dense_vector` quantization methods
@@ -326,6 +342,24 @@ PUT my-index-000004
326342
}
327343
```
328344

345+
### Customizing using ingest pipelines [custom-by-pipelines]
346+
```{applies_to}
347+
stack: ga 9.0
348+
```
349+
350+
In case you want to customize data indexing, use the
351+
[`sparse_vector`](/reference/elasticsearch/mapping-reference/sparse-vector.md)
352+
or [`dense_vector`](/reference/elasticsearch/mapping-reference/dense-vector.md)
353+
field types and create an ingest pipeline with an
354+
[{{infer}} processor](/reference/enrich-processor/inference-processor.md) to
355+
generate the embeddings.
356+
[This tutorial](docs-content://solutions/search/semantic-search/semantic-search-inference.md)
357+
walks you through the process. In these cases - when you use `sparse_vector` or
358+
`dense_vector` field types instead of the `semantic_text` field type to
359+
customize indexing - using the
360+
[`semantic_query`](/reference/query-languages/query-dsl/query-dsl-semantic-query.md)
361+
is not supported for querying the field data.
362+
329363
## Updates to `semantic_text` fields [update-script]
330364

331365
For indices containing `semantic_text` fields, updates that use scripts have the

0 commit comments

Comments
 (0)