Skip to content

Commit 6e5ec67

Browse files
authored
Merge branch 'main' into fix/134809-1
2 parents b24f7af + 59c3601 commit 6e5ec67

File tree

206 files changed

+4086
-2035
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

206 files changed

+4086
-2035
lines changed

build-tools-internal/src/integTest/groovy/org/elasticsearch/gradle/internal/transport/TransportVersionGenerationFuncTest.groovy

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,9 @@ class TransportVersionGenerationFuncTest extends AbstractTransportVersionFuncTes
9090
assertUpperBound("9.2", "new_tv,8124000")
9191
}
9292

93+
/*
94+
temporarily muted, see https://github.com/elastic/elasticsearch/pull/135226
95+
9396
def "invalid changes to a upper bounds should be reverted"() {
9497
given:
9598
transportVersionUpperBound("9.2", "modification", "9000000")
@@ -144,7 +147,7 @@ class TransportVersionGenerationFuncTest extends AbstractTransportVersionFuncTes
144147
assertReferableDefinitionDoesNotExist("test_tv")
145148
assertUpperBound("9.2", "existing_92,8123000")
146149
assertUpperBound("9.1", "existing_92,8012001")
147-
}
150+
}*/
148151

149152
def "a reference can be renamed"() {
150153
given:
@@ -242,8 +245,11 @@ class TransportVersionGenerationFuncTest extends AbstractTransportVersionFuncTes
242245
def "unreferenced definitions are removed"() {
243246
given:
244247
referableTransportVersion("test_tv", "8124000,8012002")
248+
/*
249+
TODO: reset of upper bounds
245250
transportVersionUpperBound("9.2", "test_tv", "8124000")
246251
transportVersionUpperBound("9.1", "test_tv", "8012002")
252+
*/
247253

248254
when:
249255
def result = runGenerateAndValidateTask().build()
@@ -406,6 +412,8 @@ class TransportVersionGenerationFuncTest extends AbstractTransportVersionFuncTes
406412
assertUpperBound("9.2", "new_tv,8124000")
407413
}
408414

415+
/*
416+
TODO: reset of upper bounds
409417
def "deleted upper bounds files are restored"() {
410418
given:
411419
file("myserver/src/main/resources/transport/upper_bounds/9.2.csv").delete()
@@ -416,7 +424,7 @@ class TransportVersionGenerationFuncTest extends AbstractTransportVersionFuncTes
416424
then:
417425
assertGenerateAndValidateSuccess(result)
418426
assertUpperBound("9.2", "existing_92,8123000")
419-
}
427+
}*/
420428

421429
def "upper bounds files must exist for backport branches"() {
422430
when:

build-tools-internal/src/main/java/org/elasticsearch/gradle/internal/transport/GenerateTransportVersionDefinitionTask.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,8 @@ public void run() throws IOException {
103103

104104
getLogger().lifecycle("Generating transport version name: " + targetDefinitionName);
105105
if (targetDefinitionName.isEmpty()) {
106-
resetAllUpperBounds(resources);
106+
// TODO: resetting upper bounds needs to be done locally, otherwise it pulls in some (incomplete) changes from upstream main
107+
// resetAllUpperBounds(resources);
107108
} else {
108109
List<TransportVersionId> ids = updateUpperBounds(resources, upstreamUpperBounds, targetUpperBoundNames, targetDefinitionName);
109110
// (Re)write the definition file.

docs/changelog/135247.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 135247
2+
summary: Fix alias id when drop all aggregates
3+
area: ES|QL
4+
type: bug
5+
issues: []

docs/changelog/135263.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 135263
2+
summary: Optimize `dotCount` in expanding dot parser
3+
area: "Mapping"
4+
type: enhancement
5+
issues: []

docs/changelog/135270.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 135270
2+
summary: Add .reindexed-v7-ml-anomalies-* to anomaly results template index pattern
3+
area: Machine Learning
4+
type: bug
5+
issues: []

docs/changelog/135299.yaml

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
pr: 135299
2+
summary: Release DiskBBQ(`bbq_disk`) index type for `dense_vector` fields
3+
area: Vector Search
4+
type: feature
5+
issues: []
6+
highlight:
7+
title: Release DiskBBQ(`bbq_disk`) index type for `dense_vector` fields
8+
body: |-
9+
This provides a new index type called DiskBBQ (`bbq_disk`).
10+
DiskBBQ is a cluster based format that provides:
11+
- faster and cheaper indexing than HNSW
12+
- Better behavior in lower memory environments (degrades linearly, not exponentially)
13+
- Is near HNSW for QPS when the index is in memory
14+
15+
Current restrictions:
16+
- only floating point values are allowed currently
17+
- quantization is only to a single bit, so not recommended for low dimensionality vectors
18+
- all other restrictions that exist for `dense_vector` fields still apply
19+
20+
To utilize the format, its just like any other:
21+
[source,yaml]
22+
----------------------------
23+
PUT vectors
24+
{
25+
"mappings": {
26+
"properties": {
27+
"vector": {"type": "dense_vector", "index_options": {"type": "disk_bbq"}
28+
}
29+
}
30+
}
31+
----------------------------
32+
Querying is just like any other field.
33+
[source,yaml]
34+
----------------------------
35+
POST vectors/_search{
36+
"query": {
37+
"knn": {
38+
"field": "vector",
39+
"query_vector": <vector>,
40+
"k": 3
41+
}
42+
}
43+
}
44+
----------------------------
45+
`num_candidates` can be used for tuning approximate nature of the search.
46+
Or, more granular control can be provided by setting `visit_percentage` directly.
47+
notable: true

docs/reference/elasticsearch/mapping-reference/dense-vector.md

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -341,18 +341,19 @@ $$$dense-vector-index-options$$$
341341
`type`
342342
: (Required, string) The type of kNN algorithm to use. Can be either any of:
343343
* `hnsw` - This utilizes the [HNSW algorithm](https://arxiv.org/abs/1603.09320) for scalable approximate kNN search. This supports all `element_type` values.
344-
* `int8_hnsw` - The default index type for some float vectors:
344+
* `int8_hnsw` - The default index type for some float vectors:
345345
* {applies_to}`stack: ga 9.1` Default for float vectors with less than 384 dimensions.
346346
* {applies_to}`stack: ga 9.0` Default for float all vectors.
347347
This utilizes the [HNSW algorithm](https://arxiv.org/abs/1603.09320) in addition to automatically scalar quantization for scalable approximate kNN search with `element_type` of `float`. This can reduce the memory footprint by 4x at the cost of some accuracy. See [Automatically quantize vectors for kNN search](#dense-vector-quantization).
348348
* `int4_hnsw` - This utilizes the [HNSW algorithm](https://arxiv.org/abs/1603.09320) in addition to automatically scalar quantization for scalable approximate kNN search with `element_type` of `float`. This can reduce the memory footprint by 8x at the cost of some accuracy. See [Automatically quantize vectors for kNN search](#dense-vector-quantization).
349349
* `bbq_hnsw` - This utilizes the [HNSW algorithm](https://arxiv.org/abs/1603.09320) in addition to automatically binary quantization for scalable approximate kNN search with `element_type` of `float`. This can reduce the memory footprint by 32x at the cost of accuracy. See [Automatically quantize vectors for kNN search](#dense-vector-quantization).
350-
350+
351351
{applies_to}`stack: ga 9.1` `bbq_hnsw` is the default index type for float vectors with greater than or equal to 384 dimensions.
352352
* `flat` - This utilizes a brute-force search algorithm for exact kNN search. This supports all `element_type` values.
353-
* `int8_flat` - This utilizes a brute-force search algorithm in addition to automatically scalar quantization. Only supports `element_type` of `float`.
354-
* `int4_flat` - This utilizes a brute-force search algorithm in addition to automatically half-byte scalar quantization. Only supports `element_type` of `float`.
355-
* `bbq_flat` - This utilizes a brute-force search algorithm in addition to automatically binary quantization. Only supports `element_type` of `float`.
353+
* `int8_flat` - This utilizes a brute-force search algorithm in addition to automatic scalar quantization. Only supports `element_type` of `float`.
354+
* `int4_flat` - This utilizes a brute-force search algorithm in addition to automatic half-byte scalar quantization. Only supports `element_type` of `float`.
355+
* `bbq_flat` - This utilizes a brute-force search algorithm in addition to automatic binary quantization. Only supports `element_type` of `float`.
356+
* {applies_to}`stack: ga 9.2` `bbq_disk` - This utilizes a variant of [k-means clustering algorithm](https://en.wikipedia.org/wiki/K-means_clustering) in addition to automatic binary quantization to partition vectors and search subspaces rather than an entire graph structure as in with HNSW. Only supports `element_type` of `float`. This combines the benefits of BBQ quantization with partitioning to further reduces the required memory overhead when compared with HNSW and can effectively be run at the smallest possible RAM and heap sizes when HNSW would otherwise cause swapping and grind to a halt. DiskBBQ largely scales linearlly with the total RAM. And search performance is enhanced at scale as a subset of the total vector space is loaded.
356357

357358
`m`
358359
: (Optional, integer) The number of neighbors each node will be connected to in the HNSW graph. Defaults to `16`. Only applicable to `hnsw`, `int8_hnsw`, `int4_hnsw` and `bbq_hnsw` index types.
@@ -363,6 +364,12 @@ $$$dense-vector-index-options$$$
363364
`confidence_interval`
364365
: (Optional, float) Only applicable to `int8_hnsw`, `int4_hnsw`, `int8_flat`, and `int4_flat` index types. The confidence interval to use when quantizing the vectors. Can be any value between and including `0.90` and `1.0` or exactly `0`. When the value is `0`, this indicates that dynamic quantiles should be calculated for optimized quantization. When between `0.90` and `1.0`, this value restricts the values used when calculating the quantization thresholds. For example, a value of `0.95` will only use the middle 95% of the values when calculating the quantization thresholds (e.g. the highest and lowest 2.5% of values will be ignored). Defaults to `1/(dims + 1)` for `int8` quantized vectors and `0` for `int4` for dynamic quantile calculation.
365366

367+
`default_visit_percentage` {applies_to}`stack: ga 9.2`
368+
: (Optional, integer) Only applicable to `bbq_disk`. Must be between 0 and 100. 0 will default to using `num_candidates` for calculating the percent visited. Increasing `default_visit_percentage` tends to improve the accuracy of the final results. Defaults to ~1% per shard for every 1 million vectors.
369+
370+
`cluster_size` {applies_to}`stack: ga 9.2`
371+
: (Optional, integer) Only applicable to `bbq_disk`. The number of vectors per cluster. Smaller cluster sizes increases accuracy at the cost of performance. Defaults to `384`. Must be a value between `64` and `65536`.
372+
366373
`rescore_vector` {applies_to}`stack: preview 9.0, ga 9.1`
367374
: (Optional, object) An optional section that configures automatic vector rescoring on knn queries for the given field. Only applicable to quantized index types.
368375
:::::{dropdown} Properties of rescore_vector

docs/reference/elasticsearch/rest-apis/retrievers/knn-retriever.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,12 @@ A kNN retriever returns top documents from a [k-nearest neighbor search (kNN)](d
4141
The number of nearest neighbor candidates to consider per shard. Needs to be greater than `k`, or `size` if `k` is omitted, and cannot exceed 10,000. {{es}} collects `num_candidates` results from each shard, then merges them to find the top `k` results. Increasing `num_candidates` tends to improve the accuracy of the final `k` results. Defaults to `Math.min(1.5 * k, 10_000)`.
4242

4343

44+
`visit_percentage` {applies_to}`stack: ga 9.2`
45+
: (Optional, float)
46+
47+
The percentage of vectors to explore per shard while doing knn search with `bbq_disk`. Must be between 0 and 100. 0 will default to using `num_candidates` for calculating the percent visited. Increasing `visit_percentage` tends to improve the accuracy of the final results. If `visit_percentage` is set for `bbq_disk`, `num_candidates` is ignored. Defaults to ~1% per shard for every 1 million vectors.
48+
49+
4450
`filter`
4551
: (Optional, [query object or list of query objects](/reference/query-languages/querydsl.md))
4652

docs/reference/query-languages/esql.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
11
---
2+
applies_to:
3+
stack:
4+
serverless:
25
navigation_title: "{{esql}}"
36
mapped_pages:
47
- https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-language.html

docs/reference/query-languages/esql/_snippets/commands/layout/lookup-join.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
```yaml {applies_to}
22
stack: preview 9.0.0, ga 9.1.0
3+
serverless: ga
34
```
45
56
`LOOKUP JOIN` enables you to add data from another index, AKA a 'lookup'

0 commit comments

Comments
 (0)