Skip to content

Commit a9f6ac8

Browse files
authored
Merge branch 'main' into tests/TSDBDocValuesFormatSingleNodeTests
2 parents 8b4db30 + 8b48be1 commit a9f6ac8

File tree

89 files changed

+3851
-1398
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

89 files changed

+3851
-1398
lines changed

AGENTS.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,4 +95,19 @@ The repository is organized into several key directories:
9595
## Backwards compatibility
9696
- For changes to a `Writeable` implementation (`writeTo` and constructor from `StreamInput`), add a new `public static final <UNIQUE_DESCRIPTIVE_NAME> = TransportVersion.fromName("<unique_descriptive_name>")` and use it in the new code paths. Confirm the backport branches and then generate a new version file with `./gradlew generateTransportVersion`.
9797

98+
### CI failure triage with Buildkite and Gradle Enterprise build scans
99+
- Prefer Gradle Enterprise build scans (`https://gradle-enterprise.elastic.co/s/<id>`) over raw logs for root-cause analysis when available.
100+
- If given a Buildkite link, use the Buildkite MCP server first.
101+
- First call `buildkite-list_annotations` and inspect `context=gradle-build-scans-failed` (failed jobs only). If needed, inspect `context=gradle-build-scans` (all jobs).
102+
- If annotations are incomplete, call `buildkite-get_build` and map failed job IDs to `meta_data` keys: `build-scan-<job_id>` and `build-scan-id-<job_id>`.
103+
- Buildkite UI fallback (when MCP is unavailable): Build page -> `Jobs` -> `Failures`, then open/copy the Gradle Enterprise build scan links shown per failed job.
104+
- If given a Gradle Enterprise build scan link directly, start from that link instead of searching Buildkite logs first.
105+
- If `dvcli` is available, use it to extract failed tasks, exact failed tests, primary assertion/error, and reproduction details.
106+
- If `dvcli` is unavailable, do not block: continue with Buildkite MCP logs (`buildkite-search_logs`, `buildkite-tail_logs`, `buildkite-read_logs`), artifacts, and annotations.
107+
- If either tool is missing, suggest installation to the user for faster future triage:
108+
- `dvcli` / `develocity-cli-client`: `https://github.com/breskeby/develocity-cli-client`
109+
- Buildkite MCP setup for AI tools: `https://buildkite.com/docs/apis/mcp-server/remote/configuring-ai-tools`
110+
- For Buildkite URLs that include `#<job_id>`, prioritize that specific job and resolve its corresponding `build-scan-<job_id>` entry.
111+
- In reports, list exact failed tests first, then failed tasks and related build scan URLs.
112+
98113
Stay aligned with `CONTRIBUTING.md`, `BUILDING.md`, and `TESTING.asciidoc`; this AGENTS guide summarizes—but does not replace—those authoritative docs.

docs/changelog/143228.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
area: ES|QL
2+
issues: []
3+
pr: 143228
4+
summary: "Data sources: ZSTD, BZIP2"
5+
type: feature

docs/reference/search-connectors/release-notes.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,12 @@ If you are an Enterprise Search user and want to upgrade to Elastic 9.0, refer t
1313
It includes detailed steps, tooling, and resources to help you transition to supported alternatives in 9.x, such as Elasticsearch, the Open Web Crawler, and self-managed connectors.
1414
:::
1515

16+
## 9.3.1 [connectors-9.2.1-release-notes]
17+
* Fixed an issue where MultiService would enter an unresponsive state instead of shutting down cleanly when a managed service crashed with an unhandled exception. ([#3940](https://github.com/elastic/connectors/pull/3940),[#3939](https://github.com/elastic/connectors/issues/3939))
18+
19+
## 9.2.6 [connectors-9.2.1-release-notes]
20+
* Fixed an issue where MultiService would enter an unresponsive state instead of shutting down cleanly when a managed service crashed with an unhandled exception. ([#3940](https://github.com/elastic/connectors/pull/3940), [#3939](https://github.com/elastic/connectors/issues/3939))
21+
1622
## 9.3.0 [connectors-9.3.0-release-notes]
1723

1824
### Fixes [connectors-9.3.0-fixes]

libs/native/libraries/build.gradle

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ configurations {
1919
}
2020

2121
var zstdVersion = "1.5.7"
22-
var vecVersion = "1.0.41"
22+
var vecVersion = "1.0.42"
2323

2424
repositories {
2525
exclusiveContent {

libs/simdvec/native/publish_vec_binaries.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ if [ -z "$ARTIFACTORY_API_KEY" ]; then
2020
exit 1;
2121
fi
2222

23-
VERSION="1.0.41"
23+
VERSION="1.0.42"
2424
ARTIFACTORY_REPOSITORY="${ARTIFACTORY_REPOSITORY:-https://artifactory.elastic.dev/artifactory/elasticsearch-native/}"
2525
TEMP=$(mktemp -d)
2626

libs/simdvec/native/src/vec/c/aarch64/vec_1.cpp

Lines changed: 156 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -355,7 +355,98 @@ static inline void sqri8_inner_bulk(
355355
const int32_t count,
356356
f32_t* results
357357
) {
358-
for (int c = 0; c < count; c++) {
358+
const int blk = dims & ~15;
359+
int c = 0;
360+
361+
// Process 4 vectors at a time; this helps the CPU scheduler/prefetcher.
362+
// Loading multiple memory locations while computing gives the prefetcher
363+
// information on where the data to load will be next, and keeps the CPU
364+
// execution units busy.
365+
// Our benchmarks show that this "hint" is more effective than using
366+
// explicit prefetch instructions (e.g. __builtin_prefetch) on many ARM
367+
// processors (e.g. Graviton)
368+
for (; c + 3 < count; c += 4) {
369+
const int8_t* a0 = a + mapper(c, offsets) * pitch;
370+
const int8_t* a1 = a + mapper(c + 1, offsets) * pitch;
371+
const int8_t* a2 = a + mapper(c + 2, offsets) * pitch;
372+
const int8_t* a3 = a + mapper(c + 3, offsets) * pitch;
373+
374+
int32x4_t acc0 = vdupq_n_s32(0);
375+
int32x4_t acc1 = vdupq_n_s32(0);
376+
int32x4_t acc2 = vdupq_n_s32(0);
377+
int32x4_t acc3 = vdupq_n_s32(0);
378+
int32x4_t acc4 = vdupq_n_s32(0);
379+
int32x4_t acc5 = vdupq_n_s32(0);
380+
int32x4_t acc6 = vdupq_n_s32(0);
381+
int32x4_t acc7 = vdupq_n_s32(0);
382+
383+
for (int i = 0; i < blk; i += 16) {
384+
int8x16_t vb = vld1q_s8(b + i);
385+
386+
int8x16_t v0 = vld1q_s8(a0 + i);
387+
int16x8_t d0_lo = vsubl_s8(vget_low_s8(v0), vget_low_s8(vb));
388+
int16x8_t d0_hi = vsubl_s8(vget_high_s8(v0), vget_high_s8(vb));
389+
acc0 = vmlal_s16(acc0, vget_low_s16(d0_lo), vget_low_s16(d0_lo));
390+
acc1 = vmlal_s16(acc1, vget_high_s16(d0_lo), vget_high_s16(d0_lo));
391+
acc0 = vmlal_s16(acc0, vget_low_s16(d0_hi), vget_low_s16(d0_hi));
392+
acc1 = vmlal_s16(acc1, vget_high_s16(d0_hi), vget_high_s16(d0_hi));
393+
394+
int8x16_t v1 = vld1q_s8(a1 + i);
395+
int16x8_t d1_lo = vsubl_s8(vget_low_s8(v1), vget_low_s8(vb));
396+
int16x8_t d1_hi = vsubl_s8(vget_high_s8(v1), vget_high_s8(vb));
397+
acc2 = vmlal_s16(acc2, vget_low_s16(d1_lo), vget_low_s16(d1_lo));
398+
acc3 = vmlal_s16(acc3, vget_high_s16(d1_lo), vget_high_s16(d1_lo));
399+
acc2 = vmlal_s16(acc2, vget_low_s16(d1_hi), vget_low_s16(d1_hi));
400+
acc3 = vmlal_s16(acc3, vget_high_s16(d1_hi), vget_high_s16(d1_hi));
401+
402+
int8x16_t v2 = vld1q_s8(a2 + i);
403+
int16x8_t d2_lo = vsubl_s8(vget_low_s8(v2), vget_low_s8(vb));
404+
int16x8_t d2_hi = vsubl_s8(vget_high_s8(v2), vget_high_s8(vb));
405+
acc4 = vmlal_s16(acc4, vget_low_s16(d2_lo), vget_low_s16(d2_lo));
406+
acc5 = vmlal_s16(acc5, vget_high_s16(d2_lo), vget_high_s16(d2_lo));
407+
acc4 = vmlal_s16(acc4, vget_low_s16(d2_hi), vget_low_s16(d2_hi));
408+
acc5 = vmlal_s16(acc5, vget_high_s16(d2_hi), vget_high_s16(d2_hi));
409+
410+
int8x16_t v3 = vld1q_s8(a3 + i);
411+
int16x8_t d3_lo = vsubl_s8(vget_low_s8(v3), vget_low_s8(vb));
412+
int16x8_t d3_hi = vsubl_s8(vget_high_s8(v3), vget_high_s8(vb));
413+
acc6 = vmlal_s16(acc6, vget_low_s16(d3_lo), vget_low_s16(d3_lo));
414+
acc7 = vmlal_s16(acc7, vget_high_s16(d3_lo), vget_high_s16(d3_lo));
415+
acc6 = vmlal_s16(acc6, vget_low_s16(d3_hi), vget_low_s16(d3_hi));
416+
acc7 = vmlal_s16(acc7, vget_high_s16(d3_hi), vget_high_s16(d3_hi));
417+
}
418+
int32x4_t acc01 = vaddq_s32(acc0, acc1);
419+
int32x4_t acc23 = vaddq_s32(acc2, acc3);
420+
int32x4_t acc45 = vaddq_s32(acc4, acc5);
421+
int32x4_t acc67 = vaddq_s32(acc6, acc7);
422+
423+
int32_t acc_scalar0 = vaddvq_s32(acc01);
424+
int32_t acc_scalar1 = vaddvq_s32(acc23);
425+
int32_t acc_scalar2 = vaddvq_s32(acc45);
426+
int32_t acc_scalar3 = vaddvq_s32(acc67);
427+
if (blk != dims) {
428+
// scalar tail
429+
for (int t = blk; t < dims; t++) {
430+
const int8_t bb = b[t];
431+
int32_t diff0 = a0[t] - bb;
432+
int32_t diff1 = a1[t] - bb;
433+
int32_t diff2 = a2[t] - bb;
434+
int32_t diff3 = a3[t] - bb;
435+
436+
acc_scalar0 += diff0 * diff0;
437+
acc_scalar1 += diff1 * diff1;
438+
acc_scalar2 += diff2 * diff2;
439+
acc_scalar3 += diff3 * diff3;
440+
}
441+
}
442+
results[c + 0] = (f32_t)acc_scalar0;
443+
results[c + 1] = (f32_t)acc_scalar1;
444+
results[c + 2] = (f32_t)acc_scalar2;
445+
results[c + 3] = (f32_t)acc_scalar3;
446+
}
447+
448+
// Tail-handling: remaining vectors
449+
for (; c < count; c++) {
359450
const int8_t* a0 = a + mapper(c, offsets) * pitch;
360451
results[c] = (f32_t)vec_sqri8(a0, b, dims);
361452
}
@@ -809,71 +900,6 @@ EXPORT int64_t vec_dotd1q4(const int8_t* a, const int8_t* query, const int32_t l
809900
return dotd1q4_inner(a, query, length);
810901
}
811902

812-
EXPORT int64_t vec_dotd2q4(
813-
const int8_t* a,
814-
const int8_t* query,
815-
const int32_t length
816-
) {
817-
int64_t lower = dotd1q4_inner(a, query, length/2);
818-
int64_t upper = dotd1q4_inner(a + length/2, query, length/2);
819-
return lower + (upper << 1);
820-
}
821-
822-
EXPORT int64_t vec_dotd4q4(const int8_t* a, const int8_t* query, const int32_t length) {
823-
const int32_t bit_length = length / 4;
824-
int64_t p0 = dotd1q4_inner(a + 0 * bit_length, query, bit_length);
825-
int64_t p1 = dotd1q4_inner(a + 1 * bit_length, query, bit_length);
826-
int64_t p2 = dotd1q4_inner(a + 2 * bit_length, query, bit_length);
827-
int64_t p3 = dotd1q4_inner(a + 3 * bit_length, query, bit_length);
828-
return p0 + (p1 << 1) + (p2 << 2) + (p3 << 3);
829-
}
830-
831-
template <int64_t(*mapper)(const int32_t, const int32_t*)>
832-
static inline void dotd4q4_inner_bulk(
833-
const int8_t* a,
834-
const int8_t* query,
835-
const int32_t length,
836-
const int32_t pitch,
837-
const int32_t* offsets,
838-
const int32_t count,
839-
f32_t* results
840-
) {
841-
const int32_t bit_length = length / 4;
842-
843-
for (int c = 0; c < count; c++) {
844-
const int8_t* a0 = a + mapper(c, offsets) * pitch;
845-
846-
int64_t p0 = dotd1q4_inner(a0 + 0 * bit_length, query, bit_length);
847-
int64_t p1 = dotd1q4_inner(a0 + 1 * bit_length, query, bit_length);
848-
int64_t p2 = dotd1q4_inner(a0 + 2 * bit_length, query, bit_length);
849-
int64_t p3 = dotd1q4_inner(a0 + 3 * bit_length, query, bit_length);
850-
851-
results[c] = (f32_t)(p0 + (p1 << 1) + (p2 << 2) + (p3 << 3));
852-
}
853-
}
854-
855-
EXPORT void vec_dotd4q4_bulk(
856-
const int8_t* a,
857-
const int8_t* query,
858-
const int32_t length,
859-
const int32_t count,
860-
f32_t* results
861-
) {
862-
dotd4q4_inner_bulk<identity_mapper>(a, query, length, length, NULL, count, results);
863-
}
864-
865-
EXPORT void vec_dotd4q4_bulk_offsets(
866-
const int8_t* a,
867-
const int8_t* query,
868-
const int32_t length,
869-
const int32_t pitch,
870-
const int32_t* offsets,
871-
const int32_t count,
872-
f32_t* results
873-
) {
874-
dotd4q4_inner_bulk<array_mapper>(a, query, length, pitch, offsets, count, results);
875-
}
876-
877903
template <int64_t(*mapper)(const int32_t, const int32_t*)>
878904
static inline void dotd1q4_inner_bulk(
879905
const int8_t* a,
@@ -1013,6 +1039,15 @@ EXPORT void vec_dotd1q4_bulk_offsets(
10131039
dotd1q4_inner_bulk<array_mapper>(a, query, length, pitch, offsets, count, results);
10141040
}
10151041

1042+
EXPORT int64_t vec_dotd2q4(
1043+
const int8_t* a,
1044+
const int8_t* query,
1045+
const int32_t length
1046+
) {
1047+
int64_t lower = dotd1q4_inner(a, query, length/2);
1048+
int64_t upper = dotd1q4_inner(a + length/2, query, length/2);
1049+
return lower + (upper << 1);
1050+
}
10161051

10171052
template <int64_t(*mapper)(const int32_t, const int32_t*)>
10181053
static inline void dotd2q4_inner_bulk(
@@ -1026,7 +1061,6 @@ static inline void dotd2q4_inner_bulk(
10261061
) {
10271062
int c = 0;
10281063
const int bit_length = length/2;
1029-
// TODO: specialised implementation
10301064
for (; c < count; c++) {
10311065
const int8_t* a0 = a + mapper(c, offsets) * pitch;
10321066
int64_t lower = dotd1q4_inner(a0, query, bit_length);
@@ -1054,3 +1088,58 @@ EXPORT void vec_dotd2q4_bulk_offsets(
10541088
f32_t* results) {
10551089
dotd2q4_inner_bulk<array_mapper>(a, query, length, pitch, offsets, count, results);
10561090
}
1091+
1092+
EXPORT int64_t vec_dotd4q4(const int8_t* a, const int8_t* query, const int32_t length) {
1093+
const int32_t bit_length = length / 4;
1094+
int64_t p0 = dotd1q4_inner(a + 0 * bit_length, query, bit_length);
1095+
int64_t p1 = dotd1q4_inner(a + 1 * bit_length, query, bit_length);
1096+
int64_t p2 = dotd1q4_inner(a + 2 * bit_length, query, bit_length);
1097+
int64_t p3 = dotd1q4_inner(a + 3 * bit_length, query, bit_length);
1098+
return p0 + (p1 << 1) + (p2 << 2) + (p3 << 3);
1099+
}
1100+
1101+
template <int64_t(*mapper)(const int32_t, const int32_t*)>
1102+
static inline void dotd4q4_inner_bulk(
1103+
const int8_t* a,
1104+
const int8_t* query,
1105+
const int32_t length,
1106+
const int32_t pitch,
1107+
const int32_t* offsets,
1108+
const int32_t count,
1109+
f32_t* results
1110+
) {
1111+
const int32_t bit_length = length / 4;
1112+
1113+
for (int c = 0; c < count; c++) {
1114+
const int8_t* a0 = a + mapper(c, offsets) * pitch;
1115+
1116+
int64_t p0 = dotd1q4_inner(a0 + 0 * bit_length, query, bit_length);
1117+
int64_t p1 = dotd1q4_inner(a0 + 1 * bit_length, query, bit_length);
1118+
int64_t p2 = dotd1q4_inner(a0 + 2 * bit_length, query, bit_length);
1119+
int64_t p3 = dotd1q4_inner(a0 + 3 * bit_length, query, bit_length);
1120+
1121+
results[c] = (f32_t)(p0 + (p1 << 1) + (p2 << 2) + (p3 << 3));
1122+
}
1123+
}
1124+
1125+
EXPORT void vec_dotd4q4_bulk(
1126+
const int8_t* a,
1127+
const int8_t* query,
1128+
const int32_t length,
1129+
const int32_t count,
1130+
f32_t* results
1131+
) {
1132+
dotd4q4_inner_bulk<identity_mapper>(a, query, length, length, NULL, count, results);
1133+
}
1134+
1135+
EXPORT void vec_dotd4q4_bulk_offsets(
1136+
const int8_t* a,
1137+
const int8_t* query,
1138+
const int32_t length,
1139+
const int32_t pitch,
1140+
const int32_t* offsets,
1141+
const int32_t count,
1142+
f32_t* results
1143+
) {
1144+
dotd4q4_inner_bulk<array_mapper>(a, query, length, pitch, offsets, count, results);
1145+
}

modules/reindex/src/main/java/org/elasticsearch/reindex/BulkByPaginatedSearchParallelizationHelper.java

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99

1010
package org.elasticsearch.reindex;
1111

12+
import org.elasticsearch.Version;
1213
import org.elasticsearch.action.ActionListener;
1314
import org.elasticsearch.action.ActionType;
1415
import org.elasticsearch.action.admin.cluster.shards.ClusterSearchShardsRequest;
@@ -17,6 +18,7 @@
1718
import org.elasticsearch.action.search.SearchRequest;
1819
import org.elasticsearch.client.internal.Client;
1920
import org.elasticsearch.cluster.node.DiscoveryNode;
21+
import org.elasticsearch.core.Nullable;
2022
import org.elasticsearch.index.Index;
2123
import org.elasticsearch.index.mapper.IdFieldMapper;
2224
import org.elasticsearch.index.reindex.AbstractBulkByScrollRequest;
@@ -35,6 +37,7 @@
3537
import java.util.Map;
3638
import java.util.Optional;
3739
import java.util.Set;
40+
import java.util.function.Consumer;
3841
import java.util.stream.Collectors;
3942

4043
import static org.elasticsearch.index.reindex.AbstractBulkByScrollRequest.AUTO_SLICES;
@@ -73,19 +76,24 @@ static <Request extends AbstractBulkByScrollRequest<Request>> void startSlicedAc
7376
task,
7477
request,
7578
client,
76-
listener.delegateFailure((l, v) -> executeSlicedAction(task, request, action, l, client, node, workerAction))
79+
listener.delegateFailure(
80+
(l, v) -> executeSlicedAction(task, request, action, l, client, node, null, version -> workerAction.run())
81+
)
7782
);
7883
}
7984

8085
/**
8186
* Takes an action and a {@link BulkByScrollTask} and runs it with regard to whether this task is a
8287
* leader or worker.
8388
*
84-
* If this task is a worker, the worker action in the given {@link Runnable} will be started on the local
85-
* node. If the task is a leader (i.e. the number of slices is more than 1), then a subrequest will be
86-
* created for each slice and sent.
89+
* If this task is a worker, the worker action is invoked with the given {@code remoteVersion} (may be null
90+
* for local reindex). If the task is a leader (i.e. the number of slices is more than 1), then a subrequest
91+
* will be created for each slice and sent.
8792
*
8893
* This method can only be called after the task state is initialized {@link #initTaskState}.
94+
*
95+
* @param remoteVersion the version of the remote cluster when reindexing from remote, or null for local reindex
96+
* @param workerAction invoked when this task is a worker, with the remote version (or null)
8997
*/
9098
static <Request extends AbstractBulkByScrollRequest<Request>> void executeSlicedAction(
9199
BulkByScrollTask task,
@@ -94,12 +102,13 @@ static <Request extends AbstractBulkByScrollRequest<Request>> void executeSliced
94102
ActionListener<BulkByScrollResponse> listener,
95103
Client client,
96104
DiscoveryNode node,
97-
Runnable workerAction
105+
@Nullable Version remoteVersion,
106+
Consumer<Version> workerAction
98107
) {
99108
if (task.isLeader()) {
100109
sendSubRequests(client, action, node.getId(), task, request, listener);
101110
} else if (task.isWorker()) {
102-
workerAction.run();
111+
workerAction.accept(remoteVersion);
103112
} else {
104113
throw new AssertionError("Task should have been initialized at this point.");
105114
}

0 commit comments

Comments
 (0)