[8.x] Metrics for incremental bulk splits (#116765) #117275

ankikuma · 2024-11-21T19:14:22Z

This will backport the following commit from main to 8.x:

Metrics for incremental bulk splits #116765

…dCredentialsRestIT testFirstTimeSetupWithElasticsearchSettings elastic#116286

…lastic#116304) (cherry picked from commit 8a98844)

…116320) (cherry picked from commit 954ab8a)

…estEveryActionIsEitherOperatorOnlyOrNonOperator elastic#102992

…atedSettingsReturnWarnings elastic#108628

…hAndRelocateConcurrentlyRandomReplicas elastic#116145

…gorize.Categorize SYNC} elastic#113054

…) (elastic#116285) * Add support for bitwise inner-product in painless (elastic#116082) This adds bitwise inner product to painless. The idea here is: - For two bit arrays, which we determine to be a byte array whose dimensions match `dense_vector.dim/8`, we simply return bitwise `&` - For a stored bit array (remember, with `dense_vector.dim/8` bytes), sum up the provided byte or float array using the bit array as a mask. This is effectively supporting asynchronous quantization. A prime example of how this works is: https://github.com/cohere-ai/BinaryVectorDB Basically, you do your initial search against the binary space and then rerank with a differently quantized vector allowing for more information without additional storage space. closes: elastic#111232 * removing unnecessary task adjustment --------- Co-authored-by: Elastic Machine <[email protected]>

…lastic#116288) * Align dot prefix validation with Serverless (elastic#116266) This aligns the deprecation warnings for on-prem dot-prefixed indices to be the same as the Serverless validation. It adds exemptions for the `.entities…` indices, and makes the list a dynamic setting. (cherry picked from commit 72aa17a) * Fix compilation --------- Co-authored-by: Elastic Machine <[email protected]>

We are currently holding to fields to extract values, this commit makes them abstract methods so we don't use any heap.

…astic#116343) * Clarify that MSSQL supports only SQL Server auth * typo

(cherry picked from commit f88f68d)

…c#113713) (elastic#116347) * Adding inference endpoint validation for AzureAiStudioService * Run spotlessApple * Update docs/changelog/113713.yaml * Remove isInClusterService from InferenceService * Run spotless apply --------- Co-authored-by: Elastic Machine <[email protected]>

…st_exception (elastic#116274) (elastic#116356) * validate agg filter's type is boolean (cherry picked from commit 0e044d7)

…ence/40_semantic_text_query/Query a field that uses the default ELSER 2 endpoint} elastic#114376

…c#116367) This fixes a test, actually in serverless Elasticsearch, that gets duplicate warnings. We'd like not to emit these duplicate warnings, but at this point it isn't worth it. So, for now, in some tests we allow duplicate warnings. In most of our tests we do not allow duplicate warnings so that we don't make *more* duplicate warnings without thinking about it.

… (elastic#116370)

…testHasPrivilegesOtherThanIndex elastic#116376

elastic#115511) (elastic#116316) A long desired balance computation could delay a newly created index shard from being assigned since first the computation has to finish for the assignments to be published and the shards getting assigned. With this change we add a new setting which allows setting a maximum time for a computation in case there are unassigned primary shards. Note that this is similar to how a new cluster state causes early publishing of the desired balance. Closes ES-9616 Co-authored-by: Elastic Machine <[email protected]>

…lastic#116381) * Better sizing BytesRef for Strings in Queries (elastic#115655) * Better sizing BytesRefs for Strings in Queries * Update docs/changelog/115655.yaml * iter * added test * iter * extracted method * iter --------- Co-authored-by: Elastic Machine <[email protected]> (cherry picked from commit 9ebe95a) * iter

…ic#116248) Fixes elastic#114970 Added the warnings in the `RemoveStatsOverride` LogicalPlan rule, which is the same one that's removing the duplicates. Also, fixed the groupings parser, which was assigning, to each stats grouping field, the source of the full "grouping context" instead. Without this fix, the warnings on groupings would, in some cases, say something like `Line 2:10: Field 'x' shadowed by field at line 2:10`. As there are already tests for these cases, I'm requiring the capability on them, and updating their warnings expectations. ## Notes I'm treating this as an enhancement instead of a bug. As there's existing logic removing duplicates, I'll guess this was decided at some point (Decision that may apply more or less nowadays). And still, solving it this way is less dangerous and doesn't break compatibility. Co-authored-by: Elastic Machine <[email protected]>

…uckets (elastic#116329) (elastic#116393) Related with elastic#88128 This PR pretends to reduce the potential OOMs received when building internal aggregations.

…6395) (cherry picked from commit 22c55fa)

…#116401)

… (elastic#116412) (cherry picked from commit c42b1ef)

…116410) This commit shares a unique instance between all InternalTopMetrics instances.

related to elastic#116134

This fixes sorts containing the a `_source` field. It can use the standard encoder for `BytesRef`s. You can't sort *by* a `_source` field, but that doesn't really make sense ayway.

Adds a test that always fails on one of the data nodes and makes sure this comes back as a failure. When we build support for partial results we can use this test to simulate it.

* Esql Enable Date Nanos (elastic#117080) This enables date nanos support as tech preview. Basic operations, like reading values, binary comparisons, and functions that don't care about type should work, but some functions are not yet supported. Most notably, Bucket is not yet supported, although Date_Trunc is and can be used for grouping. See the docs for the full list of limitations. relates to elastic#109352 * Skip CATEGORIZE tests outside snapshot --------- Co-authored-by: Nik Everett <[email protected]>

elastic#115631

* Was using byte position for end of offset, but it seems like using char position is correct * Update docs/changelog/116358.yaml * Update UnigramTokenizer.java --------- Co-authored-by: Elastic Machine <[email protected]>

…zeNestedGrouping elastic#116858

…zeSingleGrouping elastic#116857

…zeWithinAggregations elastic#116856

elastic#117203) This adds `maxSim` functions, specifically dotProduct and InvHamming. Why these two you might ask? Well, they are the best approximations of whats possible with Col* late interaction type models. Effectively, you want a similarity metric where "greater == better". Regular `hamming` isn't exactly that, but inverting that (just like our `element_type: bit` index for dense_vectors), is a nice approximation with bit vectors and multi-vector scoring. Then, of course, dotProduct is another usage. We will allow dot-product between like elements (bytes -> bytes, floats -> floats) and of course, allow `floats -> bit`, where the stored `bit` elements are applied as a "mask" over the float queries. This allows for some nice asymmetric interactions. This is all behind a feature flag, and I need to write a mountain of docs in a separate PR.

elastic#116998) (elastic#117215) This change loads all the modules and creates the module layers for plugins prior to entitlement checking during the 2nd phase of bootstrap initialization. This will allow us to know what modules exist for both validation and checking prior to actually loading any plugin classes (in a follow up change). There are now two classes: PluginsLoader which does the module loading and layer creation PluginsService which uses a PluginsLoader to create the main plugin classes and start the plugins

No need to have an `ActionType<>` here since we never register this as an action the `Client` can invoke. Also no need to use a dummy constructor parameter just to trick the injector into instantiating it, we can instantiate it ourselves like we do with all other subsidiary transport-only actions. Also fixes the parent task so the remote action is a child of the local action rather than a sibling.

elastic#117224)

TODO: Verify what we miss in out automation

…-time (elastic#117121) (elastic#117248)

…ions (elastic#117019) (elastic#117247) checks periodically the real memory circuit breaker when allocating objects.

) This fixes the off-by-one error of the column position in some of the error messages. (cherry picked from commit 21f206b)

…17189) (elastic#117254) * Fix deberta tokenizer bug caused by bug in normalizer which caused offesets to be negative * Update docs/changelog/117189.yaml

…117184) (elastic#117262) Add tests on use of grouping functions in agg filters: check that reusing the BUCKET expression from grouping is allowed, but no other variation. Related: elastic#115521 (cherry picked from commit fefa0f0)

…ical (elastic#117051) (elastic#117268) Backport elastic#117051

Add metrics to track incremental bulk request splits due to indexing pressure. Resolves ES-9612

github-actions · 2024-11-21T19:14:37Z

Documentation preview:

✨ Changed pages

elasticsearchmachine and others added 30 commits November 6, 2024 09:42

Mute org.elasticsearch.xpack.remotecluster.RemoteClusterSecurityReloa…

a58d437

…dCredentialsRestIT testFirstTimeSetupWithElasticsearchSettings elastic#116286

[DOCS] Fix typo in percentile-aggregation.asciidoc (elastic#116268) (e…

5d9ee17

…lastic#116304) (cherry picked from commit 8a98844)

Updates Connectors section page references (elastic#116239) (elastic#…

c58c94a

…116320) (cherry picked from commit 954ab8a)

Mute org.elasticsearch.xpack.security.operator.OperatorPrivilegesIT t…

8fbf9c6

…estEveryActionIsEitherOperatorOnlyOrNonOperator elastic#102992

Mute org.elasticsearch.xpack.deprecation.DeprecationHttpIT testDeprec…

78e9236

…atedSettingsReturnWarnings elastic#108628

Mute org.elasticsearch.search.basic.SearchWhileRelocatingIT testSearc…

037c362

…hAndRelocateConcurrentlyRandomReplicas elastic#116145

Mute org.elasticsearch.xpack.esql.qa.multi_node.EsqlSpecIT test {cate…

a4d1abb

…gorize.Categorize SYNC} elastic#113054

Make InternalCentroid leaner (elastic#116302) (elastic#116334)

2a51685

We are currently holding to fields to extract values, this commit makes them abstract methods so we don't use any heap.

Clarify that MSSQL supports only SQL Server auth (elastic#116340) (el…

c02db50

…astic#116343) * Clarify that MSSQL supports only SQL Server auth * typo

[test-triage] Unmuting stale muted items

f201df0

Add ES|QL match operator (:) (elastic#114831) (elastic#116308)

2251f80

(cherry picked from commit f88f68d)

[ES|QL] Verify aggregation filter's type is boolean to avoid class_ca…

ef79a64

…st_exception (elastic#116274) (elastic#116356) * validate agg filter's type is boolean (cherry picked from commit 0e044d7)

Mute org.elasticsearch.xpack.inference.InferenceRestIT test {p0=infer…

99fcffc

…ence/40_semantic_text_query/Query a field that uses the default ELSER 2 endpoint} elastic#114376

Add missing header in put_data_lifecycle rest-api-spec (elastic#116292)…

e76f73b

… (elastic#116370)

Mute org.elasticsearch.xpack.core.security.authz.RoleDescriptorTests …

7641277

…testHasPrivilegesOtherThanIndex elastic#116376

Add ES|QL bit_length function (elastic#115792) (elastic#116378)

b7951c5

Aggs: Add real memory CB call when building internal aggregators in b…

22c0eab

…uckets (elastic#116329) (elastic#116393) Related with elastic#88128 This PR pretends to reduce the potential OOMs received when building internal aggregations.

[DOCS] Fix boolean for native connectors (elastic#116394) (elastic#11…

8936459

…6395) (cherry picked from commit 22c55fa)

Add documentation for query rules retriever (elastic#115696) (elastic…

b24151a

…#116401)

Merge (elastic#116406)

4b03ef8

[DOCS] Use explicit link text in query rules retriever (elastic#116389)…

beb8f3c

… (elastic#116412) (cherry picked from commit c42b1ef)

Reuse metric names in TopMetricsAggregator (elastic#116296) (elastic#…

90dee80

…116410) This commit shares a unique instance between all InternalTopMetrics instances.

Only mute the bad test (elastic#116409)

fd89e15

related to elastic#116134

nik9000 and others added 21 commits November 21, 2024 07:46

ESQL: Fix sorts containing _source (elastic#116980) (elastic#117191)

2e631d5

This fixes sorts containing the a `_source` field. It can use the standard encoder for `BytesRef`s. You can't sort *by* a `_source` field, but that doesn't really make sense ayway.

ESQL: Test with a data node failure (elastic#117164) (elastic#117196)

cf05894

Adds a test that always fails on one of the data nodes and makes sure this comes back as a failure. When we build support for partial results we can use this test to simulate it.

Bump to version 8.18.0

5261ec3

Mute org.elasticsearch.oldrepos.OldRepositoryAccessIT testOldRepoAccess

ab5c017

elastic#115631

[ML] Update Deberta tokenizer (elastic#116358) (elastic#117194)

6cb39b1

* Was using byte position for end of offset, but it seems like using char position is correct * Update docs/changelog/116358.yaml * Update UnigramTokenizer.java --------- Co-authored-by: Elastic Machine <[email protected]>

Mute org.elasticsearch.xpack.esql.analysis.VerifierTests testCategori…

ded6196

…zeNestedGrouping elastic#116858

Mute org.elasticsearch.xpack.esql.analysis.VerifierTests testCategori…

8b640e5

…zeSingleGrouping elastic#116857

Mute org.elasticsearch.xpack.esql.analysis.VerifierTests testCategori…

4640fd9

…zeWithinAggregations elastic#116856

ESQL - match operator included in non-snapshot builds (elastic#116819) (

5a9c05e

elastic#117224)

Bump 8.x version (elastic#117232)

0bcc504

TODO: Verify what we miss in out automation

Adding missing json spec for allow_partial_search_results in point-in…

4d9f2e5

…-time (elastic#117121) (elastic#117248)

Check the real memory circuit breaker when building internal aggregat…

af60dce

…ions (elastic#117019) (elastic#117247) checks periodically the real memory circuit breaker when allocating objects.

ESQL: fix the column position in errors (elastic#117153) (elastic#117255

62dc839

) This fixes the off-by-one error of the column position in some of the error messages. (cherry picked from commit 21f206b)

[ML] Fix deberta tokenizer bug caused by bug in normalizer (elastic#1…

e0733f3

…17189) (elastic#117254) * Fix deberta tokenizer bug caused by bug in normalizer which caused offesets to be negative * Update docs/changelog/117189.yaml

Updating PivotConfig max_page_search_size deprecation warning to crit…

72388cd

…ical (elastic#117051) (elastic#117268) Backport elastic#117051

Metrics for incremental bulk splits (elastic#116765)

8902ba0

Add metrics to track incremental bulk request splits due to indexing pressure. Resolves ES-9612

ankikuma requested review from a team as code owners November 21, 2024 19:14

elasticsearchmachine added needs:triage Requires assignment of a team area label v9.0.0 labels Nov 21, 2024

ankikuma closed this Nov 21, 2024

ankikuma reopened this Nov 21, 2024

ankikuma closed this Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[8.x] Metrics for incremental bulk splits (#116765) #117275

[8.x] Metrics for incremental bulk splits (#116765) #117275

Uh oh!

ankikuma commented Nov 21, 2024

Uh oh!

github-actions bot commented Nov 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

[8.x] Metrics for incremental bulk splits (#116765) #117275

[8.x] Metrics for incremental bulk splits (#116765) #117275

Uh oh!

Conversation

ankikuma commented Nov 21, 2024

Uh oh!

github-actions bot commented Nov 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants