Skip to content

Conversation

ankikuma
Copy link
Contributor

This will backport the following commit from main to 8.x:

Metrics for incremental bulk splits #116765

elasticsearchmachine and others added 30 commits November 6, 2024 09:42
…dCredentialsRestIT testFirstTimeSetupWithElasticsearchSettings elastic#116286
…) (elastic#116285)

* Add support for bitwise inner-product in painless (elastic#116082)

This adds bitwise inner product to painless. 

The idea here is:

 - For two bit arrays, which we determine to be a byte array whose dimensions match `dense_vector.dim/8`, we simply return bitwise `&`
 - For a stored bit array (remember, with `dense_vector.dim/8` bytes), sum up the provided byte or float array using the bit array as a mask.

This is effectively supporting asynchronous quantization. A prime
example of how this works is:
https://github.com/cohere-ai/BinaryVectorDB

Basically, you do your initial search against the binary space and then
rerank with a differently quantized vector allowing for more information
without additional storage space. 

closes:  elastic#111232

* removing unnecessary task adjustment

---------

Co-authored-by: Elastic Machine <[email protected]>
…lastic#116288)

* Align dot prefix validation with Serverless (elastic#116266)

This aligns the deprecation warnings for on-prem dot-prefixed indices to
be the same as the Serverless validation. It adds exemptions for the
`.entities…` indices, and makes the list a dynamic setting.

(cherry picked from commit 72aa17a)

* Fix compilation

---------

Co-authored-by: Elastic Machine <[email protected]>
We are currently holding to fields to extract values, this commit makes them abstract methods so 
we don't use any heap.
…astic#116343)

* Clarify that MSSQL supports only SQL Server auth

* typo
…c#113713) (elastic#116347)

* Adding inference endpoint validation for AzureAiStudioService

* Run spotlessApple

* Update docs/changelog/113713.yaml

* Remove isInClusterService from InferenceService

* Run spotless apply

---------

Co-authored-by: Elastic Machine <[email protected]>
…st_exception (elastic#116274) (elastic#116356)

* validate agg filter's type is boolean

(cherry picked from commit 0e044d7)
…ence/40_semantic_text_query/Query a field that uses the default ELSER 2 endpoint} elastic#114376
…c#116367)

This fixes a test, actually in serverless Elasticsearch, that gets
duplicate warnings. We'd like not to emit these duplicate warnings, but
at this point it isn't worth it. So, for now, in some tests we allow
duplicate warnings. In most of our tests we do not allow duplicate
warnings so that we don't make *more* duplicate warnings without
thinking about it.
elastic#115511) (elastic#116316)

A long desired balance computation could delay a newly created index shard from being assigned since first the computation has to finish for the assignments to be published and the shards getting assigned. With this change we add a new setting which allows setting a maximum time for a computation in case there are unassigned primary shards. Note that this is similar to how a new cluster state causes early publishing of the desired balance.

Closes ES-9616

Co-authored-by: Elastic Machine <[email protected]>
…lastic#116381)

* Better sizing BytesRef for Strings in Queries (elastic#115655)

* Better sizing BytesRefs for Strings in Queries

* Update docs/changelog/115655.yaml

* iter

* added test

* iter

* extracted method

* iter

---------

Co-authored-by: Elastic Machine <[email protected]>
(cherry picked from commit 9ebe95a)

* iter
…ic#116248)

Fixes elastic#114970

Added the warnings in the `RemoveStatsOverride` LogicalPlan rule, which is the same one that's removing the duplicates.

Also, fixed the groupings parser, which was assigning, to each stats grouping field, the source of the full "grouping context" instead. Without this fix, the warnings on groupings would, in some cases, say something like `Line 2:10: Field 'x' shadowed by field at line 2:10`.

As there are already tests for these cases, I'm requiring the capability on them, and updating their warnings expectations.

## Notes
I'm treating this as an enhancement instead of a bug. As there's existing logic removing duplicates, I'll guess this was decided at some point (Decision that may apply more or less nowadays).
And still, solving it this way is less dangerous and doesn't break compatibility.

Co-authored-by: Elastic Machine <[email protected]>
…uckets (elastic#116329) (elastic#116393)

Related with elastic#88128

This PR pretends to reduce the potential OOMs received when building internal aggregations.
…116410)

This commit shares a unique instance between all InternalTopMetrics instances.
nik9000 and others added 21 commits November 21, 2024 07:46
This fixes sorts containing the a `_source` field. It can use the
standard encoder for `BytesRef`s. You can't sort *by* a `_source` field,
but that doesn't really make sense ayway.
Adds a test that always fails on one of the data nodes and makes sure
this comes back as a failure. When we build support for partial results
we can use this test to simulate it.
* Esql Enable Date Nanos (elastic#117080)

This enables date nanos support as tech preview. Basic operations, like reading values, binary comparisons, and functions that don't care about type should work, but some functions are not yet supported. Most notably, Bucket is not yet supported, although Date_Trunc is and can be used for grouping. See the docs for the full list of limitations.

relates to elastic#109352

* Skip CATEGORIZE tests outside snapshot

---------

Co-authored-by: Nik Everett <[email protected]>
* Was using byte position for end of offset, but it seems like using char position is correct

* Update docs/changelog/116358.yaml

* Update UnigramTokenizer.java

---------

Co-authored-by: Elastic Machine <[email protected]>
elastic#117203)

This adds `maxSim` functions, specifically dotProduct and InvHamming.
Why these two you might ask? Well, they are the best approximations of
whats possible with Col* late interaction type models. Effectively, you
want a similarity metric where "greater == better". Regular `hamming`
isn't exactly that, but inverting that (just like our `element_type:
bit` index for dense_vectors), is a nice approximation with bit vectors
and multi-vector scoring.

Then, of course, dotProduct is another usage. We will allow dot-product
between like elements (bytes -> bytes, floats -> floats) and of course,
allow `floats -> bit`, where the stored `bit` elements are applied as a
"mask" over the float queries. This allows for some nice asymmetric
interactions.

This is all behind a feature flag, and I need to write a mountain of
docs in a separate PR.
elastic#116998) (elastic#117215)

This change loads all the modules and creates the module layers for plugins prior to entitlement 
checking during the 2nd phase of bootstrap initialization. This will allow us to know what modules exist 
for both validation and checking prior to actually loading any plugin classes (in a follow up change).

There are now two classes:

    PluginsLoader which does the module loading and layer creation
    PluginsService which uses a PluginsLoader to create the main plugin classes and start the plugins
No need to have an `ActionType<>` here since we never register this as
an action the `Client` can invoke. Also no need to use a dummy
constructor parameter just to trick the injector into instantiating it,
we can instantiate it ourselves like we do with all other subsidiary
transport-only actions. Also fixes the parent task so the remote action
is a child of the local action rather than a sibling.
TODO: Verify what we miss in out automation
…ions (elastic#117019) (elastic#117247)

checks periodically the real memory circuit breaker when allocating objects.
)

This fixes the off-by-one error of the column position in some of the
error messages.

(cherry picked from commit 21f206b)
…17189) (elastic#117254)

* Fix deberta tokenizer bug caused by bug in normalizer which caused offesets to be negative

* Update docs/changelog/117189.yaml
…117184) (elastic#117262)

Add tests on use of grouping functions in agg filters: check that
reusing the BUCKET expression from grouping is allowed, but no other
variation.

Related: elastic#115521
(cherry picked from commit fefa0f0)
Add metrics to track incremental bulk request splits due to indexing pressure. Resolves ES-9612
@ankikuma ankikuma requested review from a team as code owners November 21, 2024 19:14
Copy link
Contributor

Documentation preview:

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.0.0 labels Nov 21, 2024
@ankikuma ankikuma closed this Nov 21, 2024
@ankikuma ankikuma reopened this Nov 21, 2024
@ankikuma ankikuma closed this Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs:triage Requires assignment of a team area label v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.