[8.x] [ESQL] Enable "any type" aggregations on Date Nanos #114438 #115557

not-napoleon · 2024-10-24T13:59:34Z

Resolves #110002
Resolves #110003
Resolves #110005

Enable Values, Count, CountDistinct, Min and Max aggregations on date nanos. In the course of addressing this, I had to make some changes to AggregateMapper where it maps types into string names. I tried to refactor this once before (#110841) but at the time we decided not to go ahead with it. That bit me while working on this, and so I am trying again to refactor it. This time I've made a more localized change, just replacing the cascading if block with a switch. That will cause a compile time failure when future new data types are added, unless they correctly update this section.

I've also done a small refactoring on the aggregators themselves, to make the supplier function consistent with the typeResolution.

…lastic#114150) (elastic#114724) With logsdb another index mode is available, the isTimeSeries parameter is limiting. Instead, we should just push down the index mode from template to index settings provider. Follow up from elastic#113451 Relates to elastic#113583

…lastic#114730)

…lastic#114737) Special values like `0.0.0.0` may resolve to multiple IP addresses just like hostnames, so the same considerations apply when using such values as a publish address. This commit spells this case out in the docs and cleans up the nearby wording a little.

…lastic#114731)

…14736) Today the overloads of `XContentBuilder#timeField` do two rather different things: one formats an object as a `String` representation of a time (where the object is either an unambiguous time object or else a `long`) and the other formats only a `long` as one or two fields depending on the `?human` flag. This is trappy in a number of ways: - `long` means an absolute (epoch) time, but sometimes folks will mistakenly use this for time intervals too. - `long` means only milliseconds, there is no facility to specify a different unit. - the dependence on the `?human` flag in exactly one of the overloads is kinda weird. This commit removes the confusion by dropping support for considering a `Long` as a valid representation of a time at all, and instead requiring callers to either convert it into a proper time object or else call a method that is explicitly expecting an epoch time in milliseconds.

…existing inference endpoints (elastic#114457) (elastic#114734) * [Inference API] Introduce Update API to change some aspects of existing inference endpoints (elastic#114457) (cherry picked from commit 6b714e2) * Fix syntax error caused by old JDK?

If apache sends an error mid stream, forward it to the user rather than the now-ignored listener.

* Add a query rules tester API call * Update docs/changelog/114168.yaml * Wrap client call in async with origin * Remove unused param * PR feedback * Remove redundant test * CI workaround - add ent-search as ml dependency so it can find node features

…stGetModel elastic#114657

…114744) Azure / Llama sends back fields we do not expect - rewriting the parser to better handle unknown fields (by dropping them).

…lastic#114751) This code refactors how the merge scheduler is configured to allow different engine implementations to configure different merge schedulers.

…ugs (elastic#114729) (elastic#114755)

…astic#114754) This appears to be dead code, so we're removing it.

Co-authored-by: Elastic Machine <[email protected]>

…lastic#114407) (elastic#114756) **Description:** This PR addresses the issue described in [elastic#114402](elastic#114402), where the `synthetic_source` feature does not correctly handle the `bit` type in `dense_vector` fields when `index` is set to `false`. The root cause of the issue was that the `bit` type was not properly accounted for, leading to an array that is 8 times the size of the actual `dims` value of docvalue. This mismatch will causes an array out-of-bounds exception when reconstructing the document. **Changes:** - Adjusted the `synthetic_source` logic to correctly handle the `bit` type by ensuring the array size accounts for the 8x difference in dimensions. - Added yaml test to cover the `bit` type scenario in `dense_vector` fields with `index` set to `false`. **Related Issues:** - Closes [elastic#114402](elastic#114402) - Introduced in [elastic#110059](elastic#110059) Co-authored-by: Rassyan <[email protected]>

Google supports SSE for chat completion and sends the same payload as their non-streaming calls, so we can reuse the SSE parser with our existing parse function. The downside is, google requires a different URI, so we refactored away from the visitor pattern to allow for a different URI creating and set during request time rather than on model instantiation time.

…astic#114758) * [ML] Pick best model variant for the default elser endpoint (elastic#114690) # Conflicts: # x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferencePlugin.java # x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/elasticsearch/ElasticsearchInternalServiceTests.java # x-pack/plugin/src/yamlRestTest/resources/rest-api-spec/test/inference/inference_crud.yml * fix test * fix test

…lastic#114770) Co-authored-by: Max Hniebergall <[email protected]> Co-authored-by: Elastic Machine <[email protected]>

…lastic#114683) (elastic#114779)

* [ML] Stream Bedrock Completion (elastic#114732) Notes: - Adds a new API to the chatCompletionRequest to invoke the Bedrock Stream API - Create a StreamingChatProcessor that subscribes to streaming results from bedrock and handles the parsing on another thread. - There was no good way (that I could see) to extend the Provider-based CompletionRequestEntity, so they have been flattened into one RequestEntity that can be shared between ConverseRequest and ConverseStreamRequest. * Use jdk17 API

…) (elastic#114785) The same line already exists in [L543](https://github.com/ywangd/elasticsearch/blob/9f4a7927bdc366f8ca98c4652ac7d1102d9430f5/server/src/main/java/org/elasticsearch/node/Node.java#L543). It should have no practial impact since AbstractLifecycleComponent#close short-circuits if its lifecycle is already closed. The original code meant to close IndicesMetrics. This PR adds it. Relates: elastic#113737

…ithTrainedModelAndInference elastic#114023

…rce.mode` (elastic#114433) (elastic#114680) * Introduce `index.mapping.source.mode` setting to override `_source.mode` (elastic#114433) * featur : introduce index.mapping.source.mode setting Introduce a new `index.mapper.source.mode` setting which will be used to override the mapping level `_source.mode`. For now the mapping level setting will stay and be deprecated later with another PR. The setting takes precedence always precedence. When not defined the index mode is used and can be overridden by the _source.mode mapping level definition. (cherry picked from commit edcabb8) * fix: replace return switch with switch case * fix: stored source mode not supported in 8.16 We also update a few error messages to account for a few minor differences. * Revert "fix: stored source mode not supported in 8.16" This reverts commit 2e523c3. * fix: stored source mode not supported in 8.16 We also update a few error messages to account for a few minor differences. * fix: update error message for time_series --------- Co-authored-by: Elastic Machine <[email protected]>

…ic#114482) (elastic#114793)

… (elastic#114792) Skip some csv tests that cannot be used in bwc tests before 8.13/8.14.

**Introduction** > In order to make adoption of failure stores simpler for all users, we are introducing a new syntactical feature to index expression resolution: The selector. > > Selectors, denoted with a :: followed by a recognized suffix will allow users to specify which component of an index abstraction they would like to operate on within an API call. In this case, an index abstraction is a concrete index, data stream, or alias; Any abstraction that can be resolved to a set of indices/shards. We define a component of an index abstraction to be some searchable unit of the index abstraction. > > To start, we will support two components: data and failures. Concrete indices are their own data components, while the data component for index aliases are all of the indices contained therein. For data streams, the data component corresponds to their backing indices. Data stream aliases mirror this, treating all backing indices of the data streams they correspond to as their data component. > > The failure component is only supported by data streams and data stream aliases. The failure component of these abstractions refer to the data streams' failure stores. Indices and index aliases do not have a failure component. For more details and examples see elastic#113144. All this work has been cherry picked from there. **Purpose of this PR** This PR is introducing a wrapper around the resolved expression that used to be a `String` to create the base on which the selectors are going to be added. The current PR is just a refactoring and does not and should not change any existing behaviour. Co-authored-by: Elastic Machine <[email protected]>

…ic#114798) Fix elastic#114767. TopN didn't work in this scenario on old versions.

… (elastic#114799) * Guard second doc parsing pass with index setting (elastic#114649) * Guard second doc parsing pass with index setting * add test * updates * updates * merge (cherry picked from commit 98e0a4e) * Update 21_synthetic_source_stored.yml

…elastic#114294) (elastic#114802) * Extend timeout of test and add logging on fail * Unmute unstable test * Switch to using logger for output Keeps the forbiddenApis check happy * Switch to using assertion messages to display To display debug info * Adjust logic of previous step info preservation Add additional checks to ensure previous step info can't be cleared when auto retrying, only updated with new info. Also added logic to ensure previous step info is cleared when transitioning to a new action * Undo accidentally added lines from merge

… (elastic#114783) * Adding new bbq index types behind a feature flag (elastic#114439) new index types of bbq_hnsw and bbq_flat which utilize the better binary quantization formats. A 32x reduction in memory, with nice recall properties. (cherry picked from commit 6c752ab) * spotless

* ESQL: Fix test muting (elastic#115448) Fix the test muting on the test for grapheme clusters - it should only allow the test if we're on the 20+ jvm. Closes elastic#114536 * Change old explanation

If a node has been removed from the cluster and the trained model assignment has not been updated the GET stats action can have an inconsistent view where it thinks a model is deployed on the removed node. The bug only affected nodes with failed deployments.

…ested objects (elastic#115275) (elastic#115467) * Apply workaround for synthetic source of object arrays inside nested objects (elastic#115275) (cherry picked from commit f04bf5c) # Conflicts: # rest-api-spec/build.gradle # rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/indices.create/21_synthetic_source_stored.yml * Fix merge

…astic#115447) (cherry picked from commit 254cedc)

…tic#115386) (elastic#115409) (cherry picked from commit 91a5a2e) # Conflicts: # muted-tests.yml

…astic#115320) (elastic#115493)

…ndexSettingsProvider (elastic#115437) (elastic#115495) * Use settings from LogsdbIndexModeSettingsProvider in SyntheticSourceIndexSettingsProvider * update (cherry picked from commit 6f7bd55)

…lastic#115469) Fixes some faulty assertions in an upgrade test. Test failures only manifest on the 8.16 branch since 9.x does not qualify for these upgrade tests, and the change is not backported to 8.17 yet (unrelated CI failures). I validated this works by running it locally from the 8.16 branch. Resolves: elastic#115410 Resolves: elastic#115411 Co-authored-by: Elastic Machine <[email protected]>

…15502) Backports elastic#115234

…tic#115389) (elastic#115507) **Introduction** > In order to make adoption of failure stores simpler for all users, we are introducing a new syntactical feature to index expression resolution: The selector. > > Selectors, denoted with a :: followed by a recognized suffix will allow users to specify which component of an index abstraction they would like to operate on within an API call. In this case, an index abstraction is a concrete index, data stream, or alias; Any abstraction that can be resolved to a set of indices/shards. We define a component of an index abstraction to be some searchable unit of the index abstraction. > > To start, we will support two components: data and failures. Concrete indices are their own data components, while the data component for index aliases are all of the indices contained therein. For data streams, the data component corresponds to their backing indices. Data stream aliases mirror this, treating all backing indices of the data streams they correspond to as their data component. > > The failure component is only supported by data streams and data stream aliases. The failure component of these abstractions refer to the data streams' failure stores. Indices and index aliases do not have a failure component. For more details and examples see elastic#113144. All this work has been cherry picked from there. **Purpose of this PR** This PR is introducing the `::*` as another selector option and not as a combination of `::data` and `::failure`. The reason for this change is that we need to differentiate between: - `my-index::*` which should resolve to `my-index::data` only and not to `my-index::failures` and - a user explicitly requesting `my-index::data, my-index::failures` which should result potentially to an error.

…elastic#115061) (elastic#115519) * simplify syntax of named parameter for identifier and pattern (cherry picked from commit 92ecd36)

…115525) Co-authored-by: David Kyle <[email protected]>

… testProcessFileChanges elastic#115280

We are seeing exceptions ~0.03% of the time in our integration tests: ``` org.apache.http.ConnectionClosedException: Connection closed unexpectedly ``` The `contentDecoder` does not always fully consume the body within `SimpleInputBuffer.consumeContent`. When we return back to Apache, the rest of the body is never delivered, and the IOSession eventually times out and gets cleaned up. During that cleanup process, Apache calls our Consumer with the above exception. If we read 0 bytes and return back immediately, Apache has a better chance to load the rest of the body/footer, and it will call `consumeContent` again. This reduces the exception rate down to ~0.001%. Fix elastic#114105 Fix elastic#114232 Fix elastic#114327 Fix elastic#114385

…115530) (cherry picked from commit 28715b7) Co-authored-by: mspielberg <[email protected]>

Resolves elastic#110002 Resolves elastic#110003 Resolves elastic#110005 Enable Values, Count, CountDistinct, Min and Max aggregations on date nanos. In the course of addressing this, I had to make some changes to AggregateMapper where it maps types into string names. I tried to refactor this once before (elastic#110841) but at the time we decided not to go ahead with it. That bit me while working on this, and so I am trying again to refactor it. This time I've made a more localized change, just replacing the cascading if block with a switch. That will cause a compile time failure when future new data types are added, unless they correctly update this section. I've also done a small refactoring on the aggregators themselves, to make the supplier function consistent with the typeResolution. --------- Co-authored-by: Elastic Machine <[email protected]>

github-actions · 2024-10-24T13:59:48Z

Documentation preview:

✨ Changed pages

not-napoleon · 2024-10-24T14:00:31Z

oops, set the wrong target branch.

martijnvg and others added 30 commits October 15, 2024 02:39

[ML] Switch default chunking strategy to sentence (elastic#114453) (e…

fc63a61

…lastic#114730)

Don't close/recreate adaptive allocations metrics (elastic#114721) (e…

d82daf4

…lastic#114731)

[ML] Send mid-stream errors to users (elastic#114549) (elastic#114746)

1ffe41b

If apache sends an error mid stream, forward it to the user rather than the now-ignored listener.

Mute org.elasticsearch.xpack.inference.integration.ModelRegistryIT te…

b0197c8

…stGetModel elastic#114657

[ML] Ignore unrecognized openai sse fields (elastic#114715) (elastic#…

3038a43

…114744) Azure / Llama sends back fields we do not expect - rewriting the parser to better handle unknown fields (by dropping them).

Refactor merge scheduling code to allow overrides (elastic#114547) (e…

737f803

…lastic#114751) This code refactors how the merge scheduler is configured to allow different engine implementations to configure different merge schedulers.

Test StDistance multivalue consistency and fixed two CartesianPoint b…

14ea56a

…ugs (elastic#114729) (elastic#114755)

Remove PushTopNToSource support for ExchangeExec (elastic#114637) (el…

844ae7d

…astic#114754) This appears to be dead code, so we're removing it.

Fixing test failure for elastic#114556 (elastic#114617) (elastic#114632)

6a00e91

Co-authored-by: Elastic Machine <[email protected]>

only return deprecation warning for elser service (elastic#114507) (e…

5babddf

…lastic#114770) Co-authored-by: Max Hniebergall <[email protected]> Co-authored-by: Elastic Machine <[email protected]>

[ML] Default inference endpoint for the multilingual-e5-small model (e…

ffcf87c

…lastic#114683) (elastic#114779)

Mute org.elasticsearch.xpack.inference.TextEmbeddingCrudIT testPutE5W…

e4a3fd6

…ithTrainedModelAndInference elastic#114023

Remove snapshot build restriction for match and qstr functions (elast…

581894a

…ic#114482) (elastic#114793)

ESQL: Add skips to tests that were added retroactively (elastic#114727)…

c87614c

… (elastic#114792) Skip some csv tests that cannot be used in bwc tests before 8.13/8.14.

Skip spatial.AirportsSortCityName before 8.13 (elastic#114795) (elast…

684c66a

…ic#114798) Fix elastic#114767. TopN didn't work in this scenario on old versions.

nik9000 and others added 17 commits October 24, 2024 08:50

ESQL: Fix test muting (elastic#115448) (elastic#115465)

548917a

* ESQL: Fix test muting (elastic#115448) Fix the test muting on the test for grapheme clusters - it should only allow the test if we're on the 20+ jvm. Closes elastic#114536 * Change old explanation

Separate tests for snapshot and release versions (elastic#115402) (el…

c3e5e4a

…astic#115447) (cherry picked from commit 254cedc)

Unmute SearchWithMinCompatibleSearchNodeIT tests muted for 7.17 (elas…

c85198e

…tic#115386) (elastic#115409) (cherry picked from commit 91a5a2e) # Conflicts: # muted-tests.yml

ES|QL: improve docs about escaping for GROK, DISSECT, LIKE, RLIKE (el…

5290630

…astic#115320) (elastic#115493)

Use settings from LogsdbIndexModeSettingsProvider in SyntheticSourceI…

b5bfb4f

…ndexSettingsProvider (elastic#115437) (elastic#115495) * Use settings from LogsdbIndexModeSettingsProvider in SyntheticSourceIndexSettingsProvider * update (cherry picked from commit 6f7bd55)

[DOCS] Resolves conflict. (elastic#115503)

4f3de83

Fix file settings service test on windows (elastic#115234) (elastic#1…

4a20067

…15502) Backports elastic#115234

[ES|QL] Simplify syntax of named parameter for identifier and pattern (…

b755d40

…elastic#115061) (elastic#115519) * simplify syntax of named parameter for identifier and pattern (cherry picked from commit 92ecd36)

[DOCS] Improve inference API documentation (elastic#115235) (elastic#…

36e95ca

…115525) Co-authored-by: David Kyle <[email protected]>

Mute org.elasticsearch.reservedstate.service.FileSettingsServiceTests…

5021d06

… testProcessFileChanges elastic#115280

Add documentation for minimum_should_match (elastic#113043) (elastic#…

1883db7

…115530) (cherry picked from commit 28715b7) Co-authored-by: mspielberg <[email protected]>

not-napoleon added >non-issue backport auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) test-release Trigger CI checks against release build :Analytics/ES|QL AKA ESQL v8.17.0 labels Oct 24, 2024

not-napoleon requested review from a team as code owners October 24, 2024 13:59

not-napoleon closed this Oct 24, 2024

elasticsearchmachine added the v9.0.0 label Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[8.x] [ESQL] Enable "any type" aggregations on Date Nanos #114438 #115557

[8.x] [ESQL] Enable "any type" aggregations on Date Nanos #114438 #115557

Uh oh!

not-napoleon commented Oct 24, 2024

Uh oh!

github-actions bot commented Oct 24, 2024

Uh oh!

not-napoleon commented Oct 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

60 participants

[8.x] [ESQL] Enable "any type" aggregations on Date Nanos #114438 #115557

[8.x] [ESQL] Enable "any type" aggregations on Date Nanos #114438 #115557

Uh oh!

Conversation

not-napoleon commented Oct 24, 2024

Uh oh!

github-actions bot commented Oct 24, 2024

Uh oh!

not-napoleon commented Oct 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

60 participants