Skip to content

Conversation

not-napoleon
Copy link
Member

Resolves #110002
Resolves #110003
Resolves #110005

Enable Values, Count, CountDistinct, Min and Max aggregations on date nanos. In the course of addressing this, I had to make some changes to AggregateMapper where it maps types into string names. I tried to refactor this once before (#110841) but at the time we decided not to go ahead with it. That bit me while working on this, and so I am trying again to refactor it. This time I've made a more localized change, just replacing the cascading if block with a switch. That will cause a compile time failure when future new data types are added, unless they correctly update this section.

I've also done a small refactoring on the aggregators themselves, to make the supplier function consistent with the typeResolution.

martijnvg and others added 30 commits October 15, 2024 02:39
…lastic#114150) (elastic#114724)

With logsdb another index mode is available, the isTimeSeries parameter is limiting. Instead, we should just push down the index mode from template to index settings provider.

Follow up from elastic#113451
Relates to elastic#113583
…lastic#114737)

Special values like `0.0.0.0` may resolve to multiple IP addresses just
like hostnames, so the same considerations apply when using such values
as a publish address. This commit spells this case out in the docs and
cleans up the nearby wording a little.
…14736)

Today the overloads of `XContentBuilder#timeField` do two rather
different things: one formats an object as a `String` representation of
a time (where the object is either an unambiguous time object or else a
`long`) and the other formats only a `long` as one or two fields
depending on the `?human` flag.

This is trappy in a number of ways:

- `long` means an absolute (epoch) time, but sometimes folks will
  mistakenly use this for time intervals too.

- `long` means only milliseconds, there is no facility to specify a
  different unit.

- the dependence on the `?human` flag in exactly one of the overloads is
  kinda weird.

This commit removes the confusion by dropping support for considering a
`Long` as a valid representation of a time at all, and instead requiring
callers to either convert it into a proper time object or else call a
method that is explicitly expecting an epoch time in milliseconds.
…existing inference endpoints (elastic#114457) (elastic#114734)

* [Inference API] Introduce Update API to change some aspects of existing inference endpoints (elastic#114457)

(cherry picked from commit 6b714e2)

* Fix syntax error caused by old JDK?
If apache sends an error mid stream, forward it to the user rather than
the now-ignored listener.
* Add a query rules tester API call

* Update docs/changelog/114168.yaml

* Wrap client call in async with origin

* Remove unused param

* PR feedback

* Remove redundant test

* CI workaround - add ent-search as ml dependency so it can find node features
…114744)

Azure / Llama sends back fields we do not expect - rewriting the parser
to better handle unknown fields (by dropping them).
…lastic#114751)

This code refactors how the merge scheduler is configured to allow
different engine implementations to configure different merge schedulers.
…lastic#114407) (elastic#114756)

**Description:**

This PR addresses the issue described in [elastic#114402](elastic#114402), where the `synthetic_source` feature does not correctly handle the `bit` type in `dense_vector` fields when `index` is set to `false`. The root cause of the issue was that the `bit` type was not properly accounted for, leading to an array that is 8 times the size of the actual `dims` value of docvalue. This mismatch will causes an array out-of-bounds exception when reconstructing the document.

**Changes:**

- Adjusted the `synthetic_source` logic to correctly handle the `bit` type by ensuring the array size accounts for the 8x difference in dimensions.
- Added yaml test to cover the `bit` type scenario in `dense_vector` fields with `index` set to `false`.

**Related Issues:**

- Closes [elastic#114402](elastic#114402)
- Introduced in [elastic#110059](elastic#110059)

Co-authored-by: Rassyan <[email protected]>
Google supports SSE for chat completion and sends the same payload as
their non-streaming calls, so we can reuse the SSE parser with our
existing parse function.

The downside is, google requires a different URI, so we refactored away
from the visitor pattern to allow for a different URI creating and set
during request time rather than on model instantiation time.
…astic#114758)

* [ML] Pick best model variant for the default elser endpoint (elastic#114690)

# Conflicts:
#	x-pack/plugin/inference/src/main/java/org/elasticsearch/xpack/inference/InferencePlugin.java
#	x-pack/plugin/inference/src/test/java/org/elasticsearch/xpack/inference/services/elasticsearch/ElasticsearchInternalServiceTests.java
#	x-pack/plugin/src/yamlRestTest/resources/rest-api-spec/test/inference/inference_crud.yml

* fix test

* fix test
* [ML] Stream Bedrock Completion (elastic#114732)

Notes:
- Adds a new API to the chatCompletionRequest to invoke the Bedrock
  Stream API
- Create a StreamingChatProcessor that subscribes to streaming results
  from bedrock and handles the parsing on another thread.
- There was no good way (that I could see) to extend the Provider-based
  CompletionRequestEntity, so they have been flattened into one
  RequestEntity that can be shared between ConverseRequest and
  ConverseStreamRequest.

* Use jdk17 API
…) (elastic#114785)

The same line already exists in
[L543](https://github.com/ywangd/elasticsearch/blob/9f4a7927bdc366f8ca98c4652ac7d1102d9430f5/server/src/main/java/org/elasticsearch/node/Node.java#L543).
It should have no practial impact since AbstractLifecycleComponent#close
short-circuits if its lifecycle is already closed. The original code
meant to close IndicesMetrics. This PR adds it.

Relates: elastic#113737
…rce.mode` (elastic#114433) (elastic#114680)

* Introduce `index.mapping.source.mode` setting to override `_source.mode` (elastic#114433)

* featur : introduce index.mapping.source.mode setting

Introduce a new `index.mapper.source.mode` setting which will be used
to override the mapping level `_source.mode`. For now the mapping
level setting will stay and be deprecated later with another PR.

The setting takes precedence always precedence. When not defined
the index mode is used and can be overridden by the _source.mode
mapping level definition.

(cherry picked from commit edcabb8)

* fix: replace return switch with switch case

* fix: stored source mode not supported in 8.16

We also update a few error messages to account
for a few minor differences.

* Revert "fix: stored source mode not supported in 8.16"

This reverts commit 2e523c3.

* fix: stored source mode not supported in 8.16

We also update a few error messages to account
for a few minor differences.

* fix: update error message for time_series

---------

Co-authored-by: Elastic Machine <[email protected]>
… (elastic#114792)

Skip some csv tests that cannot be used in bwc tests before 8.13/8.14.
**Introduction**

> In order to make adoption of failure stores simpler for all users, we
are introducing a new syntactical feature to index expression
resolution: The selector. > > Selectors, denoted with a :: followed by a
recognized suffix will allow users to specify which component of an
index abstraction they would like to operate on within an API call. In
this case, an index abstraction is a concrete index, data stream, or
alias; Any abstraction that can be resolved to a set of indices/shards.
We define a component of an index abstraction to be some searchable unit
of the index abstraction. > > To start, we will support two components:
data and failures. Concrete indices are their own data components, while
the data component for index aliases are all of the indices contained
therein. For data streams, the data component corresponds to their
backing indices. Data stream aliases mirror this, treating all backing
indices of the data streams they correspond to as their data component.
>  > The failure component is only supported by data streams and data
stream aliases. The failure component of these abstractions refer to the
data streams' failure stores. Indices and index aliases do not have a
failure component.

For more details and examples see
elastic#113144. All this work has
been cherry picked from there.

**Purpose of this PR**

This PR is introducing a wrapper around the resolved expression that
used to be a `String` to create the base on which the selectors are
going to be added.

The current PR is just a refactoring and does not and should not change
any existing behaviour.

Co-authored-by: Elastic Machine <[email protected]>
… (elastic#114799)

* Guard second doc parsing pass with index setting (elastic#114649)

* Guard second doc parsing pass with index setting

* add test

* updates

* updates

* merge

(cherry picked from commit 98e0a4e)

* Update 21_synthetic_source_stored.yml
…elastic#114294) (elastic#114802)

* Extend timeout of test and add logging on fail

* Unmute unstable test

* Switch to using logger for output

Keeps the forbiddenApis check happy

* Switch to using assertion messages to display

To display debug info

* Adjust logic of previous step info preservation

Add additional checks to ensure previous step info can't be cleared
when auto retrying, only updated with new info.

Also added logic to ensure previous step info is cleared when
transitioning to a new action

* Undo accidentally added lines from merge
… (elastic#114783)

* Adding new bbq index types behind a feature flag (elastic#114439)

new index types of bbq_hnsw and bbq_flat which utilize the better binary quantization formats. A 32x reduction in memory, with nice recall properties.

(cherry picked from commit 6c752ab)

* spotless
nik9000 and others added 17 commits October 24, 2024 08:50
* ESQL: Fix test muting (elastic#115448)

Fix the test muting on the test for grapheme clusters - it should only
allow the test if we're on the 20+ jvm.

Closes elastic#114536

* Change old explanation
If a node has been removed from the cluster and the trained model 
assignment has not been updated the GET stats action can have an
inconsistent view where it thinks a model is deployed on the removed
node. The bug only affected nodes with failed deployments.
…ested objects (elastic#115275) (elastic#115467)

* Apply workaround for synthetic source of object arrays inside nested objects (elastic#115275)

(cherry picked from commit f04bf5c)

# Conflicts:
#	rest-api-spec/build.gradle
#	rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/indices.create/21_synthetic_source_stored.yml

* Fix merge
…ndexSettingsProvider (elastic#115437) (elastic#115495)

* Use settings from LogsdbIndexModeSettingsProvider in SyntheticSourceIndexSettingsProvider

* update

(cherry picked from commit 6f7bd55)
…lastic#115469)

Fixes some faulty assertions in an upgrade test. Test failures only
manifest on the 8.16 branch since 9.x does not qualify for these upgrade
tests, and the change is not backported to 8.17 yet (unrelated CI
failures).

I validated this works by running it locally from the 8.16 branch.

Resolves: elastic#115410
Resolves: elastic#115411

Co-authored-by: Elastic Machine <[email protected]>
…tic#115389) (elastic#115507)

**Introduction**

> In order to make adoption of failure stores simpler for all users, we
are introducing a new syntactical feature to index expression
resolution: The selector. > > Selectors, denoted with a :: followed by a
recognized suffix will allow users to specify which component of an
index abstraction they would like to operate on within an API call. In
this case, an index abstraction is a concrete index, data stream, or
alias; Any abstraction that can be resolved to a set of indices/shards.
We define a component of an index abstraction to be some searchable unit
of the index abstraction. > > To start, we will support two components:
data and failures. Concrete indices are their own data components, while
the data component for index aliases are all of the indices contained
therein. For data streams, the data component corresponds to their
backing indices. Data stream aliases mirror this, treating all backing
indices of the data streams they correspond to as their data component.
>  > The failure component is only supported by data streams and data
stream aliases. The failure component of these abstractions refer to the
data streams' failure stores. Indices and index aliases do not have a
failure component.

For more details and examples see
elastic#113144. All this work has
been cherry picked from there.

**Purpose of this PR**

This PR is introducing the `::*` as another selector option and not as a
combination of `::data` and `::failure`. The reason for this change is
that we need to differentiate between:

- `my-index::*` which should resolve to `my-index::data` only and not to `my-index::failures` and
- a user explicitly requesting `my-index::data, my-index::failures` which should result potentially to an error.
…elastic#115061) (elastic#115519)

* simplify syntax of named parameter for identifier and pattern

(cherry picked from commit 92ecd36)
We are seeing exceptions ~0.03% of the time in our integration tests:
```
org.apache.http.ConnectionClosedException: Connection closed unexpectedly
```

The `contentDecoder` does not always fully consume the body within
`SimpleInputBuffer.consumeContent`. When we return back to Apache, the
rest of the body is never delivered, and the IOSession eventually times
out and gets cleaned up. During that cleanup process, Apache calls our
Consumer with the above exception.

If we read 0 bytes and return back immediately, Apache has a better
chance to load the rest of the body/footer, and it will call
`consumeContent` again. This reduces the exception rate
down to ~0.001%.

Fix elastic#114105
Fix elastic#114232
Fix elastic#114327
Fix elastic#114385
Resolves elastic#110002
Resolves elastic#110003
Resolves elastic#110005

Enable Values, Count, CountDistinct, Min and Max aggregations on date nanos. In the course of addressing this, I had to make some changes to AggregateMapper where it maps types into string names. I tried to refactor this once before (elastic#110841) but at the time we decided not to go ahead with it. That bit me while working on this, and so I am trying again to refactor it. This time I've made a more localized change, just replacing the cascading if block with a switch. That will cause a compile time failure when future new data types are added, unless they correctly update this section.

I've also done a small refactoring on the aggregators themselves, to make the supplier function consistent with the typeResolution.

---------

Co-authored-by: Elastic Machine <[email protected]>
@not-napoleon not-napoleon added >non-issue backport auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) test-release Trigger CI checks against release build :Analytics/ES|QL AKA ESQL v8.17.0 labels Oct 24, 2024
@not-napoleon not-napoleon requested review from a team as code owners October 24, 2024 13:59
Copy link
Contributor

Documentation preview:

@not-napoleon
Copy link
Member Author

oops, set the wrong target branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport >non-issue test-release Trigger CI checks against release build v8.17.0 v9.0.0

Projects

None yet