Skip to content

Conversation

seanstory
Copy link
Owner

  • Have you signed the contributor license agreement?
  • Have you followed the contributor guidelines?
  • If submitting code, have you built your formula locally prior to submission with gradle check?
  • If submitting code, is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed.
  • If submitting code, have you checked that your submission is for an OS and architecture that we support?
  • If you are submitting this code for a class then read our policy for that.

mosche and others added 30 commits July 8, 2025 10:42
We specify the master node timeout from the REST request to avoid
waiting for the task indefinitely.

Resolves elastic#120389
These test failures looked like infra/CI blips to me.

Closes elastic#124518
…lastic#128917)

Part of elastic#124715 and
similar to elastic#128476.
Different from elastic#128476 in
that it takes a "LogicalPlan" approach to running a sub-query,
integrating its result back in the "main" LogicalPlan and continuing
running the query.
* Update [email protected]

* Update resources.yaml

* fix: explicitly map system.process.cpu.start_time to date

* Update [email protected]

* Update [email protected]

* Update [email protected]
…ic#129684)

In a follow up (elastic#128993) remaining lenient usage of booleans will be deprecated, to eventually remove everything except for a few places requiring lenient parsing by means of Booleans.parseBooleanLenient - which is a wrapper around Boolean.parseBoolean.
---------

Co-authored-by: Moritz Mack <[email protected]>
This action solely needs the cluster state, it can run on any node.
Since this action is invoked across clusters, we need to be able to
(de)serialize requests and responses. We introduce a new
`RemoteClusterStateRequest` that wraps the existing
`ClusterStateRequest` and implements (de)serialization.
Add verification for LocalLogical plan
The verification is skipped if there is remote enrich, similar to how it is skipped for LocalPhysical plan optimization.
The skip only happens for LocalLogical and LocalPhysical plan optimizers.
* Add filtering for kNN vector indexer test scenarios

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <[email protected]>
Cleanup tracing header name constants
This commit fixes the Int7uScorerBenchmarkTests for running on Java 21, since scoring with heap segments is only supported on Java 22 and greater.
…st {p0=mtermvectors/10_basic/Tests catching other exceptions per item} elastic#122414
Fixes a bug during field loading where we could double-close blocks if
we failed to allocate memory during the un-shuffling portion of field
loading from single segments.

Unit test incoming in the followup.

Closes elastic#130426
Closes elastic#130790
Closes elastic#130791
Closes elastic#130792
Closes elastic#130793
Closes elastic#130270
Closes elastic#130788
Closes elastic#130122
Closes elastic#130827
* Adding embedding type

* Adding more tests and cleaning up
For most of the usages of these methods, it made more sense to return a `ProjectMetadata` instead of a `ClusterState`.
We also don't need to specify a specific project ID; generating a random one inside the helper method saves some boilerplate code.
We should not build the sorted structure for the ordinal grouping 
operator if the requested position is larger than maxGroupId. This
situation occurs with nulls. We should benchmark the ordinal blocks and
consider removing the ordinal grouping operator if performance is
similar; otherwise, we need to integrate this operator with
GroupingAggregatorFunctionTestCase.

Relates elastic#130576
… instead of interacting with doc values api directly. (elastic#130854)

This pulls elastic#130845 into the serverless fix branch for patch deployment.  Original description:

Change match_only_text's value fetcher to use SortedBinaryDocValues instead of interacting with doc values api directly.

This way, via field data abstraction, the right doc values type is used, and the right conversions happen. Values of all field types will get converted to strings.

Co-authored-by: Martijn van Groningen <[email protected]>
This change modifies reindex behavior to always include vector fields, even if the target index omits embeddings from _source.
This prepares for scenarios where embeddings may be automatically excluded (elastic#130382).
smalyshev and others added 28 commits July 16, 2025 13:09
* Put shards failure under a cap flag
…DisruptionIT testDataStreamLifecycleDownsampleRollingRestart elastic#131394
With the ordinal grouping operator removed in elastic#131133, this PR removes 
the corresponding code path in the grouping aggregator function, as it
is no longer needed.

Relates elastic#131133
The new attribute generated by MV_EXPAND should remain in the original position. The projection added by ProjectAwayColumns does not respect the original order of attributes.

Make ProjectAwayColumns respect the order of attributes to fix this.
* ES|QL categorize options

* refactor options

* fix serialization

* polish

* add verfications

* better test coverage + polish code

* better test coverage + polish code
This PR migrates legacy rest tests in the x-pack autoscaling module
It's already part of the path parts, it's not useful to duplicate it in query
parameters.
* Add Azure AI Rerank support

* address comments

* address comments

* refactor azure ai studio service

* update rerank task settings test

* add provider for rerank
Adds the `includeDiskInfo` parameter to the `cluster/allocation/explain`
 `toString()` method, and adds tests.
Also add test to ensure the file has at least one entry for each region
so that it is easy to spot missing regions in future upgrades.

Relates: elastic#131050 Resolves: elastic#131392
* Refactoring google gemini streaming error handling

* Updating comments
* To prevent an implicit grant-all if storing node homes inside the Java temp dir, the temporary folder of ESTestCase is configured separately from the Java temp dir in internalClusterTests (by means of the system property tempDir, see TestRuleTemporaryFilesCleanup)

* Move ReloadingDatabasesWhilePerformingGeoLookupsIT from internalClusterTest to test, file permissions in internalClusterTest are stricter on the lucene tempDir
Correct response which had swapped "skipped" and "failed" shard counts.
* fix boosting for knn

* Fixing for match query

* fixing for match subquery

* fix for sparse vector query boost

* fix linting issues

* Update docs/changelog/129282.yaml

* update changelog

* Copy constructor with match query

* util function to create sparseVectorBuilder for sparse query

* util function for knn query to support boost

* adding unit tests for all intercepted query terms

* Adding yaml test for match,sparse, and knn

* Adding queryname support for nested query

* fix code styles

* Fix failed yaml tests

* Update docs/changelog/129282.yaml

* update yaml tests to expand test scenarios

* Updating knn to copy constructor

* adding yaml tests for multiple indices

* refactoring match query to adjust boost and queryname and move to copy constructor

* refactoring sparse query to adjust boost and queryname and move to copy constructor

* [CI] Auto commit changes from spotless

* Refactor sparse vector to adjust boost and queryname in the top level

* Refactor knn vector to adjust boost and queryname in the top level

* fix knn combined query

* fix unit tests

* fix lint issues

* remove unused code

* Update inference feature name

* Remove double boosting issue from match

* Fix double boosting in match test yaml file

* move to bool level for match semantic boost

* fix double boosting for sparse vector

* fix double boosting for sparse vector in yaml test

* fix knn combined query

* fix knn combined query

* fix sparse combined query

* fix knn yaml test for combined query

* refactoring unit tests

* linting

* fix match query unit test

* adding copy constructor for match query

* refactor copy match builder to intercepter

* [CI] Auto commit changes from spotless

* fix unit tests

* update yaml tests

* fix match yaml test

* fix yaml tests with 4 digits error margin

* unit tests are now more randomized

---------

Co-authored-by: Elastic Machine <[email protected]>
Co-authored-by: elasticsearchmachine <[email protected]>
When the Trained Model has been deployed through the Inference Endpoint
API, it can only be updated using the Inference Endpoint API.

When the Trained Model has been deployed and then attached to an
Inference Endpoint, it can only be updated using the Trained Model API.

Fix elastic#129999

Co-authored-by: elasticsearchmachine <[email protected]>
Co-authored-by: David Kyle <[email protected]>
In elastic#131314 we fixed match_only_text fields with ignore_above keyword
multi-fields in the case that the keyword multi-field is stored. However,
the issue is still present if the keyword field is not stored, but instead
has doc values.

This patch fixes that case.
Although blocks/vectors are immutable and safe to share between threads, 
their references are currently not thread-safe, which can lead to data 
races. Previously, blocks/vectors were exclusively owned by a single
thread, but this is no longer always the case with InlineJoin. We should
consider switching to AbstractRefCounted, which is thread-safe, and
benchmark it with many-fields use cases to ensure there is no
performance regression. As a temporary solution, this change clones the
values block in InlineJoin until thread-safe blocks/vectors are
available.
…129108)

This commit adds support for implicit casting of aggregate_metric_double
when present with other numerics for a limited set of aggregation
functions:
- Max / MaxOverTime
- Min / MinOverTime
- Sum / SumOverTime
- Count / CountOverTime
- Avg / AvgOverTime

Attempting to use fields mapped to aggregate_metric_double in one index
but some other numeric in another index in any other context will still
require explicit casting with ToAggregateMetricDouble
I accidentally broke recall on flush by allowing vectors to be double
quantized. Additionally, we shouldn't use the first vector as a
centroid, this can harm recall significantly when there is just one
centroid.

recall before this change:

```
index_name                             index_type  num_docs  index_time(ms)  force_merge_time(ms)  num_segments
-------------------------------------  ----------  --------  --------------  --------------------  ------------
corpus-dbpedia-entity-E5-small-0.fvec         ivf   1000000           25820                     0            14
corpus-dbpedia-entity-E5-small-0.fvec         ivf   1000000               0                 41693             0

index_name                             index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall    visited  filter_selectivity
-------------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ---------  ------------------
corpus-dbpedia-entity-E5-small-0.fvec         ivf       50        13.05              0.00           0.00   76.61    0.63  285267.44                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      150        31.92              0.00           0.00   31.33    0.68  629033.22                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      200        34.79              0.00           0.00   28.74    0.69  679699.13                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      500        39.40              0.00           0.00   25.38    0.71  794375.05                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf     1000        45.99              0.00           0.00   21.74    0.72  940493.52                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf       50         1.52              0.00           0.00  655.74    0.74   24201.82                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      150         2.94              0.00           0.00  340.43    0.85   67943.31                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      200         3.81              0.00           0.00  262.81    0.87   89575.99                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      500         7.67              0.00           0.00  130.38    0.93  213586.44                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf     1000        14.85              0.00           0.00   67.33    0.96  402628.11                1.00
```

With this fix:

```
index_name                             index_type  num_docs  index_time(ms)  force_merge_time(ms)  num_segments
-------------------------------------  ----------  --------  --------------  --------------------  ------------
corpus-dbpedia-entity-E5-small-0.fvec         ivf   1000000           25304                     0            15
corpus-dbpedia-entity-E5-small-0.fvec         ivf   1000000               0                 42110             0

index_name                             index_type  n_probe  latency(ms)  net_cpu_time(ms)  avg_cpu_count     QPS  recall    visited  filter_selectivity
-------------------------------------  ----------  -------  -----------  ----------------  -------------  ------  ------  ---------  ------------------
corpus-dbpedia-entity-E5-small-0.fvec         ivf       50        12.63              0.00           0.00   79.18    0.89  285527.22                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      150        32.49              0.00           0.00   30.77    0.94  619783.37                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      200        35.46              0.00           0.00   28.20    0.95  667903.47                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      500        40.38              0.00           0.00   24.76    0.97  781959.74                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf     1000        48.62              0.00           0.00   20.57    0.98  931017.40                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf       50         1.55              0.00           0.00  643.09    0.74   23595.57                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      150         2.98              0.00           0.00  335.29    0.85   66299.43                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      200         3.81              0.00           0.00  262.64    0.87   87416.15                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf      500         8.80              0.00           0.00  113.64    0.93  209061.37                1.00
corpus-dbpedia-entity-E5-small-0.fvec         ivf     1000        16.18              0.00           0.00   61.81    0.96  394906.29                1.00
```
This concept is complicated.

Closes elastic#128991

Co-authored-by: Larisa Motova <[email protected]>
Co-authored-by: Liam Thompson <[email protected]>
Copy link

Documentation preview:

@seanstory seanstory closed this Jul 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet