Seanstory/increase mapping field meta char limit #3

seanstory · 2025-07-17T20:11:02Z

Have you signed the contributor license agreement?
Have you followed the contributor guidelines?
If submitting code, have you built your formula locally prior to submission with gradle check?
If submitting code, is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed.
If submitting code, have you checked that your submission is for an OS and architecture that we support?
If you are submitting this code for a class then read our policy for that.

We specify the master node timeout from the REST request to avoid waiting for the task indefinitely. Resolves elastic#120389

These test failures looked like infra/CI blips to me. Closes elastic#124518

…lastic#128917) Part of elastic#124715 and similar to elastic#128476. Different from elastic#128476 in that it takes a "LogicalPlan" approach to running a sub-query, integrating its result back in the "main" LogicalPlan and continuing running the query.

* Update [email protected] * Update resources.yaml * fix: explicitly map system.process.cpu.start_time to date * Update [email protected] * Update [email protected] * Update [email protected]

…ic#129684) In a follow up (elastic#128993) remaining lenient usage of booleans will be deprecated, to eventually remove everything except for a few places requiring lenient parsing by means of Booleans.parseBooleanLenient - which is a wrapper around Boolean.parseBoolean. --------- Co-authored-by: Moritz Mack <[email protected]>

This action solely needs the cluster state, it can run on any node. Since this action is invoked across clusters, we need to be able to (de)serialize requests and responses. We introduce a new `RemoteClusterStateRequest` that wraps the existing `ClusterStateRequest` and implements (de)serialization.

…ic#130716) Remove threat detection example

Add verification for LocalLogical plan The verification is skipped if there is remote enrich, similar to how it is skipped for LocalPhysical plan optimization. The skip only happens for LocalLogical and LocalPhysical plan optimizers.

* Add filtering for kNN vector indexer test scenarios * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <[email protected]>

elastic#130827

Cleanup tracing header name constants

This commit fixes the Int7uScorerBenchmarkTests for running on Java 21, since scoring with heap segments is only supported on Java 22 and greater.

…st {p0=mtermvectors/10_basic/Tests catching other exceptions per item} elastic#122414

Unmute yaml test fixed by elastic#130732 Closes elastic#130626, elastic#130661

Fixes a bug during field loading where we could double-close blocks if we failed to allocate memory during the un-shuffling portion of field loading from single segments. Unit test incoming in the followup. Closes elastic#130426 Closes elastic#130790 Closes elastic#130791 Closes elastic#130792 Closes elastic#130793 Closes elastic#130270 Closes elastic#130788 Closes elastic#130122 Closes elastic#130827

* Adding embedding type * Adding more tests and cleaning up

…elastic#130853

For most of the usages of these methods, it made more sense to return a `ProjectMetadata` instead of a `ClusterState`. We also don't need to specify a specific project ID; generating a random one inside the helper method saves some boilerplate code.

We should not build the sorted structure for the ordinal grouping operator if the requested position is larger than maxGroupId. This situation occurs with nulls. We should benchmark the ordinal blocks and consider removing the ordinal grouping operator if performance is similar; otherwise, we need to integrate this operator with GroupingAggregatorFunctionTestCase. Relates elastic#130576

… instead of interacting with doc values api directly. (elastic#130854) This pulls elastic#130845 into the serverless fix branch for patch deployment. Original description: Change match_only_text's value fetcher to use SortedBinaryDocValues instead of interacting with doc values api directly. This way, via field data abstraction, the right doc values type is used, and the right conversions happen. Values of all field types will get converted to strings. Co-authored-by: Martijn van Groningen <[email protected]>

…ializationPreMultiProject elastic#130872

…stic#130474)

This change modifies reindex behavior to always include vector fields, even if the target index omits embeddings from _source. This prepares for scenarios where embeddings may be automatically excluded (elastic#130382).

* Put shards failure under a cap flag

…DisruptionIT testDataStreamLifecycleDownsampleRollingRestart elastic#131394

With the ordinal grouping operator removed in elastic#131133, this PR removes the corresponding code path in the grouping aggregator function, as it is no longer needed. Relates elastic#131133

…rentUserAndGroup elastic#131412

The new attribute generated by MV_EXPAND should remain in the original position. The projection added by ProjectAwayColumns does not respect the original order of attributes. Make ProjectAwayColumns respect the order of attributes to fix this.

* ES|QL categorize options * refactor options * fix serialization * polish * add verfications * better test coverage + polish code * better test coverage + polish code

This PR migrates legacy rest tests in the x-pack autoscaling module

It's already part of the path parts, it's not useful to duplicate it in query parameters.

…lastic#131411)

* Add Azure AI Rerank support * address comments * address comments * refactor azure ai studio service * update rerank task settings test * add provider for rerank

Adds the `includeDiskInfo` parameter to the `cluster/allocation/explain` `toString()` method, and adds tests.

Also add test to ensure the file has at least one entry for each region so that it is easy to spot missing regions in future upgrades. Relates: elastic#131050 Resolves: elastic#131392

* Refactoring google gemini streaming error handling * Updating comments

* To prevent an implicit grant-all if storing node homes inside the Java temp dir, the temporary folder of ESTestCase is configured separately from the Java temp dir in internalClusterTests (by means of the system property tempDir, see TestRuleTemporaryFilesCleanup) * Move ReloadingDatabasesWhilePerformingGeoLookupsIT from internalClusterTest to test, file permissions in internalClusterTest are stricter on the lucene tempDir

Correct response which had swapped "skipped" and "failed" shard counts.

…the centroids file (elastic#131421)

* fix boosting for knn * Fixing for match query * fixing for match subquery * fix for sparse vector query boost * fix linting issues * Update docs/changelog/129282.yaml * update changelog * Copy constructor with match query * util function to create sparseVectorBuilder for sparse query * util function for knn query to support boost * adding unit tests for all intercepted query terms * Adding yaml test for match,sparse, and knn * Adding queryname support for nested query * fix code styles * Fix failed yaml tests * Update docs/changelog/129282.yaml * update yaml tests to expand test scenarios * Updating knn to copy constructor * adding yaml tests for multiple indices * refactoring match query to adjust boost and queryname and move to copy constructor * refactoring sparse query to adjust boost and queryname and move to copy constructor * [CI] Auto commit changes from spotless * Refactor sparse vector to adjust boost and queryname in the top level * Refactor knn vector to adjust boost and queryname in the top level * fix knn combined query * fix unit tests * fix lint issues * remove unused code * Update inference feature name * Remove double boosting issue from match * Fix double boosting in match test yaml file * move to bool level for match semantic boost * fix double boosting for sparse vector * fix double boosting for sparse vector in yaml test * fix knn combined query * fix knn combined query * fix sparse combined query * fix knn yaml test for combined query * refactoring unit tests * linting * fix match query unit test * adding copy constructor for match query * refactor copy match builder to intercepter * [CI] Auto commit changes from spotless * fix unit tests * update yaml tests * fix match yaml test * fix yaml tests with 4 digits error margin * unit tests are now more randomized --------- Co-authored-by: Elastic Machine <[email protected]> Co-authored-by: elasticsearchmachine <[email protected]>

When the Trained Model has been deployed through the Inference Endpoint API, it can only be updated using the Inference Endpoint API. When the Trained Model has been deployed and then attached to an Inference Endpoint, it can only be updated using the Trained Model API. Fix elastic#129999 Co-authored-by: elasticsearchmachine <[email protected]> Co-authored-by: David Kyle <[email protected]>

In elastic#131314 we fixed match_only_text fields with ignore_above keyword multi-fields in the case that the keyword multi-field is stored. However, the issue is still present if the keyword field is not stored, but instead has doc values. This patch fixes that case.

…stic#131296)

Although blocks/vectors are immutable and safe to share between threads, their references are currently not thread-safe, which can lead to data races. Previously, blocks/vectors were exclusively owned by a single thread, but this is no longer always the case with InlineJoin. We should consider switching to AbstractRefCounted, which is thread-safe, and benchmark it with many-fields use cases to ensure there is no performance regression. As a temporary solution, this change clones the values block in InlineJoin until thread-safe blocks/vectors are available.

…121914)" (elastic#131452) This reverts commit a6f0f6f.

…129108) This commit adds support for implicit casting of aggregate_metric_double when present with other numerics for a limited set of aggregation functions: - Max / MaxOverTime - Min / MinOverTime - Sum / SumOverTime - Count / CountOverTime - Avg / AvgOverTime Attempting to use fields mapped to aggregate_metric_double in one index but some other numeric in another index in any other context will still require explicit casting with ToAggregateMetricDouble

I accidentally broke recall on flush by allowing vectors to be double quantized. Additionally, we shouldn't use the first vector as a centroid, this can harm recall significantly when there is just one centroid. recall before this change: ``` index_name index_type num_docs index_time(ms) force_merge_time(ms) num_segments ------------------------------------- ---------- -------- -------------- -------------------- ------------ corpus-dbpedia-entity-E5-small-0.fvec ivf 1000000 25820 0 14 corpus-dbpedia-entity-E5-small-0.fvec ivf 1000000 0 41693 0 index_name index_type n_probe latency(ms) net_cpu_time(ms) avg_cpu_count QPS recall visited filter_selectivity ------------------------------------- ---------- ------- ----------- ---------------- ------------- ------ ------ --------- ------------------ corpus-dbpedia-entity-E5-small-0.fvec ivf 50 13.05 0.00 0.00 76.61 0.63 285267.44 1.00 corpus-dbpedia-entity-E5-small-0.fvec ivf 150 31.92 0.00 0.00 31.33 0.68 629033.22 1.00 corpus-dbpedia-entity-E5-small-0.fvec ivf 200 34.79 0.00 0.00 28.74 0.69 679699.13 1.00 corpus-dbpedia-entity-E5-small-0.fvec ivf 500 39.40 0.00 0.00 25.38 0.71 794375.05 1.00 corpus-dbpedia-entity-E5-small-0.fvec ivf 1000 45.99 0.00 0.00 21.74 0.72 940493.52 1.00 corpus-dbpedia-entity-E5-small-0.fvec ivf 50 1.52 0.00 0.00 655.74 0.74 24201.82 1.00 corpus-dbpedia-entity-E5-small-0.fvec ivf 150 2.94 0.00 0.00 340.43 0.85 67943.31 1.00 corpus-dbpedia-entity-E5-small-0.fvec ivf 200 3.81 0.00 0.00 262.81 0.87 89575.99 1.00 corpus-dbpedia-entity-E5-small-0.fvec ivf 500 7.67 0.00 0.00 130.38 0.93 213586.44 1.00 corpus-dbpedia-entity-E5-small-0.fvec ivf 1000 14.85 0.00 0.00 67.33 0.96 402628.11 1.00 ``` With this fix: ``` index_name index_type num_docs index_time(ms) force_merge_time(ms) num_segments ------------------------------------- ---------- -------- -------------- -------------------- ------------ corpus-dbpedia-entity-E5-small-0.fvec ivf 1000000 25304 0 15 corpus-dbpedia-entity-E5-small-0.fvec ivf 1000000 0 42110 0 index_name index_type n_probe latency(ms) net_cpu_time(ms) avg_cpu_count QPS recall visited filter_selectivity ------------------------------------- ---------- ------- ----------- ---------------- ------------- ------ ------ --------- ------------------ corpus-dbpedia-entity-E5-small-0.fvec ivf 50 12.63 0.00 0.00 79.18 0.89 285527.22 1.00 corpus-dbpedia-entity-E5-small-0.fvec ivf 150 32.49 0.00 0.00 30.77 0.94 619783.37 1.00 corpus-dbpedia-entity-E5-small-0.fvec ivf 200 35.46 0.00 0.00 28.20 0.95 667903.47 1.00 corpus-dbpedia-entity-E5-small-0.fvec ivf 500 40.38 0.00 0.00 24.76 0.97 781959.74 1.00 corpus-dbpedia-entity-E5-small-0.fvec ivf 1000 48.62 0.00 0.00 20.57 0.98 931017.40 1.00 corpus-dbpedia-entity-E5-small-0.fvec ivf 50 1.55 0.00 0.00 643.09 0.74 23595.57 1.00 corpus-dbpedia-entity-E5-small-0.fvec ivf 150 2.98 0.00 0.00 335.29 0.85 66299.43 1.00 corpus-dbpedia-entity-E5-small-0.fvec ivf 200 3.81 0.00 0.00 262.64 0.87 87416.15 1.00 corpus-dbpedia-entity-E5-small-0.fvec ivf 500 8.80 0.00 0.00 113.64 0.93 209061.37 1.00 corpus-dbpedia-entity-E5-small-0.fvec ivf 1000 16.18 0.00 0.00 61.81 0.96 394906.29 1.00 ```

This concept is complicated. Closes elastic#128991 Co-authored-by: Larisa Motova <[email protected]> Co-authored-by: Liam Thompson <[email protected]>

github-actions · 2025-07-17T20:11:15Z

Documentation preview:

✨ Changed pages

mosche and others added 30 commits July 8, 2025 10:42

Disable entitlements for terminal & command tests (elastic#130690)

12accdc

Specify master timeout when submitting alias tasks (elastic#130733)

bdd6f51

We specify the master node timeout from the REST request to avoid waiting for the task indefinitely. Resolves elastic#120389

ES|QL: RRF is replaced by FUSE (elastic#130693)

8f21ade

Unmute elastic#124518 (elastic#130759)

3025f6c

These test failures looked like infra/CI blips to me. Closes elastic#124518

fix: enable date_detection for all apm data streams (elastic#130466)

a16822d

* Update [email protected] * Update resources.yaml * fix: explicitly map system.process.cpu.start_time to date * Update [email protected] * Update [email protected] * Update [email protected]

[DOCS]: Remove Example: Detect threats with EQL from reference (elast…

251479a

…ic#130716) Remove threat detection example

Entitle com.unboundid.ldap.listener as test package (elastic#130706)

fda7e56

Add filtering for kNN vector indexer test scenarios (elastic#130751)

c7c5d4a

* Add filtering for kNN vector indexer test scenarios * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <[email protected]>

Mute org.elasticsearch.xpack.esql.action.EnrichIT testFilterAfterEnrich

cbd8e41

elastic#130827

Cleanup tracing header name constants (elastic#130800)

83c0c12

Cleanup tracing header name constants

Use UTF-8 optimized read for index routing (elastic#130786)

415543d

Fix Int7uScorerBenchmarkTests for running on Java 21 (elastic#130731)

aa7372e

This commit fixes the Int7uScorerBenchmarkTests for running on Java 21, since scoring with heap segments is only supported on Java 22 and greater.

Mute org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT te…

b16007b

…st {p0=mtermvectors/10_basic/Tests catching other exceptions per item} elastic#122414

Fix test setup to properly free context (elastic#130787)

2ed5fd0

Unmute yaml test (elastic#130803)

d443674

Unmute yaml test fixed by elastic#130732 Closes elastic#130626, elastic#130661

Simplified Linear and RRF Retrievers Docs (elastic#130559)

5e2f154

[ML] Custom Service add embedding type support (elastic#130141)

172637b

* Adding embedding type * Adding more tests and cleaning up

Mute org.elasticsearch.xpack.slm.SLMFileSettingsIT testSettingsApplied …

8f0e1f7

…elastic#130853

Fix Sparse Vector Query Interceptor Double Filtering (elastic#130829)

ad0c2b1

Mute org.elasticsearch.cluster.ClusterStateSerializationTests testSer…

f31bd69

…ializationPreMultiProject elastic#130872

Mute tests in SSLErrorMessageFileTests that rely on Entitlements (ela…

950e129

…stic#130474)

smalyshev and others added 28 commits July 16, 2025 13:09

Put shards failure under a cap flag (elastic#131371)

0411940

* Put shards failure under a cap flag

Mute org.elasticsearch.xpack.downsample.DataStreamLifecycleDownsample…

6832ca4

…DisruptionIT testDataStreamLifecycleDownsampleRollingRestart elastic#131394

[DOCS] Augment self-managed connector tutorials (elastic#131127)

f135998

Remove ordinal grouping path in aggregations (elastic#131307)

efd3110

With the ordinal grouping operator removed in elastic#131133, this PR removes the corresponding code path in the grouping aggregator function, as it is no longer needed. Relates elastic#131133

Mute org.elasticsearch.packaging.test.DockerTests test072RunEsAsDiffe…

adcb2a5

…rentUserAndGroup elastic#131412

ES|QL categorize options (elastic#131104)

ec7f77b

* ES|QL categorize options * refactor options * fix serialization * polish * add verfications * better test coverage + polish code * better test coverage + polish code

Migrate x-pack-autoscaling REST tests (elastic#131365)

32e50d0

This PR migrates legacy rest tests in the x-pack autoscaling module

Remove 'index' from snapshot clear_cache query params (elastic#131067)

8fb9fb5

It's already part of the path parts, it's not useful to duplicate it in query parameters.

[DiskBBQ] Use PackedLongValues to hold offsets on heap while writing (e…

f9eee6c

…lastic#131411)

Add Azure AI Rerank support (elastic#129848)

d06b0c8

* Add Azure AI Rerank support * address comments * address comments * refactor azure ai studio service * update rerank task settings test * add provider for rerank

Add includeDiskInfo to toString() (elastic#131358)

df985e6

Adds the `includeDiskInfo` parameter to the `cluster/allocation/explain` `toString()` method, and adds tests.

Update regions_by_endpoint for AWS sdk upgrade. (elastic#131400)

fd971e8

Also add test to ensure the file has at least one entry for each region so that it is easy to spot missing regions in future upgrades. Relates: elastic#131050 Resolves: elastic#131392

[ML] Refactoring streaming error handling (elastic#131316)

3b1523a

* Refactoring google gemini streaming error handling * Updating comments

Fix bug in point in time response (elastic#131391)

f739673

Correct response which had swapped "skipped" and "failed" shard counts.

[DiskBBQ] Write the raw centroid on the posting list file instead of …

628828f

…the centroids file (elastic#131421)

apm-data: enable failure store for newly created APM datastreams (ela…

280793d

…stic#131296)

Revert "Support Fields API in conditional ingest processors (elastic#…

221998d

…121914)" (elastic#131452) This reverts commit a6f0f6f.

Split retrievers docs and redirect anchors (elastic#131385)

56477d8

Explain ignore_above better (elastic#129284)

6ed50e1

This concept is complicated. Closes elastic#128991 Co-authored-by: Larisa Motova <[email protected]> Co-authored-by: Liam Thompson <[email protected]>

Naively increase the meta field char limit 50->500

95e0a9d

seanstory closed this Jul 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Seanstory/increase mapping field meta char limit #3

Seanstory/increase mapping field meta char limit #3

Uh oh!

seanstory commented Jul 17, 2025

Uh oh!

github-actions bot commented Jul 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Seanstory/increase mapping field meta char limit #3

Seanstory/increase mapping field meta char limit #3

Uh oh!

Conversation

seanstory commented Jul 17, 2025

Uh oh!

github-actions bot commented Jul 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone