Skip to content

Conversation

@parkertimmins
Copy link
Contributor

@parkertimmins parkertimmins commented May 7, 2025

Types that parse arrays directly should not need to store values in _ignored_source if synthetic_source_keep=arrays. Since they have custom handling of arrays, it provides no benefit to store in _ignored_source when there are multiple values of the type.

For example, consider the document with point field of type geo_point:

{
    "obj":  { "point": [[2,3], [4, 5]] }
}

This set of points is not subject to synthetic_source_keep=arrays since the geo_point mapper does custom array parsing. Since this is the case, it does not make sense to require that the following equivalent document use _ignored_source:

{
    "obj":  [
         { "point": [2,3] },
         { "point": [4,5] }
   ]
}

Currently, this second document does use _ignored_source, which is a waste of space. This PR causes the second document to behave the same as the first, not using _ignored_source.

@parkertimmins parkertimmins requested a review from martijnvg May 7, 2025 02:12
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.1.0 labels May 7, 2025
@parkertimmins parkertimmins requested a review from lkts May 7, 2025 02:13
@parkertimmins parkertimmins added >enhancement :StorageEngine/Mapping The storage related side of mappings labels May 7, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label May 7, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @parkertimmins, I've created a changelog YAML for you.

@parkertimmins
Copy link
Contributor Author

Fixes #126155

nielsbauman and others added 16 commits May 6, 2025 22:16
…tic#127693)

We replace usages of time sensitive
`DataStream#getDefaultBackingIndexName` with the retrieval of the name
via an API call. The problem with using the time sensitive method is
that we can have test failures around midnight.

Relates elastic#123376
Currently, we don't run the coordinating can_match to skip unmatched 
shards in field-caps. Most of the time, this is fine, but the current 
field-caps fails when the target shards are unavailable, even if they
don't match the index filter. This change integrates the coordinating
can_match into field-caps to prevent failures in such cases.
Now that security manager is gone, the policy files are no longer
needed. This commit removes the server, test and plugin specific policy
files
)

Apache Lucene 10.2 exposes a new search strategy for executing filtered searches over HNSW graphs.

This PR switches to utilizing that strategy by default as it generally provides a much better recall/latency pareto frontier than our regular hnsw fanout search.

Additionally, a new tech-preview setting is provided to potentially revert to the old fanout behavior if issues arise.
With the SecurityManager gone, the PrivilegedOperations class is no
longer needed, these operations can be called directly.
…sReference (elastic#127404)

We have a couple of places in the codebase where we do the transition from the stream
to the reference.
We can save some code and make this a little less error-prone by having a conversion
method with move-style semantics and enabling the use of try-with-resources.
Also, this enables a couple of optimizations down the line and unlinking the list of pages
and moving it to the reference instead of nulling it out is a bit nicer to the CPU caches
also.
The SecureMockMaker is a way for mockito to be allowed SM permissions
for proxying. Since the Security Manager is no longer used, secure mocks
are no longer needed. This commit removes them.
…out to request (elastic#126805)

* Fixing bug with listener and adding timeout

* Update docs/changelog/126805.yaml

* Fixing tests

* Fixing writeTo
elasticsearchmachine and others added 7 commits May 6, 2025 22:16
…ionAuthorizationIT testIndicesPrivilegesAreEnforcedForCcrRestoreSessionActions elastic#127782
* Avoid time-based expiry of channel stats or else `testHttpClientStats`
  will fail if running multiple iterations for more than 5m.

* Assert all bytes received in `testHttpClientStats`.
Now that entitlements are always enabled, there is no need for
duplicating PR jobs for entitlements enabled and not enabled. This
commit removes the buildkite setup for the entitlements jobs.
Since SecurityManager is no longer used, the custom subclass of
SecurityManager, SecureSM, is no longer needed.
…astic#123426)

Removed one unnecessary sort (the shards are already sorted in the same order) and ported two more to JDK's list sort which beats Lucene's timsort by quite a margin nowadays (the mutations are done straight on the array backing the list and that saves endless indirection).
Also, removed a needless intermediary to and from set copy and an unnecessary conditional.
This is all motivated by spacetime project benchmarking showing
this stuff as one of the biggest contributors to search transport
thread use.
@parkertimmins parkertimmins requested review from a team as code owners May 7, 2025 03:16
@parkertimmins parkertimmins deleted the parkertimmins/synthetic-source-geo-point-arrays branch May 7, 2025 03:24
@parkertimmins
Copy link
Contributor Author

Oof, not sure what I did when trying to rebase 🤦‍♂️
Closing this PR and trying again

@parkertimmins
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.