Skip to content

Conversation

@jimczi
Copy link
Contributor

@jimczi jimczi commented Sep 30, 2024

Spinoff of #113036.

This change introduces optional source filtering directly within source loaders (both synthetic and stored). The main benefit is seen in synthetic source loaders, as synthetic fields are stored independently. By filtering while loading the synthetic source, generating the source becomes linear in the number of fields that match the filter.

This update also modifies the get document API to apply source filters earlier/directly through the source loader. The search API, however, is not affected in this change, since the loaded source is still used by other features (e.g., highlighting, fields, nested hits), and source filtering is always applied as the final step. A follow-up will be required to ensure careful handling of all search-related scenarios.

Spinoff of elastic#113036.

This change introduces optional source filtering directly within source loaders (both synthetic and stored).
The main benefit is seen in synthetic source loaders, as synthetic fields are stored independently.
By filtering while loading the synthetic source, generating the source becomes linear in the number of fields that match the filter.

This update also modifies the get document API to apply source filters earlier—directly through the source loader.
The search API, however, is not affected in this change, since the loaded source is still used by other features (e.g., highlighting, fields, nested hits),
and source filtering is always applied as the final step.
A follow-up will be required to ensure careful handling of all search-related scenarios.
@jimczi jimczi added >enhancement :StorageEngine/Mapping The storage related side of mappings v8.16.0 v9.0.0 labels Sep 30, 2024
@jimczi jimczi requested a review from kkrik-es September 30, 2024 22:17
@jimczi jimczi requested a review from a team as a code owner September 30, 2024 22:17
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @jimczi, I've created a changelog YAML for you.

protected SourceLoader.SyntheticFieldLoader syntheticFieldLoader(Stream<Mapper> mappers, boolean isFragment) {
var fields = mappers.sorted(Comparator.comparing(Mapper::fullPath))
.map(Mapper::syntheticFieldLoader)
SourceLoader.SyntheticFieldLoader syntheticFieldLoader(SourceFilter filter, Collection<Mapper> mappers, boolean isFragment) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: keep passing the stream to avoid reconverting the list to a stream?

I also wonder about the overhead here, shall we skip streams altogether and use good-old collections?

@kkrik-es kkrik-es requested review from lkts and martijnvg October 1, 2024 07:36
@kkrik-es
Copy link
Contributor

kkrik-es commented Oct 1, 2024

Looks good overall, added Martijn and Sasha to get another look, since this is a core part of synthetic source.

Do we have any benchmark results quantifying the impact of this?

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The xcontent changes LGTM.

Copy link
Contributor

@lkts lkts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, synthetic source changes look good to me. I wonder if there is any performance impact when filtering is not used compared to current state.

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@kkrik-es
Copy link
Contributor

@jimczi any chance you can get back to submitting this? It should also help with fetching fields from synthetic source in ES|QL, with proper block loader adjustments.

@jimczi
Copy link
Contributor Author

jimczi commented Dec 11, 2024

@kkrik-es I have updated the branch and can proceed with merging this PR as is, provided there are no objections. Please let me know if you have any concerns or additional feedback.

@kkrik-es
Copy link
Contributor

I think we've all approved, very excited to see it submitted.

@jimczi jimczi added the auto-backport Automatically create backport pull requests when merged label Dec 11, 2024
@jimczi jimczi merged commit b40a520 into elastic:main Dec 11, 2024
16 checks passed
@jimczi jimczi deleted the source_loader_with_filters branch December 11, 2024 13:17
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 113827

jimczi added a commit to jimczi/elasticsearch that referenced this pull request Dec 11, 2024
This change introduces optional source filtering directly within source loaders (both synthetic and stored).
The main benefit is seen in synthetic source loaders, as synthetic fields are stored independently.
By filtering while loading the synthetic source, generating the source becomes linear in the number of fields that match the filter.

This update also modifies the get document API to apply source filters earlier—directly through the source loader.
The search API, however, is not affected in this change, since the loaded source is still used by other features (e.g., highlighting, fields, nested hits),
and source filtering is always applied as the final step.
A follow-up will be required to ensure careful handling of all search-related scenarios.
elasticsearchmachine pushed a commit that referenced this pull request Dec 11, 2024
This change introduces optional source filtering directly within source loaders (both synthetic and stored).
The main benefit is seen in synthetic source loaders, as synthetic fields are stored independently.
By filtering while loading the synthetic source, generating the source becomes linear in the number of fields that match the filter.

This update also modifies the get document API to apply source filters earlier—directly through the source loader.
The search API, however, is not affected in this change, since the loaded source is still used by other features (e.g., highlighting, fields, nested hits),
and source filtering is always applied as the final step.
A follow-up will be required to ensure careful handling of all search-related scenarios.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >enhancement :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine v8.18.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants