-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Alternate approach to speed up ignored_source access #133839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternate approach to speed up ignored_source access #133839
Conversation
Here are the results of running the INSIST_🐔 benchmark comparing against the current main (with the feature flag enabled, splitting up
Summary:
Looks like the new approach is comparable for limit_500 chicken_3, and chicken_3_with_where, slightly worse for chicken_1, and much worse for chicken_2 and chicken_4 |
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Jordan!
} | ||
|
||
return permissions.getFieldPermissions().filter(wrappedReader); | ||
var indexVersionCreated = searchExecutionContextProvider.apply(shardId).indexVersionCreated(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing down index version here is a little intrusive. However I don't think it is too bad. In other places of the code base passing down index version is a common pattern.
The alternative we talked about, which would store coalesced ignored source in a different field, would also be an intrusive change.
So I'm okay with this approach.
…d-fieldinfo-explosion
Follow-up to #133839 to remove the feature flag and enable the feature in production.
In #132142 and #132428 we split up ignored_source entries into distinct lucene fields, then added an optimized field visitor to speed up retrieving unmapped values for INSIST_🐔.
However, since this approach creates a unique lucene field for every ignored_source entry, we can very quickly have a lot of lucene fields if there are a lot of unique unmapped fields per document. This can cause significant slowdowns in indexing throughput and merge time.
This PR addresses those limitations by reverting back to keeping all ignored_source entries under the same lucene field. However, we still keep some of the speedups from that prior work by continuing to coalesce multiple ignored_source entries for the same field into a single entry, allowing the field visitor to exit early.
Unfortunately, we do lose some time compared to the original optimizations because now the field visitor cannot look at the fieldInfo to decide whether or not to visit a field, and it instead needs to actually visit and materialize each ignored_source entry before it can decide whether or not to keep it.