-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Improve reading ignored source in esql #130676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a23fc3d
to
4f6bacf
Compare
* Parse sub field fields once in IgnoredSourceRowStrideReader instead of for each doc that gets read. * Introduce a dedicated StoredFieldLoader for ignored source (IgnoredSourceFieldLoader). Which optimizes reading stored fields just for ignored source by avoiding relatively expensive set stuff (see CustomFieldsVisitor) and make use aborting loading stored fields (`Status.STOP`) when ignored source is read.
2bbef1d
to
9e17d81
Compare
var result = IgnoredSourceFieldMapper.decodeIfMatch(value, potentialFieldsInIgnoreSource); | ||
if (result != null) { | ||
// TODO: can't do this in case multiple entries for the same field name. (objects, arrays etc.) | ||
// done = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this doesn't work in the case that multiple ignored source entries exist for the same field. This is possible with nested objects and arrays that share the same name.
Looks like the best way to achieve is for each ignored source entry to use a unique stored field name and not reuse _ignored_source
as stored field name. This shouldn't be too difficult, given that each ignored source entry is already stored separately, but just with the same name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have been experimenting with this code and I've been encountering issues.
The main problem is that, for ESQL queries returning multiple stored values, the ValuesFromManyReader#fieldsMoved()
merges all the StoredFieldSpec
into a single spec for all the stored values requested in the query. This happens when requesting multiple values from _ignored_source, or when requesting some values from _ignored_source and other values out of doc_values: false, stored: true
fields.
This means, as written, we will only use our optimized IgnoredSourceFieldLoader
for queries retrieving a single value from _ignored_source and no values from stored fields. Otherwise, we will fall back to the default SourceLoader
.
Unfortunately, this default implementation no longer works because the FallbackSyntheticSourceBlockLoader#rowStrideStoredFieldSpec()
now returns a field spec for a field that does not actually exist (_ignored_source.<field_name>
when the field is still actually named _ignored_source
).
We could probably update the IgnoredSourceFieldLoader
to handle loading multiple values from _ignored_source, however that would not resolve the issue of loading a mix of values from _ignored_source and other stored fields.
I think the best solution here is to implement #130919 first. Then the fallback loader will still be looking for fields that actually exist, and won't return null.
@Override | ||
public StoredFieldsSpec rowStrideStoredFieldSpec() { | ||
return new StoredFieldsSpec(false, false, Set.of(IgnoredSourceFieldMapper.NAME)); | ||
return new StoredFieldsSpec(false, false, Set.of(IgnoredSourceFieldMapper.NAME + "." + fieldName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This specified field doesn't actually exist. This is fine when we're using our custom IgnoredSourceFieldLoader, but sometimes we fall back to the default StoredFieldLoader (when we're loading more than one value from ignored source, or when we're loading some values from other stored fields).
static boolean supports(StoredFieldsSpec spec) { | ||
return spec.requiresSource() == false | ||
&& spec.requiresMetadata() == false | ||
&& spec.requiredStoredFields().size() == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a query requests multiple values, the ValuesFromManyReader#fieldsMoved()
, will merge the multiple StoredFieldSpecs
into a single spec that specifies multiple fields. This check then fails (since it expects only 1 stored field in the spec), and we fall back to the default StoredFieldLoader
implementation.
for (Object value : ignoredSource) { | ||
IgnoredSourceFieldMapper.NameValue nameValue = IgnoredSourceFieldMapper.decode(value); | ||
if (fieldNames.contains(nameValue.name())) { | ||
IgnoredSourceFieldMapper.NameValue nameValue = (IgnoredSourceFieldMapper.NameValue) value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we fallback to the default StoredFieldLoader
instead of our custom IgnoredSourceFieldLoader
, this would cause a ClassCastException
(except that now our visitor is filtering for _ignored_source.<field_name>
instead of _ignored_source
, so the ignoredSource
variable is null and this whole block is skipped).
Status.STOP
) when ignored source is read.