Skip to content

Conversation

martijnvg
Copy link
Member

  • Parse sub field fields once in IgnoredSourceRowStrideReader instead of for each doc that gets read.
  • Introduce a dedicated StoredFieldLoader for ignored source (IgnoredSourceFieldLoader). Which optimizes reading stored fields just for ignored source by avoiding relatively expensive set stuff (see CustomFieldsVisitor) and make use aborting loading stored fields (Status.STOP) when ignored source is read.

@martijnvg martijnvg force-pushed the improve_reading_ignored_source branch 2 times, most recently from a23fc3d to 4f6bacf Compare July 7, 2025 07:10
martijnvg added 3 commits July 9, 2025 09:39
* Parse sub field fields once in IgnoredSourceRowStrideReader instead of for each doc that gets read.
* Introduce a dedicated StoredFieldLoader for ignored source (IgnoredSourceFieldLoader). Which optimizes reading stored fields just for ignored source by avoiding relatively expensive set stuff (see CustomFieldsVisitor) and make use aborting loading stored fields (`Status.STOP`) when ignored source is read.
@martijnvg martijnvg force-pushed the improve_reading_ignored_source branch from 2bbef1d to 9e17d81 Compare July 9, 2025 10:35
var result = IgnoredSourceFieldMapper.decodeIfMatch(value, potentialFieldsInIgnoreSource);
if (result != null) {
// TODO: can't do this in case multiple entries for the same field name. (objects, arrays etc.)
// done = true;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this doesn't work in the case that multiple ignored source entries exist for the same field. This is possible with nested objects and arrays that share the same name.

Looks like the best way to achieve is for each ignored source entry to use a unique stored field name and not reuse _ignored_source as stored field name. This shouldn't be too difficult, given that each ignored source entry is already stored separately, but just with the same name.

Copy link
Contributor

@jordan-powers jordan-powers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been experimenting with this code and I've been encountering issues.

The main problem is that, for ESQL queries returning multiple stored values, the ValuesFromManyReader#fieldsMoved() merges all the StoredFieldSpec into a single spec for all the stored values requested in the query. This happens when requesting multiple values from _ignored_source, or when requesting some values from _ignored_source and other values out of doc_values: false, stored: true fields.

This means, as written, we will only use our optimized IgnoredSourceFieldLoader for queries retrieving a single value from _ignored_source and no values from stored fields. Otherwise, we will fall back to the default SourceLoader.

Unfortunately, this default implementation no longer works because the FallbackSyntheticSourceBlockLoader#rowStrideStoredFieldSpec() now returns a field spec for a field that does not actually exist (_ignored_source.<field_name> when the field is still actually named _ignored_source).

We could probably update the IgnoredSourceFieldLoader to handle loading multiple values from _ignored_source, however that would not resolve the issue of loading a mix of values from _ignored_source and other stored fields.

I think the best solution here is to implement #130919 first. Then the fallback loader will still be looking for fields that actually exist, and won't return null.

@Override
public StoredFieldsSpec rowStrideStoredFieldSpec() {
return new StoredFieldsSpec(false, false, Set.of(IgnoredSourceFieldMapper.NAME));
return new StoredFieldsSpec(false, false, Set.of(IgnoredSourceFieldMapper.NAME + "." + fieldName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specified field doesn't actually exist. This is fine when we're using our custom IgnoredSourceFieldLoader, but sometimes we fall back to the default StoredFieldLoader (when we're loading more than one value from ignored source, or when we're loading some values from other stored fields).

static boolean supports(StoredFieldsSpec spec) {
return spec.requiresSource() == false
&& spec.requiresMetadata() == false
&& spec.requiredStoredFields().size() == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a query requests multiple values, the ValuesFromManyReader#fieldsMoved(), will merge the multiple StoredFieldSpecs into a single spec that specifies multiple fields. This check then fails (since it expects only 1 stored field in the spec), and we fall back to the default StoredFieldLoader implementation.

for (Object value : ignoredSource) {
IgnoredSourceFieldMapper.NameValue nameValue = IgnoredSourceFieldMapper.decode(value);
if (fieldNames.contains(nameValue.name())) {
IgnoredSourceFieldMapper.NameValue nameValue = (IgnoredSourceFieldMapper.NameValue) value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we fallback to the default StoredFieldLoader instead of our custom IgnoredSourceFieldLoader, this would cause a ClassCastException (except that now our visitor is filtering for _ignored_source.<field_name> instead of _ignored_source, so the ignoredSource variable is null and this whole block is skipped).

@jordan-powers
Copy link
Contributor

Ok, I was able to address the issues I discussed by updating the CustomFieldVisitor to always load the _ignored_source field: #131885

It feels a bit hacky, and I still think it might be cleanest to implement #130919 first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants