Improve reading ignored source in esql #130676

martijnvg · 2025-07-06T15:11:26Z

Parse sub field fields once in IgnoredSourceRowStrideReader instead of for each doc that gets read.
Introduce a dedicated StoredFieldLoader for ignored source (IgnoredSourceFieldLoader). Which optimizes reading stored fields just for ignored source by avoiding relatively expensive set stuff (see CustomFieldsVisitor) and make use aborting loading stored fields (Status.STOP) when ignored source is read.

* Parse sub field fields once in IgnoredSourceRowStrideReader instead of for each doc that gets read. * Introduce a dedicated StoredFieldLoader for ignored source (IgnoredSourceFieldLoader). Which optimizes reading stored fields just for ignored source by avoiding relatively expensive set stuff (see CustomFieldsVisitor) and make use aborting loading stored fields (`Status.STOP`) when ignored source is read.

martijnvg · 2025-07-09T13:17:02Z

server/src/main/java/org/elasticsearch/index/fieldvisitor/IgnoredSourceFieldLoader.java

+            var result = IgnoredSourceFieldMapper.decodeIfMatch(value, potentialFieldsInIgnoreSource);
+            if (result != null) {
+                // TODO: can't do this in case multiple entries for the same field name. (objects, arrays etc.)
+                // done = true;


Note this doesn't work in the case that multiple ignored source entries exist for the same field. This is possible with nested objects and arrays that share the same name.

Looks like the best way to achieve is for each ignored source entry to use a unique stored field name and not reuse _ignored_source as stored field name. This shouldn't be too difficult, given that each ignored source entry is already stored separately, but just with the same name.

jordan-powers

I have been experimenting with this code and I've been encountering issues.

The main problem is that, for ESQL queries returning multiple stored values, the ValuesFromManyReader#fieldsMoved() merges all the StoredFieldSpec into a single spec for all the stored values requested in the query. This happens when requesting multiple values from _ignored_source, or when requesting some values from _ignored_source and other values out of doc_values: false, stored: true fields.

This means, as written, we will only use our optimized IgnoredSourceFieldLoader for queries retrieving a single value from _ignored_source and no values from stored fields. Otherwise, we will fall back to the default SourceLoader.

Unfortunately, this default implementation no longer works because the FallbackSyntheticSourceBlockLoader#rowStrideStoredFieldSpec() now returns a field spec for a field that does not actually exist (_ignored_source.<field_name> when the field is still actually named _ignored_source).

We could probably update the IgnoredSourceFieldLoader to handle loading multiple values from _ignored_source, however that would not resolve the issue of loading a mix of values from _ignored_source and other stored fields.

I think the best solution here is to implement #130919 first. Then the fallback loader will still be looking for fields that actually exist, and won't return null.

jordan-powers · 2025-07-24T21:17:03Z

server/src/main/java/org/elasticsearch/index/mapper/FallbackSyntheticSourceBlockLoader.java

    @Override
    public StoredFieldsSpec rowStrideStoredFieldSpec() {
-        return new StoredFieldsSpec(false, false, Set.of(IgnoredSourceFieldMapper.NAME));
+        return new StoredFieldsSpec(false, false, Set.of(IgnoredSourceFieldMapper.NAME + "." + fieldName));


This specified field doesn't actually exist. This is fine when we're using our custom IgnoredSourceFieldLoader, but sometimes we fall back to the default StoredFieldLoader (when we're loading more than one value from ignored source, or when we're loading some values from other stored fields).

jordan-powers · 2025-07-24T21:20:15Z

server/src/main/java/org/elasticsearch/index/fieldvisitor/IgnoredSourceFieldLoader.java

+    static boolean supports(StoredFieldsSpec spec) {
+        return spec.requiresSource() == false
+            && spec.requiresMetadata() == false
+            && spec.requiredStoredFields().size() == 1


When a query requests multiple values, the ValuesFromManyReader#fieldsMoved(), will merge the multiple StoredFieldSpecs into a single spec that specifies multiple fields. This check then fails (since it expects only 1 stored field in the spec), and we fall back to the default StoredFieldLoader implementation.

jordan-powers · 2025-07-24T21:35:23Z

server/src/main/java/org/elasticsearch/index/mapper/FallbackSyntheticSourceBlockLoader.java

            for (Object value : ignoredSource) {
-                IgnoredSourceFieldMapper.NameValue nameValue = IgnoredSourceFieldMapper.decode(value);
-                if (fieldNames.contains(nameValue.name())) {
+                IgnoredSourceFieldMapper.NameValue nameValue = (IgnoredSourceFieldMapper.NameValue) value;


If we fallback to the default StoredFieldLoader instead of our custom IgnoredSourceFieldLoader, this would cause a ClassCastException (except that now our visitor is filtering for _ignored_source.<field_name> instead of _ignored_source, so the ignoredSource variable is null and this whole block is skipped).

jordan-powers · 2025-07-24T21:54:34Z

Ok, I was able to address the issues I discussed by updating the CustomFieldVisitor to always load the _ignored_source field: #131885

It feels a bit hacky, and I still think it might be cleanest to implement #130919 first.

elasticsearchmachine added the v9.2.0 label Jul 6, 2025

martijnvg force-pushed the improve_reading_ignored_source branch 2 times, most recently from a23fc3d to 4f6bacf Compare July 7, 2025 07:10

martijnvg added 3 commits July 9, 2025 09:39

Let IgnoredSourceFieldLoader.SFV decode NameValue

c376155

todo

9e17d81

martijnvg force-pushed the improve_reading_ignored_source branch from 2bbef1d to 9e17d81 Compare July 9, 2025 10:35

[CI] Auto commit changes from spotless

b407867

martijnvg commented Jul 9, 2025

View reviewed changes

martijnvg mentioned this pull request Jul 9, 2025

Improve accessing unmapped fields with logsdb #130885

Open

5 tasks

jordan-powers reviewed Jul 24, 2025

View reviewed changes

jordan-powers mentioned this pull request Jul 28, 2025

Special field visitor for _ignored_source #131885

Closed

martijnvg closed this Aug 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve reading ignored source in esql #130676

Improve reading ignored source in esql #130676

Uh oh!

martijnvg commented Jul 6, 2025

Uh oh!

martijnvg Jul 9, 2025

Uh oh!

jordan-powers left a comment

Uh oh!

jordan-powers Jul 24, 2025

Uh oh!

jordan-powers Jul 24, 2025

Uh oh!

jordan-powers Jul 24, 2025

Uh oh!

jordan-powers commented Jul 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Improve reading ignored source in esql #130676

Improve reading ignored source in esql #130676

Uh oh!

Conversation

martijnvg commented Jul 6, 2025

Uh oh!

martijnvg Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

jordan-powers left a comment

Choose a reason for hiding this comment

Uh oh!

jordan-powers Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

jordan-powers Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

jordan-powers Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

jordan-powers commented Jul 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants