ESQL: Speed loading stored fields

nik9000 · nik9000 · commit 442796442af2 · 2025-04-24T14:02:46.000-04:00
This speeds up loading from stored fields by opting more blocks into the
"sequential" strategy. This really kicks in when loading stored fields
like `text`. And when you need less than 100% of documents, but more than,
say, 10%. This is most useful when you need 99.9% of field documents.
That sort of thing. Here's the perf numbers:
```
%100.0 {"took": 403 -&gt; 401,"documents_found":1000000}
%099.9 {"took":3990 -&gt; 436,"documents_found": 999000}
%099.0 {"took":4069 -&gt; 440,"documents_found": 990000}
%090.0 {"took":3468 -&gt; 421,"documents_found": 900000}
%030.0 {"took":1213 -&gt; 152,"documents_found": 300000}
%020.0 {"took": 766 -&gt; 104,"documents_found": 200000}
%010.0 {"took": 397 -&gt;  55,"documents_found": 100000}
%009.0 {"took": 352 -&gt; 375,"documents_found":  90000}
%008.0 {"took": 304 -&gt; 317,"documents_found":  80000}
%007.0 {"took": 273 -&gt; 287,"documents_found":  70000}
%005.0 {"took": 199 -&gt; 204,"documents_found":  50000}
%001.0 {"took":  46 -&gt;  46,"documents_found":  10000}
```

Let's explain this with an example. First, jump to `main` and load a
million documents:
```
rm -f /tmp/bulk
for a in {1..1000}; do
    echo '{"index":{}}' &gt;&gt; /tmp/bulk
    echo '{"text":"text '$(printf %04d $a)'"}' &gt;&gt; /tmp/bulk
done

curl -s -uelastic:password -HContent-Type:application/json -XDELETE localhost:9200/test
for a in {1..1000}; do
    echo -n $a:
    curl -s -uelastic:password -HContent-Type:application/json -XPOST localhost:9200/test/_bulk?pretty --data-binary @/tmp/bulk | grep errors
done
curl -s -uelastic:password -HContent-Type:application/json -XPOST localhost:9200/test/_forcemerge?max_num_segments=1
curl -s -uelastic:password -HContent-Type:application/json -XPOST localhost:9200/test/_refresh
echo
```

Now query them all. Run this a few times until it's stable:
```
echo -n "%100.0 "
curl -s -uelastic:password -HContent-Type:application/json -XPOST 'localhost:9200/_query?pretty' -d'{
    "query": "FROM test | STATS SUM(LENGTH(text))",
    "pragma": {
        "data_partitioning": "shard"
    }
}' | jq -c '{took, documents_found}'
```

Now fetch 99.9% of documents:
```
echo -n "%099.9 "
curl -s -uelastic:password -HContent-Type:application/json -XPOST 'localhost:9200/_query?pretty' -d'{
    "query": "FROM test | WHERE NOT text.keyword IN (\"text 0998\") | STATS SUM(LENGTH(text))",
    "pragma": {
        "data_partitioning": "shard"
    }
}' | jq -c '{took, documents_found}'
```

This should spit out something like:
```
%100.0 { "took":403,"documents_found":1000000}
%099.9 {"took":4098, "documents_found":999000}
```

We're loading *fewer* documents but it's slower! What in the world?!
If you dig into the profile you'll see that it's value loading:
```
$ curl -s -uelastic:password -HContent-Type:application/json -XPOST 'localhost:9200/_query?pretty' -d'{
    "query": "FROM test | STATS SUM(LENGTH(text))",
    "pragma": {
        "data_partitioning": "shard"
    },
    "profile": true
}' | jq '.profile.drivers[].operators[] | select(.operator | contains("ValuesSourceReaderOperator"))'
{
  "operator": "ValuesSourceReaderOperator[fields = [text]]",
  "status": {
    "readers_built": {
      "stored_fields[requires_source:true, fields:0, sequential: true]": 222,
      "text:column_at_a_time:null": 222,
      "text:row_stride:BlockSourceReader.Bytes": 1
    },
    "values_loaded": 1000000,
    "process_nanos": 370687157,
    "pages_processed": 222,
    "rows_received": 1000000,
    "rows_emitted": 1000000
  }
}
$ curl -s -uelastic:password -HContent-Type:application/json -XPOST 'localhost:9200/_query?pretty' -d'{
    "query": "FROM test | WHERE NOT text.keyword IN (\"text 0998\") | STATS SUM(LENGTH(text))",
    "pragma": {
        "data_partitioning": "shard"
    },
    "profile": true
}' | jq '.profile.drivers[].operators[] | select(.operator | contains("ValuesSourceReaderOperator"))'
{
  "operator": "ValuesSourceReaderOperator[fields = [text]]",
  "status": {
    "readers_built": {
      "stored_fields[requires_source:true, fields:0, sequential: false]": 222,
      "text:column_at_a_time:null": 222,
      "text:row_stride:BlockSourceReader.Bytes": 1
    },
    "values_loaded": 999000,
    "process_nanos": 3965803793,
    "pages_processed": 222,
    "rows_received": 999000,
    "rows_emitted": 999000
  }
}
```

It jumps from 370ms to almost four seconds! Loading fewer values! The
second big difference is in the `stored_fields` marker. In the second on
it's `sequential: false` and in the first `sequential: true`.

`sequential: true` uses Lucene's "merge" stored fields reader instead of
the default one. It's much more optimized at decoding sequences of
documents.

Previously we only enabled this reader when loading compact sequences of
documents - when the entire block looks like
```
1, 2, 3, 4, 5, ... 1230, 1231
```

If there are any gaps we wouldn't enable it. That was a very
conservative thing we did long ago without doing any experiments. We
knew it was faster without any gaps, but not otherwise. It turns out
it's a lot faster in a lot more cases. I've measured it as faster for
99% gaps, at least on simple documents. I'm a bit worried that this is
too aggressive, so I've set made it configurable and made the default
being to use the "merge" loader with 10% gaps. So we'd use the merge
loader with a block like:
```
1, 11, 21, 31, ..., 1231, 1241
```
diff --git a/benchmarks/src/main/java/org/elasticsearch/benchmark/compute/operator/ValuesSourceReaderBenchmark.java b/benchmarks/src/main/java/org/elasticsearch/benchmark/compute/operator/ValuesSourceReaderBenchmark.java
@@ -25,6 +25,7 @@
 import org.apache.lucene.util.NumericUtils;
 import org.elasticsearch.common.breaker.NoopCircuitBreaker;
 import org.elasticsearch.common.lucene.Lucene;
+import org.elasticsearch.common.settings.Settings;
 import org.elasticsearch.common.util.BigArrays;
 import org.elasticsearch.compute.data.BlockFactory;
 import org.elasticsearch.compute.data.BytesRefBlock;
@@ -50,6 +51,7 @@
 import org.elasticsearch.index.mapper.MappedFieldType;
 import org.elasticsearch.index.mapper.NumberFieldMapper;
 import org.elasticsearch.search.lookup.SearchLookup;
+import org.elasticsearch.xpack.esql.plugin.QueryPragmas;
 import org.openjdk.jmh.annotations.Benchmark;
 import org.openjdk.jmh.annotations.BenchmarkMode;
 import org.openjdk.jmh.annotations.Fork;
@@ -336,7 +338,8 @@ public void benchmark() {
             List.of(new ValuesSourceReaderOperator.ShardContext(reader, () -> {
                 throw new UnsupportedOperationException("can't load _source here");
             })),
-            0
+            0,
+            QueryPragmas.STORED_FIELDS_SEQUENTIAL_PROPORTION.getDefault(Settings.EMPTY)
         );
         long sum = 0;
         for (Page page : pages) {
diff --git a/x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/lucene/ValuesSourceReaderOperator.java b/x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/lucene/ValuesSourceReaderOperator.java
@@ -72,10 +72,18 @@ public class ValuesSourceReaderOperator extends AbstractPageMappingOperator {
      * @param shardContexts per-shard loading information
      * @param docChannel the channel containing the shard, leaf/segment and doc id
      */
-    public record Factory(List<FieldInfo> fields, List<ShardContext> shardContexts, int docChannel) implements OperatorFactory {
+    public record Factory(List<FieldInfo> fields, List<ShardContext> shardContexts, int docChannel, double storedFieldsSequentialProportion)
+        implements
+            OperatorFactory {
         @Override
         public Operator get(DriverContext driverContext) {
-            return new ValuesSourceReaderOperator(driverContext.blockFactory(), fields, shardContexts, docChannel);
+            return new ValuesSourceReaderOperator(
+                driverContext.blockFactory(),
+                fields,
+                shardContexts,
+                docChannel,
+                storedFieldsSequentialProportion
+            );
         }
 
         @Override
@@ -113,6 +121,7 @@ public record ShardContext(IndexReader reader, Supplier<SourceLoader> newSourceL
     private final List<ShardContext> shardContexts;
     private final int docChannel;
     private final BlockFactory blockFactory;
+    private final double storedFieldsSequentialProportion;
 
     private final Map<String, Integer> readersBuilt = new TreeMap<>();
     private long valuesLoaded;
@@ -125,11 +134,18 @@ public record ShardContext(IndexReader reader, Supplier<SourceLoader> newSourceL
      * @param fields fields to load
      * @param docChannel the channel containing the shard, leaf/segment and doc id
      */
-    public ValuesSourceReaderOperator(BlockFactory blockFactory, List<FieldInfo> fields, List<ShardContext> shardContexts, int docChannel) {
+    public ValuesSourceReaderOperator(
+        BlockFactory blockFactory,
+        List<FieldInfo> fields,
+        List<ShardContext> shardContexts,
+        int docChannel,
+        double storedFieldsSequentialProportion
+    ) {
         this.fields = fields.stream().map(f -> new FieldWork(f)).toArray(FieldWork[]::new);
         this.shardContexts = shardContexts;
         this.docChannel = docChannel;
         this.blockFactory = blockFactory;
+        this.storedFieldsSequentialProportion = storedFieldsSequentialProportion;
     }
 
     @Override
@@ -440,7 +456,11 @@ public void close() {
      */
     private boolean useSequentialStoredFieldsReader(BlockLoader.Docs docs) {
         int count = docs.count();
-        return count >= SEQUENTIAL_BOUNDARY && docs.get(count - 1) - docs.get(0) == count - 1;
+        if (count < SEQUENTIAL_BOUNDARY) {
+            return false;
+        }
+        int range = docs.get(count - 1) - docs.get(0);
+        return range * storedFieldsSequentialProportion < count - 1;
     }
 
     private void trackStoredFields(StoredFieldsSpec spec, boolean sequential) {
diff --git a/x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/operator/OrdinalsGroupingOperator.java b/x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/operator/OrdinalsGroupingOperator.java
@@ -63,7 +63,8 @@ public record OrdinalsGroupingOperatorFactory(
         int docChannel,
         String groupingField,
         List<Factory> aggregators,
-        int maxPageSize
+        int maxPageSize,
+        double storedFieldsSequentialProportion
     ) implements OperatorFactory {
 
         @Override
@@ -76,6 +77,7 @@ public Operator get(DriverContext driverContext) {
                 groupingField,
                 aggregators,
                 maxPageSize,
+                storedFieldsSequentialProportion,
                 driverContext
             );
         }
@@ -94,6 +96,7 @@ public String describe() {
     private final List<Factory> aggregatorFactories;
     private final ElementType groupingElementType;
     private final Map<SegmentID, OrdinalSegmentAggregator> ordinalAggregators;
+    private final double storedFieldsSequentialProportion;
 
     private final DriverContext driverContext;
 
@@ -111,6 +114,7 @@ public OrdinalsGroupingOperator(
         String groupingField,
         List<GroupingAggregator.Factory> aggregatorFactories,
         int maxPageSize,
+        double storedFieldsSequentialProportion,
         DriverContext driverContext
     ) {
         Objects.requireNonNull(aggregatorFactories);
@@ -122,6 +126,7 @@ public OrdinalsGroupingOperator(
         this.aggregatorFactories = aggregatorFactories;
         this.ordinalAggregators = new HashMap<>();
         this.maxPageSize = maxPageSize;
+        this.storedFieldsSequentialProportion = storedFieldsSequentialProportion;
         this.driverContext = driverContext;
     }
 
@@ -171,6 +176,7 @@ public void addInput(Page page) {
                         channelIndex,
                         aggregatorFactories,
                         maxPageSize,
+                        storedFieldsSequentialProportion,
                         driverContext
                     );
                 }
@@ -485,6 +491,7 @@ boolean next() throws IOException {
     private static class ValuesAggregator implements Releasable {
         private final ValuesSourceReaderOperator extractor;
         private final HashAggregationOperator aggregator;
+        private final double storedFieldsSequentialProportion;
 
         ValuesAggregator(
             IntFunction<BlockLoader> blockLoaders,
@@ -495,13 +502,15 @@ private static class ValuesAggregator implements Releasable {
             int channelIndex,
             List<GroupingAggregator.Factory> aggregatorFactories,
             int maxPageSize,
+            double storedFieldsSequentialProportion,
             DriverContext driverContext
         ) {
             this.extractor = new ValuesSourceReaderOperator(
                 driverContext.blockFactory(),
                 List.of(new ValuesSourceReaderOperator.FieldInfo(groupingField, groupingElementType, blockLoaders)),
                 shardContexts,
-                docChannel
+                docChannel,
+                storedFieldsSequentialProportion
             );
             this.aggregator = new HashAggregationOperator(
                 aggregatorFactories,
@@ -513,6 +522,7 @@ private static class ValuesAggregator implements Releasable {
                 ),
                 driverContext
             );
+            this.storedFieldsSequentialProportion = storedFieldsSequentialProportion;
         }
 
         void addInput(Page page) {
diff --git a/x-pack/plugin/esql/compute/src/test/java/org/elasticsearch/compute/OperatorTests.java b/x-pack/plugin/esql/compute/src/test/java/org/elasticsearch/compute/OperatorTests.java
@@ -204,6 +204,7 @@ public String toString() {
                         gField,
                         List.of(CountAggregatorFunction.supplier().groupingAggregatorFactory(INITIAL, List.of(1))),
                         randomPageSize(),
+                        0.1,
                         driverContext
                     )
                 );
diff --git a/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/AbstractLookupService.java b/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/AbstractLookupService.java
@@ -409,7 +409,8 @@ private static Operator extractFieldsOperator(
             driverContext.blockFactory(),
             fields,
             List.of(new ValuesSourceReaderOperator.ShardContext(shardContext.searcher().getIndexReader(), shardContext::newSourceLoader)),
-            0
+            0,
+            0.1
         );
     }
 
diff --git a/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/EsPhysicalOperationProviders.java b/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/EsPhysicalOperationProviders.java
@@ -103,16 +103,19 @@ public interface ShardContext extends org.elasticsearch.compute.lucene.ShardCont
 
     private final List<ShardContext> shardContexts;
     private final DataPartitioning defaultDataPartitioning;
+    private final double storedFieldsSequentialProportion;
 
     public EsPhysicalOperationProviders(
         FoldContext foldContext,
         List<ShardContext> shardContexts,
         AnalysisRegistry analysisRegistry,
-        DataPartitioning defaultDataPartitioning
+        DataPartitioning defaultDataPartitioning,
+        double storedFieldsSequentialProportion
     ) {
         super(foldContext, analysisRegistry);
         this.shardContexts = shardContexts;
         this.defaultDataPartitioning = defaultDataPartitioning;
+        this.storedFieldsSequentialProportion = storedFieldsSequentialProportion;
     }
 
     @Override
@@ -132,7 +135,10 @@ public final PhysicalOperation fieldExtractPhysicalOperation(FieldExtractExec fi
             IntFunction<BlockLoader> loader = s -> getBlockLoaderFor(s, attr, fieldExtractPreference);
             fields.add(new ValuesSourceReaderOperator.FieldInfo(getFieldName(attr), elementType, loader));
         }
-        return source.with(new ValuesSourceReaderOperator.Factory(fields, readers, docChannel), layout.build());
+        return source.with(
+            new ValuesSourceReaderOperator.Factory(fields, readers, docChannel, storedFieldsSequentialProportion),
+            layout.build()
+        );
     }
 
     private static String getFieldName(Attribute attr) {
@@ -278,7 +284,8 @@ public final Operator.OperatorFactory ordinalGroupingOperatorFactory(
             docChannel,
             attrSource.name(),
             aggregatorFactories,
-            context.pageSize(aggregateExec.estimatedRowSize())
+            context.pageSize(aggregateExec.estimatedRowSize()),
+            storedFieldsSequentialProportion
         );
     }
 
diff --git a/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeService.java b/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/ComputeService.java
@@ -535,6 +535,13 @@ public SourceProvider createSourceProvider() {
                 new EsPhysicalOperationProviders.DefaultShardContext(i, searchExecutionContext, searchContext.request().getAliasFilter())
             );
         }
+        EsPhysicalOperationProviders physicalOperationProviders = new EsPhysicalOperationProviders(
+            context.foldCtx(),
+            contexts,
+            searchService.getIndicesService().getAnalysis(),
+            defaultDataPartitioning,
+            context.configuration().pragmas().storedFieldsSequentialProportion()
+        );
         final List<Driver> drivers;
         try {
             LocalExecutionPlanner planner = new LocalExecutionPlanner(
@@ -550,12 +557,7 @@ public SourceProvider createSourceProvider() {
                 enrichLookupService,
                 lookupFromIndexService,
                 inferenceRunner,
-                new EsPhysicalOperationProviders(
-                    context.foldCtx(),
-                    contexts,
-                    searchService.getIndicesService().getAnalysis(),
-                    defaultDataPartitioning
-                ),
+                physicalOperationProviders,
                 contexts
             );
 
diff --git a/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/QueryPragmas.java b/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/QueryPragmas.java
@@ -73,6 +73,27 @@ public final class QueryPragmas implements Writeable {
 
     public static final Setting<ByteSizeValue> FOLD_LIMIT = Setting.memorySizeSetting("fold_limit", "5%");
 
+    /**
+     * Tuning parameter for deciding when to use the "merge" stored field loader.
+     * Think of it as "how similar to a sequential block of documents do I have to
+     * be before I'll use the merge reader?" So a value of {@code 1} means I have to
+     * be <strong>exactly</strong> a sequential block, like {@code 0, 1, 2, 3, .. 1299, 1300}.
+     * A value of {@code .1} means we'll use the sequential reader even if we only
+     * need one in ten documents.
+     * <p>
+     *     The default value of this was experimentally derived using a
+     *     <a href="https://gist.github.com/nik9000/4e84ff5a76b86890e540d4a381606a55">script</a>.
+     *     And a little paranoia. A lower default value was looking good locally, but
+     *     I'm concerned about the implications of effectively using this all the time.
+     * </p>
+     */
+    public static final Setting<Double> STORED_FIELDS_SEQUENTIAL_PROPORTION = Setting.doubleSetting(
+        "stored_fields_sequential_proportion",
+        0.10,
+        0,
+        1
+    );
+
     public static final Setting<MappedFieldType.FieldExtractPreference> FIELD_EXTRACT_PREFERENCE = Setting.enumSetting(
         MappedFieldType.FieldExtractPreference.class,
         "field_extract_preference",
@@ -120,6 +141,18 @@ public int taskConcurrency() {
         return TASK_CONCURRENCY.get(settings);
     }
 
+    /**
+     * Tuning parameter for deciding when to use the "merge" stored field loader.
+     * Think of it as "how similar to a sequential block of documents do I have to
+     * be before I'll use the merge reader?" So a value of {@code 1} means I have to
+     * be <strong>exactly</strong> a sequential block, like {@code 0, 1, 2, 3, .. 1299, 1300}.
+     * A value of {@code .1} means significant gaps are allowed and we'll still use the
+     * sequential reader.
+     */
+    public double storedFieldsSequentialProportion() {
+        return STORED_FIELDS_SEQUENTIAL_PROPORTION.get(settings);
+    }
+
     /**
      * Size of a page in entries with {@code 0} being a special value asking
      * to adaptively size based on the number of columns in the page.
diff --git a/x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/PhysicalPlanOptimizerTests.java b/x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/PhysicalPlanOptimizerTests.java
@@ -7681,7 +7681,7 @@ private LocalExecutionPlanner.LocalExecutionPlan physicalOperationsFromPhysicalP
             null,
             null,
             null,
-            new EsPhysicalOperationProviders(FoldContext.small(), List.of(), null, DataPartitioning.AUTO),
+            new EsPhysicalOperationProviders(FoldContext.small(), List.of(), null, DataPartitioning.AUTO, 0.1),
             List.of()
         );
 
diff --git a/x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlannerTests.java b/x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlannerTests.java
@@ -257,7 +257,7 @@ private Configuration config() {
     }
 
     private EsPhysicalOperationProviders esPhysicalOperationProviders(List<EsPhysicalOperationProviders.ShardContext> shardContexts) {
-        return new EsPhysicalOperationProviders(FoldContext.small(), shardContexts, null, DataPartitioning.AUTO);
+        return new EsPhysicalOperationProviders(FoldContext.small(), shardContexts, null, DataPartitioning.AUTO, 0.1);
     }
 
     private List<EsPhysicalOperationProviders.ShardContext> createShardContexts() throws IOException {

Original file line number	Diff line number	Diff line change
`@@ -204,6 +204,7 @@ public String toString() {`
`204`	`204`	`gField,`
`205`	`205`	`List.of(CountAggregatorFunction.supplier().groupingAggregatorFactory(INITIAL, List.of(1))),`
`206`	`206`	`randomPageSize(),`
	`207`	`+ 0.1,`
`207`	`208`	`driverContext`
`208`	`209`	`)`
`209`	`210`	`);`
Original file line number	Diff line number	Diff line change
`@@ -409,7 +409,8 @@ private static Operator extractFieldsOperator(`
`409`	`409`	`driverContext.blockFactory(),`
`410`	`410`	`fields,`
`411`	`411`	`List.of(new ValuesSourceReaderOperator.ShardContext(shardContext.searcher().getIndexReader(), shardContext::newSourceLoader)),`
`412`		`- 0`
	`412`	`+ 0,`
	`413`	`+ 0.1`
`413`	`414`	`);`
`414`	`415`	`}`
`415`	`416`
Original file line number	Diff line number	Diff line change
`@@ -257,7 +257,7 @@ private Configuration config() {`
`257`	`257`	`}`
`258`	`258`
`259`	`259`	`private EsPhysicalOperationProviders esPhysicalOperationProviders(List<EsPhysicalOperationProviders.ShardContext> shardContexts) {`
`260`		`- return new EsPhysicalOperationProviders(FoldContext.small(), shardContexts, null, DataPartitioning.AUTO);`
	`260`	`+ return new EsPhysicalOperationProviders(FoldContext.small(), shardContexts, null, DataPartitioning.AUTO, 0.1);`
`261`	`261`	`}`
`262`	`262`
`263`	`263`	`private List<EsPhysicalOperationProviders.ShardContext> createShardContexts() throws IOException {`