Store ignored source in unique stored fields per entry #132142

jordan-powers · 2025-07-30T01:10:49Z

This PR does the following:

Stores each _ignored_source entry in a unique stored field called _ignored_source.<field_name>
Coalesces multiple entries for the same field name into a single lucene stored field
Adds the WildcardFieldMaskingReader so that when running synthetic source roundtrip tests, we can ignore differences in fields that match the pattern ignored_source.*

…ue-fields-2

martijnvg

Thanks @jordan-powers! High level this looks good.

I wonder what the impact is on the elastic/logs with unmapped fields (insist_chicken) benchmark. Did you already have a change to check this?

I also think we should introduce a bwc test suite for synthetic source with many unmapped fields that would use synthetic source / logsdb. I think that is a bit under tested at the moment. Something like LogsdbIndexingRollingUpgradeIT but with unmapped fields.

martijnvg · 2025-07-31T07:54:49Z

server/src/main/java/org/elasticsearch/index/mapper/IgnoredSourceFieldMapper.java

        return fieldsToLoadForSyntheticSource;
    }

+    public enum IgnoredFieldsLoader {


👍 to encapsulating synthesizing logic here

martijnvg · 2025-07-31T08:00:20Z

server/src/main/java/org/elasticsearch/index/fieldvisitor/StoredFieldLoader.java

+                };
+            }
+
+            return new CustomFieldsVisitor(fields, loadSource);


With this per field ignored source this can be improved in a followup.

martijnvg · 2025-07-31T08:03:20Z

server/src/main/java/org/elasticsearch/index/mapper/FallbackSyntheticSourceBlockLoader.java

-                IgnoredSourceFieldMapper.NameValue nameValue = IgnoredSourceFieldMapper.decode(value);
-                if (fieldNames.contains(nameValue.name())) {
-                    valuesForFieldAndParents.computeIfAbsent(nameValue.name(), k -> new ArrayList<>()).add(nameValue);
+            if (indexCreatedVersion.onOrAfter(IndexVersions.IGNORED_SOURCE_FIELDS_PER_ENTRY)) {


Maybe it makes sense to have two IgnoredSourceRowStrideReader implementations? Something like LegacySingleIgnoredSourceRowStrideReader and PerFieldIgnoredSourceRowStrideReader?

I agree that it's messy to check the IndexVersion here. But instead of creating a second IgnoredSourceRowStrideReader implementation, I instead opted to re-use the IgnoredFieldsLoader I added for the SourceLoader.

…ue-fields-2

jordan-powers · 2025-08-06T02:57:15Z

Ok, I've run the insist_🐔 benchmarks comparing this branch to baseline, here are the results:

baseline: 68eff34
contender: 50e6c57

|                                                        Metric |                            Task |         Baseline |        Contender |          Diff |   Unit |   Diff % |
|--------------------------------------------------------------:|--------------------------------:|-----------------:|-----------------:|--------------:|-------:|---------:|
|                    Cumulative indexing time of primary shards |                                 |    721.795       |    756.237       |      34.4413  |    min |   +4.77% |
|             Min cumulative indexing time across primary shard |                                 |      0           |      0           |       0       |    min |    0.00% |
|          Median cumulative indexing time across primary shard |                                 |      7.86623     |      8.69596     |       0.82973 |    min |  +10.55% |
|             Max cumulative indexing time across primary shard |                                 |    156.141       |    154.184       |      -1.95622 |    min |   -1.25% |
|           Cumulative indexing throttle time of primary shards |                                 |      0           |      0           |       0       |    min |    0.00% |
|    Min cumulative indexing throttle time across primary shard |                                 |      0           |      0           |       0       |    min |    0.00% |
| Median cumulative indexing throttle time across primary shard |                                 |      0           |      0           |       0       |    min |    0.00% |
|    Max cumulative indexing throttle time across primary shard |                                 |      0           |      0           |       0       |    min |    0.00% |
|                       Cumulative merge time of primary shards |                                 |    227.479       |    201.383       |     -26.0959  |    min |  -11.47% |
|                      Cumulative merge count of primary shards |                                 |    394           |    388           |      -6       |        |   -1.52% |
|                Min cumulative merge time across primary shard |                                 |      0           |      0           |       0       |    min |    0.00% |
|             Median cumulative merge time across primary shard |                                 |      1.3245      |      1.28809     |      -0.03641 |    min |   -2.75% |
|                Max cumulative merge time across primary shard |                                 |     60.9483      |     51.1664      |      -9.78193 |    min |  -16.05% |
|              Cumulative merge throttle time of primary shards |                                 |     63.2614      |     61.3044      |      -1.95697 |    min |   -3.09% |
|       Min cumulative merge throttle time across primary shard |                                 |      0           |      0           |       0       |    min |    0.00% |
|    Median cumulative merge throttle time across primary shard |                                 |      0.312058    |      0.301108    |      -0.01095 |    min |   -3.51% |
|       Max cumulative merge throttle time across primary shard |                                 |     16.6081      |     15.9925      |      -0.61558 |    min |   -3.71% |
|                     Cumulative refresh time of primary shards |                                 |      9.53073     |      8.94263     |      -0.5881  |    min |   -6.17% |
|                    Cumulative refresh count of primary shards |                                 |   5993           |   6068           |      75       |        |   +1.25% |
|              Min cumulative refresh time across primary shard |                                 |      0           |      0           |       0       |    min |    0.00% |
|           Median cumulative refresh time across primary shard |                                 |      0.0446667   |      0.0487833   |       0.00412 |    min |   +9.22% |
|              Max cumulative refresh time across primary shard |                                 |      2.7209      |      2.52535     |      -0.19555 |    min |   -7.19% |
|                       Cumulative flush time of primary shards |                                 |    143.493       |    138.416       |      -5.07708 |    min |   -3.54% |
|                      Cumulative flush count of primary shards |                                 |   5562           |   5633           |      71       |        |   +1.28% |
|                Min cumulative flush time across primary shard |                                 |      0.000316667 |      0.00116667  |       0.00085 |    min | +268.42% |
|             Median cumulative flush time across primary shard |                                 |      1.92777     |      1.88607     |      -0.0417  |    min |   -2.16% |
|                Max cumulative flush time across primary shard |                                 |     26.0975      |     25.3459      |      -0.75162 |    min |   -2.88% |
|                                       Total Young Gen GC time |                                 |    248.342       |    261.757       |      13.415   |      s |   +5.40% |
|                                      Total Young Gen GC count |                                 |  23930           |  17115           |   -6815       |        |  -28.48% |
|                                         Total Old Gen GC time |                                 |      0           |      0           |       0       |      s |    0.00% |
|                                        Total Old Gen GC count |                                 |      0           |      0           |       0       |        |    0.00% |
|                                                  Dataset size |                                 |     52.1781      |     52.607       |       0.4289  |     GB |   +0.82% |
|                                                    Store size |                                 |     52.1781      |     52.607       |       0.4289  |     GB |   +0.82% |
|                                                 Translog size |                                 |      4.09782e-06 |      4.09782e-06 |       0       |     GB |    0.00% |
|                                        Heap used for segments |                                 |      0           |      0           |       0       |     MB |    0.00% |
|                                      Heap used for doc values |                                 |      0           |      0           |       0       |     MB |    0.00% |
|                                           Heap used for terms |                                 |      0           |      0           |       0       |     MB |    0.00% |
|                                           Heap used for norms |                                 |      0           |      0           |       0       |     MB |    0.00% |
|                                          Heap used for points |                                 |      0           |      0           |       0       |     MB |    0.00% |
|                                   Heap used for stored fields |                                 |      0           |      0           |       0       |     MB |    0.00% |
|                                                 Segment count |                                 |   1026           |   1211           |     185       |        |  +18.03% |
|                                   Total Ingest Pipeline count |                                 |      4.88622e+08 |      4.88622e+08 |       0       |        |    0.00% |
|                                    Total Ingest Pipeline time |                                 |      1.70642e+07 |      1.66632e+07 | -401014       |     ms |   -2.35% |
|                                  Total Ingest Pipeline failed |                                 |      0           |      0           |       0       |        |    0.00% |
|                                                Min Throughput |                insert-pipelines |      7.19027     |      6.96305     |      -0.22722 |  ops/s |   -3.16% |
|                                               Mean Throughput |                insert-pipelines |      7.19027     |      6.96305     |      -0.22722 |  ops/s |   -3.16% |
|                                             Median Throughput |                insert-pipelines |      7.19027     |      6.96305     |      -0.22722 |  ops/s |   -3.16% |
|                                                Max Throughput |                insert-pipelines |      7.19027     |      6.96305     |      -0.22722 |  ops/s |   -3.16% |
|                                      100th percentile latency |                insert-pipelines |   2025.98        |   2094.17        |      68.1887  |     ms |   +3.37% |
|                                 100th percentile service time |                insert-pipelines |   2025.98        |   2094.17        |      68.1887  |     ms |   +3.37% |
|                                                    error rate |                insert-pipelines |      0           |      0           |       0       |      % |    0.00% |
|                                                Min Throughput |                      insert-ilm |     21.6649      |     19.7734      |      -1.89146 |  ops/s |   -8.73% |
|                                               Mean Throughput |                      insert-ilm |     21.6649      |     19.7734      |      -1.89146 |  ops/s |   -8.73% |
|                                             Median Throughput |                      insert-ilm |     21.6649      |     19.7734      |      -1.89146 |  ops/s |   -8.73% |
|                                                Max Throughput |                      insert-ilm |     21.6649      |     19.7734      |      -1.89146 |  ops/s |   -8.73% |
|                                      100th percentile latency |                      insert-ilm |     45.1697      |     49.1361      |       3.96637 |     ms |   +8.78% |
|                                 100th percentile service time |                      insert-ilm |     45.1697      |     49.1361      |       3.96637 |     ms |   +8.78% |
|                                                    error rate |                      insert-ilm |      0           |      0           |       0       |      % |    0.00% |
|                                      100th percentile latency | update-custom-package-templates |     20.0145      |     25.9988      |       5.98439 |     ms |  +29.90% |
|                                 100th percentile service time | update-custom-package-templates |     20.0145      |     25.9988      |       5.98439 |     ms |  +29.90% |
|                                                    error rate | update-custom-package-templates |      0           |      0           |       0       |      % |    0.00% |
|                                                Min Throughput |                      bulk-index |   1206.13        |   1033.24        |    -172.89    | docs/s |  -14.33% |
|                                               Mean Throughput |                      bulk-index |  40879.2         |  39703.1         |   -1176.1     | docs/s |   -2.88% |
|                                             Median Throughput |                      bulk-index |  40741.2         |  39627.4         |   -1113.8     | docs/s |   -2.73% |
|                                                Max Throughput |                      bulk-index |  43925.3         |  42960.9         |    -964.363   | docs/s |   -2.20% |
|                                       50th percentile latency |                      bulk-index |   1231.86        |   1263.81        |      31.9457  |     ms |   +2.59% |
|                                       90th percentile latency |                      bulk-index |   2108.71        |   2147.77        |      39.0555  |     ms |   +1.85% |
|                                       99th percentile latency |                      bulk-index |   3300.42        |   3401.76        |     101.346   |     ms |   +3.07% |
|                                     99.9th percentile latency |                      bulk-index |   9498.4         |   9073.37        |    -425.03    |     ms |   -4.47% |
|                                    99.99th percentile latency |                      bulk-index |  14210.2         |  12598           |   -1612.18    |     ms |  -11.35% |
|                                      100th percentile latency |                      bulk-index |  24020.8         |  16946.4         |   -7074.44    |     ms |  -29.45% |
|                                  50th percentile service time |                      bulk-index |   1233.53        |   1265.01        |      31.4782  |     ms |   +2.55% |
|                                  90th percentile service time |                      bulk-index |   2102.72        |   2145.91        |      43.1902  |     ms |   +2.05% |
|                                  99th percentile service time |                      bulk-index |   3342.63        |   3339.84        |      -2.79232 |     ms |   -0.08% |
|                                99.9th percentile service time |                      bulk-index |   9507.86        |   9069.43        |    -438.426   |     ms |   -4.61% |
|                               99.99th percentile service time |                      bulk-index |  14222.6         |  12598           |   -1624.57    |     ms |  -11.42% |
|                                 100th percentile service time |                      bulk-index |  24020.8         |  16946.4         |   -7074.44    |     ms |  -29.45% |
|                                                    error rate |                      bulk-index |      0           |      0           |       0       |      % |    0.00% |
|                                                Min Throughput |                       limit_500 |      7.97479     |      8.95117     |       0.97638 |  ops/s |  +12.24% |
|                                               Mean Throughput |                       limit_500 |     14.8373      |     15.2225      |       0.38519 |  ops/s |   +2.60% |
|                                             Median Throughput |                       limit_500 |     14.8373      |     15.2225      |       0.38519 |  ops/s |   +2.60% |
|                                                Max Throughput |                       limit_500 |     21.6998      |     21.4938      |      -0.206   |  ops/s |   -0.95% |
|                                       50th percentile latency |                       limit_500 |     14.3799      |     14.7992      |       0.41928 |     ms |   +2.92% |
|                                       90th percentile latency |                       limit_500 |     18.2078      |     20.0283      |       1.8205  |     ms |  +10.00% |
|                                       99th percentile latency |                       limit_500 |     35.7273      |     24.9255      |     -10.8018  |     ms |  -30.23% |
|                                      100th percentile latency |                       limit_500 |     35.7575      |     25.0164      |     -10.7411  |     ms |  -30.04% |
|                                  50th percentile service time |                       limit_500 |     14.3799      |     14.7992      |       0.41928 |     ms |   +2.92% |
|                                  90th percentile service time |                       limit_500 |     18.2078      |     20.0283      |       1.8205  |     ms |  +10.00% |
|                                  99th percentile service time |                       limit_500 |     35.7273      |     24.9255      |     -10.8018  |     ms |  -30.23% |
|                                 100th percentile service time |                       limit_500 |     35.7575      |     25.0164      |     -10.7411  |     ms |  -30.04% |
|                                                    error rate |                       limit_500 |      0           |      0           |       0       |      % |    0.00% |
|                                                Min Throughput |                       chicken_1 |      7.77357     |      7.85564     |       0.08207 |  ops/s |   +1.06% |
|                                               Mean Throughput |                       chicken_1 |      7.77357     |      7.85564     |       0.08207 |  ops/s |   +1.06% |
|                                             Median Throughput |                       chicken_1 |      7.77357     |      7.85564     |       0.08207 |  ops/s |   +1.06% |
|                                                Max Throughput |                       chicken_1 |      7.77357     |      7.85564     |       0.08207 |  ops/s |   +1.06% |
|                                       50th percentile latency |                       chicken_1 |     42.0316      |     44.9288      |       2.89714 |     ms |   +6.89% |
|                                      100th percentile latency |                       chicken_1 |     50.2238      |     53.2315      |       3.00771 |     ms |   +5.99% |
|                                  50th percentile service time |                       chicken_1 |     42.0316      |     44.9288      |       2.89714 |     ms |   +6.89% |
|                                 100th percentile service time |                       chicken_1 |     50.2238      |     53.2315      |       3.00771 |     ms |   +5.99% |
|                                                    error rate |                       chicken_1 |      0           |      0           |       0       |      % |    0.00% |
|                                                Min Throughput |                       chicken_2 |      0.00154722  |      0.00279475  |       0.00125 |  ops/s |  +80.63% |
|                                               Mean Throughput |                       chicken_2 |      0.00155002  |      0.00280965  |       0.00126 |  ops/s |  +81.27% |
|                                             Median Throughput |                       chicken_2 |      0.00155016  |      0.00280672  |       0.00126 |  ops/s |  +81.06% |
|                                                Max Throughput |                       chicken_2 |      0.00155384  |      0.00283215  |       0.00128 |  ops/s |  +82.27% |
|                                       50th percentile latency |                       chicken_2 | 645279           | 359428           | -285851       |     ms |  -44.30% |
|                                      100th percentile latency |                       chicken_2 | 651321           | 366075           | -285246       |     ms |  -43.79% |
|                                  50th percentile service time |                       chicken_2 | 645279           | 359428           | -285851       |     ms |  -44.30% |
|                                 100th percentile service time |                       chicken_2 | 651321           | 366075           | -285246       |     ms |  -43.79% |
|                                                    error rate |                       chicken_2 |      0           |      0           |       0       |      % |    0.00% |
|                                                Min Throughput |                       chicken_3 |      0.00107037  |      0.00164444  |       0.00057 |  ops/s |  +53.63% |
|                                               Mean Throughput |                       chicken_3 |      0.00107335  |      0.00164689  |       0.00057 |  ops/s |  +53.43% |
|                                             Median Throughput |                       chicken_3 |      0.00107395  |      0.00164734  |       0.00057 |  ops/s |  +53.39% |
|                                                Max Throughput |                       chicken_3 |      0.00107629  |      0.00164896  |       0.00057 |  ops/s |  +53.21% |
|                                       50th percentile latency |                       chicken_3 | 940807           | 610453           | -330355       |     ms |  -35.11% |
|                                      100th percentile latency |                       chicken_3 | 946406           | 611779           | -334627       |     ms |  -35.36% |
|                                  50th percentile service time |                       chicken_3 | 940807           | 610453           | -330355       |     ms |  -35.11% |
|                                 100th percentile service time |                       chicken_3 | 946406           | 611779           | -334627       |     ms |  -35.36% |
|                                                    error rate |                       chicken_3 |      0           |      0           |       0       |      % |    0.00% |
|                                                Min Throughput |            chicken_3_with_where |     23.8234      |     34.4657      |      10.6424  |  ops/s |  +44.67% |
|                                               Mean Throughput |            chicken_3_with_where |     23.8234      |     34.4657      |      10.6424  |  ops/s |  +44.67% |
|                                             Median Throughput |            chicken_3_with_where |     23.8234      |     34.4657      |      10.6424  |  ops/s |  +44.67% |
|                                                Max Throughput |            chicken_3_with_where |     23.8234      |     34.4657      |      10.6424  |  ops/s |  +44.67% |
|                                       50th percentile latency |            chicken_3_with_where |      8.46259     |      7.70784     |      -0.75475 |     ms |   -8.92% |
|                                      100th percentile latency |            chicken_3_with_where |      9.69063     |     12.0514      |       2.36073 |     ms |  +24.36% |
|                                  50th percentile service time |            chicken_3_with_where |      8.46259     |      7.70784     |      -0.75475 |     ms |   -8.92% |
|                                 100th percentile service time |            chicken_3_with_where |      9.69063     |     12.0514      |       2.36073 |     ms |  +24.36% |
|                                                    error rate |            chicken_3_with_where |      0           |      0           |       0       |      % |    0.00% |
|                                                Min Throughput |                       chicken_4 |      0.00555192  |      0.00275752  |      -0.00279 |  ops/s |  -50.33% |
|                                               Mean Throughput |                       chicken_4 |      0.00562589  |      0.00277612  |      -0.00285 |  ops/s |  -50.65% |
|                                             Median Throughput |                       chicken_4 |      0.00560198  |      0.00277458  |      -0.00283 |  ops/s |  -50.47% |
|                                                Max Throughput |                       chicken_4 |      0.00570013  |      0.00279203  |      -0.00291 |  ops/s |  -51.02% |
|                                       50th percentile latency |                       chicken_4 | 175925           | 353224           |  177300       |     ms | +100.78% |
|                                      100th percentile latency |                       chicken_4 | 203507           | 380576           |  177070       |     ms |  +87.01% |
|                                  50th percentile service time |                       chicken_4 | 175925           | 353224           |  177300       |     ms | +100.78% |
|                                 100th percentile service time |                       chicken_4 | 203507           | 380576           |  177070       |     ms |  +87.01% |
|                                                    error rate |                       chicken_4 |      0           |      0           |       0       |      % |    0.00% |

Seems to be already an overall positive change, likely because this change allows the CustomFieldVisitor to skip retrieving and parsing non-relevant _ignored_source entries.

…ue-fields-2

elasticsearchmachine · 2025-08-11T13:23:55Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

lkts

I haven't reviewed in depth (f.e. i am not familiar with FieldSubsetReader) but it looks good to me.

Looks like chicken_4 actually regressed btw? Do you know why?

lkts · 2025-08-13T17:15:54Z

server/src/main/java/org/elasticsearch/index/mapper/FallbackSyntheticSourceBlockLoader.java

+    }
+
    private static class IgnoredSourceRowStrideReader<T> implements RowStrideReader {
        // Contains name of the field and all its parents


nit: move the comment together with the line it comments

lkts · 2025-08-13T17:17:26Z

server/src/main/java/org/elasticsearch/index/mapper/IgnoredSourceFieldMapper.java

+        return NAME + "." + fieldName;
+    }
+
+    static BytesRef encodeMulti(List<NameValue> values) {


nit: encodeMultipleValuesForField?

lkts · 2025-08-13T17:24:06Z

server/src/main/java/org/elasticsearch/index/mapper/IgnoredSourceFieldMapper.java

+
+            @Override
+            public void writeIgnoredFields(Collection<NameValue> ignoredFieldValues) {
+                throw new UnsupportedOperationException();


nit: is it possible to split write and read code path and avoid UnsupportedOperationException?

It's not immediately obvious to me how to split it up. Maybe I can change the UnsupportedOperationException into an assert false, so that in production it's just a no-op instead of a fatal error?

I don't really have a preference, i think it's surprising for the user of this API in either case. Thanks for considering it and let's move on if it's not viable.

jordan-powers · 2025-08-13T17:41:39Z

Thanks for the review, Sasha!

Looks like chicken_4 actually regressed btw? Do you know why?

Looking at the nightlies, chicken_4 is really noisy, with 50th percentile latency ranging from 86,605ms to 816,208ms over the past 30 days. So I think the regression here is just noise.

…ue-fields-2

martijnvg

Two minor comments, otherwise LGTM.

martijnvg · 2025-08-14T05:35:04Z

server/src/main/java/org/elasticsearch/index/mapper/FallbackSyntheticSourceBlockLoader.java

        this.reader = reader;
        this.fieldName = fieldName;
+        this.ignoredSourceFormat = ignoredSourceFormat;
+        this.fieldPaths = splitIntoFieldPaths(fieldName);


Nice, now we do this once per field and shard instead of once per field and segment.

martijnvg · 2025-08-14T05:41:35Z

server/src/main/java/org/elasticsearch/index/mapper/IgnoredSourceFieldMapper.java

+            Map<String, List<Object>> storedFields
+        );
+
+        public abstract Map<String, List<IgnoredSourceFieldMapper.NameValue>> loadSingleIgnoredField(


++ making the distinction of loading everything and just one field (which works better for es|ql)

martijnvg · 2025-08-14T05:52:35Z

...t/java/org/elasticsearch/xpack/core/security/authz/accesscontrol/FieldSubsetReaderTests.java


            {
-                Automaton automaton = Automatons.patterns(Arrays.asList("fieldA", IgnoredSourceFieldMapper.NAME));
+                Automaton automaton = Automatons.patterns(Arrays.asList("fieldA", IgnoredSourceFieldMapper.ignoredFieldName("*")));


Maybe randomly set index version in these tests so that we keep unit testing both single and per field ignored source here?

martijnvg · 2025-08-14T05:58:09Z

server/src/main/java/org/elasticsearch/index/mapper/IgnoredSourceFieldMapper.java

+    }
+
+    public static IgnoredSourceFormat ignoredSourceFormat(IndexVersion indexCreatedVersion) {
+        return indexCreatedVersion.onOrAfter(IndexVersions.IGNORED_SOURCE_FIELDS_PER_ENTRY)


Maybe add a feature flag here for per field ignored source? With the idea to let nightly benchmarks run the new code for ~1 week to get a better picture of the effect on other benchmarks that use synthetic source and after that period remove feature flag.

I know this is more work. Removing FF and then adding another index version, but lean towards being careful. Wdyt?

martijnvg · 2025-08-14T06:00:08Z

test/framework/src/main/java/org/elasticsearch/index/mapper/NativeArrayIntegrationTestCase.java

            var document = reader.storedFields().document(0);
            Set<String> storedFieldNames = new LinkedHashSet<>(document.getFields().stream().map(IndexableField::name).toList());
-            assertThat(storedFieldNames, contains("_ignored_source"));
+            assertThat(storedFieldNames, contains("_ignored_source.parent.field"));


Maybe also run randomly run with an index version before this change and with this change? To test both single and multiple ignored source stored fields?

This is an integration test, so it doesn't seem that I'm able to specify an index version when creating the index.

…ue-fields-2

This reverts commit 6d6b9f0.

…ue-fields-2

* upstream/main: (32 commits) Speed up loading keyword fields with index sorts (elastic#132950) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testSyntheticSourceWithTranslogSnapshot elastic#132964 Simplify EsqlSession (elastic#132848) Implement WriteLoadConstraintDecider#canAllocate (elastic#132041) Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/400_synthetic_source/_doc_count} elastic#132965 Switch to PR-based benchmark pipeline defined in ES repo (elastic#132941) Breakdown undesired allocations by shard routing role (elastic#132235) Implement v_magnitude function (elastic#132765) Introduce execution location marker for better handling of remote/local compatibility (elastic#132205) Mute org.elasticsearch.cluster.ClusterInfoServiceIT testMaxQueueLatenciesInClusterInfo elastic#132957 Unmuting simulate index data stream mapping overrides yaml rest test (elastic#132946) Remove CrossClusterCancellationIT.createLocalIndex() (elastic#132952) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testFetch elastic#132956 Fix failing UT by adding a required capability (elastic#132947) Precompute the BitsetCacheKey hashCode (elastic#132875) Adding simulate ingest effective mapping (elastic#132833) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testFetchMany elastic#132948 Rename skipping logic to remove hard link to skip_unavailable (elastic#132861) Store ignored source in unique stored fields per entry (elastic#132142) Add random tests with match_only_text multi-field (elastic#132380) ...

This PR does the following: * Stores each _ignored_source entry in a unique stored field called _ignored_source.<field_name> * Coalesces multiple entries for the same field name into a single lucene stored field * Adds the WildcardFieldMaskingReader so that when running synthetic source roundtrip tests, we can ignore differences in fields that match the pattern ignored_source.* For now, these changes are by default disabled behind a feature flag.

…-stats * upstream/main: (36 commits) Fix reproducability of builds against Java EA versions (elastic#132847) Speed up loading keyword fields with index sorts (elastic#132950) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testSyntheticSourceWithTranslogSnapshot elastic#132964 Simplify EsqlSession (elastic#132848) Implement WriteLoadConstraintDecider#canAllocate (elastic#132041) Mute org.elasticsearch.test.rest.yaml.CcsCommonYamlTestSuiteIT test {p0=search/400_synthetic_source/_doc_count} elastic#132965 Switch to PR-based benchmark pipeline defined in ES repo (elastic#132941) Breakdown undesired allocations by shard routing role (elastic#132235) Implement v_magnitude function (elastic#132765) Introduce execution location marker for better handling of remote/local compatibility (elastic#132205) Mute org.elasticsearch.cluster.ClusterInfoServiceIT testMaxQueueLatenciesInClusterInfo elastic#132957 Unmuting simulate index data stream mapping overrides yaml rest test (elastic#132946) Remove CrossClusterCancellationIT.createLocalIndex() (elastic#132952) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testFetch elastic#132956 Fix failing UT by adding a required capability (elastic#132947) Precompute the BitsetCacheKey hashCode (elastic#132875) Adding simulate ingest effective mapping (elastic#132833) Mute org.elasticsearch.index.mapper.LongFieldMapperTests testFetchMany elastic#132948 Rename skipping logic to remove hard link to skip_unavailable (elastic#132861) Store ignored source in unique stored fields per entry (elastic#132142) ...

This PR builds on the work in #132142 to optimize loading values from _ignored_source by stopping the FieldVisitor early, once all required fields have been visited. Relates to #130886.

In #132142 and #132428 we split up ignored_source entries into distinct lucene fields, then added an optimized field visitor to speed up retrieving unmapped values for INSIST_🐔. However, since this approach creates a unique lucene field for every ignored_source entry, we can very quickly have a lot of lucene fields if there are a lot of unique unmapped fields per document. This can cause significant slowdowns in indexing throughput and merge time. This PR addresses those limitations by reverting back to keeping all ignored_source entries under the same lucene field. However, we still keep some of the speedups from that prior work by continuing to coalesce multiple ignored_source entries for the same field into a single entry, allowing the field visitor to exit early. Unfortunately, we do lose some time compared to the original optimizations because now the field visitor cannot look at the fieldInfo to decide whether or not to visit a field, and it instead needs to actually visit and materialize each ignored_source entry before it can decide whether or not to keep it.

jordan-powers added 5 commits July 25, 2025 14:34

Store _ignored_source in different fields per entry

9c68921

Coalesce multiple ignored source entries into single field

09ecbb8

Add WildcardFieldMaskingReader

f8260c2

Update FieldSubsetReader for new _ignored_source fields

ab1550e

Update invalid assertion in PreloadedFieldLookupProvider

a11d2ac

jordan-powers self-assigned this Jul 30, 2025

jordan-powers added >non-issue Team:StorageEngine :StorageEngine/Mapping The storage related side of mappings v9.2.0 labels Jul 30, 2025

jordan-powers mentioned this pull request Jul 30, 2025

Use an unique stored field name for each ignored source entry #130919

Closed

jordan-powers added 9 commits July 30, 2025 08:56

Only store fieldName once

da7c4f1

Remove MappedNameValueWithFilter

3e04818

Fix IgnoredSourceFieldMapperTests

5a44487

Add NOOP IgnoredFieldsLoader

38e5420

Fix more tests

ae1c258

Fix FieldSubsetReaderTests

6f75669

Merge remote-tracking branch 'upstream/main' into ignored-source-uniq…

3cd0415

…ue-fields-2

Get _ignored_source from stored_fields request

d042cab

Update binary format in 20_ignored_source yaml test

d5e9e86

martijnvg reviewed Jul 31, 2025

View reviewed changes

jordan-powers added 4 commits August 4, 2025 12:18

Merge remote-tracking branch 'upstream/main' into ignored-source-uniq…

a5838d9

…ue-fields-2

Use IgnoredFieldsLoader in FallbackSyntheticSourceBlockLoader

50e6c57

Rename IgnoredFieldsLoader to IgnoredSourceFormat

01a46f0

Merge remote-tracking branch 'upstream/main' into ignored-source-uniq…

5a80a3e

…ue-fields-2

jordan-powers mentioned this pull request Aug 5, 2025

Optimized field visitor for ES|QL loading from _ignored_source #132428

Merged

jordan-powers added the test-full-bwc Trigger full BWC version matrix tests label Aug 5, 2025

elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Aug 6, 2025

Merge remote-tracking branch 'upstream/main' into ignored-source-uniq…

5f52d64

…ue-fields-2

Merge remote-tracking branch 'upstream/main' into ignored-source-uniq…

da674b9

…ue-fields-2

jordan-powers marked this pull request as ready for review August 11, 2025 13:23

martijnvg mentioned this pull request Aug 11, 2025

Abstracted how Text fields use Keyword fields inside of Text fields #132430

Closed

jordan-powers requested review from lkts and martijnvg August 12, 2025 19:48

lkts approved these changes Aug 13, 2025

View reviewed changes

jordan-powers added 3 commits August 13, 2025 13:48

Review nits

c57d2eb

Merge remote-tracking branch 'upstream/main' into ignored-source-uniq…

c2dec2f

…ue-fields-2

Switch UnsupportedOperationException to assert false

fcaf270

martijnvg approved these changes Aug 14, 2025

View reviewed changes

jordan-powers added 3 commits August 14, 2025 09:42

Add feature flag

1741541

Merge remote-tracking branch 'upstream/main' into ignored-source-uniq…

3181236

…ue-fields-2

Test with random index version

bdd031a

jordan-powers added the test-release Trigger CI checks against release build label Aug 14, 2025

jordan-powers added 6 commits August 14, 2025 10:12

Merge remote-tracking branch 'upstream/main' into ignored-source-uniq…

5edc55b

…ue-fields-2

Fix release tests

c392437

Fix release tests

6d6b9f0

Revert "Fix release tests"

64d5f6b

This reverts commit 6d6b9f0.

Fix release tests

df518aa

Merge remote-tracking branch 'upstream/main' into ignored-source-uniq…

7f502a2

…ue-fields-2

jordan-powers merged commit 1075553 into elastic:main Aug 14, 2025
29 checks passed

jordan-powers mentioned this pull request Aug 29, 2025

Alternate approach to speed up ignored_source access #133839

Merged

jordan-powers deleted the ignored-source-unique-fields-2 branch October 1, 2025 15:03

Store ignored source in unique stored fields per entry #132142

Store ignored source in unique stored fields per entry #132142

Uh oh!

Conversation

jordan-powers commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martijnvg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jordan-powers commented Aug 6, 2025

Uh oh!

elasticsearchmachine commented Aug 11, 2025

Uh oh!

lkts left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jordan-powers commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martijnvg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jordan-powers commented Jul 30, 2025 •

edited

Loading

jordan-powers commented Aug 13, 2025 •

edited

Loading