Alternate approach to speed up ignored_source access #133839

jordan-powers · 2025-08-29T16:50:42Z

In #132142 and #132428 we split up ignored_source entries into distinct lucene fields, then added an optimized field visitor to speed up retrieving unmapped values for INSIST_🐔.

However, since this approach creates a unique lucene field for every ignored_source entry, we can very quickly have a lot of lucene fields if there are a lot of unique unmapped fields per document. This can cause significant slowdowns in indexing throughput and merge time.

This PR addresses those limitations by reverting back to keeping all ignored_source entries under the same lucene field. However, we still keep some of the speedups from that prior work by continuing to coalesce multiple ignored_source entries for the same field into a single entry, allowing the field visitor to exit early.

Unfortunately, we do lose some time compared to the original optimizations because now the field visitor cannot look at the fieldInfo to decide whether or not to visit a field, and it instead needs to actually visit and materialize each ignored_source entry before it can decide whether or not to keep it.

jordan-powers · 2025-09-02T16:05:36Z

Here are the results of running the INSIST_🐔 benchmark comparing against the current main (with the feature flag enabled, splitting up _ignored_source into multiple _ignored_source.* fields).

|                                                        Metric |                            Task |         Baseline |        Contender |             Diff |   Unit |   Diff % |
|--------------------------------------------------------------:|--------------------------------:|-----------------:|-----------------:|-----------------:|-------:|---------:|
|                    Cumulative indexing time of primary shards |                                 |    785.479       |    764.648       |    -20.8316      |    min |   -2.65% |
|             Min cumulative indexing time across primary shard |                                 |      0           |      0           |      0           |    min |    0.00% |
|          Median cumulative indexing time across primary shard |                                 |      8.35517     |      8.57839     |      0.22322     |    min |   +2.67% |
|             Max cumulative indexing time across primary shard |                                 |    181.718       |    162.163       |    -19.5545      |    min |  -10.76% |
|           Cumulative indexing throttle time of primary shards |                                 |      0           |      0           |      0           |    min |    0.00% |
|    Min cumulative indexing throttle time across primary shard |                                 |      0           |      0           |      0           |    min |    0.00% |
| Median cumulative indexing throttle time across primary shard |                                 |      0           |      0           |      0           |    min |    0.00% |
|    Max cumulative indexing throttle time across primary shard |                                 |      0           |      0           |      0           |    min |    0.00% |
|                       Cumulative merge time of primary shards |                                 |    225.36        |    224.233       |     -1.12738     |    min |   -0.50% |
|                      Cumulative merge count of primary shards |                                 |    406           |    403           |     -3           |        |   -0.74% |
|                Min cumulative merge time across primary shard |                                 |      0           |      0           |      0           |    min |    0.00% |
|             Median cumulative merge time across primary shard |                                 |      1.41965     |      1.33246     |     -0.08719     |    min |   -6.14% |
|                Max cumulative merge time across primary shard |                                 |     59.218       |     59.0345      |     -0.1835      |    min |   -0.31% |
|              Cumulative merge throttle time of primary shards |                                 |     68.2949      |     66.5924      |     -1.70247     |    min |   -2.49% |
|       Min cumulative merge throttle time across primary shard |                                 |      0           |      0           |      0           |    min |    0.00% |
|    Median cumulative merge throttle time across primary shard |                                 |      0.395342    |      0.3402      |     -0.05514     |    min |  -13.95% |
|       Max cumulative merge throttle time across primary shard |                                 |     17.6527      |     19.0972      |      1.44452     |    min |   +8.18% |
|                     Cumulative refresh time of primary shards |                                 |      9.08145     |      8.82248     |     -0.25897     |    min |   -2.85% |
|                    Cumulative refresh count of primary shards |                                 |   6254           |   6110           |   -144           |        |   -2.30% |
|              Min cumulative refresh time across primary shard |                                 |      0           |      0           |      0           |    min |    0.00% |
|           Median cumulative refresh time across primary shard |                                 |      0.049325    |      0.05675     |      0.00743     |    min |  +15.05% |
|              Max cumulative refresh time across primary shard |                                 |      2.57865     |      2.64482     |      0.06617     |    min |   +2.57% |
|                       Cumulative flush time of primary shards |                                 |    142.579       |    140.728       |     -1.85113     |    min |   -1.30% |
|                      Cumulative flush count of primary shards |                                 |   5805           |   5662           |   -143           |        |   -2.46% |
|                Min cumulative flush time across primary shard |                                 |      6.66667e-05 |      6.66667e-05 |      0           |    min |    0.00% |
|             Median cumulative flush time across primary shard |                                 |      1.83419     |      1.79902     |     -0.03517     |    min |   -1.92% |
|                Max cumulative flush time across primary shard |                                 |     26.2029      |     27.0796      |      0.87673     |    min |   +3.35% |
|                                       Total Young Gen GC time |                                 |    273.829       |    267.461       |     -6.368       |      s |   -2.33% |
|                                      Total Young Gen GC count |                                 |  15053           |  19602           |   4549           |        |  +30.22% |
|                                         Total Old Gen GC time |                                 |      0           |      0           |      0           |      s |    0.00% |
|                                        Total Old Gen GC count |                                 |      0           |      0           |      0           |        |    0.00% |
|                                                  Dataset size |                                 |     52.3843      |     52.3382      |     -0.04602     |     GB |   -0.09% |
|                                                    Store size |                                 |     52.3843      |     52.3382      |     -0.04602     |     GB |   -0.09% |
|                                                 Translog size |                                 |      4.09782e-06 |      4.09782e-06 |      0           |     GB |    0.00% |
|                                        Heap used for segments |                                 |      0           |      0           |      0           |     MB |    0.00% |
|                                      Heap used for doc values |                                 |      0           |      0           |      0           |     MB |    0.00% |
|                                           Heap used for terms |                                 |      0           |      0           |      0           |     MB |    0.00% |
|                                           Heap used for norms |                                 |      0           |      0           |      0           |     MB |    0.00% |
|                                          Heap used for points |                                 |      0           |      0           |      0           |     MB |    0.00% |
|                                   Heap used for stored fields |                                 |      0           |      0           |      0           |     MB |    0.00% |
|                                                 Segment count |                                 |    969           |    886           |    -83           |        |   -8.57% |
|                                   Total Ingest Pipeline count |                                 |      4.88622e+08 |      4.88622e+08 |      0           |        |    0.00% |
|                                    Total Ingest Pipeline time |                                 |      1.64616e+07 |      1.76306e+07 |      1.16897e+06 |     ms |   +7.10% |
|                                  Total Ingest Pipeline failed |                                 |      0           |      0           |      0           |        |    0.00% |
|                                                Min Throughput |                insert-pipelines |      6.23734     |      6.46784     |      0.2305      |  ops/s |   +3.70% |
|                                               Mean Throughput |                insert-pipelines |      6.23734     |      6.46784     |      0.2305      |  ops/s |   +3.70% |
|                                             Median Throughput |                insert-pipelines |      6.23734     |      6.46784     |      0.2305      |  ops/s |   +3.70% |
|                                                Max Throughput |                insert-pipelines |      6.23734     |      6.46784     |      0.2305      |  ops/s |   +3.70% |
|                                      100th percentile latency |                insert-pipelines |   2352.06        |   2270.18        |    -81.8831      |     ms |   -3.48% |
|                                 100th percentile service time |                insert-pipelines |   2352.06        |   2270.18        |    -81.8831      |     ms |   -3.48% |
|                                                    error rate |                insert-pipelines |      0           |      0           |      0           |      % |    0.00% |
|                                                Min Throughput |                      insert-ilm |     17.4942      |     19.5736      |      2.07938     |  ops/s |  +11.89% |
|                                               Mean Throughput |                      insert-ilm |     17.4942      |     19.5736      |      2.07938     |  ops/s |  +11.89% |
|                                             Median Throughput |                      insert-ilm |     17.4942      |     19.5736      |      2.07938     |  ops/s |  +11.89% |
|                                                Max Throughput |                      insert-ilm |     17.4942      |     19.5736      |      2.07938     |  ops/s |  +11.89% |
|                                      100th percentile latency |                      insert-ilm |     56.2021      |     50.0109      |     -6.19124     |     ms |  -11.02% |
|                                 100th percentile service time |                      insert-ilm |     56.2021      |     50.0109      |     -6.19124     |     ms |  -11.02% |
|                                                    error rate |                      insert-ilm |      0           |      0           |      0           |      % |    0.00% |
|                                      100th percentile latency | update-custom-package-templates |     24.8715      |     13.6423      |    -11.2291      |     ms |  -45.15% |
|                                 100th percentile service time | update-custom-package-templates |     24.8715      |     13.6423      |    -11.2291      |     ms |  -45.15% |
|                                                    error rate | update-custom-package-templates |      0           |      0           |      0           |      % |    0.00% |
|                                                Min Throughput |                      bulk-index |   1022.43        |   1143.53        |    121.098       | docs/s |  +11.84% |
|                                               Mean Throughput |                      bulk-index |  38045           |  39286.2         |   1241.2         | docs/s |   +3.26% |
|                                             Median Throughput |                      bulk-index |  37923           |  39170           |   1246.96        | docs/s |   +3.29% |
|                                                Max Throughput |                      bulk-index |  39995.6         |  43144           |   3148.47        | docs/s |   +7.87% |
|                                       50th percentile latency |                      bulk-index |   1331.6         |   1283.84        |    -47.756       |     ms |   -3.59% |
|                                       90th percentile latency |                      bulk-index |   2203.3         |   2183.56        |    -19.7393      |     ms |   -0.90% |
|                                       99th percentile latency |                      bulk-index |   3556.75        |   3432.9         |   -123.855       |     ms |   -3.48% |
|                                     99.9th percentile latency |                      bulk-index |   8592.96        |   9080.04        |    487.081       |     ms |   +5.67% |
|                                    99.99th percentile latency |                      bulk-index |  12180.6         |  12142.9         |    -37.7469      |     ms |   -0.31% |
|                                      100th percentile latency |                      bulk-index |  16025.5         |  16206.7         |    181.283       |     ms |   +1.13% |
|                                  50th percentile service time |                      bulk-index |   1335.2         |   1282.53        |    -52.6771      |     ms |   -3.95% |
|                                  90th percentile service time |                      bulk-index |   2203.11        |   2181.83        |    -21.2785      |     ms |   -0.97% |
|                                  99th percentile service time |                      bulk-index |   3556.65        |   3385.56        |   -171.092       |     ms |   -4.81% |
|                                99.9th percentile service time |                      bulk-index |   8581.63        |   9086.45        |    504.821       |     ms |   +5.88% |
|                               99.99th percentile service time |                      bulk-index |  12210.8         |  12132.9         |    -77.9286      |     ms |   -0.64% |
|                                 100th percentile service time |                      bulk-index |  16025.5         |  16206.7         |    181.283       |     ms |   +1.13% |
|                                                    error rate |                      bulk-index |      0           |      0           |      0           |      % |    0.00% |
|                                                Min Throughput |                       limit_500 |      7.65503     |     15.9407      |      8.28565     |  ops/s | +108.24% |
|                                               Mean Throughput |                       limit_500 |     13.6682      |     15.9407      |      2.27252     |  ops/s |  +16.63% |
|                                             Median Throughput |                       limit_500 |     13.6682      |     15.9407      |      2.27252     |  ops/s |  +16.63% |
|                                                Max Throughput |                       limit_500 |     19.6813      |     15.9407      |     -3.74061     |  ops/s |  -19.01% |
|                                       50th percentile latency |                       limit_500 |     16.3158      |     15.9017      |     -0.41408     |     ms |   -2.54% |
|                                       90th percentile latency |                       limit_500 |     19.9405      |     21.5721      |      1.63163     |     ms |   +8.18% |
|                                       99th percentile latency |                       limit_500 |     24.5963      |     25.2077      |      0.61139     |     ms |   +2.49% |
|                                      100th percentile latency |                       limit_500 |     26.6793      |     25.8568      |     -0.8225      |     ms |   -3.08% |
|                                  50th percentile service time |                       limit_500 |     16.3158      |     15.9017      |     -0.41408     |     ms |   -2.54% |
|                                  90th percentile service time |                       limit_500 |     19.9405      |     21.5721      |      1.63163     |     ms |   +8.18% |
|                                  99th percentile service time |                       limit_500 |     24.5963      |     25.2077      |      0.61139     |     ms |   +2.49% |
|                                 100th percentile service time |                       limit_500 |     26.6793      |     25.8568      |     -0.8225      |     ms |   -3.08% |
|                                                    error rate |                       limit_500 |      0           |      0           |      0           |      % |    0.00% |
|                                                Min Throughput |                       chicken_1 |      6.70301     |      6.78201     |      0.079       |  ops/s |   +1.18% |
|                                               Mean Throughput |                       chicken_1 |      6.70301     |      6.78201     |      0.079       |  ops/s |   +1.18% |
|                                             Median Throughput |                       chicken_1 |      6.70301     |      6.78201     |      0.079       |  ops/s |   +1.18% |
|                                                Max Throughput |                       chicken_1 |      6.70301     |      6.78201     |      0.079       |  ops/s |   +1.18% |
|                                       50th percentile latency |                       chicken_1 |     43.1182      |     47.1608      |      4.0426      |     ms |   +9.38% |
|                                      100th percentile latency |                       chicken_1 |     47.8871      |     52.0452      |      4.15809     |     ms |   +8.68% |
|                                  50th percentile service time |                       chicken_1 |     43.1182      |     47.1608      |      4.0426      |     ms |   +9.38% |
|                                 100th percentile service time |                       chicken_1 |     47.8871      |     52.0452      |      4.15809     |     ms |   +8.68% |
|                                                    error rate |                       chicken_1 |      0           |      0           |      0           |      % |    0.00% |
|                                                Min Throughput |                       chicken_2 |      0.0034697   |      0.00193971  |     -0.00153     |  ops/s |  -44.10% |
|                                               Mean Throughput |                       chicken_2 |      0.00349247  |      0.00194296  |     -0.00155     |  ops/s |  -44.37% |
|                                             Median Throughput |                       chicken_2 |      0.00349608  |      0.00194315  |     -0.00155     |  ops/s |  -44.42% |
|                                                Max Throughput |                       chicken_2 |      0.00350761  |      0.00194476  |     -0.00156     |  ops/s |  -44.56% |
|                                       50th percentile latency |                       chicken_2 | 282029           | 512244           | 230214           |     ms |  +81.63% |
|                                      100th percentile latency |                       chicken_2 | 282287           | 514959           | 232672           |     ms |  +82.42% |
|                                  50th percentile service time |                       chicken_2 | 282029           | 512244           | 230214           |     ms |  +81.63% |
|                                 100th percentile service time |                       chicken_2 | 282287           | 514959           | 232672           |     ms |  +82.42% |
|                                                    error rate |                       chicken_2 |      0           |      0           |      0           |      % |    0.00% |
|                                                Min Throughput |                       chicken_3 |      0.00186763  |      0.00198938  |      0.00012     |  ops/s |   +6.52% |
|                                               Mean Throughput |                       chicken_3 |      0.00187159  |      0.00202095  |      0.00015     |  ops/s |   +7.98% |
|                                             Median Throughput |                       chicken_3 |      0.00187049  |      0.00201753  |      0.00015     |  ops/s |   +7.86% |
|                                                Max Throughput |                       chicken_3 |      0.00187785  |      0.00206066  |      0.00018     |  ops/s |   +9.73% |
|                                       50th percentile latency |                       chicken_3 | 538622           | 518768           | -19854.4         |     ms |   -3.69% |
|                                      100th percentile latency |                       chicken_3 | 539738           | 523966           | -15771.6         |     ms |   -2.92% |
|                                  50th percentile service time |                       chicken_3 | 538622           | 518768           | -19854.4         |     ms |   -3.69% |
|                                 100th percentile service time |                       chicken_3 | 539738           | 523966           | -15771.6         |     ms |   -2.92% |
|                                                    error rate |                       chicken_3 |      0           |      0           |      0           |      % |    0.00% |
|                                                Min Throughput |            chicken_3_with_where |     27.5639      |     31.0218      |      3.45789     |  ops/s |  +12.55% |
|                                               Mean Throughput |            chicken_3_with_where |     27.5639      |     31.0218      |      3.45789     |  ops/s |  +12.55% |
|                                             Median Throughput |            chicken_3_with_where |     27.5639      |     31.0218      |      3.45789     |  ops/s |  +12.55% |
|                                                Max Throughput |            chicken_3_with_where |     27.5639      |     31.0218      |      3.45789     |  ops/s |  +12.55% |
|                                       50th percentile latency |            chicken_3_with_where |     11.006       |      8.50661     |     -2.49936     |     ms |  -22.71% |
|                                      100th percentile latency |            chicken_3_with_where |     12.363       |     16.4753      |      4.11232     |     ms |  +33.26% |
|                                  50th percentile service time |            chicken_3_with_where |     11.006       |      8.50661     |     -2.49936     |     ms |  -22.71% |
|                                 100th percentile service time |            chicken_3_with_where |     12.363       |     16.4753      |      4.11232     |     ms |  +33.26% |
|                                                    error rate |            chicken_3_with_where |      0           |      0           |      0           |      % |    0.00% |
|                                                Min Throughput |                       chicken_4 |      0.0146495   |      0.00609788  |     -0.00855     |  ops/s |  -58.37% |
|                                               Mean Throughput |                       chicken_4 |      0.0148707   |      0.00611548  |     -0.00876     |  ops/s |  -58.88% |
|                                             Median Throughput |                       chicken_4 |      0.014836    |      0.00611294  |     -0.00872     |  ops/s |  -58.80% |
|                                                Max Throughput |                       chicken_4 |      0.0151862   |      0.00613525  |     -0.00905     |  ops/s |  -59.60% |
|                                       50th percentile latency |                       chicken_4 |  61668.4         | 161353           |  99684.6         |     ms | +161.65% |
|                                      100th percentile latency |                       chicken_4 |  82734.9         | 168012           |  85277.5         |     ms | +103.07% |
|                                  50th percentile service time |                       chicken_4 |  61668.4         | 161353           |  99684.6         |     ms | +161.65% |
|                                 100th percentile service time |                       chicken_4 |  82734.9         | 168012           |  85277.5         |     ms | +103.07% |
|                                                    error rate |                       chicken_4 |      0           |      0           |      0           |      % |    0.00% |

Summary:

Metric	limit_500	chicken_1	chicken_2	chicken_3	chicken_3_with_where	chicken_4
50th percentile latency	-2.54%	+9.38%	+81.63%	-3.69%	-22.71%	+161.65%
100th percentile latency	-3.08%	+8.68%	+82.42%	-2.92%	+33.26%	+103.07%

Looks like the new approach is comparable for limit_500 chicken_3, and chicken_3_with_where, slightly worse for chicken_1, and much worse for chicken_2 and chicken_4

…d-fieldinfo-explosion

elasticsearchmachine · 2025-09-04T01:48:00Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

martijnvg

LGTM, thanks Jordan!

martijnvg · 2025-09-05T09:29:04Z

...va/org/elasticsearch/xpack/core/security/authz/accesscontrol/SecurityIndexReaderWrapper.java

            }

-            return permissions.getFieldPermissions().filter(wrappedReader);
+            var indexVersionCreated = searchExecutionContextProvider.apply(shardId).indexVersionCreated();


Passing down index version here is a little intrusive. However I don't think it is too bad. In other places of the code base passing down index version is a common pattern.

The alternative we talked about, which would store coalesced ignored source in a different field, would also be an intrusive change.

So I'm okay with this approach.

…d-fieldinfo-explosion

Follow-up to #133839 to remove the feature flag and enable the feature in production.

jordan-powers added 2 commits August 29, 2025 12:39

Store all ignored source stored fields under same name

9cddc15

Fix tests

f11cc4d

jordan-powers requested a review from martijnvg August 29, 2025 16:50

jordan-powers self-assigned this Aug 29, 2025

jordan-powers added >non-issue Team:StorageEngine :StorageEngine/Mapping The storage related side of mappings v9.2.0 labels Aug 29, 2025

Fix SecurityIndexReaderWrapperUnitTests

f6a68e9

jordan-powers added 2 commits September 3, 2025 15:26

Move field subset filtering into IgnoredSourceFormat

f5861a0

Merge remote-tracking branch 'upstream/main' into ignored-source-avoi…

eb7a83a

…d-fieldinfo-explosion

jordan-powers marked this pull request as ready for review September 4, 2025 01:47

elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Sep 4, 2025

martijnvg approved these changes Sep 5, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/main' into ignored-source-avoi…

c907f97

…d-fieldinfo-explosion

jordan-powers enabled auto-merge (squash) September 5, 2025 17:40

jordan-powers merged commit 84470e9 into elastic:main Sep 5, 2025
33 checks passed

This was referenced Sep 8, 2025

Special field visitor for _ignored_source #131885

Closed

Use an unique stored field name for each ignored source entry #130919

Closed

Investigate using specialized field visitor when loading ignored source. #130886

Closed

jordan-powers mentioned this pull request Sep 18, 2025

Remove feature flag for coalesced ignored source #135039

Merged

jordan-powers added a commit that referenced this pull request Sep 26, 2025

Remove feature flag for coalesced ignored source (#135039)

322e5fe

Follow-up to #133839 to remove the feature flag and enable the feature in production.

jordan-powers deleted the ignored-source-avoid-fieldinfo-explosion branch October 1, 2025 15:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Alternate approach to speed up ignored_source access #133839

Alternate approach to speed up ignored_source access #133839

Uh oh!

jordan-powers commented Aug 29, 2025

Uh oh!

jordan-powers commented Sep 2, 2025

Uh oh!

elasticsearchmachine commented Sep 4, 2025

Uh oh!

martijnvg left a comment

Uh oh!

martijnvg Sep 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Alternate approach to speed up ignored_source access #133839

Alternate approach to speed up ignored_source access #133839

Uh oh!

Conversation

jordan-powers commented Aug 29, 2025

Uh oh!

jordan-powers commented Sep 2, 2025

Uh oh!

elasticsearchmachine commented Sep 4, 2025

Uh oh!

martijnvg left a comment

Choose a reason for hiding this comment

Uh oh!

martijnvg Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants