Skip to content

Conversation

jordan-powers
Copy link
Contributor

In #132142 and #132428 we split up ignored_source entries into distinct lucene fields, then added an optimized field visitor to speed up retrieving unmapped values for INSIST_🐔.

However, since this approach creates a unique lucene field for every ignored_source entry, we can very quickly have a lot of lucene fields if there are a lot of unique unmapped fields per document. This can cause significant slowdowns in indexing throughput and merge time.

This PR addresses those limitations by reverting back to keeping all ignored_source entries under the same lucene field. However, we still keep some of the speedups from that prior work by continuing to coalesce multiple ignored_source entries for the same field into a single entry, allowing the field visitor to exit early.

Unfortunately, we do lose some time compared to the original optimizations because now the field visitor cannot look at the fieldInfo to decide whether or not to visit a field, and it instead needs to actually visit and materialize each ignored_source entry before it can decide whether or not to keep it.

@jordan-powers
Copy link
Contributor Author

Here are the results of running the INSIST_🐔 benchmark comparing against the current main (with the feature flag enabled, splitting up _ignored_source into multiple _ignored_source.* fields).

|                                                        Metric |                            Task |         Baseline |        Contender |             Diff |   Unit |   Diff % |
|--------------------------------------------------------------:|--------------------------------:|-----------------:|-----------------:|-----------------:|-------:|---------:|
|                    Cumulative indexing time of primary shards |                                 |    785.479       |    764.648       |    -20.8316      |    min |   -2.65% |
|             Min cumulative indexing time across primary shard |                                 |      0           |      0           |      0           |    min |    0.00% |
|          Median cumulative indexing time across primary shard |                                 |      8.35517     |      8.57839     |      0.22322     |    min |   +2.67% |
|             Max cumulative indexing time across primary shard |                                 |    181.718       |    162.163       |    -19.5545      |    min |  -10.76% |
|           Cumulative indexing throttle time of primary shards |                                 |      0           |      0           |      0           |    min |    0.00% |
|    Min cumulative indexing throttle time across primary shard |                                 |      0           |      0           |      0           |    min |    0.00% |
| Median cumulative indexing throttle time across primary shard |                                 |      0           |      0           |      0           |    min |    0.00% |
|    Max cumulative indexing throttle time across primary shard |                                 |      0           |      0           |      0           |    min |    0.00% |
|                       Cumulative merge time of primary shards |                                 |    225.36        |    224.233       |     -1.12738     |    min |   -0.50% |
|                      Cumulative merge count of primary shards |                                 |    406           |    403           |     -3           |        |   -0.74% |
|                Min cumulative merge time across primary shard |                                 |      0           |      0           |      0           |    min |    0.00% |
|             Median cumulative merge time across primary shard |                                 |      1.41965     |      1.33246     |     -0.08719     |    min |   -6.14% |
|                Max cumulative merge time across primary shard |                                 |     59.218       |     59.0345      |     -0.1835      |    min |   -0.31% |
|              Cumulative merge throttle time of primary shards |                                 |     68.2949      |     66.5924      |     -1.70247     |    min |   -2.49% |
|       Min cumulative merge throttle time across primary shard |                                 |      0           |      0           |      0           |    min |    0.00% |
|    Median cumulative merge throttle time across primary shard |                                 |      0.395342    |      0.3402      |     -0.05514     |    min |  -13.95% |
|       Max cumulative merge throttle time across primary shard |                                 |     17.6527      |     19.0972      |      1.44452     |    min |   +8.18% |
|                     Cumulative refresh time of primary shards |                                 |      9.08145     |      8.82248     |     -0.25897     |    min |   -2.85% |
|                    Cumulative refresh count of primary shards |                                 |   6254           |   6110           |   -144           |        |   -2.30% |
|              Min cumulative refresh time across primary shard |                                 |      0           |      0           |      0           |    min |    0.00% |
|           Median cumulative refresh time across primary shard |                                 |      0.049325    |      0.05675     |      0.00743     |    min |  +15.05% |
|              Max cumulative refresh time across primary shard |                                 |      2.57865     |      2.64482     |      0.06617     |    min |   +2.57% |
|                       Cumulative flush time of primary shards |                                 |    142.579       |    140.728       |     -1.85113     |    min |   -1.30% |
|                      Cumulative flush count of primary shards |                                 |   5805           |   5662           |   -143           |        |   -2.46% |
|                Min cumulative flush time across primary shard |                                 |      6.66667e-05 |      6.66667e-05 |      0           |    min |    0.00% |
|             Median cumulative flush time across primary shard |                                 |      1.83419     |      1.79902     |     -0.03517     |    min |   -1.92% |
|                Max cumulative flush time across primary shard |                                 |     26.2029      |     27.0796      |      0.87673     |    min |   +3.35% |
|                                       Total Young Gen GC time |                                 |    273.829       |    267.461       |     -6.368       |      s |   -2.33% |
|                                      Total Young Gen GC count |                                 |  15053           |  19602           |   4549           |        |  +30.22% |
|                                         Total Old Gen GC time |                                 |      0           |      0           |      0           |      s |    0.00% |
|                                        Total Old Gen GC count |                                 |      0           |      0           |      0           |        |    0.00% |
|                                                  Dataset size |                                 |     52.3843      |     52.3382      |     -0.04602     |     GB |   -0.09% |
|                                                    Store size |                                 |     52.3843      |     52.3382      |     -0.04602     |     GB |   -0.09% |
|                                                 Translog size |                                 |      4.09782e-06 |      4.09782e-06 |      0           |     GB |    0.00% |
|                                        Heap used for segments |                                 |      0           |      0           |      0           |     MB |    0.00% |
|                                      Heap used for doc values |                                 |      0           |      0           |      0           |     MB |    0.00% |
|                                           Heap used for terms |                                 |      0           |      0           |      0           |     MB |    0.00% |
|                                           Heap used for norms |                                 |      0           |      0           |      0           |     MB |    0.00% |
|                                          Heap used for points |                                 |      0           |      0           |      0           |     MB |    0.00% |
|                                   Heap used for stored fields |                                 |      0           |      0           |      0           |     MB |    0.00% |
|                                                 Segment count |                                 |    969           |    886           |    -83           |        |   -8.57% |
|                                   Total Ingest Pipeline count |                                 |      4.88622e+08 |      4.88622e+08 |      0           |        |    0.00% |
|                                    Total Ingest Pipeline time |                                 |      1.64616e+07 |      1.76306e+07 |      1.16897e+06 |     ms |   +7.10% |
|                                  Total Ingest Pipeline failed |                                 |      0           |      0           |      0           |        |    0.00% |
|                                                Min Throughput |                insert-pipelines |      6.23734     |      6.46784     |      0.2305      |  ops/s |   +3.70% |
|                                               Mean Throughput |                insert-pipelines |      6.23734     |      6.46784     |      0.2305      |  ops/s |   +3.70% |
|                                             Median Throughput |                insert-pipelines |      6.23734     |      6.46784     |      0.2305      |  ops/s |   +3.70% |
|                                                Max Throughput |                insert-pipelines |      6.23734     |      6.46784     |      0.2305      |  ops/s |   +3.70% |
|                                      100th percentile latency |                insert-pipelines |   2352.06        |   2270.18        |    -81.8831      |     ms |   -3.48% |
|                                 100th percentile service time |                insert-pipelines |   2352.06        |   2270.18        |    -81.8831      |     ms |   -3.48% |
|                                                    error rate |                insert-pipelines |      0           |      0           |      0           |      % |    0.00% |
|                                                Min Throughput |                      insert-ilm |     17.4942      |     19.5736      |      2.07938     |  ops/s |  +11.89% |
|                                               Mean Throughput |                      insert-ilm |     17.4942      |     19.5736      |      2.07938     |  ops/s |  +11.89% |
|                                             Median Throughput |                      insert-ilm |     17.4942      |     19.5736      |      2.07938     |  ops/s |  +11.89% |
|                                                Max Throughput |                      insert-ilm |     17.4942      |     19.5736      |      2.07938     |  ops/s |  +11.89% |
|                                      100th percentile latency |                      insert-ilm |     56.2021      |     50.0109      |     -6.19124     |     ms |  -11.02% |
|                                 100th percentile service time |                      insert-ilm |     56.2021      |     50.0109      |     -6.19124     |     ms |  -11.02% |
|                                                    error rate |                      insert-ilm |      0           |      0           |      0           |      % |    0.00% |
|                                      100th percentile latency | update-custom-package-templates |     24.8715      |     13.6423      |    -11.2291      |     ms |  -45.15% |
|                                 100th percentile service time | update-custom-package-templates |     24.8715      |     13.6423      |    -11.2291      |     ms |  -45.15% |
|                                                    error rate | update-custom-package-templates |      0           |      0           |      0           |      % |    0.00% |
|                                                Min Throughput |                      bulk-index |   1022.43        |   1143.53        |    121.098       | docs/s |  +11.84% |
|                                               Mean Throughput |                      bulk-index |  38045           |  39286.2         |   1241.2         | docs/s |   +3.26% |
|                                             Median Throughput |                      bulk-index |  37923           |  39170           |   1246.96        | docs/s |   +3.29% |
|                                                Max Throughput |                      bulk-index |  39995.6         |  43144           |   3148.47        | docs/s |   +7.87% |
|                                       50th percentile latency |                      bulk-index |   1331.6         |   1283.84        |    -47.756       |     ms |   -3.59% |
|                                       90th percentile latency |                      bulk-index |   2203.3         |   2183.56        |    -19.7393      |     ms |   -0.90% |
|                                       99th percentile latency |                      bulk-index |   3556.75        |   3432.9         |   -123.855       |     ms |   -3.48% |
|                                     99.9th percentile latency |                      bulk-index |   8592.96        |   9080.04        |    487.081       |     ms |   +5.67% |
|                                    99.99th percentile latency |                      bulk-index |  12180.6         |  12142.9         |    -37.7469      |     ms |   -0.31% |
|                                      100th percentile latency |                      bulk-index |  16025.5         |  16206.7         |    181.283       |     ms |   +1.13% |
|                                  50th percentile service time |                      bulk-index |   1335.2         |   1282.53        |    -52.6771      |     ms |   -3.95% |
|                                  90th percentile service time |                      bulk-index |   2203.11        |   2181.83        |    -21.2785      |     ms |   -0.97% |
|                                  99th percentile service time |                      bulk-index |   3556.65        |   3385.56        |   -171.092       |     ms |   -4.81% |
|                                99.9th percentile service time |                      bulk-index |   8581.63        |   9086.45        |    504.821       |     ms |   +5.88% |
|                               99.99th percentile service time |                      bulk-index |  12210.8         |  12132.9         |    -77.9286      |     ms |   -0.64% |
|                                 100th percentile service time |                      bulk-index |  16025.5         |  16206.7         |    181.283       |     ms |   +1.13% |
|                                                    error rate |                      bulk-index |      0           |      0           |      0           |      % |    0.00% |
|                                                Min Throughput |                       limit_500 |      7.65503     |     15.9407      |      8.28565     |  ops/s | +108.24% |
|                                               Mean Throughput |                       limit_500 |     13.6682      |     15.9407      |      2.27252     |  ops/s |  +16.63% |
|                                             Median Throughput |                       limit_500 |     13.6682      |     15.9407      |      2.27252     |  ops/s |  +16.63% |
|                                                Max Throughput |                       limit_500 |     19.6813      |     15.9407      |     -3.74061     |  ops/s |  -19.01% |
|                                       50th percentile latency |                       limit_500 |     16.3158      |     15.9017      |     -0.41408     |     ms |   -2.54% |
|                                       90th percentile latency |                       limit_500 |     19.9405      |     21.5721      |      1.63163     |     ms |   +8.18% |
|                                       99th percentile latency |                       limit_500 |     24.5963      |     25.2077      |      0.61139     |     ms |   +2.49% |
|                                      100th percentile latency |                       limit_500 |     26.6793      |     25.8568      |     -0.8225      |     ms |   -3.08% |
|                                  50th percentile service time |                       limit_500 |     16.3158      |     15.9017      |     -0.41408     |     ms |   -2.54% |
|                                  90th percentile service time |                       limit_500 |     19.9405      |     21.5721      |      1.63163     |     ms |   +8.18% |
|                                  99th percentile service time |                       limit_500 |     24.5963      |     25.2077      |      0.61139     |     ms |   +2.49% |
|                                 100th percentile service time |                       limit_500 |     26.6793      |     25.8568      |     -0.8225      |     ms |   -3.08% |
|                                                    error rate |                       limit_500 |      0           |      0           |      0           |      % |    0.00% |
|                                                Min Throughput |                       chicken_1 |      6.70301     |      6.78201     |      0.079       |  ops/s |   +1.18% |
|                                               Mean Throughput |                       chicken_1 |      6.70301     |      6.78201     |      0.079       |  ops/s |   +1.18% |
|                                             Median Throughput |                       chicken_1 |      6.70301     |      6.78201     |      0.079       |  ops/s |   +1.18% |
|                                                Max Throughput |                       chicken_1 |      6.70301     |      6.78201     |      0.079       |  ops/s |   +1.18% |
|                                       50th percentile latency |                       chicken_1 |     43.1182      |     47.1608      |      4.0426      |     ms |   +9.38% |
|                                      100th percentile latency |                       chicken_1 |     47.8871      |     52.0452      |      4.15809     |     ms |   +8.68% |
|                                  50th percentile service time |                       chicken_1 |     43.1182      |     47.1608      |      4.0426      |     ms |   +9.38% |
|                                 100th percentile service time |                       chicken_1 |     47.8871      |     52.0452      |      4.15809     |     ms |   +8.68% |
|                                                    error rate |                       chicken_1 |      0           |      0           |      0           |      % |    0.00% |
|                                                Min Throughput |                       chicken_2 |      0.0034697   |      0.00193971  |     -0.00153     |  ops/s |  -44.10% |
|                                               Mean Throughput |                       chicken_2 |      0.00349247  |      0.00194296  |     -0.00155     |  ops/s |  -44.37% |
|                                             Median Throughput |                       chicken_2 |      0.00349608  |      0.00194315  |     -0.00155     |  ops/s |  -44.42% |
|                                                Max Throughput |                       chicken_2 |      0.00350761  |      0.00194476  |     -0.00156     |  ops/s |  -44.56% |
|                                       50th percentile latency |                       chicken_2 | 282029           | 512244           | 230214           |     ms |  +81.63% |
|                                      100th percentile latency |                       chicken_2 | 282287           | 514959           | 232672           |     ms |  +82.42% |
|                                  50th percentile service time |                       chicken_2 | 282029           | 512244           | 230214           |     ms |  +81.63% |
|                                 100th percentile service time |                       chicken_2 | 282287           | 514959           | 232672           |     ms |  +82.42% |
|                                                    error rate |                       chicken_2 |      0           |      0           |      0           |      % |    0.00% |
|                                                Min Throughput |                       chicken_3 |      0.00186763  |      0.00198938  |      0.00012     |  ops/s |   +6.52% |
|                                               Mean Throughput |                       chicken_3 |      0.00187159  |      0.00202095  |      0.00015     |  ops/s |   +7.98% |
|                                             Median Throughput |                       chicken_3 |      0.00187049  |      0.00201753  |      0.00015     |  ops/s |   +7.86% |
|                                                Max Throughput |                       chicken_3 |      0.00187785  |      0.00206066  |      0.00018     |  ops/s |   +9.73% |
|                                       50th percentile latency |                       chicken_3 | 538622           | 518768           | -19854.4         |     ms |   -3.69% |
|                                      100th percentile latency |                       chicken_3 | 539738           | 523966           | -15771.6         |     ms |   -2.92% |
|                                  50th percentile service time |                       chicken_3 | 538622           | 518768           | -19854.4         |     ms |   -3.69% |
|                                 100th percentile service time |                       chicken_3 | 539738           | 523966           | -15771.6         |     ms |   -2.92% |
|                                                    error rate |                       chicken_3 |      0           |      0           |      0           |      % |    0.00% |
|                                                Min Throughput |            chicken_3_with_where |     27.5639      |     31.0218      |      3.45789     |  ops/s |  +12.55% |
|                                               Mean Throughput |            chicken_3_with_where |     27.5639      |     31.0218      |      3.45789     |  ops/s |  +12.55% |
|                                             Median Throughput |            chicken_3_with_where |     27.5639      |     31.0218      |      3.45789     |  ops/s |  +12.55% |
|                                                Max Throughput |            chicken_3_with_where |     27.5639      |     31.0218      |      3.45789     |  ops/s |  +12.55% |
|                                       50th percentile latency |            chicken_3_with_where |     11.006       |      8.50661     |     -2.49936     |     ms |  -22.71% |
|                                      100th percentile latency |            chicken_3_with_where |     12.363       |     16.4753      |      4.11232     |     ms |  +33.26% |
|                                  50th percentile service time |            chicken_3_with_where |     11.006       |      8.50661     |     -2.49936     |     ms |  -22.71% |
|                                 100th percentile service time |            chicken_3_with_where |     12.363       |     16.4753      |      4.11232     |     ms |  +33.26% |
|                                                    error rate |            chicken_3_with_where |      0           |      0           |      0           |      % |    0.00% |
|                                                Min Throughput |                       chicken_4 |      0.0146495   |      0.00609788  |     -0.00855     |  ops/s |  -58.37% |
|                                               Mean Throughput |                       chicken_4 |      0.0148707   |      0.00611548  |     -0.00876     |  ops/s |  -58.88% |
|                                             Median Throughput |                       chicken_4 |      0.014836    |      0.00611294  |     -0.00872     |  ops/s |  -58.80% |
|                                                Max Throughput |                       chicken_4 |      0.0151862   |      0.00613525  |     -0.00905     |  ops/s |  -59.60% |
|                                       50th percentile latency |                       chicken_4 |  61668.4         | 161353           |  99684.6         |     ms | +161.65% |
|                                      100th percentile latency |                       chicken_4 |  82734.9         | 168012           |  85277.5         |     ms | +103.07% |
|                                  50th percentile service time |                       chicken_4 |  61668.4         | 161353           |  99684.6         |     ms | +161.65% |
|                                 100th percentile service time |                       chicken_4 |  82734.9         | 168012           |  85277.5         |     ms | +103.07% |
|                                                    error rate |                       chicken_4 |      0           |      0           |      0           |      % |    0.00% |

Summary:

Metric limit_500 chicken_1 chicken_2 chicken_3 chicken_3_with_where chicken_4
50th percentile latency -2.54% +9.38% +81.63% -3.69% -22.71% +161.65%
100th percentile latency -3.08% +8.68% +82.42% -2.92% +33.26% +103.07%

Looks like the new approach is comparable for limit_500 chicken_3, and chicken_3_with_where, slightly worse for chicken_1, and much worse for chicken_2 and chicken_4

@jordan-powers jordan-powers marked this pull request as ready for review September 4, 2025 01:47
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Sep 4, 2025
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Jordan!

}

return permissions.getFieldPermissions().filter(wrappedReader);
var indexVersionCreated = searchExecutionContextProvider.apply(shardId).indexVersionCreated();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing down index version here is a little intrusive. However I don't think it is too bad. In other places of the code base passing down index version is a common pattern.

The alternative we talked about, which would store coalesced ignored source in a different field, would also be an intrusive change.

So I'm okay with this approach.

@jordan-powers jordan-powers enabled auto-merge (squash) September 5, 2025 17:40
@jordan-powers jordan-powers merged commit 84470e9 into elastic:main Sep 5, 2025
33 checks passed
jordan-powers added a commit that referenced this pull request Sep 26, 2025
Follow-up to #133839 to remove the feature flag and enable the feature in
production.
@jordan-powers jordan-powers deleted the ignored-source-avoid-fieldinfo-explosion branch October 1, 2025 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>non-issue serverless-linked Added by automation, don't add manually :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants