Move building fieldNames set to class constructor #131722

jordan-powers · 2025-07-22T18:35:31Z

This patch moves building the set of fieldName prefixes in the FallbackSyntheticSourceBlockLoader to the constructor. This set does not change between invocations (since the fieldName is final), so we can just do the work once when constructing the BlockLoader instead of per-document.

Resolves #130887

elasticsearchmachine · 2025-07-22T18:36:01Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

jordan-powers · 2025-07-22T18:50:49Z

Here are the results of the latest benchmark:

|                                                        Metric |                 Task |         Baseline |        Contender |         Diff |   Unit |   Diff % |
|--------------------------------------------------------------:|---------------------:|-----------------:|-----------------:|-------------:|-------:|---------:|
|                    Cumulative indexing time of primary shards |                      |    726.581       |    664.553       |    -62.0272  |    min |   -8.54% |
|             Min cumulative indexing time across primary shard |                      |      2.37642     |      2.10365     |     -0.27277 |    min |  -11.48% |
|          Median cumulative indexing time across primary shard |                      |      8.16167     |      7.20188     |     -0.95978 |    min |  -11.76% |
|             Max cumulative indexing time across primary shard |                      |    150.58        |    147.443       |     -3.13755 |    min |   -2.08% |
|           Cumulative indexing throttle time of primary shards |                      |      0           |      0           |      0       |    min |    0.00% |
|    Min cumulative indexing throttle time across primary shard |                      |      0           |      0           |      0       |    min |    0.00% |
| Median cumulative indexing throttle time across primary shard |                      |      0           |      0           |      0       |    min |    0.00% |
|    Max cumulative indexing throttle time across primary shard |                      |      0           |      0           |      0       |    min |    0.00% |
|                       Cumulative merge time of primary shards |                      |    218.303       |    219.45        |      1.1471  |    min |   +0.53% |
|                      Cumulative merge count of primary shards |                      |    407           |    539           |    132       |        |  +32.43% |
|                Min cumulative merge time across primary shard |                      |      0.2705      |      0.32085     |      0.05035 |    min |  +18.61% |
|             Median cumulative merge time across primary shard |                      |      1.54745     |      1.45553     |     -0.09192 |    min |   -5.94% |
|                Max cumulative merge time across primary shard |                      |     53.3357      |     58.4563      |      5.1206  |    min |   +9.60% |
|              Cumulative merge throttle time of primary shards |                      |     64.5853      |     72.5918      |      8.00655 |    min |  +12.40% |
|       Min cumulative merge throttle time across primary shard |                      |      0.0678      |      0.0751167   |      0.00732 |    min |  +10.79% |
|    Median cumulative merge throttle time across primary shard |                      |      0.389333    |      0.479883    |      0.09055 |    min |  +23.26% |
|       Max cumulative merge throttle time across primary shard |                      |     16.9648      |     18.6675      |      1.70272 |    min |  +10.04% |
|                     Cumulative refresh time of primary shards |                      |      9.88703     |      7.54207     |     -2.34497 |    min |  -23.72% |
|                    Cumulative refresh count of primary shards |                      |   6125           |   6139           |     14       |        |   +0.23% |
|              Min cumulative refresh time across primary shard |                      |      0.0117833   |      0.03235     |      0.02057 |    min | +174.54% |
|           Median cumulative refresh time across primary shard |                      |      0.0574167   |      0.07385     |      0.01643 |    min |  +28.62% |
|              Max cumulative refresh time across primary shard |                      |      2.76693     |      1.83095     |     -0.93598 |    min |  -33.83% |
|                       Cumulative flush time of primary shards |                      |    145.41        |    125.858       |    -19.5528  |    min |  -13.45% |
|                      Cumulative flush count of primary shards |                      |   5702           |   5590           |   -112       |        |   -1.96% |
|                Min cumulative flush time across primary shard |                      |      0.505133    |      0.519267    |      0.01413 |    min |   +2.80% |
|             Median cumulative flush time across primary shard |                      |      1.95388     |      1.75958     |     -0.1943  |    min |   -9.94% |
|                Max cumulative flush time across primary shard |                      |     26.3399      |     22.6715      |     -3.66847 |    min |  -13.93% |
|                                       Total Young Gen GC time |                      |    249.625       |    160.537       |    -89.088   |      s |  -35.69% |
|                                      Total Young Gen GC count |                      |  20598           |  20161           |   -437       |        |   -2.12% |
|                                         Total Old Gen GC time |                      |      0           |      0           |      0       |      s |    0.00% |
|                                        Total Old Gen GC count |                      |      0           |      0           |      0       |        |    0.00% |
|                                                  Dataset size |                      |     52.4946      |     51.8767      |     -0.61794 |     GB |   -1.18% |
|                                                    Store size |                      |     52.4946      |     51.8767      |     -0.61794 |     GB |   -1.18% |
|                                                 Translog size |                      |      3.99537e-06 |      3.99537e-06 |      0       |     GB |    0.00% |
|                                        Heap used for segments |                      |      0           |      0           |      0       |     MB |    0.00% |
|                                      Heap used for doc values |                      |      0           |      0           |      0       |     MB |    0.00% |
|                                           Heap used for terms |                      |      0           |      0           |      0       |     MB |    0.00% |
|                                           Heap used for norms |                      |      0           |      0           |      0       |     MB |    0.00% |
|                                          Heap used for points |                      |      0           |      0           |      0       |     MB |    0.00% |
|                                   Heap used for stored fields |                      |      0           |      0           |      0       |     MB |    0.00% |
|                                                 Segment count |                      |    946           |   1078           |    132       |        |  +13.95% |
|                                   Total Ingest Pipeline count |                      |      4.88622e+08 |      4.8861e+08  | -12000       |        |   -0.00% |
|                                    Total Ingest Pipeline time |                      |      1.65978e+07 |      1.71165e+07 | 518754       |     ms |   +3.13% |
|                                  Total Ingest Pipeline failed |                      |      0           |      0           |      0       |        |    0.00% |
|                                                Min Throughput |            limit_500 |     16.7281      |     10.9373      |     -5.79079 |  ops/s |  -34.62% |
|                                               Mean Throughput |            limit_500 |     16.7281      |     17.9326      |      1.20457 |  ops/s |   +7.20% |
|                                             Median Throughput |            limit_500 |     16.7281      |     17.9326      |      1.20457 |  ops/s |   +7.20% |
|                                                Max Throughput |            limit_500 |     16.7281      |     24.928       |      8.19992 |  ops/s |  +49.02% |
|                                       50th percentile latency |            limit_500 |     15.0181      |     14.3872      |     -0.63085 |     ms |   -4.20% |
|                                       90th percentile latency |            limit_500 |     19.3995      |     17.8794      |     -1.5201  |     ms |   -7.84% |
|                                       99th percentile latency |            limit_500 |     22.621       |     19.9339      |     -2.68712 |     ms |  -11.88% |
|                                      100th percentile latency |            limit_500 |     24.5074      |     20.6024      |     -3.90491 |     ms |  -15.93% |
|                                  50th percentile service time |            limit_500 |     15.0181      |     14.3872      |     -0.63085 |     ms |   -4.20% |
|                                  90th percentile service time |            limit_500 |     19.3995      |     17.8794      |     -1.5201  |     ms |   -7.84% |
|                                  99th percentile service time |            limit_500 |     22.621       |     19.9339      |     -2.68712 |     ms |  -11.88% |
|                                 100th percentile service time |            limit_500 |     24.5074      |     20.6024      |     -3.90491 |     ms |  -15.93% |
|                                                    error rate |            limit_500 |      0           |      0           |      0       |      % |    0.00% |
|                                                Min Throughput |            chicken_1 |      7.86896     |      8.42789     |      0.55894 |  ops/s |   +7.10% |
|                                               Mean Throughput |            chicken_1 |      7.86896     |      8.42789     |      0.55894 |  ops/s |   +7.10% |
|                                             Median Throughput |            chicken_1 |      7.86896     |      8.42789     |      0.55894 |  ops/s |   +7.10% |
|                                                Max Throughput |            chicken_1 |      7.86896     |      8.42789     |      0.55894 |  ops/s |   +7.10% |
|                                       50th percentile latency |            chicken_1 |     40.2074      |     40.8175      |      0.61005 |     ms |   +1.52% |
|                                      100th percentile latency |            chicken_1 |     47.4822      |     44.8914      |     -2.59075 |     ms |   -5.46% |
|                                  50th percentile service time |            chicken_1 |     40.2074      |     40.8175      |      0.61005 |     ms |   +1.52% |
|                                 100th percentile service time |            chicken_1 |     47.4822      |     44.8914      |     -2.59075 |     ms |   -5.46% |
|                                                    error rate |            chicken_1 |      0           |      0           |      0       |      % |    0.00% |
|                                                Min Throughput |            chicken_2 |      0.00166012  |      0.00191644  |      0.00026 |  ops/s |  +15.44% |
|                                               Mean Throughput |            chicken_2 |      0.00166321  |      0.00192946  |      0.00027 |  ops/s |  +16.01% |
|                                             Median Throughput |            chicken_2 |      0.00166267  |      0.00193378  |      0.00027 |  ops/s |  +16.31% |
|                                                Max Throughput |            chicken_2 |      0.00166679  |      0.00193727  |      0.00027 |  ops/s |  +16.23% |
|                                       50th percentile latency |            chicken_2 | 605252           | 534437           | -70814.8     |     ms |  -11.70% |
|                                      100th percentile latency |            chicken_2 | 606348           | 536243           | -70105.1     |     ms |  -11.56% |
|                                  50th percentile service time |            chicken_2 | 605252           | 534437           | -70814.8     |     ms |  -11.70% |
|                                 100th percentile service time |            chicken_2 | 606348           | 536243           | -70105.1     |     ms |  -11.56% |
|                                                    error rate |            chicken_2 |      0           |      0           |      0       |      % |    0.00% |
|                                                Min Throughput |            chicken_3 |      0.00123918  |      0.00135812  |      0.00012 |  ops/s |   +9.60% |
|                                               Mean Throughput |            chicken_3 |      0.00124023  |      0.00136034  |      0.00012 |  ops/s |   +9.68% |
|                                             Median Throughput |            chicken_3 |      0.00123986  |      0.00135923  |      0.00012 |  ops/s |   +9.63% |
|                                                Max Throughput |            chicken_3 |      0.00124248  |      0.00136437  |      0.00012 |  ops/s |   +9.81% |
|                                       50th percentile latency |            chicken_3 | 807223           | 739102           | -68120.9     |     ms |   -8.44% |
|                                      100th percentile latency |            chicken_3 | 812655           | 741483           | -71172.6     |     ms |   -8.76% |
|                                  50th percentile service time |            chicken_3 | 807223           | 739102           | -68120.9     |     ms |   -8.44% |
|                                 100th percentile service time |            chicken_3 | 812655           | 741483           | -71172.6     |     ms |   -8.76% |
|                                                    error rate |            chicken_3 |      0           |      0           |      0       |      % |    0.00% |
|                                                Min Throughput | chicken_3_with_where |     34.6184      |     26.4964      |     -8.12203 |  ops/s |  -23.46% |
|                                               Mean Throughput | chicken_3_with_where |     34.6184      |     26.4964      |     -8.12203 |  ops/s |  -23.46% |
|                                             Median Throughput | chicken_3_with_where |     34.6184      |     26.4964      |     -8.12203 |  ops/s |  -23.46% |
|                                                Max Throughput | chicken_3_with_where |     34.6184      |     26.4964      |     -8.12203 |  ops/s |  -23.46% |
|                                       50th percentile latency | chicken_3_with_where |      7.3965      |      7.71256     |      0.31607 |     ms |   +4.27% |
|                                      100th percentile latency | chicken_3_with_where |      9.35541     |     10.2378      |      0.88239 |     ms |   +9.43% |
|                                  50th percentile service time | chicken_3_with_where |      7.3965      |      7.71256     |      0.31607 |     ms |   +4.27% |
|                                 100th percentile service time | chicken_3_with_where |      9.35541     |     10.2378      |      0.88239 |     ms |   +9.43% |
|                                                    error rate | chicken_3_with_where |      0           |      0           |      0       |      % |    0.00% |
|                                                Min Throughput |            chicken_4 |      0.00813094  |      0.00570994  |     -0.00242 |  ops/s |  -29.78% |
|                                               Mean Throughput |            chicken_4 |      0.00832372  |      0.00578681  |     -0.00254 |  ops/s |  -30.48% |
|                                             Median Throughput |            chicken_4 |      0.00833839  |      0.00579975  |     -0.00254 |  ops/s |  -30.45% |
|                                                Max Throughput |            chicken_4 |      0.00846962  |      0.00588182  |     -0.00259 |  ops/s |  -30.55% |
|                                       50th percentile latency |            chicken_4 | 106918           | 179663           |  72745.5     |     ms |  +68.04% |
|                                      100th percentile latency |            chicken_4 | 137511           | 189260           |  51749.4     |     ms |  +37.63% |
|                                  50th percentile service time |            chicken_4 | 106918           | 179663           |  72745.5     |     ms |  +68.04% |
|                                 100th percentile service time |            chicken_4 | 137511           | 189260           |  51749.4     |     ms |  +37.63% |
|                                                    error rate |            chicken_4 |      0           |      0           |      0       |      % |    0.00% |

Summary:

	Median Throughput	Median Latency
limit_500	+7.20%	-4.20%
chicken_1	+7.10%	+1.52%
chicken_2	+16.01%	-11.70%
chicken_3	+9.63%	-8.44%
chicken_3_with_where	-23.46%	+4.27%
chicken_4	-30.45%	+68.04%

(Not sure how useful throughput is as a metric here, from what I understand for search queries the important metric is latency.)

At first glance, this is not the across-the-board improvement I was expecting. While the results are promising for limit_500, chicken_1, chicken_2, and chicken_3, the results look much worse for chicken_3_with_where and chicken_4.
However, I've spent some time looking at the dashboard, and it seems those results might just be noisy.

I'm going to run the benchmark, collect a profile, and generate a flamegraph to double-check that we're not seeing as much time spent in HashSet#add() as we saw in the original profile.

(I tried to collect a profile during the last benchmark run, but it seems I collected it during the indexing step when we're actually concerned with the search steps, so I need to re-run and re-collect).

parkertimmins

LGTM

…ck-loader-optimization

jordan-powers · 2025-07-23T15:57:55Z

Ok, with this change I'm not seeing HashSet.add or String.split at all in the flamegraph anymore.

…ck-loader-optimization

Move building fieldNames set to class constructor

6bb02be

jordan-powers self-assigned this Jul 22, 2025

jordan-powers added >non-issue :StorageEngine/Mapping The storage related side of mappings v9.2.0 labels Jul 22, 2025

elasticsearchmachine added the Team:StorageEngine label Jul 22, 2025

parkertimmins approved these changes Jul 22, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/main' into unmapped-fields-blo…

fc3ca4f

…ck-loader-optimization

Merge remote-tracking branch 'upstream/main' into unmapped-fields-blo…

e1e818a

…ck-loader-optimization

jordan-powers enabled auto-merge (squash) July 23, 2025 16:24

Merge remote-tracking branch 'upstream/main' into unmapped-fields-blo…

a0cb53c

…ck-loader-optimization

jordan-powers merged commit a76f56b into elastic:main Jul 23, 2025
33 checks passed

jordan-powers deleted the unmapped-fields-block-loader-optimization branch October 1, 2025 15:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move building fieldNames set to class constructor #131722

Move building fieldNames set to class constructor #131722

Uh oh!

jordan-powers commented Jul 22, 2025

Uh oh!

elasticsearchmachine commented Jul 22, 2025

Uh oh!

jordan-powers commented Jul 22, 2025 •

edited

Loading

Uh oh!

parkertimmins left a comment

Uh oh!

jordan-powers commented Jul 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Move building fieldNames set to class constructor #131722

Move building fieldNames set to class constructor #131722

Uh oh!

Conversation

jordan-powers commented Jul 22, 2025

Uh oh!

elasticsearchmachine commented Jul 22, 2025

Uh oh!

jordan-powers commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

parkertimmins left a comment

Choose a reason for hiding this comment

Uh oh!

jordan-powers commented Jul 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jordan-powers commented Jul 22, 2025 •

edited

Loading