Skip to content

Conversation

@jordan-powers
Copy link
Contributor

This patch moves building the set of fieldName prefixes in the FallbackSyntheticSourceBlockLoader to the constructor. This set does not change between invocations (since the fieldName is final), so we can just do the work once when constructing the BlockLoader instead of per-document.

Resolves #130887

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@jordan-powers
Copy link
Contributor Author

jordan-powers commented Jul 22, 2025

Here are the results of the latest benchmark:

|                                                        Metric |                 Task |         Baseline |        Contender |         Diff |   Unit |   Diff % |
|--------------------------------------------------------------:|---------------------:|-----------------:|-----------------:|-------------:|-------:|---------:|
|                    Cumulative indexing time of primary shards |                      |    726.581       |    664.553       |    -62.0272  |    min |   -8.54% |
|             Min cumulative indexing time across primary shard |                      |      2.37642     |      2.10365     |     -0.27277 |    min |  -11.48% |
|          Median cumulative indexing time across primary shard |                      |      8.16167     |      7.20188     |     -0.95978 |    min |  -11.76% |
|             Max cumulative indexing time across primary shard |                      |    150.58        |    147.443       |     -3.13755 |    min |   -2.08% |
|           Cumulative indexing throttle time of primary shards |                      |      0           |      0           |      0       |    min |    0.00% |
|    Min cumulative indexing throttle time across primary shard |                      |      0           |      0           |      0       |    min |    0.00% |
| Median cumulative indexing throttle time across primary shard |                      |      0           |      0           |      0       |    min |    0.00% |
|    Max cumulative indexing throttle time across primary shard |                      |      0           |      0           |      0       |    min |    0.00% |
|                       Cumulative merge time of primary shards |                      |    218.303       |    219.45        |      1.1471  |    min |   +0.53% |
|                      Cumulative merge count of primary shards |                      |    407           |    539           |    132       |        |  +32.43% |
|                Min cumulative merge time across primary shard |                      |      0.2705      |      0.32085     |      0.05035 |    min |  +18.61% |
|             Median cumulative merge time across primary shard |                      |      1.54745     |      1.45553     |     -0.09192 |    min |   -5.94% |
|                Max cumulative merge time across primary shard |                      |     53.3357      |     58.4563      |      5.1206  |    min |   +9.60% |
|              Cumulative merge throttle time of primary shards |                      |     64.5853      |     72.5918      |      8.00655 |    min |  +12.40% |
|       Min cumulative merge throttle time across primary shard |                      |      0.0678      |      0.0751167   |      0.00732 |    min |  +10.79% |
|    Median cumulative merge throttle time across primary shard |                      |      0.389333    |      0.479883    |      0.09055 |    min |  +23.26% |
|       Max cumulative merge throttle time across primary shard |                      |     16.9648      |     18.6675      |      1.70272 |    min |  +10.04% |
|                     Cumulative refresh time of primary shards |                      |      9.88703     |      7.54207     |     -2.34497 |    min |  -23.72% |
|                    Cumulative refresh count of primary shards |                      |   6125           |   6139           |     14       |        |   +0.23% |
|              Min cumulative refresh time across primary shard |                      |      0.0117833   |      0.03235     |      0.02057 |    min | +174.54% |
|           Median cumulative refresh time across primary shard |                      |      0.0574167   |      0.07385     |      0.01643 |    min |  +28.62% |
|              Max cumulative refresh time across primary shard |                      |      2.76693     |      1.83095     |     -0.93598 |    min |  -33.83% |
|                       Cumulative flush time of primary shards |                      |    145.41        |    125.858       |    -19.5528  |    min |  -13.45% |
|                      Cumulative flush count of primary shards |                      |   5702           |   5590           |   -112       |        |   -1.96% |
|                Min cumulative flush time across primary shard |                      |      0.505133    |      0.519267    |      0.01413 |    min |   +2.80% |
|             Median cumulative flush time across primary shard |                      |      1.95388     |      1.75958     |     -0.1943  |    min |   -9.94% |
|                Max cumulative flush time across primary shard |                      |     26.3399      |     22.6715      |     -3.66847 |    min |  -13.93% |
|                                       Total Young Gen GC time |                      |    249.625       |    160.537       |    -89.088   |      s |  -35.69% |
|                                      Total Young Gen GC count |                      |  20598           |  20161           |   -437       |        |   -2.12% |
|                                         Total Old Gen GC time |                      |      0           |      0           |      0       |      s |    0.00% |
|                                        Total Old Gen GC count |                      |      0           |      0           |      0       |        |    0.00% |
|                                                  Dataset size |                      |     52.4946      |     51.8767      |     -0.61794 |     GB |   -1.18% |
|                                                    Store size |                      |     52.4946      |     51.8767      |     -0.61794 |     GB |   -1.18% |
|                                                 Translog size |                      |      3.99537e-06 |      3.99537e-06 |      0       |     GB |    0.00% |
|                                        Heap used for segments |                      |      0           |      0           |      0       |     MB |    0.00% |
|                                      Heap used for doc values |                      |      0           |      0           |      0       |     MB |    0.00% |
|                                           Heap used for terms |                      |      0           |      0           |      0       |     MB |    0.00% |
|                                           Heap used for norms |                      |      0           |      0           |      0       |     MB |    0.00% |
|                                          Heap used for points |                      |      0           |      0           |      0       |     MB |    0.00% |
|                                   Heap used for stored fields |                      |      0           |      0           |      0       |     MB |    0.00% |
|                                                 Segment count |                      |    946           |   1078           |    132       |        |  +13.95% |
|                                   Total Ingest Pipeline count |                      |      4.88622e+08 |      4.8861e+08  | -12000       |        |   -0.00% |
|                                    Total Ingest Pipeline time |                      |      1.65978e+07 |      1.71165e+07 | 518754       |     ms |   +3.13% |
|                                  Total Ingest Pipeline failed |                      |      0           |      0           |      0       |        |    0.00% |
|                                                Min Throughput |            limit_500 |     16.7281      |     10.9373      |     -5.79079 |  ops/s |  -34.62% |
|                                               Mean Throughput |            limit_500 |     16.7281      |     17.9326      |      1.20457 |  ops/s |   +7.20% |
|                                             Median Throughput |            limit_500 |     16.7281      |     17.9326      |      1.20457 |  ops/s |   +7.20% |
|                                                Max Throughput |            limit_500 |     16.7281      |     24.928       |      8.19992 |  ops/s |  +49.02% |
|                                       50th percentile latency |            limit_500 |     15.0181      |     14.3872      |     -0.63085 |     ms |   -4.20% |
|                                       90th percentile latency |            limit_500 |     19.3995      |     17.8794      |     -1.5201  |     ms |   -7.84% |
|                                       99th percentile latency |            limit_500 |     22.621       |     19.9339      |     -2.68712 |     ms |  -11.88% |
|                                      100th percentile latency |            limit_500 |     24.5074      |     20.6024      |     -3.90491 |     ms |  -15.93% |
|                                  50th percentile service time |            limit_500 |     15.0181      |     14.3872      |     -0.63085 |     ms |   -4.20% |
|                                  90th percentile service time |            limit_500 |     19.3995      |     17.8794      |     -1.5201  |     ms |   -7.84% |
|                                  99th percentile service time |            limit_500 |     22.621       |     19.9339      |     -2.68712 |     ms |  -11.88% |
|                                 100th percentile service time |            limit_500 |     24.5074      |     20.6024      |     -3.90491 |     ms |  -15.93% |
|                                                    error rate |            limit_500 |      0           |      0           |      0       |      % |    0.00% |
|                                                Min Throughput |            chicken_1 |      7.86896     |      8.42789     |      0.55894 |  ops/s |   +7.10% |
|                                               Mean Throughput |            chicken_1 |      7.86896     |      8.42789     |      0.55894 |  ops/s |   +7.10% |
|                                             Median Throughput |            chicken_1 |      7.86896     |      8.42789     |      0.55894 |  ops/s |   +7.10% |
|                                                Max Throughput |            chicken_1 |      7.86896     |      8.42789     |      0.55894 |  ops/s |   +7.10% |
|                                       50th percentile latency |            chicken_1 |     40.2074      |     40.8175      |      0.61005 |     ms |   +1.52% |
|                                      100th percentile latency |            chicken_1 |     47.4822      |     44.8914      |     -2.59075 |     ms |   -5.46% |
|                                  50th percentile service time |            chicken_1 |     40.2074      |     40.8175      |      0.61005 |     ms |   +1.52% |
|                                 100th percentile service time |            chicken_1 |     47.4822      |     44.8914      |     -2.59075 |     ms |   -5.46% |
|                                                    error rate |            chicken_1 |      0           |      0           |      0       |      % |    0.00% |
|                                                Min Throughput |            chicken_2 |      0.00166012  |      0.00191644  |      0.00026 |  ops/s |  +15.44% |
|                                               Mean Throughput |            chicken_2 |      0.00166321  |      0.00192946  |      0.00027 |  ops/s |  +16.01% |
|                                             Median Throughput |            chicken_2 |      0.00166267  |      0.00193378  |      0.00027 |  ops/s |  +16.31% |
|                                                Max Throughput |            chicken_2 |      0.00166679  |      0.00193727  |      0.00027 |  ops/s |  +16.23% |
|                                       50th percentile latency |            chicken_2 | 605252           | 534437           | -70814.8     |     ms |  -11.70% |
|                                      100th percentile latency |            chicken_2 | 606348           | 536243           | -70105.1     |     ms |  -11.56% |
|                                  50th percentile service time |            chicken_2 | 605252           | 534437           | -70814.8     |     ms |  -11.70% |
|                                 100th percentile service time |            chicken_2 | 606348           | 536243           | -70105.1     |     ms |  -11.56% |
|                                                    error rate |            chicken_2 |      0           |      0           |      0       |      % |    0.00% |
|                                                Min Throughput |            chicken_3 |      0.00123918  |      0.00135812  |      0.00012 |  ops/s |   +9.60% |
|                                               Mean Throughput |            chicken_3 |      0.00124023  |      0.00136034  |      0.00012 |  ops/s |   +9.68% |
|                                             Median Throughput |            chicken_3 |      0.00123986  |      0.00135923  |      0.00012 |  ops/s |   +9.63% |
|                                                Max Throughput |            chicken_3 |      0.00124248  |      0.00136437  |      0.00012 |  ops/s |   +9.81% |
|                                       50th percentile latency |            chicken_3 | 807223           | 739102           | -68120.9     |     ms |   -8.44% |
|                                      100th percentile latency |            chicken_3 | 812655           | 741483           | -71172.6     |     ms |   -8.76% |
|                                  50th percentile service time |            chicken_3 | 807223           | 739102           | -68120.9     |     ms |   -8.44% |
|                                 100th percentile service time |            chicken_3 | 812655           | 741483           | -71172.6     |     ms |   -8.76% |
|                                                    error rate |            chicken_3 |      0           |      0           |      0       |      % |    0.00% |
|                                                Min Throughput | chicken_3_with_where |     34.6184      |     26.4964      |     -8.12203 |  ops/s |  -23.46% |
|                                               Mean Throughput | chicken_3_with_where |     34.6184      |     26.4964      |     -8.12203 |  ops/s |  -23.46% |
|                                             Median Throughput | chicken_3_with_where |     34.6184      |     26.4964      |     -8.12203 |  ops/s |  -23.46% |
|                                                Max Throughput | chicken_3_with_where |     34.6184      |     26.4964      |     -8.12203 |  ops/s |  -23.46% |
|                                       50th percentile latency | chicken_3_with_where |      7.3965      |      7.71256     |      0.31607 |     ms |   +4.27% |
|                                      100th percentile latency | chicken_3_with_where |      9.35541     |     10.2378      |      0.88239 |     ms |   +9.43% |
|                                  50th percentile service time | chicken_3_with_where |      7.3965      |      7.71256     |      0.31607 |     ms |   +4.27% |
|                                 100th percentile service time | chicken_3_with_where |      9.35541     |     10.2378      |      0.88239 |     ms |   +9.43% |
|                                                    error rate | chicken_3_with_where |      0           |      0           |      0       |      % |    0.00% |
|                                                Min Throughput |            chicken_4 |      0.00813094  |      0.00570994  |     -0.00242 |  ops/s |  -29.78% |
|                                               Mean Throughput |            chicken_4 |      0.00832372  |      0.00578681  |     -0.00254 |  ops/s |  -30.48% |
|                                             Median Throughput |            chicken_4 |      0.00833839  |      0.00579975  |     -0.00254 |  ops/s |  -30.45% |
|                                                Max Throughput |            chicken_4 |      0.00846962  |      0.00588182  |     -0.00259 |  ops/s |  -30.55% |
|                                       50th percentile latency |            chicken_4 | 106918           | 179663           |  72745.5     |     ms |  +68.04% |
|                                      100th percentile latency |            chicken_4 | 137511           | 189260           |  51749.4     |     ms |  +37.63% |
|                                  50th percentile service time |            chicken_4 | 106918           | 179663           |  72745.5     |     ms |  +68.04% |
|                                 100th percentile service time |            chicken_4 | 137511           | 189260           |  51749.4     |     ms |  +37.63% |
|                                                    error rate |            chicken_4 |      0           |      0           |      0       |      % |    0.00% |

Summary:

Median Throughput Median Latency
limit_500 +7.20% -4.20%
chicken_1 +7.10% +1.52%
chicken_2 +16.01% -11.70%
chicken_3 +9.63% -8.44%
chicken_3_with_where -23.46% +4.27%
chicken_4 -30.45% +68.04%

(Not sure how useful throughput is as a metric here, from what I understand for search queries the important metric is latency.)

At first glance, this is not the across-the-board improvement I was expecting. While the results are promising for limit_500, chicken_1, chicken_2, and chicken_3, the results look much worse for chicken_3_with_where and chicken_4.
However, I've spent some time looking at the dashboard, and it seems those results might just be noisy.

I'm going to run the benchmark, collect a profile, and generate a flamegraph to double-check that we're not seeing as much time spent in HashSet#add() as we saw in the original profile.

(I tried to collect a profile during the last benchmark run, but it seems I collected it during the indexing step when we're actually concerned with the search steps, so I need to re-run and re-collect).

Copy link
Contributor

@parkertimmins parkertimmins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jordan-powers
Copy link
Contributor Author

image

Ok, with this change I'm not seeing HashSet.add or String.split at all in the flamegraph anymore.

@jordan-powers jordan-powers enabled auto-merge (squash) July 23, 2025 16:24
@jordan-powers jordan-powers merged commit a76f56b into elastic:main Jul 23, 2025
33 checks passed
@jordan-powers jordan-powers deleted the unmapped-fields-block-loader-optimization branch October 1, 2025 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve reading FallbackSyntheticSourceBlockLoader

3 participants