Sending slice requests to different nodes in BulkByScrollParallelizationHelper to improve performance #125238

masseyke · 2025-03-19T16:58:25Z

We noticed that when running reindex on a large index on a large cluster, a single node (the one running TransportReindexAction) would have 100% CPU usage, while the other nodes would all be at ~15%. It turns out that we execute all slices for the reindex on the same node. So if there is any pipeline, all processors for all documents are executed on the single node, before the shard-specific indexing requests are sent out to the nodes where the shards live.
This change makes it so that BulkByScrollParallelizationHelper round-robins its work through all of the ingest nodes to more evenly spread out the work. In practice, we have seen significant performance improvements when we have large indices with pipelines running on large (10-node) clusters.
Relates #125171

…ve performance

…umber

…ticsearch into round-robin-slice-requests

elasticsearchmachine · 2025-03-21T20:06:51Z

Hi @masseyke, I've created a changelog YAML for you.

elasticsearchmachine · 2025-03-21T20:08:33Z

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

henningandersen

I am slightly unsure about the need for this, hope you can provide more data on it.

I would like to also have an IT demonstrating that the slices are indeed handled on different nodes.

And we may need to do a more exhaustive search for local node expectations.

henningandersen · 2025-03-25T07:21:21Z

modules/reindex/src/main/java/org/elasticsearch/reindex/BulkByScrollParallelizationHelper.java

+                DiscoveryNode ingestNode = ingestNodes[Math.floorMod(ingestNodeOffsetGenerator.incrementAndGet(), ingestNodes.length)];
+                logger.debug("Sending request for slice to {}", ingestNode.getName());
+                transportService.sendRequest(
+                    ingestNode,


I think there are expectations of this running on the local node, for instance here

I didn't realize that we had a rebalance API, and that it worked on the assumption that all subtasks were local. That would be a much bigger change to handle (I assume we'd have to put information about where each child task is running into the LeaderBulkByScrollTaskState?). So I think I'll close this for now, and maybe revisit it if we see evidence of this causing performance problems in the wild.

henningandersen · 2025-03-25T07:22:23Z

modules/reindex/src/main/java/org/elasticsearch/reindex/BulkByScrollParallelizationHelper.java

+                client.execute(action, requestForSlice, sliceListener);
+            } else {
+                /*
+                 * Indexing will potentially run a pipeline for each document. If we run all slices on the same node (locally), that


Pipelines that hog the CPU that much during reindex sounds problematic, I wonder if that is worth looking into instead? Perhaps you have more detail to share (privately is good too).

It doesn't even have to be a really CPU-heavy pipeline. Just the existence of a trivial set processor, for example, means that the data has to be deserialized and reserialized. Just that serialization slows things down a good bit (since without pipelines the data is never deserialized before being sent to the correct node for indexing). There might be something smarter we can do on the pipeline side, but this seemed like an easy workaround to spread out that work (although taking the comment below about rebalancing into account, this is no longer an easy workaround).

henningandersen · 2025-03-25T07:25:01Z

modules/reindex/src/main/java/org/elasticsearch/reindex/BulkByScrollParallelizationHelper.java

+     * The following is incremented in order to keep track of the current round-robin position for ingest nodes that we send sliced requests
+     * to. We randomize where it starts so that all nodes don't begin by sending data to the same node.
+     */
+    private static final AtomicInteger ingestNodeOffsetGenerator = new AtomicInteger(Randomness.get().nextInt(2048));


It would be good to get rid of the static here, perhaps this can be kept on the action instead and passed in?

masseyke · 2025-03-25T22:26:11Z

I am slightly unsure about the need for this, hope you can provide more data on it.

This came out of some testing that @parkertimmins and I did. We noticed that if we ran reindex with a very simple pipeline (set a single field), and had an index that was big enough to get split into many slices, and we ran on a 10-node cluster, one node would be pegged at 100% running the pipeline, and the rest would be at a much lower percent (~10-15% maybe) doing indexing.
As an experiment to see what would happen if we spread the ingest pipeline work out, we ran with the code in this PR, and found that the total reindex time went down from ~11 minutes to ~3.5 minutes. It's definitely possible that there's a better way to get the same gains (for example improving serialization/deserialization in the ingest node).

masseyke · 2025-04-21T16:28:05Z

Closing because this solution is not compatible with the rethrottle action.

Sending BulkByScrollParallelizationHelper to different nodes to impro…

bec900b

…ve performance

elasticsearchmachine added the v9.1.0 label Mar 19, 2025

masseyke changed the title ~~Sending BulkByScrollParallelizationHelper to different nodes to improve performance~~ Sending slice requests to different nodes in BulkByScrollParallelizationHelper to improve performance Mar 19, 2025

masseyke added 6 commits March 20, 2025 17:05

Merge branch 'main' into round-robin-slice-requests

9cc196b

Merge branch 'main' into round-robin-slice-requests

e9342e1

adding a unit test

148997f

setting max ingestNodeOffsetGenerator initial value to a reasonable n…

a85c864

…umber

Merge branch 'round-robin-slice-requests' of github.com:masseyke/elas…

49dd427

…ticsearch into round-robin-slice-requests

Merge branch 'main' into round-robin-slice-requests

2ae02b3

masseyke added >enhancement :Distributed Indexing/Reindex Issues relating to reindex that are not caused by issues further down auto-backport Automatically create backport pull requests when merged v8.18.1 v8.19.0 v9.0.1 labels Mar 21, 2025

Update docs/changelog/125238.yaml

30c501e

masseyke marked this pull request as ready for review March 21, 2025 20:08

elasticsearchmachine added the Team:Distributed Indexing Meta label for Distributed Indexing team label Mar 21, 2025

Merge branch 'main' into round-robin-slice-requests

c6dd325

masseyke requested a review from a team March 24, 2025 18:17

henningandersen reviewed Mar 25, 2025

View reviewed changes

masseyke closed this Apr 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sending slice requests to different nodes in BulkByScrollParallelizationHelper to improve performance #125238

Sending slice requests to different nodes in BulkByScrollParallelizationHelper to improve performance #125238

Uh oh!

masseyke commented Mar 19, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Mar 21, 2025

Uh oh!

elasticsearchmachine commented Mar 21, 2025

Uh oh!

henningandersen left a comment

Uh oh!

henningandersen Mar 25, 2025

Uh oh!

masseyke Apr 21, 2025

Uh oh!

henningandersen Mar 25, 2025

Uh oh!

masseyke Apr 21, 2025

Uh oh!

henningandersen Mar 25, 2025

Uh oh!

masseyke commented Mar 25, 2025

Uh oh!

masseyke commented Apr 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Sending slice requests to different nodes in BulkByScrollParallelizationHelper to improve performance #125238

Sending slice requests to different nodes in BulkByScrollParallelizationHelper to improve performance #125238

Uh oh!

Conversation

masseyke commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Mar 21, 2025

Uh oh!

elasticsearchmachine commented Mar 21, 2025

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

henningandersen Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

masseyke Apr 21, 2025

Choose a reason for hiding this comment

Uh oh!

henningandersen Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

masseyke Apr 21, 2025

Choose a reason for hiding this comment

Uh oh!

henningandersen Mar 25, 2025

Choose a reason for hiding this comment

Uh oh!

masseyke commented Mar 25, 2025

Uh oh!

masseyke commented Apr 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

masseyke commented Mar 19, 2025 •

edited

Loading