Limit concurrent node requests #122850

idegtiarenko · 2025-02-18T13:30:18Z

This adds a possibility to limit amount of nodes a single query send requests at once.

This could be useful:

if we want to prevent a single query utilize the entire cluster resources at once (eg make it slower to allow other things happen in meanwhile)
could be a first step to allow certain queries to sample results from sub set of nodes and return quickly rather than wait for all results at once

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/QueryPragmas.java

idegtiarenko · 2025-02-18T13:58:37Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/DataNodeRequestSender.java

-                nodeToShardIds.computeIfAbsent(selectedNode, unused -> new ArrayList<>()).add(shard.shardId);
+
+                if (concurrentRequests == null || concurrentRequests.tryAcquire()) {
+                    if (nodePermits.get(node).tryAcquire()) {


I wonder if instead we want to check all pending requests here before attempting a new one?
Or possibly implement a more complex strategy

elasticsearchmachine · 2025-02-19T10:49:00Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/DataNodeComputeHandler.java

nik9000

Sounds neat. I'm not deep enough in the request sender to give a good response about how right it is. It'd take a ton of reading. @dnhatn, can you comment?

idegtiarenko · 2025-03-04T17:20:53Z

...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/DataNodeComputeHandler.java

+                if (exchangeSource.isCompleted()) {
+                    nodeListener.onResponse(new DataNodeComputeResponse(List.of(), Map.of()));
+                    return;
+                }


This part prevents us from sending a query to remaining data nodes if we collected enough results

Nit: There's one thing here: We'll "skip" it with onSkip(), but the Sender will still continue processing all shards. From what I see, it will continue calling this after every node finishes.

Should we instead pass something to the sender so it stops calling sendRequest()? I don't think it matters, computationally speaking, but it fells like we're doing "too much" when we could shortcircuit instead (?)

It looks like we won't send more requests because we do:

if (skipRemaining) { DataNodeRequestSender.this.skipRemaining = true; }

So we'll only count the number of shards we skip and that's it. I think.

It looks like we won't send more requests because we do:

Correct, this was added today in: a5084ab

So we'll only count the number of shards we skip and that's it. I think.

The total skipped count consists of ones we skipped already (skippedShards) and count of shards we have not processed (remaining shards in pendingShardIds), please see

elasticsearch/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/DataNodeRequestSender.java

Line 96 in 50da087

var skipped = skippedShards.get() + pendingShardIds.size();

Since these computations should not be expensive, I wonder if we should skip only here, not shortcutting in other places. The reason is that we might need to be more careful not to shortcut in other places when allow_partial_results=true.

Sounds good. It would also simplify the change. We can always add it back later if we see it is needed.

dnhatn

Thanks @idegtiarenko. I've some minor comments. The early termination part will be useful, but I couldn't find a use case for limiting concurrent nodes per cluster, except for testing.

dnhatn · 2025-03-04T18:28:53Z

...compute/src/main/java/org/elasticsearch/compute/operator/exchange/ExchangeSourceHandler.java

    }

+    public boolean isCompleted() {
+        return completed;


Can we use buffer.isFinished here instead adding a new variable?

...ugin/esql/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/ManyShardsIT.java

dnhatn · 2025-03-04T18:48:57Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/DataNodeRequestSender.java

        this.esqlExecutor = esqlExecutor;
        this.rootTask = rootTask;
        this.allowPartialResults = allowPartialResults;
+        this.concurrentRequests = concurrentRequests > 0 ? new Semaphore(concurrentRequests) : null;


Should we initialize the Semaphore for the -1 case with new Semaphore(Integer.MAX_VALUE)?

I made a special case not to keep it at all if we have no limit (most of the cases). But I can do that as well.

# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/DataNodeRequestSender.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/plugin/DataNodeRequestSenderTests.java

ivancea

LGTM!

ivancea · 2025-03-07T16:21:18Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/QueryPragmas.java

+    /**
+     * The maximum number of nodes to be queried at once by this query. This is safeguard to avoid overloading the cluster.
+     */
+    public int maxConcurrentNodePerCluster() {


Suggested change

public int maxConcurrentNodePerCluster() {

public int maxConcurrentNodesPerCluster() {

ivancea · 2025-03-07T16:33:25Z

...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/DataNodeComputeHandler.java

+                if (exchangeSource.isCompleted()) {
+                    nodeListener.onResponse(new DataNodeComputeResponse(List.of(), Map.of()));
+                    return;
+                }


Nit: There's one thing here: We'll "skip" it with onSkip(), but the Sender will still continue processing all shards. From what I see, it will continue calling this after every node finishes.

Should we instead pass something to the sender so it stops calling sendRequest()? I don't think it matters, computationally speaking, but it fells like we're doing "too much" when we could shortcircuit instead (?)

ivancea · 2025-03-07T16:36:33Z

...lugin/esql/src/test/java/org/elasticsearch/xpack/esql/plugin/DataNodeRequestSenderTests.java

+        });
+        safeGet(future);
+        assertThat(sent.size(), equalTo(5));
+        assertThat(maxConcurrentRequests.get(), equalTo(2));


I would randomize this 2 if possible.
At least, to test some edge cases, like 1 and 5.

idegtiarenko · 2025-03-10T10:28:21Z

...lugin/esql/src/test/java/org/elasticsearch/xpack/esql/plugin/DataNodeRequestSenderTests.java

+        assertThat(sent.size(), equalTo(2));// onResponse() + onSkip()
+        assertThat(response.totalShards, equalTo(5));
+        assertThat(response.successfulShards, equalTo(5));
+        assertThat(response.failedShards, equalTo(0));


I wonder if we should count skipped shards as successful or as org.elasticsearch.xpack.esql.plugin.ComputeResponse#skippedShards

nik9000

I'm good. Let's get Nhat's 👍 too.

dnhatn · 2025-03-10T16:01:54Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/DataNodeRequestSender.java

+                if (skipRemaining) {
+                    DataNodeRequestSender.this.skipRemaining = true;
+                }
+                onAfter(List.of());


I think we should clear the shard failures for shards that are skipped; otherwise, we will still report failures.

dnhatn · 2025-03-10T16:43:53Z

...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/DataNodeComputeHandler.java

+                if (exchangeSource.isCompleted()) {
+                    nodeListener.onResponse(new DataNodeComputeResponse(List.of(), Map.of()));
+                    return;
+                }


Since these computations should not be expensive, I wonder if we should skip only here, not shortcutting in other places. The reason is that we might need to be more careful not to shortcut in other places when allow_partial_results=true.

dnhatn

LGTM. Thanks @idegtiarenko

(cherry picked from commit 8d11dd2)

Limit concurrent node requests

eec2039

idegtiarenko requested review from dnhatn and nik9000 February 18, 2025 13:30

elasticsearchmachine added the v9.1.0 label Feb 18, 2025

idegtiarenko commented Feb 18, 2025

View reviewed changes

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/QueryPragmas.java Show resolved Hide resolved

idegtiarenko added 2 commits February 18, 2025 14:34

upd

d4850fc

upd

ae96763

idegtiarenko commented Feb 18, 2025

View reviewed changes

idegtiarenko added >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Feb 19, 2025

idegtiarenko marked this pull request as ready for review February 19, 2025 10:48

idegtiarenko added 2 commits March 4, 2025 09:08

Merge branch 'main' into limit_concurrent_node_requests

15d897f

# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/DataNodeComputeHandler.java

do not send requests if source is completed

666f588

idegtiarenko mentioned this pull request Mar 4, 2025

Query hot indices first #122928

Merged

nik9000 reviewed Mar 4, 2025

View reviewed changes

idegtiarenko commented Mar 4, 2025

View reviewed changes

dnhatn reviewed Mar 4, 2025

View reviewed changes

idegtiarenko added 6 commits March 5, 2025 08:27

Merge branch 'main' into limit_concurrent_node_requests

34badf7

upd

8c55e8a

do not erase prior shard failures on skipping node

b2a66a2

Merge branch 'main' into limit_concurrent_node_requests

11bd0f0

upd

53f7d60

Merge branch 'main' into limit_concurrent_node_requests

1bf2715

# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/DataNodeRequestSender.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/plugin/DataNodeRequestSenderTests.java

ivancea approved these changes Mar 7, 2025

View reviewed changes

idegtiarenko added 4 commits March 10, 2025 10:45

Merge branch 'main' into limit_concurrent_node_requests

3296cf0

rename

f0cc4ec

randomize concurrency

5ce4095

skip remaining nodes

a5084ab

idegtiarenko commented Mar 10, 2025

View reviewed changes

idegtiarenko added 2 commits March 10, 2025 13:08

count skips

50da087

Merge branch 'main' into limit_concurrent_node_requests

6809e58

idegtiarenko requested review from dnhatn and nik9000 March 10, 2025 12:10

nik9000 approved these changes Mar 10, 2025

View reviewed changes

Merge branch 'main' into limit_concurrent_node_requests

1458611

dnhatn reviewed Mar 10, 2025

View reviewed changes

idegtiarenko added 2 commits March 10, 2025 18:00

remove skipping

02ddb1e

clean non-fatal errors on shard skips

4058ef1

idegtiarenko requested a review from dnhatn March 10, 2025 17:28

cleanup

e8ad22a

dnhatn approved these changes Mar 10, 2025

View reviewed changes

idegtiarenko added 5 commits March 11, 2025 08:23

ensure every shard has at least one doc

a70aff4

Merge branch 'main' into limit_concurrent_node_requests

ed54538

debug flaky test

7f47bde

upd

f6868a1

Merge branch 'main' into limit_concurrent_node_requests

f8a54f6

idegtiarenko merged commit 8d11dd2 into elastic:main Mar 12, 2025
17 checks passed

idegtiarenko deleted the limit_concurrent_node_requests branch March 12, 2025 09:02

albertzaharovits pushed a commit to albertzaharovits/elasticsearch that referenced this pull request Mar 13, 2025

Limit concurrent node requests (elastic#122850)

d7beb03

jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request Mar 13, 2025

Limit concurrent node requests (elastic#122850)

404ad61

idegtiarenko mentioned this pull request May 6, 2025

[8.19] Limit concurrent node requests (#122850) #127742

Merged

idegtiarenko added a commit that referenced this pull request May 6, 2025

Limit concurrent node requests (#122850) (#127742)

551f4c7

(cherry picked from commit 8d11dd2)

idegtiarenko added the v8.19.0 label May 6, 2025

	public int maxConcurrentNodePerCluster() {
	public int maxConcurrentNodesPerCluster() {

Limit concurrent node requests #122850

Limit concurrent node requests #122850

Uh oh!

Conversation

idegtiarenko commented Feb 18, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Feb 19, 2025

Uh oh!

nik9000 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnhatn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ivancea left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nik9000 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnhatn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants