Skip to content

Commit 36c2d4a

Browse files
hellcatCSAleksandr Semikin
andauthored
[rollout] fix: correct heap-based load balancing in AsyncLLMServerManager (verl-project#4505)
### What does this PR do? This PR fixes the load balancing issue in AsyncLLMServerManager where the heap-based server selection was using hash values instead of indices, causing unpredictable server selection order after shuffling. Problem: The original implementation used hash(server) as the secondary sort key in the heap When all servers had the same request count (0), the heap would select the server with the minimum hash value, not the first server in the shuffled list This resulted in poor load distribution and defeated the purpose of random shuffling Solution: Replace hash(server) with explicit indices in the heap structure Heap now sorts by (request_count, index, server) instead of (request_count, hash, server) Ensures deterministic selection: when request counts are equal, the server with the lowest index (first in the shuffled list) is always chosen Example: # Before (❌ Broken): server_handles = [Server_A, Server_B, Server_C] random.shuffle(server_handles) # → [Server_C, Server_A, Server_B] weighted_servers = [[0, hash(s), s] for s in server_handles] # Heap might select Server_A first (min hash), not Server_C! # After (✅ Fixed): server_handles = [Server_A, Server_B, Server_C] random.shuffle(server_handles) # → [Server_C, Server_A, Server_B] weighted_servers = [[0, idx, s] for idx, s in enumerate(server_handles)] # Heap correctly selects Server_C first (idx=0) Co-authored-by: Aleksandr Semikin <aesemikin@alice-a100.sas.yp-c.yandex.net>
1 parent ec14a87 commit 36c2d4a

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

verl/experimental/agent_loop/agent_loop.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ def __init__(self, config: DictConfig, server_handles: list[ray.actor.ActorHandl
7070
random.shuffle(self.server_handles)
7171

7272
# Least requests load balancing
73-
self.weighted_serveres = [[0, (hash(server), server)] for server in server_handles]
73+
self.weighted_serveres = [[0, idx, server] for idx, server in enumerate(self.server_handles)]
7474
heapq.heapify(self.weighted_serveres)
7575

7676
# LRU cache to map request_id to server
@@ -81,7 +81,7 @@ def _choose_server(self, request_id: str) -> ray.actor.ActorHandle:
8181
if request_id in self.request_id_to_server:
8282
return self.request_id_to_server[request_id]
8383

84-
server = self.weighted_serveres[0][1][1]
84+
_, _, server = self.weighted_serveres[0]
8585
self.weighted_serveres[0][0] += 1
8686
heapq.heapreplace(self.weighted_serveres, self.weighted_serveres[0])
8787
self.request_id_to_server[request_id] = server

0 commit comments

Comments
 (0)