[rollout] fix: correct heap-based load balancing in AsyncLLMServerManager (verl-project#4505)

hellcatCS · Aleksandr Semikin · web-flow · commit 36c2d4a1a70c · 2025-12-15T23:03:19.000+08:00
### What does this PR do?

This PR fixes the load balancing issue in AsyncLLMServerManager where
the heap-based server selection was using hash values instead of
indices, causing unpredictable server selection order after shuffling.

Problem:

The original implementation used hash(server) as the secondary sort key
in the heap
When all servers had the same request count (0), the heap would select
the server with the minimum hash value, not the first server in the
shuffled list
This resulted in poor load distribution and defeated the purpose of
random shuffling

Solution:

Replace hash(server) with explicit indices in the heap structure
Heap now sorts by (request_count, index, server) instead of
(request_count, hash, server)
Ensures deterministic selection: when request counts are equal, the
server with the lowest index (first in the shuffled list) is always
chosen

Example: 

# Before (❌ Broken):
server_handles = [Server_A, Server_B, Server_C]
random.shuffle(server_handles)  # → [Server_C, Server_A, Server_B]
weighted_servers = [[0, hash(s), s] for s in server_handles]
# Heap might select Server_A first (min hash), not Server_C!

# After (✅ Fixed):
server_handles = [Server_A, Server_B, Server_C]
random.shuffle(server_handles)  # → [Server_C, Server_A, Server_B]
weighted_servers = [[0, idx, s] for idx, s in enumerate(server_handles)]
# Heap correctly selects Server_C first (idx=0)

Co-authored-by: Aleksandr Semikin &lt;aesemikin@alice-a100.sas.yp-c.yandex.net&gt;
diff --git a/verl/experimental/agent_loop/agent_loop.py b/verl/experimental/agent_loop/agent_loop.py
@@ -70,7 +70,7 @@ def __init__(self, config: DictConfig, server_handles: list[ray.actor.ActorHandl
         random.shuffle(self.server_handles)
 
         # Least requests load balancing
-        self.weighted_serveres = [[0, (hash(server), server)] for server in server_handles]
+        self.weighted_serveres = [[0, idx, server] for idx, server in enumerate(self.server_handles)]
         heapq.heapify(self.weighted_serveres)
 
         # LRU cache to map request_id to server
@@ -81,7 +81,7 @@ def _choose_server(self, request_id: str) -> ray.actor.ActorHandle:
         if request_id in self.request_id_to_server:
             return self.request_id_to_server[request_id]
 
-        server = self.weighted_serveres[0][1][1]
+        _, _, server = self.weighted_serveres[0]
         self.weighted_serveres[0][0] += 1
         heapq.heapreplace(self.weighted_serveres, self.weighted_serveres[0])
         self.request_id_to_server[request_id] = server