Skip to content

[Serve] O(1) replica lookup in record_request_routing_info via ReplicaStateContainer index#60842

Open
abrarsheikh wants to merge 2 commits intomasterfrom
60680-abrar-request_route
Open

[Serve] O(1) replica lookup in record_request_routing_info via ReplicaStateContainer index#60842
abrarsheikh wants to merge 2 commits intomasterfrom
60680-abrar-request_route

Conversation

@abrarsheikh
Copy link
Contributor

@abrarsheikh abrarsheikh commented Feb 8, 2026

record_request_routing_info() is called once per replica on every control-loop tick. The previous implementation found the target replica with a linear scan over ReplicaStateContainer.get(), which itself concatenates every per-state list (O(R)). For R replicas this made the per-call cost O(R) and the aggregate per-tick cost O(R^2).

What

Add a _replica_id_index: Dict[ReplicaID, DeploymentReplica] to ReplicaStateContainer, maintained on add() and pop(). A new get_by_id() method provides O(1) lookup. record_request_routing_info() now uses it instead of the linear scan.

Benchmark results

Micro-benchmark averaging 2,000 random lookups per replica count:

     R      old (us)    new (us)   speedup   index memory
    16        2.41        0.23      10x          640 B
    64        7.18        0.24      29x        2,272 B
   256       25.83        0.27      94x        9,312 B
  1024       99.11        0.33     298x       36,960 B
  4096      404.21        0.44     915x      147,552 B
  • Latency: old grows linearly with R; new stays flat at ~0.3 us. 915x faster at 4096 replicas.
  • Memory: ~36 bytes per replica. 148 KB at 4096 replicas -- negligible.

Related to #60680

…ntainer index

Signed-off-by: abrar <abrar@anyscale.com>
@abrarsheikh abrarsheikh requested a review from a team as a code owner February 8, 2026 02:32
@abrarsheikh abrarsheikh added the go add ONLY when ready to merge, run all tests label Feb 8, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant performance optimization for replica lookups in record_request_routing_info. By adding a _replica_id_index to ReplicaStateContainer, the lookup time is reduced from O(R) to O(1), which is a great improvement, especially for deployments with a large number of replicas. The implementation is clean, and the index is correctly maintained during add and pop operations. The changes are well-contained and easy to follow. My only suggestion is to add a unit test for the new get_by_id method to ensure its long-term correctness.

Signed-off-by: abrar <abrar@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant