Skip to content

Commit a05cf74

Browse files
[Blog] SGLang router integration and disaggregated inference roadmap
Added multi-node replicas to roadmap
1 parent ea6a5e6 commit a05cf74

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

docs/blog/posts/sglang-router.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,7 @@ Looking ahead, this integration also shapes our roadmap. Over the coming release
161161

162162
* Enabling prefill and decode worker separation for full disaggregation (today, only standard workers are supported).
163163
* Introducing auto-scaling based on TTFT (Time to First Token) and ITL (Inter-Token Latency), complementing the current requests-per-second scaling metric.
164+
* Supporting multi-node replicas, enabling a single replica to span multiple nodes instead of being limited to one.
164165
* Extending native support to more emerging inference stacks.
165166

166167
## What's next?

0 commit comments

Comments
 (0)