-
Notifications
You must be signed in to change notification settings - Fork 2k
[TRTLLM-10029][scheduler] Re-implement MicroBatchScheduler and CapacityScheduler in Python #10273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: Lanyu Liao <[email protected]>
|
/bot run --disable-fail-fast |
|
PR_Github #29796 [ run ] triggered by Bot. Commit: |
📝 WalkthroughWalkthroughThese changes extend Python bindings for GenLlmReq and KVCacheManager C++ classes, add an environment variable to enable Python-based scheduling, and introduce a comprehensive Python scheduling framework with capacity and micro-batch scheduling policies as an alternative to C++ scheduler components. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Executor as Executor
participant Sched as SimpleUnifiedScheduler
participant Capacity as PyCapacityScheduler
participant MicroBatch as PyMicroBatchScheduler
participant KVCache as KVCacheManager
participant Policy as SchedulerPolicy
Executor->>Sched: schedule(pending_requests, running_requests, kv_cache_manager)
activate Sched
Sched->>Capacity: schedule(pending, running, kv_cache_manager)
activate Capacity
Capacity->>Policy: get_new_request_ids(pending)
activate Policy
Policy->>Capacity: filtered_request_ids
deactivate Policy
loop For each candidate request
Capacity->>KVCache: find_new_context_block(unique_tokens, request)
KVCache->>Capacity: context_block_info
Capacity->>Capacity: fit_request_to_blocks()
end
Capacity->>Sched: scheduled_requests, paused_requests
deactivate Capacity
Sched->>MicroBatch: schedule(scheduled_requests, kv_cache_manager)
activate MicroBatch
MicroBatch->>MicroBatch: compute_chunk_sizes()
rect rgb(200, 220, 255)
note right of MicroBatch: Encoder phase
MicroBatch->>KVCache: scheduling_has_free_blocks()
KVCache->>MicroBatch: has_free
end
rect rgb(220, 240, 220)
note right of MicroBatch: Context phase
MicroBatch->>MicroBatch: select_requests_for_context()
end
rect rgb(255, 240, 200)
note right of MicroBatch: Generation phase
MicroBatch->>MicroBatch: select_requests_for_generation()
end
MicroBatch->>Sched: SchedulerOutput (batches, tokens)
deactivate MicroBatch
Sched->>Executor: SchedulerOutput
deactivate Sched
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
PR_Github #29985 [ run ] triggered by Bot. Commit: |
|
PR_Github #29985 [ run ] completed with state
|
Signed-off-by: Lance Liao <[email protected]>
|
/bot run --disable-fail-fast |
|
PR_Github #29996 [ run ] triggered by Bot. Commit: |
|
PR_Github #29996 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #30006 [ run ] triggered by Bot. Commit: |
|
PR_Github #30006 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #30008 [ run ] triggered by Bot. Commit: |
|
PR_Github #30008 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #30009 [ run ] triggered by Bot. Commit: |
|
PR_Github #30009 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #30012 [ run ] triggered by Bot. Commit: |
|
PR_Github #30012 [ run ] completed with state |
Signed-off-by: Lanyu Liao <[email protected]>
|
/bot run --disable-fail-fast |
|
PR_Github #30013 [ run ] triggered by Bot. Commit: |
|
/LLM/main/L0_MergeRequest_PR pipeline #23090 CI passed when using python scheduler, now we remove the enviroment variables with the latest commit so that cpp scheduler is used by default.(No impact on the main branch) |
|
PR_Github #30013 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #30019 [ run ] triggered by Bot. Commit: |
|
PR_Github #30019 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #30026 [ run ] triggered by Bot. Commit: |
|
PR_Github #30026 [ run ] completed with state |
As titled. This PR is our first step to refactor the scheduler. We will analyze the host overhead of python scheduler based on this PR.
Summary by CodeRabbit
Release Notes
New Features
TLLM_USE_PYTHON_SCHEDULER=1environment variable) with policy-based scheduling strategies.Chores
✏️ Tip: You can customize this high-level summary in your review settings.