[TRTLLM-10029][scheduler] Re-implement MicroBatchScheduler and CapacityScheduler in Python #10273

lancelly · 2025-12-24T09:20:52Z

As titled. This PR is our first step to refactor the scheduler. We will analyze the host overhead of python scheduler based on this PR.

Summary by CodeRabbit

Release Notes

New Features
- Introduced Python-based unified scheduler option (enabled via TLLM_USE_PYTHON_SCHEDULER=1 environment variable) with policy-based scheduling strategies.
- Exposed new methods for querying unique tokens and encoder tokens.
- Added cache block scheduling utilities for improved memory management.
Chores
- Updated initialization to support Python scheduler configuration.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: junq <[email protected]>

Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]> Signed-off-by: junq <[email protected]>

Signed-off-by: junq <[email protected]>

Signed-off-by: Lanyu Liao <[email protected]>

lancelly · 2025-12-24T09:22:16Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-12-24T09:28:30Z

PR_Github #29796 [ run ] triggered by Bot. Commit: 411c254

coderabbitai · 2025-12-24T09:29:02Z

📝 Walkthrough

Walkthrough

These changes extend Python bindings for GenLlmReq and KVCacheManager C++ classes, add an environment variable to enable Python-based scheduling, and introduce a comprehensive Python scheduling framework with capacity and micro-batch scheduling policies as an alternative to C++ scheduler components.

Changes

Cohort / File(s)	Change Summary
C++ Pybind/Nanobind Bindings `cpp/tensorrt_llm/pybind/batch_manager/bindings.cpp`, `cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp`	Expose new GenLlmReq Python methods: `get_unique_tokens(beam)` and `get_unique_tokens()` overloads, plus `get_encoder_unique_tokens()` returning optional VecUniqueTokens. Adjust binding chain on `use_draft_model` to enable additional chained bindings.
KV Cache Manager Bindings `cpp/tensorrt_llm/pybind/batch_manager/kvCacheManager.cpp`, `cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp`	Add Python bindings for `find_new_context_block(unique_tokens, llm_request)` on BaseKVCacheManager and `scheduling_has_free_blocks(num_required, window_size)` on KVCacheManager, delegating to underlying C++ implementations.
Scheduler Initialization & Configuration `tensorrt_llm/__init__.py`, `tensorrt_llm/_torch/pyexecutor/_util.py`	Set `TLLM_USE_PYTHON_SCHEDULER=1` environment variable on startup. Add conditional logic in `create_py_executor_instance` to select SimpleUnifiedScheduler when flag is enabled; otherwise retain existing C++ scheduler selection logic.
Python Scheduling Framework `tensorrt_llm/_torch/pyexecutor/scheduler.py`	Introduce comprehensive Python-based scheduling system: PyCapacityScheduler (orchestrator with policy-based fitting), PyMicroBatchScheduler (encoder/context/generation batching), and SimpleUnifiedScheduler (composite runner). Add SchedulerPolicyBase with MaxRequestsPolicy, GuaranteedNoEvictPolicy, MaxUtilizationPolicy implementations; block-tracking managers; ChunkingPolicy enum; and state/prioritization logic mirroring C++ behavior.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Executor as Executor
    participant Sched as SimpleUnifiedScheduler
    participant Capacity as PyCapacityScheduler
    participant MicroBatch as PyMicroBatchScheduler
    participant KVCache as KVCacheManager
    participant Policy as SchedulerPolicy

    Executor->>Sched: schedule(pending_requests, running_requests, kv_cache_manager)
    activate Sched
    
    Sched->>Capacity: schedule(pending, running, kv_cache_manager)
    activate Capacity
    
    Capacity->>Policy: get_new_request_ids(pending)
    activate Policy
    Policy->>Capacity: filtered_request_ids
    deactivate Policy
    
    loop For each candidate request
        Capacity->>KVCache: find_new_context_block(unique_tokens, request)
        KVCache->>Capacity: context_block_info
        Capacity->>Capacity: fit_request_to_blocks()
    end
    
    Capacity->>Sched: scheduled_requests, paused_requests
    deactivate Capacity
    
    Sched->>MicroBatch: schedule(scheduled_requests, kv_cache_manager)
    activate MicroBatch
    
    MicroBatch->>MicroBatch: compute_chunk_sizes()
    
    rect rgb(200, 220, 255)
        note right of MicroBatch: Encoder phase
        MicroBatch->>KVCache: scheduling_has_free_blocks()
        KVCache->>MicroBatch: has_free
    end
    
    rect rgb(220, 240, 220)
        note right of MicroBatch: Context phase
        MicroBatch->>MicroBatch: select_requests_for_context()
    end
    
    rect rgb(255, 240, 200)
        note right of MicroBatch: Generation phase
        MicroBatch->>MicroBatch: select_requests_for_generation()
    end
    
    MicroBatch->>Sched: SchedulerOutput (batches, tokens)
    deactivate MicroBatch
    
    Sched->>Executor: SchedulerOutput
    deactivate Sched

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is minimal and vague ('As titled. This PR is our first step to refactor the scheduler. We will analyze the host overhead of python scheduler based on this PR.'), missing required sections like Description, Test Coverage, and PR Checklist.	Expand the description to explain the problem being solved, the solution approach, affected components, test coverage, and confirm completion of the PR checklist items.
Docstring Coverage	⚠️ Warning	Docstring coverage is 52.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: re-implementing MicroBatchScheduler and CapacityScheduler in Python, with the JIRA ticket properly referenced.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tensorrt-cicd · 2025-12-26T02:29:29Z

PR_Github #29985 [ run ] triggered by Bot. Commit: bc443d0

tensorrt-cicd · 2025-12-26T06:28:10Z

PR_Github #29985 [ run ] completed with state SUCCESS. Commit: bc443d0
/LLM/main/L0_MergeRequest_PR pipeline #23066 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Signed-off-by: Lance Liao <[email protected]>

lancelly · 2025-12-26T07:08:21Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-12-26T07:14:03Z

PR_Github #29996 [ run ] triggered by Bot. Commit: 7a26528

tensorrt-cicd · 2025-12-26T11:05:54Z

PR_Github #29996 [ run ] completed with state SUCCESS. Commit: 7a26528
/LLM/main/L0_MergeRequest_PR pipeline #23077 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

lancelly · 2025-12-26T11:24:25Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-12-26T11:30:03Z

PR_Github #30006 [ run ] triggered by Bot. Commit: 7a26528

tensorrt-cicd · 2025-12-26T15:57:15Z

PR_Github #30006 [ run ] completed with state SUCCESS. Commit: 7a26528
/LLM/main/L0_MergeRequest_PR pipeline #23084 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

lancelly · 2025-12-26T16:13:13Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-12-26T16:19:23Z

PR_Github #30008 [ run ] triggered by Bot. Commit: 7a26528

tensorrt-cicd · 2025-12-26T18:05:05Z

PR_Github #30008 [ run ] completed with state SUCCESS. Commit: 7a26528
/LLM/main/L0_MergeRequest_PR pipeline #23086 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

lancelly · 2025-12-27T01:00:46Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-12-27T01:06:38Z

PR_Github #30009 [ run ] triggered by Bot. Commit: 7a26528

tensorrt-cicd · 2025-12-27T02:59:31Z

PR_Github #30009 [ run ] completed with state SUCCESS. Commit: 7a26528
/LLM/main/L0_MergeRequest_PR pipeline #23087 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

lancelly · 2025-12-27T03:30:29Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-12-27T03:36:22Z

PR_Github #30012 [ run ] triggered by Bot. Commit: 7a26528

tensorrt-cicd · 2025-12-27T04:53:04Z

PR_Github #30012 [ run ] completed with state SUCCESS. Commit: 7a26528
/LLM/main/L0_MergeRequest_PR pipeline #23090 completed with status: 'SUCCESS'

Signed-off-by: Lanyu Liao <[email protected]>

lancelly · 2025-12-27T05:51:40Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-12-27T05:58:03Z

PR_Github #30013 [ run ] triggered by Bot. Commit: 4b65790

lancelly · 2025-12-27T06:02:22Z

/LLM/main/L0_MergeRequest_PR pipeline #23090 CI passed when using python scheduler, now we remove the enviroment variables with the latest commit so that cpp scheduler is used by default.(No impact on the main branch)

tensorrt-cicd · 2025-12-27T09:49:23Z

PR_Github #30013 [ run ] completed with state SUCCESS. Commit: 4b65790
/LLM/main/L0_MergeRequest_PR pipeline #23091 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

lancelly · 2025-12-27T10:10:45Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-12-27T10:17:03Z

PR_Github #30019 [ run ] triggered by Bot. Commit: 4b65790

tensorrt-cicd · 2025-12-27T14:35:44Z

PR_Github #30019 [ run ] completed with state SUCCESS. Commit: 4b65790
/LLM/main/L0_MergeRequest_PR pipeline #23097 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

lancelly · 2025-12-27T14:47:13Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-12-27T14:53:14Z

PR_Github #30026 [ run ] triggered by Bot. Commit: 4b65790

tensorrt-cicd · 2025-12-27T15:38:27Z

PR_Github #30026 [ run ] completed with state SUCCESS. Commit: 4b65790
/LLM/main/L0_MergeRequest_PR pipeline #23104 completed with status: 'SUCCESS'

QiJune and others added 22 commits December 17, 2025 13:37

re-implement micro batch scheduler and capacity scheduler in python

64bce0f

Signed-off-by: junq <[email protected]>

refine

034fffb

Signed-off-by: junq <[email protected]>

enable SimpleUnifiedScheduler

927b417

Signed-off-by: junq <[email protected]>

fix

3609b20

Signed-off-by: junq <[email protected]>

fix

c901b21

Signed-off-by: junq <[email protected]>

fix

84cebc9

Signed-off-by: junq <[email protected]>

fix

490f8e9

Signed-off-by: junq <[email protected]>

fix

4e62403

Signed-off-by: junq <[email protected]>

fix

87caccb

Signed-off-by: junq <[email protected]>

fix

d1aebe7

Signed-off-by: junq <[email protected]>

fix

4d1f530

Signed-off-by: junq <[email protected]>

fix

641236d

Signed-off-by: junq <[email protected]>

enable py scheduler

162d59e

Signed-off-by: junq <[email protected]>

support bert

707fb4a

Signed-off-by: junq <[email protected]>

fix

fbc8486

Signed-off-by: junq <[email protected]>

fix

d344670

Signed-off-by: junq <[email protected]>

fix

6617a47

Signed-off-by: junq <[email protected]>

fix

63c09c6

Signed-off-by: junq <[email protected]>

fix gemma

2a3a7f2

Signed-off-by: junq <[email protected]>

fix lora

2f30b99

Signed-off-by: junq <[email protected]>

implement scheduler using python

411c254

Signed-off-by: Lanyu Liao <[email protected]>

lancelly requested review from a team as code owners December 24, 2025 09:20

lancelly requested a review from HuiGao-NV December 24, 2025 09:20

lancelly requested review from QiJune and litaotju December 24, 2025 09:22

fix scheduler capacity for disagg gen init reqs

7a26528

Signed-off-by: Lance Liao <[email protected]>

kmk142789 approved these changes Dec 26, 2025

View reviewed changes

use cpp scheduler by default for now

4b65790

Signed-off-by: Lanyu Liao <[email protected]>

[TRTLLM-10029][scheduler] Re-implement MicroBatchScheduler and CapacityScheduler in Python #10273

Are you sure you want to change the base?

[TRTLLM-10029][scheduler] Re-implement MicroBatchScheduler and CapacityScheduler in Python #10273

Conversation

lancelly commented Dec 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

lancelly commented Dec 24, 2025

Uh oh!

tensorrt-cicd commented Dec 24, 2025

Uh oh!

coderabbitai bot commented Dec 24, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

tensorrt-cicd commented Dec 26, 2025

Uh oh!

tensorrt-cicd commented Dec 26, 2025

Uh oh!

lancelly commented Dec 26, 2025

Uh oh!

tensorrt-cicd commented Dec 26, 2025

Uh oh!

tensorrt-cicd commented Dec 26, 2025

Uh oh!

lancelly commented Dec 26, 2025

Uh oh!

tensorrt-cicd commented Dec 26, 2025

Uh oh!

tensorrt-cicd commented Dec 26, 2025

Uh oh!

lancelly commented Dec 26, 2025

Uh oh!

tensorrt-cicd commented Dec 26, 2025

Uh oh!

tensorrt-cicd commented Dec 26, 2025

Uh oh!

lancelly commented Dec 27, 2025

Uh oh!

tensorrt-cicd commented Dec 27, 2025

Uh oh!

tensorrt-cicd commented Dec 27, 2025

Uh oh!

lancelly commented Dec 27, 2025

Uh oh!

tensorrt-cicd commented Dec 27, 2025

Uh oh!

tensorrt-cicd commented Dec 27, 2025

Uh oh!

lancelly commented Dec 27, 2025

Uh oh!

tensorrt-cicd commented Dec 27, 2025

Uh oh!

lancelly commented Dec 27, 2025

Uh oh!

tensorrt-cicd commented Dec 27, 2025

Uh oh!

lancelly commented Dec 27, 2025

Uh oh!

tensorrt-cicd commented Dec 27, 2025

Uh oh!

tensorrt-cicd commented Dec 27, 2025

Uh oh!

lancelly commented Dec 27, 2025

Uh oh!

tensorrt-cicd commented Dec 27, 2025

Uh oh!

tensorrt-cicd commented Dec 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

lancelly commented Dec 24, 2025 •

edited by coderabbitai bot

Loading