Skip to content

Conversation

@lancelly
Copy link
Collaborator

@lancelly lancelly commented Dec 24, 2025

As titled. This PR is our first step to refactor the scheduler. We will analyze the host overhead of python scheduler based on this PR.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced Python-based unified scheduler option (enabled via TLLM_USE_PYTHON_SCHEDULER=1 environment variable) with policy-based scheduling strategies.
    • Exposed new methods for querying unique tokens and encoder tokens.
    • Added cache block scheduling utilities for improved memory management.
  • Chores

    • Updated initialization to support Python scheduler configuration.

✏️ Tip: You can customize this high-level summary in your review settings.

QiJune and others added 22 commits December 17, 2025 13:37
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>

Signed-off-by: junq <[email protected]>

Signed-off-by: junq <[email protected]>

Signed-off-by: junq <[email protected]>

Signed-off-by: junq <[email protected]>

Signed-off-by: junq <[email protected]>

Signed-off-by: junq <[email protected]>

Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
@lancelly lancelly requested review from a team as code owners December 24, 2025 09:20
@lancelly lancelly requested a review from HuiGao-NV December 24, 2025 09:20
@lancelly
Copy link
Collaborator Author

/bot run --disable-fail-fast

@lancelly lancelly requested review from QiJune and litaotju December 24, 2025 09:22
@tensorrt-cicd
Copy link
Collaborator

PR_Github #29796 [ run ] triggered by Bot. Commit: 411c254

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 24, 2025

📝 Walkthrough

Walkthrough

These changes extend Python bindings for GenLlmReq and KVCacheManager C++ classes, add an environment variable to enable Python-based scheduling, and introduce a comprehensive Python scheduling framework with capacity and micro-batch scheduling policies as an alternative to C++ scheduler components.

Changes

Cohort / File(s) Change Summary
C++ Pybind/Nanobind Bindings
cpp/tensorrt_llm/pybind/batch_manager/bindings.cpp, cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp
Expose new GenLlmReq Python methods: get_unique_tokens(beam) and get_unique_tokens() overloads, plus get_encoder_unique_tokens() returning optional VecUniqueTokens. Adjust binding chain on use_draft_model to enable additional chained bindings.
KV Cache Manager Bindings
cpp/tensorrt_llm/pybind/batch_manager/kvCacheManager.cpp, cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp
Add Python bindings for find_new_context_block(unique_tokens, llm_request) on BaseKVCacheManager and scheduling_has_free_blocks(num_required, window_size) on KVCacheManager, delegating to underlying C++ implementations.
Scheduler Initialization & Configuration
tensorrt_llm/__init__.py, tensorrt_llm/_torch/pyexecutor/_util.py
Set TLLM_USE_PYTHON_SCHEDULER=1 environment variable on startup. Add conditional logic in create_py_executor_instance to select SimpleUnifiedScheduler when flag is enabled; otherwise retain existing C++ scheduler selection logic.
Python Scheduling Framework
tensorrt_llm/_torch/pyexecutor/scheduler.py
Introduce comprehensive Python-based scheduling system: PyCapacityScheduler (orchestrator with policy-based fitting), PyMicroBatchScheduler (encoder/context/generation batching), and SimpleUnifiedScheduler (composite runner). Add SchedulerPolicyBase with MaxRequestsPolicy, GuaranteedNoEvictPolicy, MaxUtilizationPolicy implementations; block-tracking managers; ChunkingPolicy enum; and state/prioritization logic mirroring C++ behavior.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Executor as Executor
    participant Sched as SimpleUnifiedScheduler
    participant Capacity as PyCapacityScheduler
    participant MicroBatch as PyMicroBatchScheduler
    participant KVCache as KVCacheManager
    participant Policy as SchedulerPolicy

    Executor->>Sched: schedule(pending_requests, running_requests, kv_cache_manager)
    activate Sched
    
    Sched->>Capacity: schedule(pending, running, kv_cache_manager)
    activate Capacity
    
    Capacity->>Policy: get_new_request_ids(pending)
    activate Policy
    Policy->>Capacity: filtered_request_ids
    deactivate Policy
    
    loop For each candidate request
        Capacity->>KVCache: find_new_context_block(unique_tokens, request)
        KVCache->>Capacity: context_block_info
        Capacity->>Capacity: fit_request_to_blocks()
    end
    
    Capacity->>Sched: scheduled_requests, paused_requests
    deactivate Capacity
    
    Sched->>MicroBatch: schedule(scheduled_requests, kv_cache_manager)
    activate MicroBatch
    
    MicroBatch->>MicroBatch: compute_chunk_sizes()
    
    rect rgb(200, 220, 255)
        note right of MicroBatch: Encoder phase
        MicroBatch->>KVCache: scheduling_has_free_blocks()
        KVCache->>MicroBatch: has_free
    end
    
    rect rgb(220, 240, 220)
        note right of MicroBatch: Context phase
        MicroBatch->>MicroBatch: select_requests_for_context()
    end
    
    rect rgb(255, 240, 200)
        note right of MicroBatch: Generation phase
        MicroBatch->>MicroBatch: select_requests_for_generation()
    end
    
    MicroBatch->>Sched: SchedulerOutput (batches, tokens)
    deactivate MicroBatch
    
    Sched->>Executor: SchedulerOutput
    deactivate Sched
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is minimal and vague ('As titled. This PR is our first step to refactor the scheduler. We will analyze the host overhead of python scheduler based on this PR.'), missing required sections like Description, Test Coverage, and PR Checklist. Expand the description to explain the problem being solved, the solution approach, affected components, test coverage, and confirm completion of the PR checklist items.
Docstring Coverage ⚠️ Warning Docstring coverage is 52.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: re-implementing MicroBatchScheduler and CapacityScheduler in Python, with the JIRA ticket properly referenced.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29985 [ run ] triggered by Bot. Commit: bc443d0

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29985 [ run ] completed with state SUCCESS. Commit: bc443d0
/LLM/main/L0_MergeRequest_PR pipeline #23066 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@lancelly
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29996 [ run ] triggered by Bot. Commit: 7a26528

@tensorrt-cicd
Copy link
Collaborator

PR_Github #29996 [ run ] completed with state SUCCESS. Commit: 7a26528
/LLM/main/L0_MergeRequest_PR pipeline #23077 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@lancelly
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30006 [ run ] triggered by Bot. Commit: 7a26528

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30006 [ run ] completed with state SUCCESS. Commit: 7a26528
/LLM/main/L0_MergeRequest_PR pipeline #23084 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@lancelly
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30008 [ run ] triggered by Bot. Commit: 7a26528

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30008 [ run ] completed with state SUCCESS. Commit: 7a26528
/LLM/main/L0_MergeRequest_PR pipeline #23086 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@lancelly
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30009 [ run ] triggered by Bot. Commit: 7a26528

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30009 [ run ] completed with state SUCCESS. Commit: 7a26528
/LLM/main/L0_MergeRequest_PR pipeline #23087 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@lancelly
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30012 [ run ] triggered by Bot. Commit: 7a26528

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30012 [ run ] completed with state SUCCESS. Commit: 7a26528
/LLM/main/L0_MergeRequest_PR pipeline #23090 completed with status: 'SUCCESS'

@lancelly
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30013 [ run ] triggered by Bot. Commit: 4b65790

@lancelly
Copy link
Collaborator Author

/LLM/main/L0_MergeRequest_PR pipeline #23090 CI passed when using python scheduler, now we remove the enviroment variables with the latest commit so that cpp scheduler is used by default.(No impact on the main branch)

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30013 [ run ] completed with state SUCCESS. Commit: 4b65790
/LLM/main/L0_MergeRequest_PR pipeline #23091 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@lancelly
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30019 [ run ] triggered by Bot. Commit: 4b65790

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30019 [ run ] completed with state SUCCESS. Commit: 4b65790
/LLM/main/L0_MergeRequest_PR pipeline #23097 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@lancelly
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30026 [ run ] triggered by Bot. Commit: 4b65790

@tensorrt-cicd
Copy link
Collaborator

PR_Github #30026 [ run ] completed with state SUCCESS. Commit: 4b65790
/LLM/main/L0_MergeRequest_PR pipeline #23104 completed with status: 'SUCCESS'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants