[Feat][Diffusion]: Implement Component-Level VRAM Quota and Resource Domain Isolation#1582
Open
Flink-ddd wants to merge 3 commits intovllm-project:mainfrom
Open
[Feat][Diffusion]: Implement Component-Level VRAM Quota and Resource Domain Isolation#1582Flink-ddd wants to merge 3 commits intovllm-project:mainfrom
Flink-ddd wants to merge 3 commits intovllm-project:mainfrom
Conversation
Signed-off-by: vensen <vensenmu@gmail.com>
Signed-off-by: vensen <vensenmu@gmail.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c19643c896
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: vensen <vensenmu@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
This PR resolves the Stage-0 initialization deadlock identified in Issue #1574.
Root Cause Resolved: Previously, the Diffusion Worker (Stage-1) allocated ~27GB of VRAM silently via global torch calls without reporting its budget to the Orchestrator. This caused the LLM (Stage-0) to hit a "Memory Blind Spot" during its profiling phase, resulting in a ValueError: No available memory (0.0 GiB reported).
Key Innovations:
Heuristic Budget Pre-audit: Introduced a staticmethod predict_resource_usage in DiffusionWorker to calculate VRAM footprint based on model metadata (Parameters/Dtype/Resolution) before worker spawning.
Resource Domain Isolation: Implemented an Orchestrator-level coordinator that adjusts Stage-0's gpu_memory_utilization using a Dynamic Utilization Boost algorithm to compensate for concurrent modal loads.
RFC #1316 Alignment: This serves as a critical prerequisite for the Sleep Mode ACK mechanism. It provides the "Logical Baseline" required to audit physical VRAM reclamation during diffusion Level 1/2 sleep transitions.
Test Plan
Validated using the Bagel (7B-MoT) multi-modal pipeline on an NVIDIA A100-80GB PCIe GPU.
Test Command:
pytest tests/e2e/offline_inference/test_bagel_text2img.py::test_bagel_text2img_shared_memory_connector -s 2>&1 | tee test_results.logTest Result
VRAM Resource Model Comparison
Integration with Sleep Mode ACK (RFC #1316)
This PR establishes the foundation for the upcoming Sleep Mode ACK PR:
Logical Audit: The predict_resource_usage method provides the expected_freed_gb value used in ACK signals.
Dynamic Ledger: By tracking total_reserved_gb in the Orchestrator, we enable a "Logic vs. Physical" dual-audit mechanism. When an ACK confirms a physical release (Level 2 Sleep), the Orchestrator can dynamically decrease the reserved budget, enabling deterministic KV Cache expansion for the active LLM worker.
Verified Logs (Success Case)
Click to expand E2E passed logs
Success Highlights:
[0].Device 0.Full Test Logs: