[Feature]: Support cfg kv-cache transfer in multi-stage by princepride · Pull Request #1422 · vllm-project/vllm-omni

princepride · 2026-02-21T12:35:34Z

Purpose

Related: #1419

User Request req_0
    │
    ├─── prompt_expand_func ──→ companion req_0__cfg_text
    │
    ▼ (Stage-0: AR/LLM)
    ├── req_0 inference complete ──→ pending in _pending_parent_results
    └── req_0__cfg_text inference complete ──→ cfg_companion_done flag
                                                   │
                                                   ▼ all companions done?
                                                   │
    _forward_parent_with_cfg ◄─────────────────────┘
    │
    │  sp_next.cfg_kv_request_ids = {"cfg_text": "req_0__cfg_text"}
    │
    ▼ (Connector: SharedMemory)
    │
    ▼ (Stage-1: Diffusion/DiT)
    │
    receive_multi_kv_cache:
    │  1. receive_kv_cache(req_0)         → gen KV
    │  2. collect_cfg_kv_caches(cfg_ids)  → cfg_text KV
    │
    ▼ pipeline_bagel.forward:
       gen_context["past_key_values"]      = gen KV
       cfg_text_context["past_key_values"] = cfg_text KV
       cfg_img_context["past_key_values"]  = gen KV (text2img reuse)
       │
       ▼ 3-branch CFG DiT denoising
       │
       ▼ Output Image

Test Plan

Multi-Stage inference tasks can now generate outputs with the same high quality as only DIT stage.

Multi-Stage Test:

python3 examples/offline_inference/bagel/end2end.py --prompts "A cute cat" --modality text2img

Diffusion Test:

from vllm_omni.entrypoints.omni_diffusion import OmniDiffusion
from vllm_omni.inputs.data import OmniDiffusionSamplingParams, OmniPromptType

def main():
    pipeline = OmniDiffusion(
        model="../models/BAGEL-7B-MoT",
    )
    prompts = {
        "prompt": "A cute cat",
    }
    
    result = pipeline.generate(
        prompts,
        OmniDiffusionSamplingParams(
            seed=52
        )
    )
    result[0].images[0].save("bagel_i2i_output.png")

if __name__ == "__main__":
    main()

Test Result

Before:

After:

Signed-off-by: princepride <wangzhipeng628@gmail.com>

princepride · 2026-02-21T12:38:46Z

@natureofnature @hsliuustc0106 @ZJY0516 Can take a look after holiday ends.😊

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6cde323408

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm_omni/entrypoints/omni.py

hsliuustc

PR Review Summary

Overview

This PR adds CFG (Classifier-Free Guidance) KV-cache transfer for multi-stage inference in vllm-omni. It enables high-quality 3-branch CFG in Bagel's AR → Diffusion pipeline without degrading image quality.

Stats: 12 files changed, +487 / -18 lines

Critical Issues (Must Fix)

1. Race condition in `_forward_parent_with_cfg` (P1) 🚨

Location: vllm_omni/entrypoints/omni.py:~1046

Problem: When multiple CFG-enabled requests are in flight, _forward_parent_with_cfg recomputes next_inputs from shared mutable state (self.stage_list[0].engine_outputs) which can be overwritten by a different request's Stage-0 output. This causes the diffusion stage to receive token IDs from the wrong parent request.

Recommended Fix: Use the saved parent_result["engine_outputs"] instead of recomputing from shared state.

2. Missing error handling for companion request failures 🚨

Location: vllm_omni/entrypoints/omni.py ~806-825

Problem: If a CFG companion request fails at Stage-0, the parent request will wait indefinitely → deadlock.

Recommended Fix: Add timeout and error propagation.

3. Memory leak potential with `_pending_parent_results` 🚨

Location: vllm_omni/entrypoints/omni.py ~935-940

Problem: Failed companion requests are never cleaned up.

Recommended Fix: Implement cleanup on error/timeout paths.

Important Issues (Should Fix)

4. Missing validation in `collect_cfg_kv_caches`

vllm_omni/model_executor/stage_input_processors/bagel.py lines 107-139

5. Batch size limitation without documentation

vllm_omni/model_executor/stage_configs/bagel.yaml line 7

6. No handling for img2img case

vllm_omni/model_executor/stage_input_processors/bagel.py line 76

Positive Aspects ✅

Clean separation of concerns
Extensible hook-based design
Backward compatible
CI checks passing

Overall Assessment

The core design is sound, but the race condition is a must-fix before merge. Once the critical issues are addressed, this will be a valuable addition.

Action items: Fix race condition, add error handling, implement cleanup for memory leaks.

lishunyang12

Nice approach to CFG KV-cache transfer. I have a few concerns around error handling and concurrency.

vllm_omni/entrypoints/omni.py

vllm_omni/model_executor/stage_input_processors/bagel.py

vllm_omni/diffusion/models/bagel/pipeline_bagel.py

vllm_omni/model_executor/stage_configs/bagel.yaml

vllm_omni/distributed/omni_connectors/kv_transfer_manager.py

…add companion timeout Signed-off-by: princepride <wangzhipeng628@gmail.com>

Signed-off-by: princepride <wangzhipeng628@gmail.com>

lishunyang12

\

hsliuustc0106 · 2026-02-23T05:13:10Z

@vllm-omni-reviewer

github-actions · 2026-02-23T05:14:48Z

🤖 VLLM-Omni PR Review

Code Review: Support CFG KV-Cache Transfer in Multi-Stage

1. Overview

This PR implements CFG (Classifier-Free Guidance) KV-cache transfer for multi-stage inference in VLLM-Omni. The key insight is that BAGEL's 3-branch CFG requires multiple prompts through the AR stage (gen, cfg_text, cfg_img), and their KV caches need to be transferred to the diffusion stage.

Key Changes:

Added prompt expansion mechanism to generate CFG companion requests
Implemented multi-KV cache collection and transfer
Updated BAGEL pipeline to use multiple KV caches for CFG
Added model-specific processor functions for BAGEL

Overall Assessment: Positive - The implementation is well-structured and addresses a real limitation. The before/after images demonstrate clear quality improvement.

2. Code Quality

Strengths

Good separation of concerns with model-specific functions in bagel.py
Comprehensive logging for debugging
Proper error handling with try/except blocks
Clear docstrings in the new processor module

Issues

vllm_omni/entrypoints/omni.py:906 - Shallow copy concern:

sp_next = copy.copy(sampling_params_list[next_stage_id])

This is a shallow copy. If OmniDiffusionSamplingParams contains nested mutable objects, modifications could affect the original. Consider using copy.deepcopy() or ensuring the class implements __copy__ properly.

vllm_omni/diffusion/models/bagel/pipeline_bagel.py:339-358 - Repeated getattr calls:

cfg_text_kv = getattr(req.sampling_params, "cfg_text_past_key_values", None)
# ... later ...
cfg_text_metadata = getattr(req.sampling_params, "cfg_text_kv_metadata", None)

Consider extracting sampling_params to a local variable and using getattr once per attribute.

vllm_omni/model_executor/stage_input_processors/bagel.py:152 - Magic string for default negative prompt:

return "<|im_start|><|im_end|>"

Consider making this a constant or configurable value.

3. Architecture & Design

Strengths

Callback injection pattern: Loading prompt_expand_func and cfg_kv_collect_func from config is elegant and extensible
Race condition mitigation: The source_outputs_override parameter addresses the race condition where deferred requests read stale outputs
Configurable timeout: VLLM_CFG_PENDING_TIMEOUT_S environment variable for safety timeout

Concerns

vllm_omni/entrypoints/omni_stage.py:236-242 - Dynamic function loading security:

def _load_func_from_config(stage_config: Any, attr_name: str):
    func_path = getattr(stage_config, attr_name, None)
    if not func_path:
        return None
    module_path, func_name = func_path.rsplit(".", 1)
    module = importlib.import_module(module_path)
    return getattr(module, func_name)

This dynamically imports and calls functions based on config. If config files can be user-controlled, this is a potential security vector. Consider:

Adding a whitelist of allowed modules/functions
Documenting that config files should be trusted

vllm_omni/entrypoints/omni.py - The orchestrator logic has grown significantly complex. Consider extracting CFG-related logic into a separate helper class:

class CFGCompanionManager:
    """Manages CFG companion request lifecycle."""
    
    def __init__(self, prompt_expand_func, timeout_s: float = 120.0):
        self.companion_map: dict[str, dict[str, str]] = {}
        self.companion_ids: set[str] = set()
        self.companion_done: dict[str, set[str]] = {}
        self.pending_parents: dict[str, Any] = {}
        self.failed_parents: set[str] = set()
    
    def expand_prompts(self, request_id_to_prompt, sampling_params): ...
    def on_companion_complete(self, req_id): ...
    def check_pending_timeouts(self): ...

4. Security & Safety

Resource Management

vllm_omni/entrypoints/omni_stage.py:654-664: Good use of try/finally to restore engine_outputs after temporary modification

Input Validation

vllm_omni/distributed/omni_connectors/kv_transfer_manager.py:502-504: The exception handling silently continues after logging. Consider whether this should propagate or return a partial result indicator:

except Exception:
    logger.exception("Failed to collect CFG KV caches for %s", request_id)

Potential Issues

Memory: Multiple KV caches per request could increase memory usage significantly. Consider adding memory monitoring or limits.

5. Testing & Documentation

Test Coverage

The test plan demonstrates functional correctness with before/after images
Missing: unit tests for the new processor functions
Missing: edge case tests (companion failure, timeout, empty prompts)

Documentation

Good docstrings in bagel.py
Missing: documentation for the new config options (prompt_expand_func, cfg_kv_collect_func)
Missing: explanation of the CFG companion request flow in architecture docs

Suggested Test Cases

# Test prompt expansion
def test_expand_cfg_prompts_text2img():
    prompt = {"prompt": "A cat", "modalities": ["image"]}
    result = expand_cfg_prompts(prompt, mock_sampling_params)
    assert len(result) == 1
    assert result[0].role == "cfg_text"

# Test timeout handling
def test_cfg_companion_timeout():
    # Verify parent request is properly cleaned up after timeout
    pass

# Test companion failure propagation
def test_cfg_companion_failure_propagates():
    # Verify parent fails when companion fails
    pass

6. Specific Suggestions

`vllm_omni/entrypoints/omni.py:772`

remaining_by_stage: list[int] = [len(request_prompts) + len(cfg_companion_ids)] + [0] * (num_stages - 1)

Consider adding a comment explaining why companion IDs are counted in stage-0 but not in total_requests.

`vllm_omni/entrypoints/omni.py:799-821`

The companion failure handling logic is duplicated in two places (error result and timeout). Extract to a helper:

def _handle_companion_failure(self, parent_id: str, reason: str):
    _cfg_failed_parents.add(parent_id)
    logger.error(f"[{self._name}] {reason}")
    if parent_id in _pending_parent_results:
        _pending_parent_results.pop(parent_id)
        # ... rest of cleanup

`vllm_omni/diffusion/data.py:382`

cfg_kv_collect_func: Any | None = None

Consider using Callable | None with proper signature:

from typing import Callable
CfgKvCollectFunc = Callable[[str, dict[str, str], Any, torch.device | None], dict[str, Any]]
cfg_kv_collect_func: CfgKvCollectFunc | None = None

`vllm_omni/model_executor/stage_input_processors/bagel.py:20`

The constant CFG_TEXT_SUFFIX is good. Consider adding other suffixes as constants for future img2img support:

CFG_TEXT_SUFFIX = "__cfg_text"
CFG_IMG_SUFFIX = "__cfg_img"  # Reserved for img2img

`vllm_omni/inputs/data.py:232-237`

Consider grouping CFG-related fields with a comment:

# [Omni] Multi-KV for CFG: populated by model-specific cfg_kv_collect_func
# These fields store companion KV caches for 3-branch CFG
cfg_text_past_key_values: Any | None = None
cfg_img_past_key_values: Any | None = None
cfg_text_kv_metadata: dict[str, Any] | None = None
cfg_img_kv_metadata: dict[str, Any] | None = None
cfg_kv_request_ids: dict[str, str] | None = None  # role -> request_id mapping

7. Approval Status

LGTM with suggestions

The PR is well-designed and achieves its stated goal of enabling CFG in multi-stage inference. The before/after results clearly demonstrate the quality improvement. The architecture is extensible and the error handling is reasonable.

Minor suggestions to address before merge:

Consider using copy.deepcopy or verifying shallow copy is safe
Add documentation for new config options
Consider extracting CFG management logic into a helper class for maintainability
Add unit tests for the processor functions

These are not blocking issues - the core functionality is solid and the code is production-ready. The suggestions are for long-term maintainability.

This review was generated automatically by the VLLM-Omni PR Reviewer Bot
using glm-5.

Signed-off-by: princepride <wangzhipeng628@gmail.com>

hsliuustc0106

Overall the design is solid — the hook-based approach keeps model-specific CFG logic out of the orchestrator core, and the companion request paradigm is a reasonable way to handle multi-branch CFG across disaggregated stages.

Main concerns:

Verify request accounting (completed_requests) is correct in all paths (normal, timeout, companion failure) — bugs there would cause hangs.
O(n) parent lookup for companion requests could be improved with a reverse index.
source_outputs_override temporary mutation of shared state is not thread-safe.
Unused utility functions (is_cfg_companion_request, get_parent_request_id) should be removed or deferred.

See inline comments for details.

vllm_omni/entrypoints/omni.py

vllm_omni/entrypoints/omni_stage.py

vllm_omni/model_executor/stage_input_processors/bagel.py

vllm_omni/model_executor/stage_configs/bagel.yaml

The parameter is passed as a plain int (value-copied), so any mutation inside the method would never propagate back to the caller's loop. The method never uses it either. Remove to avoid misleading future maintainers into thinking they can increment it here. Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: princepride <wangzhipeng628@gmail.com>

Replace two O(parents × roles) linear scans in the error-handling and companion-completion paths with a pre-built cfg_companion_to_parent dict populated during the expansion phase. Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: princepride <wangzhipeng628@gmail.com>

The temporary mutation of stage_list[].engine_outputs is safe today because the orchestrator loop is single-threaded, but would race under concurrency. Add a NOTE comment to flag this for future work. Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: princepride <wangzhipeng628@gmail.com>

Neither function is called anywhere in the codebase. Removing them keeps the diff focused; they can be re-added when actually needed. Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: princepride <wangzhipeng628@gmail.com>

The resolved function object is never used in the orchestrator process; only the raw string travels to the worker for re-resolution. Add a comment explaining this is intentional early-validation of the import. Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: princepride <wangzhipeng628@gmail.com>

An empty string negative_prompt is not meaningful for Bagel CFG guidance. Change `if neg is not None` to `if neg` so that both None and "" fall through to the default token pair. Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: princepride <wangzhipeng628@gmail.com>

Stage-0 max_batch_size=2 assumes single-prompt inference (1 user + 1 CFG companion). For multi-prompt batches it should scale accordingly. Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: princepride <wangzhipeng628@gmail.com>

Signed-off-by: princepride <wangzhipeng628@gmail.com>

hsliuustc0106 · 2026-02-24T07:06:37Z

@vllm-omni-reviewer

github-actions · 2026-02-24T07:14:50Z

🤖 VLLM-Omni PR Review

Code Review: Support CFG KV-Cache Transfer in Multi-Stage

1. Overview

This PR implements Classifier-Free Guidance (CFG) support across disaggregated multi-stage inference pipelines through a "companion request" paradigm. The implementation:

Expands prompts in Stage 0 (AR/LLM) to generate companion requests (e.g., negative prompts)
Transfers KV caches for both primary and companion requests across stage boundaries
Collects and injects companion KV caches in downstream diffusion stages for CFG computation

The approach is well-designed, allowing CFG to work efficiently without redundant text encoder evaluations on diffusion workers.

Overall Assessment: Positive - The implementation is solid, well-documented, and follows the existing architecture patterns. A few suggestions for robustness and maintainability are noted below.

2. Code Quality

Strengths

Clean separation of model-specific logic into bagel.py
Good error handling for companion failures with timeout mechanism
Comprehensive logging for debugging
Backward compatible - existing configs work without changes

Concerns

Magic Strings: Several string literals are used throughout that should be constants:

# In omni.py and bagel.py
"cfg_text", "cfg_img", "__cfg_text"

Type Annotations: Some functions in bagel.py use Any excessively, which reduces type safety benefits.

Complex State Management: The orchestrator in omni.py now manages multiple dictionaries for CFG state tracking. While functional, this adds complexity.

3. Architecture & Design

Strengths

Configuration-driven design: New hooks (prompt_expand_func, cfg_kv_collect_func) are specified in YAML configs
Extensibility: Easy to add support for other models by creating new processor modules
Consistent patterns: Follows existing stage configuration and worker initialization patterns

Suggestions

Thread Safety Acknowledgment: The comment in omni_stage.py:658-662 correctly identifies a potential issue:

# NOTE: This relies on the orchestrator being single-threaded.
# If concurrency is introduced, replace with a per-call context
# or a thread-local to avoid racing on shared mutable state.

Consider adding a docstring or assertion to document this assumption explicitly.

4. Security & Safety

Dynamic Function Loading

The _load_func_from_config function dynamically imports functions based on config values. This is consistent with existing patterns but could be a concern if untrusted configs are loaded:

vllm_omni/entrypoints/omni_stage.py:236-243

def _load_func_from_config(stage_config: Any, attr_name: str):
    """Dynamically import a function referenced by a dotted path in stage config."""
    func_path = getattr(stage_config, attr_name, None)
    if not func_path:
        return None
    module_path, func_name = func_path.rsplit(".", 1)
    module = importlib.import_module(module_path)
    return getattr(module, func_name)

Suggestion: Consider validating that the function path starts with an allowed prefix (e.g., vllm_omni.) to prevent arbitrary code execution from malicious configs.

Resource Management

The timeout mechanism for pending parents is good:

vllm_omni/entrypoints/omni.py:1028-1042

However, the default timeout of 120 seconds may be too long for some use cases. Consider documenting this environment variable more prominently.

5. Testing & Documentation

Documentation

Documentation is comprehensive and well-written. The architecture overview clearly explains the CFG companion flow.

Testing

The test changes update reference pixels to match new CFG-enabled output, but there are no unit tests for:

expand_cfg_prompts function
collect_cfg_kv_caches function
CFG companion failure scenarios
Timeout handling

Suggestion: Add unit tests for the new processor functions in bagel.py.

6. Specific Suggestions

`vllm_omni/model_executor/stage_input_processors/bagel.py`

Line 21: Consider defining constants for role names:

# Suggestion
ROLE_CFG_TEXT = "cfg_text"
ROLE_CFG_IMG = "cfg_img"
CFG_TEXT_SUFFIX = "__cfg_text"

Line 45-47: The logic for determining when to expand could be clearer:

# Current
if "image" not in modalities and "img2img" not in modalities:
    return []

# Suggestion - add comment explaining why
# Only expand for image generation tasks (text2img, img2img)
# Text-only tasks don't need CFG expansion
if "image" not in modalities and "img2img" not in modalities:
    return []

Line 138-142: The fallback default for negative prompt could be documented:

# Suggestion - add comment
# Bagel's default unconditional prompt is the empty chat template
# This produces the text-unconditional branch for CFG
return "<|im_start|><|im_end|>"

`vllm_omni/entrypoints/omni.py`

Line 680-684: Consider extracting CFG state into a dataclass for clarity:

# Suggestion
@dataclass
class CFGState:
    companion_map: dict[str, dict[str, str]]  # parent_id -> {role: companion_id}
    companion_ids: set[str]
    companion_done: dict[str, set[str]]  # parent_id -> set of completed companion_ids
    companion_to_parent: dict[str, str]  # reverse index

Line 826-852: The companion handling logic is nested deeply. Consider extracting to a helper method:

# Suggestion - extract to method
def _handle_completed_companion(self, req_id: str, companion_parent_id: str | None, ...):
    """Handle a completed CFG companion request at Stage-0."""
    ...

Line 1028-1042: The timeout loop iterates over all pending parents each cycle. For efficiency with many pending requests:

# Current - O(n) check every iteration
for parent_id in list(_pending_parent_results.keys()):
    pending_since = _pending_parent_results[parent_id].get("pending_since", _now)

# Suggestion - use a heap or sorted structure for O(log n) expiry checks
# Or check only periodically rather than every iteration

`vllm_omni/diffusion/models/bagel/pipeline_bagel.py`

Line 337-367: The KV cache injection logic handles multiple cases. Consider adding a docstring explaining the CFG KV structure:

# Suggestion - add docstring or comment
# CFG requires 3 KV caches for Bagel:
# 1. gen (conditional): user prompt KV
# 2. cfg_text (text-unconditional): negative/empty prompt KV
# 3. cfg_img (image-unconditional): for text2img, same as gen

`vllm_omni/entrypoints/omni_stage.py`

Line 658-671: The temporary swap of engine_outputs is clever but fragile. Consider a safer approach:

# Suggestion - pass override explicitly through the call chain
# rather than mutating shared state
return self.custom_process_input_func(
    stage_list, engine_input_source, prompt, self.requires_multimodal_data,
    source_outputs_override=source_outputs_override
)

This would require updating the custom processor signature, but would be safer.

`vllm_omni/distributed/omni_connectors/kv_transfer_manager.py`

Line 501-503: Good error handling, but consider logging the specific exception:

# Current
except Exception:
    logger.exception("Failed to collect CFG KV caches for %s", request_id)

# Already good - logger.exception includes the traceback

7. Approval Status

LGTM with suggestions

The PR is well-designed and implements an important feature for multi-modal CFG. The architecture is sound, documentation is thorough, and the implementation follows existing patterns. The suggestions above are primarily for:

Maintainability: Constants for magic strings, extracting complex logic
Robustness: Unit tests for new functions, safer state management
Future-proofing: Thread safety documentation

None of the suggestions block merging - they can be addressed in follow-up PRs if preferred. The core functionality is solid and the test results demonstrate the feature works correctly.

This review was generated automatically by the VLLM-Omni PR Reviewer Bot
using glm-5.

Signed-off-by: princepride <wangzhipeng628@gmail.com>

lishunyang12

Final Review Pass

All my original concerns have been addressed:

Deadlock risk -- Resolved. Companion failure propagation (error path + timeout expiry at VLLM_CFG_PENDING_TIMEOUT_S) covers all three failure scenarios.
Linear scan -- Resolved. cfg_companion_to_parent reverse index gives O(1) lookup.
Race condition on shared engine_outputs -- Resolved. _forward_parent_with_cfg passes saved parent_result["engine_outputs"] via source_outputs_override. Single-threaded assumption NOTE is appreciated.
receive_kv_cache_for_request definition -- Confirmed existing in the codebase.
img2img CFG skip -- Understood; img2img bypasses CFG expansion by design.
Hard failure for connector -- Agreed; KV transfer is required for CFG, RuntimeError is correct.
Batch size comment -- Added and clear.

Also noted that feedback from @hsliuustc0106 was addressed: unused completed_requests param removed, empty-string negative prompt handled, unused utility functions removed, clarifying comments added.

Minor suggestions for follow-up (non-blocking):

Unit tests for expand_cfg_prompts and collect_cfg_kv_caches would improve confidence in edge cases
Role strings ("cfg_text", "cfg_img") could be extracted to constants alongside CFG_TEXT_SUFFIX
cfg_kv_collect_func typing in diffusion/data.py could use Callable instead of Any

LGTM -- approving.

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Signed-off-by: princepride <wangzhipeng628@gmail.com>

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

princepride · 2026-02-26T17:47:48Z

@hsliuustc0106 @tzhouam I encapsulated the CFG prompt expansion and companion tracking logic in the Omni orchestrator, PTAL

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

lishunyang12

The refactoring into CfgCompanionTracker is a big improvement. One issue: if forward_parent_with_cfg hits an exception in process_engine_inputs, the parent was already popped from _pending_parents via pop_pending_parent. That means the request is orphaned -- it won't be caught by timeout, and completed_requests never increments, so the scheduling loop hangs.

vllm_omni/entrypoints/cfg_companion_tracker.py

vllm_omni/entrypoints/omni.py

vllm_omni/entrypoints/cfg_companion_tracker.py

Signed-off-by: princepride <wangzhipeng628@gmail.com>

princepride · 2026-02-28T01:54:18Z

If process_engine_inputs raises here, the parent was already popped from _pending_parents at the call site but never counted as completed. This silently orphans the request and hangs the loop. Either re-insert the parent into _pending_parents on failure, or return a success/failure indicator so the caller can handle it.

Excellent suggestion! I looked at the original code, and it seems that handling exceptions in the connector section doesn't update completed_requests. I will update original code and cfg code.

vllm-omni/vllm_omni/entrypoints/omni.py

Lines 1108 to 1119 in 11f59f6

try:

# Derive inputs for the next stage, record preprocess time

with metrics.stage_postprocess_timer(stage_id, req_id):

next_inputs = next_stage.process_engine_inputs(

self.stage_list, [request_id_to_prompt[req_id]]

)

except Exception as e:

logger.exception(

f"[{self._name}] Process engine inputs error for req {req_id}"

f" at stage {next_stage_id}: {e}",

)

continue

@hsliuustc0106 @tzhouam I think @lishunyang12's suggestion is reasonable. Our original code did not properly handle data that failed to be sent to the next stage.

lishunyang12

All three concerns addressed. The completed_requests accounting fix in the original code path is a good catch too. LGTM.

hsliuustc0106 · 2026-03-01T09:49:26Z

Review Summary

PR Type: Feature - CFG KV-cache transfer for multi-stage inference

Key Observations:

No MRO issues (uses composition, not mixins with nn.Module)
Clean separation of concerns with CfgCompanionTracker class
Visual quality improvement demonstrated in PR description

Suggestions:

Add unit tests for CFG logic (prompt expansion, companion tracking)
Use specific exception handling instead of bare except Exception:

vllm_omni/model_executor/stage_input_processors/bagel.py

tests/e2e/offline_inference/test_bagel_text2img.py

Signed-off-by: princepride <wangzhipeng628@gmail.com>

hsliuustc0106

lgtm

Signed-off-by: princepride <wangzhipeng628@gmail.com>

cfg multi-stage

6cde323

Signed-off-by: princepride <wangzhipeng628@gmail.com>

princepride requested a review from hsliuustc0106 as a code owner February 21, 2026 12:35

chatgpt-codex-connector bot reviewed Feb 21, 2026

View reviewed changes

vllm_omni/entrypoints/omni.py Outdated Show resolved Hide resolved

hsliuustc suggested changes Feb 21, 2026

View reviewed changes

lishunyang12 reviewed Feb 21, 2026

View reviewed changes

princepride added 2 commits February 22, 2026 18:13

_forward_parent_with_cfg saved in parent_result['engine_outputs'] && …

4112f54

…add companion timeout Signed-off-by: princepride <wangzhipeng628@gmail.com>

only check whether parent have complete

428e774

Signed-off-by: princepride <wangzhipeng628@gmail.com>

lishunyang12 reviewed Feb 22, 2026

View reviewed changes

princepride added 5 commits February 23, 2026 13:49

change to deepcopy

a66e55f

Signed-off-by: princepride <wangzhipeng628@gmail.com>

update bagel e2e test

a32a813

Signed-off-by: princepride <wangzhipeng628@gmail.com>

Merge branch 'main' into feature/cfg-multi-stage

883549d

add docs

321e5a4

Signed-off-by: princepride <wangzhipeng628@gmail.com>

Merge branch 'main' into feature/cfg-multi-stage

3ae335f

hsliuustc0106 reviewed Feb 23, 2026

View reviewed changes

princepride and others added 8 commits February 24, 2026 14:15

Merge branch 'main' into feature/cfg-multi-stage

b233bb5

princepride force-pushed the feature/cfg-multi-stage branch from c4ef389 to 6cca43d Compare February 24, 2026 06:51

fix pre-commit

9b9f06e

Signed-off-by: princepride <wangzhipeng628@gmail.com>

remove omni cfg_kv_collect_func

a3a10f2

Signed-off-by: princepride <wangzhipeng628@gmail.com>

move load_func_from_config to stage_utils

06f2165

Signed-off-by: princepride <wangzhipeng628@gmail.com>

lishunyang12 approved these changes Feb 24, 2026

View reviewed changes

princepride added 6 commits February 24, 2026 23:05

Merge branch 'main' into feature/cfg-multi-stage

5ff56f6

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

wrap cfg processor

3f90cdd

Signed-off-by: princepride <wangzhipeng628@gmail.com>

wrap cfg processor

a694646

Signed-off-by: princepride <wangzhipeng628@gmail.com>

wrap cfg processor

30b5db4

Signed-off-by: princepride <wangzhipeng628@gmail.com>

wrap cfg processor

b50abab

Signed-off-by: princepride <wangzhipeng628@gmail.com>

Merge branch 'main' into feature/cfg-multi-stage

a6fc71d

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

princepride requested review from hsliuustc0106 and tzhouam February 27, 2026 02:47

Merge branch 'main' into feature/cfg-multi-stage

ebbc36f

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

lishunyang12 reviewed Feb 27, 2026

View reviewed changes

vllm_omni/entrypoints/cfg_companion_tracker.py Outdated Show resolved Hide resolved

vllm_omni/entrypoints/omni.py Show resolved Hide resolved

vllm_omni/entrypoints/cfg_companion_tracker.py Outdated Show resolved Hide resolved

princepride added 3 commits February 28, 2026 01:37

correct process completed_requests if transfer data to next stage failed

6a04e70

Signed-off-by: princepride <wangzhipeng628@gmail.com>

idiomatic return value

c2eb553

Signed-off-by: princepride <wangzhipeng628@gmail.com>

if not sent_via_connector, throw error

8ccf529

Signed-off-by: princepride <wangzhipeng628@gmail.com>

princepride requested a review from lishunyang12 February 28, 2026 01:55

lishunyang12 approved these changes Feb 28, 2026

View reviewed changes

hsliuustc0106 reviewed Mar 1, 2026

View reviewed changes

vllm_omni/model_executor/stage_input_processors/bagel.py Show resolved Hide resolved

hsliuustc0106 reviewed Mar 1, 2026

View reviewed changes

tests/e2e/offline_inference/test_bagel_text2img.py Show resolved Hide resolved

princepride added 2 commits March 1, 2026 11:41

add test_cfg_companion_tracker and fix bagel exception handlers

9c688d6

Signed-off-by: princepride <wangzhipeng628@gmail.com>

Merge branch 'main' into feature/cfg-multi-stage

46ef8c5

princepride requested a review from hsliuustc0106 March 1, 2026 11:43

hsliuustc0106 added the ready label to trigger buildkite CI label Mar 2, 2026

hsliuustc0106 approved these changes Mar 2, 2026

View reviewed changes

hsliuustc0106 enabled auto-merge (squash) March 2, 2026 03:04

test kv_transfer_manager methods expand

753a19a

Signed-off-by: princepride <wangzhipeng628@gmail.com>

hsliuustc0106 merged commit e37a89f into vllm-project:main Mar 2, 2026
6 of 7 checks passed

Conversation

princepride commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

princepride commented Feb 21, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

hsliuustc left a comment

Choose a reason for hiding this comment

PR Review Summary

Overview

Critical Issues (Must Fix)

1. Race condition in _forward_parent_with_cfg (P1) 🚨

2. Missing error handling for companion request failures 🚨

3. Memory leak potential with _pending_parent_results 🚨

Important Issues (Should Fix)

4. Missing validation in collect_cfg_kv_caches

5. Batch size limitation without documentation

6. No handling for img2img case

Positive Aspects ✅

Overall Assessment

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lishunyang12 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Feb 23, 2026

Uh oh!

github-actions bot commented Feb 23, 2026

🤖 VLLM-Omni PR Review

Code Review: Support CFG KV-Cache Transfer in Multi-Stage

1. Overview

2. Code Quality

Strengths

Issues

3. Architecture & Design

Strengths

Concerns

4. Security & Safety

Resource Management

Input Validation

Potential Issues

5. Testing & Documentation

Test Coverage

Documentation

Suggested Test Cases

6. Specific Suggestions

vllm_omni/entrypoints/omni.py:772

vllm_omni/entrypoints/omni.py:799-821

vllm_omni/diffusion/data.py:382

vllm_omni/model_executor/stage_input_processors/bagel.py:20

vllm_omni/inputs/data.py:232-237

7. Approval Status

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

princepride commented Feb 21, 2026 •

edited

Loading

1. Race condition in `_forward_parent_with_cfg` (P1) 🚨

3. Memory leak potential with `_pending_parent_results` 🚨

4. Missing validation in `collect_cfg_kv_caches`

lishunyang12 left a comment •

edited

Loading

`vllm_omni/entrypoints/omni.py:772`

`vllm_omni/entrypoints/omni.py:799-821`

`vllm_omni/diffusion/data.py:382`

`vllm_omni/model_executor/stage_input_processors/bagel.py:20`

`vllm_omni/inputs/data.py:232-237`

`vllm_omni/model_executor/stage_input_processors/bagel.py`

`vllm_omni/entrypoints/omni.py`

`vllm_omni/diffusion/models/bagel/pipeline_bagel.py`

`vllm_omni/entrypoints/omni_stage.py`

`vllm_omni/distributed/omni_connectors/kv_transfer_manager.py`