-
Notifications
You must be signed in to change notification settings - Fork 488
Description
Motivation.
Currently, in the multi-stage pipeline, Stage-0 (autoregressive) sends only a single conditional KV cache to Stage-1 (diffusion). The BagelPipeline in Stage-1 detects this missing multi-branch context and forcefully disables CFG (setting scales to 1.0), resulting in lower quality images compared to the standalone DiT pipeline which utilizes a 3-branch CFG approach.
To resolve this without polluting the core vLLM-Omni framework (orchestrator, stage workers, and KV transfer managers) with model-specific CFG logic, this feature introduces generic hooks. By completely decoupling CFG prompt generation and KV cache reception from the framework, we allow current (Bagel) and future models to seamlessly adapt to multi-branch inference paradigms without requiring changes to the underlying multi-stage orchestration logic.
Architecture
flowchart TD
subgraph framework [Framework - Generic]
Orch[Orchestrator]
KVMgr[KV Transfer Manager]
end
subgraph modelSpecific [Model-Specific - bagel.py]
ExpandFn["expand_cfg_prompts()"]
CollectFn["collect_cfg_kv_caches()"]
end
subgraph yaml [YAML Config]
YamlCfg["prompt_expand_func: ...bagel.expand_cfg_prompts\ncfg_kv_collect_func: ...bagel.collect_cfg_kv_caches"]
end
YamlCfg -->|"loaded by"| Orch
Orch -->|"calls"| ExpandFn
ExpandFn -->|"returns expanded prompts"| Orch
Orch -->|"submits all prompts"| Stage0[Stage-0 AR]
Stage0 -->|"KV caches via SHM"| KVMgr
KVMgr -->|"calls"| CollectFn
CollectFn -->|"returns organized KVs"| Stage1[Stage-1 DiT / BagelPipeline]
Proposed Change.
Feedback Period.
No response
CC List.
@hsliuustc0106 @ZJY0516 @natureofnature @nussejzz
Any Other Things.
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.