-
Notifications
You must be signed in to change notification settings - Fork 485
[Feature]: Support cfg kv-cache transfer in multi-stage #1422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
hsliuustc0106
merged 32 commits into
vllm-project:main
from
princepride:feature/cfg-multi-stage
Mar 2, 2026
Merged
Changes from all commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
6cde323
cfg multi-stage
princepride 4112f54
_forward_parent_with_cfg saved in parent_result['engine_outputs'] && …
princepride 428e774
only check whether parent have complete
princepride a66e55f
change to deepcopy
princepride a32a813
update bagel e2e test
princepride 883549d
Merge branch 'main' into feature/cfg-multi-stage
princepride 321e5a4
add docs
princepride 3ae335f
Merge branch 'main' into feature/cfg-multi-stage
princepride b233bb5
Merge branch 'main' into feature/cfg-multi-stage
princepride f6b0195
Remove unused completed_requests param from _forward_parent_with_cfg
princepride 0156ff7
Build reverse index for O(1) companion-to-parent lookup
princepride 18ccae3
Add single-threaded assumption note for source_outputs_override swap
princepride c0a1eb7
Remove unused is_cfg_companion_request and get_parent_request_id
princepride cea6766
Clarify why cfg_kv_collect_func is resolved on the parent side
princepride 9dba25f
Fix _get_negative_prompt to treat empty string as absent
princepride 6cca43d
Add comment explaining max_batch_size assumption in bagel.yaml
princepride 9b9f06e
fix pre-commit
princepride a3a10f2
remove omni cfg_kv_collect_func
princepride 06f2165
move load_func_from_config to stage_utils
princepride 5ff56f6
Merge branch 'main' into feature/cfg-multi-stage
princepride 3f90cdd
wrap cfg processor
princepride a694646
wrap cfg processor
princepride 30b5db4
wrap cfg processor
princepride b50abab
wrap cfg processor
princepride a6fc71d
Merge branch 'main' into feature/cfg-multi-stage
princepride ebbc36f
Merge branch 'main' into feature/cfg-multi-stage
princepride 6a04e70
correct process completed_requests if transfer data to next stage failed
princepride c2eb553
idiomatic return value
princepride 8ccf529
if not sent_via_connector, throw error
princepride 9c688d6
add test_cfg_companion_tracker and fix bagel exception handlers
princepride 46ef8c5
Merge branch 'main' into feature/cfg-multi-stage
princepride 753a19a
test kv_transfer_manager methods expand
princepride File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,114 @@ | ||
| import time | ||
| from types import SimpleNamespace | ||
|
|
||
| import pytest | ||
|
|
||
| from vllm_omni.entrypoints.cfg_companion_tracker import CfgCompanionTracker | ||
|
|
||
| pytestmark = [pytest.mark.core_model, pytest.mark.cpu] | ||
|
|
||
|
|
||
| def dummy_expand_func(prompt, sp0): | ||
| if prompt == "expand_me": | ||
| return [SimpleNamespace(prompt={"prompt": "neg"}, role="cfg_text", request_id_suffix="__cfg_text")] | ||
| return [] | ||
|
|
||
|
|
||
| @pytest.fixture | ||
| def tracker(): | ||
| sp0 = SimpleNamespace() | ||
| return CfgCompanionTracker(prompt_expand_func=dummy_expand_func, stage0_sampling_params=sp0, timeout_s=0.1) | ||
|
|
||
|
|
||
| def test_companion_tracker_initialization(tracker): | ||
| assert not tracker.is_active | ||
| assert tracker.num_companions == 0 | ||
|
|
||
|
|
||
| def test_expand_prompts_registers_companions(tracker): | ||
| request_id_to_prompt = {"req1": "expand_me", "req2": "do_not_expand"} | ||
|
|
||
| pairs = tracker.expand_prompts(request_id_to_prompt) | ||
|
|
||
| assert len(pairs) == 1 | ||
| companion_id, prompt = pairs[0] | ||
| assert companion_id == "req1__cfg_text" | ||
| assert prompt == {"prompt": "neg"} | ||
|
|
||
| assert tracker.is_active | ||
| assert tracker.num_companions == 1 | ||
| assert tracker.is_companion("req1__cfg_text") | ||
| assert not tracker.is_companion("req2__cfg_text") | ||
| assert tracker.has_companions("req1") | ||
| assert not tracker.has_companions("req2") | ||
|
|
||
| comp_map = tracker.get_companion_request_ids("req1") | ||
| assert comp_map == {"cfg_text": "req1__cfg_text"} | ||
|
|
||
|
|
||
| def test_companion_lifecycle_success(tracker): | ||
| request_id_to_prompt = {"req1": "expand_me"} | ||
| tracker.expand_prompts(request_id_to_prompt) | ||
|
|
||
| # Defer parent | ||
| engine_outputs = {"out": 123} | ||
| tracker.defer_parent("req1", engine_outputs, stage_id=0) | ||
|
|
||
| # Initially not done | ||
| assert not tracker.all_companions_done("req1") | ||
|
|
||
| # Companion completes | ||
| parent_id = tracker.on_companion_completed("req1__cfg_text") | ||
|
|
||
| # Parent should be returned since all companions are done and it is pending | ||
| assert parent_id == "req1" | ||
| assert tracker.all_companions_done("req1") | ||
|
|
||
| # Pop pending parent | ||
| popped = tracker.pop_pending_parent("req1") | ||
| assert popped is not None | ||
| assert popped["engine_outputs"] == engine_outputs | ||
| assert popped["stage_id"] == 0 | ||
|
|
||
|
|
||
| def test_companion_lifecycle_failure(tracker): | ||
| request_id_to_prompt = {"req1": "expand_me"} | ||
| tracker.expand_prompts(request_id_to_prompt) | ||
|
|
||
| tracker.defer_parent("req1", {"out": 123}, stage_id=0) | ||
|
|
||
| # Companion fails | ||
| parent_id, aborted = tracker.on_companion_error("req1__cfg_text") | ||
|
|
||
| assert parent_id == "req1" | ||
| assert aborted is True | ||
| assert tracker.is_parent_failed("req1") | ||
|
|
||
| # Parent should be removed from pending list | ||
| assert tracker.pop_pending_parent("req1") is None | ||
|
|
||
| # Consume failure | ||
| tracker.consume_parent_failure("req1") | ||
| assert not tracker.is_parent_failed("req1") | ||
|
|
||
|
|
||
| def test_companion_lifecycle_timeout(tracker): | ||
| request_id_to_prompt = {"req1": "expand_me"} | ||
| tracker.expand_prompts(request_id_to_prompt) | ||
|
|
||
| tracker.defer_parent("req1", {"out": 123}, stage_id=0) | ||
|
|
||
| # Initially no timeouts | ||
| timeouts = tracker.check_timeouts() | ||
| assert len(timeouts) == 0 | ||
|
|
||
| # Wait for timeout | ||
| time.sleep(0.15) | ||
|
|
||
| # Check timeouts again | ||
| timeouts = tracker.check_timeouts() | ||
| assert len(timeouts) == 1 | ||
| assert timeouts[0] == "req1" | ||
|
|
||
| # Should be removed from pending | ||
| assert tracker.pop_pending_parent("req1") is None |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.