forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 7
PD heterogenous TP #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 12 commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
c35cf98
one remote agent per remote rank
NickLucche ec84817
tp_size in metadata and handshake with rank0 first
NickLucche 60ab197
todos
NickLucche 792bacd
dst_num_blocks is engine_id only
NickLucche d2ea8c8
fixes
NickLucche f17092f
block_len is tp dependent
NickLucche ddf4c8e
wip
NickLucche 00392ce
refactor remote kv_cache splitting and ditch tp_multiplier
NickLucche eb0bdd2
2-handshake model with vertical kv cache split
NickLucche 44db464
still broken
NickLucche 52d2325
minor
NickLucche 8080346
revert config changes
NickLucche f216e03
split kv_cache along head dim
NickLucche 72a4c14
fix descr indexing
NickLucche 522f647
clean up
NickLucche e4e4749
format
NickLucche d2ce96a
format
NickLucche ca0e15f
type
NickLucche 6868c9a
change remote worker selection indexing; test ptp2-dtp4
NickLucche File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For MLA we replicate the KV cache across TP ranks, so in this case the prefiller would need to send the same blocks to all decoders. This is the same when TP size is greater than the num kv heads