-
Notifications
You must be signed in to change notification settings - Fork 28
Enable LMCache for cpuoffloading, LMCache docker support, enable lmcache #64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@hsubramony , please fix DCO |
Signed-off-by: Harish Subramony <[email protected]>
Signed-off-by: Harish Subramony <[email protected]>
Signed-off-by: Harish Subramony <[email protected]>
f1b6f55
to
1eb7eb5
Compare
Signed-off-by: Harish Subramony <[email protected]>
59e80e2
to
e6edddc
Compare
Signed-off-by: Harish Subramony <[email protected]>
examples/lmcache/disagg_prefill_lmcache_v1/configs/lmcache-decoder-config.yaml
Outdated
Show resolved
Hide resolved
examples/lmcache/disagg_prefill_lmcache_v1/configs/lmcache-prefiller-config.yaml
Outdated
Show resolved
Hide resolved
examples/lmcache/disagg_prefill_lmcache_v1/disagg_vllm_launcher.sh
Outdated
Show resolved
Hide resolved
examples/lmcache/disagg_prefill_lmcache_v1/disagg_vllm_launcher.sh
Outdated
Show resolved
Hide resolved
I think only example/lmcache/hpu folder is needed, the CUDA example can be removed right? |
Also, please so rebase, vllm upstream just updated a general class API, just fixed the crash issue for vllm-gaudi, you'll need rebase to make it pass |
Signed-off-by: Harish Subramony <[email protected]>
@@ -1760,7 +1815,11 @@ def execute_model( | |||
spec_token_ids=None, | |||
prompt_logprobs_dict=prompt_logprobs_dict, # type: ignore[arg-type] | |||
pooler_output=[], | |||
#finished_sending=finished_sending, | |||
#finished_recving=finished_recving, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove these two lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there should be a new parameter: kv_connector_output=kv_connector_output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please check here:
https://github.com/vllm-project/vllm/blob/main/vllm/v1/outputs.py#L114
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kv_connector_output: Optional[KVConnectorOutput] = None is optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see what ur saying ..
class KVConnectorOutput:
# [req_ids]
finished_sending: Optional[set[str]] = None
finished_recving: Optional[set[str]] = None
|
||
output = copy.copy(EMPTY_MODEL_RUNNER_OUTPUT) | ||
output.finished_sending = finished_sending | ||
output.finished_recving = finished_recving |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is now output.kv_connector_output instead of finished_sending / finished_recving?
please check here:
https://github.com/vllm-project/vllm/blob/main/vllm/v1/outputs.py#L114
Since LMCache upstream PR is pending and we can't add CI at this moment, might pending on upstream PR merge firstly then continue on this one. |
enable lmcache , porting from HabanaAI/vllm-fork#1590