Skip to content

Conversation

hsubramony
Copy link

enable lmcache , porting from HabanaAI/vllm-fork#1590

@xuechendi
Copy link
Collaborator

@hsubramony , please fix DCO

@xuechendi
Copy link
Collaborator

I think only example/lmcache/hpu folder is needed, the CUDA example can be removed right?

@xuechendi
Copy link
Collaborator

Also, please so rebase, vllm upstream just updated a general class API, just fixed the crash issue for vllm-gaudi, you'll need rebase to make it pass

@@ -1760,7 +1815,11 @@ def execute_model(
spec_token_ids=None,
prompt_logprobs_dict=prompt_logprobs_dict, # type: ignore[arg-type]
pooler_output=[],
#finished_sending=finished_sending,
#finished_recving=finished_recving,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove these two lines

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there should be a new parameter: kv_connector_output=kv_connector_output?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kv_connector_output: Optional[KVConnectorOutput] = None  is optional

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see what ur saying ..
class KVConnectorOutput:
# [req_ids]
finished_sending: Optional[set[str]] = None
finished_recving: Optional[set[str]] = None


output = copy.copy(EMPTY_MODEL_RUNNER_OUTPUT)
output.finished_sending = finished_sending
output.finished_recving = finished_recving
Copy link
Collaborator

@xuechendi xuechendi Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is now output.kv_connector_output instead of finished_sending / finished_recving?
please check here:
https://github.com/vllm-project/vllm/blob/main/vllm/v1/outputs.py#L114

@xuechendi
Copy link
Collaborator

Since LMCache upstream PR is pending and we can't add CI at this moment, might pending on upstream PR merge firstly then continue on this one.
CI will be needed for this PR, otherwise upstream update may break this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants