Enable LMCache for cpuoffloading, LMCache docker support, enable lmcache #64

hsubramony · 2025-08-06T23:57:36Z

enable lmcache , porting from HabanaAI/vllm-fork#1590

xuechendi · 2025-08-08T01:37:35Z

@hsubramony , please fix DCO

Signed-off-by: Harish Subramony <[email protected]>

examples/lmcache/disagg_prefill_lmcache_v1/configs/lmcache-decoder-config.yaml

examples/lmcache/disagg_prefill_lmcache_v1/configs/lmcache-prefiller-config.yaml

examples/lmcache/disagg_prefill_lmcache_v1/disagg_vllm_launcher.sh

xuechendi · 2025-08-13T21:47:09Z

I think only example/lmcache/hpu folder is needed, the CUDA example can be removed right?

xuechendi · 2025-08-13T21:48:10Z

Also, please so rebase, vllm upstream just updated a general class API, just fixed the crash issue for vllm-gaudi, you'll need rebase to make it pass

Signed-off-by: Harish Subramony <[email protected]>

xuechendi · 2025-08-13T21:57:48Z

vllm_gaudi/v1/worker/hpu_model_runner.py

@@ -1760,7 +1815,11 @@ def execute_model(
            spec_token_ids=None,
            prompt_logprobs_dict=prompt_logprobs_dict,  # type: ignore[arg-type]
            pooler_output=[],
+            #finished_sending=finished_sending,
+            #finished_recving=finished_recving,


remove these two lines

I think there should be a new parameter: kv_connector_output=kv_connector_output?

please check here:
https://github.com/vllm-project/vllm/blob/main/vllm/v1/outputs.py#L114

kv_connector_output: Optional[KVConnectorOutput] = None is optional

i see what ur saying ..
class KVConnectorOutput:
# [req_ids]
finished_sending: Optional[set[str]] = None
finished_recving: Optional[set[str]] = None

xuechendi · 2025-08-13T22:08:02Z

vllm_gaudi/v1/worker/hpu_model_runner.py

+
+        output = copy.copy(EMPTY_MODEL_RUNNER_OUTPUT)
+        output.finished_sending = finished_sending
+        output.finished_recving = finished_recving


I think it is now output.kv_connector_output instead of finished_sending / finished_recving?
please check here:
https://github.com/vllm-project/vllm/blob/main/vllm/v1/outputs.py#L114

xuechendi · 2025-08-13T22:13:04Z

Since LMCache upstream PR is pending and we can't add CI at this moment, might pending on upstream PR merge firstly then continue on this one.
CI will be needed for this PR, otherwise upstream update may break this feature.

hsubramony requested review from kzawora-intel, xuechendi, mswiniarsk and adobrzyn as code owners August 6, 2025 23:57

hsubramony added 3 commits August 12, 2025 21:09

Enable LMCache for cpuoffloading, LMCache docker support, enable lmcache

d2b9ab0

Signed-off-by: Harish Subramony <[email protected]>

add test scripts to examples

9013a49

Signed-off-by: Harish Subramony <[email protected]>

more fixes and scripts

1eb7eb5

Signed-off-by: Harish Subramony <[email protected]>

hsubramony force-pushed the port_lmcache branch from f1b6f55 to 1eb7eb5 Compare August 12, 2025 21:12

remove benchmarks folder and fix readme

e6edddc

Signed-off-by: Harish Subramony <[email protected]>

hsubramony force-pushed the port_lmcache branch from 59e80e2 to e6edddc Compare August 13, 2025 16:34

hsubramony added 2 commits August 13, 2025 16:55

Merge remote-tracking branch 'origin/main' into port_lmcache

2937fce

pre-commit fixes

e42c193

Signed-off-by: Harish Subramony <[email protected]>