vLLM prefix caching correctness (+ "test") #346

joecummings · 2025-10-08T16:01:27Z

Context

Are we correctly clearing our prefix cache? Hard to know without proper testing. Hence, this PR.

What changed?

Modified policy to 1) just call clear prefix cache on scheduler and 2) attach num tokens cached to metadata
Add "metadata" to Completions
Add test that checks that the number of tokens used in the cache are expected

Open questions

Do you find this test to be sufficient proof that we are using the prefix cache from vLLM "correctly"?

JenniferWang · 2025-10-08T17:21:25Z

tests/integration_tests/test_vllm_policy_correctness.py

+    According to the vLLM docs (https://docs.vllm.ai/en/v0.9.0/api/vllm/outputs.html#vllm.outputs.RequestOutput),
+    this is the number of tokens with a prefix cache hit. So, the logic is that if we run one generation,
+    then run another generation with the same start, we should see the number of cached tokens == the length of the prefix.
+


JenniferWang · 2025-10-08T20:01:11Z

src/forge/actors/policy.py

        logger.info(f"Weight update completed (now v{self.policy_version})")

+    @endpoint
+    async def _reset_prefix_cache(self):


Maybe we can have this convention of

Suggested change

async def _reset_prefix_cache(self):

async def _test_reset_prefix_cache(self):

In this case, I actually don't mind exposing to the end user! It could be used if someone wants to do something custom with their Policy setup.

Jack-Khuu

Thanks for the test

Jack-Khuu · 2025-10-08T20:03:36Z

src/forge/actors/policy.py

        logger.info(f"Weight update completed (now v{self.policy_version})")

+    @endpoint
+    async def _reset_prefix_cache(self):


Another example of needing
meta-pytorch/monarch#1455

Jack-Khuu · 2025-10-08T20:07:06Z

tests/integration_tests/test_vllm_policy_correctness.py

+        )
+        expected_cached_tokens = 0
+        async for res in vllm_model.generate(
+            first_prompt, sampling_params, request_id="first_16"


Does the request_id matter here?

Nah, just for logging.

Test vLLM prefix caching correctness

7021611

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 8, 2025

Add metadata to Completion

1ab1f5a

joecummings changed the title ~~Test vLLM prefix caching correctness~~ vLLM prefix caching correctness (+ "test") Oct 8, 2025

joecummings marked this pull request as ready for review October 8, 2025 18:32

joecummings added 2 commits October 8, 2025 11:37

Merge remote-tracking branch 'upstream/main' into kv-caching-test

a9c4110

Remove need to instantiate metric logger

e187c29

joecummings requested review from JenniferWang and pbontrager October 8, 2025 18:42

lint

d3fd8d3

JenniferWang approved these changes Oct 8, 2025

View reviewed changes

Jack-Khuu approved these changes Oct 8, 2025

View reviewed changes

joecummings merged commit 3dd4853 into meta-pytorch:main Oct 8, 2025
8 checks passed

joecummings deleted the kv-caching-test branch October 8, 2025 21:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vLLM prefix caching correctness (+ "test") #346

vLLM prefix caching correctness (+ "test") #346

Uh oh!

joecummings commented Oct 8, 2025 •

edited

Loading

Uh oh!

JenniferWang Oct 8, 2025

Uh oh!

JenniferWang Oct 8, 2025

Uh oh!

joecummings Oct 8, 2025

Uh oh!

Jack-Khuu left a comment

Uh oh!

Jack-Khuu Oct 8, 2025

Uh oh!

Jack-Khuu Oct 8, 2025

Uh oh!

joecummings Oct 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	async def _reset_prefix_cache(self):
	async def _test_reset_prefix_cache(self):

vLLM prefix caching correctness (+ "test") #346

vLLM prefix caching correctness (+ "test") #346

Uh oh!

Conversation

joecummings commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

What changed?

Open questions

Uh oh!

JenniferWang Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

JenniferWang Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

joecummings Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Jack-Khuu left a comment

Choose a reason for hiding this comment

Uh oh!

Jack-Khuu Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Jack-Khuu Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

joecummings Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

joecummings commented Oct 8, 2025 •

edited

Loading