[Test] add test for prefix cache feature of deepseek by HuaJiaHeng · Pull Request #3733 · vllm-project/vllm-ascend

HuaJiaHeng · 2025-10-25T01:21:53Z

What this PR does / why we need it?

This PR adds a prefix cache case for nightly test for DeepSeek-r1-0528-W8A8 on A3, we need test them daily.

Does this PR introduce any user-facing change?

No

How was this patch tested?

By running the test

vLLM version: v0.11.0rc3
vLLM main: vllm-project/vllm@17c540a

gemini-code-assist

Code Review

This pull request introduces a new end-to-end test for the prefix cache feature of the DeepSeek model. The test correctly verifies that using a prefix cache improves the Time To First Token (TTFT) by comparing performance on datasets with different prefix-sharing characteristics. My review focuses on improving the maintainability of the new test code. I've identified significant code duplication in how the test cases are executed and have suggested refactoring this logic into a helper function to make the test cleaner and easier to maintain.

gemini-code-assist · 2025-10-25T01:23:07Z

tests/e2e/nightly/features/test_prefix_cache_deepseek_r1_0528_w8a8.py

+    with RemoteOpenAIServer(model,
+                            server_args,
+                            server_port=port,
+                            env_dict=env_dict,
+                            auto_port=False):
+        run_aisbench_cases(model, port, aisbench_warm_up)
+        result = run_aisbench_cases(model, port, aisbench_cases0)
+        TTFT0 = get_TTFT(result)
+    with RemoteOpenAIServer(model,
+                            server_args,
+                            server_port=port,
+                            env_dict=env_dict,
+                            auto_port=False):
+        run_aisbench_cases(model, port, aisbench_warm_up)
+        result = run_aisbench_cases(model, port, aisbench_cases75)
+        TTFT75 = get_TTFT(result)


The test logic for running the benchmark and getting the TTFT is duplicated for aisbench_cases0 and aisbench_cases75. This includes starting the server, warming up, and running the case. This duplication makes the code harder to read and maintain. Any changes to the test setup would need to be applied in two places, increasing the risk of inconsistencies.

To improve this, you can extract the common logic into a helper function. This will make the test cleaner, more readable, and easier to maintain.

def _run_and_get_ttft(aisbench_case: list) -> float: """Helper to start a server, run a benchmark case, and return TTFT.""" with RemoteOpenAIServer(model, server_args, server_port=port, env_dict=env_dict, auto_port=False): run_aisbench_cases(model, port, aisbench_warm_up) result = run_aisbench_cases(model, port, aisbench_case) return get_TTFT(result) TTFT0 = _run_and_get_ttft(aisbench_cases0) TTFT75 = _run_and_get_ttft(aisbench_cases75)

github-actions · 2025-10-25T01:26:41Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

github-actions · 2025-10-25T01:27:52Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: root <root@hostname-2pbfv.foreman.pxe>

HuaJiaHeng · 2025-10-25T03:32:24Z

has passed CI.

jiangyunfan1 · 2025-10-25T06:06:14Z

LGTM

### What this PR does / why we need it? This PR adds a prefix cache case for nightly test for DeepSeek-r1-0528-W8A8 on A3, we need test them daily. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the test - vLLM version: v0.11.0rc3 - vLLM main: vllm-project/vllm@17c540a --------- Signed-off-by: root <root@hostname-2pbfv.foreman.pxe> Co-authored-by: root <root@hostname-2pbfv.foreman.pxe> Signed-off-by: luolun <luolun1995@cmbchina.com>

### What this PR does / why we need it? This PR adds a prefix cache case for nightly test for DeepSeek-r1-0528-W8A8 on A3, we need test them daily. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the test - vLLM version: v0.11.0rc3 - vLLM main: vllm-project/vllm@17c540a --------- Signed-off-by: root <root@hostname-2pbfv.foreman.pxe> Co-authored-by: root <root@hostname-2pbfv.foreman.pxe> Signed-off-by: hwhaokun <haokun0405@163.com>

### What this PR does / why we need it? This PR adds a prefix cache case for nightly test for DeepSeek-r1-0528-W8A8 on A3, we need test them daily. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the test - vLLM version: v0.11.0rc3 - vLLM main: vllm-project/vllm@17c540a --------- Signed-off-by: root <root@hostname-2pbfv.foreman.pxe> Co-authored-by: root <root@hostname-2pbfv.foreman.pxe> Signed-off-by: nsdie <yeyifan@huawei.com>

### What this PR does / why we need it? This PR adds a prefix cache case for nightly test for DeepSeek-r1-0528-W8A8 on A3, we need test them daily. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? By running the test - vLLM version: v0.11.0rc3 - vLLM main: vllm-project/vllm@17c540a --------- Signed-off-by: root <root@hostname-2pbfv.foreman.pxe> Co-authored-by: root <root@hostname-2pbfv.foreman.pxe>

gemini-code-assist bot reviewed Oct 25, 2025

View reviewed changes

github-actions bot added the merge-conflicts label Oct 25, 2025

github-actions bot added the module:tests label Oct 25, 2025

HuaJiaHeng force-pushed the main branch from 4f2f60b to fa1220e Compare October 25, 2025 03:19

github-actions bot removed the merge-conflicts label Oct 25, 2025

HuaJiaHeng force-pushed the main branch from fa1220e to b6a166d Compare October 25, 2025 03:22

root added 2 commits October 25, 2025 03:23

add test for prefix cache feature of deepseek

36871cb

Signed-off-by: root <root@hostname-2pbfv.foreman.pxe>

change to the new workflow

14afe9a

Signed-off-by: root <root@hostname-2pbfv.foreman.pxe>

HuaJiaHeng force-pushed the main branch from b6a166d to 14afe9a Compare October 25, 2025 03:23

wangxiyuan approved these changes Oct 25, 2025

View reviewed changes

wangxiyuan merged commit 11f7588 into vllm-project:main Oct 25, 2025
20 checks passed

MrZ20 mentioned this pull request Mar 2, 2026

[Nightly][Refactor]Migrate nightly single-node model tests from .py to .yaml #6503

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Test] add test for prefix cache feature of deepseek#3733

[Test] add test for prefix cache feature of deepseek#3733
wangxiyuan merged 2 commits intovllm-project:mainfrom
HuaJiaHeng:main

HuaJiaHeng commented Oct 25, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 25, 2025

Uh oh!

github-actions bot commented Oct 25, 2025

Uh oh!

github-actions bot commented Oct 25, 2025

Uh oh!

HuaJiaHeng commented Oct 25, 2025

Uh oh!

jiangyunfan1 commented Oct 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

HuaJiaHeng commented Oct 25, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 25, 2025

Uh oh!

github-actions bot commented Oct 25, 2025

Uh oh!

HuaJiaHeng commented Oct 25, 2025

Uh oh!

jiangyunfan1 commented Oct 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HuaJiaHeng commented Oct 25, 2025 •

edited by github-actions bot

Loading