Skip to content

Commit 859f2c2

Browse files
authored
[Nightly][Refactor]Migrate nightly single-node model tests from .py to .yaml (#6503)
### What this PR does / why we need it? This PR refactors the nightly single-node model test by migrating test configurations from Python scripts to a more maintainable `YAML-based` format. | Original PR | Python (`.py`) | YAML (`.yaml`) | | :--- | :--- | :--- | | [#3568](#3568) | `test_deepseek_r1_0528_w8a8_eplb.py` | `DeepSeek-R1-0528-W8A8.yaml` | | [#3631](#3631) | `test_deepseek_r1_0528_w8a8.py` | `DeepSeek-R1-0528-W8A8.yaml` | | [#5874](#5874) | `test_deepseek_r1_w8a8_hbm.py` | `DeepSeek-R1-W8A8-HBM.yaml` | | [#3908](#3908) | `test_deepseek_v3_2_w8a8.py` | `DeepSeek-V3.2-W8A8.yaml` | | [#5682](#5682) | `test_kimi_k2_thinking.py` | `Kimi-K2-Thinking.yaml` | | [#4111](#4111) | `test_mtpx_deepseek_r1_0528_w8a8.py` | `MTPX-DeepSeek-R1-0528-W8A8.yaml` | | [#3733](#3733) | `test_prefix_cache_deepseek_r1_0528_w8a8.py` | `Prefix-Cache-DeepSeek-R1-0528-W8A8.yaml` | | [#6543](#6543) | `test_qwen3_235b_w8a8.py` | `Qwen3-235B-A22B-W8A8.yaml` | | [#6543](#6543) | `test_qwen3_235b_a22b_w8a8_eplb.py` | `Qwen3-235B-A22B-W8A8.yaml` | | [#3973](#3973) | `test_qwen3_30b_w8a8.py` | `Qwen3-30B-A3B-W8A8.yaml` | | [#3541](#3541) | `test_qwen3_32b_int8.py` | `Qwen3-32B-Int8.yaml` | | [#3757](#3757) | `test_qwq_32b.py` | `QwQ-32B.yaml` | | [#5616](#5616) | `test_qwen3_next_w8a8.py` | `Qwen3-Next-80B-A3B-Instruct-W8A8.yaml` | | [#3541](#3541) | `test_qwen2_5_vl_7b.py` | `Qwen2.5-VL-7B-Instruct.yaml` | | [#5301](#5301) | `test_qwen2_5_vl_7b_epd.py` | `Qwen2.5-VL-7B-Instruct-EPD.yaml` | | [#3707](#3707) | `test_qwen2_5_vl_32b.py` | `Qwen2.5-VL-32B-Instruct.yaml` | | [#3676](#3676) | `test_qwen3_32b_int8_a3_feature_stack3.py` | `Qwen3-32B-Int8-A3-Feature-Stack3.yaml` | | [#3709](#3709) | `test_prefix_cache_qwen3_32b_int8.py` | `Prefix-Cache-Qwen3-32B-Int8.yaml` | | [#5395](#5395) | `test_qwen3_next.py` | `Qwen3-Next-80B-A3B-Instruct-A2.yaml` | | [#3474](#3474) | `test_qwen3_32b.py` | `Qwen3-32B.yaml` | | [#3541](#3541) | `test_qwen3_32b_int8.py` | `Qwen3-32B-Int8-A2.yaml` | ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? - vLLM version: v0.15.0 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.15.0 --------- Signed-off-by: MrZ20 <2609716663@qq.com>
1 parent a0a904a commit 859f2c2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+2255
-2326
lines changed

.github/workflows/_e2e_nightly_single_node.yaml

Lines changed: 29 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,10 @@ on:
2828
type: string
2929
default: "swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/cann:8.5.1-910b-ubuntu22.04-py3.11"
3030
tests:
31-
required: true
31+
required: false
32+
type: string
33+
config_file_path:
34+
required: false
3235
type: string
3336
name:
3437
required: false
@@ -44,12 +47,12 @@ defaults:
4447
# only cancel in-progress runs of the same workflow
4548
# and ignore the lint / 1 card / 4 cards test type
4649
concurrency:
47-
group: ascend-nightly-${{ github.workflow_ref }}-${{ github.ref }}-${{ inputs.tests }}
50+
group: ascend-nightly-${{ github.workflow_ref }}-${{ github.ref }}-${{ inputs.config_file_path || inputs.tests }}
4851
cancel-in-progress: true
4952

5053
jobs:
5154
e2e-nightly:
52-
name: ${{ inputs.tests }}
55+
name: ${{ inputs.name || inputs.config_file_path || inputs.tests }}
5356
runs-on: ${{ inputs.runner }}
5457
timeout-minutes: 600
5558
container:
@@ -114,14 +117,33 @@ jobs:
114117
update-alternatives --install /usr/bin/clang clang /usr/bin/clang-15 20
115118
update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-15 20
116119
117-
- name: Run vllm-project/vllm-ascend test
120+
- name: Validate Inputs
121+
run: |
122+
if [[ -z "${{ inputs.tests }}" && -z "${{ inputs.config_file_path }}" ]]; then
123+
echo "Error: Either 'tests' or 'config_file_path' must be provided."
124+
exit 1
125+
fi
126+
127+
- name: Run Pytest (py-driven)
128+
if: ${{ inputs.tests != '' }}
118129
env:
119130
VLLM_WORKER_MULTIPROC_METHOD: spawn
120131
VLLM_USE_MODELSCOPE: True
121132
VLLM_CI_RUNNER: ${{ inputs.runner }}
122-
BENCHMARK_HOME: /vllm-workspace/vllm-ascend/benchmark
123133
working-directory: /vllm-workspace/vllm-ascend
124134
run: |
125-
# ignore test_dispatch_ffn_combine until the test is fixed
126-
pytest -sv ${{ inputs.tests }} \
135+
echo "Running pytest with tests path: ${{ inputs.tests }}"
136+
pytest -sv "${{ inputs.tests }}" \
127137
--ignore=tests/e2e/nightly/single_node/ops/singlecard_ops/test_fused_moe.py
138+
139+
- name: Run Pytest (YAML-driven)
140+
if: ${{ always() && inputs.config_file_path != '' }}
141+
env:
142+
VLLM_WORKER_MULTIPROC_METHOD: spawn
143+
VLLM_USE_MODELSCOPE: True
144+
VLLM_CI_RUNNER: ${{ inputs.runner }}
145+
CONFIG_YAML_PATH: ${{ inputs.config_file_path }}
146+
working-directory: /vllm-workspace/vllm-ascend
147+
run: |
148+
echo "Running YAML-driven test with config: ${{ inputs.config_file_path }}"
149+
pytest -sv tests/e2e/nightly/single_node/models/scripts/test_single_node.py

.github/workflows/schedule_nightly_test_a2.yaml

Lines changed: 24 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -49,15 +49,6 @@ jobs:
4949
fail-fast: false
5050
matrix:
5151
test_config:
52-
- name: qwen3-next
53-
os: linux-aarch64-a2b3-4
54-
tests: tests/e2e/nightly/single_node/models/test_qwen3_next.py
55-
- name: qwen3-32b
56-
os: linux-aarch64-a2b3-4
57-
tests: tests/e2e/nightly/single_node/models/test_qwen3_32b.py
58-
- name: qwen3-32b-in8-a2
59-
os: linux-aarch64-a2b3-4
60-
tests: tests/e2e/nightly/single_node/models/test_qwen3_32b_int8.py
6152
- name: test_custom_op
6253
os: linux-aarch64-a2b3-1
6354
tests: tests/e2e/nightly/single_node/ops/singlecard_ops
@@ -71,10 +62,33 @@ jobs:
7162
name: ${{ matrix.test_config.name }}
7263
image: 'swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/vllm-ascend:nightly-a2'
7364

65+
single-node-yaml-tests:
66+
name: single-node
67+
if: always() && (github.event_name == 'schedule' || github.event_name == 'workflow_dispatch')
68+
strategy:
69+
fail-fast: false
70+
matrix:
71+
test_config:
72+
- name: qwen3-32b
73+
os: linux-aarch64-a2b3-4
74+
config_file_path: Qwen3-32B.yaml
75+
- name: qwen3-next-80b-a3b-instruct
76+
os: linux-aarch64-a2b3-4
77+
config_file_path: Qwen3-Next-80B-A3B-Instruct-A2.yaml
78+
- name: qwen3-32b-int8
79+
os: linux-aarch64-a2b3-4
80+
config_file_path: Qwen3-32B-Int8-A2.yaml
81+
uses: ./.github/workflows/_e2e_nightly_single_node.yaml
82+
with:
83+
runner: ${{ matrix.test_config.os }}
84+
image: 'swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/vllm-ascend:nightly-a2'
85+
config_file_path: ${{ matrix.test_config.config_file_path }}
86+
name: ${{ matrix.test_config.name }}
87+
7488
multi-node-tests:
7589
name: multi-node
7690
if: always() && (github.event_name == 'schedule' || github.event_name == 'workflow_dispatch')
77-
needs: single-node-tests
91+
needs: [single-node-tests, single-node-yaml-tests]
7892
strategy:
7993
fail-fast: false
8094
max-parallel: 1

.github/workflows/schedule_nightly_test_a3.yaml

Lines changed: 55 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -109,73 +109,83 @@ jobs:
109109
single-node-tests:
110110
name: single-node
111111
if: always() && (github.event_name == 'schedule' || github.event_name == 'workflow_dispatch')
112-
needs: multi-node-tests
112+
needs: [multi-node-tests]
113113
strategy:
114114
fail-fast: false
115115
matrix:
116116
test_config:
117-
- name: qwen3-32b-in8-a3
118-
os: linux-aarch64-a3-4
119-
tests: tests/e2e/nightly/single_node/models/test_qwen3_32b_int8.py
120-
- name: qwen3-32b-int8-a3-feature-stack3
117+
- name: qwen3-30b-acc
121118
os: linux-aarch64-a3-4
122-
tests: tests/e2e/nightly/single_node/models/test_qwen3_32b_int8_a3_feature_stack3.py
123-
- name: qwen3-235b-a22b-w8a8-eplb
119+
tests: tests/e2e/weekly/single_node/models/test_qwen3_30b_acc.py
120+
uses: ./.github/workflows/_e2e_nightly_single_node.yaml
121+
with:
122+
runner: ${{ matrix.test_config.os }}
123+
image: 'swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/vllm-ascend:nightly-a3'
124+
tests: ${{ matrix.test_config.tests }}
125+
name: ${{ matrix.test_config.name }}
126+
127+
single-node-yaml-tests:
128+
name: single-node
129+
if: always() && (github.event_name == 'schedule' || github.event_name == 'workflow_dispatch')
130+
needs: [multi-node-tests]
131+
strategy:
132+
fail-fast: false
133+
matrix:
134+
test_config:
135+
# YAML-driven tests
136+
- name: deepseek-r1-0528-w8a8
137+
os: linux-aarch64-a3-16
138+
config_file_path: DeepSeek-R1-0528-W8A8.yaml
139+
- name: deepseek-r1-w8a8-hbm
124140
os: linux-aarch64-a3-16
125-
tests: tests/e2e/nightly/single_node/models/test_qwen3_235b_a22b_w8a8_eplb.py
126-
- name: deepseek-r1-w8a8-eplb
141+
config_file_path: DeepSeek-R1-W8A8-HBM.yaml
142+
- name: deepseek-v3-2-w8a8
143+
os: linux-aarch64-a3-16
144+
config_file_path: DeepSeek-V3.2-W8A8.yaml
145+
- name: kimi-k2-thinking
127146
os: linux-aarch64-a3-16
128-
tests: tests/e2e/nightly/single_node/models/test_deepseek_r1_0528_w8a8_eplb.py
129-
- name: deepseek-r1-w8a8-mtpx
147+
config_file_path: Kimi-K2-Thinking.yaml
148+
- name: mtpx-deepseek-r1-0528-w8a8
130149
os: linux-aarch64-a3-16
131-
tests: tests/e2e/nightly/single_node/models/test_mtpx_deepseek_r1_0528_w8a8.py
150+
config_file_path: MTPX-DeepSeek-R1-0528-W8A8.yaml
151+
- name: qwen3-235b-a22b-w8a8
152+
os: linux-aarch64-a3-16
153+
config_file_path: Qwen3-235B-A22B-W8A8.yaml
154+
- name: qwen3-30b-a3b-w8a8
155+
os: linux-aarch64-a3-4
156+
config_file_path: Qwen3-30B-A3B-W8A8.yaml
157+
- name: qwen3-next-80b-a3b-instruct-w8a8
158+
os: linux-aarch64-a3-4
159+
config_file_path: Qwen3-Next-80B-A3B-Instruct-W8A8.yaml
160+
- name: qwq-32b
161+
os: linux-aarch64-a3-4
162+
config_file_path: QwQ-32B.yaml
163+
- name: qwen3-32b-int8
164+
os: linux-aarch64-a3-4
165+
config_file_path: Qwen3-32B-Int8.yaml
132166
- name: qwen2-5-vl-7b
133167
os: linux-aarch64-a3-4
134-
tests: tests/e2e/nightly/single_node/models/test_qwen2_5_vl_7b.py
168+
config_file_path: Qwen2.5-VL-7B-Instruct.yaml
135169
- name: qwen2-5-vl-7b-epd
136170
os: linux-aarch64-a3-4
137-
tests: tests/e2e/nightly/single_node/models/test_qwen2_5_vl_7b_epd.py
171+
config_file_path: Qwen2.5-VL-7B-Instruct-EPD.yaml
138172
- name: qwen2-5-vl-32b
139173
os: linux-aarch64-a3-4
140-
tests: tests/e2e/nightly/single_node/models/test_qwen2_5_vl_32b.py
174+
config_file_path: Qwen2.5-VL-32B-Instruct.yaml
175+
- name: qwen3-32b-int8-a3-feature-stack3
176+
os: linux-aarch64-a3-4
177+
config_file_path: Qwen3-32B-Int8-A3-Feature-Stack3.yaml
141178
- name: qwen3-32b-int8-prefix-cache
142179
os: linux-aarch64-a3-4
143-
tests: tests/e2e/nightly/single_node/models/test_prefix_cache_qwen3_32b_int8.py
144-
- name: deepseek-r1-0528-w8a8
145-
os: linux-aarch64-a3-16
146-
tests: tests/e2e/nightly/single_node/models/test_deepseek_r1_0528_w8a8.py
180+
config_file_path: Prefix-Cache-Qwen3-32B-Int8.yaml
147181
- name: deepseek-r1-0528-w8a8-prefix-cache
148182
os: linux-aarch64-a3-16
149-
tests: tests/e2e/nightly/single_node/models/test_prefix_cache_deepseek_r1_0528_w8a8.py
150-
- name: qwq-32b-a3
151-
os: linux-aarch64-a3-4
152-
tests: tests/e2e/nightly/single_node/models/test_qwq_32b.py
153-
- name: qwen3-30b-w8a8
154-
os: linux-aarch64-a3-2
155-
tests: tests/e2e/nightly/single_node/models/test_qwen3_30b_w8a8.py
156-
- name: qwen3-235b-w8a8
157-
os: linux-aarch64-a3-16
158-
tests: tests/e2e/nightly/single_node/models/test_qwen3_235b_w8a8.py
159-
- name: qwen3-next-w8a8
160-
os: linux-aarch64-a3-4
161-
tests: tests/e2e/nightly/single_node/models/test_qwen3_next_w8a8.py
162-
- name: kimi-k2-thinking
163-
os: linux-aarch64-a3-16
164-
tests: tests/e2e/nightly/single_node/models/test_kimi_k2_thinking.py
165-
- name: deepseek-r1-w8a8-hbm
166-
os: linux-aarch64-a3-16
167-
tests: tests/e2e/nightly/single_node/models/test_deepseek_r1_w8a8_hbm.py
168-
- name: deepseek3_2-w8a8
169-
os: linux-aarch64-a3-16
170-
tests: tests/e2e/nightly/single_node/models/test_deepseek_v3_2_w8a8.py
171-
- name: qwen3-30b-acc
172-
os: linux-aarch64-a3-4
173-
tests: tests/e2e/weekly/single_node/models/test_qwen3_30b_acc.py
183+
config_file_path: Prefix-Cache-DeepSeek-R1-0528-W8A8.yaml
174184
uses: ./.github/workflows/_e2e_nightly_single_node.yaml
175185
with:
176186
runner: ${{ matrix.test_config.os }}
177187
image: 'swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/vllm-ascend:nightly-a3'
178-
tests: ${{ matrix.test_config.tests }}
188+
config_file_path: ${{ matrix.test_config.config_file_path }}
179189
name: ${{ matrix.test_config.name }}
180190

181191
custom-ops-tests:
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# ==========================================
2+
# Shared Configurations
3+
# ==========================================
4+
5+
_envs: &envs
6+
OMP_NUM_THREADS: "10"
7+
OMP_PROC_BIND: "false"
8+
HCCL_BUFFSIZE: "1024"
9+
PYTORCH_NPU_ALLOC_CONF: "expandable_segments:True"
10+
SERVER_PORT: "DEFAULT_PORT"
11+
12+
_server_cmd: &server_cmd
13+
- "--quantization"
14+
- "ascend"
15+
- "--data-parallel-size"
16+
- "2"
17+
- "--tensor-parallel-size"
18+
- "8"
19+
- "--enable-expert-parallel"
20+
- "--port"
21+
- "$SERVER_PORT"
22+
- "--seed"
23+
- "1024"
24+
- "--max-model-len"
25+
- "36864"
26+
- "--max-num-batched-tokens"
27+
- "4096"
28+
- "--max-num-seqs"
29+
- "16"
30+
- "--trust-remote-code"
31+
- "--gpu-memory-utilization"
32+
- "0.9"
33+
- "--speculative-config"
34+
- '{"num_speculative_tokens": 1, "method": "mtp"}'
35+
- "--additional-config"
36+
- '{"enable_weight_nz_layout": true}'
37+
38+
_benchmarks_acc: &benchmarks_acc
39+
acc:
40+
case_type: accuracy
41+
dataset_path: vllm-ascend/gsm8k-lite
42+
request_conf: vllm_api_general_chat
43+
dataset_conf: gsm8k/gsm8k_gen_0_shot_cot_chat_prompt
44+
max_out_len: 32768
45+
batch_size: 32
46+
baseline: 95
47+
threshold: 5
48+
49+
_benchmarks_perf: &benchmarks_perf
50+
perf:
51+
case_type: performance
52+
dataset_path: vllm-ascend/GSM8K-in3500-bs400
53+
request_conf: vllm_api_stream_chat
54+
dataset_conf: gsm8k/gsm8k_gen_0_shot_cot_str_perf
55+
num_prompts: 400
56+
max_out_len: 1500
57+
batch_size: 1000
58+
baseline: 1
59+
threshold: 0.97
60+
61+
# ==========================================
62+
# ACTUAL TEST CASES
63+
# ==========================================
64+
65+
test_cases:
66+
- name: "DeepSeek-R1-0528-W8A8-single"
67+
model: "vllm-ascend/DeepSeek-R1-0528-W8A8"
68+
envs:
69+
<<: *envs
70+
server_cmd: *server_cmd
71+
server_cmd_extra:
72+
- "--enforce-eager"
73+
benchmarks:
74+
75+
- name: "DeepSeek-R1-0528-W8A8-aclgraph"
76+
model: "vllm-ascend/DeepSeek-R1-0528-W8A8"
77+
envs:
78+
<<: *envs
79+
server_cmd: *server_cmd
80+
benchmarks:
81+
<<: *benchmarks_acc
82+
<<: *benchmarks_perf
83+
84+
- name: "DeepSeek-R1-0528-W8A8-EPLB"
85+
model: "vllm-ascend/DeepSeek-R1-0528-W8A8"
86+
envs:
87+
<<: *envs
88+
DYNAMIC_EPLB: "true"
89+
server_cmd: *server_cmd
90+
server_cmd_extra:
91+
- "--additional-config"
92+
- '{"enable_weight_nz_layout": true, "eplb_config": {"dynamic_eplb": "true", "expert_heat_collection_interval": 1000, "algorithm_execution_interval": 50, "eplb_policy_type": 3}}'
93+
benchmarks:
94+
<<: *benchmarks_acc
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# ==========================================
2+
# ACTUAL TEST CASES
3+
# ==========================================
4+
5+
test_cases:
6+
- name: "DeepSeek-R1-W8A8-HBM-single"
7+
model: "vllm-ascend/DeepSeek-R1-W8A8"
8+
envs:
9+
HCCL_BUFFSIZE: "1024"
10+
SERVER_PORT: "DEFAULT_PORT"
11+
server_cmd:
12+
- "--quantization"
13+
- "ascend"
14+
- "--port"
15+
- "$SERVER_PORT"
16+
- "--data-parallel-size"
17+
- "8"
18+
- "--data-parallel-size-local"
19+
- "8"
20+
- "--data-parallel-rpc-port"
21+
- "13389"
22+
- "--tensor-parallel-size"
23+
- "2"
24+
- "--enable-expert-parallel"
25+
- "--seed"
26+
- "1024"
27+
- "--max-num-seqs"
28+
- "32"
29+
- "--max-model-len"
30+
- "6000"
31+
- "--max-num-batched-tokens"
32+
- "6000"
33+
- "--trust-remote-code"
34+
- "--gpu-memory-utilization"
35+
- "0.92"
36+
- "--no-enable-prefix-caching"
37+
- "--reasoning-parser"
38+
- "deepseek_r1"
39+
- "--enforce-eager"
40+
- "--additional-config"
41+
- '{"ascend_scheduler_config": {"enabled": false}, "torchair_graph_config": {"enabled": false, "enable_multistream_shared_expert": false}}'
42+
benchmarks:

0 commit comments

Comments
 (0)